Review of correlational studies in linguistics

Articles from the first edition of the Annual Review of Linguistics are appearing online this week.  Bob Ladd, Dan Dediu and I wrote a review of correlations in linguistics.

We review a number of recent studies that have identified either correlations between different linguistic features (e.g., implicational universals) or correlations between linguistic features and nonlinguistic properties of speakers or their environment (e.g., effects of geography on vocabulary). We compare large-scale quantitative studies with more traditional theoretical and historical linguistic research and identify divergent assumptions and methods that have led linguists to be skeptical of correlational work. We also attempt to demystify statistical techniques and point out the importance of informed critiques of the validity of statistical approaches. Finally, we describe various methods used in recent correlational studies to deal with the fact that, because of contact and historical relatedness, individual languages in a sample rarely represent independent data points, and we show how these methods may allow us to explore linguistic prehistory to a greater time depth than is possible with orthodox comparative reconstruction.  Whether researchers are for or against these new techniques, understanding them is becoming increasingly necessary to interface with discussions in the field.

One of the most fun parts of putting the paper together was drawing this diagram (below) of all the links that we discuss.  It turns out that there are a lot of complicated links between linguistic and social variables!  I’m currently working on methods to disentangle this web.

Screen Shot 2014-08-12 at 12.33.23

We also include three appendices as supplementary materials.  First, a list of electronic databases relevant for cross-cultural statistical comparisons.  Secondly, a very brief introduction to statistical hypothesis testing, which could be useful for linguists who are not familiar with statistical approaches.  Thirdly, a discussion of robustness and validity in statistical approaches to linguistics.

Other reviews also look interesting, for example, Johansson on Language abilities of Neandertals, Fisher and Vernes on genetics and linguistics, de Vos on village sign languages and Kroll et al. on bilingualism.

Ladd, D. R., Roberts, S. G., and Dediu, D. (2015). Correlational studies in typological and historical linguistics. Annual Review of Linguistics, 1(1). preview

SpecGram Essential Guide to Linguistics: electronic version

The Speculative Grammarian Essential Guide to Linguistics is now available in electronic format.  In the highest tradition of satire, this book gives a unique insight into the world of linguistics.  It is crucial reading for any linguist who is trying to maintain a sense of perspective (or for those seeking the comfort of realising their own perspective is relatively grounded ).

There’s also a special discount for the readers of Replicated Typo! Follow this link for 16.8% off.

From the blurb:

The book is written for linguists, by linguists. It’s about Linguistics and Language, but it’s not a textbook. Rather, it takes a sidelong look at all that is humorous about the field. Containing over 150 articles, poems, cartoons, humorous ads and book announcements—plus a generous sprinkling of quotes, proverbs and other witticisms—the book discovers things to laugh about in most major subfields of Linguistics.

What people have been saying:

“Don’t wait for Jon Stewart or Louis C.K. to do something with linguisticsit ain’t gonna happen. Just get this book and give a copy to everyone who needs a laugh.”

—Stephen Dodson, Languagehat

“[The Speculative Grammarian Essential Guide to Linguistics] will be a symbolic expression of your inner linguistic nerd.”

—Phaedra Royle, on Linguist List

“Complete with a choose-your-own-career-in-linguistics adventure game (German-sign-language-shaped dice not included), this is the ultimate gift for the budding language student, the jaded academic or the holistic forensic linguist.”

—Sean Roberts, A Replicated Typo

And just in time for Christmas.

PhD Opportunities: The Wellsprings of Linguistic Diversity

PhD positions are available at ANU, working with a team of people investigating diversity and cultural evolution.  The call is below:

Applications are now being sought for three PhD positions on the project ‘The Wellsprings of Linguistics Diversity’, funded by the Australian Research Council for the period mid-2014 to mid-2019.

Each PhD position will undertake substantial fieldwork on variation in a particular speech community: Western Arnhem Land (Bininj Gun-wok and neighbouring areas), Vanuatu (Sa and adjoining languages, South Pentecost Island), and Samoa (Samoan). Support will include a four-year stipend ($29,844 p/a), generous fieldwork funding, and embedding of the doctoral research in the dynamic team setting of the project, as well as the newly established ARC Centre of Excellence for the Dynamics of Language.  Positions will start in early February 2015.

The project is led by Prof. Nick Evans and the project team including postdocs Dr Murray Garde, Dr Ruth Singer, and Dr Dineke Schokkin and doctoral scholar Eri Kashima (fieldworkers), postdoc Dr Mark Ellison (computational modelling), and consultants Profs. Miriam Meyerhoff and Catherine Travis (variationist sociolinguistics) and Emeritus Prof. Andy Pawley (Samoan).

The project’s goal is to understand the causes of why linguistic diversity evolves differentially in different parts of the world, through a combination of detailed sociolinguistic case-studies of small-scale speech communities in their anthropological setting, and computational modelling of how micro-variation engenders macro-variation over iterations of transmission. The three high-diversity field sites are western Arnhem Land (Bininj Gun-wok and neighbouring languages), Morehead district of Southern New Guinea (Nen, Nambu, Idi), and South Pentecost Island, Vanuatu (Sa and neighbouring languages).  Samoa (Samoan) supplies a low-diversity comparator to the Vanuatu, and controls from small speech communities in global languages (English and Spanish) will be obtained by other investigators on the project.

A fuller description of the project can be downloaded from http://chl.anu.edu.au/school/laureate.php

General information about the doctoral program in School of Culture, History and Language at the ANU College of Asia and the Pacific can be found at http://chl.anu.edu.au/school/students_phd.php

Specific enquiries should be directed to Nick Evans (nicholas.evans@anu.edu.au) and completed application dossiers sent to geoff.sjollema@anu.edu.au. Completed applications should include the following information:
(a)    CV with educational qualifications, any publications and other relevant experience (e.g. fieldwork, relevant internships)
(b)    a two-page statement setting out your preferred field site or sites, what skills and personal attributes you will bring to the project, and what you see as the most interesting and challenging issues you will need to solve
(c)    if available, other materials supporting your case (e.g. relevant articles or other materials)

Deadline:  Aug 3rd 2014, midnight, AEST

Once awards are made, successful applicants will be notified and then guided through making a formal application for enrolment status through the regular ANU system.

How to speak Neanderthal

This week there’s an article about exploring Neandertal langauge in the New Scientist by Dan Dediu, Scott Moisik and I.  It discusses the idea that if Neandertals spoke modern languages, and if there was cultural contact between us and them, then ancient human languages may have been affected by Neandertal language (borrowing, contact effects etc.).  If this happened, then we may be able to detect these effects in today’s languages. The article and a recent blog post explains the idea, but I’ll cover some of the more technical stuff here.

Obviously, this is a very controversial idea:  the time scale is much longer than the usual linguistic reconstruction and we have no direct evidence for Neandertals speaking complex languages.  We’re definitely in for some flack.  So, this post briefly covers what we actually did.

Our EvoLang paper (and a full paper in prep) asks whether one necessary condition for coming anywhere near providing evidence for this idea is true:  Are there difference between current languages that were in contact (outside of Africa) and languages that were not in contact (inside Africa)?  This has been addressed before, for different reasons (Cysouw & Comrie, 2009: pdf), but with a smaller sample of data.

Can we detect traces of contact with Neandertals in present day languages? One condition for this is there being statistical differences between contact and non-contact languages.
Can we detect traces of contact with Neandertals in present day languages? One condition for this is there being statistical differences between contact and non-contact languages.

Using data from WALS, we ran a few tests:

  1. STRUCTURE analysis:  what is the most likely number of ‘founder’ populations that gives rise to the current diversity we see in African and Eurasian languages?  Do the estimated founder populations align with African and non-African languages?
  2. K-means clustering:  does a ‘natural’ statistical division between the world’s languages reflect a division between African and non-African languages? (is it better than chance and better than other continents?  Also run on phonetic data from PHOIBLE and lexical data from the ASJP)
  3. Weighted multidimensional scaling: If we compress WALS to a few dimensions, does the first dimension reflect a distinction between African and non-African languages?
  4. Phylogenetic reconstruction:  We reconstruct the cultural evolution of present-day language families to see if African and non-African languages have different cultural evolutionary biases (e.g. more likely to move towards or away from particular traits).  We used 3 phylogenies (WALS, Ethnologue, Glottolog), 3 branch length scaling assumptions (Grafen’s method, NNLS and UPGMA) and 3 methods of ancestral state reconstruction (Maximum parsimony, Maximum likelihood (BayesTraits) and Maximum likelihood (APE)).  We searched for features that have opposing biases in African and non-African languages that are bigger than 95% of all comparisons and are robust across all assumptions.
  5. Support Vector Machine learning:  We trained a Support Vector Machine (a supervised machine learning algorithm) to tell the difference between African and non-African languages.  We assessed the performance on unseen data, and also extract the most decisive linguistic features for making the distinction.  We estimate the number of features needed to get good results.
  6. Binary classification trees: This algorithm finds linguistic features to divide the data into sub-sets in a way that maximises the ease of differentiating African and non-African languages.
Results of a multidimensional scaling analysis of WALS, with African and non-African languages grouped by bag plots. The results differentiate African and non-African languages better than chance (p < 0.001) and better than other continent pairs (p = 0.004), but NOT better than 95% of linguistic variables (p = 0.06).

The detailed results will appear in our paper, but here’s what we conclude:

  • Some of the tests result in positive answers.  For example, the support vector machine analysis could differentiate between African and non-African typologies with 93% accuracy.  However, the algorithm needs at least linguistic 50 variables to make this distinction, so it’s unclear whether it’s picking up on actual differences, or just gaps in the data.
  • While some tests passed, our criterion was that ALL of the tests should pass for us to be at all confident of a statistical difference between African and non-African languages.  Some tests fail, so we can’t support this.
  • However, most of the problems we ran into were due to a lack of data.  We could get better estimates if we had more typological data of better quality from existing languages.  Another problem was implicational universals  – particular typological variables are correlated because they affect each other (e.g. verb-object order and prepositions/postpositions), causing patterns in the world’s languages that are confounded with geographic areas.
  • There’s a bigger question of whether, in theory, we can tell the difference between drift, contact effects, areal effects and language death.  Contact with Neandertals may just be too far into the past, with too many human languages dying in the meantime, to make this distinction possible.

So, our conclusion is that any attempt to reconstruct Neandertal languages will fail with the current data and theory we have.  Not surprising, really.  The interesting thing, for me, is that we actually have methods that can give us quantitative answers about this idea, and the answer might change as we document more languages and develop theories about historical change and contact.  As Chris Knight described our EvoLang presentation, this is one of my “most exciting and least conclusive” studies.

The Past, Present and Future of Language Evolution Research

During this year’s EvoLang conference, a book was launched with perspectives on the last conference. The past, present and future of language evolution research (McCrohon, Thompson, Verhoef & Yamauchi, 2014) is a volume of student responses to EvoLang9 in Kyoto. It includes basic reviews and criticism, synthesis of current approaches, experiments and sociological perspectives.

It makes for interesting reading. What comes across in all the papers is a drive for collaboration and integration of fields and ideas, as the diagram from the contribution by Barcceló-Coblijn and Martin shows. These are serious attempts to understand what has been learned so far and find new perspectives that incorporate empirical evidence. Many papers see neuroscientific evidence as a key to expanding many areas of research.

photo-2

Continue reading “The Past, Present and Future of Language Evolution Research”

Empirical Advances in Language Evolution

This is a guest post by Jeremy Collins

Hauser, Yang, Berwick, Tattersall, Ryan, Watamull, Chomsky and Lewontin have recently published an article entitled ‘The Mystery of Language Evolution‘ (see also Sean’s post), in which they argue that theories of language evolution today are ‘accompanied by a poverty of evidence’ and that ‘the most fundamental questions about the origins and evolution of our linguistic capacity remain as mysterious as ever’.  Rather than criticise their article, I thought I would summarise what I think some of the empirical advances have been, in defence of the field.  A few well-known lines of research seem to have fleshed out some details of how language evolved, even if they are still in their infancy.

1. Vocal learning in other species. 

Culturally transmitted song has evolved multiple times in various bird species, dolphins and bats.  Although Hauser et al. dismiss bird song as irrelevant in that it is ‘finite’ and lacks compositional meaning (p.6), these species shed light on why culturally transmitted vocalisation evolved in humans.  These species typically live in groups of unrelated individuals, for instance, who co-operate in foraging.  The complexity of their learnt song may have evolved in the context of recognising and being altruistic towards kin (Sharp et al. 2005) (or by extension any unrelated members who exploit this altruism by managing to acquire the song of the group).  In a similar way, much of the complexity and cultural variability of human language may have developed in the context of in-group identification, such as our ability to detect subtle variations in accent (Fitch 2004).  While sexual selection is an important reason for the evolution of vocal learning in some of these species, it is unlikely to be the main driving force in humans given the lack of sexual dimorphism in language use, in contrast with song birds (Fitch 2004), although its role in human pair bonding is similar to pair bonding in monogamous parrot species (Pepperberg 1999).  Pepperberg (1999) showed that African Grey Parrots can learn to use spoken words and correctly answer questions involving abstract semantic categories, and with some understanding of syntax, showing how bird vocal learning is not necessarily as qualitatively different from human language acquisition as Hauser et al. suggest.

2. The genetics of language. 

The precise relationships between genes and language are unknown, as the authors say; but specific language disorders at least show that syntax and fluency of speech are heritable, which is an advance in its own right.  Vocabulary size and vocabulary acquisition patterns (e.g. rate of learning words at different ages in infancy) have also been shown to be heritable (Stromswold 2001). Although these are not genes ‘for’ these specific linguistic traits,they are likely to have been selected for partly in the context of language use, given the vast difference in syntactic complexity and vocabulary size between human languages and languages that primates, such as Kanzi or Nim Chimpsky, can acquire.

3. The neurobiology of language and tool use. 

The neural circuitry for language is likely to have been co-opted in part from the transmission and use of tools; they both involve complex motor actions and have been suggested to use similar areas of the brain such as Broca’s area, which is activated in experiments involving complex tool manufacture (Higuchi et al. 2009), and which is often lateralised differently in the brain in left-handed individuals (Knecht et al. 2000).  The prevalence of gesture in spoken languages, the fact that we can acquire complex sign languages, and the range of innate gestures in gorillas and chimpanzees (contrasted with their absence of vocalization) suggest that gesture may have been a platform for the evolution of language, and manual dexterity for the evolution of recursive syntax in particular (Arbib 2012).  If the authors want an evolutionary origin for ‘discrete infinity’, this is one candidate.

4. The study of sound symbolism. 

Three lines of evidence suggest that sound-symbolism helped spoken language evolve: robust sound-meaning pairings tested across 6000 languages, controlling for language family and region (such as proximal demonstratives and words for ‘small’ using a front vowel) (Blasi et al. 2014); rich systems of ideophones, namely words similar to onomatopoeia but which go beyond sound in being able to depict appearance, texture, motion, tastes, and emotions, in language families in Africa, Southeast Asia and the Americas (Dingemanse 2012); and innate associations of sounds and shapes independent of language, as suggested by ideophones, and the bouba/kiki and similar tests (Ramachandran 2013).

5. The study of the diversity of grammar. 

As an example, grammatical categories regularly develop from simpler, lexical categories, in ways that recur across many language families: e.g.pre-/post-positions develop from abstract nouns and verbs, adjectives develop from forms of nouns and verbs, tense and aspect markers develop from adverbs or nominalizers (e.g. the development of English ‘-ing’ from a nominal affix to a gerund marker to a participle marker), and so on (Heine and Kuteva 2007).  Cross-linguistic work can therefore shed light on what the first languages may have been like, such as having more weakly differentiated grammatical categories (e.g. collapsing adjectives or adpositions with nouns and verbs). Studies on patterns of basic word order suggest that that subject-object-verb order is likely to have been used, given its dominance in spoken languages today when controlling for geography and language family (Gell-Mann and Ruhlen 2011, Dryer 1992), and the way that people spontaneously converge on that word order when gesturing (Goldin-Meadow et al. 2008).  Languages spoken by small populations tend to develop case-marking and other complex morphology (Lupyan and Dale 2010), suggesting that this may also have been a feature of early languages.  Increasingly detailed surveys of linguistic diversity can help generate hypotheses like these, and hopefully soon allow ways of testing them.

Hauser et al.’s paper has some valid criticisms of the field (such as of models of the cultural evolution of compositionality, and the evidence for Neanderthal language), but I think that their assessment that ‘the fundamental questions remain as mysterious as ever’ is too pessimistic. Others have noted that none of the authors were at the last Evolution of Language conference, which is not surprising given what I remember of meeting Charles Yang, the second author on that paper, at the previous conference in Kyoto.  He was sitting gloomily at dinner with a group of Japanese generativists, who were not talking.  I asked him whether he had enjoyed any of the talks, and he said ‘Almost none.  Their notion of language is so…impoverished.’ He brightened up when the conversation turned back to Chomsky, whom he had had dinner with recently.  ‘We drank a lot of wine.  And Noam had two desserts.’

 

Jeremy Collins designs kitchens and bathrooms at the Max Planck Institute for Psycholinguistics.  His homepage is here.

References

Arbib M. A. (2012) Tool use and constructions.  Behav Brain Sci. 35(4):218-9.

Blasi et al. (2014) Sound symbolism and the origins of language.  IN

Cartmill, Roberts, Lyn & Cornsih (Eds. ) The Evolution of Language: Proceedings of the 10th EvoLang Conference.

Dingemanse, Mark. 2012. “Advances in the Cross-Linguistic Study of
Ideophones.” Language and Linguistics Compass 6 (10): 654–72.
doi:10.1002/lnc3.361.

Dryer, M. (1992). The Greenbergian word order correlations. Language, pages 81–138.

Fitch, W. T. (2004). The evolution of language. In: The Cognitive Neurosciences (3rd Edition, Ed. by Gazzaniga, M.). Cambridge, MA: MIT Press

Gell-Mann, M. and Ruhlen, M. (2011). The origin and evolution of word order. Proceedings of the National Academy of Sciences, 108(42):17290–17295.

Goldin-Meadow, S., So, W. C., O ̈zyu ̈rek, A., and Mylander, C. (2008). The natural order of events: How speakers of different languages represent events nonverbally. Proceedings of the National Academy of Sciences, 105(27):9163–9168.

Higuchi, S., Chaminade, T., Imamizu, H., and Kawato, M. (2009). Shared neural correlates for language and tool use in broca’s area. Neuroreport, 20(15):1376–1381.

Knecht, S., Dr ̈ager, B., Deppe, M., Bobe, L., Lohmann, H., Fl ̈oel, A., Ringelstein, E.-B., and Henningsen, H. (2000). Handedness and hemispheric language dominance in healthy humans. Brain, 123(12):2512–2518.

Lupyan, G. and Dale, R. (2010). Language structure is partly determined by social structure. PLoS ONE, 5(1):e8559.

Pepperberg, I.M. (1999). The Alex Studies: Cognitive and Communicative Abilities of Grey Parrots. Harvard.

Sharp, S.P., McGowan, A., Wood, M.J., and Hatchwell, B.J. (2005).  Learned kin recognition cues in a social bird.  Nature, 434:1127-1130

Stromswold, K. (2001). The heritability of language: A review and metaanalysis of twin, adoption, and linkage studies. Language, 77(4):647–723.

The Mystery of Language Evolution: We can’t know more until we do

Hauser, Yang, Berwick, Tattersall, Ryan, Watumull, Chomsky and Lewontin have a co-authored article on The Mystery of Language Evolution. It’s a review of current directions in the field with the basic message that we don’t yet understand enough for empirical evidence from animal studies, archaeology, palaeontology, genetics or modelling to inform theories of language evolution.  Here I summarise the paper and offer some criticisms.

The core language phenotype of interest, according to the authors, is discrete infinity as exemplified in recursive operations found in combinatorial phonology and hierarchical syntax. The authors argue that the methods of evolutionary biology cannot yet be adequately applied to the evolution of this phenotype.

The paper begins with an illustration of the methods of evolutionary biology in a case where this kind of inference is possible. Túngara frogs (pictured above) have a very simple communication system (males croak to attract females), and we know a lot about the mechanisms underlying production and perception and how it links to fitness. However, the obvious adaptive hypothesis (perception adapted after production) was proven wrong by comparison with living sister species (they had similar perception, but not production capacities, so production adapted to perception). This method is hard to apply to language evolution, because we don’t have a good idea of the mechanisms involved and we have no sister-species to compare ourselves to.

Specifically, the authors focus on 4 domains of inquiry, which they claim cannot contribute to theories of language evolution.

Continue reading “The Mystery of Language Evolution: We can’t know more until we do”

EvoLang Preview: Detecting differences between the languages of humans and Neandertals

This year’s EvoLang is busy – around 100 talks in 4 parallel sessions and 40 posters.  Replicated Typo is hosting a series of EvoLang previews to help people decide on what to go and see.  If you’d like to post a preview of your own presentation, please get in touch with sean.roberts@mpi.nl.

Roberts, Dediu & Levinson.  Detecting differences between the languages of Neanderthals and modern humans.  Thursday, 17:45, session A.

Recently, Dediu & Levinson (2013) argued that, given recent genetic and archaeological evidence, the default assumption should be that Neandertals spoke modern languages (not protolanguages).  Dediu will be giving a talk on this work in the same session.  My talk will discuss whether there are methods that can test these ideas.  Is there any way to estimate what Neandertal languages were like?  It’s a  controversial topic, but could have big implications for the field.

Continue reading “EvoLang Preview: Detecting differences between the languages of humans and Neandertals”

The great language game: Confusing languages

This is a guest post by Hedvig Skirgård.

The Great Language Game, have you heard of it? It’s an online game where players compete in matching audio clips to the correct language. The game was created by Lars Yencken earlier this year and has become very popular. Data generated by the game can be used to map what languages the players find hardest to tell apart and support what we’ve known all along: Portuguese does sound a bit slavic!

Continue reading “The great language game: Confusing languages”

Syntax of Mind Conference

A conference on the Syntax of Mind is taking place April 17-19 in Vienna, immediately following Evolang.  Registration is free and they are accepting abstracts for talks and posters.  From the website:

This conference will provide a state-of-the-art update on this fast-moving field, and will focus on overlaps between and evolution of spoken language and music, researching employing artificial grammar learning, and comparative work in these areas with a wide range of animal species.