Beyond Quantification: Digital Criticism and the Search for Patterns

I’ve collected some recent posts (from New Savanna) on patterns into a working paper. It’s online at SSRN. Here’s the abstract and the introduction.

Abstract: Literary critics seek patterns, whether patterns in individual texts or patterns in large collections of texts. Valid patterns are taken as indices of causal mechanisms of one sort or another. Most abstractly, a pattern emerges or is enacted as some machine makes its way in an environment. An ecological niche is a pattern “traced” by an organism in its environment. Literary texts are themselves patterns traced by writers (and readers) through their life worlds. Patterns are frequently described through visualizations. The concept of pattern thus dissolves the apparent conflict between quantification and meaning, for quantification is but a means to describing a pattern. It is up to the critic to determine whether or not a pattern is meaningful by identifying the mechanism that produced the pattern. Examples from Shakespeare and Joseph Conrad.

Introduction: Patterns and Descriptions There is a sense, of course, in which I’ve been aware of and have been perceiving and thinking about patterns all my life. They are ubiquitous after all. But it wasn’t until I began studying cognitive science with the late David Hays that “pattern” became a term of art. Hays and his students were developing a network model of cognitive structure – such works became common in the 1970s. Such networks admit of two general kinds of computational process, path tracing and pattern recognition. Path tracing is computationally easy, while the pattern recognition is not. Human beings, however, are very good at perceiving and recognizing patterns.

What put the idea before me, though, as something demanding specific thought, are remarks Franco Moretti made in coming to grips with his work on the network analysis of plot structure. In Network Theory, Plot Analysis (Literary Lab Pamphlet 2, 2011, p. 11) he noted that he “did not need network theory; but I probably needed networks…. What I took from network theory were less concepts than visualization.” We then examine the visualizations to determine whether or not they indicate patterns that are worth further exploration. Continue reading “Beyond Quantification: Digital Criticism and the Search for Patterns”

Talk to the Animals: BBC series on Animal Communication

The BBC have got a new series on animal communication, so far they’ve covered mongooses, hippopotamuses, vervet monkeys, chimpanzees, dolphins and other animals but I haven’t actually watched it yet.

Someone’s uploaded the first episode to youtube here:

But of course you can watch it on iPlayer too if you’re in the UK.

PhD Opportunities: The Wellsprings of Linguistic Diversity

PhD positions are available at ANU, working with a team of people investigating diversity and cultural evolution.  The call is below:

Applications are now being sought for three PhD positions on the project ‘The Wellsprings of Linguistics Diversity’, funded by the Australian Research Council for the period mid-2014 to mid-2019.

Each PhD position will undertake substantial fieldwork on variation in a particular speech community: Western Arnhem Land (Bininj Gun-wok and neighbouring areas), Vanuatu (Sa and adjoining languages, South Pentecost Island), and Samoa (Samoan). Support will include a four-year stipend ($29,844 p/a), generous fieldwork funding, and embedding of the doctoral research in the dynamic team setting of the project, as well as the newly established ARC Centre of Excellence for the Dynamics of Language.  Positions will start in early February 2015.

The project is led by Prof. Nick Evans and the project team including postdocs Dr Murray Garde, Dr Ruth Singer, and Dr Dineke Schokkin and doctoral scholar Eri Kashima (fieldworkers), postdoc Dr Mark Ellison (computational modelling), and consultants Profs. Miriam Meyerhoff and Catherine Travis (variationist sociolinguistics) and Emeritus Prof. Andy Pawley (Samoan).

The project’s goal is to understand the causes of why linguistic diversity evolves differentially in different parts of the world, through a combination of detailed sociolinguistic case-studies of small-scale speech communities in their anthropological setting, and computational modelling of how micro-variation engenders macro-variation over iterations of transmission. The three high-diversity field sites are western Arnhem Land (Bininj Gun-wok and neighbouring languages), Morehead district of Southern New Guinea (Nen, Nambu, Idi), and South Pentecost Island, Vanuatu (Sa and neighbouring languages).  Samoa (Samoan) supplies a low-diversity comparator to the Vanuatu, and controls from small speech communities in global languages (English and Spanish) will be obtained by other investigators on the project.

A fuller description of the project can be downloaded from http://chl.anu.edu.au/school/laureate.php

General information about the doctoral program in School of Culture, History and Language at the ANU College of Asia and the Pacific can be found at http://chl.anu.edu.au/school/students_phd.php

Specific enquiries should be directed to Nick Evans (nicholas.evans@anu.edu.au) and completed application dossiers sent to geoff.sjollema@anu.edu.au. Completed applications should include the following information:
(a)    CV with educational qualifications, any publications and other relevant experience (e.g. fieldwork, relevant internships)
(b)    a two-page statement setting out your preferred field site or sites, what skills and personal attributes you will bring to the project, and what you see as the most interesting and challenging issues you will need to solve
(c)    if available, other materials supporting your case (e.g. relevant articles or other materials)

Deadline:  Aug 3rd 2014, midnight, AEST

Once awards are made, successful applicants will be notified and then guided through making a formal application for enrolment status through the regular ANU system.

How to speak Neanderthal

This week there’s an article about exploring Neandertal langauge in the New Scientist by Dan Dediu, Scott Moisik and I.  It discusses the idea that if Neandertals spoke modern languages, and if there was cultural contact between us and them, then ancient human languages may have been affected by Neandertal language (borrowing, contact effects etc.).  If this happened, then we may be able to detect these effects in today’s languages. The article and a recent blog post explains the idea, but I’ll cover some of the more technical stuff here.

Obviously, this is a very controversial idea:  the time scale is much longer than the usual linguistic reconstruction and we have no direct evidence for Neandertals speaking complex languages.  We’re definitely in for some flack.  So, this post briefly covers what we actually did.

Our EvoLang paper (and a full paper in prep) asks whether one necessary condition for coming anywhere near providing evidence for this idea is true:  Are there difference between current languages that were in contact (outside of Africa) and languages that were not in contact (inside Africa)?  This has been addressed before, for different reasons (Cysouw & Comrie, 2009: pdf), but with a smaller sample of data.

Can we detect traces of contact with Neandertals in present day languages? One condition for this is there being statistical differences between contact and non-contact languages.
Can we detect traces of contact with Neandertals in present day languages? One condition for this is there being statistical differences between contact and non-contact languages.

Using data from WALS, we ran a few tests:

  1. STRUCTURE analysis:  what is the most likely number of ‘founder’ populations that gives rise to the current diversity we see in African and Eurasian languages?  Do the estimated founder populations align with African and non-African languages?
  2. K-means clustering:  does a ‘natural’ statistical division between the world’s languages reflect a division between African and non-African languages? (is it better than chance and better than other continents?  Also run on phonetic data from PHOIBLE and lexical data from the ASJP)
  3. Weighted multidimensional scaling: If we compress WALS to a few dimensions, does the first dimension reflect a distinction between African and non-African languages?
  4. Phylogenetic reconstruction:  We reconstruct the cultural evolution of present-day language families to see if African and non-African languages have different cultural evolutionary biases (e.g. more likely to move towards or away from particular traits).  We used 3 phylogenies (WALS, Ethnologue, Glottolog), 3 branch length scaling assumptions (Grafen’s method, NNLS and UPGMA) and 3 methods of ancestral state reconstruction (Maximum parsimony, Maximum likelihood (BayesTraits) and Maximum likelihood (APE)).  We searched for features that have opposing biases in African and non-African languages that are bigger than 95% of all comparisons and are robust across all assumptions.
  5. Support Vector Machine learning:  We trained a Support Vector Machine (a supervised machine learning algorithm) to tell the difference between African and non-African languages.  We assessed the performance on unseen data, and also extract the most decisive linguistic features for making the distinction.  We estimate the number of features needed to get good results.
  6. Binary classification trees: This algorithm finds linguistic features to divide the data into sub-sets in a way that maximises the ease of differentiating African and non-African languages.
Results of a multidimensional scaling analysis of WALS, with African and non-African languages grouped by bag plots. The results differentiate African and non-African languages better than chance (p < 0.001) and better than other continent pairs (p = 0.004), but NOT better than 95% of linguistic variables (p = 0.06).

The detailed results will appear in our paper, but here’s what we conclude:

  • Some of the tests result in positive answers.  For example, the support vector machine analysis could differentiate between African and non-African typologies with 93% accuracy.  However, the algorithm needs at least linguistic 50 variables to make this distinction, so it’s unclear whether it’s picking up on actual differences, or just gaps in the data.
  • While some tests passed, our criterion was that ALL of the tests should pass for us to be at all confident of a statistical difference between African and non-African languages.  Some tests fail, so we can’t support this.
  • However, most of the problems we ran into were due to a lack of data.  We could get better estimates if we had more typological data of better quality from existing languages.  Another problem was implicational universals  – particular typological variables are correlated because they affect each other (e.g. verb-object order and prepositions/postpositions), causing patterns in the world’s languages that are confounded with geographic areas.
  • There’s a bigger question of whether, in theory, we can tell the difference between drift, contact effects, areal effects and language death.  Contact with Neandertals may just be too far into the past, with too many human languages dying in the meantime, to make this distinction possible.

So, our conclusion is that any attempt to reconstruct Neandertal languages will fail with the current data and theory we have.  Not surprising, really.  The interesting thing, for me, is that we actually have methods that can give us quantitative answers about this idea, and the answer might change as we document more languages and develop theories about historical change and contact.  As Chris Knight described our EvoLang presentation, this is one of my “most exciting and least conclusive” studies.

The Past, Present and Future of Language Evolution Research

During this year’s EvoLang conference, a book was launched with perspectives on the last conference. The past, present and future of language evolution research (McCrohon, Thompson, Verhoef & Yamauchi, 2014) is a volume of student responses to EvoLang9 in Kyoto. It includes basic reviews and criticism, synthesis of current approaches, experiments and sociological perspectives.

It makes for interesting reading. What comes across in all the papers is a drive for collaboration and integration of fields and ideas, as the diagram from the contribution by Barcceló-Coblijn and Martin shows. These are serious attempts to understand what has been learned so far and find new perspectives that incorporate empirical evidence. Many papers see neuroscientific evidence as a key to expanding many areas of research.

photo-2

Continue reading “The Past, Present and Future of Language Evolution Research”

Empirical Advances in Language Evolution

This is a guest post by Jeremy Collins

Hauser, Yang, Berwick, Tattersall, Ryan, Watamull, Chomsky and Lewontin have recently published an article entitled ‘The Mystery of Language Evolution‘ (see also Sean’s post), in which they argue that theories of language evolution today are ‘accompanied by a poverty of evidence’ and that ‘the most fundamental questions about the origins and evolution of our linguistic capacity remain as mysterious as ever’.  Rather than criticise their article, I thought I would summarise what I think some of the empirical advances have been, in defence of the field.  A few well-known lines of research seem to have fleshed out some details of how language evolved, even if they are still in their infancy.

1. Vocal learning in other species. 

Culturally transmitted song has evolved multiple times in various bird species, dolphins and bats.  Although Hauser et al. dismiss bird song as irrelevant in that it is ‘finite’ and lacks compositional meaning (p.6), these species shed light on why culturally transmitted vocalisation evolved in humans.  These species typically live in groups of unrelated individuals, for instance, who co-operate in foraging.  The complexity of their learnt song may have evolved in the context of recognising and being altruistic towards kin (Sharp et al. 2005) (or by extension any unrelated members who exploit this altruism by managing to acquire the song of the group).  In a similar way, much of the complexity and cultural variability of human language may have developed in the context of in-group identification, such as our ability to detect subtle variations in accent (Fitch 2004).  While sexual selection is an important reason for the evolution of vocal learning in some of these species, it is unlikely to be the main driving force in humans given the lack of sexual dimorphism in language use, in contrast with song birds (Fitch 2004), although its role in human pair bonding is similar to pair bonding in monogamous parrot species (Pepperberg 1999).  Pepperberg (1999) showed that African Grey Parrots can learn to use spoken words and correctly answer questions involving abstract semantic categories, and with some understanding of syntax, showing how bird vocal learning is not necessarily as qualitatively different from human language acquisition as Hauser et al. suggest.

2. The genetics of language. 

The precise relationships between genes and language are unknown, as the authors say; but specific language disorders at least show that syntax and fluency of speech are heritable, which is an advance in its own right.  Vocabulary size and vocabulary acquisition patterns (e.g. rate of learning words at different ages in infancy) have also been shown to be heritable (Stromswold 2001). Although these are not genes ‘for’ these specific linguistic traits,they are likely to have been selected for partly in the context of language use, given the vast difference in syntactic complexity and vocabulary size between human languages and languages that primates, such as Kanzi or Nim Chimpsky, can acquire.

3. The neurobiology of language and tool use. 

The neural circuitry for language is likely to have been co-opted in part from the transmission and use of tools; they both involve complex motor actions and have been suggested to use similar areas of the brain such as Broca’s area, which is activated in experiments involving complex tool manufacture (Higuchi et al. 2009), and which is often lateralised differently in the brain in left-handed individuals (Knecht et al. 2000).  The prevalence of gesture in spoken languages, the fact that we can acquire complex sign languages, and the range of innate gestures in gorillas and chimpanzees (contrasted with their absence of vocalization) suggest that gesture may have been a platform for the evolution of language, and manual dexterity for the evolution of recursive syntax in particular (Arbib 2012).  If the authors want an evolutionary origin for ‘discrete infinity’, this is one candidate.

4. The study of sound symbolism. 

Three lines of evidence suggest that sound-symbolism helped spoken language evolve: robust sound-meaning pairings tested across 6000 languages, controlling for language family and region (such as proximal demonstratives and words for ‘small’ using a front vowel) (Blasi et al. 2014); rich systems of ideophones, namely words similar to onomatopoeia but which go beyond sound in being able to depict appearance, texture, motion, tastes, and emotions, in language families in Africa, Southeast Asia and the Americas (Dingemanse 2012); and innate associations of sounds and shapes independent of language, as suggested by ideophones, and the bouba/kiki and similar tests (Ramachandran 2013).

5. The study of the diversity of grammar. 

As an example, grammatical categories regularly develop from simpler, lexical categories, in ways that recur across many language families: e.g.pre-/post-positions develop from abstract nouns and verbs, adjectives develop from forms of nouns and verbs, tense and aspect markers develop from adverbs or nominalizers (e.g. the development of English ‘-ing’ from a nominal affix to a gerund marker to a participle marker), and so on (Heine and Kuteva 2007).  Cross-linguistic work can therefore shed light on what the first languages may have been like, such as having more weakly differentiated grammatical categories (e.g. collapsing adjectives or adpositions with nouns and verbs). Studies on patterns of basic word order suggest that that subject-object-verb order is likely to have been used, given its dominance in spoken languages today when controlling for geography and language family (Gell-Mann and Ruhlen 2011, Dryer 1992), and the way that people spontaneously converge on that word order when gesturing (Goldin-Meadow et al. 2008).  Languages spoken by small populations tend to develop case-marking and other complex morphology (Lupyan and Dale 2010), suggesting that this may also have been a feature of early languages.  Increasingly detailed surveys of linguistic diversity can help generate hypotheses like these, and hopefully soon allow ways of testing them.

Hauser et al.’s paper has some valid criticisms of the field (such as of models of the cultural evolution of compositionality, and the evidence for Neanderthal language), but I think that their assessment that ‘the fundamental questions remain as mysterious as ever’ is too pessimistic. Others have noted that none of the authors were at the last Evolution of Language conference, which is not surprising given what I remember of meeting Charles Yang, the second author on that paper, at the previous conference in Kyoto.  He was sitting gloomily at dinner with a group of Japanese generativists, who were not talking.  I asked him whether he had enjoyed any of the talks, and he said ‘Almost none.  Their notion of language is so…impoverished.’ He brightened up when the conversation turned back to Chomsky, whom he had had dinner with recently.  ‘We drank a lot of wine.  And Noam had two desserts.’

 

Jeremy Collins designs kitchens and bathrooms at the Max Planck Institute for Psycholinguistics.  His homepage is here.

References

Arbib M. A. (2012) Tool use and constructions.  Behav Brain Sci. 35(4):218-9.

Blasi et al. (2014) Sound symbolism and the origins of language.  IN

Cartmill, Roberts, Lyn & Cornsih (Eds. ) The Evolution of Language: Proceedings of the 10th EvoLang Conference.

Dingemanse, Mark. 2012. “Advances in the Cross-Linguistic Study of
Ideophones.” Language and Linguistics Compass 6 (10): 654–72.
doi:10.1002/lnc3.361.

Dryer, M. (1992). The Greenbergian word order correlations. Language, pages 81–138.

Fitch, W. T. (2004). The evolution of language. In: The Cognitive Neurosciences (3rd Edition, Ed. by Gazzaniga, M.). Cambridge, MA: MIT Press

Gell-Mann, M. and Ruhlen, M. (2011). The origin and evolution of word order. Proceedings of the National Academy of Sciences, 108(42):17290–17295.

Goldin-Meadow, S., So, W. C., O ̈zyu ̈rek, A., and Mylander, C. (2008). The natural order of events: How speakers of different languages represent events nonverbally. Proceedings of the National Academy of Sciences, 105(27):9163–9168.

Higuchi, S., Chaminade, T., Imamizu, H., and Kawato, M. (2009). Shared neural correlates for language and tool use in broca’s area. Neuroreport, 20(15):1376–1381.

Knecht, S., Dr ̈ager, B., Deppe, M., Bobe, L., Lohmann, H., Fl ̈oel, A., Ringelstein, E.-B., and Henningsen, H. (2000). Handedness and hemispheric language dominance in healthy humans. Brain, 123(12):2512–2518.

Lupyan, G. and Dale, R. (2010). Language structure is partly determined by social structure. PLoS ONE, 5(1):e8559.

Pepperberg, I.M. (1999). The Alex Studies: Cognitive and Communicative Abilities of Grey Parrots. Harvard.

Sharp, S.P., McGowan, A., Wood, M.J., and Hatchwell, B.J. (2005).  Learned kin recognition cues in a social bird.  Nature, 434:1127-1130

Stromswold, K. (2001). The heritability of language: A review and metaanalysis of twin, adoption, and linkage studies. Language, 77(4):647–723.

The Mystery of Language Evolution: We can’t know more until we do

Hauser, Yang, Berwick, Tattersall, Ryan, Watumull, Chomsky and Lewontin have a co-authored article on The Mystery of Language Evolution. It’s a review of current directions in the field with the basic message that we don’t yet understand enough for empirical evidence from animal studies, archaeology, palaeontology, genetics or modelling to inform theories of language evolution.  Here I summarise the paper and offer some criticisms.

The core language phenotype of interest, according to the authors, is discrete infinity as exemplified in recursive operations found in combinatorial phonology and hierarchical syntax. The authors argue that the methods of evolutionary biology cannot yet be adequately applied to the evolution of this phenotype.

The paper begins with an illustration of the methods of evolutionary biology in a case where this kind of inference is possible. Túngara frogs (pictured above) have a very simple communication system (males croak to attract females), and we know a lot about the mechanisms underlying production and perception and how it links to fitness. However, the obvious adaptive hypothesis (perception adapted after production) was proven wrong by comparison with living sister species (they had similar perception, but not production capacities, so production adapted to perception). This method is hard to apply to language evolution, because we don’t have a good idea of the mechanisms involved and we have no sister-species to compare ourselves to.

Specifically, the authors focus on 4 domains of inquiry, which they claim cannot contribute to theories of language evolution.

Continue reading “The Mystery of Language Evolution: We can’t know more until we do”

New Evidence for Neanderthal Language Announced (on April 1st…)

In keeping with Sean’s previous Evolang Preview some Neanderthal&language evolution-related news:

As Andrew Lamont writes on the official LINGUIST List Blog:

The controversy over whether Neanderthals possessed a capacity for language may have been resolved. After years of speculation by evolutionary anthropologists and geneticists, a group of linguists has announced today that they have uncovered written evidence proving the Neanderthal capacity for language.

[…]

Schmaltz’ team was able to identify and translate two texts left by Neanderthals. The first, a recent discovery in Spain, is a fragment of a teenager’s diary. It reads oog.oog.oog and has been translated as ‘[Dear diary, I feel] emotionally distant. [I wish I had my own cave]’.”

Read the whole thing here.

EvoLang Preview: Detecting differences between the languages of humans and Neandertals

This year’s EvoLang is busy – around 100 talks in 4 parallel sessions and 40 posters.  Replicated Typo is hosting a series of EvoLang previews to help people decide on what to go and see.  If you’d like to post a preview of your own presentation, please get in touch with sean.roberts@mpi.nl.

Roberts, Dediu & Levinson.  Detecting differences between the languages of Neanderthals and modern humans.  Thursday, 17:45, session A.

Recently, Dediu & Levinson (2013) argued that, given recent genetic and archaeological evidence, the default assumption should be that Neandertals spoke modern languages (not protolanguages).  Dediu will be giving a talk on this work in the same session.  My talk will discuss whether there are methods that can test these ideas.  Is there any way to estimate what Neandertal languages were like?  It’s a  controversial topic, but could have big implications for the field.

Continue reading “EvoLang Preview: Detecting differences between the languages of humans and Neandertals”

Linguistic Phylogenies Support Back-Migration from Beringia to Asia

It finally happened! A press release from PLOS landed in my inbox with the words “Language Evolution” in the title!

The paper’s “Linguistic Phylogenies Support Back-Migration from Beringia to Asia” by Sicoli and Holton. Given that PLOS have released this as a press release, the media may well pick it up, so I’ve made a quick and easy-to-read list of details which probably won’t reach the papers:

What? Phylogenetic models applied to linguistic data to make inferences about human migration into and out of North America.
Why? Hypothesis testing/model fitting an Out-of-Beringia (to Asia) hypothesis compared to an Out-of-Central Asia (to North America) hypothesis.
Languages? North American languages and Central Siberia languages – about 40 languages (2 Yeniseian languages, 37 Na-Dene languages and Haida (isolate))
What’s the data? Binary coded 116 typological features (26 of which were excluded later for being “uninformative”). Data from Sherzer’s An areal-typological study of American Indian languages north of Mexico, the Alaska Native Language Archive and other grammars
Methods? Bayesian likelihood modelling (using Markov Chain Monte Carlo methods in MrBayes) and neighbour joining distance methods (using NeighborNet and SplitsTree4)
Results? The  Out-of-Beringia model fits better (the results section is massive, you should go and read it if you’re interested in the details). This model supports the story that there was a back-migration into Asia from Beringia, which is in contrast to recent arguments that the connections between Na-Dene languages and Yeniseian languages show that the Native Americans migrated from Central Asia.

 

This seems to be a reasonably solid piece of work, though I should leave it to someone else to assess the legitimacy of the statistical analysis/results. It’s nice to see also that the press release does state that: “the authors cannot conclusively determine the migration pattern just from these results, and state that this study does not necessarily contradict the popular tale of hunters entering the New World through Beringia, it at the very least indicates that migration may not have been a one-way trip.” Back-migration is rarely considered when testing hypotheses using models for serial-founder effects – and I think this must happen more than we often assume in linguistic phylogenies.