Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution

In scientific prognostication we have a condition analogous to a fact of archery—the farther back you draw your longbow, the farther ahead you can shoot.
– Buckminster Fuller

The following remarks are rather speculative in nature, as many of my remarks tend to be. I’m sketching large conclusions on the basis of only a few anecdotes. But those conclusions aren’t really conclusions at all, not in the sense that they are based on arguments presented prior to them. I’ve been thinking about cultural evolution for years, and about the need to apply sophisticated statistical techniques to large bodies of text—really, all the texts we can get, in all languages—by way of investigating cultural evolution.

So it is no surprise that this post arrives at cultural evolution and concludes with remarks on how the human sciences will have to change their institutional ways to support that kind of research. Conceptually, I was there years ago. But now we have a younger generation of scholars who are going down this path, and it is by no means obvious that the profession is ready to support them. Sure, funding is there for “digital humanities” and so deans and department chairs can get funding and score points for successful hires. But you can’t build a profound and a new intellectual enterprise on financially-driven institutional gamesmanship alone.

You need a vision, and though I’d like to be proved wrong, I don’t see that vision, certainly not on the web. That’s why I’m writing this post. Consider it sequel to an article I published back in 1976 with my teacher and mentor, David Hays: Computational Linguistics and the Humanist. This post presupposes the conceptual framework of that vision, but does not restate nor endorse its specific recommendations (given in the form of a hypothetical program for simulating the “reading” of texts).

The world has changed since then and in ways neither Hays nor I anticipated. This post reflects those changes and takes as its starting point a recent web discussion about recovering the history of literary studies by using the largely statistical techniques of corpus linguistics in a kind of digital archaeology. But like Tristram Shandy, I approach that starting point indirectly, by way of a digression.

Who’s Kemp Malone?

Back in the ancient days when I was still an undergraduate, and we tied an onion in our belts as was the style at the time, I was at an English Department function at Johns Hopkins and someone pointed to an old man and said, in hushed tones, “that’s Kemp Malone.” Who is Kemp Malone, I thought? From his Wikipedia bio:

Born in an academic family, Kemp Malone graduated from Emory College as it then was in 1907, with the ambition of mastering all the languages that impinged upon the development of Middle English. He spent several years in Germany, Denmark and Iceland. When World War I broke out he served two years in the United States Army and was discharged with the rank of Captain.

Malone served as President of the Modern Language Association, and other philological associations … and was etymology editor of the American College Dictionary, 1947.

Who’d have thought the Modern Language Association was a philological association? Continue reading “Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution”

3rd Linguistic Conference for Doctoral Students: Interdisciplinary Perspectives on Language, Discourse, and Culture

Here’s a link to another conference that might be of interest:

The 3rd Linguistic Conference for Doctoral Students will take place at Heidelberg University, Germany from 05.-06. April 2013. The overarching topic of the conference will be: “Interdisciplinary Perspectives on Language, Discourse, and Culture.” The deadline for submissions is 15 February.

I’ve included the Call for Papers below (The Call for Papers can also be downloaded here):

Continue reading “3rd Linguistic Conference for Doctoral Students: Interdisciplinary Perspectives on Language, Discourse, and Culture”

Ways to protolanguage conference

The third Ways to Protolanguage conference has released its call for papers.  It will take place in Wrocław, Poland from 25–26 May.  The deadline for submission is 31 March.

Plenary speakers include Sue Savage-Rumbaugh, Robin Dunbar,  Peter Gärdenfors, Josep Call  and Tomasz P. Krzeszowski.

More details on the website here: http://www.wsf.edu.pl/57793.xml

 

The Role of Foreigner-Directed Speech in Language Evolution

After all of this talk of spurious cross-cultural correlations it might be time to direct the discussion back to ways to resolve an over-reliance on statistical tendencies. Sean and James did a workshop on this at this year’s EvoLang about how constructive, idiographic and experimental approaches also need to be considered when investigating how linguistic and social structure are linked.

With this in mind, I present my poster from EHBEA earlier this year, which explains some experiments I did for my MSc thesis. I was trying to test the hypothesis that more second language speakers in a linguistic population might effect the cultural transmission of that language. This hypothesis is an attempt to explain the large-scale correlations found by Lupyan & Dale (2010) that showed that the the larger a language population the less morphologically complex that language will be. The idea being that larger language populations will have more second language speakers, and will therefore be more susceptible to the learning biases of adult learners.

There is some experimental evidence about the differences between adult and child learners, some of which I look at here, but in this study I looked at the role foreigner directed speech might have on the use of language in a community with a lot of second language speakers.


Lupyan, G., & Dale, R. (2010). Language structure is partly determined by social structure. PloS one5(1), e8559.

The origin of language in gesture–speech unity

In honor of a new book entitled “How Language Began” by David McNeil, the author has been blogging about the origin of language in gesture–speech unity over at the Cambridge Extra/Linguist List part of the CUP site. These blog posts are lengthy, thought provoking and include very thorough reading lists for the interested.

Part 1: Language and Imagery

Part 2: Gesture-first

Part 3: Mead’s Loop (1)

Part 4: Mead’s Loop (2). Wider consequences.

I’m not sure if there’s any more coming, but I wish more authors and professors would take the time to have a good old blog on open access platforms about their work.

New Perspectives on Duality of Patterning

For those of you who might be interested: Language and Cognition has a special issue on the nature and emergence of duality of patterning (paywall access, sorry!). As one of Hockett’s (1960) design features, duality of patterning is the property of human language that enables parts of language to be recombined in a systematic way to create new forms. In the introductory paper, de Boer, Sandler & Kirby (2012) identify two distinct levels where we see duality of patterning: combinatorial (meaningless sounds can be combined into meaningful morphemes and words) and compositional (morphemes and words can be combined to create new constructions with different meanings). For Hockett, not only is duality of patterning a design feature of language (in that all human languages have it), but also it is a unique characteristic of human language.

These two assumptions have been challenged on several fronts. First of all, simple combinatorial structure is found in systems of primate vocalisations, albeit restricted to a relatively limited set of signals. Meanwhile, in the Al-Sayyid Bedouin Sign Language (ABSL), the community does not have a conventionalised level of meaningless elements (although it does have compositional structure at the levels of morphology and syntax). These two examples offer important insights into the duality of patterning debate:

We see then from the case of ABSL that the need to express a large set of signals does not necessarily lead to combinatorial structure, while conversely from the animal systems, it appears that combinatorial structure does not necessarily need a very large set of signals to emerge. As combinatorial structure is the main defining characteristic of duality of patterning, it appears that both the status of duality of patterning as a design feature of language and the evolutionary pathways leading to it need to be rethought. (de Boer, Sandler & Kirby, 2012: 252).

The rest of the special issue is divided up between theoretical and experimental/modelling contributions. The abstracts and links to the papers (again, paywall, sorry!) are posted below. In summary, the general picture emerging from these papers is that duality of patterning is not a clearcut design feature of language, and nor is it necessarily a unique property of our capacity for language. Furthermore, we should also show a greater appreciation of the role that cultural evolution plays:

An apparent point of consensus from the papers in this special issue is that we should not see duality of patterning as a feature hard-wired into an innate language faculty, but rather as arising from multiple pressures operating on language as it emerges and changes in socially interacting populations. When we talk about the evolution of this design of language, then, we are referring more to cultural rather than biological evolution […] It appears that duality of patterning is a rather general state towards which sufficiently complex systems of signals evolve for different reasons: distinctiveness, learnability and a tendency to keep meaningful distinctions, while at the same time trying to make one’s utterances sound similar to those of others in the population. Thus, multiple cognitive processes seem to lead to duality of patterning and therefore, there are probably multiple evolutionary pathways that lead to duality of patterning as well. (de Boer, Sandler & Kirby, 2012: 257).

Continue reading “New Perspectives on Duality of Patterning”

miR-941 – The new Language Gene

Sorry for the hyperbole in the title, but now I’ve got your attention – Researchers at the University of Edinburgh have found a gene which is implicated in human brain development which humans have, but chimpanzees don’t.

The study compared the human genome to 11 other species of mammals, including chimpanzees, gorillas, mice and rats, and found that miR-941 is unique to humans.

miR-941 is now being slated as a gene that contributed to how early humans developed tool use and language. This gene, in contrast to the likes of FoxP2, has a very specific function rather than being a gene that down regulates other functions. It is said to be the only gene discovered that has such a specific function while only being present in humans. It is active in two areas of the brain that control our linguistic abilities and also our decision making.

The authors estimate that it emerged between six and one million years ago and that it emerged fully functional out of non-coding genetic material (“junk DNA”) in a very short interval of evolutionary time.

I’m sure the studies where they implant it into mice will start soon. Watch this space.

References

Hu, H. Y., He, L., Fominykh, K., Yan, Z., Guo, S., Zhang, X., … & Khaitovich, P. (2012). Evolution of the human-specific microRNA miR-941. Nature Communications3, 1145.

Is ambiguity dysfunctional for communicatively efficient systems?

Based on yesterday’s post, where I argued degeneracy emerges as a design solution for ambiguity pressures, a Reddit commentator pointed me to a cool paper by Piantadosi et al (2012) that contained the following quote:

The natural approach has always been: Is [language] well designed for use, understood typically as use for communication? I think that’s the wrong question. The use of language for communication might turn out to be a kind of epiphenomenon… If you want to make sure that we never misunderstand one another, for that purpose language is not well designed, because you have such properties as ambiguity. If we want to have the property that the things that we usually would like to say come out short and simple, well, it probably doesn’t have that property (Chomsky, 2002: 107).

The paper itself argues against Chomsky’s position by claiming ambiguity allows for more efficient communication systems. First of all, looking at ambiguity from the perspective of coding theory, Piantadosi et al argue that any good communication system will leave out information already in the context (assuming the context is informative about the intended meaning). Their other point, and one which they test through a corpus analysis of English, Dutch and German, suggests that as long as there are some ambiguities the context can resolve, then ambiguity will be used to make communication easier. In short, ambiguity emerges as a result of tradeoffs between ease of production and ease of comprehension, with communication systems favouring hearer inference over speaker effort:

The essential asymmetry is: inference is cheap, articulation expensive, and thus the design requirements are for a system that maximizes inference. (Hence … linguistic coding is to be thought of less like definitive content and more like interpretive clue.) (Levinson, 2000: 29).

If this asymmetry exists, and hearers are good at disambiguating in context, then a direct result of such a tradeoff should be that linguistic units which require less effort should be more ambiguous. This is what they found in results from their corpus analysis of word length, word frequency and phonotactic probability:

We tested predictions of this theory, showing that words and syllables which are more efficient are preferentially re-used in language through ambiguity, allowing for greater ease overall. Our regression on homophones, polysemous words, and syllables – though similar – are theoretically and statistically independent. We therefore interpret positive results in each as strong evidence for the view that ambiguity exists for reasons of communicative efficiency (Piantadosi et al., 2012: 288).

At some point, I’d like to offer a more comprehensive overview of this paper, but this will have to wait until I’ve read more of the literature. Until then, here’s some graphs of the results from their paper:

Continue reading “Is ambiguity dysfunctional for communicatively efficient systems?”

Chocolate Consumption, Traffic Accidents and Serial Killers

Last month there was a paper published about a correlation between chocolate consumption and Nobel Laureates.

EDIT: I now see the article may not be accessible to everyone.  Here’s a summary: Messerli suggests that, because some flavinoids that are found in chocolate have been linked to improved cognition, one might expect a country that eats more chocolate on average to produce more Nobel Laureates on average.  Indeed, Messerli finds a linear correlation between the two variables.  While the tone of the short paper is not entirely serious, we’ve previously reported on many spurious correlations, and why they’re so easy to find between cultural variables. The chocolate/Laureate correlation looked like one of these spurious findings, so we set out to debunk it by showing correlations with some less expected variables.  If this is the case, then papers like the one criticised here are dangerous because they give credence to this questionable method, while producing media-grabbing headlines.

Me and James wrote a response article, but it’s just been rejected, citing ‘lack of space’ (Dorothy Bishop has also posted a recent response).  Here’s the 175 words we submitted.   Amongst the 4 statistical tests, try spotting the 6 hidden references to chocolate:

Chocolate consumption (CC) correlates with the number of Nobel laureates (NL) per capita1.  However, correlation studies are a rocky road, and it’s easy to fudge correlation and causation. Our data mars the previous inference.  Average IQ2 does not correlate with CC (r=0.27, p=0.21).  CC correlates with the (log) number of serial3 and rampage3 killers per capita (r = 0.52, p=0.02, fig. 1). NL correlates with the annual road fatalities per capita3 (r=-0.55, p=0.0066).  Controlling for GDP3 and mean temperature4, CC is not a significant predictor of NL (F(1,19) = 3.6, p = 0.07). These correlations are unlikely to be causal, so why are they robust? Cultural phenomena diffuse in a way that leads to spurious correlations between independent variables5.  While flavonoids may aid cognition, there is little evidence to suggest there is an isomorphic link between individual-level benefits and widespread population-level effects. The original work may have elicited snickers, but it is receiving media coverage.  If researchers declare this to be a robust approach, then it’s a slippery slope to a world of pure imagination.

And here’s a longer version of the paper: ChocolateSerialKillers_WintersRoberts (including three more puns).

 

References

[1] Messerli, F.H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine. DOI: 10.1056/NEJMon1211064

[2] Lynn, R. and Vanhanen, T. (2002). IQ and the wealth of nations. Praeger Publishers.

[3] Wikipedia:

http://en.wikipedia.org/wiki/List_of_serial_killers_by_number_of_victims;

http://en.wikipedia.org/wiki/List_of_rampage_killers; http://en.wikipedia.org/wiki/List_of_countries_by_traffic-related_death_rate

http://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28nominal%29_per_capita

[4] Mitchell,T.D., Hulme,M., and New,M., 2002: Climate data for political areas. Area 34:109-112.

[5] Roberts, S. and Winters, J. (2012). Constructing knowledge: Nomothetic approaches to language evolution. In McCrohon, L., Fujimura, T., Fujita, K., Martin, R., Okanoya, K., Suzuki, R., and Yusa, N., editors, Five Approaches to Language Evolution: Proceedings of the Workshops of the 9th International Conference on the Evolution of Language. World Scientific. pp 148-157.

 

Messerli, F. (2012). Chocolate Consumption, Cognitive Function, and Nobel Laureates New England Journal of Medicine, 367 (16), 1562-1564 DOI: 10.1056/NEJMon1211064