Ways to protolanguage conference

The third Ways to Protolanguage conference has released its call for papers.  It will take place in Wrocław, Poland from 25–26 May.  The deadline for submission is 31 March.

Plenary speakers include Sue Savage-Rumbaugh, Robin Dunbar,  Peter Gärdenfors, Josep Call  and Tomasz P. Krzeszowski.

More details on the website here: http://www.wsf.edu.pl/57793.xml

 

Chocolate Consumption, Traffic Accidents and Serial Killers

Last month there was a paper published about a correlation between chocolate consumption and Nobel Laureates.

EDIT: I now see the article may not be accessible to everyone.  Here’s a summary: Messerli suggests that, because some flavinoids that are found in chocolate have been linked to improved cognition, one might expect a country that eats more chocolate on average to produce more Nobel Laureates on average.  Indeed, Messerli finds a linear correlation between the two variables.  While the tone of the short paper is not entirely serious, we’ve previously reported on many spurious correlations, and why they’re so easy to find between cultural variables. The chocolate/Laureate correlation looked like one of these spurious findings, so we set out to debunk it by showing correlations with some less expected variables.  If this is the case, then papers like the one criticised here are dangerous because they give credence to this questionable method, while producing media-grabbing headlines.

Me and James wrote a response article, but it’s just been rejected, citing ‘lack of space’ (Dorothy Bishop has also posted a recent response).  Here’s the 175 words we submitted.   Amongst the 4 statistical tests, try spotting the 6 hidden references to chocolate:

Chocolate consumption (CC) correlates with the number of Nobel laureates (NL) per capita1.  However, correlation studies are a rocky road, and it’s easy to fudge correlation and causation. Our data mars the previous inference.  Average IQ2 does not correlate with CC (r=0.27, p=0.21).  CC correlates with the (log) number of serial3 and rampage3 killers per capita (r = 0.52, p=0.02, fig. 1). NL correlates with the annual road fatalities per capita3 (r=-0.55, p=0.0066).  Controlling for GDP3 and mean temperature4, CC is not a significant predictor of NL (F(1,19) = 3.6, p = 0.07). These correlations are unlikely to be causal, so why are they robust? Cultural phenomena diffuse in a way that leads to spurious correlations between independent variables5.  While flavonoids may aid cognition, there is little evidence to suggest there is an isomorphic link between individual-level benefits and widespread population-level effects. The original work may have elicited snickers, but it is receiving media coverage.  If researchers declare this to be a robust approach, then it’s a slippery slope to a world of pure imagination.

And here’s a longer version of the paper: ChocolateSerialKillers_WintersRoberts (including three more puns).

 

References

[1] Messerli, F.H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine. DOI: 10.1056/NEJMon1211064

[2] Lynn, R. and Vanhanen, T. (2002). IQ and the wealth of nations. Praeger Publishers.

[3] Wikipedia:

http://en.wikipedia.org/wiki/List_of_serial_killers_by_number_of_victims;

http://en.wikipedia.org/wiki/List_of_rampage_killers; http://en.wikipedia.org/wiki/List_of_countries_by_traffic-related_death_rate

http://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28nominal%29_per_capita

[4] Mitchell,T.D., Hulme,M., and New,M., 2002: Climate data for political areas. Area 34:109-112.

[5] Roberts, S. and Winters, J. (2012). Constructing knowledge: Nomothetic approaches to language evolution. In McCrohon, L., Fujimura, T., Fujita, K., Martin, R., Okanoya, K., Suzuki, R., and Yusa, N., editors, Five Approaches to Language Evolution: Proceedings of the Workshops of the 9th International Conference on the Evolution of Language. World Scientific. pp 148-157.

 

Messerli, F. (2012). Chocolate Consumption, Cognitive Function, and Nobel Laureates New England Journal of Medicine, 367 (16), 1562-1564 DOI: 10.1056/NEJMon1211064

Cultural Evolution: Headspace with Dr. Kenny Smith

Edinburgh’s student radio station, Fresh Air, has a show called “Headspace” which aims to discuss ideas related to how we perceive, act, learn, communicate and think. Today’s episode was all about Cultural Evolution and features an extended discussion with Kenny Smith.

Readers can also listen to our very own Rachael Bailes talking about Animal Learning a couple of weeks ago:

The Simming Problem

I’m currently reading Iain M. Banks’ latest Culture novel The Hydrogen Sonata (quotes, but no spoliers ahead).  It has a discussion of the ethics of simulating individuals, what Banks calls the Simming Problem.  As someone who uses modelling to study cultural evolution, it struck a chord.  Those who’ve read culture novels will be familiar with this kind of issue, but for the non-initiated, the Culture is a hyper-advanced race with out-of-this-galaxy Artificial Intelligences called Minds who usually end up tangling with the affairs of other, lesser cultures in times of crisis.  A useful tool would be the simulation of events to pick the best course of action.  However, this brings with it some ethical concerns:

“Most problems, even seemingly really tricky ones, could be handled by simulations which happily modelled slippery concepts like public opinion or the likely reactions of alien societies by the appropriate use of some especially cunning and devious algorithms… nothing more processor-hungry than the right set of equations…

But not always.  Sometimes, if you were going to have any hope of getting useful answers, there really was no alternative modelling the individuals themselves, at the sort of scale and level of complexity that mean they each had to exhibit some kind of discrete personality, and that was where the Problem kicked in.

Once you’d created your population of realistically reacting and – in a necessary sense – cogitating individuals, you had – also in a sense – created life.  The particular parts of whatever computational substrate you’d devoted to the problem now held beings; virtual beings capable of reacting so much like the back-in-reality beings they were modelling – because how else were they to do so convincingly without also hoping, suffering, rejoicing, caring, living and dreaming?

By this reasoning, then, you couldn’t just turn off your virtual environment and the living, thinking creatures it contained at the completion of a run or when a simulation had reached the end of its useful life; that amounted to genocide.”

Uh oh.  Given the number of simulations I’ve ended, this might make me pretty much a war criminal.   Seriously, even if this is a genuine problem, we’re so far away from modelling at this level it’s not worth losing any sleep over. This topic is linked to a lot of philosophical work on what constitutes life and sentience.  However, people have developed ethics codes for simulationists.  The Simming Problem is also mentioned in Open problems in artificial life, which discusses biological and virtual life (Bedau et al. 2006, p. 375):

“It is worth noting that public protocols govern the responsible treatment of human and animal research subjects. The lack of analogous protocols in artificial life may be no serious problem today, but as we create more sophisticated living entities we will have to face the responsibility of treating them appropriately.”

Banks also discusses two other problems:  If you make your simulations too realistic, you’re basically left with the same problem as in reality.  Finally, there’s the Chaos Problem – basically that even if you didn’t think that simulated beings really had a right to life and you ran loads of simulations, different runs of the same simulation might give you different results.  There’s no telling which one will actually match up with reality.  The alternative is to

“access the summed total of galactic history and analyse, compare and contrast the current situation relative to similar ones from the past… Its official title was Constructive Historical Integrative Analysis.”

… Sounds a lot like Bayesian Phylogenetics to me.  The questions raised here, though, are often the source of conflict between approaches to cultural evolution studies.  How useful are abstract models?  How do we interpret the results of abstract models? How valid is it to model populations rather than individuals?  How useful is it to model things as realistically as possible? Should we only be using real data?  How do we integrate real data and abstract models?

But of course, Banks has the last word:

“In the end, though, there was another name the Minds used, amongst themselves, for this technique, which was Just Guessing.”

The final correlation: Bayesian Causal Graphs as an alternative to Phylogenetics

I vowed never to look at any more spurious correlations.  But there is time for one final foray into the word of acacia trees and traffic accidents.

Some of my previous posts showed correlations between bizarre variables such as alcohol consumption and morphological complexity, acacia trees and tonal languages and the sonority of a language and the amount of extra marital sex that its speakers indulge in.  Of course, the point was that cultural variables are likely to be correlated, even when they are not causally related, because of the way they spread.  As populations migrate, they bring whole bundles of cultural features with them.  See the article I wrote with James about this here.

There are some methods that try to account for this, such as Bayesian Phylogenetics.  However, these models are quite complicated and can take a lot of time to run.  Mostly, they are concerned with one or two cultural traits that we have some reason to think are linked.  However, what do we do if we’re not even sure what we should be controlling for?

One possible method is Bayesian causal graphs.  This is a method of figuring out the most likely causal graph given the correlations between variables.  R has a package to build causal graphs called pcalg which is quite straightforward to use (maual here).  I loaded up as many variables as I could and aggregated them by country.  The causal graph is calculated by computing all the partial correlations between all sets of variables, then figuring out which ones are most significant.  Here’s what I came up with (visualised with Gephi, click to make big):

Some interesting things come out.  First, some elements make intuitive sense, like the contemporary pathogen prevalence relying on the historical pathogen prevalence and the Gini coefficient (how rich the country is).  Variables like the number of frost days, mean growing season and mean temperature are linked.

Interestingly, this analysis suggests that linguistic diversity and road fatalities are not causally linked, although there is a strong correlation between them.  Also, tonal languages and the presence of acacia trees are not causally linked.  This is good news!

5-HTTLPR is a geneotype that has been linked to collectivism.  Me and James worked on a hypothesis that the link between these two things came about because of a difference in migration patterns.  The current graph suggests that there is no direct causal link between collectivism and the 5-HTTLPR geneotype, but they are linked through the levels of current migration.  The link with population density and long/short term orientation also fits with our hypothesis of more independent people migrating into harsher climates.  Me and James are currently working on a paper that uses statistics and modelling to argue this case.

Overall, the following picture falls out.  Ecological factors, such as the availability of water, dictate the kind of interaction dynamics that are prioritised between people.  This leads to different kinds of communication pressures which changes language in different ways:

There may be feedback in the other direction, as well.  That is, coevolution!

While the graph above looks impressive and makes sense in some areas, it is quite unstable.  Taking out some variables or cases leads to different links.  Some links are more stable than others.  For example, water availability and masculinity was quite robust.

It may be possible to do a kind of leave-one-out analysis, create lots of causal graphs and then bootstrap a most likely causal graph.  That is, each edge would be probabilistic, just like branches on a phylogenetic tree.  The advantage is that the graphs take almost no time to compute, even with relatively large datasets.  Also, the graphs get more accurate the more variables you put in.

And now I really, really promise not to do any more spurious correlations.

… although keep an eye out for an article by me and James in the New England Journal of Medicine on a correlation between chocolate consumption and serial killers.

 

Roberts, S. & Winters, J. (2012) Constructing Knowledge: Nomothetic approaches to language evolution. In L. McCrohon, T. Fujimura, K. Fujita, R. Martin, K. Okanoya, R. Suzuki, N. Yusa, Five Approaches to Language Evolution: Proceedings of the Workshops of the 9th International Conference on the Evolution of Language. Evolang9 Organizing Committee

Workshop on Investigating Protolanguage, Utrecht

Timed to co-incide with Marieke Schouwstra’s PhD defense, there will be a workshop on investigating protolanguage at the University of Utrecht, the Netherlands on the 30th November.  Here’s the blurb:

What did language look like in the early stages of language evolution? Recently, researchers have started to investigate this question in the lab, using human participants. From these experiments, conclusions have been formulated about the properties of protolanguage (an intermediate stage in the emergence of human language). What are the different accounts of protolanguage? And does it make sense to talk about an intermediate stage in the emergence of language?

More information here:

http://www.phil.uu.nl/~mariekes/workshop-protolanguage/index.html

PLM2012 Coverage: Dirk Geeraerts: Corpus Evidence for Non-Modularity

The first plenary talk at this year’s Poznań Linguistic Meeting was by Dirk Geeraerts, who is professor of linguistics at the University of Leuven, Belgium.

In his talk, he discussed the possibility that corpus studies could yield evidence against the supposed modularity of language and mind endorsed by, for example, Generative linguists (you can find the abstract here)

Geeraerts began his talk by stating that there seems to be a paradigm shift in linguistics from an analysis of structure that is based on introspection to analyses of behaviour based on quantitative linguistic studies. More and more researchers are adopting quantified corpus-based analyses, which test hypotheses using statistical testing of language behaviour. As a data-set they use experimental data or large corpora. In his talk, he discussed the possibility that corpus studies could yield evidence against the supposed modularity of language and mind endorsed by, for example, Generative linguists (you can find the abstract here)

Multifactoriality

One further trend Geeraerts identified in this paradigm shift is that these kinds of analyses become more and more multifactorial in that they include multiple different factors which are both internal and external to language. Importantly, this way of doing linguistics is fundamentally different than the mainstream late 20th century view of linguistics.

What is important to note here when comparing this trend to other approaches to studying language is that multifactoriality goes against Chomsky’s idea of grammar as an ideal mental system that can be studied through introspection. In the traditional view, it is supposed that there is some kind of ideal language system which everyone has access to. This line of reasoning then justifies introspection as a method of studying the whole system of language and making valid generalizations about it. However, this goes against the emerging corpus linguistic view of language. On this view a random speaker is not representative for the linguistic community as a whole. The linguistic system is not homogenous across all speakers, and therefore introspection doesn’t suffice.

Modularity

The main thrust of Geeraerts’ talk was that research within this emerging paradigm also might call into question the assumption of the modularity of the mind (as advocated, for example by Jerry Fodor or Neil Smith): The view of the mind as a compartmentalized system consisting of discrete components or modules (for example, the visual system, language) plus a central processor.

Continue reading “PLM2012 Coverage: Dirk Geeraerts: Corpus Evidence for Non-Modularity”

Evolutionary Linguistics conferences in Beijing and Geneva

Two recent calls for papers in evolutionary Linguistics:

Conference in Evolutionary Linguistics 2012.  November 9th-11th, 2012, Peking University. Submission deadline: September 1st.

The keynote speakers include Prof. William S.-Y. Wang, William Labov and Morten Christiansen.

Session on Origin of language and human cognition at the International Congress of Linguistics. July 22nd-27th, Geneva. Submission deadline: September 1st.

I found out about these through MusiCoLinguistics.  Confusingly, some publicised calls for the Peking University conference link to the conference in Geneva.

Visualising language similarity through translation statistics

A tweet put me on to UNESCO’s Index Translationum – World Bibliography of Translation.  It’s a list of books that have been translated from one language into another.  I wondered if there was a way to use this to look at language similarity which took bilingualism into account.  Essentially, if two languages are very different and there are few bilingual speakers, then there should be a lot of translations.  If two languages are spoken bilingually by many people, then there should be less cause for translations.  Of course, there economic, cultural and political factors, too, but let’s see how far we can get.  Here’s a visualisation of the data using Gephi:

Thresh1002

At first, some predictions are not borne out.  There are 3616 publications translated from Spanish into Catalan, while there are 9244 publications from Spanish into English.  This suggests that Spanish and Catalan are closer.  Of course, there are only 12 publications translated from Spanish to Hindi, but it’s unlikely that this is being caused by a large Spanish-Hindi bilingual community.  That is, low numbers could mean no need to translate because of language similarity, or that there is no economic or cultural incentive for translating between them (or lack of data).

Still, we can put the translation matrix into a clustering algorithm and create a cluster diagram.  Using the inverse of the (log) number of publications as a distance measure (so that languages with lots of translated books are closer), we get some sensible clusterings:

Continue reading “Visualising language similarity through translation statistics”