Future tense and saving money: no correlation when controlling for cultural evolution

This week our paper on future tense and saving money is published (Roberts, Winters & Chen, 2015).  In this paper we test a previous claim by Keith Chen about whether the language people speak influences their economic decisions (see Chen’s TED talk here or paper).  We find that at least part of the previous study’s claims are not robust to controlling for historical relationships between cultures. We suggest that large-scale cross-cultural patterns should always take cultural history into account.

Does language influence the way we think?

There is a longstanding debate about whether the constraints of the languages we speak influence the way we behave. In 2012, Keith Chen discovered a correlation between the way a language allows people to talk about future events and their economic decisions: speakers of languages which make an obligatory grammatical distinction between the present and the future are less likely to save money.

Continue reading “Future tense and saving money: no correlation when controlling for cultural evolution”

Phylogenetics in linguistics: the biggest intellectual fraud since Chomsky?

A few weeks ago, Roger Blench gave a talk at the MPI entitled ‘New mathematical methods’ in linguistics constitute the greatest intellectual fraud in the discipline since ChomskyThe title is controversial, to say the least!  The talk argued, amongst other things,  that phylogenetic methods are less transparent and less replicatable than traditional historical reconstruction.   Here are I argue against those points.

I felt like I should respond online, because Roger Blench made the talk slides available online (a similar set of arguments are more fully expressed by here).

Continue reading “Phylogenetics in linguistics: the biggest intellectual fraud since Chomsky?”

Rice, collectivism and cultural history

Today I published a short commentary on a recent paper which found correlations between rice growing and collectivism (Talhelm et al., 2014).  We’ve written about collectivism before (and here).  However, while this may sound like a spurious correlation, there’s more to it:  The theory is that communities which engage in more intensive practices, and therefore require help and collaboration of others, are biased towards a collectivist attitude (as opposed to an individualist attitude).  Rice growing is more intensive than wheat growing, and requires more extensive irrigation, both of which may require collaboration from neighbours.

The really interesting thing about Talhelm et al.’s study is that they look at data within a single country – ChinaThey also find correlations at the county level: Neighbouring counties which differ in the proportion of rice grown (the so-called rice-wheat border) differ in a range of sociological measures of individualism.

Still, the study did not directly control for possible shared history – either of farming practices or social attitudes.  I was recently a reviewer for another commentary on the paper, and decided to look a little deeper.

Continue reading “Rice, collectivism and cultural history”

Evolang 11: Call for papers

The next Evolution of Language Conference will take place in New Orleans on March 21 -24, 2016.  The call for papers is now open.

The deadline for submissions is September 4th.  See the call for papers for more details.

This year there are some notable changes, including double blind reviewing, electronic proceedings and the possibility of adding supplementary materials.

I’m looking forwards to it already!

Causality in linguistics: Nodes and edges in causal graphs

This coming week I’ll be at the Causality in the Language Sciences conference.  One of the topics of discussion will be how to integrate theories of causality into linguistic work.  Bayesian Causal Graphs are a core approach to causality, and seem like a useful framework for thinking about linguistic problems.  However, it’s not entirely clear whether all questions in linguistics can be represented using causal graphs.  In this post, I’ll discuss some possible uses of Bayesian Causal Graphs, and test the fit of some actual data to some causal structures.  (and please forgive my basic understanding of causality theory!)

Causal graphs are composed of states connected by edges.  A change or activation of a state causes a change in another.  States and causes can be categorical and absolute, or statistical and even complex in their relations.  Causal graphs are often introduced with the following kind of structure, taken from Pearl’s seminal book on Causality.  The season causes it to rain (in winter) and causes the sprinkler to come on (in summer).  Both the sprinkler being on and rain independently cause the grass to be wet.  If the grass is wet, the grass becomes slippery:

Screen Shot 2015-04-11 at 22.06.52

This example is easy to understand because each state is binary and (in this simple world) each causal effect is immediate and direct.  However, finding a similar example for linguistics is tricky.  Linguists may simply not agree on what the nodes are or what the edges represent.

Continue reading “Causality in linguistics: Nodes and edges in causal graphs”

How spurious correlations arise from inheritance and borrowing (with pictures)

James and I have written about Galton’s problem in large datasets.  Because two modern languages can have a common ancestor, the traits that they exhibit aren’t independent observations.  This can lead to spurious correlations: patterns in the data that are statistical artefacts rather than indications of causal links between traits.

However, I’ve often felt like we haven’t articulated the general concept very well.  For an upcoming paper, we created some diagrams that try to present the problem in its simplest form.

Spurious correlations can be caused by cultural inheritance 

Gproblem2

Above is an illustration of how cultural inheritance can lead to spurious correlations.  At the top are three independent historical cultures, each of which has a bundle of various traits which are represented as coloured shapes.  Each trait is causally independent of the others.  On the right is a contingency table for the colours of triangles and squares.  There is no particular relationship between the colour of triangles and the colour of squares.  However, over time these cultures split into new cultures.  Along the bottom of the graph are the currently observable cultures.  We now see a pattern has emerged in the raw numbers (pink triangles occur with orange squares, and blue triangles occur with red squares).  The mechanism that brought about this pattern is simply that the traits are inherited together, with some combinations replicating more often than others: there is no causal mechanism whereby pink triangles are more likely to cause orange squares.

Spurious correlations can be caused by borrowing

Gproblem_HorizontalB

Above is an illustration of how borrowing (or areal effects or horizontal cultural inheritance) can lead to spurious correlations.  Three cultures (left to right) evolve over time (top to bottom).  Each culture has a bundle of various traits which are represented as coloured shapes.  Each trait is causally independent of the others.  On the right is a count of the number of cultures with both blue triangles and red squares.  In the top generation, only one out of three cultures have both.  Over some period of time, the blue triangle is borrowed from the culture on the left to the culture in the middle, and then from the culture in the middle to the culture on the right.  By the end, all languages have blue triangles and red squares.  The mechanism that brought about this pattern is simply that one trait spread through the population: there is no causal mechanism whereby blue triangles are more likely to cause red squares.

A similar effect would be caused by a bundle of causally unrelated features being borrowed, as shown below.

Gproblem_Horizontal

Persister: A sci-fi novel about cultural evolution and academic funding

Someone has written a sci-fi space opera about a serial killer that targets researchers of cultural evolution which is also a satire on the state of academic funding systems.

That’s quite an action-packed sentence.

Persister: Space Funding Crisis I by Casey Hattrey is a short novel set in the 45th century about a cultural evolution researcher named Arianne. By this point, the decision process for academic funding takes so long that the only sensible option is to cryogenetically freeze yourself while you wait for the decision to come in. The cost of this, and the fierce competition for funding in a pan-galactic community, has made the Central Academic Funding Council Administration the most powerful force in the galaxy. Now, Arianne has been woken from chryo-sleep, not to be given a grant, but to investigate a series of gruesome murders. Someone has been killing the top researchers in the field of cultural evolution.

"In space, no one can hear you apply for funding"

Continue reading “Persister: A sci-fi novel about cultural evolution and academic funding”

Tone and Humidity: FAQ

Everett, Blasi & Roberts (2015) review literature on how inhaling dry air affects phonation, suggesting that lexical tone is harder to produce and perceive in dry environments.  This leads to a prediction that languages should adapt to this pressure, so that lexical tone should not be found in dry climates, and the paper presents statistical evidence in favour of this prediction.

Below are some frequently asked questions about the study (see also the previous blog post explaining the statistics).

Continue reading “Tone and Humidity: FAQ”

Tone and humidity

Does the weather effect the languages we speak?

This week, Caleb Everett, Damian Blasi and I have a paper out in PNAS (also available here) on the effects of humidity on the production and perception of lexical tone, and the subsequent predictions about the distribution of tone across the world.

Screen Shot 2015-01-14 at 17.27.53
Map of humidity (lighter = more humid) with complex tone languages in red and non-complex tone languages in blue.

The basic principle behind studies of cultural evolution is that a selective pressure on communication can transform the structures of a language over time.  What we explore is whether speaking in dry environments exerts a pressure to avoid using sounds that are more difficult to produce or comprehend, leading to those sounds being selected against.

Edit: See also this FAQ page

Continue reading “Tone and humidity”

Serotonin and short-term/long-term orientation

This week I discovered that an analysis using Causal Graphs that James and I did in 2013 has been backed up by more recent data.  This demonstrates the power of Causal Graph analysis, which we’ll be discussing in our workshop on Causality in the Language Sciences (submission deadline extended!)

A recent paper demonstrates a correlation between various genetic factors and life history strategies (Minkov & Bond, 2015).  Minkov & Bond find that the prevalence of three gene polymorphisms (5-HTTLPR serotonin transporter gene, the androgen receptor gene AR and the dopamine receptor gene DRD4) correlate with measures of how willing people are to take risks, such as long-term/short-term orientation.

Screen Shot 2015-01-14 at 09.34.06

We’re written before about 5-HTTLPR (here and here), which was previously associated with individualism/collectivism.  However, the paper above, and a previous one in 2014 by Minkov, Blagoev & Bond, find that the correlation is stronger for long-term/short-term orientation.

What’s interesting for us is that James and I predicted this in our 2013 paper on spurious correlations (the one with acacia trees and traffic accidents).  Here’s figure 4 from our paper, which was generated using a causal graph algoritm (explained in more detail in this post):

Screen Shot 2015-01-14 at 09.29.09

The relevant part is here, which predicts that 5-HTTLPR prevalence is causally related to Long-term/short-term orientation, but is causally independent from collectivism:

Screen Shot 2015-01-14 at 09.30.56

We suggested that the relationship between 5-HTTLPR and collectivism is mediated by the probability of migrating into harsher climates (a kind of risk-taking), and produced a computational model to demonstrate the principle (we also did some analyses which showed that measures of climate are correlated with 5-HTTLPR, but we haven’t reported these).

The more recent papers above also suggest that the genetic traits are linked with long-term/short-term orientation, but did so my greatly expanding the sample of genetic prevalence.  So how did we get our result?  In our analysis, we averaged 5-HTTLPR prevalence across countries, which is not realistic.  This makes me worried that the correlations are being inflated by non-independence of the samples.

The authors are confident of the robustness of the correlation:

“If all these associations were spurious, their association would be miraculous, especially at the national-regional level. If there is no real association between the LHSGF and the reported measures of LHS and TO, what then explains the extremely high correlations?”

However, as our paper argues, spurious correlations are more likely when datapoints are linked through historical descent or borrowing (Galton’s problem).  In the case of this paper, genetic traits are obviously historically related, and it’s likely that cultural values and life history strategies are also culturally transmitted.

I tried testing whether the correlation is robust to historical or contact relationships.  I used geographic proximity as a proxy for how closely related different cultures are.  For each country, I found the geographic coordinates of the capital city.  The graphs below demonstrate that there’s at least some geographic clustering (and a hit of a founder effect for the genetic data, as predicted by our migration model):

Geographic distribution of the genetic index
Geographic distribution of the genetic index
Geographic distribution of the life history strategy index
Geographic distribution of the life history strategy index

I then calculated the distance between each pair of countries in geographic terms (great circle distance), the National life history strategy genetic factor index and the genetic factors.  (for the genetic factors, I did a principal components analysis, as in Mikov & Bond, and used the first component, which had an eigen value of 2.62 and explained  65.5% of the variance, compared to Mikov & Bond’s 2.04, and 68%).

This gives us three distance matrices:  distance in miles, distance in life history strategy and distance in genetic traits.  I then used a Mantel test to compare these.

Genetic and life history measures are correlated (r = 0.88, p < 0.0001), as in the paper above (in the regression, r = 0.78-0.84).  Both the genetic and life history measures were correlated with geographic distance (r = 0.36, p < 0.0001; r = 0.27, p = 0.0003), which suggests that they are not independent (i.e. a country is likely to be more similar to its neighbour than a distant culture).

However, there is still a significant correlation between genetic and life history measures when controlling for geographic distance (r = 0.87, p = 0.0001).  In fact, the correlation is barely affected at all when partialling out the geographic distance.

So, it appears that the correlation is somewhat robust to controlling for non-independence.  But will it play out in the long-term?

Source data and analysis script: MikovBond_Mantel

Edit: Michael Minkov has been in touch, and argues that psychological phenomena, such as happiness, values, attitudes etc. can’t be borrowed across cultures.  They depend on particular economic conditions, which also can’t be borrowed in the same way that a word or an artefact can be borrowed.

Edit2: Above, I used raw distance, but log distance is probably a better measure.  Both genetic index and life history index are more strongly correlated with log geographic distance (r = 0.42, p < 0.0001; r = 0.35, p < 0.0001).  However, there’s not much difference in the correlation between genetic and life history measures when controlling for log geographic distance (r = 0.86, p < 0.0001).