Degeneracy emerges as a design feature in response to ambiguity pressures

Two weeks ago my supervisor, Simon Kirby, gave a talk on some of the work that’s been going on in the LEC. Much of his talk focused on one of the key areas in language evolution research: the emergence of the basic design features that underpin language as a system of communication. He gave several examples of these design features, mostly drawn from the eminent linguist, Charles Hockett, before moving on to one of the main areas of focus over the past few years: compositionality (the ability for complex expressions to derive their meaning from the combined meaning of their parts; see Michael’s post and Sean’s post for some good previous coverage). Simon’s argument is that compositionality, as well as some other design features of language, emerge from two competing constraints: a pressure to be useful (expressivity) and a pressure to be learned (compressibility).

The general gist of the talk was that by varying the relative pressures of these two constraints we can evolve very different systems of communication. To get something approaching language we thus need to reach a balance between learning and use. First, naïve learning is required because it forces language to adapt to the learning bottleneck imposed by the maturational constraints on child learning. Still, even with this inter-generational learning pressure, language isn’t merely a passive task of remembering and reproducing a set of forms and meanings. Instead, we need to also account for usage dynamics: here, the system must display a capacity to be expressive, in so much that there is an ability for signals to differentiate between meanings within a language.

From Kirby, Cornish & Smith’s (2008) work we know that a language heavily biased towards maximally expressivity is very much like the initial generation of their experiments: there is an idiosyncratic set of one-form to one-meaning pairs without any systematic structure. It’s expressive because every possible meaning in the space has a label. By contrast, a stronger bias towards learnability results in highly compressible languages: that is, we see highly underspecified systems of communication, with the most extreme example being one-form to all-meanings. The result of balancing these two forces over Iterated Learning (henceforth, IL) is the emergence of compositionality: a learnable, yet highly structured communication system that is the result of a pressure to generalise over a set of novel stimuli.

Continue reading “Degeneracy emerges as a design feature in response to ambiguity pressures”

Cultural Evolution: Headspace with Dr. Kenny Smith

Edinburgh’s student radio station, Fresh Air, has a show called “Headspace” which aims to discuss ideas related to how we perceive, act, learn, communicate and think. Today’s episode was all about Cultural Evolution and features an extended discussion with Kenny Smith.

Readers can also listen to our very own Rachael Bailes talking about Animal Learning a couple of weeks ago:

The Simming Problem

I’m currently reading Iain M. Banks’ latest Culture novel The Hydrogen Sonata (quotes, but no spoliers ahead).  It has a discussion of the ethics of simulating individuals, what Banks calls the Simming Problem.  As someone who uses modelling to study cultural evolution, it struck a chord.  Those who’ve read culture novels will be familiar with this kind of issue, but for the non-initiated, the Culture is a hyper-advanced race with out-of-this-galaxy Artificial Intelligences called Minds who usually end up tangling with the affairs of other, lesser cultures in times of crisis.  A useful tool would be the simulation of events to pick the best course of action.  However, this brings with it some ethical concerns:

“Most problems, even seemingly really tricky ones, could be handled by simulations which happily modelled slippery concepts like public opinion or the likely reactions of alien societies by the appropriate use of some especially cunning and devious algorithms… nothing more processor-hungry than the right set of equations…

But not always.  Sometimes, if you were going to have any hope of getting useful answers, there really was no alternative modelling the individuals themselves, at the sort of scale and level of complexity that mean they each had to exhibit some kind of discrete personality, and that was where the Problem kicked in.

Once you’d created your population of realistically reacting and – in a necessary sense – cogitating individuals, you had – also in a sense – created life.  The particular parts of whatever computational substrate you’d devoted to the problem now held beings; virtual beings capable of reacting so much like the back-in-reality beings they were modelling – because how else were they to do so convincingly without also hoping, suffering, rejoicing, caring, living and dreaming?

By this reasoning, then, you couldn’t just turn off your virtual environment and the living, thinking creatures it contained at the completion of a run or when a simulation had reached the end of its useful life; that amounted to genocide.”

Uh oh.  Given the number of simulations I’ve ended, this might make me pretty much a war criminal.   Seriously, even if this is a genuine problem, we’re so far away from modelling at this level it’s not worth losing any sleep over. This topic is linked to a lot of philosophical work on what constitutes life and sentience.  However, people have developed ethics codes for simulationists.  The Simming Problem is also mentioned in Open problems in artificial life, which discusses biological and virtual life (Bedau et al. 2006, p. 375):

“It is worth noting that public protocols govern the responsible treatment of human and animal research subjects. The lack of analogous protocols in artificial life may be no serious problem today, but as we create more sophisticated living entities we will have to face the responsibility of treating them appropriately.”

Banks also discusses two other problems:  If you make your simulations too realistic, you’re basically left with the same problem as in reality.  Finally, there’s the Chaos Problem – basically that even if you didn’t think that simulated beings really had a right to life and you ran loads of simulations, different runs of the same simulation might give you different results.  There’s no telling which one will actually match up with reality.  The alternative is to

“access the summed total of galactic history and analyse, compare and contrast the current situation relative to similar ones from the past… Its official title was Constructive Historical Integrative Analysis.”

… Sounds a lot like Bayesian Phylogenetics to me.  The questions raised here, though, are often the source of conflict between approaches to cultural evolution studies.  How useful are abstract models?  How do we interpret the results of abstract models? How valid is it to model populations rather than individuals?  How useful is it to model things as realistically as possible? Should we only be using real data?  How do we integrate real data and abstract models?

But of course, Banks has the last word:

“In the end, though, there was another name the Minds used, amongst themselves, for this technique, which was Just Guessing.”

Arguments against a “prometheus” scenario

The Biological Origin of Linguistic Diversity:

From some of the minds that brought you  Chater et al. (2009) comes a new and exciting paper in PlosONE.

Chater et al. (2009) used a computational model to show that biological adaptations for language are impossible because language changes too rapidly through cultural evolution for natural selection to be able to act.

This new paper, Baronchelli et al. (2012), uses similar models to first argue that if language changes quickly then “neutral genes” are selected for because biological evolution cannot act upon linguistic features when they are too much of a “moving target”. Secondly they show that if language changes slowly in order to facilitate coding of linguistic features in the genome, then two isolated subpopulations who originally spoke the same language will diverge biologically through genetic assimilation after they linguistically diverge, which they inevitably will.

The paper argues that because we can observe so much diversity in the world’s languages, but yet children can acquire any language they are immersed in, only the model which supports the selection of “neutral genes” is plausible. Because of this, a hypothesis in which domain general cognitive abilities facilitate language rather than a hypothesis for a biologically specified, special-purpose language system is much more plausible.

A Prometheus scenario:

Baronchelli et al. (2012) use the results of their models to argue against what they call a “Prometheus” scenario. This is a scenario in which “a single mutation (or very few) gave rise to the language faculty in an early human ancestor, whose descendants then dispersed across the globe.”

I wonder if “prometheus” scenario an established term in this context because I can’t find much by googling it. It seems an odd term to use given that Prometheus was the titan who “stole” fire and other cultural tools from the Gods to be used by humans. Since Prometheus was a Titan, he couldn’t pass his genes on to humans, and rather the beginning and proliferation of fire and civilization happened through a process of learning and cultural transmission. I know this is just meant to be an analogy and presumably the promethian aspect of it is alluding to it suddenly happening, but I can’t help but feel that the term “Prometheus scenario” should be given to the hypothesis that language is the result of cultual evolution acting upon domain general processes, rather than one which supports a genetically-defined language faculty in early humans.

References. 

Baronchelli A, Chater N, Pastor-Satorras R, & Christiansen MH (2012). The biological origin of linguistic diversity. PloS one, 7 (10) PMID: 23118922

Chater, N., Reali, F., & Christiansen, M. H. (2009). Restrictions on biological adaptation in language evolution. Proceedings of the National Academy of Sciences, 106(4), 1015- 1020.

Spontaneous Imitation of Human Speech

Lately, there have been a string of news articles regarding animals imitating human speech sounds. First, there was an account of the nine year-old beluga whale named NOC who was recorded making unusually low, clipped bursts of noise. Then, today, news from the University of Vienna was reported of an asian elephant named Koshik using his trunk to imitate Korean words.  Koshik does attempt to match both the pitch and timbre of the human voice, though the researchers doubt there is any meaningfulness to his phrases beyond an attempt at social affiliation.

The more interesting aspect of NOC’s speech is that, unlike the dolphins that are trained to imitate human noises or computer generated whistles, it is the first recorded spontaneous imitation of human speech. Similarly, the marine animals previously studied were raised primarily in captivity. NOC is not only a wild beluga whale, but his speech was also recorded in the wild. The study,  published in Current Biology, can be found here (only the abstract is available for free).

 

Links

http://www.sciencedaily.com/releases/2012/11/121101121534.htm

http://www.bbc.co.uk/news/science-environment-20026938

Sam Ridgway, Donald Carder, Michelle Jeffries, Mark Todd. Current Biology – 23 October 2012 (Vol. 22, Issue 20, pp. R860-R861)

Angela S. Stoeger, Daniel Mietchen, Sukhun Oh, Shermin de Silva, Christian T. Herbst, Soowhan Kwon, W. Tecumseh Fitch. “An Asian Elephant Imitates Human Speech.” Current Biology, 2012; DOI:10.1016/j.cub.2012.09.022

Taking the “icon” out of Emoticon

For some years now Simon Garrod and Nicolas Fay, among others, have been looking at the emergence of symbolic graphical symbols out of iconic ones using communication experiments which simulate repeated use of a symbol.

Garrod et al. (2007) use a ‘pictionary’ style paradigm where participants are to graphically depict one of 16 concepts without using words,  so that their partner can identify it. This process is repeated to see if repeated usage would take advantage of the  shared memory of the representation rather than the representation itself to the point where a iconic depiction of an item could become an arbitrary, symbolic one.

Garrod et al. (2007) showed that simple repetition is not enough to allow an arbitrary system to emerge and that feedback and interaction are required between communicators. The amount of interaction afforded to participants was shown to affect the emergence of signs due to a process of grounding. The signs that emerged from this process of interaction were shown to be arbitrary as participants not involved directly in the interaction were shown to have trouble interpreting the outcome signs.

The experimental evidence then shows that icons do indeed evolve into symbols as a consequence of the  shared memory of the representation rather than the representation itself.  Which is all well and good, but can this process be seen in the real world? YES!

I was talking to a friend on skype and he started typing repeated right round brackets:

))))))))

At first I just thought he had some problem with keys sticking on his keyboard, but after he did it two or three times I finally asked. To which he alluded that that they were smilies. Upon further questioning, it seems that this has become a norm for Russian internet chat that their emoticons have lost their eyes – presumably in the same process as Garrod et al. (2007) showed above.

 

 

 

 

 

 

 

 

 

 

 

 

 

They have also created an intensification system based on this slightly more arbitrary symbol, where by the more brackets repeated the happier or sadder you are. Among those in the UK and America, the need to intensify an emoticon has stayed well within the rhealms of iconicity with : D meaning “very happy” and D: meaning “oh God, WHHHHHYYYYY”. Japan have a completely different emoticon system altogether which focusses on the eyes:  ^_^ meaning happy and u_u meaning sad. Some of argued that this is because in Japan people tend to look to the eyes for emotional cues, whereas Americans tend to look to the mouth, as backed up by SCIENCE.

I’d be interested to see if norms have been established in other countries, either iconic or not.

Refs

Garrod S, Fay N, Lee J, Oberlander J, & Macleod T (2007). Foundations of representation: where might graphical symbol systems come from? Cognitive science, 31 (6), 961-87 PMID: 21635324

Yuki, M., Maddux, W., & Masuda, T. (2007). Are the windows to the soul the same in the East and West? Cultural differences in using the eyes and mouth as cues to recognize emotions in Japan and the United States Journal of Experimental Social Psychology, 43 (2), 303-311 DOI: 10.1016/j.jesp.2006.02.004

The final correlation: Bayesian Causal Graphs as an alternative to Phylogenetics

I vowed never to look at any more spurious correlations.  But there is time for one final foray into the word of acacia trees and traffic accidents.

Some of my previous posts showed correlations between bizarre variables such as alcohol consumption and morphological complexity, acacia trees and tonal languages and the sonority of a language and the amount of extra marital sex that its speakers indulge in.  Of course, the point was that cultural variables are likely to be correlated, even when they are not causally related, because of the way they spread.  As populations migrate, they bring whole bundles of cultural features with them.  See the article I wrote with James about this here.

There are some methods that try to account for this, such as Bayesian Phylogenetics.  However, these models are quite complicated and can take a lot of time to run.  Mostly, they are concerned with one or two cultural traits that we have some reason to think are linked.  However, what do we do if we’re not even sure what we should be controlling for?

One possible method is Bayesian causal graphs.  This is a method of figuring out the most likely causal graph given the correlations between variables.  R has a package to build causal graphs called pcalg which is quite straightforward to use (maual here).  I loaded up as many variables as I could and aggregated them by country.  The causal graph is calculated by computing all the partial correlations between all sets of variables, then figuring out which ones are most significant.  Here’s what I came up with (visualised with Gephi, click to make big):

Some interesting things come out.  First, some elements make intuitive sense, like the contemporary pathogen prevalence relying on the historical pathogen prevalence and the Gini coefficient (how rich the country is).  Variables like the number of frost days, mean growing season and mean temperature are linked.

Interestingly, this analysis suggests that linguistic diversity and road fatalities are not causally linked, although there is a strong correlation between them.  Also, tonal languages and the presence of acacia trees are not causally linked.  This is good news!

5-HTTLPR is a geneotype that has been linked to collectivism.  Me and James worked on a hypothesis that the link between these two things came about because of a difference in migration patterns.  The current graph suggests that there is no direct causal link between collectivism and the 5-HTTLPR geneotype, but they are linked through the levels of current migration.  The link with population density and long/short term orientation also fits with our hypothesis of more independent people migrating into harsher climates.  Me and James are currently working on a paper that uses statistics and modelling to argue this case.

Overall, the following picture falls out.  Ecological factors, such as the availability of water, dictate the kind of interaction dynamics that are prioritised between people.  This leads to different kinds of communication pressures which changes language in different ways:

There may be feedback in the other direction, as well.  That is, coevolution!

While the graph above looks impressive and makes sense in some areas, it is quite unstable.  Taking out some variables or cases leads to different links.  Some links are more stable than others.  For example, water availability and masculinity was quite robust.

It may be possible to do a kind of leave-one-out analysis, create lots of causal graphs and then bootstrap a most likely causal graph.  That is, each edge would be probabilistic, just like branches on a phylogenetic tree.  The advantage is that the graphs take almost no time to compute, even with relatively large datasets.  Also, the graphs get more accurate the more variables you put in.

And now I really, really promise not to do any more spurious correlations.

… although keep an eye out for an article by me and James in the New England Journal of Medicine on a correlation between chocolate consumption and serial killers.

 

Roberts, S. & Winters, J. (2012) Constructing Knowledge: Nomothetic approaches to language evolution. In L. McCrohon, T. Fujimura, K. Fujita, R. Martin, K. Okanoya, R. Suzuki, N. Yusa, Five Approaches to Language Evolution: Proceedings of the Workshops of the 9th International Conference on the Evolution of Language. Evolang9 Organizing Committee

1st International Winter School on Evolution

I don’t think anyone’s posted this yet:

1st International Winter School on Evolution – March 11th – 15th, 2013 University of Lisbon

The International Winter School on Evolution aims to better prepare a future generation for inter- and transdisciplinary evolution research by providing courses on cutting edge research in biological and sociocultural evolutionary sciences for Master, Doctoral and Postdoctoral students. Emphasis lies on topics that are currently underrepresented in (post)graduate curricula.

International experts will teach 9 courses on critical aspects of biological and socio-cultural evolution. The Winter School courses are centred around the following themes:

  • Macroevolution and the major transitions
  • Symbiogenesis, lateral gene transfer and hybridization
  • Language evolution

Visiting speakers include:

  • Michael Arnold
  • Folmer Bokma
  • Bill Croft
  • Daniel Dor
  • William Martin
  • Eörs Szathmáry
  • Mónica Tamariz
  • Douglas P. Zook

More info here: http://evolutionschool.fc.ul.pt

Workshop on Investigating Protolanguage, Utrecht

Timed to co-incide with Marieke Schouwstra’s PhD defense, there will be a workshop on investigating protolanguage at the University of Utrecht, the Netherlands on the 30th November.  Here’s the blurb:

What did language look like in the early stages of language evolution? Recently, researchers have started to investigate this question in the lab, using human participants. From these experiments, conclusions have been formulated about the properties of protolanguage (an intermediate stage in the emergence of human language). What are the different accounts of protolanguage? And does it make sense to talk about an intermediate stage in the emergence of language?

More information here:

http://www.phil.uu.nl/~mariekes/workshop-protolanguage/index.html