Linguistic interactions in the UK

I just heard a talk by social network creator extraordinaire Clio Andris about redefining regional boundaries in the UK based on telecommunications data.  Her group took data from 12 billion telephone calls made over the space of a month and created a social network based on this (Ratti et al. , 2010). This network was then used to calculate how closely connected two neighbouring locations were. By optimising the spectral modularity, the best-fitting boundaries could be defined.

Here’s a video demonstration:

The data is fascinating, but there is little explanation.  Here’s one of the maps (left) compared with a map of regional accents and a map of rail transport links (right):

A perceptual map of dialects, from Montgomery, C. (2007) Northern English Dialects: A perceptual approach, PhD thesis. pdf

 

A comparison of the two experiments.

One of the first things that struck me was the similarity with a map of regional accents (apologies for the quality of the accent map – I couldn’t find the one I was looking for).  Apparently, people are talking to people that sound like them.  Or, people who talk to each other sound like each other.  This isn’t covered in the paper, but seems like an important issue.

Secondly, the rail links also seem to form the ‘backbones’ of the communications regions.  This is also mentioned in the paper.  However, these two features are linked.

Coming from Wales, the important fit here is the three-way split in Wales.  South Wales feels like a different country to North Wales – culturally and linguistically.  However, both are linked by having large amounts of natural resources: Coal in South Wales and slate in North Wales.  This lead to massive migration into cities in the north and south, and rail links were set up to extract these resources to London or the nearest ports:  Cardiff in the south and Liverpool in the north.  Thus, it’s still a real pain to get from North Wales to South Wales.  The picture is somewhat true of the east and west sides of the north of England.

So, the natural resources concentrated people and transport links.  However, it also concentrated political views.  The large migrant community in Wales, working for little pay in large mine institutions, became unionised.  Socialism emerged, promoting political movements that lead to the minimum wage.

The point being, natural resources, transport links and politics are connected with some being historically dependent on each other.  This is, perhaps, precisely why splitting the nation by who speaks to who is a good measure of political regions.  It would be fascinating to see how linguistic divisions interact with these variables.

Ratti, Carlo, Sobolevsky, Stanislav, Calabrese, Francesco, Andris, Clio, Reades, Jonathan, Martino, Mauro, Claxton, Rob, & Strogatz, Steven H. (2010). Redrawing the map of Great Britain from a network of human interaction PLoS ONE, 5

Creative cultural transmission as chaotic sampling

This post was chosen as an Editor's Selection for ResearchBlogging.orgLast week I attended a lecture by Liz Bradley on chaos.  Chaos has been used to create variations on musical and dance sequences (Dabby, 2008; Bradley & Stuart, 1998).  I was interested to see whether this technique could be iterated and applied to birdsong or other culturally transmitted systems.  I present a model of creative cultural transmission based on this.

Continue reading “Creative cultural transmission as chaotic sampling”

Cultural Evolution and the Impending Singularity

Prof. Alfred Hubler is an actual mad professor who is a danger to life as we know it.  In a talk this evening he went from ball bearings in castor oil to hyper-advanced machine intelligence and from some bits of string to the boundary conditions of the universe.  Hubler suggests that he is building a hyper-intelligent computer.  However, will hyper-intelligent machines actually give us a better scientific understanding of the universe, or will they just spend their time playing Tetris?

Let him take you on a journey…

Continue reading “Cultural Evolution and the Impending Singularity”

Categorising languages through network modularity

Today I’ve been learning more about network structure (from Cris Moore) and I’ve applied my poor understanding and overconfidence to find language families from etymology data!

Here’s what I understand so far (see Clauset, Moore, &  Newman, 2008):  The modularity of a network is a measure of how many ‘communities’ it has.  An optimal modularity will split the graph to maximise the average degree within modules or clusters.  You can search all the possible clusterings to find this optimum.  I’m still hazy on how this is actually done, and you can extend this to find hierarchies like phylogenetics, but without some assumptions.  Luckily, there’s a network analysis program called gephi that does this automatically!

Continue reading “Categorising languages through network modularity”

Academic Networking

Who are the movers and shakers in your field?  You can use social network theory on your bibliographies to find out:

Today I learned about some studies looking at social networks constructed from bibliographic data (from Mark Newman, see Newman 2001 or Said et al. 2008) .  Nodes on a graph represent authors and edges are added if those authors have co-authored a paper.

I scripted a little tool to construct such a graph from bibtex files – the bibliographic data files used with latex.  The Language Evolution and Computation Bibliography – a list of the most relevant papers in the field – is available in bibtex format.

You can look at the program using the online Academic Networking application that I scripted today, or upload your own bibtex file to find out who the movers and shakers are in your field.  Soon, I hope to add an automatic graph-visualisation, too.

Continue reading “Academic Networking”

Laryngeal Air Sacs

So, I got a request from a friend of mine to make an abstract on the fly for a poster for Friday. I stayed up until 3am and banged this out. Tonight, I hope to write the poster justifying it into being. A lot of the work here builds on Bart de Boer’s work, with which I am pretty familiar, but much of it also started with a wonderful series of posts over on Tetrapod Zoology. Rather than describe air sacs here, I’m just going to link to that – I highly suggest the series!

Here’s the abstract I wrote up, once you’ve read that article on air sacs in primates. Any feedback would be greatly appreciated – I’ll try to make a follow-up post with the information that I gather tonight and tomorrow morning on the poster, as well.

Re-dating the loss of laryngeal air sacs in hominins

Laryngeal air sacs are a product of convergent evolution in many different species of primates, cervids, bats, and other mammals. In the case of Homo sapiens, their presence has been lost. This has been argued to have happened before Homo heidelbergensis, due to a loss of the bulla in the hyoid bone from Austrolopithecus afarensis (Martinez, 2008), at a range of 500kya to 3.3mya. (de Boer, to appear). Justifications for the loss of laryngeal air sacs include infection, the ability to modify breathing patterns and reduce need for an anti-hyperventilating device (Hewitt et al, 2002), and the selection against air sacs as they are disadvantageous for subtle, timed, and distinct sounds (de Boer, to appear). Further, it has been suggested that the loss goes against the significant correlation of air sac retention to evolutionary growth in body mass (Hewitt et al., 2002).

I argue that the loss of air sacs may have occurred more recently (less than 600kya), as the loss of the bulla in the hyoid does not exclude the possibility of airs sacs, as in cervids, where laryngeal air sacs can herniate between two muscles (Frey et al., 2007).  Further, the weight measurements of living species as a justification for the loss of air sacs despite a gain in body mass I argue to be unfounded given archaeological evidence, which suggests that the laryngeal air sacs may have been lost only after size reduction in Homo sapiens from Homo heidelbergensis.

Finally, I suggest two further justifications for loss of the laryngeal air sacs in homo sapiens. First, the linguistic niche of hunting in the environment in which early hominin hunters have been posited to exist – the savannah – would have been better suited to higher frequency, directional calls as opposed to lower frequency, multidirectional calls. The loss of air sacs would have then been directly advantageous, as lower frequencies produced by air sac vocalisations over bare ground have been shown to favour multidirectional over targeted utterances (Frey and Gebler, 2003). Secondly, the reuse of air stored in air sacs could have possibly been disadvantageous toward sustained, regular heavy breathing, as would occur in a similar hunting environment.

References:

Boer, B. de. (to appear). Air sacs and vocal fold vibration: Implications for evolution of speech.

Fitch, T. (2006). Production of Vocalizations in Mammals. Encyclopedia of Language and Linguistics. Elsevier.

Frey, R, & Gebler, A. (2003). The highly specialized vocal tract of the male Mongolian gazelle (Procapra gutturosa Pallas, 1777–Mammalia, Bovidae). Journal of anatomy, 203(5), 451-71. Retrieved June 1, 2011, from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1571182&tool=pmcentrez&rendertype=abstract.

Frey, Roland, Gebler, Alban, Fritsch, G., Nygrén, K., & Weissengruber, G. E. (2007). Nordic rattle: the hoarse vocalization and the inflatable laryngeal air sac of reindeer (Rangifer tarandus). Journal of Anatomy, 210(2), 131-159. doi: 10.1111/j.1469-7580.2006.00684.x.

Martínez, I., Arsuaga, J. L., Quam, R., Carretero, J. M., Gracia, a, & Rodríguez, L. (2008). Human hyoid bones from the middle Pleistocene site of the Sima de los Huesos (Sierra de Atapuerca, Spain). Journal of human evolution, 54(1), 118-24. doi: 10.1016/j.jhevol.2007.07.006.

Hewitt, G., MacLarnon, A., & Jones, K. E. (2002). The functions of laryngeal air sacs in primates: a new hypothesis. Folia primatologica international journal of primatology, 73(2-3), 70-94. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12207055.


Sound good? I hope so! That’s all for now.

Statistics and Symbols in Mimicking the Mind

MIT recently held a symposium on the current status of AI, which apparently has seen precious little progress in recent decades. The discussion, it seems, ground down to a squabble over the prevalence of statistical techniques in AI and a call for a revival of work on the sorts of rule-governed models of symbolic processing that once dominated much of AI and its sibling, computational linguistics.

Briefly, from the early days in the 1950s up through the 1970s both disciplines used models built on carefully hand-crafted symbolic knowledge. The computational linguists built parsers and sentence generators and the AI folks modeled specific domains of knowledge (e.g. diagnosis in elected medical domains, naval ships, toy blocks). Initially these efforts worked like gang-busters. Not that they did much by Star Trek standards, but they actually did something and they did things never before done with computers. That’s exciting, and fun.

In time, alas, the excitement wore off and there was no more fun. Just systems that got too big and failed too often and they still didn’t do a whole heck of a lot.

Then, starting, I believe, in the 1980s, statistical models were developed that, yes, worked like gang-busters. And these models actually did practical tasks, like speech recognition and then machine translation. That was a blow to the symbolic methodology because these programs were “dumb.” They had no knowledge crafted into them, no rules of grammar, no semantics. Just routines the learned while gobbling up terabytes of example data. Thus, as Google’s Peter Norvig points out, machine translation is now dominated by statistical methods. No grammars and parsers carefully hand-crafted by linguists. No linguists needed.

What a bummer. For machine translation is THE prototype problem for computational linguistics. It’s the problem that set the field in motion and has been a constant arena for research and practical development. That’s where much of the handcrafted art was first tried, tested, and, in a measure, proved. For it to now be dominated by statistics . . . bummer.

So that’s where we are. And that’s what the symposium was chewing over.

Continue reading “Statistics and Symbols in Mimicking the Mind”

The end of universals?

Woah, I just read some of the responses to Dunn et al. (2011) “Evolved structure of language shows lineage-specific trends in word-order universals” (language log here, Replicated Typo coverage here).  It’s come in for a lot of flack.  One concern raised at the LEC was that, considering an extreme interpretation, there may be no affect of universal biases on language structure.  This goes against Generativist approaches, but also the Evolutionary approach adopted by LEC-types.  For instance, Kirby, Dowman & Griffiths (2007) suggest that there are weak universal biases which are amplified by culture.  But there should be some trace of universality none the less.

Below is the relationship diagram for Indo-European and Uto-Aztecan feature dependencies from Dunn et al..  Bolder lines indicate stronger dependencies.  They appear to have different dependencies- only one is shared (Genitive-Noun and Object-Verb).

However, I looked at the median Bayes Factors for each of the possible dependencies (available in the supplementary materials).  These are the raw numbers that the above diagrams are based on.  If the dependencies’ strength rank in roughly the same order, they will have a high Spearman rank correlation.

Spearman Rank Correlation Indo-European Austronesian
Uto-Aztecan 0.39, p = 0.04 0.25, p = 0.19
Indo-European -0.13, p = 0.49

Spearman rank correlation coefficients and p-values for Bayes Factors for different dependency pairs in different language families.  Bantu was excluded because of missing feature data.

Although the Indo-European and Uto-Aztecan families have different strong dependencies, have similar rankings of those dependencies.  That is, two features with a weak dependency in an Indo-European language tend to have a weak dependency in Uto-Aztecan language, and the same is true of strong dependencies.  The same is true to some degree for Uto-Aztecan and Austronesian languages.  This might suggest that there are, in fact, universal weak biases lurking beneath the surface. Lucky for us.

However, this does not hold between Indo-European and Austronesian language families.  Actually, I have no idea whether a simple correlation between Bayes Factors makes any sense after hundreds of computer hours of advanced phylogenetic statistics, but the differences may be less striking than the diagram suggests.

UPDATE:

As Simon Greenhill points out below, the statistics are not at all conclusive.  However, I’m adding the graphs for all Bayes Factors (these are made directly from the Bayes Factors in the Supplementary Material):

Austronesian:                                                             Bantu:

Indo-European:                                                            Uto-Aztecan:

Michael Dunn,, Simon J. Greenhill,, Stephen C. Levinson, & & Russell D. Gray (2011). Evolved structure of language shows lineage-specific trends in word-order universals Nature, 473, 79-82

Cultural Evolution: Brought to you by Bacardi

Didn’t I say that alcohol affects language evolution?

 

 

The video is actually a pretty good summary of many of the main issues surrounding cultural evolution and self domestication. Surprisingly, Bacardi have actually done some research on this:

I cannot wait to make a Bacardi-WALS data cocktail.