A Replicated Word Cloud

I found this cool website for generating really awesome looking word clouds: http://www.wordle.net/. Here is a word cloud for this website:

Apparently, the clouds are based on word frequency, but I don’t think it can be pulling much data from this website if Levinson is coming out as one of the most frequently used words. Still, it entertained me for a few minutes. H/T: Glossographia.

More on Phoneme Inventory Size and Demography

On the basis of Sean’s comment, about using a regression to look at how phoneme inventory size improved as geographic spread was incorporated along with population size, I decided to look at the stats a bit more closely (original post is here). It’s fairly easy to perform multiple regression in R, which, in the case of my data, resulted in highly significant results (p<0.001) for the intercept, area and population (residual standard error = 9.633 on 393 degrees of freedom; adjusted R-Squared = 0.1084). I then plotted all the combinations as scatterplots for each pair of variables. As you can see below, this is fairly useful as a quick summary but it is also messy and confusing. Another problem is that the pairs plot is on the original data and not the linear model.

Continue reading “More on Phoneme Inventory Size and Demography”

Some Links #17: The Return of Whorf

The famous Klingon linguist, Whorf, has returned with his theories on linguistic relativity (I know, terrible joke).

The Largest Whorfian Study Ever. The Lousy Linguist looks at the paper Ways to go: Methodological considerations in Whorfian studies on motion events. As you can probably guess, the paper deals with the methodological issues surrounding linguistic relativity. It’s all interesting stuff, bringing to light important questions about how the brain handles language. I’m fairly lay when it comes to this topic, so for more background on the current events, see similar posts over at Language Log: Never Mind the Conclusions, What’s the Evidence? and SLA Blog: Linguistic Relativity, Whorf, Linguistic Relativity.

But Science Doesn’t Work That Way: Miller & Chomsky (1963). Many of you who read this blog will be familiar with the position taken by Melody’s post over at Child’s Play: against a strong nativist position in language acquisition. It’s the first part in a series of posts so I’ll reserve judgement on her conclusions until she’s finished. But much of her post is drawn from a brilliant paper by Scholz and Pullum (2005): Irrational Nativist Exuberance. Key paragraph:

Do we really want to say that phonemes are ‘innate’?

I haven’t yet addressed how we know — with all but certainty — that the model Miller and Chomsky used had to be a poor approximation of human learning capabilities.  It has to do with phonemes.

Experiments have shown that people are remarkably sensitive to the transitional probabilities between phonemes in their native languages, both when speaking and when listening to speech.  If Miller and Chomsky’s assessment of probabilistic learning is correct, then the problem of “parameter estimation” should apply not only to learning the probabilities between words, but also to learning the probabilities between phonemes.  Given that people do learn to predict phonemes, Miller and Chomsky’s logic would force us to conclude that not only must ‘grammar’ be innate, but the particular distribution of phonemes in English (and every other language) must be innate as well.

We only get to this absurdist conclusion because Miller & Chomsky’s argument mistakes philosophical logic for science (which is, of course, exactly what intelligent design does).  So what’s the difference between philosophical logic and science? Here’s the answer, in Einstein’s words, “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”

PLoS Blogs. Yet another blogging network. This time it’s with the Public Library of Science. The most notable move, for me at least, is Neuroanthropology. That move hasn’t seemed to impact upon their ability to produce good articles, the latest of which being in regards to Uner Tan Syndrome (I’m sure there was a documentary about this on BBC…).

Hap Map 3: more people ~ more genetic variation. Razib has a cool read on the new HapMap dataset. The current paper (Integrating common and rare genetic variation in diverse human populations) looked for variants across the genome in 11 populations, consisting of 1184 samples. It’s been especially useful with less common variants. As with previous versions, you can also explore the data. Here’s the conclusion from the paper:

With improvements in sequencing technology, low-frequency variation is becoming increasingly accessible. This greater resolution will no doubt expand our ability to identify genes and variants associated with disease and other human traits. This study integrates CNPs and lower-frequency SNPs with common SNPs in a more diverse set of human populations than was previously available. The results underscore the need to characterize population-genetic parameters in each population, and for each stratum of allele frequency, as it is not possible to extrapolate from past experience with common alleles. As expected, lower-frequency variation is less shared across populations, even closely related ones, highlighting the importance of sampling widely to achieve a comprehensive understanding of human variation.

Mathematics: From the Birth of Numbers. Someone gave this in to the charity store I work at: it’s a brilliant book by Jan Gullberg on (surprise, surprise) the history of mathematics. The first chapter was on mathematics and language, so I had to pick it up, and not just for that chapter alone, as there are plenty of gaps in my mathematical knowledge I’m sure this will clear up.

Two new Greenhill Papers

Simon Greenhill has just announced two new papers on applying phylogenetic techniques to the study of culture. No doubt I’ll be blogging about these at some point in the future. Below are the abstracts:

Continue reading “Two new Greenhill Papers”

Guardian Science Blogs

Some smart moves by the Guardian. They’ve created their own mini science blog network, containing some top names and proven bloggers. There are currently five blogs: Punctuated Equilibrium, Political Science, The Lay Scientist, Life and Physics. The fifth blog, in case you were concerned about my ability to count, is going to rotate between various bloggers, the first of which being the brilliant Mo Costandi of Neurophilosophy. I would normally subscribe to each of these blogs individually, so it’s nice to see them all under one digital roof of science-blogging goodness.

Btw, here’s the RSS feed for all the blogs: http://www.guardian.co.uk/science/scienceblogs/roundup/rss.

Phoneme Inventory Size and Demography

It’s long since been established that demography drives evolutionary processes (see Hawks, 2008 for a good overview). Similar attempts are also being made to describe cultural (Shennan, 2000; Henrich, 2004; Richerson & Boyd, 2009) and linguistic (Nettle, 1999a; Wichmann & Homan, 2009; Vogt, 2009) processes by considering the effects of population size and other demographic variables. Even though these ideas are hardly new, until recently, there was a ceiling as to the amount of resources one person could draw upon. In linguistics, this paucity of data is being remedied through the implementation of large-scale projects, such as WALS, Ethnologue and UPSID, that bring together a vast body of linguistic fieldwork from around the world. Providing a solid direction for how this might be utilised is a recent study by Lupyan & Dale (2010). Here, the authors compare the structural properties of more than 2000 languages with three demographic variables: a language’s speaker population, its geographic spread and the number of linguistic neighbours. The salient point being that certain differences in structural features correspond to the underlying demographic conditions.

With that said, a few months ago I found myself wondering about a particular feature, the phoneme inventory size, and its potential relationship to underlying demographic conditions of a speech community. What piqued my interest was that two languages I retain a passing interest in, Kayardild and Pirahã, both contain small phonological inventories and have small speaker communities. The question being: is their a correlation between the population size of a language and its number of phonemes? Despite work suggesting at such a relationship (e.g. Trudgill, 2004), there is little in the way of empirical evidence to support such claims. Hay & Bauer (2007) perhaps represent the most comprehensive attempt at an investigation: reporting a statistical correlation between the number of speakers of a language and its phoneme inventory size.

In it, the authors provide some evidence for the claim that the more speakers a language has, the larger its phoneme inventory. Without going into the sub-divisions of vowels (e.g. separating monophthongs, extra monophtongs and diphthongs) and consonants (e.g. obstruents), as it would extend the post by about 1000 words, the vowel inventory and consonant inventory are both correlated with population size (also ruling out that language families are driving the results). As they note:

That vowel inventory and consonant inventory are both correlated with population size is quite remarkable. This is especially so because consonant inventory and vowel inventory do not correlate with one another at all in this data-set (rho=.01, p=.86). Maddieson (2005) also reports that there is no correlation between vowel and consonant inventory size in his sample of 559 languages. Despite the fact that there is no link between vowel inventory and consonant inventory size, both are significantly correlated with the size of the population of speakers.

Using their paper as a springboard, I decided to look at how other demographic factors might influence the size of the phoneme inventory, namely: population density and the degree of social interconnectedness.

Continue reading “Phoneme Inventory Size and Demography”

The Rap Guide to Human Nature

Last year at Edinburgh’s Fringe Festival me and several of my friends saw this brilliant show called the Rap Guide To Evolution (which I briefly blogged about here). Well this year the same rapper, one Baba Brinkman, was back with a new show: the Rap Guide to Human Nature. Unlike most rap attempts at explaining science, it doesn’t sound like a bad Beastie Boys and Sugarhill Gang pastiche of awkward cadence and simple rhyming. Also, it includes some brilliant interludes from various scientists and researchers, including Olivia Judson and David Sloan Wilson. Here is a video of him performing at Binghampton University (I suppose his audience isn’t straight outta compton):

Some Links #16: Why I want to Falcon Punch (some) BBC Science Writers

I’m not normally one for violent resolutions to sloppy science, but in taking inspiration from one such perpetrator I’m promoting a Falcon Punch policy. Above is a graphical example of a successful Falcon Punch: the goal being to hurl your target onwards and upwards into a flaming ball of scientific shame.

Space is the final frontier for evolution, study claims. I had planned on writing a more substantial article on how yet another science writer, in this case one Howard Falcon-Lang, is claiming that Darwin has once again been felled by a new study. Greg Laden, however, beat me to the punch with a damning critique:

Continue reading “Some Links #16: Why I want to Falcon Punch (some) BBC Science Writers”

Massive Science Blogging Aggregator

If you are quite keen on keeping up with the ever-changing science blog ecosystem, then a must visit website is the newly created ScienceBlogging.org:

Created by Anton Zuiker (MisterSugar), Bora Zivkovic (A Blog Around The Clock) and Dave Munger (Word Munger), the site aggregates all the major science group blogs, blogging networks, aggregators and services. Great stuff. They also have a blog.