We live in an age where we have more data on more languages than ever before, and more data to link it with from other domains. This should make it easier to test hypotheses involving adaptation, and also to spot new patterns that might be explained by adaptation. For example, the proposed link between climate and tone languages could never have been investigated without massive global databases. However, there is not much discussion of the overall approach to research in this area.
This week I published a paper in a special issue on the Adaptive Value of Langauges, outlining the maximum robustness approach to these problems. I then try to apply this approach to the debate about the link between tones and climate.
In a nutshell, I suggest that research should be:
Robust
Instead of aiming for the most valid test for a hypothesis, we should consider as many sources of data and as many processes as possible. Agreement between them supports a theory, but differences can also highlight which parts of a theory are weak.
Causal
Researchers should be more explicit about the causal effects in their hypotheses. Formal tools from causal graph theory can help formulate tests, recognise weaknesses and avoid talking past each other.
Incremental
Realistically, a single paper can’t be the final word on a topic, and shouldn’t aim to. Statistical studies of large-scale, cross-cultural data are very complicated, and we should expect small steps to establishing causality.
I applying these ideas to the debate about tone and climate. Caleb Everett also published a paper in this issue showing that speakers in drier regions use vowels less frequently in their basic vocabulary. I test whether the original link with tone and the new link with vowels holds up when using different data sources and different statistical frameworks. The correlation with tone is not robust, while the correlation with vowels seems more promising.
I then suggest some ideas for alternative methodological approaches to this theory that could be tested. For example:
- An iterated artificial learning experiment
- A phonetic study of vowel systems
- A historical case-study of 5 Bantu languages
- A corpus study of tone use in Cantonese and conversational repair in Mandarin
- A corpus study of Larry King’s speech