A few weeks ago, Roger Blench gave a talk at the MPI entitled ‘New mathematical methods’ in linguistics constitute the greatest intellectual fraud in the discipline since Chomsky. The title is controversial, to say the least! The talk argued, amongst other things, that phylogenetic methods are less transparent and less replicatable than traditional historical reconstruction. Here are I argue against those points.
I felt like I should respond online, because Roger Blench made the talk slides available online (a similar set of arguments are more fully expressed by here).
Having attended the talk, the title is clearly tongue-in-cheek. The talk was quite light-hearted and he was very easy to talk to. He certainly didn’t accuse any particular researcher of actual fraud (“fraud” only appears in the title, not in the actual talk). By “fraud”, he seems to mean “overblown claims not supported by evidence” (and he is not alone in this view of phylogenetics). However, I was surprised to find the talk slides were made public, where things can be taken out of context.
It was a fun talk to be at, because it made clear to me why some people distrust phylogenetic methods. Roger Blench made 3 basic points (the points were made in detail with examples, but I will summarise them here). He argued that phylogenetic methods were useless because they (1) were not replicatable and (2) were not transparent. Regarding (1), he argued that changes to parameters or data can lead to differences in results (though, to me, this might be better characterised as a lack of robustness rather than a lack of replicatability), and it’s impossible to tell which are better. Furthermore, the method itself is opaque and mysterious, because it relies on cognacy judgements which are quite subjective (he gave an example of one historical linguist who claimed that it was much easier to find cognates after lunch!). That is, garbage in, garbage out.
He also argued that many phylogenetic studies don’t try to validate their trees by aligning them with anthropological or archaelological data (and I agree than several studies could be improved by doing such validation). That is, the graphs look pretty, but to an expert linguist they don’t reveal anything new. More specifically, they offer limited falsifiability. Worse, they may actually mislead the reader if the visualisations do not reflect actual processes of change etc. Blench argued that the pretty graphs have ‘bamboozled’ the editors and reviewers of high-impact journals (who do not have expertise in linguistics) into accepting the studies.
Blench stressed that he is not anti-quantitative (nor a Luddite, indeed, I talked with him afterwords about a quantitative method for tracing borrowings for which he had several good insights) , but suggested that he feels like he hasn’t learned anything from phylogenetics.
In my mind, transparency and replicatability are the strong points of phylogenetic techniques. Statistical methods require precise definitions. This includes defining the assumptions behind the analysis, defining the measurements and defining how the assumptions and measures lead to the conclusion (the method). Therefore, another researcher should be able to reproduce the precise results of a study given the same data, assumptions and method. For example, given the same data and parameter settings (and random starting seeds) two researchers could produce precisely the same phylogenetic tree using a Bayesian phylogenetic approach. This means the results can be replicated, an important step in any science.
Furthermore, while some statistical methods may seem opaque without a knowledge of mathematics, they can be precisely communicated. They are arguably more transparent than analyses which were the result of an individual researcher combining deep knowledge of several domains without fully explaining the process. For example, cognacy judgements often rely on a deep knowledge of the language as well as its history and culture and the surrounding geopolitical landscape. These judgements are invaluable for subsequent quantitative work, yet the data and assumptions that go into the judgements are often left implicit. This obscures the research method and also makes it difficult to reproduce the same results. This can lead to disagreements that focus on the skill or authority of the researcher, which is not productive.
In contrast, when assumptions, measures, methods and results are precisely defined, researchers can focus on them directly. For example, if a researcher takes an issue with one of the assumptions, they can make a different assumption, then use the same measures and methods to produce alternative results. The two results can be directly compared to determine whether the assumption has a crucial impact on the results, or whether the alternative assumption leads to better results (e.g. a better ‘fit’ to the data or a more efficient explanation). Researchers can also directly test the impact of certain data or steps in the method. That is, arguments can focus on core scientific elements rather than on opinion or prestige of researchers. In this way, quantitative methods can achieve better transparency and replication in a productive way.
However, this is not to say that qualitative judgements cannot achieve this potential. All that is required is that that the assumptions, methods and underlying measures are precisely defined. For example, while there is a large amount of disagreement in the classification of languages into historical trees, the Glottolog classification has a rigidly defined sequence of judgements that decide how to place a language or dialect on a tree. If you accept the assumptions behind this process, then you should accept the classification. If you disagree with the classification, you should be able to identify either an assumption or a specific judgement that you don’t agree with. You can then bring evidence against this particular assumption or judgement and determine how the classification should change – there is no need to reject the entire tree, nor necessarily any other classification.
In a similar way, the decisions that go into seemingly more subjective measures such as cognacy judgements could be explicitly stated. For example, the LexStat method (List, 2012) is a computational method for identifying cognates based on linguistic criteria and a set of assumptions (implemented in LingPy). It produces replicable judgements which are derived in a transparent way. It’s probable that expert historical linguists would disagree with the results obtained, but rather than dismissing the method, they should be able to define an additional set of data and assumptions which would produce more agreeable results. This might include coding archaeological or anthropological evidence that certain languages were or were not in contact, known population movements or knowledge about certain semantic domains that refer to items that were traded. This is essentially quantifying the knowledge in a way that could be used with the statistical methods. Historical linguists have a wealth of critical knowledge about language, and this could find much broader impact if it were combined with the transparency and reproducibility of quantitative methods.
Having said this, the talk by Blench makes it clear that this kind of synergy is not taking place. Those using mathematical models may have to spend more time justifying and clarifying their work. At the same time, learning the mathematical principles is not so hard. As we argued in our paper on correlational studies, understanding the mathematical methods in linguistics is becoming more relevant not only to conduct research, but engage in debate.
On a positive note, I understand that Blench is working together with an evolutionary biologist to work out a mathematical model which reflects their assumptions and theories about how languages change and diversify. I look forward to seeing this model and how it compares to phylogenetic models.
The talk slides are quite easy to follow if you’d like more detail on Blench’s arguments.
This may be a bit naive, but the simple fact that computational methods require machine-readable data should give them a headstart when it comes to transparency and replication. Furthermore, as noted by Blench, people using these methods tend to be distinct from the ones collecting the data; while this may be a problem when interpreting the data, it also promotes making data publicly available. I would consider both of these properties of data – machine-readable and publicly available – to be part of good scientific practice, and not really the norm in linguistics so far. So if nothing else, computational methods in linguistics may advance the field in terms of research data management.
True, though making data available is risky: researchers who put a lot of time into data collection want to see a return on their investment in terms of co-authorship. Also, some types of linguistic data cannot be ethically made public. One productive solution is a greater number of direct collaborations between data collectors and data analysers, and I think this is productive too.
I guess that’s what I hope increased interest in the data may fix. If availability of data becomes crucial, because the methods require it, the rewards for publishing data should increase.
Exactly: We need to make it standard that people publish the data they base their analyses upon! Otherwise: how do we actually want to achieve repeatability? It makes me almost mad that people still can get along publishing results without the data: it’s like saying that one converted water into wine or dust into gold: if nobody tries to repeat the claims, one can just claim anything… But this is also a problem of traditional historical linguistics, where people are used to telling that “X” is cognate with “Y” because of their authority, where what we actually really would need was a direct statement of sound correspondences (preferably in form of an alignment analysis). So I can only agree with Robert: if we follow the path of making things machine-readable and publicly available, we are coming much closer to good scientific practice and will also advance the field of traditional historical linguistics.