Crowdsourcing can say if speech therapy works

Crowdsourcing can be a cheap, unbiased way to determine if patients with speech disorders are pronouncing sounds correctly, report researchers.

“Because large crowdsourced samples can be obtained quickly, easily, and inexpensively, speech researchers could find it beneficial to use crowdsourcing technology in place of traditional methods of collecting speech ratings,” says lead author Tara McAllister Byun, an assistant professor in New York University’s Steinhardt School of Culture, Education, and Human Development.

Research in linguistics and psychology has reported that using crowdsourcing not only saves time and money, but can also enhance scientific rigor.

The study, published in the Journal of Communication Disorders, suggests that these benefits can also be extended to studies of the nature and treatment of speech disorders.

In speech disorders research, unbiased listeners are needed to evaluate patients’ progress over the course of treatment by listening to speech sounds and rating or coding them.

Because speech language pathologists and other trained professionals are often used as raters, collecting the ratings can be costly. It can also be a challenge to find raters who are not part of the research and are therefore unbiased.

Is that the right ‘R’?

Amazon Mechanical Turk (AMT) is an online crowdsourcing platform developed by Amazon as a tool for completing routine tasks better performed by humans than computers.

Now with hundreds of thousands of workers, and roughly 10,000 requestors or employers, anyone can use AMT’s standardized interface to post or complete electronic tasks. While not originally designed for conducting behavioral research, AMT has been successfully used in linguistics and psychology research.

Modeling studies have shown that even when individual responses to a task are not highly accurate, aggregated or crowdsourced responses from a large number of people generally converge with those of experts. In this study, the researchers tested the validity of having AMT users rate speech sounds, compared with ratings collected from experienced listeners.

Listeners were asked to rate recordings of 100 words containing the “r” sound, collected from children with trouble pronouncing the sound and working to correct it in speech therapy. Twenty-five experienced listeners and 153 AMT listeners scored the “r” sounds as correct or incorrect. Data from experienced listeners were collected over a period of three months, while data gathering using AMT took a mere 23 hours.

Recruiting listeners

The researchers found that when responses were aggregated, there was a very high level of overall agreement. When items were classified as correct or incorrect based on the majority vote across all listeners in a group, the AMT group and the experienced listener group were in agreement on all but seven of 100 items.

In a further analysis, the researchers sought to understand how many AMT listeners were needed to still get valid responses that converged with those of experienced listeners. They found that samples of nine or more AMT listeners demonstrate a level of performance consistent with typical expectations for experienced listeners.


While using AMT for speech ratings poses some limitations, including a lack of control over sound quality and inattentive or uncooperative raters, the researchers concluded that using AMT for speech language pathology research could have a substantial impact on the process of gathering speech ratings.

“A key advantage of using crowdsourcing to recruit listeners for speech rating tasks is the speed and ease with which ratings can be obtained,” says McAllister Byun.

“However, using crowdsourcing for speech data rating is not merely a question of convenience; it also has the potential to improve speech research by expanding access to independent listeners, thereby reducing bias.”

In addition to McAllister Byun, study authors include Peter Halpin, an assistant professor of applied statistics at NYU Steinhardt, and Daniel Szeredi, a doctoral student in NYU’s department of linguistics. The National Institutes of Health supported this research.

Source: New York University