As we progress through 2020 trying to stay on top of COVID-19 updates and Zoom meetings, I find myself reflecting on how far artificial intelligence, in particular Natural Language Processing (NLP), has progressed during the last 30 years. These days, chatbots start conversations on websites, text messages are transcribed automatically by our phones, and smart home devices operate on simple voice commands. It’s remarkable to think that so many of the technological conveniences we experience today started decades ago. Yet, as was the case back then, it has been the continued advances made in the fields of data science and artificial intelligence that have turned technology into the products and applications we all benefit from today.

Short History on Speech Recognition

By the time I was gathering speech data for an artificial intelligence system design in the 1990s, Bell Laboratories, IBM, the US Department of Defense and DARPA, among others, had already made large advancements propelling speech recognition vocabulary forward. Starting in the 1950s with machines that could recognize a single voice speaking numerical digits, and progressing to devices that could interpret several thousand words by the 1980s, the advancements in speech recognition began relatively slowly. However, by the beginning of the 2000s, personal computers with faster processing speeds had advanced speech recognition technology to close to 80% accuracy, laying the groundwork for many of the voice-powered applications we use today. 

Since then, Google has led the field with the launch of its Voice Search application, which is capable of predicting what a user is actually saying. In large part, Google’s success has come from collecting and analyzing data from billions of user searches and using this information to fine-tune its app in order to make it more accurate and easy for consumers to use. Along the way, Amazon, Apple, Microsoft and many other companies have entered the race for speech to text accuracy, which is now close to 95% overall.

Solving Bias Using Technology

As many of us can attest, advancement opportunities still exist in the field of speech recognition. According to research by Dr. Tatman, published by the North American Chapter of the Association for Computational Linguistics (NAACL), Google’s speech recognition is 13% more accurate for men than it is for women, and regularly performs worse for non-white people or people with dialects. As in many cases, it all comes back to the data. Speech data has been modeled on mostly white adult males, leaving minorities, women and children to continue to struggle with the current technology. In many ways, the existence of racial biases in technology is not a new problem. Kodak struggled until the mid-90s to produce color film that flattered all skin tones. Even then, the issue was only addressed because Kodak received complaints from corporate furniture and chocolate manufacturers that their products were not getting the correct brown tones when photographed.

Last month, it was refreshing to hear from voices in AI academia that “biases in the data used for algorithmic decision-making can be avoided through technological solutions and by improving diversity among the people who create and use AI software.”

Making Sure All Voices Are Heard

NLP and AI will truly come of age when priority is given to the role that humans have to play to ensure non-bias. Policing of bias can only be achieved with dialogue between machines and humanity in all of its diverse forms.

Now and especially moving forward, massively big data continues to be the foundation of natural language processing and artificial intelligence. I continue to be proud of my past and present participation at AICorp and Voice Processing Corporation (now Nuance) during this technological journey. I look forward to a future where a person’s voice is the dominant interface, provided that fully inclusionary data allows all voices to be clearly recognized.