Of particular interest to me at the moment are how we process auditory signals and extract semantic information from those signals, and democratizing access to linguistic information, research techniques, and speech technologies, which have in some cases been locked into black-box proprietary software or esoteric web interfaces. To those ends, I have a few research projects I'm engaged with currently.
Phonotactic Probability and Auditory Nonword Processing
One of the more interesting abilities we have in our speech perception arsenal is determining whether or not a "word" we hear is actually a part of our language or not. But what factors about this kind of stimulus serve to signal this to us? And what makes it more difficult for us to make this determination? Answering these questions will provide us a clearer window into the way we process auditory signals and extract semantic information from them. One of the ways I would like to examine this is through phonotactic probability, or the likelihood that a given sequence of phones occur in a language. This would shed light on whether there is a point at which, before we finish perceiving a word, we determine its meaning (or lack thereof), and whether or not we may characterize certain segment sequences as "characteristic" of a language we speak. But such a phonotactic probability metric still needs to be created from contemporary linguistic resources, along with an interface that allows users to painlessly calculate the metric.
Vowel Merger Assessment
A reliable vowel merger quantification has implications in a number of fields, such as dialectology in studying vowel merger, for example; second language speech learning, to help language learners and users more closely match the vowel targets in their target language; and sociophonetics to facilitate examining the variation in vowel production among different speakers of a language. To that end, one of the projects that I'm working on is a collaboration with Drs. Geoffrey Stewart Morrison and Benjamin V. Tucker, analyzing and comparing current vowel overlap metrics. These metrics use the formant values, and optionally duration, in a dataset of vowels to compare each different vowel category to another and determine to what extent these categories overlap with each other. The problem, though, is that there is no consensus on which of the many proposed metrics, if any, provides the most accurate and precise results and also account for the density of the data points. The project, then, focuses on finding which of these metrics is the most suitable for the field to converge on for general purpose application.
New Approaches to Forced Alignment
One of the tools we use in phonetic research is forced alignment, which will automatically label the word-level and phone-level segments in a piece of speech. Many of the freely available tools that exist to perform this task do an adequate job, but they rely on the Hidden Markov Model Toolkit (HTK), which can be cumbersome to install and get working correctly. Neural networks, and especially deep nets are seeing a resurgence in the machine learning field and becoming a state of the art technique, however, and seem worth investigating as the backend of a forced aligner tool. Such a new tool may provide better alignment results than would the HTK-based aligners, and could leverage more easily accessible and freer software libraries.
Equitable Access to Speech Technologies
Speech technology is often only available for those who speak majority languages, which can be problematic. Speech synthesis, for instance, is used in technologies such as screen readers, digital assistants, and general text-to-speech programs. However, this software can be difficult to find (or even non-existent) for non-majority languages. This gap of technology could perhaps be filled using free and open source software. After a preliminary investigation into the feasibility of automated speech synthesis voice creation on an arbitrary language with minimal data using free and open source tools and language material that is currently available and easily accessible, it seems to me that the systems aren't there yet for full automation. This has implications into the development of speech synthesis systems by non-experts and computer accessibility, and further research and development is needed in this area to allow more equitable access to speech technologies.