Date of Award
5-2023
Document Type
Thesis
Publisher
Santa Clara : Santa Clara University, 2023.
Department
Computer Science and Engineering
First Advisor
Nam Ling
Second Advisor
Ahmed Amer
Abstract
While constituting a rare family of diseases that afflicts 268,000 people worldwide, motor neuron diseases carry a high fatality rate with one-third of people dying within a year of diagnosis and 50% of people dying within two years (MND Association, 2022). MNDs rapidly and progressively impair muscle movement, making everyday activities like walking, chewing, and speaking almost impossible. In collaboration with famed physicist Dr. Stephen Hawking, Intel Labs developed an assistive communications platform known as ACAT to simulate speech and facilitate electronic tasks. However, the original ACAT can be slow to use, leading to awkward pauses in conversations. This paper presents a solution through a machine learning pipeline that listens in on conversations and generates full sentence responses that more accurately simulates human speech in real time. Our pipeline consists of three main phases: (1) voice activity detection; (ii) diarization; and (iii) response generation. A significant benefit of this technique is that it allows users a flexible substitution of components. Our results show that this speech generation method can significantly improve conversational flow, partly by adapting to user feedback to create more accurate results. To increase efficacy further, we plan to implement additional steps that incorporate fine-tuning the voice activity detector and diarizer models, enhancing and integrating our GUI into ACAT, and upgrading the response generator.
Recommended Citation
Quazi, Kairan, "ACAT 2.0: An AI Transformer-Based Approach to Predictive Speech Generation" (2023). Computer Science and Engineering Senior Theses. 246.
https://scholarcommons.scu.edu/cseng_senior/246