Developing a Multi-Task Learning (MTL) model for low-resource Speech Emotion Recognition (SER) by leveraging ASR and multilingual datasets to distill powerful linguistic and prosodic features.
Investigated duration reduction in ASR/TTS systems by analyzing homophone and lexical frequency effects, and developed 3D visualizations of acoustic word clusters using wav2vec2, t-SNE, and the Montreal Forced Aligner (MFA).
• Architecting a generalized multilingual SER model that rivals specialized systems, targeting an average performance within 20% of single-language SOTA models, by fine-tuning Wav2Vec2 with a multi-task ASR approach.
• Created an efficient, on-device conversational assistant for AAC users, packaged into a 5GB deployable model that runs without an internet connection, by fine-tuning and quantizing LLMs like LLAMA3-8B for local inference.
• Minimized emergency vehicle delay at intersections, reducing average wait times by 45% compared to a standard timed system, by implementing and comparing a suite of RL algorithms (Q-Learning to DDQN) in the SumoRL environment.
• Built a scalable search and summarization system by scraping 50,000 Wikipedia summaries, indexing them with SOLR, and deploying a Flask server with a React frontend on GCP.
• Designed an intelligent response pipeline using zero-shot classification for message categorization, integrating a Blenderbot-based chatbot for casual conversations and T5 for summarizing SOLR query results.
• Developed a C++ program applying a 3 × 3 matrix kernel to a 1D vector (representing a 2D float array) using multithreading with OpenMP, achieving 70% efficiency on 64 processors in an HPC environment.
Visualizes 618 datasets from WHO API in line-charts dynamically, built in latest NextJS 13 framework.