Atharva Vikas Jadhav

Research Engineer | NLP & Voice AI | Applied Machine Learning

Computer Science researcher specializing in low-resource speech synthesis, acoustic modeling, and end-to-end NLP pipelines. Passionate about building language technologies that adapt to human nuances.

About Me

I am an AI engineer and researcher driven by the challenge of building accessible, high-fidelity language technologies. By combining computational linguistics with production-ready machine learning, I design speech systems capable of handling the complexities of real-world audio, from heavily accented regional dialects to noisy acoustic environments.

My core expertise lies in acoustic modeling, fine-tuning generative TTS pipelines, and building multi-task architectures that maximize performance on limited data. Whether extracting phonetic boundaries for language-learning applications, aligning deep speech embeddings to study human cognition, or stitching together real-time ASR-to-TTS inference graphs, I focus on the intricate science of speech processing first—ensuring the models I build are theoretically sound before scaling them for deployment.

Work Experience

University at Buffalo

Research Assistant – Dialectal Speech Processing

Sep 2025 – May 2026
  • Architected an end-to-end speech-to-speech (S2S) conversational tool for clinical simulations, helping students practice colloquial Puerto Rican Spanish.
  • Addressed low-resource data scarcity by building staging pipelines (forced alignment) and fine-tuning Coqui XTTS-v2 and Gemma models on HPC clusters.
  • Deployed the streaming ASR → LLM → TTS pipeline utilizing LiveKit and WebRTC on OpenStack infrastructure.

Research Assistant – Cognitive Modeling & Acoustic Analysis

Mar 2025 – May 2026
  • Engineered a cognitive model simulating human word recognition using Dynamic Time Warping (DTW) and probabilistic updating to resolve acoustic ambiguity.
  • Extracted precise phonetic boundaries via the Montreal Forced Aligner (MFA) and generated contextual word embeddings using Wav2Vec2 and HuBERT.
  • Visualized high-dimensional acoustic clusters utilizing t-SNE projections to evaluate model focus and linguistic trends across transformer layers.

Research Assistant – Applied NLP & Conversational Agents

Mar 2025 – May 2026
  • Developed "SPICA," an Augmentative and Alternative Communication (AAC) framework, by fine-tuning Llama and Gemma LLMs to dynamically retrieve user knowledge.
  • Architected end-to-end data pipelines involving web scraping and LLM-assisted annotation to construct a 4.1-hour South Asian rhythmic dataset.
  • Trained ML classification models achieving 70% rhythmic analysis accuracy, leading to accepted co-authored papers at ACM IUI and AAAI EAIM.

Crimsonbeans Ltd

Dec 2021 – Apr 2024

Software Engineer

  • Integrated Silero VAD for ultra-low-latency speech endpoint detection on embedded devices. Engineered state-trigger transmissions via Bluetooth to mobile clients, resolving critical cross-talk issues across concurrent audio streams.
  • Developed production-grade PyTorch LSTM predictive models and Flask APIs to forecast high-frequency supply and demand vectors, mitigating consumer budget overruns by 8%.
  • Architected and scaled a distributed microservice notification framework via Kubernetes and GCP, guaranteeing 98% uptime for thousands of daily data-driven alerts.

Research & Publications

Modeling Cohort and Rhyme Competition in the Visual World Paradigm

Psychonomic Society 2026

Derived real-time probabilistic estimates of acoustic ambiguity as speech unfolds using DTW-aligned Wav2Vec2.0 XLS-R embeddings and Bayesian updating. Successfully simulated human looking behavior and reproduced classic cohort and rhyme eye-movement patterns across both clean and noisy real-world speech tokens.

Manuscript Under Review

Scalable and Personalized Conversational Agent Framework for AAC Users

ACM IUI 2026

Built an end-to-end ASR-to-TTS conversational framework combining Deepgram Nova 3 and Google TTS via structured SSML orchestration. Fine-tuned LLMs on HPC clusters to evaluate 200 synthetic user profiles, advancing personalized Augmentative and Alternative Communication (AAC).

Read Paper →

Low-Resource Rhythm Learning of South Asian Rhythmic Structures

AAAI EAIM 2026

Engineered an end-to-end data acquisition and machine learning pipeline to synthesize a 4.1-hour Nattuvangam acoustic dataset, achieving 70% classification accuracy on complex temporal patterns.

Read Paper →

Key Projects

Multilingual Speech Emotion Recognition

Thesis Defended

Successfully defended master's thesis on architecting a unified multilingual SER model. Implemented an auxiliary ASR multi-task learning objective to insulate the model from in-domain overfitting without adding runtime inference overhead. Attained zero-shot cross-lingual generalization across English, Italian, and Farsi targets using highly constrained data limits.

Manuscript under conference review
PyTorch Multi-Task Learning Acoustic Modeling

Caribbean Spanish S2S Framework

Interactive Tool

End-to-end development of a streaming Puerto Rican dialect speech-to-speech simulator designed to help clinical nursing and pharmacy cohorts overcome regional linguistic boundaries. Features specialized TTS alignment and LLM conversational inversion.

Traffic Light Automation via RL

Benchmarked Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) algorithms within the SumoRL macroscopic transit environment. Optimized intersection phase vectors to dynamically reduce emergency vehicle idle delays by 45%.

SumoRL PPO/DQN
Read Case Study

Wikipedia Intelligent Search

Constructed a robust RAG extraction and search platform indexing 50,000+ localized documents via a distributed SOLR topology on GCP instances. Coupled the ingestion system with zero-shot T5 architectures to generate context-aware summarizations.

RAG Apache SOLR GCP Cloud

Technical Skills

Speech & Voice AI

Wav2Vec2 / HuBERT Coqui XTTS-v2 Deepgram Nova 3 Google TTS API SSML Orchestration Acoustic Alignment

Machine Learning & NLP

PyTorch Transformers HuggingFace LLM Fine-tuning RAG Systems Multi-task Learning

Languages & Frameworks

Python JavaScript (React) C++ SQL Flask

Systems & Infrastructure

WebRTC / LiveKit Kubernetes Maze (OpenStack) GCP Apache Solr

Education

University at Buffalo, SUNY

Aug 2024 – May 2026

MS in Computer Science & Engineering (Research Track) GPA: 3.81/4.0

Courses: Reinforcement Learning, NLP, Information Retrieval, Computational Linguistics, Parallel & Distributed Processing

Symbiosis International University

MS in Computer Application, AI & Data Science | GPA: 8.83/10.0

2020 – 2022

Bachelor in Computer Application, Software Engineering | GPA: 7.93/10.0

2017 – 2020