anuj.diwan@utexas.edu

Anuj Diwan

I am a fourth year PhD student in the Computer Science Department at the University of Texas at Austin. I am fortunate to be co-advised by Prof. David Harwath and Prof. Eunsol Choi. I am part of the broader UT NLP group.

My research interests are in the fields of Speech and Natural Language Processing. My current research focuses on stylistic and multilingual speech generation.

I received my B.Tech (with Honors) degree in Computer Science and a Minor degree in Statistics from IIT Bombay in 2021, where I had a wonderful time working with Prof. Preethi Jyothi and Prof. Sunita Sarawagi.

I have also spent some time interning at Google DeepMind (Summer 2023, with Yu Zhang and Ankur Bapna), Meta AI (Summer 2022, with Abdelrahman Mohamed, Wei-Ning Hsu and Ching-Feng Yeh), Adobe Research India (Summer 2020) and UCLouvain (Summer 2019).

In my spare time, I enjoy reading, quizzing, solving wordgames, and watching the latest movies and TV shows.

View my CV

Publications

Scaling Rich Style-Prompted Text-to-Speech Datasets

Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi

Preprint

PDF Code Demo

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing

Zhisheng Zheng, Puyuan Peng, Anuj Diwan, Cong Phuoc Huynh, Xiaohang Sun, Zhu Liu, Vimal Bhat, David Harwath

Preprint

Rhapsody: A Dataset for Highlight Detection in Podcasts

Younghan Park, Anuj Diwan, David Harwath, Eunsol Choi

Preprint

PDF

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

Brian Yan, Injy Hamed, Shuichiro Shimizu, Vasista Lodagala, William Chen, Olga Iakovenko, Bashar Talafha, Amir Hussein, Alexander Polok, Kalvin Chang, Dominik Klement, Sara Althubaiti, Puyuan Peng, Matthew Wiesner, Thamar Solorio, Ahmed Ali, Sanjeev Khudanpur, Shinji Watanabe, Chih-Chen Chen, Zhen Wu, Karim Benharrak, Anuj Diwan, et al.

Interspeech 2025

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, et al.

ICLR 2025

PDF

Textless Speech-to-Speech Translation With Limited Parallel Data

Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi

EMNLP 2024 Findings

PDF Poster

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

Anuj Diwan, Eunsol Choi, David Harwath

ACL 2023

PDF Poster

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

ICASSP 2023

PDF Poster

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

Anuj Diwan*, Layne Berry*, Eunsol Choi, David Harwath, Kyle Mahowald (*Equal contribution)

EMNLP 2022 (Oral)

PDF Code Slides

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

Anuj Diwan*, Puyuan Peng*, Raymond J. Mooney (*Equal contribution)

TL4NLP@NeurIPS 2022

PDF Poster

Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages

Anuj Diwan, Preethi Jyothi

Interspeech 2021 Shortlisted for Best Student Paper Award

PDF Slides

Low Resource ASR: The surprising effectiveness of High Resource Transliteration

Shreya Khare*, Ashish Mittal*, Anuj Diwan*, Sunita Sarawagi, Preethi Jyothi, Samarth Bharadwaj (*Equal contribution)

Interspeech 2021

PDF Slides

MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan

Interspeech 2021

PDF

Scheduling and Control of Executable Jobs over Compute Instances

Subrata Mitra, Sunav Choudhary, Shaddy Garg, Anuj Jitendra Diwan, Piyush Kumar Maurya, Arpit Aggarwal, Prateek Jain

US Patents and Trademarks Office (Filed 2021 | Issued 2024) | Adobe Inc.

Patent

Invited Talks

July 2024

AI4Bharat, Indian Institute of Technology Madras

Education

2021-present

PhD in Computer Science

University of Texas at Austin

Advisors: Prof. David Harwath and Prof. Eunsol Choi

2017-2021

B.Tech in Computer Science and Engineering (with Honours)

Indian Institute of Technology Bombay

Minor in Statistics

B.Tech Thesis: "Multilingual, Code-switching and Low-Resource NLP and ASR"

PDF

Experience

Summer 2023

Student Researcher

Google DeepMind, Mountain View, CA

Worked on multilingual speech generation.

Summer 2022

AI Research Intern

FAIR, Meta AI, Seattle, WA

Worked on continual learning for on-device speech recognition.

Summer 2020

Research Intern

Adobe Research, Bangalore, India

Summer 2019

Research Intern

UCLouvain, Belgium