Anuj Diwan

I am a fourth year PhD student in the Computer Science Department at the University of Texas at Austin. I am fortunate to be co-advised by Prof. David Harwath and Prof. Eunsol Choi. I am part of the broader UT NLP group.

My research interests are in the fields of Speech and Natural Language Processing. My current research focuses on stylistic and multilingual speech generation.

I received my B.Tech (with Honors) degree in Computer Science and a Minor degree in Statistics from IIT Bombay in 2021, where I had a wonderful time working with Prof. Preethi Jyothi and Prof. Sunita Sarawagi.

I have also spent some time interning at Google DeepMind (Summer 2023, with Yu Zhang and Ankur Bapna), Meta AI (Summer 2022, with Abdelrahman Mohamed, Wei-Ning Hsu and Ching-Feng Yeh), Adobe Research India (Summer 2020) and UCLouvain (Summer 2019).

In my spare time, I enjoy reading, quizzing, solving wordgames, and watching the latest movies and TV shows.

Publications

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, et al.

Preprint

Textless Speech-to-Speech Translation With Limited Parallel Data

Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi

EMNLP 2024 Findings

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

ICASSP 2023

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

Anuj Diwan*, Layne Berry*, Eunsol Choi, David Harwath, Kyle Mahowald (*Equal contribution)

EMNLP 2022

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

Anuj Diwan*, Puyuan Peng*, Raymond J. Mooney (*Equal contribution)

TL4NLP@NeurIPS 2022

Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages

Anuj Diwan, Preethi Jyothi

Interspeech 2021

Low Resource ASR: The surprising effectiveness of High Resource Transliteration

Shreya Khare*, Ashish Mittal*, Anuj Diwan*, Sunita Sarawagi, Preethi Jyothi, Samarth Bharadwaj (*Equal contribution)

Interspeech 2021

MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan

Interspeech 2021

Education

2021-present

PhD in Computer Science

University of Texas at Austin

Advisors: Prof. David Harwath and Prof. Eunsol Choi

2017-2021

B.Tech in Computer Science and Engineering (with Honours)

Indian Institute of Technology Bombay

Minor in Statistics

Experience

Summer 2023

Student Researcher

Google DeepMind, Mountain View, CA

Worked on multilingual speech generation.

Summer 2022

AI Research Intern

FAIR, Meta AI, Seattle, WA

Worked on continual learning for on-device speech recognition.

Summer 2020

Research Intern

Adobe Research, Bangalore, India

Summer 2019

Research Intern

UCLouvain, Belgium