Michele Mancusi

Michele Mancusi

Senior Research Scientist, PhD

Sony

Hello world! đź‘‹

I’m Michele Mancusi, a Senior Research Scientist at Sony. My work focuses on deep learning for generative models in speech, audio, and music utilizing Large Language Models (LLMs) and diffusion models to push the boundaries of what’s possible in audio technology.

Before joining Sony, I gained valuable experience as an intern at Microsoft and Musixmatch. At Microsoft, I worked on deep learning for unsupervised speech separation, and at Musixmatch, I focused on deep learning for singing voice detection.

I earned my Ph.D. from Sapienza University of Rome under the supervision of Prof. Emanuele RodolĂ  as a member of the Gladia research group. My doctoral research centered on music generation, source separation, and Natural Language Processing (NLP), contributing to advancements in the field of generative AI.

Interests
  • Deep Learning
  • Signal Processing
  • Generative AI
  • Music Generation
  • Source Separation
  • NLP
  • Speech Synthesis
Education

Work Experience

 
 
 
 
 
Sony
Senior Research Scientist
Apr 2024 – Present Stuttgart, Germany
  • Research on deep learning for generative models for speech and audio with LLM and diffusion models.
 
 
 
 
 
Sony
Visiting Research Scientist
Nov 2023 – Jan 2024 Stuttgart, Germany
  • Worked with Dr. Stefan Uhlich in the AI, Speech and Sound Group. Conducted research on deep learning for effects removal and timbre transfer with diffusion models.
 
 
 
 
 
Microsoft
Research Scientist Intern
Jun 2023 – Sep 2023 Redmond, Washington, USA
  • Worked with Dr. Sebastian Braun in the Audio and Acoustics Research Group. Conducted research on deep learning for unsupervised speech separation.
 
 
 
 
 
Musixmatch
Data Scientist Intern
Sep 2022 – Mar 2023 Bologna, Italy
  • Worked with Dr. Loreto Parisi in the AI Team. Conducted research on deep learning for singing voice detection.

Publications

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer
ICASSP 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
ICASSP 2025
High-Resolution Speech Restoration with Latent Diffusion Model
ICASSP 2025
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
Oral (top 1%) @ ICLR 2024
Accelerating transformer inference for translation via parallel decoding
ACL 2023

Achievements

  • First-author paper selected among the top 1% submissions for an oral presentation at ICLR 2024
  • Awarded €50,000 in AWS credit for the best generative AI research project
  • Awarded €20,000 for the best machine translation research project
  • Recognized as one of the top doctoral research projects and awarded research funding

Academic Experience

  • Lectured at the Deep Learning and Applied AI MSc course and mentored students for their Master’s Thesis
  • Delivered a guest lecture on the Latent Autoregressive Source Separation paper upon the invitation of Prof. Ronald Coifman

Attended

AAAI 2023: The 37th AAAI Conference on Artificial Intelligence
4th International Summer School on AI and Games
IRDTA: 4th International School on Deep Learning
ACDL: 3rd Advanced Course on Data Science & Machine Learning

Skills

lightning
Lightning
wandb
Weights & Biases
pytorch
PyTorch
hydra
Hydra
aws
AWS SageMaker
azure
Azure ML
slurm
Slurm
condor
HTCondor

Contact