Théo Lepage

Théo Lepage

Ph.D. student in Machine Learning → learning robust representations for speaker & language recognition

Paris, Île-de-France, France
637 abonnés + de 500 relations

Activité

S’inscrire pour voir toute l’activité

Expérience

  • Graphique Siemens Healthineers

    Research Scientist Intern

    Siemens Healthineers

    - 7 mois

    Princeton, New Jersey, États-Unis

    • Focused on state-of-the-art deep learning models for MR images enhancement (denoising and super-resolution)
    • Designed a CNN architecture that leverages the attention mechanism of Vision Transformers and recovers more details compared to the solution being used in the product

  • Graphique LSE - EPITA Systems Laboratory

    Research Student

    LSE - EPITA Systems Laboratory

    - 2 ans 1 mois

    Région de Paris, France

    • Worked on self-supervised methods applied to speaker and language recognition while doing monthly "lightning" talks about my progress (supervised by Dr. Réda Dehak)
    • Developed a label-efficient non-contrastive speaker verification model that outperforms its supervised counterpart when fine-tuned with only 2% of labeled data
    • Our work led to a publication and an oral presentation at INTERSPEECH 2022 (one of the top conferences in the field)

  • Graphique CNRS - Centre national de la recherche scientifique

    Software Developer Intern

    CNRS - Centre national de la recherche scientifique

    - 5 mois

    Ville de Paris, Île-de-France, France

    • Contributed to a real-time digital holography software (C++ / CUDA) used for retinal blood flow analysis in a medical setting
    • Our work resulted in a 20x (500 to 10,000 FPS) speedup which improved substantially output images contrast and quality
    • Our refactoring and the addition of unit tests improved the stability and allowed the project to become open source
    • Founding member of the association 'Digital Holography' created to sustain the development of the software

  • Graphique EPITA: Ecole d'Ingénieur en Informatique

    Teaching Assistant

    EPITA: Ecole d'Ingénieur en Informatique

    - 1 an 1 mois

    Région de Paris, France

    • Taught Unix concepts as well as C and Rust programming languages to undergraduates through weekly graded practicals

Formation

  • Graphique Sorbonne Université

    Sorbonne Université

    Doctor of Philosophy (Ph.D.) Artificial Intelligence

    -

    • Conducting research related to "Learning speech and speaker representations for robust speaker and language recognition"
    • Supported by French ANR 'APATE' project (Forensic Deepfakes Detection Toolbox)
    • Supervised by Dr. Réda Dehak and Pr. Thierry Géraud (LRE-EPITA)

  • Graphique EPITA: Ecole d'Ingénieurs en Informatique

    École pour l'informatique et les Techniques Avancées

    Master of Engineering (M.Eng.) Computer Science

    -

    • International section
    • Exchange semester in Spring 2019 at California State University Monterey Bay (CSUMB)
    • Signal processing and machine learning (IMAGE major) + scientific research specialization (RDI major)

Licences et certifications

Publications

  • Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations

    Odyssey 2024

    Self-Supervised Learning (SSL) frameworks became the standard for learning robust class representations by benefiting from large unlabeled datasets. For Speaker Verification (SV), most SSL systems rely on contrastive-based loss functions. We explore different ways to improve the performance of these techniques by revisiting the NT-Xent contrastive loss. Our main contribution is the definition of the NT-Xent-AM loss and the study of the importance of Additive Margin (AM) in SimCLR and MoCo SSL…

    Self-Supervised Learning (SSL) frameworks became the standard for learning robust class representations by benefiting from large unlabeled datasets. For Speaker Verification (SV), most SSL systems rely on contrastive-based loss functions. We explore different ways to improve the performance of these techniques by revisiting the NT-Xent contrastive loss. Our main contribution is the definition of the NT-Xent-AM loss and the study of the importance of Additive Margin (AM) in SimCLR and MoCo SSL methods to further separate positive from negative pairs. Despite class collisions, we show that AM enhances the compactness of same-speaker embeddings and reduces the number of false negatives and false positives on SV. Additionally, we demonstrate the effectiveness of the symmetric contrastive loss, which provides more supervision for the SSL task. Implementing these two modifications to SimCLR improves performance and results in 7.85% EER on VoxCeleb1-O, outperforming other equivalent methods.

    Other authors
    See publication
  • Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models

    INTERSPEECH 2024

    Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that speech representations from large-scale ASR models contain valuable speaker information. This work explores the limitations of fine-tuning these models for SV using an SSL contrastive objective in an end-to-end approach. Then, we propose a framework to learn…

    Recent advancements in Self-Supervised Learning (SSL) have shown promising results in Speaker Verification (SV). However, narrowing the performance gap with supervised systems remains an ongoing challenge. Several studies have observed that speech representations from large-scale ASR models contain valuable speaker information. This work explores the limitations of fine-tuning these models for SV using an SSL contrastive objective in an end-to-end approach. Then, we propose a framework to learn speaker representations in an SSL context by fine-tuning a pre-trained WavLM with a supervised loss using pseudo-labels. Initial pseudo-labels are derived from an SSL DINO-based model and are iteratively refined by clustering the model embeddings. Our method achieves 0.99% EER on VoxCeleb1-O, establishing the new state-of-the-art on self-supervised SV. As this performance is close to our supervised baseline of 0.94% EER, this contribution is a step towards supervised performance on SV with SSL.

    Other authors
    See publication
  • Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification

    INTERSPEECH 2023

    Most state-of-the-art self-supervised speaker verification systems rely on a contrastive-based objective function to learn speaker representations from unlabeled speech data. We explore different ways to improve the performance of these methods by: (1) revisiting how positive and negative pairs are sampled through a "symmetric" formulation of the contrastive loss; (2) introducing margins similar to AM-Softmax and AAM-Softmax that have been widely adopted in the supervised setting. We…

    Most state-of-the-art self-supervised speaker verification systems rely on a contrastive-based objective function to learn speaker representations from unlabeled speech data. We explore different ways to improve the performance of these methods by: (1) revisiting how positive and negative pairs are sampled through a "symmetric" formulation of the contrastive loss; (2) introducing margins similar to AM-Softmax and AAM-Softmax that have been widely adopted in the supervised setting. We demonstrate the effectiveness of the symmetric contrastive loss which provides more supervision for the self-supervised task. Moreover, we show that Additive Margin and Additive Angular Margin allow reducing the overall number of false negatives and false positives by improving speaker separability. Finally, by combining both techniques and training a larger model we achieve 7.50% EER and 0.5804 minDCF on the VoxCeleb1 test set, which outperforms other contrastive self supervised methods on speaker verification.

    Other authors
    See publication
  • Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

    INTERSPEECH 2022

    State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large…

    State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large inter-speaker variance. Our approach is based on recent information maximization learning frameworks and an intensive data augmentation pre-processing step. We evaluate the ability of these methods to work without contrastive samples before showing that they achieve better performance when combined with a contrastive loss. Furthermore, we conduct experiments to show that our method reaches competitive results compared to existing techniques and can get better performances compared to a supervised baseline when fine-tuned with a small portion of labeled data.

    Other authors
    See publication

Langues

  • Anglais

    Capacité professionnelle générale

  • Français

    Bilingue ou langue natale

Plus d’activités de Théo

Voir le profil complet de Théo

  • Découvrir vos relations en commun
  • Être mis en relation
  • Contacter Théo directement
Devenir membre pour voir le profil complet

Autres profils similaires

Autres personnes nommées Théo Lepage (France)

Ajoutez de nouvelles compétences en suivant ces cours