PhD student · machine learning & computer vision

Killian Steunou

I'm a PhD student at Institut Polytechnique de Paris and Moments Lab, where I work on efficient omni-modal learning for generalized video understanding — in short: how to make video models that see enough while computing less.

Before the PhD, I studied mathematics and statistics (Toulouse School of Economics), then vision and learning (Master MVA, ENS Paris-Saclay), with research internships at Idemia, CLS, and JoliBrain. I care about reproducible research, open source, and models that survive contact with the real world.

Portrait of Killian Steunou
Paris, France

Started my industrial PhD on efficient omni-modal learning for generalized video understanding, joint between Institut Polytechnique de Paris and Moments Lab.

older news

Joined Idemia as a deep learning research intern, working on SAM 2 for end-to-end multi-object tracking.

Completed the MVA master's (Mathématiques, Vision, Apprentissage) at ENS Paris-Saclay.

also on Google Scholar

full details in the CV

Experience

PhD student — Moments Lab

Nov 2025 – present

Efficient omni-modal learning for generalized video understanding, in an industrial research setting.

Education

PhD, machine learning — Institut Polytechnique de Paris

Nov 2025 – present

Master MVA (Mathématiques, Vision, Apprentissage) — ENS Paris-Saclay

2024 – 2025

Optimal transport, convex optimization, deep learning, graphical & generative models.

M1 applied mathematics & statistics — Toulouse School of Economics

2023 – 2024

Exchange semester — University of Copenhagen

2022 – 2023

NLP, energy economics, blockchain business development.

Double bachelor, applied mathematics & economics — Toulouse School of Economics

2019 – 2022

more on GitHub

Video Background Removal in action

Video Background Removal

Automatic video background removal built on Mobile SAM, with an interactive demo.

segmentation · video

joliGEN logo

joliGEN (contributor)

Integrated ControlNet-inspired edge controls and SAM-based masking into JoliBrain's generative toolkit.

generative AI · open source

Zero-shot video object detection

Video Object Detection

Zero-shot object detection in videos with Owl-ViT, driven by natural-language prompts.

detection · zero-shot

Nail Bite Detection

A macOS menu-bar app that spots nail biting from the webcam in real time, to make the habit visible.

macOS · computer vision

Audio Visual Transcription

Fast subtitling for audio and video with OpenAI Whisper, behind a simple interface.

speech · whisper

MathViz

A Streamlit app visualizing ideas from mathematics, statistics, ML, and algorithms.

education · visualization

Research reports (MVA & TSE coursework)

Score-based generative networks for large-scale optimal transport

SCONES reproduction · optimal transport

Test-time training with masked autoencoders, online

TTT-MAE extension · test-time adaptation

An end-to-end transformer model for 3D object detection

3DETR reproduction · 3D vision

Are generative classifiers more robust to adversarial attacks?

robustness · generative classifiers

Toxic gas characterization under humidity-driven domain shift

multi-task learning · adversarial adaptation

Convergence of SGD for training with sliced Wasserstein losses

optimization · generative modeling

Happy to talk about efficient vision systems, multimodal learning, evaluation, or research collaborations. The fastest way to reach me is email.

contact@killian-steunou.com

Or use the form — it lands in the same inbox.