Industrial PhD / Computer Vision / Multimodal Learning

Killian Steunou

Building efficient visual systems for the multimodal world.

I am an industrial PhD student at Institut Polytechnique de Paris and Moments Lab, working on efficient omni-modal learning for generalized video understanding.

PhD Industry and academia, since November 2025
Vision Video understanding, tracking, representation learning
Open Reproducible research and transparent engineering
Portrait of Killian Steunou
Current thesis

Efficient omni-modal learning for generalized video understanding.

Research themes

Efficiency, multimodal learning, tracking, and deployment-aware model design.

Working style

Scientific rigor, clear writing, reproducible pipelines, and pragmatic engineering.

About

A research practice shaped by efficiency, clarity, and deployment constraints.

I am currently pursuing an industrial PhD in machine learning at Institut Polytechnique de Paris and Moments Lab. The work sits at the intersection of modern multimodal models and the practical limits that define whether they remain useful in the real world.

I am especially drawn to problems in computer vision and deep learning because visual perception feels foundational to intelligence, both human and artificial. I like models that can scale, adapt, and still remain legible enough to improve.

Current focus

Efficient omni-modal learning for generalized video understanding, with an emphasis on scalable training and practical inference.

Research values

Open-source practices, transparent experimentation, and reproducible pipelines that other researchers can meaningfully build on.

Experience

Industrial research, product-minded experimentation, and model-building in production settings.

Download CV

Machine Learning PhD Student

Moments Lab

November 2025 - Present

  • Researching efficient omni-modal learning strategies for generalized video understanding.
  • Working on problems where representation quality, inference cost, and deployment realism all matter.

Deep Learning Research Engineer Intern

Idemia

April 2025 - October 2025

  • Explored how SAM 2 can strengthen end-to-end multi-object tracking systems.
  • Built segmentation-aware and proposal-aware tracking variants around MOTIP-style models.

AI Research Intern

Collecte Localisation Satellites

April 2024 - August 2024

  • Developed tooling to fine-tune foundation models on remote sensing datasets.
  • Benchmarked segmentation performance and summarized the rapidly changing literature.

Machine Learning Engineer Intern

JoliBrain

February 2023 - July 2023

  • Integrated ControlNet-style controls and zero-shot detection models into joliGEN.
  • Helped ship documentation and product-facing material around the tool.

Software Developer Intern

French Ministry of Agriculture

May 2022 - August 2022

  • Created agreste, an R package and R Shiny workflow for statistical publication automation.
  • Handled the project end to end, from requirements gathering to delivery.

Education

A mathematics and statistics foundation, refined through machine learning and vision research.

Machine Learning PhD

Institut Polytechnique de Paris

November 2025 - Present

  • Research topic: efficient omni-modal learning for generalized video understanding.

Master 2 Mathématiques, Vision, Apprentissage

ENS Paris-Saclay

September 2024 - March 2025

  • Optimal transport, convex optimization, deep learning, graphical models, and generative models.

Master 1 Applied Mathematics and Statistics

Toulouse School of Economics

September 2023 - April 2024

  • Econometrics, probability, optimization for ML, time series, and data science in Python.

Gap Year

University of Copenhagen

September 2022 - January 2023

  • Natural language processing, blockchain business development, energy economics, and tax policy.

Double Bachelor: Applied Mathematics and Economics

Toulouse School of Economics

September 2019 - April 2022

  • Linear algebra, analysis, statistics, econometrics, optimization, programming, and economics.

Research

Selected work across adversarial robustness, generative modeling, test-time adaptation, and 3D vision.

View all research

Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

We show, theoretically and empirically, that SPCA-based classifiers can be more robust than PCA-based alternatives under adversarial attack.

Preprint Code

Score-Based Generative Neural Networks for Large-Scale Optimal Transport

I reproduced the SCONES framework and evaluated how score-based modeling changes the behavior of regularized transport on synthetic distributions.

Report Code

Test Time Training with Masked Autoencoders

I extended TTT-MAE with an online setting and studied how adaptation behaves when distribution shift keeps evolving at inference time.

Report Code

An End-to-End Transformer Model for 3D Object Detection

We reproduced 3DETR on SUN RGB-D and explored how a lean transformer detector behaves when extended with RGB information.

Report More research

Projects

Tools, experiments, and interfaces that turn research ideas into usable artifacts.

Nail Bite Menu
macOS Computer vision

Nail Bite Detection App

A macOS menu bar app that detects nail biting from your webcam in real time to make the habit visible and measurable.

GitHub stars Total downloads
Audio Visual Transcription
Whisper Speech

Audio Visual Transcription

A tool for quickly subtitling audio and video content with OpenAI Whisper, exposed through a simple demo interface.

Video Background Removal
Segmentation Video

Video Background Removal

A background removal workflow based on Mobile SAM, designed to automatically segment and clean video streams.

GitHub stars
Video Object Detection
Detection Zero-shot

Video Object Detection

An implementation of Owl-ViT for zero-shot object detection in videos using natural-language prompts.

GitHub stars
joliGEN
Open source Generative AI

Contribution to joliGEN

I integrated ControlNet-inspired edge controls and SAM-based masking into joliGEN, then helped improve its documentation.

GitHub stars
MathViz
Education Visualization

MathViz

A Streamlit application that visualizes ideas from mathematics, statistics, machine learning, and algorithms.

GitHub stars
Wikipedia Graph
Graph Scraping

Wikipedia Graph

A French-language concept graph built by scraping linked Wikipedia topics and exporting the result for graph exploration.

GitHub stars
Word Ladder Generator
Language Algorithms

Word Ladder Generator

A French word-ladder generator that computes the shortest sequence of one-letter edits between two words.

GitHub stars

Writing

Long-form notes that turn research trends and evaluation tools into something easier to reason about.

Browse articles

Efficiency Follows Capability: A Decade of Video Understanding Research Trends

An analysis of how efficiency moved from the margins to the center of video understanding research from 2015 to 2025.

Read article

NLP Metrics for Image & Video Captioning: A Visual Guide

A worked visual walkthrough of n-grams, TF-IDF, BLEU, ROUGE-L, METEOR, and CIDEr for captioning research.

Read article

Demos

Interactive spaces for testing models and interfaces beyond the paper.

See all demos
Audio Visual Transcription demo

Audio Visual Transcription

Whisper-powered subtitling for audio and video with a fast demo workflow.

Video Background Removal demo

Video Background Removal

Interactive segmentation-based background removal for video.

MathViz demo

MathViz

Visual explanations for mathematics, statistics, machine learning, and algorithms.

Contact

If you care about efficient vision systems, robust evaluation, or research that has to ship, we should talk.

I am always interested in thoughtful conversations around machine learning research, multimodal systems, visual understanding, and the engineering decisions that make models usable outside the lab.

Video understanding Computer vision Multimodal ML Research collaborations