Research Projects

Score-Based Generative Neural Networks for Large-Scale Optimal Transport

Authors: Max Daniels, Tyler Maunu, Paul Hand

Original Paper Download Report GitHub Repository

Abstract: This paper investigates the integration of score-based generative models into regularized optimal transport for addressing the computational challenges of large-scale OT problems. Optimal transport often becomes computationally intractable for large datasets. Regularized OT, using methods like the Sinkhorn algorithm, introduces entropy-based regularization to enhance efficiency. However, these methods suffer from limitations, including averaging artifacts when deriving transport maps. To address this, \cite{daniels2022scorebasedgenerativeneuralnetworks} propose a hybrid approach combining score-based generative models with regularized OT. Their method follows the work of \cite{seguy2018largescaleoptimaltransportmapping}, using neural networks to approximate optimal dual variables and employs Langevin dynamics \cite{langevinsampling} for conditional sampling, enabling direct sampling from the Sinkhorn coupling without averaging artifacts. The contributions of this work are both theoretical and practical: they build on existing formulations of $f$-divergence-regularized OT and introduce a novel numerical framework, SCONES (Sinkorn Conditional Neural Sampling), for efficiently approximating and sampling OT plans. We validate their approach through experiments on toy distributions, comparing its performance against barycentric projection method from \cite{seguy2018largescaleoptimaltransportmapping} in terms of accuracy and computational efficiency. Additionally, we analyze the effects of relevant hyperparameters parameters to understand their influence on the method's performance. This work extends the applicability of OT methods to large-scale, high-dimensional problems in machine learning.

Test Time Training with Masked Autoencoders

Authors: Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A Efros

Original Paper Download Report GitHub Repository

Abstract: Generalization under distribution shifts remains a critical challenge in computer vision. Test-time training (TTT) addresses this by adapting models dynamically during deployment, using self-supervised learning to improve performance on unseen test distributions. This report evaluates the TTT-MAE framework, which integrates Masked Autoencoders into TTT, for its effectiveness on the ImageNet-C benchmark. Through experiments, we confirm TTT-MAE's capability to enhance robustness under all corruption types. We analyze failure cases, hypothesizing a decoupling between the reconstruction and classification tasks. \\Additionally, an online variant of TTT-MAE, where encoder weights are not reset between test samples, demonstrates notable improvements, suggesting potential for cumulative adaptation. Despite resource constraints influencing the evaluation protocol, the findings provide critical insights into TTT-MAE's strengths and limitations, paving the way for future refinements to optimize its performance in real-world applications.

Are Generative Classifiers More Robust to Adversarial Attacks?

Authors: Yingzhen Li, John Bradshaw, Yash Sharma

Original Paper Download Report GitHub Repository

Abstract: This report presents our work on the article \textit{Are Generative Classifiers More Robust to Adversarial Attacks?} \cite{li2019generativeclassifiersrobustadversarial}. We implemented the authors experiment on MNIST \cite{deng2012mnist}, and applied the methods on the German Traffic Sign Recognition Benchmark dataset \cite{gtsrb}, under black-box adversarial attacks, and were unable to conclude on whether generative classifiers were more robust to adversarial attacks than discriminative classifiers.

Toxic Gas Characterization

Authors: Killian Steunou

Abstract: This paper addresses the challenge of toxic gas identification and characterization using sensor data significantly influenced by varying humidity levels, which present a notable distribution shift between training and test datasets. To bridge this domain gap, we explore and benchmark several modeling strategies, including strategic data partitioning that emulates test-set humidity distributions, standard machine learning models (Random Forest and XGBoost), a two-stage classification-regression approach, and an end-to-end deep learning architecture called RAMTNet, specifically designed for multi-task regression. Additionally, we implement Unsupervised Domain Adaptation through adversarial training, encouraging domain-invariant feature representations. Our experiments reveal that careful simulation of test conditions and domain adaptation substantially mitigate overfitting caused by humidity-induced distribution shifts. The best-performing approach, a two-stage model combining classification and regression, achieves a weighted RMSE of $0.154256$, surpassing the provided baseline ($0.1567$).

Download Report GitHub Repository

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Authors: Eloi Tanguy

Original Paper Download Report GitHub Repository

Abstract: In this report, we verify the theoretical results of \cite{tanguy2024convergencesgdtrainingneural} on the convergence of neural networks trained with the sliced Wasserstein distance by generating 2D data points and FashionMNIST images using the proposed algorithm. We also study and experiment the proposed alternative optimizer to SGD called Noise Projected SGD (NPSGD).

An End-to-End Transformer Model for 3D Object Detection

Authors: Ishan Misra, Rohit Girdhar, Armand Joulin

Original Paper Download Report

Abstract: 3DETR \cite{3detr} is an end-to-end Transformer based object detection model for 3D point clouds. It makes minimal modifications to the Transformer based on DETR \cite{detr}, a model for 2D images, to adapt it for 3D point data. At the same time, it is a one-step model, meaning it does not require a pretrained model as DETR does. Compared with other detection methods for 3D data, it uses fewer hand-tuned hyperparameters, fewer hand-designed inductive biases, and non-parametric queries, making the model much easier to implement. In this report, we explain the method, evaluate the performance of 3DETR, study the impact of the test-time number number of queries, and experiment a version of the model that uses RGB values of point clouds.