Authors: Max Daniels, Tyler Maunu, Paul Hand
Original Paper Download Report GitHub RepositoryAbstract: This paper investigates the integration of score-based generative models into regularized optimal transport for addressing the computational challenges of large-scale OT problems. Optimal transport often becomes computationally intractable for large datasets. Regularized OT, using methods like the Sinkhorn algorithm, introduces entropy-based regularization to enhance efficiency. However, these methods suffer from limitations, including averaging artifacts when deriving transport maps. To address this, \cite{daniels2022scorebasedgenerativeneuralnetworks} propose a hybrid approach combining score-based generative models with regularized OT. Their method follows the work of \cite{seguy2018largescaleoptimaltransportmapping}, using neural networks to approximate optimal dual variables and employs Langevin dynamics \cite{langevinsampling} for conditional sampling, enabling direct sampling from the Sinkhorn coupling without averaging artifacts. The contributions of this work are both theoretical and practical: they build on existing formulations of $f$-divergence-regularized OT and introduce a novel numerical framework, SCONES (Sinkorn Conditional Neural Sampling), for efficiently approximating and sampling OT plans. We validate their approach through experiments on toy distributions, comparing its performance against barycentric projection method from \cite{seguy2018largescaleoptimaltransportmapping} in terms of accuracy and computational efficiency. Additionally, we analyze the effects of relevant hyperparameters parameters to understand their influence on the method's performance. This work extends the applicability of OT methods to large-scale, high-dimensional problems in machine learning.
Authors: Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A Efros
Original Paper Download Report GitHub RepositoryAbstract: Generalization under distribution shifts remains a critical challenge in computer vision. Test-time training (TTT) addresses this by adapting models dynamically during deployment, using self-supervised learning to improve performance on unseen test distributions. This report evaluates the TTT-MAE framework, which integrates Masked Autoencoders into TTT, for its effectiveness on the ImageNet-C benchmark. Through experiments, we confirm TTT-MAE's capability to enhance robustness under all corruption types. We analyze failure cases, hypothesizing a decoupling between the reconstruction and classification tasks. \\Additionally, an online variant of TTT-MAE, where encoder weights are not reset between test samples, demonstrates notable improvements, suggesting potential for cumulative adaptation. Despite resource constraints influencing the evaluation protocol, the findings provide critical insights into TTT-MAE's strengths and limitations, paving the way for future refinements to optimize its performance in real-world applications.
Authors: Yingzhen Li, John Bradshaw, Yash Sharma
Original Paper Download Report GitHub RepositoryAbstract: This report presents our work on the article \textit{Are Generative Classifiers More Robust to Adversarial Attacks?} \cite{li2019generativeclassifiersrobustadversarial}. We implemented the authors experiment on MNIST \cite{deng2012mnist}, and applied the methods on the German Traffic Sign Recognition Benchmark dataset \cite{gtsrb}, under black-box adversarial attacks, and were unable to conclude on whether generative classifiers were more robust to adversarial attacks than discriminative classifiers.
Authors: Killian Steunou
Abstract: This paper addresses the challenge of toxic gas identification and characterization using sensor data significantly influenced by varying humidity levels, which present a notable distribution shift between training and test datasets. To bridge this domain gap, we explore and benchmark several modeling strategies, including strategic data partitioning that emulates test-set humidity distributions, standard machine learning models (Random Forest and XGBoost), a two-stage classification-regression approach, and an end-to-end deep learning architecture called RAMTNet, specifically designed for multi-task regression. Additionally, we implement Unsupervised Domain Adaptation through adversarial training, encouraging domain-invariant feature representations. Our experiments reveal that careful simulation of test conditions and domain adaptation substantially mitigate overfitting caused by humidity-induced distribution shifts. The best-performing approach, a two-stage model combining classification and regression, achieves a weighted RMSE of $0.154256$, surpassing the provided baseline ($0.1567$).
Download Report GitHub RepositoryAuthors: Eloi Tanguy
Original Paper Download Report GitHub RepositoryAbstract: In this report, we verify the theoretical results of \cite{tanguy2024convergencesgdtrainingneural} on the convergence of neural networks trained with the sliced Wasserstein distance by generating 2D data points and FashionMNIST images using the proposed algorithm. We also study and experiment the proposed alternative optimizer to SGD called Noise Projected SGD (NPSGD).
Authors: Ishan Misra, Rohit Girdhar, Armand Joulin
Original Paper Download ReportAbstract: 3DETR \cite{3detr} is an end-to-end Transformer based object detection model for 3D point clouds. It makes minimal modifications to the Transformer based on DETR \cite{detr}, a model for 2D images, to adapt it for 3D point data. At the same time, it is a one-step model, meaning it does not require a pretrained model as DETR does. Compared with other detection methods for 3D data, it uses fewer hand-tuned hyperparameters, fewer hand-designed inductive biases, and non-parametric queries, making the model much easier to implement. In this report, we explain the method, evaluate the performance of 3DETR, study the impact of the test-time number number of queries, and experiment a version of the model that uses RGB values of point clouds.