Theory of Replicable ML

Course Description

Replicability is vital to ensuring scientific conclusions are reliable, but failures of replicability have been a major issue in nearly all scientific areas of study, and machine learning is no exception. In this course, we will study replicability as a property of learning and other statistical algorithms, developing a theory of replicable learning. We will cover recent formalizations of replicability and their relationships to other common stability notions such as differential privacy and adaptive generalization. We will survey replicable algorithms for fundamental learning tasks, and discuss the limitations of replicable algorithms. If time permits, we will discuss replicability in other settings, such as reinforcement learning and clustering, or other useful and related stability notions such as list replicability and global stability.

Lecture Notes

Lecture 1: Introduction, Markov, Chebyshev [notes]
Lecture 2: Hoeffding [notes]
Lecture 3: Statistical queries, PAC learning for finite classes [notes]
Lecture 4: More SQ algorithms [notes]
Lecture 5: Replicable SQs [notes]
Lecture 6/7: Replicable SQ lower-bound [notes]
Lecture 8: Replicable SQ lower-bound, adaptive statistical queries [notes]
Lecture 9: Overfitting with adaptive SQs [notes]
Lecture 10: Uniform Change One stability [notes]
Lecture 11: UCO => Generalization [notes]
Lecture 12: TV stability, differential privacy [notes]
Lecture 13: Stability => expected generalization [notes]
Lecture 14: Adaptive composition [notes]
Lecture 15: Adaptive composition continued [notes]
Lecture 16: Exponential mechanism [notes]
Lecture 17: DP => high probability generalization [notes]
Lecture 18: DP => high probability generalization (!!!) [notes]
Lecture 19: Accuracy and generalization for adaptive SQs [notes]
Lecture 20: Replicable heavy hitters [notes]
Lecture 21: Replicable learning for finite hypothesis classes [notes]

Project Ideas

https://jess-sorrell.github.io/Courses/Replicable-ML/project_ideas.pdf

Resources

Adaptive Data Analysis course notes. https://adaptivedataanalysis.com/about/
The Algorithmic Foundations of Differential Privacy. https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
Reproducibility in Learning. https://arxiv.org/abs/2201.08430
Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization. https://arxiv.org/abs/2303.12921
Replicability in High Dimensional Statistics. https://arxiv.org/abs/2406.02628
Generalization in Adaptive Data Analysis and Holdout Reuse. https://arxiv.org/abs/1506.02629
Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing. https://arxiv.org/abs/1604.03924
User-Level Privacy via Correlated Sampling. https://arxiv.org/abs/2110.11208
Replicability in reinforcement learning. https://arxiv.org/abs/2305.19562
Replicable Reinforcement Learning. https://arxiv.org/abs/2305.15284
Replicable Clustering. https://arxiv.org/abs/2302.10359
Replicability and Stability in Learning. https://arxiv.org/abs/2304.03757

EN.601.774 Theory of Replicable ML

Course Info

Course Description

Lecture Notes

Project Ideas

Resources

Office Hours:	Jess - Friday 12pm (though if no one shows up in the first 15 min, I may leave) or by appointment
	Iliana - by appointment