This page is under construction and will be updated as the program is finalized.
Property testing asks when we can make reliable global decisions about massive objects (such as graphs, probability distributions, and Boolean functions) by inspecting only tiny parts of them. This simple idea has led to a rich theory, with deep connections to combinatorics, complexity theory, analysis of Boolean functions, and sublinear-time computation.
Many classical questions in the field, especially those motivated by Boolean-function query models and complexity theory, are now reaching a mature stage. This workshop will look ahead to new models and problem formulations that arise naturally in data science, statistics, and machine learning.
The goal is to bring together researchers in property testing, learning theory, complexity theory, statistics, and machine learning to revisit classical problems through modern lenses and highlight promising new directions.
Schedule
The schedule below is tentative and subject to change.
| June 25, 8:30am-11:00am | |
|---|---|
| 8:30-8:35 | Opening remarks |
| 8:35-9:25 |
TBAShivam Nadimpalli (MIT)TBA |
| 9:25-10:15 |
TBAErik Waingarten (University of Pennsylvania)TBA |
| 10:15-11:00 |
Your data is blue, but can you prove it? Interactive proofs for distribution properties.Tal Herman (UC Berkeley and MIT)Given i.i.d. samples from an unknown distribution, what can we learn about the distribution? How many samples are required to determine whether the distribution admits various properties? These are the central questions of the field of distribution testing, and they have spurred a rich body of work, establishing a deep understanding of the computational complexity associated with such tasks, and their many applications across TCS. In this talk we explore a novel perspective on this topic: what is the complexity of verifying claims about distributions? Suppose some untrusted party allegedly drew many samples from the distribution, ran some complicated analysis on the samples, and claims that the distribution admits a certain property (for example, that the distribution has high entropy, is monotone, is far from uniform, etc.). Can they provide a concise proof of their claim? Can we verify their proof with fewer resources than required for testing the property? We review a recent line of work addressing these questions, where verification is considered via interactive proof systems (as defined in the seminal work of Goldwasser, Micali, and Rackoff '85, and introduced to this setting by Chiesa and Gur '17). We will discuss efficient proof systems for rich families of distribution properties, known lower bounds and limitations of such proofs, and present avenues for future research and possible applications for verification of data-intensive algorithms. |
| June 26, 8:30am-11:00am | |
| 8:30-9:20 |
TBARocco Servedio (Columbia University)TBA |
| 9:20-10:10 |
Quality control in sublinear timeCassandra Marcussen (Harvard University)In this talk, I will introduce "quality control problems." This is a class of problems that assess when it is safe to apply an algorithm that works well on average to a specific input. Specifically, a quality control algorithm for a distribution D and a real-valued quality measuring function rho that is concentrated around 1 for inputs from D, takes an input x and rejects every x whose quality is not close to 1, while accepting most inputs from D. Our goal is to design quality control algorithms that perform this assessment efficiently, for example, in sublinear time. Quality control problems are thus asymmetric hypothesis testing problems with worst-case soundness guarantees ("no false positives") and completeness on average ("low probability of false negatives"). An input passed by the quality control algorithm is thus safe to run on downstream algorithms whose correctness is assured on high-quality instances. In this talk, I will describe some quality control algorithms that use k-clique counts (for constant k) as the quality measure on random graphs G_{n,p}. I will show that quality control algorithms are provably faster than worst-case algorithms for clique-count estimation. I will also highlight two key techniques behind these results: composability and graph quasirandomness. Based on joint work with Ronitt Rubinfeld and Madhu Sudan. |
| 10:10-11:00 |
Testable Sanitization via Boundary GeometryManolis Zampetakis (Yale University)Machine learning models are prone to attacks where an adversary changes the model's prediction by adding a small perturbation to the input. Two such well-known threats in classification are: 1) backdoor attacks, where a malicious trainer plants a hidden structure in the model that allows them to flip the label by small alterations of the input, and 2) adversarial examples, which arise even in honestly trained models, where an adversary can efficiently find a nearby point with the opposite label for most of the input points. Both of these liabilities share a common geometric root: the decision boundary of the model passes too close to too many data points. In this work, we introduce testable sanitization, a framework that either improves the decision boundary of a classifier or rejects it as too risky to deploy. More formally, we ask for an algorithm that, given oracle access to a classifier f and sample access to a data distribution D, either accepts and outputs a function g which is ε-close to f and robust to small perturbations, or rejects. The algorithm performs testable sanitization of the function class F if it is guaranteed to accept whenever f belongs to F. Next, we construct an algorithm that performs testable sanitization of several interesting function classes. First, our algorithm performs testable sanitization of the class of functions with a small decision boundary for accuracy ε that depends on the boundary size. We prove that this dependence is essentially optimal, i.e., no testable sanitization algorithm of this class can achieve arbitrary accuracy ε. Second, we show our algorithm achieves testable sanitization of the class of functions with a small and smooth decision boundary, for arbitrary accuracy ε. Finally, we show that our results apply to the class of halfspaces and of polynomial threshold functions. |