Uniform Learnability, Model Selection, and Neural Networks

Andrew Barron (Yale)

A variety of pattern recognition methods including polynomial discriminants and neural networks have a feature of universal statistical consistency for arbitrary joint distributions of inputs and outputs. Indeed, the probability of error of the estimated discriminant converges to the Bayes optimal probability of error in the limit of large sample size when the size of the model is chosen adaptively. An index of resolvability quantifies the rate of convergence in terms of a complexity versus approximation trade-off. Though the convergence is not uniform over all distributions, this index of resolvability quantifies interesting nonparametric classes for which the convergence is uniform at a polynomial rate. We discuss the relationship of these conclusions to computational learning theory.