Historical notes on semi-parametric theory and estimation

August 29, 2025

This post gathers some notes on the history of various topics with a focus on the history of semi-parametric efficiency theory and related estimators. Some of these are more fleshed out than others. I intend to add to this as I learn more. Bibtex is provided at the bottom.

Statistical Functionals and von Mises calculus

The concept of a statistical functional has origins in von Mises (1947). There are also two earlier works by von Mises, published in French in 1936 and 1939. Interestingly, the author in both is given as “de Mises”, a translation into French of “von Mises”. Fernholz (1983) has a nice discussion. Following her notation, let $( X_1, \dots, X_n )$ an i.i.d. sample from a law with distribution function $F$. A statistical functional $T$ is a statistic of the empirical distribution $F_n: F_n \mapsto T(F_n)$. von Mises proposed expanding $T_n$ in the form $$ T_n(F_n) = T(F) + T'_{F}\left( F_n - F \right) + \mathsf{Rem}_n(F_n - F) $$ where $T'_F$ is a derivative of the functional and $\mathsf{Rem}_n$ is a remainder term. The derivative considered here was a type of Gateaux derivative. Subsequent work considered expansions based on alternative types of derivatives, which apply to larger sets of functionals. See in this line Filippova (1962), Reeds (1976), Huber (1977), Serfling (1980).

Since then, the term “von Mises expansion” has been used to refer to this style of expansion for more general statistical functionals; see e.g. Kennedy 2024.

Sample Splitting

To my knowledge, the first case of sample-splitting is in Schick (1986), who proposes splitting a sample into two parts, estimating a nuisance parameter in each part, and then using the left-out sample to evaluate the estimate. He writes “We divide the ample [sic] in two equal parts, obtain an estimate of the score function from each part, and evaluate the estimate of the score function obtained from the first part only with observations from the second part and vice versa."

The idea splitting a sample into two parts and using one for estimation and the other for prediction predates Schick, but it seems he was the first to rotate the splits and average the results. He cites Bickel (1982) as using the latter strategy. Pfanzagl (1982) also describes this approach being used by Hasminskii and Ibragimov (1979), but I haven’t been able to track down a copy of their original paper.

One-step estimation

See Pfanzagl (1982) as an early example of a one-step estimator.

Bibtex

Presented in chronological order.

@article{vonmises1936functionals,
     author = {Mis\'es, R. de},
     title = {Les lois de probabilit\'e pour les fonctions statistiques},
     journal = {Annales de l'institut Henri Poincar\'e},
     pages = {185--212},
     publisher = {Institut Henri Poincar\'e et les Presses Universitaires de France},
     volume = {6},
     number = {3-4},
     year = {1936},
     zbl = {0016.31204},
     language = {fr},
     url = {https://www.numdam.org/item/AIHP_1936__6_3-4_185_0/}
}

@article{vonmises1939functionals,
    author = {Mises, R. de},
    journal = {Bulletin de la Société Mathématique de France},
    keywords = {probability theory},
    language = {fre},
    pages = {177-184},
    publisher = {Société mathématique de France},
    title = {Sur les fonctions statistiques},
    url = {http://eudml.org/doc/86726},
    volume = {67},
    year = {1939},
}

@article{vonmises1947functionals,
    author = {R. {von Mises}},
    title = {{On the Asymptotic Distribution of Differentiable Statistical Functions}},
    volume = {18},
    journal = {The Annals of Mathematical Statistics},
    number = {3},
    publisher = {Institute of Mathematical Statistics},
    pages = {309 -- 348},
    year = {1947},
    doi = {10.1214/aoms/1177730385},
    URL = {https://doi.org/10.1214/aoms/1177730385}
}


@article{filippova1962mises,
    author = {Filippova, A. A.},
    title = {Mises’ Theorem on the Asymptotic Behavior of Functionals of Empirical Distribution Functions and Its Statistical Applications},
    journal = {Theory of Probability \& Its Applications},
    volume = {7},
    number = {1},
    pages = {24-57},
    year = {1962},
    doi = {10.1137/1107003},
    URL = {https://doi.org/10.1137/1107003},
    eprint = {https://doi.org/10.1137/1107003}
}

@phdthesis{reeds1976vonmises,
    author = {Reeds, J. A.},
    title = {On the definition of von Mises functionals},
    school = {Harvard University, Cambridge, MA},
    year = {1976}
}

@conference{huber1977robust,
    author = {Huber, P. J.},
    booktitle = {Regional Conference Series in Applied Mathematics 27, SIAM},
    title = {Robust Statistical Procedures},
    year = {1977}
}

@inproceedings{hasminskii1979functionals,
  author    = {Hasminskii, R. and Ibragimov, I. A.},
  title     = {On the nonparametric estimation of functionals},
  booktitle = {Proceedings of the Second Prague Symposium on Asymptotic Statistics},
  editor    = {Mandl, P. and Hu{\v{s}}kova, M.},
  pages     = {41--51},
  year      = {1979},
  publisher = {North-Holland},
  address   = {Amsterdam}
}

@book{serfling1980approximation,
    author = {Serfling, R. J.},
    title = {Approximation Theorems of Mathematical Statistics},
    publisher = {John Wiley \& Sons},
    year = {1980}
}

@article{bickel1982adaptive,
    author = {P. J. Bickel},
    doi = {10.1214/aos/1176345863},
    journal = {The Annals of Statistics},
    number = {3},
    pages = {647 -- 671},
    publisher = {Institute of Mathematical Statistics},
    title = {On Adaptive Estimation},
    url = {https://doi.org/10.1214/aos/1176345863},
    volume = {10},
    year = {1982}
}

@book{pfanzagl1982theory,
    author="Pfanzagl, J.",
    title="Contributions to a General Asymptotic Statistical Theory",
    year="1982",
    publisher="Springer New York",
    address="New York, NY"
}

@book{fernholz1983vonmises,
    author="Fernholz, Luisa Turrin",
    title="von Mises Calculus For Statistical Functionals",
    publisher="Springer New York",
    address="New York, NY",
    year="1983"
}

@article{schick1986efficientestimation,
    author = {Anton Schick},
    title = {{On Asymptotically Efficient Estimation in Semiparametric Models}},
    volume = {14},
    journal = {The Annals of Statistics},
    number = {3},
    publisher = {Institute of Mathematical Statistics},
    pages = {1139 -- 1151},
    year = {1986},
    doi = {10.1214/aos/1176350055},
    URL = {https://doi.org/10.1214/aos/1176350055}
}

@inbook{kennedy2024review,
    author={Edward H. Kennedy},
    title={Semiparametric doubly robust targeted double machine learning: a review}, 
    publisher = {Chapman and Hall/CRC},
    year = {2024},
    chapter = {10},
    bookTitle={Handbook of Statistical Methods for Precision Medicine}
}