https://arxiv.org/abs/2009.00513 (2020)
“Library digitization has made more than a hundred thousand
19th-century English-language books available to the public.
Do the books which have been digitized reflect the population
of published books? An affirmative answer would allow book
and literary historians to use holdings of major digital
libraries as proxies for the population of published works,
sparing them the labor of collecting a representative sample.
We address this question by taking advantage of exhaustive
bibliographies of novels published for the first time in the
British Isles in 1836 and 1838, identifying which of these
novels have at least one digital surrogate in the Internet
Archive, HathiTrust, Google Books, and the British Library.
We find that digital surrogate availability is not random.
Certain kinds of novels, notably novels written by men and
novels published in multivolume format, have digital
surrogates available at distinctly higher rates than other kinds
of novels.”
“We find that of the 126 novels, 106 (84%) have at least one digital
surrogate available in the major digital libraries.”