Public Libraries in the United States, a Statistical Portrait
Robert E. Molyneux
Abstract
Data used in this study come from the US National Center for Education Statistics (NCES) and are based on the data from the PLDF3 dataset documented elsewhere on this site. PLDF3 is a recompilation of the annual public library data collected by NCES from 1987-2001. This analysis is the first examination of these data and uses PLDF3 but focusses on data from 1990-2001 for reasons that are discussed shortly.
There will be two kinds of examinations of these data: a static analysis for the year 2001 and a further consideration of the data to be used for the trend analysis for the period 1990-2001. Those two pages have a good bit of detail about the data themselves. Results of the trend analysis are reported more simply on the "Trends Results" page. Readers uninterested in the discussion of the vagaries of these data should begin at this latter page.
Summary of Results
Trends examined so far show that during the period from 1990-2001 the libraries examined added staff, increased expenditures in real terms, and also increased expenditures per capita.
Static analysis of data from these PLDF3 libraries in 2001 shows, as might be expected, a skewed distribution on five key variables examined. That is, there are a few large libraries and many small ones, a common occurence with library distributions. A result of this skewness is that the libraries in the top 25%, as measured by these five variables, had more of what the variables measure than the lowest 75%. For instance, the top 25% of public library staffs employed a total of 111,000 people while the bottom 75% employed just over 19,000. There are implications from these results. For instance, in analyzing trends, we will want to know what the average or typical library does over time. "Average," though, is often casually used to mean the arithmetic mean of a distribution and in a skewed distribution, the arithmetic mean does not indicate what is typical in that distribution. For instance, the arithmetic mean of the number of full-time staff at all public libraries in 2001 is 15.5 while the median, or central value, is 3.75. That is, half the libraries employ more than 3.75 staff members and half employ fewer. In effect, the large libraries pull the arithmetic mean far above this central value.
The largest libraries are quite large. The top public libraries in these quantitative measures are urban libraries. The top five includes three libraries in the New York City area, the Boston Public Library, and the Los Angeles County Library. The quantitative measures used here make these libraries look like the members of the Association of Research Libraries (ARL) but, of course, the different missions of public libraries and academic libraries are such that these few numbers only tell a part of the story of how those different missions make these libraries behave in different ways which will become clear with analysis of more of the public library data than these summary data. Two notable exceptions to the observation about different missions are The New York Public Library (NYPL) and Boston Public Library, both of which are public libraries and members of ARL. NYPL is also the largest public library in the United States.
How are these five variables related? It turns out that the correlations between them are high. Not reported here are the correlations between the logarithmic transformations of the raw variables which are also high. The log transforms were calculated because they are often used as a way of treating skewed data and because of a characteristic of correlations. These correlations are done as a first approximation.
The driving force behind the static analysis is to learn enough about these libraries and their data to begin to analyze trends over as much of the period of the data as possible. As a result of consideration of the characteristics of the data outlined here and given in more detail in the static analysis of the 2001 data (and of other years not reported) the PLDF3 dataset was further pared down for this first look at trends. Of the 9,766 libraries ever reporting in PLDF3, we have 8,282 libraries that reported each year from 1990-2001. The creation of the first analytic dataset for trend analysis is discussed separately. Briefly, in an attempt to get the libraries from as many states as possible, the data used here had to begin in 1990 when the last state, Tennessee, reported. Libraries not reporting each year were dropped for now because otherwise would changes observed be the result of a changing pool of libraries or of underlying changes in the condition of libraries? Given that Minnesota did not report in 2001, none of its libraries is included. The resulting dataset exhibits the same kinds of skewing and correlation patterns seen in PLDF3. This first analysis leaves out libraries but has the advantage of being a good first set of libraries to examine. As mentioned, the analysis of trends is found at "Trends Results".
Regions
An aspect of the tapestry of libraries in the US are differences in states and regions. There are many ways to supply library service and an important question is to understand what those differences are to the extent we can discern them with these data. The NCES data include information on economic regions and summary statistics on libraries in those regions show regional differences. After considering these regional data, I am more skeptical of their value in analyzing libraries and did a state breakdown which illustrates through these data the extraordinary variety of ways the states provide for public library service. This question awaits further work.
May 20, 2004
Analysis of 2001 data
Analyzing Trends
Trends Results
Tables
Back to NCES index
Return to NCLIS Homepage