Quick Links
Public Libraries in the US
Analyzing Trends
Trends Results
Tables

Public Libraries in the United States in 2001


The Data

The 2001 NCES survey of public libraries reported data collected from 9,133 libraries in 49 states (Minnesota did not report and is not included in this analysis), and one each from the District of Columbia, the Northern Marianas, Palau, and the Virgin Islands. These "2001" data are actually collected to report on each state's local "year" which years vary rather considerably across the country. There are two consequences of this fact. One is that a few states reported very late so that nominally 2001 data were reported as late as the Fall of 2002 and were subsequently published in June, 2003. Second, in most analysis of library data the fact that these years vary is ignored, particularly with trends where the year-to-year changes in data deal with different but consistent years by state. This second fact could affect those trends in unknown ways.

The NCES data as published include imputed values. That is, NCES in order to make national estimates will make up data when faced with data that are missing. Imputation is a well-known and respected technique to deal with missing values and the missing values for Minnesota were imputed by NCES to provide for national estimates. The dataset used for this analysis, PDLF3, however, has had the imputations removed so totals presented here may differ from totals published elsewhere using imputed data. The NCES state summary data, with imputations, may be used where appropriate.

The numbers discussed so far are the numbers of libraries reporting in the different categories and datasets, however, the number reporting for each variable will vary because not all libraries will report all variables for all years. "Missing" values are common in this dataset--as they are in most other datasets. The number reporting for any given set of variables--conventionally called "N"--will be noted as much as possible either with a column or row in tables or within parentheses. Generally, the larger the N, the fewer the difficulties with the number reported. In certain procedures, such as the correlations that follow, the N is reported for the pair of variables correlated.

The Distribution of Libraries in the US, 2001

Results of basic descriptive analysis are reported in tables discussed separately. For the static analysis there and for the discussion of the distribution, two different sets of data for 2001 are used. One is of data from all 9,133 libraries reporting in 2001 less the 140 libraries in Minnesota or 8,993. The second is of the "1990" data for libraries with a span of 'A'. (Note that because Minnesota does not report in 2001, no Minnesota library has a span of 'A' so some procedure will have to be developed to include the experience of Minnesota libraries from 1990-2000.) Descriptive statistics on this latter group are presented along with the 2001 data. The "1990" data will be used for the trend analysis so the question arises: is this subset of 8,406 of libraries like all the libraries in PLDF3 reporting in 2001? For this analysis and the correlations which follow, five key variables have been analyzed from all 74 available variables:

  1. BKVOL - book and serial volumes held
  2. POPU_LSA - population served in the library's legal service area
  3. TOTINCM - total library income
  4. TOTOPEXP - total library operating expenditures
  5. TOTSTAFF - total paid employees

The tables below show one type of practical implication of these basic statistics that are discussed with the distribution statistics. The national data for the two datasets indicate a highly skewed distribution. That means that there are a few big libraries and lots of little ones. This result is not surprising and in keeping with expected results. If we divide these five variables into quartiles, the few observations in the group above the 75th percentile have higher levels of these five variables than the lower 75%. This table is repeated on the trends page with a similar table from the "1990" data.

Upper Quartile and the Lower Three Quartiles
PLDF3 Dataset for 2001
Variable # in Highest 25% # in Lowest 75%
BKVOL 596,949,844 152,199,120
POPU_LSA 234,422,863 38,910,067
TOTINCM $7,164,681,020 $885,146,273
TOTOPEXP $6,611,312,519 $800,052,014
TOTSTAFF 110,724 19,340

In describing this kind of distribution, the median--or central value--is often included because it captures something important about the distribution that the mean--the arithmetic average--obscures. Note for instance TOTSTAFF below where the mean number of paid employees is 15.5 but the median number is 3.75. Those two numbers present a different picture of what is going on in an "average" library, don't they?

Summary Statistics for TOTSTAFF
Variable Mean Median 3rd Quartile
TOTSTAFF 15.5 3.75 11.15

The Five Largest, 2001

The largest of these libraries are quite sizeable. Here is are the five largest libraries in terms of BKVOLs, that is, the number of volumes held in the library. Note that if one of the other three variables were used instead of BKVOLs, the order would be slightly different. For instance, Total Operating Expenditures for the Brooklyn Public Library ($77,054,618) is higher than that of Queens Borough Public Library ($76,785,290) which is second on this list. And Boston Public Library has the third largest number of volumes but the population of its service area, staff, and total expenditures are quite a bit smaller than the others at the top of the list. But, as the next section on correlations shows, overall there is a high degree of consistency between these variables. These data show that support for libraries in the New York area is impressive. The Los Angeles Public Library is the sixth largest in volumes so the Los Angeles area has created two large libraries. Note that Boston, with a smaller population served has managed to build quite a large library, too. But, these five are not like most public libraries.

The Five Largest Public Libraries, 2001
Library Name BKVOL POPU_LSA TOTOPEXP TOTSTAFF
New York Public Library 19,080,814 3,313,573 $215,379,047 2,821.88
Queens Borough Public Library 9,595,051 2,229,379 $76,785,290 1,209.34
Boston Public Library 7,736,451 589,141 $40,093,539 562.61
County of Los Angeles Public Library 7,670,531 3,484,800 $70,181,538 1,206.00
Brooklyn Public Library 6,777,703 2,465,326 $77,054,618 1,153.25

Correlations, 2001

Calculating the correlations between each of the pairs of these five variables shows that the relationships between them are strong. That is, the more of any one you have, the more of each of the others. These correlations are quite high. Correlations are presented between these five variables for the two datasets' 2001 data. Correlations were also done with logarithmic and square root transforms of both the datasets. All show similarly high correlations. Note that this table is symmetrical about the diagonal of 1s from the upper left to lower right where the variables are correlated with themselves and, hence, the correlation is 1. This symmetry reflects the fact that the correlation between BKVOL and TOTOPEXP is the same as the correlation between TOTOPEXP and BKVOL. This table is repeated with a similar table of correlations for the "1990" data on the trends page.

Correlations Between Five Pairs of Variables in PLDF3, 2001
  BKVOL POPU_LSA TOTINCM TOTOPEXP TOTSTAFF
BKVOL 1.00
(8,772)
.86
(8,772)
.96
(8,713)
.96
(8,719)
.96
(8,724)
POPU_LSA .86
(8,772)
1.00
(8,991)
.88
(8,759)
.87
(8,764)
.90
(8,791)
TOTINCM .96
(8,713)
.88
(8,759)
1.00
(8,759)
.99
(8,726)
.98
(8,720)
TOTOPEXP .96
(8,719)
.87
(8,764)
.99
(8,764)
1.00
(8,764)
.98
(8,722)
TOTSTAFF .96
(8,724)
.90
(8,791)
.98
(8,720)
.98
(8,722)
1.00
(8,791)

N is included in parentheses under the correlations. N in that diagonal of 1s is the N we have of this variable in this dataset. One way of looking at a correlation is as how constant a ratio is between to variables. If when you have 1 of A and 2 of B in a pair, and with 2 of A, 4 of B in a second pair, 6 of A, 12 of B and so on, the correlation would be a positive correlation of 1. In the real world, such relationships vary because of a host of factors that made up examples ignore. Correlations range between a maximum of 1 to a minimum of -1. The highest correlation in this table is that between total library income and total library expenditures which is not hard to understand. A negative correlation might be that as one eats more vegetables, one's weight declines. Correlations require pairs of variables and if one library reports TOTSTAFF but not TOTOPEXP, that library's data will not be used for the correlation between those two values but TOTSTAFF for that library will be used with the other two correlations if values are reported for them.


Regions, 2001

The NCES data include a variable OBEREG which are economic regions developed by the Bureau of Economic Analysis in the Department of Commerce. As a result, these library data can be used with an established economic series. It is not clear that these economic regions will be useful in describing how libraries vary but it is a convenient place to start. These regions are:

BEA Regions and States
Region States
New England Connecticut
Maine
Massachusetts
New Hampshire
Rhode Island
Vermont
Mideast Delaware
District of Columbia
Maryland
New Jersey
New York
Pennsylvania
Great Lakes Illinois
Indiana
Michigan
Ohio
Wisconsin
Plains Iowa
Kansas
Minnesota
Missouri
Nebraska
North Dakota
South Dakota
Southeast Alabama
Arkansas
Florida
Georgia
Kentucky
Louisiana
Mississippi
North Carolina
South Carolina
Tennessee
Virginia
West Virginia
Southwest Arizona
New Mexico
Oklahoma
Texas
Rocky Mountain Colorado
Idaho
Montana
Utah
Wyoming
Far West Alaska
California
Hawaii
Nevada
Oregon
Washington
Outlying Areas Guam
Commonwealth of the Northern Marianas Islands
Republic of Palau
Virgin Islands

Here are summary data for these regions for the same five variables:

Summary Library Data for BEA Regions
Region BKVOL POPU_LSA TOTINCM TOTOPEXP TOTSTAFF
  Mean Median Mean Median Mean Median Mean Median Mean Median
New England (N=1,303) 49,453 21,805 11,510 4,416 $378,284 $100,466 $360,512 $97,476 6.6 2.4
Mideast (N=1,580) 100,236 31,556 29,553 9,186 $1,102,291 $207,560 $1,040,797 $188,472 17.0 4.5
Great Lakes (N=1,878) 83,624 31,450 23,335 7,756 $1,030,689 $241,555 $903,479 $187,441 15.2 4.5
Plains (N=1,628) 36,442 13,549 8,622 1,663 $263,277 $36,812 $240,192 $34,558 5.2 1.1
Southeast (N=1,097) 129,208 49,561 61,112 22,989 $1,292,392 $310,816 $1,203,059 $279,757 24.1 7.1
Southwest (N=770) 72,666 24,664 37,529 7,361 $702,157 $99,387 $671,589 $93,092 13.3 3.0
Rocky Mountain (N=394) 65,816 24,025 22,854 5,548 $718,714 $111,024 $667,676 $101,023 11.8 2.8
Far West (N=479) 219,506 56,864 99,356 16,774 $2,847,731 $506,185 $2,617,587 $464,300 38.2 8.0
Outlying Areas (N=4) 199,478 199,478 131,618 131,618 $1,913,108 $1,913,108 $1,813,365 $1,813,365 31.0 31.0

Some values affected by rounding. The N's in parentheses are the total number of public libraries in each region. However, except for most of the figures for POPU_LSA, none of these summary data reflects data from all libraries reporting to NCES in 2001. For Outlying Areas, N in this table = 2 which is why the median and mean are equal.

Note the evidence of skewness found in each of these variables in each region. Skewness is a constant aspect of library data. Libraries in New England and the Plains states serve the smallest populations and, by these measures, have smaller libraries than those in the other regions. The Far West, on the other hand, tends to have larger libraries serving larger populations. It appears to me now that state level analysis will be more productive and a first approximation has been done using the data from the states for 2001 in the same format as this table for regions.

Valid XHTML 1.0!


May 20, 2004
Analyzing Trends
Trends Results
Tables
Public Libraries in the United States
Back to NCES index
NCLIS 30th Anniversary logo Return to NCLIS Homepage