Quick Links
Public Libraries in the US
Analysis of 2001 data
Trends Results
Tables

Analyzing Trends in Public Libraries in the United States


Abstract

This page discusses characteristics of public libraries as revealed through the data. These characteristics must be taken into account in analyzing libraries during the period in order to arrive at appropriate conclusions about trends. The result is the creation of five groups to analyze trends in the "1990" dataset of public libraries. This dataset is a subset of PLDF3.

These results are discussed separately on the page linked to as "Trends Results" in these pages.

Since 1990, we have seen a period of economic boom and the results available so far indicated generally good trends as libraries have added staff and expenditures kept up with inflation. From my work with academic data, I have seen economic cycles play out in measures of library growth: in good times, libraries have more money because of higher taxes or gifts and in bad times, libraries have less. Here are two examples of this relationship in the analysis of 12 ARL libraries from 1908-2003. I expect the same kinds of relationships to be true with public libraries. That is, when we add older data to these data, we will find the same relationship to the business cycle with public libraries.


The Data

Specifying a set of data to analyze trend data is a bit more complex than taking one year's worth of data. In compiling the public library data for longitudinal analyis for PLDF2, it became clear that there was no key variable that uniquely identified each library so an important matter was creating such a key. That done in PLDF3, it is now possible to analyze trends on consistent sets of libraries over time. Without such a key, changes in levels of variables from year to year might reflect real changes in underlying conditions or these changes might reflect an analysis of a randomly changing set of libraries--or it might not. We would have no way of knowing. As a result of creating such a key variable, a new variable classifying libraries by the number of years they reported was calculated. PLDF3 has the variable span that is designed to group libraries by their reporting behavior over the years. Libraries with a span of 'A' report all years the data were collected from each state, 'S' report for some years, and 'E' libraries are a subset of 'S' libraries that report some years but include the first and last years (or 'End') of each state's data. 'A' libraries can be used to show the behavior of this set of consistent libraries over time and, with the addition of the 'E' libraries, could be used to calculate changes from the beginning year to 2001.

However, the various states' public libraries began reporting to NCES in different years. 19 reported in 1987 and the last, Tennessee, started reporting in 1990. As a result, a span of 'A' for states beginning in 1987 would include a different set of years for a span of 'A' from Tennessee and, in addition, analysis of trends from 1987 would include a different set of data each year through 1990. In order to make a consistent set of libraries to analyze trends with, a second dataset of libraries that reported from 1990 through 2002 was created and the variable span was recalculated for those libraries. These are called the "1990" data here to denote their first year. The basic descriptive statistics reported here for the PLDF3 data are also calculated for libraries with a span of 'A' in the "1990" data for 2002. It is a slightly smaller group of libraries but it is that group that will be looked at for trend information so a comparison with all the PLDF3 data for 2002 might be useful. But, first, a bit more on these two datasets.

Characteristics of the Libraries

How do the two ways of grouping the public libraries compare in 2002? Here are some basic statistics on their spans:

Comparison of Two Public Library Datasets for 2001 and 2002
Dataset Total Number
of libraries
Libraries with
span = 'A'
Libraries with
span = 'E'
Libraries with
span = 'S'
PLDF3 in 2001 8,993 7,829 184 980
"1990" in 2001 8,993 8,282 127 584
         
PLDF3 in 2002 9,139 7,808 269 1,062
"1990" in 2002 9,139 8,259 130 750

Although the libraries in 2001 and 2002 are the same, the classification into these span categories varies. Redoing the calculation of span from the year 1990 instead of each state's beginning year results in more libraries with a span 'A' at the expense of the other two categories. There are other ways to look at the entire set of data in PLDF3 but the libraries in the "1990" dataset that reported each year will provide the basic set of data for the first look at trends undertaken here. The first work on these data at this URL began with the "1990" data from 1990-2001 but with the addition of data from 2002, it is appropriate to compare the two versions of the "1990" data before going on. The 2002 data have 146 more libraries reporting than in 2001. However, libraries with a span of 'A' fell 21 from 7,829 to 7,808. How can this be? Well, none of the 146 new libraries would have reported in all years and some of the 'A' libraries would drop out for various reasons. That would mean 'A's must decline and the other categories stay the same or rise.

The 9,139 libraries in 2002 are not the only libraries in the dataset, in fact, 9,809 public librares have ever reported to NCES in this series and their experiences will have to be taken into account in any complete analysis. As the result of the fact that Minnesota did not report in 2001, no library from Minnesota will can have a span of A or E and this fact must also be considered in subsequent analysis so that the experience of all the libraries in this state is not lost in looking at national trends. For the first look, though, we will take the 8,259 libraries in the current version of "1990" dataset that have a span of 'A' because this set of libraries should be the most straightforward place to start.

Let's look at this subset of the "1990" libraries.

A table presented in the discussion of the distribution of the 2001 data for PLDF3 had upper quartile sums for five variables compared with those of the lower three quartiles. Repeating the first table for comparison with the "1990" data:

Upper Quartile and the Lower Three Quartiles
PLDF3 Dataset for 2002
Variable # in Highest 25% # in Lowest 75%
BKVOL 596,949,844 152,199,120
POPU_LSA 234,422,863 38,910,067
TOTINCM $7,164,681,020 $885,146,273
TOTOPEXP (total operating expenditures) $6,611,312,519 $800,052,014
TOTSTAFF (total staff) 110,724 19,340

Upper Quartile and the Lower Three Quartiles
"1990" Dataset for 2001, Span = 'A'
Variable # in Highest 25% # in Lowest 75%
BKVOL 579,051,181 151,718,029
POPU_LSA 224,396,018 38,566,378
TOTINCM $6,959,308,713 $898,022,441
TOTOPEXP $6,423,157,406 $812,997,221
TOTSTAFF 107,194 19,505

These two datasets have the same basic structure. Both are skewed and have heavy tails. Although various statistics calculated using these two ways of looking at the data will differ, the general structure of these two is similar.

Abbreviations:


The pattern of correlations between these two sets of data for 2001 are also similar. Again, the table of correlations from the discussion of distributions in 2001 is repeated here for comparison. However, given the strong relationships, I will not repeat these correlations for 2002 at this time. Rather, I will press forward with the trend analysis to include the data through 2002. The quartiles used are dealt with at the bottom of the page under "Groups by Size".

Correlations Between Five Pairs of Variables in PLDF3, 2001
  BKVOL POPU_LSA TOTINCM TOTOPEXP TOTSTAFF
BKVOL 1.00
(8,772)
.86
(8,772)
.96
(8,713)
.96
(8,719)
.96
(8,724)
POPU_LSA .86
(8,772)
1.00
(8,991)
.88
(8,759)
.87
(8,764)
.90
(8,791)
TOTINCM .96
(8,713)
.88
(8,759)
1.00
(8,759)
.99
(8,726)
.98
(8,720)
TOTOPEXP .96
(8,719)
.87
(8,764)
.99
(8,764)
1.00
(8,764)
.98
(8,722)
TOTSTAFF .96
(8,724)
.90
(8,791)
.98
(8,720)
.98
(8,722)
1.00
(8,791)
Correlations Between Five Pairs of Variables in "1990", 2001 Span = 'A'
  BKVOL POPU_LSA TOTINCM TOTOPEXP TOTSTAFF
BKVOL 1.00
(8,143)
.86
(8,143)
.96
(8,093)
.96
(8,105)
.96
(8,105)
POPU_LSA .86
(8,143)
1.00
(8,282)
.88
(8,129)
.87
(8,140)
.90
(8,158)
TOTINCM .96
(8,093)
.88
(8,129)
1.00
(8,129)
1.00
(8,107)
.98
(8,096)
TOTOPEXP .96
(8,105)
.87
(8,140)
1.00
(8,107)
1.00
(8,140)
.98
(8,105)
TOTSTAFF .96
(8,105)
.90
(8,158)
.98
(8,096)
.98
(8,105)
1.00
(8,158)

Some values affected by rounding. The N's in parentheses are the total number of public libraries in each pair. The correlations for TOTOPEXP to TOTINCM to five places: .99599.

The two sets of data are similar and the "1990" span = 'A' library data allow us to analyze more libraries than the span = 'A' libraries from the PLDF3 dataset, so we will use the "1990" data as the base data as a place to start to analyze trends in the behavior of these variables over the years 1990-2002.


Groups by Size

The libraries in the "1990" data have been divided into groups to take into account what we know about the characteristics of these libraries as shown by the data. We know that the skewed distribution shown by the five key variables examined means that the influence of the very largest libraries will obscure what is going on in smaller libraries. We also see in these relatively high correlations that grouping libraries by these variables, at least, would tend to create similar groups using any of these variables.

A second reason to group like libraries is to damp fluctuations in the apparent behavior of the data that could occur as a result of missing values or individual libraries having unusual years. Grouping will tend to smooth out the experiences of these libraries. It could also obscure potentially important results from being noticed in this beginning analysis, so there are tradeoffs to using this technique--as there are to any set of techniques.

Five groups were created by separating the libraries by size of the population served (POPU_LSA). There will be five groups where the mean of each of these groups will be examined using line charts and tables. The five groups are the four quartiles and the 95th percentile libraries. Note that the latter two are not mutually exclusive so that the 95th percentile libraries are also in the fourth quartile. There are about 400 of these 95th percentile libraries and, roughly, over the time from 1990-2002 this set of libraries are those near 100,000 population served in the legal service area.

It was decided to form the groups based on the quartiles in 1996, and to assign this quartile to all years. 1996 is the latter of the two middle years of the period 1990 to 2002 and an analysis of quartiles in 1990, 1996, and 2001 showed that those formed in 1996 had fewer changes in quartiles than the other two. In making up groups, the attempt is to get like libraries together so that they can be analyzed with like libraries, however a library can move from one quartile to another as a result of several factors discussed separately in greater detail. Using the middle year resulted in the fewest such changes over the period. The quartiles are:

Criteria Used to Create Groups For First Analysis of Trends
Group N POPU_LSA in
1996
95th% 413 > 106,822.9
4th Quartile 2,065
(1,652 without top 5th%)
> 20,240.9
3rd Quartile 2,065 > 6,859.9 and < 20,241
2nd Quartile 2,064 > 2,212 and < 6,860
1st Quartile 2,065 < 2,212.1

Valid XHTML 1.0!


January 25, 2005
Analysis of 2001 data
Trends Results
Tables
Public Libraries in the United States
Back to NCES index
NCLIS 30th Anniversary logo Return to NCLIS Homepage