As one of the newest (in terms of public awareness) emerging advances in technology, Big Data is being employed and explored by government, business and even Library of Congress. Leslie Johnston, Acting Director, National Digital Information Infrastructure & Preservation Program, Library of Congress, presented the following information at the Georgetown University Law School Symposium on Big Data last January.
We still have collections. But what we also have is Big Data, which requires us to rethink the infrastructure that is needed to support Big Data services. Our community used to expect researchers to come to us, ask us questions about our collections, and use our digital collections in our environment. Now our collections are, more often than not, self-service.
Obviously, Library of Congress has significant influence on our profession and the complexion of libraries across America. One has to pause and wonder what impact the future of Big Data will have on the small library.
Big Data is described by our friends at Wikipedia as:
Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.”
As of 2012, limits on the size of data sets that are feasible to process in a reasonable amount of time were on the order of exabytes [1018bytes] of data. Scientists regularly encounter limitations due to large data sets in many areas, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research. The limitations also affect Internet search, finance and business informatics. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created. The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization.
Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers”. …
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. The target moves due to constant improvement in traditional DBMS technology as well as new databases like NoSQL and their ability to handle larger amounts of data. With this difficulty, new platforms of “big data” tools are being developed to handle various aspects of large quantities of data.
OK, that was a big quote, but the most important part is that “The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.” meaning that Big Data is SO FAR beyond the capabilities and considerations of smaller libraries that it may create a huge polarization in library services. “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time.”
Curate! Store! Search! Share! Transfer! are ALL functions of librarianship and libraries, and LoC is telling us that Big Data is making them “beyond the ability of commonly used software” to deal with. So where does that leave small libraries without the capability to develop their own digital library services, like Douglas County (CO) Library System’s cloud library project? It leaves them right back in the 20th Century where they have too long been.
The progression of Big Data relative to library collections obviously appears to be toward everything being available over the Internet or other WiFi and mobile devices. Earlier fears about librarians becoming obsolete were not unfounded. Not only are Google, Amazon, and National Digital Library trying to make information more ubiquitous, now Library of Congress is helping promote Big Data that will make small libraries nothing more than remote terminals for Big Data.