By now, you are familiar with “big data” or datasets that are so large that they cannot be analyzed by conventional analytical methods. You may have heard of “long data” which is data that has a temporal context. I work with long data when I analyze hiring patterns over time in workforce data. There is also “small data.” Small data are datasets that describe a current condition. For example, if you have a smart home appliance such as a smart thermostat or a home security system, that appliance is constantly monitoring data such as temperature or if a door is open. The appliance does not store the data (or may store the data for a short time) but only reacts if there is a change in the data. As you can see, there are many types of data.
So, what do you call data that is collected but isn’t analyzed or made readily available? “Dark data” is becoming a major issue for organizations as the ability to collect data is far outpacing the ability to analyze data. According to an article on CIO.com, Shedding Light on Dark Data, dark data is created because the organization may not be aware that the data is being collected.
When different parts of the organization are not collaborating on data collection or data analysis, they could also create dark data. Also, the organization may not have the tools or the analytical skills to analyze the collected data. Finally, dark data may happen because the organization does not have good documentation on the organizational datasets. Dark data is like an attic or storage shed, where boxes of information are stored away with the promise that we will, one day, get around to working with the data.
The federal government is a massive collector of information. Data.gov currently has 131,941 datasets and I am sure that is a small fraction of the total number of government datasets. An even smaller fraction of government datasets has been translated into APIs. As I have documented in earlier postings, apps built on government APIs have become a vital part of the U.S. economy and provide valuable public services. The next big challenge for federal government data is to explore and utilize the dark data that exists in all of the government agencies. Much progress has been made, but more work is needed to unearth the hidden treasures in the federal government’s dark data.*API – Application Programming Interface; how software programs and databases share data and functions with each other. Check out APIs in Government for more information. Each week, “The API Briefing” will showcase government APIs and the latest API news and trends. Visit this column every week to learn how government APIs are transforming government and improving government services for the American people. If you have ideas for a topic or have questions about APIs, please contact me via email. All opinions are my own and do not reflect the opinions of the USDA and GSA.