The Data Briefing: Tales from the Dark Side of Data

Apr 27, 2016

There are many scary tales in the world of knowledge management and data management. Tales of missing data that was lost through the administrative cracks, such as the story of the missing Apollo 11 moonwalk tapes that most likely were erased by accident. Or the 36-year search for the original Wright Brothers’ patent, which was happily re-discovered this month. As more data is being created at ever-increasing speed and complexity, there will be more missing data horror stories.

Businesswoman searches for data with magnifying glass.

Data is easier to create now than ever before in history. The government has always been a major creator and collector of data. Whenever I think of government data, I think of the enormous government warehouse at the end of Raiders of the Lost Ark. That warehouse has grown even larger as more and more data has been lost in the endless rows of storage. I wrote about the problem of “dark data” in a column last year:

“When different parts of the organization are not collaborating on data collection or data analysis, they could also create dark data. Also, the organization may not have the tools or the analytical skills to analyze the collected data. Finally, dark data may happen because the organization does not have good documentation on the organizational datasets. Dark data is like an attic or storage shed, where boxes of information are stored away with the promise that we will, one day, get around to working with the data.”

Dark data, if kept for too long, can become toxic data. At a panel for a recent government-wide event, military officials discussed the dangers of storing data past its useful life: “Datasets created and stored before the development of advanced cybersecurity protections can potentially offer easy pathways for hackers.” Systems to store and utilize the data have to be maintained way past their useful life, which also increases the risk of hackers penetrating other, more modern systems through well-known vulnerabilities in the older systems. Like chains, which are only as strong as their weakest links, computer networks are only as secure as their most vulnerable network component.

I’ve been in several projects where a data solution was implemented because of immediate need. There was some data modeling and analysis performed, but not enough long-range planning was performed to help future-proof the datasets. Thus, you have data locked into old systems where the limitations of the systems prevent the office from adopting newer, more effective systems. In one project, we just had to cut our losses because it was impossible to migrate the data from the old system into the new database. Therefore, the past data is locked away with very little hope to make it accessible again.

Sometimes, data science reads like an Edgar Allen Poe story. Datasets are locked away in a forgotten (computer) prison or trapped behind a (data silo) brick wall. Datasets grow toxic over time and turn into menaces to network security.

Data has to be managed, stored and used carefully and effectively. The ability to create and collect data has become a blessing to the government as it helps us to make better-informed decisions and is a major component of today’s global economy. However, data can also be a curse because of bad decisions based on expired data or lost data. The key is to have clear objectives for collecting and using the data while having a management plan for the lifecycle of the data. Data horror stories are not good for government or the American public.

Each week, The Data Briefing showcases the latest federal data news and trends. Dr. William Brantley is the Training Administrator for the U.S. Patent and Trademark Office’s Global Intellectual Property Academy. You can find out more about his personal work in open data, analytics, and related topics at BillBrantley.com. All opinions are his own and do not reflect the opinions of the USPTO or GSA.