Data.gov’s Data Pipeline Explained

Dec 4, 2014

In case you missed it: the Data.gov team recently hosted DigitalGov University webinars designed to help agencies and open data advocates better understand how to get data on Data.gov and how to implement the Open Data Policy’s metadata schema updates. These webinars were designed assist government data publishers in making more data discoverable to the American people. You can watch these webinars and check out additional supplemental resources below.

Project Open Data Metadata Schema v1.1 Updates

Executive Order 13642 and OMB Memorandum M-13-13 require all executive departments and agencies to list all agency data that can be made public in a publicly available open data catalog with consistent metadata. In the year plus since the release of the Open Data Policy, agencies and the public have suggested several updates to the metadata schema. Each issue was rigorously discussed in its own issue thread and at the July government-wide offsite session dedicated to this update. The result is version 1.1 of the metadata schema required under the Open Data Policy. Federal agencies will be required to present their datasets using version 1.1 starting in February 2015.

Data.gov’s October 15th “Project Open Data Metadata Updates” webinar reviews metadata schema v1.0 as required under the Open Data Policy, provides a comprehensive step-by-step overview of the updates to the metadata schema, and provides a roundup of tools and resources to assist data stewards, IT personnel, and all agency staff in their v 1.1 metadata updates.

As of December 3, 2014, the Data.gov catalog supports both version 1.0 and 1.1 of the metadata schema, to provide agencies adequate transition time to version 1.1 by the February 2015 deadline.

Data Harvesting 101

The Data.gov team also conducted a recent webinar with more basic information about Data.gov and how agencies’ data is added to the Data.gov catalog. Data.gov is the United States’ central clearinghouse to search and discover over 130,000 open government datasets. Data.gov does not host data directly, but rather aggregates metadata about open data resources in one centralized location.

Once an open data source meets the necessary format and metadata requirements, the Data.gov team can pull directly from it as a Harvest Source, synchronizing that source’s metadata on Data.gov as often as every 24 hours.

The November 5th “How to Get Your Agency’s Data onto Data.gov” webinar reviews step-by-step how federal, geospatial, and non-federal data is funneled on to Data.gov, the data requirements for getting your government data on to Data.gov, and tools and resources to assist data stewards, IT personnel, and all agency staff.

Additional Resources

Stay Tuned

Register for Data.gov’s upcoming webinar on how to use the Inventory.data.gov tool to host your metadata: “How to use Inventory.Data.Gov” on Tuesday, December 16th from 1pm – 2 pm and stay tuned for additional Data Harvesting documentation. As always you can reach the Data.gov team at Data.gov/contact.