Technical details

Learn about the automated processes behind the Site Scanning Program.

The Site Scanning Program maintains a number of automated processes that, together, constitute the entire project and seek to deliver useful data. The basic flow of these events are as follows:

Building the Federal Website Index

Every day, a comprehensive list of public federal websites is assembled as the Federal Website Index.

Description of the Federal Website Index.
Direct download of the current Federal Website Index.
Process description, including details about the sources used, how the list is combined, and which criteria are used to remove entries.
Process description in table form, which each step and resulting files described in order.
List of the source files that are combined to make the index.
Snapshots from each step in the assembly process, including which URLs are removed at each step and which remain.
Data dictionary for the Federal Website Index.
Analysis report for the assembly process.
Analysis report for the completed Federal Website Index.
Task repository.

Running the scans

Every day, the Federal Website Index is then scanned. This is done by loading each Initial URL in a virtual browser and noting the results. This information is the Site Scanning data.

Scanning process description, including what criteria are used to create each field of data.
Direct download of the complete Site Scanning dataset.
Other download options.
Archived snapshots.
API documentation.
Data dictionary for the Site Scanning data.
Explanation of possible scan statuses.

Analyzing and snapshotting the data

Every day, after the Federal Website Index is assembled and the scans have run, a further series of discrete actions run that generate a series of analytical reports.

Project repositories

Project source code and documentation is available at the following locations.

Understanding the Site Scanning Program

Sections