Digital.gov Guide
Understanding the Site Scanning program
Technical details
Learn about the automated processes behind the site scanning program.
The Site Scanning program maintains a number of automated processes that, together, consitute the entire project and seek to deliver useful data. The basic flow of these events are as follows:
Building the Federal Website Index
- Every day, a comprehensive list of public federal websites is assembled as the Federal Website Index.
- Description of the Federal Website Index.
- Direct download of the current Federal Website Index.
- Process description, including details about the sources used, how the list is combined, and which criteria are used to remove entries.
- List of the source files that are combined to make the index.
- Snapshots from each step in the assembly process, including which URLs are removed at each step and which remain.
- Data dictionary for the Federal Website Index.
- Summary report for the assembly process.
- Summary report for the completed Federal Website Index.
- Task repository.
Running the Scans
- Every day, the Federal Website Index is then scanned. This is done by loading each Initial URL in a virtual browser and noting the results. This information is the Site Scanning data.
- Scanning process description, including what criteria are used to create each field of data.
- Direct download of the complete Site Scanning dataset.
- Other download options.
- Archived snapshots.
- API documentation.
- Data dictionary for the Site Scanning data.
- Explanation of possible scan statuses.
Analyzing and Snapshotting the Data
- Every day, after the Federal Website Index is assembled and the scans have run, a further series of discrete actions run that generate a series of analytical reports.
Other Useful Links
- Technical users may also find the following helpful.
Project Repositories
- Project source code and documentation is available at the following locations.