Digital.gov Guide
Understanding the Site Scanning program

Understand the data
Learn about the various types of data collected from scanned websites.
The Site Scanning engine runs against the full list of federal government websites and analyzes various aspects of them.
The scans operate without authentication over the public internet. Using a headless browser (a browser without a graphical interface), they load each Target URL and inspect what would normally be returned to a user who is visiting that page with a web browser. The results of these inspections form the data that Site Scanning makes available.
The scans currently collect the following data about each target URL. A complete data dictionary with much more detail can be found in the program's documentation hub.
General | USWDS | DAP | SEO | Third Party Services |
---|---|---|---|---|
Server Response Code | Presence of USWDS components | Presence of DAP snippet | Meta Description Tags | Presence of Third Party Services |
Redirects | USWDS Version | Customizations of the Snippet | Presence of Robots.txt | Number of Third Party Services |
Domain | Degree of Implementation | Elements of the Robots.txt | ||
Agency | Presence of Sitemap.xml | |||
Bureau | Elements of Sitemap.xml | |||
404 Configuration | Canonical URL | |||
IPv6 Compliance | ||||
Underlying Technology |
Have ideas for what else we should be scanning for? Please file an issue or add your idea to the list of proposed future scans!