Digital.gov Guide

Understanding the Site Scanning program

This program is available to automatically generate data about the health and best practices of federal websites.
A person works in front of a computer with many internet symbols on it

Understand the data

Learn about the various types of data collected from scanned websites.

The Site Scanning engine runs against the full list of federal government websites and analyzes various aspects of them.

The scans operate without authentication over the public internet. Using a headless browser (a browser without a graphical interface), they load each Target URL and inspect what would normally be returned to a user who is visiting that page with a web browser. The results of these inspections form the data that Site Scanning makes available.

The scans currently collect the following data about each target URL. A complete data dictionary with much more detail can be found in the program's documentation hub.

General

USWDS

DAP

SEO

Third Party Services

Server Response Code

Presence of USWDS components

Presence of DAP snippet

Meta Description Tags

Presence of Third Party Services

Redirects

USWDS Version

Customizations of the Snippet

Presence of Robots.txt

Number of Third Party Services

Domain

Degree of Implementation

 

Elements of the Robots.txt

 

Agency

  

Presence of Sitemap.xml

 

Bureau

  

Elements of Sitemap.xml

 

404 Configuration

  

Canonical URL

 

IPv6 Compliance

    

Underlying Technology

    

Have ideas for what else we should be scanning for? Please file an issue or add your idea to the list of proposed future scans!