At the Digital Analytics Program (DAP), some of the most frequently asked questions we get are “how can I get access to the DAP data?” and “what do I do with all this data?” We all know that data is knowledge, and knowledge is power, but once we have access to it and realize that it is, indeed, oceans of data, how do we not “drown” in it, and, perhaps more importantly, how do we make sense of it?
What Questions Are You Trying to Answer?
In research and data analysis, a hypothesis-driven approach is one of the main methods for using data to test and, ultimately, prove (or disprove) assertions. To do that, researchers collect a sufficient amount of data on the subject and then approach it with a specific hypothesis in mind. On the flipside, there’s also exploratory data analysis, where data analysts “dive” into data in search of patterns (or lack there of). Often times, exploratory data analysis creates a foundation for hypotheses, or feeds our assertions that then get used as part of hypothesis-driven testing.
So what does this all mean? It means that when you are attempting to perform an analysis using multidimensional website traffic data, you, too, should approach it with a goal in mind, or, at minimum, specific questions you want to be answered. Hence, when people ask us, “what do I do with all this data?” our response is “what questions are you trying to answer?”
Example: Real-Time Gov-Wide Visitors on #TaxDay
Let’s use this in a concrete example. On April 14, 2015, in a conversation with my team regarding the gov-wide website traffic on #TaxDay, I casually made an assertion that the number of real-time visitors on April 15, 2015, may reach 300,000. The assertion was based on the fact that in almost three years of the DAP’s life, we’ve seen several big spikes (up to 220K real-time visitors), which had happened before IRS.gov became part of the DAP. So, I figured, now that DAP monitors IRS.gov traffic, in addition to other large governement websites, perhaps we’ll see a big spike in real-time visitor traffic on #TaxDay because lots of people will be online filing their tax returns and/or extensions on the last day. My hypothesis was not based on any previously done exploratory analysis.
To see if we’d reach that new record of real-time traffic in DAP on #TaxDay, our DigitalGov team performed live blogging and hourly monitoring throughout the day yesterday to report on the real-time users on the .gov websites as part of the public dashboard. And…. the hypothesis did not hold true (and was rejected by reality). The highest number of real-time visitors gov-wide we got on #TaxDay for 2015 was just shy of 200K.
What Happens After the Hypothesis is Tested
So now that the results of my hypothesis are known, the big question is “why?”
Well, the data tells us that for the last three months, the “Where’s My Refund?” page has been consistently in the top performing pages with the highest number of real-time users and a total 115M+ pageviews since February 1, 2015. IRS-related pages overall have been dominating the top 20 active pages consistently in the last three months, suggesting that people were filing their taxes online during the last few months and, then, naturally, spent most of their time wondering when they get their refund. We did see a spike in the extension applications [PDF] downloads yesterday, which makes sense, but the number of visitors filling out the application was not high enough to bring us anywhere even close to 300K real-time visitors.
Interestingly, the uptick in overall traffic to .gov websites on #TaxDay was modest compared to previous events driven by weather, space shuttle launches, and asteroid fly-bys. Hmmm…. that may be a good topic for a different blog post.
The DAP yields a lot of data and may be overwhelming but is a goldmine for exploratory and hypothesis-driven analyses. With the right questions in mind, this data can help the government to better understand its visitors and what brings them to agency websites, and ultimately, continuously create better web content and digital services for online visitors.