Content may be king, but with HTTPS migrations, the prevalence of mobile, the increasing importance of site speed, and countless other recent developments, the world of technical SEO has been seeing more than its fair share of shakeups recently. And while marketers are being pressured more than ever to report on ROI and use data to build their case when pitching improvements, their Google Analytics accounts are not being maintained well enough to accomplish this.
In the past few months, my fellow Dragon (read: co-author) Caitlin Boroden and I have been researching the many newly-emerging issues related to maintaining clean Google Analytics reporting. The challenges, which seem to be at an all-time high, are making our ability to provide accurate and easy-to-understand data reporting more difficult than ever. From our vantage, it seems to boil down to four key areas.
HTTPS Migration = Lost Data
It’s not just Google that’s pushing for HTTPS migration; champions of privacy all over the web are calling for the change, and major websites like Bing, Reddit and Netflix are answering the call. While referral traffic from most major websites seems to be working properly, others are wreaking havoc on our data.
Most of us already know that Bing organic search traffic has been misreporting as referral traffic, but as companies continue their mass migration to HTTPS, there are other surprises out there as well. For example, you may not have noticed, but Wikipedia went dark in Google Analytics when it migrated to HTTPS on June 12th:
If you’re seeing sharp drops in referral traffic, it pays to see if a major referral source has recently migrated to HTTPS. In cases where no announcement was made, I like to enter the URL in Google search and look at the backlinks to both the HTTPS and HTTP versions. Check to see if the number of references to the HTTP version far outweighs the number to the HTTPS version, and if you can find how old links to the HTTPS version are. In the case of Wikipedia, a Google search shows 81,800 links to the HTTPS version and 599,000 links to the HTTP version.
As for the loss of reporting data, we have three options:
- Convince all website owners that switch from HTTP to HTTPS to implement the meta referrer tag,
- Migrate your own website to†HTTPS (data is not lost from one HTTPS site to another), or
- Accept the loss of data.
Migrating to HTTPS has several advantages and may be worth it if your management team:
- Buys into the potential benefits of HTTPS migration, and
- Understand the demands on IT – both in implementation and troubleshooting afterwards, and
- Is prepared to accept the short-term traffic loss that goes hand in hand with the migration.
All said, HTTPS migration is a very hard sell. For the short term, at least, the marketing team will likely have†to accept the loss of data.
Data Attribution Assignment is Blurring
This issue came up at DragonSearch a few days ago, when Jason White was researching Google’s new Tweet Box in search. One thing he found was that click traffic from links in tweets that appear in SERPS are credited as Organic Search traffic, rather than from Twitter – Social. (For the full research on this subject, check out his blog post.)
With embedding, the issue can be further complicated. For example, are links from embedded tweets in blog posts: social traffic or referral? (Social.) If a user clicks on a link in an embedded YouTube video, does YouTube or the embedding site get the referral? (Plot twist: it’s direct traffic.) There seems to be no hard and fast rule; you’ll have to test as you go.
Additionally, be sure to check your Organic Search traffic sources for unexpected reporting. For example, if you see a spike in traffic on a month when your company is hiring aggressively, you may be find that many job search websites are counted as organic search:
In Google Analytics, you’re allowed to add a search engine to your reporting list, but so far as I can tell, there’s no way to remove the ones that that are already set.
It’s also worth remembering that the traffic source in cookies is stored for six months. This means that whatever brought your first visit – whether it was organic search, direct, referral, social, etc., this will remain the attributed source for future visits for quite some time! (Unless, of course, they found you first on their mobile device, then followed up with a desktop visit…)
Direct Traffic Has Become a Dumping Ground
When Google Analytics is in doubt, it drops the session into the Direct Traffic bucket. This problem is so extensive that it’s made the metric virtually unusable. While some of the data dropped into this source type is related to fixable, technical issues, there’s a whole lot going on that we have little or no control over, and usually no way of fixing in Google Analytics. Common examples of “dirty” Direct Traffic include:
- Organic Search Traffic (An experiment by Groupon showed that as much as 60% of direct traffic is organic search)
- Clicks on links in e-mails where no UTM parameters are defined
- Links from YouTube Videos
- Clicks on links in PDFs
- Links from shortened URLs (depending on the shortener)
- HTTPS – HTTP referral issues
- Traffic related to technical issues (ex: double reporting from virtual pages)
- Google has confirmed that a direct-traffic reporting issue applies to campaign attribution
- Anything else – anything at all – that Google doesn’t understand
Until recently, and for months beforehand, Caitlin and I had also seen PDFs appearing in Real-Time reporting as direct traffic when clicked on from organic search results. However, a few weeks ago, they disappeared! We didn’t see a correlating traffic drop, even on sites that have more PDFs than HTML-based content pages. Was the traffic translating to direct traffic in day-to-day reporting beforehand? Who can tell?
Keeping your direct traffic free of issues requires constant analytics monitoring – overlooking a reporting issue for even a week can result in an embarrassingly large traffic spike or drop. (Get those alerts in place!) For example, I’m working with a client right now whose data shows a sudden 153.17% spike in their direct traffic. From what I can tell so far, the GA code has ended up on an internal-only website that employees use for data entry. Face a few problems like this, and building accurate year-over-year reporting becomes a daunting challenge.
Barring technical disasters, how much of “Direct Traffic” is actually direct? To put it in perspective, take a look at the example below. I’ve compared all traffic to direct traffic, for new vs. returning visitors, for one website, over an eight-month period:
If Direct Traffic is coming primarily from users who type the address in the URL bar or click through a bookmark, shouldn’t the percentage of returning visitors be MUCH higher than the average percentage for the overall website? Instead, we see a difference of 0.51%. (Also, check out the numbers under “New Sessions” – 100.10% And 100.11%? Are you kidding me??!?)
To put the nail in the coffin for Direct Traffic, take a look at the landing page visits. In the case above, Google is reporting 3,694 different landing pages that include all sorts of crazy, session-specific, and long URLs that aren’t likely at all to be bookmarked or typed in by hand. On top of that, the home page is the #3 result by traffic volume, consisting of only 8.69% of all direct traffic. Seems awfully dubious, if you ask me.
Again, of course, this goes back to the way Google builds cookies – any search that begins with direct traffic can stay reported as direct for up to six months. But if that’s the case, it’s very important to take that into consideration, and to pass that knowledge along†to management and our clients.
Google Analytics Data is Being Polluted
If you haven’t already turned on filters for your bot traffic, do it now; the amount of noise it creates can be astounding. The example below (from an account I worked on early this year) is fairly extreme, but I’ve seen several cases where a high-traffic website saw a 50% drop overall as bot traffic was purged from data collection.
Filtering this traffic is extremely easy. From the GA view of the site, go to Admin –> View –> View Settings, and check the box pictured below. Save it, and your new traffic will be protected from corruption.
Finally, I’m sure that most of us have fallen victim to Referral Spam at least a few times in our recent analytical adventures; Adam Singer of the Google Analytics team has been saying for months now that they’re working on a solution for this issue. In the meantime, the best, simplest, clearest article I’ve read on filtering out this data is by Carlos Escalera. I recommend checking it out and giving his solution a shot. Even with a fairly simple, in-house designed advanced filter, some accounts can see an incredible cleansing of their data:
Organic Search Spam
While most marketers have seen and heard about their share of referral spam, what you may not have noticed yet is that, for the past few months, organic search spam is also becoming an increasingly large issue. Caitlin and I have seen the anomalies across a number of different clients, and it seems that no Google Analytics account is safe. The spam becomes apparent when you take a look at the keyword data within the Organic Channel of your analytics account, with keywords ranging from risqué to bizarre:
I’d like to point out that the metrics you would except to see with spam traffic (huge bounce rate, brief time on site) don’t seem to be present here – and the hostnames all check out as legitimate as well (which means Carlos Escalera’s nifty tricks won’t work). The bounce rates are decent, the pages views are, on average, more than one and the average session duration is 2+ minutes.
Here’s how you can take a look at this issue for yourself in three easy steps:
- Navigate to Acquisitions > Overview within Google Analytics
- Select Organic Search.
- Take a look at the keywords driving traffic to your site. You’re bound to find some that are fishy.
The amount of traffic coming from these sources is the hardest thing to track down. As the screenshot above shows, it would appear that only 68 spam organic visits affected the site in the two-month timeframe analyzed (on a site that has thousands of visitors). However, we can’t be sure there’s not even more visits simply being added into the (not provided) bucket, and there’s not much we can do to stop it from running rampant in the future. To date, we know of no solution for this problem.