Referral spam is a technique used by spammers to make several repeated requests to a website via a fake referral URL, so that the spammer can advertise that website.
I’ve noticed over the last few months that the amount of referral spam showing up in Google Analytics has started to increase to the point where it really skews the data. Around the web there has recently been a bit of buzz on this issue, and more and more posts have been popping up, giving advice on various techniques for filtering this traffic, or even hitting the server to begin with.
This can be very frustrating and waste a lot of time, as it’s gotten to be up to 5-10% of overall traffic in some cases, even for those sites that have substantial monthly visitor counts. As a result, the data has to be manipulated more to get a good sense of where traffic is actually coming from. Also, it can be annoying to have to continually explain to a client what semalt and 100dollar-seo is, and why it keeps showing up in their analytics when it isn’t actually helping them.
Since Google hasn’t taken any steps at this point to help users deal with this issue, we needed to take this into our own hands, and tried to find a solution that was easy to implement and solved the problem without causing further difficulties. So here we went, on a little journey…
Failed Solution 1: Updating the .htaccess file
Back in February of this year, the first method we tried was to actually manipulate the .htaccess file, to prevent the traffic from even hitting the server in the first place. This involved a few steps that our development team could easily take, to add in all of the offensive URLs and banish them from showing up forever. Once this was implemented, however, we noticed a huge slowdown on a couple sites – the more pages, the worse the site speeds got. With every request, the referrer was being checked multiple times and it added up a lot. In our case, average time to first byte for every file was about 500-600ms after the change, while it was 100-150ms before the file was updated.
As a result, we decided to revert back right away, and put up with the skewed data until finding another workable solution.
Failed Solution 2: Adding Domains to the Referral Exclusion List
Another option that we looked at was setting a Filter to add the unwanted domains to a Referral Exclusion List under Admin –> Tracking Info.
Well, it turns out that this didn’t work either, as this doesn’t end up actually removing the Users and Sessions, but just moves them to be added to your Direct Traffic stats instead. After running this option for a few more days, I stumbled into an article by Black Belt Robots that showed that this was indeed the case, and helped me towards my current solution.
After these two failed trials and tribulations, we’ve managed to find a solution that isn’t complex and doesn’t take too much time.
Our Steps to Success!
Step 1: Setup Bot Filters
One thing that you should definitely be doing as a base starting point is to go to the Admin tab of your Analytics account and under specific View, select Bot Filtering to exclude all hits from known bots and spiders (although I’m curious to know how much of an impact this has at the moment).
Do this as a starting point at a very minimum, and then continue on to the next steps to actually remove more of the culprits, which this doesn’t seem to capture.
Step 2: Filter by Campaign Source
After not getting the data we wanted from adding domains to the Referral Exclusion List, we tried to setup a new Filter instead. This actually resulted in removing the unwanted traffic from showing up completely, as far as we can tell (again, props to Black Belt Robots).
To set up this filter, start by going to your Google Analytics View, and create a new Custom Filter that excludes by Campaign Source (we named it Exclude Spam Referrals).
Fill out the Filter Pattern field with the pesky domains that you need to exclude, and then feel free to click Verify this Filter to ensure that it will indeed remove that traffic.
Some of the domains that we have found to be the most offensive include the following (with formatting):
event-tracking\.com|best-seo-offer\.com|free-share-buttons\.com|buy-cheap-online\.info|get-free-traffic-now\.com|free-social-buttons\.com|buttons-for-your-website\.com|weburlopener\.com|webmaster-traffic\.com|100dollars-seo\.com
it-max\.com\.ua|billiard-classic\.com\.ua|ci\.ua|mirobuvi\.com\.ua||simple-share-buttons\.com|social-buttons\.com
As there is a character max limit in the Filter Pattern field of 255 characters, we had to setup two separate filters to account for all of the offending domains.
To make sure you gather an all-inclusive list of potential domains that may be causing you issues, just go to your Acquisition Referrals Report, and look for strange sounding domains – they’re pretty easy to spot.
We now continue to check about once per month to see if there are any more spammy domains that we need to add to the existing filters, and recommend you do the same.
Step 3: Add an Annotation
One thing to note is that any of your historical data will not change, and just what you see once you’ve setup a filter. As a result, it’s a good idea to add a new Annotation every time you are doing something significant that may be impacting your analytics.
All we did here was setup a new annotation for the date that we setup these new filters (this also helped us each time we made a failed attempt earlier, so we could track what may have been causing issues or what was actually leading torwards progress).
Alternate Solution:
Although we haven’t tried this one yet ourselves, I’ve come across another solution, from Distilled, that appears to be even more simple to set up. This solution was just published a couple days ago.
It’s based on the premise that most spam sessions fall into one or both of the following categories:
- Invalid hostname (i.e. not your site)
- Screen resolution = “(not set)”
and thus, two filters can be setup for each to ensure that this traffic is excluded.
Hostname Inclusion:
and Screen Resolution Exclusion:
Since we’ve already created our other filters, we won’t be fiddling around with this for now, but will definitely be keeping this in mind for greater ease in newer Analytics accounts, should the solution have staying power.
Conclusion:
The good news is that, even though this issue can be a huge pain, you do have options. I recommend giving these solutions a try (and avoid making the same mistakes I did).
Please let me know if there are any other issues or solutions that you’ve found to help others not have to go down the same broken path and enjoy referral spam-free Google Analytics data!