How to get better Google Analytics Data in 2014

Show all Close
December 30, 2013

Avoid sampling and get more accurate data from Google Analytics

Towards the new year and annual reports, we chose to focus on an issue that impedes data accuracy in large websites, using Google Analytics. In a nutshell: using its free version, GA limits the amount of processing it allows on top of the raw data it collects, to save bandwidth and processing.

What is sampling?

Sampling, put simply, is a method of measuring only part of the data in order to infer a measurement over the entire set of data with improved efficiency. This method is quite reliable as long as the sampled set is representative of the entire population. Google Analytics uses sampling in some cases when presenting reports.

When does sampling occur in Google Analytics?

GA pre-aggregates the data required for all the standard reports that are accessible through the (left-hand-side) menu, so they can be generated quickly without compromising accuracy. However, in many cases you’ll want to drill down beyond the standard reports set by adding secondary dimensions to reports, applying a segment, or creating customized reports. In these cases, Google goes to the raw session data, but limits the amount of data selected to 250K pre-filtered visits, in order to speed up performance. When looking at large datasets, you might see a yellow notification on the top right-hand corner saying that the report is based on a certain percentage of the data.

 

sampling3

When should you be concerned?

In most cases, as stated above, sampling provides a reliable proxy of the full data. However, you should start being cautious with the numbers you see in the following cases:

  • When precision is key. Mostly, we use stats to get insights on user behavior and make smarter design and marketing decisions based on trends and comparisons. However, sometimes you need exact figures, for example when paying royalties based on content consumption. In these cases, “close enough” might not be good enough.
  • When looking at the long tail. Extrapolations tend to even out with large numbers. However, the more you drill down (for instance, looking at like how many unique visitors from a certain campaign visited the FAQ page before buying a product), the less you can rely on the numbers you see.
  • When the sample rate is low. The lower the percentage of the actual amount of visits that the report is based on, the less confidence you can have in the data. When going below 1 or even 10%, you should question it.

What can be done?

Go for higher precision.  The first thing that you can always do is double the sample base (from 250K to 500K visits) with the control icon displayed above the yellow notification (see above). This will only make the report a bit slower, but definitely tolerable.

sampling4

 

Measure shorter date ranges.  When reporting over a large data range (some of you might be running your annual reports soon…), you can reduce sampling considerably by running your reports over smaller segments (e.g. one per month) and then using excel or other tools for adding up.

Structure your GA account.  Standard reports are unsampled for any profile. Therefore, if you have certain segment of data that you look at a lot (e.g. new visitors, organic traffic), create specific views (profiles) for them. However, this will not help you with non-standard reports, which are based on raw data from the entire property (everything that’s tracked with the same UA-xxxxx-yy code). Therefore, if you have large numbers of visitors, use separate properties for your website, apps, blog, etc.

Consider using multiple trackers.  In some cases you’ll want high accuracy for a subset of your data (e.g. registered users activity) while still being able to see it within a consolidated view of all visitor activity. In cases like this, you can use an additional tracker just for those subsets. Another use case for this method, is that if you’re tracking your website and mobile apps under separate properties, you can use an additional tracker for sending all of your data to a Universal web property for measuring cross-platform user behavior.