Sampling

SamplingWhat is sampling?

Sampling is a process by which a part (a sample) of a larger portion of data is used to characterize the rest of the data. Though sampling can't give 100 % correct information about the data set, it is accurate enough to spot trends in the data - the spikes and drops in the dataset. Sampling, because it processes a smaller amount of data representative of the whole, reduces the load on the systems doing the processing.

Sampling is also used in statistics when it’s impossible or impractical to analyze all the data that is accessible. Instead, a small, randomly selected subset is used to keep things controllable. Sampling isn’t something to fear, but, in Google Analytics in particular, the data cannot always be trustworthy. For this reason, it’s definitely worth your time to understand when it occurs, how it affects your work, and how it can be avoided.

How does sampling work?

Sampling occurs when you endeavor to filter or manipulate a particularly large amount of data from a huge number of visits, or use a large number of visits as the basis for setting up a complex/multiple set of filters in custom reports or advanced segments.

What are tools and methods to implement sampling?

You can always tell when sampling is being used, because of this yellow line at the top of every report. If the percentage is below 100%, then assume sampling is in progress. sampling-1The level of sampling in your data can be controlled or adjusted, but you cannot cancel it. You can adjust the tradeoff between higher accuracy/precision and faster processing time. If you use a larger sample to increase the accuracy of your data, you'll have to wait a long time for your report to load. If you use a smaller sample because you are running short of time, your data might not be entirely accurate – just accurate enough to help you spot trends in the data.

Follow the steps below to increase the accuracy of your data:

Step 1: Select the dotted square icon, just below the date on top right.sampling-2Alternatively, to avoid sampling simply  select a date range which would include less than 500,000 sessions. If you want to analyze a wider date range: export the data or numbers of the two (or more) date ranges to Excel or CSV and combine them there.

Another solution is to create several profiles or views that track a smaller part of your site data. Within those profiles or views you won't hit the 500.000 session limit as quickly as in the main profile.

Note: Flow Visualization reports are sampled after 100,000 sessions and 1 million conversions in the Multi-Channel Funnel reports.

If you like my articles,