March 10, 2017

To understand users, we need to know how they consume content. The key metric in understanding this consumption has been the “pageview,” traditionally an event signaled when a user navigates to a new page. 

With the rise of single-page applications and that of serving related content via infinite scroll rather than link navigation, this traditional “pageview” – the heart of most Google Analytics dashboards – often doesn’t tell us enough about content consumption.

Most of the articles and Stack Overflow answers I’ve come across on the subject of Google Analytics infinite scroll tracking of pageviews simply suggest to fire a traditional Google Analytics pageview when a new piece of content is scrolled to:

ga("set", "page", uri); // set the Google Analytics “page” to our scrolled-to content’s uri
ga("send", "pageview");

This approach is problematic for a couple reasons:

1) We have no way to differentiate a traditional pageview from a scroll pageview to understand how our users are consuming content.

2) The piece of content we’re telling GA about might contain hit-level custom dimensions or metrics that differ from the current page.

 

Issue #1: We need to differentiate traditional pageviews from scroll pageviews

The crux of the problem is this: we want one metric to reflect the total consumption of site content, yet we want to be able to accurately dissect behavior. Additionally, we want to show investors, advertisers, and other stakeholders the highest possible pageview count. We want to do so not merely because it’s the most impressive number, but because it’s a representation of the total consumption of content, regardless of how it is served.

This is why I recommend that everytime we send a traditional “pageview” to GA, we also send a “scroll pageview” event along with the content’s uri. The snippet now simply becomes:

ga("set", "page", uri);
ga("send", "pageview");
ga("send", "event", "Scroll Pageview", uri);

If your content doesn’t have a uri, you can send a page name instead, but it will be more difficult to make calculations against your Google Analytics pageviews (tied to a path). Also, a canonical uri should never change – even if it’s fictional, which is a good practice.

With this approach, all of our ”pageview” numbers in Google Analytics will now by default be the greatest value (i.e., total content consumed). When we want to compare traditional and scroll pageviews, we simply run two queries to get the differential. In pseudo-code, this looks like:

Query #1 – get all pageviews by content uri

metrics: ga:pageviews
dimensions: ga:pagePath

Query #2 – get scroll pageviews by content uri

metrics: ga:totalEvents
dimensions: ga:pagePath
filters: ga:eventCategory=~scroll pageviews

You can easily calculate the traditional pageviews by taking the difference between these queries in your backend or on a spreadsheet (if you’re not already using the Google Analytics Spreadsheet Add-On tool for Google Sheets, I’d highly recommend it).

For some cases, this is as far as you need to go, but for many, we still need to deal with a second issue.

Issue #2: We need to tell GA about hit-level custom dimensions or metrics

For many of content-driven sites, we provide GA with some extra hit-level data so that we can understand how particular types of content are performing. Some examples I’ve worked with are tags, date published, cost to produce, and info on the writer. For a traditional pageview, you set these just before you send your pageview:

// On page load
ga("set", “dimension1”, dimension1_value);
ga("set", “dimension2”, dimension2_value);
ga("send", "pageview");

For infinite scroll pageviews, you need to make sure to set content-specific, hit-level dimensions and metrics before every piece of content you send events for to GA. For example, you wouldn’t want to send a pageview to GA for a scrolled-to piece of content with the tags or published date of the actual page that the user is on.

Taking this into account, the full analytics code should look something like:

// On page load
ga("set", “dimension1”, dimension1_value); // Session-level
ga("set", “dimension2”, dimension2); // Hit-level
ga("send", "pageview");

// Whatever other code is executed on your page

// On scroll to a piece of content
ga("set", "page", uri);
ga("set", “dimension2”, new_dimension2_value);
ga("send", "pageview");
ga("send", "event", "Scroll Pageview", uri);

Note: Remember to only fire pageviews the first time a user scrolls to the new content! You can do this by setting an HTML data attribute or building an array of fired uris.

Finally, be careful if not all your content has a particular custom dimension set, because you will have dimensions falsely attributed to content. If your 1st piece of content has dimensions 2 and 3, but your second only has 2, you must set dimension3 to an empty string or null: 

// On page load
ga("set", “dimension1”, dimension1_value); // Session-level
ga("set", “dimension2”, dimension2); // Hit-level
ga("set", “dimension3”, dimension3); // Hit-level
ga("send", "pageview");

// Whatever other code is executed on your page

// On scroll to a piece of content
ga("set", "page", uri);
ga("set", “dimension2”, new_dimension2_value);
ga("set", “dimension3”, null); // GA will then ignore this dimension
ga("send", "pageview");
ga("send", "event", "Scroll Pageview", uri);

There are several related tasks that are beyond the scope of this blog post because the have their own issues and framework dependence, e.g, how to spy on content in scroll events, changing the url using history.pushState(), getting info on the scrolled content from the backend with AJAX, etc.

Hopefully this is helpful for those serving content dynamically who want to both show total content consumed and understand consumption simultaneously.