Correctly Calculating GA4 Sessions in BigQuery

Recreating GA4 Basic Metrics in BigQuery series

In this post, I will share how you can recreate the “Session” metrics in BigQuery.

There are three ways to get the “Session” metrics in GA4 and each one tells a different story.

In addition to the “Session” metrics there are also “Engaged Session” metrics, that I will cover in a future post.

For now, let us focus on “Session” metrics. The three ways of calculating are:

  1. Using session_start event

  2. Using ga_session_id event parameter

  3. Using ga_session_id event parameter and user_pseudo_id column

Using session_start event

Every time a new session starts, GA4 fires a session start event. You can count the number of session start events for the time frame you need to calculate by using the COUNT function. Here is the sample SQL

SELECT COUNT(*)
FROM your_project_id.your_dataset_id.your_table_id 
WHERE event_name = "session_start" 

So when I ran this SQL on my dataset, I got 896 session_start events for a particular day.

However, GA4 shows 893 sessions on that particular day.

So why is it different? We will cover this later but now let’s take a look at the second method of calculating sessions.

Using ga_session_id event parameter

As you might recall from one of my previous emails, GA4 stores a session_id ,based on timestamp, in a parameter called ga_session_id. To count sessions based on this field simply do a distinct count of the ga_session_id value, which is stored in the int_value column of event parameters. Distinct makes sure that you count one session_id only once.

Before you can use the values stored in event parameters, make sure to unnest the event_params column. Here is the sample SQL:

SELECT COUNT(DISTINCT value.int_value)
FROM your_project_id.your_dataset_id.your_table_id, unnest(event_params)
WHERE key = 'ga_session_id'

Running this SQL on my dataset, I got 882 as the session count.

Note: Even though the name of this field suggests that it is unique but it is not. More than one user can have the same ga_session_id so using this field is wrong to begin with and hence ignore the metrics based on this field. Instead, use the next method that builds on this method.

Using a combination of ga_session_id and user_pseudo_id

Subscribe to keep reading

This content is free, but you must be subscribed to GA4 BigQuery Newsletter For Marketing Analytics to continue reading.

Already a subscriber?Sign In.Not now

Reply

or to participate.