GA4 Cohort Analysis in BigQuery

Recreating GA4 Metrics and Reports in BigQuery

A group of users who share a common characteristic is called a Cohort. For example, you can create a cohort of all the users who were acquired on a particular date or a cohort of all the users who visited your site on a particular date/week/month, etc. In GA4, you can build a cohort exploration to understand the behavior of a cohort.

Here is a cohort report showing you the users who did any activity on a particular date (in rows). In the columns, you shave the retention of those users.

For example, on May 13th (first row of data) there were 629 users who did any activity on the site. I used “any event” as cohort criteria and “any event” as return criteria to see the activity. Of those 629, only 37 came back the following day, and only 17 the following day, and so on.

If you are not familiar with how this exploration works then join my GA4 Newsletter, I will discuss it there. In this post, I will share how you can create similar reports using BigQuery.

In GA4 there are a few things you have to do.

  1. Decide the granularity of the data. In my example, I am using it daily.

  2. Get the list of users who belong to the cohort. In my example, I am looking at users on a particular day, who did any activity on the site.

  3. Use that list to query if the user returned on the following day.

  4. Repeat this for the date range you want to build the report for.

  5. Repeat the above 4 steps for each of the dates in different rows.

Check out the past editions of this newsletter to see how I write the various queries. I am using the concepts that I have covered in the past.

Sample Query

SELECT distinct user_pseudo_id 
FROM `optizent-test-project.analytics_XXXXX.events_20240413` 
where is_active_user = true

The above query gives you a list of active users on a given date, 13th May in the example above.

select count(distinct user_pseudo_id) 
FROM `optizent-test-project.analytics_289314775.events_202405*` 
where is_active_user = true
and user_pseudo_id in ('the users selected in the first query')
and _TABLE_SUFFIX BETWEEN '14' and '19'

This query gives you the count of users who came on 13and and also came on the 6 days following the main date.

If you want to see the complete sample query, become a premium subscriber of this newsletter. Premium membership gives you full access to all the SQL and advanced concepts.

Use the sample queries to practice how to use GA4 data in BigQuery. Stay tuned for more such queries and use cases

Thank you,
Anil Batra, Optizent.com

Reply

or to participate.