Navigating the Google Analytics 4 API: Quotas, Filters, and Python Integration

The new edition of Google Analytics with an innovative approach to data structure was added in the summer of 2019 under the Google Analytics App + Web name. In autumn 2020, Google released it from the beta model and rebranded it as Google Analytics 4. While it is still not advocated to transfer your complete data measurement to the new edition as it's still being progressively developed, it definitely merits more interest as Google is introducing many new features. 

GA4; Source: Cross Masters

The new Google Analytics 4 is built around events and their parameters. It combines data from apps and web analytics into a single database, as the name suggests. Moreover, it serves as a flexible device for cross-platform analysis. To learn more about Google Analytics 4, check out our previous blog on GA4 APIs: Getting Started. In this article, we will delve deep into topics about GA4 API, like the GA4 API quota limits, GA4 API filters, GA4 API metrics and dimensions, and GA4 API using Python.

What are GA4 API quotas?

GA4 quotas are categorized into 3 request classes: core, real-time, and funnel. “API requests that Core methods charge Core quotas. API requests to real-time methods consume real-time quotas. One request will not consume both core and real-time quotas.” The quotas are used to ensure equity and parity among Google users.

The official Google Analytics documentation explains a bit about these quotas. They have a long list of numbers that are likely to not matter much to Looker Studio users. Two quotes from this lengthy list that have the most impact on Looker Studio reports are concurrent requests and hourly tokens.

Concurrent requests are less complicated to understand—the more visitors access your reports concurrently, the quicker you hit the quota. The second component is the number of visualizations you use on your reports, in addition to the complexity of the data you’re consuming. Filters, massive amounts of data, and frequent interaction with your report will all be counted against your quotas.

How to deal with the request limits

You can limit the access of customers to reports or lessen the number of charts to stay within the quotas for longer—but you do not want to restrict your company's potential to generate actionable insights.

GA4 Limits; Source: LookerStudio.VIP

Apart from having access to GA4 data in Looker Studio, an option is to extract data from the data source inside your token. However, it also comes with its own constraints and could require a number of repeated manipulations so that you can get up-to-date data.

Talking about better solutions, there are those who prefer the use of third-party connectors for GA4 and Looker Studio, as they normally come with a simplified interface and do not involve writing SQL to get the data.

Another option is to export your data to BigQuery. Your selection will depend on your case, so let’s dive deeper into how BigQuery figures in this case to help you decide.

BigQuery (BQ) is a Google Cloud Platform service that provides a serverless data warehouse. It enables the storage and analysis of data. Using it you can export your data once to your own cloud and then query it with pay on usage basis thus avoiding any limits.

Approach to setting up reports based on BQ GA4 data

Since the data exported to BigQuery is raw and has a unique structure with nested rows, you will need to do certain manipulations to get the fields you need and to ensure efficient consumption of resources.

BigQuery; Source: Dataedo

The common workflow for creating reports using BQ GA4 data is as follows:

  • Define the data you need for your report and write a query (or queries) to create a separate table(s), which will then be used as data sources in Looker Studio. Never use your GA4 export events table immediately in dashboards; you will most likely not use all the data, and it'll increase the charges tremendously.
  • Schedule the query (queries) to be replaced automatically with the frequency wanted.
  • Make a Looker Studio data source based on the tables. You may also need to create new custom fields manually in the data source, as not the whole thing can be blanketed in query results. For example, if you need an in-depth report, you could add several fields, some of which may also distort the other metrics you would like to consist of. 
  • Build the file based totally on the data source(s) you defined.

Further parameters in a single query

Quotas report

One of the new features provided by the GA4 API is the ability to view the quotas for a specific property. Because requests in the GA4 API cost different amounts of tokens based on their complexity, this information serves as a warning if you are approaching any of the thresholds or as information about the number of tokens consumed by a given request. If you run too many complex requests, you'll reach the limit of your available quotas and may need to wait for the next hour or day, depending on the form of quota you have reached.

Quota reports; Source: Supermetrics

The quota information may be acquired by setting the optional parameter returnPropertyQuotas to true. As a result, an additional object is added to the results at the same level as rows or headers. The results include five quotas of information. First, they inform you of the number of tokens consumed by the current query, as well as the number of tokens remaining per day and hour. 

Notes on filters

Next, we include a comment about the filters in the GA4 API. The metrics and dimensions filters are separated in the same way as in the original API. There isn't any filtersExpression parameter that would allow simple filters to be written in a single line.

However, measurement and metric filters are now more flexible concerning combinations of more than one filter together. In the original API, you can integrate multiple filters on only one level, specifying that both filters should keep (using the 'and' operator) or that it's enough for only one of the filters to be glad (using the 'or' operator). 

Filters; Source: VakulskiGroup

In the GA4 API, there is an option to chain several of these operators using a fixed set of the following parameters successively: andGroup, orGroup, and notExpression. For instance, if you want to analyze more than one group of site visitors or occasions wherein, at least, one group is specified by more than one condition (e.g., people who visited a certain page using a mobile phone or when an error occurred for people coming from a specific source), you may effortlessly do it inside the GA4 API while you will want separate requests using the original API.

GA4 API using Python

Until now, we've mentioned the JSON-established queries used in the request frame of cURL, HTTP, or JavaScript requests. However, it is possible to run the requests in Python as well, using the Google Python library known as google-analytics-data. The run of the Python code requires receiving a path to an access token for the service account and GA4 property ID to which this service account has access.

GA4 Data using Python; Source: YouTube

This is a sample code along with information on how to prepare your surroundings. To run the Python code, you need to first install a virtual environment, then put together a file with a path to an access token for the service account, after which you can run the Python code, which will ask you to input the property_id as an argument. For the setup on Windows, run the code underneath in the Command Prompt. Replace the placeholder <your-env> with the selected name of your environment.

pip install virtualenv
  virtualenv <your-env>
  <your-env>\Scripts\activate
  <your-env>\Scripts\pip.exe install google-analytics-data pandas python-dotenv

To prepare access to data, you want to have a service account that has access to the GA4 property data. The local path has to be saved under the parameter SERVICE_TOKEN_PATH in the .env file that is located within the same folder as the Python code. The file must be in the following format:

SERVICE_TOKEN_PATH="C:/Users/YourUser/Documents/service_account_token.json"

When you have installed the virtual surroundings, prepare the .env record and set the GA4 property ID into the parameter property_id, then you can run the Python code to get the sample results. Starting in the year 2021, the code loads and prints data about active users and the number of sessions by date, country, and city characteristics.

from dotenv import load_dotenv
    import os
    import json
    import pandas as pd
    from google.analytics.data_v1beta import BetaAnalyticsDataClient
    from google.analytics.data_v1beta.types import DateRange, Dimension, Metric, RunReportRequest
    
    # Setting 
    #(.env file is located in the same location and contains SERVICE_TOKEN_PATH=[local-path-to-Google-service-token-with-data-access])
    load_dotenv()
    SERVICE_TOKEN_PATH = os.getenv('SERVICE_TOKEN_PATH')
    property_id = '<set-your-property-ID-here>'
    
    def sample_run_report(property_id):
        """Runs a simple report on a Google Analytics 4 property."""
    
        client = AlphaAnalyticsDataClient.from_service_account_file(SERVICE_TOKEN_PATH)
        request = RunReportRequest(property=f"properties/{​property_id}​",
                                   dimensions=[Dimension(name='date'), Dimension(name='country'), Dimension(name='city')],
                                   metrics=[Metric(name='activeUsers'), Metric(name='sessions')],
                                   date_ranges=[DateRange(start_date='2021-01-01', end_date='yesterday')])
        response = client.run_report(request)
        return response
    
    def sample_extract_data(response):
        """Extracts data from GA 4 Data API response as Pandas Dataframe """
        data_dict = {}
        for row in response.rows:
                data_dict_row = []
                for i in range(len(row.dimension_values)):
                    data_dict_row.append(row.dimension_values[i].value)
                for j in range(len(row.metric_values)):
                    data_dict_row.append(row.metric_values[j].value)
                data_dict[response.rows.index(row)] = data_dict_row
    
        columns_list = []
        for dim_header in response.dimension_headers:
            columns_list.append(dim_header.name)
        for met_header in response.metric_headers:
            columns_list.append(met_header.name)
        return pd.DataFrame.from_dict(data_dict, orient='index', columns = columns_list)
    
    
    if __name__ == "__main__":
        query_response = sample_run_report(property_id)
        sample_extract_data(query_response)

Conclusion 

The Google Analytics Data API, which was recently released, is designed to query data from Google Analytics 4. In this article, we summarized some of its features, like query limitations, filters, and querying using Python. The basic parameters and results' structure are very similar between the approaches of GA4 and Universal Analytics; however, there are some new features added (like the Quotas report), and there are numerous steps towards consistency among dimensions and metrics and between the different types of queries. Thus, the GA Data API seems to be very easy to transition to, and at the same time, it brings the unique experience of working with the new Google Analytics.