How to Automate SEO Monitoring with Python and Google Search Console API

I want to share with you how I’ve automated my SEO monitoring using Python and the Google Search Console API. In this guide, I walk you through some handy code examples that let you track and analyze your keyword performance over time. Together, we’ll uncover trends, spot your strengths and gaps, and find great opportunities to boost your website’s visibility and effectiveness.

In this article, you’ll see exactly how I use Python and the Google Search Console API to make SEO monitoring a breeze. We’ll effortlessly keep an eye on keyword rankings, identify emerging trends, and discover ways to enhance your site’s visibility and drive more traffic.

The Importance of Tracking SEO Performance

When I’m managing multiple keywords and numerous pages, tracking SEO performance becomes essential. It gives me clear insights into which keywords and pages are driving traffic and engagement.

By keeping an eye on performance metrics, I can identify the areas that are doing well and spot content that needs improvement. This helps me make informed decisions about where to focus my optimization efforts.

Having comprehensive oversight allows me to prioritize tasks, allocate resources effectively, and uncover opportunities to boost my website’s visibility.

Ultimately, consistently tracking SEO performance ensures that my strategy stays targeted and effective, maximizing my site’s potential in a competitive landscape.

Here’s how this Python script has personally helped me:

Tracking multiple keywords across various pages used to be such a hassle, but this script completely changed the game for me. It automates everything by pulling data directly from the Google Search Console API, so I no longer have to gather it all manually.

What I really love is how it lets you monitor keyword rankings over different time frames—whether it’s weekly, monthly, or yearly. This makes it super easy to spot trends and adjust to any shifts in search behavior right away.

The insights it provides are incredibly useful, too. It shows you which pages are performing well and which ones need more attention. It’s also great for spotting opportunities you might’ve missed, helping you make smarter decisions based on real data.

In a nutshell, automating this process makes it easier to keep your SEO efforts consistent and scalable. It saves time, improves your site’s visibility, and helps you stay ahead of the competition.

What insights it offers

By tracking your keyword rankings regularly, you can quickly tell if your website traffic is increasing or declining. It helps you figure out whether these changes are linked to shifts in your keyword rankings.

If traffic drops, looking at both impressions and rankings helps you dig deeper. It could mean fewer people are searching for those terms, or your site may have dropped in the rankings.

External factors can also come into play—things like search engine algorithm updates, AI-generated summaries in the search results (AI Overviews), or new SERP features. Even if your rankings hold steady, these factors can affect your click-through rates (CTR) and, in turn, your traffic.

If ranking changes or search demand aren’t the issue, it’s smart to check for these external influences.

Being aware of these changes allows you to fine-tune your strategy so your SEO stays effective, even when the search landscape shifts.

Tools and Prerequisites

Before diving into the automation process, ensure you have the following:

Python 3.x: The programming language used for scripting.
Google Cloud Project: To access the Google Search Console API.
Google Search Console Account: With your website verified.
Python Libraries:
- google-auth-oauthlib
- google-auth-httplib2
- google-api-python-client
- pandas

You can install the required Python libraries using pip:

pip install google-auth-oauthlib google-auth-httplib2 google-api-python-client pandas

Setting Up Google Search Console API

1. Create a Google Cloud Project

Visit the Google Cloud Console.
Create a new project or select an existing one.

2. Enable the Search Console API

Navigate to APIs & Services > Library.
Search for “Google Search Console API” and enable it.

3. Set Up OAuth 2.0 Credentials

Go to APIs & Services > Credentials.
Click Create Credentials > OAuth client ID.
Select Desktop App and provide a name.
Download the JSON file containing your client credentials and save it securely.

4. Verify Your Site in Search Console

Ensure your website (e.g., https://www.yourdomain.com) is verified in Google Search Console.

With these steps completed, you’re ready to integrate the API with your Python script.

Understanding the Python Script

The provided Python script automates the process of fetching, processing, and exporting SEO data from Google Search Console API. Let’s break down each component to understand its functionality and how it contributes to comprehensive SEO monitoring.

1. Authentication and Authorization

import os
import datetime
import calendar
import pandas as pd
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
from collections import Counter

# Define the OAuth 2.0 scopes
SCOPES = ['https://www.googleapis.com/auth/webmasters.readonly']

# Path to the OAuth 2.0 Client ID JSON file downloaded from Google Cloud Console
CLIENT_SECRET_FILE = '/path/to/your/client_secret.json'
TOKEN_FILE = 'token.json'

def get_credentials():
    """Handles OAuth 2.0 credentials authentication using client secret file and token."""
    creds = None
    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
            creds = flow.run_local_server(port=8080)
        with open(TOKEN_FILE, 'w') as token:
            token.write(creds.to_json())
    return creds

Explanation:

Imports: Essential libraries for authentication, API interaction, data manipulation, and more.
SCOPES: Defines the permissions required. Here, webmasters.readonly allows read-only access to Search Console data.
Client Secret: The JSON file downloaded from Google Cloud containing your OAuth 2.0 credentials.
get_credentials() Function:
- Checks if a token file exists to reuse existing credentials.
- If not, initiates the OAuth flow to obtain new credentials.
- Saves the credentials for future use, avoiding repeated authentication prompts.

2. Fetching Search Console Data

EXCLUDE_QUERY_REGEX = (
    "YOUR BRAND SEARCH QUERY PATTERN"
)
EXCLUDE_URL_REGEX = (
    "URLS_WANT_TO_EXCLUDE"
)

def fetch_search_console_data(service, site_url, start_date, end_date):
    """Fetches Search Console data for the specified date range, dimensions, and metrics."""
    payload = {
        'startDate': start_date,
        'endDate': end_date,
        'dimensions': ['query'],
        'metrics': ['clicks', 'impressions', 'ctr', 'position'],
        'searchType': 'web',
        'rowLimit': 10000,
        "dimensionFilterGroups": [
            {
                "groupType": "and",
                "filters": [
                    {
                        "dimension": "query",
                        "operator": "excludingRegex",
                        "expression": EXCLUDE_QUERY_REGEX
                    },
                    {
                        "dimension": "page",
                        "operator": "excludingRegex",
                        "expression": EXCLUDE_URL_REGEX
                    },
                    {
                        "dimension": "country",
                        "operator": "equals",
                        "expression": "ind"
                    },
                ]
            }
        ]
    }

    try:
        response = service.searchanalytics().query(siteUrl=site_url, body=payload).execute()
    except Exception as e:
        print(f"An error occurred while fetching data: {e}")
        return pd.DataFrame()

    # Initialize an empty list to store results
    results = []

    # Loop through rows in the response
    for row in response.get('rows', []):
        data = {}

        # Use the dimensions from the payload to map the keys (query)
        for i in range(len(payload['dimensions'])):
            data[payload['dimensions'][i]] = row['keys'][i]
        
        # Add metrics
        data['impressions'] = row.get('impressions', 0)
        data['clicks'] = row.get('clicks', 0)
        data['ctr'] = row.get('ctr', 0)
        data['position'] = round(row.get('position', 0), 2)  # Round position to 2 decimal places

        # Append the data row to the results list
        results.append(data)

    # Convert the results into a pandas DataFrame
    dataframe_report = pd.DataFrame.from_dict(results)

    return dataframe_report

Explanation:

EXCLUDE_QUERY_REGEX & EXCLUDE_URL_REGEX: Regular expressions to filter out irrelevant queries and URLs, ensuring data quality by excluding branded or non-essential terms and pages.
fetch_search_console_data() Function:
- Constructs a payload defining the date range, dimensions (queries), metrics (clicks, impressions, CTR, position), and filters.
- Executes the query using the Search Console API.
- Parses the response, extracting relevant data into a structured pandas DataFrame for further analysis.

3. Calculating Date Ranges

def get_mtd_date_ranges():
    """
    Calculates the start and end dates for:
    - Current Month MTD: From the first day of the current month to 3 days ago.
    - Last Month MTD: From the first day of the previous month to the same day number minus 3 days.
    - Last Year MTD: From the first day of the same month last year to the same day number minus 3 days.
    Also calculates the 7-day window for position averages, including last week.
    """
    today = datetime.date.today()
    three_days_ago = today - datetime.timedelta(days=3)
    ten_days_ago = today - datetime.timedelta(days=10)
    
    # **Current Month MTD Impressions**
    first_day_current_month = today.replace(day=1)
    current_mtd_start = first_day_current_month.strftime('%Y-%m-%d')
    current_mtd_end = three_days_ago.strftime('%Y-%m-%d')
    
    # **Last Month MTD Impressions**
    last_month_date = (today.replace(day=1) - datetime.timedelta(days=1))
    first_day_last_month = last_month_date.replace(day=1)
    last_month_day = min(today.day, calendar.monthrange(last_month_date.year, last_month_date.month)[1])
    last_month_mtd_end_date = last_month_date.replace(day=last_month_day)
    last_month_mtd_end = (last_month_mtd_end_date - datetime.timedelta(days=3)).strftime('%Y-%m-%d')
    last_month_mtd_start = first_day_last_month.strftime('%Y-%m-%d')
    
    # **Last Year MTD Impressions**
    try:
        first_day_last_year = today.replace(year=today.year - 1, day=1)
    except ValueError:
        # Handle February 29 for leap years
        first_day_last_year = today.replace(year=today.year - 1, month=2, day=1)
    last_year_mtd_end_date = first_day_last_year.replace(day=min(today.day, calendar.monthrange(first_day_last_year.year, first_day_last_year.month)[1]))
    last_year_mtd_end = (last_year_mtd_end_date - datetime.timedelta(days=3)).strftime('%Y-%m-%d')
    last_year_mtd_start = first_day_last_year.strftime('%Y-%m-%d')
    
    # **Position 7-Day Average Date Ranges**
    # Current 7-day average position: 7 days ending 3 days ago
    position_current_end = three_days_ago
    position_current_start = position_current_end - datetime.timedelta(days=6)
    position_current_date_range = [
        (position_current_start + datetime.timedelta(days=i)).strftime('%Y-%m-%d') 
        for i in range(7)
    ]
    
    # Last Week 7-day average position: 7 days ending 10 days ago
    position_last_week_end = ten_days_ago
    position_last_week_start = position_last_week_end - datetime.timedelta(days=6)
    position_last_week_date_range = [
        (position_last_week_start + datetime.timedelta(days=i)).strftime('%Y-%m-%d') 
        for i in range(7)
    ]
    
    # Last Month 7-day average position: same 7-day window in last month
    last_month_position_end = last_month_mtd_end_date - datetime.timedelta(days=3)
    last_month_position_start = last_month_position_end - datetime.timedelta(days=6)
    position_last_month_date_range = [
        (last_month_position_start + datetime.timedelta(days=i)).strftime('%Y-%m-%d') 
        for i in range(7)
    ]
    
    # Last Year 7-day average position: same 7-day window last year
    try:
        last_year_position_end_date = last_year_mtd_end_date.replace(year=last_year_mtd_end_date.year, month=last_year_mtd_end_date.month, day=last_year_mtd_end_date.day)
    except ValueError:
        # Handle February 29 for leap years
        last_year_position_end_date = last_year_mtd_end_date.replace(month=2, day=28)
    last_year_position_end = last_year_position_end_date - datetime.timedelta(days=3)
    last_year_position_start = last_year_position_end - datetime.timedelta(days=6)
    position_last_year_date_range = [
        (last_year_position_start + datetime.timedelta(days=i)).strftime('%Y-%m-%d') 
        for i in range(7)
    ]
    
    # Logging the date ranges for clarity
    print("\nDate Ranges for Impressions:")
    print(f"Current Month MTD: {current_mtd_start} to {current_mtd_end}")
    print(f"Last Month MTD: {last_month_mtd_start} to {last_month_mtd_end}")
    print(f"Last Year MTD: {last_year_mtd_start} to {last_year_mtd_end}")
    
    print("\nDate Ranges for 7-Day Average Positions:")
    print(f"Current Position 7-Day Avg: {position_current_date_range[0]} to {position_current_date_range[-1]}")
    print(f"Last Week 7-Day Avg Position: {position_last_week_date_range[0]} to {position_last_week_date_range[-1]}")
    print(f"Last Month 7-Day Avg Position: {position_last_month_date_range[0]} to {position_last_month_date_range[-1]}")
    print(f"Last Year 7-Day Avg Position: {position_last_year_date_range[0]} to {position_last_year_date_range[-1]}")
    
    return {
        'impressions': {
            'current_mtd': (current_mtd_start, current_mtd_end),
            'last_month_mtd': (last_month_mtd_start, last_month_mtd_end),
            'last_year_mtd': (last_year_mtd_start, last_year_mtd_end)
        },
        'positions': {
            'current_avg': position_current_date_range,
            'last_week_avg': position_last_week_date_range,
            'last_month_avg': position_last_month_date_range,
            'last_year_avg': position_last_year_date_range
        }
    }

Explanation:

get_mtd_date_ranges() Function:
- Current Month MTD (Month-to-Date): From the first day of the current month to three days ago because GSC data are usually appeared after 2-3 days.
- Last Month MTD: From the first day of the previous month to the same day number minus three days, ensuring consistency even if the months have different lengths.
- Last Year MTD: Mirrors the current month MTD but for the same period in the previous year.
- 7-Day Averages: Calculates date ranges for current, last week, last month, and last year to compute average keyword positions, providing a comparative analysis over time.
- Output: Returns a dictionary containing all relevant date ranges, facilitating organized data fetching and processing.

4. Processing Impressions and Positions

def fetch_impressions_data(service, site_url, start_date, end_date):
    """Fetches and aggregates impressions per query for the specified date range."""
    df = fetch_search_console_data(service, site_url, start_date, end_date)
    
    if df.empty or 'query' not in df.columns or 'impressions' not in df.columns:
        return pd.DataFrame(columns=['query', 'impressions'])
    
    # Aggregate impressions by query
    df_impressions = df.groupby('query')['impressions'].sum().reset_index()
    
    return df_impressions

def calculate_7_day_average_for_position(service, site_url, date_range):
    """Calculates the 7-day average position for each query over the specified date range."""
    # Initialize a dictionary to store positions per query
    query_position_data = {}

    # Fetch data for each day and store the position per query
    for date in date_range:
        df_day = fetch_position_for_date(service, site_url, date)
        
        for index, row in df_day.iterrows():
            query = row['query']
            position = row[f'position_{date}']
            
            if query not in query_position_data:
                query_position_data[query] = []
            
            query_position_data[query].append(position)

    # Calculate the 7-day average by taking the average of positions
    average_position_data = []
    for query, positions in query_position_data.items():
        avg_position = round(sum(positions) / len(positions), 2)
        average_position_data.append({'query': query, 'average_position': avg_position})

    # Convert to DataFrame
    df_average_position = pd.DataFrame(average_position_data)

    return df_average_position

Explanation:

fetch_impressions_data() Function:
- Retrieves impressions data for a specific date range.
- Aggregates impressions per query, providing a clear view of visibility and reach.
calculate_7_day_average_for_position() Function:
- Iterates through each day in the specified date range to fetch position data.
- Aggregates positions per query and computes the average over the 7-day window.
- Provides insights into keyword performance trends, smoothing out daily fluctuations.

5. Exporting Data to Excel

def export_to_excel(df_filtered):
    """Exports the data to an Excel file."""
    print("\nExporting the data to an Excel file 'SEO_queries_export.xlsx'...")

    # Export the queries to an Excel file
    df_filtered.to_excel(f'SEO_queries_export.xlsx', index=False)

    print("Data has been exported successfully!")

Explanation:

export_to_excel() Function:
- Takes the final DataFrame containing all processed data.
- Exports the data to an Excel file named SEO_queries_export.xlsx, facilitating easy sharing and further analysis using spreadsheet tools.

6. Complete Python Script

def main():
    """Main function that fetches data, compares positions, and exports to Excel."""
    credentials = get_credentials()
    service = build('searchconsole', 'v1', credentials=credentials)

    site_url = 'YOURDOMAIN'
    
    # **Fetch Date Ranges for Impressions and Positions**
    date_ranges = get_mtd_date_ranges()
    
    # Extract individual date ranges for impressions
    current_mtd_start, current_mtd_end = date_ranges['impressions']['current_mtd']
    last_month_mtd_start, last_month_mtd_end = date_ranges['impressions']['last_month_mtd']
    last_year_mtd_start, last_year_mtd_end = date_ranges['impressions']['last_year_mtd']
    
    # **Fetch Impressions Data for MTD, Last Month MTD, and Last Year MTD**
    print("\nFetching Current Month MTD Impressions...")
    df_current_mtd_impressions = fetch_impressions_data(service, site_url, current_mtd_start, current_mtd_end)
    df_current_mtd_impressions.rename(columns={'impressions': f'MTD Impressions ({current_mtd_start} to {current_mtd_end})'}, inplace=True)
    
    print("Fetching Last Month MTD Impressions...")
    df_last_month_mtd_impressions = fetch_impressions_data(service, site_url, last_month_mtd_start, last_month_mtd_end)
    df_last_month_mtd_impressions.rename(columns={'impressions': f'Last Month MTD Impressions ({last_month_mtd_start} to {last_month_mtd_end})'}, inplace=True)
    
    print("Fetching Last Year MTD Impressions...")
    df_last_year_mtd_impressions = fetch_impressions_data(service, site_url, last_year_mtd_start, last_year_mtd_end)
    df_last_year_mtd_impressions.rename(columns={'impressions': f'Last Year MTD Impressions ({last_year_mtd_start} to {last_year_mtd_end})'}, inplace=True)
    
    # **Merge Impressions Data**
    df_impressions_final = pd.merge(df_current_mtd_impressions, df_last_month_mtd_impressions, on='query', how='left')
    df_impressions_final = pd.merge(df_impressions_final, df_last_year_mtd_impressions, on='query', how='left')
    
    # **Handle Missing Values by Filling with 0**
    df_impressions_final.fillna(0, inplace=True)
    
    # **Fetch and Calculate Position Averages**
    print("\nCalculating 7-day average positions...")
    position_date_ranges = date_ranges['positions']
    
    # Current 7-day average position
    current_position_date_range = position_date_ranges['current_avg']
    df_current_avg = calculate_7_day_average_for_position(service, site_url, current_position_date_range)
    df_current_avg.rename(columns={'average_position': f'Current Position (7-day avg {current_position_date_range[0]} to {current_position_date_range[-1]})'}, inplace=True)
    
    # Last Week 7-day average position
    last_week_position_date_range = position_date_ranges['last_week_avg']
    df_last_week_avg = calculate_7_day_average_for_position(service, site_url, last_week_position_date_range)
    df_last_week_avg.rename(columns={'average_position': f'Last Week 7-day Avg Position ({last_week_position_date_range[0]} to {last_week_position_date_range[-1]})'}, inplace=True)
    
    # Last Month 7-day average position
    last_month_position_date_range = position_date_ranges['last_month_avg']
    df_last_month_avg = calculate_7_day_average_for_position(service, site_url, last_month_position_date_range)
    df_last_month_avg.rename(columns={'average_position': f'Last Month 7-day Avg Position ({last_month_position_date_range[0]} to {last_month_position_date_range[-1]})'}, inplace=True)
    
    # Last Year 7-day average position
    last_year_position_date_range = position_date_ranges['last_year_avg']
    df_last_year_avg = calculate_7_day_average_for_position(service, site_url, last_year_position_date_range)
    df_last_year_avg.rename(columns={'average_position': f'Last Year 7-day Avg Position ({last_year_position_date_range[0]} to {last_year_position_date_range[-1]})'}, inplace=True)
    
    # Merge the position averages into a single DataFrame
    df_final = pd.merge(df_current_avg, df_last_week_avg, on='query', how='left')
    df_final = pd.merge(df_final, df_last_month_avg, on='query', how='left')
    df_final = pd.merge(df_final, df_last_year_avg, on='query', how='left')
    
    # **Drop duplicate queries and keep only unique ones**
    df_final = df_final.drop_duplicates(subset=['query'])
    
    # **Merge Impressions Data with Position Data**
    df_final = pd.merge(df_final, df_impressions_final, on='query', how='left')
    
    # **Remove clicks and page columns from the export if present**
    columns_to_remove = ['clicks']
    df_final = df_final.drop(columns=[col for col in columns_to_remove if col in df_final.columns], errors='ignore')
    
    # **Rearrange Columns to Place Impressions and Positions First, Remove 'page'**
    # List of dynamic impressions column names
    impressions_columns = [
        f'MTD Impressions ({current_mtd_start} to {current_mtd_end})',
        f'Last Month MTD Impressions ({last_month_mtd_start} to {last_month_mtd_end})',
        f'Last Year MTD Impressions ({last_year_mtd_start} to {last_year_mtd_end})'
    ]
    
    # List of dynamic position column names
    position_columns = [
        f'Current Position (7-day avg {current_position_date_range[0]} to {current_position_date_range[-1]})',
        f'Last Week 7-day Avg Position ({last_week_position_date_range[0]} to {last_week_position_date_range[-1]})',
        f'Last Month 7-day Avg Position ({last_month_position_date_range[0]} to {last_month_position_date_range[-1]})',
        f'Last Year 7-day Avg Position ({last_year_position_date_range[0]} to {last_year_position_date_range[-1]})'
    ]
    
    # Identify other columns (excluding 'query', impressions, positions)
    other_columns = [col for col in df_final.columns if col not in impressions_columns + position_columns + ['query']]
    
    # Final column arrangement: query, impressions, positions, others
    df_final = df_final[['query'] + impressions_columns + position_columns + other_columns]
    
    # **Remove 'page' column if it exists**
    if 'page' in df_final.columns:
        df_final = df_final.drop(columns=['page'])
    
    # Show the first 10 rows as a preview
    print("\nSample of the first 10 queries with comparison of ranks and impressions:")
    print(df_final.head(10))
    
    # Wait for user confirmation before exporting
    confirm_export = input("\nDo you want to export all the data to Excel? (y/n): ").strip().lower()
    if confirm_export != 'y':
        print("Export canceled.")
        return
    
    # Export sorted data to Excel
    export_to_excel(df_final)

if __name__ == '__main__':
    main()

Ramesh Singh

SEO Professional / Organic Growth Hacker

How to Automate SEO Monitoring with Python and Google Search Console API

The Importance of Tracking SEO Performance

Here’s how this Python script has personally helped me:

What insights it offers

Tools and Prerequisites

Setting Up Google Search Console API

1. Create a Google Cloud Project

2. Enable the Search Console API

3. Set Up OAuth 2.0 Credentials

4. Verify Your Site in Search Console

Understanding the Python Script

1. Authentication and Authorization

2. Fetching Search Console Data

3. Calculating Date Ranges

4. Processing Impressions and Positions

5. Exporting Data to Excel

6. Complete Python Script