Google API to Amazon S3

Use the GoogleApiToS3Operator transfer to make requests to any Google API which supports discovery and save its response in an Amazon S3 file.

Prerequisite Tasks

To use these operators, you must do a few things:

Operators

Google Sheets to Amazon S3 transfer operator

This example loads data from Google Sheets and save it to an Amazon S3 file.

tests/system/amazon/aws/example_google_api_sheets_to_s3.py

task_google_sheets_values_to_s3 = GoogleApiToS3Operator(
    task_id="google_sheet_data_to_s3",
    google_api_service_name="sheets",
    google_api_service_version="v4",
    google_api_endpoint_path="sheets.spreadsheets.values.get",
    google_api_endpoint_params={"spreadsheetId": GOOGLE_SHEET_ID, "range": GOOGLE_SHEET_RANGE},
    s3_destination_key=f"s3://{s3_bucket}/{s3_key}",
)

You can find more information about the Google API endpoint used here.

Google Youtube to Amazon S3

This is a more advanced example dag for using GoogleApiToS3Operator which uses xcom to pass data between tasks to retrieve specific information about YouTube videos.

It searches for up to 50 videos (due to pagination) in a given time range (YOUTUBE_VIDEO_PUBLISHED_AFTER, YOUTUBE_VIDEO_PUBLISHED_BEFORE) on a YouTube channel (YOUTUBE_CHANNEL_ID) saves the response in Amazon S3 and also pushes the data to xcom.

tests/system/amazon/aws/example_google_api_youtube_to_s3.py

video_ids_to_s3 = GoogleApiToS3Operator(
    task_id="video_ids_to_s3",
    google_api_service_name="youtube",
    google_api_service_version="v3",
    google_api_endpoint_path="youtube.search.list",
    gcp_conn_id=conn_id_name,
    google_api_endpoint_params={
        "part": "snippet",
        "channelId": YOUTUBE_CHANNEL_ID,
        "maxResults": 50,
        "publishedAfter": YOUTUBE_VIDEO_PUBLISHED_AFTER,
        "publishedBefore": YOUTUBE_VIDEO_PUBLISHED_BEFORE,
        "type": "video",
        "fields": "items/id/videoId",
    },
    google_api_response_via_xcom="video_ids_response",
    s3_destination_key=f"https://s3.us-west-2.amazonaws.com/{s3_bucket_name}/youtube_search",
    s3_overwrite=True,
)

It passes over the YouTube IDs to the next request which then gets the information (YOUTUBE_VIDEO_FIELDS) for the requested videos and saves them in Amazon S3 (S3_BUCKET_NAME).

tests/system/amazon/aws/example_google_api_youtube_to_s3.py

video_data_to_s3 = GoogleApiToS3Operator(
    task_id="video_data_to_s3",
    google_api_service_name="youtube",
    google_api_service_version="v3",
    gcp_conn_id=conn_id_name,
    google_api_endpoint_path="youtube.videos.list",
    google_api_endpoint_params={
        "part": YOUTUBE_VIDEO_PARTS,
        "maxResults": 50,
        "fields": YOUTUBE_VIDEO_FIELDS,
    },
    google_api_endpoint_params_via_xcom="video_ids",
    s3_destination_key=f"https://s3.us-west-2.amazonaws.com/{s3_bucket_name}/youtube_videos",
    s3_overwrite=True,
)

Was this entry helpful?