Skip to content

Latest commit

 

History

History
119 lines (75 loc) · 8.01 KB

google-sheets.md

File metadata and controls

119 lines (75 loc) · 8.01 KB

Google Sheets

Sync overview

The Google Sheets Source is configured to pull data from a single Google Sheets spreadsheet. To replicate multiple spreadsheets, you can create multiple instances of the Google Sheets Source in your Airbyte instance.

Output schema

Each sheet in the selected spreadsheet will be output as a separate stream. Each selected column in the sheet is output as a string field.

Airbyte only supports replicating Grid sheets. See the Google Sheets API docs for more info on all available sheet types.

Note: Sheet names and column headers must contain only alphanumeric characters or _, as specified in the Airbyte Protocol. If your sheet or column header is named e.g: "the data", you'll need to change it to "the_data" for it to be synced by Airbyte. This restriction does not apply to non-header cell values: those can contain any unicode characters. This limitation is temporary and future versions of Airbyte will support more permissive naming patterns.

Data type mapping

Integration Type Airbyte Type Notes
any type string

Features

This section should contain a table with the following format:

Feature Supported?(Yes/No) Notes
Full Refresh Sync Yes
Incremental Sync Coming soon
Namespaces No

Performance considerations

At the time of writing, the Google API rate limit is 100 requests per 100 seconds per user and 500 requests per 100 seconds per project. Airbyte batches requests to the API in order to efficiently pull data and respects these rate limits. It is recommended that you use the same service user (see the "Creating a service user" section below for more information on how to create one) for no more than 3 instances of the Google Sheets Source to ensure high transfer speeds.

Getting Started (Airbyte Cloud)

To configure the connector you'll need to:

  • Authorize your Google account via OAuth
  • The ID of the spreadsheet you'd like to sync

Setup Guide

Authorization

Click on the "Sign in with Google" button and authorize via your Google account.

Sheet ID

you'll need the ID of the Spreadsheet you'd like to sync. To get it, navigate to the spreadsheet in your browser, then copy the portion of the URL which comes after "/d" and before "/edit" or "/view". This is the highlighted portion of the screenshot below:

Getting started (Airbyte OSS)

Requirements

To configure the Google Sheets Source for syncs, you'll need the following:

  • Enable the Google Sheets API for your personal or organization account
  • Enable the Google Drive API for your personal or organization account
  • Create a service account with permissions to access the Google Sheets and Drive APIs
  • Create a Service Account Key for the Service Account
  • Share the spreadsheets you'd like to sync with the Service Account created above
  • The ID of the spreadsheet you'd like to sync

Setup guide

Enable the Google Sheets and Google Drive APIs

Follow the Google documentation for enabling and disabling APIs to enable the Google Sheets and Google Drive APIs. This connector only needs Drive to find the spreadsheet you ask us to replicate; it does not look at any of your other files in Drive.

The video below illustrates how to enable the APIs:

{% embed url="https://youtu.be/Fkfs6BN5HOo" caption="" %}

Create a Service Account and Service Account Key

Follow the Google documentation for creating a service account with permissions as Project Viewer, following the section titled Service Accounts, NOT OAuth 2.0. In the "Grant this service account access to project" section of the wizard, grant the service account the role of Project > Viewer. The video below also illustrates how you can create a Service Account and Key:

{% embed url="https://youtu.be/-RZiNY2RHDM" caption="" %}

You'll notice that once you create the key, your browser will automatically download a JSON file. This is the credentials JSON file that you'll input in the Airbyte UI later in this process, so keep it around.

****

Share your spreadsheet with the Service Account

Once you've created the Service Account, you need to explicitly give it access to your spreadsheet. If your spreadsheet is viewable by anyone with its link, no further action is needed. If this is not the case, then in the "Credentials" tab on the left side of your Google API Dashboard, copy the email address of the Service Account you just created. Then, in the Google sheets UI, click the "share" button and share the spreadsheet with the service account. The video below illustrates this process.

{% embed url="https://youtu.be/GyomEw5a2NQ" caption="" %}

The spreadsheet ID of your Google Sheet

Finally, you'll need the ID of the Spreadsheet you'd like to sync. To get it, navigate to the spreadsheet in your browser, then copy the portion of the URL which comes after "/d" and before "/edit" or "/view". This is the highlighted portion of the screenshot below:

Setting up in the Airbyte UI

The Airbyte UI will ask for two things:

  1. The spreadsheet ID
  2. The content of the credentials JSON you created in the "Create a Service Account and Service Account Key" step above. This should be as simple as opening the file and copy-pasting all its contents into this field in the Airbyte UI.

Changelog

Version Date Pull Request Subject
0.2.9 2022-01-25 9208 Update title and descriptions
0.2.7 2021-09-27 8470 Migrate to the CDK
0.2.6 2021-09-27 6354 Support connecting via Oauth webflow
0.2.5 2021-09-12 5972 Fix full_refresh test by adding supported_sync_modes to Stream initialization
0.2.4 2021-08-05 5233 Fix error during listing sheets with diagram only
0.2.3 2021-06-09 3973 Add AIRBYTE_ENTRYPOINT for Kubernetes support
0.2.2 2021-04-20 2994 Formatting spec
0.2.1 2021-04-03 2726 Fix base connector versioning
0.2.0 2021-03-09 2238 Protocol allows future/unknown properties
0.1.7 2021-01-21 1762 Fix issue large spreadsheet
0.1.6 2021-01-27 1668 Adopt connector best practices
0.1.5 2020-12-30 1438 Implement backoff
0.1.4 2020-11-30 1046 Add connectors using an index YAML file