Skip to content

joshuataylor/req_snowflake

Repository files navigation

req_snowflake

req_snowflake

NOTE: THIS DRIVER/CONNECTOR IS NOT OFFICIALLY AFFILIATED WITH SNOWFLAKE, NOR HAS OFFICIAL SUPPORT FROM THEM.

An Elixir driver for Snowflake, the cloud data platform.

Also has support for both pure-Elixir using JSON, or decoding Arrow files via snowflake_arrow_elixir, a Rust library which decodes the Arrow streaming file into Elixir.

Table of Contents

Usage

Mix.install([
  {:req_snowflake, github: "joshuataylor/req_snowflake"}
])

# With plain string query
Req.new()
|> ReqSnowflake.attach(
  username: "rosebud",
  password: "hunter2",
  account_name: "foobar",
  region: "us-east-1",
  warehouse: "compute_wh", # optional
  role: "myrole", # optional
  database: "mydb", # optional
  schema: "myschema" # optional,
  session_parameters: %{} # Passing in session parameters from
  parallel_downloads: 10 # optional, but recommended. Defaults to 5 (what the other connectors default to).
)
|> Req.post!(query: "select L_ORDERKEY, L_PARTKEY from snowflake_sample_data.tpch_sf1.lineitem limit 2").body
#=>
# %ReqSnowflake.Result{
#   columns: ["L_ORDERKEY", "L_PARTKEY"],
#   total_rows: 2,
#   rows: [[3_000_001, 14406], [3_000_002, 34422]],
#   success: true
# }

# With query parameters for inserting
Req.new()
|> ReqSnowflake.attach(
  username: "rosebud",
  password: "hunter2",
  account_name: "foobar",
  region: "us-east-1",
  warehouse: "compute_wh", # optional
  role: "myrole", # optional
  database: "mydb", # optional
  schema: "myschema" # optional
)
|> Req.post!(
  query: "INSERT INTO \"foo\".\"bar\".\"baz\" (\"hello\") VALUES (?)",
  bindings: %{"1" => %{type: "TEXT", value: "xxx"}}
)
#=>
# %ReqSnowflake.Result {
#   columns: ["number of rows inserted"],
#   total_rows: 1,
#   rows: [[1]],
#   success: true
# }

What this is

It uses the Snowflake REST API to communicate with Snowflake, with an earlier version set for JSON (with support for Arrow if using snowflake_arrow. The REST API is used by the Python, Golang, NodeJS and other languages to send requests to Snowflake, so it is stable and changes are communicated.

This library does not use the Snowflake SQL API, which is limited in its implementation and features.

Right now the library doesn't support MFA, so you'll need to either use private key auth or connect using a username & password. A private key auth is highly recommended as you can rotate passwords easier.

One of the major notes when using Ecto is you will need to enable Snowflakes QUOTED_IDENTIFIERS_IGNORE_CASE setting, which you can find here: https://docs.snowflake.com/en/sql-reference/identifiers-syntax.html#third-party-tools-and-case-insensitive-identifier-resolution

Note that this can be done on an account or if needed on a session level which you can set below.

Features

  • Running queries and inserts.
  • JSON row decoding, using Flow this is much faster to decode. It's recommended to also use Jiffy, benchmarking shows a 2x speedup due to the size of the JSON files Snowflake returns.
  • Arrow row decoding, using snowflake_arrow
  • Passing async queries and getting back a query ID, or polling for results

Options

There are a lot of options that you can pass in, and you can also pass in Snowflake Session Parameters as documented below.

  • snowflake_query string required

    Your snowflake query. select L_ORDERKEY, L_PARTKEY from snowflake_sample_data.tpch_sf1.lineitem limit 2

  • username string required

    Your snowflake username.

  • password string required

    Your snowflake password.

  • account_name string required

    Your account name, this is found before the region name in the URL. https://abc1234.us-east-1.snowflakecomputing.com would be abc1234 .

  • region string required

    Your snowflake region, this is found after your account name.

  • arrow (boolean) optional

    Whether or not to use Arrow. You must have snowflake_arrow included in your project for this to work.

  • cache_token optional

    Cache the login token between queries, for up to 10 minutes. 10 minutes is the standard login token time for Snowflake. If you change a parameter (apart from the query) this will relog you in.

  • warehouse optional The warehouse to use. If none is provided, will use the users default warehouse

  • role optional

    The role to use. If none is provided, will use the users default role

  • database optional

    The database to use. If none is provided, will use the users default database

  • schema string optional

    The schema to use. If none is provided, will use the users default schema

  • application_name string optional

    Application name to pass. By default will not use an application name.

  • bindings map optional

    Any bindings to pass.

  • session_parameters map optional

    You can pass any session parameters from https://docs.snowflake.com/en/sql-reference/parameters.html.

    Example: session_parameters: %{ROWS_PER_RESULTSET: 50} will return 50 results only.

  • parallel_downloads integer optional

    How many parallel downloads to perform for s3. This defaults to 5, which is the default for other connectors.

  • async boolean optional

    Will run the query in async mode, returning you the query ID.

  • async_poll boolean optional

    Will run the query in async mode, then poll every 5000ms (unless defined by async_poll_interval) for the result.

  • async_poll_interval integer optional

    Will run the query in async mode, then poll every interval milliseconds.

  • async_poll_timeout integer optional

    How many times it will try to poll for the result before giving up.

  • download_chunks boolean optional

    Whether to download the chunks or just return the base64.

  • table boolean optional

    If true, will return the results with the expectation you will use table to process the results. By default this is false. For kino_db, this is set as true as this uses Table by default.

    If you are dealing with a lot of data, it's highly recommended to use Table as you can stream the chunks out as you need them, saving you memory, CPU and bandwidth.

  • cache_results **boolean optional

    Whether to cache the downloaded chunks, if you are re-iterating over the same dataset this will be cached locally so that you don't have to redownload the data. This is false by default, but enabled for kino_db as when paging we don't want to redownload the chunk.

  • json_library module optional

    When decoding JSON, jiffy has shown to be 2x faster and use less memory than Jason for larger JSON blobs, as Snowflake can send large JSON files. Examples: json_library: Jason or json_library: :jiffy. :jiffy is an atom because it's an Erlang library. Defaults to JSON.

Table Integration

Using table makes it incredibly easy to manipulate and stream over the data. If you download all the chunks at once, you might end up using a lot of memory if you do this over millions of rows.

With table, you can enumerate over the data and only the relevant chunks will be downloaded and decoded.

To get results 0-200, you could do the following:

Req.new()
|> ReqSnowflake.attach(
# your details here!
)
|> Req.post!(
  snowflake_query: "select * from foo.bar.baz"
)
|> Map.get(:body)
|> Table.to_rows()
|> Enum.slice(0, 200)

Which results in the following:

[
%{
  "ROW_NUMBER" => 688156,
  "SF_ARRAY" => nil,
  "SF_BOOLEAN" => false,
  "SF_DATE" => ~D[2024-04-19],
  "SF_DECIMAL_38_2" => 2146.89,
  "SF_FLOAT" => nil,
  "SF_FLOAT_TWO_PRECISION" => nil,
  "SF_INTEGER" => 8562093803,
  "SF_OBJECT" => nil,
  "SF_TIMESTAMP" => ~N[2025-04-20 07:31:58.064000],
  "SF_TIMESTAMP_LTZ" => nil,
  "SF_TIMESTAMP_NTZ" => nil,
  "SF_VARCHAR" => "ylPPdH0leSF3f5lK9kEC",
  "SF_VARIANT_JSON" => "{\n  \"key_eV6kL4tVYXLFrEyyDNol\": true\n}"
},
...
]

kino_db

kino_db integration is supported out of the box, which allows you to connect to an account, create queries and slice the data. It will only download the chunks that it needs, thanks to slicing. This allows you to see results in the table extremely quickly.

Short term Roadmap

  • Add this to db_connection_snowflake as a generic Snowflake library for db_connection
  • Add this to ecto_snowflake as a Snowflake library for Ecto.

Medium term roadmap

  • Add support for MFA
  • Add support for private key auth
  • Add support for telemetry

Thanks

I just want to thank the opensource community, especially dbconnection/ecto/ecto_sql/postgrex for being amazing, and being able to copy most of the decoding code from that. I also want to thank the @wojtekmach for the awesome work on req, and the help converting this to req.

Thanks to @lewisvrobinson for the logo.

License

Copyright (C) 2022 Josh Taylor

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

❄️ Snowflake integration for Elixir using req, supporting both JSON and Arrow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages