Skip to content

Latest commit

 

History

History
269 lines (155 loc) · 13.6 KB

STEP_BY_STEP_GUIDE.md

File metadata and controls

269 lines (155 loc) · 13.6 KB

TiDB Cloud + Snowflake Demo Step By Step Guidance

You can find the code here: https://github.com/Mini256/tidb-snowflake-e-commerce-demo

And visit our live demo here: https://tidb-snowflake-e-commerce-demo.vercel.app

Demo Proposal

image

In this demo, we will use TiDB Cloud and Snowflake to build an online e-commerce system, which will use TiDB's powerful realtime HTAP capability and Snowflake's offline analysis capability for a large amount of data in the system. This demo uses TiDB to store business data such as orders, expresses, user data, etc. Use Snowflake as data warehouse to archive cold data and data warehouse modeling. This demo will build a simple real-time dashboard to monitor today's product sales situation, as well as will analyzes users' consumption ability and calculate users' labels according to users' purchasing behavior, and calculate popular items at different grade prices according to item sales, so as to build a simple item recommendation function.

In the traditional e-commerce scenario, especially data reporting part or real-time monitoring part. There was a third part data warehouse used to do pre-aggregation of data and then put it into MySQL to do queries. As the business grows and the report forms become more diversified, the scalability of MySQL becomes a bottleneck. Besides, the complexity of multi-node MySQL's library and table splitting scheme is high, and the operation and maintenance is very difficult.

Compared to MySQL, TiDB has lower latency and better scalability in TP under many scenarios, especially served as the underlying storage for e-commerce big sales promotion.

  1. TiDB supports high-speed access and provides good batch query capability.
  2. Row-based storage with index precise positioning providing millisecond response, and HTAP architecture can support future business development.
  3. TiDB support real-time analysis (such as the day of the data that can not be timely ETLing, like orders and expresses).

Business

Generate business data.

  1. Order business: Generate orders table data.
  2. Express business: Generate express table data. The express table and the orders table are stored in different TiDB databases separately. These two tables can be related by order_id.
  3. Other businesses(Ignore at Demo stage): Use databases (such as MySQL or Oracle), and synchronize data to a unified TiDB cluster in real time through synchronization tools.

Data center

Collect business data and provide data services.

  1. Collect business data through data integration tools such as DM and Flink CDC and write to the unified TiDB cluster.
  2. Data service: Query user tags and recommend hot-selling products to users.
    1. Generate user tags based on user purchase behavior: whether high-net-worth customers or not.
    2. Generate two batches of weekly hot-selling products according to the sales data: hot-selling products with high unit price and hot-selling products with low unit price.
    3. Recommend high-net-worth hot-selling products to high-net-worth users, and low-unit-price hot-selling products to low-net-worth users.
  3. Unified view: Users can query their own orders and the express information associated with the orders in real time.
  4. Real-time analysis: Data analysts analyze the real-time transaction data of the day.
    1. The total amount, and number of transactions on the day.
    2. Group by commodity type to display the total amount, and number of transactions.

Data warehouse

Use Snowflake to complete data warehouse modeling and cold data archiving.

  1. Cold data archiving: Store TiDB orders and express yesterday's data in Snowflake.
  2. Data warehouse modeling: Calculate the monthly bill data of each user, and get the monthly summary expenditure data.
  3. Machine Learning (Ignore at Demo stage): Mining transaction data through machine learning.

Before we start

This is the first page of live demo:

image

As you can see, we need a custom backend endpoint to support this demo.

We recommend user create a gitpod workspace as the backend, by clicking the link in helper text on the first page.

The backend need TiDB and Snowflake connection, you can use your existing instance to try this demo.

If you have no TiDB or Snowflake instances, don't worry.

In this scenario, user could create a trial account of TiDB Cloud, Snowflake and ETLeap(for data pipeline from TiDB to Snowflake).

Run demo backend

You can click the Gitpod button on Github repo, or click the URL in live demo page.

Then authorize the Gitpod, a new workspace will run automatically.

We define scripts in gitpod.yml so that all works will automatically run.

The backend server will serve port 8080 if run successfully. But in Gitpod workspace, the port need to be forwarded. You can get the public 8080 port URL like the picture below.

image

Gitpod workspace will open a preview tab of live demo after port 8080 is ready. We recommend here to copy the URL or Endpoint, paste on our live demo website using your own browser.

Gitpod Preview browser tab has many limitations, so that we recommend using your own browser tab intstead of Gitpod Preview tab.

Visit and config live demo

1. Visit home page and configure

Visit our live demo here: https://tidb-snowflake-e-commerce-demo.vercel.app

Paste and check the endpoint URL you get from previous step.

You will see the Config Stepper if the backend is not fully ready.

image

Click the Walkthrough button and you will get a guidance for creating TiDB Cloud.

2. Configure TiDB Connection

2.1 Register TiDB Cloud

We recommend TiDB Cloud here instead of a self-host TiDB instance.

  1. If you do not have a TiDB Cloud account, click TiDB Cloud to sign up for an account.

  2. Sign in with your TiDB Cloud account.

  3. To create a Developer Tier cluster for one year free, you can either select the Developer Tier plan on the plan page or click Create a Cluster (Dev Tier).

  4. On the Create a Cluster (Dev Tier) page, set up your cluster name, password, cloud provider (for now, only AWS is available for Developer Tier), and region (a nearby region is recommended). Then click Create to create your cluster.

  5. Your TiDB Cloud cluster will be created in approximately 5 to 15 minutes. You can check the creation progress at Active Clusters.

  6. After creating a cluster, on the Active Clusters page, click the name of your newly created cluster to navigate to the cluster control panel.

    active clusters

  7. Click Connect to create a traffic filter (a list of client IPs allowed for TiDB connection).

    connect

  8. In the popup window, click Add Your Current IP Address to fill in your current IP address, and then click Create Filter to create a traffic filter.

  9. Copy the string to connect with a SQL client for later use.

    SQL string

Tip: For Gitpod Workspace usage, we can set an Allow All traffic filter here by setting an IP address 0.0.0.0/0. image

2.2 Configure TiDB Connection

Configure the TiDB Cloud Host Address and password.

In this demo we will create a database ECOMMERCE.

image

Click Continue after all settled.

You will see the next stepper if the backend successfully connect to TiDB instance. If you see error message, don't worry. Check your input, make sure each input fields are correct. Do remember set a Allow All traffic filter if you use TiDB Cloud.

3. Create Schema on TiDB

Click the Create button and the schema will be created.

image

4. Import initial data

Click the Import button and the initial data will be imported.

image

You can check the status anytime.

image

5. Configure Snowflake Connection

Click the Walkthrough button and you will get a guidance for creating Snowflake cluster.

image

5.1 Create Snowflake Cluster

  1. Create Snowflake account(snowflake.com): image

  2. Config version, cloud provider and region: image

  3. Get details from link in email which will automatically be sent to you after submitting:

    Then we can get(for example):

    • SNOWSQL_USERNAME
    • SNOWSQL_PASSWORD
    • SNOWSQL_ROLE: ACCOUNTADMIN
  4. Get Snowflake URL from link in email which will automatically be sent to you after activating account: image

    Then we can get(for example):

    • SNOWSQL_HOST: GQ01328.ap-northeast-1.aws.snowflakecomputing.com
    • SNOWSQL_ACCOUNT: GQ01328
  5. Visit snowflake manage page, then choose Admin > Partner Connect on left navigation, then choose ETLeap:

    This step is for further data pipeline from TiDB to Snowflake.

    Note: A Snowflake trial account will expire after 30 days and ETLeap will expire after 15 days.

    image

  6. Click Connect:

    image

    Then we can get(for example):

    • SNOWSQL_DATABASE: PC_ETLEAP_DB
    • SNOWSQL_WAREHOUSE: PC_ETLEAP_WH
    • SNOWSQL_SCHEMA: PUBLIC
  7. Active ETLeap: image image

  8. After all settled, we can get all these data:

    SNOWSQL_HOST=GQ01328.ap-northeast-1.aws.snowflakecomputing.com
    SNOWSQL_ACCOUNT=GQ01328
    SNOWSQL_WAREHOUSE=PC_ETLEAP_WH
    SNOWSQL_DATABASE=PC_ETLEAP_DB
    SNOWSQL_SCHEMA=PUBLIC
    SNOWSQL_USER=<admin username>
    SNOWSQL_ROLE=ACCOUNTADMIN
    SNOWSQL_PWD=<admin password>
    

5.2 Input Snowflake Info

Click Continue and backend will check the connection.

image

6. Create Snowflake Schema

image

7. Finish configuration

After all settled, you can click Continue button.

You'll be directed to demo console page.

image

8. Create Pipeline between TiDB and Snowflake

After initial configuration, you will be directed to Console - Pipeline page.

In this page, we will perform offiline analysis on Snowflake, and create a pipeline between TiDB and Snowflake. So that Snowflake analysis data will be written back to TiDB.

You can edit your endpoint at any time by click the edit icon on header.

And switch to other pages by click left navigation.

image

8.1 ETL data from TiDB to Snowflake

🚀 You can easily synchronize data from TiDB to Snowflake using the following proven ETL tools: (Click the link to check out the step-by-step guide)

  • ETLeap
  • AirByte (Coming soon!)

8.2 Perform offline analysis on Snowflake

You can see three tabs, each tab perform different calculate job.

You can easily run the calculation by clicking the button below.

After calculation, item label and user label will display correctly in Items page and Recommended Items page.

image

8.3 Write back data to TiDB

You can easily wirte back analysis data from Snowflake to TiDB by clicking the button.

image

8.4 Next step

After writting data back to TiDB, you have finished all initial steps of this demo.

Then you can visit Recommend Items page to take a view of labeled items. Or you can visit any other page you like.