Skip to content

GHC Cheat Sheet

Beeke-Marie Nelke edited this page Sep 27, 2023 · 50 revisions

Legend Set Up

Step 1: Gain access to your Workshop Studio AWS account

  1. Open AWS Workshop Studio at https://catalog.us-east-1.prod.workshops.aws/join

  2. Choose your preferred sign-in method (SSO or using your Amazon retail account)

image
  1. Enter the code provided by the event organizer into the text box. You will find a paper printout at your table.
image
  1. Read and agree to the Terms and Conditions by selecting I agree with the terms and conditions and choose Join Event.
image

Step 2: Access the console of your AWS account

  1. Access the console of your AWS account for the event by clicking the link in the sidebar.
image

Step 3: Type in "cloud formation" in the search bar at the top and click on the cloud formation tile.

image

Step 4: Click on "Legend" under stack name.

image

Step 5: Navigate to the Output tab and click on the Studio URL to launch the Legend Studio Workbench.

image

Step 6: Choose Legend project and create workspace

  1. Choose the LegendStarterModel Legend project by clicking on the drop down.
image
  1. Hit the "+" symbol to create a workspace to start working in Legend. You can give the workspace any name you want.
image image
  1. Click on "Edit Workspace" to get started with Legend Studio.
image

You are all set!

Data Modeling Guide

Edit the Relational Schema (Follow along)

First, let's create a new table in our relational schema so that we can bring in the new surface temperature dataset.

Build Joins (Follow along)

Now that the table schema for surface temperature is defined, we need to join it with the country dataset. We will add a join for the two tables into the relationship schema.

Create a New Class (Let's do this together)

Let's create a new SurfaceTemperature class.

  1. Make sure that the model package is highlighted/selected.
image
  1. Click on the "+" icon and select Class.
image
  1. Name the new Class SurfaceTemperature. All in one word, camelcase. Click "create" when done.
image
  1. Add 3 attributes to your class by hitting the "+" sign: country, year, temperatureChange. Please see screenshot for datatype and cardinality.
image
  1. Hit F9 or click on the Hammer icon at the bottom right to compile your new code.
image

Map the New Class (Let's do this together)

Now, we will map our new _SurfaceTemperature _ Class to our data.

  1. Navigate to the SustainabilityStatisticsMapping in the mapping package.
image
  1. Drag and drop the SurfaceTemperature Class in the Mapping Editor (where also the other Classes are listed). Hit "Create" in the dialog that pops up.
image
  1. Navigate to the store package, and select the SustainabilityStatisticsDatabase. Make sure the SurfaceTemperature Class is selected in the Mapping Editor (you should see a property and source panel). Now drag the store into the source panel on the right-hand side. Select the surface_temperature_change table from the drop down that appears.
image
  1. Drag and drop the columns of the surface temperature datasets (on the right) to the respective attributes (on the left). See screenshot for details.
image
  1. Hit F9 or click on the Hammer icon at the bottom right to compile your new code.
image

Build the relationship between our 2 datasets

  1. Use the Country Class to build the relationship between ProductionBasedCarbonDioxideEmissions Class and the SurfaceTemperature Class. Navigate to the Country Class and add the surfaceTemperature attribute. See screenshot for details.
image
  1. Now update your Country Class Mapping. Navigate to the SustainabilityStatisticsMapping, select the Country Class and map the country_surface_temperature_join to the surfaceTemperature attribute.
image

Visualize in a Diagram (Let's do this together)

  1. Navigate to the "EmissionsModelDiagram" under the model package.
  2. You can now drag and drop the SurfaceTemperature class onto the diagram. You can see that the relationship to the Country class was added for you based on the association that we modeled earlier. Tip: You can move the class box and the relationship lines around to make the diagram pretty.
image

Data Queries

  • Query 1 - Part 1: Get the number of distinct countries that have production based carbon dioxide emission statistics.
  • Query 1 - Part 2: Get the list of distinct countries that have production-based carbon dioxide emission statistics.
  • Query 2: Get the top 10 countries with highest carbon dioxide emissions in 2021.
  • Query 3: Get top 10 countries by average emissions across all years available in the data set.
  • Query 4: List all countries and their surface temperature change in 2020.
  • Query 5: Get top 10 countries with largest surface temperature change between 2016 and 2021.
  • Query 6: Query and sort countries by average (1)surface temperature & then by average (2)carbon dioxide across all years since 2001

Mini User Guide for Data Queries

Launch queries

To create a query, right-click on any class.

image

Add attributes

Add attributes you want to query by dragging & dropping them into the "fetch structure panel".

image

Apply operations

Apply operations on your data, by clicking on the "sigma" symbol. Options include:

  • Sum: Adds up all values
  • Average: Returns the average
  • Count: Counts values
  • Distinct count: Returns distinct count value
image

Add filter

Add a filter to your query by dragging & dropping an attribute to the "filter" panel. Click on the drop down to see available filter options. Alternatively, you can add a filter by right-clicking on the table values. Note: the filter specs will not be persisted in the query.

image image

Sort query results

To sort query results, navigate to the "result set modifier" and add filter conditions. Alternatively, you can sort the results directly in the table by clicking on the column headers. Note: the filter specs will not be persisted in the query.

image image image

Remove duplicates

To remove duplicate values to create a unique set of values, click on "result set modifier" and check the "Eliminate duplicate rows" box.

image

Restrict number of rows

To specify in your query how many rows you'd like to have returned go to the "result set modifier" and enter the desired value under "Limit results".

image

Resources

Deck

GraceHopper_Final.pdf

Data Model

Last Resort in case you get stuck during the exercises or your Legend instances crashes. Please refer to this document to see the entire data model code (including data queries). You can copy/paste this in your Studio project when hitting F8 to enter text mode and then F9 to compile. Make sure that the any other code is deleted before doing this.

Query Results

Query 1 - Part 1: Get the number of distinct countries that have production based carbon dioxide emission statistics. image

Query 1 - Part 2: Get the list of distinct countries that have production-based carbon dioxide emission statistics. image image

Query 2: Get the top 10 countries with highest carbon dioxide emissions in 2021. image image

Query 3: Get top 10 countries by average emissions across all years available in the data set. image image

Query 4: List all countries and their surface temperature change in 2020. image

Query 5: Get top 10 countries with largest surface temperature change between 2016 and 2021. image image

Query 6: Query and sort countries by average (1)surface temperature & then by average (2)carbon dioxide across all years since 2001 image

Legend Website

https://legend.finos.org/

Clone this wiki locally