Load data from the Million Song Dataset into a final dimensional model stored in S3.
-
Updated
May 17, 2020 - Python
Load data from the Million Song Dataset into a final dimensional model stored in S3.
A command line tool for inspecting parquet files with PyArrow.
UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).
Project on MapReduce for the Μ111 - Big Data Management course, NKUA, Spring 2023.
A Quarto notebook requesting a parquet file stored in S3
Managing large data sets projects (Data Science)
Help you to visualize hadoop file formats.
Glue Data Quality Example - Deploy to your AWS Account w/ Terraform to test
Data Engineering project on how to build Data Lake on S3 using Chicago Taxi Dataset
Merge Parquet Files on S3 with this AWS Lambda Function
create files which formats are like "orc", "parquet", "xlsx", "json" and so on with Python
Processing and exporting data from EPW files into other formats.
A fast and simple command-line (CLI) tool to convert a Parquet file to an Apache Arrow file
Upstream classifier image preprocessing
Daily scraps the data from rpi-imager-stats
Udacity Data Engeneering Nanodegree Program - My Submission of Project: Data Lake
Simple utility package to convert EDF/EDF+ files into Apache Parquet format.
Proyecto Integrador: Big Data | Bootcamp Henry: Carrera Data Science | Cohorte DataFT 17
Add a description, image, and links to the parquet-files topic page so that developers can more easily learn about it.
To associate your repository with the parquet-files topic, visit your repo's landing page and select "manage topics."