Skip to content

Data Validation w/ TensorFlow Data Validation & Preprocessing w/ Apache Beam (via TensorFlow Transform)

Notifications You must be signed in to change notification settings

kaizenlabs/TensorFlow-Preprocessing-Apache-Beam

Repository files navigation

TensorFlow Data Validation & Pre-Processing w/ Apache Beam

Tensorflow

[TensorFlow Extended] (https://www.tensorflow.org/tfx/guide/tft)

Colab

Background

TensorFlow Data Validation (TFDV) helps developers understand, validate, and monitor their ML data at scale. TFDV is used to analyze and validate petabytes of data at Google every day, and has a proven track record in helping TFX users maintain the health of their ML pipelines.

When applying machine learning to real world datasets, a lot of effort is required to preprocess data into a suitable format. This includes converting between formats, tokenizing and stemming text and forming vocabularies, and performing a variety of numerical operations such as normalization. You can do it all with tf.Transform which utilizes Apache Beam for data pre-processing.

Detailed developer documentation on TensorFlow Extended: https://www.tensorflow.org/tfx

Approach

In this Python 2.0 notebook, we're using a CSV that has data about air quality in Belgrade. Using TensorFlow Data Validation, we can infer a schema that can be used for future new datasets to maintain the integrity of data being fed into the model. We also make a copy of the data and drop a column to show anomalies, as well as make two different environments (one for Training the other for Serving) which gives us flexibility with our schema for our Target (we're trying to predict "soot" in the air, given so2, no2, and pm10).

Later on, we create metadata for our schema which can be used with Apache Beam to perform data pre-processing transformations via tf.Transform.

Links

  • KaizenTek - IT Consulting & Cloud Professional Services

About

Data Validation w/ TensorFlow Data Validation & Preprocessing w/ Apache Beam (via TensorFlow Transform)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published