TensorFlow with SeaweedFS

Simplest Example

import tensorflow as tf
import os

os.environ["S3_ENDPOINT"] = "http://localhost:8333"

# ...

train_dataset = tf.data.TFRecordDataset(filenames=[
    "s3://bucketname/path/to/file1.tfrecord",
    "s3://bucketname/path/to/file2.tfrecord",
]).map(record_parser).batch(BATCH_SIZE)

# ...

model.fit(train_dataset, ...)

TensorFlow on SeaweedFS S3

TensorFlow already supports S3 (Old Link)

Here is an adaption of it with unnecessary content removed.

Configuration

When reading or writing data on S3 with your TensorFlow program, the behavior can be controlled by various environmental variables:

S3_ENDPOINT: The endpoint could be overridden explicitly with S3_ENDPOINT specified.

To read or write objects in a bucket that is not publicly accessible, AWS credentials must be provided through one of the following methods:

Set credentials in the AWS credentials profile file on the local system, located at: ~/.aws/credentials on Linux, macOS, or Unix, or C:\Users\USERNAME\.aws\credentials on Windows.
Set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Example Setup

Using the above information, we can configure Tensorflow to communicate to an S3 endpoint by setting the following environment variables:

S3_ENDPOINT=http://localhost:8333
AWS_ACCESS_KEY_ID=XXXXX                 # Credentials if configured
AWS_SECRET_ACCESS_KEY=XXXXX

Usage

Once setup is completed, Tensorflow can interact with S3 in a variety of ways. Anywhere there is a Tensorflow IO function, an S3 URL can be used.

Smoke Test

To test your setup, stat a file:

from tensorflow.python.lib.io import file_io
print file_io.stat('s3://bucketname/path/')

You should see output similar to this:

<tensorflow.python.pywrap_tensorflow_internal.FileStatistics; proxy of <Swig Object of type 'tensorflow::FileStatistics *' at 0x10c2171b0> >

Reading Data

filenames = ["s3://bucketname/path/to/file1.tfrecord",
             "s3://bucketname/path/to/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)

Tensorflow Tools

Many Tensorflow tools, such as Tensorboard or model serving, can also take S3 URLS as arguments:

tensorboard --logdir s3://bucketname/path/to/model/
tensorflow_model_server --port=9000 --model_name=model --model_base_path=s3://bucketname/path/to/model/export/

This enables an end to end workflow using S3 for all data needs.

Introduction

API

Configuration

Filer

Filer Stores

Advanced Filer Configurations

FUSE Mount

WebDAV

Cloud Drive

AWS S3 API

AWS IAM

Machine Learning

TensorFlow with SeaweedFS

HDFS

Replication and Backup

Async Replication to another Filer [Deprecated]
Async Backup
Async Filer Metadata Backup
Async Replication to Cloud [Deprecated]
Kubernetes Backups and Recovery with K8up

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow with SeaweedFS

Simplest Example

TensorFlow on SeaweedFS S3

Configuration

Example Setup

Usage

Smoke Test

Reading Data

Tensorflow Tools

Introduction

API

Configuration

Filer

Filer Stores

Advanced Filer Configurations

FUSE Mount

WebDAV

Cloud Drive

AWS S3 API

AWS IAM

Machine Learning

HDFS

Replication and Backup

Messaging

Use Cases

Operations

Advanced

Security

Misc Use Case Examples

Clone this wiki locally