tf_cnn_benchmarks: High performance benchmarks

tf_cnn_benchmarks contains implementations of several popular convolutional models, and is designed to be as fast as possible. tf_cnn_benchmarks supports both running on a single machine or running in distributed mode across multiple hosts. See the High-Performance models guide for more information.

These models utilize many of the strategies in the TensorFlow Performance Guide. Benchmark results can be found here.

These models are designed for performance. For models that have clean and easy-to-read implementations, see the TensorFlow Official Models.

Getting Started

To run ResNet50 with synthetic data without distortions with a single GPU, run

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_server

Note that the master branch of tf_cnn_benchmarks requires the latest nightly version of TensorFlow. You can install the nightly version by running pip install tf-nightly-gpu in a clean environment, or by installing TensorFlow from source. We sometimes will create a branch of tf_cnn_benchmarks, in the form of cnn_tf_vX.Y_compatible, that is compatible with TensorFlow version X.Y For example, branch cnn_tf_v1.9_compatible works with TensorFlow 1.9.

Some important flags are

model: Model to use, e.g. resnet50, inception3, vgg16, and alexnet.
num_gpus: Number of GPUs to use.
data_dir: Path to data to process. If not set, synthetic data is used. To use Imagenet data use these instructions as a starting point.
batch_size: Batch size for each GPU.
variable_update: The method for managing variables: parameter_server ,replicated, distributed_replicated, independent
local_parameter_device: Device to use as parameter server: cpu or gpu.

To see the full list of flags, run python tf_cnn_benchmarks.py --help.

To run ResNet50 with real data with 8 GPUs, run:

python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \
--model=resnet50 --optimizer=momentum --variable_update=replicated \
--nodistortions --gradient_repacking=8 --num_gpus=8 \
--num_epochs=90 --weight_decay=1e-4 --data_dir=${DATA_DIR} --use_fp16 \
--train_dir=${CKPT_DIR}

This will train a ResNet-50 model on ImageNet with 2048 batch size on 8 GPUs. The model should train to around 76% accuracy.

Running the tests

To run the tests, run

pip install portpicker
python run_tests.py && python run_tests.py --run_distributed_tests

Note the tests require portpicker.

The command above runs a subset of tests that is both fast and fairly comprehensive. Alternatively, all the tests can be run, but this will take a long time:

python run_tests.py --full_tests && python run_tests.py --full_tests --run_distributed_tests

We will run all tests on every PR before merging them, so it is not necessary to pass --full_tests when running tests yourself.

To run an individual test, such as method testParameterServer of test class TfCnnBenchmarksTest of module benchmark_cnn_test, run

python -m unittest -v benchmark_cnn_test.TfCnnBenchmarksTest.testParameterServer

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
platforms		platforms
test_data		test_data
README.md		README.md
Untitled5.py		Untitled5.py
all_reduce_benchmark.py		all_reduce_benchmark.py
all_reduce_benchmark_test.py		all_reduce_benchmark_test.py
allreduce.py		allreduce.py
allreduce_test.py		allreduce_test.py
batch_allreduce.py		batch_allreduce.py
benchmark test		benchmark test
benchmark_cnn.py		benchmark_cnn.py
benchmark_cnn_distributed_test.py		benchmark_cnn_distributed_test.py
benchmark_cnn_distributed_test_runner.py		benchmark_cnn_distributed_test_runner.py
benchmark_cnn_test.py		benchmark_cnn_test.py
cnn_util.py		cnn_util.py
cnn_util_test.py		cnn_util_test.py
coco_metric.py		coco_metric.py
constants.py		constants.py
convnet_builder.py		convnet_builder.py
datasets.py		datasets.py
flags.py		flags.py
mlperf.py		mlperf.py
mlperf_test.py		mlperf_test.py
preprocessing.py		preprocessing.py
run_tests.py		run_tests.py
ssd_constants.py		ssd_constants.py
ssd_dataloader.py		ssd_dataloader.py
test_util.py		test_util.py
tf_cnn_benchmarks.py		tf_cnn_benchmarks.py
variable_mgr.py		variable_mgr.py
variable_mgr_util.py		variable_mgr_util.py
variable_mgr_util_test.py		variable_mgr_util_test.py

bumblebeejoe/benchmark-deep-learning

Folders and files

Latest commit

History

Repository files navigation

tf_cnn_benchmarks: High performance benchmarks

Getting Started

Running the tests

About

Topics

Resources

Stars

Watchers

Forks

Languages