Python | Tensorflow Benchmark Class | tf.test.Benchmark()

Chirag Shilwant
3 min readJul 6, 2021

--

Tensorflow is an open-source machine learning library developed by Google. One of its applications is to develop deep neural networks.

tf.test.Benchmark is an Abstract class that provides helpers for TensorFlow benchmarks. It can be used for comparing between different versions of a function or library.

Syntax:tf.test.Benchmark()

Methods:

run_op_benchmark

Run an op or tensor in the given session. Report the results.

Syntax:run_op_benchmark(
sess, op_or_tensor, feed_dict=None, burn_iters=2, min_iters=10,
store_trace=False, store_memory_usage=True, name=None, extras=None, mbs=0
)
Args:
sess: Session object to use for timing.
op_or_tensor: Operation or Tensor to benchmark.feed_dict: A dict of values to feed for each op iteration (see the feed_dict parameter of Session.run).burn_iters: Number of burn-in iterations to run.min_iters: Minimum number of iterations to use for timing.store_trace: Boolean, whether to run an extra untimed iteration and store the trace of iteration in returned
extras.The trace will be stored as a string in Google Chrome trace format in the extras field
"full_trace_chrome_format". Note that trace will not be stored in test_log_pb2.TestResults proto.
store_memory_usage: Boolean, whether to run an extra untimed iteration, calculate memory usage, and store that in
extras fields.
name: (optional) Override the BenchmarkEntry name with name. Otherwise it is inferred from the top-level
method name.
extras: (optional) Dict mapping string keys to additional benchmark info. Values may be either floats or
values that are convertible to strings.
mbs: (optional) The number of megabytes moved by this op, used to calculate the ops throughput.

Example

Let us consider an example of simple operation which is transposed matrix multiplication (i.e. A * B.T) to understand how to use the above class and methods for benchmarking.

We will use 2 different functions to compute A * B.T and benchmark both the functions to compare their results. First we will use tf.linagl.matmul with transpose_b=True. The second function would do a manual transpose of B using tf.transpose and then multiply with A using tf.linagl.matmul

Note: The code snippets below will not run on Online IDE. Try running it on Google Collab or locally with tensorflow installed on the system.

Code in Python3

1. Benchmarking of First Function ( tf.linagl.matmul with transpose_b=True)

import tensorflow as tfdef matmul_transpose(x, y):
return tf.matmul(x, y, transpose_b=True)
def get_args(i=1024, j=1024, k=1024):
return tf.random.normal((i, j)), tf.random.normal((k, j))
def benchmark_matmul_impl(f, **kwargs):
with tf.Graph().as_default() as graph:
x, y = get_args(**kwargs)
output = f(x, y)
with tf.compat.v1.Session(graph=graph) as sess:
bm = tf.test.Benchmark()
bm_result = bm.run_op_benchmark(sess, output)
return bm_result
# passing matmul_transpose which is a function as a argument
results = benchmark_matmul_impl(matmul_transpose)
print(results)

Output:

INFO:tensorflow:Benchmark [TensorFlowBenchmark.run_op_benchmark] iters: 10, wall_time: 0.0728269, cpu_time: -1,throughput: 0, extras: {'allocator_maximum_num_bytes_cpu': 12582912}, metrics: None
entry {
name: "TensorFlowBenchmark.run_op_benchmark"
iters: 10
wall_time: 0.07282686233520508
extras {
key: "allocator_maximum_num_bytes_cpu"
value {
double_value: 12582912.0
}
}
}
{'iters': 10, 'wall_time': 0.07282686233520508, 'extras': {'allocator_maximum_num_bytes_cpu': 12582912, 'wall_time_mean': 0.07259857654571533, 'wall_time_stdev': 0.0007383372491640008}, 'name': None, 'throughput': 0.0}

2. Benchmarking of Second Function ( manual transpose of B using tf.transpose and then multiply with A)

import tensorflow as tfdef matmul_manual_transpose(x, y):
return tf.matmul(x, tf.transpose(y, (1, 0)))
def get_args(i=1024, j=1024, k=1024):
return tf.random.normal((i, j)), tf.random.normal((k, j))
def benchmark_matmul_impl(f, **kwargs):
with tf.Graph().as_default() as graph:
x, y = get_args(**kwargs)
output = f(x, y)
with tf.compat.v1.Session(graph=graph) as sess:
bm = tf.test.Benchmark()
bm_result = bm.run_op_benchmark(sess, output)
return bm_result
# passing matmul_manual_transpose which is a function as a argument
results = benchmark_matmul_impl(matmul_manual_transpose)
print(results)

Output:

INFO:tensorflow:Benchmark [TensorFlowBenchmark.run_op_benchmark] iters: 10, wall_time: 0.071124, cpu_time: -1,throughput: 0, extras: {'allocator_maximum_num_bytes_cpu': 12582920}, metrics: None
entry {
name: "TensorFlowBenchmark.run_op_benchmark"
iters: 10
wall_time: 0.07112395763397217
extras {
key: "allocator_maximum_num_bytes_cpu"
value {
double_value: 12582920.0
}
}
}
{'iters': 10, 'wall_time': 0.07112395763397217, 'extras': {'allocator_maximum_num_bytes_cpu': 12582920, 'wall_time_mean': 0.07271370887756348, 'wall_time_stdev': 0.00417619888858891}, 'name': None, 'throughput': 0.0}

Comparison:

By running the above code on CPU, we observe that tf.linagl.matmul with transpose_b=True gave a wall time of 0.0728269 where as manual transpose of B using tf.transpose and then multiply with A gave a wall time of 0.071124. Hence both of them performed equally good with the Second approach (i.e manual transpose of B using tf.transpose and then multiply with A) took slightly less time.

--

--

No responses yet