tensorflow：x86_64上的Tensorflow Lite延迟性能混乱-Java 学习之路

System information

OS平台和发行版：Linux Ubuntu 16.04和Intel（R）Core（TM）i3-8100 CPU
TensorFlow从（源或二进制）安装：二进制
TensorFlow版本（使用下面的命令）：1.12.0
Python版本：python-2.7.12
Bazel版：从二进制安装
GCC /编译器版本：从二进制安装
CUDA / cuDNN版本：CPU版本
GPU型号和内存：CPU版本

Describe the current behavior

从“https://www.tensorflow.org/lite/models " and run latency test on original frozen model file and tflite file conerted by " Lite Tools”下载"Mobilenet_V1_1.0_224" .

frozen model test

# some code to import graph from frozen model file

input_tensor = sess.graph.get_tensor_by_name('input:0')
output_tensor = sess.graph.get_tensor_by_name('MobilenetV1/Predictions/Reshape_1:0')

test_case = load_test_data(data_file, resize)
res = sess.run(output_tensor, feed_dict={input_tensor: test_case})

s = timeit.default_timer()
for i in range(1000):
    res = sess.run(output_tensor, feed_dict={input_tensor: test_case})
e = timeit.default_timer()
print("avg cost {} s".format((e - s)/1000))

我的平均成本为0.0266642200947 s

tflite model converted by Lite

# Load TFLite model and allocate tensors.
interpreter = tf.contrib.lite.Interpreter(model_path=model_file)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(input_details)
print(output_details)

# Test model on random input data.
input_shape = input_details[0]['shape']
input_size = (input_shape[1], input_shape[2])
print("size: ", input_size)

# input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
input_data = load_test_data(data_file, resize)

interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])

print("Got {} by prob: {}".format(labels[np.argmax(output_data)], np.max(output_data)))

s = timeit.default_timer()
for i in range(200):
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])
e = timeit.default_timer()
print("avg cost {} s".format((e - s)/200))
print("Got {} by prob: {}".format(labels[np.argmax(output_data)], np.max(output_data)))

然后我得到平均成本0.0471310245991 s，对于quanted模型甚至得到平均成本0.133453620672 s

我期待延迟性能提升，但似乎相反 .

TensorFlow Lite工具是否只为特定的嵌入平台优化模型，例如正式运行的bechmark？或者我做错了导致这种性能下降？

tensorflow：x86_64上的Tensorflow Lite延迟性能混乱

相关问题