Problem to install tensorflow on Xavier (Solved)

I installed the new released JetPack 4.0 and I was trying to install TensorFlow.
I’ve tried couple installation methods listed in TensorFlow website and none of them worked. (using pip and build from source)
pip: could not find a version that satisfies the requirement TensorFlow
build from source: 1. Bazel does not support Ubuntu18.04 so I built Bazel from source
2. TensorFlow cannot be compiled successfully
Can you provide an idea for us how to install TensorFlow on Xavier?
Thank you very much.

Deb

Hi guodebby,

Please see if GitHub - JasonAtNvidia/JetsonTFBuild: Assistance script to build TensorFlow on an NVIDIA Jetson Module helps.

1 Like

I built tensorflow on Xavier.
https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python2.7/JetPack4.0/python2.7/binary

It can install after joining with the cat command.

cat tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part1 tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part2 > tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl pip install tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl 

Parameters:
TF_NEED_JEMALLOC=1
TF_NEED_CUDA=1
TF_CUDA_COMPUTE_CAPABILITIES=7.2,6.2,5.3
TF_NEED_TENSORRT=1
TF_NCCL_VERSION=1

build script:
https://github.com/naisy/JetsonXavier/blob/JetPack4.0_python2.7/JetPack4.0/python2.7/scripts/build_tensorflow.sh

@naisy

Could you do one please for v1.8 Tensorflow v3.6 python please?

Hi AerialRoboticsGuru,

For python 3.6, I have only built 1.6, so I will update it.
https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python3.6/JetPack4.0/python3.6/binary

I built tensorflow 1.8 with Python 3.6.
https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python3.6/JetPack4.0/python3.6/binary

cat tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl.part1 tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl.part2 > tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl pip3 install tensorflow-1.8.0-cp36-cp36m-linux_aarch64.whl 

Parameters:
TF_NEED_JEMALLOC=1
TF_NEED_CUDA=1
TF_CUDA_COMPUTE_CAPABILITIES=7.2,6.2,5.3
TF_NEED_TENSORRT=1
TF_NCCL_VERSION=1

Hi Naisy,

I tried installing Tensorlow in Jetson Xavier using

https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python2.7

After an hour of installation, I got the following error.

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Xavier”
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 7.2
Total amount of global memory: 15827 MBytes (16596103168 bytes)
( 8) Multiprocessors, ( 64) CUDA Cores/MP: 512 CUDA Cores
GPU Max Clock rate: 1500 MHz (1.50 GHz)
Memory Clock rate: 1500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS
arm64
./join_tensorflow_whl.sh: line 26: syntax error: unexpected end of file
./install_tensorflow.sh: line 27: syntax error: unexpected end of file

Hi ankitpurohit,

These files needs ‘fi’ for end of if.
I updated these files. Sorry.

When continuing manually,

cd JetsonXavier/JetPack4.0/python2.7/binay cat tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part1 tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl.part2 > tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl sudo su pip install tensorflow-1.10.1-cp27-cp27mu-linux_aarch64.whl 

Thank you for doing this. I would love to know how you built this because I tried and failed horribly.
I tried using this information:
[url]https://github.com/JasonAtNvidia/JetsonTFBuild[/url]

But could never get the build to complete successfully.

On my Mac I’m running Python v3.6.5, Keras v2.2.0, and Tensorflow v1.8.0. I created a LeNet architecture and I’m training on the MNIST dataset. Although slow it will complete the training. I’ve never had it fail.

When I tried the same experiment on the Jetson, Python v3.6.5, Keras v2.2.0, Tensorflow v1.10.0, 50% of the time the training would fail. The error being - “Input to reshape is a tensor with xxx values, but the requested shape has xxx.” I’m using the exact same program from the Mac.

That is why I wanted to go back to Tensorflow v1.8.0 on the Jetson and repeat the experiment.

Hi Naisy,

Thank you so much for your hard work.
I got succeeded to build tensorflow .

This is the final line of installation.

Successfully installed absl-py-0.5.0 astor-0.7.1 backports.weakref-1.0.post1 gast-0.2.0 grpcio-1.15.0 markdown-3.0.1 numpy-1.14.5 protobuf-3.6.1 setuptools-39.1.0 tensorboard-1.10.0 tensorflow-1.10.1 termcolor-1.1.0 werkzeug-0.14.1

Thank you.

Hi Aerial Robotics Guru,

1. Requirements for building tensorflow:

  • numpy of pip package.
  • mock of pip package.
  • Java 8 is required for bazel. (Not required for TF execution)
  • bazel is required. (Not required for TF execution)

In addition, patches may be applied to the source code.
https://github.com/naisy/JetsonXavier/blob/JetPack4.0_python3.6/JetPack4.0/python3.6/scripts/build_tensorflow.sh

2. Training MNIST data using LeNet model in Keras:
It seemed that there was no problem as far as I tried.

  • Environment
# remove naisy build tensorflow pip3 uninstall tensorflow # install official tensorflow pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp40 tensorflow-gpu # install keras-2.2.0 pip3 install --upgrade keras==2.2.0 
  • Source code (mnist_lenet.py)
# https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py '''Trains a simple convnet on the MNIST dataset. Gets to 99.25% test accuracy after 12 epochs (there is still a lot of margin for parameter tuning). 16 seconds per epoch on a GRID K520 GPU. ''' from __future__ import print_function import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D from keras import backend as K batch_size = 128 num_classes = 10 epochs = 12 # input image dimensions img_rows, img_cols = 28, 28 def LeNet(input_shape, num_classes): model = Sequential() model.add(Conv2D(20, kernel_size=5, strides=1, activation='relu', input_shape=input_shape)) model.add(MaxPooling2D(2, strides=2)) model.add(Conv2D(50, kernel_size=5, strides=1, activation='relu')) model.add(MaxPooling2D(2, strides=2)) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(500, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(), metrics=['accuracy']) return model def default_cnn(input_shape, num_classes): model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) return model # the data, split between train and test sets (x_train, y_train), (x_test, y_test) = mnist.load_data() if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 print('x_train shape:', x_train.shape) print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) #model = default_cnn(input_shape, num_classes) model = LeNet(input_shape, num_classes) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) 
  • Training
python mnist_lenet.py 
  • Result
Using TensorFlow backend. x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 2018-10-03 05:34:14.234838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero 2018-10-03 05:34:14.235162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5 pciBusID: 0000:00:00.0 totalMemory: 15.46GiB freeMemory: 9.55GiB 2018-10-03 05:34:14.235332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-10-03 05:34:15.064031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-03 05:34:15.064223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-10-03 05:34:15.064312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-10-03 05:34:15.064639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9066 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2) 60000/60000 [==============================] - 12s 196us/step - loss: 1.4815 - acc: 0.5106 - val_loss: 0.3588 - val_acc: 0.9084 Epoch 2/12 60000/60000 [==============================] - 6s 92us/step - loss: 0.4331 - acc: 0.8666 - val_loss: 0.2002 - val_acc: 0.9450 Epoch 3/12 60000/60000 [==============================] - 5s 91us/step - loss: 0.2958 - acc: 0.9104 - val_loss: 0.1520 - val_acc: 0.9561 Epoch 4/12 60000/60000 [==============================] - 5s 91us/step - loss: 0.2391 - acc: 0.9277 - val_loss: 0.1248 - val_acc: 0.9622 Epoch 5/12 60000/60000 [==============================] - 5s 90us/step - loss: 0.2048 - acc: 0.9381 - val_loss: 0.1072 - val_acc: 0.9676 Epoch 6/12 60000/60000 [==============================] - 5s 90us/step - loss: 0.1834 - acc: 0.9453 - val_loss: 0.0963 - val_acc: 0.9724 Epoch 7/12 60000/60000 [==============================] - 5s 89us/step - loss: 0.1656 - acc: 0.9501 - val_loss: 0.0864 - val_acc: 0.9737 Epoch 8/12 60000/60000 [==============================] - 5s 89us/step - loss: 0.1541 - acc: 0.9541 - val_loss: 0.0790 - val_acc: 0.9762 Epoch 9/12 60000/60000 [==============================] - 5s 89us/step - loss: 0.1416 - acc: 0.9572 - val_loss: 0.0738 - val_acc: 0.9776 Epoch 10/12 60000/60000 [==============================] - 5s 88us/step - loss: 0.1339 - acc: 0.9593 - val_loss: 0.0683 - val_acc: 0.9786 Epoch 11/12 60000/60000 [==============================] - 5s 88us/step - loss: 0.1255 - acc: 0.9612 - val_loss: 0.0648 - val_acc: 0.9797 Epoch 12/12 60000/60000 [==============================] - 5s 88us/step - loss: 0.1204 - acc: 0.9641 - val_loss: 0.0614 - val_acc: 0.9807 Test loss: 0.06142731437981129 Test accuracy: 0.9807 
1 Like

Interesting. This is what I got the first time. Since then I have tried 2 other times with the same result.

Using TensorFlow backend. Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz 11493376/11490434 [==============================] - 11s 1us/step x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 2018-10-03 12:14:29.103370: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero 2018-10-03 12:14:29.103925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5 pciBusID: 0000:00:00.0 totalMemory: 15.45GiB freeMemory: 9.51GiB 2018-10-03 12:14:29.104045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-10-03 12:14:30.841893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-03 12:14:30.842226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-10-03 12:14:30.842295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-10-03 12:14:30.842868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8951 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2) 60000/60000 [==============================] - 37s 613us/step - loss: 0.2631 - acc: 0.9188 - val_loss: 0.0581 - val_acc: 0.9804 Epoch 2/12 45824/60000 [=====================>........] - ETA: 4s - loss: 0.0915 - acc: 0.9737Traceback (most recent call last): File "mnist_cnn.py", line 66, in <module> validation_data=(x_test, y_test)) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training.py", line 1042, in fit validation_steps=validation_steps) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop outs = f(ins_batch) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2661, in __call__ return self._call(inputs) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2631, in _call fetched = self._callable_fn(*array_vals) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__ run_metadata_ptr) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 0 [[Node: training/Adadelta/gradients/loss/dense_2_loss/Sum_1_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=["loc:@training/Adadelta/gradients/loss/dense_2_loss/Sum_1_grad/Tile"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adadelta/gradients/loss/dense_2_loss/Neg_grad/Neg, training/Adadelta/gradients/loss/dense_2_loss/Sum_1_grad/DynamicStitch/_81)]] 

And another attempt.

Using TensorFlow backend. x_train shape: (60000, 28, 28, 1) 60000 train samples 10000 test samples Train on 60000 samples, validate on 10000 samples Epoch 1/12 2018-10-03 12:39:22.554559: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero 2018-10-03 12:39:22.554937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5 pciBusID: 0000:00:00.0 totalMemory: 15.45GiB freeMemory: 8.54GiB 2018-10-03 12:39:22.555035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-10-03 12:39:24.011406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-03 12:39:24.011757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-10-03 12:39:24.011977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-10-03 12:39:24.012565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8082 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2) 60000/60000 [==============================] - 26s 433us/step - loss: 1.6841 - acc: 0.4387 - val_loss: 0.4450 - val_acc: 0.8889 Epoch 2/12 47360/60000 [======================>.......] - ETA: 3s - loss: 0.5245 - acc: 0.8352Traceback (most recent call last): File "mnist_cnn.py", line 90, in <module> validation_data=(x_test, y_test)) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training.py", line 1042, in fit validation_steps=validation_steps) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop outs = f(ins_batch) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2661, in __call__ return self._call(inputs) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2631, in _call fetched = self._callable_fn(*array_vals) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__ run_metadata_ptr) File "/home/nvidia/.virtualenvs/dl4cv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 0 [[Node: training/SGD/gradients/loss/dense_2_loss/Sum_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=["loc:@training/SGD/gradients/loss/dense_2_loss/Sum_grad/Tile"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/dense_2_loss/truediv_grad/Sum_1, training/SGD/gradients/loss/dense_2_loss/Sum_grad/DynamicStitch/_63)]] 

Hi AerialRoboticsGuru,

I tried it with JetPack 4.1 DP which was released today, but there was no problem.
Because I do not use vertualenv, is that the difference?

Hi naisy,

I’m still on the original JetPack4.0. Initially I thought the virtualenv may be causing the problem. I ran a few tests yesterday. I deleted my original virtual environment and created a fresh one. I ran the mnist_lenet.py program both within the virtual environment and outside.

without virtual environment
pass - 8
fail - 2

with virtual environment
pass - 9
fail -1

My conclusion is that they are the same. I could repeat and try again with Tensorflow 1.8. However at this time if I can get 80-90% success then I am okay with it. As I mentioned before on my Mac I’ve never see a training failure. There may still be an issue with Tensorflow running on the ARM architecture. I appreciate your efforts.

Hi naisy,

Did you remove the the link for v1.8?

https://github.com/naisy/JetsonXavier/tree/JetPack4.0_python3.6/JetPack4.0/python3.6/binary

Hi AerialRoboticsGuru,

Sorry, it moved to JetPack4.1 now.
https://github.com/naisy/JetsonXavier/tree/JetPack4.1_python3.6/JetPack4.1/python3.6/binary
It is the same binary as JetPack4.0.

I will rebuild the repository, but I think that it can be traced from this URL.

Hi naisy,

Thanks for the TensorFlow build. Over the weekend I upgraded to JetPack 4.1 and have TensorFlow v1.8.0 installed. Unfortunately it doesn’t seem to have addressed my original error message. Approximately 1/10 times running my model I will get TensorFlow to fail. At this point I’m just moving on.

Thanks,
Andrew

Thanks @naisy,

I was able to install the tf binary at https://developer.download.nvidia.com/compute/redist/jp/v44/tensorflow/
using the guide provided for Jetpack 4.4 DP Installing TensorFlow for Jetson Platform :: NVIDIA Deep Learning Frameworks Documentation

I create a virtual environment for tensorflow 1.15 as well as 2.1 and ran the mnist training test above.

A.) Tensorflow 1.15.2+nv20.4 and Keras 2.2.4

name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377 pciBusID: 0000:00:00.0 2020-05-28 08:30:34.817542: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 2020-05-28 08:30:34.829256: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-05-28 08:30:34.840522: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2020-05-28 08:30:34.844160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2020-05-28 08:30:34.855435: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2020-05-28 08:30:34.862722: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2020-05-28 08:30:34.863544: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 2020-05-28 08:30:34.863877: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:30:34.864209: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:30:34.864295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2020-05-28 08:30:34.864432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 2020-05-28 08:30:36.604711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 08:30:36.604872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2020-05-28 08:30:36.604987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2020-05-28 08:30:36.605639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:30:36.605954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:30:36.606213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23419 MB memory) -> physical GP U (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2) WARNING:tensorflow:From /home/nv/.virtualenvs/tf1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global _variables instead. WARNING:tensorflow:From /home/nv/.virtualenvs/tf1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1 .is_variable_initialized instead. WARNING:tensorflow:From /home/nv/.virtualenvs/tf1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.v ariables_initializer instead. 2020-05-28 08:30:46.768505: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-05-28 08:30:48.003721: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 60000/60000 [==============================] - 30s 495us/step - loss: 1.4332 - acc: 0.5256 - val_loss: 0.3754 - val_acc: 0.8997 Epoch 2/12 60000/60000 [==============================] - 9s 153us/step - loss: 0.4515 - acc: 0.8605 - val_loss: 0.2204 - val_acc: 0.9361 Epoch 3/12 60000/60000 [==============================] - 9s 153us/step - loss: 0.3161 - acc: 0.9019 - val_loss: 0.1652 - val_acc: 0.9497 Epoch 4/12 60000/60000 [==============================] - 9s 151us/step - loss: 0.2524 - acc: 0.9224 - val_loss: 0.1372 - val_acc: 0.9579 Epoch 5/12 60000/60000 [==============================] - 9s 150us/step - loss: 0.2158 - acc: 0.9340 - val_loss: 0.1164 - val_acc: 0.9631 Epoch 6/12 60000/60000 [==============================] - 9s 149us/step - loss: 0.1895 - acc: 0.9415 - val_loss: 0.1018 - val_acc: 0.9681 Epoch 7/12 60000/60000 [==============================] - 9s 149us/step - loss: 0.1707 - acc: 0.9477 - val_loss: 0.0904 - val_acc: 0.9719 Epoch 8/12 60000/60000 [==============================] - 9s 149us/step - loss: 0.1586 - acc: 0.9513 - val_loss: 0.0859 - val_acc: 0.9730 Epoch 9/12 60000/60000 [==============================] - 9s 149us/step - loss: 0.1441 - acc: 0.9562 - val_loss: 0.0763 - val_acc: 0.9752 Epoch 10/12 60000/60000 [==============================] - 9s 148us/step - loss: 0.1380 - acc: 0.9570 - val_loss: 0.0708 - val_acc: 0.9781 Epoch 11/12 60000/60000 [==============================] - 9s 150us/step - loss: 0.1271 - acc: 0.9614 - val_loss: 0.0663 - val_acc: 0.9787 Epoch 12/12 60000/60000 [==============================] - 9s 149us/step - loss: 0.1252 - acc: 0.9611 - val_loss: 0.0636 - val_acc: 0.9803 Test loss: 0.06360115777691826 Test accuracy: 0.9803 

B.) Tensorflow 2.1.0+nv20.4 and Keras 2.2.4

pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2 coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.17GiB deviceMemoryBandwidth: 82.08GiB/s 2020-05-28 08:34:44.917703: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 2020-05-28 08:34:44.917834: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-05-28 08:34:44.920948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2020-05-28 08:34:44.921789: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2020-05-28 08:34:44.925813: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2020-05-28 08:34:44.928751: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2020-05-28 08:34:44.928902: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 2020-05-28 08:34:44.929129: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:34:44.929415: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:34:44.929522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0 2020-05-28 08:34:44.929653: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2 2020-05-28 08:34:48.372985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 08:34:48.373151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 2020-05-28 08:34:48.373230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N 2020-05-28 08:34:48.374116: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:34:48.374413: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero 2020-05-28 08:34:48.374735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20798 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2) WARNING:tensorflow:From /home/nv/.virtualenvs/tf2/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. WARNING:tensorflow:From /home/nv/.virtualenvs/tf2/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead. WARNING:tensorflow:From /home/nv/.virtualenvs/tf2/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead. 2020-05-28 08:34:55.436745: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-05-28 08:34:56.412336: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 60000/60000 [==============================] - 23s 389us/step - loss: 1.4081 - acc: 0.5403 - val_loss: 0.3864 - val_acc: 0.8945 Epoch 2/12 60000/60000 [==============================] - 7s 122us/step - loss: 0.4643 - acc: 0.8542 - val_loss: 0.2252 - val_acc: 0.9362 Epoch 3/12 60000/60000 [==============================] - 8s 132us/step - loss: 0.3197 - acc: 0.9018 - val_loss: 0.1689 - val_acc: 0.9505 Epoch 4/12 60000/60000 [==============================] - 9s 146us/step - loss: 0.2579 - acc: 0.9205 - val_loss: 0.1375 - val_acc: 0.9583 Epoch 5/12 60000/60000 [==============================] - 9s 146us/step - loss: 0.2193 - acc: 0.9325 - val_loss: 0.1208 - val_acc: 0.9632 Epoch 6/12 60000/60000 [==============================] - 9s 146us/step - loss: 0.1957 - acc: 0.9401 - val_loss: 0.1067 - val_acc: 0.9676 Epoch 7/12 60000/60000 [==============================] - 9s 146us/step - loss: 0.1744 - acc: 0.9469 - val_loss: 0.0939 - val_acc: 0.9709 Epoch 8/12 60000/60000 [==============================] - 9s 147us/step - loss: 0.1648 - acc: 0.9504 - val_loss: 0.0867 - val_acc: 0.9740 Epoch 9/12 60000/60000 [==============================] - 9s 147us/step - loss: 0.1517 - acc: 0.9540 - val_loss: 0.0810 - val_acc: 0.9756 Epoch 10/12 60000/60000 [==============================] - 9s 146us/step - loss: 0.1408 - acc: 0.9562 - val_loss: 0.0748 - val_acc: 0.9768 Epoch 11/12 60000/60000 [==============================] - 9s 145us/step - loss: 0.1328 - acc: 0.9594 - val_loss: 0.0696 - val_acc: 0.9779 Epoch 12/12 60000/60000 [==============================] - 9s 146us/step - loss: 0.1270 - acc: 0.9610 - val_loss: 0.0660 - val_acc: 0.9795 Test loss: 0.06595891129858791 Test accuracy: 0.9795