Hello!
I just recently got the Jetson TX2 developer kit, and I would really like to use TensorFlow on it. I followed JetsonHacks’ tutorials on installing, and I had no problems during install. I tried both this:
and this:
I have a fresh full install of Jetpack 3.1 on the board as well. I have tried python 2 and 3 with TensorFlow.
The issue I’m encountering only seems to occur when trying to make more sophisticated models with convolutional neural networks. If you are familiar with the TensorFlow examples, then I have been using “minst_softmax.py” without a problem, however, “mnist_deep.py” always outputs this in the terminal:
nvidia@tegra-ubuntu:~/Desktop/tensorflow-r1.3/tensorflow/examples/tutorials/mnist$ ./mnist_deep.py Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz Saving graph to: /tmp/tmpyJvseo 2017-12-02 00:56:23.092487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero 2017-12-02 00:56:23.092610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate (GHz) 1.3005 pciBusID 0000:00:00.0 Total memory: 7.67GiB Free memory: 5.76GiB 2017-12-02 00:56:23.092659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 2017-12-02 00:56:23.092684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y 2017-12-02 00:56:23.092710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0) step 0, training accuracy 0.04 step 100, training accuracy 0.86 step 200, training accuracy 0.96 step 300, training accuracy 0.94 step 400, training accuracy 0.86 step 500, training accuracy 0.92 step 600, training accuracy 0.96 step 700, training accuracy 0.96 step 800, training accuracy 0.96 step 900, training accuracy 1 2017-12-02 00:57:17.461035: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.461146: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.461188: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED Traceback (most recent call last): File "./mnist_deep.py", line 177, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./mnist_deep.py", line 169, in main x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 541, in eval return _eval_using_default_session(self, feed_dict, self.graph, session) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4085, in _eval_using_default_session return session.run(tensors, feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked! [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](reshape/Reshape, conv1/Variable/read)]] [[Node: Mean_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_79_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] Caused by op u'conv1/Conv2D', defined at: File "./mnist_deep.py", line 177, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./mnist_deep.py", line 138, in main y_conv, keep_prob = deepnn(x) File "./mnist_deep.py", line 64, in deepnn h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) File "./mnist_deep.py", line 106, in conv2d return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 397, in conv2d data_format=data_format, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access NotFoundError (see above for traceback): No algorithm worked! [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](reshape/Reshape, conv1/Variable/read)]] [[Node: Mean_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_79_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 2017-12-02 00:57:17.738653: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.738769: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.738800: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.738824: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED nvidia@tegra-ubuntu:~/Desktop/tensorflow-r1.3/tensorflow/examples/tutorials/mnist$ Keep in mind, that I have only worked with sample files from Tensorflow, and therefore I would believe they work as intended, and that the error lies somewhere in my setup.
If any of you have any idea of what the fault might be, please let me know!
Thanks in advance.