CUDA_ERROR_LAUNCH_FAILED error when running TensorFlow mnist example

nikko4913 · December 2, 2017, 3:30pm

Hello!
I just recently got the Jetson TX2 developer kit, and I would really like to use TensorFlow on it. I followed JetsonHacks’ tutorials on installing, and I had no problems during install. I tried both this:

and this:

I have a fresh full install of Jetpack 3.1 on the board as well. I have tried python 2 and 3 with TensorFlow.

The issue I’m encountering only seems to occur when trying to make more sophisticated models with convolutional neural networks. If you are familiar with the TensorFlow examples, then I have been using “minst_softmax.py” without a problem, however, “mnist_deep.py” always outputs this in the terminal:

nvidia@tegra-ubuntu:~/Desktop/tensorflow-r1.3/tensorflow/examples/tutorials/mnist$ ./mnist_deep.py Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz Saving graph to: /tmp/tmpyJvseo 2017-12-02 00:56:23.092487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero 2017-12-02 00:56:23.092610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate (GHz) 1.3005 pciBusID 0000:00:00.0 Total memory: 7.67GiB Free memory: 5.76GiB 2017-12-02 00:56:23.092659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 2017-12-02 00:56:23.092684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y 2017-12-02 00:56:23.092710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0) step 0, training accuracy 0.04 step 100, training accuracy 0.86 step 200, training accuracy 0.96 step 300, training accuracy 0.94 step 400, training accuracy 0.86 step 500, training accuracy 0.92 step 600, training accuracy 0.96 step 700, training accuracy 0.96 step 800, training accuracy 0.96 step 900, training accuracy 1 2017-12-02 00:57:17.461035: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.461146: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.461188: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED Traceback (most recent call last): File "./mnist_deep.py", line 177, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./mnist_deep.py", line 169, in main x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 541, in eval return _eval_using_default_session(self, feed_dict, self.graph, session) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4085, in _eval_using_default_session return session.run(tensors, feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked! [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](reshape/Reshape, conv1/Variable/read)]] [[Node: Mean_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_79_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] Caused by op u'conv1/Conv2D', defined at: File "./mnist_deep.py", line 177, in <module> tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./mnist_deep.py", line 138, in main y_conv, keep_prob = deepnn(x) File "./mnist_deep.py", line 64, in deepnn h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) File "./mnist_deep.py", line 106, in conv2d return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 397, in conv2d data_format=data_format, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access NotFoundError (see above for traceback): No algorithm worked! [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](reshape/Reshape, conv1/Variable/read)]] [[Node: Mean_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_79_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 2017-12-02 00:57:17.738653: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.738769: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.738800: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED 2017-12-02 00:57:17.738824: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED nvidia@tegra-ubuntu:~/Desktop/tensorflow-r1.3/tensorflow/examples/tutorials/mnist$

Keep in mind, that I have only worked with sample files from Tensorflow, and therefore I would believe they work as intended, and that the error lies somewhere in my setup.

If any of you have any idea of what the fault might be, please let me know!

Thanks in advance.

AastaLLL · December 4, 2017, 2:51am

Hi,

W can run ./mnist_deep.py successfully.
Our environment is JetPack3.1 + cuDNNv7 + this TF wheel.

Could you also try this setting on your side?

nvidia@tegra-ubuntu:/media/nvidia/NVIDIA/tensorflow/tensorflow/examples/tutorials/mnist$ python mnist_deep.py Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz Saving graph to: /tmp/tmpX6a0Wf 2017-12-04 02:44:00.519511: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:879] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node Your kernel may have been built without NUMA support. 2017-12-04 02:44:00.519625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate (GHz) 1.3005 pciBusID 0000:00:00.0 Total memory: 7.67GiB Free memory: 4.19GiB 2017-12-04 02:44:00.519671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 2017-12-04 02:44:00.519697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y 2017-12-04 02:44:00.519722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0) 2017-12-04 02:44:00.519752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:657] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support. step 0, training accuracy 0.08 step 100, training accuracy 0.9 step 200, training accuracy 0.84 step 300, training accuracy 0.92 step 400, training accuracy 0.96 step 500, training accuracy 0.92 step 600, training accuracy 0.98 step 700, training accuracy 0.92 step 800, training accuracy 0.98 step 900, training accuracy 0.96 step 1000, training accuracy 0.9 step 1100, training accuracy 0.98 step 1200, training accuracy 1 step 1300, training accuracy 0.94 step 1400, training accuracy 0.96 step 1500, training accuracy 0.98 step 1600, training accuracy 0.96 step 1700, training accuracy 0.96 step 1800, training accuracy 1 step 1900, training accuracy 0.94 step 2000, training accuracy 1 step 2100, training accuracy 0.94 step 2200, training accuracy 0.98 step 2300, training accuracy 1 step 2400, training accuracy 1 step 2500, training accuracy 0.98 step 2600, training accuracy 0.98 step 2700, training accuracy 0.96 step 2800, training accuracy 0.98 step 2900, training accuracy 1 step 3000, training accuracy 0.98 step 3100, training accuracy 0.96 step 3200, training accuracy 1 step 3300, training accuracy 0.98 step 3400, training accuracy 0.98 step 3500, training accuracy 1 step 3600, training accuracy 0.98 step 3700, training accuracy 1 step 3800, training accuracy 0.98 step 3900, training accuracy 0.96 step 4000, training accuracy 0.98 step 4100, training accuracy 0.98 step 4200, training accuracy 0.96 step 4300, training accuracy 0.96 step 4400, training accuracy 1 step 4500, training accuracy 0.94 step 4600, training accuracy 1 step 4700, training accuracy 1 step 4800, training accuracy 0.96 step 4900, training accuracy 1 step 5000, training accuracy 1 step 5100, training accuracy 0.96 step 5200, training accuracy 0.98 step 5300, training accuracy 1 step 5400, training accuracy 0.98 step 5500, training accuracy 0.98 step 5600, training accuracy 1 step 5700, training accuracy 1 step 5800, training accuracy 0.98 step 5900, training accuracy 1 step 6000, training accuracy 1 step 6100, training accuracy 1 step 6200, training accuracy 0.98 step 6300, training accuracy 1 step 6400, training accuracy 0.98 step 6500, training accuracy 1 step 6600, training accuracy 1 step 6700, training accuracy 1 step 6800, training accuracy 0.98 step 6900, training accuracy 1 step 7000, training accuracy 1 step 7100, training accuracy 1 step 7200, training accuracy 1 step 7300, training accuracy 1 step 7400, training accuracy 1 step 7500, training accuracy 1 step 7600, training accuracy 0.98 step 7700, training accuracy 1 step 7800, training accuracy 0.98 step 7900, training accuracy 0.98 step 8000, training accuracy 0.98 step 8100, training accuracy 1 step 8200, training accuracy 1 step 8300, training accuracy 1 step 8400, training accuracy 1 step 8500, training accuracy 1 step 8600, training accuracy 0.98 step 8700, training accuracy 1 step 8800, training accuracy 0.98 step 8900, training accuracy 0.98 step 9000, training accuracy 1 step 9100, training accuracy 1 step 9200, training accuracy 1 step 9300, training accuracy 1 step 9400, training accuracy 1 step 9500, training accuracy 1 step 9600, training accuracy 0.98 step 9700, training accuracy 1 step 9800, training accuracy 1 step 9900, training accuracy 1 step 10000, training accuracy 1 step 10100, training accuracy 0.98 step 10200, training accuracy 1 step 10300, training accuracy 1 step 10400, training accuracy 1 https://github.com/peterlee0127/tensorflow-tx2step 10500, training accuracy 1 step 10600, training accuracy 1 step 10700, training accuracy 1 step 10800, training accuracy 1 ...

Thanks

nikko4913 · December 4, 2017, 3:12pm

Hello.

Thanks for the reply. It seems to me, that the only difference we have is that you have cuDNNv7 and I have the version 6 that comes with Jetpack 3.1. I will try to update, and see if that makes a difference, thank you!

nikko4913 · December 5, 2017, 8:27pm

Hello again.

I have now flashed the Jetson with Jetpack 3.1. I then installed Tensor RT by following the debian install guide on the link you provided. I then installed the wheel file, and tried to run the “mnist_deep.py”. It gives me the same error as in my first post.

This, unfortunately, didn’t help. Just a few hours ago, the repo with the wheel file for Tensorflow 1.3 was updated, and now there is a wheelfile for TensorFlow 1.4. I’ll try this one out too, but I’d still like to get to the bottom of this issue.

Any other ideas on what might be wrong?

Thanks

AastaLLL · December 7, 2017, 5:39am

Hi,

Which TensorFlow wheel do you use?
Have you tried the wheel file mentioned in comment #2?

Thanks.

Topic		Replies	Views
Tensorflow Memory Error Jetson TX2	25	15439	October 18, 2021
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5705	October 18, 2021
CUDA Fail when running Tensorflow inference Jetson TX2	10	3423	February 2, 2018
failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED Jetson TX2	10	1320	March 1, 2018
trouble with Tensorflow and TX2. Jetson TX2	1	1929	March 1, 2018
Available: TensorFlow 1.5 for Jetson TX2 Jetson TX2	18	7981	May 21, 2018
TensorFlow Issue - 'NonMaxSuppressionV3' in binary Jetson TX2	16	3290	October 18, 2021
CUDA_ERROR_LAUNCH_FAILED on Jetson Nano (4GB), Tensorflow 2.5.0, Python 3.6.9 Jetson Nano cuda , tensorflow , ubuntu , jetson-inference , python	4	1739	October 15, 2021
CUDA_error_launch_failed when deploying tensorflow model Jetson TX2	4	2006	October 18, 2021
Crash on training (CUDA_ERROR_LAUNCH_FAILED) cuDNN	7	6843	October 12, 2021

CUDA_ERROR_LAUNCH_FAILED error when running TensorFlow mnist example

Related topics