Why cv task cannot work with NVIDIA TAO Toolkit 3.0

16968377 · September 8, 2021, 6:41am

(launcher) xgy@xgy:~$ tao detectnet_v2 train --help
2021-09-08 14:36:21,791 [INFO] root: Registry: [‘nvcr.io’]
2021-09-08 14:36:21,882 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/xgy/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2021-09-08 14:36:23,290 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

it cannot show the help info

Morganh · September 8, 2021, 6:59am

I cannot reproduce your result.

$ tao detectnet_v2 train --help
~/.tao_mounts.json wasn’t found. Falling back to obtain mount points and docker configs from ~/.tlt_mounts.json.
Please note that this will be deprecated going forward.
2021-09-08 14:57:15,369 [INFO] root: Registry: [‘nvcr.io’]
2021-09-08 14:57:19,576 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/morganh/.tlt_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
usage: detectnet_v2 train [-h] [–num_processes NUM_PROCESSES] [–gpus GPUS]
[–gpu_index GPU_INDEX [GPU_INDEX …]] [–use_amp]
[–log_file LOG_FILE] [-e EXPERIMENT_SPEC_FILE]
[-r RESULTS_DIR] [-n MODEL_NAME] [-v] -k KEY
{calibration_tensorfile,dataset_convert,evaluate,export,inference,prune,train}
…

optional arguments:
-h, --help show this help message and exit
–num_processes NUM_PROCESSES, -np NUM_PROCESSES
The number of horovod child processes to be spawned.
Default is -1(equal to --gpus).
–gpus GPUS The number of GPUs to be used for the job.
–gpu_index GPU_INDEX [GPU_INDEX …]
The indices of the GPU’s to be used.
–use_amp Flag to enable Auto Mixed Precision.
–log_file LOG_FILE Path to the output log file.
-e EXPERIMENT_SPEC_FILE, --experiment_spec_file EXPERIMENT_SPEC_FILE
Path to spec file. Absolute path or relative to
working directory. If not specified, default spec from
spec_loader.py is used.
-r RESULTS_DIR, --results_dir RESULTS_DIR
Path to a folder where experiment outputs should be
written.
-n MODEL_NAME, --model_name MODEL_NAME
Name of the model file. If not given, then defaults to
model.hdf5.
-v, --verbose Set verbosity level for the logger.
-k KEY, --key KEY The key to load pretrained weights and save
intermediate snapshopts and final model.

tasks:
{calibration_tensorfile,dataset_convert,evaluate,export,inference,prune,train}

Please try again. Or login docker directly to check.
$ tao detectnet_v2 run /bin/bash
# detectnet_v2 train --help

16968377 · September 8, 2021, 7:03am

root@f1419be94cb4:/workspace# detectnet_v2 train --help
Illegal instruction (core dumped)
root@f1419be94cb4:/workspace#

it show some wrong,how can i fix?

Morganh · September 8, 2021, 7:04am

Seems that your cpu is a bit old. What is the cpu info ?
More, please search “Illegal instruction” in Tao forum. Previously some users get the same issues. Unfortunately TAO does not support it.

16968377 · September 8, 2021, 7:09am

but it can work well in conversational AI
(launcher) xgy@xgy:~$ tao text_classification dataset_convert -h
2021-09-08 15:08:41,843 [INFO] root: Registry: [‘nvcr.io’]
2021-09-08 15:08:41,929 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/xgy/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
[NeMo W 2021-09-08 07:08:46 experimental:27] Module <class ‘nemo.collections.nlp.modules.common.megatron.megatron_bert.MegatronBertEncoder’> is experimental, not ready for production and is not fully supported. Use at your own risk.
INFO: Generating new fontManager, this may take some time…
usage: text_classification [-h] -r RESULTS_DIR [-k KEY] [-e EXPERIMENT_SPEC_FILE] [-g GPUS] [-m RESUME_MODEL_WEIGHTS] [-o OUTPUT_SPECS_DIR]
{dataset_convert,evaluate,export,finetune,infer,infer_onnx,train,download_specs}

Train Adapt Optimize Toolkit

positional arguments:
{dataset_convert,evaluate,export,finetune,infer,infer_onnx,train,download_specs}
Subtask for a given task/model.

optional arguments:
-h, --help show this help message and exit
-r RESULTS_DIR, --results_dir RESULTS_DIR
Path to a folder where the experiment outputs should be written. (DEFAULT: ./)
-k KEY, --key KEY User specific encoding key to save or load a .tlt model.
-e EXPERIMENT_SPEC_FILE, --experiment_spec_file EXPERIMENT_SPEC_FILE
Path to the experiment spec file.
-g GPUS, --gpus GPUS Number of GPUs to use. The default value is 1.
-m RESUME_MODEL_WEIGHTS, --resume_model_weights RESUME_MODEL_WEIGHTS
Path to a pre-trained model or model to continue training.
-o OUTPUT_SPECS_DIR, --output_specs_dir OUTPUT_SPECS_DIR
Path to a target folder where experiment spec files will be downloaded.
2021-09-08 15:08:47,234 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · September 8, 2021, 7:16am

What is the cpu info ?

16968377 · September 8, 2021, 7:17am

My environment is virtual by proxmox,it can use? the cpu info
xgy@xgy:/data/xgy/worksapce/tao/cv_samples_v1.2.0$ cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
16 Common KVM processor

16968377 · September 8, 2021, 7:20am

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Common KVM processor
stepping : 1
microcode : 0x1
cpu MHz : 2800.000
cache size : 16384 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology cpuid tsc_known_freq pni cx16 x2apic hypervisor lahf_lm cpuid_fault pti
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5600.00
clflush size : 64
cache_alignment : 128
address sizes : 40 bits physical, 48 bits virtual
power management:

Morganh · September 8, 2021, 7:25am

Please try to find the info of real cpu instead of KVM.

16968377 · September 8, 2021, 7:27am

OK,thank you

16968377 · September 8, 2021, 7:50am

i fix the problem, the cpu have not avx instruction set，tensorflow compile in avx instruction set,KVM CPU add avx nstruction set,it can work well

Topic		Replies	Views
Tao toolkit container not installing TAO Toolkit	27	3260	June 20, 2022
No CUDA-capable device is detected on tao detectnet_v2 dataset convert TAO Toolkit pycuda , omniverse_extension	13	6308	January 4, 2022
Tao toolkit Error while fetching server API version TAO Toolkit	19	1996	June 15, 2023
TAO Toolkit Container(for Conversational AI) Setup Issue TAO Toolkit	12	677	November 30, 2022
CLI update TAO Toolkit	14	1249	June 23, 2022
TAO 4.0 AutoML - the provided PTX was compiled with an unsupported toolchain TAO Toolkit	6	727	July 17, 2023
Error in TAO Toolkit API 3.22.05 TAO Toolkit	4	540	July 26, 2022
Can't run tao 3.0 w/ RTX A6000 TAO Toolkit	15	1226	January 4, 2022
Facing error after training command TAO Toolkit	10	1157	February 28, 2022
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>) TAO Toolkit	6	1893	March 4, 2022

Why cv task cannot work with NVIDIA TAO Toolkit 3.0

Related topics