I am new to TLT and I received this error while training detectnet_v2. I cannot understand which paramteter or configuration that caused this error.
Using TensorFlow backend. 2020-03-09 07:56:03.176802: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-03-09 07:56:03.246178: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-03-09 07:56:03.246602: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5bd6a90 executing computations on platform CUDA. Devices: 2020-03-09 07:56:03.246623: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1 2020-03-09 07:56:03.248490: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299965000 Hz 2020-03-09 07:56:03.248824: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5cf20f0 executing computations on platform Host. Devices: 2020-03-09 07:56:03.248845: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2020-03-09 07:56:03.248984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493 pciBusID: 0000:01:00.0 totalMemory: 3.95GiB freeMemory: 3.65GiB 2020-03-09 07:56:03.249025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-03-09 07:56:03.249707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-03-09 07:56:03.249727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-03-09 07:56:03.249739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-03-09 07:56:03.249862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3439 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) 2020-03-09 07:56:03,250 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /workspace/file/tlt/pc/specs/detectnet_v2_train_resnet18_kitti.txt. 2020-03-09 07:56:03,251 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/file/tlt/pc/specs/detectnet_v2_train_resnet18_kitti.txt WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)` 2020-03-09 07:56:03,259 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: `tf.data.TFRecordDataset(path)` 2020-03-09 07:56:03,319 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 50 samples with a batch size of 4; each epoch will therefore take one extra step. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2020-03-09 07:56:03,323 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. 2020-03-09 07:56:03,378 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 3, 384, 1248) 0 __________________________________________________________________________________________________ conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0] __________________________________________________________________________________________________ bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0] __________________________________________________________________________________________________ activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0] __________________________________________________________________________________________________ block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0] __________________________________________________________________________________________________ block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0] __________________________________________________________________________________________________ activation_2 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0] __________________________________________________________________________________________________ block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_2[0][0] __________________________________________________________________________________________________ block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0] __________________________________________________________________________________________________ block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0] __________________________________________________________________________________________________ block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0] block_1a_bn_shortcut[0][0] __________________________________________________________________________________________________ activation_3 (Activation) (None, 64, 96, 312) 0 add_1[0][0] __________________________________________________________________________________________________ block_1b_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_3[0][0] __________________________________________________________________________________________________ block_1b_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_1[0][0] __________________________________________________________________________________________________ activation_4 (Activation) (None, 64, 96, 312) 0 block_1b_bn_1[0][0] __________________________________________________________________________________________________ block_1b_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_4[0][0] __________________________________________________________________________________________________ block_1b_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_2[0][0] __________________________________________________________________________________________________ add_2 (Add) (None, 64, 96, 312) 0 block_1b_bn_2[0][0] activation_3[0][0] __________________________________________________________________________________________________ activation_5 (Activation) (None, 64, 96, 312) 0 add_2[0][0] __________________________________________________________________________________________________ block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 activation_5[0][0] __________________________________________________________________________________________________ block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0] __________________________________________________________________________________________________ activation_6 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0] __________________________________________________________________________________________________ block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_6[0][0] __________________________________________________________________________________________________ block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 activation_5[0][0] __________________________________________________________________________________________________ block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0] __________________________________________________________________________________________________ block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_3 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0] block_2a_bn_shortcut[0][0] __________________________________________________________________________________________________ activation_7 (Activation) (None, 128, 48, 156) 0 add_3[0][0] __________________________________________________________________________________________________ block_2b_conv_1 (Conv2D) (None, 128, 48, 156) 147584 activation_7[0][0] __________________________________________________________________________________________________ block_2b_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_1[0][0] __________________________________________________________________________________________________ activation_8 (Activation) (None, 128, 48, 156) 0 block_2b_bn_1[0][0] __________________________________________________________________________________________________ block_2b_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_8[0][0] __________________________________________________________________________________________________ block_2b_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_2[0][0] __________________________________________________________________________________________________ add_4 (Add) (None, 128, 48, 156) 0 block_2b_bn_2[0][0] activation_7[0][0] __________________________________________________________________________________________________ activation_9 (Activation) (None, 128, 48, 156) 0 add_4[0][0] __________________________________________________________________________________________________ block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 activation_9[0][0] __________________________________________________________________________________________________ block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0] __________________________________________________________________________________________________ activation_10 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0] __________________________________________________________________________________________________ block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_10[0][0] __________________________________________________________________________________________________ block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 activation_9[0][0] __________________________________________________________________________________________________ block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0] __________________________________________________________________________________________________ block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_5 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0] block_3a_bn_shortcut[0][0] __________________________________________________________________________________________________ activation_11 (Activation) (None, 256, 24, 78) 0 add_5[0][0] __________________________________________________________________________________________________ block_3b_conv_1 (Conv2D) (None, 256, 24, 78) 590080 activation_11[0][0] __________________________________________________________________________________________________ block_3b_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_1[0][0] __________________________________________________________________________________________________ activation_12 (Activation) (None, 256, 24, 78) 0 block_3b_bn_1[0][0] __________________________________________________________________________________________________ block_3b_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_12[0][0] __________________________________________________________________________________________________ block_3b_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_2[0][0] __________________________________________________________________________________________________ add_6 (Add) (None, 256, 24, 78) 0 block_3b_bn_2[0][0] activation_11[0][0] __________________________________________________________________________________________________ activation_13 (Activation) (None, 256, 24, 78) 0 add_6[0][0] __________________________________________________________________________________________________ block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 activation_13[0][0] __________________________________________________________________________________________________ block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0] __________________________________________________________________________________________________ activation_14 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0] __________________________________________________________________________________________________ block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_14[0][0] __________________________________________________________________________________________________ block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 activation_13[0][0] __________________________________________________________________________________________________ block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0] __________________________________________________________________________________________________ block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_7 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0] block_4a_bn_shortcut[0][0] __________________________________________________________________________________________________ activation_15 (Activation) (None, 512, 24, 78) 0 add_7[0][0] __________________________________________________________________________________________________ block_4b_conv_1 (Conv2D) (None, 512, 24, 78) 2359808 activation_15[0][0] __________________________________________________________________________________________________ block_4b_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_1[0][0] __________________________________________________________________________________________________ activation_16 (Activation) (None, 512, 24, 78) 0 block_4b_bn_1[0][0] __________________________________________________________________________________________________ block_4b_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_16[0][0] __________________________________________________________________________________________________ block_4b_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_2[0][0] __________________________________________________________________________________________________ add_8 (Add) (None, 512, 24, 78) 0 block_4b_bn_2[0][0] activation_15[0][0] __________________________________________________________________________________________________ activation_17 (Activation) (None, 512, 24, 78) 0 add_8[0][0] __________________________________________________________________________________________________ output_bbox (Conv2D) (None, 12, 24, 78) 6156 activation_17[0][0] __________________________________________________________________________________________________ output_cov (Conv2D) (None, 3, 24, 78) 1539 activation_17[0][0] ================================================================================================== Total params: 11,203,023 Trainable params: 11,193,295 Non-trainable params: 9,728 __________________________________________________________________________________________________ target/truncation is not updated to match the crop areaif the dataset contains target/truncation. target/truncation is not updated to match the crop areaif the dataset contains target/truncation. target/truncation is not updated to match the crop areaif the dataset contains target/truncation. target/truncation is not updated to match the crop areaif the dataset contains target/truncation. 2020-03-09 07:56:26,632 [INFO] iva.detectnet_v2.scripts.train: Found 50 samples in training set Traceback (most recent call last): File "/usr/local/bin/tlt-train-g1", line 8, in <module> sys.exit(main()) File "./common/magnet_train.py", line 37, in main File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn File "./detectnet_v2/scripts/train.py", line 633, in main File "./detectnet_v2/scripts/train.py", line 557, in run_experiment File "./detectnet_v2/scripts/train.py", line 480, in train_gridbox File "./detectnet_v2/scripts/train.py", line 354, in build_validation_graph File "./detectnet_v2/dataloader/default_dataloader.py", line 198, in get_dataset_tensors File "./detectnet_v2/dataloader/utilities.py", line 181, in extract_tfrecords_features StopIteration This is the config file. I changed all pedestrian to person, because my training dataset is labelled as person.
random_seed: 42 dataset_config { data_sources { tfrecords_path: "/workspace/file/tlt/pc/experiment/tfrecords/kitti_trainval/*" image_directory_path: "/workspace/file/Python/annotate_kitti/poc_mrt_frames_Images_KITTI" } image_extension: "jpg" target_class_mapping { key: "car" value: "car" } target_class_mapping { key: "cyclist" value: "cyclist" } target_class_mapping { key: "pedestrian" value: "person" } target_class_mapping { key: "person_sitting" value: "person" } target_class_mapping { key: "van" value: "car" } validation_fold: 0 } augmentation_config { preprocessing { output_image_width: 1248 output_image_height: 384 min_bbox_width: 1.0 min_bbox_height: 1.0 output_image_channel: 3 } spatial_augmentation { hflip_probability: 0.5 zoom_min: 1.0 zoom_max: 1.0 translate_max_x: 8.0 translate_max_y: 8.0 } color_augmentation { hue_rotation_max: 25.0 saturation_shift_max: 0.20000000298 contrast_scale_max: 0.10000000149 contrast_center: 0.5 } } postprocessing_config { target_class_config { key: "car" value { clustering_config { coverage_threshold: 0.00499999988824 dbscan_eps: 0.20000000298 dbscan_min_samples: 0.0500000007451 minimum_bounding_box_height: 20 } } } target_class_config { key: "cyclist" value { clustering_config { coverage_threshold: 0.00499999988824 dbscan_eps: 0.15000000596 dbscan_min_samples: 0.0500000007451 minimum_bounding_box_height: 20 } } } target_class_config { key: "person" value { clustering_config { coverage_threshold: 0.00749999983236 dbscan_eps: 0.230000004172 dbscan_min_samples: 0.0500000007451 minimum_bounding_box_height: 20 } } } } model_config { pretrained_model_file: "/workspace/file/tlt/pc/experiment/pretrained_resnet18/tlt_resnet18_detectnet_v2_v1/resnet18.hdf5" num_layers: 18 use_batch_norm: true activation { activation_type: "relu" } objective_set { bbox { scale: 35.0 offset: 0.5 } cov { } } training_precision { backend_floatx: FLOAT32 } arch: "resnet" } evaluation_config { validation_period_during_training: 10 first_validation_epoch: 1 minimum_detection_ground_truth_overlap { key: "car" value: 0.699999988079 } minimum_detection_ground_truth_overlap { key: "cyclist" value: 0.5 } minimum_detection_ground_truth_overlap { key: "person" value: 0.5 } evaluation_box_config { key: "car" value { minimum_height: 20 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } evaluation_box_config { key: "cyclist" value { minimum_height: 20 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } evaluation_box_config { key: "person" value { minimum_height: 20 maximum_height: 9999 minimum_width: 10 maximum_width: 9999 } } average_precision_mode: INTEGRATE } cost_function_config { target_classes { name: "car" class_weight: 1.0 coverage_foreground_weight: 0.0500000007451 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 10.0 } } target_classes { name: "cyclist" class_weight: 8.0 coverage_foreground_weight: 0.0500000007451 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 1.0 } } target_classes { name: "person" class_weight: 4.0 coverage_foreground_weight: 0.0500000007451 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 10.0 } } enable_autoweighting: true max_objective_weight: 0.999899983406 min_objective_weight: 9.99999974738e-05 } training_config { batch_size_per_gpu: 4 num_epochs: 120 learning_rate { soft_start_annealing_schedule { min_learning_rate: 5e-06 max_learning_rate: 5e-04 soft_start: 0.10000000149 annealing: 0.699999988079 } } regularizer { type: L1 weight: 3.00000002618e-09 } optimizer { adam { epsilon: 9.99999993923e-09 beta1: 0.899999976158 beta2: 0.999000012875 } } cost_scaling { initial_exponent: 20.0 increment: 0.005 decrement: 1.0 } checkpoint_interval: 10 } bbox_rasterizer_config { target_class_config { key: "car" value { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.40000000596 cov_radius_y: 0.40000000596 bbox_min_radius: 1.0 } } target_class_config { key: "cyclist" value { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 1.0 cov_radius_y: 1.0 bbox_min_radius: 1.0 } } target_class_config { key: "person" value { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 1.0 cov_radius_y: 1.0 bbox_min_radius: 1.0 } } deadzone_radius: 0.400000154972 }