SSD-300 tensorflow mAP is very low due to unstable loss

Question

SSD-300 tensorflow mAP is very low due to unstable loss

EverydayDeveloper

2021年10月24日 14:18

I am training my SSD-300 model for which I have resized images to 300x300. I am using the default settings as mentioned in github repo: https://github.com/balancap/SSD-Tensorflow The loss is unstable while training. I tried training it till 50,000 training steps. The current mAP that I am getting is 0.26(VOC 2007) and 0.24 (VOC 2012)

Train set: 1500 images Test: 300 images

Current parameters:

!python train_ssd_network.py --dataset_name=pascalvoc_2007 --dataset_split_name=train --model_name=ssd_300_vgg --save_summaries_secs=60 --save_interval_secs=600 --weight_decay=0.00004 --optimizer=adam --learning_rate=0.01 --batch_size=2 --gpu_memory_fraction=0.9 --learning_rate_decay_factor=0.94 -num_classes=3  --checkpoint_exclude_scopes =ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box --eval_training_data=True

What can I do so that I get a good accuracy (mAP)?

Example of loss, the loss even reached to 80:

W1024 13:57:41.660651 140239494461312 deprecation.py:323] From train_ssd_network.py:256: batch (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.batch(batch_size)` (or `padded_batch(...)` if `dynamic_pad=True`).
WARNING:tensorflow:From train_ssd_network.py:292: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

W1024 13:57:41.676577 140239494461312 module_wrapper.py:139] From train_ssd_network.py:292: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From train_ssd_network.py:292: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

W1024 13:57:41.676797 140239494461312 module_wrapper.py:139] From train_ssd_network.py:292: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/deployment/model_deploy.py:194: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1024 13:57:41.677163 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/deployment/model_deploy.py:194: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/deployment/model_deploy.py:194: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

W1024 13:57:41.677324 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/deployment/model_deploy.py:194: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1024 13:57:41.679852 140239494461312 deprecation.py:323] From /usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/nets/ssd_vgg_300.py:476: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
W1024 13:57:41.998192 140239494461312 deprecation.py:323] From /content/gdrive/MyDrive/Training_SSD/SSD-1/nets/ssd_vgg_300.py:476: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/nets/ssd_vgg_300.py:642: The name tf.losses.add_loss is deprecated. Please use tf.compat.v1.losses.add_loss instead.

W1024 13:57:42.408573 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/nets/ssd_vgg_300.py:642: The name tf.losses.add_loss is deprecated. Please use tf.compat.v1.losses.add_loss instead.

WARNING:tensorflow:From train_ssd_network.py:307: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

W1024 13:57:42.419716 140239494461312 module_wrapper.py:139] From train_ssd_network.py:307: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

WARNING:tensorflow:From train_ssd_network.py:308: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

W1024 13:57:42.420833 140239494461312 module_wrapper.py:139] From train_ssd_network.py:308: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:105: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

W1024 13:57:42.625701 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:105: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:144: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W1024 13:57:42.629828 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:144: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:245: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W1024 13:57:42.630920 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:245: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

WARNING:tensorflow:From train_ssd_network.py:367: The name tf.summary.merge is deprecated. Please use tf.compat.v1.summary.merge instead.

W1024 13:57:43.817304 140239494461312 module_wrapper.py:139] From train_ssd_network.py:367: The name tf.summary.merge is deprecated. Please use tf.compat.v1.summary.merge instead.

WARNING:tensorflow:From train_ssd_network.py:372: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.

W1024 13:57:43.820022 140239494461312 module_wrapper.py:139] From train_ssd_network.py:372: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.

WARNING:tensorflow:From train_ssd_network.py:373: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W1024 13:57:43.820249 140239494461312 module_wrapper.py:139] From train_ssd_network.py:373: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From train_ssd_network.py:375: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W1024 13:57:43.820408 140239494461312 module_wrapper.py:139] From train_ssd_network.py:375: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:226: The name tf.gfile.IsDirectory is deprecated. Please use tf.io.gfile.isdir instead.

W1024 13:57:43.963253 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:226: The name tf.gfile.IsDirectory is deprecated. Please use tf.io.gfile.isdir instead.

WARNING:tensorflow:From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:230: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W1024 13:57:43.963784 140239494461312 module_wrapper.py:139] From /content/gdrive/MyDrive/Training_SSD/SSD-1/tf_utils.py:230: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:Fine-tuning from /content/gdrive/MyDrive/Training_SSD/SSD-1/checkpoints/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt. Ignoring missing vars: False
I1024 13:57:43.963922 140239494461312 tf_utils.py:230] Fine-tuning from /content/gdrive/MyDrive/Training_SSD/SSD-1/checkpoints/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt. Ignoring missing vars: False
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py:742: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
W1024 13:57:44.120857 140239494461312 deprecation.py:323] From /usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py:742: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2021-10-24 13:57:44.436826: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-10-24 13:57:44.440876: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000170000 Hz
2021-10-24 13:57:44.441070: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56449f9cb9c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-10-24 13:57:44.441100: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-10-24 13:57:44.442817: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-10-24 13:57:44.554870: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-24 13:57:44.555802: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56449f9cb640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-10-24 13:57:44.555833: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2021-10-24 13:57:44.556006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-24 13:57:44.556564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2021-10-24 13:57:44.556867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-10-24 13:57:44.558049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-10-24 13:57:44.559113: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-10-24 13:57:44.559464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-10-24 13:57:44.560805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-10-24 13:57:44.561773: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-10-24 13:57:44.564919: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-10-24 13:57:44.565038: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-24 13:57:44.565658: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-24 13:57:44.566169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-10-24 13:57:44.566234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-10-24 13:57:44.567361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-24 13:57:44.567389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2021-10-24 13:57:44.567399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2021-10-24 13:57:44.567544: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-24 13:57:44.568127: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-24 13:57:44.568645: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-10-24 13:57:44.568687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14652 MB memory) - physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
INFO:tensorflow:Restoring parameters from /content/gdrive/MyDrive/Training_SSD/SSD-1/checkpoints/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt
I1024 13:57:45.783673 140239494461312 saver.py:1284] Restoring parameters from /content/gdrive/MyDrive/Training_SSD/SSD-1/checkpoints/ssd_300_vgg.ckpt/ssd_300_vgg.ckpt
INFO:tensorflow:Running local_init_op.
I1024 13:57:46.017776 140239494461312 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1024 13:57:46.075058 140239494461312 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Starting Session.
I1024 13:57:47.806292 140239494461312 learning.py:754] Starting Session.
INFO:tensorflow:Saving checkpoint to path /content/gdrive/MyDrive/Training_SSD/SSD-1/log_30000/model.ckpt
I1024 13:57:47.882676 140237141432064 supervisor.py:1117] Saving checkpoint to path /content/gdrive/MyDrive/Training_SSD/SSD-1/log_30000/model.ckpt
INFO:tensorflow:Starting Queues.
I1024 13:57:47.896139 140239494461312 learning.py:768] Starting Queues.
INFO:tensorflow:global_step/sec: 0
I1024 13:57:51.071433 140237149824768 supervisor.py:1099] global_step/sec: 0
2021-10-24 13:57:51.662253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-10-24 13:57:52.787163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
INFO:tensorflow:Recording summary at step 1.
I1024 13:57:55.259541 140235672987392 supervisor.py:1050] Recording summary at step 1.
INFO:tensorflow:global step 10: loss = 9.6467 (0.115 sec/step)
I1024 13:57:56.277639 140239494461312 learning.py:507] global step 10: loss = 9.6467 (0.115 sec/step)
INFO:tensorflow:global step 20: loss = 0.7245 (0.106 sec/step)
I1024 13:57:57.399851 140239494461312 learning.py:507] global step 20: loss = 0.7245 (0.106 sec/step)
INFO:tensorflow:global step 30: loss = 9.5159 (0.109 sec/step)
I1024 13:57:58.544558 140239494461312 learning.py:507] global step 30: loss = 9.5159 (0.109 sec/step)
INFO:tensorflow:global step 40: loss = 0.6637 (0.106 sec/step)
I1024 13:57:59.686780 140239494461312 learning.py:507] global step 40: loss = 0.6637 (0.106 sec/step)
INFO:tensorflow:global step 50: loss = 0.7424 (0.140 sec/step)
I1024 13:58:00.898716 140239494461312 learning.py:507] global step 50: loss = 0.7424 (0.140 sec/step)
INFO:tensorflow:global step 60: loss = 21.9683 (0.141 sec/step)
I1024 13:58:02.276094 140239494461312 learning.py:507] global step 60: loss = 21.9683 (0.141 sec/step)
INFO:tensorflow:global step 70: loss = 0.6486 (0.132 sec/step)
I1024 13:58:03.593588 140239494461312 learning.py:507] global step 70: loss = 0.6486 (0.132 sec/step)
INFO:tensorflow:global step 80: loss = 9.6484 (0.135 sec/step)
I1024 13:58:04.992696 140239494461312 learning.py:507] global step 80: loss = 9.6484 (0.135 sec/step)
INFO:tensorflow:global step 90: loss = 0.6877 (0.114 sec/step)
I1024 13:58:06.135541 140239494461312 learning.py:507] global step 90: loss = 0.6877 (0.114 sec/step)
INFO:tensorflow:global step 100: loss = 4.4349 (0.116 sec/step)
I1024 13:58:07.301742 140239494461312 learning.py:507] global step 100: loss = 4.4349 (0.116 sec/step)

Topic one-shot-learning object-detection tensorflow

Category Data Science

SSD-300 tensorflow mAP is very low due to unstable loss

About