Tensorflow: Unable to Save checkpoint after every 2 global steps during training the SSD model object detection

Question

Tensorflow: Unable to Save checkpoint after every 2 global steps during training the SSD model object detection

Rahul_Saini

2022年5月31日 10:01

python models/object_detection/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco.config

INFO:tensorflow:Restoring parameters from /home/rahul/Downloads/ssd_inception_v2_coco_2018_01_28/model.ckpt
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path training/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 14.1708 (73.929 sec/step)
INFO:tensorflow:global step 2: loss = 13.3957 (26.779 sec/step)
INFO:tensorflow:global_step/sec: 0.0207936
INFO:tensorflow:Recording summary at step 2.
INFO:tensorflow:global step 3: loss = 13.2996 (34.331 sec/step)
INFO:tensorflow:global step 4: loss = 12.6129 (27.737 sec/step)
INFO:tensorflow:global step 5: loss = 12.0835 (28.638 sec/step)
INFO:tensorflow:global step 6: loss = 11.9736 (29.535 sec/step)
INFO:tensorflow:global_step/sec: 0.0333812
INFO:tensorflow:Recording summary at step 6.
INFO:tensorflow:global step 7: loss = 11.3325 (35.411 sec/step)
INFO:tensorflow:global step 8: loss = 10.9632 (28.500 sec/step)
INFO:tensorflow:global step 9: loss = 10.8758 (27.419 sec/step)
INFO:tensorflow:global step 10: loss = 11.1301 (25.544 sec/step)
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
rahul@rahul-pc:~/crack_detection_ssd_inception$

For above it is saving a checkpoint at the $0^{th}$ $10^{th}$ step but I would like to save a checkpoint after every 2 global steps for which I re-trained my model only for 10 num_steps (this is only for testing). From where we can modify the checkpoint saving?

I tried to change save_checkpoints_steps, save_checkpoints_secs in object_detection/model_main.py and object_detection/model_lib_test.py

Topic object-detection tensorflow

Category Data Science

Piyush Singh · Accepted Answer · 2018年10月20日 14:56

Use the saver class:

saver = tf.train.Saver()
with tf.Session() as sess:
    # Run your ops here with sess.run()
    saver.save(sess, 'my-model', global_step=2)

This will create checkpoint files with prefix my-model at every 2 time steps.

Reference: https://www.tensorflow.org/guide/saved_model

Tensorflow: Unable to Save checkpoint after every 2 global steps during training the SSD model object detection

About