Tuning SSD Mobilenet for better performance

I'm using Tensorflow's SSD Mobilenet V2 object detection code and am so far disappointed by the results I've gotten. I'm hoping that somebody can take a look at what I've done so far and suggest how I might improve the results:

Dataset

I'm training on two classes (from OIV5) containing 2352 instances of "Lemon" and 2009 instances of "Cheese". I have read in several places that "state of the art" results can be achieved with a few thousand instances.

Train / validation parameters

Next up I'll list my config file, which is basically the same as the default. The only changes I made were a) changed num_classes and b) doubled the l2_normalizer scale because during training the algorithm was overfitting and validation loss started to increase after only ~25,000 iterations.

model {
  ssd {
    num_classes: 2
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.05
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00007
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00007
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 3
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 24
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "gs://MY_DIR/data/train.record-?????-of-00010"
  }
  label_map_path: "gs://MY_DIR/data/label_map.pbtxt"
}

eval_config: {
  num_examples: 870
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "gs://MY_DIR/data/val.record-?????-of-00010"
  }
  label_map_path: "gs://MY_DIR/data/label_map.pbtxt"
  shuffle: true
  num_readers: 1
}

Cluster setup

I didn't touch the cloud config file, but thought I'd include it for completion.

trainingInput:
  runtimeVersion: "1.12"
  scaleTier: CUSTOM
  masterType: standard_gpu
  workerCount: 5
  workerType: standard_gpu
  parameterServerCount: 3
  parameterServerType: standard

Problem / Question

With this setup, I'm achieving a mAP that generally doesn't go above 25%. The best is [email protected], which touches 30% briefly before falling again at 25k iterations.

As mentioned before, validation loss goes from 12 to 7 (arbitrary), but then increases again around 25k iterations as well.

Although I'm not super familiar with what type of results I should expect, these numbers seem wrong. I'm not even sure if I should be looking to improve my dataset or improve my training hyperparams. I'll accept any answer that will help put me on the right track. Please let me know if I've forgotten to include any pertinent information.

Topic object-detection tensorflow

Category Data Science


Try to change your data augmentation techniques , ssd_random_crop is making the problem I believe.

Try using the one below:

    data_augmentation_options {
      random_horizontal_flip {
      } 
    }
  
    data_augmentation_options {
      random_rotation90 {
      }
    }
  
    data_augmentation_options {
      random_rgb_to_gray {
      }
    }
    data_augmentation_options {
      random_distort_color {
      }
    }

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.