How to specify steps_per_epoch and validation steps on infinite dataset?
I have a huge csv dataset with a size of 200 GB. I'm using CsvDataset to make dataset generator for loading data from the disk while training the model. I want all the data to be passed on each epoch. So, what should I pass in the parameters steps_per_epoch and validation steps.
Here is my Keras model using the data_set.
training_csvs = sorted(str(p) for p in pathlib.Path('.').glob(path-to-data/Train_DS/*/*.csv))
training_csvs
training_dataset=tf.data.experimental.CsvDataset(
training_csvs,
record_defaults=defaults,
compression_type=None,
buffer_size=None,
header=True,
field_delim=',',
# use_quote_delim=True,
# na_value=,
select_cols=selected_indices
)
print(type(training_dataset))
for features in training_dataset.take(1):
print(Training samples before mapping)
print(features)
validate_ds = training_dataset.map(preprocess).take(10).batch(100).repeat()
train_ds = training_dataset.map(preprocess).skip(10).take(90).batch(100).repeat()
model = tf.keras.Sequential([
layers.Dense(256,activation='elu'),
layers.Dense(128,activation='elu'),
layers.Dense(64,activation='elu'),
layers.Dense(1,activation='sigmoid')
])
history = model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
metrics=['accuracy'])
model.fit(train_ds,
validation_data=validate_ds,
validation_steps=20, # I think it's wrong, on each epoch only 20 batches would be used...
steps_per_epoch= 20,
epochs=20,
verbose=1
)
Topic epochs keras tensorflow machine-learning
Category Data Science