Models / General guidelines in joining two datasets via timestamp

So I have two datasets (dataset1, dataset2) that are related in this way:

  1. (Think of a checkout counter in a supermarket) A customer comes in with an item.
  2. The cashier registers the item to dataset1 directly, in particular registering the current time.
  3. The cashier does the usual cashier thing (scanning the item, waiting for the customer to pay, point is there is some time gap).
  4. After the customer paid, now the cashier needs to register the sale to dataset2, including the current time (there is a time gap).

Now, I would like to join dataset1 with dataset2, but there are some issues:

  1. (Apart from timestamp), the only connection between the two datasets is the cashier's name.

  2. The customer uses pennies to pay for the purchase, which takes forever to count, so there is a time difference of up to ~25mins between the same purchase being registered in dataset1 and dataset2.

  3. The cashier does not wait for customers to finish paying before serving another customer, suppose there are two customers (customer1, customer2). The cashier registers customer1's purchase to dataset1, then serves customer2 and registers customer2's purchase to dataset1 before customer1 can finish paying. Then, when customer1 finally finish the payment, the cashier registers the payment to dataset2. Customer2 may actually finish paying before customer1, so the payment of customer2 may enter dataset2 before customer1's.

  4. The cashier is not the best, sometimes he forget to enter the purchase to dataset1, and sometimes he forget to enter the purchase to dataset2.

  5. Just to state it explicitly, there are no unique id's I can use to join the two datasets.

I'm looking for general methods, tricks on joining the two datasets. I'm not hoping for a perfect join, but a metric on how good is the join would be very nice. I came from a math background, so feel free to hit me with some mathematically heavy models (if there exist any).

Topic time dataset

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.