What visualization I should choose for Monte Carlo simulations in timeline events?

I wasn't sure if I should open this question in Cross Validated or here. But since the question belongs to a bigger project related with Data Science, I chose this one.

I will present a simplified version of my working project, since the original is too complicated and domain specific.

Let's say that we have a timeline of 1 hour (60 minutes). During this period a job starts running and create user notifications in random points. I have written a Monte Carlo simulation to study this process.

My main questions:

  1. How well those notifications are spread on the period of 60 minutes?
  2. Do we have parts of this timeline that don't contains notifications and are the rest clustered in a specific time?
  3. Changing the random functions in the implementation, how the above answers will be affected?

A pseudo code for the Monte Carlo, which mimic the actual code is:

Repeat one million times
    number_of_notifications  = get_random_number_of_notifications ()
    previous_point = 0
    for i in range(0 to number_of_notifications):
        interval = get_random_interval()
        new_point = previous_point + interval
        previous_point = new_point

Note: In the current implementation, two or more notifications can be at the same minute.

First, I thought that creating a histogram of the specific minutes during the simulations would help me to answer the first question. But then, I realize that I could have one simulation with all the events in the first half and another one in the second half and the histogram misleading that it is well spread.

Then I thought that it might be nice to plot also the min, max, average and std of the intervals in each simulation. But then, would be enough to answer them?

What kind of visualizations should I try to give me insights about the notifications in the Monte Carlo simulation?

Topic monte-carlo simulation visualization

Category Data Science


Instead of (or additionally to) histogram you may check the density plot of the distribution of the intervals between the events. In the r snipped below the process id identified with ID and the interval length is INT_LEN .

densityplot(~INT_LEN,  groups = ID,
data =  df,    
scales=list(x=list(rot=90, cex= .9),y=list(cex=.9)),par.strip.text=list(cex=.8), plot.points = F,  
ylab="density", xlab="Interval between Events", main=paste("Interval Length Density per Process"  ) )

enter image description here

This show the distribution (in my simple case there are two different distribution). One group of processes has a normal distribution of event with a mean of 1,2. So you may argue that there are 50 events per minute on average.

But this may not be true as the time sequence is lost in this view.

To get the time series information I select a basic interval length, where the distribution of the event doesn't play a role any more. I use a minute, but if this is to coarse go down. The point is there is no difference to get all events at the beginning of the interval (minute) or uniformly distribute within the interval.

This selection allows to aggregate the data on the interval level and present it as a time series. The graph below shows the count of event per minute.

xy <- xyplot(CNT ~ MINUTE_NO , groups = ID,   
data =  df ,    type=c("o","g"),
scales=list(rot=90, x=list( cex= .9 ),y=list(rot=0, cex=.8, alternating = c(1,1), tck=c(1,0), relation = "free")),par.strip.text=list(cex=.8), 
ylab="event count per minute", xlab="time", main=paste("Time Series - Event Count per Minute"  ) )
print (xy)

enter image description here

Here you can spot a possible trend in the distribution of the events.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.