Similarity Measure of Simulated Time Series vs Observed time Series

In my work I have an observed Time Series and Simulated ones. I want to compare the Light Curves and check for similarityto find out which simulated curve fits best respectivley which parameters simulate the Light Curve the best. At the moment I do it with the Cross-Correlation function from numpy. But I am not sure if that is the best option, due to the fact that the Light Curve with the highest Cross-Correlation Coefficient not always looks like the …
Category: Data Science

Stack as many industrial components as possible in a crate

The exact problem is a crate of industrial parts, made by injection molding in very high quantities. The objective is to put as much parts as possible in one crate. This is done by a small robotic arm that take the part from the injection molding machine, cold it a little and put it in the crate. The shape of the parts can be a bit complex. It can be basically anything that can be molded into a 2 parts …
Category: Data Science

How to interpret data projected on the sum of first few principal components weighted by eigen values?

I have simulation time series data of a molecule from Molecular dynamics and I want to visualize the very high-dimensional trajectory in two dimensions and also identify some clusters. The problem is that when I do PCA, the first 20 eigenvectors are needed to explain 80% of the variance. Is it possible to add the first 10 eigenvectors and get a single vector V1 and add up the 11th to 20th vector as V2 all components weighted by their eigen …
Category: Data Science

Building a simulator for continuous state, discrete action reinforcement learning

I am trying to build a simulator that optimizes the performance and temperature of a device. I want the device to perform well, but without making the device too hot. If the device becomes too hot, I want the internal circuitry to push down the device performance to reduce the temperature. It is hard to perform repeated ground truth experiments on the device so I need to build a simulator in which to train the agent. I am new to …
Category: Data Science

Any books or resources about how to approach "purely synthethic expressions" of physical phenomena?

Over and over again I come to think that "it's cumbersome to collect empirical data". Yet it's often viewed as a necessity for explaining empirical phenomena. But then I idealize that: It would be so nice if I could describe a phenomenon simply by describing a simple model with variable parameters and then generate instances from it to describe the empirical phenomenon. But I've been puzzled by particularly the validition phase of this, since often "validation" means "to compare to …
Category: Data Science

How to find out if two datasets are close to each other?

I have the following three datasets. data_a=[0.21,0.24,0.36,0.56,0.67,0.72,0.74,0.83,0.84,0.87,0.91,0.94,0.97] data_b=[0.13,0.21,0.27,0.34,0.36,0.45,0.49,0.65,0.66,0.90] data_c=[0.14,0.18,0.19,0.33,0.45,0.47,0.55,0.75,0.78,0.82] data_a is real data and the other two are the simulated ones. Here I am trying to check which one (data_b or data_c) is closest or closely resembles to data_a. Currently I am doing it visually and with ks_2samp test (python). Visually I graphed the cdf of real data vs cdf of simulated data and try to see visually that which one is the closest. Above is the cdf of data_a …
Category: Data Science

How to build a simulator for a physical machine given a set of datapoints of its behaviour?

I have a database with millions of datapoints describing the behaviour of a heat pump. For every second, I know various temperature, pressure, mass flow and power measurements as a response to the signals sent by a controller. In other words, I have records of what the machine is being told to do and what actually does. I would like to build a simulator that, given a set of artificial inputs (e.g. coming from a web page) attempts to simulate …
Category: Data Science

Pull Random Numbers from my Data (Python)

Let's imagine I have a series of numbers that represents cash flows into some account over the past 30 days in some time window. This data is non-normal but it does represent some distribution. I would like to pull "new" numbers from this distribution in an effort to create a monte-carlo simulation based on the numerical data I have. How can I accomplish this? I've seen methods where you assume the data is normal & pull numbers based on some …
Category: Data Science

How can I build a simulation environment that assess different risk policies?

I work in fin-tech and would like to build some sort of simulation program to assess how different inputs will impact net revenue. For example, if we create new policies based on ML scores, how would those have impacted our loss and revenue metrics? While we can and do run online experiments, it would be desirable to simulate these impacts ahead of time. Aside from something like reinforcement learning, I was thinking that Monte Carlo simulations might be the best …
Category: Data Science

What is the difference between domain randomization and data augmentation?

Domain randomization (https://arxiv.org/abs/1703.06907) is used to create a synthetic dataset with enough variance that it will encompass unseen real data, as just one variation. I am trying to understand how this is different from applying data augmentation techniques to a synthetically generated dataset.
Category: Data Science

Understand how to simulate a statistics

This solution describes how to simulate statistics to find a confidence interval. A journalist called 1000 people in town to ask who will they be voting for out of candidates A and B. The observed value came out to be 511 votes for A and 489 votes for B. this makes us think that candidate A will win. But we need to know if this sample is truly representative of the underlying population distribution. To find this, we simulate this …
Category: Data Science

Visualization of multiple Markov models

I am working on a project where we compare over 10 different Markov models, each representing a different treatment plan. Most often single models are visualized with a decision tree or transition state diagram. However, with multiple different models what are potential visualizations that could communicate the transition states that differentiate each model? I have seen other people use a table to depict different models and the transition states. For clarity, I am not referring to a transition probabilities chart …
Category: Data Science

Ways to simulate weather data over several periods (Python or R)?

I have a time series dataset that has several variables for a state/province for fixed periods of time. That is for state A, there are samples from April 2017 to July 2019. Of course, I thought adding precipitation and temperature variables would be a great idea. I tried finding some relevant external data but most of it is abstract and spread out. How would one simulate dynamic data in Python with varying means, highs and lows for say six months …
Category: Data Science

Finding similarity between two datasets

I have two datasets. One is actual percentage of white population in counties in an american state and the other is the simulated percentage of white population in counties in an american state. Bits about my simulation: It is a random simulation done on California map with two different agents, white and minority. Their total population is based on the real white to minority ratio in California. For example if there is 70% white and 30% minority in California then …
Category: Data Science

Similarity Measure Time Series

In my work I have an observed Time Series and Simulated ones. I want to compare the Light Curves and check for similarityto find out which simulated curve fits best respectivley which parameters simulate the Light Curve the best. At the moment I do it with the Cross-Correlation function from numpy. But I am not sure if that is the best option, due to the fact that the Light Curve with the highest Cross-Correlation Coefficient not always looks like the …
Category: Data Science

Estimating the value of $\pi$ with a Monte Carlo dartboard: $<$ or $\leq$?

I'm trying to figure out which is the proper way to estimate $\pi$ using the Monte Carlo method randomly distributing points in a square that also contains an inscribed circle. Some sources say to use the comparison of $\sqrt{x^2+y^2}\le 1$, while others use $\sqrt{x^2+y^2}&lt;1$. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Here's some example code from a wikipedia article: def monte_carlo_pi(nsamples): acc = 0 for i in range(nsamples): x = random.random() y = random.random() if (x**2 + y**2) &lt; 1.0: acc += 1 return 4.0 * …
Category: Data Science

ML/Statistical Model to Analyse the Distribution

Consider a Sample Data-set provided below; |ShopID| |Transactions| |dist_to_shop| S1 15478 0 S2 12345 0.41 S3 17865 0.11 S4 35479 0.57 S5 74589 0.35 The data-set consist of ShopID, Transactions and dist_to_shop (In Meters) fields. Assuming all the Shops belong to one retailer, I would like to find out the distribution of Transaction/People Visits to the other shops, by assigning weights/business rules on the basis of the distance. For Example, the weights can be given as; 0-200 Meters = 40% …
Category: Data Science

Modeling uncertainty from Logistic Regression

Logistic regression is a part in a simulation pipeline that I use for some scenario analysis. The dataset that this is based on is not small but relatively noisy, and only one explanatory variable/feature. Of course I can say something about this uncertainty using frequentist or Bayesian methods but I would like to use this in the sequential simulation step as well, to get a fairer final estimate. What I'm planning on doing should work but is somewhat computationally expensive …
Category: Data Science

Testing Multi-Arm Bandits on Historical Data

Suppose I want to test a multi-arm bandit algorithm in the contextual setting on a set of historical data. For simplicity, let's assume there are only two arms A and B and suppose the rewards are binary. Furthermore, suppose I have a data set where users were shown one of the two arms and I have a record of the rewards. What would be the best approach to simulating the scenario of running the algorithm online? I was thinking of …
Category: Data Science

What visualization I should choose for Monte Carlo simulations in timeline events?

I wasn't sure if I should open this question in Cross Validated or here. But since the question belongs to a bigger project related with Data Science, I chose this one. I will present a simplified version of my working project, since the original is too complicated and domain specific. Let's say that we have a timeline of 1 hour (60 minutes). During this period a job starts running and create user notifications in random points. I have written a …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.