Considering Bayesian posterior inference, which distribution does Monte Carlo sampling take samples from: posterior or prior? Posterior is intractable because the denominator (evidence) is an integration over infinite theta values. So, if Monte Carlo samples from posterior distribution, I am confused as to how the posterior distribution is known as it is intractable. Could someone please explain me what I am missing? If Monte Carlo samples from prior distribution, how does the samples approximate to posterior distribution?
As I understand it, in reinforcement learning, off-policy Monte Carlo control is when the state-action value function $Q(s,a)$ is estimated as a weighted average of the observed returns. However, in Q-learning the value of $Q(s, a)$ is estimated as the maximum expected return. Why is this not used in Monte Carlo control? Suppose I have a simple 2-dimensional bridge game, where the objective is to get from a to b. I can move left, right, up or down. Lets say …
In Monte Carlo-based action value estimation problem for a deterministic policy (estimation of $q_{\pi}(s,a)$),the estimation problem seems not to be well-defined because $q_{\pi}(s,a)$ by definition means the value of an arbitrary action $a$ at a given state $s$ when initial action $a$ is applied at that state and then following actions from policy $\pi$ at the next states. But, in a real application under a given deterministic policy $\pi$, how can you choose the initial action $a$ arbitrarily at state …
Assume I have registered the duration of 10 tasks and built the table below with using this data: Duration For how many tasks it happened 4 days 5 task 6 days 2 task 8 days 2 task 10 days 1 task Looking at this table, one can easily conclude that there's a 50% chance that a task would last 4 days. Therefore, my Monte Carlo simulation will yield "4 days" as the task duration 50% of the time. However, there's …
I was trying to figure out what is a Monte Carlo Markov Chain. From what I understand it is a way of computing an approximation of a probability distribution, which cannot compute exactly. So we keep sampling from a probability distribution, in order to be more accurate, reducing variance of the samples by increasing the number of examples, and this samples are given by Gibbs Sampling. This step-to-step process is a Markov Chain, but I don't really get the details …
I have a text classifier with 3 dropout layers. I tried to use Monte Carlo Dropout (MCD) technique to improve its performance, however its performance hasn't improved. MCD improved performance when classifying hand-written digits for MNIST dataset. Now I wonder whether there is simply no space/potential for improving my text classifier or I have selected incorrect dropout rate. How do I find the optimal dropout rate for Monte Carlo Dropout? In particular: Should I use same dropout rate during both …
Below code runs without any problem, however when I run the same code using Monte Carlo Analysis for 1000 runs, it gives IndexError. Can someone explain why this happens. Thanks X = df1.drop("Gender", axis = 1) y = df1.Gender X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2) nb = CategoricalNB() nb.fit(X_train,y_train) nb_pred = nb.predict(X_test) nb_accuracy = accuracy_score(y_test,nb_pred) nb_accuracy output: 0.6279486413854882 X = df1.drop("Gender", axis = 1) y = df1.Gender for i in range(1000): X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2, random_state = i) #CategoricalNB Naive Bayes Model nb = …
I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or classification problem I have metrics like r2_score or accuracy etc.. Are there any such parameters or how do I test the agent, conclude that the agent is trained well or bad. Thanks
I have a dataset that I’ve been playing around with for school I have gotten very good results with a bunch of methods (Ridge, Lasso, ElasticNet, SVM, Bagging, Stacking and NN even) Now I’m having a range of different coefficients of my predictors, is it a good idea to use them as my priors (I did so, I think the result has been ok) or should I use noninformative priors instead. If it is a bad idea, could you explain …
I am trying to understand an MCMC program. I manage to run it, but I am trying to understand the meaning of the some parameters in the analysis. The code is something like this #Nsamples nsamp = 50000 #Burn-in skip = 300 #temperature at which to sample temp = 2 #Gelman-Rubin for convergence GRstop = 0.01 #every number of steps check the GR-criteria checkGR = 500 #1 if single cpu , otherwise is giving by the nproc-> mpi -np # …
So I can generate X from arbitrary CDF F(x) by the procedure above. Can it be generalized to two variables? How, exactly? If not, what's the best way to generate (X,Y) from arbitrary CDF F(x,y) or PDF f(x,y)?
Let's imagine I have a series of numbers that represents cash flows into some account over the past 30 days in some time window. This data is non-normal but it does represent some distribution. I would like to pull "new" numbers from this distribution in an effort to create a monte-carlo simulation based on the numerical data I have. How can I accomplish this? I've seen methods where you assume the data is normal & pull numbers based on some …
I understand how to use MC dropout from this answer, but I don't understand how MC dropout works, what its purpose is, and how it differs from normal dropout.
Watching this video (11:30) that presents the simplest algorithm for reinforcement learning: Monte Carlo Policy Evaluation, which says in general: The first time a sate is visited: increment N(s): N(s) = N(s) + 1 increment total state's return function by current episode's return so far S(s) = S(s) + G_t State's value is estimated by mean return over many episodes: V(s) = S(s) / N(s) by law of large numbers, V(s)-->V_true(S) as N(S)-->inf My question is - should the environment …
I work in fin-tech and would like to build some sort of simulation program to assess how different inputs will impact net revenue. For example, if we create new policies based on ML scores, how would those have impacted our loss and revenue metrics? While we can and do run online experiments, it would be desirable to simulate these impacts ahead of time. Aside from something like reinforcement learning, I was thinking that Monte Carlo simulations might be the best …
first of all sorry if this is not the proper place to ask but i have been trying to create some dummy variables in order to run a students t-test as well as a welch t-test and then run a monte-carlo simulation.Problem is, I am only given the sample size and standard deviation of the 2 populations. How can I go about creating some sort of representation for this data in order for me to run these tests? I wish …
I am training an RL agent using PPO algorithm for a control problem. The objective of the agent is to maintain temperature in a room. It is an episodic task with episode length of 9 hrs and step size(action being taken) for every 15 mins.During the training of an agent, from a given state, agent takes an action.Then I check the temperature of the room after 15 mins(step size) and if this temperature is within limits, I give the action …
Hi I am training an RL agent for a control problem. The objective of the agent is to maintain temperature in a zone. It is an episodic task with episode length of 10 hrs and actions being taken every 15 mins. Ambient weather is one of the state variable during the training. For training process a profile of ambient temperature has been generated for each hour of the day and used for training. I have trained the agent using PPO …
I am training an RL agent for a control problem using PPO algorithm. I am using stable-baselines library for it. The objective of an agent is to maintain a temperature of 24 deg in a zone and it takes actions every 15 mins.The length of episode is 9 hrs. I have trained the model for 1 million steps and the rewards have converged. I assume that the agent is trained enough. I have done some experiments and have few questions …
I think I might need the help of this valuable community for a task. I have been given a dataset for 100 numerical independent variables (IVs) that predict output for 200 numerical values (from monte carlo simulation results). Which statistical technique should I start exploring and trying on my dataset? The number of observations can be increased for more points to enhance learning an algorithm. From this, I would like to learn a few insights, such as the multiple collinearity …