estimators

Latent variables with thresholds

mrt

2022年2月21日 07:02

There are many ML techniques to estimate latent variables such as the EM algorithm. Is there a technique that allows for thresholds for each of the latent variables? I have a feature space with 10 variables $(X_1,\dots,X_{10})$ and the outcome $Y$. 7 of the $X$ features are known (I have their observations) and 3 are unknown. Each of the unknown can be within a range from 0 up to a positive constant number. What ML technique would you recommend for …

Topic: estimators machine-learning

Category: Data Science

Math behind, MSE = bias^2 + variance

Fatemeh Asgarinejad

2022年2月14日 23:00

Based on the deeplearningbook: $$MSE = E[(\theta_m^{-} - \theta)^2]$$ $$equals$$ $$Bias(\theta_m^{-})^2 + Var(\theta_m^{-})$$ where m is the number of samples in training set, $\theta$ is the actual parameter in the training set and $\theta_m^{-}$ is the estimated parameter. I can't get to the second equation. Further, I don't understand how the first expression is gained. Note: $Bias(\theta_m^{-})^2 = E(\theta_m^{-2}) - \theta^2$ Also how bias and variance evaluated in classification.?

Topic: mse bias variance estimators

Category: Data Science

Subtraction of two variances (scores)

kiacan

2021年12月23日 22:52

I was wondering, would it be correct to say, when we treat two variances of two populations as a random variable itself (or as a score), that we can simply get a resultant variance V_subtract = V_pop1 - V_pop2 (e.g. V_subtract = (1-0.5) = 0.5. If so, I am wondering what that says about the actual standard error in terms of this subtracted variance score, if we know the total sample size of population 1 and population 2 respectively which …

Topic: variance estimators statistics

Category: Data Science

gridsearchcv best coefficients do not match well with the perfect line

robin kuntz

2021年12月15日 15:22

I wrote a program to find the best combination of coefficients to describe a variable. However, the coefficients from the gridsearchcv do not match well with the expected line. This is a sample of my data: pipe = make_pipeline(process, SelectKBest(f_regression), model) gs=GridSearchCV(pipe,params,n_jobs=-1,cv=5, return_train_score = False); gs.fit(x_train, y_train) fin = gs.best_estimator_.steps[2][1]; coef = fin.coef_; intercept = fin.intercept_ and these are the coefficients given: Then if I plot the line with the coefficients: xplot = 16.15589 + 1.13934372*df_loc.ChargeAmount + 1.605411*df_loc.PatientPrice + 6.81365603*df_loc.LastCost …

Topic: machine-learning-model estimators gridsearchcv regression python

Category: Data Science

How can I do the correlation between two estimators?

juanmah

2021年6月17日 17:03

I'm working with several estimators of all kind. Then, I want to stack these estimators, and the best is if they have low correlation between them. I suppose that the correlation method depends on the type of dependent variable, if it's categorical or numerical. In my case, it's categorical, and the estimators are classifiers. How can I do the correlation between two estimators?

Topic: estimators classifier correlation

Category: Data Science

Offline/Batch Reinforcement Learning: Doubly Robust Off-policy Estimator takes huge values

MetaHG

2020年12月4日 18:29

Context: My team and I are working on a RL problem for a specific application. We have data collected from user interactions (states, actions, etc.). It is too costly for us to emulate agents. We decided therefore to concentrate on Offline RL techniques. For this, we are currently using the RL-Coach library by Intel, which offers support for Batch/Offline RL. More specifically, to evaluate policies in offline settings, we train a DDQN-BCQ model and evaluate the learned policies using Offline …

Topic: estimators q-learning reinforcement-learning dataset machine-learning

Category: Data Science

What is the value of AIC criterion if RSS is 0?

user606273

2020年10月19日 10:02

The AIC formula is : $AIC = 2k + n Log(RSS/n)$ So if RSS is equal to 0, it is undefined. How do I deal with this? What value should it take?

Topic: data-science-model estimators error-handling accuracy

Category: Data Science

Pearson correlation coefficient - is correlation estimator acceptable?

I.D.M

2020年7月25日 16:53

As far as I know when it comes to theory, we use Pearson correlation when we want to check the correlation between two variables, which are both continuous or discrete. For a mixed case it's not so easy to use it to compute correlation coefficient. On the other hand, we have Pearson correlation estimators, where we can calculate mixed case without any problems (based on samples). Does the Pearson correlation coefficient give deceptive results in this case ?

Topic: estimators parameter-estimation correlation

Category: Data Science

Is it possible to estimate

guest

2020年2月27日 09:01

For example, if I want to know a page views/relative prevalence of a Reddit post, and I have the upvote and downvote data. Is it valid that I use the upvote and downvote to predict that, even if I will never get the page views of that post?

Topic: estimators descriptive-statistics statistics

Category: Data Science

Observation Operator - Data Assimilation

Akshay Bansal

2019年1月11日 09:26

In data assimilation, one assumes the existence of a observation operator $\mathcal{H}$ that maps the model-state vector $\bf{x_b}$ to $ \bf{y_b}$ (the model-equivalent of the observations $\bf{y_o}$) according to a reference I'm using to develop a preliminary understanding of DA. Can someone please elaborate on the precise meaning of: model-equivalent of the observations $\bf{y_o}$ and the methods one can use to estimate the operator $\mathcal{H}$.

Topic: estimators forecast time-series

Category: Data Science

Using recurrent neural networks for modeling errors in IMUs

HighVoltage

2018年11月5日 02:50

Inertial measurement units (IMU), usually composed of accelerometers and gyroscopes; are well known to have inherent errors in their data, originating from bias, random walk noise, temperature dependence etc. creating a highly non-linear dependence. Typically, extended Kalman filters are used to estimate and remove these errors for stable measurement of orientations and angular velocities: but even this is not entirely accurate, as some higher order errors are ignored or approximated, and the fact that the Markov assumption ignores the effect …

Topic: lstm estimators rnn

Category: Data Science

What is the difference between a Categorical Column and a Dense Column?

dayuloli

2018年7月21日 16:21

In Tensorflow, there are 9 different feature columns, arranged into three groups: categorical, dense and hybrid. From reading the guide, I understand categorical columns are used to represent discrete input data with a numerical value. It gives the example of a categorical column called categorical identity column: ID Represented using one-hot encoding 0 [1, 0, 0, 0] 1 [0, 1, 0, 0] 2 [0, 0, 1, 0] 3 [0, 0, 0, 1] But you also have a dense column called …

Topic: estimators tensorflow machine-learning

Category: Data Science

How to Save TensorFlow model using estimator.export_savemodel()

Akhil Katpally

2018年4月10日 14:54

How can i Save the TensorFlow model using estimator.export_savedmode() ? Especially, what should i put inside the serving_input_receiver_fn()? I have created a Custom Estimator based on VGGNet Architecture, i am using my own images and doing some transformation (you can see them in _parse_function()) on the images. I have read the documentation here, but i am exactly not sure what to write for my code (please see below). Ultimately i want to save the model and use TensorFlow Serving. from …

Topic: estimators tensorflow deep-learning machine-learning

Category: Data Science

'DecisionTreeClassifier' object has no attribute 'importances_'

Pedro Alves

2018年4月7日 14:12

I've this code in order to visualize the most important feature of each model: dtc = DecisionTreeClassifier(min_samples_split=7, random_state=111) rfc = RandomForestClassifier(n_estimators=31, random_state=111) trained_model = dtc.fit(features_train, labels_train) trained_model.fit(features_train, labels_train) predictions = trained_model.predict(features_test) importances = trained_model.feature_importances_ std = np.std([trained_model.feature_importances_ for trained_model in trained_model.estimators_], axis=0) indices = np.argsort(importances)[::-1] for f in range(features_train.shape[1]): print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]])) plt.figure() plt.title("Feature importances") plt.bar(range(features_train.shape[1]), importances[indices], color="r", yerr=std[indices], align="center") plt.xticks(range(features_train.shape[1]), indices) plt.xlim([-1, features_train.shape[1]]) plt.show() Using RandomForestClassifier this code runs good but when …

Topic: estimators decision-trees feature-selection python predictive-modeling

Category: Data Science

Plotting estimates of Tail Dependence Coefficient in R

O.S

2017年9月24日 20:22

I want to plot the following estimate of the tail dependence coefficient: $\hat{\lambda}_n=\frac{1}{k}\sum_{j=1}^nI_{\{X_j>X_{(n-k)},Y_j>X_{(n-k)}\}}$ where $I_{(.)}$ is an indicator function and $X_{(1)}>...>X_{(n)}$ are order statistics. If I consider this example n=1000; alpha=4; U=runif(n); phi=0.8; sigma=0.1; X=(1-U)^{(-1/alpha)} ; Z=rnorm(n) ; Y=phi*X+sigma*abs(Z) ; then $\lambda=(.8)^4=04.09$. When I plot $\hat{\lambda}_n$ vs $k$ then the graph seems to stabilize around $0.50$ which is not true. See below the graph. n=length(X); lambda_hat=c(); lambda_hat=sapply(1:(n-1), FUN = function(i,minXY=pmin(X,Y),Xsort=sort(X)) length(minXY[minXY>Xsort[floor(n-i)]])/i); k=1:(n-1) plot(k,lambda_hat,type = "l",lwd=2,col="blue", main="Tdc of X and Y",ylab="Estimates",xlab="Order …

Topic: estimators visualization r

Category: Data Science

Estimating forward velocity for a swimmer

M-T-A

2017年2月5日 21:19

With a modern IMU with 9 angles of freedom collecting accelerometer, magnetometer and gyroscope data on 3 axis, what would be the best approach on filtering the data and handling it to accurately estimate the forward velocity of the swimmer? My approach was to: 1. Use a 3-point moving average to get rid of any vibrations caused by unneeded movements 2. Use a median average to get rid of repetitve movements such as shakes or water resistance 3. Perform integration …

Topic: estimators data predictive-modeling data-cleaning

Category: Data Science

How to estimate the mutual information numerically?

patapouf_ai

2016年9月27日 08:53

Suppose I have a sample {$z_i$}$_{i\in[0,N]}$ = {($x_i,y_i$)}$_{i\in[0,N]}$ which commes from a probability distribution $p_z(z)$. How can I use it to estimate the mutual information between X and Y ? $MI(X,Y) = \int_Y \int_X p_z(x,y) \log{ \left(\frac{p_z(x,y)}{p_x(x)\,p_y(y)} \right) }$ where $p_x$ and $p_y$ are the marginal distributions of X and Y: $p_x(x) = \int_Yp_z(x,y)$ $p_y(y) = \int_Xp_z(x,y)$.

Topic: estimators distribution mutual-information information-theory numerical

Category: Data Science

About