Latent variables with thresholds

There are many ML techniques to estimate latent variables such as the EM algorithm. Is there a technique that allows for thresholds for each of the latent variables? I have a feature space with 10 variables $(X_1,\dots,X_{10})$ and the outcome $Y$. 7 of the $X$ features are known (I have their observations) and 3 are unknown. Each of the unknown can be within a range from 0 up to a positive constant number. What ML technique would you recommend for …
Category: Data Science

Math behind, MSE = bias^2 + variance

Based on the deeplearningbook: $$MSE = E[(\theta_m^{-} - \theta)^2]$$ $$equals$$ $$Bias(\theta_m^{-})^2 + Var(\theta_m^{-})$$ where m is the number of samples in training set, $\theta$ is the actual parameter in the training set and $\theta_m^{-}$ is the estimated parameter. I can't get to the second equation. Further, I don't understand how the first expression is gained. Note: $Bias(\theta_m^{-})^2 = E(\theta_m^{-2}) - \theta^2$ Also how bias and variance evaluated in classification.?
Category: Data Science

Subtraction of two variances (scores)

I was wondering, would it be correct to say, when we treat two variances of two populations as a random variable itself (or as a score), that we can simply get a resultant variance V_subtract = V_pop1 - V_pop2 (e.g. V_subtract = (1-0.5) = 0.5. If so, I am wondering what that says about the actual standard error in terms of this subtracted variance score, if we know the total sample size of population 1 and population 2 respectively which …
Category: Data Science

gridsearchcv best coefficients do not match well with the perfect line

I wrote a program to find the best combination of coefficients to describe a variable. However, the coefficients from the gridsearchcv do not match well with the expected line. This is a sample of my data: pipe = make_pipeline(process, SelectKBest(f_regression), model) gs=GridSearchCV(pipe,params,n_jobs=-1,cv=5, return_train_score = False);, y_train) fin = gs.best_estimator_.steps[2][1]; coef = fin.coef_; intercept = fin.intercept_ and these are the coefficients given: Then if I plot the line with the coefficients: xplot = 16.15589 + 1.13934372*df_loc.ChargeAmount + 1.605411*df_loc.PatientPrice + 6.81365603*df_loc.LastCost …
Category: Data Science

How can I do the correlation between two estimators?

I'm working with several estimators of all kind. Then, I want to stack these estimators, and the best is if they have low correlation between them. I suppose that the correlation method depends on the type of dependent variable, if it's categorical or numerical. In my case, it's categorical, and the estimators are classifiers. How can I do the correlation between two estimators?
Category: Data Science

Offline/Batch Reinforcement Learning: Doubly Robust Off-policy Estimator takes huge values

Context: My team and I are working on a RL problem for a specific application. We have data collected from user interactions (states, actions, etc.). It is too costly for us to emulate agents. We decided therefore to concentrate on Offline RL techniques. For this, we are currently using the RL-Coach library by Intel, which offers support for Batch/Offline RL. More specifically, to evaluate policies in offline settings, we train a DDQN-BCQ model and evaluate the learned policies using Offline …
Category: Data Science

Pearson correlation coefficient - is correlation estimator acceptable?

As far as I know when it comes to theory, we use Pearson correlation when we want to check the correlation between two variables, which are both continuous or discrete. For a mixed case it's not so easy to use it to compute correlation coefficient. On the other hand, we have Pearson correlation estimators, where we can calculate mixed case without any problems (based on samples). Does the Pearson correlation coefficient give deceptive results in this case ?
Category: Data Science

Observation Operator - Data Assimilation

In data assimilation, one assumes the existence of a observation operator $\mathcal{H}$ that maps the model-state vector $\bf{x_b}$ to $ \bf{y_b}$ (the model-equivalent of the observations $\bf{y_o}$) according to a reference I'm using to develop a preliminary understanding of DA. Can someone please elaborate on the precise meaning of: model-equivalent of the observations $\bf{y_o}$ and the methods one can use to estimate the operator $\mathcal{H}$.
Category: Data Science

Using recurrent neural networks for modeling errors in IMUs

Inertial measurement units (IMU), usually composed of accelerometers and gyroscopes; are well known to have inherent errors in their data, originating from bias, random walk noise, temperature dependence etc. creating a highly non-linear dependence. Typically, extended Kalman filters are used to estimate and remove these errors for stable measurement of orientations and angular velocities: but even this is not entirely accurate, as some higher order errors are ignored or approximated, and the fact that the Markov assumption ignores the effect …
Category: Data Science

What is the difference between a Categorical Column and a Dense Column?

In Tensorflow, there are 9 different feature columns, arranged into three groups: categorical, dense and hybrid. From reading the guide, I understand categorical columns are used to represent discrete input data with a numerical value. It gives the example of a categorical column called categorical identity column: ID Represented using one-hot encoding 0 [1, 0, 0, 0] 1 [0, 1, 0, 0] 2 [0, 0, 1, 0] 3 [0, 0, 0, 1] But you also have a dense column called …
Category: Data Science

How to Save TensorFlow model using estimator.export_savemodel()

How can i Save the TensorFlow model using estimator.export_savedmode() ? Especially, what should i put inside the serving_input_receiver_fn()? I have created a Custom Estimator based on VGGNet Architecture, i am using my own images and doing some transformation (you can see them in _parse_function()) on the images. I have read the documentation here, but i am exactly not sure what to write for my code (please see below). Ultimately i want to save the model and use TensorFlow Serving. from …
Category: Data Science

'DecisionTreeClassifier' object has no attribute 'importances_'

I've this code in order to visualize the most important feature of each model: dtc = DecisionTreeClassifier(min_samples_split=7, random_state=111) rfc = RandomForestClassifier(n_estimators=31, random_state=111) trained_model =, labels_train), labels_train) predictions = trained_model.predict(features_test) importances = trained_model.feature_importances_ std = np.std([trained_model.feature_importances_ for trained_model in trained_model.estimators_], axis=0) indices = np.argsort(importances)[::-1] for f in range(features_train.shape[1]): print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]])) plt.figure() plt.title("Feature importances")[1]), importances[indices], color="r", yerr=std[indices], align="center") plt.xticks(range(features_train.shape[1]), indices) plt.xlim([-1, features_train.shape[1]]) Using RandomForestClassifier this code runs good but when …
Category: Data Science

Plotting estimates of Tail Dependence Coefficient in R

I want to plot the following estimate of the tail dependence coefficient: $\hat{\lambda}_n=\frac{1}{k}\sum_{j=1}^nI_{\{X_j>X_{(n-k)},Y_j>X_{(n-k)}\}}$ where $I_{(.)}$ is an indicator function and $X_{(1)}>...>X_{(n)}$ are order statistics. If I consider this example n=1000; alpha=4; U=runif(n); phi=0.8; sigma=0.1; X=(1-U)^{(-1/alpha)} ; Z=rnorm(n) ; Y=phi*X+sigma*abs(Z) ; then $\lambda=(.8)^4=04.09$. When I plot $\hat{\lambda}_n$ vs $k$ then the graph seems to stabilize around $0.50$ which is not true. See below the graph. n=length(X); lambda_hat=c(); lambda_hat=sapply(1:(n-1), FUN = function(i,minXY=pmin(X,Y),Xsort=sort(X)) length(minXY[minXY>Xsort[floor(n-i)]])/i); k=1:(n-1) plot(k,lambda_hat,type = "l",lwd=2,col="blue", main="Tdc of X and Y",ylab="Estimates",xlab="Order …
Category: Data Science

Estimating forward velocity for a swimmer

With a modern IMU with 9 angles of freedom collecting accelerometer, magnetometer and gyroscope data on 3 axis, what would be the best approach on filtering the data and handling it to accurately estimate the forward velocity of the swimmer? My approach was to: 1. Use a 3-point moving average to get rid of any vibrations caused by unneeded movements 2. Use a median average to get rid of repetitve movements such as shakes or water resistance 3. Perform integration …
Category: Data Science

How to estimate the mutual information numerically?

Suppose I have a sample {$z_i$}$_{i\in[0,N]}$ = {($x_i,y_i$)}$_{i\in[0,N]}$ which commes from a probability distribution $p_z(z)$. How can I use it to estimate the mutual information between X and Y ? $MI(X,Y) = \int_Y \int_X p_z(x,y) \log{ \left(\frac{p_z(x,y)}{p_x(x)\,p_y(y)} \right) }$ where $p_x$ and $p_y$ are the marginal distributions of X and Y: $p_x(x) = \int_Yp_z(x,y)$ $p_y(y) = \int_Xp_z(x,y)$.
Category: Data Science


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.