How to interpret a linear regression effects graph?

could someone tell me how to interpret the following graph? It corresponds to a graph in which the effects of the variables in a linear regression are observed, but its interpretation is not clear to me. Why in working day only half a graph is shown? Why doesn't weathersit have whiskers? Why holiday is simply a line at 0? Here is a brief summary of the variables: workingday : if day is neither weekend nor holiday is 1, otherwise is …
Category: Data Science

Can absolute or relative contributions from X be calculated for a multiplicative model? $\log{ y}$ ~ $\log {x_1} + \log{x_2}$

(How) can absolute or relative contributions be calculated for a multiplicative (log-log) model? Relative contributions from a linear (additive) model E.g., there are 3 contributors to $y$ (given by the three additive terms): $$y = \beta_1 x_{1} + \beta_2 x_{2} + \alpha$$ In this case, I would interpret the absolute contribution of $x_1$ to $y$ to be $\beta_1 x_{1}$, and the relative contribution of $x_1$ to $y$ to be: $$\frac{\beta_1 x_{1}}{y}$$ (assuming everything is positive) Relative contributions from a log-log …
Category: Data Science

How to interpret two continous variables output using GAM?

I really need help with GAM. I have to find out whether association is linear or non-linear by using GAM. The predictor variable is temperature at lag0 and the output is cardiovascular admissions (count variable). I have tried a lot but I am not able to understand how to interpret the graph and output that I am getting. I tried this formula using mgcv package: model1<- gam(cvd ~ s(templg0), family=poisson) summary(model1) plot(model1) So here is the output for summary that …
Category: Data Science

Interpreting ROC curves across k-fold cross-validation

I have used a MARS model (multivariate adaptive regression splines) and I have used k fold cross validation for the evaluation of the model, obtaining the following graph: How would be the interpretation of this model? I understand that in the 6 fold, the model obtains a better AUC, but why? What is the interpretation of this? Thanks to all.
Category: Data Science

Drastic drop in Somers' D ? Why?

I came across to find the correlation between the ratings assigned by two coaches to a same group of 40 players. I have tabulated the results as below: The Somers' D is 50%. However, for the case below, The Somers' D is 94.7%. My question is, why both scenarios are having 2 deviations but the first scenario has so much lower Somers' D compared to the second scenario?
Category: Data Science

How can I disaggregate the impact of a group of variables using machine learning?

I have a problem where the target variable Y (continuous, values: 0-1) is controlled by large number of variables. These variables can be grouped by the nature of the data: Group 1 - x1, x2, x3, x4 Group 2 - x5, x6, x7 Group 3 - x8, x9, x10, x12 After modeling Y~X, I would like to disaggregate the impact of these groups. Example, I want to have a plot like this famous Hawkins and Sutton plot of climate change …
Category: Data Science

Using ML Interpretability Techniques for Data Analysis Instead of Strictly Model Analysis

Hope you lot are doing alright. I have been looking into Explainable AI and model interpretability lately, and I had an idea but am wondering whether it would constitute a valid use case. There is a data analysis project happening at work where we're trying to analyze data we had on hand to determine the factors that affect our KPOs and possibly derive useful actionable insights. Instead of moving forward with manually evaluating correlation and doing EDA that way, I …
Category: Data Science

Interpretation of VAR model: about impulse function and lag of p

For example, I have three time series, Y,X1,X2. After using time series cross validation and utilizing BIC/AIC to determine the best p as the lag of the VAR model, in which I got p = 1 to estimate the model. I know that to explain the model, we can use impulse function to explain the model, while using variance decomposition to explain the variance of predicted errors. I have a confusion of p and the explanation of impulse function. Based …
Category: Data Science

Why does SHAP's TreeExplainer with "interventional" method not match exact SHAP?

I am trying to understand the concepts/definitions behind the SHAP method of explaining model predictions. In particular I've read the original SHAP paper and the TreeExplainer paper. The original paper lays out a particular, well-defined set of values for a given model prediction on tabular data, which can be computed exactly (although this is very slow in practice, so the paper/package gives various other algorithms as "approximations".) In the TreeExplainer paper, algorithm 1 & 2 for "TreeExplainer with path-dependent feature …
Category: Data Science

how to visualize segmented labels in a already existing graph?

I am working on a project where I have to segment the image using multi-class segmentation (3 classes) on microscopic images. Now let's say that I am segmenting solid, liquid and gas images (this is an example made up for asking the question because I am not allowed to discuss anything about the project :( ), I have pixel wise segmented solid, liquid and gas. But now, once I have segmented these images, I want to put the results on …
Category: Data Science

How to interpret my logistic regression result with statsmodels

so I'am doing a logistic regression with statsmodels and sklearn. My result confuses me a bit. I used a feature selection algorithm in my previous step, which tells me to only use feature1 for my regression. The results are the following: So the model predicts everything with a 1 and my P-value is < 0.05 which means its a pretty good indicator to me. But the accuracy score is < 0.6 what means it doesn't say anything basically. Can you …
Category: Data Science

Why do I get this result with a chi- square test?

I have a question about the chi squared independence test, I'm working on dataset and I'm interested in finding the link between the categories of product and the gender, I plot my contingency table. contingency_table :- I found that p-value is1.54*10-5 implying that my variables are correlated. I don't really understand how is it possible because the proportion between man and women for each category are very similar.
Category: Data Science

Practical Interpretation of PCAs for a supplier analysis

I am using PCA to validate and research a set of 13 suppliers of products against a set of about 50 variables and performance indicators against an ideal "wish"-Supplier, mostly based on G. Jankers Book on Factor Analysis for Supplier a Rating System. While using R Studio I use my data to perform the PCA with prcomp. My question is regarding practical statements of the outcomes of the PCA and its factors. My Goal is to identify the perfomance indicators, …
Category: Data Science

Feature importance by removing all other features?

For neural network feature importance, can I zero-out all features except one in order to gauge that feature's importance? I know shuffling a feature is one approach. For example, leaving in the 4th feature. feature_4 = [ [0.,0.,0.,1.15,0.] [0.,0.,0.,1.76,0.] [0.,0.,0.,2.31,0.] [0.,0.,0.,0.94,0.] ] _, probabilities = model.predict(feature_4) The non-linear output of activation functions worries me because activation of the whole is not equal to the sum of individual activations: from scipy.special import expit #aka sigmoid >>> expit(2.0) 0.8807970779778823 >>> expit(1.0)+expit(1.0) 1.4621171572600098 …
Category: Data Science

Decision Trees and SHAP Values

I've recently been using some (optimal) decision trees methods in R, such as 'evtree' and 'iai.' Both of these provide really nice interpretable plots. And out of the 12 covariates I have in my model, the optimal tree (say for example, using 'evtree') is typically described by 3-4 covariates. However, when I calculate my Shapley values for the evtree, it is unusual that many of remaining 8-9 covariates which are not in the optimal tree, often have a very high …
Category: Data Science

Answering the question of "WHY" using AI?

We have seen lots of natural occurrences that are happening in the whole world. Since we have great progress in technology and in particular AI, How can I employ ML to answer the question of WHY. In a sense that, without interpreting the result by human, Can machine interpret why something is happening or not? Like feeding a machine with lots of input, from synthesized data to actual data, does the machine answer any question or no, it does just …
Category: Data Science

Shapley values for channel attribution equal to linear attribution

I am looking into Shapley values for online marketing attribution. In recent time many articles seem to have been made on this particular approach to attribution (there are more): https://medium.com/analytics-vidhya/the-shapley-value-approach-to-multi-touch-attribution-marketing-model-e345b35f3359 And i.e.: https://blog.dataiku.com/step-up-your-marketing-attribution-with-game-theory It seems that, at least in certain cases, the result will be identical to linear attribution, so I am trying to get more information regarding whether this is to be expected / correct. The problem: The Shapley value approach for online marketing attribution in these articles seems …
Category: Data Science

How do I interpret the output of linear regression model in R?

I have the following linear regression model and its analysis. There are a few errors, but I am not very sure about the errors. I have not succeeded in finding them so far. First, the 95% confidence interval for the slope should be So the calculation is wrong. Second, I'm not sure about the interpretation of the confidence interval. How would you interpret it in the context ?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.