How to interpret two continous variables output using GAM?

I really need help with GAM. I have to find out whether association is linear or non-linear by using GAM. The predictor variable is temperature at lag0 and the output is cardiovascular admissions (count variable). I have tried a lot but I am not able to understand how to interpret the graph and output that I am getting.

I tried this formula using mgcv package:

model1- gam(cvd ~ s(templg0), family=poisson)
summary(model1)
plot(model1)

So here is the output for summary that I am getting:

Family: poisson 
Link function: log 

Formula:
cvd ~ s(templg0)

Parametric coefficients:
            Estimate Std. Error z value Pr(|z|)    
(Intercept) 3.195669   0.004877   655.2   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
             edf Ref.df Chi.sq  p-value    
s(templg0) 3.422  4.295  57.23 2.93e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.0152   Deviance explained = 1.68%
UBRE =  1.016  Scale est. = 1         n = 1722

Can someone please explain the output in detail. What this output is explaining? and also can someone help what this plot (picture attached) is showing? Please be kind as I have invested a lot of time but can not find how to interpret this.

Topic interpretation correlation glm statistics r

Category Data Science


The z-value (see glm output) shows predictor variable (temperature at lag0) has a statistically significant positive effect on the cardiovascular admissions. A substantive value of chi-square also support the idea of major effect of temperatures.(Note that p-value indicates the observed sample is random i.e. randomly distributed). The Graph is showing the GAM part . In no case,it evaluates whether there is a linear or the non-linear association.


What you do here is to use smoothing splines regression. Have a look at the book "Introduction to Statistical Learning" Chapter 7.5 for a very good overview on the method.

The s-function in GAM allows you to specify how the GAM is fitted. You have not supplied k, so some default is chosen:

the dimension of the basis used to represent the smooth term. The default depends on the number of variables that the smooth is a function of. k should not be less than the dimension of the null space of the penalty for the term (see null.space.dimension), but will be reset if it is. See choose.k for further information.

GAM basically does several linear regressions (specified by k) along the x-axis if you want to say so. Thus, GAM allows to model wild non-linearity. If you want to check linearity in your data, you should check different values of k and look at the plot.

GAM Example:

library(gam)
library(ISLR)
df = ISLR::Auto

# GAM with 10 knots
gam5= gam(horsepower ~ s(mpg, 10), data=df)
summary(gam5)
plot(gam5, se=T)

enter image description here

GAM Result:

Would you conclude that your model is linear? No. In the range between about ]mpg=0 to mpg=20[ there is a linear relation as well as betweeen ]mpg=20 to mpg=40[. But linearity does not hold for the entire range of data. So I would differentiate these segments, e.g. by dummy encoding and interaction terms.

Note that the y-axis is rescaled. So there is no natural interpretation of the y-axis here.

Comparison to non-parametric (NP) estimation:

To deal with non-linearity non-parametric regression is an obvious alternative. What happens if we do NP?

# Nonparametric regression
library(SemiPar)
fit <- spm(df$horsepower ~ f(df$mpg)) 
plot(fit)

enter image description here

NP Result:

As you can see NP delivers almost the same result. However, the y-axis in the figure has a natural interpretation, which can be useful.

What about your Problem:

First make sure that you check different values for k in s(...,k), so check different number of knots and see how the figure changes. Also have a look at the book to understand the background.

In your current figure, I see some kinks at about x=10 and x=20. However, I would not say that this is severe non-linearity (but there is non-linearity in the data). Generally, if you can draw a line over the plot range (along the x-axis), and if this line is not outside your confidence bands, you can claim for a linear relationship.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.