Can absolute or relative contributions from X be calculated for a multiplicative model? $\log{ y}$ ~ $\log {x_1} + \log{x_2}$

(How) can absolute or relative contributions be calculated for a multiplicative (log-log) model? Relative contributions from a linear (additive) model E.g., there are 3 contributors to $y$ (given by the three additive terms): $$y = \beta_1 x_{1} + \beta_2 x_{2} + \alpha$$ In this case, I would interpret the absolute contribution of $x_1$ to $y$ to be $\beta_1 x_{1}$, and the relative contribution of $x_1$ to $y$ to be: $$\frac{\beta_1 x_{1}}{y}$$ (assuming everything is positive) Relative contributions from a log-log …
Category: Data Science

Log odds understanding

Here is my understanding of one reason why we prefer log odds over odds & probability. Please let me know if I got it right. Reasons why we choose log-odds- The range of probability values: $[0,1]$, The range of odds values: $[0,\infty]$, The range of log-odds values: $[-\infty,+\infty]$ Probability & odds lie in a restricted range, while log-odds doesn't. When values lie in a restricted range, the correlation between variables falls. Following is an example where correlation between 'Hours spent …
Category: Data Science

Log odds preference query

The range of probability values: $[0,1]$, The range of odds values: $[0,\infty]$, The range of log odds values: $[-\infty,+\infty]$ We use log of odds instead of odds and probability in logistic regression as data can be modeled better when values lie in a non-restricted range. What exactly is the issue with values that lie in a restricted range?
Category: Data Science

How to justify logarithmically scaled frequency for tf in tf-idf?

I am studying tf-idf (term frequency - inverse document frequency). The original logic for tf was straightforward: count of term t / number of total terms in the document. However, I came across the log scaled frequency: log(1 + count of term t in the document). Please refer to Wikipedia. It does not include the number of total terms in a document. For example, say, document 1 has 10 words in total and one of them is "happy". Using the …
Category: Data Science

XGBoost non-linear regression

Is it possible to use XGBoost regressor to do non-linear regressions? I know of the objectives linear and logistic. The linear objective works very good with the gblinear booster. This made me wonder if it is possible to use XGBoost for non-linear regressions like logarithmic or polynomial regression. a) Is it generally possible to make polynomial regression like in CNN where XGBoost approximates the data by generating n-polynomial function? b) If a) is generally not possible, would it be possible …
Category: Data Science

RANSAC and R2, why the r2 score is negative?

I was experimenting with curve_fit, RANSAC and stuff trying to learn the basics and there is one thing I don´t understand. Why is R2 score negative here? import numpy as np import warnings import matplotlib.pyplot as plt from sklearn.metrics import r2_score from sklearn.base import BaseEstimator from sklearn.linear_model import RANSACRegressor from scipy.optimize import OptimizeWarning from scipy.optimize import curve_fit class LogarithmicRegression(BaseEstimator): def __init__(self, log_base=np.log): self.__log_base = log_base def __log_expr(self, x, a, b, c): with warnings.catch_warnings(): warnings.simplefilter("ignore", RuntimeWarning) return a * self.__log_base(x+c) + …
Category: Data Science

How to maximize a log linear regression equation satisfying a constraint?

I have a log linear equation of the form $y = w_1(\log{X1}) + w_2(\log{X2}) + ... + w_n(\log{Xn})$. How can I find the value of X's that maximize the value of y subject to a constraint $(X_1+X_2+...+X_n \le t)$ in python or excel? I tried using Excel solver but ran into issues where one of the X variables became 0 in the process and gave undefined results.
Category: Data Science

Normalizing variables with logarithmic shape

A simple model with two variables [A,B] to train, let's say, a logistic regression or any other classification model: A: Flat distribution from 0 to 100. B: A logarithmic distribution from 0 to a few thousands. What would be the proper way to normalize this? Should I make B flat before? Do I put a limit before the max in B and consider all points above like the max? I read you carefully. Thanks in advance.
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.