theory

Learning the Average of a 0/1 Dependent Variable

Ami Tavory

2022年5月10日 07:11

uppose I have a matrix and a dependent vector whose entries are each in {0,1} dependent on the corresponding row of Given this dataset, I'd like to learn a model, so that given some other dataset ′, I could predict average(′) of the dependent-variable vector ′. Note that I'm only interested in the response on the aggregate level of the entire dataset. One way of doing so would be to train a calibrated binary classifier →, apply it to ′, …

Topic: theory aggregation classification

Category: Data Science

Time series test data dilema

drumkey

2022年5月8日 14:00

I’m trying to build a model to predict the amount of sales of a product for the next few days This question is about whether or not I should use the tail of the serie as the test set and train models using the rest of the data or I should create a test set picking dates at random as usual Reading about classical time series models (ARIMA), they recommend the first approach (using the last days as test) but …

Topic: forecasting machine-learning-model theory time-series

Category: Data Science

Why do the performance of DL models increase with the volume of data while that of ML models will flat out or even decrease?

Lam TRINH Thanh

2022年5月5日 06:27

I have read some articles and realized that many of them cited, for example, DLis better for large amount of data than ML. Typically: The performance of machine learning algorithms decreases as the number of data increases Source Another one says the performance of ML models will plateau, Source As far as I understand, the more data, the better. It helps us implement complex models without overfitting as well as the algorithms learn the data better, thus inferring decent patterns …

Topic: theory

Category: Data Science

Geometric Deep Learning - G-Smoothing operator on polynomials

luxoar

2022年4月24日 06:37

(Note: My question resolves about a problem stated in the following lecture video: https://youtu.be/ERL17gbbSwo?t=413 Hi, I hope this is the right forum for these kind of questions. I'm currently following the lectures of geometric deep learning from (geometricdeeplearning.com) and find the topics fascinating. As I want to really dive in I wanted to also follow up on the questions they state towards the students. In particular my question revolves around creating invariant functions using the G-Smoothing operator (To enforce invariance, …

Topic: mathematics theory deep-learning

Category: Data Science

Creating a map between N images and N labels using CNN

Akash

2022年3月31日 17:19

I have seen classification CNNs that train with numerous images for a subset of labels (i.e. Number of images >> Number of labels), however, is it still possible to use CNNs when the number of images = Number of labels? specifically consider: having N settings that you can control to generate a unique image. Is it possible to make a CNN that can describe the mapping? (Is CNN the right architecture to use?)

Topic: cnn theory image-classification machine-learning

Category: Data Science

Proof of GOSS algorithm in lightGBM paper

HannesZ

2022年3月4日 08:19

In the LightGBM paper the authors make use of a newly developed sampling method GOSS to reduce the number of data instances needed for finding the best split of a given feature in a tree-node. They give an error estimation for the error made by sampling instead of taking the entire data (Theorem 3.2 in https://www.microsoft.com/en-us/research/wp-content/uploads/2017/11/lightgbm.pdf) I am interested in the proof of this Theorem for which the paper refers to "supplementary materials" Where can I find those?

Topic: lightgbm theory

Category: Data Science

Use of multiple models vs training a single model for multiple outputs

SirAchesis

2022年2月19日 12:04

So let's say I have data with numerical variables A, B and C. I believe that the value of a has an effect on B. I also believe that A and B both have an effect on C. I don't think C has an effect on either A or B. I want to use machine learning to predict A, B and C. I obviously have A and B as training data, and I have other variables as training data too. …

Topic: machine-learning-model theory

Category: Data Science

How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

Shashank Kumar

2022年2月4日 15:28

Let's say our data is discrete-valued and belongs to one of $K$ classes. The underlying probability distribution is assumed to be a categorical/multinoulli distribution given as $p(\textbf{x}) = \prod_{k = 1}^{K}\mu_{k}^{x_{k}}$ where x is a one-hot vector given as $\textbf{x} = [x_{1} x_{2} ... x_{K}]^{T}$ and $\boldsymbol{\mu} = [\mu_{1} ... \mu_{K} ]^{T}$ are the parameters. Suppose $D = \{\mathbf{x}_{1}, \text{ } \mathbf{x}_{2}, \text{ } ... ,\text{ }\mathbf{x}_{N}\}$ is our data. The log likelihood is: $\log p(D|\boldsymbol{\mu}) = \sum_{k = 1}^{K} …

Topic: mathematics theory parameter-estimation optimization categorical-data

Category: Data Science

Explanation of inductive bias of Candidate Elimination Algorithm

JustABeginner

2022年1月23日 16:19

The definition of inductive bias says that The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered. The inductive bias of Candidate elimination says that The target concept c is contained in the given hypothesis space H My question is , how does this inductive bias help us to predict for next instance in given dataset?

Topic: theory machine-learning

Category: Data Science

Lasso (or Ridge) vs Bayesian MAP

dzheng1887

2022年1月18日 08:03

This is the first time I have posted here. I am looking for some feedback or perspective on this question. To make it simple, let's just talk about linear models. We know the MLE solution for the $l_1$ loss objective is the same as the Bayesian MAP estimate with a Laplace prior for each parameter. I'll show it here for convenience. For vector $Y$ with $n$ observations, matrix $X$, parameters $\beta$, and noise $\epsilon$ $$Y = X\beta + \epsilon,$$ the …

Topic: linear-models lasso theory bayesian

Category: Data Science

Is the hypothesis space spanned by kernel evaluations on datapoints equivalent to the hypothesis space of linear functionals in the feature space?

Chrisu

2021年10月7日 22:35

when studying kernel methods a few years ago I got a bit confused with the concepts of feature space, hypothesis space and reproducing kernel Hilbert space. Recently, I thought a little about questions that I asked myself back then (with newly acquired math background) and noticed that some things are still unclear to me. I appreciate help and pointers to good - mathematical - literature. Let's consider the following learning problem: We are given a training sample $((x_1, y_1), \dots, …

Topic: kernel mathematics theory

Category: Data Science

Can XGBoost support vector outputs?

RyanM

2021年10月6日 21:45

I am interested in fitting data (regression rather than classification) with individual targets which are vectors via an XGBoost type model. However, currently Python's xgboost.XGBRegressor model only supports scalar targets. Looking at the original paper defining the algorithm, it seems possible we could just extend their methods using a vectorized form: Paper here Following their notation, if one simply assumed that $f_t(x_i)$ is a vector in $\mathbb{R}^k$ I think the multi-dimensional analogue of equation (6) would be something like: $$\tilde{\mathcal{L}}^{(t)}(q) …

Topic: theory xgboost optimization

Category: Data Science

Structuring experiment/training data with months in mind

lte__

2021年10月5日 02:02

We're using a whole year's data to predict a certain target variable.The model works like data - OneHot encoding the categorical variables - MinMaxScaler - PCA (to choose a subset of 2000 components out of the 15k) - MLPRegressor. When we're doing a ShuffleSplit cross-validation and everything is hunky-dory (r^2 scores above 0.9 and low error rates), however in real life, they're not going to use the data in the same format (e.g. a whole year's data), but rather a …

Topic: mlp theory regression experiments

Category: Data Science

How to use the eval set in catboost appropriately?

user125720

2021年9月25日 05:10

Let's say you have a dataset, and you split it into 80% training and 20% testing. Naturally, you want to find the optimal hyperparameters for your model, so with the training set, you plan to do cross validation and search parameter space. CatBoost has something called the eval set which is used to help avoid overfitting, but I have a fundamental question on how to use it appropriately. Say you do CV10. So now we have 10 iterations where 90% …

Topic: catboost theory supervised-learning machine-learning

Category: Data Science

End-to-end machine learning project processes

rober_dinero

2021年8月30日 08:07

I've read a book chapter that walks you through all the steps involved in an end-to-end machine learning project. After doing all the practical exercises I'm still not quite sure that my way of thinking about the whole process is right. I've tried to depict it in the following flowchart: Is this the right way of thinking about all the steps in an ML project? Is there something missing?

Topic: finetuning theory machine-learning

Category: Data Science

Would all classification models perform similarly in a theoretical and ideal scenario?

Tendero

2021年8月7日 15:47

Imagine that we count on infinite computation power, an infinite amount of data and we have an infinite amount of time to wait for a model to learn. In such a scenario, we want to have some data binary classified. My question is: would all classification models (we can leave out linear models because they won't be able to learn non-linear boundaries) perform similarly? In other words, are all the (in principle) solvable problems by each (non-linear) classification algorithm the …

Topic: theory model-selection classification machine-learning

Category: Data Science

Which neural network is better?

user51515151

2021年7月28日 14:09

MNIST dataset with 60 000 training samples and 10 000 test samples. Neural network #1. Accuracy on the training set: 99.53%. Accuracy on the test set: 99.31%. Neural network #2. Accuracy on the training set: 100.0%. Accuracy on the test set: 99.19%. Which neural network is better if other parameters are unknown? I have seen how many studies focus on accuracy on a test set, and rarely write about accuracy on a training set. The first neural network is better …

Topic: mnist theory accuracy neural-network

Category: Data Science

Theoretical basis for neural network "effort"

Elliot Way

2021年6月28日 16:36

I might be in danger of having my question closed as "not clear what I'm asking for," but here goes. Suppose we have a simple feedforward network. It has a few layers, each layer has a "reasonable" number of neurons, nothing complicated. Let's say the output has size $n$, and there is no final activation function on the output. The network will have an "easier" time training to produce some outputs relative to others. In particular, outputs close to 0, …

Topic: theory neural-network machine-learning

Category: Data Science

Given M binary variables and R samples, what is the maximum number of leaves in a decision tree?

yaminlee

2021年6月27日 16:40

Given M binary variables and R samples, what is the maximum number of leaves in a decision tree? My first assumption was that the worst case would be a leaf for each sample, thus R leaves maximum. Am I wrong and there should be a kind of connection with the number of variables M? I know that the maximum depth of a decision tree is M as a variable can appear once in a branch, but I don't see the …

Topic: theory decision-trees

Category: Data Science

How to input a list into my model and not have it care about order

SirAchesis

2021年6月25日 23:58

I'm trying to predict a list of numbers, e.g: [23,55,198,200,64] The data I have includes multiple things, along with: The numbers from the previous run (These numbers come from scientific experiments) A list of all previous lists of numbers So for example if two runs ago we got [22,24,77,187,21], and the run after that we got [90,22,76,88,29], we would now have a list of [[22,24,77,187,21],[90,22,76,88,29]] The important thing is that it doesn't matter what order the numbers are in. [22,24,77,187,21] …

Topic: theory data machine-learning

Category: Data Science

About