non-parametric

Is SVM a good choice for this imputing a numerical variable?

Stonecat

2022年3月31日 12:56

Let's say I have 10,000 training points, 100,000,000 points to impute, and 5-10 prediction variables/parameters, all numeric (for now). The target variable is numeric, skewed normal with outliers. I want to use SVM, but I'm new, so I would appreciate any opinions.

Topic: non-parametric data-imputation svm algorithms

Category: Data Science

How do you use KS-test in a data science report?

Carl

2022年3月20日 20:07

I'm writing a data science report, I want to find an exist distribution to fit the sample. I got a good looking result , but when I use KS-test to test the model, I got a low p-value,1.2e-4, definitely I should reject the model. I mean, whatever what distribution/model you use to fit the sample, you cannot expect to have a perfect result, especially working with huge amount of data. So what does KS-test do in a data science report? …

Topic: non-parametric data-science-model hypothesis-testing statistics

Category: Data Science

Nonparametric Outlier Detection

Arkan

2021年12月12日 12:59

Which Nonparametric outlier detection do you suggest to detect outliers (red points) in these plots? I have tested std, IQR, etc., but no good result. It is just one vector including normal and outliers. Thanks for your help.

Topic: non-parametric anomaly-detection outlier data-mining machine-learning

Category: Data Science

How to automatically segment multidimensional data?

Cai

2021年5月15日 03:37

How to partition the time-series multidimensional data in the figure below into segments using an unsupervised algorithm, so that the information within the same segment remains consistent, while the information in adjacent segments differs? Note that the algorithm should be adaptive because we do not know how many segments each time-series data should be divided into. The data can be found here.

Topic: non-parametric time-series

Category: Data Science

Multiple regression with non-normal data in minitab - help

shar6580

2021年5月3日 10:28

I am aiming to assess the effect of BMI (continuous) on certain biomarkers (also continuous) whilst adjusting for several relevant variables (mixed categorical and continuous) using multiple regression. My data is non-normal which I believe violates one of the key assumptions of multiple linear regression. Whilst I think it can still be performed I think it affects significance testing which is an issue for me. I think I can transform the data and then perform regression but I'm not sure …

Topic: transformation non-parametric regression

Category: Data Science

Do non-parametric models always overfit without regularization?

kennysong

2021年3月18日 14:40

Let's scope this to just classification. It's clear that if you fully grow out a decision tree with no regularization (e.g. max depth, pruning), it will overfit the training data and get full accuracy down to Bayes error*. Is this universally true for all non-parametric methods? *Assuming the model has access to the "right" features.

Topic: non-parametric bias overfitting regularization machine-learning

Category: Data Science

pass variable length argument to mstats.kruskalwallis

Ayush Ranjan

2021年2月16日 16:00

I am trying to run kruskawallis test on multiple columns of my data for that i wrote an function var=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'] def kruskawallis_test(column): k_test=train.loc[:,[column,'SalePrice']] x=pd.pivot_table(k_test,index=k_test.index, values='SalePrice',columns=column) for i in range(x.shape[1]): var[i]=x.iloc[:,i] var[i]=var[i][~var[i].isnull()].tolist() H, pval = mstats.kruskalwallis(var[0],var[1],var[2],var[3]) return pval the problem i am facing is every column have a different number of groups so var[0],var[1],var[2],var[3] will not be correct for every column. mstats.kruskalwallis() take input vector which contain values of each group to be compared from a particular column.(as per my knowledge). …

Topic: anova non-parametric statistics machine-learning

Category: Data Science

what are the main differences between parametric and non-parametric machine learning algorithms?

JackEarl

2020年12月16日 10:27

I am interested in parametric and non-parametric machine learning algorithms, their advantages and disadvantages and also their main differences regarding computational complexities. In particular I am interested in the parametric Gaussian Mixture Model (GMM) and the non-parametric kernel density estimation (KDE). I found out that if a "small" number of data points is used then parametric (like GMM/EM) are the better choice but if the amount of data points increases to a much higher number then non-parametric algorithms are better. …

Topic: non-parametric kernel gaussian computer-vision machine-learning

Category: Data Science

Books about statistical inference

E. Ginzburg

2020年10月31日 14:16

I'm currently taking a course "Introduction to Machine Learning" which covers the following topics: linear regression, overfitting, classification problems, parametric & non-parametric models, Bayesian & non Bayesian models, generative classification, neural networks, SVM, boosting & bagging, unsupervised learning. I've asked the course stuff for some reading material about those subjects but I would like to hear some more recommendations about books (or any other material) that give more intuition about the listed topics to start with and also some books …

Topic: non-parametric bayesian parameter-estimation neural-network machine-learning

Category: Data Science

Should features be correlated or uncorrelated for classification?

Srishti M

2020年8月12日 03:40

I have seen researchers using pearson's correlation coefficient to find out the relevant features -- to keep the features that have a high correlation value with the target. The implication is that the correlated features contribute more information in finding out the target in classification problems. Whereas, we remove the features which are redundant and have very negligible correlation value. Q1) Should highly correlated features with the target variable be included or removed from classification problems ? Is there a …

Topic: anova non-parametric feature-engineering classification

Category: Data Science

Logic behind the Statement on Non-Parametric models

jayant98

2020年7月27日 04:50

I am currently reading 'Mastering Machine Learning with scikit-learn', 2E, by Packt. In Lazy Learning and Non-Parametric models topic in Chapter 3- Classification and Regression with k-Nearest Neighbors, there is a paragraph stating- Non-parametric models can be useful when training data is abundant and you have little prior knowledge about the relationship between the response and the explanatory variables. kNN makes only one assumption: instances that are near each other are likely to have similar values of the response variable. …

Topic: non-parametric k-nn machine-learning-model self-study

Category: Data Science

Good introductory reference for Bayesian Non-parametric (Dirichlet Process / Chinese Restaurant Process)

SHASHANK GUPTA

2020年5月13日 09:04

I am looking for a recommendation for basic introductory material on Bayesian Non-parametric methods, specifically Dirichlet Process / Chinese Restaurant Process. I am looking for material which covers the modeling part as well as the inference part from ground-up. Most of the material I found on the internet has slightly advanced material and they skip the inference part, which is usually harder to grasp.

Topic: bayesian-nonparametric non-parametric bayesian machine-learning

Category: Data Science

Linear vs Non linear regression (Basic Beginner)

mewbie

2020年5月8日 12:05

So my doubt is basically in Linear regression, We try to fit a straight line or a curve for a given training set. Now, I believe whenever the features (independent variable) increases, parameters also increases. Hence computing these parameters is computationally expensive. So, I guess that's the reason we move to Non linear!? Is my understanding right? And my next doubt is, in overfitting for linear regression, we say that the model memorizes. What I understand is that the parameters …

Topic: non-parametric linear-regression neural-network machine-learning

Category: Data Science

About confidence/prediction intervals: parametric methods VS non-parametric (via bootstrap) methods

German C M

2020年4月12日 08:24

About the methodology to find confidence and/or prediction intervals in, let's say, a regression problem, I know 2 main options: Checking normality in the estimates/predictions distribution, and applying well known Gaussian alike methods to find those intervals if the distribution is gaussian Applying non-parametric methodologies like bootstraping, so we do not need to assume/check/care whether our distribution is normal With this in mind, I would basically always go for the second one because: it is meant to be generic, as …

Topic: bootstraping non-parametric machine-learning

Category: Data Science

Non-parametric regression on set of time series: One model for each or one for all series?

Make42

2020年2月4日 15:18

Let's say I have a set of 1D time series which values have been samples in equip-distant time steps with timestamps $1,2,3,...$, they have all the same lengths and are somewhat similar in shape. I want to apply non-parametric regression (e.g. with Gaussian Processes or Kernel Regression) on the time series in order to infer values for timestamps that are between sample timestamps (e.g. $5.3$). The obvious way of doing this would be to simply build a regression model for …

Topic: non-parametric regression time-series dataset

Category: Data Science