scipy

How to make a gaussian distribution in python considering mean. variance. skewness and kurtosis?

rb173

2022年6月3日 10:04

np.random.normal(mean,sigma,size) allows to create a gaussian distribution based only on mean and variance. I want to create a distribution based on function_name(mean,sigma,skew,kurtosis,size). I tried scipy.stats.gengamma but I don't understand how to use it. It takes 2 parameters - a,c and creates a distribution. But it is difficult to interpret for what values of a & c, the function will give a particular value of skewness and kurtosis. Can anyone explain how to use gengamma or any other way to create …

Topic: distribution scipy python

Category: Data Science

Dendrogram: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Sam.H

2022年6月1日 10:03

I am trying to plot a Dendrogram to cluster data but this error is stopping me. My datea is here. I first chose columns to work with: df_euro = pd.read_csv('https://assets.datacamp.com/production/repositories/655/datasets/2a1f3ab7bcc76eef1b8e1eb29afbd54c4ebf86f2/eurovision-2016.csv') samples = df_euro.iloc[:, 2:7].values[:42] country_names = df_euro.iloc[:, 1].values[:42] # Calculate the linkage: mergings mergings = linkage(samples , method = 'complete') # Plot the dendrogram dendrogram( mergings, labels = y, leaf_rotation = 90, leaf_font_size = 6 ) plt.show() But I'm getting this error which I can't understand. I googled it and …

Topic: unsupervised-learning scipy clustering machine-learning

Category: Data Science

Evaluate Dendrogram Statistical Significance

Mirko

2022年5月3日 10:33

I have N=21 objects and each one has about 80 possible not NaN descriptors. I carried out a hierarchical clustering on the objects and I obtained this dendrogram. I want some kind of 'confidence' index for the dendrogram or for each node. I saw many dendrograms with Bootstrap values (as far as I understand it is the same as Monte Carlo Cross-Validation, but I might be wrong), and i think that in my case they could be used as well. …

Topic: confidence bootstraping scipy python clustering

Category: Data Science

Scipy curve_fit and method "dogbox"

zipline86

2022年4月30日 05:02

I am trying to duplicate this papers feature engineering for user activity. They take 14 days of accumulated user activity and keep the parameters (2 parameters) that fit a sigmoid to it. I would like to do the same except with 7 days of activity. http://hanj.cs.illinois.edu/pdf/kdd18_cyang.pdf They use the formula below and keep the parameters x0 and k as features. from scipy.optimize import curve_fit import numpy as np def sigmoid(x, x0, k): y = 1 / (1 + np.exp(-k*(x-x0))) return …

Topic: scipy

Category: Data Science

How to sample a dataframe or numpy array with a particular interval?

NEEK

2022年4月26日 03:20

I have the following dataframe : A B1 B2 B3 B4 B5 B6 B7 0 0 0 0 0 0 0 0 1 444 325 479 502 630 458 588 2 1200 1255 1101 1259 1365 1400 1100 3 2092 1764 2103 2359 2245 2397 2487 4 2586 2232 2549 2597 2628 2718 2770 5 2951 2762 2924 2757 2903 2934 2963 I want to sample the columns uniformly.For examples I want to divide the interval 0 to 1 for …

Topic: numpy data scipy pandas statistics

Category: Data Science

the mean and standard deviation aren't the same as those of the input data i provided after sampling

codebreaker12

2022年4月25日 14:21

I have a log-normal mean and a standard deviation. after i converted them to the underlying normal distribution's parameters mu and sigma, I sampled from the log-normal distribution however when i take the mean and standard deviation of this sampled data i don't get the results i plugged in at first. This only happens when the log-normal mean is way smaller than the log-normal standard deviation otherwise it works. how do i prevent this from happening and get the input …

Topic: distribution probability scipy sampling python

Category: Data Science

What kind of hypothesis testing in Python can be used to validate that 4 job titles are significantly different based on their skillset?

Justin Schmidt

2022年4月18日 12:41

I have 4 job titles, for each of which I scraped hundreds of job descriptions and classified them by if they contain words related to a predefined list of skills. For each job description, I now have a True/False parameter if they mention one of the skills. How can I validate that there is a significant difference between job descriptions that represent different job titles? I'm very new to this topic and all I could think of is using dummy …

Topic: web-scraping hypothesis-testing scipy python categorical-data

Category: Data Science

Error when drawing random numbers from a custom continuous distribution using scipy.rv_continuous

pelegs

2022年4月14日 14:34

I am trying to generate a sample of random numbers from a custom distribution $$ p(x) = x^{n}e^{-xtn}. $$ After reading the tutorial on scipy's website, I wrote a subclass which I called kbayes: class kbayes(rv_continuous): def _pdf(self, x, t, n): p = x**n * np.exp(-t*n*x) s = np.sum(p) return p/s The line s=np.sum(p) is there to normalize the distribution. The pdf seems to be ok when I check it on some numbers: running the following code ks = np.logspace(-5, …

Topic: numpy scipy python statistics

Category: Data Science

How does one feed graph optimization problems into Python's anneal function in SciPy?

user2896468

2022年4月9日 15:50

I am interested in graph problems like 2-color, max-clique, stable sets, etc but the documentation for scipy.optimize.anneal seems to be for ordinary functions. How would one apply this library towards graph formulations?

Topic: scipy graphs optimization python

Category: Data Science

How do I properly write scipy.stats.binom.cdf() details

Polina

2022年4月8日 07:17

I need to calculate the probability of my random variable being $\le 0$. It's a binomial distribution, $10000$ trials, probability of success is $\frac{10}{19}$ (roughly $0.53$). How do I properly use the scipy.stats.binom.cdf() to do that? I've tried the following: stats.binom(10000, a).cdf(0) But it gives me an answer $0$. I feel like I might be missing something about the formula itself.

Topic: probability scipy python statistics

Category: Data Science

Vectorize scipy.stats.norm.logpdf

Vellyxenya

2022年4月8日 07:06

I am tryint to trying to train a Bayesian NN and at some point I need to compute log-likelihoods for some data points, according to a multivariate diagonal gaussian distribution with parameters (mu, sigma). I have 2 problems: I don't know the size of the values in advance (note that I am guaranteed that 'values', 'mu' and 'rho') are the same size, but they could either be 1D or 2D, which forces me to have an ugly if statement. Ideally …

Topic: numpy scipy

Category: Data Science

Stemming/lemmatization for German words

johnnydoe

2022年4月7日 00:02

I have a huge dataset of German words and their frequency in a text corpus (so words like "der", "die", "das" have a very high frequency, whereas terminology-like words have a very low frequency). Different forms of the same word, such as plural or 3rd person forms do appear, but there is no guarantee that this happens for every word. I tried using spacy.load('de_core_news_sm') but it says it can't find the model. Other older posts don't mention anything reliable in …

Topic: nltk scipy nlp python

Category: Data Science

Find the right balance between price of a property and agent fee

TPPZ

2022年3月17日 08:01

I would like to know when buying a property when is better for an estate agent to get a higher fee from me compared to the seller if we get a deal with a lower amount. As an example, let's say that: the property asking price is €350k the agent fee for the buyer is 3% the agent fee for the seller is 3% All of the above could be parameters. I would like e.g. to offer €300k (50k less …

Topic: linear-programming numpy scipy optimization python

Category: Data Science

Why does the 1st derivative appear to lag the slope of the fit in Scipy's Savitzky-Golay filter?

quant

2022年3月9日 18:08

I have a simple script that performs the Savitzky-Golay filter on a toy dataset of forex prices from yahoo finance: import scipy.signal price_series = pandas.read_csv('AUDUSD=X.csv').set_index('Date')['Close'] splinal_fit = scipy.signal.savgol_filter(price_series, window_length=21, polyorder=2, deriv=0, mode='mirror') splinal_fit = pandas.Series(splinal_fit, index=price_series.index, name='fit') splinal_deriv = scipy.signal.savgol_filter(price_series, window_length=21, polyorder=2, deriv=1, axis=0, delta=1) splinal_deriv = pandas.Series(splinal_deriv, index=price_series.index, name='fit') The fit and derivatives looks broadly sensible, however, the x-axis seems skewed. Here is what I ran to plot the derivative alongside the original fit: import matplotlib.pyplot as plt mask …

Topic: convolution scipy python

Category: Data Science

How to make scipy.optimize.basinhopping find the global optimal point

mon

2022年3月4日 08:00

Question Try to find the global optimal point of the function (reading Python for finance 2nd edition - Chapter 11. Mathematical Tools). def fm(p): x, y = p return (np.sin(x) + 0.05 * x ** 2 + np.sin(y) + 0.05 * y ** 2) scipy.optimize.basinhopping says it finds the global minimum. Find the global minimum of a function using the basin-hopping algorithm However, it looks it does not find the global optimal point. Why is this and how can make …

Topic: scipy optimization

Category: Data Science

Feature Selection: How to select categorical features in a regression problem

user140259

2022年2月27日 15:17

I am reviewing information for feature selection based in filter methods. I got info (link1, link2, link3, link4, link5) for: Numerical input, numerical output Categorical input, categorical output Numerical input, categorical output However, I'm having a hard time finding information on: Categorical input, numerical output (categorical features in a regression problem.) I would be grateful if you could pass me information about it, please, or the name of the function that could carry out this task.

Topic: scipy feature-selection python

Category: Data Science

Which is the best algorithm for entity extraction for unstructured document

Rajesh das

2022年2月2日 06:01

I have unstructured documents from which I have to extract the information like let buyer name, seller name, expiry date, buying date etc. I had planned to use spacy(Custom entity recolonization(Followed this blog https://medium.com/@manivannan_data/how-to-train-ner-with-custom-training-data-using-spacy-188e0e508c6)). But it seems sometimes buyer name predict as seller name and vice-versa and also sometimes got multiple predicted data wrongly in single entity when I passed whole document content. FYI.. This documents have approx 2-20 pages. so it has large content. Can someone share if we …

Topic: scipy python machine-learning

Category: Data Science

p-value of chi squared test is exactly 0.0

Michele Papucci

2022年1月27日 22:05

I need to do a chi square test of two of my dataset's categorical variables. This two variables have basically the same meaning but comes from two different sources, so my idea is to use a chi square test to see how "similar" or correlated, these two variables really are. To do so, I've written code in Python, but the p-value I get from it is exactly 0 which sounds a little strange to me. the code is: from scipy.stats …

Topic: chi-square-test pvalue scipy python

Category: Data Science

Optimizing an averaged perceptron algorithm using numpy and scipy instead of dictionaries

QuantumHoneybees

2021年12月29日 01:46

So I'm trying to write an averaged perceptron algorithm (page 48 here for the equation) in python. Instead of storing the historical weights, I simply accumulate the weights and then multiply consistency counter, $c$, that is the variable w_accum. My implementation initially had the weight vectors and x represented as dictionaries where a feature is in the dictionary only if it's active, that was supposed to be the most efficient way I could think of. Here is that code: def …

Topic: numpy perceptron implementation scipy optimization

Category: Data Science

Create Period column based on a date column where the first month is 1, second 2, etc

conradoov

2021年11月10日 07:48

I have a dataset with many project's monthly expendituries (cost curve), like this one: Project Date Expenditure(USD) Project A 12-2020 500 Project A 01-2021 1257 Project A 02-2021 125889 Project A 03-2021 102447 Project A 04-2021 1248 Project A 05-2021 1222 Project A 06-2021 856 Project B 01-2021 5589 Project B 02-2021 52874 Project B 03-2021 5698745 Project B 04-2021 2031487 Project B 05-2021 2359874 Project B 06-2021 25413 Project B 07-2021 2014 Project B 08-2021 2569 Using python, I …

Topic: project-planning data-science-model python-3.x scipy dataset

Category: Data Science

About