randomized-algorithms

Need a random process/distribution where I can pass a certain level of bias for producing an outcome

Leo Torres

2021年11月28日 22:10

My first question here if am not clear please let me know. My objective a startup Sportsbook wants to test its algo to see how it manages game lines for incoming bets placed on a particular game. For example, as bets come in for a particular team the algo checks the book to see if it can cover and when the book is lob-sided it will adjust the line/odds giving the other team more favorable odds to balance the book …

Topic: gaussian-process distribution randomized-algorithms algorithms

Category: Data Science

contextual bandits for online learning

Pavan Sangha

2021年5月11日 00:01

Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!

Topic: randomized-algorithms online-learning reinforcement-learning machine-learning

Category: Data Science

Splitting train/test sets by an identifier?

Greg Rosen

2021年5月7日 21:58

I know sklearn has train_test_split() to split a train and test set. But I read that, even with setting a random seed, if your actual dataset is updated regularly, the random seed will reset with each updated dataset and take a different train/test split. Doing this, your ML algos will eventually cover the whole dataset, defeating the purpose of the train/test split because it'll eventually train on too much of the whole dataset over time. The book I'm reading (Hands-On …

Topic: randomized-algorithms dataset python data-cleaning machine-learning

Category: Data Science

Create a random chi-Square independence distribution with a given p-Value

Cowboy_Patrick

2021年4月24日 14:04

I want to randomly create a table of data that has a predefined p-Value and chi-Value of a chi-square distribution. For example this would have a p-Value of 1 on a chi-square independence test: [[25,25], [25,25]] Trying arround some random values I see that: [[50,0], [30,20]] has a p-Value of 2.02E-6 and a chi-Value of 22,56. But how would I do it the other way arround? I have given p-Value of 0.05 for example from that I want to get …

Topic: randomized-algorithms python

Category: Data Science

Why should the initialization of weights and bias be chosen around 0?

cinqS

2021年3月11日 20:17

I read this: To train our neural network, we will initialize each parameter W(l)ijWij(l) and each b(l)ibi(l) to a small random value near zero (say according to a Normal(0,ϵ2)Normal(0,ϵ2) distribution for some small ϵϵ, say 0.01) from Stanford Deep learning tutorials at the 7th paragraph in the Backpropagation Algorithm What I don't understand is why the initialization of the weight or bias should be around 0?

Topic: stanford-nlp randomized-algorithms deep-learning

Category: Data Science

Cannot clone object <keras.wrappers.scikit_learn.KerasRegressor object at 0x7fdc9c3ba550>

Ruchika Sancheti

2020年9月2日 13:49

Trying to hypertune ANN but getting an error while using fit..(grid1.fit(X_train, y_train)) Below is the code def create_model(dropout_rate,weight_constraint,optimizer,init,layers,activation): model = Sequential() model.add(Dense(nodes, input_dim=171, kernel_initializer=init, activation='relu', kernel_constraint=maxnorm(weight_constraint))) model.add(Dropout(dropout_rate)) model.add(Dense(1, kernel_initializer=init, activation='relu')) model.compile(loss='mse', optimizer=optimizers, metrics=['mean_absolute_error']) return model model = KerasRegressor(build_fn=create_model, verbose=0) #hyperparameters layers = [[50],[50, 20], [50, 30, 15], [70,45,15,5]] optimizers = ['rmsprop', 'adam'] dropout_rate = [0.1, 0.2, 0.3, 0.4] init = ['glorot_uniform', 'normal', 'uniform'] epochs = [150, 500] batches = [5, 10, 20] weight_constraint = [1, 2, 3] param_dist = dict(optimizer=optimizers, …

Topic: hyperparameter-tuning randomized-algorithms keras error-handling

Category: Data Science

How to choose the random seed?

Bruno Lubascher

2020年5月30日 22:19

I understand this question can be strange, but how do I pick the final random_seed for my classifier? Below is an example code. It uses the SGDClassifier from SKlearn on the iris dataset, and GridSearchCV to find the best random_state: from sklearn.linear_model import SGDClassifier from sklearn import datasets from sklearn.model_selection import train_test_split, GridSearchCV iris = datasets.load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42) parameters = {'random_state':[1, 42, 999, 123456]} sgd = SGDClassifier(max_iter=20, shuffle=True) …

Topic: hyperparameter-tuning randomized-algorithms hyperparameter

Category: Data Science

What is the objective that is optimized with Random Search?

RazorLazor

2020年5月1日 15:13

I have recently learned about Random Search (or sklearn.model_selection.RandomizedSearchCV in Python) and was thinking about the theory behind the optimization process. In particular my question is, given that one performs Random Search on a certain algorithm (let's say random forest), what are the best hyperparameter based on? More specifically in what sense are they the "best" hyperparameters for the model? Do they maximize accuracy of the model? If not what is the (performance-)criterion that is optimized? Or is it entropy/gini?

Topic: hyperparameter-tuning randomized-algorithms optimization

Category: Data Science

Is shuffling data really necessary for training?

user95039

2020年4月17日 00:05

I don't mean if we had a dataset where if sequentially sampled, the labels would be [1111122223333]. In this case, the network learns to predict everything as 1, then 2, and so on and it's impossible to learn. I mean: Assume you have Imagenet 2012 dataset. You shuffle it once. So now the labels and the images are shuffled. Since the dataset is huge, can the network really remember the previous epoch's predictions and overfit? OR, I shuffle data 5 …

Topic: randomized-algorithms machine-learning

Category: Data Science

How to compute modulo of a hash?

Den Delimarsky

2020年1月31日 06:52

Let's say that I have a set of users in my database, that have GUIDs as their IDs. I use xxhash to generate fixed-length hashes for each value, so that I can then proceed to "bucketizing" them and being able to do random sampling with the help of the modulo function. That said, if I have a hash such as 367b50760441849e, I want to be able to use hash % 20 == 0 to randomly pick 5% of the population …

Topic: randomized-algorithms sampling dataset bigdata

Category: Data Science

RL Sutton book, initial estimate of q*(a) for 10 arm testbed

mLstudent33

2019年8月4日 11:36

The Sutton book does not mention what the initial estimate is for q*(a) before the first reward is received. In this code repo that seems to go along with the book: Sutton code repo They have initialized it with 0 per snippet below: def __init__(self, kArm=10, epsilon=0., initial=0., stepSize=0.1, sampleAverages=False, UCBParam=None, gradient=False, gradientBaseline=False, trueReward=0.): But the explanation for Figure 2.1 that shows the distribution of rewards for the 10 arms of the bandit says, Figure 2.1: An example bandit problem …

Topic: randomized-algorithms reinforcement-learning

Category: Data Science

How to generate 12 independent random weights which all add up to one

Angus

2019年8月1日 22:13

I'm using Palisade's @Risk software with a triangular distribution to generate 12 random weights which must add up to one, but I get a lot of negative numbers. Is there a straightforward way to set this up?

Topic: distribution randomized-algorithms weighted-data

Category: Data Science

What is the most efficient method for hyperparameter optimization in scikit-learn?

Brian Spiering

2019年5月31日 17:57

An overview of the hyperparameter optimization process in scikit-learn is here. Exhaustive grid search will find the optimal set of hyperparameters for a model. The downside is that exhaustive grid search is slow. Random search is faster than grid search but has unnecessarily high variance. There are also additional strategies in other packages, including scikit-optimize, auto-sklearn, and scikit-hyperband. What is the most efficient (finding reasonably performant parameters quickly) method for hyperparameter optimization in scikit-learn? Ideally, I would like working code …

Topic: hyperparameter-tuning grid-search randomized-algorithms hyperparameter scikit-learn

Category: Data Science

Why would one crossvalidate the random state number?

Dan Chaltiel

2019年5月5日 13:22

Still learning about machine learning, I've stumbled across a kaggle (link), which I cannot understand. Here are lines 72 and 73: parameters = {'solver': ['lbfgs'], 'max_iter': [1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000 ], 'alpha': 10.0 ** -np.arange(1, 10), 'hidden_layer_sizes':np.arange(10, 15), 'random_state':[0,1,2,3,4,5,6,7,8,9]} clf = GridSearchCV(MLPClassifier(), parameters, n_jobs=-1) As you can see, the random_state parameter is been tested across 10 values. What is the point of doing this? If one model perform better with some random_state, does it make any sense to use this particular parameter on …

Topic: mlp randomized-algorithms scikit-learn python

Category: Data Science

how to label a tain_data?

Mukesh Bhandarkar

2019年2月2日 20:54

I have one assignment that I have four files 1) train_data.csv: The training file contains two fields (text, id). 2) train_label.csv: The label file contains two fields (id, label). 3) test_data.csv: The test file contains two fields (text, id). 4) sample_submission.csv: This is a file that needs to be submitted. And this should be obvious multilabel classification, but whenever I try to identify labels in train data, it doesn't show labels. How can I remove noise from train_data?? Any type …

Topic: randomized-algorithms multilabel-classification cross-validation nlp

Category: Data Science

Epoch greedy algorithm for contextual bandits

Pavan Sangha

2018年1月5日 16:26

I'm reading the following paper on the epoch greedy algorithm for the contextual bandits problem. I have two questions http://hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf I'm unsure how they've used the Bernstein inequality on page 6 to conclude $\mu_{n}(\mathcal{H},1) \leq c^{-1} \sqrt{k \mathrm{ln}(m)/n}$. Could someone please elaborate on this as it seems Bernsteins inequality seems to measure whp the deviation of a sum of random variables from it's mean. Where as the regret bound $\mu_{n}(\mathcal{H},1)$ is defined as the expected regret from the empirically best …

Topic: randomized-algorithms probability reinforcement-learning statistics machine-learning

Category: Data Science

Testing Multi-Arm Bandits on Historical Data

Pavan Sangha

2018年1月5日 08:07

Suppose I want to test a multi-arm bandit algorithm in the contextual setting on a set of historical data. For simplicity, let's assume there are only two arms A and B and suppose the rewards are binary. Furthermore, suppose I have a data set where users were shown one of the two arms and I have a record of the rewards. What would be the best approach to simulating the scenario of running the algorithm online? I was thinking of …

Topic: randomized-algorithms simulation online-learning reinforcement-learning machine-learning

Category: Data Science

Multi-arm bandit problem for bernoulli reward distribution

Pavan Sangha

2017年12月28日 15:54

Suppose in the multi-arm bandit problem I know my rewards are distributed as $0$ or $1$ i.e according to a Bernoulli distribution rather than the condition that they lie in the range $[0,1]$. Does anyone know if we can do better with our confidence bounds with this restricted condition? In particular how does the upper confidence bound algorithm change and what is the corresponding upper bound on the expected regret? Can someone provide links to a paper or a set …

Topic: randomized-algorithms probability reinforcement-learning machine-learning

Category: Data Science

HOW TO: Deep Neural Network weight initialization

Joonatan Samuel

2016年3月28日 16:14

Given difficult learning task (e.g high dimensionality, inherent data complexity) Deep Neural Networks become hard to train. To ease many of the problems one might: Normalize && handpick quality data choose a different training algorithm (e.g RMSprop instead of Gradient Descent) pick a steeper gradient Cost function (e.g Cross Entropy instead of MSE) Use different network structure (e.g Convolution layers instead of Feedforward) I have heard that there are clever ways to initialize better weights. For example you can choose …

Topic: randomized-algorithms deep-learning neural-network machine-learning

Category: Data Science

Interpreting the results of randomized PCA in scikit-learn

retsreg

2016年3月8日 02:34

I'm using scikit-learn to do a genome-wide association study with a feature vector of about 100K SNPs. My goal is to tell the biologists which SNPs are "interesting". RandomizedPCA really improved my models, but I'm having trouble interpreting the results. Can scikit-learn tell me which features are used in each component?

Topic: randomized-algorithms pca scikit-learn feature-selection

Category: Data Science

About