parallel

Model Parallelism not working in Inception v3 with Keras and TensorFlow

Reuben_v1

2022年5月27日 11:01

I have been stuck with a problem like this for a while now. I have an AWS setup with 500 GB of RAM and about 7 GPUs. Now the issue is that each time I try to run my Keras with TensorFlow as back-end code, it runs out of memory. I have found out the reason for this as well. The reason is that each GPU just has 12GB of memory, whereas my model needs more than that. So, how …

Topic: gpu keras tensorflow python parallel

Category: Data Science

How to vectorize this loop process

SrtoPeixet

2022年4月27日 19:14

Hi guys I want to ask if anyone knows how to vectorize this code to make it more optimal and faster. loss = 0 total_steps = 0 for i in range(len(distances)): for j in range(len(distances)): for k in range(len(distances)): if not ((i == j) | (i == k) | ( j==k )): if similarities[i][j] >= similarities[i][k]: loss += (distances[i][j] - distances[i][k]).clip(min=0) else: loss += (distances[i][k] - distances[i][j]).clip(min=0) total_steps +=1 return (loss/total_steps)

Topic: pytorch loss-function python parallel

Category: Data Science

Specifying number of threads using XGBoost.train

LauritsT

2022年3月24日 19:27

When using the xgboost.train() function, all the threads are used. I would like to use a specific amount. Unfortunately, this function does not accept the parameters nthread nor n_jobs. How can I control the number of threads being used? Thanks. // Edit It seems that I found a solution. In contrast with the method, how one provides the nthread (or n_jobs) parameter to XGBClassifier of XGBRegressor, by adding this parameter directly to the brackets as xgb.XGBRegressor(nthread=n) then as indicated on …

Topic: xgboost parallel processing

Category: Data Science

How to loop through multiple lists/dict?

spectre

2021年11月23日 10:55

I have the following code which finds the best value of k parameter in the KNNImputer. Basically it is looping through the list of k_value and for each element, it is fitting the KNNImputer to the model and in the end appending the result to an empty dataframe. lire_model = LinearRegression() k_value = [1,3,5,7,9,11, 13, 15, 17, 19, 21] k_value_results = pd.DataFrame(columns = ['k', 'mse', 'rmse', 'mae', 'r2']) scoring_list = ['neg_mean_squared_error', 'neg_root_mean_squared_error', 'neg_mean_absolute_error', 'r2'] for s in k_value: imputer = …

Topic: hyperparameter-tuning grid-search python parallel

Category: Data Science

Methodology for parallelising linked data?

MeridarchGekkota

2021年11月3日 19:06

If I have some form of data that can have inherent links to all other data in the set but I wish to parallelise out this data in order to increase computation time or to reduce the size of any particular piece of data currently being worked on, is there a methodology to split this out into chunks without reducing the validity of the data? For example, assume I have a grid of crime across the whole of a country. …

Topic: methodology optimization parallel

Category: Data Science

How to load and run feature selection on a dataset with 5,000 samples and 500,000 features?

applebanana

2021年5月21日 02:21

I have a dataset with 5000 samples and 500,000 features (all categorical with a cardinality of 3). Two problems I'm trying to solve: Loading the dataset - I can't load it in memory despite using a computing cluster, so I'm assuming I should use a parallelization library like Dask, Spark, or Vaex. Is this the best idea? Feature selection - how to run feature selection within a parallelization library? Can this be done with Dask, Spark, Vaex?

Topic: parallel machine-learning

Category: Data Science

How to run two different models in single frame?

amin sama

2021年1月11日 07:07

I have mask_detector.model and yolov3 social distancing weights. I want to run them simultaneously with a single webcam stream. how can I run them both i.e. detecting mask and social distancing model together?

Topic: data-science-model deep-learning python parallel machine-learning

Category: Data Science

Parallel hyperparameter optimization techniques?

Oooaaa

2021年1月4日 22:08

Most hyperparameter optimization technique want to evaluate points one by one. I have an expensive optimization problem, but i can run hundreds of evaluations in parallel. The dimension of the problem is around 20-30. My variables are mostly continuous. Is there any technique with open source, documented implementation available for this kind of problem?

Topic: bayesian hyperparameter optimization parallel

Category: Data Science

Efficiently Sending Two Series to a Function For Strings with an application to String Matching (Dice Coefficient)

PythonNoob

2020年8月4日 10:54

I am using a Dice Coefficient based function to calculate the similarity of two strings: def dice_coefficient(a,b): try: if not len(a) or not len(b): return 0.0 except: return 0.0 if a == b: return 1.0 if len(a) == 1 or len(b) == 1: return 0.0 a_bigram_list = [a[i:i+2] for i in range(len(a)-1)] b_bigram_list = [b[i:i+2] for i in range(len(b)-1)] a_bigram_list.sort() b_bigram_list.sort() lena = len(a_bigram_list) lenb = len(b_bigram_list) matches = i = j = 0 while (i < lena and j …

Topic: jaccard-coefficient pandas python parallel efficiency

Category: Data Science

Is there a straightforward way to run pandas.DataFrame.isin in parallel?

Therriault

2020年8月2日 12:40

I have a modeling and scoring program that makes heavy use of the DataFrame.isin function of pandas, searching through lists of facebook "like" records of individual users for each of a few thousand specific pages. This is the most time-consuming part of the program, more so than the modeling or scoring pieces, simply because it only runs on one core while the rest runs on a few dozen simultaneously. Though I know I could manually break up the dataframe into …

Topic: pandas python parallel performance

Category: Data Science

What needs to be done to make n_jobs work properly on sklearn? in particular on ElasticNetCV?

OldSchool

2020年5月22日 19:01

The constructor of sklearn.linear_model.ElasticNetCV takesn_jobs as an argument. Quoting the documentation here n_jobs: int, default=None Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. However, running the below simple program on my 4 core machine (spec details below) shows performance is best when n_jobs = None, progressively deteriorating as you increase n_jobs all the way to n_jobs = -1 (supposedly requesting all …

Topic: elastic-net cross-validation scikit-learn parallel machine-learning

Category: Data Science

Pytorch Distributed Computing - Recomendations/Resources/Courses?

Mason Acree

2020年4月29日 02:25

I would like to get into some distributed computing for processing Pytorch CNN models. I am completely fresh in this field and want to get some recommendations as to where I should start researching and learning techniques in distributed computing specifically for Deep Learning. My motivation is that I have access to a lot of personal Windows 10 Desktops with great hardware, a few Ubuntu Linux machines of my own and then my personal desktop that is rigged with great …

Topic: pytorch deep-learning distributed parallel

Category: Data Science

Parallelization of a MIMO linear filter

marco

2020年4月5日 13:51

I would like to implement a Multi Input Multi Output filtering operation, acting as fast as possible on batches of data. Here is my current implementation: def lfilter_mimo(b, a, u_in): batch_size, seq_len, in_ch = u_in.shape # [B, T, I] out_ch, _, _ = a.shape y_out = np.zeros_like(u_in, shape=(batch_size, seq_len, out_ch)) for out_idx in range(out_ch): for in_idx in range(in_ch): y_out[:, :, out_idx] += scipy.signal.lfilter(b[out_idx, in_idx, :], a[out_idx, in_idx, :], u_in[:, :, in_idx], axis=-1) return y_out # [B, T, O] For another …

Topic: numpy scipy parallel

Category: Data Science

Multiple keras models parallel - time efficient

Lara Larsen

2019年12月21日 10:02

I am trying to load two different keras models in parallel. I tried to use the functional API model: input1 = Input(inputShapeOfModel1) input2 = Input(inputShapeOfModel2) output1 = model1(input1) output2 = model2(input2) parallelModel = Model([input1,input2], [output1,output2]) This works but it does not run in parallel actually. Inference time is just the sum of each model's individual inference time. My question is should this run concurrently? I also tried to load them in different py files with gpu memory options. Still I …

Topic: gpu keras tensorflow computer-vision parallel

Category: Data Science

Parallel active optimization

Mark

2019年12月12日 01:01

I'm trying to optimize an expensive function for which I can choose sample points. The difficulty is that many function evaluations may be computed in parallel, taking varying amounts of time. I don't know which keywords to search for to find existing literature(/implementations). So at a time, I might have already computed function values at 18 points, with 15 still being computed, and I want to start evaluating the function another point. Without the running jobs, I could make a …

Topic: sampling optimization parallel

Category: Data Science

Would writing a decision tree algorithm in Pytorch or Tensorflow be faster than with Numpy?

Nicolas Gervais

2019年9月17日 14:15

Since these libraries can turn CPU arrays into GPU tensors, could you parallelize (and therefore accelerate) the calculations for a decision tree? I am considering making a decision tree class written in Tensorflow/Pytorch for a school project, but I want to be certain that it makes sense.

Topic: numpy pytorch tensorflow decision-trees parallel

Category: Data Science

GPU Accelerated Data Processing for R in Windows

Jesse Maher

2019年9月11日 09:31

I'm currently taking a paper on Big Data which has us utilising R heavily for data analysis. I happen to have a GTX1070 in my pc for gaming reasons. Thus, I thought it would be really cool if I could use that to speed up some of the processing for some of the stuff my lecturers have me doing, but it really doesn't seem easy to do this at all. I've installed gpuR, CUDA, Rtools, and a few other bits …

Topic: gpu parallel r

Category: Data Science

CUDA 8.0 is compatible with my GeForce GTX 670M Wikipedia says, but TensorFlow rises an error: GTX 670M's Compute Capability is < 3.0

JarsOfJam-Scheduler

2019年8月5日 07:52

According to Wikipedia, the GeForce GTX 670M has a Compute Capability of 2.1 (and a Fermi micro-architecture), which is confirmed by TensorFlow (I can read "2.1" in the error it rises). Wikipedia says that CUDA 8.0 supports compute capabilities from 2.0 to 5.x (Fermi micro-architecture included). It even says that it's the "last version with support for compute capability 2.x (Fermi)". However, the error rised by TensorFlow says that my being-used CUDA version support at least compute capability of... 3.0... …

Topic: distribution gpu tensorflow parallel

Category: Data Science

Updating Weight Using Updates on Related Data

Varun Chhangani

2019年6月28日 06:14

Suppose $$ x=Ay $$ The $x$ is $M\times 1$, $y$ is $N \times 1$ and $A$ is $M\times N$ We have the data $x$ and would like to know what $y$ is. However, the matrix $A$ is too large for pseudo-inverse. And thus we would like to approximate $A^{-1}$ using machine learning as it is possible to parallelize it. Here for parallelization, we divide the given problem into: $$ x^l = A^l y $$ where $x = [x^1 , x^2,\dots,x^L]^T$ …

Topic: mathematics gradient-descent distributed parallel

Category: Data Science

Can parallel computing be utilized for boosting?

Indominus

2019年1月17日 03:05

Since boosting is sequential, does that mean we cannot use multi-processing or multi-threading to speed it up? If my computer has multiple CPU cores, is there anyway to utilized these extra resources in boosting?

Topic: boosting parallel

Category: Data Science

About