efficiency

Timing sequence in MapReduce

syed

2022年5月2日 06:00

I'm running a test on MapReduce algorithm in different environments, like Hadoop and MongoDB, and using different types of data. What are the different methods or techniques to find out the execution time of a query. If I'm inserting a huge amount of data, consider it to be 2-3GB, what are the methods to find out the time for the process to be completed.

Topic: experiments map-reduce performance efficiency

Category: Data Science

find speedup for different number of processes

Salman Al-haddad

2022年4月26日 17:56

I am new to data science I need to create code to find speedup compared with the number of processes while using a k-nearest neighbor. which (k=1,2,3,4,5,6,7). this process should be after downloading some datasets. it is preferred to use python. What is the appropriate code in python for that?

Topic: c++ hpc classification python efficiency

Category: Data Science

parallel work on KNN in python

Salman Al-haddad

2022年4月23日 19:40

I have a question, related to parallel work on python How I can use Processers =1,2,3... on k nearest neighbor algorithm when K=1, 2, 3,.. to find the change in time spent, speedup, and efficiency. What is the appropriate code for that?

Topic: time k-nn preprocessing python efficiency

Category: Data Science

Levenshtein distance vs simple for loop

Jadon Steinmetz

2022年2月22日 21:29

I have recently begun studying different data science principles, and have had a particular interest as of late in fuzzy matching. For preface, I'd like to include smarter fuzzy searching in a proprietary language named "4D" in my workplace, so access to libraries is pretty much non existent. It's also worth noting that client side is single threaded currently, so taking advantage of multi-threaded matrix manipulations is out of the question. I began studying the levenshtein algorithm and got that …

Topic: fuzzy-logic distance efficiency

Category: Data Science

More efficient way to create frequency column based on different groupings

Curt

2022年2月3日 21:12

I have code below that calculates a frequency for each column element (respective to it's own column) and adds all five frequencies together in a column. The code works but is very slow and the majority of the processing time is spent on this process. Any ideas to accomplish the same goal but more efficiently? Create_Freq <- function(Word_List) { library(dplyr) Word_List$AvgFreq <- (Word_List%>% add_count(FirstLet))[,"n"] + (Word_List%>% add_count(SecLet))[,"n"] + (Word_List%>% add_count(ThirdtLet))[,"n"] + (Word_List%>% add_count(FourLet))[,"n"] + (Word_List%>% add_count(FifthLet))[,"n"] return(Word_List) } ```

Topic: dplyr r efficiency

Category: Data Science

Inbetween CNN and MLP: neural network architecture for "close to convolutional" problem?

Mav

2022年1月10日 23:21

I am looking to approximate an (expensive to calculate precisely) forward problem using a NN. Input and output are vectors of identical length. Although not linear, the output somewhat resembles a convolution with a kernel, but the kernel is not constant but varies smoothly along the offset in the vector. I can only provide a limited training set, so I'm looking for a way to exploit this smoothness. Correct me if I'm wrong (I'm completely new to ML/NN), but in …

Topic: mlp cnn convolutional-neural-network convolution efficiency

Category: Data Science

Techniques to increase the evaluation speed of a neural network

HXSP1947

2021年11月3日 17:23

This is somewhat of an open ended question and in some respects a literature request (I would love to be pointed to a survey paper if one exists). Suppose I am constructing a neural network to make some arbitrary prediction (either categorical, or numeric, doesn't matter). With this network I am concerned primarily with speed of evaluation. Obviously, I want the network to give as accurate as possible predictions, but I'm more than willing to sacrifice some accuracy if it …

Topic: reference-request neural-network performance efficiency

Category: Data Science

Efficient method of performing within matrix similarity

CopyOfA

2021年6月15日 20:01

I want to compute a similarity comparison for each entry in a dataset to every other entry that is labeled as class 1 (excluding the current entry if it has a label of 1). So, consider a matrix of training data that has columns for ID and class/label, and then a bunch of data columns. ID Label var1 var2 var3 ... varN 1 1 0.26 0.44 0.2 0.11 2 0 0.13 0.34 0.14 0.21 3 1 0.22 0.34 0.45 0.57 …

Topic: matrix r efficiency

Category: Data Science

Set value for column based on two other columns in pandas dataframe

rv0723

2021年3月2日 19:10

I have a dataframe that has contracts with different order dates and I need to create a new column that assign a number to each contract if it has more than one order date. For example my sample dataframe looks something like this: df = pd.DataFrame({'contract': ['123A','123A','123A','123A','123B','123B','123C'],'prod': ['X1','M1','V1','D1','A1','B1','C1'],'date':['2019-04-17','2019-07-02','2019-04-17','2019-07-02','2019-04-17','2019-09-01','2019-08-02'],'revenue': [5688,113932,5688,49157,5002,892,9000]}) I need my final table to have another column with a unique contract id for each date. My final table from above should look something like this: contract date header_contract 123A …

Topic: pandas python efficiency

Category: Data Science

What is the difference in computational cost at inference time between object detection and semantic segmentation?

JStrahl

2021年2月15日 14:10

I am aware that YOLO (v1-5) is a real-time object detection model with moderately good overall prediction performance. I know that UNet and variants are efficient semantic segmentation models that are also fast and have good prediction performance. I cannot find any resources comparing the inference speed differences between these two approaches. It seems to me that semantic segmentation is clearly a more difficult problem, to classify each pixel in an image, than object detection, drawing bounding boxes around objects …

Topic: semantic-segmentation object-detection convolutional-neural-network computer-vision efficiency

Category: Data Science

Can I say that a trained neural network model with less parameters requires less resources during real world inference?

user3352632

2021年1月16日 16:29

Let us imagine that we have two trained neural network models with different architectures (e.g., type of layers). The first model (a) uses 1D convolutional layers with fully-connected layers and has 10 million learnable prameters. The second model (b) does use 2d conv layer with and has only 1 million paramerts in total. Both model achieve equal scores on the same input data set. Can I say that model b with less parameter is more favourable because it has less …

Topic: convolution parameter neural-network efficiency

Category: Data Science

Deep learning on cloud

mlds1337

2020年10月27日 07:57

I am trying to implement some deep learning models with large amount of data around 10gigabyte. Although, my Laptop and Collab-free crashes when it tries to load them. Do you think it worths to buy collab-pro? Do you suggest any other solutions? But my worries are mostly about buying collab-pro is only for US and Canada while I am from Europe. Thanks in advance.

Topic: cloud-computing deep-learning efficiency

Category: Data Science

Efficiently Sending Two Series to a Function For Strings with an application to String Matching (Dice Coefficient)

PythonNoob

2020年8月4日 10:54

I am using a Dice Coefficient based function to calculate the similarity of two strings: def dice_coefficient(a,b): try: if not len(a) or not len(b): return 0.0 except: return 0.0 if a == b: return 1.0 if len(a) == 1 or len(b) == 1: return 0.0 a_bigram_list = [a[i:i+2] for i in range(len(a)-1)] b_bigram_list = [b[i:i+2] for i in range(len(b)-1)] a_bigram_list.sort() b_bigram_list.sort() lena = len(a_bigram_list) lenb = len(b_bigram_list) matches = i = j = 0 while (i < lena and j …

Topic: jaccard-coefficient pandas python parallel efficiency

Category: Data Science

When is a Model Underfitted?

blunders

2020年4月26日 15:03

Logic often states that by underfitting a model, it's capacity to generalize is increased. That said, clearly at some point underfitting a model cause models to become worse regardless of the complexity of data. How do you know when your model has struck the right balance and is not underfitting the data it seeks to model? Note: This is a followup to my question, "Why Is Overfitting Bad?"

Topic: parameter algorithms efficiency

Category: Data Science

Memory efficient encoding logic for group categories

redguy

2020年2月15日 08:07

I have a huge dataset with categorical data. It is comprised of alerts having multiple properties. Each alert belongs to a group, and some even belong to multiple groups. It looks somewhat like this: GroupID System State TimeStamp etc... 0 [1, 2, 3, 4] A REC ... 1 [1, 2, 3, 4] A SNT ... 2 [2, 4] B REC 3 [2, 4] B PND 4 [2, 4] B COM 5 [2, 4] B SNT 6 [2] C RCV 7 …

Topic: categorical-encoding data-science-model encoding efficiency machine-learning

Category: Data Science

Ways to speed up Python code for data science purposes

German C M

2020年1月29日 10:49

Although it might sound like a pure techie question, I would like to know which ways you usually try out, for very data science-like processes, when you need to speed up your processes (given that the data retrieval is not a problem and that it also fits in memory etc). Some of those could be the following, but I would like to receive feedback about any other else: good practices as always using Numpy when possible on numeric operations and …

Topic: python efficiency scalability

Category: Data Science

How can calculate Efficiency for predictive models based on accuracy or error over time?

Mario

2020年1月27日 14:00

I was wondering if I could express the efficiency of prognostic models according to their accuracy(error, e.g. MAPE or MSE) over time [sec]. So let's imagine I have the following results for different predictive models: models MSE MAE MAPE predicting Time[sec] LSTM 0.12 0.13 15.67% 456789 GRU 0.06 0.05 5.89% 688741 RNN 0.45 0.51 25.33% 55555 What is the best way to illustrate the efficiency of predictive models over predicting time? Is the following equation right? how about its unit …

Topic: deep-learning accuracy predictive-modeling efficiency

Category: Data Science

Finding synergies among observations of equal length

Matmarbon

2019年12月12日 13:07

Assume we have a set $I$ with 20 different items (we call them $I_0$, $I_1$ up to $I_{19}$). Also we have $n$ observations $O \in I^{n\times 8}$; so each observation is a subset of $I$ with exactly 8 items and is labeled with a score. Just as an illustration here are some made up observations with their score: $O_1=\{I_0, I_8, I_9, I_{10}, I_{14}, I_{15}, I_{16}, I_{17}\};s_1=0.995$ $O_2=\{I_0, I_1, I_2, I_3, I_4, I_5, I_6, I_7\};s_2=0.667$ $O_3=\{I_2, I_3, I_9, I_{15}, I_{16}, I_{17}, …

Topic: efficiency

Category: Data Science

RIMS declining efficiency

Icebreaker

2019年11月4日 15:42

For many years, I was getting efficiency at about 81% after backing into the numbers with Brewers Friend doing simple non-recirculated infusions at 2qt per lb with continuous fly sparge. About a year ago, I built a electric RIMS 240 volt, 5500 watt, PID controlled system, so that I could do some step mashes and manage temps better. I like the set up, but my efficiency has gone down to 61% with continuous recirculation and the RIMS firing as needed …

Topic: rims process homebrew efficiency

Category: Mac

What Calcium ppm is required in the mash for alpha-amylase stability and mash efficiency?

Hop the Mad Alchemist

2019年5月23日 14:07

I recently moved and my water here is quite soft with a very low pH. As such, I've taken to adding many of my brewing salts in the brew kettle and only using salts in the mash to balance the pH. However, with a very low pH in the water to begin with, brewing a very dark beer means I'm going to leave as much calcium out of the mash as possible (because calcium lowers the pH). I do add …

Topic: water mash homebrew efficiency

Category: Mac