algorithms

The difference between data science and algorithm development

אבנר יעקב

2022年6月4日 23:08

I see a lot of job opportunities in the field of data science but I'm not sure the difference between a data scientist and deep learning algorithm developer. Can someone explain that to me?

Topic: deep-learning algorithms machine-learning

Category: Data Science

gradient descent diverges extremely

user94586

2022年6月1日 14:04

I have manually created a random data set around some mean value and I have tried to use gradient descent linear regression to predict this simple mean value. I have done exactly like in the manual and for some reason my predictor coefficients are going to infinity, even though it worked for another case. Why, in this case, can it not predict a simple 1.4 value? clear all; n=10000; t=1.4; sigma_R = t*0.001; min_value_t = t-sigma_R; max_value_t = t+sigma_R; y_data …

Topic: matlab gradient-descent predictive-modeling algorithms machine-learning

Category: Data Science

Which algorithm to use for transactional data

Liam Louw

2022年6月1日 11:03

I'm given a Dataset of transactions and asked to find insights for businesses. I'm extremely new to ML / Data science and have only been experiencing with KMeans. The dataset has the following features merchant ID Transaction date Military time Amount card amount paid merchant name Town area code client ID age band gender code province average income 3 months card value spending card tapped Ignoring NULL data, what type of analysis can I do on this data? I have …

Topic: dataset algorithms

Category: Data Science

What is the most effective unsupervised ML algorithm to use when outliers are present in data set?

Ross leavitt

2022年5月31日 23:07

I am analyzing a portfolio of about 225 stocks and have gotten data for each of them based on their "Price/Earnings ratio", "Return on Assets", and "Earnings per share growth". I would like to cluster these stocks based on their attributes into 3 or 4 groups. However, there are substantial outliers in the data set. Instead of removing them altogether I would like to keep them in. What ML algorithm would be best suited for this? I have been told …

Topic: unsupervised-learning outlier algorithms machine-learning

Category: Data Science

Finding the tightest (smallest) triangle that fits all points

MathCurious

2022年5月31日 04:59

I'm supposed to find an algorithm that, given a bunch of points on the Euclidean plane, I have to return the tightest (smallest) origin centered upright equilateral triangle that fits all the given points inside of it, in a way that if I input some random new point, the algorithm will return $+$ if the point is inside the triangle and $-$ if not. Someone has suggested me to go over all the possible points and find the point with …

Topic: pac-learning algorithms machine-learning

Category: Data Science

Learner Algorithm Time & Sample Complexity

MathCurious

2022年5月25日 15:06

Let $X=R^{2}$. Let $u=\left(\frac{\sqrt{3}}{2},-\frac{1}{2}\right),\ w=\left(-\frac{\sqrt{3}}{2},-\frac{1}{2}\right),\ v=\left(0,1\right)$ and $C=H=\left\{h\left(r\right)=\left\{\left(x_{1},x_{2\ }\right)\ |\left(x_{1},x_{2\ }\right)\cdot u\le4,\ \left(x_{1},x_{2\ }\right)\cdot w\le r,\ \left(x_{1},x_{2\ }\right)\cdot v\le r\right\}\right\}$ for $r>0$, the set of all origin centered upright equilateral triangles. Describe a sample complexity algorithm $L$ that learns $C$ using $H$. State the time and sample complexity of your algorithm and prove it. I was faced with this question in a homework assignment and I'm a bit confused.. My solution is: Let D be our dataset Learner Algorithm: maxDistance …

Topic: pac-learning classification algorithms

Category: Data Science

How does the construction of a decision tree differ for different optimization metrics?

sgk

2022年5月21日 19:04

I understand how a decision tree is constructed (in the ID3 algorithm) using criterion such as entropy, gini index, and variance reduction. But the formulae for these criteria do not care about optimization metrics such as accuracy, recall, AUC, kappa, f1-score, and others. R and Python packages allow me to optimize for such metrics when I construct a decision tree. What do they do differently for each of these metrics? Where does the change happen? Is there a pattern to …

Topic: decision-trees optimization algorithms machine-learning

Category: Data Science

Does CART algorithm takes into account in the order of the set of attributes?

LSola

2022年5月20日 11:02

when using matlab command 'fitctree' for classification purpose, and I change the order of the attributes I do not find the same Tree and thus the same classificaiton error? why? CART algorithm does take account on the attributes firstly introduced ?

Topic: matlab decision-trees random-forest algorithms machine-learning

Category: Data Science

Which machine learning algorithms can be used for trajectory classifications?

thisisjaymehta

2022年5月13日 19:00

I am working on project for clustering of air objects based on their trajectories. Like I would like to train a model on a dataset of different flying object's trajectories so later I can predict what type of object is based on trajectory data. Now trajectory data include 4 things (Altitude, Longitude, Latitude, and Time). So based on set of such dataset we may be able to classify objects like plane, rocket, missile, etc. What I cannot figure out is …

Topic: classification time-series algorithms clustering machine-learning

Category: Data Science

What Framework To Use for Asynchronous Algorithms?

blah

2022年5月13日 06:05

I have a problem with an extremely large dataset (who doesn't?) which is stored in chunks such that there is low variance across chunks (i.e., the chunks are sort of representative). I wanted to play around with algorithms to do some classification in an asynchronous fashion but I wanted to code it up myself. A sample code would look like start a master distribute 10 chunks on 10 slaves while some criterion is not met for each s in slave: …

Topic: algorithms

Category: Data Science

Time Complexity notation in Big Data platforms

Mohitt

2022年5月13日 04:04

I am redesigning some of the classical algorithms for Hadoop/MapReduce framework. I was wondering if there any established approach for denoting Big(O) kind of expressions to measure time complexity? For example, hypothetically, a simple average calculation of n (=1 billion) numbers is O(n) + C operation using simple for loop, or O(log) I am assuming division to be a constant time operation for the sake for simplicity. If i break this massively parallelizable algorithm for MapReduce, by dividing data over …

Topic: map-reduce algorithms bigdata

Category: Data Science

Which algorithms should I use for identifying similar characteristics between data points (the intersections)?

Zigrivers

2022年5月12日 03:03

I am working with a dataset that has been coded and categorized, so that each datapoint has a set of coded characteristics. An example data point would be something like the following: Example Data Point: Quality Service & Support Price Each data point can have multiple codes associated with it. What I'm looking to do is identify the "intersections" between the data points so that I can answer questions like the following: When a data point has "Quality" as a …

Topic: algorithms data-mining

Category: Data Science

Is it possible to make a label automatically in supervised learning(Machine Learning)?

Sean.G

2022年5月7日 19:02

My background knowledge: Basically, supervised learning is based on labeled data. Using the labeled data, the machine can study and determine results for unlabeled data. To do that, for example, if we handle picture issue, manpower is essentially needed to cut raw photo, label on the photos, and scan on the server for fundamental labeled data. I know it sounds weird, but i'm just curious if there are any algorithms/system to make a label automatically for supervised learning.

Topic: unsupervised-learning deep-learning algorithms machine-learning

Category: Data Science

How do I select the "best" unsupervised machine learning algorithm to cluster my specific dataset?

Alex

2022年5月7日 07:00

I want to cluster a dataset without prior knowledge on the correct amount of clusters. For different algorithms (i.e. k-means, gmm...) I can iterate through different values and try to find the best solution for any given algorithm (i.e. ellbow-curve, silhouette-coefficient etc.). But I get very different results - as expected with different algorithms. K-Means is good for spherical clusters, density-based approaches for totally different cluster shapes. Now the actual question: How do I select the "best" unsupervised machine learning …

Topic: unsupervised-learning model-selection algorithms data-mining

Category: Data Science

Predicting change of shapes/coordinates

Fabian Schultz

2022年5月6日 18:04

I'm trying to find a way to predict/calculate how a shape (e.g. outline of a glacier) will change in the future—based on its history (previous shape) and additional factors (e.g. Δtemperature). In my example: I have the shape/coordinates of a glacier and an average temperature at 1970, 1985, 2000, 2015. How can I give an estimate on how that shape will look like in 2030, based on the previous shapes and a predicted temperature? The shapes would ideally come in …

Topic: regression statistics predictive-modeling algorithms machine-learning

Category: Data Science

Mixed Data Type Classification / Neighbor Algorithm

CyberBully2003

2022年5月4日 18:48

Here is a hypothetical simplified dataframe of my problem, which would be low dimensional (20ish features), containing some made-up information about certain dog breeds: Breed Min_Weight Max_Weight Min_Height Max_Height is_friendly grp Husky 10 20 30 35 True working Poodle 8 17 15 30 False terrier The algorithm would receive some information about a dog, and it would need to identify k-closest dog breeds based on the input data. It needs to be high performance. Example: algorithm receives an unknown breed …

Topic: k-nn machine-learning-model classification algorithms clustering

Category: Data Science

Estimating location in a model

principe

2022年5月3日 10:05

I have a big dataset with 10 columns and about a 100,000 rows. Each 5 rows represent a person being tracked and the data related to this tracking such as time, velocity, etc. the last two columns are the longitude and latitude for that person. To test the model, the test set has the fifth row for each person missing in longitude and latitude. What's the best way to approach this problem? for example the test set looks like: id …

Topic: machine-learning-model predictive-modeling algorithms machine-learning

Category: Data Science

Clustering a variable based on another variable or set of variables

Chinti

2022年4月30日 20:01

df11[['COMPONENT_ID','FIRMWARE','SERIAL','CRP0_VDDN']].head() Consider I have these four columns to analyse. I want to form say 3-5 clusters of COMPONENT_IDs with similar characters. I want this to happen based on the remaining features or just CRPO_VDNN in relation with COMPONENT_IDs. How can I do this ?

Topic: python algorithms clustering machine-learning

Category: Data Science

Interpretation of agnostic models

PicaR

2022年4月28日 11:10

I am trying to interpret a black box model. This model is a random forest that I am using to make predictions. I have read that LIME is a way to interpret black box models, but I don't quite know how to interpret the following graphs: If someone could help me to interpret them or tell me how to do it, it would be of great help. Thank you.

Topic: lime interpretation random-forest algorithms machine-learning

Category: Data Science

How to detect that sequence of points belong to some model of first order theory?

TomR

2022年4月27日 11:37

Assume that every neural network can be recast to the sequence of layers (https://arxiv.org/abs/2106.14587 has chapter how to do this). Assume that layer U has N neurons. The set of possible activities of layer U forms the N-dimensional vector space. Each concrete state of layer U (in the sense of activities) can be described by N-dimensional vector (point) in this space. Assume, that NN functions or learns and assume that some First Order Theory (set of variables and functions and …

Topic: programming neural-network algorithms

Category: Data Science

About