hierarchical-data-format

Using PCA to cluster multidimensional data (RFM variables)

Pads

2022年5月3日 01:00

So i am performing k-means clustering on RFM variables (Recency, Frequency, Monetary). The RFM variables are in the form of quantiles (1-4). I used PCA and found the PCA components. I then used the elbow method to find the optimal number of clusters and then I use it in the k-means algorithm. Could anyone guide me if this is a correct method? Further, the clusters I get range on the graph, their axis ranges from -3 to 3 and I …

Topic: pca hierarchical-data-format k-means clustering

Category: Data Science

Proof related to Ward's Method

user134944

2022年4月23日 15:07

According to Ward's Method that says :

Topic: data hierarchical-data-format clustering data-mining machine-learning

Category: Data Science

What is the best to identify the proper hierarchy of this data?

Steven Cunden

2022年4月15日 11:02

Topic: hierarchical-data-format clustering

Category: Data Science

Advice on dealing with very large datasets - HDF5, Python

munieq11

2022年2月25日 13:03

I've recently started working on an application for visualization of really big datasets. While reading online it became apparent that most people use HDF5 for storing big, multi-dimensional datasets as it offers the versatility to allow many dimensions, has no file size limits and is transferable between OSs. My question is how to best deal with very large files. I am working with datasets that have 3-dimensions, all of which have large number of components (example size: 62,500 x 500,000 …

Topic: hierarchical-data-format python bigdata

Category: Data Science

How to train/test/validate hierachical classifiers?

Isa

2022年2月15日 14:55

I am writing an algorithm which allows to detect activities based on wearable data. I would like to try it out an hierachical approach (Local Classifier Per Parent Node structure). In the first level, I determine the intensity of the activity (1 classifier), and in the second level I determine the activity label (3 classifiers). I am however struggling with how I need to approach the training/testing/validation of such a structure. What I did now is: Split data into 2 …

Topic: activity-recognition training cross-validation hierarchical-data-format machine-learning

Category: Data Science

Looking for an algorithm to perform classification on multivariate grouped time series

Meowcapone

2021年12月2日 19:54

I will be grateful for any help. I have multivariate time series, where every one of them has an unique ID. Also, there is a variable giving information about the trend type of the ID from a point of view of a single variable which we consider important. The problem is, I need to understand, how is behaviour (or trends) of other variables (time series in ID) affecting the inclusion of the ID to a specific stated trend category. I …

Topic: decision-trees hierarchical-data-format classification time-series clustering

Category: Data Science

How to cluster/group these data points (using K-Mean or Hirarachal clustering)

asmgx

2021年12月2日 05:26

I have genes from different species Gene A , Gene B, Gene C, ... Gene Z Some Genes are similar to each other A & G are 96% similar C & H are 92% similar G & B are 89% similar G & T are 85% similar . . . K & F are 52% similar I want to classify these genes into groups of species Species A, B, T, G are the same species Species C, H, N, R, …

Topic: hierarchical-data-format feature-extraction k-means

Category: Data Science

Input Features of a Hierarchical Structure

TKTK

2021年10月1日 10:20

I have input features of a hierarchical structure. Each feature consists of a header element and 0 to n subfeatures of the same structure. Also, there is no upper limit for n and n can be different from feature to feature. It should also be possible to establish relationships between features with a different number of subfeatures. How can I format this data so that it can be used to train different (machine) learning algorithms? Example of one input feature …

Topic: deep-learning hierarchical-data-format machine-learning

Category: Data Science

Use dummy variables to create a rank variable. R

Marvin Aliaga

2021年7月12日 09:00

I have a series of multiple response (dummy) variables describing causes for a canceled visits. A visit can have multiple reasons for the cancelation. My goal is to create a single mutually exclusive variable using the dummy variables in a hierarchical way. For example, in my sample data below the rank of my variables is as follow: Medical, NoID and Refuse. Ex. if a visit was cancelled due to medical and lack of ID reasons, I would like to recode …

Topic: dummy-variables ranking hierarchical-data-format r

Category: Data Science

What are the advantages of HDF compared to alternative formats?

IharS

2021年7月2日 00:43

What are the advantages of HDF compared to alternative formats? What are the main data science tasks where HDF is really suitable and useful?

Topic: hierarchical-data-format data-formats

Category: Data Science

Should I scale or normalise my dataset before clustering?

Karan Khurana

2021年5月17日 10:53

So i have a dataset with variables with unit of measurement as milligrams, kgs and quintals. Should i use standard scaler or minmaxscaler to scale the dataset.

Topic: feature-scaling hierarchical-data-format k-means clustering

Category: Data Science

Can we define a data partitioning in K clusters, by cutting the branches of the tree at some levels in the tree below the root node?

cristid9

2021年1月27日 17:04

Assume we have a dendogram (hierarchical clusterisation tree), can we define a data partitioning in K clusters, by cutting the branches of the tree at some levels in the tree below the root node?

Topic: hierarchical-data-format clustering

Category: Data Science

Different representations of dendrograms

Noppawee Apichonpongpan

2020年8月3日 14:58

I have a dendrogram represented in a format I don't understand: (K_5:1.000030e+00,((K_1:2.000000e-05,(K_2:1.000000e-05,K_3:1.000000e-05):1.000000e-05):1.000000e-05,K_4:3.000000e-05)0.806:1.000000e+00):0.000000e+00; I am not sure how to interpret the above. It is an output of hierarchical clustering. K_1, K_2, K_3, K_4, K_5 are the data points. I have other dendrograms represented in the following format: [x_1,x_2,x_3,x_4,x_5] (we start with one big cluster and split a cluster at each step) [x_1,x_2][x_3,x_4,x_5] [x_1,x_2][x_3,x_5][x_4] [x_1][x_2][x_3,x_5][x_4] [x_1][x_2][x_3][x_5][x_4] I want a way to convert between these two representations.

Topic: hierarchical-data-format clustering

Category: Data Science

Finding the best "depth" of ICD9 codes with pseudo-hierarchical clustering

Sam Castillo

2020年3月25日 10:57

Here is a common problem in health care modeling. Did I just invent a new algorithm or has someone already thought of this? The goal is to find the most homogeneous partition of patients by medical costs using ICD9 codes. There are 13,000 individual codes in the data set, so using the full code results in many only having a few observations. ICD9 codes are in a nested hierarchical structure. For instance, all infectious diseases are 001-139, one particular disease …

Topic: embeddings hierarchical-data-format clustering

Category: Data Science

Question About Coming Up With Own Function for Distance Matrix (For Clustering)

Tim Weah

2019年10月5日 15:03

Right now, I am currently working on implementing a clustering algorithm with millions data entries with regards to game users for a mobile game. A lot of the features I plan on using are unique to this game (data that can only be analyzed if one knows the game well), and thus I believe that it is best for my data that I come up a new function to generate the distance matrix that I plan on using in the …

Topic: distance hierarchical-data-format k-means clustering bigdata

Category: Data Science

Handling hierarchical category independent variables

Wickkiey

2019年8月31日 11:29

I have data with huge categorical attributes. For example, main_column, sub_column1, sub_column2 are 3 hierarchical attributes. If if take dummy variable on these columns the column count is increased to 1000. How to handle this kind of hierarchical attributes for a classification problem ? Thanks !!

Topic: dummy-variables hierarchical-data-format classification pandas

Category: Data Science

Best classification technique for following kind of data set

Alex Skarulis

2019年3月14日 15:24

I have a large table where each record or row represents a single salesperson, and there are 50 columns or dimensions where each column represents one of 50 products potentially sold by any given salesperson, with one final column representing their total compensation as a percentile of their salesperson peers. The values within each column range from 0 to 100, reflective of the salesperson's percentile performance in sales for that product, and then in the final column, percentile in total …

Topic: pca hierarchical-data-format classification k-means clustering

Category: Data Science

Machine learning for predicting HTML Elements on a web page?

Mathias Mamsch

2019年2月28日 13:50

My goal is to implement an assistant for crawling web data for users that don't understand anything about HTML or DOM. I will show a web page to the user and the user has to select, what data he is interested on the page (or what data he is not interested in). Example: If the user clicks on the cell inside a table, it is very likely he wants to extract all elements inside that column. He might only be …

Topic: hierarchical-data-format predictive-modeling data-mining machine-learning

Category: Data Science

Using an ontology to recognize named entities in free text

xApple

2018年9月29日 08:46

I'm trying to solve a fairly basic problem in NPL efficiently. What tool or software package would you use to identify the words, or group of words that are part of an given ontology within a free text. Let's imagine the inputs are the following dummy ontology: And this publication's abstract: This study evaluates the addition of metformin to standard of care in locally advanced and metastatic prostate cancer, half the patients will receive metformin in combination with standard treatment, …

Topic: named-entity-recognition hierarchical-data-format nlp

Category: Data Science

Efficient dynamic clustering

alexT

2018年5月21日 20:48

I have a set of datapoints from the unit interval (i.e. 1-dimensional dataset with numerical values). I receive some additional datapoints online, and moreover the value of some datapoints might change dynamically. I'm looking for an ideal clustering algorithm which can handle these issues efficiently. I know sequential k-means clustering copes with the addition of new instances, and I suppose with minor modification it can work with dynamic instance values (i.e. first taking the modified instance from the respective cluster, …

Topic: hierarchical-data-format algorithms k-means clustering machine-learning

Category: Data Science