Features of the fourier transform for machine learning

i intend to extract features from time-domain measurement data. I feed the features to machine learning algorithms to detect anomalies. In the time-domain, i extract mean, RMS, skew and standard deviation. I also want to execute a fourier transform and extract the features from the fourier transform. Intuitively, i would pick the mean frequency and the peak frequency for different frequency bands. Unfortunately, i cant find any literature on the topic or other people who extracted features from fourier transform …
Category: Data Science

How do I use TF*IDF scores for my machine learning model?

I have applied TF*IDF on the 'Ad-topic line' column of my dataset. For every ad-topic line, I get the same output: Firstly, I am unable to make sense of the output. The TF*IDF values are mentioned to the right, but what exactly are the numbers in brackets? I plan to use these for my logistic regression model for classification. How exactly do I feed these values to the algorithm?
Category: Data Science

What is the suggested way to create features (Mel-Spectograms) from speech signal for classification with ResNet?

At the moment I have this piece of code which cuts a Spectogram into fixed length tensors: def chunks(l, n): """Yield successive n-sized chunks from l.""" for i in range(0, len(l[0][0]), n): if(i+n < len(l[0][0])): yield X_sample.narrow(2, i, n) The following piece of code downsamples the Audio Creates Mel_Spectograms and takes the log of it Applies a Cepstral Mean and Variance Normalization Then it cuts the spectogram with the code above into a fixed size of length and appends it …
Category: Data Science

extract features from parts of one image

I have several parts of one image that have one caption... I need to do image captioning by evaluating every part of the image to which the caption will belong so do I need to extract the features from parts of the image and pass it to the model with its caption ? or how can I do it please? for example; the dataset I have are the parts of the image which are divided into three parts “beach, sea, …
Category: Data Science

log mel energies

I want to convert mel spectogram to log mel energies what I used is y, sr = librosa.load(filename, sr=16000) mel_spectrogram = librosa.feature.melspectrogram( y=y, sr=sr, n_mels=128, n_fft=1024, hop_length=512, power=2) log_mel_spectrogram = librosa.power_to_db(mel_spectrogram) I thought this converts to mel energies but I found this line of code log_mel_spectrogram = 20.0 / power * np.log10(np.maximum(mel_spectrogram, sys.float_info.epsilon)) My question is what is the difference between log-mel spectrograms and log mel energies, which line of code to use
Category: Data Science

Calculate features on stationary time-series data

I am trying to create a deep learning model that predicts the future price of crypto currencies based on past data. I downloaded the Open, High, Low, Close and Volume (OHLCV) data from yahoo finance and made it stationary by differencing it. Now I also want to calculate some technical indicators from the OHLCV data. For example the simple or exponential moving average. I'm guessing that the calculated features also need to be stationary. Is that correct? So do I …
Category: Data Science

extract features from low resolution

I have medical images and need to extract features from the layer before the classification layer using VGG for example but the resolution of the images is not efficient... Are the features without improving this resolution will not be affected or do I need to improve the resolution before extracting the features? I was doing processing in color images for extracting the features using VGG by this processing preprocess = T.Compose([ T.Resize(256, interpolation=3), T.CenterCrop(224), T.ToTensor(), T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, …
Category: Data Science

What are some good methods to forecast future revenue on categorical and value based data?

I have monthly snapshots (3 years) of all the contract data. It includes following information: Contract status [Categorical]: Proposed, tracked, submitted, won, lost, etc Contract stages [Categorical]: Prospecting, engaged, tracking, submitted, etc. Duration of contract [Date/Time] : months and years Bid Start date [Date/Time]: Date (But this changes when the contracts are delayed) Contract value [Numerical] : Value of the contract in local currency Future revenue projection [Numerical]: Currency value breakdown of revenue for next 5 years (this value is …
Category: Data Science

Task of regression on graphs

Which tools are available to extract features from a graph. After that, I would like to perform regressions on those features. Initially, I was thinking about using the adjacency matrix of the graph. But maybe there is a smarter way of doing feature extraction on graphs.
Category: Data Science

MR images segmentation for feature extraction

I have datasets of brain MR images with tumours, the tumours are already selected manually by a physicist using Image J. I have read about segmentation, but I still couldn't understand how do they extract features from a segmented image. should the images have only the tumor with a black background as shown in the below images, so the feature extraction will be processed on the whole image? or do they extract features only on the region of interest using …
Category: Data Science

Mapping between original feature space and an interpretable feature space

I'm reading the following really interesting paper https://arxiv.org/pdf/1602.04938.pdf on local interpretable model explanations on page 3 however particularly section 3.3 Sampling for Local Exploration they mention obtaining perturbed samples $z' \in \{0,1\}^{d'}$, it then says "we recover the sample in the original representation $z \in \mathbb{R}^{d}$ and obtain $f(z)$ " with no indication how this is done, surely the map is not injective? If not how would you know you recovered the correct sample? To this end, i wondering how …
Category: Data Science

how to align sliding window to extract features from multi modal timeseries data?

I have two datasets that are collected at different frequencies at the same time. One is recorded at 128Hz and another one is recorded at 512 Hz. I am trying to extract some features using the moving window technique but I have some problems. Frequencies of both datasets are different. the timestamp is in unix format and changes in nanoseconds. hence there won't be any match at the start and end of each second or minute. one of the datasets …
Category: Data Science

Single image feature reduction at inference time : SVM

I am trying to train a SVM classifier using scikit-learn.. At training time I want to reduce the feature vector dimension. I have used PCA to reduce the dimension. pp = PCA(n_components=400).fit(features) features = pp.transform(features) PCA requires m x n dataset to determine the variance. but at the time of inference I have only single image and corresponding 1d feature vector.. I am wondering how to reduce feature vector at inference time in order to match the training dimension. Or …
Category: Data Science

Neural Network One-hot Feature concatenation

I'm trying to add features to a model with two one hot encoded features. The features are defined like this. vocabulary = "ACGU" mapping_characters = list(vocabulary) integer_mapping = {x: i for i,x in enumerate(list(vocabulary))} n1 = [integer_mapping[word] for word in df[1][i]] Afterwards I'd like to add an additional one dimentional feature. If I use a concatenate layer, this means the model I'm using will go from (N, L, 4) dimensions per sample to (N, L, 5) dimensions, with the one-hot …
Category: Data Science

statistical significance test between binary label features

I have 667 features and I want to find features that have a significant boundary between a binary class label before I apply a classification model (e.g Naive Bayes/ SVM) to improve classification model learning rate. What I know is, if the features' values between the two classes are overlapping, this will cause poor classification. Hence, I have done a 2 samples t-test to calculate the statistical significance of features between binary class label. from scipy import stats p=[] failure …
Category: Data Science

I am looking for general image-based clustering methods

My task is to cluster some images, I decided to use the VGG model to extract the features and then use K-Means to cluster these features. But my question: When I use a VGG as a feature extractor, I should make sure if the VGG model was trained on this type of images before, otherwise, the VGG model is not generalizable to all types of images, am I right? I am looking for a general method to cluster images regardless …
Category: Data Science

Feature extraction from relational database

In order to build a classifier, I need to extract a few features from the data stored on a MySQL database. I need to join multiple tables and it is taking a lot of time. I have joined 2 tables at one time and have got results in multiple cases. I need to combine them. Writing a script will be the best option? How do people extract features from large relational databases? Am I missing something? Thanks.
Category: Data Science

Is Self-Supervised Learning a task of Representation Learning?

Maybe a weird question but: Currently, I'm writing a seminar paper about Self Supervised Learning for time series data. For this paper, I have to find methods to prepare unlabelled time series data with SSL techniques to perform a classification task. In a scientific paper, I was able to find time series representation learning methods. Another SSL paper used one of those methods to do the classification on a specific dataset. Now I have to admit that I'm kind of …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.