Mapping between original feature space and an interpretable feature space

I'm reading the following really interesting paper https://arxiv.org/pdf/1602.04938.pdf on local interpretable model explanations on page 3 however particularly section 3.3 Sampling for Local Exploration they mention obtaining perturbed samples $z' \in \{0,1\}^{d'}$, it then says "we recover the sample in the original representation $z \in \mathbb{R}^{d}$ and obtain $f(z)$ " with no indication how this is done, surely the map is not injective? If not how would you know you recovered the correct sample? To this end, i wondering how …
Category: Data Science

Machine learning frameworks for tree-based models

Background: Its well known that Pytorch and TensorFlow are currently the most used frameworks for Deep Learning (DL) research. As far as I know, most researchers (applied or theoretical) that contribute to the field of DL usually perform experiments with Pytorch. Specifically, the level of abstraction is just right to try custom architectures or models without having to build everything from scratch. Question: What about research in another popular field of machine learning, tree-based methods and ensembles? I am thinking …
Category: Data Science

What kind of regression model should I do?

my research question is the examine the effect of "receiving attention" from other members in an online community on "sustained participation" on the website. I decided to measure "sustained participation" of each user by calculating average time difference between the submissions of the user. I calculated it in the following way: and I measured "attention" by calculating total number of the comments each user received for all the submissions he/she has posted.I also want to consider total number of votes …
Category: Data Science

Topic / concept learning difficulty prediction

I was exploring the field of learning analytics. A bunch of research papers are focused on predicting course scores or grades. But, I was searching for predicting which concepts/topics students find difficult to learn / score. However, I did not find any research papers and datasets regarding this. Can someone point me to some search in "topic/concept difficulty prediction"?
Category: Data Science

Model does not learn after ternarization of weights contrary to the paper mentioned below

I’m implementing the ‘Ternary Weights Network’ paper by Fengfu Li and Bo Zhang ( archive link - https://arxiv.org/abs/1605.04711). I’m training a simple Covnet with linear layers on the MNIST dataset. Without ternarization, the exact same model converges with high accuracy, but after ternarization of the linear layers, the model does not perform well at all. It either gets stuck in a local optima ( in which it predicts all the classes with equal probability of 0.1) , or gets up …
Category: Data Science

Adversarial attacks on non image data

Reading the literature around deep learning adversarial attacks it appears to be wholly concentrated on attacks of image classification models. Are there papers that describe attacks on non image data ? Searching archive for deep learning adversarial attacks appears to contain results that are just related to image classification field.
Category: Data Science

Dataset for IoT activities

First of all, I am not sure whether this is the correct Stackexchange site for this question. If not, please let me know :) My question: I am looking for a dataset pertaining to IoT activities. Ideally, it should include network traces pertaining to the activity that a device was performing at a given time. Here 'activity' means whatever the device was performing on behalf of a user (e.g.: (Device,Activity): (Smart camera, Live streaming) | (smart switch: ON OFF activity)). …
Category: Data Science

For a student who is a beginner in quantitative research and statistics, which is the better statistical tool to start: R or IBM SPSS? Why?

Currently, I am writing my research design. However, I am still indecisive on what statistical tool should I use for the data analysis. I tried looking up on the internet and there are disparate answers to my question. I have noticed that R (Programming Language) and IBM Statistical Package for the Social Sciences are two of the recurring tools that are mentioned when it comes to this question. So, which is better? I need some insights so I can settle …
Category: Data Science

What does the term "seed lexicon" means?

I am reading a research paper (NLP) and found the phrase "seed lexicon". Could someone please explain it in detail? Edit : A sample paper Leveraging Affective Bidirectional Transformers for Offensive Language Detection Check 3rd page right column 5th line.
Category: Data Science

Should I reshuffle the training set when benchmarking neural networks?

I'm trying to set up a fair benchmarking between various RNN models, where each of them is trained until convergence with a fixed random seed. Because the task is very costly, I am only able to run each model once and then compare their performance. By reshuffling training set, I would change the loss surface every epoch. The result is that the models converge to a more generalized minima. But assumed that my random seed is fixed and the training …
Category: Data Science

Can we consider Meta-features of a datasets as its embeddings?

While reading some works on meta-learning. I had this doubt. Can we consider meta-features of a dataset as it's embedding ? Given the meta-feature is a lower dimensional representation which also try to retain properties of a dataset. Embeddings are essentially low dimension representation of some high dimensional concept. Is it fair to use "embeddings" instead of "meta-features" ? or can we use "representation" instead of "meta-features"
Category: Data Science

Developing a deep learning hybrid architecture for a particular problem is a highly complicated task

I am currently conducting research on application of deep learning (sensor signal recognition). I spent about a year and a half sifting through the literature and discovered some research patterns. To begin, I noted the emergence of Convolutional Networks (CNNs). Individuals applied CNN to their problems and reported state-of-the-art outcomes. Then LSTM was proposed; it was quickly adopted and declared state-of-the-art. Then the trend shifted; people began to use hybrid architectures and reported cutting-edge results. The current trend is to …
Category: Data Science

Interactive plot of topics over time

I am working on an NLP program to extract and analyze topics of research papers based only abstracts. I would like to have a plot like this one: But when I click on a line I would like to have a new page open and have a list of the top N relevant research papers and the topic prevalence percentage next to the title, and abstract text. All of this in Python. Do you know a way to achieve this? …
Category: Data Science

[Guidance Needed]Research in data science

IMAGINE You're a research intern and you're an undergraduate student. Have some experience in data science and now new to research. Your task is to conduct research on vision transformers. Oh and by the way, you're new to transformers concepts. You first learn transformers and make your fundamentals strong. You implemented and checked vision transformers original code. All fine! You even checked and ran current SOTA papers in vision transformers. Now you have gained enough knowledge to conduct your own …
Category: Data Science

How to determine the abnormality of a specific variable by taking into account all the other variables in the data?

I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X). Normally, an anomaly detection algorithm would find anomalies but on the whole data (Y + X), but in my case I want to zoom in on Y because …
Category: Data Science

Custom Loss Function Equation

I am trying to reproduce a research paper, where it is a classification problem, and they have introduced a custom loss function that I am unable to understand. Now I think I have to implement the equation (8) or equation (7) and I am using Tensorflow framework, but I am not able to understand the equation (8) as we have to input both Actual and Predicted Features in custom loss, but there answer is klog(lambda+1) Similarly in equation 7, they …
Category: Data Science

How to conclude the generality of any classification methods?

Suppose a classification task A, and there exist a lot of methods $M_1, M_2, M_3$. The task $A$ is measured by a consistent measure. For instance, the task A can be a binary classification. In this case, F-score, ROC curve can be used. I did a survey on some research are and found that $M_1$ is evaluated with dataset $D_1$ (open) using pre-processing $P_1$ only (seems the seminal work). $M_2$ is evaluated with dataset $D_1$ (open), $D_2$ (private) and compared …
Category: Data Science

Actual problems in Data Science/Machine Learning connected with music

I'm on my $4^{th}$ year now and searching for a theme for my diploma in mathematics & computer science specialization. Is there are interesting problems or fields for researches that i can explore? Probably i'd wanted to work in fields connected to art - music reccomendations e.g. (but we have Spotify sure). I was suggested to take theme "Authorship attribution in classical music", but i think this theme is rather explored.
Topic: research
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.