does R2 diverge because of a lack of input dimensions?

I try to improve my R2 score between theoretical and real output values. On the picture you can see two cases: the blue one is an artificial case I’m completely mastering with 7 dimensions as input and 1 dimension as output; The orange curve is a real case, 7 inputs 1 output. As you can see, the blue curve respond as expected. The more I add data, better is the prediction. BUT, with the orange case, this is the opposite. …
Category: Data Science

Reinforcement Learning vs Retraining

I have created a complex ML model using supervised learning. For the sake of discussion, let's say my model identifies dogs and a human labels the output as "correct" or "not correct" Now, I want to improve my model. I would like to understand when to use Reinforcement Learning and when to use Retraining. Approach 1 (Reinforcement Learning) - My current understanding is that in reinforcement learning, we create an agent with the goal of maximizing reward. In my example, …
Category: Data Science

what level of discrepancy do I target for a good interpolation?

I'm performing some interpolation comparison and, of course, "the quality" of the training sample is a key parameter to survey. In this case I can create the dataset. For this reason, I try to create a good dataset ( = the minimum of sample that help me to have a predictive model). What is the quantity of experiments required to generate a predictive model? To answer this question I managed to see how the data are nicely sparse in the …
Category: Data Science

What are the metrics to evaluate Data Quality?

Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system. What are the various techniques and metrics to evaluate the quality of the dataset to be considered for building any ML/Statistical Model? Are there any metrics to evaluate the data quality? Some measures I thought to consider: Columns where nearly all values are identical (Stability) …
Category: Data Science

Public information quality trends vs quantity

With the great increase of publically viewable information (content) supported by the internet and modern communications, is the average quality of that information decreasing, roughly static, or increasing? To put some bounds on that, let's constrain that to human accessible information, and let's weight it in terms of views by persons of the respective content on whatever medium. Clearly, we have vastly more information, but its visibility is uneven. Put another way, do we know for sure how public information …
Topic: data-quality
Category: Data Science

How to mathematically quantify the quality of a corpus?

I am working on a text classification project. I have around 60,000 text samples of 40intents. By calculating the frequency of each intents I found there is a class imbalance out there. But it is just a subjective decision which I made on my data. But apart from this is there any mathematical approaches by which I can generate an overall report on the quality of my training data? I am mainly focused on finding (mathematically): If there is any …
Category: Data Science

UI-based Tool for Qualitative Evaluation of Data Quality

Dear DS StackExchange community, I'm currently searching the interwebs for a (near-)ready-to-use solution, to perform a qualitative evaluation of extracted features from video data. In my head the tool looks something like the screenshot below (taken from the annotation tool prodigy), in the sense that a video is displayed at the top and underneath one would see a plot of a corresponding feature (selected e.g. via a drop-down menu) extracted from the video. This includes (nearly) every kind of data …
Category: Data Science

What do we mean by permissible transformations in types of attributes-:nominal,ordinal,interval,ratio?

I am studying data mining and I stumbled upon types of attributes. They are Nominal Ordinal Interval Ratio Data mining book by Tan,Steinbech,Kumar says Permissible transformations for-: nominal-: any one to one mapping, eg a permutation of values. Ordinal-: new_value=f(old_value). An order preserving change of values. Interval-: new_value=a*old_value+b Ratio-: new_value=a*old_value I tried making sense of this, but could not really make sense what is this trying to say. What I know? Nominal attributes-: It provides enough information to distinguish one …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.