data-quality

does R2 diverge because of a lack of input dimensions?

lelorrain7

2022年4月13日 07:53

I try to improve my R2 score between theoretical and real output values. On the picture you can see two cases: the blue one is an artificial case I’m completely mastering with 7 dimensions as input and 1 dimension as output; The orange curve is a real case, 7 inputs 1 output. As you can see, the blue curve respond as expected. The more I add data, better is the prediction. BUT, with the orange case, this is the opposite. …

Topic: data-quality correlation deep-learning python

Category: Data Science

Reinforcement Learning vs Retraining

learnlifelong

2022年3月26日 21:30

I have created a complex ML model using supervised learning. For the sake of discussion, let's say my model identifies dogs and a human labels the output as "correct" or "not correct" Now, I want to improve my model. I would like to understand when to use Reinforcement Learning and when to use Retraining. Approach 1 (Reinforcement Learning) - My current understanding is that in reinforcement learning, we create an agent with the goal of maximizing reward. In my example, …

Topic: data-quality machine-learning-model training reinforcement-learning machine-learning

Category: Data Science

what level of discrepancy do I target for a good interpolation?

lelorrain7

2022年3月11日 13:58

I'm performing some interpolation comparison and, of course, "the quality" of the training sample is a key parameter to survey. In this case I can create the dataset. For this reason, I try to create a good dataset ( = the minimum of sample that help me to have a predictive model). What is the quantity of experiments required to generate a predictive model? To answer this question I managed to see how the data are nicely sparse in the …

Topic: data-quality distribution interpolation dataset predictive-modeling

Category: Data Science

What are the metrics to evaluate Data Quality?

Pluviophile

2022年1月7日 08:35

Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system. What are the various techniques and metrics to evaluate the quality of the dataset to be considered for building any ML/Statistical Model? Are there any metrics to evaluate the data quality? Some measures I thought to consider: Columns where nearly all values are identical (Stability) …

Topic: data-quality data-mining machine-learning

Category: Data Science

Public information quality trends vs quantity

Toastmaster6

2021年12月11日 23:26

With the great increase of publically viewable information (content) supported by the internet and modern communications, is the average quality of that information decreasing, roughly static, or increasing? To put some bounds on that, let's constrain that to human accessible information, and let's weight it in terms of views by persons of the respective content on whatever medium. Clearly, we have vastly more information, but its visibility is uneven. Put another way, do we know for sure how public information …

Topic: data-quality

Category: Data Science

How to mathematically quantify the quality of a corpus?

hafiz031

2021年9月9日 03:46

I am working on a text classification project. I have around 60,000 text samples of 40intents. By calculating the frequency of each intents I found there is a class imbalance out there. But it is just a subjective decision which I made on my data. But apart from this is there any mathematical approaches by which I can generate an overall report on the quality of my training data? I am mainly focused on finding (mathematically): If there is any …

Topic: data-quality data visualization data-cleaning

Category: Data Science

UI-based Tool for Qualitative Evaluation of Data Quality

SherabWangchuk

2021年8月3日 10:41

Dear DS StackExchange community, I'm currently searching the interwebs for a (near-)ready-to-use solution, to perform a qualitative evaluation of extracted features from video data. In my head the tool looks something like the screenshot below (taken from the annotation tool prodigy), in the sense that a video is displayed at the top and underneath one would see a plot of a corresponding feature (selected e.g. via a drop-down menu) extracted from the video. This includes (nearly) every kind of data …

Topic: data-quality data visualization tools

Category: Data Science

What do we mean by permissible transformations in types of attributes-:nominal,ordinal,interval,ratio?

broman

2021年7月27日 15:14

I am studying data mining and I stumbled upon types of attributes. They are Nominal Ordinal Interval Ratio Data mining book by Tan,Steinbech,Kumar says Permissible transformations for-: nominal-: any one to one mapping, eg a permutation of values. Ordinal-: new_value=f(old_value). An order preserving change of values. Interval-: new_value=a*old_value+b Ratio-: new_value=a*old_value I tried making sense of this, but could not really make sense what is this trying to say. What I know? Nominal attributes-: It provides enough information to distinguish one …

Topic: data-quality preprocessing data-mining

Category: Data Science

About