I try to improve my R2 score between theoretical and real output values. On the picture you can see two cases: the blue one is an artificial case I’m completely mastering with 7 dimensions as input and 1 dimension as output; The orange curve is a real case, 7 inputs 1 output. As you can see, the blue curve respond as expected. The more I add data, better is the prediction. BUT, with the orange case, this is the opposite. …
I have created a complex ML model using supervised learning. For the sake of discussion, let's say my model identifies dogs and a human labels the output as "correct" or "not correct" Now, I want to improve my model. I would like to understand when to use Reinforcement Learning and when to use Retraining. Approach 1 (Reinforcement Learning) - My current understanding is that in reinforcement learning, we create an agent with the goal of maximizing reward. In my example, …
I'm performing some interpolation comparison and, of course, "the quality" of the training sample is a key parameter to survey. In this case I can create the dataset. For this reason, I try to create a good dataset ( = the minimum of sample that help me to have a predictive model). What is the quantity of experiments required to generate a predictive model? To answer this question I managed to see how the data are nicely sparse in the …
Data quality refers to the overall utility of a dataset(s) as a function of its ability to be easily processed and analyzed for other uses, usually by a database, data warehouse, or data analytics system. What are the various techniques and metrics to evaluate the quality of the dataset to be considered for building any ML/Statistical Model? Are there any metrics to evaluate the data quality? Some measures I thought to consider: Columns where nearly all values are identical (Stability) …
With the great increase of publically viewable information (content) supported by the internet and modern communications, is the average quality of that information decreasing, roughly static, or increasing? To put some bounds on that, let's constrain that to human accessible information, and let's weight it in terms of views by persons of the respective content on whatever medium. Clearly, we have vastly more information, but its visibility is uneven. Put another way, do we know for sure how public information …
I am working on a text classification project. I have around 60,000 text samples of 40intents. By calculating the frequency of each intents I found there is a class imbalance out there. But it is just a subjective decision which I made on my data. But apart from this is there any mathematical approaches by which I can generate an overall report on the quality of my training data? I am mainly focused on finding (mathematically): If there is any …
Dear DS StackExchange community, I'm currently searching the interwebs for a (near-)ready-to-use solution, to perform a qualitative evaluation of extracted features from video data. In my head the tool looks something like the screenshot below (taken from the annotation tool prodigy), in the sense that a video is displayed at the top and underneath one would see a plot of a corresponding feature (selected e.g. via a drop-down menu) extracted from the video. This includes (nearly) every kind of data …
I am studying data mining and I stumbled upon types of attributes. They are Nominal Ordinal Interval Ratio Data mining book by Tan,Steinbech,Kumar says Permissible transformations for-: nominal-: any one to one mapping, eg a permutation of values. Ordinal-: new_value=f(old_value). An order preserving change of values. Interval-: new_value=a*old_value+b Ratio-: new_value=a*old_value I tried making sense of this, but could not really make sense what is this trying to say. What I know? Nominal attributes-: It provides enough information to distinguish one …