Interpreting confidence interval results for datasets
I have created a dataset automatically and wanted to clarify my interpretation of the amount of noise using the confidence interval.
I selected a random sample and manually annotated the sample and found that 98% of the labels were correct. Based on these values I then calculated the confidence interval at 99% which gave a lower bound of 0.9614 and upper bound of 0.9949. Does this mean that the noise in the overall dataset is between the lower and upper bound and is then from 0.005% to 0.038%?
Topic confidence text-classification dataset statistics
Category Data Science