Question on bootstrap sampling
I have a corpus of manually annotated (aka gold standard) documents and a collection of NLP systems annotations on the text from the corpus. I want to do a bootstrap sampling of the system and gold standard to approximate a mean and standard error for various measures so that I can do a series of hypotheses tests using possibly ANOVA.
The issue is how do I do the sampling. I have 40 documents in the corpus with ~44K manual annotation in the gold standard. I was thinking of using each document as a sampling unit, and taking 60% of documents for each sample (or 24 documents per sample). However, the issue is that each manually annotated documents does not have the same number of annotations, so that violates using same sample size for each sample.
Any suggestions on how to achieve this bootstrap?
Topic bootstraping nlp
Category Data Science