How to compare genetic profiles or vcf files in Python?
I have hundreds of vcf file where each vcf file contains genome profile for a tissue. A portion of the vcf file is as follows:
I can read each vcf file into a dataframe. So it would be hundreds of dataframes. Each vcf file/dataframe contains hundreds of columns and 40/50 thousands rows. I want to see the difference in ALT column for each profile (vcf files/ dataframes) on CHROM, POS, ID and REF columns. What would be the best way to compare these dataframes/vcf files to see any similarity on ALT column? Thanks in advance.
Topic dataframe bioinformatics pandas python
Category Data Science