Extract or subset hundreds of columns from a data frame

I need to extract many columns from a dataset. I have a very large csv file with thousands of columns and rows, and I read it into R using:

mydata - read.csv(file = "file.csv",header = TRUE,sep = ",",row.names = 1)

Each column is a gene name. I know how to extract specific columns from my R data.frame by using the basic code like this:

dataset[ , "GeneName1", "GeneName2"]

But my question is, how do I pull hundreds of gene names? Too many to type in? They are listed in a txt file. I'm new, so please go easy on jargon and abbreviations.

Topic csv r

Category Data Science


You can also subset a dataframe in base R like this:

# Some dataframe
df = data.frame(a=c(1,2),b=c(1,2),c=c(1,2),d=c(1,2),e=c(1,2))
names(df)

# List of column names to select
colnamelist = c("a","b","c")

# Subset dataframe based on list of wanted columns
df = df[,colnames(df) %in% colnamelist]
names(df)

Result will be:

> names(df)
[1] "a" "b" "c"

The logic is: Select from a dataframe df[row,column], where you say the column names to be selected should be in a list colnames(df) %in% colnamelist.


Hopefully I've understood your question correctly.

Assuming your text file looks like this?

GeneName1
GeneName2

You can read that in using the readLines() function:

cols <- readLines("name_of_text_file")

Which returns cols as a vector of those names:

> cols
[1] "GeneName1" "GeneName2"

Which can then be used to subset the data frame as per your example:

mydata[ , cols]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.