How to get a (descriptive) overview of a large database?

Question

How to get a (descriptive) overview of a large database?

Ben

2020年12月9日 20:08

I'm facing a data framework with

~ 20 k observations and
151 variables across
2078 subjects

At first I am primarily interested in how the data looks like related to a single parameter. But I cannot plot 2078 subjects on the x-axis and make a bar plot out of it or so.

What would be useful methods for such a situation? I prefer some visualizations but I think they won't be applicable. I'm afraid even non-visualization methods are not really helpful as well.

Topic aggregation descriptive-statistics ggplot2 visualization r

Category Data Science

Erwan · Accepted Answer · 2020年12月9日 20:08

There's no way to have a complete summary of a large dataset like this, you have to analyze what can be relevant, decompose into more specific pieces of information and then find the best way to visualize each specific part on its own.

The first thing would be to plot the distribution of this parameter of interest across subjects and/or observations.

If you want to look at the individual level and there are too many values, you can simply pick a random subset (say 100 subjects) and plot these. Then you do it again with a different random subset in order to distinguish real patterns from variations due to chance.

How to get a (descriptive) overview of a large database?

About