Working with others: Tidy data vs Pretty data

When working with someone whose background and skill level in data work may not be strong, how do you best make the argument for tidy data over pretty data?

There are notes of what I want to ask/discuss in this StackExchange post, but I want to ask more about the collaborative aspect.

Part of my job is to collect data from many different organizations which are all members of a common group. The spreadsheets they send me are untidy in ways as varied as the companies themselves. I'm sure you're familiar. A thought I keep having is that I could go meet with each company and help them organize what data they have if they're amenable to that. My own superiors are fond of this idea as well.

So the conversation I'm imaging is with, say, GizWidCo, whose product data lives in a spreadsheet that someone lovingly crafted to have the information they need at a glance, formatted in a way they find aesthetically pleasing, with adjacent tables, variables in column names or even table headers, with cells that contain multiple values. Because that's how they've always done it. Because they don't want to spend all day running analyses, they just need this information stored somewhere. They're experts in widget design and gizmo production, not number crunching.

Do you have any experience guiding someone toward using better data management principles? I don't want to just tell someone how to do their job, and I don't even want my influence to serve only my needs either (even if getting them to send me better data is the impetus of all this). My hope is that I can educate someone who could take value from improving their data management and thus also improving all related processes. I'm intending this as a win/win.

If you have any thoughts, perspective, or lived experience here, I would love to hear it. If this is too off-topic for a StackExchange question, I would be grateful to know the right forum. (For what it's worth, I first asked this on Cross Validated and they suggested here.)

Topic pipelines data management data-cleaning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.