Data engineering good and bad practice?

I'm a Data Analyst in a pretty big company and I'm having a really bad time with the data I'm being given. I spend about 70% of my time thinking about where to find the data and how to pull it instead of analyzing it. I have to pull from tables that are sometimes 800 columns wide (600 with a ton of N/As) and which have no or almost no documentation. This is my first job so I don't know what's the standard of how Data engineers should design their databases and tables but as someone who uses data made available by a Data Engineering team, what would you expect from it?

I hate that so to facilitate my life I clean them and create queries that output clean (or almost) clean tables that I can directly then use to query clean data.

What are the good practices in general? what do you expect from a good Data Engineering team as someone who depends on it? What can I do to improve the data quality overall?

Topic data-engineering etl sql

Category Data Science


The data engineering team should make it easy for others to access and query data. However, I would not say that they need to be experts at the data. If it's complex data then documentation of it should be written and maintained by a domain expert. This could be a business analyst, data scientist, or data analyst as an example. Taking the data science hat on me, I prefer as raw data as I almost can get. I need to know how my data is preprocessed and need to do it either myself, or in together with data engineers :) As an example N/A values can actually be informative and helpfull when building models. Hope that gives some context

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.