Data science career problems: interaction of technical and social difficulties

Question

Data science career problems: interaction of technical and social difficulties

Stonecraft

2020年2月11日 07:51

I have been trying to break into the bioinformatics space (and now, data science more generally).

Although there are numerous challenges in this field and I am constantly learning to deal with them, I have have found the most consistent and intractible challenges are interpersonal, and they are unlike anything that my previous experience prepared me for. In particular, I find it very difficult to balance between getting the information I need (or think I need) to do a good analysis and maintaining productive relationships with people.

I have often had the sense that higher-ranking people than me were using their ability to act as gatekeepers of information to assert power over me, but it is almost never possible to be sure whether this is what is going on, or if I am making up excuses for my own lack of planning and follow-through.

A few examples.

When I was in grad school (before flunking out), I rotated in a bioinformatics lab where my attempts to pick a project always went in cicrles. If I'd suggest an experiment to the PI, the first question that I would need to answer before continuing was "What data do you want to use to answer this question?" Which sounds perfectly reasonable, except that when I would ask for access to databases that would contain such information, the question was "What question do you want to answer with access to this data". Thus, I could never figure out a project, because I had no way of knowing what questions I had the theoretical ability to address. What was going on there? Did the PI simply not like me? Did I need to be more clever in terms of social politics in order to discover the necessary inforation through informal channels? Is it that I lack some basic research skill that would have vaulted me past this predicament?
In another lab, the PI wanted me to do some work before a deadline, but to do the work I needed access to some additional datasets, which had supposedly already been prepared, and thus did not need to be factored into the estimate of how long the project would take. However, when I asked my coworker for the data, they sent me the wrong data, and I had to go through several back and forths before I got the file I needed, by which time I was unable to complete the main assignment by the deadline. The result was being let go from the lab and flunking out of grad school. Again, it's hard for me to deterimine where I failed in my responsibilities, and whether I was undermined by colleagues. How much of the job of a data scientist is simply resiliant against and planning for unhelpful coworkers? How would one explain such a situation to a boss without giving the impression of trying to blame others for one's own failings?
In a third project, not in academia and not science related, I am being asked to do some analyses on a dataset that contains a lot of numbers that I don't know the origin or meaning of. According to the person who assigned me the task, these numbers are not relevant to me and I don't need to know them. I'm not so sure that is the case, and have repeatedly asked that these numbers be explained to me, to nop avail. I'm trying to work with what I was given, but it drives me insane to have to ignore data without even knowing what exactly it is. Similarly, the datasets keep getting moved to archive, and while there are always datasets of similar format that I can test my code on, I am not able to consistently work with one of them and keep using it as a reference. Instead I have to use the most current data. The result of this is that if I notice potential issues with the dataset, I need to get a response from my higher-ups before the dataset is moved to archive, otherwise I will have to find a new example in the most current data, which hopefully I will get a response about before it goes to archive. Of course, in theory, since all these datasets are formatted the same, it shouldn't matter what data I am using, my code should work equally well. But I still want to be able to stick with one example until my question is answered, rather than constantly documenting new ones.

So these are the questions I need to figure out answers to:

1) How common is this sort of behavior in the data science world?

2) Are there good reasons for it, or is it just how people assert power in this space?

3) Is the problem most likely my approach to data, or my approach to people?

Based on your experience and my telling of the situations, what do you think is most likely?

Topic career

Category Data Science

Sammy · Accepted Answer · 2020年2月11日 07:51

1) How common is this sort of behavior in the data science world?

Very common across the board and not necessarily related to data science but any data request. I cannot comment on academia and my experience is much broader than what nowadays is considered data science but with regards to data requests towards different departments (e.g. controlling, sales, production, supply chain, finance, HR) in different industries (e.g. chemicals, metals, consumer goods, retail, utilities) I have seen this a lot.

2) Are there good reasons for it, or is it just how people assert power in this space?

There are plenty of different reasons, e.g.

high number of incoming data requests,
high general workload,
unclear goal of data request,
bad quality of data requests,
not-ideal form of communication,
incorrect path of communication,
interpersonal issues (with you, your boss, or your department).

The important thing here is to anticipate these and request your data accordingly - which brings me to your third question.

3) Is the problem most likely my approach to data, or my approach to people?

From the list above you can directly draw some conclusions how to properly request data:

Provide your requests in a way that helps the receiver to prioritize properly and process it efficiently. [addresses number 1 and 2 in above list]
State clearly what data you need and what the purpose of your request is. [addresses number 3 and 4 in above list]
communicate in line with your company culture, e.g. choose the right level of politeness (hugely dependent on corporate culture and countries but very generally tech-people might sometimes come across as too demanding, direct or even impolite). [addresses point 5 in above list]
Follow the correct path of communication in your organization, e.g. involve superiors before approaching someone on the working level in a different department or team, have your boss approach the other department or team if appropriate, reach out to people responsible for data provision (and not just anyone who has access to or "owns" it) and if in doubt align with your boss how to proceed. [addresses number 6 of above list]
Keep interpersonal relationships in mind and approach them accordingly [addresses number 7 in above list]

With regards to the examples you provided I would like to point out two things.

Example 1: As described above it is helpful to explain the reasons for a data request. Failing to answer this question might lead to deprioritizing or not complying with your data request.

Example 2: This can and does happen a lot. There are at least two things you can do about it: 1. include contingencies in your planning. 2. Flag potential and actual delays asap with your supervisor.

All this is not say that it is all your fault but do your part to make things go as smoothly as possible.

Shivi Kulshrestha · Accepted Answer · 2020年2月10日 20:26

Answering your questions below. 1) How common is this sort of behavior in the data science world? Not all fingers are equal. Some people are selfish. Dataset is not personal to anyone. They are made public for us to use. Either search dataset online or report this act to your supervisor.

2) Are there good reasons for it, or is it just how people assert power in this space? No good reasons. It is only an act to ensure that they don't loose their importance and dependency.

3) Is the problem most likely my approach to data, or my approach to people? I assume that these are public data. Try searching online with the dataset name or ask for help from your guide or supervisor.

Data science career problems: interaction of technical and social difficulties

About