Data science career problems: interaction of technical and social difficulties
I have been trying to break into the bioinformatics space (and now, data science more generally).
Although there are numerous challenges in this field and I am constantly learning to deal with them, I have have found the most consistent and intractible challenges are interpersonal, and they are unlike anything that my previous experience prepared me for. In particular, I find it very difficult to balance between getting the information I need (or think I need) to do a good analysis and maintaining productive relationships with people.
I have often had the sense that higher-ranking people than me were using their ability to act as gatekeepers of information to assert power over me, but it is almost never possible to be sure whether this is what is going on, or if I am making up excuses for my own lack of planning and follow-through.
A few examples.
When I was in grad school (before flunking out), I rotated in a bioinformatics lab where my attempts to pick a project always went in cicrles. If I'd suggest an experiment to the PI, the first question that I would need to answer before continuing was "What data do you want to use to answer this question?" Which sounds perfectly reasonable, except that when I would ask for access to databases that would contain such information, the question was "What question do you want to answer with access to this data". Thus, I could never figure out a project, because I had no way of knowing what questions I had the theoretical ability to address. What was going on there? Did the PI simply not like me? Did I need to be more clever in terms of social politics in order to discover the necessary inforation through informal channels? Is it that I lack some basic research skill that would have vaulted me past this predicament?
In another lab, the PI wanted me to do some work before a deadline, but to do the work I needed access to some additional datasets, which had supposedly already been prepared, and thus did not need to be factored into the estimate of how long the project would take. However, when I asked my coworker for the data, they sent me the wrong data, and I had to go through several back and forths before I got the file I needed, by which time I was unable to complete the main assignment by the deadline. The result was being let go from the lab and flunking out of grad school. Again, it's hard for me to deterimine where I failed in my responsibilities, and whether I was undermined by colleagues. How much of the job of a data scientist is simply resiliant against and planning for unhelpful coworkers? How would one explain such a situation to a boss without giving the impression of trying to blame others for one's own failings?
In a third project, not in academia and not science related, I am being asked to do some analyses on a dataset that contains a lot of numbers that I don't know the origin or meaning of. According to the person who assigned me the task, these numbers are not relevant to me and I don't need to know them. I'm not so sure that is the case, and have repeatedly asked that these numbers be explained to me, to nop avail. I'm trying to work with what I was given, but it drives me insane to have to ignore data without even knowing what exactly it is. Similarly, the datasets keep getting moved to archive, and while there are always datasets of similar format that I can test my code on, I am not able to consistently work with one of them and keep using it as a reference. Instead I have to use the most current data. The result of this is that if I notice potential issues with the dataset, I need to get a response from my higher-ups before the dataset is moved to archive, otherwise I will have to find a new example in the most current data, which hopefully I will get a response about before it goes to archive. Of course, in theory, since all these datasets are formatted the same, it shouldn't matter what data I am using, my code should work equally well. But I still want to be able to stick with one example until my question is answered, rather than constantly documenting new ones.
So these are the questions I need to figure out answers to:
1) How common is this sort of behavior in the data science world?
2) Are there good reasons for it, or is it just how people assert power in this space?
3) Is the problem most likely my approach to data, or my approach to people?
Based on your experience and my telling of the situations, what do you think is most likely?
Topic career
Category Data Science