Feeling Stuck on a Beginner – Intermediate level

Question

Feeling Stuck on a Beginner – Intermediate level

Lucas

2021年6月2日 04:40

Over the past two years, I have been working as a full-time data scientist for a government company. As the sole data science team in the organization, our job is a hybrid between data science and machine learning engineering. We need to research and develop ml solutions for the organization's business problems as well as implement them in production environments. The problem is I'm feeling stuck knowledge-wise and I don't know what can I do about that. Let me explain.

I have a major in computer science (B.Sc). Although I took some ai/ml courses during my major, I would contribute most of my data science education to the wonderful book Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. During these past two years in the organization, I have gained a lot of experience in the field: I managed to bring some fair – but, far from perfect - ml solutions for a couple of the organization's business problems.

But alas, I'm still feeling like I'm missing a big piece of the puzzle that holds me from going forward. I'm feeling like I'm stuck somewhere between a beginner to an intermediate data scientist. I know all about the basic ml models and their basic intuitions and algorithms. I know the basics of deep learning and how to implement them in keras/tensorflow/pytorch. I know about CNNs, RNNs, and other basic deep learning architectures. I'm pretty prolific with pandas, numpy, and all other common data preprocessing\wrangling\visualizations libraries. And yet, despite all of that, I can't shake the feeling that I'm missing something important. That something that would make the difference on the previous ml problems I worked on and would differentiate a professional data scientist from me. Sometimes I feel like, for a lack of a better term, a 'stack overflow' data scientist. I mean, with every problem it’s the same – I preprocess the data a bit (nothing too fancy or advanced), I try a couple of basic ml models (usually random forests\gradient boosting works the best) and then I try to see if I can get better results with a deep learning approach. Finally, I will do some hyper-parameters optimization and will start the process of implement this model in production.

I know the primary suspect is my not-so-great math/statistics knowledge but is it really? Obviously, I know the basic math behind the models (not that I see it really critical at this point) and I know the basic concepts in statistics. Will improving either one of these areas will really improve me as a data scientist in my day-to-day work? Cause honestly, I don’t think this is the answer. I'm not looking to do a master's in computer science. I'm looking more for some useful books, online courses, or anything else that might help.

To sum it up: how can I 'escape' this beginner area and become a next-level data scientist/ml engineer? A one that can bring something unique to the table, other than doing the basic and obvious stuff for each problem.

I would really appreciate any advice on this. Thanks in advance.

Topic books self-study deep-learning machine-learning

Category Data Science

yoav · Accepted Answer · 2021年6月2日 04:40

I agree with what previous answers stated and adding - One thing about being a professional data-scientist, or being a professional at all is the ability to bring results in an efficient way. In this scope it can mean several things -

Knowing which data is relevant and if more is needed(can be useful) - adding to that, how can it be collected?
Knowing if the problem requires ML or will a heuristic based approach be enough? Many of us fail here. 'professionals' use the right tool..
Managing the project end to end, by knowing which uncertainties should be tackled first.
Timing the project and break it down into milestones/steps - though process is mostly similar between projects an uncertainty a dataset can bring can makes this hard.
Designing a sustainable solution which wont require everlasting maintenance.
Communicating all of the above to your peers and stakeholders.

These are things that may come with experience and that can be learnt from others experiences.

Erwan · Accepted Answer · 2021年5月31日 08:22

There's a good chance that your question will be closed I'm afraid, but here are a few thoughts:

would differentiate a professional data scientist from me

A professional data scientist is somebody who does data science for a living, so you definitely belong to the club, congrats!

Seriously, apparently, you have at least some symptoms of the impostor syndrome: your level is appropriate, you're able to do your job, yet you feel inadequate. The usual advice on AcademiaSE (it's very common in academia) is to deal with the psychological aspect, optionally with some professional help.

Now about the myth of "the real professional data scientist": data science has become vast and specialized. There's not even a clear definition of the scope of data science, let alone a shared understanding of which knowledge/skills a data scientist should have. Additionally, the field changes very fast, so it's humanly impossible to know everything.

What people usually recommend is to gain as much experience as possible, and especially in your case since it looks like you already covered the theory fairly well. You can just pick a topic you'd like to dig deeper into and go for it.

For the record, I find browsing and answering questions on DataScienceSE a very good way to keep up, discover things that I didn't know, and progress. Why answering is useful you ask? Because it forces me to (1) understand the problem and think about how I would address it. An intellectual ML design exercise, it's always good to practice. (2) explain things in a clear way, which is always a good exercise to check how clear things are in my mind.

Sammy · Accepted Answer · 2021年5月31日 07:01

Just doing another online course, reading some stuff online or adding stats knowledge to fill your self-perceived gaps is not expected to solve anything. On the one hand, almost all online classes teach only the basics and probably not add anything for you. On the other hand additional stats knowledge or fancy stuff like self-driving cars is often of limited practical relevance.

My general advise is to start with the goal in mind. That is, think about your career and personal development path first and then find the right building blocks supporting that path. A mentor or good career coach(*) can help with this. Similarly, I'd discuss with your employer what your development plan is.

Having said that and generally speaking, I think digging into recent research papers in relevant areas or participating in data science competitions are 2 (not mutually exclusive) potential paths to pursue. However, beware that current research might be of limited practical relevance to your work but it brings you up to speed with regards to state-of-the-art methods. While competitions have a strong feature engineering- and model-focus, they are highly practical and can add something to your portfolio/CV.

(*) Beware that the coaching market has been flooded with many not well-trained amateurs

Feeling Stuck on a Beginner – Intermediate level

About