Ways to speed up Python code for data science purposes
Although it might sound like a pure techie question, I would like to know which ways you usually try out, for very data science-like processes, when you need to speed up your processes (given that the data retrieval is not a problem and that it also fits in memory etc). Some of those could be the following, but I would like to receive feedback about any other else:
- good practices as always using Numpy when possible on numeric operations and not loops...
- more good practices like using 'apply', 'applymap'... instead of loops when applying functions to elements of lists, dataframes, etc
- Numba applied on native python loops, numpy arrays...
- multiprocessing with multiprocessing library depending on the available logical cores
This is motivated by the fact that, if we mainly use Python with all its advantages, we do not want to switch to other languages like Scala or Julia, unless there is no alternative.
Topic python efficiency scalability
Category Data Science