When would C become nescessary to do an analysis or manage data?

I use python in my day to day work as a research scientist and I am interested in learning C. When would a situation arise where python would prove insufficient to manipulate data?

Topic c++ data c python

Category Data Science


Most of the common libraries you would use for data manipulation do actually use C (or C++ or Fortran, etc.) under the hood.

There are even libraries such as CuPy, which offers the entire NumPy API, but can run your code on a GPU. Using GPUs for speed is a much more common use case these days (in my experience), compared to writing the C/C++ version.

EDIT: here is a related answer, about which programming languages are most competitive for AI, ML, DataScience, etc.


In my opinion, you might need to "do it yourself" in one of 3 cases:

1. Speed

You need it to run faster than current libraries offer - e.g. if the clustering algorithms in Scikit-Learn are too slow

2. Memory

You need to use less memory that existing algorithms - perhaps a specific method on your Pandas DataFrame uses more memory that you have available

3. New Algorithms

You need something that is fairly fast or very low level, and no existing library offers it. I would normally suggest trying your idea first using NumPy though, before trying to roll your own binaries.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.