When would C become nescessary to do an analysis or manage data?

Question

When would C become nescessary to do an analysis or manage data?

Angus Campbell

2021年1月28日 20:31

I use python in my day to day work as a research scientist and I am interested in learning C. When would a situation arise where python would prove insufficient to manipulate data?

Topic c++ data c python

Category Data Science

n1k31t4 · Accepted Answer · 2021年1月28日 20:31

Most of the common libraries you would use for data manipulation do actually use C (or C++ or Fortran, etc.) under the hood.

There are even libraries such as CuPy, which offers the entire NumPy API, but can run your code on a GPU. Using GPUs for speed is a much more common use case these days (in my experience), compared to writing the C/C++ version.

EDIT: here is a related answer, about which programming languages are most competitive for AI, ML, DataScience, etc.

In my opinion, you might need to "do it yourself" in one of 3 cases:

1. Speed

You need it to run faster than current libraries offer - e.g. if the clustering algorithms in Scikit-Learn are too slow

2. Memory

You need to use less memory that existing algorithms - perhaps a specific method on your Pandas DataFrame uses more memory that you have available

3. New Algorithms

You need something that is fairly fast or very low level, and no existing library offers it. I would normally suggest trying your idea first using NumPy though, before trying to roll your own binaries.

When would C become nescessary to do an analysis or manage data?

1. Speed

2. Memory

3. New Algorithms

About