I want to learn how to construct data science packages on top of core packages. Is there a list of excellent data science packages I can learn from?

Short question

I want to learn how to construct data science packages on top of core packages. Is there a list of excellent data science packages I can learn from?

Long question

I recently came across an excellent video where Joel Grus live codes a neural network library in Python. As an inexperienced data scientist without a software engineering background, this was the first time I saw the construction of a "complete" data science package from scratch.

My data analysis code up to this point has been very straightforward, only splitting into functions here and there, and never splitting into modules even where it may have been beneficial. I have since tried to find more resources to teach myself this skill. The Hitchhiker's Guide to Python has been useful.

I gather that it is important to learn by reading excellent packages. The Hitchhiker's guide does suggest some Python packages, but none that I could find were within data science (like Joel Grus's neural network library).

I am not asking for a list of the core data science packages of a language, e.g., numpy, scipy, sklearn, pandas, etc. Rather, a list of packages constructed on top of those.

Does such a list already exist? If not, can (should) we create it here?

EDIT: In hindsight, sklearn is possibly the perfect example of what I'm looking for, since its built on top of numpy and scipy.

Topic programming software-development python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.