Creating & handling large matrices in python?

I need to create a large matrix of size 400,000*400,000 and do some transformation on it. I am not able to do it using python in my laptop due to memory constraints. What technologies I can use to achieve this?

Topic matrix feature-engineering tfidf

Category Data Science


You can use Dask array, in which you can create essentially any size array and apply some transformation. What it basically does is load a chunk of the array that can be fit into memory process it and then save it. Check the following link :

[https://examples.dask.org/array.html][1]


I don't know if this is your job work or your personal project but cloud services can help you. Create a free account on Azure (you get 200$ worth free credit for 1 month) and the account is free for1 year.

You can run your project there using Azure Machine Learning/AutoML/Python SDK whatever you choose. Use the free 200$ credit in 1 month for any kind of large scale project which requires large computational power or large memory requirement. The link will give you an idea about the computational power and memory capacities on Azure.


Are many of the entries in the matrix zero? In this case, you can often deal with large matrices without using large amounts of memory. Sparse matrix data structures exist and so do algorithms for doing arithmetic with them.

scipy includes support for sparse matrices. https://docs.scipy.org/doc/scipy/reference/sparse.html


You can also buy some GPU's which will always help you to make up for the low memory allocation.

Cloud services will help as well but the variable costs are too high if your objective is to work on high dimension matrices

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.