Python distributed machine learning

Question

Python distributed machine learning

Simon

2019年10月6日 17:35

I occasionally train neural nets for my research, and they usually take quite a long time to run (especially when I'm working on my laptop).

I'm looking for a way to build the model on any computer and send it up to a server for training and have it return the graphs/accuracies/weights etc. I know there are paid solutions for this but I'm looking for a distributed solution I can run myself.

I have a server set up at home which is about to get a CPU and GPU upgrade. I'd like to be able to set it up so that when I'm working on the LAN, or when I'm working remotely on my laptop, I can send code to the server and have it train the model and return to me the results (or save the results if the sender machine is switched off)

Are there any existing solutions to accomplish something like this? I'm not tied to any specific library, but would prefer to stick with Python if possible

Topic neural-network python distributed machine-learning

Category Data Science

knb · Accepted Answer · 2015年12月16日 10:31

Sounds like you want something like Apache Spark with its Python API. It is designed to run locally, or on a single server, OR in a distributed way, and its "distributedness" is hidden away from you as much as possible.

It has a big community - I took this MOOC in summer 2015 together with 80000 participants, and it was a great introduction to the topic. It is still open for enrollments but unmaintained by the instructors.

metjush · Accepted Answer · 2015年11月15日 21:02

This shouldn't be terribly complicated.

Big picture

Assuming you have Linux on your server: SSH to your server from your work laptop, train the network, receive results (the trained network).

Details

Train the network

If you want to stick with Python, there are two basic options that I am familiar with.

The first is PyBrain, a library built specifically for training neural nets. The syntax is fairly straightforward. The dataset structure is a bit unusual (especially if you are used to just using numpy arrays in scikit-learn), but otherwise, it works rather well. However, it doesn't support GPUs AFAIK, and I don't know how optimized/fast it is.

The second is Google's Tensor Flow. A bit of a heavy-weight if you just want to run vanilla neural nets, but the syntax is also pretty easy to get around (although very different from PyBrain). It's also probably faster than anything else you could find in Python. It also supports GPU training.

Get the results back

This depends on the library you choose (or it could be definitely up to you write your own implementation). But both libraries mentioned above will let you return the trained weights, which you can save as csv files and download upon training. Or you could pickle them and get back that way. Also, Tensor Flow has this Tensor Board functionality which should let you visualize the training and the network structure, but I haven't tried it yet, so can't help on that front.

Python distributed machine learning

About