Pytorch Distributed Computing - Recomendations/Resources/Courses?

Question

Pytorch Distributed Computing - Recomendations/Resources/Courses?

Mason Acree

2020年4月29日 02:25

I would like to get into some distributed computing for processing Pytorch CNN models. I am completely fresh in this field and want to get some recommendations as to where I should start researching and learning techniques in distributed computing specifically for Deep Learning.

My motivation is that I have access to a lot of personal Windows 10 Desktops with great hardware, a few Ubuntu Linux machines of my own and then my personal desktop that is rigged with great hardware for a heavy workflow. Previously I've solely run separate scripts manually through other machines via RDP or SSH and let them compute for a few hours to a few days. But now I'm at the point it'd be a great idea to combine all these resources so when I have a large dataset to generate or train a Pytorch model on I can use 2-10x as many resources to cumulatively compute a script. (One heavy task I would have in mind for the future is something like Neural Architecture Search NAS, which seems simpler to parallelized between machines between each generation.)

My background is very Python and ML/DL focused through PyTorch. Then I also have a hobbyist level experience in networking, Linux VMs as well as dedicated servers, and computer hardware.

Other Ideas: Not familiar with, but not sure if any other buzzwords like grid computing, setting up a computer cluster, or etc would fit the scenario.

This is fresh for me but a long term interest. Point me in the right direction! Better forums, courses, articles, books, etc!

Edit 1: This could be a resource/recommendation for something PyTorch specific such as their distributed computing tutorial pages OR be something generalized for any python script distributed computing. Ex: Compute xx terabytes of spectrograms across 10 machines with a single call of a script.

Topic pytorch deep-learning distributed parallel

Category Data Science

Pytorch Distributed Computing - Recomendations/Resources/Courses?

About