How to run unmodified Python program on GPU servers with scheduled GPUs?

Question

How to run unmodified Python program on GPU servers with scheduled GPUs?

Gqqnbig

2021年1月29日 19:22

Say I have one server with 10 GPUs. I have a python program which detects available GPU and use all of them.

I have a couple of users who will run python (Machine learning or data mining) programs and use GPU.

I initially thought to use Hadoop, as I find Yarn is good at managing resources, including GPU, and YARN has certain scheduling strategies, like fair, FIFO, capacity.

I don't like hard-coded rules, eg. user1 can only use gpu1, user2 can only use gpu2.

I later find Hadoop seems to require the program written in map-reduce pattern, but my requirement is to run unmodified code as we run on Windows or local desktop, or modify as little as possible.

Which knowledge should I look at for running and scheduling python programs on a machine with multiple GPUs?

Topic gpu cloud-computing apache-hadoop clustering

Category Data Science

noe · Accepted Answer · 2021年1月29日 19:22

A popular solution used for job management on GPU environments is SLURM.

SLURM allows specifying the resources needed by a job (e.g. 2 CPUs, 2Gb of RAM, 4 GPUs) and it will be scheduled for execution when the needed resources are available.

A job can be any program or script.

How to run unmodified Python program on GPU servers with scheduled GPUs?

About