How to run unmodified Python program on GPU servers with scheduled GPUs?

Say I have one server with 10 GPUs. I have a python program which detects available GPU and use all of them.

I have a couple of users who will run python (Machine learning or data mining) programs and use GPU.

I initially thought to use Hadoop, as I find Yarn is good at managing resources, including GPU, and YARN has certain scheduling strategies, like fair, FIFO, capacity.

I don't like hard-coded rules, eg. user1 can only use gpu1, user2 can only use gpu2.

I later find Hadoop seems to require the program written in map-reduce pattern, but my requirement is to run unmodified code as we run on Windows or local desktop, or modify as little as possible.

Which knowledge should I look at for running and scheduling python programs on a machine with multiple GPUs?

Topic gpu cloud-computing apache-hadoop clustering

Category Data Science


A popular solution used for job management on GPU environments is SLURM.

SLURM allows specifying the resources needed by a job (e.g. 2 CPUs, 2Gb of RAM, 4 GPUs) and it will be scheduled for execution when the needed resources are available.

A job can be any program or script.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.