Skip to content

An Ansible role that installs the slurm workload manager on Ubuntu.

License

Notifications You must be signed in to change notification settings

marvel-nccr/ansible-role-slurm

Repository files navigation

CI Ansible Role Release

Ansible Role: marvel-nccr.slurm

An Ansible role that installs the slurm workload manager on Ubuntu (tested on 16.04, 18.04 and 20.04).

Note: This role configures the machine on which slurm is running to also be the compute node (it thus contains e.g. some autodetection capability around #CPUs on the machine it is being run on). So far the role is not yet designed to set up a compute cluster with multiple nodes, for this see e.g. tools like https://github.com/elasticluster/elasticluster.

The role:

  • Installs the slurm packages
  • Sets up the slurm configuration (/etc/slurm-llnl/slurm.conf) to dynamically use the correct platform resources (hostname, #CPUs, etc), configuring one node (named $HOSTNAME) and one partition (named slurm_partition_name).
  • Adds a slurm-resources script and start-up service to automate the initiation of correct platform resources (required if creating a VM image where instances may have different resources)
  • Starts the slurm services.

To check the services are running (assuming systemd in use):

$ systemctl --type=service
...
slurmctld.service                  loaded active running Slurm controller daemon
slurmd.service                     loaded active running Slurm node daemon
...

To check the slurm node/partition:

$ scontrol show node
$ scontrol show partition

This should match the resources given in lscpu.

To enable/disable the slurm-resources start up service:

$ systemctl enable slurm-resources

To alter the resources configuration of slurm directly, you can use e.g.:

$ slurm-resources -e restart_on_change=true -e slurm_max_cpus=2

This will update the resources defined for the node, set the maximum CPUs for the partition to 2 (independent of the CPUs available on the node), and restart the slurm services with the updated configuration (if the configuration has changed).

Installation

ansible-galaxy install marvel-nccr.slurm

Role Variables

See defaults/main.yml

Example Playbook

- hosts: servers
  roles:
  - role: marvel-nccr.slurm

Development and testing

This role uses Molecule and Docker for tests.

After installing Docker:

Clone the repository into a package named marvel-nccr.slurm (the folder must be named the same as the Ansible Galaxy name)

git clone https://github.com/marvel-nccr/ansible-role-slurm marvel-nccr.slurm
cd marvel-nccr.slurm

Then run:

pip install -r requirements.txt  # Installs molecule
molecule test  # runs tests

or use tox (see tox.ini):

pip install tox
tox

Code style

Code style is formatted and linted with pre-commit.

pip install pre-commit
pre-commit run -all

Deployment

Deployment to Ansible Galaxy is automated via GitHub Actions. Simply tag a release vX.Y.Z to initiate the CI and release workflow. Note, the release will only complete if the CI tests pass.

License

MIT

Contact

Please direct inquiries regarding Quantum Mobile and associated ansible roles to the AiiDA mailinglist.

About

An Ansible role that installs the slurm workload manager on Ubuntu.

Resources

License

Stars

Watchers

Forks

Packages

No packages published