Skip to content

Cylc 8 architecture security model and design decisions

Jacinta Richardson edited this page Feb 1, 2021 · 7 revisions

Cylc 8 architecture

There are several components involved in the cylc-8 architecture. These are as follows:

  • Proxy
  • Hub
  • UI Server
  • Workflow hosts (Workflow Service)
  • Job hosts
  • ZeroMQ

Proxy

A configurable HTTP proxy that provides access to the UI Servers.

Hub

Currently an un-modified Jupyter Hub, the hub exists for the following purposes.

  • Authenticating users and identifying their roles/permissions
  • Re-authenticating users where applicable
  • Spawning UI servers belonging to specific users

UI Server

A Jupyter-notebook inspired custom UI server, that runs with the permissions of a regular system user. Provides the HTML+ web UI to the user's workflows. UI Servers may be located on the same host as the Hub or on other hosts. One UI Server exists per user. The UI server:

  • Lists workflows
  • Allows interaction with specific workflows owned by the same user as the UI Server owner (stop, start, hold, edit triggers etc) by both the UI Server owner and anyone authenticated with a role that allows that interaction.
  • Provides access to workflow logs
  • Provides 'rose edit' functionality, to allow editing of workflow parameters.

Workflow Service

The workflow host is the host and file system where the workflow files have been installed, and where cylc runs the workflows. A UI Server may have workflows across multiple hosts, but each workflow runs only on one host as the workflow service.

The Workflow Service is the cylc daemon that collects task requirements and runs tasks as required for the workflow, as defined in the workflow's suite.rc.

Job host

Host and file system where a workflow's remote jobs run. A workflow may run jobs on multiple job hosts, as well as background jobs running on the same host as the workflow service. Job clients know how to connect to their workflow service.

ZeroMQ

ZeroMQ is used to provide reliable communication between a workflow service's jobs and itself, and between a workflow and its UI Server. By utilizing a messaging queue, messages are robust against network hiccoughs. See more in the CurveZMQ Authentication for Cylc8 guide.

Architectural considerations

Two primary principles have lay behind decisions in making this architecture:

  1. Workflows have to be able to run as their user, and submit their tasks (including to remote machines).
  2. Users have to be able to find, start, stop, edit all of their own workflows from a single location. Secondarily users have to be able to find and interact with any other suites with which they have the permissions to interact.

In every case, tried-and-proven technologies have been preferred over custom-work and non-privileged actions have been preferred over privileged actions. Intra-workflow permissions rely on UNIX file system permissions, for example the UI Server acts on a workflow as its user, workflows run only as their user, and jobs run only as their user. Only files which have the execute bit set for the user can be executed, only files and directories which have the write bit set can be written to and so forth. Inter-workflow permissions rely on authentication at the hub and authorization at the UI server.

Component security

User's browser connection to proxy/hub

User connection to the proxy and hub will be via HTTPS or WebSockets over SSL/TLS (aka Websockets over HTTPS, aka WSS) with a signed certificate as arranged by the organisation.

Where the connection can use WSS, the interaction with workflows and the UI will be appropriately faster than the equivalent over HTTPS. WebSockets over SS/TLS is well supported with modern browsers and HTTPS is available as a secure fall back option.

The authenticated user name will be sent with each request.

Proxy connection with (spawned) UI Server

UI Servers will are hosted locally at an address like /usr/{name} The proxy will proxy the established HTTPS/WSS connection from the Hub through to the UI Server, even where the UI Servers are hosted on different machines than the hub or each other. UI servers are based on the Jupyter Notebooks, a proven technology commonly used for interactive programming and sharing embedded code.

The UI Server builds on the Jupyter Notebook Server with Tornado. Jupyter Notebooks also include a Notebook Kernel, in Cylc 8 this is replaced with the Cylc Workflow Service which executes the workflow. Both the UI Server and Cylc Workflow Service run as the user, with the user's permissions.

Hub

As an unedited version of the Jupyter Hub, the Jupyter Hub Security Overview is generally relevant.

Authentication is performed by the use of a Jupyter Hub authentication plugin to the organisation's host or site identity management eg PAM, LDAP, OAuth (GitHub and Google accounts), etc. See Jupyter's Authenticators page for more detail.

Successful authentication will generate a token representing the user, their roles (if applicable) and their session. This is shared with the UI Server. Authentication state (and information) is encrypted with Fernet as per the Jupyter's Authenticators page.

Authorization at the UI Server

Cylc UI servers are independent of each other and cannot share HTML fragments or code between each other. Unlike Jupyter notebooks, the HTML from UI Server is not generated by users, and indeed all user input displayed on the UI Server (such as workflow and task names) is HTML-escaped before display.

With one exception, each UI Server provides an independent view of the workflows owned by the UI Server's owner. Any action the UI Server enables is performed by that UI Server's UNIX user.

The partial exception is the gscan-like functionality. The gscan-like functionality behaves differently than the cylc UI Servers as it provides a (read-only) view into all of the running (and stopped) workflows for multiple users.

To enable authenticated users to perform actions on other users' UI Servers, the user must be authorized to perform this action. Authorization is broken into three concepts:

  • Read-only - a user may view the workflow, its logs and its full state, but make no changes
  • Execute - a user may stop, start, pause/hold, restart the workflow and tasks
  • Write - a user may perform edit triggers, and make other code-related changes to workflow tasks and suite.rc

The precise mechanics for user authorization for interacting with other users' workflows are still under development.

Command line and UNIX-level access

User command line commands

User executed commands go via the proxy. This allows remote commands and commands from other authorized users. Session management is handled via (...)?

Job-executed commands

Job-executed commands come from the parent Cylc Workflow Service and are run as the user on the machine.

Questions:

  • how are the workflow files actually deployed onto the workflow server?
  • if a workflow is started manually, but in an equivalent way to the UI Server's starting them, does the "contact" file have to be registered with the UI server in some way or will it just scan over the equivalent of ~/cylc-run/*/ looking for contact files? (Is this how it will find stopped suites? Can we therefore just delete/move old ones when we don't want those suites to show up as existing and stopped?))
  • how are command-line level interactions managed?

Note

1.

I guess a couple of points for UI Server justification (?):

Serves the WUI, one per user (instead of Cylc7's one per workflow scheduler). By not modifying hub to serve the UI, we avoid having to maintain our own Jupyterhub. And/Or, instead of Hub/central-UIS, allows for the WFS sync & WUI load to be divided by user and delegated to multiple other machines. Isolates workflow schedulers from WUI load. Open to criticism on these points .. But just trying to further rationalise our choice of

not centralising the UIS with the hub and have each WFS authenticate directly with it.. OR not decentralising the UIS by having each WFS authenticate directly with the hub and each serve a WUI..

2.

Here's a big one: the UI Server must be able to see the filesystem where the workflows reside in order to present (e.g.) stopped suites and historical data. The hub can be remote from there. Note this nixes the workflow's serving their own UIs too. At cylc-7 we could do that because the GUIs can see the filesystem. (A scheduler can't serve its own UI if it is not running...)