Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job.sh improvement for creating share and work directories? #5567

Open
ColemanTom opened this issue Jun 2, 2023 · 3 comments · May be fixed by #5978
Open

job.sh improvement for creating share and work directories? #5567

ColemanTom opened this issue Jun 2, 2023 · 3 comments · May be fixed by #5978
Labels
question Flag this as a question for the next Cylc project meeting.
Milestone

Comments

@ColemanTom
Copy link
Contributor

ColemanTom commented Jun 2, 2023

Problem

In Cylc (both 7 and 8) job.sh scripts, the mkdir commands for the share and work directories (I'm not sure where share/cycle is created) assume the directory is not a symbolic link.

    # Create share and work directories
    mkdir -p "${CYLC_WORKFLOW_SHARE_DIR}" || true
    mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true
    mkdir -p "${CYLC_TASK_WORK_DIR}"

Not so hypothetically, assume there are two lustre disks on the HPC, a production one and a failover disk. Like below.

/g/sc/production_disk -> /g/sc/disk1
/g/sc/failover_production_disk -> /g/sc/disk2

Let's say CYLC_TASK_WORK_DIR -> /g/sc/production_disk/cylc-run/my_workflow/work.

Then, let's say a disk failover happens, so now

/g/sc/production_disk -> /g/sc/disk2
/g/sc/failover_production_disk -> /g/sc/disk2  # disk1 is offline

Cylc job.sh will fail to create these directories, and things will assume they exist.
This can lead to failures if the directories don't exist.

Proposed Solution

Be a bit more defensive and do mkdir on where that disk is pointing at.

    # Create share and work directories
    mkdir -p "$(readlink -m "${CYLC_WORKFLOW_SHARE_DIR}")" || true
    mkdir -p "$(readlink -m "$(dirname "${CYLC_TASK_WORK_DIR}")")" || true
    mkdir -p "$(readlink -m "${CYLC_TASK_WORK_DIR})"

Similar for any directory creation for directories that could be symbolic links as defined by the global.cylc[install][symlink dirs] area.

ps. whilst here, I don't quite get the need for two mkdir here

    mkdir -p "$(dirname "${CYLC_TASK_WORK_DIR}")" || true
    mkdir -p "${CYLC_TASK_WORK_DIR}"
$ CYLC_TASK_WORK_DIR=/home/USER/cylc-run/foo/work/20110511T1800Z/t1  # (from docs)
$ dirname "$CYLC_TASK_WORK_DIR"
/home/USER/cylc-run/foo/work/20110511T1800Z

Based on the options, neither of those two should be symbolic links, so you should just be able to have the latter one and ditch the dirname "$CYLC_TASK_WORK_DIR" I think.

@MetRonnie MetRonnie added this to the cylc-8.2.0 milestone Jun 13, 2023
@oliver-sanders oliver-sanders modified the milestones: cylc-8.2.0, cylc-8.3.0 Jun 29, 2023
@oliver-sanders oliver-sanders added the question Flag this as a question for the next Cylc project meeting. label Jul 25, 2023
@ColemanTom
Copy link
Contributor Author

Thinking about this, I had a suggestion. Any suggestions in the original were based on knowledge of Cylc7 only. A way to do this nicely in Cylc8 would be (making some assumptions).

  1. Pass through into the job script for each task, similar to a global-init-script in the global.cylc file, any resolved paths for a host. e.g. CYLC_WORKFLOW_SHARE_DIR_RESOLVED
  2. Change mkdir to something like mkdir -p "${CYLC_WORKFLOW_SHARE_DIR:-"$CYLC_WORKFLOW_SHARE_DIR"}" || true, similarly for the work directory one

This way it doesn't have the extra i/o of readlink, and still is just one mkdir, but instead of going via symlinks, its the actual path which is known as its in the global.cylc file now instead of managed via an external tool, rose.

Does that make sense as an idea?

@hjoliver
Copy link
Member

hjoliver commented Jan 30, 2024

I think I missed this one when original posted, sorry.

ps. whilst here, I don't quite get the need for two mkdir here

Yeah, same.

Does that make sense as an idea?

We'll need to think this through! Agreed needs a solution though... one for next project meeting, mid-Feb, if not before.

@ColemanTom
Copy link
Contributor Author

Thanks. I don't really like the idea of the readlink approach, although I believe it would work. If we can make use of the information contained in global.cylc instead somehow, that would be the best option I can think of without dwelling on it too extensively (or knowing all the Cylc ins and outs)..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Flag this as a question for the next Cylc project meeting.
Projects
None yet
4 participants