CSI volumes staging mount path collision between namespaces with CSI plugins that support staging #18741

ygg-drop · 2023-10-12T13:09:32Z

Nomad version

Output from nomad version

# nomad version
Nomad v1.6.2+ent
BuildDate 2023-09-13T17:11:57Z
Revision a93af012175baf56ed529b9f97e24afbf3415738

Operating system and Environment details

Ubuntu 23.04

Issue

CSI volumes are namespace-scoped in Nomad, yet Nomad does not include the namespace name in mount path for CSI plugins that support staging (like ceph-csi). If there are namespaces with CSI volumes that have the same ID, then when jobs from those namespace get scheduled on the same node the staging mount path for those volumes will collide.

Reproduction steps

Deploy CSI plugin that supports staging (for example ceph-csi)
Create namespace A and B
Create CSI volume with ID testvol in both namespaces
Run a job in namespace A that mounts testvol and is constrained to a certain node
Run a job in namespace B that mounts testvol and is constrained to the same node as job in namespace A

Expected Result

Both jobs should run successfully and each have access to the testvol CSI volume from their respective namespace.

Actual Result

I only tested with multi-node-multi-writer (CephFS volume) and the result was that testvol from namespace A was bind-mounted to per-alloc directory for an allocation from a job from namespace B. This is a potential security issue.

The staging path looks like $NOMAD_DATA_DIR/client/csi/node/$CSI_PLUGIN_ID/staging/testvol/rw-file-system-multi-node-multi-writer. The namespace is not included in the path.

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

The text was updated successfully, but these errors were encountered:

the-nando · 2023-10-12T14:10:23Z

I've tested it with csi-efs, which doesn't use StageVolume and it works as expected in a similar setup as the OP.
According to the CSI specs for NodeStageVolume:

// The path to which the volume MAY be staged. It MUST be an
// absolute path in the root filesystem of the process serving this
// request, and MUST be a directory. The CO SHALL ensure that there
// is only one `staging_target_path` per volume.

So the issue seems to be originating from https://github.com/hashicorp/nomad/blob/v1.6.2/client/pluginmanager/csimanager/volume.go#L170 which doesn't include the namespace, if any:

pluginStagingPath := v.stagingDirForVolume(v.containerMountPoint, vol.ID, usage)

jrasell · 2023-10-16T09:44:03Z

Hi @the-nando and @ygg-drop for raising this issue, I'll add this to our roadmapping list.

ron-savoia · 2023-10-20T15:50:18Z

I've tested this in my lab and see the same results with the staging directory. I also noticed another oddity when creating volumes in separate namespaces where the volume name is the same. When the name is the same for two volumes in separate nomad namespaces only one volume was created on the storage side, which was accessable by both jobs in different namespaces. Output from both scenarios are below.

Same ID in Volume file

Volume 1

# vol1.hcl
id = "testvol"
namespace = "A"
name = "vol-ceph-1g-A"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "nomad"
  userKey = "AQA6vi5lcqAmExAADSkeWFyTfrldI9rPhfview=="
}

parameters {
  clusterID = "71bd4b18-6cf7-11ee-99e0-776f051e8016"
  pool = "nomad"
  imageFeatures = "layering"
}

Volume 2

# vol2.hcl
id = "testvol"
namespace = "B"
name = "vol-ceph-1g-B"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "nomad"
  userKey = "AQA6vi5lcqAmExAADSkeWFyTfrldI9rPhfview=="
}

parameters {
  clusterID = "71bd4b18-6cf7-11ee-99e0-776f051e8016"
  pool = "nomad"
  imageFeatures = "layering"
}

VOLUME1 - NS A

root@nomad-server-1:/home/vagrant# nomad volume create vol1.hcl
Created external volume 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2926129f-6e90-11ee-b829-0242ac110002 with ID testvol

root@nomad-server-1:/home/vagrant# nomad volume status -namespace="A"
Container Storage Interface
ID       Name           Namespace  Plugin ID  Schedulable  Access Mode
testvol  vol-ceph-1g-A  A          ceph-csi   true         <none>

root@ceph1:/home/ron# rbd ls nomad
csi-vol-2926129f-6e90-11ee-b829-0242ac110002

VOLUME2 - NS B

root@nomad-server-1:/home/vagrant# nomad volume create vol2.hcl
Created external volume 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2fbb5f27-6e90-11ee-b829-0242ac110002 with ID testvol

root@nomad-server-1:/home/vagrant# nomad volume status -namespace="B"
Container Storage Interface
ID       Name           Namespace  Plugin ID  Schedulable  Access Mode
testvol  vol-ceph-1g-B  B          ceph-csi   true         <none>

root@ceph1:/home/ron# rbd ls nomad
csi-vol-2926129f-6e90-11ee-b829-0242ac110002
csi-vol-2fbb5f27-6e90-11ee-b829-0242ac110002

Job 1

root@nomad-server-1:/home/vagrant# nomad run job1.nomad
==> 2023-10-20T14:55:37Z: Monitoring evaluation "ad6b5ccd"
    2023-10-20T14:55:37Z: Evaluation triggered by job "busybox1"
    2023-10-20T14:55:37Z: Allocation "e07db70a" created: node "d73d5317", group "bbox1"
    2023-10-20T14:55:38Z: Evaluation within deployment: "5325ebf3"
    2023-10-20T14:55:38Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-20T14:55:38Z: Evaluation "ad6b5ccd" finished with status "complete"
==> 2023-10-20T14:55:38Z: Monitoring deployment "5325ebf3"
  ✓ Deployment "5325ebf3" successful

    2023-10-20T14:55:55Z
    ID          = 5325ebf3
    Job ID      = busybox1
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    bbox1       1        1       1        0          2023-10-20T15:05:54Z

$ nomad alloc exec -i -t -task busybox1 e07db70a sh
/ # 
/ # ls
alloc    dev      home     lib64    proc     secrets  tmp      var
bin      etc      lib      local    root     sys      usr
/ # cd alloc/
/alloc # ls
data  logs  test  tmp
/alloc # cd test/
/alloc/test # ls
lost+found
/alloc/test # touch job1
/alloc/test # 

root@nomad-client-1:/opt/nomad/data/client/csi# tree
.
├── controller
│   └── ceph-csi
├── node
│   └── ceph-csi
│       ├── per-alloc
│       │   └── e07db70a-c6ab-c65b-ae67-141fc0296938
│       │       └── testvol
│       │           └── rw-file-system-single-node-writer
│       │               ├── job1
│       │               └── lost+found
│       └── staging
│           └── testvol
│               └── rw-file-system-single-node-writer
│                   ├── 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2926129f-6e90-11ee-b829-0242ac110002
│                   │   ├── job1
│                   │   └── lost+found
│                   └── image-meta.json
└── plugins
    ├── 660c73a3-ad59-f405-eaee-7ef1326eb593
    │   └── csi.sock
    └── 8a4b4786-52ba-6234-249c-f0019137291a
        └── csi.sock

17 directories, 5 files

JOB2

root@nomad-server-1:/home/vagrant# nomad run job2.nomad
==> 2023-10-20T14:57:54Z: Monitoring evaluation "d3298440"
    2023-10-20T14:57:54Z: Evaluation triggered by job "busybox2"
    2023-10-20T14:57:54Z: Allocation "4e504fd3" created: node "d73d5317", group "bbox2"
    2023-10-20T14:57:55Z: Evaluation within deployment: "a0f456e3"
    2023-10-20T14:57:55Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-20T14:57:55Z: Evaluation "d3298440" finished with status "complete"
==> 2023-10-20T14:57:55Z: Monitoring deployment "a0f456e3"
  ✓ Deployment "a0f456e3" successful

    2023-10-20T14:58:10Z
    ID          = a0f456e3
    Job ID      = busybox2
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    bbox2       1        1       1        0          2023-10-20T15:08:08Z


$ nomad alloc exec -i -t -task busybox2 4e504fd3 sh
/ # 
/ # cd /alloc/test/
/alloc/test # ls
lost+found
/alloc/test # touch job2
/alloc/test # 

root@nomad-client-1:/opt/nomad/data/client/csi# tree
.
├── controller
│   └── ceph-csi
├── node
│   └── ceph-csi
│       ├── per-alloc
│       │   ├── 4e504fd3-7205-8f54-71cd-243d2a506a7f
│       │   │   └── testvol
│       │   │       └── rw-file-system-single-node-writer
│       │   │           ├── job2
│       │   │           └── lost+found
│       │   └── e07db70a-c6ab-c65b-ae67-141fc0296938
│       │       └── testvol
│       │           └── rw-file-system-single-node-writer
│       │               ├── job1
│       │               └── lost+found
│       └── staging
│           └── testvol
│               └── rw-file-system-single-node-writer
│                   ├── 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2926129f-6e90-11ee-b829-0242ac110002
│                   │   ├── job1
│                   │   └── lost+found
│                   ├── 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2fbb5f27-6e90-11ee-b829-0242ac110002
│                   │   ├── job2
│                   │   └── lost+found
│                   └── image-meta.json
└── plugins
    ├── 660c73a3-ad59-f405-eaee-7ef1326eb593
    │   └── csi.sock
    └── 8a4b4786-52ba-6234-249c-f0019137291a  
        └── csi.sock

23 directories, 7 files

Same Name in Volume file

Volume 1

# vol1.hcl
id = "vol-ceph-1g-A"
name = "vol-ceph-1g"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "dan"
  userKey = "AQCIayllDOWPGBAAVbfb38ng6jksv9+7DTQ6wA=="
}

parameters {
  clusterID = "2919d323-dd87-426b-b3d9-a7b2b84bd156"
  pool = "nomad"
  imageFeatures = "layering"
}

Volume 2

# vol2.hcl
id = "vol-ceph-1g-B"
name = "vol-ceph-1g"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "dan"
  userKey = "AQCIayllDOWPGBAAVbfb38ng6jksv9+7DTQ6wA=="
}

parameters {
  clusterID = "2919d323-dd87-426b-b3d9-a7b2b84bd156"
  pool = "nomad"
  imageFeatures = "layering"
}

VOLUME1 - NS A

root@nomad-client-1:/home/vagrant# ls
vol1.hcl  vol2.hcl  vol_reg.hcl
root@nomad-client-1:/home/vagrant# date; nomad volume create -namespace=A vol1.hcl
Fri 13 Oct 2023 07:40:51 PM UTC
Created external volume 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a with ID vol-ceph-1g-A
root@nomad-client-1:/home/vagrant# date; nomad volume status -namespace=A vol-ceph-1g-A
Fri 13 Oct 2023 07:40:57 PM UTC
ID                   = vol-ceph-1g-A
Name                 = vol-ceph-1g
Namespace            = A
External ID          = 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a
Plugin ID            = ceph-csi
Provider             = rbd.csi.ceph.com
Version              = canary
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = fs_type: ext4 flags: [REDACTED]
Namespace            = A

Allocations
No allocations placed

root@nomad-client-2:/home/vagrant# rbd ls nomad
csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a

VOLUME2 - NS B

root@nomad-client-1:/home/vagrant# date; nomad volume create -namespace=B vol2.hcl
Fri 13 Oct 2023 07:41:34 PM UTC
Created external volume 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a with ID vol-ceph-1g-B
root@nomad-client-1:/home/vagrant# date; nomad volume status -namespace=B vol-ceph-1g-B
Fri 13 Oct 2023 07:41:42 PM UTC
ID                   = vol-ceph-1g-B
Name                 = vol-ceph-1g
Namespace            = B
External ID          = 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a
Plugin ID            = ceph-csi
Provider             = rbd.csi.ceph.com
Version              = canary
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = fs_type: ext4 flags: [REDACTED]
Namespace            = B

Allocations
No allocations placed

root@nomad-client-2:/home/vagrant# rbd ls nomad
csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a

Controller log during volumes create

I1013 19:40:50.613909       1 utils.go:195] ID: 1813 GRPC call: /csi.v1.Identity/Probe
I1013 19:40:50.613958       1 utils.go:206] ID: 1813 GRPC request: {}
I1013 19:40:50.614018       1 utils.go:212] ID: 1813 GRPC response: {}
I1013 19:40:51.041177       1 utils.go:195] ID: 1814 Req-ID: vol-ceph-1g GRPC call: /csi.v1.Controller/CreateVolume
I1013 19:40:51.041311       1 utils.go:206] ID: 1814 Req-ID: vol-ceph-1g GRPC request: {"accessibility_requirements":{},"capacity_range":{"limit_bytes":1000000000,"required_bytes":1000000000},"name":"vol-ceph-1g","parameters":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","pool":"nomad"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["noatime"]}},"access_mode":{"mode":1}}]}
I1013 19:40:51.041402       1 rbd_util.go:1309] ID: 1814 Req-ID: vol-ceph-1g setting disableInUseChecks: false image features: [layering] mounter: rbd
I1013 19:40:51.043848       1 omap.go:88] ID: 1814 Req-ID: vol-ceph-1g got omap values: (pool="nomad", namespace="", name="csi.volumes.nomad-client-1-controller"): map[]
I1013 19:40:51.051210       1 omap.go:158] ID: 1814 Req-ID: vol-ceph-1g set omap keys (pool="nomad", namespace="", name="csi.volumes.nomad-client-1-controller"): map[csi.volume.vol-ceph-1g:c2b6f110-5924-459f-8695-4fec5d008f4a])
I1013 19:40:51.051825       1 omap.go:158] ID: 1814 Req-ID: vol-ceph-1g set omap keys (pool="nomad", namespace="", name="csi.volume.c2b6f110-5924-459f-8695-4fec5d008f4a"): map[csi.imagename:csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a csi.volname:vol-ceph-1g])
I1013 19:40:51.051839       1 rbd_journal.go:490] ID: 1814 Req-ID: vol-ceph-1g generated Volume ID (0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a) and image name (csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a) for request name (vol-ceph-1g)
I1013 19:40:51.051879       1 rbd_util.go:423] ID: 1814 Req-ID: vol-ceph-1g rbd: create nomad/csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a size 954M (features: [layering]) using mon 172.16.45.132
I1013 19:40:51.051890       1 rbd_util.go:1557] ID: 1814 Req-ID: vol-ceph-1g setting image options on nomad/csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a
I1013 19:40:51.064933       1 controllerserver.go:743] ID: 1814 Req-ID: vol-ceph-1g created image nomad/csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a backed for request name vol-ceph-1g
I1013 19:40:51.074451       1 omap.go:158] ID: 1814 Req-ID: vol-ceph-1g set omap keys (pool="nomad", namespace="", name="csi.volume.c2b6f110-5924-459f-8695-4fec5d008f4a"): map[csi.imageid:105826efcd7a])
I1013 19:40:51.074571       1 utils.go:212] ID: 1814 Req-ID: vol-ceph-1g GRPC response: {"volume":{"capacity_bytes":1000341504,"volume_context":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","imageName":"csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a","journalPool":"nomad","pool":"nomad"},"volume_id":"0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a"}}
I1013 19:40:52.500763       1 utils.go:195] ID: 1815 GRPC call: /csi.v1.Identity/Probe
I1013 19:40:52.500808       1 utils.go:206] ID: 1815 GRPC request: {}
I1013 19:40:52.500830       1 utils.go:212] ID: 1815 GRPC response: {}
I1013 19:40:52.502183       1 utils.go:195] ID: 1816 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:40:52.502212       1 utils.go:206] ID: 1816 GRPC request: {}
I1013 19:40:52.502221       1 controllerserver-default.go:72] ID: 1816 Using default ControllerGetCapabilities
I1013 19:40:52.504936       1 utils.go:212] ID: 1816 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1013 19:41:20.616355       1 utils.go:195] ID: 1817 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:20.616421       1 utils.go:206] ID: 1817 GRPC request: {}
I1013 19:41:20.616446       1 utils.go:212] ID: 1817 GRPC response: {}
I1013 19:41:22.509589       1 utils.go:195] ID: 1818 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:22.509619       1 utils.go:206] ID: 1818 GRPC request: {}
I1013 19:41:22.509634       1 utils.go:212] ID: 1818 GRPC response: {}
I1013 19:41:22.509988       1 utils.go:195] ID: 1819 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:41:22.510023       1 utils.go:206] ID: 1819 GRPC request: {}
I1013 19:41:22.510030       1 controllerserver-default.go:72] ID: 1819 Using default ControllerGetCapabilities
I1013 19:41:22.510102       1 utils.go:212] ID: 1819 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1013 19:41:34.883635       1 utils.go:195] ID: 1820 Req-ID: vol-ceph-1g GRPC call: /csi.v1.Controller/CreateVolume
I1013 19:41:34.883798       1 utils.go:206] ID: 1820 Req-ID: vol-ceph-1g GRPC request: {"accessibility_requirements":{},"capacity_range":{"limit_bytes":1000000000,"required_bytes":1000000000},"name":"vol-ceph-1g","parameters":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","pool":"nomad"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["noatime"]}},"access_mode":{"mode":1}}]}
I1013 19:41:34.883932       1 rbd_util.go:1309] ID: 1820 Req-ID: vol-ceph-1g setting disableInUseChecks: false image features: [layering] mounter: rbd
I1013 19:41:34.885933       1 omap.go:88] ID: 1820 Req-ID: vol-ceph-1g got omap values: (pool="nomad", namespace="", name="csi.volumes.nomad-client-1-controller"): map[csi.volume.vol-ceph-1g:c2b6f110-5924-459f-8695-4fec5d008f4a]
I1013 19:41:34.887524       1 omap.go:88] ID: 1820 Req-ID: vol-ceph-1g got omap values: (pool="nomad", namespace="", name="csi.volume.c2b6f110-5924-459f-8695-4fec5d008f4a"): map[csi.imageid:105826efcd7a csi.imagename:csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a csi.volname:vol-ceph-1g]
I1013 19:41:34.899661       1 rbd_journal.go:345] ID: 1820 Req-ID: vol-ceph-1g found existing volume (0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a) with image name (csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a) for request (vol-ceph-1g)
I1013 19:41:34.899820       1 utils.go:212] ID: 1820 Req-ID: vol-ceph-1g GRPC response: {"volume":{"capacity_bytes":1000341504,"volume_context":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","imageName":"csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a","journalPool":"nomad","pool":"nomad"},"volume_id":"0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a"}}
I1013 19:41:50.620247       1 utils.go:195] ID: 1821 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:50.620376       1 utils.go:206] ID: 1821 GRPC request: {}
I1013 19:41:50.620430       1 utils.go:212] ID: 1821 GRPC response: {}
I1013 19:41:52.511307       1 utils.go:195] ID: 1822 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:52.511364       1 utils.go:206] ID: 1822 GRPC request: {}
I1013 19:41:52.511387       1 utils.go:212] ID: 1822 GRPC response: {}
I1013 19:41:52.512331       1 utils.go:195] ID: 1823 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:41:52.512352       1 utils.go:206] ID: 1823 GRPC request: {}
I1013 19:41:52.512361       1 controllerserver-default.go:72] ID: 1823 Using default ControllerGetCapabilities
I1013 19:41:52.512451       1 utils.go:212] ID: 1823 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1013 19:42:20.624092       1 utils.go:195] ID: 1824 GRPC call: /csi.v1.Identity/Probe
I1013 19:42:20.624139       1 utils.go:206] ID: 1824 GRPC request: {}
I1013 19:42:20.624167       1 utils.go:212] ID: 1824 GRPC response: {}
I1013 19:42:22.513491       1 utils.go:195] ID: 1825 GRPC call: /csi.v1.Identity/Probe
I1013 19:42:22.513516       1 utils.go:206] ID: 1825 GRPC request: {}
I1013 19:42:22.513529       1 utils.go:212] ID: 1825 GRPC response: {}
I1013 19:42:22.514307       1 utils.go:195] ID: 1826 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:42:22.514319       1 utils.go:206] ID: 1826 GRPC request: {}
I1013 19:42:22.514325       1 controllerserver-default.go:72] ID: 1826 Using default ControllerGetCapabilities
I1013 19:42:22.514382       1 utils.go:212] ID: 1826 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}

Job 1 run and file created in volume

root@nomad-server-1:/home/vagrant# date; nomad run job1.nomad
Fri 13 Oct 2023 07:43:32 PM UTC
==> 2023-10-13T19:43:32Z: Monitoring evaluation "27c0ac72"
    2023-10-13T19:43:32Z: Evaluation triggered by job "mysql-busybox1"
    2023-10-13T19:43:33Z: Evaluation within deployment: "879eb23e"
    2023-10-13T19:43:33Z: Allocation "cfa1adfd" created: node "ad7dc74c", group "mysql1"
    2023-10-13T19:43:33Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-13T19:43:33Z: Evaluation "27c0ac72" finished with status "complete"
==> 2023-10-13T19:43:33Z: Monitoring deployment "879eb23e"
  ✓ Deployment "879eb23e" successful

    2023-10-13T19:43:45Z
    ID          = 879eb23e
    Job ID      = mysql-busybox1
    Job Version = 2
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    mysql1      1        1       1        0          2023-10-13T19:53:44Z

$ nomad alloc exec -i -t -task busybox1 cfa1adfd sh
/ # 
/ # ls
alloc    dev      home     lib64    proc     secrets  tmp      var
bin      etc      lib      local    root     sys      usr
/ # cd alloc/
/alloc # ls
data  logs  test  tmp
/alloc # cd test/
/alloc/test # ls
lost+found
/alloc/test # touch job1
/alloc/test #

Job 2 run and ls of its volume

root@nomad-server-1:/home/vagrant# date; nomad run job2.nomad
Fri 13 Oct 2023 07:44:41 PM UTC
==> 2023-10-13T19:44:41Z: Monitoring evaluation "4c2d2cec"
    2023-10-13T19:44:41Z: Evaluation triggered by job "mysql-busybox2"
    2023-10-13T19:44:41Z: Allocation "82c8ac43" created: node "ad7dc74c", group "mysql2"
    2023-10-13T19:44:42Z: Evaluation within deployment: "3183da2c"
    2023-10-13T19:44:42Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-13T19:44:42Z: Evaluation "4c2d2cec" finished with status "complete"
==> 2023-10-13T19:44:42Z: Monitoring deployment "3183da2c"
  ✓ Deployment "3183da2c" successful

    2023-10-13T19:44:53Z
    ID          = 3183da2c
    Job ID      = mysql-busybox2
    Job Version = 2
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    mysql2      1        1       1        0          2023-10-13T19:54:52Z

$ nomad alloc exec -i -t -task busybox2 82c8ac43 sh
/ # 
/ # ls
alloc    dev      home     lib64    proc     secrets  tmp      var
bin      etc      lib      local    root     sys      usr
/ # cd alloc/
/alloc # ls
data  logs  test  tmp
/alloc # cd test/
/alloc/test # ls
job1        lost+found
/alloc/test #

ceph-node_vol-same-name.log.zip

tgross · 2024-05-03T19:10:14Z

Hi folks, just a heads up that I'm picking this up (as well as #20424). But @ron-savoia I've split out #20530 around the name field collision.

CSI volumes are are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. Fixes: #18741

tgross · 2024-05-03T20:27:08Z

Initial draft PR is up here #20532. I think the upgrade path ends up being ok, but I need to do some end-to-end testing to verify that before marking this ready for review. Will do that testing early next week.

tgross · 2024-05-07T20:25:04Z

Upgrade testing didn't go so well, and I've at least broken unstaging when clients are upgraded before servers (which isn't the recommended upgrade path but we want to handle it gracefully). Going to do some test code rework that'll help debug this plus #20424 in more fine detail.

CSI volumes are are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. Fixes: #18741

tgross · 2024-05-08T20:38:42Z

After a bit of rework I've tested the upgrade path from 1.8.0-beta.1 to the patch I've got in #20532. Looks like this should work now and I'll mark it ready for review. Test details below.

Existing behavior

First, I started with 1.8.0-beta.1 and a running allocation that consumes a CSI volume. I see the following filesystem and mounts.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 55113309-8135-fd80-112b-d9f0f2c4cc6f # consuming alloc
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer # per-alloc mount point
│       │               └── test.txt
│       └── staging
│           └── csi-volume-nfs                        # staging is not namespaced
│               └── rw-file-system-single-node-writer # staging mount point
│                   └── test.txt
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

14 directories, 4 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/55113309-8135-fd80-112b-d9f0f2c4cc6f/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

I then stop the job, so that we have a baseline behavior.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 55113309-8135-fd80-112b-d9f0f2c4cc6f
│       │       └── csi-volume-nfs # mount is gone
│       └── staging
│           └── csi-volume-nfs # mount is gone
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Note that this leaves behind the parent directories of the mount points. That's mostly harmless but not very tidy, so I've opened #20544 to follow-up on fixing that.

Client Upgrade

The first test is upgrading the client first (although this isn't our recommended approach).

First I run the job, and I see the following filesystem and mounts.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 55113309-8135-fd80-112b-d9f0f2c4cc6f # old job version
│       │   │   └── csi-volume-nfs
│       │   └── 65a7fc03-2205-7f0d-1e38-a56e39e68ac0 # new alloc
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer # mount point
│       │               └── test.txt
│       └── staging
│           └── csi-volume-nfs                        # staging is not namespaced
│               └── rw-file-system-single-node-writer # mount point
│                   └── test.txt
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

16 directories, 4 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/65a7fc03-2205-7f0d-1e38-a56e39e68ac0/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Then I upgrade the client from 1.8.0-beta.1 to the patch in #20532 and restart. I checked the filesystem and mounts are unchanged after restoring the allocation (as expected).

Then I stopped the job, and see that the old mounts and paths are cleaned up just as before (the claim is also released):

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 55113309-8135-fd80-112b-d9f0f2c4cc6f
│       │   │   └── csi-volume-nfs
│       │   └── 65a7fc03-2205-7f0d-1e38-a56e39e68ac0
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs

Then I started the job again, and see that staging is properly namespaced.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 55113309-8135-fd80-112b-d9f0f2c4cc6f # from previous alloc
│       │   │   └── csi-volume-nfs
│       │   ├── 65a7fc03-2205-7f0d-1e38-a56e39e68ac0 # from previous alloc
│       │   │   └── csi-volume-nfs
│       │   └── a86e82ee-0d91-f48f-36ea-70c19d93fce8
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer # mount point
│       │               └── test.txt
│       └── staging
│           ├── csi-volume-nfs # from previous alloc
│           └── prod           # staging is now namespaced
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer # mount point
│                       └── test.txt
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

20 directories, 4 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/a86e82ee-0d91-f48f-36ea-70c19d93fce8/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Server Upgrade

Next I wiped the client and server and started over from a clean datadir and 1.8.0-beta.1. After deplying the job that consumes the volume, I have the following filesystem and mounts:

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/d6a782ee-0623-4d7e-7633-51aaf04c8286/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Then I upgrade the server to the patched version and restart it. Then, just to verify restore is good I restarted the client without upgrading. As expected, that's all good.

Next I stopped the job, and everything unmounted as expected. The volume claim was also freed.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Then I re-ran the job (now with upgraded server but non-upgraded client), and see the old client behavior is still safely in place.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 531da649-bd64-b692-dbed-ea93840808b7
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

16 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/531da649-bd64-b692-dbed-ea93840808b7/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Lastly, I upgraded the client as well. I stopped the job, and restarted the job.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 531da649-bd64-b692-dbed-ea93840808b7 # old alloc
│       │   │   └── csi-volume-nfs
│       │   ├── 92ffa2bd-9399-29ad-99b0-748e3d7a0c4d # current alloc
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer # mount point
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286 # old alloc
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs # old staging
│           └── prod           # staging is now namespaced
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer # mount point
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

20 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/92ffa2bd-9399-29ad-99b0-748e3d7a0c4d/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

tgross · 2024-05-08T20:48:07Z

Bah, I missed that the unstage code path for the new code isn't quite working as expected. Need to fix that. The plugin returns no error in that case but it's not unstaging because the unstage path for some reason doesn't include the namespace. I've had a pass through the code but it's not yet obvious why this is missing. Will pick that up tomorrow morning.

2024-05-08T16:41:47.043-0400 [TRACE] client.alloc_runner: running post-run hook: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d name=csi_hook start="2024-05-08 16:41:47.043329149 -0400 EDT m=+460.582850122"
2024-05-08T16:41:47.043-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unpublishing volume: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d volume_id=csi-volume-nfs plugin_target_path=/local/csi/per-alloc/92ffa2bd-9399-29ad-99b0-748e3d7a0c4d/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-08T16:41:47.053-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=10.523676ms grpc.service=csi.v1.Node grpc.method=NodeUnpublishVolume
2024-05-08T16:41:47.053-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unstaging volume: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d volume_id=csi-volume-nfs staging_path=/local/csi/staging/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-08T16:41:47.058-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=4.768144ms grpc.service=csi.v1.Node grpc.method=NodeUnstageVolume
2024-05-08T16:41:47.064-0400 [TRACE] client.alloc_runner: finished post-run hooks: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d name=csi_hook end="2024-05-08 16:41:47.064857476 -0400 EDT m=+460.604378469" duration=21.528347ms
2024-05-08T16:41:47.064-0400 [TRACE] client.alloc_runner: finished post-run hooks: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d end="2024-05-08 16:41:47.064885581 -0400 EDT m=+460.604406553" duration=173.813468ms

tgross · 2024-05-09T14:17:07Z

Ok, the problem was that I was checking the existence of the staging path using the path inside the plugin container, which of course will never exist from the perspective of the CSI hook. With that adjustment, everything appears to be working as we want.

Mount / unmount with patched version

After running the job, our filesystem and mounts are as expected.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 51a520b1-dae0-3403-e20e-d4565cd7f68e
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── prod
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer
└── plugins
    ├── b97f67e5-3969-a876-75c6-0a97b6c1c150
    │   └── csi.sock
    └── e018a1d8-26b1-229b-79ed-ed0ef77f5e1f
        └── csi.sock

15 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/51a520b1-dae0-3403-e20e-d4565cd7f68e/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

I stop the job and see the mounts are cleaned up.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 51a520b1-dae0-3403-e20e-d4565cd7f68e
│       │       └── csi-volume-nfs
│       └── staging
│           └── prod
│               └── csi-volume-nfs
└── plugins
    ├── b97f67e5-3969-a876-75c6-0a97b6c1c150
    │   └── csi.sock
    └── e018a1d8-26b1-229b-79ed-ed0ef77f5e1f
        └── csi.sock

13 directories, 2 files

$ mount | grep csi-volume-nfs

Just to double-check everything is good, I grabbed the trace logs for the allocation and those look ok.

2024-05-09T09:46:34.910-0400 [TRACE] client.alloc_runner: running post-run hook: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e name=csi_hook start="2024-05-09 09:46:34.910896442 -0400 EDT m=+146.819772211"
2024-05-09T09:46:34.910-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unmounting volume: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e ns=prod volume_id=csi-volume-nfs
2024-05-09T09:46:34.910-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unpublishing volume: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e ns=prod volume_id=csi-volume-nfs plugin_target_path=/local/csi/per-alloc/51a520b1-dae0-3403-e20e-d4565cd7f68e/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-09T09:46:34.921-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=10.416262ms grpc.service=csi.v1.Node grpc.method=NodeUnpublishVolume
2024-05-09T09:46:34.921-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unstaging volume: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e ns=prod volume_id=csi-volume-nfs staging_path=/local/csi/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-09T09:46:34.949-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=27.758998ms grpc.service=csi.v1.Node grpc.method=NodeUnstageVolume
2024-05-09T09:46:34.956-0400 [TRACE] client.alloc_runner: finished post-run hooks: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e name=csi_hook end="2024-05-09 09:46:34.956975477 -0400 EDT m=+146.865851275" duration=46.079064ms

Client upgrade

Reset both hosts to 1.8.0-beta.1, start from a fresh datadir, and deploy the job.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/a8b8c6cb-3d62-de5b-930e-2ac53864ef21/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Upgrade the client to the patched version and restart it. Restoring the alloc looks good in the trace logs.

2024-05-09T09:58:04.797-0400 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=a8b8c6cb-3d62-de5b-930e-2ac53864ef21 name=csi_hook start="2024-05-09 09:58:04.797707724 -0400 EDT m=+0.208162277"
2024-05-09T09:58:04.998-0400 [DEBUG] client.alloc_runner.runner_hook.csi_hook: found CSI plugin: alloc_id=a8b8c6cb-3d62-de5b-930e-2ac53864ef21 type=csi-node name=org.democratic-csi.nfs
2024-05-09T09:58:05.015-0400 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=a8b8c6cb-3d62-de5b-930e-2ac53864ef21 name=csi_hook end="2024-05-09 09:58:05.015799926 -0400 EDT m=+0.426254597" duration=218.09232ms

Stop the job and see everything is unmounted as we'd hope.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Start the job again (still using the old server, but with the new client), and see the namespaced staging dir now.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 479e6afd-4dde-15a4-b062-de01e5dc8195
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs
│           └── prod
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

18 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/479e6afd-4dde-15a4-b062-de01e5dc8195/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Stop the job again, just to verify that the old server can't interfere with cleanup on a new client.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 479e6afd-4dde-15a4-b062-de01e5dc8195
│       │   │   └── csi-volume-nfs
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs
│           └── prod
│               └── csi-volume-nfs
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

16 directories, 2 files

$ mount | grep csi-volume-nfs

Server upgrade

Reset both client and server to 1.8.0-beta.1, start from a fresh data dir, and run the job.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

14 directories, 2 files

$  mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/7929db60-1e23-2cb4-6e06-682949d86bd9/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Upgrade the server and restart it. Then stop the job. Everything is unmounted as expected.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Run the job again (without upgrading client), just to make sure a new server can't force incorrect behavior on an old client. This looks as expected -- the bug is still in place on the client but the mount works.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │   │   └── csi-volume-nfs
│       │   └── fd381fa6-1022-c8d7-4f43-65ba8e23c6dd
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

16 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/fd381fa6-1022-c8d7-4f43-65ba8e23c6dd/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Now update the client and restart it, and stop the job. Everything is unmounted as expected.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │   │   └── csi-volume-nfs
│       │   └── fd381fa6-1022-c8d7-4f43-65ba8e23c6dd
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs

Start the job again, showing that new server + new client results in namespaced staging as we'd expect.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 4ebf15be-bcf0-3c00-9b99-9ce72895f0c1
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer
│       │   ├── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │   │   └── csi-volume-nfs
│       │   └── fd381fa6-1022-c8d7-4f43-65ba8e23c6dd
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs
│           └── prod
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

20 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/4ebf15be-bcf0-3c00-9b99-9ce72895f0c1/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

CSI volumes are are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. Fixes: #18741

CSI volumes are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. The per-allocation paths don't need to be namespaced, because an allocation can only mount volumes from its job's own namespace. Rework the CSI hook tests to have more fine-grained control over the mock on-disk state. Add tests covering upgrades from staging paths missing namespaces. Fixes: #18741

…e/1.6.x (#20572) CSI volumes are namespaced. But the client does not include the namespace in the staging mount path. This causes CSI volumes with the same volume ID but different namespace to collide if they happen to be placed on the same host. The per-allocation paths don't need to be namespaced, because an allocation can only mount volumes from its job's own namespace. Rework the CSI hook tests to have more fine-grained control over the mock on-disk state. Add tests covering upgrades from staging paths missing namespaces. Fixes: #18741 Co-authored-by: Tim Gross <tgross@hashicorp.com>

tgross · 2024-05-16T14:48:45Z

#20532 has been merged and will ship in Nomad 1.8.0 (with backports to supported versions)

ygg-drop added the type/bug label Oct 12, 2023

tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Oct 12, 2023

jrasell added theme/storage stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Oct 16, 2023

jrasell moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Oct 16, 2023

davemay99 added the hcc/cst Admin - internal label Nov 3, 2023

tgross mentioned this issue Apr 17, 2024

Race condition when a CSI volume with staging gets unmounted from one alloc and mounted to another alloc #20424

Closed

tgross self-assigned this May 3, 2024

tgross mentioned this issue May 3, 2024

CSI: volume.name uniqueness is not enforced per-plugin #20530

Open

tgross mentioned this issue May 3, 2024

CSI: include volume namespace in staging path #20532

Merged

tgross moved this from Needs Roadmapping to In Progress in Nomad - Community Issues Triage May 3, 2024

tgross added this to the 1.8.0 milestone May 9, 2024

tgross closed this as completed in #20532 May 13, 2024

Nomad - Community Issues Triage automation moved this from In Progress to Done May 13, 2024

hc-github-team-nomad-core mentioned this issue May 13, 2024

Backport of CSI: include volume namespace in staging path into release/1.6.x #20572

Merged

hc-github-team-nomad-core mentioned this issue May 13, 2024

Backport of CSI: include volume namespace in staging path into release/1.7.x #20573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI volumes staging mount path collision between namespaces with CSI plugins that support staging #18741

CSI volumes staging mount path collision between namespaces with CSI plugins that support staging #18741

ygg-drop commented Oct 12, 2023

the-nando commented Oct 12, 2023 •

edited

jrasell commented Oct 16, 2023

ron-savoia commented Oct 20, 2023

tgross commented May 3, 2024 •

edited

tgross commented May 3, 2024 •

edited

tgross commented May 7, 2024

tgross commented May 8, 2024 •

edited

tgross commented May 8, 2024 •

edited

tgross commented May 9, 2024

tgross commented May 16, 2024

CSI volumes staging mount path collision between namespaces with CSI plugins that support staging #18741

CSI volumes staging mount path collision between namespaces with CSI plugins that support staging #18741

Comments

ygg-drop commented Oct 12, 2023

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

the-nando commented Oct 12, 2023 • edited

jrasell commented Oct 16, 2023

ron-savoia commented Oct 20, 2023

Same ID in Volume file

Same Name in Volume file

tgross commented May 3, 2024 • edited

tgross commented May 3, 2024 • edited

tgross commented May 7, 2024

tgross commented May 8, 2024 • edited

Existing behavior

Client Upgrade

Server Upgrade

tgross commented May 8, 2024 • edited

tgross commented May 9, 2024

Mount / unmount with patched version

Client upgrade

Server upgrade

tgross commented May 16, 2024

the-nando commented Oct 12, 2023 •

edited

tgross commented May 3, 2024 •

edited

tgross commented May 3, 2024 •

edited

tgross commented May 8, 2024 •

edited

tgross commented May 8, 2024 •

edited