EC filesystem fails to deploy with latest release of Rook-Ceph #8210

SalvoRusso8 · 2021-06-28T08:46:50Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:
EC filesystem is not created

Expected behavior:
Everything correctly created

How to reproduce it (minimal and precise):
I created on AKS a K8s cluster with 6 nodes. 3 OSDs. The deploy of normal filesystem works just fine, I used the example crds.yaml, common.yaml,

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph # namespace:cluster
spec:
  cephVersion:
    image: ceph/ceph:v16.2.4
    allowUnsupported: false
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  waitTimeoutForHealthyOSDInMinutes: 10
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 1
    modules:
      - name: pg_autoscaler
        enabled: true
  dashboard:
    enabled: true
    ssl: true
  monitoring:
    enabled: false
    rulesNamespace: rook-ceph
  network:
  crashCollector:
    disable: false
  cleanupPolicy:
    confirmation: ""
    sanitizeDisks:
      method: quick
      dataSource: zero
      iteration: 1
    allowUninstallWithVolumes: false
  annotations:
  labels:
  resources:
  removeOSDsIfOutAndSafeToRemove: false
  storage: # cluster level storage configuration and selection
    storageClassDeviceSets:
    - name: set1
      count: 3
      portable: false
      placement:
        tolerations:
        - key: storage-node
          operator: Exists
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - npstorage
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rook-ceph-osd
                - key: app
                  operator: In
                  values:
                  - rook-ceph-osd-prepare
              topologyKey: kubernetes.io/hostname
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 30Gi
          storageClassName: managed-premium
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
    useAllNodes: true
    useAllDevices: true
    allowMultiplePerNode: false
    config:
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0
    manageMachineDisruptionBudgets: false
    machineDisruptionBudgetNamespace: openshift-machine-api
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false

Operator's logs, if necessary
2021-06-28 07:36:15.478740 I | ceph-file-controller: creating filesystem "myfs-ec" 2021-06-28 07:36:15.478782 I | cephclient: creating filesystem "myfs-ec" with metadata pool "myfs-ec-metadata" and data pools [myfs-ec-data0] 2021-06-28 07:36:17.223857 E | ceph-file-controller: failed to reconcile failed to create filesystem "myfs-ec": failed to create filesystem "myfs-ec": failed enabling ceph fs "myfs-ec": Error EINVAL: pool 'myfs-ec-data0' (id '3') is an erasure-coded pool. Use of an EC pool for the default data pool is discouraged; see the online CephFS documentation for more information. Use --force to override.

Environment:

OS (e.g. from /etc/os-release): Azure Kubernetes Service
Kernel (e.g. uname -a):
Cloud provider or hardware configuration: Azure
Rook version (use rook version inside of a Rook Pod): v1.6.2
Storage backend version (e.g. for ceph do ceph -v): 16.2.2
Kubernetes version (use kubectl version): 1.20.7
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): AKS
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

The text was updated successfully, but these errors were encountered:

sp98 · 2021-06-30T10:26:15Z

This looks expected. Please check https://docs.ceph.com/en/latest/cephfs/createfs/#creating-pools ( last point).
Also, https://tracker.ceph.com/issues/42450

SalvoRusso8 · 2021-06-30T10:38:15Z

@sp98 So it is not enough to follow this documentation https://rook.io/docs/rook/v1.6/ceph-filesystem-crd.html#erasure-coded to deploy a working EC filesystem. Maybe some indications to solve the problem with metadata pool should be added.

travisn · 2021-06-30T15:10:28Z

This is a result of #8130, a couple of users have also reported this in slack.
@subhamkrai We should look at reverting that in the case of an EC pool for the filesystem.

travisn · 2021-06-30T15:12:03Z

The workaround would be to use Rook v1.6.5 until this is fixed, since v1.6.6 removed the --force flag.

subhamkrai · 2021-07-06T04:54:37Z

@batrick is there another way apart from using --force?

batrick · 2021-07-06T14:43:30Z

@subhamkrai do not use ec pools for the default (primary) data pool. Add a directory layout on root for a secondary ec data pool instead.

subhamkrai · 2021-07-14T15:49:15Z

@batrick can you share more details about it or maybe a doc link from where I can get some ideas about directory layout and primary/secondary pools?

batrick · 2021-07-15T00:14:17Z

Create an erasure coded pool and then: https://docs.ceph.com/en/latest/cephfs/file-layouts/#adding-a-data-pool-to-the-file-system

subhamkrai · 2021-07-19T03:08:19Z

@travisn adding this link in the doc only will work?

travisn · 2021-07-19T20:04:42Z

First, let's update the Filesystem EC example to have two data pools:

spec:
  metadataPool:
    replicated:
      size: 3
  dataPools:
    - replicated:
        size: 3
    - erasureCoded:
        dataChunks: 2
        codingChunks: 1
  metadataServer:
    activeCount: 1
    activeStandby: true

This way Rook will create the pools and add them to the filesystem.

Then I understand the user will need to go ahead and provision their cephfs volume, and inside the volume set this attribute, where <ec-pool> is named something like myfs-data1, depending on the filesystem name.

setfattr -n ceph.dir.layout.pool -v <ec-pool> /mnt/cephfs/myssddir

@Madhu-1 Or does the CSI driver allow setting an attribute like this for the directory to use a pool?

Madhu-1 · 2021-07-21T11:23:34Z

CSI sets pool layout when creating cephfs subvolume https://github.com/ceph/ceph-csi/blob/d85304c7c22de19e20efbcb34140f2c5b130bb70/internal/cephfs/volume.go#L175

when creating ec fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Also after this, user need to add a directory layout on root for a secondary ec data pool. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>

zamnuts · 2021-08-22T03:25:50Z

Specifying a replicated pool as the first dataPool, and an erasureCoded pool as the second, doesn't do the trick. The ceph-file-onctroller still complains about the data0 EC pool.

---
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: hdd-ec-filesystem
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    deviceClass: ssd
    replicated:
      size: 2
  dataPools:
    - failureDomain: host
      deviceClass: ssd
      replicated:
        size: 2
    - failureDomain: host
      deviceClass: hdd
      erasureCoded:
        dataChunks: 2
        codingChunks: 1
  metadataServer:
    activeCount: 1
    activeStandby: true

---

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-hdd-ec-fs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  clusterID: rook-ceph
  fsName: hdd-ec-filesystem
  pool: hdd-ec-filesystem-data1

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster

  mounter: kernel
reclaimPolicy: Retain
allowVolumeExpansion: true

After applying both the CephFilesystem and the StorageClass, I get the error:

2021-08-22 03:23:23.615721 I | ceph-file-controller: creating filesystem "hdd-ec-filesystem"
2021-08-22 03:23:23.615767 I | cephclient: creating filesystem "hdd-ec-filesystem" with metadata pool "hdd-ec-filesystem-metadata" and data pools [hdd-ec-filesystem-data0 hdd-ec-filesystem-data1]
2021-08-22 03:23:26.441220 E | ceph-file-controller: failed to reconcile failed to create filesystem "hdd-ec-filesystem": failed to create filesystem "hdd-ec-filesystem": failed enabling ceph fs "hdd-ec-filesystem": Error EINVAL: pool 'hdd-ec-filesystem-data0' (id '30') is an erasure-coded pool. Use of an EC pool for the default data pool is discouraged; see the online CephFS documentation for more information. Use --force to override.

The cephfs exists, but with error:

# k get CephFilesystem --all-namespaces                   
NAMESPACE   NAME                ACTIVEMDS   AGE     PHASE
rook-ceph   hdd-ec-filesystem   1           3m56s   ReconcileFailed

...as does the sc:

k get sc
NAME                                  PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-hdd-ec-fs                   rook-ceph.cephfs.csi.ceph.com   Retain          Immediate           true                   4m20s

I reviewed the draft guidance in PR #8452 (and this thread). I'm not sure what else is missing here.

github-actions · 2021-10-21T20:02:10Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

when creating ec fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>

When creating EC fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>

Omar007 · 2021-11-14T18:10:57Z

Just hit this issue. From my understanding of the docs it's not actually required to attach 2 data pools at all and just an EC data pool should also work as long as allow_ec_overwrites is enabled on the EC data pool.

That said, while it is not required to have a replicated and EC data pool (if I'm understanding the docs correctly), it does seem to be advised to potentially improve small-object backtrace update read and write performance.

BlaineEXE · 2021-11-15T18:30:50Z

Just hit this issue. From my understanding of the docs it's not actually required to attach 2 data pools at all and just an EC data pool should also work as long as allow_ec_overwrites is enabled on the EC data pool.

That said, while it is not required to have a replicated and EC data pool (if I'm understanding the docs correctly), it does seem to be advised to potentially improve small-object backtrace update read and write performance.

I'm not so sure that's true. The Ceph docs here https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites say

Erasure coded pools do not support omap, so to use them with RBD and CephFS you must instruct them to store their data in an ec pool, and their metadata in a replicated pool.

Omar007 · 2021-11-15T20:00:51Z

Haha I was just about to update my post as I stumbled upon that segment in the docs as well while I was going through different parts of the docs.
Based on that it does seem like both types are needed but not just for an EC CephFS in that case, also for the RBD types. Which, as a sidenote, also means the rook-ceph-cluster chart can not currently be used to define EC BlockPools.

I'm giving it a shot to use an EC RBD now and see what happens.

EDIT: So yea that does indeed also fail. So it is 100% required then. In which case I'd think, based on the docs, that if Rook instantiates the RBDs using --data-pool and the filesystems using the equivalent of ceph fs new <fs_name> <metadata> <data> it should be working without manual intervention and editting 🤔

travisn · 2021-11-15T20:36:01Z

Which, as a sidenote, also means the rook-ceph-cluster chart can not currently be used to define EC BlockPools.

Would you mind opening a new issue for this?

Omar007 · 2021-11-15T20:55:11Z

Would you mind opening a new issue for this?

Done; #9179

Omar007 · 2021-11-17T19:50:37Z

I did one more test with block and fs pools and to summarise:

Metadata must always be on a replica pool (due to the OMAP stuff) so any setup with an EC pool always needs/has an accompanying replica pool
EC Block Pools work as documented in Rook and Ceph documentation; you need to define 2 CephBlookPool types and link them together in a StorageClass for Rook, which then result in the operator running the documented command and behaviour of the Ceph documentation.
EC Filesystems DO NOT currently work as documented; 1 data pool should be sufficient and any type goes for that singular pool (as long as it has overwrites enabled if it's an EC pool). Configuring this in Rook seems (based on a quick glance at the code) to result in the operator running the documented command of the Ceph documentation but this currently fails with exit code 22. According to several different Ceph docs I've read[1][2][3], this should be working. If my glance is correct and the operator is indeed running ceph fs new <fs_name> <metadata> <data> for this, at this point I'm thinking this might be something for members of the Ceph team to get involved in, not just the Rook operator team.

What I haven't looked into yet is the last category; CephObjectStore. I can't figure out the setup/internal workings for it based on the documentation as what I'm seeing there suggests it creates like 6 pools per zone so I have no idea how that maps on what's documented/set up for Rook's CRD which only defines a single metadata- and data pool. I am expecting the first bullet to also be applicable here but I'll need to do a test with object stores to see what happens here in regards to pool setup/creation.

[1] https://docs.ceph.com/en/latest/cephfs/createfs/#using-erasure-coded-pools-with-cephfs
[2] https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites
[3] https://docs.ceph.com/en/octopus/install/ceph-deploy/quick-cephfs/#using-erasure-coded-pools

Omar007 · 2021-11-18T13:20:42Z

Looking at the possible options I'm currently seeing 2 things that could be done to improve the situation:

Ceph documentation should probably explicitly mention EC pools can be used but require --force as it currently doesn't until you try it.
Rook could/should expose an option to enable filesystem creation with force so this configuration can be realized without having to bypass Rook.

That said, I've done the operator's fs new command manually with --force (I am acknowledging the impact on backtrace updates) and it looks like the operator reconciles everything else properly 👍

subhamkrai · 2021-11-18T14:56:56Z

@Omar007 currently, we are looking for other options where don't have to use -force because it has some side effects. You can look more at the PR linked to this issue. And maybe try if that works :-p

Omar007 · 2021-11-18T17:48:28Z

I've seen that PR but the point is, based on the docs, going that route should not be needed.
It's an option, sure, but if I'm setting up and attaching a second data pool as EC to then use file layouts to use that as the root and thus basically the only real data pool, bypassing the original, then what's the point 🤔

And if there is side-effects not mentioned in the docs or anywhere else, that should really be fixed as well (though honestly that's a Ceph documentation problem, not such much a Rook documentation problem).

To give some more context; in my case I'm talking about a pool that will be running a mostly read-oriented workload with files of several 100's of MBs in size for the small ones and several GBs for the big ones. That one/singular drawback currently mentioned in the docs is of completely no concern to me so if the documentation is valid/complete, I do not want to deal with having a more complicated configuration and unused pools for no reason as this should be working/usable.

If there are other issues with this setup other than that single documented one, I would love to see more information about that be made available (or referenced if it is hidden somewhere completely unrelated at the moment).

When creating EC fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Also, changing the pool name in storageclass-ec file. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>

ajarr · 2021-11-26T00:42:06Z

@subhamkrai do not use ec pools for the default (primary) data pool. Add a directory layout on root for a secondary ec data pool instead.

@batrick @kotreshhr as pointed out in the discussions above, is this documentation "For CephFS, an erasure coded pool can be set as the default data pool during file system creation or via file layouts.", incorrect? See, https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites

subhamkrai · 2021-11-26T01:20:46Z

@ajarr We are now adding erasure-coded pool as non-default data fool for CephFs but while testing I didn't run the the command for layouts and we can see in the PR comment that data is going in erasure-coded pool

When creating EC fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Also, changing the pool name in storageclass-ec file. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>

batrick · 2021-12-07T16:02:21Z

@subhamkrai do not use ec pools for the default (primary) data pool. Add a directory layout on root for a secondary ec data pool instead.

@batrick @kotreshhr as pointed out in the discussions above, is this documentation "For CephFS, an erasure coded pool can be set as the default data pool during file system creation or via file layouts.", incorrect? See, https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites

Strictly speaking it's not incorrect because you can use it for the default data pool but you should not. That documentation should mention as much.

batrick · 2021-12-07T16:02:43Z

(That documentation was not updated probably because it's part of doc/rados and not doc/cephfs.)

When creating EC fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Also, changing the pool name in storageclass-ec file. Closes: #8210 Signed-off-by: subhamkrai <srai@redhat.com> (cherry picked from commit 5bb29f1)

When creating EC fs, create replicated pool as primary pool and ec pool as secondary pool, creating ec pool as primary is not encouraged and it will lead to failure. Also, changing the pool name in storageclass-ec file. Closes: rook#8210 Signed-off-by: subhamkrai <srai@redhat.com>

SalvoRusso8 added the bug label Jun 28, 2021

subhamkrai self-assigned this Jul 9, 2021

subhamkrai mentioned this issue Aug 2, 2021

mds: create EC pool as secondary pool #8452

Merged

10 tasks

github-actions bot added the wontfix label Oct 21, 2021

subhamkrai removed the wontfix label Oct 25, 2021

travisn closed this as completed in #8452 Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EC filesystem fails to deploy with latest release of Rook-Ceph #8210

EC filesystem fails to deploy with latest release of Rook-Ceph #8210

SalvoRusso8 commented Jun 28, 2021

sp98 commented Jun 30, 2021

SalvoRusso8 commented Jun 30, 2021 •

edited

travisn commented Jun 30, 2021

travisn commented Jun 30, 2021

subhamkrai commented Jul 6, 2021

batrick commented Jul 6, 2021

subhamkrai commented Jul 14, 2021

batrick commented Jul 15, 2021

subhamkrai commented Jul 19, 2021

travisn commented Jul 19, 2021

Madhu-1 commented Jul 21, 2021

zamnuts commented Aug 22, 2021 •

edited

github-actions bot commented Oct 21, 2021

Omar007 commented Nov 14, 2021 •

edited

BlaineEXE commented Nov 15, 2021 •

edited

Omar007 commented Nov 15, 2021 •

edited

travisn commented Nov 15, 2021

Omar007 commented Nov 15, 2021

Omar007 commented Nov 17, 2021 •

edited

Omar007 commented Nov 18, 2021

subhamkrai commented Nov 18, 2021

Omar007 commented Nov 18, 2021 •

edited

ajarr commented Nov 26, 2021 •

edited

subhamkrai commented Nov 26, 2021

batrick commented Dec 7, 2021

batrick commented Dec 7, 2021

EC filesystem fails to deploy with latest release of Rook-Ceph #8210

EC filesystem fails to deploy with latest release of Rook-Ceph #8210

Comments

SalvoRusso8 commented Jun 28, 2021

sp98 commented Jun 30, 2021

SalvoRusso8 commented Jun 30, 2021 • edited

travisn commented Jun 30, 2021

travisn commented Jun 30, 2021

subhamkrai commented Jul 6, 2021

batrick commented Jul 6, 2021

subhamkrai commented Jul 14, 2021

batrick commented Jul 15, 2021

subhamkrai commented Jul 19, 2021

travisn commented Jul 19, 2021

Madhu-1 commented Jul 21, 2021

zamnuts commented Aug 22, 2021 • edited

github-actions bot commented Oct 21, 2021

Omar007 commented Nov 14, 2021 • edited

BlaineEXE commented Nov 15, 2021 • edited

Omar007 commented Nov 15, 2021 • edited

travisn commented Nov 15, 2021

Omar007 commented Nov 15, 2021

Omar007 commented Nov 17, 2021 • edited

Omar007 commented Nov 18, 2021

subhamkrai commented Nov 18, 2021

Omar007 commented Nov 18, 2021 • edited

ajarr commented Nov 26, 2021 • edited

subhamkrai commented Nov 26, 2021

batrick commented Dec 7, 2021

batrick commented Dec 7, 2021

SalvoRusso8 commented Jun 30, 2021 •

edited

zamnuts commented Aug 22, 2021 •

edited

Omar007 commented Nov 14, 2021 •

edited

BlaineEXE commented Nov 15, 2021 •

edited

Omar007 commented Nov 15, 2021 •

edited

Omar007 commented Nov 17, 2021 •

edited

Omar007 commented Nov 18, 2021 •

edited

ajarr commented Nov 26, 2021 •

edited