Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC filesystem fails to deploy with latest release of Rook-Ceph #8210

Closed
SalvoRusso8 opened this issue Jun 28, 2021 · 26 comments · Fixed by #8452
Closed

EC filesystem fails to deploy with latest release of Rook-Ceph #8210

SalvoRusso8 opened this issue Jun 28, 2021 · 26 comments · Fixed by #8452
Assignees
Labels

Comments

@SalvoRusso8
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
EC filesystem is not created

Expected behavior:
Everything correctly created

How to reproduce it (minimal and precise):
I created on AKS a K8s cluster with 6 nodes. 3 OSDs. The deploy of normal filesystem works just fine, I used the example crds.yaml, common.yaml,

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph # namespace:cluster
spec:
  cephVersion:
    image: ceph/ceph:v16.2.4
    allowUnsupported: false
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  waitTimeoutForHealthyOSDInMinutes: 10
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 1
    modules:
      - name: pg_autoscaler
        enabled: true
  dashboard:
    enabled: true
    ssl: true
  monitoring:
    enabled: false
    rulesNamespace: rook-ceph
  network:
  crashCollector:
    disable: false
  cleanupPolicy:
    confirmation: ""
    sanitizeDisks:
      method: quick
      dataSource: zero
      iteration: 1
    allowUninstallWithVolumes: false
  annotations:
  labels:
  resources:
  removeOSDsIfOutAndSafeToRemove: false
  storage: # cluster level storage configuration and selection
    storageClassDeviceSets:
    - name: set1
      count: 3
      portable: false
      placement:
        tolerations:
        - key: storage-node
          operator: Exists
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - npstorage
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rook-ceph-osd
                - key: app
                  operator: In
                  values:
                  - rook-ceph-osd-prepare
              topologyKey: kubernetes.io/hostname
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 30Gi
          storageClassName: managed-premium
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
    useAllNodes: true
    useAllDevices: true
    allowMultiplePerNode: false
    config:
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0
    manageMachineDisruptionBudgets: false
    machineDisruptionBudgetNamespace: openshift-machine-api
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false
  • Operator's logs, if necessary
    2021-06-28 07:36:15.478740 I | ceph-file-controller: creating filesystem "myfs-ec" 2021-06-28 07:36:15.478782 I | cephclient: creating filesystem "myfs-ec" with metadata pool "myfs-ec-metadata" and data pools [myfs-ec-data0] 2021-06-28 07:36:17.223857 E | ceph-file-controller: failed to reconcile failed to create filesystem "myfs-ec": failed to create filesystem "myfs-ec": failed enabling ceph fs "myfs-ec": Error EINVAL: pool 'myfs-ec-data0' (id '3') is an erasure-coded pool. Use of an EC pool for the default data pool is discouraged; see the online CephFS documentation for more information. Use --force to override.

Environment:

  • OS (e.g. from /etc/os-release): Azure Kubernetes Service
  • Kernel (e.g. uname -a):
  • Cloud provider or hardware configuration: Azure
  • Rook version (use rook version inside of a Rook Pod): v1.6.2
  • Storage backend version (e.g. for ceph do ceph -v): 16.2.2
  • Kubernetes version (use kubectl version): 1.20.7
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): AKS
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK
@sp98
Copy link
Contributor

sp98 commented Jun 30, 2021

This looks expected. Please check https://docs.ceph.com/en/latest/cephfs/createfs/#creating-pools ( last point).
Also, https://tracker.ceph.com/issues/42450

@SalvoRusso8
Copy link
Author

SalvoRusso8 commented Jun 30, 2021

@sp98 So it is not enough to follow this documentation https://rook.io/docs/rook/v1.6/ceph-filesystem-crd.html#erasure-coded to deploy a working EC filesystem. Maybe some indications to solve the problem with metadata pool should be added.

@travisn
Copy link
Member

travisn commented Jun 30, 2021

This is a result of #8130, a couple of users have also reported this in slack.
@subhamkrai We should look at reverting that in the case of an EC pool for the filesystem.

@travisn
Copy link
Member

travisn commented Jun 30, 2021

The workaround would be to use Rook v1.6.5 until this is fixed, since v1.6.6 removed the --force flag.

@subhamkrai
Copy link
Contributor

@batrick is there another way apart from using --force?

@batrick
Copy link
Contributor

batrick commented Jul 6, 2021

@subhamkrai do not use ec pools for the default (primary) data pool. Add a directory layout on root for a secondary ec data pool instead.

@subhamkrai subhamkrai self-assigned this Jul 9, 2021
@subhamkrai
Copy link
Contributor

@batrick can you share more details about it or maybe a doc link from where I can get some ideas about directory layout and primary/secondary pools?

@batrick
Copy link
Contributor

batrick commented Jul 15, 2021

@subhamkrai
Copy link
Contributor

@travisn adding this link in the doc only will work?

@travisn
Copy link
Member

travisn commented Jul 19, 2021

First, let's update the Filesystem EC example to have two data pools:

spec:
  metadataPool:
    replicated:
      size: 3
  dataPools:
    - replicated:
        size: 3
    - erasureCoded:
        dataChunks: 2
        codingChunks: 1
  metadataServer:
    activeCount: 1
    activeStandby: true

This way Rook will create the pools and add them to the filesystem.

Then I understand the user will need to go ahead and provision their cephfs volume, and inside the volume set this attribute, where <ec-pool> is named something like myfs-data1, depending on the filesystem name.

setfattr -n ceph.dir.layout.pool -v <ec-pool> /mnt/cephfs/myssddir

@Madhu-1 Or does the CSI driver allow setting an attribute like this for the directory to use a pool?

@Madhu-1
Copy link
Member

Madhu-1 commented Jul 21, 2021

subhamkrai added a commit to subhamkrai/rook that referenced this issue Aug 2, 2021
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.
Also after this, user need to add a directory layout
on root for a secondary ec data pool.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
subhamkrai added a commit to subhamkrai/rook that referenced this issue Aug 3, 2021
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.
Also after this, user need to add a directory layout
on root for a secondary ec data pool.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
@zamnuts
Copy link

zamnuts commented Aug 22, 2021

Specifying a replicated pool as the first dataPool, and an erasureCoded pool as the second, doesn't do the trick. The ceph-file-onctroller still complains about the data0 EC pool.

---
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: hdd-ec-filesystem
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    deviceClass: ssd
    replicated:
      size: 2
  dataPools:
    - failureDomain: host
      deviceClass: ssd
      replicated:
        size: 2
    - failureDomain: host
      deviceClass: hdd
      erasureCoded:
        dataChunks: 2
        codingChunks: 1
  metadataServer:
    activeCount: 1
    activeStandby: true

---

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-hdd-ec-fs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  clusterID: rook-ceph
  fsName: hdd-ec-filesystem
  pool: hdd-ec-filesystem-data1

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster

  mounter: kernel
reclaimPolicy: Retain
allowVolumeExpansion: true

After applying both the CephFilesystem and the StorageClass, I get the error:

2021-08-22 03:23:23.615721 I | ceph-file-controller: creating filesystem "hdd-ec-filesystem"
2021-08-22 03:23:23.615767 I | cephclient: creating filesystem "hdd-ec-filesystem" with metadata pool "hdd-ec-filesystem-metadata" and data pools [hdd-ec-filesystem-data0 hdd-ec-filesystem-data1]
2021-08-22 03:23:26.441220 E | ceph-file-controller: failed to reconcile failed to create filesystem "hdd-ec-filesystem": failed to create filesystem "hdd-ec-filesystem": failed enabling ceph fs "hdd-ec-filesystem": Error EINVAL: pool 'hdd-ec-filesystem-data0' (id '30') is an erasure-coded pool. Use of an EC pool for the default data pool is discouraged; see the online CephFS documentation for more information. Use --force to override.

The cephfs exists, but with error:

# k get CephFilesystem --all-namespaces                   
NAMESPACE   NAME                ACTIVEMDS   AGE     PHASE
rook-ceph   hdd-ec-filesystem   1           3m56s   ReconcileFailed

...as does the sc:

k get sc
NAME                                  PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-hdd-ec-fs                   rook-ceph.cephfs.csi.ceph.com   Retain          Immediate           true                   4m20s

I reviewed the draft guidance in PR #8452 (and this thread). I'm not sure what else is missing here.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

subhamkrai added a commit to subhamkrai/rook that referenced this issue Nov 8, 2021
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
subhamkrai added a commit to subhamkrai/rook that referenced this issue Nov 8, 2021
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
subhamkrai added a commit to subhamkrai/rook that referenced this issue Nov 8, 2021
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
subhamkrai added a commit to subhamkrai/rook that referenced this issue Nov 8, 2021
when creating ec fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
subhamkrai added a commit to subhamkrai/rook that referenced this issue Nov 9, 2021
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
@Omar007
Copy link
Contributor

Omar007 commented Nov 14, 2021

Just hit this issue. From my understanding of the docs it's not actually required to attach 2 data pools at all and just an EC data pool should also work as long as allow_ec_overwrites is enabled on the EC data pool.

That said, while it is not required to have a replicated and EC data pool (if I'm understanding the docs correctly), it does seem to be advised to potentially improve small-object backtrace update read and write performance.

@BlaineEXE
Copy link
Member

BlaineEXE commented Nov 15, 2021

Just hit this issue. From my understanding of the docs it's not actually required to attach 2 data pools at all and just an EC data pool should also work as long as allow_ec_overwrites is enabled on the EC data pool.

That said, while it is not required to have a replicated and EC data pool (if I'm understanding the docs correctly), it does seem to be advised to potentially improve small-object backtrace update read and write performance.

I'm not so sure that's true. The Ceph docs here https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites say

Erasure coded pools do not support omap, so to use them with RBD and CephFS you must instruct them to store their data in an ec pool, and their metadata in a replicated pool.

@Omar007
Copy link
Contributor

Omar007 commented Nov 15, 2021

Haha I was just about to update my post as I stumbled upon that segment in the docs as well while I was going through different parts of the docs.
Based on that it does seem like both types are needed but not just for an EC CephFS in that case, also for the RBD types. Which, as a sidenote, also means the rook-ceph-cluster chart can not currently be used to define EC BlockPools.

I'm giving it a shot to use an EC RBD now and see what happens.

EDIT: So yea that does indeed also fail. So it is 100% required then. In which case I'd think, based on the docs, that if Rook instantiates the RBDs using --data-pool and the filesystems using the equivalent of ceph fs new <fs_name> <metadata> <data> it should be working without manual intervention and editting 🤔

@travisn
Copy link
Member

travisn commented Nov 15, 2021

Which, as a sidenote, also means the rook-ceph-cluster chart can not currently be used to define EC BlockPools.

Would you mind opening a new issue for this?

@Omar007
Copy link
Contributor

Omar007 commented Nov 15, 2021

Would you mind opening a new issue for this?

Done; #9179

@Omar007
Copy link
Contributor

Omar007 commented Nov 17, 2021

I did one more test with block and fs pools and to summarise:

  • Metadata must always be on a replica pool (due to the OMAP stuff) so any setup with an EC pool always needs/has an accompanying replica pool
  • EC Block Pools work as documented in Rook and Ceph documentation; you need to define 2 CephBlookPool types and link them together in a StorageClass for Rook, which then result in the operator running the documented command and behaviour of the Ceph documentation.
  • EC Filesystems DO NOT currently work as documented; 1 data pool should be sufficient and any type goes for that singular pool (as long as it has overwrites enabled if it's an EC pool). Configuring this in Rook seems (based on a quick glance at the code) to result in the operator running the documented command of the Ceph documentation but this currently fails with exit code 22. According to several different Ceph docs I've read[1][2][3], this should be working. If my glance is correct and the operator is indeed running ceph fs new <fs_name> <metadata> <data> for this, at this point I'm thinking this might be something for members of the Ceph team to get involved in, not just the Rook operator team.

What I haven't looked into yet is the last category; CephObjectStore. I can't figure out the setup/internal workings for it based on the documentation as what I'm seeing there suggests it creates like 6 pools per zone so I have no idea how that maps on what's documented/set up for Rook's CRD which only defines a single metadata- and data pool. I am expecting the first bullet to also be applicable here but I'll need to do a test with object stores to see what happens here in regards to pool setup/creation.

[1] https://docs.ceph.com/en/latest/cephfs/createfs/#using-erasure-coded-pools-with-cephfs
[2] https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites
[3] https://docs.ceph.com/en/octopus/install/ceph-deploy/quick-cephfs/#using-erasure-coded-pools

@Omar007
Copy link
Contributor

Omar007 commented Nov 18, 2021

Looking at the possible options I'm currently seeing 2 things that could be done to improve the situation:

  • Ceph documentation should probably explicitly mention EC pools can be used but require --force as it currently doesn't until you try it.
  • Rook could/should expose an option to enable filesystem creation with force so this configuration can be realized without having to bypass Rook.

That said, I've done the operator's fs new command manually with --force (I am acknowledging the impact on backtrace updates) and it looks like the operator reconciles everything else properly 👍

@subhamkrai
Copy link
Contributor

@Omar007 currently, we are looking for other options where don't have to use -force because it has some side effects. You can look more at the PR linked to this issue. And maybe try if that works :-p

@Omar007
Copy link
Contributor

Omar007 commented Nov 18, 2021

I've seen that PR but the point is, based on the docs, going that route should not be needed.
It's an option, sure, but if I'm setting up and attaching a second data pool as EC to then use file layouts to use that as the root and thus basically the only real data pool, bypassing the original, then what's the point 🤔

And if there is side-effects not mentioned in the docs or anywhere else, that should really be fixed as well (though honestly that's a Ceph documentation problem, not such much a Rook documentation problem).

To give some more context; in my case I'm talking about a pool that will be running a mostly read-oriented workload with files of several 100's of MBs in size for the small ones and several GBs for the big ones. That one/singular drawback currently mentioned in the docs is of completely no concern to me so if the documentation is valid/complete, I do not want to deal with having a more complicated configuration and unused pools for no reason as this should be working/usable.

If there are other issues with this setup other than that single documented one, I would love to see more information about that be made available (or referenced if it is hidden somewhere completely unrelated at the moment).

subhamkrai added a commit to subhamkrai/rook that referenced this issue Nov 24, 2021
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Also, changing the pool name in storageclass-ec file.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
@ajarr
Copy link
Contributor

ajarr commented Nov 26, 2021

@subhamkrai do not use ec pools for the default (primary) data pool. Add a directory layout on root for a secondary ec data pool instead.

@batrick @kotreshhr as pointed out in the discussions above, is this documentation "For CephFS, an erasure coded pool can be set as the default data pool during file system creation or via file layouts.", incorrect? See, https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites

@subhamkrai
Copy link
Contributor

@ajarr We are now adding erasure-coded pool as non-default data fool for CephFs but while testing I didn't run the the command for layouts and we can see in the PR comment that data is going in erasure-coded pool

subhamkrai added a commit to subhamkrai/rook that referenced this issue Dec 7, 2021
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Also, changing the pool name in storageclass-ec file.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
subhamkrai added a commit to subhamkrai/rook that referenced this issue Dec 7, 2021
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Also, changing the pool name in storageclass-ec file.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
@batrick
Copy link
Contributor

batrick commented Dec 7, 2021

@subhamkrai do not use ec pools for the default (primary) data pool. Add a directory layout on root for a secondary ec data pool instead.

@batrick @kotreshhr as pointed out in the discussions above, is this documentation "For CephFS, an erasure coded pool can be set as the default data pool during file system creation or via file layouts.", incorrect? See, https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwrites

Strictly speaking it's not incorrect because you can use it for the default data pool but you should not. That documentation should mention as much.

@batrick
Copy link
Contributor

batrick commented Dec 7, 2021

(That documentation was not updated probably because it's part of doc/rados and not doc/cephfs.)

mergify bot pushed a commit that referenced this issue Dec 7, 2021
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Also, changing the pool name in storageclass-ec file.

Closes: #8210
Signed-off-by: subhamkrai <srai@redhat.com>
(cherry picked from commit 5bb29f1)
parth-gr pushed a commit to parth-gr/rook that referenced this issue Feb 22, 2022
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Also, changing the pool name in storageclass-ec file.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
parth-gr pushed a commit to parth-gr/rook that referenced this issue Feb 22, 2022
When creating EC fs, create replicated pool as primary
pool and ec pool as secondary pool, creating ec pool
as primary is not encouraged and it will lead to failure.

Also, changing the pool name in storageclass-ec file.

Closes: rook#8210
Signed-off-by: subhamkrai <srai@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants