Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file: allow to create CephFS dataPools with predefined names #9296

Merged
merged 1 commit into from Dec 14, 2021

Conversation

degorenko
Copy link
Contributor

Add an ability to specify CephFS dataPools as a map with a predefined
pool name.

Related-Issue: #9295
Signed-off-by: Denis Egorenko degorenko@mirantis.com

Description of your changes:

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@degorenko degorenko changed the title ceph: allow to create CephFS dataPools with predefined names file: allow to create CephFS dataPools with predefined names Dec 2, 2021
@degorenko degorenko force-pushed the cephfs-datapools branch 2 times, most recently from ccca2e7 to 5224605 Compare December 2, 2021 15:43
Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this seems good. It would be good to figure out how we can suggest users migrate from dataPools to dataPoolsMapped.

I also don't see changes to the documentation (Documentation/ceph-filesystem-crd.md file). On that note, it would be good to mention how users can migrate from dataPools to dataPoolsMapped in a new section in the doc, I think.

Also, I wonder if namedDataPools might be a better name than dataPoolsMapped.

Comment on lines 967 to 968
// The data pool settings, deprecated.
// At least DataPools or DataPoolsMapped should be specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is not supported for users to specify both of these?

Suggested change
// The data pool settings, deprecated.
// At least DataPools or DataPoolsMapped should be specified
// (Deprecated in favor of dataPoolsMapped) The data pool settings.
// Exactly one of DataPools or DataPoolsMapped (preferred) must be specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's discussable. TBH, give a user an ability to specify both at time will not lead to any problems. At least, we may check that pools specified in map are not intersect with pools from array. Wdyt?

Comment on lines 972 to 973
// The data pool settings with explicit data pool name.
// At least DataPools or DataPoolsMapped should be specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is not supported for users to specify both of these?

Suggested change
// The data pool settings with explicit data pool name.
// At least DataPools or DataPoolsMapped should be specified
// The data pool settings with explicit data pool name.
// Exactly one of DataPools or DataPoolsMapped (preferred) must be specified.

@@ -55,7 +55,7 @@ func createFilesystem(
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it shouldn't be supported for users to specify both of these. If so, we should also include a check to make sure that the user has not specified both DataPools and DataPoolsMapped. Does that make sense to you @travisn ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of important design points that come to mind with this feature. What about the following?

  • For backward compatibility, we should still support the dataPools. Let's keep it as a fully supported feature rather than deprecating it. Perhaps we eventually deprecate it, but with some unit tests it seems like it should be very simple to support.
  • Let's only allow either dataPools or namedDataPools. No need to support both in the same filesystem CR. The operator would fail the reconcile if both are specified.
  • The example manifests and docs could all be updated to use namedDataPools by default.
  • Ordering is important for adding the pools to cephfs. As seen in mds: create EC pool as secondary pool #8452, to support EC, a replica data pool must be created first, then the EC pool. So instead of a map we need to keep an ordered list. For the types.go, this might look like the following where the PoolSpec is an inline property:
type NamedPoolSpec struct {
	Name string `json:"name,omitempty"`
	PoolSpec
}
type FilesystemSpec struct {
	DataPools []PoolSpec `json:"dataPools"`
	NamedDataPools []NamedPoolSpec `json:"namedDataPools"`
...
}
  • For the implementation, before the ceph pools are created, we would translate all the dataPools into namedDataPools with the default pool names. Then all the pools can be processed the same. No need for a separate implementation in the operator code for creating the ceph pools.
  • If users desired to update their CRs to the named pools, they would simply need to rename dataPools to namedDataPools and set the name of each pool to the name that was being generated previously by rook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we may follow way:

type NamedPoolSpec struct {
	Name string `json:"name,omitempty"`
        Default bool `json:"default"`
	PoolSpec
}
type FilesystemSpec struct {
	DataPools []PoolSpec `json:"dataPools"`
	NamedDataPools []NamedPoolSpec `json:"namedDataPools"`
...
}

In that case we dont need an ordered list, because ordering seems to me is a hardcode and workaround. Before filesystem is going to be created check that default pool is present and it is the only one. For backward compatibility, since right now 1 element of array is assuming to be default we dont have any problems with transition to a new structure.

@travisn @BlaineEXE what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the designation of the first/default pool is the only thing cephfs cares about. The ordering of the additional data pools is unimportant to ceph.

I still like the idea of keeping the ordered list. There are several advantages:

  • No need for a new property to designate the default
  • Follows the same pattern we've already been using, we're just adding a name property and renaming it to namedDataPools.
  • Very easy to convert existing filesystem CRs to the new approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should be able to get away with not renaming it to namedDataPools, we should just be able to add the name to the existing pools. Then if the name is empty, we generate the name.

type NamedPoolSpec struct {
	Name string `json:"name,omitempty"`
	PoolSpec
}
type FilesystemSpec struct {
	DataPools []NamedPoolSpec `json:"dataPools"`
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, using a new default property will explicitly show users, which pool is used as default. And users will be freed from any ordering mistake during pool removal. We still may follow current pattern, mark as default only 1st element of list and convert existing filesystem CRs to new approach.

 type NamedPoolSpec struct {
	Name string `json:"name,omitempty"`
	PoolSpec
}
type FilesystemSpec struct {
	DataPools []NamedPoolSpec `json:"dataPools"`
...
}

This can lead to backward compatibility problems for those, who are using rook types. Because from Golang perspective dataPool description will be changed from just datapool[i] <- poolSpec to datapool[i].PoolSpec <- poolSpec. I think it is better do not mix it right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the initial creation of the filesystem, is there a way to change the primary pool? For example, if you remove a named pool, it wouldn't affect the other named pools being removed. Or what is the issue with pool removal?

If we use the inline flag, I believe the compatibility would be fine, it would just be handled by the json deserialization inside the operator.

type NamedPoolSpec struct {
	Name     string `json:"name,omitempty"`
	PoolSpec `json:",inline"`
}

But another thought I'm having is that we could just add the name property to the PoolSpec. This would allow the user to specify pool names on the cephfs pools to override the default, and with #9209 for configuring the built-in ceph pool. It just wouldn't make sense in some areas like the rgw pool names, which I don't believe are configurable, so maybe that approach is less preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so if we are assuming that first element of dataPools array is always coming as default one (on creation) and we dont manage pool removal for cephfs, let's go your way.

Another reason for specifying default property is adding validation to avoid issue #8452 - give to a user an explicit error message what is going wrong without double checking docs. But since user is allowed to drop from spec any pool without its actual removal, then it will not work correctly. In the same time, we can add such validation on creation. Do you agree? I will post a comment in related issue.

I'll post update as discussed above, with auto transition to a new NamedPoolSpec structure, all pools present in current dataPools will be named by default pool name + its index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a request on #8452 to not prevent the creation in that case. Instead of blocking the case, logging a warning seems sufficient.

I'll post update as discussed above, with auto transition to a new NamedPoolSpec structure, all pools present in current dataPools will be named by default pool name + its index.

Agreed on this approach, thanks.

Comment on lines 178 to 170
for name, pool := range spec.DataPoolsMapped {
poolName := name
err := cephclient.CreatePoolWithProfile(context, clusterInfo, clusterSpec, poolName, pool, "")
if err != nil {
return errors.Wrapf(err, "failed to update datapool %q", poolName)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for name, pool := range spec.DataPoolsMapped {
poolName := name
err := cephclient.CreatePoolWithProfile(context, clusterInfo, clusterSpec, poolName, pool, "")
if err != nil {
return errors.Wrapf(err, "failed to update datapool %q", poolName)
}
}
for poolName, poolSpec := range spec.DataPoolsMapped {
err := cephclient.CreatePoolWithProfile(context, clusterInfo, clusterSpec, poolName, poolSpec, "")
if err != nil {
return errors.Wrapf(err, "failed to update datapool %q", poolName)
}
}

@degorenko
Copy link
Contributor Author

@BlaineEXE yeah, i skipped changes to documentation just in case if overall approach will not suit community vision. Will add docs. For users we can raise info/warning message that old way is deprecated and they can switch to a new one. Also mention it in docs.

Also, I wonder if namedDataPools might be a better name than dataPoolsMapped.

That was my first thought too :) Will update.

@@ -55,7 +55,7 @@ func createFilesystem(
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of important design points that come to mind with this feature. What about the following?

  • For backward compatibility, we should still support the dataPools. Let's keep it as a fully supported feature rather than deprecating it. Perhaps we eventually deprecate it, but with some unit tests it seems like it should be very simple to support.
  • Let's only allow either dataPools or namedDataPools. No need to support both in the same filesystem CR. The operator would fail the reconcile if both are specified.
  • The example manifests and docs could all be updated to use namedDataPools by default.
  • Ordering is important for adding the pools to cephfs. As seen in mds: create EC pool as secondary pool #8452, to support EC, a replica data pool must be created first, then the EC pool. So instead of a map we need to keep an ordered list. For the types.go, this might look like the following where the PoolSpec is an inline property:
type NamedPoolSpec struct {
	Name string `json:"name,omitempty"`
	PoolSpec
}
type FilesystemSpec struct {
	DataPools []PoolSpec `json:"dataPools"`
	NamedDataPools []NamedPoolSpec `json:"namedDataPools"`
...
}
  • For the implementation, before the ceph pools are created, we would translate all the dataPools into namedDataPools with the default pool names. Then all the pools can be processed the same. No need for a separate implementation in the operator code for creating the ceph pools.
  • If users desired to update their CRs to the named pools, they would simply need to rename dataPools to namedDataPools and set the name of each pool to the name that was being generated previously by rook.

@degorenko
Copy link
Contributor Author

@travisn @BlaineEXE updated. May be the last question, which we need to answer: do we need to create cephfs data pools with cephfs name prefix? Or name totally comes from spec?

@degorenko degorenko force-pushed the cephfs-datapools branch 2 times, most recently from 16c2741 to ce78486 Compare December 7, 2021 14:03
DataPools []PoolSpec `json:"dataPools"`
// (Deprecated in favor of namedDataPools) The data pool settings.
// +optional
DataPools []PoolSpec `json:"dataPools,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion, I was thinking we would stick with only the dataPools and effectively be converting all of the FS data pools into named pools by changing this type to NamedPoolSpec. Since it's only internal json serialization by the operator, to the CRD it simply looks like the name is added, and no CRD type conversion needed.

Suggested change
DataPools []PoolSpec `json:"dataPools,omitempty"`
DataPools []NamedPoolSpec `json:"dataPools,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case name parameter should be an optional for NamedPoolSpec - and if it is missed in spec we will use current default naming <cephfsName>-data<index>. So, then my question is still actual: do we need to create cephfs data pools with cephfs name prefix for NamedPoolSpec, e.g. <cephfsName>-<nameFromSpec> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, such behaviour forces users to add pools with names add the end of current list only, otherwise it may brake current naming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the name would be optional, and if blank rook would set the generated pool name as today.

If the set the name in a filesystem CR that already exists, and it is different than the existing generated pool name, it would certainly create a problem so we should add a check for that case. Otherwise, setting the pool name would cause a new pool to be created and added to the filesystem, and the default-named pool(s) would remain.

Also, such behaviour forces users to add pools with names add the end of current list only, otherwise it may brake current naming.

If they add the name, maybe we should enforce that the name be set on all the pools in the list. This would allow them to be reordered, even for the existing default-named pools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the set the name in a filesystem CR that already exists, and it is different than the existing generated pool name, it would certainly create a problem so we should add a check for that case.

We can't add a check for that, since we are not tracking diff for spec (only showing it as simple diff string from comparer). Also, it is mostly same to situation, when user dropped old pool and added a new one named - real diff impossible to track. So, i'd rather to leave it up to user and warning them to avoid such situations, like add a different/new names for already present pools.

If they add the name, maybe we should enforce that the name be set on all the pools in the list. This would allow them to be reordered, even for the existing default-named pools.

I dont think that it is needed. Since we already have an ordered list, user may just add new pools with predefined names and do not touch already present.

Again, my question is not answered: do we need to create cephfs data pools with cephfs name prefix for NamedPoolSpec, e.g. - ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the set the name in a filesystem CR that already exists, and it is different than the existing generated pool name, it would certainly create a problem so we should add a check for that case.

We can't add a check for that, since we are not tracking diff for spec (only showing it as simple diff string from comparer). Also, it is mostly same to situation, when user dropped old pool and added a new one named - real diff impossible to track. So, i'd rather to leave it up to user and warning them to avoid such situations, like add a different/new names for already present pools.

Agreed, not sure how to check for that.

If they add the name, maybe we should enforce that the name be set on all the pools in the list. This would allow them to be reordered, even for the existing default-named pools.

I dont think that it is needed. Since we already have an ordered list, user may just add new pools with predefined names and do not touch already present.

It's not really necessary, but it could help save some error scenarios or at least confusion around some pools having names and some not. We can skip the error check though, I'm fine with that.

Again, my question is not answered: do we need to create cephfs data pools with cephfs name prefix for NamedPoolSpec, e.g. - ?

This would certainly help ensure the pool names are unique when there are multiple filesystems with named pools, otherwise the pools could mistakenly overlap. My assumption with #9295 was that arbitrary pool names could be used, but sounds like you're interested in better names than data0, data1, ...

Btw I'm working on #9209 and the NamedPoolSpec will be useful for that implementation as well. I hope to have a PR open by tomorrow if not today. That change should set a foundation for what we want to do with this PR, so perhaps you wait for that to avoid too much implementation overlap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #9363, hopefully this will be a much more straight forward implementation. The NamedPoolSpec is already passed down to the pool creation code, so it might basically look like:

  • Change the types.go for dataPools to NamedPoolSpec
  • In the filesystem.go, where you see dataPoolNames[i], implement the name override if specified instead of using the generated name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #9363, hopefully this will be a much more straight forward implementation. The NamedPoolSpec is already passed down to the pool creation code, so it might basically look like:
Change the types.go for dataPools to NamedPoolSpec

Ok, let's wait. Then i'll just rebase, once your PR is merged.

In the filesystem.go, where you see dataPoolNames[i], implement the name override if specified instead of using the generated name.

Actually, it is already done. But again, should we continue to use real pool names as <cephfsName>-<poolNameFromSpec> or just <poolNameFromSpec>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But again, should we continue to use real pool names as <cephfsName>-<poolNameFromSpec> or just <poolNameFromSpec>?

I'm thinking <cephfsName>-<poolNameFromSpec> to prevent conflicts between filesystems. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree. I'll update.

@mergify
Copy link

mergify bot commented Dec 7, 2021

This pull request has merge conflicts that must be resolved before it can be merged. @degorenko please rebase it. https://rook.io/docs/rook/latest/development-flow.html#updating-your-fork

@degorenko
Copy link
Contributor Author

@travisn is there any reason to wait actually #9363 ? Defining NamedPoolSpec in current change is not complex. May be we can go ahead and merge since CI is passed and all comments resolved?

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn is there any reason to wait actually #9363 ? Defining NamedPoolSpec in current change is not complex. May be we can go ahead and merge since CI is passed and all comments resolved?

Yes, we could go ahead with it, it's independent enough. Just a few more small comments...

@@ -31,7 +31,8 @@ spec:
replicated:
size: 3
dataPools:
- failureDomain: host
- name: replicated-pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool in the name seems redundant, how about removing the pool suffix from the examples?

Suggested change
- name: replicated-pool
- name: replicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but since it is for example, i just wanted to mention pool suffix for storage class. Otherwise, in storage class we will have pool name just as myfs-replicated. Having myfs-replicated-pool (with pool name replicated-pool) a bit more user friendly, haven't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the storage class, the setting is: pool: myfs-replicated-pool, right? If the setting name is pool, isn't it clear it's a pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you win :)

@@ -639,6 +639,14 @@ type PoolSpec struct {
Quotas QuotaSpec `json:"quotas,omitempty"`
}

// NamedPoolSpec represents the named ceph pool spec
type NamedPoolSpec struct {
// Name of pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Name of pool
// Name of the pool

@@ -278,10 +278,16 @@ func downFilesystem(context *clusterd.Context, clusterInfo *cephclient.ClusterIn
}

// generateDataPoolName generates DataPool name by prefixing the filesystem name to the constant DataPoolSuffix
// or get predefined name from spec
func generateDataPoolNames(f *Filesystem, spec cephv1.FilesystemSpec) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a unit test for this name generation method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add it

@@ -187,7 +191,7 @@ See the official cephfs mirror documentation on [how to add a bootstrap peer](ht
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](ceph-pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.

* `metadataPool`: The settings used to create the filesystem metadata pool. Must use replication.
* `dataPools`: The settings to create the filesystem data pools. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
* `dataPools`: The settings to create the filesystem data pools. Optionally, pool name can be specified with `name` field. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `dataPools`: The settings to create the filesystem data pools. Optionally, pool name can be specified with `name` field. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
* `dataPools`: The settings to create the filesystem data pools. Optionally, a pool name can be specified with the `name` field to override the generated name. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally, a pool name can be specified with the name field to override the generated name.

How about:

Optionally, a pool name can be specified with the name field to override the default generated name.

@@ -5797,7 +5799,6 @@ spec:
type: object
x-kubernetes-preserve-unknown-fields: true
required:
- dataPools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should dataPools really be optional now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping required makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it still stays required

@@ -4968,7 +4971,6 @@ spec:
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
nullable: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have had upgrade problems with things being turned from nullable to non-nullable. Should we keep this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really do require at least one data pool, so not sure there would be an impact here either way.

Comment on lines -20 to +19
pool: myfs-ec-data1
pool: myfs-ec-erasurecoded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also some examples in nfs.yaml and nfs-test.yaml IIRC. Those might have been gone for a bit, so rebasing and doing a search for myfs-data0 again might help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nsf.yaml contains next:

  rados:
    # RADOS pool where NFS configs are stored.
    # In this example the data pool for the "myfs" filesystem is used.
    # If using the object store example, the data pool would be "my-store.rgw.buckets.data".
    # Note that this has nothing to do with where exported file systems or object stores live.
    pool: myfs-data0

So i guess, it is clear, why used such name, isn't it?

// NamedPoolSpec represents the named ceph pool spec
type NamedPoolSpec struct {
// Name of the pool
Name string `json:"name,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still struggling a bit with calling this name since the name ends up being partly generated. At the end of the day, I'm not sure something like generateName is very useful here and that improving the docs is the best thing to do.

We still might want to make this struct CephFilesystem specific (i.e., not reusing the named pool spec for CephBlockPool) and having the doc text be something like "unique identifier for the pool within the filesystem. will be prefixed with the metadata name." We should also mention that we do most definitely recommend setting the pool name here since it helps resolve errors that could arise from altering the pool list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the NamedPoolSpec to use both for the CephBlockPool and the CephFilesystem. In the Filesystem docs sounds good to just be clear that the filesystem name is prepended to the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Filesystem docs sounds good to just be clear that the filesystem name is prepended to the name.

Yeah, we may add note that name for pool will be prepended to the filesystem name.

// +nullable
DataPools []PoolSpec `json:"dataPools"`
// The data pool settings, with optional predefined pool name.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a carryover from the previous design?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like, i'll return nullable as it was before

@@ -187,7 +191,7 @@ See the official cephfs mirror documentation on [how to add a bootstrap peer](ht
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](ceph-pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.

* `metadataPool`: The settings used to create the filesystem metadata pool. Must use replication.
* `dataPools`: The settings to create the filesystem data pools. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
* `dataPools`: The settings to create the filesystem data pools. Optionally, a pool name can be specified with the `name` field to override the default generated name. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
Copy link
Member

@BlaineEXE BlaineEXE Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to be clearer here that while the name is optional and that we highly recommend setting the name to prevent issues that can arise from modifying the spec in a way that causes Rook to lose the original pool ordering. I would also suggest adding name as a field under dataPools here so that we can also document that the actual pool name stored in Ceph for each pool will be <metadata.name>-<name>. It will be easier to read the section just on name if it is separated. We can still mention here in the dataPools section that we recommend setting the name parameter on on pools that is documented below.

Copy link
Member

@BlaineEXE BlaineEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small doc and example manifest updates. Thanks, @degorenko

@@ -187,7 +191,7 @@ See the official cephfs mirror documentation on [how to add a bootstrap peer](ht
The pools allow all of the settings defined in the Pool CRD spec. For more details, see the [Pool CRD](ceph-pool-crd.md) settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.

* `metadataPool`: The settings used to create the filesystem metadata pool. Must use replication.
* `dataPools`: The settings to create the filesystem data pools. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
* `dataPools`: The settings to create the filesystem data pools. Optionally, a pool name can be specified with the `name` field to override the default generated name, then final pool name will consist of filesystem name and pool name, eg. `<fsName>-<poolName>`. We are highly recommend to specify `name` to prevent issues that can arise from modifying the spec in a way that causes Rook to lose the original pool ordering. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users will be able to read the docs more easily if the name field has its own section. This way, we can be clear about other features of dataPools without creating a paragraph that is cumbersome to understand.

Suggested change
* `dataPools`: The settings to create the filesystem data pools. Optionally, a pool name can be specified with the `name` field to override the default generated name, then final pool name will consist of filesystem name and pool name, eg. `<fsName>-<poolName>`. We are highly recommend to specify `name` to prevent issues that can arise from modifying the spec in a way that causes Rook to lose the original pool ordering. If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
* `dataPools`: The settings to create the filesystem data pools. Optionally (and we highly recommend), a pool name can be specified with the `name` field to override the default generated name; see more below.If multiple pools are specified, Rook will add the pools to the filesystem. Assigning users or files to a pool is left as an exercise for the reader with the [CephFS documentation](http://docs.ceph.com/docs/master/cephfs/file-layouts/). The data pools can use replication or erasure coding. If erasure coding pools are specified, the cluster must be running with bluestore enabled on the OSDs.
* `name`: (optional, and highly recommended) Override the default generated name of the pool. The final pool name will consist of the filesystem name and pool name, e.g., `<fsName>-<poolName>`. We highly recommend to specify `name` to prevent issues that can arise from modifying the spec in a way that causes Rook to lose the original pool ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pool: myfs-data0
pool: myfs-replicated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change myfs-data0 in other manifests (like nfs.yaml and nfs-test.yaml) also. Otherwise, the NFS examples won't successfully run with the new filesystem manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated.

Add an ability to create data pools for CephFS with predefined
names.

Related-Issue: rook#9295
Signed-off-by: Denis Egorenko <degorenko@mirantis.com>
@BlaineEXE BlaineEXE merged commit 24a77d4 into rook:master Dec 14, 2021
mergify bot added a commit that referenced this pull request Dec 14, 2021
file: allow to create CephFS dataPools with predefined names (backport #9296)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants