Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool: add rbd ec pool support in external cluster #9276

Merged
merged 2 commits into from Dec 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/canary-integration-test.yml
Expand Up @@ -59,6 +59,11 @@ jobs:
kubectl -n rook-ceph cp deploy/examples/create-external-cluster-resources.py $toolbox:/etc/ceph
timeout 10 sh -c "until kubectl -n rook-ceph exec $toolbox -- python3 /etc/ceph/create-external-cluster-resources.py --rbd-data-pool-name replicapool; do echo 'waiting for script to succeed' && sleep 1; done"
- name: test external script for erasure coded rbd pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI only creates a single OSD, but for the EC pool we need 3 OSDs. Therefore, any operation that actually requires the pool to be functional will hang since ceph cannot read or write to the pool. With #9363 I will disable this test for now, and separately we will need to see if we can get 3 OSDs running in the CI so this test can be created again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but my question is how it is not failing always sometimes both ec and replicapool are created and sometimes not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the pool controller creates the pool, there are basically two steps:

  1. Create the pool
  2. Init the pool

The EC pool is always created, but then hangs on initializing the pool. So if the EC pool is the last pool created, everything will appear to work since the initialization isn't critical to the test. But if any other pool hasn't been reconciled yet, it will never get created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I undertand now... thanks @travisn

run: |
toolbox=$(kubectl get pod -l app=rook-ceph-tools -n rook-ceph -o jsonpath='{.items[*].metadata.name}')
timeout 10 sh -c "until kubectl -n rook-ceph exec $toolbox -- python3 /etc/ceph/create-external-cluster-resources.py --rbd-data-pool-name=ec-pool --rbd-metadata-ec-pool-name=replicapool; do echo 'waiting for script to succeed' && sleep 1; done"
subhamkrai marked this conversation as resolved.
Show resolved Hide resolved
- name: run external script create-external-cluster-resources.py unit tests
run: |
kubectl -n rook-ceph exec $(kubectl get pod -l app=rook-ceph-tools -n rook-ceph -o jsonpath='{.items[0].metadata.name}') -- python3 -m unittest /etc/ceph/create-external-cluster-resources.py
Expand Down
91 changes: 76 additions & 15 deletions deploy/examples/create-external-cluster-resources.py
Expand Up @@ -193,6 +193,8 @@ def gen_arg_parser(cls, args_to_parse=None):
help="Ceph Manager prometheus exporter endpoints (comma separated list of <IP> entries of active and standby mgrs)")
output_group.add_argument("--monitoring-endpoint-port", default="", required=False,
help="Ceph Manager prometheus exporter port")
output_group.add_argument("--rbd-metadata-ec-pool-name", default="", required=False,
help="Provides the name of erasure coded RBD metadata pool")

upgrade_group = argP.add_argument_group('upgrade')
upgrade_group.add_argument("--upgrade", action='store_true', default=False,
Expand All @@ -205,6 +207,49 @@ def gen_arg_parser(cls, args_to_parse=None):
args_to_parse = sys.argv[1:]
return argP.parse_args(args_to_parse)

def validate_rgw_metadata_ec_pool_name(self):
if self._arg_parser.rbd_metadata_ec_pool_name:
rbd_metadata_ec_pool_name = self._arg_parser.rbd_metadata_ec_pool_name
leseb marked this conversation as resolved.
Show resolved Hide resolved
rbd_pool_name = self._arg_parser.rbd_data_pool_name
leseb marked this conversation as resolved.
Show resolved Hide resolved

if rbd_pool_name == "":
raise ExecutionFailureException(
"Flag '--rbd-data-pool-name' should not be empty"
)

if rbd_metadata_ec_pool_name == "":
raise ExecutionFailureException(
"Flag '--rbd-metadata-ec-pool-name' should not be empty"
)

cmd_json = {
"prefix": "osd dump", "format": "json"
}
ret_val, json_out, err_msg = self._common_cmd_json_gen(cmd_json)
if ret_val != 0 or len(json_out) == 0:
raise ExecutionFailureException(
"{}".format(cmd_json['prefix']) + " command failed.\n" +
"Error: {}".format(err_msg if ret_val !=
0 else self.EMPTY_OUTPUT_LIST)
)
metadata_pool_exist, pool_exist = False, False

for key in json_out['pools']:
# if erasure_code_profile is empty and pool name exists then it replica pool
if key['erasure_code_profile'] == "" and key['pool_name'] == rbd_metadata_ec_pool_name:
metadata_pool_exist = True
# if erasure_code_profile is not empty and pool name exists then it is ec pool
if key['erasure_code_profile'] and key['pool_name'] == rbd_pool_name:
pool_exist = True

if not metadata_pool_exist:
raise ExecutionFailureException(
"Provided rbd_ec_metadata_pool name, {}, does not exist".format(rbd_metadata_ec_pool_name))
if not pool_exist:
raise ExecutionFailureException(
"Provided rbd_data_pool name, {}, does not exist".format(rbd_pool_name))
return rbd_metadata_ec_pool_name

def validate_rgw_endpoint_tls_cert(self):
if self._arg_parser.rgw_tls_cert_path:
with open(self._arg_parser.rgw_tls_cert_path, encoding='utf8') as f:
Expand Down Expand Up @@ -458,7 +503,7 @@ def create_cephCSIKeyring_cephFSProvisioner(self):
if self._arg_parser.restricted_auth_permission:
if cephfs_filesystem == "":
raise ExecutionFailureException(
"'cephfs_filesystem_name' not found, please set the '--cephfs-filesystem-name' flag")
"'cephfs_filesystem_name' not found, please set the '--cephfs-filesystem-name' flag")
cmd_json = {"prefix": "auth get-or-create",
"entity": entity,
"caps": ["mon", "allow r", "mgr", "allow rw",
Expand Down Expand Up @@ -492,9 +537,10 @@ def create_cephCSIKeyring_cephFSNode(self):
cmd_json = {"prefix": "auth get-or-create",
"entity": entity,
"caps": ["mon", "allow r",
"mgr", "allow rw",
"osd", "allow rw tag cephfs data={}".format(cephfs_filesystem),
"mds", "allow rw"],
"mgr", "allow rw",
"osd", "allow rw tag cephfs data={}".format(
cephfs_filesystem),
"mds", "allow rw"],
"format": "json"}
else:
cmd_json = {"prefix": "auth get-or-create",
Expand All @@ -518,7 +564,7 @@ def create_cephCSIKeyring_RBDProvisioner(self):
entity = "client.csi-rbd-provisioner"
if cluster_name:
entity = "client.csi-rbd-provisioner-{}".format(cluster_name)
cmd_json={}
cmd_json = {}
if self._arg_parser.restricted_auth_permission:
if rbd_pool_name == "":
raise ExecutionFailureException(
Expand Down Expand Up @@ -597,8 +643,10 @@ def get_cephfs_data_pool_details(self):
return

if matching_json_out:
self._arg_parser.cephfs_filesystem_name = str(matching_json_out['name'])
self._arg_parser.cephfs_metadata_pool_name = str(matching_json_out['metadata_pool'])
self._arg_parser.cephfs_filesystem_name = str(
matching_json_out['name'])
self._arg_parser.cephfs_metadata_pool_name = str(
matching_json_out['metadata_pool'])

if type(matching_json_out['data_pools']) == list:
# if the user has already provided data-pool-name,
Expand Down Expand Up @@ -635,7 +683,7 @@ def create_cephCSIKeyring_RBDNode(self):
entity = "client.csi-rbd-node"
if cluster_name:
entity = "client.csi-rbd-node-{}".format(cluster_name)
cmd_json={}
cmd_json = {}
if self._arg_parser.restricted_auth_permission:
if rbd_pool_name == "":
raise ExecutionFailureException(
Expand Down Expand Up @@ -758,6 +806,7 @@ def _gen_output_map(self):
self.out_map['MONITORING_ENDPOINT'], \
self.out_map['MONITORING_ENDPOINT_PORT'] = self.get_active_and_standby_mgrs()
self.out_map['RBD_POOL_NAME'] = self._arg_parser.rbd_data_pool_name
self.out_map['RBD_METADATA_EC_POOL_NAME'] = self.validate_rgw_metadata_ec_pool_name()
self.out_map['RGW_POOL_PREFIX'] = self._arg_parser.rgw_pool_prefix
if self._arg_parser.rgw_endpoint:
self.out_map['ACCESS_KEY'], self.out_map['SECRET_KEY'] = self.create_rgw_admin_ops_user()
Expand Down Expand Up @@ -811,13 +860,7 @@ def gen_json_out(self):
"userKey": self.out_map['CSI_RBD_NODE_SECRET_SECRET']
}
},
{
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
"pool": self.out_map['RBD_POOL_NAME']
}
},

{
"name": "monitoring-endpoint",
"kind": "CephCluster",
Expand All @@ -828,6 +871,24 @@ def gen_json_out(self):
}
]

if self.out_map['RBD_METADATA_EC_POOL_NAME']:
json_out.append({
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
"dataPool": self.out_map['RBD_POOL_NAME'],
"pool": self.out_map['RBD_METADATA_EC_POOL_NAME']
},
})
else:
json_out.append({
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
"pool": self.out_map['RBD_POOL_NAME']
},
})

# if 'ROOK_EXTERNAL_DASHBOARD_LINK' exists, then only add 'rook-ceph-dashboard-link' Secret
if self.out_map['ROOK_EXTERNAL_DASHBOARD_LINK']:
json_out.append({
Expand Down
2 changes: 1 addition & 1 deletion deploy/examples/pool-ec.yaml
Expand Up @@ -10,7 +10,7 @@ metadata:
namespace: rook-ceph # namespace:cluster
spec:
# The failure domain will spread the replicas of the data across different failure zones
failureDomain: osd
failureDomain: host
leseb marked this conversation as resolved.
Show resolved Hide resolved
# Make sure you have enough OSDs to support the replica size or sum of the erasure coding and data chunks.
# This is the minimal example that requires only 3 OSDs.
erasureCoded:
Expand Down
1 change: 1 addition & 0 deletions tests/scripts/github-action-helper.sh
Expand Up @@ -212,6 +212,7 @@ function deploy_cluster() {
kubectl create -f cluster-test.yaml
kubectl create -f object-test.yaml
kubectl create -f pool-test.yaml
kubectl create -f pool-ec.yaml
kubectl create -f filesystem-test.yaml
kubectl create -f rbdmirror.yaml
kubectl create -f filesystem-mirror.yaml
Expand Down