Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Data Disk restore from Image #824

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
79 changes: 79 additions & 0 deletions docs/proposals/datadisk-snapshot-restore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Data Disk Restore From Snapshot
creation-date: 2025-04-05
status: implementable
authors:
- "@elankath"
reviewers:
- "@rishabh-11"
- "@unmarshall"
- "@kon-angelo "
---

# Data Disk Restore From Snapshot

## Table of Contents

- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Alternatives](#alternatives)

## Summary

Currently, we have no support either in the shoot spec or in the [MCM Azure](https://github.com/gardener/machine-controller-manager-provider-azure) for restoring Azure Data Disks from snapshots

## Motivation
The primary motivation is to support [Integration of vSMP MemeoryOne in Azure #](https://github.com/gardener/gardener-extension-provider-azure/issues/788).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the proposal can stand on its own without any reference to vsmp.

Since we already implemented this for AWS via [Support for data volume snapshot ID ](https://github.com/gardener/gardener-extension-provider-aws/pull/112), we should introduce this enhancement
in Azure as well.

### Goals

1. Extend the provider specific [WorkerConfig](https://github.com/gardener/gardener-extension-provider-azure/blob/master/docs/usage/usage.md#workerconfig) section in the shoot YAML and support provider configuration for data-disks to support data-disk creation based from a snapshot id.


## Proposal

### Shoot Specification

At this current time, there is no support for provider specific configuration of data disks in an azure shoot spec.
The below shows an example configuration at the time of this proposal:
```yaml
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: WorkerConfig
nodeTemplate: # (to be specified only if the node capacity would be different from cloudprofile info during runtime)
capacity:
cpu: 2
gpu: 1
memory: 50Gi
```
We propose that the worker config section be enahnced to support data disk configuration
```yaml
providerConfig:
apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1
kind: WorkerConfig
dataVolumes: # <-- NEW SUB_SECTION
- name: vsmp1
snapshotName: snap-1234
nodeTemplate: # (to be specified only if the node capacity would be different from cloudprofile info during runtime)
capacity:
cpu: 2
gpu: 1
memory: 50Gi
```

In the above `snap-1234` represents the snapshot name created by an external process/tool.
See [az-snapshot-create](https://learn.microsoft.com/en-us/cli/azure/snapshot?view=azure-cli-latest#az-snapshot-create).

The Azure disk `snapshotName` is distinct from the azure `snapshotID`. The azure disk `snapshotID` is a full qualified hierarchical
identifier that includes the `snapshotName`, Azure subscription ID and resource group name:
like `/subscriptions/<AzureSubscriptionID>/resourceGroups/<resourceGroupName>/providers/Microsoft.Compute/disks/<snapshotName>`

It would be painful and errorprone to specify this in the shoot `WorkerConfig` section, so it is best that the MCM Azure provider take care
of forming the fully qualified azure disk `snapshotID` and forming the `MachineClass` for azure which is then operated on by MCM Azure Provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:
a) I am not sure why you consider error prone the copy-pasting or an azure UID compared to MCM constructing. That being said it is a trick used often by the azure controllers so we can do it too, but I don't necessarily find it easier.
b) Main issue is that you need at least to also add the resource group to the spec. Otherwise you are constrained to using disks only in the same resource group and that would make it useless for the vsmp scenario, as I imagine they want to restore from some knowd resource in a RG different from the shoots.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) I removed this statement.
b) added resourceGroup as an optional field, though I am unsure whether this is needed for vsmp case. They just appear to use the default resource group. It appears -g default is specified everywhere in their script.