OADP-4855 Add cachePVC support for Kopia cache volume#2090
OADP-4855 Add cachePVC support for Kopia cache volume#2090mpryc wants to merge 1 commit intoopenshift:oadp-devfrom
Conversation
WalkthroughThis pull request introduces support for configurable Kopia repository cache PVCs via a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
/hold needs some more testing. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mpryc The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/nodeagent.go (1)
400-418:⚠️ Potential issue | 🟡 MinorAdd explicit guard for empty backup-repository config map name to match pattern in velero.go.
The code passes
backupRepoConfigMapNamedirectly toinstall.WithBackupRepoConfigMap()without verifying it's non-empty. While the name is conditionally populated, velero.go (line 435) shows the pattern used elsewhere in the codebase:if cmName.Name != ""before constructing the flag. Add the same guard before callinginstall.WithBackupRepoConfigMap()to ensure consistency and protect against unintended empty flag injection.
🤖 Fix all issues with AI agents
In `@bundle/manifests/oadp-operator.clusterserviceversion.yaml`:
- Around line 1055-1061: The RBAC rule incorrectly lists the resource
"persistentvolumerclaims" (typo) so it grants no access; update the resources
array used with the apiGroup "" to include the correct Kubernetes resource name
"persistentvolumeclaims" (and keep "persistentvolumes"), ensuring the
resources/verbs block around the rule that contains "persistentvolumes" and
"get" is updated to use "persistentvolumeclaims" instead of the misspelled
token.
In `@config/crd/bases/_.yaml`:
- Around line 1-13: The file _.yaml is an unused CRD scaffold containing empty
placeholders (apiVersion, kind, names, scope, versions) and should be deleted;
remove config/crd/bases/_.yaml from the repo (or move it to an archival
location) so dead scaffold CRD content is not kept alongside real CRDs
referenced by config/crd/kustomization.yaml.
🧹 Nitpick comments (1)
internal/controller/nodeagent.go (1)
130-145: Verify default StorageClass detection covers legacy annotation if required.
If any supported clusters still mark defaults with the beta annotation, this helper could miss them and cachePVC could be disabled. Confirm GA-only is acceptable or add a fallback check.🔧 Optional fallback for legacy default annotation
- if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" { + if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" || + sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" { return sc.Name, nil }
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - persistentvolumerclaims | ||
| - persistentvolumes | ||
| verbs: | ||
| - get |
There was a problem hiding this comment.
Fix invalid RBAC resource name.
persistentvolumerclaims isn’t a valid Kubernetes resource, so this rule won’t grant access as intended.
🔧 Proposed fix
- - persistentvolumerclaims
+ - persistentvolumeclaims📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - apiGroups: | |
| - "" | |
| resources: | |
| - persistentvolumerclaims | |
| - persistentvolumes | |
| verbs: | |
| - get | |
| - apiGroups: | |
| - "" | |
| resources: | |
| - persistentvolumeclaims | |
| - persistentvolumes | |
| verbs: | |
| - get |
🤖 Prompt for AI Agents
In `@bundle/manifests/oadp-operator.clusterserviceversion.yaml` around lines 1055
- 1061, The RBAC rule incorrectly lists the resource "persistentvolumerclaims"
(typo) so it grants no access; update the resources array used with the apiGroup
"" to include the correct Kubernetes resource name "persistentvolumeclaims" (and
keep "persistentvolumes"), ensuring the resources/verbs block around the rule
that contains "persistentvolumes" and "get" is updated to use
"persistentvolumeclaims" instead of the misspelled token.
config/crd/bases/_.yaml
Outdated
| --- | ||
| apiVersion: apiextensions.k8s.io/v1 | ||
| kind: CustomResourceDefinition | ||
| metadata: | ||
| annotations: | ||
| controller-gen.kubebuilder.io/version: v0.16.5 | ||
| spec: | ||
| group: "" | ||
| names: | ||
| kind: "" | ||
| plural: "" | ||
| scope: "" | ||
| versions: null |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and see its content
find . -name "_.yaml" -path "*/crd/bases/*" | head -20Repository: openshift/oadp-operator
Length of output: 92
🏁 Script executed:
# Search for references to this file in kustomization.yaml files
rg "_.yaml" -g "kustomization.yaml" -B 2 -A 2Repository: openshift/oadp-operator
Length of output: 49
🏁 Script executed:
# Search for references to config/crd/bases directory in kustomization files
rg "config/crd/bases" -g "kustomization.yaml" -B 2 -A 2Repository: openshift/oadp-operator
Length of output: 49
🏁 Script executed:
# Check if config/crd/bases directory is included in any kustomization
rg "config/crd" -g "kustomization.yaml" -lRepository: openshift/oadp-operator
Length of output: 49
🏁 Script executed:
# Check kustomization directory structure to understand build patterns
find . -name "kustomization.yaml" -type f | head -20Repository: openshift/oadp-operator
Length of output: 566
🏁 Script executed:
# Search for any reference to bases/ directory in kustomization files more broadly
rg "bases/" -g "kustomization.yaml" -B 2 -A 2 | head -50Repository: openshift/oadp-operator
Length of output: 2596
🏁 Script executed:
# Check if there's any .gitignore or build configuration that might exclude this
cat config/crd/.gitignore 2>/dev/null || echo "no .gitignore in crd directory"Repository: openshift/oadp-operator
Length of output: 97
🏁 Script executed:
# Verify the file content matches what's in the review
cat -n config/crd/bases/_.yamlRepository: openshift/oadp-operator
Length of output: 388
Remove unused placeholder CRD file.
The _.yaml file is an empty scaffold that should either be populated with a real CRD or removed entirely. It is currently not referenced in config/crd/kustomization.yaml or any build configuration, so it poses no immediate risk; however, it should be cleaned up as dead code.
🤖 Prompt for AI Agents
In `@config/crd/bases/_.yaml` around lines 1 - 13, The file _.yaml is an unused
CRD scaffold containing empty placeholders (apiVersion, kind, names, scope,
versions) and should be deleted; remove config/crd/bases/_.yaml from the repo
(or move it to an archival location) so dead scaffold CRD content is not kept
alongside real CRDs referenced by config/crd/kustomization.yaml.
7930e31 to
3081300
Compare
Add cachePVC configuration to the DPA CRD under spec.configuration.nodeAgent, enabling dedicated PVC-backed cache volumes for Kopia repository cache during data movement restore operations. This prevents ephemeral storage exhaustion on nodes with limited root filesystem space during concurrent restores. - Add CachePVCConfig field to NodeAgentConfigMapSettings using upstream types.CachePVC directly to prevent field name drift with Velero - Serialize cachePVC config into the node-agent ConfigMap so Velero provisions per-restore cache PVCs instead of using pod root filesystem - Add default StorageClass resolution when cachePVC is configured without explicit storageClass, falling back to the cluster's default SC - Pass backup-repository-configmap to node-agent DaemonSet for cache volume size calculation (cacheLimitMB * 1.2) - Add RBAC for StorageClass list/get/watch - Add unit tests for ConfigMap assembly, default SC resolution, isNodeAgentCMRequired, and DaemonSet build with backup-repo configmap - Add user documentation with configuration examples Signed-off-by: Michal Pryc <mpryc@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
3081300 to
153dbff
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/cache_volume_pvc_configuration.md`:
- Around line 32-34: Add a language tag to the fenced code block containing
"cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)" to satisfy
MD040; change the opening fence from ``` to something like ```text (or ```none)
so the block reads ```text followed by the existing line and the closing ```,
ensuring the fenced block has an explicit language tag.
In `@internal/controller/nodeagent.go`:
- Around line 130-145: The getDefaultStorageClass function currently only checks
the "storageclass.kubernetes.io/is-default-class" annotation; update it to also
consider the legacy "storageclass.beta.kubernetes.io/is-default-class"
annotation when deciding a default StorageClass. In the loop over scList.Items
(function getDefaultStorageClass) test both
sc.Annotations["storageclass.kubernetes.io/is-default-class"] and
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] for the value
"true" (treat either as indicating default) and return sc.Name when either is
present; keep error handling and the rest of the function unchanged.
| ``` | ||
| cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB) | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced code block.
Markdownlint (MD040) flags fenced blocks without a language.
Suggested fix
-```
+```text
cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
-```
+```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ``` | |
| cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB) | |
| ``` |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/cache_volume_pvc_configuration.md` around lines 32 - 34, Add a language
tag to the fenced code block containing "cacheVolumeSize = cacheLimitMB * 1.2
(rounded up to the nearest GB)" to satisfy MD040; change the opening fence from
``` to something like ```text (or ```none) so the block reads ```text followed
by the existing line and the closing ```, ensuring the fenced block has an
explicit language tag.
| // getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists. | ||
| // A StorageClass is considered default if it has the annotation | ||
| // "storageclass.kubernetes.io/is-default-class" set to "true". | ||
| func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) { | ||
| scList := &storagev1.StorageClassList{} | ||
| if err := r.Client.List(r.Context, scList); err != nil { | ||
| return "", fmt.Errorf("failed to list storage classes: %w", err) | ||
| } | ||
| for i := range scList.Items { | ||
| sc := &scList.Items[i] | ||
| if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" { | ||
| return sc.Name, nil | ||
| } | ||
| } | ||
| return "", nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for default StorageClass annotation handling to keep behavior consistent.
rg -n "is-default-class" -SRepository: openshift/oadp-operator
Length of output: 2092
🏁 Script executed:
#!/bin/bash
# Also search for StorageClass handling in general
rg -n "StorageClass" --type go -A 2 -B 2 | head -100Repository: openshift/oadp-operator
Length of output: 7344
🏁 Script executed:
#!/bin/bash
# Check the exact file and context
cat -n internal/controller/nodeagent.go | sed -n '125,150p'Repository: openshift/oadp-operator
Length of output: 1356
🌐 Web query:
Kubernetes StorageClass default annotations storageclass.beta.kubernetes.io/is-default-class
💡 Result:
storageclass.beta.kubernetes.io/is-default-class is the legacy (beta) annotation that was used to mark a StorageClass as the cluster default (so PVCs that omit storageClassName get this class). It appears in older Kubernetes examples/docs. [3]
Today, the supported/current annotation is:
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"Kubernetes treats any other value (or absence) as not default. [2]
To switch defaults:
# remove default from an existing class
kubectl patch storageclass standard -p \
'{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
# set a new default
kubectl patch storageclass gold -p \
'{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'[2]
Notes:
- If more than one
StorageClassis marked default, Kubernetes uses the most recently created default class for PVCs that omitstorageClassName. [1] - Defaulting is applied by the DefaultStorageClass admission controller. [4]
Sources: Kubernetes StorageClasses concept doc [1], “Change the default StorageClass” task doc [2], older Kubernetes blog showing the beta annotation [3], dynamic provisioning doc [4].
Consider supporting the legacy StorageClass default annotation for compatibility.
While Kubernetes currently uses storageclass.kubernetes.io/is-default-class, the legacy annotation storageclass.beta.kubernetes.io/is-default-class was historically supported. Some older clusters may still use it, causing the default StorageClass lookup to fail and unexpectedly disabling cache PVC. Adding support for both annotations is a simple, non-invasive defensive measure. Note that similar code elsewhere in this project (e.g., tests/e2e/lib/virt_storage_helpers.go) also uses only the stable annotation.
Proposed fix
- if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+ if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
+ sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists. | |
| // A StorageClass is considered default if it has the annotation | |
| // "storageclass.kubernetes.io/is-default-class" set to "true". | |
| func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) { | |
| scList := &storagev1.StorageClassList{} | |
| if err := r.Client.List(r.Context, scList); err != nil { | |
| return "", fmt.Errorf("failed to list storage classes: %w", err) | |
| } | |
| for i := range scList.Items { | |
| sc := &scList.Items[i] | |
| if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" { | |
| return sc.Name, nil | |
| } | |
| } | |
| return "", nil | |
| } | |
| // getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists. | |
| // A StorageClass is considered default if it has the annotation | |
| // "storageclass.kubernetes.io/is-default-class" set to "true". | |
| func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) { | |
| scList := &storagev1.StorageClassList{} | |
| if err := r.Client.List(r.Context, scList); err != nil { | |
| return "", fmt.Errorf("failed to list storage classes: %w", err) | |
| } | |
| for i := range scList.Items { | |
| sc := &scList.Items[i] | |
| if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" || | |
| sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" { | |
| return sc.Name, nil | |
| } | |
| } | |
| return "", nil | |
| } |
🤖 Prompt for AI Agents
In `@internal/controller/nodeagent.go` around lines 130 - 145, The
getDefaultStorageClass function currently only checks the
"storageclass.kubernetes.io/is-default-class" annotation; update it to also
consider the legacy "storageclass.beta.kubernetes.io/is-default-class"
annotation when deciding a default StorageClass. In the loop over scList.Items
(function getDefaultStorageClass) test both
sc.Annotations["storageclass.kubernetes.io/is-default-class"] and
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] for the value
"true" (treat either as indicating default) and return sc.Name when either is
present; keep error handling and the rest of the function unchanged.
|
@mpryc: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add cachePVC configuration to the DPA CRD under spec.configuration.nodeAgent, enabling dedicated PVC-backed cache volumes for Kopia repository cache during data movement restore operations. This prevents ephemeral storage exhaustion on nodes with limited root filesystem space during concurrent restores.
Why the changes were made
Fix OADP-4855