Skip to content

OADP-4855 Add cachePVC support for Kopia cache volume#2090

Open
mpryc wants to merge 1 commit intoopenshift:oadp-devfrom
mpryc:oadp-dev-copia-cache-update
Open

OADP-4855 Add cachePVC support for Kopia cache volume#2090
mpryc wants to merge 1 commit intoopenshift:oadp-devfrom
mpryc:oadp-dev-copia-cache-update

Conversation

@mpryc
Copy link
Contributor

@mpryc mpryc commented Feb 13, 2026

Add cachePVC configuration to the DPA CRD under spec.configuration.nodeAgent, enabling dedicated PVC-backed cache volumes for Kopia repository cache during data movement restore operations. This prevents ephemeral storage exhaustion on nodes with limited root filesystem space during concurrent restores.

  • Add CachePVCConfig field to NodeAgentConfigMapSettings using upstream types.CachePVC directly to prevent field name drift with Velero
  • Serialize cachePVC config into the node-agent ConfigMap so Velero provisions per-restore cache PVCs instead of using pod root filesystem
  • Add default StorageClass resolution when cachePVC is configured without explicit storageClass, falling back to the cluster's default SC
  • Pass backup-repository-configmap to node-agent DaemonSet for cache volume size calculation (cacheLimitMB * 1.2)
  • Add RBAC for StorageClass list/get/watch
  • Add unit tests for ConfigMap assembly, default SC resolution, isNodeAgentCMRequired, and DaemonSet build with backup-repo configmap
  • Add user documentation with configuration examples

Why the changes were made

Fix OADP-4855

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Walkthrough

This pull request introduces support for configurable Kopia repository cache PVCs via a new CachePVCConfig field in the DataProtectionApplication CRD, includes default StorageClass resolution logic in the controller, adds caCertRef as a new CA certificate reference mechanism across multiple CRD schemas, and extends RBAC permissions to access StorageClasses.

Changes

Cohort / File(s) Summary
Type Definitions
api/v1alpha1/dataprotectionapplication_types.go
Adds CachePVCConfig field to NodeAgentConfigMapSettings struct and updates DeepCopyInto method to deep-copy the new field.
RBAC and Permissions
bundle/manifests/oadp-operator.clusterserviceversion.yaml, config/rbac/role.yaml, internal/controller/dataprotectionapplication_controller.go
Adds RBAC rules granting get, list, and watch permissions on storage.k8s.io/storageclasses across ClusterRole and kubebuilder markers.
CRD Schema Updates – CA Certificate Reference
bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml, bundle/manifests/oadp.openshift.io_dataprotectiontests.yaml, bundle/manifests/velero.io_backupstoragelocations.yaml, config/crd/bases/...
Introduces caCertRef field as an object containing key (required), name, and optional properties with atomic map semantics; deprecates caCert in favor of caCertRef across multiple CRD manifests and base schemas.
CRD Schema Updates – Cache PVC Configuration
bundle/manifests/oadp.openshift.io_dataprotectionapplications.yaml, config/crd/bases/oadp.openshift.io_dataprotectionapplications.yaml
Adds cachePVC field under nodeAgent.backupPVC with properties residentThresholdInMB (integer) and storageClass (string) to configure dedicated PVC for Kopia repository cache.
Controller Implementation
internal/controller/nodeagent.go
Introduces getDefaultStorageClass() method to resolve cluster default StorageClass; extends isNodeAgentCMRequired logic to consider CachePVCConfig; adds automatic default StorageClass resolution in updateNodeAgentCM; propagates backup repository config map name to DaemonSet via buildNodeAgentDaemonset.
Controller Tests
internal/controller/nodeagent_test.go
Adds comprehensive test coverage for default StorageClass resolution, cache PVC configuration validation, backup repository config map arguments, and CachePVCConfig-required logic across multiple test functions.
Dependencies
go.mod
Updates multiple transitive and direct dependencies including cloud.google.com/go storage, Google API libraries, Azure SDKs, OpenTelemetry components, and Velero/Kopia replacements.
Documentation
docs/cache_volume_pvc_configuration.md
New documentation file detailing cache volume PVC configuration, usage scenarios, YAML examples, PVC sizing calculations, and lifecycle behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@mpryc
Copy link
Contributor Author

mpryc commented Feb 13, 2026

/hold needs some more testing.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mpryc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/nodeagent.go (1)

400-418: ⚠️ Potential issue | 🟡 Minor

Add explicit guard for empty backup-repository config map name to match pattern in velero.go.

The code passes backupRepoConfigMapName directly to install.WithBackupRepoConfigMap() without verifying it's non-empty. While the name is conditionally populated, velero.go (line 435) shows the pattern used elsewhere in the codebase: if cmName.Name != "" before constructing the flag. Add the same guard before calling install.WithBackupRepoConfigMap() to ensure consistency and protect against unintended empty flag injection.

🤖 Fix all issues with AI agents
In `@bundle/manifests/oadp-operator.clusterserviceversion.yaml`:
- Around line 1055-1061: The RBAC rule incorrectly lists the resource
"persistentvolumerclaims" (typo) so it grants no access; update the resources
array used with the apiGroup "" to include the correct Kubernetes resource name
"persistentvolumeclaims" (and keep "persistentvolumes"), ensuring the
resources/verbs block around the rule that contains "persistentvolumes" and
"get" is updated to use "persistentvolumeclaims" instead of the misspelled
token.

In `@config/crd/bases/_.yaml`:
- Around line 1-13: The file _.yaml is an unused CRD scaffold containing empty
placeholders (apiVersion, kind, names, scope, versions) and should be deleted;
remove config/crd/bases/_.yaml from the repo (or move it to an archival
location) so dead scaffold CRD content is not kept alongside real CRDs
referenced by config/crd/kustomization.yaml.
🧹 Nitpick comments (1)
internal/controller/nodeagent.go (1)

130-145: Verify default StorageClass detection covers legacy annotation if required.
If any supported clusters still mark defaults with the beta annotation, this helper could miss them and cachePVC could be disabled. Confirm GA-only is acceptable or add a fallback check.

🔧 Optional fallback for legacy default annotation
-		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
+			sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
 			return sc.Name, nil
 		}

Comment on lines 1055 to 1061
- apiGroups:
- ""
resources:
- persistentvolumerclaims
- persistentvolumes
verbs:
- get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix invalid RBAC resource name.
persistentvolumerclaims isn’t a valid Kubernetes resource, so this rule won’t grant access as intended.

🔧 Proposed fix
-          - persistentvolumerclaims
+          - persistentvolumeclaims
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- apiGroups:
- ""
resources:
- persistentvolumerclaims
- persistentvolumes
verbs:
- get
- apiGroups:
- ""
resources:
- persistentvolumeclaims
- persistentvolumes
verbs:
- get
🤖 Prompt for AI Agents
In `@bundle/manifests/oadp-operator.clusterserviceversion.yaml` around lines 1055
- 1061, The RBAC rule incorrectly lists the resource "persistentvolumerclaims"
(typo) so it grants no access; update the resources array used with the apiGroup
"" to include the correct Kubernetes resource name "persistentvolumeclaims" (and
keep "persistentvolumes"), ensuring the resources/verbs block around the rule
that contains "persistentvolumes" and "get" is updated to use
"persistentvolumeclaims" instead of the misspelled token.

Comment on lines 1 to 13
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.5
spec:
group: ""
names:
kind: ""
plural: ""
scope: ""
versions: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and see its content
find . -name "_.yaml" -path "*/crd/bases/*" | head -20

Repository: openshift/oadp-operator

Length of output: 92


🏁 Script executed:

# Search for references to this file in kustomization.yaml files
rg "_.yaml" -g "kustomization.yaml" -B 2 -A 2

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Search for references to config/crd/bases directory in kustomization files
rg "config/crd/bases" -g "kustomization.yaml" -B 2 -A 2

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Check if config/crd/bases directory is included in any kustomization
rg "config/crd" -g "kustomization.yaml" -l

Repository: openshift/oadp-operator

Length of output: 49


🏁 Script executed:

# Check kustomization directory structure to understand build patterns
find . -name "kustomization.yaml" -type f | head -20

Repository: openshift/oadp-operator

Length of output: 566


🏁 Script executed:

# Search for any reference to bases/ directory in kustomization files more broadly
rg "bases/" -g "kustomization.yaml" -B 2 -A 2 | head -50

Repository: openshift/oadp-operator

Length of output: 2596


🏁 Script executed:

# Check if there's any .gitignore or build configuration that might exclude this
cat config/crd/.gitignore 2>/dev/null || echo "no .gitignore in crd directory"

Repository: openshift/oadp-operator

Length of output: 97


🏁 Script executed:

# Verify the file content matches what's in the review
cat -n config/crd/bases/_.yaml

Repository: openshift/oadp-operator

Length of output: 388


Remove unused placeholder CRD file.

The _.yaml file is an empty scaffold that should either be populated with a real CRD or removed entirely. It is currently not referenced in config/crd/kustomization.yaml or any build configuration, so it poses no immediate risk; however, it should be cleaned up as dead code.

🤖 Prompt for AI Agents
In `@config/crd/bases/_.yaml` around lines 1 - 13, The file _.yaml is an unused
CRD scaffold containing empty placeholders (apiVersion, kind, names, scope,
versions) and should be deleted; remove config/crd/bases/_.yaml from the repo
(or move it to an archival location) so dead scaffold CRD content is not kept
alongside real CRDs referenced by config/crd/kustomization.yaml.

@mpryc mpryc force-pushed the oadp-dev-copia-cache-update branch 2 times, most recently from 7930e31 to 3081300 Compare February 13, 2026 16:48
Add cachePVC configuration to the DPA CRD under spec.configuration.nodeAgent,
enabling dedicated PVC-backed cache volumes for Kopia repository cache during
data movement restore operations. This prevents ephemeral storage exhaustion
on nodes with limited root filesystem space during concurrent restores.

- Add CachePVCConfig field to NodeAgentConfigMapSettings using upstream
  types.CachePVC directly to prevent field name drift with Velero
- Serialize cachePVC config into the node-agent ConfigMap so Velero
  provisions per-restore cache PVCs instead of using pod root filesystem
- Add default StorageClass resolution when cachePVC is configured without
  explicit storageClass, falling back to the cluster's default SC
- Pass backup-repository-configmap to node-agent DaemonSet for cache
  volume size calculation (cacheLimitMB * 1.2)
- Add RBAC for StorageClass list/get/watch
- Add unit tests for ConfigMap assembly, default SC resolution,
  isNodeAgentCMRequired, and DaemonSet build with backup-repo configmap
- Add user documentation with configuration examples

Signed-off-by: Michal Pryc <mpryc@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
@mpryc mpryc force-pushed the oadp-dev-copia-cache-update branch from 3081300 to 153dbff Compare February 13, 2026 16:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/cache_volume_pvc_configuration.md`:
- Around line 32-34: Add a language tag to the fenced code block containing
"cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)" to satisfy
MD040; change the opening fence from ``` to something like ```text (or ```none)
so the block reads ```text followed by the existing line and the closing ```,
ensuring the fenced block has an explicit language tag.

In `@internal/controller/nodeagent.go`:
- Around line 130-145: The getDefaultStorageClass function currently only checks
the "storageclass.kubernetes.io/is-default-class" annotation; update it to also
consider the legacy "storageclass.beta.kubernetes.io/is-default-class"
annotation when deciding a default StorageClass. In the loop over scList.Items
(function getDefaultStorageClass) test both
sc.Annotations["storageclass.kubernetes.io/is-default-class"] and
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] for the value
"true" (treat either as indicating default) and return sc.Name when either is
present; keep error handling and the rest of the function unchanged.

Comment on lines +32 to +34
```
cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block.

Markdownlint (MD040) flags fenced blocks without a language.

Suggested fix
-```
+```text
 cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
cacheVolumeSize = cacheLimitMB * 1.2 (rounded up to the nearest GB)
```
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/cache_volume_pvc_configuration.md` around lines 32 - 34, Add a language
tag to the fenced code block containing "cacheVolumeSize = cacheLimitMB * 1.2
(rounded up to the nearest GB)" to satisfy MD040; change the opening fence from
``` to something like ```text (or ```none) so the block reads ```text followed
by the existing line and the closing ```, ensuring the fenced block has an
explicit language tag.

Comment on lines +130 to +145
// getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists.
// A StorageClass is considered default if it has the annotation
// "storageclass.kubernetes.io/is-default-class" set to "true".
func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) {
scList := &storagev1.StorageClassList{}
if err := r.Client.List(r.Context, scList); err != nil {
return "", fmt.Errorf("failed to list storage classes: %w", err)
}
for i := range scList.Items {
sc := &scList.Items[i]
if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}
}
return "", nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for default StorageClass annotation handling to keep behavior consistent.
rg -n "is-default-class" -S

Repository: openshift/oadp-operator

Length of output: 2092


🏁 Script executed:

#!/bin/bash
# Also search for StorageClass handling in general
rg -n "StorageClass" --type go -A 2 -B 2 | head -100

Repository: openshift/oadp-operator

Length of output: 7344


🏁 Script executed:

#!/bin/bash
# Check the exact file and context
cat -n internal/controller/nodeagent.go | sed -n '125,150p'

Repository: openshift/oadp-operator

Length of output: 1356


🌐 Web query:

Kubernetes StorageClass default annotations storageclass.beta.kubernetes.io/is-default-class

💡 Result:

storageclass.beta.kubernetes.io/is-default-class is the legacy (beta) annotation that was used to mark a StorageClass as the cluster default (so PVCs that omit storageClassName get this class). It appears in older Kubernetes examples/docs. [3]

Today, the supported/current annotation is:

metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"

Kubernetes treats any other value (or absence) as not default. [2]

To switch defaults:

# remove default from an existing class
kubectl patch storageclass standard -p \
  '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

# set a new default
kubectl patch storageclass gold -p \
  '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

[2]

Notes:

  • If more than one StorageClass is marked default, Kubernetes uses the most recently created default class for PVCs that omit storageClassName. [1]
  • Defaulting is applied by the DefaultStorageClass admission controller. [4]

Sources: Kubernetes StorageClasses concept doc [1], “Change the default StorageClass” task doc [2], older Kubernetes blog showing the beta annotation [3], dynamic provisioning doc [4].


Consider supporting the legacy StorageClass default annotation for compatibility.

While Kubernetes currently uses storageclass.kubernetes.io/is-default-class, the legacy annotation storageclass.beta.kubernetes.io/is-default-class was historically supported. Some older clusters may still use it, causing the default StorageClass lookup to fail and unexpectedly disabling cache PVC. Adding support for both annotations is a simple, non-invasive defensive measure. Note that similar code elsewhere in this project (e.g., tests/e2e/lib/virt_storage_helpers.go) also uses only the stable annotation.

Proposed fix
-		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
+		if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
+			sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
 			return sc.Name, nil
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists.
// A StorageClass is considered default if it has the annotation
// "storageclass.kubernetes.io/is-default-class" set to "true".
func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) {
scList := &storagev1.StorageClassList{}
if err := r.Client.List(r.Context, scList); err != nil {
return "", fmt.Errorf("failed to list storage classes: %w", err)
}
for i := range scList.Items {
sc := &scList.Items[i]
if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}
}
return "", nil
}
// getDefaultStorageClass returns the name of the cluster's default StorageClass, if one exists.
// A StorageClass is considered default if it has the annotation
// "storageclass.kubernetes.io/is-default-class" set to "true".
func (r *DataProtectionApplicationReconciler) getDefaultStorageClass() (string, error) {
scList := &storagev1.StorageClassList{}
if err := r.Client.List(r.Context, scList); err != nil {
return "", fmt.Errorf("failed to list storage classes: %w", err)
}
for i := range scList.Items {
sc := &scList.Items[i]
if sc.Annotations["storageclass.kubernetes.io/is-default-class"] == "true" ||
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] == "true" {
return sc.Name, nil
}
}
return "", nil
}
🤖 Prompt for AI Agents
In `@internal/controller/nodeagent.go` around lines 130 - 145, The
getDefaultStorageClass function currently only checks the
"storageclass.kubernetes.io/is-default-class" annotation; update it to also
consider the legacy "storageclass.beta.kubernetes.io/is-default-class"
annotation when deciding a default StorageClass. In the loop over scList.Items
(function getDefaultStorageClass) test both
sc.Annotations["storageclass.kubernetes.io/is-default-class"] and
sc.Annotations["storageclass.beta.kubernetes.io/is-default-class"] for the value
"true" (treat either as indicating default) and return sc.Name when either is
present; keep error handling and the rest of the function unchanged.

@openshift-ci
Copy link

openshift-ci bot commented Feb 13, 2026

@mpryc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.22-e2e-test-aws 153dbff link false /test 4.22-e2e-test-aws
ci/prow/4.21-e2e-test-cli-aws 153dbff link false /test 4.21-e2e-test-cli-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-gen-bugfix ai-generated-test approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant