kubevirt-datamover-controller deployment#2087
kubevirt-datamover-controller deployment#2087Joeavaikath wants to merge 10 commits intoopenshift:oadp-devfrom
Conversation
WalkthroughAdds KubeVirt DataMover controller: reconcile logic and deployment management, RBAC and CSV entries, readiness condition and checks, singleton validation and warnings, related image/env wiring, Makefile/tooling updates, extensive unit tests, and two tech-debt design documents. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@internal/controller/kubevirt_datamover_controller.go`:
- Around line 46-48: The two package-level variables kdmDpaResourceVersion and
previousKubevirtDatamoverEnabled are global mutable state and cause race
conditions across concurrent reconciliations; move them into the reconciler's
instance state (e.g., fields on the KubevirtDatamoverReconciler struct) or
implement a per-DPA state map keyed by NamespacedName with synchronization
(sync.RWMutex) to guard reads/writes, then update all references in reconcile
logic to use the reconciler's fields or the synchronized map instead of the
package-level variables.
- Around line 96-99: The call to common.GetImagePullPolicy(...) logs the regex
error but lets execution continue with an invalid imagePullPolicy; update the
error handling in kubevirt_datamover_controller.go (around the
GetImagePullPolicy call where imagePullPolicy and err are set) to return the
error instead of just logging it—e.g., after calling GetImagePullPolicy with
r.dpa.Spec.ImagePullPolicy and kubevirtDatamoverControllerImage, propagate the
error back (return it from the reconcile function or wrap and return) so the
controller does not proceed with an invalid imagePullPolicy; remove or keep the
log as desired but ensure the function returns on error.
| kdmDpaResourceVersion = "" | ||
| previousKubevirtDatamoverEnabled = false | ||
| ) |
There was a problem hiding this comment.
Race condition: Global state shared across concurrent reconciliations.
These package-level variables are shared across all reconciler invocations. Multiple DPA instances or concurrent reconcile loops will read/write these variables simultaneously without synchronization, leading to undefined behavior.
Consider storing this state on the reconciler struct or using a per-DPA state map with proper synchronization.
🤖 Prompt for AI Agents
In `@internal/controller/kubevirt_datamover_controller.go` around lines 46 - 48,
The two package-level variables kdmDpaResourceVersion and
previousKubevirtDatamoverEnabled are global mutable state and cause race
conditions across concurrent reconciliations; move them into the reconciler's
instance state (e.g., fields on the KubevirtDatamoverReconciler struct) or
implement a per-DPA state map keyed by NamespacedName with synchronization
(sync.RWMutex) to guard reads/writes, then update all references in reconcile
logic to use the reconciler's fields or the synchronized map instead of the
package-level variables.
There was a problem hiding this comment.
Documented this tech debt, happens in multiple other places
There was a problem hiding this comment.
@Joeavaikath, thank you for documenting this tech debt. I understand this pattern exists in other parts of the codebase.
Would you like me to create a GitHub issue to track the refactoring of these global state variables across the codebase? This could help consolidate the effort to address this pattern in a future PR.
| imagePullPolicy, err := common.GetImagePullPolicy(r.dpa.Spec.ImagePullPolicy, kubevirtDatamoverControllerImage) | ||
| if err != nil { | ||
| r.Log.Error(err, "imagePullPolicy regex failed") | ||
| } |
There was a problem hiding this comment.
Error from GetImagePullPolicy is logged but not returned.
If the regex validation fails, execution continues with a potentially invalid imagePullPolicy. Consider returning the error to surface configuration problems.
Proposed fix
imagePullPolicy, err := common.GetImagePullPolicy(r.dpa.Spec.ImagePullPolicy, kubevirtDatamoverControllerImage)
if err != nil {
- r.Log.Error(err, "imagePullPolicy regex failed")
+ return err
}🤖 Prompt for AI Agents
In `@internal/controller/kubevirt_datamover_controller.go` around lines 96 - 99,
The call to common.GetImagePullPolicy(...) logs the regex error but lets
execution continue with an invalid imagePullPolicy; update the error handling in
kubevirt_datamover_controller.go (around the GetImagePullPolicy call where
imagePullPolicy and err are set) to return the error instead of just logging
it—e.g., after calling GetImagePullPolicy with r.dpa.Spec.ImagePullPolicy and
kubevirtDatamoverControllerImage, propagate the error back (return it from the
reconcile function or wrap and return) so the controller does not proceed with
an invalid imagePullPolicy; remove or keep the log as desired but ensure the
function returns on error.
Add ConditionKubevirtDatamoverReady constant to track the health of the kubevirt-datamover-controller deployment when the kubevirt-datamover plugin is enabled. This condition will be set to: - True when the controller is disabled or deployment is ready - False when the deployment is not ready or not found Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add ReconcileKubevirtDatamoverController to manage the deployment of kubevirt-datamover-controller when the kubevirt-datamover plugin is enabled in spec.configuration.velero.defaultPlugins. Key features: - Automatic deployment creation/deletion based on plugin presence - 3-tier image resolution (UnsupportedOverrides → env var → default) - Security hardened with non-root user, read-only filesystem - DPA resource version tracking triggers pod restarts on config changes - Health probes on :8081 (healthz, readyz) - Metrics endpoint on :8443 Service Account: oadp-kubevirt-datamover-controller-manager Pattern: Follows vmfilerestore_controller.go implementation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Track kubevirt-datamover-controller deployment health in DPA status conditions, enabling users to monitor the controller via: kubectl get dpa -o yaml The KubevirtDatamoverReady condition reports: - Status: True, Reason: ComponentDisabled - plugin not enabled - Status: True, Reason: DeploymentReady - deployment is healthy - Status: False, Reason: DeploymentNotReady - replicas not ready - Status: False, Reason: ComponentNotFound - deployment missing This condition is integrated into overall DPA readiness calculation, so DPA won't report as Reconciled until all enabled components (including kubevirt-datamover) are ready. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add validation logic to prevent misconfigurations: 1. Cluster-wide singleton enforcement: Blocks DPA creation if another namespace already has kubevirt-datamover enabled. This prevents resource conflicts as the controller uses ClusterRole permissions and reconciles cluster-scoped DataUpload/DataDownload CRs. Error: "only a single instance of KubeVirt DataMover Controller can be installed across the entire cluster" 2. Soft dependency warning: Emits a warning event if kubevirt-datamover plugin is enabled without the kubevirt plugin. While not blocking, this guides users toward proper configuration: spec.configuration.velero.defaultPlugins: - kubevirt # VM backup/restore - kubevirt-datamover # Incremental backup support Pattern: Follows vmfilerestore_controller.go validation (lines 308-360) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add RELATED_IMAGE_KUBEVIRT_DATAMOVER_CONTROLLER environment variable to enable image overrides for the kubevirt-datamover-controller. Image resolution priority: 1. DPA spec.unsupportedOverrides[kubevirtDatamoverImageKey] 2. RELATED_IMAGE_KUBEVIRT_DATAMOVER_CONTROLLER env var 3. Default: quay.io/konveyor/kubevirt-datamover-controller:latest This follows the same pattern as other controllers (velero, node-agent, vm-file-restore, non-admin-controller). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add oadp-kubevirt-datamover-controller-manager to OLM bundle generation: 1. Makefile: Add to --extra-service-accounts for operator-sdk bundle 2. config/manifests/kustomization.yaml: Include RBAC resources This ensures the OLM ClusterServiceVersion (CSV) includes: - ServiceAccount: oadp-kubevirt-datamover-controller-manager - ClusterRole: kubevirt-datamover-controller permissions - ClusterRoleBinding: bind service account to cluster role Required permissions (already defined in config/kubevirt-datamover-controller_rbac/): - DataUploads (velero.io/v2alpha1) - reconcile incremental backups - VirtualMachineBackups (backup.kubevirt.io) - coordinate with KubeVirt - VirtualMachines (kubevirt.io) - read VM configurations - PersistentVolumeClaims - manage data movement Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Regenerate OLM bundle after kubevirt-datamover changes: make generate make manifests make bundle Changes: - Add kubevirt-datamover-controller-manager ServiceAccount to CSV - Include kubevirt-datamover RBAC resources - Update deployment permissions and related resources This commit contains only generated files and should not be manually edited. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
c6577cf to
09d0502
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@internal/controller/kubevirt_datamover_controller.go`:
- Around line 129-130: The function ensureKubevirtDatamoverRequiredLabels
mutates the package-level map kubevirtDatamoverDeploymentLabels via maps.Copy,
which can race; stop modifying the global map at runtime by either merging
kubevirtDatamoverControlPlaneLabel into kubevirtDatamoverDeploymentLabels once
at package initialization, or change ensureKubevirtDatamoverRequiredLabels to
create a new local map (clone kubevirtDatamoverDeploymentLabels into a fresh
map) and copy kubevirtDatamoverControlPlaneLabel into that local map before
using it — reference the function ensureKubevirtDatamoverRequiredLabels, the
global maps kubevirtDatamoverDeploymentLabels and
kubevirtDatamoverControlPlaneLabel, and the maps.Copy call when making the
change.
| func ensureKubevirtDatamoverRequiredLabels(deploymentObject *appsv1.Deployment) { | ||
| maps.Copy(kubevirtDatamoverDeploymentLabels, kubevirtDatamoverControlPlaneLabel) |
There was a problem hiding this comment.
Mutating package-level map can cause race conditions.
maps.Copy(kubevirtDatamoverDeploymentLabels, kubevirtDatamoverControlPlaneLabel) modifies the global kubevirtDatamoverDeploymentLabels map on every call. This is both unnecessary (the control plane label is already static) and unsafe with concurrent reconciliations.
Proposed fix
Either merge the labels at initialization time, or create a local copy:
func ensureKubevirtDatamoverRequiredLabels(deploymentObject *appsv1.Deployment) {
- maps.Copy(kubevirtDatamoverDeploymentLabels, kubevirtDatamoverControlPlaneLabel)
+ // Create merged labels locally to avoid mutating package-level state
+ requiredLabels := make(map[string]string)
+ maps.Copy(requiredLabels, kubevirtDatamoverDeploymentLabels)
+ maps.Copy(requiredLabels, kubevirtDatamoverControlPlaneLabel)
deploymentObjectLabels := deploymentObject.GetLabels()
if deploymentObjectLabels == nil {
- deploymentObject.SetLabels(kubevirtDatamoverDeploymentLabels)
+ deploymentObject.SetLabels(requiredLabels)
} else {
- for key, value := range kubevirtDatamoverDeploymentLabels {
+ for key, value := range requiredLabels {
deploymentObjectLabels[key] = value
}
deploymentObject.SetLabels(deploymentObjectLabels)
}
}Or merge at package initialization:
var (
kubevirtDatamoverControlPlaneLabel = map[string]string{
kubevirtDatamoverControlPlaneKey: kubevirtDatamoverControlPlaneValue,
}
- kubevirtDatamoverDeploymentLabels = map[string]string{
+ kubevirtDatamoverDeploymentLabels = map[string]string{
"app.kubernetes.io/component": "manager",
"app.kubernetes.io/created-by": common.OADPOperator,
"app.kubernetes.io/instance": kubevirtDatamoverObjectName,
"app.kubernetes.io/managed-by": "kustomize",
"app.kubernetes.io/name": "deployment",
"app.kubernetes.io/part-of": common.OADPOperator,
+ kubevirtDatamoverControlPlaneKey: kubevirtDatamoverControlPlaneValue,
}🤖 Prompt for AI Agents
In `@internal/controller/kubevirt_datamover_controller.go` around lines 129 - 130,
The function ensureKubevirtDatamoverRequiredLabels mutates the package-level map
kubevirtDatamoverDeploymentLabels via maps.Copy, which can race; stop modifying
the global map at runtime by either merging kubevirtDatamoverControlPlaneLabel
into kubevirtDatamoverDeploymentLabels once at package initialization, or change
ensureKubevirtDatamoverRequiredLabels to create a new local map (clone
kubevirtDatamoverDeploymentLabels into a fresh map) and copy
kubevirtDatamoverControlPlaneLabel into that local map before using it —
reference the function ensureKubevirtDatamoverRequiredLabels, the global maps
kubevirtDatamoverDeploymentLabels and kubevirtDatamoverControlPlaneLabel, and
the maps.Copy call when making the change.
|
let's get an image up |
The onboarding PR was merged openshift/release#73948 |
|
Mirroring jobs are failing: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-image-mirroring-konveyor |
|
Fix PR: openshift/release#74839 |
|
/retest |
shubham-pampattiwar
left a comment
There was a problem hiding this comment.
Code looks good, follows established patterns. Missing unit tests though. Let's get test coverage in before merging.
|
|
still not working.. Let's get the image right folks |
|
@weshayutin I can see the image in quay: https://quay.io/repository/konveyor/kubevirt-datamover-controller?tab=tags |
|
ah.. the image pull was just slow |
|
/woof |
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@kaovilai le bark, le review |
Signed-off-by: Joseph <jvaikath@redhat.com>
|
New changes are detected. LGTM label has been removed. |
Signed-off-by: Joseph <jvaikath@redhat.com>
|
docs/design/tech-debt readmes are just things coderabbitai pointed out. We follow the pattern in a few places so I just doc'ed it up in case we want to take a closer look at them, can ignore and delete if it's not a big deal |
golangci-lint v2.6.1 pre-built binaries were compiled with Go 1.24, causing version mismatch errors when linting code targeting Go 1.25.0. Upgraded to v2.9.0 and added -a flag to force building from source with the current Go toolchain. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Joseph <jvaikath@redhat.com>
|
@Joeavaikath: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Joeavaikath, weshayutin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Why the changes were made
Adds the controller, API, RBAC, validation code for deploying the kubevirt-datamover-controller
Builds on #2074
How to test the changes made
Once the quay image for kubevirt-datamover-controller is up:
This should trigger the deployment from the kubevirt-datamover-controller