Skip to content

Conversation

@minjieqiu
Copy link

@minjieqiu minjieqiu commented Jan 29, 2026

Description.

This PR implement SOK Telemetry enhancement. ERD:
https://cisco-my.sharepoint.com/:w:/p/mqiu/IQBoVUuEEY1SR4rDjbja0iPuAeN5dxFG-K-ZPpvO6RoWJp0?e=n5R1Ow

What does this PR have in it?.

Periodically collect (once per day) and send SOK telemetry which includes:

  1. SOK telemetry.
    a. SOK version.
    b. CPU/Memory settings (limit and request) of containers including standalone, searchheadcluster, indexercluster,
    clustermaster, clustermanager, licensemaster and licensemanager.
    c. LincenseInfo (Splunk license ID and license type).
  2. Other component's telemetry which are submitted to SOK by adding key/value to the new telemetry configmap splunk-operator-manager-telemetry

Key Changes.

  • Created a new configmap splunk-operator-manager-telemetry
  • Create a new controller which reconciles on the telemetry configmap
  • Renamed the telemetry app to app_tel_for_sok

Highlight the updates in specific files

Testing and Verification.

Tested on s1, c3 and m4.

How did you test these changes? What automated tests are added?.
Added telemetry verification to existing s1, c3 and m4 tests.

Related Issues

Jira tickets, GitHub issues, Support tickets...
https://splunk.atlassian.net/browse/CSPL-4371.

PR Checklist

  • [✅ ] Code changes adhere to the project's coding standards.
  • [ ✅ ] Relevant unit and integration tests are included.
  • [✅ ] Documentation has been updated accordingly.
  • [✅ ] All tests pass locally.
  • [✅ ] The PR description follows the project's guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 29, 2026

CLA Assistant Lite bot CLA Assistant Lite bot All contributors have signed the COC ✍️ ✅

@minjieqiu
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@minjieqiu
Copy link
Author

I have read the Code of Conduct and I hereby accept the Terms

@coveralls
Copy link
Collaborator

coveralls commented Jan 29, 2026

Pull Request Test Coverage Report for Build 21936969034

Details

  • 364 of 479 (75.99%) changed or added relevant lines in 6 files are covered.
  • 398 unchanged lines in 21 files lost coverage.
  • Overall coverage decreased (-0.6%) to 85.79%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/splunk/client/enterprise.go 27 29 93.1%
pkg/splunk/enterprise/names.go 0 6 0.0%
internal/controller/telemetry_controller.go 37 46 80.43%
pkg/splunk/enterprise/telemetry.go 294 392 75.0%
Files with Coverage Reduction New Missed Lines %
pkg/splunk/client/minioclient.go 3 97.81%
pkg/splunk/enterprise/afwscheduler.go 3 92.51%
pkg/splunk/splkcontroller/controller.go 3 91.67%
internal/controller/clustermanager_controller.go 5 93.18%
internal/controller/licensemanager_controller.go 5 92.5%
internal/controller/licensemaster_controller.go 5 92.5%
internal/controller/monitoringconsole_controller.go 5 93.55%
internal/controller/searchheadcluster_controller.go 5 92.41%
internal/controller/standalone_controller.go 5 92.5%
internal/controller/clustermaster_controller.go 7 90.12%
Totals Coverage Status
Change from base Build 21358635731: -0.6%
Covered Lines: 11097
Relevant Lines: 12935

💛 - Coveralls

@minjieqiu minjieqiu marked this pull request as ready for review February 2, 2026 17:11
@minjieqiu minjieqiu changed the title [Draft]: Telemetry enhancement Telemetry enhancement Feb 2, 2026
@kasiakoziol
Copy link
Collaborator

I think it might be worth to add/update docs

"sigs.k8s.io/controller-runtime/pkg/reconcile"
)

var _ = Describe("Telemetry Controller", func() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have some controller test cases

scopedLog.Info("Updated last transmission time in configmap", "newStatus", cm.Data[telStatusKey])
}

func collectResourceTelData(resources corev1.ResourceRequirements, data map[string]string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we refactor this code to make it much easier to read, or use generics
an example

func collectDeploymentTelDataRefactored(ctx context.Context, client splcommon.ControllerClient, deploymentData map[string]interface{}) map[string][]splcommon.MetaObject {
	reqLogger := log.FromContext(ctx)
	scopedLog := reqLogger.WithName("collectDeploymentTelData")

	crWithTelAppList := make(map[string][]splcommon.MetaObject)
	scopedLog.Info("Start collecting deployment telemetry data")

	// Define all CR handlers in a slice
	handlers := []crListHandler{
		{kind: "Standalone", listFunc: listStandalones, checkTelApp: true},
		{kind: "LicenseManager", listFunc: listLicenseManagers, checkTelApp: true},
		{kind: "LicenseMaster", listFunc: listLicenseMasters, checkTelApp: true},
		{kind: "SearchHeadCluster", listFunc: listSearchHeadClusters, checkTelApp: true},
		{kind: "IndexerCluster", listFunc: listIndexerClusters, checkTelApp: false},
		{kind: "ClusterManager", listFunc: listClusterManagers, checkTelApp: true},
		{kind: "ClusterMaster", listFunc: listClusterMasters, checkTelApp: true},
		{kind: "MonitoringConsole", listFunc: listMonitoringConsoles, checkTelApp: false},
	}

	// Process each CR type using the same logic
	for _, handler := range handlers {
		processCRType(ctx, client, handler, deploymentData, crWithTelAppList, scopedLog)
	}

	return crWithTelAppList
}

// processCRType is the common processing logic for all CR types
func processCRType(
	ctx context.Context,
	client splcommon.ControllerClient,
	handler crListHandler,
	deploymentData map[string]interface{},
	crWithTelAppList map[string][]splcommon.MetaObject,
	scopedLog interface{}, // Using interface{} to avoid import issues, should be logr.Logger
) {
	items, err := handler.listFunc(ctx, client)
	if err != nil {
		// scopedLog.Error(err, "Failed to list objects", "kind", handler.kind)
		return
	}

	if len(items) == 0 {
		return
	}

	// Create per-kind data map
	perKindData := make(map[string]interface{})
	deploymentData[handler.kind] = perKindData

	// Process each item
	for _, item := range items {
		// scopedLog.Info("Collecting data", "kind", item.kind, "name", item.name, "namespace", item.namespace)

		crResourceData := make(map[string]string)
		perKindData[item.name] = crResourceData

		// Collect resource telemetry data
		if resources, ok := item.resources.(corev1.ResourceRequirements); ok {
			collectResourceTelData(resources, crResourceData)
		}

		// Add to telemetry app list if applicable
		if handler.checkTelApp && item.hasTelApp {
			crWithTelAppList[handler.kind] = append(crWithTelAppList[handler.kind], item.cr)
		} else if handler.checkTelApp && !item.hasTelApp {
			// scopedLog.Info("Telemetry app is not installed for this CR", "kind", item.kind, "name", item.name)
		}
	}
}

// List functions for each CR type - these extract the common pattern

func listStandalones(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApi.StandaloneList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: cr.Status.TelAppInstalled,
			cr:        cr,
		})
	}
	return items, nil
}

func listLicenseManagers(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApi.LicenseManagerList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: cr.Status.TelAppInstalled,
			cr:        cr,
		})
	}
	return items, nil
}

func listLicenseMasters(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApiV3.LicenseMasterList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: cr.Status.TelAppInstalled,
			cr:        cr,
		})
	}
	return items, nil
}

func listSearchHeadClusters(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApi.SearchHeadClusterList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: cr.Status.TelAppInstalled,
			cr:        cr,
		})
	}
	return items, nil
}

func listIndexerClusters(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApi.IndexerClusterList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: false, // IndexerClusters don't track TelAppInstalled
			cr:        cr,
		})
	}
	return items, nil
}

func listClusterManagers(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApi.ClusterManagerList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: cr.Status.TelAppInstalled,
			cr:        cr,
		})
	}
	return items, nil
}

func listClusterMasters(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApiV3.ClusterMasterList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: cr.Status.TelAppInstalled,
			cr:        cr,
		})
	}
	return items, nil
}

func listMonitoringConsoles(ctx context.Context, client splcommon.ControllerClient) ([]crItem, error) {
	var list enterpriseApi.MonitoringConsoleList
	err := client.List(ctx, &list)
	if err != nil {
		return nil, err
	}

	items := make([]crItem, 0, len(list.Items))
	for i := range list.Items {
		cr := &list.Items[i]
		items = append(items, crItem{
			name:      cr.GetName(),
			namespace: cr.GetNamespace(),
			kind:      cr.Kind,
			resources: cr.Spec.CommonSplunkSpec.Resources,
			hasTelApp: false, // MonitoringConsoles don't track TelAppInstalled
			cr:        cr,
		})
	}
	return items, nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code has 47% test coverage lets try to move to 90%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants