Skip to content

feat: update v1beta2#255

Draft
jschoone wants to merge 2 commits intochore/cs-cleanupfrom
feat/scs2-v1beta2
Draft

feat: update v1beta2#255
jschoone wants to merge 2 commits intochore/cs-cleanupfrom
feat/scs2-v1beta2

Conversation

@jschoone
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squash commits
  • include documentation
  • add unit tests

@jschoone jschoone changed the base branch from main to chore/cs-cleanup February 18, 2026 08:44
Migrate openstack/scs2 and docker/scs2 cluster stacks to CAPI v1beta2:

CAPI v1beta2 Migration:
- ClusterClass, KubeadmControlPlaneTemplate, KubeadmConfigTemplate -> v1beta2
- Infrastructure resources (CAPO/CAPD) remain v1beta1 (providers not yet v1beta2)
- ref -> templateRef, workers template: wrapper removed
- extraArgs/kubeletExtraArgs converted from map to list of {name, value}
- oidcConfig patches use extraArgs/- (list-append) pattern
- apiServer: {} removed (fails minProperties:1, created dynamically by patches)

Variable Defaults Consolidation:
- All ClusterClass variable defaults moved to values.yaml
- Templates reference {{ .Values.variables.* }} instead of hardcoded values
- 20+ variables for openstack/scs2, 4 for docker/scs2

New Features:
- Registry mirrors: registryMirrors array variable with containerd hosts.toml patches
- OIDC authentication: Full oidcConfig variable + 6 apiServer extraArgs patches
- certSANs: Extra Subject Alternative Names for API server cert
- AfterClusterUpgrade hook: Third lifecycle stage in clusteraddon.yaml

Security Hardening:
- controller-manager: --profiling=false, --terminated-pod-gc-threshold=100
- scheduler: --profiling=false
- etcd: metrics exposed, auto-compaction, tuned election/heartbeat
- kube-proxy: metrics on 0.0.0.0:10249 (docker/scs2)

docker/scs2 (new stack):
- Complete new cluster stack for Docker provider with v1beta2
- Cilium 1.19.0 with Gateway API + SCTP support
- metrics-server 3.13.0
- Multi-version build support (versions.yaml: 1.32-1.35)

Addon Version Bumps (openstack/scs2):
- Cilium 1.18.5 -> 1.19.0
- openstack-cloud-controller-manager 2.34.1 -> 2.34.2
- openstack-cinder-csi 2.34.1 -> 2.34.3
- versions.yaml: key renames (occm/cinder_csi -> full chart names), ubuntu field

Documentation:
- Rewritten overview.md, configuration.md
- New quickstart.mdx (Docusaurus Tabs for provider selection)
- New versioning.md, build-system.md
- Removed outdated kamaji.md

Legacy stacks (minimal):
- openstack/scs: ubuntu field added to versions.yaml
- docker/scs: imageRepository variable in values.yaml, versions.yaml created

Assisted-by: Claude Code
Signed-off-by: Jan Schoone <jan.schoone@uhurutec.com>
@jschoone jschoone changed the title refactor(build): Replace build tooling with simplified script-based a… feat: update v1beta2 Feb 18, 2026
Copy link
Member

@garloff garloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is scs3 then, because we change all the variable names again.
"Old scs2" and "new scs2" is just trying to maximize customer confusion.
I honestly would hate this as customer.

Question:
Can we keep variable names etc. stable with going to v1beta2 or do we need to change anyways? In the latter case, I'd say let's call it scs3.
Otherwise let's avoid another round of variable renames.
There is really only one variable missing from scs2 in my practical usage of the stack: serviceLoadBalancer. But adding things is not a problem (as long as the defaults are in line with old behavior).

variables:
- name: flavor
value: "SCS-2V-4-20s"
- name: rootDisk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the rootDisk for the workers? That would make sense then: SCS-4V-8 plus a 50GB cinder disk.
If it is for the controlPlane, then the example is a very bad idea: We have the flavor SCS-2V-4-20s which has a fast local SSD/NVMe which costs a bit more than a 20GB cinder disk (in most clouds), but has the big advantage of low write latency and thus a stable etcd cluster. If you now attach a 50GB cinder root disk, you get the worst of both worlds: You pay for a local SSD/NVMe which you can't even see and your etcd is not stable :-(
We had the variables controlPlaneFlavor, controlPlaneRootDisk and workerFlavor and workerRootDisk in scs2 clusterClass. I would expect to see these variables here ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for all machines and can/should be configured for different control planes and machine deployments using variables.override.
This allows us to have the same variables for clusters with a hosted control plane where controlPlane* variables make no sense


| scs (old) | scs2 (new) | Notes |
|-----------|------------|-------|
| `controller_flavor` | `flavor` | Unified; override per CP or worker via topology overrides |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we need scs3 if we change the variable names again.
Why oh why?
Also, I consider it a regression.
I regularly chose SCS-2V-4-20s flavors for the control plane, (no rootDisk) and a flavor without root disk (and potentially more vCPUs/RAM) for workers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to have the same variables for clusters with a hosted control plane where controlPlane* variables make no sense.
Variables can be overridden per machineDeployment and control plane

@garloff
Copy link
Member

garloff commented Feb 19, 2026

Sorry for my rant, but changing the clusterClass variables all the time looks like a very confused way of doing product management for cluster stacks. Communicating to users that you don't care about their pain for handling spurious changes all the time. Trying to get a Lennart-medal? Hiding it behind "old scs2" and "new scs2" adds insult to injury.
If the conversion to v1beta2 makes this hard to avoid, then let's use the opportunity to design a stable setup for the variables that we hopefully can keep stable for more than half a year and call this scs3. I would also object to merging flavor and rootDisk for workers and controlPlane. They have very different requirements.

@jschoone
Copy link
Contributor Author

jschoone commented Feb 19, 2026

Sorry for my rant, but changing the clusterClass variables all the time looks like a very confused way of doing product management for cluster stacks. Communicating to users that you don't care about their pain for handling spurious changes all the time. Trying to get a Lennart-medal? Hiding it behind "old scs2" and "new scs2" adds insult to injury. If the conversion to v1beta2 makes this hard to avoid, then let's use the opportunity to design a stable setup for the variables that we hopefully can keep stable for more than half a year and call this scs3. I would also object to merging flavor and rootDisk for workers and controlPlane. They have very different requirements.

This must be a new Cluster Stack anyway as we said for v1beta2 and the changes we want to implement for the Workload Clusters default loadbalancer configuration.
Sorry for not marking it as Draft, I did it now.

@jschoone jschoone marked this pull request as draft February 19, 2026 12:10
@garloff
Copy link
Member

garloff commented Feb 19, 2026

OK, good, so let's work on a good scs3 clusterStack with v1beta2 then.

@jschoone
Copy link
Contributor Author

There were just many discussions on the configuration at the hackathon which are not yet visible in this PR.
We already reviewed it together and it will change even more.
To avoid confusion I will find a working title for the new Cluster Stack.

Well, this is scs3 then, because we change all the variable names again. "Old scs2" and "new scs2" is just trying to maximize customer confusion. I honestly would hate this as customer.

Question: Can we keep variable names etc. stable with going to v1beta2 or do we need to change anyways? In the latter case, I'd say let's call it scs3. Otherwise let's avoid another round of variable renames. There is really only one variable missing from scs2 in my practical usage of the stack: serviceLoadBalancer. But adding things is not a problem (as long as the defaults are in line with old behavior).

Basically adding variables breaks the idea of Cluster Stacks the same since the description does not match anymore, but can be mentioned in the docs, with "available from..." or something

@Nils98Ar
Copy link
Member

@jschoone @garloff @janiskemper

Side note: The cso validating webhook does not allow changing the ClusterStack of an existing cluster (e.g. scs -> scs2) and therefore needs to be disabled for cluster resources during the upgrade.

@janiskemper
Copy link
Member

In general, my very strong opinion is that if we want people to upgrade their clusters from one version of a cluster stack to another, then it must be the same cluster stack.

I don't see any reason in creating different cluster stacks, if we want people to upgrade their clusters from one to another.

The only reason for me to create a new cluster stack, is if we want to maintain them in parallel because they both have valid, but different configuration.

If we introduce breaking changes in the configuration, then we need to think about upgrade paths (if we want people to upgrade) independently of the name of the cluster stack.

Signed-off-by: Nils Arnold <arnold@aov.de>
@Nils98Ar
Copy link
Member

@janiskemper Would it be possible in general to use SemVer for ClusterStacks, for example, to distinguish between patch, minor, and major updates with breaking changes?

@janiskemper
Copy link
Member

not right now. But I don't think it is necessary. How we do it at Syself is that we use the Kubernetes minor versions to introduce some new stuff. Within one Kubernetes minor version we stick to small changes and patches, while with the new minor versions we introduce more things.

We haven't actually had breaking changes so far, but if they are necessary, and people should still be able to upgrade their clusters, I suggest to introduce these breaking changes with a new Kubernetes minor version.

In this way, there is no need for semver versioning. This is in fact the reason why we didn't introduce it, because the versioning based on the Kubernetes version already gives you a new "major" version every few months (based on e.g. "openstack-scs-1-34-v1"

@jschoone
Copy link
Contributor Author

not right now. But I don't think it is necessary. How we do it at Syself is that we use the Kubernetes minor versions to introduce some new stuff. Within one Kubernetes minor version we stick to small changes and patches, while with the new minor versions we introduce more things.

We haven't actually had breaking changes so far, but if they are necessary, and people should still be able to upgrade their clusters, I suggest to introduce these breaking changes with a new Kubernetes minor version.

In this way, there is no need for semver versioning. This is in fact the reason why we didn't introduce it, because the versioning based on the Kubernetes version already gives you a new "major" version every few months (based on e.g. "openstack-scs-1-34-v1"

Ah, so if we follow that pattern we need the one dir per Kubernetes version or at least per all Kubernetes versions with the same features. Could work, probably not more or less messy than now :D
Let's see how this could look like

@janiskemper
Copy link
Member

we do in fact have on directory per Kubernetes minor version!

@jschoone
Copy link
Contributor Author

jschoone commented Feb 20, 2026

we do in fact have on directory per Kubernetes minor version!

Yes I know, in this repo we replaced it because of code repetition, because we always had the idea of dealing with breaking changes in different cluster stacks not their minor versions, where the subdirectories make much more sense now. I think that can really solve the problems we now have, even it still looks wrong to have the version directories in a git repo :D
I'll add a possible structure soon

This was referenced Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants