-
Notifications
You must be signed in to change notification settings - Fork 45
Deploy codabench using kubernetes #2055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Deploy codabench using kubernetes #2055
Conversation
charts/Chart.yaml
Outdated
| dependencies: | ||
| - name: rabbitmq | ||
| version: "14.7.0" | ||
| repository: "oci://registry.cern.ch/kubeflow/charts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will replace this one with the original upstream chart https://hub.docker.com/r/bitnamicharts/rabbitmq. Same for redis below.
| RUN poetry install | ||
|
|
||
| # Copy the rest of the application code | ||
| COPY . /app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the discussion about the Dockerfiles. We saw that the django component was missing the code (https://github.com/codalab/codabench/blob/develop/Dockerfile) also in the latest develop branch. Therefor we added node builder image here as well as the copying of the code.
As long as the container is self contained and does not required a volume mount, it will be compatible with this chart.
5ade7ba to
a7a58fa
Compare
| WORKDIR /app/ | ||
| COPY ./compute_worker/ ./ | ||
| COPY ./compute_worker/pyproject.toml ./ | ||
| COPY ./compute_worker/poetry.lock ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous 3 lines are already done on line 29 with the ADD compute_worker .
Deploy Codabench Using Kubernetes
This PR aims to make Codabench deployable using Kubernetes.
Main Changes
To deploy Codabench using Kubernetes, we:
Issues this PR resolves
How to Test it
minikube start)values.yamlproperlyObservations
Tests mainly made at 1.19.
To expose the codabench UI, you will have to install an ingress controller and create an Ingress.
This PR does not include a template for Minio yet. For now, testing requires an external S3 instance. We can also add the Minio chart if needed.
The helm chart passes secret values directly via environment variables, which is not the best practice, as it will make them visible in deployment tools like ArgoCD. If you want, we can update this PR (or open a future one) with this change.
Inside values.yaml, there is a section (env) with environment variables that are currently passed to multiple pods. This is similar to the .env setup in docker-compose, but as we sometimes had a hard time debugging which variable is used where, we would like to separate those in a later PR.
This PR is making Codabench deployable using Kubernetes, but it is not changing how the compute-worker is running the user submissions. We plan to open another PR in the future, updating the compute-worker to run submissions using Kubernetes. We plan to make it configurable and using docker as default so that it will not break any current deployments. Please note that for the current way the compute worker works, the Kubernetes Pods need to mount a volume from the host node, to mount the Docker socket from the node, and run privileged Docker containers, which is not the best practice. This will mess with the storage on the node and will mess with the scheduling, as containers are directly created, bypassing the Kubernetes scheduler. Therefore, we will open another PR.
The PR also contains the Dockerfiles that we are using. Ideally, we should see if they are still compatible with deploying Codabench using the original docker-compose setup and keep a single Dockerfile for each component. The images were updated because, ideally, we don't want to mount the code from a PersistentVolume or the host node but rather have the image contain the code.
Checklist