-
-
Notifications
You must be signed in to change notification settings - Fork 41
refactor(docker) #1130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(docker) #1130
Conversation
- Replaced Trivy with Docker Scout for container vulnerability scanning in CI/CD documentation. - Added a new guide for building Docker images, covering local builds, optimization, and troubleshooting. - Introduced a reference guide for Docker production deployment, detailing configuration and usage. - Enhanced the database configuration documentation to clarify the use of Alpine-based PostgreSQL. - Updated the Docker installation guide to highlight production optimizations and service management. - Created a troubleshooting guide for common Docker issues encountered while running Tux.
…script - Introduced functions to validate build configuration, calculate SOURCE_DATE_EPOCH for reproducible builds, and generate BUILD_DATE in ISO 8601 format. - Updated command usage in the script to include new functionalities for improved build management.
…ching improvements - Added validation steps for build configuration and generation of build date to ensure reproducibility. - Replaced Trivy with Docker Scout for vulnerability scanning, improving security analysis during builds. - Implemented hybrid caching strategy using both registry and GitHub Actions cache for faster and more efficient builds. - Updated Docker build steps to include SBOM and provenance for enhanced supply chain security.
- Updated CACHE_DIR to resolve relative paths to absolute, ensuring proper handling of cache directory permissions.
…p logic - Improved database connection logic to allow for customizable host and port via environment variables. - Added user-friendly messages indicating connection attempts and results. - Streamlined bot startup process by validating configuration before execution and ensuring proper signal handling.
… and directories - Added entries for various development, test, and documentation files to prevent them from being included in Docker images. - Included cache directories, plugins, example code, type stubs, coverage reports, and Nix development files for better build management.
- Introduced a new `compose.production.yaml` file for production deployments, removing development-specific features. - Configured services for PostgreSQL and the Tux application, including health checks and environment variable management. - Ensured proper logging and resource management for production use, with a focus on security and performance.
…and service management - Changed restart policy for PostgreSQL and Adminer services to a more standard format. - Updated health check for the Tux service to use a simpler command for process verification. - Added stop grace period for services to ensure proper shutdown handling. - Introduced a development profile for Adminer to better separate development tooling from production services.
…th common setup - Introduced a common setup stage to share configurations across base and production stages. - Added common environment variables and user setup to streamline the build process. - Updated the production stage to utilize dynamic Python version detection for cleanup. - Enhanced the build process by excluding development dependencies in the production image.
Reviewer's GuideRefactors the Docker build and deployment pipeline to share a common base image stage, tighten production images (no dev deps, dynamic cleanup), introduce reproducible build metadata and Docker Scout-based scanning in CI, and split user-facing Docker docs into install, operations, production, troubleshooting, and image-building guides with a dedicated production compose file. Flow diagram for updated GitHub Actions Docker CI pipelineflowchart TD
trigger["Workflow trigger<br/>push / pull_request / dispatch"]
subgraph Job_validate["Job: Validate Docker image"]
v1["Checkout code"]
v2["Setup Buildx"]
v3["Login to registry (docker/login-action)"]
v4["Generate Docker metadata (docker/metadata-action)"]
v5["Generate PR version (docker.sh generate-pr-version)"]
v6["Generate build date (docker.sh generate-build-date)"]
v7["Build validation image<br/>target=production, load=true<br/>cache: registry + gha"]
v8_pr["Docker Scout compare<br/>against production env"]
v8_other["Docker Scout quickview"]
v9_other["Docker Scout recommendations"]
end
subgraph Job_build["Job: Build and push production image"]
b1["Checkout code"]
b2["Setup Buildx"]
b3["Login to registry (docker/login-action)"]
b4["Generate Docker metadata"]
b5["Generate release version<br/>(docker.sh generate-release-version)"]
b6["Validate build config<br/>(docker.sh validate-build-config)"]
b7["Calculate SOURCE_DATE_EPOCH<br/>(docker.sh calculate-source-date-epoch)"]
b8["Generate build date<br/>(docker.sh generate-build-date)"]
b9["Build for validation<br/>target=production, push=false, load=true<br/>cache: registry + gha"]
b10["Docker Scout CVE scan<br/>critical,high only<br/>exit-code=1 on findings"]
b11["Push validated image with attestations<br/>push=true, provenance=mode=max, sbom=true"]
end
trigger --> Job_validate
trigger --> Job_build
v1 --> v2 --> v3 --> v4 --> v5 --> v6 --> v7
v7 --> v8_pr
v7 --> v8_other
v8_other --> v9_other
b1 --> b2 --> b3 --> b4 --> b5 --> b6 --> b7 --> b8 --> b9 --> b10
b10 -->|"no critical/high CVEs"| b11
b10 -->|"critical/high CVEs found"| b10_fail["Fail job, do not push image"]
Flow diagram for Containerfile multi-stage Docker buildflowchart TD
common["Stage common<br/>FROM python:3.13.8-slim@sha256:...<br/>labels, nonroot user, dpkg slimming,<br/>Python env vars"]
base["Stage base<br/>FROM common<br/>apt-get build deps (curl, build-essential, git, etc.)"]
build["Stage build<br/>FROM base<br/>install uv, sync deps (no default groups),<br/>copy app, create plugins dirs, build project"]
dev["Stage dev<br/>FROM build<br/>optional zsh/dev tools,<br/>copy /app, set dev env (TLDR_CACHE_DIR),<br/>USER nonroot, entrypoint"]
production["Stage production<br/>FROM common<br/>runtime-only deps, copy venv + app from build,<br/>set prod env, dynamic Python cleanup"]
common --> base --> build
build --> dev
build --> production
Flow diagram for updated Docker entrypoint startup logicflowchart TD
start["Container start"]
w1["wait_for_db()<br/>resolve POSTGRES_HOST/POSTGRES_PORT with defaults"]
w2["Loop up to 30 attempts<br/>Python socket connect to db_host:db_port"]
w3_ok["Connection successful"]
w3_fail["Exceeded max attempts"]
w4["Log timeout with host/port hints<br/>exit 1"]
w5["Log progress with symbols<br/>⏳ and ✅"]
s1["start_bot_with_retry()<br/>attempts = 0"]
s2["Validate configuration (validate_config)"]
s3_fail["Validation failed"]
s3_ok["Validation succeeded"]
s4_retry_check["attempts < MAX_STARTUP_ATTEMPTS?"]
s5_wait["Sleep STARTUP_DELAY and retry"]
s6_exit["Max attempts reached<br/>exit 1"]
s7["Log 'Configuration validated. Starting bot...' "]
s8["exec tux start<br/>(replace shell for proper signal handling)"]
start --> w1 --> w2
w2 -->|"success"| w3_ok --> w5 --> s1
w2 -->|"after 30 failed attempts"| w3_fail --> w4
s1 --> s2
s2 -->|"invalid"| s3_fail --> s4_retry_check
s4_retry_check -->|"yes"| s5_wait --> s1
s4_retry_check -->|"no"| s6_exit
s2 -->|"valid"| s3_ok --> s7 --> s8
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Dependency ReviewThe following issues were found:
License Issues.github/workflows/docker.yml
OpenSSF Scorecard
Scanned Files
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughRestructures Docker build (multi-stage Containerfile), expands .dockerignore, adds production compose and compose.production.yaml, enhances entrypoint/startup and TLDR cache path, adds build-metadata shell commands, revamps GitHub Actions with registry-backed caching and Docker Scout validation, and adds extensive Docker documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer / PR
participant Actions as GitHub Actions (Buildx)
participant Daemon as Runner Docker Daemon
participant Scout as Docker Scout
participant Registry as Container Registry (GHCR)
Dev->>Actions: push PR / push to branch (trigger)
Actions->>Actions: generate_build_date(), calculate_source_date_epoch()
Actions->>Actions: build (validation, no-push) -- uses hybrid Buildx cache
Actions->>Daemon: load image into local daemon for scanning
Actions->>Scout: Docker Scout Compare / Quickview (scan loaded image)
Scout-->>Actions: scan results (CVE / policy)
alt scan passes
Actions->>Registry: push validated image + SBOM & attestations
Registry-->>Actions: push OK
else scan fails
Scout-->>Actions: report high-severity findings
Actions-->>Dev: fail workflow (block push)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- The new
calculate_source_date_epochhelper assumes numeric timestamps (milliseconds) but in the workflow it is passedgithub.event.head_commit.timestampandgithub.event.repository.created_at, which are ISO 8601 strings, so the arithmetic$((commit_timestamp / 1000))will fail at runtime; consider parsing these strings withdate -dor switching to an existing numeric field. - The
tuxhealthchecks in bothcompose.yamlandcompose.production.yamlnow rely onps aux | grep -q "tux start", which is somewhat brittle (e.g. changes in command line, multiple processes); you might want to switch to a more robust check (e.g.pgrepon the main process or a lightweight internal HTTP/IPC health endpoint) to avoid false positives/negatives.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `calculate_source_date_epoch` helper assumes numeric timestamps (milliseconds) but in the workflow it is passed `github.event.head_commit.timestamp` and `github.event.repository.created_at`, which are ISO 8601 strings, so the arithmetic `$((commit_timestamp / 1000))` will fail at runtime; consider parsing these strings with `date -d` or switching to an existing numeric field.
- The `tux` healthchecks in both `compose.yaml` and `compose.production.yaml` now rely on `ps aux | grep -q "tux start"`, which is somewhat brittle (e.g. changes in command line, multiple processes); you might want to switch to a more robust check (e.g. `pgrep` on the main process or a lightweight internal HTTP/IPC health endpoint) to avoid false positives/negatives.
## Individual Comments
### Comment 1
<location> `.github/scripts/docker.sh:51-58` </location>
<code_context>
+
+# Calculate SOURCE_DATE_EPOCH for reproducible builds
+calculate_source_date_epoch() {
+ local commit_timestamp="${1:-}"
+ local repo_created_at="${2:-}"
+
+ # Calculate SOURCE_DATE_EPOCH for reproducible builds
+ # Priority: commit timestamp > repository creation date > current time
+ local source_date_epoch
+ if [ -n "$commit_timestamp" ]; then
+ # Use commit timestamp (convert from milliseconds to seconds)
+ source_date_epoch=$((commit_timestamp / 1000))
+ elif [ -n "$repo_created_at" ]; then
</code_context>
<issue_to_address>
**issue (bug_risk):** Timestamp handling for SOURCE_DATE_EPOCH is likely incorrect with GitHub context values
`github.event.head_commit.timestamp` and `github.event.repository.created_at` are ISO 8601 strings (e.g. `2025-01-01T12:34:56Z`), not millisecond Unix timestamps. Using `source_date_epoch=$((commit_timestamp / 1000))` will fail arithmetic expansion (and break under `set -e`) or misbehave.
To derive a Unix epoch, parse the timestamps instead, e.g.:
```bash
if [ -n "$commit_timestamp" ]; then
source_date_epoch=$(date -d "$commit_timestamp" +%s)
elif [ -n "$repo_created_at" ]; then
source_date_epoch=$(date -d "$repo_created_at" +%s)
else
source_date_epoch=$(date +%s)
fi
```
Also update the comment about "convert from milliseconds to seconds" to reflect this behavior. As implemented, `calculate_source_date_epoch` is likely to fail when used in the workflow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
📚 Documentation Preview
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1130 +/- ##
==========================================
- Coverage 40.24% 40.24% -0.01%
==========================================
Files 204 204
Lines 14372 14373 +1
Branches 1686 1686
==========================================
Hits 5784 5784
- Misses 8588 8589 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Containerfile (1)
193-198: Potential typo in npm cache directory name.Line 196 removes
/home/nonroot/.npm/_cacache_with a trailing underscore. The standard npm cache directory is_cacache(no trailing underscore). This may silently fail to clean the intended directory.🔎 Proposed fix
- rm -rf /home/nonroot/.npm/_cacache_; \ + rm -rf /home/nonroot/.npm/_cacache; \docs/content/selfhost/install/docker.md (1)
59-62: Git clone path argument appears incorrect.Line 60 shows
git clone https://github.com/allthingslinux/tux.git /optbut line 61 then doescd /opt/tux.Git clone creates a subdirectory by default, so the command would create
/opt/tux. However, specifying/optas the destination would clone directly into/opt, not/opt/tux.🔎 Proposed fix
-git clone https://github.com/allthingslinux/tux.git /opt +git clone https://github.com/allthingslinux/tux.git /opt/tux cd /opt/tux
🧹 Nitpick comments (6)
docs/content/selfhost/config/database.md (1)
23-24: Clarify rationale for base image comparison in the note.The new "Base Images" note provides a useful comparison, but it would benefit from explaining why this choice matters to users. The current phrasing emphasizes size difference but doesn't clarify the implications (e.g., reduced attack surface, dependency availability, performance trade-offs, etc.). This would help users understand the design decision.
Additionally, verify that hardcoding specific image versions (e.g.,
python:3.13.8-slim) won't become stale as the project evolves, particularly if the Docker build process intends to support multiple Python versions dynamically.🔎 Suggested enhancement to clarify the note
Consider expanding the note to explain why base image choices matter:
!!! note "Base Images" - The Tux application uses `python:3.13.8-slim` (Debian-based), while PostgreSQL uses `postgres:17-alpine` (Alpine-based) for a smaller database image size. + The Tux application uses `python:3.13.8-slim` (Debian-based) for better library compatibility, while PostgreSQL uses `postgres:17-alpine` (Alpine-based) for a smaller footprint and reduced dependencies. This keeps the database image lightweight while ensuring Python has sufficient compatibility for required packages..github/scripts/docker.sh (2)
49-69: Document the milliseconds assumption for timestamp inputs.The function assumes
commit_timestampandrepo_created_atare in milliseconds and divides by 1000. This should be documented in comments, and ideally validated. If callers pass seconds instead of milliseconds, the resulting epoch will be incorrect (e.g., year 1970).🔎 Suggested improvement
# Calculate SOURCE_DATE_EPOCH for reproducible builds -# Priority: commit timestamp > repository creation date > current time +# Priority: commit timestamp > repository creation date > current time +# Note: Input timestamps are expected in MILLISECONDS (e.g., from GitHub API) calculate_source_date_epoch() { local commit_timestamp="${1:-}" local repo_created_at="${2:-}" local source_date_epoch if [ -n "$commit_timestamp" ]; then - # Use commit timestamp (convert from milliseconds to seconds) + # Use commit timestamp (convert from milliseconds to seconds for Unix epoch) source_date_epoch=$((commit_timestamp / 1000))
71-74: Consider writing toGITHUB_OUTPUTfor consistency.Unlike
calculate_source_date_epochand other functions that write to$GITHUB_OUTPUT,generate_build_dateonly echoes to stdout. If this is intentional (for command substitution usage), consider documenting the different usage pattern.🔎 Optional: Add GITHUB_OUTPUT for workflow consistency
# Generate BUILD_DATE in ISO 8601 format generate_build_date() { - date -u +'%Y-%m-%dT%H:%M:%SZ' + local build_date + build_date=$(date -u +'%Y-%m-%dT%H:%M:%SZ') + if [ -n "${GITHUB_OUTPUT:-}" ]; then + echo "build_date=$build_date" >> "$GITHUB_OUTPUT" + fi + echo "$build_date" }src/tux/services/wrappers/tldr.py (1)
21-27: Good fix for relative path resolution, with minor simplification opportunity.The change correctly resolves relative paths to absolute to avoid permission issues in containerized environments. However, the logic can be slightly simplified since
Path.resolve()works correctly on both relative and absolute paths.🔎 Simplified version
# Configuration constants following 12-factor app principles # Resolve relative paths to absolute to avoid permission issues _cache_dir = os.getenv("TLDR_CACHE_DIR", ".cache/tldr") -CACHE_DIR: Path = ( - Path(_cache_dir).resolve() - if not Path(_cache_dir).is_absolute() - else Path(_cache_dir) -) +CACHE_DIR: Path = Path(_cache_dir).resolve()
Path.resolve()returns an absolute path for both relative and absolute inputs, making the conditional unnecessary. For an already-absolute path,resolve()simply returns it (after resolving any symlinks, which is generally desirable).docker/entrypoint.sh (1)
101-116: Dead code:cleanupfunction will never execute afterexec.Since
start_bot_with_retrynow usesexec tux start(line 97), the shell process is replaced and control never returns. This means:
- The trap handlers set on line 119 won't catch signals after exec
- The
BOT_PIDvariable will never be set- The
cleanupfunction is effectively dead codeThis isn't a bug (signals go directly to the bot process, which handles its own shutdown), but the cleanup function and trap setup are now misleading.
🔎 Suggested cleanup
Consider either:
Option A: Remove dead code
-# Signal handlers for graceful shutdown -cleanup() { - echo "" - echo "Received shutdown signal" - echo "Performing cleanup..." - - # Kill any child processes - if [ -n "$BOT_PID" ]; then - echo "Stopping bot process (PID: $BOT_PID)..." - kill -TERM "$BOT_PID" 2> /dev/null || true - wait "$BOT_PID" 2> /dev/null || true - fi - - echo "Cleanup complete" - exit 0 -} - -# Set up signal handlers -trap cleanup SIGTERM SIGINTOption B: Keep minimal trap for pre-exec phase
-cleanup() { - echo "" - echo "Received shutdown signal" - echo "Performing cleanup..." - - # Kill any child processes - if [ -n "$BOT_PID" ]; then - echo "Stopping bot process (PID: $BOT_PID)..." - kill -TERM "$BOT_PID" 2> /dev/null || true - wait "$BOT_PID" 2> /dev/null || true - fi - - echo "Cleanup complete" - exit 0 -} +# Signal handler for graceful shutdown during startup phase (before exec) +cleanup() { + echo "" + echo "Received shutdown signal during startup" + exit 0 +}.github/workflows/docker.yml (1)
189-216: Pin Docker Scout action to a specific SHA for supply chain security.All instances of
docker/scout-action@v1use a floating major version tag that will update with any v1.x release, creating supply chain security risk. Consider pinning to a specific commit SHA (e.g.,f8c776824083494ab0d56b8105ba2ca85c86e4defor v1.18.2), consistent with the approach used throughout this workflow for other actions likedocker/build-push-action,docker/setup-buildx-action, anddocker/login-action.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.dockerignore.github/scripts/docker.sh.github/workflows/docker.ymlContainerfilecompose.production.yamlcompose.yamldocker/entrypoint.shdocs/content/developer/best-practices/ci-cd.mddocs/content/developer/guides/docker-images.mddocs/content/reference/docker-production.mddocs/content/selfhost/config/database.mddocs/content/selfhost/install/docker.mddocs/content/selfhost/manage/docker.mddocs/content/support/troubleshooting/docker.mdsrc/tux/services/wrappers/tldr.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use strict type hints withType | Noneinstead ofOptional[Type]
Use NumPy docstrings for documenting functions and classes
Prefer absolute imports; relative imports allowed only within the same module
Organize imports in order: stdlib → third-party → local
Use 88 character line length
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Always add imports to the top of the file unless absolutely necessary
Use async/await for I/O operations
Use custom exceptions for business logic with context logging and meaningful user messages
Use Pydantic for data validation
Keep files to a maximum of 1600 lines
Use one class or function per file when possible
Use descriptive filenames
Add appropriate logging to services and error handlers
Files:
src/tux/services/wrappers/tldr.py
🧬 Code graph analysis (1)
docker/entrypoint.sh (2)
src/tux/core/app.py (1)
start(124-177)src/tux/core/bot.py (1)
_post_ready_startup(183-214)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Validate
- GitHub Check: Run All Tests (3.13.8)
- GitHub Check: Sourcery review
- GitHub Check: Seer Code Review
🔇 Additional comments (25)
docs/content/selfhost/config/database.md (1)
21-21: Verify Python version matches current Dockerfile after refactoring.Line 21 references the Alpine-based postgres:17-alpine, which is correct. However, ensure that any references to specific Python versions (like
python:3.13.8-slimmentioned later in the note block) accurately reflect the current Dockerfile, especially given the PR's mention of dynamic Python version detection and cleanup.docs/content/developer/best-practices/ci-cd.md (2)
146-148: LGTM!The documentation update accurately reflects the switch from Trivy to Docker Scout for container vulnerability scanning, and the expanded description mentioning policy evaluation is helpful.
184-184: LGTM!Good addition noting the PR comparison feature for Docker Scout, which aligns with the workflow changes in this PR.
.dockerignore (1)
55-101: LGTM!Comprehensive and well-organized additions to the
.dockerignore. The exclusions appropriately cover:
- Development and test artifacts
- Runtime-mounted directories (plugins, data)
- Documentation and build config files
- IDE-specific files
The inline comments explaining the rationale (e.g., plugins as runtime volumes) are helpful for maintainability.
docs/content/selfhost/manage/docker.md (1)
1-319: Well-structured operations guide.Comprehensive documentation covering service management, backup/restore, monitoring, and maintenance. The commands are practical and the explanatory notes (like the alpine container usage) are helpful.
docs/content/support/troubleshooting/docker.md (1)
1-177: LGTM!Well-structured troubleshooting guide with clear problem categories and actionable solutions. The step-by-step verification commands are helpful for debugging common Docker issues.
.github/scripts/docker.sh (1)
37-47: LGTM!Simple and effective validation for required build configuration.
docs/content/developer/guides/docker-images.md (1)
1-164: LGTM!Comprehensive guide covering local builds, multi-stage architecture, optimization tips, and troubleshooting. The build stages documentation aligns well with the Containerfile changes in this PR.
docker/entrypoint.sh (2)
16-47: LGTM!Good improvements to the database connection waiting logic:
- Sensible defaults for
POSTGRES_HOSTandPOSTGRES_PORT- Consistent defaults in both shell and Python code
- Improved error messaging with connection details for debugging
71-98: Good use ofexecfor proper signal handling.Using
execto replace the shell process ensures signals (SIGTERM, SIGINT) are delivered directly to the bot process, which is the correct pattern for container entrypoints. The inline comments explaining this behavior are helpful.compose.yaml (3)
7-7: LGTM on restart policy change.Using
restart: falseis cleaner YAML syntax compared to the quoted string'no'. Both are semantically equivalent for disabling automatic restarts.
77-86: Healthcheck pattern looks correct but consider edge cases.The
grep -v grep | grep -q "tux start"pattern correctly excludes the grep process itself. However, this check only verifies that a process with "tux start" in its command line exists—it doesn't confirm the bot is actually connected to Discord or functioning properly.The comment accurately describes this as a "lightweight" check, which is appropriate for container orchestration. The 40s
start_periodprovides adequate time for bot initialization.
140-142: Good separation of dev tooling via profiles.Adding
profiles: [dev]to Adminer ensures it won't start by default in production-like deployments, requiring explicit--profile devto enable. This aligns well with the production/development separation documented in the PR..github/workflows/docker.yml (3)
167-172: Consider cache-to ordering for hybrid caching.The hybrid cache configuration writes to both registry and GHA cache. When
cache-tohas multiple entries, both will be updated. This is correct, but be aware that:
- Registry cache is shared across all branches/PRs (good for cache hits)
- GHA cache is branch-scoped by default
This setup should work well for your use case. The registry cache provides cross-branch sharing while GHA cache serves as a fast fallback.
392-399: Good implementation of test-before-push pattern.The Docker Scout CVE scan with
exit-code: 1will fail the workflow if critical or high-severity vulnerabilities are found before pushing. This is a solid security gate.Note that
continue-on-erroris not set here (unlike the comparison steps), so this will properly block the push on vulnerabilities.
404-449: The second build should be fast due to cache, but consider potential race conditions.The comment correctly notes this rebuild will be fast due to BuildKit cache. However, both validation and push builds write to the same
buildcachetag. In parallel builds (different PRs/branches), this could cause cache invalidation.For production releases triggered by tags, this is likely fine since they run sequentially. Just be aware if you expand this pattern.
Containerfile (4)
1-36: Well-structured common stage for shared configuration.The common stage effectively centralizes:
- OCI labels for traceability
- Non-root user creation
- dpkg optimizations to reduce image size
- Python environment variables
This DRY approach ensures consistency between base and production stages.
88-91: Good practice ensuring plugin directory structure exists for runtime mounts.Creating the plugin directories with
.gitkeepensures the mount points exist even though.dockerignoreexcludes the actual plugin files. This prevents mount failures at runtime.
200-218: Excellent use of dynamic Python version detection.Using
python3 -c 'import sys; ...'to detect the Python version at build time instead of hardcodingpython3.13makes the Containerfile more maintainable when Python versions change. The${PYTHON_VERSION:?}and${PYTHON_LIB_PATH:?}parameter expansions will properly fail the build if the variable is empty, providing a safety net.The cleanup operations appropriately use
|| truesince some paths may not exist depending on installed packages.
222-226: Fix healthcheck import statement - module does not exist.The healthcheck command imports
tux.shared.config.env, but this module does not exist in the codebase. The shared/config package only exportsCONFIGand provides submodules likeloaders,generators, andsettings. Update the healthcheck to use a valid import such asfrom tux.shared.config import CONFIGinstead.Likely an incorrect or invalid review comment.
docs/content/reference/docker-production.md (2)
36-44: Clear and helpful comparison table.The feature comparison table effectively communicates the differences between development and production configurations. This will help users choose the appropriate setup for their use case.
131-173: Security documentation accurately reflects implementation.The security features section correctly documents:
- Non-root user (UID/GID 1001) matching Containerfile lines 16-17
- Read-only filesystem with tmpfs matching compose.yaml lines 89-90
- no-new-privileges security option matching compose.yaml line 88
- Minimal base image with pinned digest matching Containerfile line 4
Good alignment between documentation and implementation.
compose.production.yaml (1)
40-93: Production service configuration looks solid.Good production practices implemented:
- Pre-built image usage (no local build)
- Read-only filesystem with tmpfs
no-new-privilegessecurity option- Appropriate volume mounts (config/data only, not source)
- Health checks aligned with development config
The
restart: unless-stoppedpolicy (line 70) is appropriate for the application container.docs/content/selfhost/install/docker.md (2)
101-116: Clear separation of development and production deployment instructions.The updated documentation clearly distinguishes between development and production deployment modes, with appropriate commands for each. The production section correctly references
compose.production.yaml.
299-307: Good cross-references to related documentation.The updated links section provides comprehensive navigation to related docs including the new production deployment reference and Docker troubleshooting guide.
| # Test build context size | ||
| docker build --dry-run -f Containerfile --target production . 2>&1 | grep "Sending build context" | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--dry-run is not a valid docker build flag.
Same issue as in the troubleshooting documentation. The docker build command doesn't support --dry-run.
🔎 Suggested fix
# Test build context size
-docker build --dry-run -f Containerfile --target production . 2>&1 | grep "Sending build context"
+# Start a build and observe the context size in the output
+docker build -f Containerfile --target production . 2>&1 | head -5📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Test build context size | |
| docker build --dry-run -f Containerfile --target production . 2>&1 | grep "Sending build context" | |
| ``` | |
| # Start a build and observe the context size in the output | |
| docker build -f Containerfile --target production . 2>&1 | head -5 |
🤖 Prompt for AI Agents
docs/content/developer/guides/docker-images.md around lines 126-128: the example
uses the non-existent --dry-run flag on docker build; remove that flag and
replace the check with a valid approach — either run docker build without
--dry-run and capture the "Sending build context" output, or instead compute the
build-context size locally by streaming the directory to tar and measuring the
byte count (optionally converting to human-readable) and document that
alternative as the preferred, non-building method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/scripts/docker.sh (1)
37-47: Validation could be more robust.The function only checks if
git_shais non-empty. Consider adding validation for:
- SHA format (40-character hex string for full SHA, 7+ for short)
- SHA length constraints
🔎 Enhanced validation example
validate_build_config() { local git_sha="${1}" # Validate that required build args are available if [ -z "$git_sha" ]; then echo "Error: GIT_SHA is required" exit 1 fi + + # Validate SHA format (hex characters only, 7-40 chars) + if ! [[ "$git_sha" =~ ^[0-9a-f]{7,40}$ ]]; then + echo "Error: GIT_SHA must be 7-40 hexadecimal characters" + exit 1 + fi + echo "✓ Build configuration validated" }docker/entrypoint.sh (1)
70-98: Bot startup logic with exec is correct.Using
execto replace the shell process withtux start(line 97) ensures proper signal handling, which is essential for graceful shutdown in containers. The retry logic appropriately applies only to configuration validation, not bot runtime failures.The
shellcheck disable=SC2093comment (line 96) may be unnecessary sinceexec tux startdoesn't trigger that rule. Consider removing it unless there's a specific reason.🔎 Optional cleanup
echo "Configuration validated. Starting bot..." - # shellcheck disable=SC2093 exec tux start
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/scripts/docker.shContainerfilecompose.production.yamldocker/entrypoint.shdocs/content/developer/guides/docker-images.mddocs/content/reference/docker-production.mddocs/content/selfhost/manage/docker.mddocs/content/support/troubleshooting/docker.md
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/content/selfhost/manage/docker.md
- docs/content/developer/guides/docker-images.md
🧰 Additional context used
🧬 Code graph analysis (1)
docker/entrypoint.sh (1)
src/tux/core/app.py (1)
start(124-177)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run All Tests (3.13.8)
- GitHub Check: Sourcery review
- GitHub Check: Seer Code Review
🔇 Additional comments (15)
compose.production.yaml (1)
10-39: PostgreSQL service configuration looks good.The restart policy has been updated to
unless-stopped, addressing the previous concern about automatic restart in production. The pinned image digest, healthcheck configuration, and logging settings are appropriate for production deployment..github/scripts/docker.sh (2)
49-70: SOURCE_DATE_EPOCH calculation looks correct.The implementation now correctly parses ISO 8601 timestamp strings using
date -d, addressing the previous concern about incorrect timestamp handling. The comments clearly document the expected input format.Note:
date -dis GNU-specific and won't work on BSD/macOS. Since GitHub Actions runs on Linux by default, this should be fine. If cross-platform compatibility is needed, consider adding a check.
72-75: Build date generation is correct.The ISO 8601 UTC format is appropriate for build timestamps.
docs/content/reference/docker-production.md (2)
22-34: Usage examples are clear and correct.The documentation properly shows how to deploy with specific tagged versions. The previous concern about a misleading "main" tag example has been addressed.
128-171: Security documentation is comprehensive.The security features section clearly documents the production hardening measures: non-root user, read-only filesystem, security options, and minimal base image. This aligns well with the actual implementation in the Containerfile.
docs/content/support/troubleshooting/docker.md (1)
131-141: Build context size check is now correct.The documentation has been updated to use a valid approach for checking build context size, addressing the previous concern about the invalid
--dry-runflag.Containerfile (6)
1-36: Common stage refactoring improves maintainability.Introducing a shared
commonstage that centralizes user setup, environment variables, and dpkg configuration is a solid improvement. This reduces duplication across the base, dev, and production stages.
84-91: Plugin directory scaffolding is well-documented.Creating empty plugin directory structures for runtime volume mounts is appropriate. The comments clearly explain why the directories exist despite plugins being excluded from the image via
.dockerignore.
200-218: Dynamic Python version detection improves maintainability.Detecting the Python version at runtime (line 203) instead of hardcoding it is a significant improvement. This makes the Containerfile more maintainable and resilient to Python version changes. The comprehensive cleanup logic effectively reduces image size.
146-146: TLDR cache path improvement enhances reliability.Setting
TLDR_CACHE_DIRto an absolute path/app/.cache/tldr(instead of a relative path) prevents permission issues, as mentioned in the PR objectives. This is a good defensive improvement.Also applies to: 180-180
222-223: HEALTHCHECK import correctedThe healthcheck now correctly imports
tux.shared.config.settings, resolving the previous issue with the non-existenttux.shared.config.envmodule.
80-80: The--no-default-groupsflag is valid foruv syncand correctly excludes development dependencies in production builds.docker/entrypoint.sh (3)
16-20: Database connection defaults improve usability.Providing sensible defaults for
POSTGRES_HOSTandPOSTGRES_PORTaligns with the production compose configuration and improves the developer experience. The explicit echo of the target host/port aids debugging.
22-48: Database readiness check is robust and informative.The updated
wait_for_dbfunction properly uses environment-derived connection parameters and provides helpful error messages including host/port details. The emoji-enhanced status messages improve readability.
101-119: Signal handlers may be unused after exec.The
cleanupfunction checksBOT_PID(line 108), but this variable is never set in the script. Additionally, sinceexec tux start(line 97) replaces the shell process, the signal handlers registered at line 119 won't be invoked after the bot starts.If the signal handlers are intended for future use or for validation-phase interruption, consider adding a comment explaining this. Otherwise, this may be dead code.
| volumes: | ||
| # Only mount configuration and data directories (not source code) | ||
| # Note: entrypoint.sh is included in the image, no need to mount from host | ||
| - ./config:/app/config:ro | ||
| - ./src/tux/plugins:/app/tux/plugins:ro | ||
| - ./assets:/app/assets:ro | ||
| - ./data/cache:/app/.cache | ||
| - ./data/temp:/app/temp | ||
| - ./data/user-home:/home/nonroot | ||
| # Note: Plugins are drop-in extensions - mount from host for easy updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Volume mounts may fail in production without source checkout.
Lines 50-52 mount ./config, ./src/tux/plugins, and ./assets from the host. In a production deployment using pre-built images (the stated goal of this compose file), these directories may not exist on the host without a source checkout.
Consider either:
- Documenting that a partial checkout (config, plugins, assets directories) is required for production deployment
- Baking plugins and assets into the production image and only mounting config and data directories
- Providing a minimal production deployment bundle with only necessary directories
🔎 Alternative approach - minimal host mounts
volumes:
# Only mount configuration and data directories (not source code)
# Note: entrypoint.sh is included in the image, no need to mount from host
- ./config:/app/config:ro
- - ./src/tux/plugins:/app/tux/plugins:ro
- - ./assets:/app/assets:ro
- ./data/cache:/app/.cache
- ./data/temp:/app/temp
- ./data/user-home:/home/nonroot
- # Note: Plugins are drop-in extensions - mount from host for easy updates
+ # Note: For custom plugins/assets, create a deployment directory structure:
+ # ./deploy/plugins:/app/tux/plugins:ro (optional)
+ # ./deploy/assets:/app/assets:ro (optional)🤖 Prompt for AI Agents
In compose.production.yaml around lines 47 to 56, the compose mounts ./config,
./src/tux/plugins and ./assets from the host which will fail when running
pre-built images without a source checkout; either document the requirement for
a partial host checkout, or change the image/compose to bake plugins and assets
into the production image and only mount runtime configuration and data (e.g.,
./config:/app/config:ro and data dirs), or provide and reference a minimal
deployment bundle (tar/dir) that contains only the required host paths;
implement one of these fixes and update the compose file and README/deploy docs
accordingly so production deployments do not rely on a full source tree existing
on the host.
- Updated the test for guild creation to include SQLAlchemy's OperationalError in the suppression context, ensuring graceful failure handling when the service is disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/database/test_database_migrations.py (1)
310-311: Correctly broadens exception handling for disconnected database operations.The addition of
sqlalchemy.exc.OperationalErroris appropriate, as SQLAlchemy raises this when attempting operations on a disconnected service. The comment clearly explains the rationale.Optional: Consider making the test more explicit
If you want to verify that a specific exception is raised (rather than just suppressing all expected errors), you could refactor to:
- # Operations should fail gracefully when not connected - # SQLAlchemy raises OperationalError when trying to use a disconnected service - with suppress(RuntimeError, ConnectionError, sqlalchemy.exc.OperationalError): - await guild_controller.create_guild(guild_id=TEST_GUILD_ID) - # If we get here, the service should handle disconnection gracefully + # Operations should fail gracefully when not connected + # SQLAlchemy raises OperationalError when trying to use a disconnected service + with pytest.raises( + (RuntimeError, ConnectionError, sqlalchemy.exc.OperationalError) + ): + await guild_controller.create_guild(guild_id=TEST_GUILD_ID)This would assert that one of the expected exceptions is actually raised, making the test slightly more robust. However, the current approach is valid for testing graceful degradation.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/database/test_database_migrations.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use strict type hints withType | Noneinstead ofOptional[Type]
Use NumPy docstrings for documenting functions and classes
Prefer absolute imports; relative imports allowed only within the same module
Organize imports in order: stdlib → third-party → local
Use 88 character line length
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Always add imports to the top of the file unless absolutely necessary
Use async/await for I/O operations
Use custom exceptions for business logic with context logging and meaningful user messages
Use Pydantic for data validation
Keep files to a maximum of 1600 lines
Use one class or function per file when possible
Use descriptive filenames
Add appropriate logging to services and error handlers
Files:
tests/database/test_database_migrations.py
**/database/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/database/**/*.py: Use SQLModel for type-safe database operations
Use transactions for multi-step database operations
Use model-level validation in SQLModel models
Optimize database queries
Files:
tests/database/test_database_migrations.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run All Tests (3.13.8)
- GitHub Check: Sourcery review
🔇 Additional comments (1)
tests/database/test_database_migrations.py (1)
22-22: LGTM! Import correctly positioned.The import is properly placed in the third-party imports section and is necessary for the enhanced exception handling below.
Pull Request
Description
Provide a clear summary of your changes and reference any related issues. Include the motivation behind these changes and list any new dependencies if applicable.
If your PR is related to an issue, please include the issue number below:
Related Issue: Closes #
Type of Change:
Guidelines
My code follows the style guidelines of this project (formatted with Ruff)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation if needed
My changes generate no new warnings
I have tested this change
Any dependent changes have been merged and published in downstream modules
I have added all appropriate labels to this PR
I have followed all of these guidelines.
How Has This Been Tested? (if applicable)
Please describe how you tested your code. e.g describe what commands you ran, what arguments, and any config stuff (if applicable)
Screenshots (if applicable)
Please add screenshots to help explain your changes.
Additional Information
Please add any other information that is important to this PR.
Summary by Sourcery
Refine Docker-based deployment for Tux with a shared container base, clearer separation of dev vs production usage, and more robust CI image build, scan, and release workflows.
Build:
CI:
Documentation: