-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Git Caching Proxy — Design Doc
Overview
Two serving strategies from a single mirror:
- Protocol proxy — intercept git requests, serve from a local mirror via
git http-backend. Fast-path withls-remotediff check; only fetch upstream when refs diverge. - Snapshot distribution — periodically produce self-contained
tar.zstarchives (full checkout + .git history) for fast bootstrapping. Clients untar and are ready immediately, skipping git's expensive checkout computation.
Mirror (shared upstream)
git clone --mirror <upstream> /srv/git/repo.gitConfig
# Protocol
git config protocol.version 2
git config uploadpack.allowFilter true
git config uploadpack.allowReachableSHA1InWant true
# Bitmaps — biggest win for upload-pack
git config repack.writeBitmaps true
git config pack.useBitmaps true
git config pack.useBitmapBoundaryTraversal true
# Commit graph (no --changed-paths; Bloom filters don't help upload-pack)
git config core.commitGraph true
git config gc.writeCommitGraph true
git config fetch.writeCommitGraph true
# Multi-pack-index (avoids full repack on every fetch)
git config core.multiPackIndex true
# Never unpack loose — keep fetched objects as packs
git config transfer.unpackLimit 1
git config fetch.unpackLimit 1
# Disable auto GC — maintenance is explicit
git config gc.auto 0
# Pack performance
git config pack.threads 0
git config pack.deltaCacheSize 512m
git config pack.windowMemory 1gMaintenance
Use git maintenance for routine tasks — it handles incremental repacks, commit-graph writes, loose object packing, and ref compaction with sensible scheduling:
git maintenance register
git maintenance start
git config maintenance.strategy incrementalThis sets up systemd timers / cron automatically. The incremental strategy runs:
commit-graph— incremental split graph writes.incremental-repack— consolidates packs via multi-pack-index using geometric size progression. Avoids expensive full repacks.loose-objects— packs stale loose objects.pack-refs— compresses refs.
Keep a separate cron job for a periodic full repack (daily/weekly, during low-traffic windows) — git maintenance deliberately avoids these, but a single optimally-deltified pack is the best state for upload-pack serving:
# Full repack — schedule during low traffic
git repack -adb --write-midx --write-bitmap-indexFetching
git fetch --prune --prune-tagsLocking considerations
upload-pack(serving clients) — read-only, no locks. Safe to run concurrently.git fetch— briefly lockspacked-refs. The proxy server serializes fetches with its own internal lock.commit-graph write,multi-pack-index write— atomic file renames. Safe anytime.- Full
repack -adb— deletes old packs and swaps in new ones. In-flightupload-packprocesses are safe (open fds survive unlink on Linux), but new readers during the swap window could fail. The multi-pack-index mitigates this via atomic midx updates. Schedule full repacks during low-traffic windows.
Strategy 1: Protocol Proxy
Before proxying a client request, check if upstream has new refs:
UPSTREAM=$(git ls-remote <upstream> | sort)
LOCAL=$(git show-ref | sort)
[ "$UPSTREAM" != "$LOCAL" ] && git fetch --prune --prune-tagsls-remote only exchanges the ref advertisement — cheap when nothing changed. Then serve via git http-backend against the mirror.
Strategy 2: Snapshot Distribution
Setup
Local clone from mirror — git hardlinks objects by default, so no disk duplication:
git clone /srv/git/repo.git /srv/snapshots/repo-fullUpdating the snapshot clone
Over time a long-lived clone's object store drifts from the mirror's — the mirror repacks and deletes old packs, while the clone retains stale hardlinked pack files alongside new fetch packs.
Recommended: re-clone before each snapshot. Delete and re-clone from the mirror. Cheap because it's a local hardlink clone — essentially just cp -al on the object store plus checkout. Guarantees the snapshot always has a clean, compact object store matching the mirror's repacked state.
Snapshot cycle
rm -rf /srv/snapshots/repo-full
git clone /srv/git/repo.git /srv/snapshots/repo-full
cd /srv/snapshots/repo-full
REV=$(git rev-parse --short HEAD)
tar -cf - . | zstd -T0 -3 -o "/srv/snapshots/out/repo-${REV}.tar.zst"tar resolves hardlinks into real file content automatically, so the archive is fully self-contained.
Client usage
zstd -dc repo-abc123.tar.zst | tar xf -
git remote set-url origin <proxy-or-upstream>
git pull # catch up to latest if snapshot is slightly staleKey decisions
| Decision | Rationale |
|---|---|
--mirror for object store |
Single source of truth; bare repo with all refs |
git maintenance for routine tasks |
Handles incremental repack, commit-graph, loose objects, pack-refs with sensible scheduling |
| Separate full repack cron | git maintenance avoids full repacks; a single optimal pack is best for upload-pack serving |
| Local clone for snapshots | Hardlinks objects from mirror — no disk duplication, self-contained from the start |
| Re-clone before snapshot | Clean object store matching mirror's repacked state; cheap because local clone just hardlinks |
No --changed-paths on commit graph |
Bloom filters are expensive to build and only help git log -- <path>, not upload-pack |
--split commit graph |
Incremental layers; only processes new commits per fetch |
ls-remote diff check |
Avoids unnecessary fetches when refs haven't changed |
zstd -T0 -3 |
Good compression/speed tradeoff; -T0 uses all cores |
| Full repack in low-traffic windows | Pack swap can briefly affect new readers; mitigated by multi-pack-index |
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status