Skip to content

Conversation

@msukkari
Copy link
Contributor

Clean up temporary Zoekt shard files on indexing failure to prevent disk space exhaustion.

When zoekt-git-index fails during repository indexing, it leaves behind .tmp shard files. These accumulate over time, especially for repos that repeatedly fail to index, leading to disk space issues. This PR adds logic to automatically remove these temporary files immediately after an indexing operation fails.


Linear Issue: SOU-306

Open in Cursor Open in Web

When zoekt-git-index fails during repository indexing, it can leave behind
.tmp shard files that accumulate over time and fill up disk space. This is
especially problematic for large repos that repeatedly fail to index.

Changes:
- Add cleanupTempShards() function to zoekt.ts that removes temporary shard
  files (files with .tmp in their name) for a specific repository
- Call cleanupTempShards() in repoIndexManager.ts when indexGitRepository
  fails, before re-throwing the error

This ensures that even if a repository consistently fails to index, the
temporary files created during each attempt are cleaned up.

Co-authored-by: michael <michael@sourcebot.dev>
@cursor
Copy link

cursor bot commented Jan 28, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

try {
const { durationMs } = await measure(() => indexGitRepository(repo, this.settings, revisions, signal));
const indexDuration_s = durationMs / 1000;
logger.info(`Indexed ${repo.name} (id: ${repo.id}) in ${indexDuration_s}s`);
Copy link
Contributor

@brendan-kellam brendan-kellam Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fair workaround and we should probably have a mechanism for cleaning up these files, but it doesn't really address the root cause for why these files are being created in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants