feat(metadata-db): eager file_metadata deletion during compaction#1407
feat(metadata-db): eager file_metadata deletion during compaction#1407JohnSwan1503 wants to merge 7 commits intomainfrom
Conversation
…ile_metadata Migrations: - Add footer_cache table to store footer bytes separately from file_metadata - Add url column to file_metadata, denormalized from physical_tables metadata-db: - Update insert to use CTE that writes to both file_metadata and footer_cache - Add url parameter to insert/register_file functions - Update get_footer_bytes_by_id to read from footer_cache instead of file_metadata - Update select queries to use denormalized fm.url instead of joining physical_tables dump: - Add url field to ParquetFileWriterOutput struct - Add url parameter to commit_metadata function - Update all call sites in raw_dataset_writer, derived_dataset, and compactor common: - Update register_file call in physical.rs to pass url parameter Signed-off-by: John Swanson <jswanson@edgeandnode.com>
Introduces a new garbage collection pattern where file_metadata entries are eagerly deleted during compaction while preserving footer_cache entries until the Collector runs. This allows immediate query visibility changes while maintaining footer data for in-flight reads. **Migration changes:** - Drop `footer` column from `file_metadata` (now only in `footer_cache`) - Remove FK constraint from `footer_cache` to `file_metadata` - Remove FK constraint from `gc_manifest` to `file_metadata` - Remove CHECK constraint on `gc_manifest.expiration` **Compaction flow:** - `upsert_gc_manifest` now deletes from `file_metadata` via CTE and inserts into `gc_manifest` - File metadata disappears from queries immediately after compaction - Footer cache entries preserved for reads during grace period **Collector changes:** - No longer deletes from `file_metadata` (already done during compaction) - Now deletes from `footer_cache` after gc_manifest entries expire - Now deletes from `gc_manifest` after physical files are removed **Consistency check:** - Remove orphan file deletion logic (now handled by garbage collector) - Orphaned files may be pending GC via `gc_manifest` **Test utilities:** - Add `test-utils` feature flag to `metadata-db` - Add `count_footer_cache_by_location` method for test assertions - Update `sql_dataset_input_batch_size` test to verify new behavior Signed-off-by: John Swanson <jswanson@edgeandnode.com>
Remove duplicate doc comment and `url` parameter in `files::insert` that were accidentally introduced during merge from main. Signed-off-by: John Swanson <jswanson@edgeandnode.com>
|
This is a follow up of #1403 |
…tadata/deletion-logic
There was a problem hiding this comment.
Review Summary
This PR implements a significant architectural change to the garbage collection pattern, introducing eager deletion of file_metadata entries during compaction while preserving footer_cache entries for in-flight reads. The implementation is well-structured and the test coverage is comprehensive.
✅ Strengths
- Clear separation of concerns: The new pattern cleanly separates metadata visibility (immediate deletion from
file_metadata) from data availability (footer_cachepreservation) - Atomic operations: Good use of CTEs to ensure atomic transitions between tables
- Comprehensive test updates: The test clearly validates the new multi-phase lifecycle
- Removal of side effects: The consistency check function no longer performs deletions, improving code clarity
📝 Suggestions for Improvement
- Migration safety: The schema migration drops critical constraints - ensure deployment is carefully coordinated
- Return values: Consider having deletion methods return affected row counts for better observability
- Performance optimization: Use
UNION ALLinstead ofUNIONin the test utility query - Error handling: Consider adding validation for partial deletion scenarios
⚠️ Points to Consider
- The removal of FK constraints means the application must now maintain referential integrity - ensure all code paths handle this correctly
- Silent ignoring of non-existent file IDs in
upsert_gc_manifest- verify this is intended behavior - The collector now has more deletion responsibilities - ensure proper error handling and retry logic
Overall, this is a well-thought-out change that improves the garbage collection architecture. The code is clean and the test coverage demonstrates the new behavior effectively.
| .delete_file_ids(found_file_ids_to_paths.keys()) | ||
| // Delete from footer_cache (file_metadata was already deleted during compaction) | ||
| metadata_db | ||
| .delete_footer_cache(found_file_ids_to_paths.keys()) |
There was a problem hiding this comment.
Consider adding error handling or logging if the number of deleted entries doesn't match expectations. For example, if delete_footer_cache or delete_gc_manifest operations partially succeed, it might be worth logging the discrepancy.
|
|
||
| -- Remove the CHECK constraint on gc_manifest.expiration | ||
| -- This allows inserting entries with any expiration time | ||
| ALTER TABLE gc_manifest DROP CONSTRAINT gc_manifest_expiration_check; |
There was a problem hiding this comment.
Important: This migration makes breaking schema changes by dropping constraints and columns. Ensure you have:
- A rollback plan if needed
- Verified that no running processes depend on these constraints
- Considered the impact on any concurrent operations during deployment
The removal of FK constraints enables the new eager deletion pattern, but make sure to coordinate the deployment carefully.
There was a problem hiding this comment.
#1403 Prepares the metadata db for the schema changes made here in this PR's migrations.
| let sql = indoc::indoc! {r#" | ||
| WITH deleted_files AS ( |
There was a problem hiding this comment.
The CTE pattern here is elegant - atomically deleting from file_metadata and inserting into gc_manifest in a single query. This ensures consistency without needing an explicit transaction.
However, consider what happens if some file IDs don't exist in file_metadata. The deleted_files CTE will only return rows that were actually deleted, so non-existent IDs will be silently ignored. Is this the intended behavior?
| /// Deletes footer cache entries for the given file IDs. | ||
| /// | ||
| /// Called by the Collector when gc_manifest entries have expired | ||
| /// and physical files are ready to be deleted. | ||
| pub async fn delete_footer_cache( | ||
| &self, | ||
| file_ids: impl Iterator<Item = &FileId>, | ||
| ) -> Result<(), Error> { |
There was a problem hiding this comment.
These new deletion methods (delete_footer_cache and delete_gc_manifest) don't return information about how many rows were actually deleted. Consider returning the count of affected rows, which could be useful for validation and debugging:
| /// Deletes footer cache entries for the given file IDs. | |
| /// | |
| /// Called by the Collector when gc_manifest entries have expired | |
| /// and physical files are ready to be deleted. | |
| pub async fn delete_footer_cache( | |
| &self, | |
| file_ids: impl Iterator<Item = &FileId>, | |
| ) -> Result<(), Error> { | |
| pub async fn delete_footer_cache( | |
| &self, | |
| file_ids: impl Iterator<Item = &FileId>, | |
| ) -> Result<u64, Error> { | |
| let sql = "DELETE FROM footer_cache WHERE file_id = ANY($1)"; | |
| let result = sqlx::query(sql) | |
| .bind(file_ids.map(|id| **id).collect::<Vec<_>>()) | |
| .execute(&*self.pool) | |
| .await | |
| .map_err(Error::Database)?; | |
| Ok(result.rows_affected()) | |
| } |
| // 6. After compaction: | ||
| // - file_metadata: only 1 entry (the compacted file) since old files are eagerly deleted | ||
| // - physical files: 5 (4 original + 1 compacted, waiting for garbage collection) | ||
| // - footer_cache: 5 (preserved for all files until garbage collection) | ||
| let physical_file_count_after = test.file_count("sql_stream_ds", "even_blocks").await; | ||
| let file_metadata_count_after = test | ||
| .file_metadata_count("sql_stream_ds", "even_blocks") | ||
| .await; | ||
| let footer_cache_count_after = test | ||
| .footer_cache_count("sql_stream_ds", "even_blocks") | ||
| .await; | ||
|
|
||
| assert_eq!(file_count_after, 5); | ||
| assert_eq!( | ||
| file_metadata_count_after, 1, |
There was a problem hiding this comment.
Excellent test updates! The test now properly validates the new behavior where:
file_metadataentries are eagerly deleted during compactionfooter_cacheentries are preserved until garbage collection- Physical files remain until the collector runs
The assertions clearly document the expected state at each phase of the lifecycle.
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: JohnSwan1503 <82048481+JohnSwan1503@users.noreply.github.com>
This reverts commit aa913c4. Bad bot! Signed-off-by: John Swanson <jswanson@edgeandnode.com>
Introduces a new garbage collection pattern where file_metadata entries are eagerly deleted during compaction while preserving footer_cache entries until the Collector runs. This allows immediate query visibility changes while maintaining footer data for in-flight reads.
Migration changes:
footercolumn fromfile_metadata(now only infooter_cache)footer_cachetofile_metadatagc_manifesttofile_metadatagc_manifest.expirationCompaction flow:
upsert_gc_manifestnow deletes fromfile_metadatavia CTE and inserts intogc_manifestCollector changes:
file_metadata(already done during compaction)footer_cacheafter gc_manifest entries expiregc_manifestafter physical files are removedConsistency check:
gc_manifestTest utilities:
test-utilsfeature flag tometadata-dbcount_footer_cache_by_locationmethod for test assertionssql_dataset_input_batch_sizetest to verify new behavior