Skip to content

Conversation

@el-yawd
Copy link

@el-yawd el-yawd commented Dec 11, 2025

Description

Related Issues

Closes #

Checklist when merging to main

  • No compiler warnings (if applicable)
  • Code is formatted with rustfmt
  • No useless or dead code (if applicable)
  • Code is easy to understand
  • Doc comments are used for all functions, enums, structs, and fields (where appropriate)
  • All tests pass
  • Performance has not regressed (assuming change was not to fix a bug)
  • Version number has been updated in helix-cli/Cargo.toml and helixdb/Cargo.toml

Additional Notes

Greptile Overview

Greptile Summary

This PR adds comprehensive support for upsert operations (UpsertN, UpsertE, UpsertV) to HelixDB, enabling update-or-insert semantics for nodes, edges, and vectors. The implementation follows the existing patterns for Add operations and includes changes across the grammar, parser, analyzer, code generator, and runtime engine.

Key Changes

  • Grammar: Added UpsertN, UpsertE, and UpsertV syntax matching AddN, AddE, and AddV patterns
  • Core Logic: Implemented upsert operations that check if an entity exists in the iterator, merging properties if found or creating new entities if not
  • Property Handling: Upsert merges new properties with existing ones, preserving unchanged properties and updating modified ones
  • Indexing: Properly maintains secondary indices and BM25 full-text search indices during both update and insert paths
  • Type Safety: Full type validation and error checking in the analyzer matching existing Add operations

Observations

  • The upsert mechanism relies on checking self.inner.next() to determine if an entity exists, but there's no clear documentation or examples showing how users should pass existing entities to trigger the update path vs the insert path
  • Integration tests verify parser compilation but don't test actual runtime upsert behavior (whether updates or inserts actually work correctly)
  • The PR checklist shows "Version number has been updated" is unchecked, but Cargo.toml files show version was bumped to 1.1.4
  • Secondary index update logic is split between handling changed properties (lines 108-147) and new properties (lines 149-172), which could be consolidated for clarity

Recommendations

  1. Add usage documentation and examples demonstrating both update and insert scenarios
  2. Add runtime tests that execute actual upsert operations and verify property merging, indexing, and database state
  3. Consider clarifying the expected usage pattern (e.g., does upsert require a preceding query step like existing <- N<Person>(id) followed by updated <- existing::UpsertN<Person>({...})?)

Important Files Changed

File Analysis

Filename Score Overview
helix-db/src/grammar.pest 5/5 Added grammar rules for UpsertN, UpsertE, and UpsertV operations, mirroring the structure of AddN, AddE, and AddV. Updated step and creation_stmt rules to include upsert operations.
helix-db/src/helix_engine/traversal_core/ops/util/upsert.rs 3/5 Implemented core upsert logic for nodes, edges, and vectors. The implementation checks if an entity exists in the iterator, updates it if found, or creates a new one if not. Handles property merging, secondary indices, and BM25 updates. However, lacks clear mechanism for how existing entities are identified and passed to upsert operations.
helix-db/src/helixc/analyzer/methods/infer_expr_type.rs 4/5 Added type inference logic for UpsertNode, UpsertEdge, and UpsertVector expressions, following the same pattern as Add operations with property validation and type checking.
helix-db/src/helixc/parser/creation_step_parse_methods.rs 5/5 Implemented parser methods for UpsertN, UpsertE, and UpsertV, following the same structure as Add operations. Includes comprehensive unit tests.
helix-db/tests/upsert_integration_test.rs 4/5 Comprehensive integration tests covering basic upsert operations, property handling, and complex scenarios. Tests verify parser compilation but don't test actual runtime upsert behavior (update-or-insert logic).

Sequence Diagram

sequenceDiagram
    participant User
    participant Parser
    participant Analyzer
    participant Generator
    participant Runtime
    participant Storage

    User->>Parser: UpsertN<Person>({name: "Alice"})
    Parser->>Parser: parse_upsert_node()
    Parser->>Analyzer: UpsertNode AST
    
    Analyzer->>Analyzer: Validate node type exists
    Analyzer->>Analyzer: Validate properties & types
    Analyzer->>Generator: Generate UpsertN code
    
    Generator->>Generator: Create upsert_n() call
    Generator->>Runtime: Execute traversal with upsert_n()
    
    Runtime->>Runtime: Check if entity in iterator
    
    alt Entity exists (iterator has item)
        Runtime->>Runtime: Merge properties (old + new)
        Runtime->>Storage: Update secondary indices
        Runtime->>Storage: Update BM25 index
        Runtime->>Storage: Update node in nodes_db
        Runtime-->>User: Return updated node
    else Entity does not exist (iterator empty)
        Runtime->>Runtime: Create new node with UUID
        Runtime->>Storage: Insert secondary indices
        Runtime->>Storage: Insert BM25 index
        Runtime->>Storage: Insert node in nodes_db
        Runtime-->>User: Return new node
    end
Loading

xav-db and others added 5 commits December 11, 2025 14:15
…compilation error handling to helix build (HelixDB#751)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<h3>Greptile Summary</h3>


This PR makes two key improvements:

**1. Removes static globals from config generation** - Replaces
process-wide `OnceLock` statics (`INTROSPECTION_DATA`,
`SECONDARY_INDICES`) with instance-scoped data flow through
`GeneratedSource`. This architectural change improves thread safety and
eliminates potential issues with concurrent compilation of multiple
instances.

**2. Adds Rust compilation error handling for Docker builds** - When a
Docker build fails due to Rust compilation errors (potentially from a
bug in the Helix code generator), the CLI now detects this, displays a
helpful error message, and offers to open a pre-filled GitHub issue with
diagnostic information.

- Introspection data and secondary indices now flow through
`GeneratedSource` instead of global statics
- New `DockerBuildError` enum distinguishes Rust compilation failures
from other Docker errors
- `Config::fmt_with_schema()` method replaces `Display` impl for code
generation with explicit parameters
- Minor optimization: spinner frames changed from `Vec` to array literal

<details><summary><h3>Important Files Changed</h3></summary>



File Analysis



| Filename | Score | Overview |
|----------|-------|----------|
| helix-db/src/helixc/analyzer/mod.rs | 5/5 | Removed static
`INTROSPECTION_DATA` and `SECONDARY_INDICES` OnceLocks, moving data into
`GeneratedSource` struct for thread-safe, instance-scoped operation. |
| helix-db/src/helixc/generator/mod.rs | 5/5 | Added
`introspection_data` and `secondary_indices` fields to `Source` struct;
Display impl now calls `fmt_with_schema` for proper code generation. |
| helix-db/src/helix_engine/traversal_core/config.rs | 4/5 | Refactored
config formatting - moved Display logic to `fmt_with_schema` method that
takes introspection data as parameters instead of reading from statics.
Contains outdated comment referencing removed `INTROSPECTION_DATA`. |
| helix-cli/src/commands/build.rs | 5/5 | Added Rust compilation error
handling during Docker builds with GitHub issue creation workflow for
user bug reporting. |
| helix-cli/src/docker.rs | 5/5 | Added `DockerBuildError` enum to
distinguish Rust compilation failures from other Docker errors, enabling
targeted error handling. |
| helix-cli/src/utils.rs | 5/5 | Changed spinner frames from `vec![]` to
array literal `[]` - minor performance improvement avoiding heap
allocation. |

</details>


</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant CLI as helix build
    participant Analyzer
    participant Generator
    participant Docker
    participant Browser

    User->>CLI: helix build instance_name
    CLI->>Analyzer: analyze(source)
    Analyzer->>Analyzer: Build introspection_data
    Analyzer->>Analyzer: Collect secondary_indices
    Analyzer-->>CLI: GeneratedSource with embedded data
    CLI->>Generator: fmt_with_schema(introspection_data, secondary_indices)
    Generator-->>CLI: Generated Rust code
    CLI->>Docker: build_image()
    
    alt Rust Compilation Error
        Docker-->>CLI: DockerBuildError::RustCompilation
        CLI->>CLI: handle_docker_rust_compilation_failure()
        CLI->>User: Display error, offer to create issue
        opt User accepts
            CLI->>Browser: Open GitHub issue with diagnostics
        end
    else Success
        Docker-->>CLI: Ok(())
        CLI-->>User: Build successful
    end
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
Needs some cleaning and better error handling tho
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. helix-db/src/helix_engine/traversal_core/ops/util/upsert.rs, line 309 (link)

    style: TODO comment indicates error handling needs improvement - operations continue even after errors occur, wasting resources. Consider early returns or using ? operator to propagate errors immediately.

  2. helix-db/src/helix_engine/traversal_core/ops/util/upsert.rs, line 62-73 (link)

    style: The upsert logic checks self.inner.next() to determine if an entity exists, but there's no clear mechanism or documentation explaining how users should pass an existing entity to the upsert operation. The tests only verify parser compilation, not actual update behavior. Consider adding documentation/examples showing the update path and runtime tests that verify both insert and update cases.

  3. helix-db/tests/upsert_integration_test.rs, line 232-254 (link)

    style: Test only verifies parsing, not runtime behavior. Add runtime tests that: (1) create an entity then upsert it to verify update, (2) upsert a non-existent entity to verify insert, (3) test property merging, (4) verify secondary indices and BM25 updates.

  4. helix-db/src/helix_engine/traversal_core/ops/util/upsert.rs, line 108-147 (link)

    style: When updating secondary indices for changed properties, if old.get(k) returns None, you continue without creating the new index. New properties are handled separately in the diff logic (lines 149-172). Consider consolidating this logic for clarity.

14 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@xav-db
Copy link
Member

xav-db commented Dec 12, 2025

@el-yawd should add some helixql tests in hql-tests and unit tests

@xav-db xav-db changed the base branch from main to dev December 12, 2025 16:03
@el-yawd el-yawd force-pushed the upserts branch 2 times, most recently from 84d1096 to d42b17c Compare December 13, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants