-
Notifications
You must be signed in to change notification settings - Fork 55
Query regex #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.x
Are you sure you want to change the base?
Query regex #775
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (2)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe changes introduce regex query support and trigram index capabilities across the database adapter layer. New abstract methods in the base Adapter class expose regex (PCRE/POSIX) and trigram support flags, each adapter implements these with appropriate return values, and Query/Database classes add regex queries and trigram indexes with corresponding validation logic. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…hing with existing collections
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Database/Query.php (1)
254-301: AddTYPE_REGEXtoisMethod()to allow parsing regex queries
TYPE_REGEXis missing fromisMethod(), soparse()/parseQuery()will currently reject queries whosemethodis"regex"even though the type is otherwise supported. That makes the JSON/array parsing path inconsistent with the fluentQuery::regex()helper.Consider adding it to the
match:Proposed fix
public static function isMethod(string $value): bool { return match ($value) { self::TYPE_EQUAL, @@ self::TYPE_SELECT, self::TYPE_VECTOR_DOT, self::TYPE_VECTOR_COSINE, - self::TYPE_VECTOR_EUCLIDEAN => true, + self::TYPE_VECTOR_EUCLIDEAN, + self::TYPE_REGEX => true, default => false, }; }
🧹 Nitpick comments (6)
src/Database/Adapter/Mongo.php (1)
2744-2762: Regex/trigram capability flags for Mongo are consistent (minor naming nit)Declaring:
getSupportForPRCERegex(): truegetSupportForPOSIXRegex(): falsegetSupportForTrigramIndex(): falsematches Mongo’s actual capabilities and aligns with the new abstract methods in
Adapterand their use inPooland other adapters. The only nit is the “PRCE” spelling, which likely wants to be “PCRE”; since this is a newly introduced API, now would be the easiest time to correct the name acrossAdapter,Pool, and all adapters if you want to fix it.Also applies to: 3246-3249
tests/unit/Validator/IndexTest.php (1)
481-598: Trigram index validation test is thorough; consider using named args for clarityThe new
testTrigramIndexValidation()nicely exercises all key paths incheckTrigramIndexes():
- happy paths (single and multi string attributes),
- non‑string and mixed‑type failures,
- orders/lengths disallowed,
- and the “feature disabled” case.
Given the growing boolean flag list on
Index::__construct, the first validator uses a namedsupportForTrigramIndexes: trueargument, which is clear. For\$validatorNoSupport, you currently rely on a long positional boolean chain; mirroring the named argument style there (e.g.,supportForTrigramIndexes: false) would reduce the risk of future parameter‑order mistakes.src/Database/Adapter.php (1)
1446-1478: Consider fixing thegetSupportForPRCERegexname before the API solidifiesThe new capability surface looks good:
getSupportForTrigramIndex()as an abstract flag.- Separate PCRE vs POSIX regex flags plus a
getSupportForRegex()helper.However, the PCRE method is consistently named
getSupportForPRCERegex()(letters swapped) while the docblock correctly refers to “PCRE (Perl Compatible Regular Expressions)”. Because this is an abstract method on the coreAdaptertype, the typo becomes part of the public API and must be implemented everywhere.Before this spreads further, I’d strongly recommend renaming it to
getSupportForPCRERegex()and updating all implementations and call sites in this PR; doing so later will be a breaking change.To scope the rename, you can search for the current spelling:
#!/bin/bash rg -n "getSupportForPRCERegex" -C2tests/e2e/Adapter/Scopes/DocumentTests.php (1)
6552-7034:testFindRegexis thorough; consider a few small cleanupsThe test logic and coverage across engines/adapters look solid and consistent with the new
Query::regexsemantics. A few nits you might want to tidy up:
- Several
$patternlocals are assigned but never used (e.g., around Line 6713, 6727, 6741, 6753, 6767). They can be dropped to reduce noise.- The
$verifyRegexQueryhelper always uses PHP’s PCRE engine to validate results. That’s fine for the current pattern set (simple anchors,.*,$, colon, etc.), but if you later add more POSIX‑specific constructs, it may be worth documenting that assumption near the helper to avoid future mismatches.No functional issues; these are just clarity/maintenance tweaks.
src/Database/Database.php (1)
3671-3679: Trigram index creation gating and error messaging are consistent with existing patterns
- The new
case self::INDEX_TRIGRAMbranch correctly guards index creation withgetSupportForTrigramIndex()and throws a clear exception when unsupported.- The “Unknown index type” error text now includes
Database::INDEX_TRIGRAM, keeping user-facing guidance accurate.- Adding the trigram support flag into the
IndexValidatorcall increateIndex()keeps validation behavior aligned across create/alter flows.If you find yourself adding more index types later, consider building the allowed-type list for the exception message from the constants instead of hardcoding the concatenated string, but that’s optional.
Also applies to: 3721-3735
tests/e2e/Adapter/Scopes/IndexTests.php (1)
176-178: Consider documenting or using named parameter for the hardcodedfalsevalue.The hardcoded
falseon line 177 is not self-documenting. While PHP 8.0+ supports named arguments, at minimum a trailing comment would clarify what this boolean controls, improving maintainability.$database->getAdapter()->getSupportForMultipleFulltextIndexes(), $database->getAdapter()->getSupportForIdenticalIndexes(), - false, + false, // $supportForIndexAlias (or appropriate param name) $database->getAdapter()->getSupportForTrigramIndex()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
src/Database/Adapter.phpsrc/Database/Adapter/MariaDB.phpsrc/Database/Adapter/Mongo.phpsrc/Database/Adapter/Pool.phpsrc/Database/Adapter/Postgres.phpsrc/Database/Adapter/SQL.phpsrc/Database/Adapter/SQLite.phpsrc/Database/Database.phpsrc/Database/Query.phpsrc/Database/Validator/Index.phpsrc/Database/Validator/Queries.phpsrc/Database/Validator/Query/Filter.phptests/e2e/Adapter/Scopes/DocumentTests.phptests/e2e/Adapter/Scopes/IndexTests.phptests/unit/Validator/IndexTest.php
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: ArnabChatterjee20k
Repo: utopia-php/database PR: 613
File: src/Database/Adapter/Postgres.php:1254-1319
Timestamp: 2025-07-01T11:31:37.438Z
Learning: In PostgreSQL adapter methods like getUpsertStatement, complexity for database-specific SQL generation is acceptable when the main business logic is properly separated in the parent SQL adapter class, following the adapter pattern where each database adapter handles its own SQL syntax requirements.
📚 Learning: 2025-10-16T08:48:36.715Z
Learnt from: fogelito
Repo: utopia-php/database PR: 739
File: src/Database/Adapter/Postgres.php:154-158
Timestamp: 2025-10-16T08:48:36.715Z
Learning: For the utopia-php/database repository, no migration scripts are needed for the collation change from utf8_ci to utf8_ci_ai in the Postgres adapter because there is no existing production data.
Applied to files:
src/Database/Adapter/Postgres.php
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
Repo: utopia-php/database PR: 721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/IndexTests.phptests/unit/Validator/IndexTest.phptests/e2e/Adapter/Scopes/DocumentTests.phpsrc/Database/Database.php
📚 Learning: 2025-10-03T01:50:11.943Z
Learnt from: abnegate
Repo: utopia-php/database PR: 721
File: tests/e2e/Adapter/Scopes/AttributeTests.php:1329-1334
Timestamp: 2025-10-03T01:50:11.943Z
Learning: MongoDB has a 1024kb (1,048,576 bytes) limit for index entries. The MongoDB adapter's getMaxIndexLength() method should return this limit rather than 0.
Applied to files:
tests/e2e/Adapter/Scopes/IndexTests.php
📚 Learning: 2025-10-29T12:27:57.071Z
Learnt from: ArnabChatterjee20k
Repo: utopia-php/database PR: 747
File: src/Database/Adapter/Mongo.php:1449-1453
Timestamp: 2025-10-29T12:27:57.071Z
Learning: In src/Database/Adapter/Mongo.php, when getSupportForAttributes() returns false (schemaless mode), the updateDocument method intentionally uses a raw document without $set operator for replacement-style updates, as confirmed by the repository maintainer ArnabChatterjee20k.
Applied to files:
src/Database/Adapter/Mongo.php
📚 Learning: 2025-10-16T09:37:33.531Z
Learnt from: fogelito
Repo: utopia-php/database PR: 733
File: src/Database/Adapter/MariaDB.php:1801-1806
Timestamp: 2025-10-16T09:37:33.531Z
Learning: In the MariaDB adapter (src/Database/Adapter/MariaDB.php), only duplicate `_uid` violations should throw `DuplicateException`. All other unique constraint violations, including `PRIMARY` key collisions on the internal `_id` field, should throw `UniqueException`. This is the intended design to distinguish between user-facing document duplicates and internal/user-defined unique constraint violations.
Applied to files:
src/Database/Database.php
🧬 Code graph analysis (10)
src/Database/Adapter/SQLite.php (5)
src/Database/Adapter/Postgres.php (2)
getSupportForPRCERegex(2122-2125)getSupportForPOSIXRegex(2127-2130)src/Database/Adapter.php (2)
getSupportForPRCERegex(1459-1459)getSupportForPOSIXRegex(1467-1467)src/Database/Adapter/MariaDB.php (2)
getSupportForPRCERegex(2239-2242)getSupportForPOSIXRegex(2244-2247)src/Database/Adapter/Mongo.php (2)
getSupportForPRCERegex(2749-2752)getSupportForPOSIXRegex(2759-2762)src/Database/Adapter/Pool.php (2)
getSupportForPRCERegex(368-371)getSupportForPOSIXRegex(373-376)
src/Database/Adapter/SQL.php (2)
src/Database/Query.php (1)
Query(8-1195)src/Database/Adapter/Postgres.php (1)
getRegexOperator(2148-2151)
src/Database/Adapter.php (5)
src/Database/Adapter/Postgres.php (3)
getSupportForTrigramIndex(2132-2135)getSupportForPRCERegex(2122-2125)getSupportForPOSIXRegex(2127-2130)src/Database/Adapter/MariaDB.php (3)
getSupportForTrigramIndex(2234-2237)getSupportForPRCERegex(2239-2242)getSupportForPOSIXRegex(2244-2247)src/Database/Adapter/Mongo.php (3)
getSupportForTrigramIndex(3246-3249)getSupportForPRCERegex(2749-2752)getSupportForPOSIXRegex(2759-2762)src/Database/Adapter/Pool.php (3)
getSupportForTrigramIndex(378-381)getSupportForPRCERegex(368-371)getSupportForPOSIXRegex(373-376)src/Database/Adapter/SQLite.php (2)
getSupportForPRCERegex(1886-1889)getSupportForPOSIXRegex(1897-1900)
src/Database/Adapter/Pool.php (6)
src/Database/Adapter/Postgres.php (3)
getSupportForPRCERegex(2122-2125)getSupportForPOSIXRegex(2127-2130)getSupportForTrigramIndex(2132-2135)src/Database/Adapter.php (3)
getSupportForPRCERegex(1459-1459)getSupportForPOSIXRegex(1467-1467)getSupportForTrigramIndex(1451-1451)src/Database/Adapter/MariaDB.php (3)
getSupportForPRCERegex(2239-2242)getSupportForPOSIXRegex(2244-2247)getSupportForTrigramIndex(2234-2237)src/Database/Adapter/Mongo.php (3)
getSupportForPRCERegex(2749-2752)getSupportForPOSIXRegex(2759-2762)getSupportForTrigramIndex(3246-3249)src/Database/Adapter/SQLite.php (2)
getSupportForPRCERegex(1886-1889)getSupportForPOSIXRegex(1897-1900)src/Database/Mirror.php (1)
delegate(88-103)
src/Database/Adapter/Postgres.php (3)
src/Database/Adapter.php (3)
getSupportForPRCERegex(1459-1459)getSupportForPOSIXRegex(1467-1467)getSupportForTrigramIndex(1451-1451)src/Database/Database.php (1)
Database(37-8790)src/Database/Adapter/SQLite.php (2)
getSupportForPRCERegex(1886-1889)getSupportForPOSIXRegex(1897-1900)
tests/unit/Validator/IndexTest.php (2)
src/Database/Validator/Index.php (2)
Index(10-622)getDescription(76-79)src/Database/Validator/Query/Base.php (1)
getDescription(25-28)
src/Database/Adapter/Mongo.php (6)
src/Database/Query.php (1)
Query(8-1195)src/Database/Adapter/Postgres.php (3)
getSupportForPRCERegex(2122-2125)getSupportForPOSIXRegex(2127-2130)getSupportForTrigramIndex(2132-2135)src/Database/Adapter.php (3)
getSupportForPRCERegex(1459-1459)getSupportForPOSIXRegex(1467-1467)getSupportForTrigramIndex(1451-1451)src/Database/Adapter/MariaDB.php (3)
getSupportForPRCERegex(2239-2242)getSupportForPOSIXRegex(2244-2247)getSupportForTrigramIndex(2234-2237)src/Database/Adapter/Pool.php (3)
getSupportForPRCERegex(368-371)getSupportForPOSIXRegex(373-376)getSupportForTrigramIndex(378-381)src/Database/Adapter/SQLite.php (2)
getSupportForPRCERegex(1886-1889)getSupportForPOSIXRegex(1897-1900)
tests/e2e/Adapter/Scopes/DocumentTests.php (9)
src/Database/Query.php (3)
Query(8-1195)regex(1191-1194)or(798-801)tests/e2e/Adapter/Base.php (1)
getDatabase(46-46)tests/e2e/Adapter/MongoDBTest.php (1)
getDatabase(32-66)src/Database/Adapter/Postgres.php (4)
getSupportForPRCERegex(2122-2125)getSupportForPOSIXRegex(2127-2130)create(139-168)delete(178-186)src/Database/Adapter/MariaDB.php (4)
getSupportForPRCERegex(2239-2242)getSupportForPOSIXRegex(2244-2247)create(31-46)delete(56-67)src/Database/Adapter/Mongo.php (4)
getSupportForPRCERegex(2749-2752)getSupportForPOSIXRegex(2759-2762)create(328-331)delete(388-393)src/Database/Adapter/Pool.php (4)
getSupportForPRCERegex(368-371)getSupportForPOSIXRegex(373-376)create(133-136)delete(148-151)src/Database/Adapter/SQLite.php (4)
getSupportForPRCERegex(1886-1889)getSupportForPOSIXRegex(1897-1900)create(116-119)delete(129-132)src/Database/Validator/Queries/Document.php (1)
Document(10-44)
src/Database/Database.php (5)
src/Database/Adapter/Postgres.php (2)
getSupportForTrigramIndex(2132-2135)getSupportForObject(2222-2225)src/Database/Adapter.php (2)
getSupportForTrigramIndex(1451-1451)getSupportForObject(1080-1080)src/Database/Adapter/MariaDB.php (2)
getSupportForTrigramIndex(2234-2237)getSupportForObject(2138-2141)src/Database/Adapter/Mongo.php (2)
getSupportForTrigramIndex(3246-3249)getSupportForObject(2814-2817)src/Database/Adapter/Pool.php (2)
getSupportForTrigramIndex(378-381)getSupportForObject(603-606)
src/Database/Validator/Queries.php (3)
src/Database/Query.php (1)
Query(8-1195)tests/e2e/Adapter/Base.php (1)
Base(24-76)src/Database/Validator/Query/Base.php (1)
Base(7-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Adapter Tests (Pool)
- GitHub Check: Adapter Tests (Mirror)
🔇 Additional comments (19)
src/Database/Validator/Query/Filter.php (1)
326-344: TreatingTYPE_REGEXas a single‑value filter is consistentAdding
Query::TYPE_REGEXto the group that requires exactly one value and then delegating toisValidAttributeAndValues()aligns it with other single‑value string filters (search,startsWith, etc.) and correctly reuses existing type validation.src/Database/Query.php (1)
29-29:TYPE_REGEXconstant andregex()helper are wired consistentlyExposing
TYPE_REGEX, adding it toTYPES, and providingQuery::regex($attribute, $pattern)(single pattern value) is consistent with the other helper factories and with the filter validator’s expectation of exactly one value.Also applies to: 69-114, 1184-1194
src/Database/Adapter/SQL.php (1)
1771-1799: Regex operator integration into SQL adapters looks correctRouting
Query::TYPE_REGEXthroughgetSQLOperator()and introducing a dedicatedgetRegexOperator()(defaulting to'REGEXP'here and overridden in adapters like Postgres) cleanly extends the operator surface without affecting existing conditions or bindings.Also applies to: 2289-2295
src/Database/Adapter/Mongo.php (1)
2452-2477:TYPE_REGEXmapping to$regexis correct; be aware it bypassescreateSafeRegex()Mapping
Query::TYPE_REGEXto Mongo’s$regexoperator plugs the new query type cleanly intobuildFilter(), which will now produce:$filter[$attribute]['$regex'] = $pattern;Unlike the higher‑level text helpers (
contains,search,notSearch,not*With), this path does not go throughcreateSafeRegex(), so patterns are taken as raw PCRE. That’s appropriate for a low‑level regex primitive, but it means any safety/validation for user‑supplied patterns must be enforced at a higher layer.From the adapter’s perspective the wiring looks sound.
src/Database/Adapter/SQLite.php (1)
1880-1900: Explicitly declaring SQLite has no built‑in regex support looks correctBoth capability methods returning
falsematch SQLite’s lack of nativeREGEXPwithout user‑defined functions; this is a sensible, conservative default.src/Database/Validator/Queries.php (1)
80-127: RoutingTYPE_REGEXandTYPE_VECTOR_EUCLIDEANthrough filter validators is consistentAdding
Query::TYPE_VECTOR_EUCLIDEANandQuery::TYPE_REGEXto theMETHOD_TYPE_FILTERgroup cleanly integrates them into existing validation without special‑casing.src/Database/Adapter/Pool.php (1)
368-381: Delegation of regex/trigram capability flags is correctly wiredThe three new methods mirror the existing delegation pattern and keep the pool adapter’s capability surface in sync with concrete adapters.
src/Database/Adapter/Postgres.php (3)
155-157: LGTM!The
pg_trgmextension creation follows the established pattern for enabling PostgreSQL extensions alongsidepostgisandvector.
903-931: LGTM - Trigram index SQL generation correctly uses GIN with gin_trgm_ops.The implementation correctly:
- Maps
INDEX_TRIGRAMto'INDEX'for GIN index creation- Applies
gin_trgm_opsoperator class to each attribute for trigram-based similarity searchesNote: Trigram indexes intentionally don't include
_tenantprefix (lines 913-916) since GIN indexes work differently than B-tree indexes for multi-tenant filtering.
2145-2151: LGTM!The
~operator is the correct PostgreSQL POSIX regex match operator for case-sensitive pattern matching, aligning with the PR's documentation that PostgreSQL uses POSIX-style regex.src/Database/Validator/Index.php (3)
47-48: LGTM!The new
$supportForTrigramIndexesparameter follows the established pattern for index type support flags with a sensible default offalse.
142-144: LGTM!The trigram index validation check is correctly integrated into the validation chain.
470-506: LGTM!The
checkTrigramIndexesmethod correctly validates:
- Adapter support for trigram indexes
- String-only attribute type constraint
- No orders or lengths (as these aren't applicable to GIN trigram indexes)
The method intentionally allows multiple attributes (unlike vector/spatial/object indexes), which is appropriate since PostgreSQL GIN indexes can index multiple columns.
src/Database/Database.php (3)
79-88: NewINDEX_TRIGRAMconstant is well-integratedConstant name/value aligns with existing index type conventions and keeps index kinds centrally defined; no issues here.
1631-1646: Propagating trigram support into collection-level index validation looks correctPassing
$this->adapter->getSupportForTrigramIndex()intoIndexValidatorduringcreateCollection()ensures trigram indexes are validated against adapter capabilities at collection creation time, consistent with other feature flags (arrays, spatial, vectors, object, etc.). Argument ordering matches the other call sites, so this looks sound.
2778-2792: Update-attribute path now trigram-aware viaIndexValidatorExtending the
IndexValidatorcall inupdateAttribute()with bothgetSupportForObject()andgetSupportForTrigramIndex()keeps index validation consistent when attribute definitions change, preventing incompatible trigram indexes from slipping through on adapters that don’t support them. The added flags are in the same relative position as in other call sites, which helps avoid ctor-signature drift.tests/e2e/Adapter/Scopes/IndexTests.php (3)
269-271: Same hardcodedfalsepattern as above.Apply the same documentation improvement here for consistency.
652-699: Well-structured trigram index test with proper cleanup.The test correctly:
- Guards with early return when trigram support is unavailable
- Tests CRUD operations for trigram indexes
- Uses
try/finallyto ensure cleanup regardless of test outcome- Verifies index metadata (type, attributes) after creation
701-772: Thorough validation test covering key constraint scenarios.The test comprehensively validates:
- Type constraints (string-only attributes)
- Multi-attribute trigram indexes
- Mixed-type rejection
- Orders/lengths constraints
The use of
assertStringContainsStringfor error message verification is appropriate as it decouples the test from exact wording changes.
|
LGTM let's wait for @abnegate review. |
Query method added
Query::regex
Adapters supporting it
Mongodb, mariadb/mysql, postgres
Mongdb, postgres are case sensitive by default
mysql/mariadb are case insensitive by default
Index added and optimisaton
Mongodb leverage the already present index(https://www.mongodb.com/docs/manual/reference/operator/query/regex/#index-use)
Postgres uses Trigram index
Mysql doesn't use the index or have any external index. It will be always a full table scan
Postgres cases
~ means case sensitive regex match but we can manipulate from the pattern getting passed inside the regex to make it case insenstive.
WHERE username ~ '(?i)param';Optimisation -> using pg_trgm extension -> trigram index
by default pg_trgm is included in the postgres:16 onwards so no need to change the dockerfile
Trigram index -> An index (GIN or GiST) that stores trigrams instead of full strings
word -> author
trigrams ->
"** au**"
"aut"
"uth"
"tho"
"hor"
"**or **"
Difference in working of postgres regex engine
Mongodb and mysql/mariadb are PCRE based
so regex works a bit diffferent
Example -> \bWork\b works as word boundaray in general regex but \b is using a backspace escape in postgres posix based engine. And many more
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.