Skip to content

Conversation

@ruoliu2
Copy link

@ruoliu2 ruoliu2 commented Jan 17, 2026

Summary

  • Adds index_filter parameter support for the ES-compatible _field_caps endpoint
  • Parses ES Query DSL (term, bool, range, match, etc.) and converts to internal QueryAst
  • Passes query through to leaf nodes for future split-level filtering
  • Adds unit tests and REST API integration tests

Test plan

  • Unit tests for index_filter parsing (6 tests)
  • REST API integration tests for various query types
  • Verify invalid queries return 400 Bad Request
  • Existing field_caps tests still pass

Notes

This implementation accepts and parses the index_filter parameter for API compatibility with Elasticsearch. Full split-level document filtering (executing queries to prune splits with no matching documents) can be added as a follow-up enhancement.

Closes #5693

🤖 Generated with Claude Code

@ruoliu2 ruoliu2 marked this pull request as draft January 17, 2026 09:41
@ruoliu2 ruoliu2 marked this pull request as ready for review January 17, 2026 09:56
@ruoliu2 ruoliu2 force-pushed the feature/5693-index-filter-field-caps-api branch 3 times, most recently from 1753d4c to 60cde2c Compare January 17, 2026 11:13
Implements index_filter parameter support for the ES-compatible
_field_caps endpoint, allowing users to filter field capabilities
based on document queries.

Changes:
- Add query_ast field to ListFieldsRequest and LeafListFieldsRequest protos
- Parse index_filter from ES Query DSL and convert to QueryAst
- Pass query_ast through to leaf nodes for future filtering support
- Add unit tests for index_filter parsing
- Add REST API integration tests

Note: This implementation accepts and parses the index_filter parameter
for API compatibility. Full split-level document filtering will be
added as a follow-up enhancement.

Closes quickwit-oss#5693

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: ruo <ruoliu.dev@gmail.com>
@ruoliu2 ruoliu2 force-pushed the feature/5693-index-filter-field-caps-api branch from 60cde2c to 7173546 Compare January 17, 2026 11:16
search_body: FieldCapabilityRequestBody,
) -> Result<quickwit_proto::search::ListFieldsRequest, ElasticsearchError> {
// Parse index_filter if provided
let query_ast_json: Option<String> = if search_body.index_filter.is_null()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a well-named function with the prototype, to implement this block of code?

(serde_json::Value) -> anyhow::Result<Option<QueryAst>

- quickwit
endpoint: fieldcaps/_field_caps?fields=name
json_body:
index_filter: {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kudos for checking these edge cases too!

// Full query-based split filtering is a TODO for future enhancement
if query_ast.is_some() {
tracing::debug!(
"index_filter provided for field capabilities, but split-level filtering is not yet \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not merge this if it is not implemented.

# Test _field_caps API with index_filter (term query)
method: [POST]
engines:
- quickwit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you testing agains QW only? If this PR's goal is to increase the compatibility with ES it should be tested agains ES as well.

Comment on lines 314 to 319
/// If `query_ast` is provided (from ES-compatible `index_filter`), it indicates that
/// the caller wants to filter fields based on documents matching the query.
///
/// Note: Full query-based filtering is not yet implemented. When `query_ast` is provided,
/// fields from all documents are currently returned. This matches the ES behavior of
/// returning field capabilities for indices where at least one document matches.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ES spec sais the following:

Filter indices if the provided query rewrites to match_none on every shard.
IMPORTANT: The filtering is done on a best-effort basis, it uses index statistics and mappings to rewrite queries to match_none instead of fully running the request. For instance a range query over a date field can rewrite to match_none if all documents within a shard (including deleted documents) are outside of the provided range. However, not all queries can rewrite to match_none so this API may return an index even if the provided filter matches no document.

Given that in QW we don't have the necessary info to know much about the fields in the metadata, we need to open the splits. I think nevertheless that we need to aim for a lightweight query execution, even if that returns too many fields.

Comment on lines 201 to 217
let query_ast: QueryAst = elastic_query_dsl.try_into().map_err(|err: anyhow::Error| {
ElasticsearchError::new(
StatusCode::BAD_REQUEST,
format!("Failed to convert index_filter: {err}"),
None,
)
})?;

Some(serde_json::to_string(&query_ast).expect("QueryAst should be JSON serializable"))
};

Ok(quickwit_proto::search::ListFieldsRequest {
index_id_patterns,
fields: search_params.fields.unwrap_or_default(),
start_timestamp: search_params.start_timestamp,
end_timestamp: search_params.end_timestamp,
query_ast: query_ast_json,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to leverage the two metastore pruning capabilities:

  • refine start and end time using the query ast
  • extract tags from query ast and provide them to split listing call

…_filter

Address PR review comments for index_filter support in _field_caps API:

- Extract `parse_index_filter_to_query_ast()` function with clean prototype
- Implement split-level filtering via `split_matches_query()` using
  lightweight `query.count()` execution (no document materialization)
- Add proper async handling with ByteRangeCache, warmup(), and
  run_cpu_intensive() for Quickwit's async-only storage
- Add metastore-level pruning:
  - Tag extraction via `extract_tags_from_query()`
  - Time range extraction via `refine_start_end_timestamp_from_ast()`
- Build DocMapper only when query_ast is provided (no overhead for
  common path without index_filter)
- Fix REST API tests: use `json:` key (not `json_body:`), use lowercase
  term values to match tokenizer behavior
- Update tests to run against both quickwit and elasticsearch engines

Two-level filtering now implemented:
1. Metastore level: tags + time range from query AST
2. Split level: lightweight query execution for accurate filtering

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: ruo <ruoliu.dev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

index_filter support for field capabilities API elastic search compatible endpoints

3 participants