Skip to content

Participants with local on-disk storage, but without OS page cache flush #774

@puzpuzpuz

Description

@puzpuzpuz

According to the benchmark rules,

if it's a database with local on-disk storage, the first query should be run after dropping the page cache

The following local disk-based participants do not flush the OS page cache between query runs. This gives them an unfair advantage on repeated queries since data may be served from the OS cache rather than being read from disk.

The corresponding scripts should be fixed to put everyone in the same conditions.

For reference, the correct way to flush the page cache is:

sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

List

Note that the list may be incomplete.

  • chdb-dataframe | Reads parquet locally via Python chdb-dataframe: clear page cache between queries #779
  • clickhouse-datalake | Uses clickhouse local, no OS cache flush
  • clickhouse-datalake-partitioned | Uses clickhouse local, no OS cache flush
  • duckdb-dataframe | Reads parquet locally via Python
  • elasticsearch | Clears ES query cache only, not OS page cache
  • hydra | PostgreSQL-based, no cache flush
  • locustdb | Disk-based (RocksDB), no cache flush (benchmark broken)
  • mongodb | Local installation, no cache flush
  • pandas | Reads parquet locally via Python
  • polars | Reads parquet locally via Python
  • polars-dataframe | Reads parquet locally via Python
  • tembo-olap | PostgreSQL-based, no cache flush

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions