Skip to content

fix(datafusion): keep hot runs in one session#833

Open
kumarUjjawal wants to merge 1 commit intoClickHouse:mainfrom
kumarUjjawal:fix/datafusion_hot_run
Open

fix(datafusion): keep hot runs in one session#833
kumarUjjawal wants to merge 1 commit intoClickHouse:mainfrom
kumarUjjawal:fix/datafusion_hot_run

Conversation

@kumarUjjawal
Copy link
Copy Markdown

Summary

This fixes the partitioned DataFusion runner so hot runs are really hot. See apache/datafusion#21696

Before this change, the script started a new datafusion-cli process for every try. That made each try use a cold process, so tries 2 and 3 did not keep the warmed metadata cache.

What changed

  • drop OS cache once before each query
  • build one SQL file with create.sql and the same query 3 times
  • run that file in one datafusion-cli session
  • skip the setup Elapsed lines from create.sql
  • keep failed later tries as null instead of mixing in setup timings

This follows the same main idea used by the DuckDB partitioned runner: one process per query, with repeated runs inside that process.

Thank you for review 🙏

Thank You for Your Contribution!

We appreciate your effort and contribution to the project. To ensure that your Pull Request (PR) adheres to our guidelines, please ensure to review the rules mentioned in our contribution guidelines:

ClickHouse/ClickBench Contribution Rules

Thank you for your attention to these details and for helping us maintain the quality and integrity of the project.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 20, 2026

CLA assistant check
All committers have signed the CLA.

@kumarUjjawal
Copy link
Copy Markdown
Author

@qoega If I could get your eyes on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants