rev-list: use merge-base --independent algorithm when possible#2082
rev-list: use merge-base --independent algorithm when possible#2082derrickstolee wants to merge 3 commits intogitgitgadget:masterfrom
Conversation
Add a test that verifies the 'git rev-list --maximal-only' option produces the same set of commits as 'git merge-base --independent'. This equivalence was noted when the feature was first created, but we are about to update the implementation to use a common algorithm in this case where the user intention is identical. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add a performance test that compares 'git rev-list --maximal-only' against 'git merge-base --independent'. These two commands are asking essentially the same thing, but the rev-list implementation is more generic and hence slower. These performance tests will demonstrate that in the current state and also be used to show the equivalence in the future. We also add a case with '--since' to force the generic walk logic for rev-list even when we make that future change to use the merge-base algorithm on a simple walk. When run on my copy of git.git, I see these results: Test HEAD ---------------------------------------------- 6011.2: merge-base --independent 0.03 6011.3: rev-list --maximal-only 0.06 6011.4: rev-list --maximal-only --since 0.06 These numbers are low, but the --independent calculation is interesting due to having a lot of local branches that are actually independent. Running the same test on a fresh clone of the Linux kernel repository shows a larger difference between the algorithms, especially because the --independent algorithm is extremely fast when there are no independent references selected: Test HEAD ---------------------------------------------- 6011.2: merge-base --independent 0.00 6011.3: rev-list --maximal-only 0.70 6011.4: rev-list --maximal-only --since 0.70 Signed-off-by: Derrick Stolee <stolee@gmail.com>
The 'git rev-list --maximal-only' option filters the output to only independent commits. A commit is independent if it is not reachable from other listed commits. Currently this is implemented by doing a full revision walk and marking parents with CHILD_VISITED to skip non-maximal commits. The 'git merge-base --independent' command computes the same result using reduce_heads(), which uses the more efficient remove_redundant() algorithm. This is significantly faster because it avoids walking the entire commit graph. Add a fast path in rev-list that detects when --maximal-only is the only interesting option and all input commits are positive (no revision ranges). In this case, use reduce_heads() directly instead of doing a full revision walk. In order to preserve the rest of the output filtering, this computation is done opportunistically in a new prepare_maximal_independent() method when possible. If successful, it populates revs->commits with the list of independent commits and set revs->no_walk to prevent any other walk from occurring. This allows us to have any custom output be handled using the existing output code hidden inside traverse_commit_list_filtered(). A new test is added to demonstrate that this output is preserved. The fast path is only used when no other flags complicate the walk or output format: no UNINTERESTING commits, no limiting options (max-count, age filters, path filters, grep filters), no output formatting beyond plain OIDs, and no object listing flags. Running the p6011 performance test for my copy of git.git, I see the following improvement with this change: Test HEAD~1 HEAD ------------------------------------------------------------ 6011.2: merge-base --independent 0.03 0.03 +0.0% 6011.3: rev-list --maximal-only 0.06 0.03 -50.0% 6011.4: rev-list --maximal-only --since 0.06 0.06 +0.0% And for a fresh clone of the Linux kernel repository, I see: Test HEAD~1 HEAD ------------------------------------------------------------ 6011.2: merge-base --independent 0.00 0.00 = 6011.3: rev-list --maximal-only 0.70 0.00 -100.0% 6011.4: rev-list --maximal-only --since 0.70 0.70 +0.0% In both cases, the performance is indeed matching the behavior of 'git merge-base --independent', as expected. Signed-off-by: Derrick Stolee <stolee@gmail.com>
|
/submit |
|
Submitted as pull.2082.git.1775482048.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via git@41520e7. |
|
This branch is now known as |
|
This patch series was integrated into seen via git@ba1589f. |
|
This patch series was integrated into next via git@7a70817. |
|
This patch series was integrated into seen via git@919b077. |
|
There was a status update in the "Cooking" section about the branch "git rev-list --maximal-only" has been optimized by borrowing the logic used by "git show-branch --independent", which computes the same kind of information much more efficiently. Will merge to 'master'. source: <pull.2082.git.1775482048.gitgitgadget@gmail.com> |
|
This patch series was integrated into seen via git@42f852b. |
|
This patch series was integrated into seen via git@c343f9c. |
|
This patch series was integrated into master via git@c343f9c. |
|
This patch series was integrated into next via git@c343f9c. |
|
Congratulations! 🎉 Your patch series was merged into upstream via c343f9c. Note: this pull request will show as "Closed" rather than "Merged" because the merge happened in the upstream repository, not on GitHub. This is expected — your contribution has been accepted! |
The --maximal-only option was added to
git rev-listin b4e8f60 (revision: add --maximal-only option, 2026-01-22) and the discussion [1] included talks of how 'git rev-list --maximal-only <refs>' acts the same as 'git merge-base --independent <refs>' assuming that no other walk modifiers are provided to the revision walk. And with those assumptions, the merge-base algorithm can be faster if the refs have most of their history shared.[1] https://lore.kernel.org/git/pull.2032.v2.git.1769097958549.gitgitgadget@gmail.com/
This series updates the revision walk to use the merge-base algorithm when possible. This checks the rev_info struct for options that cause the walk to be different and also looks for negative references. If none of these appear, then the merge-base algorithm is used instead.
The series is broken into three patches that could theoretically be squashed into a single patch.
Thanks,
-Stolee
cc: gitster@pobox.com
cc: j6t@kdbg.org