Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, me@ttaylorr.com
Cc: git@vger.kernel.org, dstolee@microsoft.com, gitster@pobox.com,
	peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com
Subject: Re: [PATCH v2 18/24] pack-bitmap-write: build fewer intermediate bitmaps
Date: Mon, 30 Nov 2020 13:41:44 -0500
Message-ID: <ea0c8c5d-6bc3-0dca-4fa1-fb461ed7ccb9@gmail.com> (raw)
In-Reply-To: <20201125014633.951649-1-jonathantanmy@google.com>

On 11/24/2020 8:46 PM, Jonathan Tan wrote:
> I've reviewed the commit message already [1]; now let's look at the code.
> 
> [1] https://lore.kernel.org/git/20201124060738.762751-1-jonathantanmy@google.com/
> 
>>  struct bb_commit {
>>  	struct commit_list *reverse_edges;
>> +	struct bitmap *commit_mask;
>>  	struct bitmap *bitmap;
>> -	unsigned selected:1;
>> +	unsigned selected:1,
>> +		 maximal:1;
> 
> The code itself probably should contain comments about the algorithm,
> but I'm not sure of the best way to do it. (E.g. I would say that
> "maximal" should be "When iteration in bitmap_builder_init() reaches
> this bb_commit, this is true iff none of its descendants has or will
> ever have the exact same commit_mask" - but then when do we explain why
> the commit_mask matters?)

Comments are tricky, as they are likely to go stale. In fact,
this algorithm changes dramatically later in this very series!

How much can we expect a reader to inspect the commit history to
discover the lengthy commit message? The message explains the
algorithm and its many subtleties versus reading comments that may
be too specific to this initial version.

At this point, "maximal" is a property that doesn't mean much
without inspecting the places where we set or check that bit.

>> +		if (c_ent->maximal) {
>> +			if (!c_ent->selected) {
>> +				bitmap_set(c_ent->commit_mask, num_maximal);
>> +				num_maximal++;
>> +			}
>> +
>> +			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
>> +			bb->commits[bb->commits_nr++] = commit;
> 
> So the order of bit assignments in the commit_mask and the order of
> commits in bb->commits are not the same. In the commit_mask, bits are
> first assigned for selected commits and then the rest for commits we
> discover to be maximal. But in bb->commits, the order follows the
> topologically-sorted iteration. This is fine, but might be worth a
> comment (to add to the already big comment burden...)

This one I waffled on a bit. There _is_ a difference between the
bitmask order and this list order. This isn't a problem as long as
no one attempts to use the bitmasks to navigate into this list.

Further, it is entirely possible that the tests to never demonstrate
that difference, so pointing out may help a future developer from
making that mistake. Could be good to add this comment over the
bb->commits definition:

	/*
	 * 'commits' stores the list of maximal commits, in visited
	 * order. This can be different than the bitmask order for
	 * the selected commits.
	 */

>> +# To ensure the logic for "maximal commits" is exercised, make
>> +# the repository a bit more complicated.
>> +#
>> +#    other                         master
>> +#      *                             *
>> +# (99 commits)                  (99 commits)
>> +#      *                             *
>> +#      |\                           /|
>> +#      | * octo-other  octo-master * |
>> +#      |/|\_________  ____________/|\|
>> +#      | \          \/  __________/  |
>> +#      |  | ________/\ /             |
>> +#      *  |/          * merge-right  *
>> +#      | _|__________/ \____________ |
>> +#      |/ |                         \|
>> +# (l1) *  * merge-left               * (r1)
>> +#      | / \________________________ |
>> +#      |/                           \|
>> +# (l2) *                             * (r2)
>> +#       \____________...____________ |
> 
> What does the ... represent? If a certain number of commits, it would be
> clearer to write that there.

The ... are unnecessary and should be ___ instead. Thanks.
 
>> +#                                   \|
>> +#                                    * (base)
> 
> OK - some of the crosses are unclear, but from the bitmask given below,
> I know where the lines should go.
> 
>> +#
>> +# The important part for the maximal commit algorithm is how
>> +# the bitmasks are extended. Assuming starting bit positions
>> +# for master (bit 0) and other (bit 1), and some flexibility
>> +# in the order that merge bases are visited, the bitmasks at
>> +# the end should be:
>> +#
>> +#      master: 1       (maximal, selected)
>> +#       other: 01      (maximal, selected)
>> +# octo-master: 1
>> +#  octo-other: 01
>> +# merge-right: 111     (maximal)
>> +#        (l1): 111
>> +#        (r1): 111
>> +#  merge-left: 1101    (maximal)
>> +#        (l2): 11111   (maximal)
>> +#        (r2): 111101  (maximal)
>> +#      (base): 1111111 (maximal)
> 
> This makes sense. (l1) and (r1) are not maximal because everything that
> can reach merge-right can also reach them.
> 
> [snip]
> 
>>  test_expect_success 'full repack creates bitmaps' '
>> -	git repack -ad &&
>> +	GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \
>> +		git repack -ad &&
>>  	ls .git/objects/pack/ | grep bitmap >output &&
>> -	test_line_count = 1 output
>> +	test_line_count = 1 output &&
>> +	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
>> +	grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace
> 
> From the diagram and bit masks, I see that the important number for
> "maximal" is 7. Could this test be run twice - one without the crosses
> and one with, and we can verify that the difference between the maximal
> commits is 7? As it is, this 111 number is hard to verify.

This number _is_ hard to verify. It is fragile to many behaviors inside
the code of Git, including the selection algorithm and some hard-coded
limits (there's a reason we insert 100 commits on top of each side).
Further, this number changes as the algorithm is modified.

Perhaps the best way to recognize this number is that it changes by adding
5 to the previous number (these are the 5 "newly maximal" commits, since
two are already selected as tips).

This number changes again later, and the difference is justified by
the number of maximal commits dropping by 4.

Thanks,
-Stolee

  reply index

Thread overview: 174+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 19:41 [PATCH 00/23] pack-bitmap: bitmap generation improvements Taylor Blau
2020-11-11 19:41 ` [PATCH 01/23] ewah/ewah_bitmap.c: grow buffer past 1 Taylor Blau
2020-11-22 19:36   ` Junio C Hamano
2020-11-23 16:22     ` Taylor Blau
2020-11-24  2:48       ` Jeff King
2020-11-24  2:51         ` Jeff King
2020-12-01 22:56           ` Taylor Blau
2020-11-11 19:41 ` [PATCH 02/23] pack-bitmap: fix header size check Taylor Blau
2020-11-12 17:39   ` Martin Ågren
2020-11-11 19:42 ` [PATCH 03/23] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-11-12 17:47   ` Martin Ågren
2020-11-13  4:57     ` Jeff King
2020-11-13  5:26       ` Martin Ågren
2020-11-13 21:29       ` Taylor Blau
2020-11-13 21:39         ` Jeff King
2020-11-13 21:49           ` Taylor Blau
2020-11-13 22:11             ` Jeff King
2020-11-11 19:42 ` [PATCH 04/23] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-11-11 19:42 ` [PATCH 05/23] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-11-11 19:42 ` [PATCH 06/23] ewah: factor out bitmap growth Taylor Blau
2020-11-11 19:42 ` [PATCH 07/23] ewah: make bitmap growth less aggressive Taylor Blau
2020-11-22 20:32   ` Junio C Hamano
2020-11-23 16:49     ` Taylor Blau
2020-11-24  3:00       ` Jeff King
2020-11-24 20:11         ` Junio C Hamano
2020-11-11 19:43 ` [PATCH 08/23] ewah: implement bitmap_or() Taylor Blau
2020-11-22 20:34   ` Junio C Hamano
2020-11-23 16:52     ` Taylor Blau
2020-11-11 19:43 ` [PATCH 09/23] ewah: add bitmap_dup() function Taylor Blau
2020-11-11 19:43 ` [PATCH 10/23] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-11-11 19:43 ` [PATCH 11/23] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-11-11 19:43 ` [PATCH 12/23] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-11-11 19:43 ` [PATCH 13/23] bitmap: add bitmap_diff_nonzero() Taylor Blau
2020-11-11 19:43 ` [PATCH 14/23] commit: implement commit_list_contains() Taylor Blau
2020-11-11 19:43 ` [PATCH 15/23] t5310: add branch-based checks Taylor Blau
2020-11-11 20:58   ` Derrick Stolee
2020-11-11 21:04     ` Junio C Hamano
2020-11-15 23:26       ` Johannes Schindelin
2020-11-11 19:43 ` [PATCH 16/23] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-11-11 19:43 ` [PATCH 17/23] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-11-13 22:23   ` SZEDER Gábor
2020-11-13 23:03     ` Jeff King
2020-11-14  6:23       ` Jeff King
2020-11-11 19:43 ` [PATCH 18/23] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-11-11 19:44 ` [PATCH 19/23] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-11-11 19:44 ` [PATCH 20/23] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-11-11 19:44 ` [PATCH 21/23] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-11-11 19:44 ` [PATCH 22/23] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-11-11 19:44 ` [PATCH 23/23] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-11-17 21:46 ` [PATCH v2 00/24] pack-bitmap: bitmap generation improvements Taylor Blau
2020-11-17 21:46   ` [PATCH v2 01/24] ewah/ewah_bitmap.c: grow buffer past 1 Taylor Blau
2020-11-17 21:46   ` [PATCH v2 02/24] pack-bitmap: fix header size check Taylor Blau
2020-11-17 21:46   ` [PATCH v2 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-11-17 21:46   ` [PATCH v2 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-11-17 21:46   ` [PATCH v2 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-11-17 21:46   ` [PATCH v2 06/24] ewah: factor out bitmap growth Taylor Blau
2020-11-17 21:47   ` [PATCH v2 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-11-17 21:47   ` [PATCH v2 08/24] ewah: implement bitmap_or() Taylor Blau
2020-11-17 21:47   ` [PATCH v2 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-11-17 21:47   ` [PATCH v2 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-11-25  0:53     ` Jonathan Tan
2020-11-28 17:27       ` Taylor Blau
2020-11-17 21:47   ` [PATCH v2 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-11-25  1:00     ` Jonathan Tan
2020-11-17 21:47   ` [PATCH v2 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-11-22 21:50     ` Junio C Hamano
2020-11-23 14:54       ` Derrick Stolee
2020-11-25  1:14     ` Jonathan Tan
2020-11-28 17:21       ` Taylor Blau
2020-11-30 18:33         ` Jonathan Tan
2020-11-17 21:47   ` [PATCH v2 13/24] bitmap: add bitmap_diff_nonzero() Taylor Blau
2020-11-22 22:01     ` Junio C Hamano
2020-11-23 20:19       ` Taylor Blau
2020-11-17 21:47   ` [PATCH v2 14/24] commit: implement commit_list_contains() Taylor Blau
2020-11-17 21:47   ` [PATCH v2 15/24] t5310: add branch-based checks Taylor Blau
2020-11-25  1:17     ` Jonathan Tan
2020-11-28 17:30       ` Taylor Blau
2020-11-17 21:47   ` [PATCH v2 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-11-17 21:47   ` [PATCH v2 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-11-17 21:48   ` [PATCH v2 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-11-24  6:07     ` Jonathan Tan
2020-11-25  1:46     ` Jonathan Tan
2020-11-30 18:41       ` Derrick Stolee [this message]
2020-11-17 21:48   ` [PATCH v2 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-02  7:13     ` Jonathan Tan
2020-11-17 21:48   ` [PATCH v2 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-02  7:17     ` Jonathan Tan
2020-11-17 21:48   ` [PATCH v2 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-02  7:20     ` Jonathan Tan
2020-11-17 21:48   ` [PATCH v2 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-02  7:28     ` Jonathan Tan
2020-12-02 16:21       ` Taylor Blau
2020-11-17 21:48   ` [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-12-02  7:44     ` Jonathan Tan
2020-12-02 16:30       ` Taylor Blau
2020-12-07 18:19         ` Jonathan Tan
2020-12-07 18:43           ` Derrick Stolee
2020-12-07 18:45             ` Derrick Stolee
2020-12-07 18:48           ` Jeff King
2020-11-17 21:48   ` [PATCH v2 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-12-02  8:08     ` Jonathan Tan
2020-12-02 16:35       ` Taylor Blau
2020-12-02 18:22         ` Derrick Stolee
2020-12-02 18:25           ` Taylor Blau
2020-12-07 18:26             ` Jonathan Tan
2020-12-07 18:24           ` Jonathan Tan
2020-12-07 19:20             ` Derrick Stolee
2020-11-18 18:32   ` [PATCH v2 00/24] pack-bitmap: bitmap generation improvements SZEDER Gábor
2020-11-18 19:51     ` Taylor Blau
2020-11-22  2:17       ` Taylor Blau
2020-11-22  2:28         ` Taylor Blau
2020-11-20  6:34   ` Martin Ågren
2020-11-21 19:37     ` Junio C Hamano
2020-11-21 20:11       ` Martin Ågren
2020-11-22  2:31         ` Taylor Blau
2020-11-24  2:43           ` Jeff King
2020-12-01 23:04             ` Taylor Blau
2020-12-01 23:37               ` Jonathan Tan
2020-12-01 23:43                 ` Taylor Blau
2020-12-02  8:11                   ` Jonathan Tan
2020-12-08  0:04 ` [PATCH v3 " Taylor Blau
2020-12-08  0:04   ` [PATCH v3 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Taylor Blau
2020-12-08  0:04   ` [PATCH v3 02/24] pack-bitmap: fix header size check Taylor Blau
2020-12-08  0:04   ` [PATCH v3 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-12-08  0:04   ` [PATCH v3 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-12-08  0:04   ` [PATCH v3 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-12-08  0:04   ` [PATCH v3 06/24] ewah: factor out bitmap growth Taylor Blau
2020-12-08  0:04   ` [PATCH v3 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-12-08  0:04   ` [PATCH v3 08/24] ewah: implement bitmap_or() Taylor Blau
2020-12-08  0:04   ` [PATCH v3 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-12-08  0:04   ` [PATCH v3 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-12-08  0:05   ` [PATCH v3 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-12-08  0:05   ` [PATCH v3 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-12-08  0:05   ` [PATCH v3 13/24] bitmap: implement bitmap_is_subset() Taylor Blau
2020-12-08  0:05   ` [PATCH v3 14/24] commit: implement commit_list_contains() Taylor Blau
2020-12-08  0:05   ` [PATCH v3 15/24] t5310: add branch-based checks Taylor Blau
2020-12-08  0:05   ` [PATCH v3 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-12-08  0:05   ` [PATCH v3 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-12-08  0:05   ` [PATCH v3 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-12-08  0:05   ` [PATCH v3 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-08  0:05   ` [PATCH v3 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-08  0:05   ` [PATCH v3 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-08  0:05   ` [PATCH v3 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-08  0:05   ` [PATCH v3 23/24] pack-bitmap-write: relax unique rewalk condition Taylor Blau
2020-12-08  0:05   ` [PATCH v3 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau
2020-12-08 20:56   ` [PATCH v3 00/24] pack-bitmap: bitmap generation improvements Junio C Hamano
2020-12-08 21:03     ` Taylor Blau
2020-12-08 22:03       ` Junio C Hamano
2020-12-08 22:03 ` [PATCH v4 " Taylor Blau
2020-12-08 22:03   ` [PATCH v4 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Taylor Blau
2020-12-08 22:03   ` [PATCH v4 02/24] pack-bitmap: fix header size check Taylor Blau
2020-12-08 22:03   ` [PATCH v4 03/24] pack-bitmap: bounds-check size of cache extension Taylor Blau
2020-12-08 22:03   ` [PATCH v4 04/24] t5310: drop size of truncated ewah bitmap Taylor Blau
2020-12-08 22:03   ` [PATCH v4 05/24] rev-list: die when --test-bitmap detects a mismatch Taylor Blau
2020-12-08 22:03   ` [PATCH v4 06/24] ewah: factor out bitmap growth Taylor Blau
2020-12-08 22:03   ` [PATCH v4 07/24] ewah: make bitmap growth less aggressive Taylor Blau
2020-12-08 22:03   ` [PATCH v4 08/24] ewah: implement bitmap_or() Taylor Blau
2020-12-08 22:03   ` [PATCH v4 09/24] ewah: add bitmap_dup() function Taylor Blau
2020-12-08 22:03   ` [PATCH v4 10/24] pack-bitmap-write: reimplement bitmap writing Taylor Blau
2020-12-08 22:03   ` [PATCH v4 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Taylor Blau
2020-12-08 22:04   ` [PATCH v4 12/24] pack-bitmap-write: fill bitmap with commit history Taylor Blau
2020-12-08 22:04   ` [PATCH v4 13/24] bitmap: implement bitmap_is_subset() Taylor Blau
2020-12-08 22:04   ` [PATCH v4 14/24] commit: implement commit_list_contains() Taylor Blau
2020-12-08 22:04   ` [PATCH v4 15/24] t5310: add branch-based checks Taylor Blau
2020-12-08 22:04   ` [PATCH v4 16/24] pack-bitmap-write: rename children to reverse_edges Taylor Blau
2020-12-08 22:04   ` [PATCH v4 17/24] pack-bitmap.c: check reads more aggressively when loading Taylor Blau
2020-12-08 22:04   ` [PATCH v4 18/24] pack-bitmap-write: build fewer intermediate bitmaps Taylor Blau
2020-12-08 22:04   ` [PATCH v4 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Taylor Blau
2020-12-08 22:04   ` [PATCH v4 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Taylor Blau
2020-12-08 22:05     ` Taylor Blau
2020-12-08 22:05   ` [PATCH v4 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Taylor Blau
2020-12-08 22:05   ` [PATCH v4 22/24] pack-bitmap-write: use existing bitmaps Taylor Blau
2020-12-08 22:05   ` [PATCH v4 23/24] pack-bitmap-write: relax unique revwalk condition Taylor Blau
2020-12-08 22:05   ` [PATCH v4 24/24] pack-bitmap-write: better reuse bitmaps Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea0c8c5d-6bc3-0dca-4fa1-fb461ed7ccb9@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=martin.agren@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git