* [PATCH 0/2] bitmap-format.txt: fix some formatting issues and include checksum info @ 2022-06-02 13:52 Abhradeep Chakraborty via GitGitGadget 2022-06-02 13:52 ` [PATCH 1/2] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-02 13:52 UTC (permalink / raw) To: git; +Cc: Abhradeep Chakraborty There are some issues in the bitmap-format html page. For example, some nested lists are shown as top-level lists (e.g. [1]- Here BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as top-level list). The first commit fix those. The second commit is about including the info of trailing checksum in the bitmap-format documentation. [1] https://git-scm.com/docs/bitmap-format#_on_disk_format Abhradeep Chakraborty (2): bitmap-format.txt: fix some formatting issues bitmap-format.txt: add information for trailing checksum Documentation/technical/bitmap-format.txt | 100 +++++++++++----------- 1 file changed, 49 insertions(+), 51 deletions(-) base-commit: 2668e3608e47494f2f10ef2b6e69f08a84816bcb Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1246%2FAbhra303%2Ffix-doc-formatting-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1246/Abhra303/fix-doc-formatting-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1246 -- gitgitgadget ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/2] bitmap-format.txt: fix some formatting issues 2022-06-02 13:52 [PATCH 0/2] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget @ 2022-06-02 13:52 ` Abhradeep Chakraborty via GitGitGadget 2022-06-06 15:55 ` Junio C Hamano 2022-06-02 13:52 ` [PATCH 2/2] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2 siblings, 1 reply; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-02 13:52 UTC (permalink / raw) To: git; +Cc: Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> The asciidoc generated html for `Documentation/technical/bitmap- format.txt` is broken. This is mainly because `-` is used for nested lists (which is not allowed in asciidoc) instead of `*`. Fix these and also reformat it (e.g. removing some blank lines) for better readability of the html page. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 96 +++++++++++------------ 1 file changed, 45 insertions(+), 51 deletions(-) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 04b3ec21785..110d7ddf8ed 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: + * A header appears at the beginning: 4-byte signature: {'B', 'I', 'T', 'M'} @@ -48,35 +48,30 @@ MIDXs, both the bit-cache and rev-cache extensions are required. of the bitmap index (the same one as JGit). 2-byte flags (network byte order) - The following flags are supported: - - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED - This flag must always be present. It implies that the - bitmap index has been generated for a packfile or - multi-pack index (MIDX) with full closure (i.e. where - every single object in the packfile/MIDX can find its - parent links inside the same packfile/MIDX). This is a - requirement for the bitmap index format, also present in - JGit, that greatly reduces the complexity of the - implementation. - - - BITMAP_OPT_HASH_CACHE (0x4) - If present, the end of the bitmap file contains - `N` 32-bit name-hash values, one per object in the - pack/MIDX. The format and meaning of the name-hash is - described below. + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the + bitmap index has been generated for a packfile or + multi-pack index (MIDX) with full closure (i.e. where + every single object in the packfile/MIDX can find its + parent links inside the same packfile/MIDX). This is a + requirement for the bitmap index format, also present in + JGit, that greatly reduces the complexity of the + implementation. + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack/MIDX. The format and meaning of the name-hash is + described below. 4-byte entry count (network byte order) - The total count of entries (bitmapped commits) in this bitmap index. 20-byte checksum - The SHA1 checksum of the pack/MIDX this bitmap index belongs to. - - 4 EWAH bitmaps that act as type indexes + * 4 EWAH bitmaps that act as type indexes Type indexes are serialized after the hash cache in the shape of four EWAH bitmaps stored consecutively (see Appendix A for @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. There is a bitmap for each Git object type, stored in the following order: - - Commits - Trees - Blobs @@ -97,39 +91,39 @@ MIDXs, both the bit-cache and rev-cache extensions are required. in a full set (all bits set), and the AND of all 4 bitmaps will result in an empty bitmap (no bits set). - - N entries with compressed bitmaps, one for each indexed commit + * N entries with compressed bitmaps, one for each indexed commit Where `N` is the total amount of entries in this bitmap index. Each entry contains the following: - - 4-byte object position (network byte order) - The position **in the index for the packfile or - multi-pack index** where the bitmap for this commit is - found. - - - 1-byte XOR-offset - The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual - bitmap representing this commit is composed by XORing the - bitmap for this entry with the bitmap in entry `x-y` (i.e. - the bitmap `y` entries before this one). - - Note that this compression can be recursive. In order to - XOR this entry with a previous one, the previous entry needs - to be decompressed first, and so on. - - The hard-limit for this offset is 160 (an entry can only be - xor'ed against one of the 160 entries preceding it). This - number is always positive, and hence entries are always xor'ed - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - - - 1-byte flags for this bitmap - At the moment the only available flag is `0x1`, which hints - that this bitmap can be re-used when rebuilding bitmap indexes - for the repository. - - - The compressed bitmap itself, see Appendix A. + ** 4-byte object position (network byte order) + The position **in the index for the packfile or + multi-pack index** where the bitmap for this commit is + found. + + ** 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). + + Note that this compression can be recursive. In order to + XOR this entry with a previous one, the previous entry needs + to be decompressed first, and so on. + + The hard-limit for this offset is 160 (an entry can only be + xor'ed against one of the 160 entries preceding it). This + number is always positive, and hence entries are always xor'ed + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + + ** 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] bitmap-format.txt: fix some formatting issues 2022-06-02 13:52 ` [PATCH 1/2] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-06 15:55 ` Junio C Hamano 2022-06-07 10:25 ` Abhradeep Chakraborty 0 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2022-06-06 15:55 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Abhradeep Chakraborty, Vicent Marti, Taylor Blau "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > Cc: git@vger.kernel.org, Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Identify those who may have input with "git log --no-merges" and add them here, perhaps? > The asciidoc generated html for `Documentation/technical/bitmap- > format.txt` is broken. This is mainly because `-` is used for nested > lists (which is not allowed in asciidoc) instead of `*`. Are we missing another step that must come much earlier than this patch? It seems to me that Documentation/Makefile does not even consider that we should feed this file to AsciiDoc. > Fix these and also reformat it (e.g. removing some blank lines) for > better readability of the html page. Do these blank lines hurt very badly how the end-result is formatted in HTML? Does the extra indentation between the line with "The following flags are supported" on it and the two bullet items in the header make the output better in significant way? These changes make the input text much harder to read, and are not very welcome, so unless they are part of "fixing generated HTML is broken", please omit them. As evidenced by the lack of HTML output in the build system, a lot more folks read this document in text than in HTML, and readability of the source matters. Thanks. > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > --- > Documentation/technical/bitmap-format.txt | 96 +++++++++++------------ > 1 file changed, 45 insertions(+), 51 deletions(-) > > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt > index 04b3ec21785..110d7ddf8ed 100644 > --- a/Documentation/technical/bitmap-format.txt > +++ b/Documentation/technical/bitmap-format.txt > @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > > == On-disk format > > - - A header appears at the beginning: > + * A header appears at the beginning: > > 4-byte signature: {'B', 'I', 'T', 'M'} > > @@ -48,35 +48,30 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > of the bitmap index (the same one as JGit). > > 2-byte flags (network byte order) > - > The following flags are supported: > - > - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED > - This flag must always be present. It implies that the > - bitmap index has been generated for a packfile or > - multi-pack index (MIDX) with full closure (i.e. where > - every single object in the packfile/MIDX can find its > - parent links inside the same packfile/MIDX). This is a > - requirement for the bitmap index format, also present in > - JGit, that greatly reduces the complexity of the > - implementation. > - > - - BITMAP_OPT_HASH_CACHE (0x4) > - If present, the end of the bitmap file contains > - `N` 32-bit name-hash values, one per object in the > - pack/MIDX. The format and meaning of the name-hash is > - described below. > + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED > + This flag must always be present. It implies that the > + bitmap index has been generated for a packfile or > + multi-pack index (MIDX) with full closure (i.e. where > + every single object in the packfile/MIDX can find its > + parent links inside the same packfile/MIDX). This is a > + requirement for the bitmap index format, also present in > + JGit, that greatly reduces the complexity of the > + implementation. > + - BITMAP_OPT_HASH_CACHE (0x4) > + If present, the end of the bitmap file contains > + `N` 32-bit name-hash values, one per object in the > + pack/MIDX. The format and meaning of the name-hash is > + described below. > > 4-byte entry count (network byte order) > - > The total count of entries (bitmapped commits) in this bitmap index. > > 20-byte checksum > - > The SHA1 checksum of the pack/MIDX this bitmap index > belongs to. > > - - 4 EWAH bitmaps that act as type indexes > + * 4 EWAH bitmaps that act as type indexes > > Type indexes are serialized after the hash cache in the shape > of four EWAH bitmaps stored consecutively (see Appendix A for > @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > > There is a bitmap for each Git object type, stored in the following > order: > - > - Commits > - Trees > - Blobs > @@ -97,39 +91,39 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > in a full set (all bits set), and the AND of all 4 bitmaps will > result in an empty bitmap (no bits set). > > - - N entries with compressed bitmaps, one for each indexed commit > + * N entries with compressed bitmaps, one for each indexed commit > > Where `N` is the total amount of entries in this bitmap index. > Each entry contains the following: > > - - 4-byte object position (network byte order) > - The position **in the index for the packfile or > - multi-pack index** where the bitmap for this commit is > - found. > - > - - 1-byte XOR-offset > - The xor offset used to compress this bitmap. For an entry > - in position `x`, a XOR offset of `y` means that the actual > - bitmap representing this commit is composed by XORing the > - bitmap for this entry with the bitmap in entry `x-y` (i.e. > - the bitmap `y` entries before this one). > - > - Note that this compression can be recursive. In order to > - XOR this entry with a previous one, the previous entry needs > - to be decompressed first, and so on. > - > - The hard-limit for this offset is 160 (an entry can only be > - xor'ed against one of the 160 entries preceding it). This > - number is always positive, and hence entries are always xor'ed > - with **previous** bitmaps, not bitmaps that will come afterwards > - in the index. > - > - - 1-byte flags for this bitmap > - At the moment the only available flag is `0x1`, which hints > - that this bitmap can be re-used when rebuilding bitmap indexes > - for the repository. > - > - - The compressed bitmap itself, see Appendix A. > + ** 4-byte object position (network byte order) > + The position **in the index for the packfile or > + multi-pack index** where the bitmap for this commit is > + found. > + > + ** 1-byte XOR-offset > + The xor offset used to compress this bitmap. For an entry > + in position `x`, a XOR offset of `y` means that the actual > + bitmap representing this commit is composed by XORing the > + bitmap for this entry with the bitmap in entry `x-y` (i.e. > + the bitmap `y` entries before this one). > + > + Note that this compression can be recursive. In order to > + XOR this entry with a previous one, the previous entry needs > + to be decompressed first, and so on. > + > + The hard-limit for this offset is 160 (an entry can only be > + xor'ed against one of the 160 entries preceding it). This > + number is always positive, and hence entries are always xor'ed > + with **previous** bitmaps, not bitmaps that will come afterwards > + in the index. > + > + ** 1-byte flags for this bitmap > + At the moment the only available flag is `0x1`, which hints > + that this bitmap can be re-used when rebuilding bitmap indexes > + for the repository. > + > + ** The compressed bitmap itself, see Appendix A. > > == Appendix A: Serialization format for an EWAH bitmap ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/2] bitmap-format.txt: fix some formatting issues 2022-06-06 15:55 ` Junio C Hamano @ 2022-06-07 10:25 ` Abhradeep Chakraborty 0 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-07 10:25 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty, Git, Vicent Marti, Taylor Blau, Kaartic Sivaraam, Derrick Stolee Junio C Hamano <gitster@pobox.com> wrote: > Identify those who may have input with "git log --no-merges" and add > them here, perhaps? Thanks, I hopefully cc'd all the people who can give some input about the patch except Peff. I got to know that he took a break so I decided not to cc him (will surely do if you say). I would love to hear from other people who has knowledge on asciidoc. I previously informed Taylor and Kaartic about the patch but forgot to cc them :P Another thing to note that the checksum that I included in the last commit is suggested by Taylor himself. I was having problem to understand some portion of `load_bitmap_header()` (because I wasn't aware of the trailing checksum) when he cleared my doubt by saying that a trailer checksum exists and also suggested to make a PR addressing that - > I'm glad that it was helpful! If you think others may be confused by the same, feel free to write a patch modifying Documentation/technical/bitmap-format.txt to point out the trailing checksum. Junio wrote - > Are we missing another step that must come much earlier than this > patch? It seems to me that Documentation/Makefile does not even > consider that we should feed this file to AsciiDoc. I also think the same. At first, I thought this is intentional. When I ran `make doc` (to test the resulting html file), it didn't generate any html file for bitmap-format.txt. But thankfully there is an online asciidoc editor[1] where you can check the resulting html file. You also can check the resulting html by copy-pasting the content[2] of my github branch bitmap-format file to that editor. Will write a patch for it. The current broken page can be found at - https://git-scm.com/docs/bitmap-format > Do these blank lines hurt very badly how the end-result is formatted > in HTML? Does the extra indentation between the line with "The > following flags are supported" on it and the two bullet items in the > header make the output better in significant way? Answering to the first question - yes, those are necessary to improve the html readability (you can verify that by including and removing the blank lines in the editor and obsering the changes). This ensures that all the related paragraphes are contained in the same block. The extra identations are not necessary. I add those because I thought that these would be visually better for html page readers. If you think it does the opposite, I can remove those. I tried to use two bullets as less as possible ( In most cases, nested lists came under <pre> blocks, so I didn't have to use two bullets). But in one case, I had to use it for nested lists (Try the editor to see the rendered output). > These changes make the input text much harder to read, and are not > very welcome, so unless they are part of "fixing generated HTML is > broken", please omit them. As evidenced by the lack of HTML output > in the build system, a lot more folks read this document in text than > in HTML, and readability of the source matters. Okay, I will then remove those extra indentations. But besides that, all are necessary. I admit that readability of source matters but I think html pages are also important (even more important) for people who don't have the source codes and want to know the git internals. Thanks :) [1] https://asciidoclive.com/edit/scratch/1 [2] https://github.com/Abhra303/git/blob/fix-doc-formatting/Documentation/technical/bitmap-format.txt ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/2] bitmap-format.txt: add information for trailing checksum 2022-06-02 13:52 [PATCH 0/2] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2022-06-02 13:52 ` [PATCH 1/2] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-02 13:52 ` Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-02 13:52 UTC (permalink / raw) To: git; +Cc: Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Bitmap file has a trailing checksum at the end of the file. However there is no information in the bitmap-format documentation about it. Add a trailer section to include the trailing checksum info in the `Documentation/technical/bitmap-format.txt` file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 110d7ddf8ed..6846e7221a7 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -125,6 +125,10 @@ MIDXs, both the bit-cache and rev-cache extensions are required. ** The compressed bitmap itself, see Appendix A. + * TRAILER: + + Index checksum of the above contents. + == Appendix A: Serialization format for an EWAH bitmap Ewah bitmaps are serialized in the same protocol as the JAVAEWAH -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-02 13:52 [PATCH 0/2] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2022-06-02 13:52 ` [PATCH 1/2] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget 2022-06-02 13:52 ` [PATCH 2/2] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 ` Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget ` (4 more replies) 2 siblings, 5 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 UTC (permalink / raw) To: git Cc: Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty There are some issues in the bitmap-format html page. For example, some nested lists are shown as top-level lists (e.g. [1]- Here BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as top-level list). There is also a need of adding info about trailing checksum in the docs. Changes since v1: * a new commit addressing bitmap-format.txt html page generation is added * Remove extra indentation from the previous change * elaborate more about the trailing checksum (as suggested by Kaartic) initial version: * first commit fixes some formatting issues * information about trailing checksum in the bitmap file is added in the bitmap-format doc. [1] https://git-scm.com/docs/bitmap-format#_on_disk_format Abhradeep Chakraborty (3): bitmap-format.txt: feed the file to asciidoc to generate html bitmap-format.txt: fix some formatting issues bitmap-format.txt: add information for trailing checksum Documentation/Makefile | 1 + Documentation/technical/bitmap-format.txt | 24 +++++++++++------------ 2 files changed, 12 insertions(+), 13 deletions(-) base-commit: 2668e3608e47494f2f10ef2b6e69f08a84816bcb Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1246%2FAbhra303%2Ffix-doc-formatting-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1246/Abhra303/fix-doc-formatting-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1246 Range-diff vs v1: -: ----------- > 1: a1b9bd9af90 bitmap-format.txt: feed the file to asciidoc to generate html 1: 976361e624a ! 2: cb919513c14 bitmap-format.txt: fix some formatting issues @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac - The following flags are supported: - -- - BITMAP_OPT_FULL_DAG (0x1) REQUIRED -- This flag must always be present. It implies that the -- bitmap index has been generated for a packfile or -- multi-pack index (MIDX) with full closure (i.e. where -- every single object in the packfile/MIDX can find its -- parent links inside the same packfile/MIDX). This is a -- requirement for the bitmap index format, also present in -- JGit, that greatly reduces the complexity of the -- implementation. + - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + This flag must always be present. It implies that the + bitmap index has been generated for a packfile or +@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. + requirement for the bitmap index format, also present in + JGit, that greatly reduces the complexity of the + implementation. - -- - BITMAP_OPT_HASH_CACHE (0x4) -- If present, the end of the bitmap file contains -- `N` 32-bit name-hash values, one per object in the -- pack/MIDX. The format and meaning of the name-hash is -- described below. -+ - BITMAP_OPT_FULL_DAG (0x1) REQUIRED -+ This flag must always be present. It implies that the -+ bitmap index has been generated for a packfile or -+ multi-pack index (MIDX) with full closure (i.e. where -+ every single object in the packfile/MIDX can find its -+ parent links inside the same packfile/MIDX). This is a -+ requirement for the bitmap index format, also present in -+ JGit, that greatly reduces the complexity of the -+ implementation. -+ - BITMAP_OPT_HASH_CACHE (0x4) -+ If present, the end of the bitmap file contains -+ `N` 32-bit name-hash values, one per object in the -+ pack/MIDX. The format and meaning of the name-hash is -+ described below. + - BITMAP_OPT_HASH_CACHE (0x4) + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the +@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. + described below. 4-byte entry count (network byte order) - @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac Each entry contains the following: - - 4-byte object position (network byte order) -- The position **in the index for the packfile or -- multi-pack index** where the bitmap for this commit is -- found. -- ++ ** 4-byte object position (network byte order) + The position **in the index for the packfile or + multi-pack index** where the bitmap for this commit is + found. + - - 1-byte XOR-offset -- The xor offset used to compress this bitmap. For an entry -- in position `x`, a XOR offset of `y` means that the actual -- bitmap representing this commit is composed by XORing the -- bitmap for this entry with the bitmap in entry `x-y` (i.e. -- the bitmap `y` entries before this one). -- -- Note that this compression can be recursive. In order to -- XOR this entry with a previous one, the previous entry needs -- to be decompressed first, and so on. -- -- The hard-limit for this offset is 160 (an entry can only be -- xor'ed against one of the 160 entries preceding it). This -- number is always positive, and hence entries are always xor'ed -- with **previous** bitmaps, not bitmaps that will come afterwards -- in the index. -- ++ ** 1-byte XOR-offset + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the +@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. + with **previous** bitmaps, not bitmaps that will come afterwards + in the index. + - - 1-byte flags for this bitmap -- At the moment the only available flag is `0x1`, which hints -- that this bitmap can be re-used when rebuilding bitmap indexes -- for the repository. -- ++ ** 1-byte flags for this bitmap + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + - - The compressed bitmap itself, see Appendix A. -+ ** 4-byte object position (network byte order) -+ The position **in the index for the packfile or -+ multi-pack index** where the bitmap for this commit is -+ found. -+ -+ ** 1-byte XOR-offset -+ The xor offset used to compress this bitmap. For an entry -+ in position `x`, a XOR offset of `y` means that the actual -+ bitmap representing this commit is composed by XORing the -+ bitmap for this entry with the bitmap in entry `x-y` (i.e. -+ the bitmap `y` entries before this one). -+ -+ Note that this compression can be recursive. In order to -+ XOR this entry with a previous one, the previous entry needs -+ to be decompressed first, and so on. -+ -+ The hard-limit for this offset is 160 (an entry can only be -+ xor'ed against one of the 160 entries preceding it). This -+ number is always positive, and hence entries are always xor'ed -+ with **previous** bitmaps, not bitmaps that will come afterwards -+ in the index. -+ -+ ** 1-byte flags for this bitmap -+ At the moment the only available flag is `0x1`, which hints -+ that this bitmap can be re-used when rebuilding bitmap indexes -+ for the repository. -+ -+ ** The compressed bitmap itself, see Appendix A. ++ ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap 2: ba534b5d486 ! 3: 2171d31fb2b bitmap-format.txt: add information for trailing checksum @@ Commit message ## Documentation/technical/bitmap-format.txt ## @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. - ** The compressed bitmap itself, see Appendix A. + ** The compressed bitmap itself, see Appendix A. + * TRAILER: + -+ Index checksum of the above contents. ++ Index checksum of the above contents. It is a 20-byte SHA1 checksum. + == Appendix A: Serialization format for an EWAH bitmap -- gitgitgadget ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 ` Abhradeep Chakraborty via GitGitGadget 2022-06-07 18:39 ` Junio C Hamano 2022-06-07 20:21 ` Taylor Blau 2022-06-07 17:43 ` [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget ` (3 subsequent siblings) 4 siblings, 2 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 UTC (permalink / raw) To: git Cc: Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Documentation/Makefile does not include bitmap-format.txt to generate a html page using asciidoc. Teach Documentation/Makefile to also generate a html page for Documentation/technical/bitmap-format.txt file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index d3f043f50d2..8d405a14330 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -94,6 +94,7 @@ TECH_DOCS += MyFirstContribution TECH_DOCS += MyFirstObjectWalk TECH_DOCS += SubmittingPatches TECH_DOCS += ToolsForGit +TECH_DOCS += technical/bitmap-format TECH_DOCS += technical/bundle-format TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html 2022-06-07 17:43 ` [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 18:39 ` Junio C Hamano 2022-06-08 15:02 ` Abhradeep Chakraborty 2022-06-07 20:21 ` Taylor Blau 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2022-06-07 18:39 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > > Documentation/Makefile does not include bitmap-format.txt to generate > a html page using asciidoc. > > Teach Documentation/Makefile to also generate a html page for > Documentation/technical/bitmap-format.txt file. > > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > --- > Documentation/Makefile | 1 + > 1 file changed, 1 insertion(+) The change itself is obviously correct (assuming that it is worth passing the document to AsciiDoc, instead of reading it in text, that is). > diff --git a/Documentation/Makefile b/Documentation/Makefile > index d3f043f50d2..8d405a14330 100644 > --- a/Documentation/Makefile > +++ b/Documentation/Makefile > @@ -94,6 +94,7 @@ TECH_DOCS += MyFirstContribution > TECH_DOCS += MyFirstObjectWalk > TECH_DOCS += SubmittingPatches > TECH_DOCS += ToolsForGit > +TECH_DOCS += technical/bitmap-format > TECH_DOCS += technical/bundle-format > TECH_DOCS += technical/hash-function-transition > TECH_DOCS += technical/http-protocol Is bitmap-format the only one that is not fed to AsciiDoc, by the way? Are there other 'text-only' document that is worth converting to AsciiDoc? It is outside the scope of this series, of course, to actually adjusting them, but since you are already doing the homework, I thought you might already know the answer, which may become a source of inspriation for others to find something to work on. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html 2022-06-07 18:39 ` Junio C Hamano @ 2022-06-08 15:02 ` Abhradeep Chakraborty 0 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-08 15:02 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty, Git, Taylor Blau, Kaartic Sivaraam, Derrick Stolee Junio C Hamano <gitster@pobox.com> wrote: > Is bitmap-format the only one that is not fed to AsciiDoc, by the > way? Are there other 'text-only' document that is worth converting > to AsciiDoc? > > It is outside the scope of this series, of course, to actually > adjusting them, but since you are already doing the homework, I > thought you might already know the answer, which may become a source > of inspriation for others to find something to work on. No, bitmap-format is not the only one. There are more text-only files. Some of them which I found till now are - technical/chunk-format.txt, technical/commit-graph.txt etc. There are more but I don't know if they actually need html conversion. These two texts (which I mentioned) is I think worth having html files. I was thinking of adding those in my commit but later I thought it would divert the patch series. Thanks :) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html 2022-06-07 17:43 ` [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-07 18:39 ` Junio C Hamano @ 2022-06-07 20:21 ` Taylor Blau 1 sibling, 0 replies; 37+ messages in thread From: Taylor Blau @ 2022-06-07 20:21 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty On Tue, Jun 07, 2022 at 05:43:32PM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > > Documentation/Makefile does not include bitmap-format.txt to generate > a html page using asciidoc. > > Teach Documentation/Makefile to also generate a html page for > Documentation/technical/bitmap-format.txt file. I am glad to see us finally getting around to this ;). I proposed this back in: https://lore.kernel.org/git/b0bb2e8051f19ec47140fda6500e092e37c6bea8.1624314293.git.me@ttaylorr.com/ but I dropped it from later versions of that series, due in large part to some of the formatting issues that your series here fixes. Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 ` Abhradeep Chakraborty via GitGitGadget 2022-06-07 20:51 ` Taylor Blau 2022-06-07 17:43 ` [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget ` (2 subsequent siblings) 4 siblings, 1 reply; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 UTC (permalink / raw) To: git Cc: Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> The asciidoc generated html for `Documentation/technical/bitmap- format.txt` is broken. This is mainly because `-` is used for nested lists (which is not allowed in asciidoc) instead of `*`. Fix these and also reformat it (e.g. removing some blank lines) for better readability of the html page. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 04b3ec21785..f22669b5916 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: + * A header appears at the beginning: 4-byte signature: {'B', 'I', 'T', 'M'} @@ -48,9 +48,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. of the bitmap index (the same one as JGit). 2-byte flags (network byte order) - The following flags are supported: - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED This flag must always be present. It implies that the bitmap index has been generated for a packfile or @@ -60,7 +58,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. requirement for the bitmap index format, also present in JGit, that greatly reduces the complexity of the implementation. - - BITMAP_OPT_HASH_CACHE (0x4) If present, the end of the bitmap file contains `N` 32-bit name-hash values, one per object in the @@ -68,15 +65,13 @@ MIDXs, both the bit-cache and rev-cache extensions are required. described below. 4-byte entry count (network byte order) - The total count of entries (bitmapped commits) in this bitmap index. 20-byte checksum - The SHA1 checksum of the pack/MIDX this bitmap index belongs to. - - 4 EWAH bitmaps that act as type indexes + * 4 EWAH bitmaps that act as type indexes Type indexes are serialized after the hash cache in the shape of four EWAH bitmaps stored consecutively (see Appendix A for @@ -84,7 +79,6 @@ MIDXs, both the bit-cache and rev-cache extensions are required. There is a bitmap for each Git object type, stored in the following order: - - Commits - Trees - Blobs @@ -97,17 +91,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required. in a full set (all bits set), and the AND of all 4 bitmaps will result in an empty bitmap (no bits set). - - N entries with compressed bitmaps, one for each indexed commit + * N entries with compressed bitmaps, one for each indexed commit Where `N` is the total amount of entries in this bitmap index. Each entry contains the following: - - 4-byte object position (network byte order) + ** 4-byte object position (network byte order) The position **in the index for the packfile or multi-pack index** where the bitmap for this commit is found. - - 1-byte XOR-offset + ** 1-byte XOR-offset The xor offset used to compress this bitmap. For an entry in position `x`, a XOR offset of `y` means that the actual bitmap representing this commit is composed by XORing the @@ -124,12 +118,12 @@ MIDXs, both the bit-cache and rev-cache extensions are required. with **previous** bitmaps, not bitmaps that will come afterwards in the index. - - 1-byte flags for this bitmap + ** 1-byte flags for this bitmap At the moment the only available flag is `0x1`, which hints that this bitmap can be re-used when rebuilding bitmap indexes for the repository. - - The compressed bitmap itself, see Appendix A. + ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues 2022-06-07 17:43 ` [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 20:51 ` Taylor Blau 2022-06-07 22:02 ` Junio C Hamano 2022-06-08 15:40 ` Abhradeep Chakraborty 0 siblings, 2 replies; 37+ messages in thread From: Taylor Blau @ 2022-06-07 20:51 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty On Tue, Jun 07, 2022 at 05:43:33PM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > > The asciidoc generated html for `Documentation/technical/bitmap- > format.txt` is broken. This is mainly because `-` is used for nested > lists (which is not allowed in asciidoc) instead of `*`. > > Fix these and also reformat it (e.g. removing some blank lines) for > better readability of the html page. Hmm. When I render the HTML for this page and view it in my browser, the removed blank lines makes the contents of the section "2-byte flags (network byte order)" run together, and I think it hurts readability IMHO. Is there a way to keep those line breaks without significantly reformatting the source of this file? > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > --- > Documentation/technical/bitmap-format.txt | 20 +++++++------------- > 1 file changed, 7 insertions(+), 13 deletions(-) > > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt > index 04b3ec21785..f22669b5916 100644 > --- a/Documentation/technical/bitmap-format.txt > +++ b/Documentation/technical/bitmap-format.txt > @@ -39,7 +39,7 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > > == On-disk format > > - - A header appears at the beginning: > + * A header appears at the beginning: > > 4-byte signature: {'B', 'I', 'T', 'M'} Similarly, everything below the "A header appears at the beginning" list item appears in a <pre> element, so the rendered HTML looks more like plaintext to me. This isn't new from your patch, but I wonder if now is a good opportunity to make some light use of the formatting options that ASCIIDoc gives us to make the page read a little bit more easily when rendered as HTML. I don't want to compromise too much on the readability of the .txt file, though, so if there isn't a good way to strike this balance, then I trust you and think we should leave it as you have modified things here. Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues 2022-06-07 20:51 ` Taylor Blau @ 2022-06-07 22:02 ` Junio C Hamano 2022-06-08 16:06 ` Abhradeep Chakraborty 2022-06-08 15:40 ` Abhradeep Chakraborty 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2022-06-07 22:02 UTC (permalink / raw) To: Taylor Blau Cc: Abhradeep Chakraborty via GitGitGadget, git, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty Taylor Blau <me@ttaylorr.com> writes: > Similarly, everything below the "A header appears at the beginning" > list item appears in a <pre> element, so the rendered HTML looks more > like plaintext to me. True. Unless we are going to revamp the text in some major way so that we produce "true" HTML, not just the text source enclosed in a <pre></pre> pair, I would think we are better off keeping it not passed to AsciiDoc and leaving it in text format. After all, modern browsers, which I presume those who want HTML output files would read them with, can display plain text files just fine, don't they? > This isn't new from your patch, but I wonder if now is a good > opportunity to make some light use of the formatting options that > ASCIIDoc gives us to make the page read a little bit more easily when > rendered as HTML. There was some talk about asking those who are adept at website engineering to work on git-scm.com; it may be a good starting point to look at these text files that weren't originally written to be given to AsciiDoc and convert them to be true AsciiDoc sources. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues 2022-06-07 22:02 ` Junio C Hamano @ 2022-06-08 16:06 ` Abhradeep Chakraborty 0 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-08 16:06 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty, Git, Taylor Blau, Kaartic Sivaraam, Derrick Stolee Junio C Hamano <gitster@pobox.com> wrote: > True. Unless we are going to revamp the text in some major way so > that we produce "true" HTML, not just the text source enclosed in a > <pre></pre> pair, I would think we are better off keeping it not > passed to AsciiDoc and leaving it in text format. After all, modern > browsers, which I presume those who want HTML output files would > read them with, can display plain text files just fine, don't they? I am not sure whether that's a good idea or not. As I come from web dev background, I know that people get bored if they need to read a plain-text long article. SEO optimisation also need some beautiful designing of articles so that people can spend more time with visual ease. Of course, git doesn't need any SEO optimisation as it is very much popular. But readers want some visual satisfaction while reading Docs. That's why some people complain about GNU sites (git's site is beautiful by the way). Obviously, here I am using `people` to refer non git developers who are curious about git internals. Thanks :) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues 2022-06-07 20:51 ` Taylor Blau 2022-06-07 22:02 ` Junio C Hamano @ 2022-06-08 15:40 ` Abhradeep Chakraborty 1 sibling, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-08 15:40 UTC (permalink / raw) To: Taylor Blau Cc: Abhradeep Chakraborty, Git, Junio C Hamano, Kaartic Sivaraam, Derrick Stolee Taylor Blau <me@ttaylorr.com> wrote: > Hmm. When I render the HTML for this page and view it in my browser, the > removed blank lines makes the contents of the section "2-byte flags > (network byte order)" run together, and I think it hurts readability > IMHO. Honestly I agree with you. I also felt the same but then I thought it is still better than the currently broken page. > Is there a way to keep those line breaks without significantly > reformatting the source of this file? I have a limited knowledge on asciidoc. I removed those blank lines only because it generates weird html output. I didn't find any other way to fix that (with minimum source code changes). > This isn't new from your patch, but I wonder if now is a good > opportunity to make some light use of the formatting options that > ASCIIDoc gives us to make the page read a little bit more easily when > rendered as HTML. Yeah, quite sensible. I will surely look for better way. > I don't want to compromise too much on the readability of the .txt file, > though, so if there isn't a good way to strike this balance, then I > trust you and think we should leave it as you have modified things here. This is one of the main reason why I removed those blank lines and other stuff. It is the minimum change to fix the html doc. But I will look more into it. Thanks :) ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 ` Abhradeep Chakraborty via GitGitGadget 2022-06-07 20:56 ` Taylor Blau 2022-06-07 18:28 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget 4 siblings, 1 reply; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 17:43 UTC (permalink / raw) To: git Cc: Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Bitmap file has a trailing checksum at the end of the file. However there is no information in the bitmap-format documentation about it. Add a trailer section to include the trailing checksum info in the `Documentation/technical/bitmap-format.txt` file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index f22669b5916..a43d2fe2bbf 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -125,6 +125,10 @@ MIDXs, both the bit-cache and rev-cache extensions are required. ** The compressed bitmap itself, see Appendix A. + * TRAILER: + + Index checksum of the above contents. It is a 20-byte SHA1 checksum. + == Appendix A: Serialization format for an EWAH bitmap Ewah bitmaps are serialized in the same protocol as the JAVAEWAH -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum 2022-06-07 17:43 ` [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 20:56 ` Taylor Blau 2022-06-08 16:15 ` Abhradeep Chakraborty 0 siblings, 1 reply; 37+ messages in thread From: Taylor Blau @ 2022-06-07 20:56 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty On Tue, Jun 07, 2022 at 05:43:34PM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > > Bitmap file has a trailing checksum at the end of the file. However > there is no information in the bitmap-format documentation about it. > > Add a trailer section to include the trailing checksum info in the > `Documentation/technical/bitmap-format.txt` file. > > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> > --- > Documentation/technical/bitmap-format.txt | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt > index f22669b5916..a43d2fe2bbf 100644 > --- a/Documentation/technical/bitmap-format.txt > +++ b/Documentation/technical/bitmap-format.txt > @@ -125,6 +125,10 @@ MIDXs, both the bit-cache and rev-cache extensions are required. > > ** The compressed bitmap itself, see Appendix A. > > + * TRAILER: > + > + Index checksum of the above contents. It is a 20-byte SHA1 checksum. > + I assume by "Index checksum" you are referring to a checksum of the bitmap _index_'s contents. That term is used a little throughout pack-format.txt, but it's foreign to me. Assuming that's how you meant it, a more conventional term (I think) would be just "trailing checksum". It is also not guaranteed to be a SHA-1 checksum, if the repository which wrote the bitmap is in SHA-256 mode. So I would suggest that this addition just read: * TRAILER: Trailing checksum of the preceding contents. Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum 2022-06-07 20:56 ` Taylor Blau @ 2022-06-08 16:15 ` Abhradeep Chakraborty 0 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-08 16:15 UTC (permalink / raw) To: Taylor Blau Cc: Abhradeep Chakraborty, Git, Junio C Hamano, Kaartic Sivaraam, Derrick Stolee Taylor Blau <me@ttaylorr.com> wrote: > I assume by "Index checksum" you are referring to a checksum of the > bitmap _index_'s contents. Yeah, I meant a checksum of the bitmap file's content. > That term is used a little throughout > pack-format.txt, but it's foreign to me. Assuming that's how you meant > it, a more conventional term (I think) would be just "trailing > checksum". Actually, I copy-paste it from the pack-format.txt file ;). Will surely follow your suggestions. > It is also not guaranteed to be a SHA-1 checksum, if the repository > which wrote the bitmap is in SHA-256 mode. So I would suggest that this > addition just read: > > * TRAILER: > > Trailing checksum of the preceding contents. Got it. Thanks ! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget ` (2 preceding siblings ...) 2022-06-07 17:43 ` [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget @ 2022-06-07 18:28 ` Junio C Hamano 2022-06-07 20:58 ` Taylor Blau 2022-06-07 21:00 ` Junio C Hamano 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget 4 siblings, 2 replies; 37+ messages in thread From: Junio C Hamano @ 2022-06-07 18:28 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> writes: > There are some issues in the bitmap-format html page. "First, it does not even exist!" before anything else ;-) > For example, some > nested lists are shown as top-level lists (e.g. [1]- Here > BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as > top-level list). There is also a need of adding info about trailing checksum > in the docs. > > Changes since v1: > > * a new commit addressing bitmap-format.txt html page generation is added Good. > * Remove extra indentation from the previous change Good. > * elaborate more about the trailing checksum (as suggested by Kaartic) Good. Will take a look (and audiences are requested to do so, too). Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-07 18:28 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano @ 2022-06-07 20:58 ` Taylor Blau 2022-06-07 21:00 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: Taylor Blau @ 2022-06-07 20:58 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty via GitGitGadget, git, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty On Tue, Jun 07, 2022 at 11:28:17AM -0700, Junio C Hamano wrote: > Will take a look (and audiences are requested to do so, too). I think this is on a good track. The rendered HTML still has much of its content inside of <pre> elements, but that may be an acceptable trade-off to maintain readability of the source material. If there's a way to make the rendered page more appealing without compromising on the readability of the source, I'd be in favor of that. But I trust Abhradeep's judgement here, so if there isn't, I'd be happy with the series (mostly) as-is. I left a textual suggestion on the third patch, which I'd like to adopt before picking this up (this will also give Abhradeep a chance to investigate the formatting improvements on patch 2/3). In the meantime, it's probably safe to drop Vicent Martí from the CC list, since he is no longer working on Git (though I miss him very much!). Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-07 18:28 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 2022-06-07 20:58 ` Taylor Blau @ 2022-06-07 21:00 ` Junio C Hamano 2022-06-08 17:12 ` Abhradeep Chakraborty 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2022-06-07 21:00 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Taylor Blau, Vicent Marti, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty Junio C Hamano <gitster@pobox.com> writes: > "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> > writes: > >> There are some issues in the bitmap-format html page. > > "First, it does not even exist!" before anything else ;-) > >> For example, some >> nested lists are shown as top-level lists (e.g. [1]- Here >> BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as >> top-level list). There is also a need of adding info about trailing checksum >> in the docs. > ... No, this is not quite ready for production. Almost all the "indented" material are shown in fixed-width typewriter format in the resulting HTML output. Look how ugly the output from it is. Not your fault; it is mostly because when the original text was written, it was not even meant to be given to AsciiDoc. https://twitter.com/jch2355/status/1534276427607986178/photo/1 https://pbs.twimg.com/media/FUrYP2nakAAnRaH?format=png And as I already said, removal of the blank lines made it harder to see what is going on in the source, and because the output is pretty much straight copy of the source in the fixed-font, just like reading the source in the terminal, the output here is equally hard to read. https://twitter.com/jch2355/status/1534277664441511937/photo/1 https://pbs.twimg.com/media/FUrZZXUUsAEmEeT?format=png If we really want to give it to AsciiDoc, we'd need to reformat it more extensively, not just tweak it on the surface and making an equivalent of <pre>...</pre> slightly easier to read, which is what this patch does. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-07 21:00 ` Junio C Hamano @ 2022-06-08 17:12 ` Abhradeep Chakraborty 0 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-08 17:12 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty, Git, Taylor Blau, Kaartic Sivaraam, Derrick Stolee Junio C Hamano <gitster@pobox.com> wrote: > No, this is not quite ready for production. > > Almost all the "indented" material are shown in fixed-width > typewriter format in the resulting HTML output. > > Look how ugly the output from it is. Not your fault; it is mostly > because when the original text was written, it was not even meant to > be given to AsciiDoc. Actually, I am wondering how git-scm.com is able to produce a html page for bitmap-format.txt (if it is not passing to asciidoc). The design of asciidoc generated html pages in `make docs` are not same as the design of production html page designs. Probably, production uses some extra css code to beautify the asciidoc generated html files. So, the generated html file (production version) is not as bad as the locally built generated html. I need some understanding of the working of git-scm though (to verify it). If you see other locally built html pages - they would look similar to the bitmap-format html page. But in production, they are beautiful enough. By the way, I forgot to inform that https://git-scm.com/docs/pack-format#_original_version_1_pack_idx_files_have_the_following_format also has some weird formatting issues. See the <pre> block after the pack-idx structure drawing. There are other issues also which you can find (like having unnecessary indentations e.g. here[1] the second block under the "The header is followed by number of object entries...."). Thanks :) [1] https://git-scm.com/docs/pack-format#_pack_pack_files_have_the_following_format ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget ` (3 preceding siblings ...) 2022-06-07 18:28 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano @ 2022-06-10 10:54 ` Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget ` (4 more replies) 4 siblings, 5 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty There are some issues in the bitmap-format html page. For example, some nested lists are shown as top-level lists (e.g. [1]- Here BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as top-level list). There is also a need of adding info about trailing checksum in the docs. Changes since v2: The last two commits are updated to address the suggestions. These changes are - * previously omitted blank lines are re-added. In the updated commit, use of <pre> blocks are decreased. Description lists and + are used instead to add more than one paragraphs under lists. Readability of the source text might decrease due to the use of +. But other documentation files (e.g. git-add.txt) also use it to connect two paragraphs. So, I hope this is acceptable. * Information about trailing checksum is updated (as suggested by Taylor) Changes since v1: * a new commit addressing bitmap-format.txt html page generation is added * Remove extra indentation from the previous change * elaborate more about the trailing checksum (as suggested by Kaartic) initial version: * first commit fixes some formatting issues * information about trailing checksum in the bitmap file is added in the bitmap-format doc. [1] https://git-scm.com/docs/bitmap-format#_on_disk_format Abhradeep Chakraborty (3): bitmap-format.txt: feed the file to asciidoc to generate html bitmap-format.txt: fix some formatting issues bitmap-format.txt: add information for trailing checksum Documentation/Makefile | 1 + Documentation/technical/bitmap-format.txt | 113 ++++++++++++---------- 2 files changed, 63 insertions(+), 51 deletions(-) base-commit: 2668e3608e47494f2f10ef2b6e69f08a84816bcb Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1246%2FAbhra303%2Ffix-doc-formatting-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1246/Abhra303/fix-doc-formatting-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1246 Range-diff vs v2: 1: a1b9bd9af90 = 1: a1b9bd9af90 bitmap-format.txt: feed the file to asciidoc to generate html 2: cb919513c14 ! 2: c74b9a52c2a bitmap-format.txt: fix some formatting issues @@ Commit message format.txt` is broken. This is mainly because `-` is used for nested lists (which is not allowed in asciidoc) instead of `*`. - Fix these and also reformat it (e.g. removing some blank lines) for - better readability of the html page. + Fix these and also reformat it for better readability of the html page. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac - - A header appears at the beginning: + * A header appears at the beginning: - 4-byte signature: {'B', 'I', 'T', 'M'} +- 4-byte signature: {'B', 'I', 'T', 'M'} ++ 4-byte signature: :: {'B', 'I', 'T', 'M'} ++ ++ 2-byte version number (network byte order): :: -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. +- 2-byte version number (network byte order) + The current implementation only supports version 1 of the bitmap index (the same one as JGit). - 2-byte flags (network byte order) -- +- 2-byte flags (network byte order) ++ 2-byte flags (network byte order): :: + The following flags are supported: -- - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + +- - BITMAP_OPT_FULL_DAG (0x1) REQUIRED ++ ** {empty} ++ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: ::: ++ This flag must always be present. It implies that the bitmap index has been generated for a packfile or + multi-pack index (MIDX) with full closure (i.e. where @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. - requirement for the bitmap index format, also present in JGit, that greatly reduces the complexity of the implementation. -- - - BITMAP_OPT_HASH_CACHE (0x4) + +- - BITMAP_OPT_HASH_CACHE (0x4) ++ ** {empty} ++ BITMAP_OPT_HASH_CACHE (0x4): ::: ++ If present, the end of the bitmap file contains `N` 32-bit name-hash values, one per object in the -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. + pack/MIDX. The format and meaning of the name-hash is described below. - 4-byte entry count (network byte order) +- 4-byte entry count (network byte order) - ++ 4-byte entry count (network byte order): :: The total count of entries (bitmapped commits) in this bitmap index. - 20-byte checksum +- 20-byte checksum - ++ 20-byte checksum: :: The SHA1 checksum of the pack/MIDX this bitmap index belongs to. - - 4 EWAH bitmaps that act as type indexes -+ * 4 EWAH bitmaps that act as type indexes - - Type indexes are serialized after the hash cache in the shape - of four EWAH bitmaps stored consecutively (see Appendix A for -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. - - There is a bitmap for each Git object type, stored in the following - order: - - - Commits - - Trees - - Blobs -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. - in a full set (all bits set), and the AND of all 4 bitmaps will - result in an empty bitmap (no bits set). - +- Type indexes are serialized after the hash cache in the shape +- of four EWAH bitmaps stored consecutively (see Appendix A for +- the serialization format of an EWAH bitmap). +- +- There is a bitmap for each Git object type, stored in the following +- order: +- +- - Commits +- - Trees +- - Blobs +- - Tags +- +- In each bitmap, the `n`th bit is set to true if the `n`th object +- in the packfile or multi-pack index is of that type. +- +- The obvious consequence is that the OR of all 4 bitmaps will result +- in a full set (all bits set), and the AND of all 4 bitmaps will +- result in an empty bitmap (no bits set). +- - - N entries with compressed bitmaps, one for each indexed commit -+ * N entries with compressed bitmaps, one for each indexed commit - - Where `N` is the total amount of entries in this bitmap index. - Each entry contains the following: - +- +- Where `N` is the total amount of entries in this bitmap index. +- Each entry contains the following: +- - - 4-byte object position (network byte order) -+ ** 4-byte object position (network byte order) ++ * 4 EWAH bitmaps that act as type indexes +++ ++Type indexes are serialized after the hash cache in the shape ++of four EWAH bitmaps stored consecutively (see Appendix A for ++the serialization format of an EWAH bitmap). +++ ++There is a bitmap for each Git object type, stored in the following ++order: +++ ++ - Commits ++ - Trees ++ - Blobs ++ - Tags ++ +++ ++In each bitmap, the `n`th bit is set to true if the `n`th object ++in the packfile or multi-pack index is of that type. ++ ++ The obvious consequence is that the OR of all 4 bitmaps will result ++ in a full set (all bits set), and the AND of all 4 bitmaps will ++ result in an empty bitmap (no bits set). ++ ++ * N entries with compressed bitmaps, one for each indexed commit +++ ++Where `N` is the total amount of entries in this bitmap index. ++Each entry contains the following: ++ ++ ** {empty} ++ 4-byte object position (network byte order): :: The position **in the index for the packfile or multi-pack index** where the bitmap for this commit is found. - - 1-byte XOR-offset -+ ** 1-byte XOR-offset ++ ** {empty} ++ 1-byte XOR-offset: :: The xor offset used to compress this bitmap. For an entry in position `x`, a XOR offset of `y` means that the actual bitmap representing this commit is composed by XORing the -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). +- +- Note that this compression can be recursive. In order to +- XOR this entry with a previous one, the previous entry needs +- to be decompressed first, and so on. +- +- The hard-limit for this offset is 160 (an entry can only be +- xor'ed against one of the 160 entries preceding it). This +- number is always positive, and hence entries are always xor'ed +- with **previous** bitmaps, not bitmaps that will come afterwards +- in the index. +- - - 1-byte flags for this bitmap -+ ** 1-byte flags for this bitmap +++ ++NOTE: This compression can be recursive. In order to ++XOR this entry with a previous one, the previous entry needs ++to be decompressed first, and so on. +++ ++The hard-limit for this offset is 160 (an entry can only be ++xor'ed against one of the 160 entries preceding it). This ++number is always positive, and hence entries are always xor'ed ++with **previous** bitmaps, not bitmaps that will come afterwards ++in the index. ++ ++ ** {empty} ++ 1-byte flags for this bitmap: :: At the moment the only available flag is `0x1`, which hints that this bitmap can be re-used when rebuilding bitmap indexes for the repository. 3: 2171d31fb2b ! 3: b971558e1cb bitmap-format.txt: add information for trailing checksum @@ Commit message Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> ## Documentation/technical/bitmap-format.txt ## -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. +@@ Documentation/technical/bitmap-format.txt: in the index. ** The compressed bitmap itself, see Appendix A. -+ * TRAILER: -+ -+ Index checksum of the above contents. It is a 20-byte SHA1 checksum. ++ * {empty} ++ TRAILER: :: ++ Trailing checksum of the preceding contents. + == Appendix A: Serialization format for an EWAH bitmap -- gitgitgadget ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v3 1/3] bitmap-format.txt: feed the file to asciidoc to generate html 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 ` Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget ` (3 subsequent siblings) 4 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Documentation/Makefile does not include bitmap-format.txt to generate a html page using asciidoc. Teach Documentation/Makefile to also generate a html page for Documentation/technical/bitmap-format.txt file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index d3f043f50d2..8d405a14330 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -94,6 +94,7 @@ TECH_DOCS += MyFirstContribution TECH_DOCS += MyFirstObjectWalk TECH_DOCS += SubmittingPatches TECH_DOCS += ToolsForGit +TECH_DOCS += technical/bitmap-format TECH_DOCS += technical/bundle-format TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 ` Abhradeep Chakraborty via GitGitGadget 2022-06-15 2:27 ` Taylor Blau 2022-06-10 10:54 ` [PATCH v3 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget ` (2 subsequent siblings) 4 siblings, 1 reply; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> The asciidoc generated html for `Documentation/technical/bitmap- format.txt` is broken. This is mainly because `-` is used for nested lists (which is not allowed in asciidoc) instead of `*`. Fix these and also reformat it for better readability of the html page. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 109 ++++++++++++---------- 1 file changed, 58 insertions(+), 51 deletions(-) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 04b3ec21785..cd621379f42 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -39,19 +39,22 @@ MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: + * A header appears at the beginning: - 4-byte signature: {'B', 'I', 'T', 'M'} + 4-byte signature: :: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order): :: - 2-byte version number (network byte order) The current implementation only supports version 1 of the bitmap index (the same one as JGit). - 2-byte flags (network byte order) + 2-byte flags (network byte order): :: The following flags are supported: - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED + ** {empty} + BITMAP_OPT_FULL_DAG (0x1) REQUIRED: ::: + This flag must always be present. It implies that the bitmap index has been generated for a packfile or multi-pack index (MIDX) with full closure (i.e. where @@ -61,75 +64,79 @@ MIDXs, both the bit-cache and rev-cache extensions are required. JGit, that greatly reduces the complexity of the implementation. - - BITMAP_OPT_HASH_CACHE (0x4) + ** {empty} + BITMAP_OPT_HASH_CACHE (0x4): ::: + If present, the end of the bitmap file contains `N` 32-bit name-hash values, one per object in the pack/MIDX. The format and meaning of the name-hash is described below. - 4-byte entry count (network byte order) - + 4-byte entry count (network byte order): :: The total count of entries (bitmapped commits) in this bitmap index. - 20-byte checksum - + 20-byte checksum: :: The SHA1 checksum of the pack/MIDX this bitmap index belongs to. - - 4 EWAH bitmaps that act as type indexes - - Type indexes are serialized after the hash cache in the shape - of four EWAH bitmaps stored consecutively (see Appendix A for - the serialization format of an EWAH bitmap). - - There is a bitmap for each Git object type, stored in the following - order: - - - Commits - - Trees - - Blobs - - Tags - - In each bitmap, the `n`th bit is set to true if the `n`th object - in the packfile or multi-pack index is of that type. - - The obvious consequence is that the OR of all 4 bitmaps will result - in a full set (all bits set), and the AND of all 4 bitmaps will - result in an empty bitmap (no bits set). - - - N entries with compressed bitmaps, one for each indexed commit - - Where `N` is the total amount of entries in this bitmap index. - Each entry contains the following: - - - 4-byte object position (network byte order) + * 4 EWAH bitmaps that act as type indexes ++ +Type indexes are serialized after the hash cache in the shape +of four EWAH bitmaps stored consecutively (see Appendix A for +the serialization format of an EWAH bitmap). ++ +There is a bitmap for each Git object type, stored in the following +order: ++ + - Commits + - Trees + - Blobs + - Tags + ++ +In each bitmap, the `n`th bit is set to true if the `n`th object +in the packfile or multi-pack index is of that type. + + The obvious consequence is that the OR of all 4 bitmaps will result + in a full set (all bits set), and the AND of all 4 bitmaps will + result in an empty bitmap (no bits set). + + * N entries with compressed bitmaps, one for each indexed commit ++ +Where `N` is the total amount of entries in this bitmap index. +Each entry contains the following: + + ** {empty} + 4-byte object position (network byte order): :: The position **in the index for the packfile or multi-pack index** where the bitmap for this commit is found. - - 1-byte XOR-offset + ** {empty} + 1-byte XOR-offset: :: The xor offset used to compress this bitmap. For an entry in position `x`, a XOR offset of `y` means that the actual bitmap representing this commit is composed by XORing the bitmap for this entry with the bitmap in entry `x-y` (i.e. the bitmap `y` entries before this one). - - Note that this compression can be recursive. In order to - XOR this entry with a previous one, the previous entry needs - to be decompressed first, and so on. - - The hard-limit for this offset is 160 (an entry can only be - xor'ed against one of the 160 entries preceding it). This - number is always positive, and hence entries are always xor'ed - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - - - 1-byte flags for this bitmap ++ +NOTE: This compression can be recursive. In order to +XOR this entry with a previous one, the previous entry needs +to be decompressed first, and so on. ++ +The hard-limit for this offset is 160 (an entry can only be +xor'ed against one of the 160 entries preceding it). This +number is always positive, and hence entries are always xor'ed +with **previous** bitmaps, not bitmaps that will come afterwards +in the index. + + ** {empty} + 1-byte flags for this bitmap: :: At the moment the only available flag is `0x1`, which hints that this bitmap can be re-used when rebuilding bitmap indexes for the repository. - - The compressed bitmap itself, see Appendix A. + ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues 2022-06-10 10:54 ` [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-15 2:27 ` Taylor Blau 2022-06-15 14:28 ` Abhradeep Chakraborty 0 siblings, 1 reply; 37+ messages in thread From: Taylor Blau @ 2022-06-15 2:27 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty Hi Abhradeep, On Fri, Jun 10, 2022 at 10:54:40AM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > ++ > +In each bitmap, the `n`th bit is set to true if the `n`th object > +in the packfile or multi-pack index is of that type. > + > + The obvious consequence is that the OR of all 4 bitmaps will result > + in a full set (all bits set), and the AND of all 4 bitmaps will > + result in an empty bitmap (no bits set). > + > + * N entries with compressed bitmaps, one for each indexed commit > ++ > +Where `N` is the total amount of entries in this bitmap index. > +Each entry contains the following: The new formatting looks terrific; it's much easier to read this in my browser after generating the HTML version of these docs. Two questions: - Are the hard-tabs added in this file required for ASCIIDoc to treat it correctly? They are a slight impediment to reading the source in my editor, but it's not a huge deal. It would just be nice if we could replace "\t" characters with two or four spaces or something. - The above hunk is the only one which rendered slightly oddly to me; it looks like the paragraph beginning with "The obvious consequence ..." is surrounded by a <pre> element, when it should be a continuation of the above paragraph ("In each bitmap ..."). Otherwise, this series is looking great. Let me know what you think! Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues 2022-06-15 2:27 ` Taylor Blau @ 2022-06-15 14:28 ` Abhradeep Chakraborty 0 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty @ 2022-06-15 14:28 UTC (permalink / raw) To: Taylor Blau Cc: Abhradeep Chakraborty, Git, Junio C Hamano, Kaartic Sivaraam, Derrick Stolee Taylor Blau <me@ttaylorr.com> wrote: > - Are the hard-tabs added in this file required for ASCIIDoc to treat it > correctly? They are a slight impediment to reading the source in my > editor, but it's not a huge deal. It would just be nice if we could > replace "\t" characters with two or four spaces or something. No, it is not required for Asciidoc. But `git diff --check` was complaining against it. Don't know if that is related to my git configuration settings. Moreover other parts of the file didn't seem to use spaces. For these reasons, I used tabs. But can remove it if you say. > - The above hunk is the only one which rendered slightly oddly to me; it > looks like the paragraph beginning with "The obvious consequence ..." > is surrounded by a <pre> element, when it should be a continuation of > the above paragraph ("In each bitmap ..."). Thanks for pointing out. Don't know how it was missed. Correcting it. Thanks :) ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v3 3/3] bitmap-format.txt: add information for trailing checksum 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 ` Abhradeep Chakraborty via GitGitGadget 2022-06-10 17:01 ` [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget 4 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 10:54 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Bitmap file has a trailing checksum at the end of the file. However there is no information in the bitmap-format documentation about it. Add a trailer section to include the trailing checksum info in the `Documentation/technical/bitmap-format.txt` file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index cd621379f42..3f8cdd0ed91 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -138,6 +138,10 @@ in the index. ** The compressed bitmap itself, see Appendix A. + * {empty} + TRAILER: :: + Trailing checksum of the preceding contents. + == Appendix A: Serialization format for an EWAH bitmap Ewah bitmaps are serialized in the same protocol as the JAVAEWAH -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget ` (2 preceding siblings ...) 2022-06-10 10:54 ` [PATCH v3 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget @ 2022-06-10 17:01 ` Junio C Hamano 2022-06-15 2:28 ` Taylor Blau 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget 4 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2022-06-10 17:01 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> writes: > There are some issues in the bitmap-format html page. For example, some > nested lists are shown as top-level lists (e.g. [1]- Here > BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as > top-level list). There is also a need of adding info about trailing checksum > in the docs. Quite honestly, I am not sure if a piecemeal "let's make <pre>...</pre> a bit prettier" is worth our time. Especially relative to the importance of adding missing information to the documentation. So, if this round (I haven't looked at the formatting changes at all yet) turns out to be still not doing the HTML properly, I'd suggest shuffling the patches around, add missing information so that readers can get the corrections in text regardless of the rest of HTMLify effort. We'll see. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-10 17:01 ` [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano @ 2022-06-15 2:28 ` Taylor Blau 2022-06-15 22:41 ` Junio C Hamano 0 siblings, 1 reply; 37+ messages in thread From: Taylor Blau @ 2022-06-15 2:28 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty On Fri, Jun 10, 2022 at 10:01:02AM -0700, Junio C Hamano wrote: > "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com> > writes: > > > There are some issues in the bitmap-format html page. For example, some > > nested lists are shown as top-level lists (e.g. [1]- Here > > BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as > > top-level list). There is also a need of adding info about trailing checksum > > in the docs. > > Quite honestly, I am not sure if a piecemeal "let's make > <pre>...</pre> a bit prettier" is worth our time. Especially > relative to the importance of adding missing information to the > documentation. > > So, if this round (I haven't looked at the formatting changes at all > yet) turns out to be still not doing the HTML properly, I'd suggest > shuffling the patches around, add missing information so that readers > can get the corrections in text regardless of the rest of HTMLify > effort. We'll see. This version of the series significantly improves the readability of the generated HTML, and I only had a minor comment or two. So I think that the improvement is worthwhile, though if others disagree strongly, the third patch should get picked up regardless, since it addresses a legitimate gap in our documentation. Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-15 2:28 ` Taylor Blau @ 2022-06-15 22:41 ` Junio C Hamano 0 siblings, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2022-06-15 22:41 UTC (permalink / raw) To: Taylor Blau Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty Taylor Blau <me@ttaylorr.com> writes: > This version of the series significantly improves the readability of the > generated HTML, and I only had a minor comment or two. Yeah, I looked at the output and it is improved so much to the point that the remaining paragraph or two that are still typeset in the fixed font incorrectly start to look even irritating ;-) I've tentatively queued it in my tree. I doubt that the topic is ultra-urgent so if the remaining mark-up issues can be fixed before the topic hits 'next', that would be great. Thanks, both. ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget ` (3 preceding siblings ...) 2022-06-10 17:01 ` [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano @ 2022-06-16 5:03 ` Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget ` (3 more replies) 4 siblings, 4 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty There are some issues in the bitmap-format html page. For example, some nested lists are shown as top-level lists (e.g. [1]- Here BITMAP_OPT_FULL_DAG (0x1) and BITMAP_OPT_HASH_CACHE (0x4) are shown as top-level list). There is also a need of adding info about trailing checksum in the docs. Changes since v3: * spaces are used instead of tabs * fixed remaining <pre> blocks Changes since v2: The last two commits are updated to address the suggestions. These changes are - * previously omitted blank lines are re-added. In the updated commit, use of <pre> blocks are decreased. Description lists and + are used instead to add more than one paragraphs under lists. Readability of the source text might decrease due to the use of +. But other documentation files (e.g. git-add.txt) also use it to connect two paragraphs. So, I hope this is acceptable. * Information about trailing checksum is updated (as suggested by Taylor) Changes since v1: * a new commit addressing bitmap-format.txt html page generation is added * Remove extra indentation from the previous change * elaborate more about the trailing checksum (as suggested by Kaartic) initial version: * first commit fixes some formatting issues * information about trailing checksum in the bitmap file is added in the bitmap-format doc. [1] https://git-scm.com/docs/bitmap-format#_on_disk_format Abhradeep Chakraborty (3): bitmap-format.txt: feed the file to asciidoc to generate html bitmap-format.txt: fix some formatting issues bitmap-format.txt: add information for trailing checksum Documentation/Makefile | 1 + Documentation/technical/bitmap-format.txt | 203 ++++++++++++---------- 2 files changed, 108 insertions(+), 96 deletions(-) base-commit: 5699ec1b0aec51b9e9ba5a2785f65970c5a95d84 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1246%2FAbhra303%2Ffix-doc-formatting-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1246/Abhra303/fix-doc-formatting-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1246 Range-diff vs v3: 1: a1b9bd9af90 ! 1: 494c1c1bd52 bitmap-format.txt: feed the file to asciidoc to generate html @@ Documentation/Makefile: TECH_DOCS += MyFirstContribution TECH_DOCS += ToolsForGit +TECH_DOCS += technical/bitmap-format TECH_DOCS += technical/bundle-format + TECH_DOCS += technical/cruft-packs TECH_DOCS += technical/hash-function-transition - TECH_DOCS += technical/http-protocol 2: c74b9a52c2a ! 2: 25512aa9c5b bitmap-format.txt: fix some formatting issues @@ Commit message Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> ## Documentation/technical/bitmap-format.txt ## +@@ Documentation/technical/bitmap-format.txt: An object is uniquely described by its bit position within a bitmap: + is defined as follows: + + o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2) +- +- The ordering between packs is done according to the MIDX's .rev file. +- Notably, the preferred pack sorts ahead of all other packs. +++ ++The ordering between packs is done according to the MIDX's .rev file. ++Notably, the preferred pack sorts ahead of all other packs. + + The on-disk representation (described below) of a bitmap is the same regardless + of whether or not that bitmap belongs to a packfile or a MIDX. The only @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: -+ * A header appears at the beginning: - +- - 4-byte signature: {'B', 'I', 'T', 'M'} -+ 4-byte signature: :: {'B', 'I', 'T', 'M'} -+ -+ 2-byte version number (network byte order): :: - +- - 2-byte version number (network byte order) - The current implementation only supports version 1 - of the bitmap index (the same one as JGit). - +- The current implementation only supports version 1 +- of the bitmap index (the same one as JGit). +- - 2-byte flags (network byte order) -+ 2-byte flags (network byte order): :: - - The following flags are supported: - +- +- The following flags are supported: +- - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED -+ ** {empty} -+ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: ::: -+ - This flag must always be present. It implies that the - bitmap index has been generated for a packfile or - multi-pack index (MIDX) with full closure (i.e. where -@@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required. - JGit, that greatly reduces the complexity of the - implementation. - +- This flag must always be present. It implies that the +- bitmap index has been generated for a packfile or +- multi-pack index (MIDX) with full closure (i.e. where +- every single object in the packfile/MIDX can find its +- parent links inside the same packfile/MIDX). This is a +- requirement for the bitmap index format, also present in +- JGit, that greatly reduces the complexity of the +- implementation. +- - - BITMAP_OPT_HASH_CACHE (0x4) -+ ** {empty} -+ BITMAP_OPT_HASH_CACHE (0x4): ::: -+ - If present, the end of the bitmap file contains - `N` 32-bit name-hash values, one per object in the - pack/MIDX. The format and meaning of the name-hash is - described below. - +- If present, the end of the bitmap file contains +- `N` 32-bit name-hash values, one per object in the +- pack/MIDX. The format and meaning of the name-hash is +- described below. +- - 4-byte entry count (network byte order) - -+ 4-byte entry count (network byte order): :: - The total count of entries (bitmapped commits) in this bitmap index. - +- The total count of entries (bitmapped commits) in this bitmap index. +- - 20-byte checksum - -+ 20-byte checksum: :: - The SHA1 checksum of the pack/MIDX this bitmap index - belongs to. - +- The SHA1 checksum of the pack/MIDX this bitmap index +- belongs to. +- - - 4 EWAH bitmaps that act as type indexes - - Type indexes are serialized after the hash cache in the shape @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac - Each entry contains the following: - - - 4-byte object position (network byte order) -+ * 4 EWAH bitmaps that act as type indexes +- The position **in the index for the packfile or +- multi-pack index** where the bitmap for this commit is +- found. +- +- - 1-byte XOR-offset +- The xor offset used to compress this bitmap. For an entry +- in position `x`, a XOR offset of `y` means that the actual +- bitmap representing this commit is composed by XORing the +- bitmap for this entry with the bitmap in entry `x-y` (i.e. +- the bitmap `y` entries before this one). +- +- Note that this compression can be recursive. In order to +- XOR this entry with a previous one, the previous entry needs +- to be decompressed first, and so on. +- +- The hard-limit for this offset is 160 (an entry can only be +- xor'ed against one of the 160 entries preceding it). This +- number is always positive, and hence entries are always xor'ed +- with **previous** bitmaps, not bitmaps that will come afterwards +- in the index. +- +- - 1-byte flags for this bitmap +- At the moment the only available flag is `0x1`, which hints +- that this bitmap can be re-used when rebuilding bitmap indexes +- for the repository. +- +- - The compressed bitmap itself, see Appendix A. ++ * A header appears at the beginning: ++ ++ 4-byte signature: :: {'B', 'I', 'T', 'M'} ++ ++ 2-byte version number (network byte order): :: ++ ++ The current implementation only supports version 1 ++ of the bitmap index (the same one as JGit). ++ ++ 2-byte flags (network byte order): :: ++ ++ The following flags are supported: ++ ++ ** {empty} ++ BITMAP_OPT_FULL_DAG (0x1) REQUIRED: ::: ++ ++ This flag must always be present. It implies that the ++ bitmap index has been generated for a packfile or ++ multi-pack index (MIDX) with full closure (i.e. where ++ every single object in the packfile/MIDX can find its ++ parent links inside the same packfile/MIDX). This is a ++ requirement for the bitmap index format, also present in ++ JGit, that greatly reduces the complexity of the ++ implementation. ++ ++ ** {empty} ++ BITMAP_OPT_HASH_CACHE (0x4): ::: ++ ++ If present, the end of the bitmap file contains ++ `N` 32-bit name-hash values, one per object in the ++ pack/MIDX. The format and meaning of the name-hash is ++ described below. ++ ++ 4-byte entry count (network byte order): :: ++ The total count of entries (bitmapped commits) in this bitmap index. ++ ++ 20-byte checksum: :: ++ The SHA1 checksum of the pack/MIDX this bitmap index ++ belongs to. ++ ++ * 4 EWAH bitmaps that act as type indexes ++ +Type indexes are serialized after the hash cache in the shape +of four EWAH bitmaps stored consecutively (see Appendix A for @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac +There is a bitmap for each Git object type, stored in the following +order: ++ -+ - Commits -+ - Trees -+ - Blobs -+ - Tags ++ - Commits ++ - Trees ++ - Blobs ++ - Tags + ++ +In each bitmap, the `n`th bit is set to true if the `n`th object +in the packfile or multi-pack index is of that type. +++ ++The obvious consequence is that the OR of all 4 bitmaps will result ++in a full set (all bits set), and the AND of all 4 bitmaps will ++result in an empty bitmap (no bits set). + -+ The obvious consequence is that the OR of all 4 bitmaps will result -+ in a full set (all bits set), and the AND of all 4 bitmaps will -+ result in an empty bitmap (no bits set). -+ -+ * N entries with compressed bitmaps, one for each indexed commit ++ * N entries with compressed bitmaps, one for each indexed commit ++ +Where `N` is the total amount of entries in this bitmap index. +Each entry contains the following: + -+ ** {empty} -+ 4-byte object position (network byte order): :: - The position **in the index for the packfile or - multi-pack index** where the bitmap for this commit is - found. - -- - 1-byte XOR-offset -+ ** {empty} -+ 1-byte XOR-offset: :: - The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual - bitmap representing this commit is composed by XORing the - bitmap for this entry with the bitmap in entry `x-y` (i.e. - the bitmap `y` entries before this one). -- -- Note that this compression can be recursive. In order to -- XOR this entry with a previous one, the previous entry needs -- to be decompressed first, and so on. -- -- The hard-limit for this offset is 160 (an entry can only be -- xor'ed against one of the 160 entries preceding it). This -- number is always positive, and hence entries are always xor'ed -- with **previous** bitmaps, not bitmaps that will come afterwards -- in the index. -- -- - 1-byte flags for this bitmap ++ ** {empty} ++ 4-byte object position (network byte order): :: ++ The position **in the index for the packfile or ++ multi-pack index** where the bitmap for this commit is ++ found. ++ ++ ** {empty} ++ 1-byte XOR-offset: :: ++ The xor offset used to compress this bitmap. For an entry ++ in position `x`, a XOR offset of `y` means that the actual ++ bitmap representing this commit is composed by XORing the ++ bitmap for this entry with the bitmap in entry `x-y` (i.e. ++ the bitmap `y` entries before this one). ++ +NOTE: This compression can be recursive. In order to +XOR this entry with a previous one, the previous entry needs @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac +with **previous** bitmaps, not bitmaps that will come afterwards +in the index. + -+ ** {empty} -+ 1-byte flags for this bitmap: :: - At the moment the only available flag is `0x1`, which hints - that this bitmap can be re-used when rebuilding bitmap indexes - for the repository. - -- - The compressed bitmap itself, see Appendix A. -+ ** The compressed bitmap itself, see Appendix A. ++ ** {empty} ++ 1-byte flags for this bitmap: :: ++ At the moment the only available flag is `0x1`, which hints ++ that this bitmap can be re-used when rebuilding bitmap indexes ++ for the repository. ++ ++ ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap +@@ Documentation/technical/bitmap-format.txt: implementation: + - 4-byte number of words of the COMPRESSED bitmap, when stored + + - N x 8-byte words, as specified by the previous field +- +- This is the actual content of the compressed bitmap. +++ ++This is the actual content of the compressed bitmap. + + - 4-byte position of the current RLW for the compressed + bitmap 3: b971558e1cb ! 3: dbb86dca205 bitmap-format.txt: add information for trailing checksum @@ Commit message ## Documentation/technical/bitmap-format.txt ## @@ Documentation/technical/bitmap-format.txt: in the index. - ** The compressed bitmap itself, see Appendix A. + ** The compressed bitmap itself, see Appendix A. + * {empty} + TRAILER: :: -- gitgitgadget ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v4 1/3] bitmap-format.txt: feed the file to asciidoc to generate html 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 ` Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget ` (2 subsequent siblings) 3 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Documentation/Makefile does not include bitmap-format.txt to generate a html page using asciidoc. Teach Documentation/Makefile to also generate a html page for Documentation/technical/bitmap-format.txt file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index f2e7fc1daa5..4f801f4e4c9 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -94,6 +94,7 @@ TECH_DOCS += MyFirstContribution TECH_DOCS += MyFirstObjectWalk TECH_DOCS += SubmittingPatches TECH_DOCS += ToolsForGit +TECH_DOCS += technical/bitmap-format TECH_DOCS += technical/bundle-format TECH_DOCS += technical/cruft-packs TECH_DOCS += technical/hash-function-transition -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v4 2/3] bitmap-format.txt: fix some formatting issues 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 ` Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget 2022-06-16 18:53 ` [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 3 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> The asciidoc generated html for `Documentation/technical/bitmap- format.txt` is broken. This is mainly because `-` is used for nested lists (which is not allowed in asciidoc) instead of `*`. Fix these and also reformat it for better readability of the html page. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 199 +++++++++++----------- 1 file changed, 103 insertions(+), 96 deletions(-) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 04b3ec21785..49c8e819804 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -25,9 +25,9 @@ An object is uniquely described by its bit position within a bitmap: is defined as follows: o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2) - - The ordering between packs is done according to the MIDX's .rev file. - Notably, the preferred pack sorts ahead of all other packs. ++ +The ordering between packs is done according to the MIDX's .rev file. +Notably, the preferred pack sorts ahead of all other packs. The on-disk representation (described below) of a bitmap is the same regardless of whether or not that bitmap belongs to a packfile or a MIDX. The only @@ -39,97 +39,104 @@ MIDXs, both the bit-cache and rev-cache extensions are required. == On-disk format - - A header appears at the beginning: - - 4-byte signature: {'B', 'I', 'T', 'M'} - - 2-byte version number (network byte order) - The current implementation only supports version 1 - of the bitmap index (the same one as JGit). - - 2-byte flags (network byte order) - - The following flags are supported: - - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED - This flag must always be present. It implies that the - bitmap index has been generated for a packfile or - multi-pack index (MIDX) with full closure (i.e. where - every single object in the packfile/MIDX can find its - parent links inside the same packfile/MIDX). This is a - requirement for the bitmap index format, also present in - JGit, that greatly reduces the complexity of the - implementation. - - - BITMAP_OPT_HASH_CACHE (0x4) - If present, the end of the bitmap file contains - `N` 32-bit name-hash values, one per object in the - pack/MIDX. The format and meaning of the name-hash is - described below. - - 4-byte entry count (network byte order) - - The total count of entries (bitmapped commits) in this bitmap index. - - 20-byte checksum - - The SHA1 checksum of the pack/MIDX this bitmap index - belongs to. - - - 4 EWAH bitmaps that act as type indexes - - Type indexes are serialized after the hash cache in the shape - of four EWAH bitmaps stored consecutively (see Appendix A for - the serialization format of an EWAH bitmap). - - There is a bitmap for each Git object type, stored in the following - order: - - - Commits - - Trees - - Blobs - - Tags - - In each bitmap, the `n`th bit is set to true if the `n`th object - in the packfile or multi-pack index is of that type. - - The obvious consequence is that the OR of all 4 bitmaps will result - in a full set (all bits set), and the AND of all 4 bitmaps will - result in an empty bitmap (no bits set). - - - N entries with compressed bitmaps, one for each indexed commit - - Where `N` is the total amount of entries in this bitmap index. - Each entry contains the following: - - - 4-byte object position (network byte order) - The position **in the index for the packfile or - multi-pack index** where the bitmap for this commit is - found. - - - 1-byte XOR-offset - The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual - bitmap representing this commit is composed by XORing the - bitmap for this entry with the bitmap in entry `x-y` (i.e. - the bitmap `y` entries before this one). - - Note that this compression can be recursive. In order to - XOR this entry with a previous one, the previous entry needs - to be decompressed first, and so on. - - The hard-limit for this offset is 160 (an entry can only be - xor'ed against one of the 160 entries preceding it). This - number is always positive, and hence entries are always xor'ed - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - - - 1-byte flags for this bitmap - At the moment the only available flag is `0x1`, which hints - that this bitmap can be re-used when rebuilding bitmap indexes - for the repository. - - - The compressed bitmap itself, see Appendix A. + * A header appears at the beginning: + + 4-byte signature: :: {'B', 'I', 'T', 'M'} + + 2-byte version number (network byte order): :: + + The current implementation only supports version 1 + of the bitmap index (the same one as JGit). + + 2-byte flags (network byte order): :: + + The following flags are supported: + + ** {empty} + BITMAP_OPT_FULL_DAG (0x1) REQUIRED: ::: + + This flag must always be present. It implies that the + bitmap index has been generated for a packfile or + multi-pack index (MIDX) with full closure (i.e. where + every single object in the packfile/MIDX can find its + parent links inside the same packfile/MIDX). This is a + requirement for the bitmap index format, also present in + JGit, that greatly reduces the complexity of the + implementation. + + ** {empty} + BITMAP_OPT_HASH_CACHE (0x4): ::: + + If present, the end of the bitmap file contains + `N` 32-bit name-hash values, one per object in the + pack/MIDX. The format and meaning of the name-hash is + described below. + + 4-byte entry count (network byte order): :: + The total count of entries (bitmapped commits) in this bitmap index. + + 20-byte checksum: :: + The SHA1 checksum of the pack/MIDX this bitmap index + belongs to. + + * 4 EWAH bitmaps that act as type indexes ++ +Type indexes are serialized after the hash cache in the shape +of four EWAH bitmaps stored consecutively (see Appendix A for +the serialization format of an EWAH bitmap). ++ +There is a bitmap for each Git object type, stored in the following +order: ++ + - Commits + - Trees + - Blobs + - Tags + ++ +In each bitmap, the `n`th bit is set to true if the `n`th object +in the packfile or multi-pack index is of that type. ++ +The obvious consequence is that the OR of all 4 bitmaps will result +in a full set (all bits set), and the AND of all 4 bitmaps will +result in an empty bitmap (no bits set). + + * N entries with compressed bitmaps, one for each indexed commit ++ +Where `N` is the total amount of entries in this bitmap index. +Each entry contains the following: + + ** {empty} + 4-byte object position (network byte order): :: + The position **in the index for the packfile or + multi-pack index** where the bitmap for this commit is + found. + + ** {empty} + 1-byte XOR-offset: :: + The xor offset used to compress this bitmap. For an entry + in position `x`, a XOR offset of `y` means that the actual + bitmap representing this commit is composed by XORing the + bitmap for this entry with the bitmap in entry `x-y` (i.e. + the bitmap `y` entries before this one). ++ +NOTE: This compression can be recursive. In order to +XOR this entry with a previous one, the previous entry needs +to be decompressed first, and so on. ++ +The hard-limit for this offset is 160 (an entry can only be +xor'ed against one of the 160 entries preceding it). This +number is always positive, and hence entries are always xor'ed +with **previous** bitmaps, not bitmaps that will come afterwards +in the index. + + ** {empty} + 1-byte flags for this bitmap: :: + At the moment the only available flag is `0x1`, which hints + that this bitmap can be re-used when rebuilding bitmap indexes + for the repository. + + ** The compressed bitmap itself, see Appendix A. == Appendix A: Serialization format for an EWAH bitmap @@ -142,8 +149,8 @@ implementation: - 4-byte number of words of the COMPRESSED bitmap, when stored - N x 8-byte words, as specified by the previous field - - This is the actual content of the compressed bitmap. ++ +This is the actual content of the compressed bitmap. - 4-byte position of the current RLW for the compressed bitmap -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v4 3/3] bitmap-format.txt: add information for trailing checksum 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 ` Abhradeep Chakraborty via GitGitGadget 2022-06-16 18:53 ` [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 3 siblings, 0 replies; 37+ messages in thread From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 5:03 UTC (permalink / raw) To: git Cc: Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Junio C Hamano, Abhradeep Chakraborty, Abhradeep Chakraborty From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Bitmap file has a trailing checksum at the end of the file. However there is no information in the bitmap-format documentation about it. Add a trailer section to include the trailing checksum info in the `Documentation/technical/bitmap-format.txt` file. Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> --- Documentation/technical/bitmap-format.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 49c8e819804..7be5f2318ba 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -138,6 +138,10 @@ in the index. ** The compressed bitmap itself, see Appendix A. + * {empty} + TRAILER: :: + Trailing checksum of the preceding contents. + == Appendix A: Serialization format for an EWAH bitmap Ewah bitmaps are serialized in the same protocol as the JAVAEWAH -- gitgitgadget ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget ` (2 preceding siblings ...) 2022-06-16 5:03 ` [PATCH v4 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget @ 2022-06-16 18:53 ` Junio C Hamano 2022-06-16 21:18 ` Taylor Blau 3 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2022-06-16 18:53 UTC (permalink / raw) To: Abhradeep Chakraborty via GitGitGadget Cc: git, Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty This version looks good and seems to format well. Well done. Thanks. Will queue. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info 2022-06-16 18:53 ` [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano @ 2022-06-16 21:18 ` Taylor Blau 0 siblings, 0 replies; 37+ messages in thread From: Taylor Blau @ 2022-06-16 21:18 UTC (permalink / raw) To: Junio C Hamano Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau, Kaartic Sivaraam, Derrick Stolee, Abhradeep Chakraborty On Thu, Jun 16, 2022 at 11:53:27AM -0700, Junio C Hamano wrote: > This version looks good and seems to format well. Well done. Agreed. Nice work, Abhradeep! Thanks, Taylor ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2022-06-16 21:19 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-02 13:52 [PATCH 0/2] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2022-06-02 13:52 ` [PATCH 1/2] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget 2022-06-06 15:55 ` Junio C Hamano 2022-06-07 10:25 ` Abhradeep Chakraborty 2022-06-02 13:52 ` [PATCH 2/2] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Abhradeep Chakraborty via GitGitGadget 2022-06-07 17:43 ` [PATCH v2 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-07 18:39 ` Junio C Hamano 2022-06-08 15:02 ` Abhradeep Chakraborty 2022-06-07 20:21 ` Taylor Blau 2022-06-07 17:43 ` [PATCH v2 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget 2022-06-07 20:51 ` Taylor Blau 2022-06-07 22:02 ` Junio C Hamano 2022-06-08 16:06 ` Abhradeep Chakraborty 2022-06-08 15:40 ` Abhradeep Chakraborty 2022-06-07 17:43 ` [PATCH v2 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget 2022-06-07 20:56 ` Taylor Blau 2022-06-08 16:15 ` Abhradeep Chakraborty 2022-06-07 18:28 ` [PATCH v2 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 2022-06-07 20:58 ` Taylor Blau 2022-06-07 21:00 ` Junio C Hamano 2022-06-08 17:12 ` Abhradeep Chakraborty 2022-06-10 10:54 ` [PATCH v3 " Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-10 10:54 ` [PATCH v3 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget 2022-06-15 2:27 ` Taylor Blau 2022-06-15 14:28 ` Abhradeep Chakraborty 2022-06-10 10:54 ` [PATCH v3 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget 2022-06-10 17:01 ` [PATCH v3 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 2022-06-15 2:28 ` Taylor Blau 2022-06-15 22:41 ` Junio C Hamano 2022-06-16 5:03 ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 1/3] bitmap-format.txt: feed the file to asciidoc to generate html Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 2/3] bitmap-format.txt: fix some formatting issues Abhradeep Chakraborty via GitGitGadget 2022-06-16 5:03 ` [PATCH v4 3/3] bitmap-format.txt: add information for trailing checksum Abhradeep Chakraborty via GitGitGadget 2022-06-16 18:53 ` [PATCH v4 0/3] bitmap-format.txt: fix some formatting issues and include checksum info Junio C Hamano 2022-06-16 21:18 ` Taylor Blau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).