git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com,
	git@jeffhostetler.com, Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH 0/3] Convert index writes to use hashfile API
Date: Fri, 26 Mar 2021 19:12:04 +0000	[thread overview]
Message-ID: <pull.916.git.1616785928.gitgitgadget@gmail.com> (raw)

As I prepare some ideas on index v5, one thing that strikes me as an
interesting direction to try is to use the chunk-format API. This would make
our extension model extremely simple (they become optional chunks, easily
identified by the table of contents).

But there is a huge hurdle to even starting that investigation: the index
uses its own hashing methods, separate from the hashfile API in csum-file.c!

The internals of the algorithms are mostly identical. The only possible
change is that the buffer sizes are different: 8KB for hashfile and 128KB in
read-cache.c. I was unable to find a performance difference in these two
implementations, despite testing on several repo sizes.

There is a subtle point about how the EOIE extension works in that it needs
a hash of just the previous extension data. This is solved by adding a new
"nested hashfile" mechanism that computes the hash at one level and then
passes the data below to another hashfile. (The good news is that this
extension will not need to exist at all if we use the chunk-format API to
manage extensions.)

Thanks, -Stolee

Derrick Stolee (3):
  csum-file: add nested_hashfile()
  read-cache: use hashfile instead of git_hash_ctx
  read-cache: delete unused hashing methods

 csum-file.c  |  22 +++++++
 csum-file.h  |   9 +++
 read-cache.c | 182 ++++++++++++++++-----------------------------------
 3 files changed, 89 insertions(+), 124 deletions(-)


base-commit: 142430338477d9d1bb25be66267225fb58498d92
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-916%2Fderrickstolee%2Findex-hashfile-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-916/derrickstolee/index-hashfile-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/916
-- 
gitgitgadget

             reply	other threads:[~2021-03-26 19:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 19:12 Derrick Stolee via GitGitGadget [this message]
2021-03-26 19:12 ` [PATCH 1/3] csum-file: add nested_hashfile() Derrick Stolee via GitGitGadget
2021-03-26 19:12 ` [PATCH 2/3] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-03-29 15:04   ` Derrick Stolee
2021-03-29 19:10     ` Derrick Stolee
2021-03-26 19:12 ` [PATCH 3/3] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-03-26 20:16 ` [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee
2021-05-17 12:24 ` [PATCH v2 0/4] " Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-05-17 21:54     ` Junio C Hamano
2021-05-18  7:33       ` Jeff King
2021-05-18 14:44         ` Derrick Stolee
2021-05-18  7:31     ` Jeff King
2021-05-18  7:42       ` Jeff King
2021-05-17 12:24   ` [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-17 22:13     ` Junio C Hamano
2021-05-18 14:16       ` Derrick Stolee
2021-05-17 12:24   ` [PATCH v2 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-05-18 18:32   ` [PATCH v3 0/4] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-11-25 12:14       ` t4216-log-bloom.sh fails with -v (but not --verbose-log) Ævar Arnfjörð Bjarmason
2021-11-26  4:08         ` Jeff King
2021-11-29 13:49           ` Derrick Stolee
2021-05-18 18:32     ` [PATCH v3 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.916.git.1616785928.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --subject='Re: [PATCH 0/3] Convert index writes to use hashfile API' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).