git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, git@jeffhostetler.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx
Date: Tue, 18 May 2021 10:16:22 -0400	[thread overview]
Message-ID: <1641be46-c8d7-08ae-ebe0-7f3eb3589b27@gmail.com> (raw)
In-Reply-To: <xmqqfsyl57q2.fsf@gitster.g>

On 5/17/2021 6:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> ...
>> mult-pack-indexes, and commit-graphs. Therefore, it seems prudent to
> 
> multi-pack, I would say.
> 
>> There are still some remaining: the extension headers are hashed for use
> 
> some remaining what?  I first read an unwritten word as "issues",
> but I think the answer is "uses of git_hash_ctx".

Thanks for pointing these out. I will fix them.

>> in the End of Index Entries (EOIE) extension. This use of the
>> git_hash_ctx is left as-is. There are multiple reasons to not use a
>> hashfile here, including ...
> 
>> In addition to the test suite passing, I computed indexes using the
>> previous binaries and the binaries compiled after this change, and found
>> the index data to be exactly equal. Finally, I did extensive performance
>> testing of "git update-index --force-write" on repos of various sizes,
>> including one with over 2 million paths at HEAD. These tests
>> demonstrated less than 1% difference in behavior, so the performance
>> should be considered identical.
> 
> Hmph, does that mean 128k buffer is overkill and if we wanted to
> unify the buffer sizes we should have used 8k instead?

The buffer was previously increased to 128k because it makes a
difference in performance when writing the index.

The thing I'm measuring here is the difference between the old
writing code and the new hashfile code. Using the hashfile API
(with an identical buffer size) does not have a meaningful
performance impact, as it should.

I can make this clearer.

> Wait, the removal of fsync has made things faster in general, hasn't
> it?  Did something else degrade performance to cancel that gain?

Are you thinking about [1], which originally was talking about a
change to fsync() calls, but really ended up just making the same
behavior more readable?

[1] https://lore.kernel.org/git/pull.914.v2.git.1616762291574.gitgitgadget@gmail.com/

I was focused on that because I had initially seen a performance
degradation when I did this refactor. It turns out that my measurements
were not robust enough to the noise, which has been remedied.

> The patch looks an obvious improvement.  What was open-coded in
> longhand is now a well structured series of API calls and the result
> is much easier to follow and maintain.

That is the goal. I'm glad you agree.

Thanks,
-Stolee

  reply	other threads:[~2021-05-18 14:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 19:12 [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-03-26 19:12 ` [PATCH 1/3] csum-file: add nested_hashfile() Derrick Stolee via GitGitGadget
2021-03-26 19:12 ` [PATCH 2/3] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-03-29 15:04   ` Derrick Stolee
2021-03-29 19:10     ` Derrick Stolee
2021-03-26 19:12 ` [PATCH 3/3] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-03-26 20:16 ` [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee
2021-05-17 12:24 ` [PATCH v2 0/4] " Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-05-17 21:54     ` Junio C Hamano
2021-05-18  7:33       ` Jeff King
2021-05-18 14:44         ` Derrick Stolee
2021-05-18  7:31     ` Jeff King
2021-05-18  7:42       ` Jeff King
2021-05-17 12:24   ` [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-17 22:13     ` Junio C Hamano
2021-05-18 14:16       ` Derrick Stolee [this message]
2021-05-17 12:24   ` [PATCH v2 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-05-18 18:32   ` [PATCH v3 0/4] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-11-25 12:14       ` t4216-log-bloom.sh fails with -v (but not --verbose-log) Ævar Arnfjörð Bjarmason
2021-11-26  4:08         ` Jeff King
2021-11-29 13:49           ` Derrick Stolee
2021-05-18 18:32     ` [PATCH v3 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1641be46-c8d7-08ae-ebe0-7f3eb3589b27@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).