Re: [PATCH] csum-file: flush less often

From: Junio C Hamano <gitster@pobox.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, peff@peff.net, me@ttaylorr.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH] csum-file: flush less often
Date: Thu, 25 Mar 2021 11:46:20 -0700	[thread overview]
Message-ID: <xmqq8s6bvzpf.fsf@gitster.g> (raw)
In-Reply-To: <84ccabca-0bd3-d0cb-6b38-f96d75c0bbd6@gmail.com> (Derrick Stolee's message of "Thu, 25 Mar 2021 07:55:55 -0400")

Derrick Stolee <stolee@gmail.com> writes:

> But...is there something we could still do here?
>
> My confusion about flushing is mostly due to my error, but upon
> reflection the loop is doing a lot of different things, but most of
> the time we know which behavior we need at the start, in the middle,
> and at the end:
>
>      1. Fill the existing buffer with the beginning of 'buf'. If the
>         hashfile's buffer is full, then flush.

"But do not do this if f->buffer is empty, and we are writing out
more than sizeof(f->buffer)." is missing, isn't it?

>      2. Flush sizeof(f->buffer) chunks directly out of 'buf' as long as
>         possible.
>     
>      3. Copy the remaining bytes out of 'buf' into the hashfile's buffer.

It is debatable if the existing loop, which came mostly from Nico's
a8032d12 (sha1write: don't copy full sized buffers, 2008-09-02), is
too clever; I personally find it concise and readable enough, but my
reading is tainted.

If you use a couple of helpers to reduce the repeated "crc and hash"
pattern in your variant, it may become easier to follow than the
original, but I dunno.

> Here is a rewrite that more explicitly follows this flow:
>
> void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
> {
> 	const int full_buffer = sizeof(f->buffer);
> 	unsigned left = full_buffer - f->offset;
> 	unsigned nr = count > left ? left : count;
>
> 	/*
> 	 * Initially fill the buffer in a batch until it
> 	 * is full, then flush.
> 	 */
> 	if (f->do_crc)
> 		f->crc32 = crc32(f->crc32, buf, nr);
>
> 	memcpy(f->buffer + f->offset, buf, nr);

Here, if the f->buffer was empty, we end up memcpy a full bufferful
unconditionally.  Nico's original cleverly takes advantage of the
fact that 'nr' would be the full buffer size only when the f->buffer
was empty upon entry to the function and we have more byte than the
size of the buffer to copy out directly from 'buf'.

> 	f->offset += nr;
> 	count -= nr;
> 	buf = (char *) buf + nr;
>
> 	if (left == nr)
> 		hashflush(f);
>
> 	/*
> 	 * After filling the hashfile's buffer and flushing, take
> 	 * batches of full_buffer bytes directly from the input
> 	 * buffer.
> 	 */
> 	while (count >= full_buffer) {
> 		if (f->do_crc)
> 			f->crc32 = crc32(f->crc32, buf, full_buffer);
>
> 		the_hash_algo->update_fn(&f->ctx, buf, full_buffer);
> 		flush(f, buf, full_buffer);
>
> 		count -= full_buffer;
> 		buf = (char *) buf + full_buffer;
> 	}
>
> 	/*
> 	 * Capture any remaining bytes at the end of the input buffer
> 	 * into the hashfile's buffer. We do not need to flush because
> 	 * count is strictly less than full_buffer here.
> 	 */
> 	if (count) {
> 		if (f->do_crc)
> 			f->crc32 = crc32(f->crc32, buf, count);
>
> 		memcpy(f->buffer + f->offset, buf, count);
> 		f->offset = count;
> 	}
> 	
> 	if (f->base)
> 		hashwrite(f->base, buf, count);
> }
> ...
> So, I'm of two minds here:
>
>  1. This is embarassing. I wasted everyone's time for nothing. I can retract
>     this patch.
>
>  2. This is embarassing. I overstated the problem here. But we might be able
>     to eke out a tiny performance boost here.
>
> I'm open to either. I think we should default to dropping this patch unless
> someone thinks the rewrite above is a better organization of the logic. (I
> can then send a v2 including that version and an updated commit message.)

3. The current code around "if (nr == sizeof(f->buffer))" might be a
   bit too clever for readers who try to understand what is going
   on, and the whole "while" loop may deserve a comment based on
   what you wrote before your replacement implementation.