git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	gitster@pobox.com, jnareb@gmail.com, tboegi@web.de,
	mlbright@gmail.com, remi.galan-alfonso@ensimag.grenoble-inp.fr,
	pclouds@gmail.com, e@80x24.org, ramsay@ramsayjones.plus.com
Subject: Re: [PATCH v2 5/5] convert: add filter.<driver>.process option
Date: Wed, 27 Jul 2016 14:11:48 -0400	[thread overview]
Message-ID: <20160727181148.GC32219@sigill.intra.peff.net> (raw)
In-Reply-To: <5FE50D2C-5D97-4523-9BE2-88745B3F83EA@gmail.com>

On Wed, Jul 27, 2016 at 07:31:26PM +0200, Lars Schneider wrote:

> >> +	strbuf_grow(sb, size + 1);	// we need one extra byte for the packet flush
> > 
> > What happens if size is the maximum for size_t here (i.e., 4GB-1 on a
> > 32-bit system)?
> 
> Would that be an acceptable solution?
> 
> if (size + 1 > SIZE_MAX)
> 	return die("unrepresentable length for filter buffer");

No, because by definition "size" will wrap to 0. :)

You have to do:

  if (size > SIZE_MAX - 1)
	die("whoops");

> Can you point me to an example in the Git source how this kind of thing should
> be handled?

The strbuf code itself checks for overflows. So you could do:

  strbuf_grow(sb, size);
  ... fill up with size bytes ...
  strbuf_addch(sb, ...); /* extra byte for whatever */

That does mean _possibly_ making a second allocation just to add the
extra byte, but in practice it's not likely (unless the input exactly
matches the strbuf's growth pattern).

If you want to do it yourself, I think:

  strbuf_grow(sb, st_add(size, 1));

would work.

> >> +	while (
> >> +		bytes_read > 0 && 					// the last packet was no flush
> >> +		sb->len - total_bytes_read - 1 > 0 	// we still have space left in the buffer
> >> +	);
> > 
> > And I'm not sure if you need to distinguish between "0" and "-1" when
> > checking byte_read here.
> 
> We want to finish reading in both cases, no?

If we get "-1", that's from an unexpected EOF during the packet_read(),
because you set GENTLE_ON_EOF. So there's nothing left to read, and we
should break and return an error.

I guess "0" would come from a flush packet? Why would the filter send
back a flush packet (unless you were using them to signal end-of-input,
but then why does the filter have to send back the number of bytes ahead
of time?).

> > Why 8K? The pkt-line format naturally restricts us to just under 64K, so
> > why not take advantage of that and minimize the framing overhead for
> > large data?
> 
> I took inspiration from here for 8K MAX_IO_SIZE:
> https://github.com/git/git/blob/master/copy.c#L6
> 
> Is this read limit correct? Should I read 8 times to fill a pkt-line?

MAX_IO_SIZE is generally 8 _megabytes_, not 8K. The loop in copy.c just
haad to pick an arbitrary size for doing its read/write proxying.  I
think in practice you are not likely to get much benefit from going
beyond 8K or so there, just because operating systems tend to do things
in page-sizes anyway, which are usually 4K.

But since you are formatting the result into a form that has framing
overhead, anything up to LARGE_PACKET_MAX will see benefits (though
admittedly even 4 bytes per 8K is not much).

I don't think it's worth the complexity of reading 8 times, but just
using a buffer size of LARGE_PACKET_MAX-4 would be the most efficient.

I doubt it matters _that much_ in practice, but any time I see a magic
number I have to wonder at the "why". At least basing it on
LARGE_PACKET_MAX has some basis, whereas 8K is largely just made-up. :)

> > We do sometimes do "ret |= something()" but that is in cases where
> > "ret" is zero for success, and non-zero (usually -1) otherwise. Perhaps
> > your function's error-reporting is inverted from our usual style?
> 
> I thought it makes the code easier to read and the filter doesn't care
> at what point the error happens anyways. The filter either succeeds
> or fails. What style would you suggest?

I think that's orthogonal. I just mean that using zero for success puts
you in our usual style, and then accumulating errors can be done with
"|=".

I didn't look carefully at whether the accumulating style you're using
makes sense or not. But something like:

> >> +		ret &= write_in_full(out, &header, sizeof(header)) == sizeof(header);
> >> +		ret &= write_in_full(out, src, bytes_to_write) == bytes_to_write;

does mean that we call the second write() even if the first one failed.
That's a waste of time (albeit a minor one), but it also means you could
potentially cover up the value of "errno" from the first one (though in
practice I'd expect the second one to fail for the same reason).

-Peff

  reply	other threads:[~2016-07-27 18:11 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-22 15:48 [PATCH v1 0/3] Git filter protocol larsxschneider
2016-07-22 15:48 ` [PATCH v1 1/3] convert: quote filter names in error messages larsxschneider
2016-07-22 15:48 ` [PATCH v1 2/3] convert: modernize tests larsxschneider
2016-07-26 15:18   ` Remi Galan Alfonso
2016-07-26 20:40     ` Junio C Hamano
2016-07-22 15:49 ` [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option larsxschneider
2016-07-22 22:32   ` Torsten Bögershausen
2016-07-24 12:09     ` Lars Schneider
2016-07-22 23:19   ` Ramsay Jones
2016-07-22 23:28     ` Ramsay Jones
2016-07-24 17:16     ` Lars Schneider
2016-07-24 22:36       ` Ramsay Jones
2016-07-24 23:22         ` Jakub Narębski
2016-07-25 20:32           ` Lars Schneider
2016-07-26 10:58             ` Jakub Narębski
2016-07-25 20:24         ` Lars Schneider
2016-07-23  0:11   ` Jakub Narębski
2016-07-23  7:27     ` Eric Wong
2016-07-26 20:00       ` Jeff King
2016-07-24 18:36     ` Lars Schneider
2016-07-24 20:14       ` Jakub Narębski
2016-07-24 21:30         ` Jakub Narębski
2016-07-25 20:16           ` Lars Schneider
2016-07-26 12:24             ` Jakub Narębski
2016-07-25 20:09         ` Lars Schneider
2016-07-26 14:18           ` Jakub Narębski
2016-07-23  8:14   ` Eric Wong
2016-07-24 19:11     ` Lars Schneider
2016-07-25  7:27       ` Eric Wong
2016-07-25 15:48       ` Duy Nguyen
2016-07-22 21:39 ` [PATCH v1 0/3] Git filter protocol Junio C Hamano
2016-07-24 11:24   ` Lars Schneider
2016-07-26 20:11     ` Jeff King
2016-07-27  0:06 ` [PATCH v2 0/5] " larsxschneider
2016-07-27  0:06   ` [PATCH v2 1/5] convert: quote filter names in error messages larsxschneider
2016-07-27 20:01     ` Jakub Narębski
2016-07-28  8:23       ` Lars Schneider
2016-07-27  0:06   ` [PATCH v2 2/5] convert: modernize tests larsxschneider
2016-07-27  0:06   ` [PATCH v2 3/5] pkt-line: extract and use `set_packet_header` function larsxschneider
2016-07-27  0:20     ` Junio C Hamano
2016-07-27  9:13       ` Lars Schneider
2016-07-27 16:31         ` Junio C Hamano
2016-07-27  0:06   ` [PATCH v2 4/5] convert: generate large test files only once larsxschneider
2016-07-27  2:35     ` Torsten Bögershausen
2016-07-27 13:32       ` Jeff King
2016-07-27 16:50         ` Lars Schneider
2016-07-27  0:06   ` [PATCH v2 5/5] convert: add filter.<driver>.process option larsxschneider
2016-07-27  1:32     ` Jeff King
2016-07-27 17:31       ` Lars Schneider
2016-07-27 18:11         ` Jeff King [this message]
2016-07-28 12:10           ` Lars Schneider
2016-07-28 13:35             ` Jeff King
2016-07-27  9:41     ` Eric Wong
2016-07-29 10:38       ` Lars Schneider
2016-07-29 11:24         ` Jakub Narębski
2016-07-29 11:31           ` Lars Schneider
2016-08-05 18:55         ` Eric Wong
2016-08-05 23:26           ` Lars Schneider
2016-08-05 23:38             ` Eric Wong
2016-07-27 23:31     ` Jakub Narębski
2016-07-29  8:04       ` Lars Schneider
2016-07-29 17:35         ` Junio C Hamano
2016-07-29 23:11           ` Jakub Narębski
2016-07-29 23:44             ` Lars Schneider
2016-07-30  9:32               ` Jakub Narębski
2016-07-28 10:32     ` Torsten Bögershausen
2016-07-27 19:08   ` [PATCH v2 0/5] Git filter protocol Jakub Narębski
2016-07-28  7:16     ` Lars Schneider
2016-07-28 10:42       ` Jakub Narębski
2016-07-28 13:29       ` Jeff King
2016-07-29  7:40         ` Jakub Narębski
2016-07-29  8:14           ` Lars Schneider
2016-07-29 15:57             ` Jeff King
2016-07-29 16:20               ` Lars Schneider
2016-07-29 16:50                 ` Jeff King
2016-07-29 17:43                   ` Lars Schneider
2016-07-29 18:27                     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160727181148.GC32219@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=remi.galan-alfonso@ensimag.grenoble-inp.fr \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).