All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Claire Fousse <claire.fousse@ensimag.imag.fr>,
	git@vger.kernel.org, Sylvain Boulme <Sylvain.Boulme@imag.fr>,
	Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Subject: Re: [PATCH 01/10] strbuf_split: add a max parameter
Date: Mon, 13 Jun 2011 15:20:55 -0400	[thread overview]
Message-ID: <20110613192055.GE17845@sigill.intra.peff.net> (raw)
In-Reply-To: <7voc21od0g.fsf@alter.siamese.dyndns.org>

On Mon, Jun 13, 2011 at 10:30:07AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I am tempted to just call this new one strbuf_split and update all
> > callers. There aren't that many.
> 
> Yes, that is indeed tempting, and because we have a new parameter the
> compiler will catch any new callers that pop up in a mismerge so that
> would be perfectly safe.

Should we also change the naming later in the series to remain
consistent with strbuf_add. IOW, to end up at:

  struct strbuf **strbuf_split(const char *buf, int len, int delim, int max);
  struct strbuf **strbuf_split_str(const char *s, int delim, int max);
  struct strbuf **strbuf_split_buf(const struct strbuf *, int delim, int max);

(though I think consistency would also dictate "splitstr" and "splitbuf"
without the extra underscore. Personally I find it a bit unreadable).

> > -struct strbuf **strbuf_split(const struct strbuf *sb, int delim)
> > +struct strbuf **strbuf_split_max(const struct strbuf *sb, int delim, int max)
> >  {
> >  	int alloc = 2, pos = 0;
> >  	char *n, *p;
> > @@ -114,7 +114,10 @@ struct strbuf **strbuf_split(const struct strbuf *sb, int delim)
> >  	p = n = sb->buf;
> >  	while (n < sb->buf + sb->len) {
> >  		int len;
> > -		n = memchr(n, delim, sb->len - (n - sb->buf));
> > +		if (max <= 0 || pos + 1 < max)
> > +			n = memchr(n, delim, sb->len - (n - sb->buf));
> > +		else
> > +			n = NULL;
> >  		if (pos + 1 >= alloc) {
> >  			alloc = alloc * 2;
> >  			ret = xrealloc(ret, sizeof(struct strbuf *) * alloc);
> 
> Hmm, even when we know the value of max, we go exponential, and even do so
> by hand without using ALLOC_GROW(). Somewhat sad.

Thanks for reminding me. I noticed it wasn't using ALLOC_GROW, but
decided not to change it because I wanted to introduce an optimization
later on not to grow beyond max. But then I forgot. :)

The optimization I was going to do was to simply allocate "max" slots at
the beginning (if it's defined). You know you can't grow beyond that,
and in most splits with a max, the caller is expecting all of them to be
filled.

But your two-pass patch below is also reasonable.

> Also do we currently rely on the bug that strbuf_split() returns (NULL,)
> instead of ("", NULL) when given an empty string?  If not, perhaps...

I assumed that behavior was not a bug (and even had to avoid a segfault
with it in a later series, as you saw). But thinking on it more, it
really is one; splitting even a single character without delimiter ends
up with a non-NULL portion, and I think the empty string should do the
same.

>  strbuf.c |   50 +++++++++++++++++++++++++++++++-------------------
>  1 files changed, 31 insertions(+), 19 deletions(-)

I think your patch looks reasonable. In theory doing two passes over a
very large buffer (e.g., splitting lines from a large commit message)
might be slightly less efficient, but I imagine it is drowned out in the
noise of malloc'ing strbufs.

> +	for (pass = 0; pass < 2; pass++) {
> +		/* First pass counts, second pass allocates and fills */

Maybe it is just me, but I tend not to like writing multi-pass stuff
like this as a for-loop, but instead to factor it into a function with
an "actually allocate" parameter. I find it makes the code much more
obvious.

> +	if (!count) {
>  		t = xmalloc(sizeof(struct strbuf));
> -		strbuf_init(t, len);
> -		strbuf_add(t, p, len);
> -		ret[pos] = t;
> -		ret[++pos] = NULL;
> -		p = ++n;
> +		strbuf_init(t, 0);
> +		ret[0] = t;
>  	}

I think my test in 4/10 (which avoids the segfault by checking
explicitly for NULL in the caller) should go with this part, and then
4/10 can go away.

-Peff

  reply	other threads:[~2011-06-13 19:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-08 11:19 Git-Mediawiki : cloning a set of pages Claire Fousse
2011-06-08 15:19 ` Jeff King
2011-06-08 17:04   ` Sverre Rabbelier
2011-06-08 17:13     ` Jeff King
2011-06-09 15:50   ` Jeff King
2011-06-09 15:51     ` [PATCH 01/10] strbuf_split: add a max parameter Jeff King
2011-06-13 17:30       ` Junio C Hamano
2011-06-13 19:20         ` Jeff King [this message]
2011-06-09 15:51     ` [PATCH 02/10] fix "git -c" parsing of values with equals signs Jeff King
2011-06-09 15:52     ` [PATCH 03/10] config: die on error in command-line config Jeff King
2011-06-09 15:52     ` [PATCH 04/10] config: avoid segfault when parsing " Jeff King
2011-06-13 17:30       ` Junio C Hamano
2011-06-13 19:22         ` Jeff King
2011-06-09 15:54     ` [PATCH 05/10] strbuf: allow strbuf_split to work on non-strbufs Jeff King
2011-06-09 15:55     ` [PATCH 06/10] config: use strbuf_split_str instead of a temporary strbuf Jeff King
2011-06-09 15:55     ` [PATCH 07/10] parse-options: add OPT_STRING_LIST helper Jeff King
2011-06-09 15:55     ` [PATCH 08/10] remote: use new OPT_STRING_LIST Jeff King
2011-06-09 15:56     ` [PATCH 09/10] config: make git_config_parse_parameter a public function Jeff King
2011-06-09 15:57     ` [PATCH 10/10] clone: accept config options on the command line Jeff King
2011-06-09 17:10       ` Bert Wesarg
2011-06-09 17:12         ` Jeff King
2011-06-09 20:56           ` Jeff King
2011-06-09 22:34       ` Matthieu Moy
2011-06-08 17:14 ` Git-Mediawiki : cloning a set of pages Jakub Narebski
2011-06-09  9:06   ` Claire Fousse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110613192055.GE17845@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=Sylvain.Boulme@imag.fr \
    --cc=claire.fousse@ensimag.imag.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.