All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Cc: git@vger.kernel.org, simon.rabourg@ensimag.grenoble-inp.fr,
	francois.beutin@ensimag.grenoble-inp.fr,
	antoine.queru@ensimag.grenoble-inp.fr,
	matthieu.moy@grenoble-inp.fr, mhagger@alum.mit.edu
Subject: Re: [PATCH 2/2] strbuf: allow to use preallocated memory
Date: Mon, 30 May 2016 14:13:52 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1605301326530.4449@virtualbox> (raw)
In-Reply-To: <20160530103642.7213-3-william.duclot@ensimag.grenoble-inp.fr>

Hi William,

On Mon, 30 May 2016, William Duclot wrote:

> It is unfortunate that it is currently impossible to use a strbuf
> without doing a memory allocation. So code like
> 
> void f()
> {
>     char path[PATH_MAX];
>     ...
> }
> 
> typically gets turned into either
> 
> void f()
> {
>     struct strbuf path;
>     strbuf_add(&path, ...); <-- does a malloc
>     ...
>     strbuf_release(&path);  <-- does a free
> }
> 
> which costs extra memory allocations, or
> 
> void f()
> {
>     static struct strbuf path;
>     strbuf_add(&path, ...);
>     ...
>     strbuf_setlen(&path, 0);
> }
> 
> which, by using a static variable, avoids most of the malloc/free
> overhead, but makes the function unsafe to use recursively or from
> multiple threads. Those limitations prevent strbuf to be used in
> performance-critical operations.

This description is nice and verbose, but maybe something like this would
introduce the subject in a quicker manner?

	When working e.g. with file paths or with dates, strbuf's
	malloc()/free() dance of strbufs can be easily avoided: as
	a sensible initial buffer size is already known, it can be
	allocated on the heap.

The rest of the commit message flows nicely.

> diff --git a/strbuf.c b/strbuf.c
> index 1ba600b..527b986 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -1,6 +1,14 @@
>  #include "cache.h"
>  #include "refs.h"
>  #include "utf8.h"
> +#include <sys/param.h>

Why?

> +/**
> + * Flags
> + * --------------
> + */
> +#define STRBUF_OWNS_MEMORY 1
> +#define STRBUF_FIXED_MEMORY (1 << 1)

>From reading the commit message, I expected STRBUF_OWNS_MEMORY.
STRBUF_FIXED_MEMORY still needs to be explained.

> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1];
>  
>  void strbuf_init(struct strbuf *sb, size_t hint)
>  {
> +	sb->flags = 0;
>  	sb->alloc = sb->len = 0;
>  	sb->buf = strbuf_slopbuf;
>  	if (hint)
>  		strbuf_grow(sb, hint);
>  }
>  
> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf,
> +			      size_t path_buf_len, size_t alloc_len)
> +{
> +	if (!path_buf)
> +		die("you try to use a NULL buffer to initialize a strbuf");
> +
> +	strbuf_init(sb, 0);
> +	strbuf_attach(sb, path_buf, path_buf_len, alloc_len);
> +	sb->flags &= ~STRBUF_OWNS_MEMORY;
> +	sb->flags &= ~STRBUF_FIXED_MEMORY;

Shorter: sb->flags &= ~(STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY);

> +}
> +
> +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf,
> +		       size_t path_buf_len, size_t alloc_len)
> +{
> +	strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len);
> +	sb->flags |= STRBUF_FIXED_MEMORY;
> +}

Rather than letting strbuf_wrap_preallocated() set sb->flags &=
~FIXED_MEMORY only to revert that decision right away, a static function
could be called by both strbuf_wrap_preallocated() and
strbuf_wrap_fixed().

>  void strbuf_release(struct strbuf *sb)
>  {
>  	if (sb->alloc) {
> -		free(sb->buf);
> +		if (sb->flags & STRBUF_OWNS_MEMORY)
> +			free(sb->buf);
>  		strbuf_init(sb, 0);
>  	}

Should we not reset the flags here, too?

> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz)
>  {
>  	char *res;
>  	strbuf_grow(sb, 0);
> -	res = sb->buf;
> +	if (sb->flags & STRBUF_OWNS_MEMORY)
> +		res = sb->buf;
> +	else
> +		res = xmemdupz(sb->buf, sb->alloc - 1);

This looks like a usage to be avoided: if we plan to detach the buffer,
anyway, there is no good reason to allocate it on the heap first. I would
at least issue a warning here.

> @@ -51,6 +84,8 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc)
>  	sb->buf   = buf;
>  	sb->len   = len;
>  	sb->alloc = alloc;
> +	sb->flags |= STRBUF_OWNS_MEMORY;
> +	sb->flags &= ~STRBUF_FIXED_MEMORY;
>  	strbuf_grow(sb, 0);
>  	sb->buf[sb->len] = '\0';
>  }
> @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra)
>  	if (unsigned_add_overflows(extra, 1) ||
>  	    unsigned_add_overflows(sb->len, extra + 1))
>  		die("you want to use way too much memory");
> -	if (new_buf)
> -		sb->buf = NULL;
> -	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> +	if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc)
> +		die("you try to make a string overflow the buffer of a fixed strbuf");

We try to avoid running over 80 columns/row. This message could be
more to the point: cannot grow fixed string

> +	/*
> +	 * ALLOC_GROW may do a realloc() if needed, so we must not use it on
> +	 * a buffer the strbuf doesn't own
> +	 */
> +	if (sb->flags & STRBUF_OWNS_MEMORY) {
> +		if (new_buf)
> +			sb->buf = NULL;
> +		ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
> +	} else {
> +		/*
> +		 * The strbuf doesn't own the buffer: to avoid to realloc it,
> +		 * the strbuf needs to use a new buffer without freeing the old
> +		 */
> +		if (sb->len + extra + 1 > sb->alloc) {
> +			size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc));
> +			char *buf = xmalloc(new_alloc);
> +			memcpy(buf, sb->buf, sb->alloc);
> +			sb->buf = buf;
> +			sb->alloc = new_alloc;
> +			sb->flags |= STRBUF_OWNS_MEMORY;
> +		}
> +	}
> +
>  	if (new_buf)
>  		sb->buf[0] = '\0';
>  }
> diff --git a/strbuf.h b/strbuf.h
> index 7987405..634759c 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -11,11 +11,16 @@
>   * A strbuf is NUL terminated for convenience, but no function in the
>   * strbuf API actually relies on the string being free of NULs.
>   *
> + * You can avoid the malloc/free overhead of `strbuf_init()`, `strbuf_add()` and
> + * `strbuf_release()` by wrapping pre-allocated memory (stack-allocated for
> + * example) using `strbuf_wrap_preallocated()` or `strbuf_wrap_fixed()`.
> + *
>   * strbufs have some invariants that are very important to keep in mind:
>   *
>   *  - The `buf` member is never NULL, so it can be used in any usual C
>   *    string operations safely. strbuf's _have_ to be initialized either by
> - *    `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though.
> + *    `strbuf_init()`, `= STRBUF_INIT`, `strbuf_wrap_preallocated()` or
> + *    `strbuf_wrap_fixed()` before the invariants, though.
>   *
>   *    Do *not* assume anything on what `buf` really is (e.g. if it is
>   *    allocated memory or not), use `strbuf_detach()` to unwrap a memory
> @@ -62,13 +67,14 @@
>   * access to the string itself.
>   */
>  struct strbuf {
> +	unsigned int flags;
>  	size_t alloc;
>  	size_t len;
>  	char *buf;
>  };
>  
>  extern char strbuf_slopbuf[];
> -#define STRBUF_INIT  { 0, 0, strbuf_slopbuf }
> +#define STRBUF_INIT  { 0, 0, 0, strbuf_slopbuf }

If I am not mistaken, to preserve the existing behavior the initial flags
should be 1 (own memory).

BTW this demonstrates that it may not be a good idea to declare the
"flags" field globally but then make the actual flags private.

Also: similar use cases in Git used :1 flags (see e.g. the "configured"
field in credential.h).

Ciao,
Johannes

  reply	other threads:[~2016-05-30 12:14 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot
2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot
2016-05-30 11:26   ` Johannes Schindelin
2016-05-30 13:42     ` Simon Rabourg
2016-05-30 11:56   ` Matthieu Moy
2016-05-31  2:04   ` Michael Haggerty
2016-05-31  9:48     ` Simon Rabourg
2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot
2016-05-30 12:13   ` Johannes Schindelin [this message]
2016-05-30 13:20     ` William Duclot
2016-05-31  6:21       ` Johannes Schindelin
2016-05-31  3:05     ` Michael Haggerty
2016-05-31  6:41       ` Johannes Schindelin
2016-05-31  8:25         ` Michael Haggerty
2016-05-30 12:52   ` Matthieu Moy
2016-05-30 14:15     ` William Duclot
2016-05-30 14:34       ` Matthieu Moy
2016-05-30 15:16         ` William Duclot
2016-05-31  4:05     ` Michael Haggerty
2016-05-31 15:59       ` William Duclot
2016-06-03 14:04       ` William Duclot
2016-05-30 21:56   ` Mike Hommey
2016-05-30 22:46     ` William Duclot
2016-05-30 22:50       ` Mike Hommey
2016-05-31  6:34   ` Junio C Hamano
2016-05-31 15:45     ` William
2016-05-31 15:54       ` Matthieu Moy
2016-05-31 16:08         ` William Duclot
2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso
2016-06-01  7:42   ` Jeff King
2016-06-01 19:50     ` David Turner
2016-06-01 20:09       ` Jeff King
2016-06-01 20:22         ` David Turner
2016-06-01 21:07     ` Jeff King
2016-06-02 11:11       ` Michael Haggerty
2016-06-02 12:58         ` Matthieu Moy
2016-06-02 14:22           ` William Duclot
2016-06-24 17:20         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1605301326530.4449@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=mhagger@alum.mit.edu \
    --cc=simon.rabourg@ensimag.grenoble-inp.fr \
    --cc=william.duclot@ensimag.grenoble-inp.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.