* [PATCH 0/2] strbuf: improve API @ 2016-05-30 10:36 William Duclot 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: William Duclot @ 2016-05-30 10:36 UTC (permalink / raw) To: git; +Cc: simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger This patch series implements an improvment of the strbuf API, allowing strbuf to use preallocated memory. This makes strbuf fit to be used in performance-critical operations. The first patch is simply a preparatory work, adding tests for existing strbuf implementation. Most of the work is made in the second patch: handle pre-allocated memory, extend the API, document it and test it. As said in the second commit, the idea comes from Michael Haggerty. William Duclot (2): strbuf: add tests strbuf: allow to use preallocated memory Makefile | 1 + strbuf.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- strbuf.h | 31 +++++++++++++++++++++++++++++-- t/helper/test-strbuf.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ t/t0082-strbuf.sh | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 251 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 1/2] strbuf: add tests 2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot @ 2016-05-30 10:36 ` William Duclot 2016-05-30 11:26 ` Johannes Schindelin ` (2 more replies) 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot 2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso 2 siblings, 3 replies; 38+ messages in thread From: William Duclot @ 2016-05-30 10:36 UTC (permalink / raw) To: git; +Cc: simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger Test the strbuf API. Being used throughout all Git the API could be considered tested, but adding specific tests makes it easier to improve and extend the API. --- Makefile | 1 + t/helper/test-strbuf.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++ t/t0082-strbuf.sh | 19 ++++++++++++++ 3 files changed, 89 insertions(+) create mode 100644 t/helper/test-strbuf.c create mode 100755 t/t0082-strbuf.sh diff --git a/Makefile b/Makefile index 3f03366..dc84f43 100644 --- a/Makefile +++ b/Makefile @@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree TEST_PROGRAMS_NEED_X += test-sha1 TEST_PROGRAMS_NEED_X += test-sha1-array TEST_PROGRAMS_NEED_X += test-sigchain +TEST_PROGRAMS_NEED_X += test-strbuf TEST_PROGRAMS_NEED_X += test-string-list TEST_PROGRAMS_NEED_X += test-submodule-config TEST_PROGRAMS_NEED_X += test-subprocess diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c new file mode 100644 index 0000000..622f627 --- /dev/null +++ b/t/helper/test-strbuf.c @@ -0,0 +1,69 @@ +#include "git-compat-util.h" +#include "strbuf.h" + +/* + * Check behavior on usual use cases + */ +int test_usual(struct strbuf *sb) +{ + size_t size, old_alloc; + char *res, *old_buf, *str_test = malloc(5*sizeof(char)); + strbuf_grow(sb, 1); + strcpy(str_test, "test"); + old_alloc = sb->alloc; + strbuf_grow(sb, 1000); + if (old_alloc == sb->alloc) + die("strbuf_grow does not realloc the buffer as expected"); + old_buf = sb->buf; + res = strbuf_detach(sb, &size); + if (res != old_buf) + die("strbuf_detach does not return the expected buffer"); + free(res); + + strcpy(str_test, "test"); + strbuf_attach(sb, (void *)str_test, strlen(str_test), sizeof(str_test)); + res = strbuf_detach(sb, &size); + if (res != str_test) + die("strbuf_detach does not return the expected buffer"); + free(res); + strbuf_release(sb); + + return 0; +} + +int main(int argc, char *argv[]) +{ + size_t size = 1; + struct strbuf sb; + char str_test[5] = "test"; + char str_foo[7] = "foo"; + + if (argc != 2) + usage("test-strbuf mode"); + + if (!strcmp(argv[1], "basic_grow")) { + /* + * Check if strbuf_grow(0) allocate a new NUL-terminated buffer + */ + strbuf_init(&sb, 0); + strbuf_grow(&sb, 0); + if (sb.buf == strbuf_slopbuf) + die("strbuf_grow failed to alloc memory"); + strbuf_release(&sb); + if (sb.buf != strbuf_slopbuf) + die("strbuf_release does not reinitialize the strbuf"); + } else if (!strcmp(argv[1], "strbuf_check_behavior")) { + strbuf_init(&sb, 0); + return test_usual(&sb); + } else if (!strcmp(argv[1], "grow_overflow")) { + /* + * size_t overflow: should die() + */ + strbuf_init(&sb, 1000); + strbuf_grow(&sb, maximum_unsigned_value_of_type((size_t)1)); + } else { + usage("test-strbuf mode"); + } + + return 0; +} diff --git a/t/t0082-strbuf.sh b/t/t0082-strbuf.sh new file mode 100755 index 0000000..0800d26 --- /dev/null +++ b/t/t0082-strbuf.sh @@ -0,0 +1,19 @@ +#!/bin/sh + +test_description="Test the strbuf API. +" +. ./test-lib.sh + +test_expect_success 'basic test on strbuf_grow()' ' + test-strbuf basic_grow +' + +test_expect_success 'check strbuf behavior in usual use cases ' ' + test-strbuf strbuf_check_behavior +' + +test_expect_success 'overflow while calling strbuf_grow' ' + test_must_fail test-strbuf grow_overflow +' + +test_done -- 2.8.2.403.ge2646ba.dirty ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] strbuf: add tests 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot @ 2016-05-30 11:26 ` Johannes Schindelin 2016-05-30 13:42 ` Simon Rabourg 2016-05-30 11:56 ` Matthieu Moy 2016-05-31 2:04 ` Michael Haggerty 2 siblings, 1 reply; 38+ messages in thread From: Johannes Schindelin @ 2016-05-30 11:26 UTC (permalink / raw) To: William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger Hi William, On Mon, 30 May 2016, William Duclot wrote: > Test the strbuf API. Being used throughout all Git the API could be > considered tested, but adding specific tests makes it easier to improve > and extend the API. > --- The commit message makes sense. Please add your sign-off. > Makefile | 1 + > t/helper/test-strbuf.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++ > t/t0082-strbuf.sh | 19 ++++++++++++++ > 3 files changed, 89 insertions(+) > create mode 100644 t/helper/test-strbuf.c > create mode 100755 t/t0082-strbuf.sh > > diff --git a/Makefile b/Makefile > index 3f03366..dc84f43 100644 > --- a/Makefile > +++ b/Makefile > @@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree > TEST_PROGRAMS_NEED_X += test-sha1 > TEST_PROGRAMS_NEED_X += test-sha1-array > TEST_PROGRAMS_NEED_X += test-sigchain > +TEST_PROGRAMS_NEED_X += test-strbuf > TEST_PROGRAMS_NEED_X += test-string-list > TEST_PROGRAMS_NEED_X += test-submodule-config > TEST_PROGRAMS_NEED_X += test-subprocess > diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c > new file mode 100644 > index 0000000..622f627 > --- /dev/null > +++ b/t/helper/test-strbuf.c > @@ -0,0 +1,69 @@ > +#include "git-compat-util.h" > +#include "strbuf.h" > + > +/* > + * Check behavior on usual use cases > + */ > +int test_usual(struct strbuf *sb) I have to admit that I would prefer a more concrete name. And since your other tests are more fine-grained, maybe this one could be split into multiple separate ones, too? > +{ > + size_t size, old_alloc; > + char *res, *old_buf, *str_test = malloc(5*sizeof(char)); Our convention is to list the initialized variables first, the uninitialized ones after that, and for readability an empty line is recommended after the variable declaration block. > + strbuf_grow(sb, 1); > + strcpy(str_test, "test"); > + old_alloc = sb->alloc; > + strbuf_grow(sb, 1000); > + if (old_alloc == sb->alloc) > + die("strbuf_grow does not realloc the buffer as expected"); > + old_buf = sb->buf; > + res = strbuf_detach(sb, &size); > + if (res != old_buf) > + die("strbuf_detach does not return the expected buffer"); > + free(res); > + > + strcpy(str_test, "test"); > + strbuf_attach(sb, (void *)str_test, strlen(str_test), sizeof(str_test)); > + res = strbuf_detach(sb, &size); > + if (res != str_test) > + die("strbuf_detach does not return the expected buffer"); > + free(res); > + strbuf_release(sb); > + > + return 0; > +} > + > +int main(int argc, char *argv[]) > +{ > + size_t size = 1; > + struct strbuf sb; The common theme in our source code seems to initialize using STRBUF_INIT... Let's use that paradigm here, too? > + char str_test[5] = "test"; > + char str_foo[7] = "foo"; > + > + if (argc != 2) > + usage("test-strbuf mode"); A nice and convenient way to do command-line parsing is to use the parse-options API, in this case with OPT_CMDMODE. This would also give us a chance to document the command modes in a nice and succinct way: as help strings. > + > + if (!strcmp(argv[1], "basic_grow")) { > + /* > + * Check if strbuf_grow(0) allocate a new NUL-terminated buffer s/allocate/&s/ > + */ > + strbuf_init(&sb, 0); > + strbuf_grow(&sb, 0); > + if (sb.buf == strbuf_slopbuf) > + die("strbuf_grow failed to alloc memory"); > + strbuf_release(&sb); > + if (sb.buf != strbuf_slopbuf) > + die("strbuf_release does not reinitialize the strbuf"); > + } else if (!strcmp(argv[1], "strbuf_check_behavior")) { > + strbuf_init(&sb, 0); > + return test_usual(&sb); > + } else if (!strcmp(argv[1], "grow_overflow")) { > + /* > + * size_t overflow: should die() > + */ > + strbuf_init(&sb, 1000); > + strbuf_grow(&sb, maximum_unsigned_value_of_type((size_t)1)); A comment "If this does not die(), fall through to returning success, to indicate an error" might be nice here. > + } else { > + usage("test-strbuf mode"); > + } > + > + return 0; > +} > diff --git a/t/t0082-strbuf.sh b/t/t0082-strbuf.sh > new file mode 100755 > index 0000000..0800d26 > --- /dev/null > +++ b/t/t0082-strbuf.sh > @@ -0,0 +1,19 @@ > +#!/bin/sh > + > +test_description="Test the strbuf API. > +" This description does not need a new-line, and existing one-liner test descriptions seem not to be terminated by a period. The rest of this patch looks good. Ciao, Johannes ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] strbuf: add tests 2016-05-30 11:26 ` Johannes Schindelin @ 2016-05-30 13:42 ` Simon Rabourg 0 siblings, 0 replies; 38+ messages in thread From: Simon Rabourg @ 2016-05-30 13:42 UTC (permalink / raw) To: Johannes Schindelin Cc: William Duclot, git, francois beutin, antoine queru, matthieu moy, mhagger Hi Johannes, I'm William's teammate on this feature. Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi William, > > On Mon, 30 May 2016, William Duclot wrote: > > > Test the strbuf API. Being used throughout all Git the API could be > > considered tested, but adding specific tests makes it easier to improve > > and extend the API. > > --- > > The commit message makes sense. Please add your sign-off. > We forgot to add the sign-off, we will fix that in the V2. > > Makefile | 1 + > > t/helper/test-strbuf.c | 69 > > ++++++++++++++++++++++++++++++++++++++++++++++++++ > > t/t0082-strbuf.sh | 19 ++++++++++++++ > > 3 files changed, 89 insertions(+) > > create mode 100644 t/helper/test-strbuf.c > > create mode 100755 t/t0082-strbuf.sh > > > > diff --git a/Makefile b/Makefile > > index 3f03366..dc84f43 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree > > TEST_PROGRAMS_NEED_X += test-sha1 > > TEST_PROGRAMS_NEED_X += test-sha1-array > > TEST_PROGRAMS_NEED_X += test-sigchain > > +TEST_PROGRAMS_NEED_X += test-strbuf > > TEST_PROGRAMS_NEED_X += test-string-list > > TEST_PROGRAMS_NEED_X += test-submodule-config > > TEST_PROGRAMS_NEED_X += test-subprocess > > diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c > > new file mode 100644 > > index 0000000..622f627 > > --- /dev/null > > +++ b/t/helper/test-strbuf.c > > @@ -0,0 +1,69 @@ > > +#include "git-compat-util.h" > > +#include "strbuf.h" > > + > > +/* > > + * Check behavior on usual use cases > > + */ > > +int test_usual(struct strbuf *sb) > > I have to admit that I would prefer a more concrete name. And since your > other tests are more fine-grained, maybe this one could be split into > multiple separate ones, too? > We will rename this function. We thought that one complete function would be convenient to test the usual API's behaviour. We are not sure how that change would be useful? > > +{ > > + size_t size, old_alloc; > > + char *res, *old_buf, *str_test = malloc(5*sizeof(char)); > > Our convention is to list the initialized variables first, the > uninitialized ones after that, and for readability an empty line is > recommended after the variable declaration block. OK, seems more readable. > > + strbuf_grow(sb, 1); > > + strcpy(str_test, "test"); > > + old_alloc = sb->alloc; > > + strbuf_grow(sb, 1000); > > + if (old_alloc == sb->alloc) > > + die("strbuf_grow does not realloc the buffer as expected"); > > + old_buf = sb->buf; > > + res = strbuf_detach(sb, &size); > > + if (res != old_buf) > > + die("strbuf_detach does not return the expected buffer"); > > + free(res); > > + > > + strcpy(str_test, "test"); > > + strbuf_attach(sb, (void *)str_test, strlen(str_test), sizeof(str_test)); > > + res = strbuf_detach(sb, &size); > > + if (res != str_test) > > + die("strbuf_detach does not return the expected buffer"); > > + free(res); > > + strbuf_release(sb); > > + > > + return 0; > > +} > > + > > +int main(int argc, char *argv[]) > > +{ > > + size_t size = 1; > > + struct strbuf sb; > > The common theme in our source code seems to initialize using > STRBUF_INIT... Let's use that paradigm here, too? We will add a test to check that initializing with srtbuf_init(...) is the same as initializing with STRBUF_INIT. > > > + char str_test[5] = "test"; > > + char str_foo[7] = "foo"; > > + > > + if (argc != 2) > > + usage("test-strbuf mode"); > > A nice and convenient way to do command-line parsing is to use the > parse-options API, in this case with OPT_CMDMODE. This would also give us > a chance to document the command modes in a nice and succinct way: as help > strings. > True, we're going to make that change. > > + > > + if (!strcmp(argv[1], "basic_grow")) { > > + /* > > + * Check if strbuf_grow(0) allocate a new NUL-terminated buffer > > s/allocate/&s/ > > > + */ > > + strbuf_init(&sb, 0); > > + strbuf_grow(&sb, 0); > > + if (sb.buf == strbuf_slopbuf) > > + die("strbuf_grow failed to alloc memory"); > > + strbuf_release(&sb); > > + if (sb.buf != strbuf_slopbuf) > > + die("strbuf_release does not reinitialize the strbuf"); > > + } else if (!strcmp(argv[1], "strbuf_check_behavior")) { > > + strbuf_init(&sb, 0); > > + return test_usual(&sb); > > + } else if (!strcmp(argv[1], "grow_overflow")) { > > + /* > > + * size_t overflow: should die() > > + */ > > + strbuf_init(&sb, 1000); > > + strbuf_grow(&sb, maximum_unsigned_value_of_type((size_t)1)); > > A comment "If this does not die(), fall through to returning success, to > indicate an error" might be nice here. Agreed. > > + } else { > > + usage("test-strbuf mode"); > > + } > > + > > + return 0; > > +} > > diff --git a/t/t0082-strbuf.sh b/t/t0082-strbuf.sh > > new file mode 100755 > > index 0000000..0800d26 > > --- /dev/null > > +++ b/t/t0082-strbuf.sh > > @@ -0,0 +1,19 @@ > > +#!/bin/sh > > + > > +test_description="Test the strbuf API. > > +" > > This description does not need a new-line, and existing one-liner test > descriptions seem not to be terminated by a period. OK. > The rest of this patch looks good. > > Ciao, > Johannes > Thanks for the Review, Simon Rabourg ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] strbuf: add tests 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot 2016-05-30 11:26 ` Johannes Schindelin @ 2016-05-30 11:56 ` Matthieu Moy 2016-05-31 2:04 ` Michael Haggerty 2 siblings, 0 replies; 38+ messages in thread From: Matthieu Moy @ 2016-05-30 11:56 UTC (permalink / raw) To: William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, mhagger I agree with Johannes' remarks. I'd add two style nits: William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > --- /dev/null > +++ b/t/helper/test-strbuf.c > @@ -0,0 +1,69 @@ > +#include "git-compat-util.h" > +#include "strbuf.h" > + > +/* > + * Check behavior on usual use cases > + */ > +int test_usual(struct strbuf *sb) > +{ > + size_t size, old_alloc; > + char *res, *old_buf, *str_test = malloc(5*sizeof(char)); Spaces around binary operator. > +int main(int argc, char *argv[]) > +{ > + size_t size = 1; > + struct strbuf sb; > + char str_test[5] = "test"; > + char str_foo[7] = "foo"; > + > + if (argc != 2) > + usage("test-strbuf mode"); > + > + if (!strcmp(argv[1], "basic_grow")) { > + /* > + * Check if strbuf_grow(0) allocate a new NUL-terminated buffer > + */ > + strbuf_init(&sb, 0); > + strbuf_grow(&sb, 0); > + if (sb.buf == strbuf_slopbuf) > + die("strbuf_grow failed to alloc memory"); > + strbuf_release(&sb); > + if (sb.buf != strbuf_slopbuf) > + die("strbuf_release does not reinitialize the strbuf"); > + } else if (!strcmp(argv[1], "strbuf_check_behavior")) { > + strbuf_init(&sb, 0); > + return test_usual(&sb); I think the command ("strbuf_check_behavior") should have the same name as the function (test_usual). This avoids one indirection for the reader going from t*.sh file to the actual test code. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] strbuf: add tests 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot 2016-05-30 11:26 ` Johannes Schindelin 2016-05-30 11:56 ` Matthieu Moy @ 2016-05-31 2:04 ` Michael Haggerty 2016-05-31 9:48 ` Simon Rabourg 2 siblings, 1 reply; 38+ messages in thread From: Michael Haggerty @ 2016-05-31 2:04 UTC (permalink / raw) To: William Duclot, git Cc: simon.rabourg, francois.beutin, antoine.queru, matthieu.moy Hi, Cool that you are working on this! See my comments below. On 05/30/2016 12:36 PM, William Duclot wrote: > Test the strbuf API. Being used throughout all Git the API could be > considered tested, but adding specific tests makes it easier to improve > and extend the API. > --- > Makefile | 1 + > t/helper/test-strbuf.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++ > t/t0082-strbuf.sh | 19 ++++++++++++++ > 3 files changed, 89 insertions(+) > create mode 100644 t/helper/test-strbuf.c > create mode 100755 t/t0082-strbuf.sh > > diff --git a/Makefile b/Makefile > index 3f03366..dc84f43 100644 > --- a/Makefile > +++ b/Makefile > @@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree > TEST_PROGRAMS_NEED_X += test-sha1 > TEST_PROGRAMS_NEED_X += test-sha1-array > TEST_PROGRAMS_NEED_X += test-sigchain > +TEST_PROGRAMS_NEED_X += test-strbuf > TEST_PROGRAMS_NEED_X += test-string-list > TEST_PROGRAMS_NEED_X += test-submodule-config > TEST_PROGRAMS_NEED_X += test-subprocess > diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c > new file mode 100644 > index 0000000..622f627 > --- /dev/null > +++ b/t/helper/test-strbuf.c > @@ -0,0 +1,69 @@ > +#include "git-compat-util.h" > +#include "strbuf.h" > + > +/* > + * Check behavior on usual use cases > + */ > +int test_usual(struct strbuf *sb) Is there a particular reason that you pass `sb` into the function? Why not use a local variable? It wouldn't hurt to declare this function static, because it is only used within this one compilation unit. On the other hand, in this particular case I don't think it matters much one way or the other. > +{ > + size_t size, old_alloc; > + char *res, *old_buf, *str_test = malloc(5*sizeof(char)); There is no reason to multiply by `sizeof(char)` here, and we don't do it in our code. (According to the C standard, `sizeof(char)` is always 1.) > + strbuf_grow(sb, 1); > + strcpy(str_test, "test"); > + old_alloc = sb->alloc; > + strbuf_grow(sb, 1000); > + if (old_alloc == sb->alloc) > + die("strbuf_grow does not realloc the buffer as expected"); It's not ideal that your test depends on the details of the allocation policy of strbuf. If somebody were to decide that it makes sense for the initial allocation of a strbuf to be 1024 characters, this test would break. I know it's implausible, but it's better to remove this coupling. You could do it by ensuring that you request more space than is already allocated: strbuf_grow(sb, sb.alloc - sb.len + 1000); On the other hand, if you *want* to test the details of strbuf's allocation policy here, you would do it explicitly rather than as the side effect of this test. For example, before calling `strbuf_grow(sb, 1000)`, you could insert: if (sb.alloc > SOME_VALUE) die("strbuf_grow(sb, 1) allocated too much space"); Though my opinion is that if you want to write tests of strbuf's allocation policies, it should be in an entirely separate test. > + old_buf = sb->buf; > + res = strbuf_detach(sb, &size); Since you don't use `size`, you can pass `NULL` as the second argument of `strbuf_detach()`. The same below. Alternatively, maybe you want to add a test that `strbuf_detach()` sets `size` to the expected value. > + if (res != old_buf) > + die("strbuf_detach does not return the expected buffer"); > + free(res); > + > + strcpy(str_test, "test"); Near the beginning of the function you have this exact line, but here you do it again even though str_test wasn't touched between the two lines. One of them can be deleted... ...but in fact it would be easier to initialize str_test using our xstrdup() function: char *str_test = xstrdup("test"); > + strbuf_attach(sb, (void *)str_test, strlen(str_test), sizeof(str_test)); Since str_test is a `char *`, `sizeof(str_test)` returns the size of the pointer itself (e.g., 4 or 8 bytes, depending on your computer's architecture). What you want here is the size of the memory that it points at, which in this case is `strlen(str_test) + 1`. (You may be confusing this pattern with code that looks like this: char str_test[5]; In this case, `sizeof(str_test)` is indeed 5, because `str_test` is an array of characters rather than a pointer to a character. It's confusing.) Also, you don't need to cast `str_test` to `void *`. Any pointer can be converted to `void *` implicitly. > + res = strbuf_detach(sb, &size); > + if (res != str_test) > + die("strbuf_detach does not return the expected buffer"); > + free(res); > + strbuf_release(sb); > + > + return 0; > +} > + > +int main(int argc, char *argv[]) > +{ > + size_t size = 1; > + struct strbuf sb; > + char str_test[5] = "test"; > + char str_foo[7] = "foo"; `size`, `str_test`, and `str_foo` are all unused. You should compile using `DEVELOPER=1` to enable a bunch of compiler warnings that Git developers tend to use. Any code you write should compile without warnings or errors when compiled with that setting. See `Documentation/CodingGuidelines` for more info. > + > + if (argc != 2) > + usage("test-strbuf mode"); > + > + if (!strcmp(argv[1], "basic_grow")) { > + /* > + * Check if strbuf_grow(0) allocate a new NUL-terminated buffer > + */ > + strbuf_init(&sb, 0); > + strbuf_grow(&sb, 0); > + if (sb.buf == strbuf_slopbuf) > + die("strbuf_grow failed to alloc memory"); > + strbuf_release(&sb); > + if (sb.buf != strbuf_slopbuf) > + die("strbuf_release does not reinitialize the strbuf"); > + } else if (!strcmp(argv[1], "strbuf_check_behavior")) { > + strbuf_init(&sb, 0); > + return test_usual(&sb); > + } else if (!strcmp(argv[1], "grow_overflow")) { > + /* > + * size_t overflow: should die() > + */ > + strbuf_init(&sb, 1000); > + strbuf_grow(&sb, maximum_unsigned_value_of_type((size_t)1)); > + } else { > + usage("test-strbuf mode"); > + } Consider putting each test in a separate function, rather than implementing some tests inline and one as a function. Consistency makes code easier to read. > [...] Michael ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 1/2] strbuf: add tests 2016-05-31 2:04 ` Michael Haggerty @ 2016-05-31 9:48 ` Simon Rabourg 0 siblings, 0 replies; 38+ messages in thread From: Simon Rabourg @ 2016-05-31 9:48 UTC (permalink / raw) To: Michael Haggerty Cc: William Duclot, git, francois beutin, antoine queru, matthieu moy Hi Michael, Michael Haggerty <mhagger@alum.mit.edu> writes: > Hi, > > Cool that you are working on this! See my comments below. > > On 05/30/2016 12:36 PM, William Duclot wrote: > > Test the strbuf API. Being used throughout all Git the API could be > > considered tested, but adding specific tests makes it easier to improve > > and extend the API. > > --- > > Makefile | 1 + > > t/helper/test-strbuf.c | 69 > > ++++++++++++++++++++++++++++++++++++++++++++++++++ > > t/t0082-strbuf.sh | 19 ++++++++++++++ > > 3 files changed, 89 insertions(+) > > create mode 100644 t/helper/test-strbuf.c > > create mode 100755 t/t0082-strbuf.sh > > > > diff --git a/Makefile b/Makefile > > index 3f03366..dc84f43 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -613,6 +613,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree > > TEST_PROGRAMS_NEED_X += test-sha1 > > TEST_PROGRAMS_NEED_X += test-sha1-array > > TEST_PROGRAMS_NEED_X += test-sigchain > > +TEST_PROGRAMS_NEED_X += test-strbuf > > TEST_PROGRAMS_NEED_X += test-string-list > > TEST_PROGRAMS_NEED_X += test-submodule-config > > TEST_PROGRAMS_NEED_X += test-subprocess > > diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c > > new file mode 100644 > > index 0000000..622f627 > > --- /dev/null > > +++ b/t/helper/test-strbuf.c > > @@ -0,0 +1,69 @@ > > +#include "git-compat-util.h" > > +#include "strbuf.h" > > + > > +/* > > + * Check behavior on usual use cases > > + */ > > +int test_usual(struct strbuf *sb) > > Is there a particular reason that you pass `sb` into the function? Why > not use a local variable? > The fact is that we currently use the same 'sb' variable for all the tests. We are calling the "test_usual" function twice: the first time after calling strbuf_init on sb and the second one after calling strbuf_wrap_preallocated. > It wouldn't hurt to declare this function static, because it is only > used within this one compilation unit. On the other hand, in this > particular case I don't think it matters much one way or the other. > Agreed. > > +{ > > + size_t size, old_alloc; > > + char *res, *old_buf, *str_test = malloc(5*sizeof(char)); > > There is no reason to multiply by `sizeof(char)` here, and we don't do > it in our code. (According to the C standard, `sizeof(char)` is always 1.) > Ok. > > + strbuf_grow(sb, 1); > > + strcpy(str_test, "test"); > > + old_alloc = sb->alloc; > > + strbuf_grow(sb, 1000); > > + if (old_alloc == sb->alloc) > > + die("strbuf_grow does not realloc the buffer as expected"); > > It's not ideal that your test depends on the details of the allocation > policy of strbuf. If somebody were to decide that it makes sense for the > initial allocation of a strbuf to be 1024 characters, this test would > break. I know it's implausible, but it's better to remove this coupling. > > You could do it by ensuring that you request more space than is already > allocated: > > strbuf_grow(sb, sb.alloc - sb.len + 1000); > > On the other hand, if you *want* to test the details of strbuf's > allocation policy here, you would do it explicitly rather than as the > side effect of this test. For example, before calling `strbuf_grow(sb, > 1000)`, you could insert: > > if (sb.alloc > SOME_VALUE) > die("strbuf_grow(sb, 1) allocated too much space"); Ok. > > Though my opinion is that if you want to write tests of strbuf's > allocation policies, it should be in an entirely separate test. > > > + old_buf = sb->buf; > > + res = strbuf_detach(sb, &size); > > Since you don't use `size`, you can pass `NULL` as the second argument > of `strbuf_detach()`. The same below. > > Alternatively, maybe you want to add a test that `strbuf_detach()` sets > `size` to the expected value. > True, we will put that test in the next version. > > + if (res != old_buf) > > + die("strbuf_detach does not return the expected buffer"); > > + free(res); > > + > > + strcpy(str_test, "test"); > > Near the beginning of the function you have this exact line, but here > you do it again even though str_test wasn't touched between the two > lines. One of them can be deleted... > > ...but in fact it would be easier to initialize str_test using our > xstrdup() function: > > char *str_test = xstrdup("test"); Ok. > > > + strbuf_attach(sb, (void *)str_test, strlen(str_test), sizeof(str_test)); > > Since str_test is a `char *`, `sizeof(str_test)` returns the size of the > pointer itself (e.g., 4 or 8 bytes, depending on your computer's > architecture). What you want here is the size of the memory that it > points at, which in this case is `strlen(str_test) + 1`. > > (You may be confusing this pattern with code that looks like this: > > char str_test[5]; > > In this case, `sizeof(str_test)` is indeed 5, because `str_test` is an > array of characters rather than a pointer to a character. It's confusing.) > In the first version, str_test was a static variable. We updated it but forgot to make the changes on the alloc_size passed to the strbuf_detach call. Indeed it's a big mistake. Thanks for pointing it out. > Also, you don't need to cast `str_test` to `void *`. Any pointer can be > converted to `void *` implicitly. Agreed. > > > + res = strbuf_detach(sb, &size); > > + if (res != str_test) > > + die("strbuf_detach does not return the expected buffer"); > > + free(res); > > + strbuf_release(sb); > > + > > + return 0; > > +} > > + > > +int main(int argc, char *argv[]) > > +{ > > + size_t size = 1; > > + struct strbuf sb; > > + char str_test[5] = "test"; > > + char str_foo[7] = "foo"; > > `size`, `str_test`, and `str_foo` are all unused. True. The fact is that we don't use them in all cases so they are not necessary for all the tests. We will change that in the next version. > > You should compile using `DEVELOPER=1` to enable a bunch of compiler > warnings that Git developers tend to use. Any code you write should > compile without warnings or errors when compiled with that setting. See > `Documentation/CodingGuidelines` for more info. > Thanks for this advice. > > + > > + if (argc != 2) > > + usage("test-strbuf mode"); > > + > > + if (!strcmp(argv[1], "basic_grow")) { > > + /* > > + * Check if strbuf_grow(0) allocate a new NUL-terminated buffer > > + */ > > + strbuf_init(&sb, 0); > > + strbuf_grow(&sb, 0); > > + if (sb.buf == strbuf_slopbuf) > > + die("strbuf_grow failed to alloc memory"); > > + strbuf_release(&sb); > > + if (sb.buf != strbuf_slopbuf) > > + die("strbuf_release does not reinitialize the strbuf"); > > + } else if (!strcmp(argv[1], "strbuf_check_behavior")) { > > + strbuf_init(&sb, 0); > > + return test_usual(&sb); > > + } else if (!strcmp(argv[1], "grow_overflow")) { > > + /* > > + * size_t overflow: should die() > > + */ > > + strbuf_init(&sb, 1000); > > + strbuf_grow(&sb, maximum_unsigned_value_of_type((size_t)1)); > > + } else { > > + usage("test-strbuf mode"); > > + } > > Consider putting each test in a separate function, rather than > implementing some tests inline and one as a function. Consistency makes > code easier to read. > Ok. > > [...] > > Michael > > Thanks, Simon Rabourg ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot @ 2016-05-30 10:36 ` William Duclot 2016-05-30 12:13 ` Johannes Schindelin ` (3 more replies) 2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso 2 siblings, 4 replies; 38+ messages in thread From: William Duclot @ 2016-05-30 10:36 UTC (permalink / raw) To: git; +Cc: simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger It is unfortunate that it is currently impossible to use a strbuf without doing a memory allocation. So code like void f() { char path[PATH_MAX]; ... } typically gets turned into either void f() { struct strbuf path; strbuf_add(&path, ...); <-- does a malloc ... strbuf_release(&path); <-- does a free } which costs extra memory allocations, or void f() { static struct strbuf path; strbuf_add(&path, ...); ... strbuf_setlen(&path, 0); } which, by using a static variable, avoids most of the malloc/free overhead, but makes the function unsafe to use recursively or from multiple threads. Those limitations prevent strbuf to be used in performance-critical operations. THE IDEA -------- The idea here is to enhance strbuf to allow it to use memory that it doesn't own (for example, stack-allocated memory), while (optionally) allowing it to switch over to using allocated memory if the string grows past the size of the pre-allocated buffer. API ENHANCEMENT --------------- All functions of the API can still be reliably called without knowledge of the initialization (normal/preallocated/fixed) with the exception that strbuf_grow() may die() if the string try to overflow a fixed buffer. The API contract is still respected: - The API users may peek strbuf.buf in-place until they perform an operation that makes it longer (at which point the .buf pointer may point at a new piece of memory). - The API users may strbuf_detach() to obtain a piece of memory that belongs to them (at which point the strbuf becomes empty), hence needs to be freed by the callers. Full credit to Michael Haggerty for the idea and most of the wording of this commit (http://mid.gmane.org/53512DB6.1070600@alum.mit.edu). The implementation and bugs are all mine. Signed-off by: William Duclot <william.duclot@ensimag.grenoble-inp.fr> Signed-off by: Simon Rabourg <simon.rabourg@ensimag.grenoble-inp.fr> Signed-off by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> --- strbuf.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++---- strbuf.h | 31 +++++++++++++++++++++-- t/helper/test-strbuf.c | 42 +++++++++++++++++++++++++++++++ t/t0082-strbuf.sh | 28 +++++++++++++++++++++ 4 files changed, 162 insertions(+), 7 deletions(-) diff --git a/strbuf.c b/strbuf.c index 1ba600b..527b986 100644 --- a/strbuf.c +++ b/strbuf.c @@ -1,6 +1,14 @@ #include "cache.h" #include "refs.h" #include "utf8.h" +#include <sys/param.h> + +/** + * Flags + * -------------- + */ +#define STRBUF_OWNS_MEMORY 1 +#define STRBUF_FIXED_MEMORY (1 << 1) int starts_with(const char *str, const char *prefix) { @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; void strbuf_init(struct strbuf *sb, size_t hint) { + sb->flags = 0; sb->alloc = sb->len = 0; sb->buf = strbuf_slopbuf; if (hint) strbuf_grow(sb, hint); } +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, + size_t path_buf_len, size_t alloc_len) +{ + if (!path_buf) + die("you try to use a NULL buffer to initialize a strbuf"); + + strbuf_init(sb, 0); + strbuf_attach(sb, path_buf, path_buf_len, alloc_len); + sb->flags &= ~STRBUF_OWNS_MEMORY; + sb->flags &= ~STRBUF_FIXED_MEMORY; +} + +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf, + size_t path_buf_len, size_t alloc_len) +{ + strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len); + sb->flags |= STRBUF_FIXED_MEMORY; +} + void strbuf_release(struct strbuf *sb) { if (sb->alloc) { - free(sb->buf); + if (sb->flags & STRBUF_OWNS_MEMORY) + free(sb->buf); strbuf_init(sb, 0); } } @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) { char *res; strbuf_grow(sb, 0); - res = sb->buf; + if (sb->flags & STRBUF_OWNS_MEMORY) + res = sb->buf; + else + res = xmemdupz(sb->buf, sb->alloc - 1); + if (sz) *sz = sb->len; strbuf_init(sb, 0); @@ -51,6 +84,8 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc) sb->buf = buf; sb->len = len; sb->alloc = alloc; + sb->flags |= STRBUF_OWNS_MEMORY; + sb->flags &= ~STRBUF_FIXED_MEMORY; strbuf_grow(sb, 0); sb->buf[sb->len] = '\0'; } @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra) if (unsigned_add_overflows(extra, 1) || unsigned_add_overflows(sb->len, extra + 1)) die("you want to use way too much memory"); - if (new_buf) - sb->buf = NULL; - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); + if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc) + die("you try to make a string overflow the buffer of a fixed strbuf"); + + /* + * ALLOC_GROW may do a realloc() if needed, so we must not use it on + * a buffer the strbuf doesn't own + */ + if (sb->flags & STRBUF_OWNS_MEMORY) { + if (new_buf) + sb->buf = NULL; + ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); + } else { + /* + * The strbuf doesn't own the buffer: to avoid to realloc it, + * the strbuf needs to use a new buffer without freeing the old + */ + if (sb->len + extra + 1 > sb->alloc) { + size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); + char *buf = xmalloc(new_alloc); + memcpy(buf, sb->buf, sb->alloc); + sb->buf = buf; + sb->alloc = new_alloc; + sb->flags |= STRBUF_OWNS_MEMORY; + } + } + if (new_buf) sb->buf[0] = '\0'; } diff --git a/strbuf.h b/strbuf.h index 7987405..634759c 100644 --- a/strbuf.h +++ b/strbuf.h @@ -11,11 +11,16 @@ * A strbuf is NUL terminated for convenience, but no function in the * strbuf API actually relies on the string being free of NULs. * + * You can avoid the malloc/free overhead of `strbuf_init()`, `strbuf_add()` and + * `strbuf_release()` by wrapping pre-allocated memory (stack-allocated for + * example) using `strbuf_wrap_preallocated()` or `strbuf_wrap_fixed()`. + * * strbufs have some invariants that are very important to keep in mind: * * - The `buf` member is never NULL, so it can be used in any usual C * string operations safely. strbuf's _have_ to be initialized either by - * `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though. + * `strbuf_init()`, `= STRBUF_INIT`, `strbuf_wrap_preallocated()` or + * `strbuf_wrap_fixed()` before the invariants, though. * * Do *not* assume anything on what `buf` really is (e.g. if it is * allocated memory or not), use `strbuf_detach()` to unwrap a memory @@ -62,13 +67,14 @@ * access to the string itself. */ struct strbuf { + unsigned int flags; size_t alloc; size_t len; char *buf; }; extern char strbuf_slopbuf[]; -#define STRBUF_INIT { 0, 0, strbuf_slopbuf } +#define STRBUF_INIT { 0, 0, 0, strbuf_slopbuf } /** * Life Cycle Functions @@ -81,6 +87,25 @@ extern char strbuf_slopbuf[]; */ extern void strbuf_init(struct strbuf *, size_t); +/** + * Allow the caller to give a pre-allocated piece of memory for the strbuf + * to use. It is possible then to strbuf_grow() the string past the size of the + * pre-allocated buffer: a new buffer will be allocated. The pre-allocated + * buffer will never be freed. + */ +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, + size_t path_buf_len, size_t alloc_len); + +/** + * Allow the caller to give a pre-allocated piece of memory for the strbuf + * to use and indicate that the strbuf must use exclusively this buffer, + * never realloc() it or allocate a new one. It means that the string can + * be manipulated but cannot overflow the pre-allocated buffer. The + * pre-allocated buffer will never be freed. + */ +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf, + size_t path_buf_len, size_t alloc_len); + /** * Release a string buffer and the memory it used. You should not use the * string buffer after using this function, unless you initialize it again. @@ -91,6 +116,8 @@ extern void strbuf_release(struct strbuf *); * Detach the string from the strbuf and returns it; you now own the * storage the string occupies and it is your responsibility from then on * to release it with `free(3)` when you are done with it. + * Must allocate a copy of the buffer in case of a preallocated/fixed buffer. + * Performance-critical operations have to be aware of this. */ extern char *strbuf_detach(struct strbuf *, size_t *); diff --git a/t/helper/test-strbuf.c b/t/helper/test-strbuf.c index 622f627..1aaacb5 100644 --- a/t/helper/test-strbuf.c +++ b/t/helper/test-strbuf.c @@ -61,6 +61,48 @@ int main(int argc, char *argv[]) */ strbuf_init(&sb, 1000); strbuf_grow(&sb, maximum_unsigned_value_of_type((size_t)1)); + } else if (!strcmp(argv[1], "preallocated_check_behavior")) { + strbuf_wrap_preallocated(&sb, (void *)str_test, + strlen(str_test), sizeof(str_test)); + return test_usual(&sb); + } else if (!strcmp(argv[1], "preallocated_NULL")) { + /* + * Violation of invariant "strbuf must not be NULL": should die() + */ + strbuf_wrap_preallocated(&sb, NULL, 0, sizeof(str_test)); + } else if (!strcmp(argv[1], "grow_fixed_overflow")) { + /* + * Overflowing the buffer of a fixed strbuf: should die() + */ + strbuf_wrap_fixed(&sb, (void *)str_foo, + strlen(str_foo), sizeof(str_foo)); + strbuf_grow(&sb, 3); + strbuf_grow(&sb, 1000); + } else if (!strcmp(argv[1], "grow_fixed_overflow_min")) { + /* + * Minimum strbuf_grow() for overflowing a fixed strbuf: should die() + */ + strbuf_wrap_fixed(&sb, (void *)str_foo, + strlen(str_foo), sizeof(str_foo)); + strbuf_grow(&sb, 4); + } else if (!strcmp(argv[1], "grow_fixed_success")) { + strbuf_wrap_fixed(&sb, (void *)str_foo, + strlen(str_foo), sizeof(str_foo)); + strbuf_grow(&sb, 3); + } else if (!strcmp(argv[1], "detach_fixed")) { + char *buf; + strbuf_wrap_fixed(&sb, (void *)str_test, + strlen(str_test), sizeof(str_test)); + buf = strbuf_detach(&sb, &size); + if (str_test == buf) + die("strbuf_detach does not copy the buffer"); + free(buf); + } else if (!strcmp(argv[1], "release_fixed")) { + strbuf_wrap_fixed(&sb, (void *)str_test, strlen(str_test), + sizeof(sb) + 1); + strbuf_release(&sb); + if (sb.buf != strbuf_slopbuf) + die("strbuf_release does not reinitialize the strbuf"); } else { usage("test-strbuf mode"); } diff --git a/t/t0082-strbuf.sh b/t/t0082-strbuf.sh index 0800d26..5de909d 100755 --- a/t/t0082-strbuf.sh +++ b/t/t0082-strbuf.sh @@ -16,4 +16,32 @@ test_expect_success 'overflow while calling strbuf_grow' ' test_must_fail test-strbuf grow_overflow ' +test_expect_success 'check preallocated strbuf behavior in usual use cases' ' + test-strbuf preallocated_check_behavior +' + +test_expect_success 'strbuf_wrap_preallocated NULL initialization' ' + test_must_fail test-strbuf preallocated_NULL +' + +test_expect_success 'strbuf_grow with wrap_fixed overflow' ' + test_must_fail test-strbuf grow_fixed_overflow +' + +test_expect_success 'strbuf_grow with wrap_fixed minimum overflow' ' + test_must_fail test-strbuf grow_fixed_overflow_min +' + +test_expect_success 'strbuf_grow with wrap_fixed in a successful case' ' + test-strbuf grow_fixed_success +' + +test_expect_success 'stbuf_detach with wrap_fixed memory' ' + test-strbuf detach_fixed +' + +test_expect_success 'stbuf_release with wrap_fixed memory' ' + test-strbuf release_fixed +' + test_done -- 2.8.2.403.ge2646ba.dirty ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot @ 2016-05-30 12:13 ` Johannes Schindelin 2016-05-30 13:20 ` William Duclot 2016-05-31 3:05 ` Michael Haggerty 2016-05-30 12:52 ` Matthieu Moy ` (2 subsequent siblings) 3 siblings, 2 replies; 38+ messages in thread From: Johannes Schindelin @ 2016-05-30 12:13 UTC (permalink / raw) To: William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger Hi William, On Mon, 30 May 2016, William Duclot wrote: > It is unfortunate that it is currently impossible to use a strbuf > without doing a memory allocation. So code like > > void f() > { > char path[PATH_MAX]; > ... > } > > typically gets turned into either > > void f() > { > struct strbuf path; > strbuf_add(&path, ...); <-- does a malloc > ... > strbuf_release(&path); <-- does a free > } > > which costs extra memory allocations, or > > void f() > { > static struct strbuf path; > strbuf_add(&path, ...); > ... > strbuf_setlen(&path, 0); > } > > which, by using a static variable, avoids most of the malloc/free > overhead, but makes the function unsafe to use recursively or from > multiple threads. Those limitations prevent strbuf to be used in > performance-critical operations. This description is nice and verbose, but maybe something like this would introduce the subject in a quicker manner? When working e.g. with file paths or with dates, strbuf's malloc()/free() dance of strbufs can be easily avoided: as a sensible initial buffer size is already known, it can be allocated on the heap. The rest of the commit message flows nicely. > diff --git a/strbuf.c b/strbuf.c > index 1ba600b..527b986 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -1,6 +1,14 @@ > #include "cache.h" > #include "refs.h" > #include "utf8.h" > +#include <sys/param.h> Why? > +/** > + * Flags > + * -------------- > + */ > +#define STRBUF_OWNS_MEMORY 1 > +#define STRBUF_FIXED_MEMORY (1 << 1) >From reading the commit message, I expected STRBUF_OWNS_MEMORY. STRBUF_FIXED_MEMORY still needs to be explained. > @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; > > void strbuf_init(struct strbuf *sb, size_t hint) > { > + sb->flags = 0; > sb->alloc = sb->len = 0; > sb->buf = strbuf_slopbuf; > if (hint) > strbuf_grow(sb, hint); > } > > +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, > + size_t path_buf_len, size_t alloc_len) > +{ > + if (!path_buf) > + die("you try to use a NULL buffer to initialize a strbuf"); > + > + strbuf_init(sb, 0); > + strbuf_attach(sb, path_buf, path_buf_len, alloc_len); > + sb->flags &= ~STRBUF_OWNS_MEMORY; > + sb->flags &= ~STRBUF_FIXED_MEMORY; Shorter: sb->flags &= ~(STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY); > +} > + > +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf, > + size_t path_buf_len, size_t alloc_len) > +{ > + strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len); > + sb->flags |= STRBUF_FIXED_MEMORY; > +} Rather than letting strbuf_wrap_preallocated() set sb->flags &= ~FIXED_MEMORY only to revert that decision right away, a static function could be called by both strbuf_wrap_preallocated() and strbuf_wrap_fixed(). > void strbuf_release(struct strbuf *sb) > { > if (sb->alloc) { > - free(sb->buf); > + if (sb->flags & STRBUF_OWNS_MEMORY) > + free(sb->buf); > strbuf_init(sb, 0); > } Should we not reset the flags here, too? > @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) > { > char *res; > strbuf_grow(sb, 0); > - res = sb->buf; > + if (sb->flags & STRBUF_OWNS_MEMORY) > + res = sb->buf; > + else > + res = xmemdupz(sb->buf, sb->alloc - 1); This looks like a usage to be avoided: if we plan to detach the buffer, anyway, there is no good reason to allocate it on the heap first. I would at least issue a warning here. > @@ -51,6 +84,8 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc) > sb->buf = buf; > sb->len = len; > sb->alloc = alloc; > + sb->flags |= STRBUF_OWNS_MEMORY; > + sb->flags &= ~STRBUF_FIXED_MEMORY; > strbuf_grow(sb, 0); > sb->buf[sb->len] = '\0'; > } > @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra) > if (unsigned_add_overflows(extra, 1) || > unsigned_add_overflows(sb->len, extra + 1)) > die("you want to use way too much memory"); > - if (new_buf) > - sb->buf = NULL; > - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > + if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc) > + die("you try to make a string overflow the buffer of a fixed strbuf"); We try to avoid running over 80 columns/row. This message could be more to the point: cannot grow fixed string > + /* > + * ALLOC_GROW may do a realloc() if needed, so we must not use it on > + * a buffer the strbuf doesn't own > + */ > + if (sb->flags & STRBUF_OWNS_MEMORY) { > + if (new_buf) > + sb->buf = NULL; > + ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > + } else { > + /* > + * The strbuf doesn't own the buffer: to avoid to realloc it, > + * the strbuf needs to use a new buffer without freeing the old > + */ > + if (sb->len + extra + 1 > sb->alloc) { > + size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); > + char *buf = xmalloc(new_alloc); > + memcpy(buf, sb->buf, sb->alloc); > + sb->buf = buf; > + sb->alloc = new_alloc; > + sb->flags |= STRBUF_OWNS_MEMORY; > + } > + } > + > if (new_buf) > sb->buf[0] = '\0'; > } > diff --git a/strbuf.h b/strbuf.h > index 7987405..634759c 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -11,11 +11,16 @@ > * A strbuf is NUL terminated for convenience, but no function in the > * strbuf API actually relies on the string being free of NULs. > * > + * You can avoid the malloc/free overhead of `strbuf_init()`, `strbuf_add()` and > + * `strbuf_release()` by wrapping pre-allocated memory (stack-allocated for > + * example) using `strbuf_wrap_preallocated()` or `strbuf_wrap_fixed()`. > + * > * strbufs have some invariants that are very important to keep in mind: > * > * - The `buf` member is never NULL, so it can be used in any usual C > * string operations safely. strbuf's _have_ to be initialized either by > - * `strbuf_init()` or by `= STRBUF_INIT` before the invariants, though. > + * `strbuf_init()`, `= STRBUF_INIT`, `strbuf_wrap_preallocated()` or > + * `strbuf_wrap_fixed()` before the invariants, though. > * > * Do *not* assume anything on what `buf` really is (e.g. if it is > * allocated memory or not), use `strbuf_detach()` to unwrap a memory > @@ -62,13 +67,14 @@ > * access to the string itself. > */ > struct strbuf { > + unsigned int flags; > size_t alloc; > size_t len; > char *buf; > }; > > extern char strbuf_slopbuf[]; > -#define STRBUF_INIT { 0, 0, strbuf_slopbuf } > +#define STRBUF_INIT { 0, 0, 0, strbuf_slopbuf } If I am not mistaken, to preserve the existing behavior the initial flags should be 1 (own memory). BTW this demonstrates that it may not be a good idea to declare the "flags" field globally but then make the actual flags private. Also: similar use cases in Git used :1 flags (see e.g. the "configured" field in credential.h). Ciao, Johannes ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 12:13 ` Johannes Schindelin @ 2016-05-30 13:20 ` William Duclot 2016-05-31 6:21 ` Johannes Schindelin 2016-05-31 3:05 ` Michael Haggerty 1 sibling, 1 reply; 38+ messages in thread From: William Duclot @ 2016-05-30 13:20 UTC (permalink / raw) To: Johannes Schindelin Cc: git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Mon, 30 May 2016, William Duclot wrote: > >> It is unfortunate that it is currently impossible to use a strbuf >> without doing a memory allocation. So code like >> >> void f() >> { >> char path[PATH_MAX]; >> ... >> } >> >> typically gets turned into either >> >> void f() >> { >> struct strbuf path; >> strbuf_add(&path, ...); <-- does a malloc >> ... >> strbuf_release(&path); <-- does a free >> } >> >> which costs extra memory allocations, or >> >> void f() >> { >> static struct strbuf path; >> strbuf_add(&path, ...); >> ... >> strbuf_setlen(&path, 0); >> } >> >> which, by using a static variable, avoids most of the malloc/free >> overhead, but makes the function unsafe to use recursively or from >> multiple threads. Those limitations prevent strbuf to be used in >> performance-critical operations. > > This description is nice and verbose, but maybe something like this would > introduce the subject in a quicker manner? > > When working e.g. with file paths or with dates, strbuf's > malloc()/free() dance of strbufs can be easily avoided: as > a sensible initial buffer size is already known, it can be > allocated on the heap. strbuf already allow to indicate a sensible initial buffer size thanks to strbuf_init() second parameter. The main perk of pre-allocation is to use stack-allocated memory, and not heap-allocated :) Unless I misunderstood your message? >> diff --git a/strbuf.c b/strbuf.c >> index 1ba600b..527b986 100644 >> --- a/strbuf.c >> +++ b/strbuf.c >> @@ -1,6 +1,14 @@ >> #include "cache.h" >> #include "refs.h" >> #include "utf8.h" >> +#include <sys/param.h> > > Why? For the MAX macro. It may be a teeny tiny overkill >> +/** >> + * Flags >> + * -------------- >> + */ >> +#define STRBUF_OWNS_MEMORY 1 >> +#define STRBUF_FIXED_MEMORY (1 << 1) > > From reading the commit message, I expected STRBUF_OWNS_MEMORY. > STRBUF_FIXED_MEMORY still needs to be explained. Yes, that seems right >> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; >> >> void strbuf_init(struct strbuf *sb, size_t hint) >> { >> + sb->flags = 0; >> sb->alloc = sb->len = 0; >> sb->buf = strbuf_slopbuf; >> if (hint) >> strbuf_grow(sb, hint); >> } >> >> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, >> + size_t path_buf_len, size_t alloc_len) >> +{ >> + if (!path_buf) >> + die("you try to use a NULL buffer to initialize a strbuf"); >> + >> + strbuf_init(sb, 0); >> + strbuf_attach(sb, path_buf, path_buf_len, alloc_len); >> + sb->flags &= ~STRBUF_OWNS_MEMORY; >> + sb->flags &= ~STRBUF_FIXED_MEMORY; > > Shorter: sb->flags &= ~(STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY); Okay with me >> +} >> + >> +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf, >> + size_t path_buf_len, size_t alloc_len) >> +{ >> + strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len); >> + sb->flags |= STRBUF_FIXED_MEMORY; >> +} > > Rather than letting strbuf_wrap_preallocated() set sb->flags &= > ~FIXED_MEMORY only to revert that decision right away, a static function > could be called by both strbuf_wrap_preallocated() and > strbuf_wrap_fixed(). Makes sense >> void strbuf_release(struct strbuf *sb) >> { >> if (sb->alloc) { >> - free(sb->buf); >> + if (sb->flags & STRBUF_OWNS_MEMORY) >> + free(sb->buf); >> strbuf_init(sb, 0); >> } > > Should we not reset the flags here, too? Well, strbuf_init() reset the flags. The only way to have !sb->alloc is that strbuf has been initialized and never used (even alloc_grow(0) set sb->alloc=1), so sb==STRBUF_INIT, so the flags don't have to be reset >> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) >> { >> char *res; >> strbuf_grow(sb, 0); >> - res = sb->buf; >> + if (sb->flags & STRBUF_OWNS_MEMORY) >> + res = sb->buf; >> + else >> + res = xmemdupz(sb->buf, sb->alloc - 1); > > This looks like a usage to be avoided: if we plan to detach the buffer, > anyway, there is no good reason to allocate it on the heap first. I would > at least issue a warning here. strbuf_detach() guarantees to return heap-allocated memory, that the caller can use however he want and that he'll have to free. If the strbuf doesn't own the memory, it cannot return the buf attribute directly because: - The memory belong to someone else (so the caller can't use it however he want) - The caller can't have the responsibility to free (because the memory belong to someone else) - The memory may not even be heap-allocated >> @@ -51,6 +84,8 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t >> len, size_t alloc) >> sb->buf = buf; >> sb->len = len; >> sb->alloc = alloc; >> + sb->flags |= STRBUF_OWNS_MEMORY; >> + sb->flags &= ~STRBUF_FIXED_MEMORY; >> strbuf_grow(sb, 0); >> sb->buf[sb->len] = '\0'; >> } >> @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra) >> if (unsigned_add_overflows(extra, 1) || >> unsigned_add_overflows(sb->len, extra + 1)) >> die("you want to use way too much memory"); >> - if (new_buf) >> - sb->buf = NULL; >> - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); >> + if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc) >> + die("you try to make a string overflow the buffer of a fixed strbuf"); > > We try to avoid running over 80 columns/row. This message could be > more to the point: cannot grow fixed string What is fixed is the buffer, not the string. I'll shrink that under 80 columns >> extern char strbuf_slopbuf[]; >> -#define STRBUF_INIT { 0, 0, strbuf_slopbuf } >> +#define STRBUF_INIT { 0, 0, 0, strbuf_slopbuf } > > If I am not mistaken, to preserve the existing behavior the initial flags > should be 1 (own memory). strbuf_slopbuf is a buffer that doesn't belong to any strbuf (because it's shared between all just-initialized strbufs). If STRBUF_OWNS_MEMORY was set, strbuf_slopbuf could be freed (which is impossible because it is shared AND even more because it is stack-allocated) > BTW this demonstrates that it may not be a good idea to declare the > "flags" field globally but then make the actual flags private. I'm not sure what you mean here? > Also: similar use cases in Git used :1 flags (see e.g. the "configured" > field in credential.h). I think that keeping an obscure `flags` attribute may be better, as they should only be useful for internal operations and the user shouldn't mess with it. Keeping it a `private` attribute, in a way ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 13:20 ` William Duclot @ 2016-05-31 6:21 ` Johannes Schindelin 0 siblings, 0 replies; 38+ messages in thread From: Johannes Schindelin @ 2016-05-31 6:21 UTC (permalink / raw) To: William Duclot Cc: git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger Hi William, On Mon, 30 May 2016, William Duclot wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > When working e.g. with file paths or with dates, strbuf's > > malloc()/free() dance of strbufs can be easily avoided: as > > a sensible initial buffer size is already known, it can be > > allocated on the heap. > > strbuf already allow to indicate a sensible initial buffer size thanks > to strbuf_init() second parameter. The main perk of pre-allocation > is to use stack-allocated memory, and not heap-allocated :) Sorry, my brain thought about the next point while my fingers typed "heap". It is "stack" I meant. > >> +#include <sys/param.h> > > > > Why? > > For the MAX macro. It may be a teeny tiny overkill Teeny tiny. You probably assumed that everybody compiles this on Linux, and since it compiles on your machine, no problem, right? > >> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) > >> { > >> char *res; > >> strbuf_grow(sb, 0); > >> - res = sb->buf; > >> + if (sb->flags & STRBUF_OWNS_MEMORY) > >> + res = sb->buf; > >> + else > >> + res = xmemdupz(sb->buf, sb->alloc - 1); > > > > This looks like a usage to be avoided: if we plan to detach the buffer, > > anyway, there is no good reason to allocate it on the heap first. I would > > at least issue a warning here. > > strbuf_detach() guarantees to return heap-allocated memory, that the caller > can use however he want and that he'll have to free. First of all, let's stop this "caller == he" thing right here and now. A better way is to use the "singular they": ... the caller can use however they want... > If the strbuf doesn't own the memory, it cannot return the buf attribute > directly because: [...] I know that. My point was that it is more elegant to allocate the buffer on the heap right away, if we already know that we want to detach it. I'll reply to Michael's comment on this. > - The memory belong to someone else (so the caller can't use it however > he want) s/belong/belongs/ (just because the 's' is often silent in spoken French does not mean that you can drop it from written English) > >> extern char strbuf_slopbuf[]; > >> -#define STRBUF_INIT { 0, 0, strbuf_slopbuf } > >> +#define STRBUF_INIT { 0, 0, 0, strbuf_slopbuf } > > > > If I am not mistaken, to preserve the existing behavior the initial flags > > should be 1 (own memory). > > strbuf_slopbuf is a buffer that doesn't belong to any strbuf (because it's > shared between all just-initialized strbufs). Yet we somehow already have handling for that, right? Makes me wonder why I did not see that handling converted to the STRBUF_OWNS_MEMORY way... > > BTW this demonstrates that it may not be a good idea to declare the > > "flags" field globally but then make the actual flags private. > > I'm not sure what you mean here? What I meant is that the global _INIT cannot set any flags, because the constants are not available globally. And that seems odd. If you ever want to introduce something like STRBUF_INIT_WRAP(buffer), you would *require* those flag constants to be public. > > Also: similar use cases in Git used :1 flags (see e.g. the "configured" > > field in credential.h). > > I think that keeping an obscure `flags` attribute may be better, as they > should only be useful for internal operations and the user shouldn't mess > with it. Keeping it a `private` attribute, in a way This is not C++. To do what you intend to do, you would require C++ style visibility scopes, where a constructor can use a private enum. If you truly want to go this way, I fear you will have *a lot more* to do than this 2-patch series: there are a *ton* of existing header files disagreeing with this goal. Ciao, Johannes ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 12:13 ` Johannes Schindelin 2016-05-30 13:20 ` William Duclot @ 2016-05-31 3:05 ` Michael Haggerty 2016-05-31 6:41 ` Johannes Schindelin 1 sibling, 1 reply; 38+ messages in thread From: Michael Haggerty @ 2016-05-31 3:05 UTC (permalink / raw) To: Johannes Schindelin, William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy On 05/30/2016 02:13 PM, Johannes Schindelin wrote: > [...] >> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) >> { >> char *res; >> strbuf_grow(sb, 0); >> - res = sb->buf; >> + if (sb->flags & STRBUF_OWNS_MEMORY) >> + res = sb->buf; >> + else >> + res = xmemdupz(sb->buf, sb->alloc - 1); > > This looks like a usage to be avoided: if we plan to detach the buffer, > anyway, there is no good reason to allocate it on the heap first. I would > at least issue a warning here. I think this last case should be changed to res = xmemdupz(sb->buf, sb->len); Johannes, if this change is made then I think that there is a reasonable use case for calling `strbuf_detach()` on a strbuf that wraps a stack-allocated string, so I don't think that a warning is needed. I think this change makes sense. After all, once a caller detaches a string, it is probably not planning on growing/shrinking it anymore, so any more space than that would probably be wasted. In fact, since the caller has no way to ask how much extra space the detached string has allocated, it is almost guaranteed that the space would be wasted. Actually, that is not 100% certain. Theoretically, a caller might read `sb->alloc` *before* calling `strbuf_detach()`, and assume that the detached string has that allocated size. Or the caller might call `strbuf_grow()` then assume that the detached string has at least that much free space. I sure hope that no callers actually do that. The docstring for `strbuf_detach()` doesn't promise that it will work, and there is a pretty stern warning [1] that should hopefully have dissuaded developers from such a usage. But how could it be checked for sure? * Audit the callers of strbuf_detach(). But given that there are nearly 200 callers, that would be a huge amount of work. * On a test branch, change the existing implementation of strbuf_detach() to return newly-allocated memory of size `sb->len + 1` and free `sb->buf`, then run the test suite under valgrind. This would flush out examples of this antipattern in the test suite. It might seem like we don't have to worry about this, because existing callers only deal with strbufs that wrap heap-allocated strings. But such a caller might get a strbuf passed to it from a caller, and that caller might someday be modified to use stack-allocated strings. So I think that at least the valgrind test suggested above would be prudent. Michael [1] https://github.com/git/git/blob/f3913c2d03abc660140678a9e14dac399f847647/strbuf.h#L20-L23 ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 3:05 ` Michael Haggerty @ 2016-05-31 6:41 ` Johannes Schindelin 2016-05-31 8:25 ` Michael Haggerty 0 siblings, 1 reply; 38+ messages in thread From: Johannes Schindelin @ 2016-05-31 6:41 UTC (permalink / raw) To: Michael Haggerty Cc: William Duclot, git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy Hi Michael, On Tue, 31 May 2016, Michael Haggerty wrote: > On 05/30/2016 02:13 PM, Johannes Schindelin wrote: > > [...] > >> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) > >> { > >> char *res; > >> strbuf_grow(sb, 0); > >> - res = sb->buf; > >> + if (sb->flags & STRBUF_OWNS_MEMORY) > >> + res = sb->buf; > >> + else > >> + res = xmemdupz(sb->buf, sb->alloc - 1); > > > > This looks like a usage to be avoided: if we plan to detach the buffer, > > anyway, there is no good reason to allocate it on the heap first. I would > > at least issue a warning here. > > I think this last case should be changed to > > res = xmemdupz(sb->buf, sb->len); > > Johannes, if this change is made then I think that there is a reasonable > use case for calling `strbuf_detach()` on a strbuf that wraps a > stack-allocated string, so I don't think that a warning is needed. > > I think this change makes sense. After all, once a caller detaches a > string, it is probably not planning on growing/shrinking it anymore, so > any more space than that would probably be wasted. In fact, since the > caller has no way to ask how much extra space the detached string has > allocated, it is almost guaranteed that the space would be wasted. Ah, I did not think of that. (We could of course introduce strbuf_realloc_snugly() or some such, but I agree: why?) So the best counter-example to my objection might be something like this: -- snip -- Subject: git_pathdup(): avoid realloc() When generating Git paths, we already know the maximum length: MAX_PATH. Let's avoid realloc()-caused fragmentation by using a fixed buffer. As a side effect, we avoid overallocating the end result, too: previously, strbuf_detach() would not realloc() to avoid wasting the (sb->alloc - sb->len) bytes, while it malloc()s the precisely needed amount of memory when detaching fixed buffers. diff --git a/path.c b/path.c index 2511d8a..64fd3ee 100644 --- a/path.c +++ b/path.c @@ -426,7 +426,8 @@ const char *git_path(const char *fmt, ...) char *git_pathdup(const char *fmt, ...) { - struct strbuf path = STRBUF_INIT; + char buffer[MAX_PATH]; + struct strbuf path = STRBUF_WRAP_FIXED(buffer); va_list args; va_start(args, fmt); do_git_path(&path, fmt, args); -- snap -- Color me convinced, Dscho ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 6:41 ` Johannes Schindelin @ 2016-05-31 8:25 ` Michael Haggerty 0 siblings, 0 replies; 38+ messages in thread From: Michael Haggerty @ 2016-05-31 8:25 UTC (permalink / raw) To: Johannes Schindelin Cc: William Duclot, git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy On 05/31/2016 08:41 AM, Johannes Schindelin wrote: > Hi Michael, > > On Tue, 31 May 2016, Michael Haggerty wrote: > >> On 05/30/2016 02:13 PM, Johannes Schindelin wrote: >>> [...] >>>> @@ -38,7 +67,11 @@ char *strbuf_detach(struct strbuf *sb, size_t *sz) >>>> { >>>> char *res; >>>> strbuf_grow(sb, 0); >>>> - res = sb->buf; >>>> + if (sb->flags & STRBUF_OWNS_MEMORY) >>>> + res = sb->buf; >>>> + else >>>> + res = xmemdupz(sb->buf, sb->alloc - 1); >>> >>> This looks like a usage to be avoided: if we plan to detach the buffer, >>> anyway, there is no good reason to allocate it on the heap first. I would >>> at least issue a warning here. >> >> I think this last case should be changed to >> >> res = xmemdupz(sb->buf, sb->len); >> >> Johannes, if this change is made then I think that there is a reasonable >> use case for calling `strbuf_detach()` on a strbuf that wraps a >> stack-allocated string, so I don't think that a warning is needed. >> >> I think this change makes sense. After all, once a caller detaches a >> string, it is probably not planning on growing/shrinking it anymore, so >> any more space than that would probably be wasted. In fact, since the >> caller has no way to ask how much extra space the detached string has >> allocated, it is almost guaranteed that the space would be wasted. > > Ah, I did not think of that. (We could of course introduce > strbuf_realloc_snugly() or some such, but I agree: why?) So the best > counter-example to my objection might be something like this: > > -- snip -- > Subject: git_pathdup(): avoid realloc() > > When generating Git paths, we already know the maximum length: > MAX_PATH. Let's avoid realloc()-caused fragmentation by using a > fixed buffer. > > As a side effect, we avoid overallocating the end result, too: > previously, strbuf_detach() would not realloc() to avoid wasting > the (sb->alloc - sb->len) bytes, while it malloc()s the precisely > needed amount of memory when detaching fixed buffers. > > diff --git a/path.c b/path.c > index 2511d8a..64fd3ee 100644 > --- a/path.c > +++ b/path.c > @@ -426,7 +426,8 @@ const char *git_path(const char *fmt, ...) > > char *git_pathdup(const char *fmt, ...) > { > - struct strbuf path = STRBUF_INIT; > + char buffer[MAX_PATH]; > + struct strbuf path = STRBUF_WRAP_FIXED(buffer); > va_list args; > va_start(args, fmt); > do_git_path(&path, fmt, args); > -- snap -- I like the idea, but apparently MAX_PATH is not a hard limit that OSs must respect (e.g., see [1]). The existing implementation can generate paths longer than MAX_PATH whereas your replacement cannot. This would be easy to solve by using STRBUF_WRAP_MOVABLE() instead of STRBUF_WRAP_FIXED() in your patch. We would thereby accept the possibility that the strbuf might need to be expanded for really long paths. However, if that happens, then strbuf_detach() is likely to return the string in an overallocated memory area. So if there are callers who *really* care about not having overallocated memory, we might want a strbuf_detach_snugly(). Your (only half-serious) idea for strbuf_realloc_snugly() would have the disadvantage that it couldn't work with a fixed strbuf, whereas strbuf_detach_snugly() could. It would be sad for callers to have to worry about the distinction. Michael [1] http://insanecoding.blogspot.de/2007/11/pathmax-simply-isnt.html ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot 2016-05-30 12:13 ` Johannes Schindelin @ 2016-05-30 12:52 ` Matthieu Moy 2016-05-30 14:15 ` William Duclot 2016-05-31 4:05 ` Michael Haggerty 2016-05-30 21:56 ` Mike Hommey 2016-05-31 6:34 ` Junio C Hamano 3 siblings, 2 replies; 38+ messages in thread From: Matthieu Moy @ 2016-05-30 12:52 UTC (permalink / raw) To: William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, mhagger William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > multiple threads. Those limitations prevent strbuf to be used in prevent strbuf from being used ... > API ENHANCEMENT > --------------- > > All functions of the API can still be reliably called without > knowledge of the initialization (normal/preallocated/fixed) with the > exception that strbuf_grow() may die() if the string try to overflow a s/try/tries/ > @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; > > void strbuf_init(struct strbuf *sb, size_t hint) > { > + sb->flags = 0; > sb->alloc = sb->len = 0; > sb->buf = strbuf_slopbuf; > if (hint) > strbuf_grow(sb, hint); > } If you set flags = 0 here, existing callers will have all flags off, including OWNS_MEMORY. I *think* this is OK, as sb->buf is currently pointing to strbuf_slopbuf, which the the strbuf doesn't own. But that is too subtle to go without an explanatory comment IMHO. Also, doesn't this make the "new_buf" case useless in strbuf_grow? With your patch, the code looks like: void strbuf_grow(struct strbuf *sb, size_t extra) { int new_buf = !sb->alloc; ... if (sb->flags & STRBUF_OWNS_MEMORY) { if (new_buf) // <---------------------------------------- (1) sb->buf = NULL; ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); } else { /* * The strbuf doesn't own the buffer: to avoid to realloc it, * the strbuf needs to use a new buffer without freeing the old */ if (sb->len + extra + 1 > sb->alloc) { size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); char *buf = xmalloc(new_alloc); memcpy(buf, sb->buf, sb->alloc); sb->buf = buf; sb->alloc = new_alloc; sb->flags |= STRBUF_OWNS_MEMORY; } } if (new_buf) // <---------------------------------------- (2) sb->buf[0] = '\0'; } I think (1) is now dead code, since sb->alloc == 0 implies that STRBUF_OWNS_MEMORY is set. (2) seems redundant since you've just memcpy-ed the existing '\0' into the buffer. > +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, > + size_t path_buf_len, size_t alloc_len) > +{ > + if (!path_buf) > + die("you try to use a NULL buffer to initialize a strbuf"); > + > + strbuf_init(sb, 0); > + strbuf_attach(sb, path_buf, path_buf_len, alloc_len); > + sb->flags &= ~STRBUF_OWNS_MEMORY; > + sb->flags &= ~STRBUF_FIXED_MEMORY; > +} strbuf_wrap_preallocated seem very close to strbuf_attach. I'd rather see a symmetric code sharing like void strbuf_attach_internal(struct strbuf *sb, ..., unsigned int flags) and then strbuf_attach() and strbuf_wrap_preallocated() become straightforward wrappers around it. This would avoid setting and then unsetting STRBUF_OWNS_MEMORY (the performance impact is probably small, but it looks weird to set the flag and then unset it right away). After your patch, there are differences between strbuf_wrap_preallocated() which I think are inconsistencies: * strbuf_attach() does not check for NULL buffer, but it doesn't accept them either if I read correctly. It would make sense to add the check to strbuf_attach(), but it's weird to have the performance-critical oriented function do the expensive stuff that the non-performance-critical one doesn't. * strbuf_attach() calls strbuf_release(), which allows reusing an existing strbuf. strbuf_wrap_preallocated() calls strbuf_init which would override silently any previous content. I think strbuf_attach() does the right thing here. (And I'm probably the one who misguided you to do this) In any case, you probably want to include calls to strbuf_attach() and strbuf_wrap_*() functions on existing non-empty strbufs. > +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf, > + size_t path_buf_len, size_t alloc_len) > +{ > + strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len); > + sb->flags |= STRBUF_FIXED_MEMORY; > +} And this could become a 3rd caller of strbuf_attach_internal(). > @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra) > if (unsigned_add_overflows(extra, 1) || > unsigned_add_overflows(sb->len, extra + 1)) > die("you want to use way too much memory"); > - if (new_buf) > - sb->buf = NULL; > - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > + if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc) > + die("you try to make a string overflow the buffer of a fixed strbuf"); > + > + /* > + * ALLOC_GROW may do a realloc() if needed, so we must not use it on > + * a buffer the strbuf doesn't own > + */ > + if (sb->flags & STRBUF_OWNS_MEMORY) { > + if (new_buf) > + sb->buf = NULL; > + ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > + } else { > + /* > + * The strbuf doesn't own the buffer: to avoid to realloc it, > + * the strbuf needs to use a new buffer without freeing the old > + */ > + if (sb->len + extra + 1 > sb->alloc) { > + size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); > + char *buf = xmalloc(new_alloc); > + memcpy(buf, sb->buf, sb->alloc); I think you want to memcpy only sb->len + 1 bytes. Here, you're memcpy-ing the allocated-but-not-initialized part of the array. xmemdupz can probably simplify the code too (either you include the '\0' in what memcpy copies, or you let xmemdupz add it). > +/** > + * Allow the caller to give a pre-allocated piece of memory for the strbuf > + * to use. It is possible then to strbuf_grow() the string past the size of the > + * pre-allocated buffer: a new buffer will be allocated. The pre-allocated To make it clearer: "a new buffer will then be allocated"? > +/** > + * Allow the caller to give a pre-allocated piece of memory for the strbuf > + * to use and indicate that the strbuf must use exclusively this buffer, > + * never realloc() it or allocate a new one. It means that the string can > + * be manipulated but cannot overflow the pre-allocated buffer. The > + * pre-allocated buffer will never be freed. > + */ Perhaps say explicitly that although the allocated buffer has a fixed size, the string itself can grow as long as it does not overflow the buffer? > @@ -91,6 +116,8 @@ extern void strbuf_release(struct strbuf *); > * Detach the string from the strbuf and returns it; you now own the > * storage the string occupies and it is your responsibility from then on > * to release it with `free(3)` when you are done with it. > + * Must allocate a copy of the buffer in case of a preallocated/fixed buffer. > + * Performance-critical operations have to be aware of this. Better than just warn about performance, you can give the alternative. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 12:52 ` Matthieu Moy @ 2016-05-30 14:15 ` William Duclot 2016-05-30 14:34 ` Matthieu Moy 2016-05-31 4:05 ` Michael Haggerty 1 sibling, 1 reply; 38+ messages in thread From: William Duclot @ 2016-05-30 14:15 UTC (permalink / raw) To: Matthieu Moy; +Cc: git, simon rabourg, francois beutin, antoine queru, mhagger Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes: > William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > >> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; >> >> void strbuf_init(struct strbuf *sb, size_t hint) >> { >> + sb->flags = 0; >> sb->alloc = sb->len = 0; >> sb->buf = strbuf_slopbuf; >> if (hint) >> strbuf_grow(sb, hint); >> } > > If you set flags = 0 here, existing callers will have all flags off, > including OWNS_MEMORY. > > I *think* this is OK, as sb->buf is currently pointing to > strbuf_slopbuf, which the the strbuf doesn't own. But that is too subtle > to go without an explanatory comment IMHO. Right > > Also, doesn't this make the "new_buf" case useless in strbuf_grow? > > With your patch, the code looks like: > > void strbuf_grow(struct strbuf *sb, size_t extra) > { > int new_buf = !sb->alloc; > ... > if (sb->flags & STRBUF_OWNS_MEMORY) { > if (new_buf) // <---------------------------------------- (1) > sb->buf = NULL; > ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > } else { > /* > * The strbuf doesn't own the buffer: to avoid to realloc it, > * the strbuf needs to use a new buffer without freeing the old > */ > if (sb->len + extra + 1 > sb->alloc) { > size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); > char *buf = xmalloc(new_alloc); > memcpy(buf, sb->buf, sb->alloc); > sb->buf = buf; > sb->alloc = new_alloc; > sb->flags |= STRBUF_OWNS_MEMORY; > } > } > > if (new_buf) // <---------------------------------------- (2) > sb->buf[0] = '\0'; > } > > I think (1) is now dead code, since sb->alloc == 0 implies that > STRBUF_OWNS_MEMORY is set. (2) seems redundant since you've just > memcpy-ed the existing '\0' into the buffer. You're right for (1), I hadn't noticed that. For (2), we'll still have to set sb->buf[new_alloc-1]='\0' after the memcpy, if we have sb->alloc==0 then the memcpy won't copy it. >> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, >> + size_t path_buf_len, size_t alloc_len) >> +{ >> + if (!path_buf) >> + die("you try to use a NULL buffer to initialize a strbuf"); >> + >> + strbuf_init(sb, 0); >> + strbuf_attach(sb, path_buf, path_buf_len, alloc_len); >> + sb->flags &= ~STRBUF_OWNS_MEMORY; >> + sb->flags &= ~STRBUF_FIXED_MEMORY; >> +} > > strbuf_wrap_preallocated seem very close to strbuf_attach. I'd rather > see a symmetric code sharing like > > void strbuf_attach_internal(struct strbuf *sb, ..., unsigned int flags) > > and then strbuf_attach() and strbuf_wrap_preallocated() become > straightforward wrappers around it. > > This would avoid setting and then unsetting STRBUF_OWNS_MEMORY (the > performance impact is probably small, but it looks weird to set the flag > and then unset it right away). We'll refactor the code with Johannes' remarks and yours in mind > After your patch, there are differences between > strbuf_wrap_preallocated() which I think are inconsistencies: > > * strbuf_attach() does not check for NULL buffer, but it doesn't accept > them either if I read correctly. It would make sense to add the check > to strbuf_attach(), but it's weird to have the performance-critical > oriented function do the expensive stuff that the > non-performance-critical one doesn't. I agree that strbuf_attach should do the check (it seems strange that it doesn't already do it, as the "buffer never NULL" invariant is not new). I don't understand your "but" part, what "expensive stuff" are you talking about? > * strbuf_attach() calls strbuf_release(), which allows reusing an > existing strbuf. strbuf_wrap_preallocated() calls strbuf_init which > would override silently any previous content. I think strbuf_attach() > does the right thing here. > > In any case, you probably want to include calls to strbuf_attach() and > strbuf_wrap_*() functions on existing non-empty strbufs. > >> +void strbuf_wrap_fixed(struct strbuf *sb, char *path_buf, >> + size_t path_buf_len, size_t alloc_len) >> +{ >> + strbuf_wrap_preallocated(sb, path_buf, path_buf_len, alloc_len); >> + sb->flags |= STRBUF_FIXED_MEMORY; >> +} > > And this could become a 3rd caller of strbuf_attach_internal(). True. We'll take care of that when refactoring >> @@ -61,9 +96,32 @@ void strbuf_grow(struct strbuf *sb, size_t extra) >> if (unsigned_add_overflows(extra, 1) || >> unsigned_add_overflows(sb->len, extra + 1)) >> die("you want to use way too much memory"); >> - if (new_buf) >> - sb->buf = NULL; >> - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); >> + if ((sb->flags & STRBUF_FIXED_MEMORY) && sb->len + extra + 1 > sb->alloc) >> + die("you try to make a string overflow the buffer of a fixed strbuf"); >> + >> + /* >> + * ALLOC_GROW may do a realloc() if needed, so we must not use it on >> + * a buffer the strbuf doesn't own >> + */ >> + if (sb->flags & STRBUF_OWNS_MEMORY) { >> + if (new_buf) >> + sb->buf = NULL; >> + ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); >> + } else { >> + /* >> + * The strbuf doesn't own the buffer: to avoid to realloc it, >> + * the strbuf needs to use a new buffer without freeing the old >> + */ >> + if (sb->len + extra + 1 > sb->alloc) { >> + size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); >> + char *buf = xmalloc(new_alloc); >> + memcpy(buf, sb->buf, sb->alloc); > > I think you want to memcpy only sb->len + 1 bytes. Here, you're > memcpy-ing the allocated-but-not-initialized part of the array. > > xmemdupz can probably simplify the code too (either you include the '\0' > in what memcpy copies, or you let xmemdupz add it). memcpying sb->len+1 bytes is indeed enough. xmemdupz can only allocate the same size it will copy. Here we want to allocate `new_alloc` bytes but only copy `sb->alloc` bytes (or sb->len+1 bytes, as you noticed) >> +/** >> + * Allow the caller to give a pre-allocated piece of memory for the strbuf >> + * to use. It is possible then to strbuf_grow() the string past the size >> of the >> + * pre-allocated buffer: a new buffer will be allocated. The pre-allocated > > To make it clearer: "a new buffer will then be allocated"? OK >> +/** >> + * Allow the caller to give a pre-allocated piece of memory for the strbuf >> + * to use and indicate that the strbuf must use exclusively this buffer, >> + * never realloc() it or allocate a new one. It means that the string can >> + * be manipulated but cannot overflow the pre-allocated buffer. The >> + * pre-allocated buffer will never be freed. >> + */ > > Perhaps say explicitly that although the allocated buffer has a fixed > size, the string itself can grow as long as it does not overflow the > buffer? That's what I meant by "the string can be manipulated but cannot overflow the pre-allocated buffer". I'll try to reformulate >> @@ -91,6 +116,8 @@ extern void strbuf_release(struct strbuf *); >> * Detach the string from the strbuf and returns it; you now own the >> * storage the string occupies and it is your responsibility from then on >> * to release it with `free(3)` when you are done with it. >> + * Must allocate a copy of the buffer in case of a preallocated/fixed >> buffer. >> + * Performance-critical operations have to be aware of this. > > Better than just warn about performance, you can give the alternative. I'm not sure what you mean, I don't think there really is an alternative for detaching a string? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 14:15 ` William Duclot @ 2016-05-30 14:34 ` Matthieu Moy 2016-05-30 15:16 ` William Duclot 0 siblings, 1 reply; 38+ messages in thread From: Matthieu Moy @ 2016-05-30 14:34 UTC (permalink / raw) To: William Duclot Cc: git, simon rabourg, francois beutin, antoine queru, mhagger William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes: > >> void strbuf_grow(struct strbuf *sb, size_t extra) >> { >> int new_buf = !sb->alloc; >> ... >> if (sb->flags & STRBUF_OWNS_MEMORY) { >> if (new_buf) // <---------------------------------------- (1) >> sb->buf = NULL; >> ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); >> } else { >> /* >> * The strbuf doesn't own the buffer: to avoid to realloc it, >> * the strbuf needs to use a new buffer without freeing the old >> */ >> if (sb->len + extra + 1 > sb->alloc) { >> size_t new_alloc = MAX(sb->len + extra + 1, alloc_nr(sb->alloc)); >> char *buf = xmalloc(new_alloc); >> memcpy(buf, sb->buf, sb->alloc); >> sb->buf = buf; >> sb->alloc = new_alloc; >> sb->flags |= STRBUF_OWNS_MEMORY; >> } >> } >> >> if (new_buf) // <---------------------------------------- (2) >> sb->buf[0] = '\0'; >> } >> >> I think (1) is now dead code, since sb->alloc == 0 implies that >> STRBUF_OWNS_MEMORY is set. (2) seems redundant since you've just >> memcpy-ed the existing '\0' into the buffer. > > You're right for (1), I hadn't noticed that. > For (2), we'll still have to set sb->buf[new_alloc-1]='\0' after the memcpy, if we > have sb->alloc==0 then the memcpy won't copy it. That sounds like one more reason to memcpy len + 1 bytes, and you'll get the '\0' copied. >> After your patch, there are differences between >> strbuf_wrap_preallocated() which I think are inconsistencies: >> >> * strbuf_attach() does not check for NULL buffer, but it doesn't accept >> them either if I read correctly. It would make sense to add the check >> to strbuf_attach(), but it's weird to have the performance-critical >> oriented function do the expensive stuff that the >> non-performance-critical one doesn't. > > I agree that strbuf_attach should do the check (it seems strange that it > doesn't already do it, as the "buffer never NULL" invariant is not new). > I don't understand your "but" part, what "expensive stuff" are you talking > about? "expensive stuff" was an exageration for "== NULL" test. It's not that expensive, but costs a tiny bit of CPU time. > xmemdupz can only allocate the same size it will copy. Indeed, so forget about it. >>> +/** >>> + * Allow the caller to give a pre-allocated piece of memory for the strbuf >>> + * to use and indicate that the strbuf must use exclusively this buffer, >>> + * never realloc() it or allocate a new one. It means that the string can >>> + * be manipulated but cannot overflow the pre-allocated buffer. The >>> + * pre-allocated buffer will never be freed. >>> + */ >> >> Perhaps say explicitly that although the allocated buffer has a fixed >> size, the string itself can grow as long as it does not overflow the >> buffer? > > That's what I meant by "the string can be manipulated but cannot overflow > the pre-allocated buffer". I'll try to reformulate Maybe "the string can grow, but cannot overflow"? >>> @@ -91,6 +116,8 @@ extern void strbuf_release(struct strbuf *); >>> * Detach the string from the strbuf and returns it; you now own the >>> * storage the string occupies and it is your responsibility from then on >>> * to release it with `free(3)` when you are done with it. >>> + * Must allocate a copy of the buffer in case of a preallocated/fixed >>> buffer. >>> + * Performance-critical operations have to be aware of this. >> >> Better than just warn about performance, you can give the alternative. > > I'm not sure what you mean, I don't think there really is an alternative for > detaching a string? So, is the comment above saying "You're doomed, there's no way you can get good performance anyway"? The alternative is just that you don't have to call strbuf_release since the caller can access the .buf field and is already the one responsible for freeing it when needed, and it's safe to just call strbuf_init() if one needs to re-initialize the stbuf structure. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 14:34 ` Matthieu Moy @ 2016-05-30 15:16 ` William Duclot 0 siblings, 0 replies; 38+ messages in thread From: William Duclot @ 2016-05-30 15:16 UTC (permalink / raw) To: Matthieu Moy; +Cc: git, simon rabourg, francois beutin, antoine queru, mhagger Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes: > William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > >> Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes: >>>> +/** >>>> + * Allow the caller to give a pre-allocated piece of memory for the >>>> strbuf >>>> + * to use and indicate that the strbuf must use exclusively this buffer, >>>> + * never realloc() it or allocate a new one. It means that the string >>>> can >>>> + * be manipulated but cannot overflow the pre-allocated buffer. The >>>> + * pre-allocated buffer will never be freed. >>>> + */ >>> >>> Perhaps say explicitly that although the allocated buffer has a fixed >>> size, the string itself can grow as long as it does not overflow the >>> buffer? >> >> That's what I meant by "the string can be manipulated but cannot overflow >> the pre-allocated buffer". I'll try to reformulate > > Maybe "the string can grow, but cannot overflow"? Seems OK >>>> @@ -91,6 +116,8 @@ extern void strbuf_release(struct strbuf *); >>>> * Detach the string from the strbuf and returns it; you now own the >>>> * storage the string occupies and it is your responsibility from then >>>> on >>>> * to release it with `free(3)` when you are done with it. >>>> + * Must allocate a copy of the buffer in case of a preallocated/fixed >>>> buffer. >>>> + * Performance-critical operations have to be aware of this. >>> >>> Better than just warn about performance, you can give the alternative. >> >> I'm not sure what you mean, I don't think there really is an alternative >> for >> detaching a string? > > So, is the comment above saying "You're doomed, there's no way you can > get good performance anyway"? > > The alternative is just that you don't have to call strbuf_release since > the caller can access the .buf field and is already the one responsible > for freeing it when needed, and it's safe to just call strbuf_init() if > one needs to re-initialize the stbuf structure. Actually, if the user _needs_ to detach(), then yes he's doomed. Or more realistically, he have to refactor so he'll don't need detach() (by being sure the strbuf is pre-allocated by himself for example). The strbuf_release() function is not the problem here, it does nothing problematic for performances. The problem is strbuf_wrap_preallocated(), but you're right I can add a comment so the user know when he don't have to call it. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 12:52 ` Matthieu Moy 2016-05-30 14:15 ` William Duclot @ 2016-05-31 4:05 ` Michael Haggerty 2016-05-31 15:59 ` William Duclot 2016-06-03 14:04 ` William Duclot 1 sibling, 2 replies; 38+ messages in thread From: Michael Haggerty @ 2016-05-31 4:05 UTC (permalink / raw) To: Matthieu Moy, William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru On 05/30/2016 02:52 PM, Matthieu Moy wrote: > [...] I feel bad bikeshedding about names, especially since you took some of the original names from my RFC. But names are very important, so I think it's worth considering whether the current names could be improved upon. When reading this patch series, I found I had trouble remembering whether "preallocated" meant "preallocated and movable" or "preallocated and immovable". So maybe we should brainstorm alternatives to "preallocated" and "fixed". For example, * "growable"/"fixed"? Seems OK, though all strbufs are growable at least to the size of their initial allocation, so maybe "growable" is misleading. * "movable"/"fixed"? This maybe better captures the essence of the distinction. I'll use those names below for concreteness, without claiming that they are the best. > * strbuf_attach() calls strbuf_release(), which allows reusing an > existing strbuf. strbuf_wrap_preallocated() calls strbuf_init which > would override silently any previous content. I think strbuf_attach() > does the right thing here. Hmmm.... I think the best way to answer these questions is to think about use cases and make them as easy/consistent as possible. I expect that a very common use of strbuf_wrap_fixed() will be to wrap a stack-allocated string, like char pathbuf[PATH_MAX]; struct strbuf path; strbuf_wrap_fixed(&path, pathbuf, 0, sizeof(pathbuf)); In this use case, it would be a shame if `path` had to be initialized to STRBUF_INIT just because `strbuf_wrap_fixed()` calls `strbuf_release()` internally. But maybe we could make this use case easier still. If there were a macro #define STRBUF_FIXED_WRAPPER(sb, buf, len) { STRBUF_FIXED_MEMORY, sizeof(buf), (len), (buf) } then we could write char pathbuf[PATH_MAX]; struct strbuf path = STRBUF_FIXED_WRAPPER(pathbuf, 0); I think that would be pretty usable. One would have to be careful only to wrap arrays and not `char *` pointers, because `sizeof` wouldn't work on the latter. The BARF_UNLESS_AN_ARRAY macro could be used here to add a little safety. (It would be even nicer if we could write struct strbuf path = STRBUF_FIXED(PATH_MAX); and it would initialize both path and a pathbuf variable for it to wrap, but I don't think there is a way to implement such a macro. So I think the only way to squeeze this onto one line would be to make it look like DEFINE_FIXED_STRBUF(path, PATH_MAX); But that looks awful, so I think the two-line version above is preferable.) Similarly, there could be a macro #define STRBUF_MOVABLE_WRAPPER(sb, buf, len) { 0, sizeof(buf), (len), (buf) } If you provide macro forms like these for initializing strbufs, then I agree with Matthieu that the analogous functional forms should probably call strbuf_release() before wrapping the array. The functions might be named more like `strbuf_attach()` to emphasize their similarity to that existing function. Maybe strbuf_attach_fixed(struct strbuf *sb, void *s, size_t len, size_t alloc); strbuf_attach_movable(struct strbuf *sb, void *s, size_t len, size_t alloc); Michael ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 4:05 ` Michael Haggerty @ 2016-05-31 15:59 ` William Duclot 2016-06-03 14:04 ` William Duclot 1 sibling, 0 replies; 38+ messages in thread From: William Duclot @ 2016-05-31 15:59 UTC (permalink / raw) To: Michael Haggerty Cc: Matthieu Moy, git, simon.rabourg, francois.beutin, antoine.queru On Tue, May 31, 2016 at 06:05:45AM +0200, Michael Haggerty wrote: > On 05/30/2016 02:52 PM, Matthieu Moy wrote: > > [...] > > I feel bad bikeshedding about names, especially since you took some of > the original names from my RFC. But names are very important, so I think > it's worth considering whether the current names could be improved upon. > > When reading this patch series, I found I had trouble remembering > whether "preallocated" meant "preallocated and movable" or "preallocated > and immovable". So maybe we should brainstorm alternatives to > "preallocated" and "fixed". For example, > > * "growable"/"fixed"? Seems OK, though all strbufs are growable at least > to the size of their initial allocation, so maybe "growable" is misleading. > > * "movable"/"fixed"? This maybe better captures the essence of the > distinction. I'll use those names below for concreteness, without > claiming that they are the best. Yes, the names are debatable > > * strbuf_attach() calls strbuf_release(), which allows reusing an > > existing strbuf. strbuf_wrap_preallocated() calls strbuf_init which > > would override silently any previous content. I think strbuf_attach() > > does the right thing here. > > Hmmm.... > > I think the best way to answer these questions is to think about use > cases and make them as easy/consistent as possible. > > I expect that a very common use of strbuf_wrap_fixed() will be to wrap a > stack-allocated string, like > > char pathbuf[PATH_MAX]; > struct strbuf path; > > strbuf_wrap_fixed(&path, pathbuf, 0, sizeof(pathbuf)); > > In this use case, it would be a shame if `path` had to be initialized to > STRBUF_INIT just because `strbuf_wrap_fixed()` calls `strbuf_release()` > internally. > > But maybe we could make this use case easier still. If there were a macro > > #define STRBUF_FIXED_WRAPPER(sb, buf, len) { STRBUF_FIXED_MEMORY, > sizeof(buf), (len), (buf) } > > then we could write > > char pathbuf[PATH_MAX]; > struct strbuf path = STRBUF_FIXED_WRAPPER(pathbuf, 0); > > I think that would be pretty usable. One would have to be careful only > to wrap arrays and not `char *` pointers, because `sizeof` wouldn't work > on the latter. The BARF_UNLESS_AN_ARRAY macro could be used here to add > a little safety. That sounds like a good idea to me > [...] > If you provide macro forms like these for initializing strbufs, then I > agree with Matthieu that the analogous functional forms should probably > call strbuf_release() before wrapping the array. The functions might be > named more like `strbuf_attach()` to emphasize their similarity to that > existing function. Maybe > > strbuf_attach_fixed(struct strbuf *sb, void *s, size_t len, size_t > alloc); > strbuf_attach_movable(struct strbuf *sb, void *s, size_t len, size_t > alloc); I'm not a big fan of the idea that the macro and the function don't have the same behavior. Using "attach" instead of "wrap" or "init" may justify that ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 4:05 ` Michael Haggerty 2016-05-31 15:59 ` William Duclot @ 2016-06-03 14:04 ` William Duclot 1 sibling, 0 replies; 38+ messages in thread From: William Duclot @ 2016-06-03 14:04 UTC (permalink / raw) To: Michael Haggerty Cc: Matthieu Moy, git, simon.rabourg, francois.beutin, antoine.queru, Johannes.Schindelin On Tue, May 31, 2016 at 06:05:45AM +0200, Michael Haggerty wrote: > When reading this patch series, I found I had trouble remembering > whether "preallocated" meant "preallocated and movable" or "preallocated > and immovable". So maybe we should brainstorm alternatives to > "preallocated" and "fixed". For example, > > * "growable"/"fixed"? Seems OK, though all strbufs are growable at least > to the size of their initial allocation, so maybe "growable" is misleading. > > * "movable"/"fixed"? This maybe better captures the essence of the > distinction. I'll use those names below for concreteness, without > claiming that they are the best. > > [...] > > The functions might be > named more like `strbuf_attach()` to emphasize their similarity to that > existing function. Maybe > > strbuf_attach_fixed(struct strbuf *sb, void *s, size_t len, size_t > alloc); > strbuf_attach_movable(struct strbuf *sb, void *s, size_t len, size_t > alloc); Now that I am looking in detail into it, I am not so convinced by those names. Using "attach" suggests the same behavior as strbuf_attach(), which is _not_ the case to me: - The aim of the attach() function is to give ownership of a buffer allocated by the caller to the strbuf. - The aim of the wrap() functions is to give the right to use a buffer allocated by the caller to the strbuf, while keeping ownership. - For completion: the aim of the init() function is to let the strbuf manage its own buffer. I think that it is important to distinct those 3 use cases for the API user to be able to understand. And to describe this API extension, "wrap" seems clear to me. Another point that would makes me skeptical about using `strbuf_attach_preallocated()` is that the real difference with `strbuf_attach()` isn't in the allocation of the buffer parameter, it is in the ownership. Both takes a buffer allocated by the user as parameter (so a preallocated buffer), even thought `strbuf_attach_preallocated()` may use a stack-allocated one. So I come to the conclusion that even using the word "preallocated" may not be such a good idea, as even strbuf_attach() use a preallocated buffer. What I think would be the clearer would be: - strbuf_attach() (unchanged) - strbuf_wrap() (no need for the "preallocated") - strbuf_wrap_fixed() - STRBUF_WRAP - STRBUF_WRAP_FIXED The two last ones would be macros, equivalent to the functions except that they don't release the strbuf before initializing it. What do you think about this? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot 2016-05-30 12:13 ` Johannes Schindelin 2016-05-30 12:52 ` Matthieu Moy @ 2016-05-30 21:56 ` Mike Hommey 2016-05-30 22:46 ` William Duclot 2016-05-31 6:34 ` Junio C Hamano 3 siblings, 1 reply; 38+ messages in thread From: Mike Hommey @ 2016-05-30 21:56 UTC (permalink / raw) To: William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger > struct strbuf { > + unsigned int flags; > size_t alloc; > size_t len; > char *buf; > }; Depending whether the size of strbuf matters, it /might/ be worth considering some packing here. malloc() usually returns buffers that can contain more data than what is requested. Which means allocation sizes could be rounded and that wouldn't change the amount of allocated memory. On glibc malloc_usable_size(malloc(1)) apparently returns 24. On jemalloc, it's 4 or 8. It's in the same ballbark with many allocators. So, it would be possible to round alloc such that it's always a multiple of, say, 4, and stick flags in the low, unused bits. Whether it's worth doing is another question. Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 21:56 ` Mike Hommey @ 2016-05-30 22:46 ` William Duclot 2016-05-30 22:50 ` Mike Hommey 0 siblings, 1 reply; 38+ messages in thread From: William Duclot @ 2016-05-30 22:46 UTC (permalink / raw) To: Mike Hommey Cc: git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger Mike Hommey <mh@glandium.org> writes: >> struct strbuf { >> + unsigned int flags; >> size_t alloc; >> size_t len; >> char *buf; >> }; > > Depending whether the size of strbuf matters, it /might/ be worth > considering some packing here. malloc() usually returns buffers that can > contain more data than what is requested. Which means allocation sizes > could be rounded and that wouldn't change the amount of allocated > memory. On glibc malloc_usable_size(malloc(1)) apparently returns 24. > On jemalloc, it's 4 or 8. It's in the same ballbark with many > allocators. > > So, it would be possible to round alloc such that it's always a multiple > of, say, 4, and stick flags in the low, unused bits. If I'm not mistaken, the memory allocated is not necessarily linear with the size asked, depending on the algorithm used by the allocator and/or the kernel. The system for exemple use powers of two, if the user asks for exactly 2^x bytes, adding the space for the flags would lead to an allocation of 2^(x+1) bytes. Way worse than storing an unsigned. If the allocator use a fibonnaci system, we can't even rely on multiples of 4 (or 2). I'm not sure the fibonnaci system is actually used by any allocator, but my point is that I'm not sure it is a good thing to rely on such low-level implementations. > Whether it's worth doing is another question. I'd say it is not, we generally lack time more than space, storing an unsigned seems very reasonable compared to operations involved in using malloc() leftovers :) Not even talking about growing buffers, so realloc(), so a whole new set of problems... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 22:46 ` William Duclot @ 2016-05-30 22:50 ` Mike Hommey 0 siblings, 0 replies; 38+ messages in thread From: Mike Hommey @ 2016-05-30 22:50 UTC (permalink / raw) To: William Duclot Cc: git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger On Tue, May 31, 2016 at 12:46:23AM +0200, William Duclot wrote: > Mike Hommey <mh@glandium.org> writes: > >> struct strbuf { > >> + unsigned int flags; > >> size_t alloc; > >> size_t len; > >> char *buf; > >> }; > > > > Depending whether the size of strbuf matters, it /might/ be worth > > considering some packing here. malloc() usually returns buffers that can > > contain more data than what is requested. Which means allocation sizes > > could be rounded and that wouldn't change the amount of allocated > > memory. On glibc malloc_usable_size(malloc(1)) apparently returns 24. > > On jemalloc, it's 4 or 8. It's in the same ballbark with many > > allocators. > > > > So, it would be possible to round alloc such that it's always a multiple > > of, say, 4, and stick flags in the low, unused bits. > > If I'm not mistaken, the memory allocated is not necessarily linear with > the size asked, depending on the algorithm used by the allocator and/or > the kernel. The system for exemple use powers of two, if the user asks > for exactly 2^x bytes, adding the space for the flags would lead to an > allocation of 2^(x+1) bytes. No, it would not. If you requested 129 bytes, you'd request 136 instead, which the allocator would round to the same power of two. If you requested 128, you'd still request 128. It's not about adding space in the allocated buffer for the flags, it's about needing less bits in `alloc` because those bits are effectively useless because of how allocators work. > Way worse than storing an unsigned. > If the allocator use a fibonnaci system, we can't even rely on multiples > of 4 (or 2). > I'm not sure the fibonnaci system is actually used by any allocator, but > my point is that I'm not sure it is a good thing to rely on such > low-level implementations. Allocators have constraints related to word sizes and alignment, so they are pretty much guaranteed to align things to powers of two. Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot ` (2 preceding siblings ...) 2016-05-30 21:56 ` Mike Hommey @ 2016-05-31 6:34 ` Junio C Hamano 2016-05-31 15:45 ` William 3 siblings, 1 reply; 38+ messages in thread From: Junio C Hamano @ 2016-05-31 6:34 UTC (permalink / raw) To: William Duclot Cc: git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > The API contract is still respected: > > - The API users may peek strbuf.buf in-place until they perform an > operation that makes it longer (at which point the .buf pointer > may point at a new piece of memory). I think the contract is actually a bit stronger; the API reserves the right to free and reallocate a smaller chunk of memory if you make the string shorter, so peeked value of .buf will not be relied upon even if you didn't make it longer. > - The API users may strbuf_detach() to obtain a piece of memory that > belongs to them (at which point the strbuf becomes empty), hence > needs to be freed by the callers. Shouldn't you be honuoring another API contract? - If you allow an instance of strbuf go out of scope without taking the ownership of the string by calling strbuf_detach(), you must release the resource by calling strbuf_release(). As long as your "on stack strbuf" allows lengthening the string beyond the initial allocation (i.e. .alloc, not .len), the user of the API (i.e. the one that placed the strbuf on its stack) would not know when the implementation (i.e. the code in this patch) decides to switch to allocated memory, so it must call strbuf_release() before it leaves. Which in turn means that your implementation of strbuf_release() must be prepared to be take a strbuf that still has its string on the stack. On the other hand, if your "on stack strbuf" does not allow lengthening, I'd find such a "feature" pretty much useless. The caller must always test the remaining capacity, and switch to a dynamic strbuf, which is something the caller would expect the API implementation to handle silently. You obviously do not have to release the resource in such a case, but that is being convenient in the wrong part of the API. It would be wonderful if I can do: void func (void) { extern void use(char *[2]); /* * strbuf that uses 1024-byte on-stack buffer * initially, but it may be extended dynamically. */ struct strbuf buf = STRBUF_INIT_ON_STACK(1024); char *x[2]; strbuf_add(&buf, ...); /* add a lot of stuff */ x[0] = strbuf_detach(&buf, NULL); strbuf_add(&buf, ...); /* do some stuff */ x[1] = strbuf_detach(&buf, NULL); use(x); strbuf_release(&buf); } and add more than 2kb with the first add (hence causing buf to switch to dynamic scheme), the first _detach() gives the malloc()ed piece of memory to x[0] _and_ points buf.buf back to the on-stack buffer (and buf.alloc back to 1024) while setting buf.len to 0, so that the second _add() can still work purely on stack as long as it does not go beyond the 1024-byte on-stack buffer. > +/** > + * Flags > + * -------------- > + */ > +#define STRBUF_OWNS_MEMORY 1 > +#define STRBUF_FIXED_MEMORY (1 << 1) This is somewhat a strange way to spell two flag bits. Either spell them as 1 and 2 (perhaps in octal or hexadecimal), or spell them as 1 shifted by 0 and 1 to the left. Don't mix the notation. > @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; > > void strbuf_init(struct strbuf *sb, size_t hint) > { > + sb->flags = 0; > sb->alloc = sb->len = 0; > sb->buf = strbuf_slopbuf; > if (hint) > strbuf_grow(sb, hint); > } > > +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, > + size_t path_buf_len, size_t alloc_len) > +{ > + if (!path_buf) > + die("you try to use a NULL buffer to initialize a strbuf"); What does "path" mean in the context of this function (and its "fixed" sibling)? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 6:34 ` Junio C Hamano @ 2016-05-31 15:45 ` William 2016-05-31 15:54 ` Matthieu Moy 0 siblings, 1 reply; 38+ messages in thread From: William @ 2016-05-31 15:45 UTC (permalink / raw) To: Junio C Hamano Cc: git, simon.rabourg, francois.beutin, antoine.queru, matthieu.moy, mhagger On Mon, May 30, 2016 at 11:34:42PM -0700, Junio C Hamano wrote: > William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > >> The API contract is still respected: >> >> - The API users may peek strbuf.buf in-place until they perform an >> operation that makes it longer (at which point the .buf pointer >> may point at a new piece of memory). > > I think the contract is actually a bit stronger; the API reserves > the right to free and reallocate a smaller chunk of memory if you > make the string shorter, so peeked value of .buf will not be relied > upon even if you didn't make it longer. Right, anytime the string size change >> - The API users may strbuf_detach() to obtain a piece of memory that >> belongs to them (at which point the strbuf becomes empty), hence >> needs to be freed by the callers. > > Shouldn't you be honuoring another API contract? > > - If you allow an instance of strbuf go out of scope without taking > the ownership of the string by calling strbuf_detach(), you must > release the resource by calling strbuf_release(). > > As long as your "on stack strbuf" allows lengthening the string > beyond the initial allocation (i.e. .alloc, not .len), the user of > the API (i.e. the one that placed the strbuf on its stack) would not > know when the implementation (i.e. the code in this patch) decides > to switch to allocated memory, so it must call strbuf_release() > before it leaves. Which in turn means that your implementation of > strbuf_release() must be prepared to be take a strbuf that still has > its string on the stack. Well, my implementation does handle a strbuf that still has its string on the stack: the buffer won't be freed in this case (only a reset to STRBUF_INIT). Unless I misunderstood you? > On the other hand, if your "on stack strbuf" does not allow > lengthening, I'd find such a "feature" pretty much useless. The > caller must always test the remaining capacity, and switch to a > dynamic strbuf, which is something the caller would expect the API > implementation to handle silently. You obviously do not have to > release the resource in such a case, but that is being convenient > in the wrong part of the API. > > It would be wonderful if I can do: > > void func (void) > { > extern void use(char *[2]); > /* > * strbuf that uses 1024-byte on-stack buffer > * initially, but it may be extended dynamically. > */ > struct strbuf buf = STRBUF_INIT_ON_STACK(1024); > char *x[2]; > > strbuf_add(&buf, ...); /* add a lot of stuff */ > x[0] = strbuf_detach(&buf, NULL); > strbuf_add(&buf, ...); /* do some stuff */ > x[1] = strbuf_detach(&buf, NULL); > use(x); > > strbuf_release(&buf); > } > > and add more than 2kb with the first add (hence causing buf to > switch to dynamic scheme), the first _detach() gives the malloc()ed > piece of memory to x[0] _and_ points buf.buf back to the on-stack > buffer (and buf.alloc back to 1024) while setting buf.len to 0, > so that the second _add() can still work purely on stack as long as > it does not go beyond the 1024-byte on-stack buffer. I think that it's possible, but extends the API beyond what was originally intended by Michael. I don't see other use cases than treating several string sequentially, is it worth avoiding a attach_movable() (as suggested by Michael in place of wrap_preallocated())? You could do: void func (void) { extern void use(char *[2]); /* * strbuf that uses 1024-byte on-stack buffer * initially, but it may be extended dynamically. */ char on_stack[1024]; struct strbuf buf; char x[2]; strbuf_attach_movable(&sb, on_stack, 0, sizeof(on_stack)); strbuf_add(&buf, ...); /* add a lot of stuff */ x[0] = strbuf_detach(&buf, NULL); strbuf_attach_movable(&sb, on_stack, 0, sizeof(on_stack)); strbuf_add(&buf, ...); /* do some stuff */ x[1] = strbuf_detach(&buf, NULL); use(x); strbuf_release(&buf); } Which seems pretty close without needing the strbuf API to remember the original buffer or to even care about memory being on the stack or the heap. >> +/** >> + * Flags >> + * -------------- >> + */ >> +#define STRBUF_OWNS_MEMORY 1 >> +#define STRBUF_FIXED_MEMORY (1 << 1) > > This is somewhat a strange way to spell two flag bits. Either spell > them as 1 and 2 (perhaps in octal or hexadecimal), or spell them as > 1 shifted by 0 and 1 to the left. Don't mix the notation. Noted >> @@ -20,16 +28,37 @@ char strbuf_slopbuf[1]; >> >> void strbuf_init(struct strbuf *sb, size_t hint) >> { >> + sb->flags = 0; >> sb->alloc = sb->len = 0; >> sb->buf = strbuf_slopbuf; >> if (hint) >> strbuf_grow(sb, hint); >> } >> >> +void strbuf_wrap_preallocated(struct strbuf *sb, char *path_buf, >> + size_t path_buf_len, size_t alloc_len) >> +{ >> + if (!path_buf) >> + die("you try to use a NULL buffer to initialize a strbuf"); > > What does "path" mean in the context of this function (and its > "fixed" sibling)? That should be someting like `str` and `str_len` indeed ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 15:45 ` William @ 2016-05-31 15:54 ` Matthieu Moy 2016-05-31 16:08 ` William Duclot 0 siblings, 1 reply; 38+ messages in thread From: Matthieu Moy @ 2016-05-31 15:54 UTC (permalink / raw) To: William Cc: Junio C Hamano, git, simon.rabourg, francois.beutin, antoine.queru, mhagger William <william.duclot@ensimag.grenoble-inp.fr> writes: > On Mon, May 30, 2016 at 11:34:42PM -0700, Junio C Hamano wrote: > >> As long as your "on stack strbuf" allows lengthening the string >> beyond the initial allocation (i.e. .alloc, not .len), the user of >> the API (i.e. the one that placed the strbuf on its stack) would not >> know when the implementation (i.e. the code in this patch) decides >> to switch to allocated memory, so it must call strbuf_release() >> before it leaves. Which in turn means that your implementation of >> strbuf_release() must be prepared to be take a strbuf that still has >> its string on the stack. > > Well, my implementation does handle a strbuf that still has its > string on the stack: the buffer won't be freed in this case (only a > reset to STRBUF_INIT). > Unless I misunderstood you? I think Junio meant: void f() { struct strbuf sb; char buf[N]; strbuf_wrap_preallocated(&sb, buf, ...); strbuf_add(&sb, ...); // is it safe to just let sb reach the end of its scope? } To answer the last question, the user would need to know too much about the allocation policy, so in this case the user should call strbuf_release(), and let it chose whether to call "free()" (OWNS_MEMORY) or not. This is OK with your implementation, but the doc needs to reflect this. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/2] strbuf: allow to use preallocated memory 2016-05-31 15:54 ` Matthieu Moy @ 2016-05-31 16:08 ` William Duclot 0 siblings, 0 replies; 38+ messages in thread From: William Duclot @ 2016-05-31 16:08 UTC (permalink / raw) To: Matthieu Moy Cc: Junio C Hamano, git, simon.rabourg, francois.beutin, antoine.queru, mhagger On Tue, May 31, 2016 at 05:54:59PM +0200, Matthieu Moy wrote: > William <william.duclot@ensimag.grenoble-inp.fr> writes: > > > On Mon, May 30, 2016 at 11:34:42PM -0700, Junio C Hamano wrote: > > > >> As long as your "on stack strbuf" allows lengthening the string > >> beyond the initial allocation (i.e. .alloc, not .len), the user of > >> the API (i.e. the one that placed the strbuf on its stack) would not > >> know when the implementation (i.e. the code in this patch) decides > >> to switch to allocated memory, so it must call strbuf_release() > >> before it leaves. Which in turn means that your implementation of > >> strbuf_release() must be prepared to be take a strbuf that still has > >> its string on the stack. > > > > Well, my implementation does handle a strbuf that still has its > > string on the stack: the buffer won't be freed in this case (only a > > reset to STRBUF_INIT). > > Unless I misunderstood you? > > I think Junio meant: > > void f() > { > struct strbuf sb; > char buf[N]; > strbuf_wrap_preallocated(&sb, buf, ...); > strbuf_add(&sb, ...); > > // is it safe to just let sb reach the end of its scope? > } > > To answer the last question, the user would need to know too much about > the allocation policy, so in this case the user should call > strbuf_release(), and let it chose whether to call "free()" > (OWNS_MEMORY) or not. This is OK with your implementation, but the doc > needs to reflect this. Okay, I understand. The idea was that the only change in the API is the initialization: the API user can then use the strbuf like they could before this patch, with the same responsabilities (having to release the strbuf when they don't need it anymore). Maybe a clearer documentation about the life and death of a strbuf (initialization, use and release) would be useful, yes ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot @ 2016-05-30 11:32 ` Remi Galan Alfonso 2016-06-01 7:42 ` Jeff King 2 siblings, 1 reply; 38+ messages in thread From: Remi Galan Alfonso @ 2016-05-30 11:32 UTC (permalink / raw) To: William Duclot Cc: git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > This patch series implements an improvment of the strbuf API, allowing > strbuf to use preallocated memory. This makes strbuf fit to be used > in performance-critical operations. > > The first patch is simply a preparatory work, adding tests for > existing strbuf implementation. > Most of the work is made in the second patch: handle pre-allocated > memory, extend the API, document it and test it. Seems interesting, however do you have any test/example that would show the difference of performance between using these optimizations and not using them? Such values would make a nice addition to help convince people that your series is interesting to have and use. Thanks, Rémi ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso @ 2016-06-01 7:42 ` Jeff King 2016-06-01 19:50 ` David Turner 2016-06-01 21:07 ` Jeff King 0 siblings, 2 replies; 38+ messages in thread From: Jeff King @ 2016-06-01 7:42 UTC (permalink / raw) To: Remi Galan Alfonso Cc: William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger On Mon, May 30, 2016 at 01:32:08PM +0200, Remi Galan Alfonso wrote: > William Duclot <william.duclot@ensimag.grenoble-inp.fr> writes: > > This patch series implements an improvment of the strbuf API, allowing > > strbuf to use preallocated memory. This makes strbuf fit to be used > > in performance-critical operations. > > > > The first patch is simply a preparatory work, adding tests for > > existing strbuf implementation. > > Most of the work is made in the second patch: handle pre-allocated > > memory, extend the API, document it and test it. > > Seems interesting, however do you have any test/example that would > show the difference of performance between using these optimizations > and not using them? > > Such values would make a nice addition to help convince people that > your series is interesting to have and use. I'll second the request for actual numbers. I'm a little dubious that malloc overhead is actually a significant place we are spending time, or if there is simply a superstitious avoidance of using strbufs. A huge number of strbufs are used for filenames, where we're about to make a syscall anyway. If your allocator for a 4K page is not competitive with a context switch, I suspect the best solution is to get a new allocator. So I wonder if we have some less-invasive alternatives: 1. Ship a faster allocator library with git, and use its malloc by default. 2. Do caching tricks for strbufs used in tight loops. For example, have strbuf_release() throw its buffer into a last-used cache, and let the next strbuf_grow() use that cache entry. This cuts malloc() out of the loop. You'd probably want to protect the cache with a mutex, though. Most of git isn't thread-safe, but a few parts are, and strbufs are low-level enough that they might get called. I have no idea if those ideas would work. But I wouldn't want to start looking into either of them without some idea of how much time we're actually spending on strbuf mallocs (or how much time we would spend if strbufs were used in some proposed sites). -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-01 7:42 ` Jeff King @ 2016-06-01 19:50 ` David Turner 2016-06-01 20:09 ` Jeff King 2016-06-01 21:07 ` Jeff King 1 sibling, 1 reply; 38+ messages in thread From: David Turner @ 2016-06-01 19:50 UTC (permalink / raw) To: Jeff King, Remi Galan Alfonso Cc: William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote: > 2. Do caching tricks for strbufs used in tight loops. For example, > have strbuf_release() throw its buffer into a last-used cache, > and > let the next strbuf_grow() use that cache entry. This cuts > malloc() > out of the loop. > > You'd probably want to protect the cache with a mutex, though. ... or make the last-used cache be thread-local. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-01 19:50 ` David Turner @ 2016-06-01 20:09 ` Jeff King 2016-06-01 20:22 ` David Turner 0 siblings, 1 reply; 38+ messages in thread From: Jeff King @ 2016-06-01 20:09 UTC (permalink / raw) To: David Turner Cc: Remi Galan Alfonso, William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger On Wed, Jun 01, 2016 at 03:50:29PM -0400, David Turner wrote: > On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote: > > 2. Do caching tricks for strbufs used in tight loops. For example, > > have strbuf_release() throw its buffer into a last-used cache, > > and > > let the next strbuf_grow() use that cache entry. This cuts > > malloc() > > out of the loop. > > > > You'd probably want to protect the cache with a mutex, though. > > > ... or make the last-used cache be thread-local. Good idea. I almost didn't mention threading at all, because I'd be surprised if malloc lock contention is a serious bottleneck for us anywhere. It's hard to really say much concrete because it's not clear to me where people think strbuf allocation _is_ a bottleneck (or again, _would be_ if we were to use it more). If we imagine a loop like this: for (i = 0; i < n; i++) { struct strbuf buf = STRBUF_INIT; do_something_with(&buf); strbuf_release(&buf); } then yes, we'll call malloc() and free() a lot of times. But unless your libc malloc is terrible, those calls are all probably just taking a mutex and reusing the same block from a free list. The strbuf case is actually really friendly to allocators because our blocks tend to be predictable sizes (they're sized by ALLOC_GROW, which has a deterministic set of block sizes). In practice, I'd imagine loops get more complicated than that. They call sub-functions which allocate a strbuf, and sometimes don't release it until other strbufs have been allocated, etc. The proposal in this thread is basically using the call stack to mirror the allocation patterns. So when the pattern of allocations matches an actual stack, that's good, but I'd expect a reasonably smart allocator to perform similarly. And when it's not a true stack, the stack-based strbufs are going to end up with wasted stack space hanging around even after we've free()'d the memory and could reuse it. And I'd still expect an allocator based off a linked list or something to handle such cases pretty well. There are also other non-big-O factors at play in modern systems, like CPU cache footprints. Is heap memory cheaper or more expensive to access than stack? I can imagine stack is kept in the cache better, but if you're reusing the same heap block over and over, it probably stays in cache, too. But if you have unused stack buffers that _could_ be reused (but won't be because you'd have to manually feed them to a new strbuf), that hurts your footprint. And on top of that, the stack proposals I've seen generally involve over-sizing the stack buffers out of a fear of actually calling malloc. And then on top of that, there is the question of whether anything involving a strbuf allocation is actually a tight enough loop to even _care_ about cache dynamics. So again, I'm slightly negative on anything that makes the code even slightly more complex, especially the call sites, if we can't show actual measurable improvement numbers. Sorry, this turned into a mini-rant, and it's not really directed at you, David. Your email was just what spurred me to put some of these thoughts into words. :) -Peff ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-01 20:09 ` Jeff King @ 2016-06-01 20:22 ` David Turner 0 siblings, 0 replies; 38+ messages in thread From: David Turner @ 2016-06-01 20:22 UTC (permalink / raw) To: Jeff King Cc: Remi Galan Alfonso, William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger On Wed, 2016-06-01 at 16:09 -0400, Jeff King wrote: > On Wed, Jun 01, 2016 at 03:50:29PM -0400, David Turner wrote: > > > On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote: > > > 2. Do caching tricks for strbufs used in tight loops. For > > > example, > > > have strbuf_release() throw its buffer into a last-used > > > cache, > > > and > > > let the next strbuf_grow() use that cache entry. This cuts > > > malloc() > > > out of the loop. > > > > > > You'd probably want to protect the cache with a mutex, > > > though. > > > > > > ... or make the last-used cache be thread-local. > > Good idea. > > I almost didn't mention threading at all, because I'd be surprised if > malloc lock contention is a serious bottleneck for us anywhere. > > It's hard to really say much concrete because it's not clear to me > where > people think strbuf allocation _is_ a bottleneck (or again, _would > be_ > if we were to use it more). > > If we imagine a loop like this: > > for (i = 0; i < n; i++) { > struct strbuf buf = STRBUF_INIT; > do_something_with(&buf); > strbuf_release(&buf); > } > > then yes, we'll call malloc() and free() a lot of times. But unless > your > libc malloc is terrible, those calls are all probably just taking a > mutex and reusing the same block from a free list. The strbuf case is > actually really friendly to allocators because our blocks tend to be > predictable sizes (they're sized by ALLOC_GROW, which has a > deterministic set of block sizes). > > In practice, I'd imagine loops get more complicated than that. They > call > sub-functions which allocate a strbuf, and sometimes don't release it > until other strbufs have been allocated, etc. The proposal in this > thread is basically using the call stack to mirror the allocation > patterns. So when the pattern of allocations matches an actual stack, > that's good, but I'd expect a reasonably smart allocator to perform > similarly. And when it's not a true stack, the stack-based strbufs > are > going to end up with wasted stack space hanging around even after > we've > free()'d the memory and could reuse it. And I'd still expect an > allocator based off a linked list or something to handle such cases > pretty well. > > There are also other non-big-O factors at play in modern systems, > like > CPU cache footprints. Is heap memory cheaper or more expensive to > access > than stack? I can imagine stack is kept in the cache better, but if > you're reusing the same heap block over and over, it probably stays > in > cache, too. But if you have unused stack buffers that _could_ be > reused > (but won't be because you'd have to manually feed them to a new > strbuf), > that hurts your footprint. And on top of that, the stack proposals > I've > seen generally involve over-sizing the stack buffers out of a fear of > actually calling malloc. > > And then on top of that, there is the question of whether anything > involving a strbuf allocation is actually a tight enough loop to even > _care_ about cache dynamics. > > So again, I'm slightly negative on anything that makes the code even > slightly more complex, especially the call sites, if we can't show > actual measurable improvement numbers. > > Sorry, this turned into a mini-rant, and it's not really directed at > you, David. Your email was just what spurred me to put some of these > thoughts into words. :) I agree with all this stuff anyway. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-01 7:42 ` Jeff King 2016-06-01 19:50 ` David Turner @ 2016-06-01 21:07 ` Jeff King 2016-06-02 11:11 ` Michael Haggerty 1 sibling, 1 reply; 38+ messages in thread From: Jeff King @ 2016-06-01 21:07 UTC (permalink / raw) To: Remi Galan Alfonso Cc: William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy, mhagger On Wed, Jun 01, 2016 at 03:42:18AM -0400, Jeff King wrote: > I have no idea if those ideas would work. But I wouldn't want to start > looking into either of them without some idea of how much time we're > actually spending on strbuf mallocs (or how much time we would spend if > strbufs were used in some proposed sites). So I tried to come up with some numbers. Here's an utterly silly use of strbufs, but one that I think should over-emphasize the effect of any improvements we make: diff --git a/Makefile b/Makefile index 7a0551a..72b968a 100644 --- a/Makefile +++ b/Makefile @@ -579,6 +579,7 @@ PROGRAM_OBJS += shell.o PROGRAM_OBJS += show-index.o PROGRAM_OBJS += upload-pack.o PROGRAM_OBJS += remote-testsvn.o +PROGRAM_OBJS += foo.o # Binary suffix, set to .exe for Windows builds X = diff --git a/foo.c b/foo.c index e69de29..b62dd97 100644 --- a/foo.c +++ b/foo.c @@ -0,0 +1,18 @@ +#include "git-compat-util.h" +#include "strbuf.h" + +int main(void) +{ + const char *str = "this is a string that we'll repeatedly insert"; + size_t len = strlen(str); + + int i; + for (i = 0; i < 1000000; i++) { + struct strbuf buf = STRBUF_INIT; + int j; + for (j = 0; j < 500; j++) + strbuf_add(&buf, str, len); + strbuf_release(&buf); + } + return 0; +} That takes about 3.5 seconds to run git-foo on my machine. Here's where perf says the time goes: # Children Self Command Shared Object Symbol # ........ ........ ....... ................. .......................... # 35.62% 34.58% git-foo git-foo [.] strbuf_add 22.70% 22.14% git-foo git-foo [.] strbuf_grow 19.85% 19.04% git-foo libc-2.22.so [.] __memcpy_avx_unaligned 9.47% 9.17% git-foo git-foo [.] main 4.88% 4.68% git-foo git-foo [.] memcpy@plt 3.21% 3.12% git-foo libc-2.22.so [.] realloc 1.75% 1.71% git-foo libc-2.22.so [.] _int_realloc 1.42% 0.00% git-foo [unknown] [.] 0x676e697274732061 0.82% 0.79% git-foo git-foo [.] xrealloc 0.61% 0.59% git-foo git-foo [.] memory_limit_check 0.45% 0.00% git-foo [unknown] [.] 0000000000000000 0.32% 0.00% git-foo [unknown] [.] 0x0000000000000fff 0.32% 0.32% git-foo [unknown] [k] 0x00007f591b44a2f0 0.31% 0.31% git-foo [unknown] [k] 0x0000000000404719 0.30% 0.00% git-foo [unknown] [.] 0x0000000000001ffe 0.30% 0.30% git-foo libc-2.22.so [.] _int_free 0.30% 0.28% git-foo libc-2.22.so [.] _int_malloc So malloc and free are pretty low. It looks like realloc is more, probably because of the memcpys it has to do. I don't think that would be much better in a stack-based system (because we'd realloc there, too). What would help is simply using a larger initial size (for _this_ made-up benchmark; I'm not convinced it would be all that good in the real world). But either way, most of the time is actually spent in the strbuf functions themselves (interesting, if you replace strbuf_add with strbuf_addf, we spend much more time dealing with the formatted input). We can probably micro-optimize them some. Here's a short and hacky patch to inline strbuf_grow, and to use compiler intrinsics to do the overflow checks (which obviously isn't portable, but we can easily hide it behind a macro and fall back to the existing scheme): diff --git a/strbuf.c b/strbuf.c index 1ba600b..4f163cb 100644 --- a/strbuf.c +++ b/strbuf.c @@ -55,19 +55,25 @@ void strbuf_attach(struct strbuf *sb, void *buf, size_t len, size_t alloc) sb->buf[sb->len] = '\0'; } -void strbuf_grow(struct strbuf *sb, size_t extra) +static inline void strbuf_grow_inline(struct strbuf *sb, size_t extra) { int new_buf = !sb->alloc; - if (unsigned_add_overflows(extra, 1) || - unsigned_add_overflows(sb->len, extra + 1)) + if (__builtin_add_overflow(extra, sb->len, &extra) || + __builtin_add_overflow(extra, 1, &extra)) die("you want to use way too much memory"); if (new_buf) sb->buf = NULL; - ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); + ALLOC_GROW(sb->buf, extra, sb->alloc); if (new_buf) sb->buf[0] = '\0'; } +void strbuf_grow(struct strbuf *sb, size_t extra) +{ + strbuf_grow_inline(sb, extra); +} +#define strbuf_grow strbuf_grow_inline + void strbuf_trim(struct strbuf *sb) { strbuf_rtrim(sb); That drops my best-of-five for git-foo from: real 0m3.377s user 0m3.376s sys 0m0.000s to: real 0m3.107s user 0m3.104s sys 0m0.000s So that's at least measurable. Again, though, I'm not convinced it actually matters much in the real world. But from what I see here, I'd be surprised if the stack-buffer thing actually generates measurable improvements here _or_ in the real world. -Peff ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-01 21:07 ` Jeff King @ 2016-06-02 11:11 ` Michael Haggerty 2016-06-02 12:58 ` Matthieu Moy 2016-06-24 17:20 ` Jeff King 0 siblings, 2 replies; 38+ messages in thread From: Michael Haggerty @ 2016-06-02 11:11 UTC (permalink / raw) To: Jeff King, Remi Galan Alfonso Cc: William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy On 06/01/2016 11:07 PM, Jeff King wrote: > On Wed, Jun 01, 2016 at 03:42:18AM -0400, Jeff King wrote: > >> I have no idea if those ideas would work. But I wouldn't want to start >> looking into either of them without some idea of how much time we're >> actually spending on strbuf mallocs (or how much time we would spend if >> strbufs were used in some proposed sites). > > So I tried to come up with some numbers. > > Here's an utterly silly use of strbufs, but one that I think should > over-emphasize the effect of any improvements we make: > > diff --git a/Makefile b/Makefile > index 7a0551a..72b968a 100644 > --- a/Makefile > +++ b/Makefile > @@ -579,6 +579,7 @@ PROGRAM_OBJS += shell.o > PROGRAM_OBJS += show-index.o > PROGRAM_OBJS += upload-pack.o > PROGRAM_OBJS += remote-testsvn.o > +PROGRAM_OBJS += foo.o > > # Binary suffix, set to .exe for Windows builds > X = > diff --git a/foo.c b/foo.c > index e69de29..b62dd97 100644 > --- a/foo.c > +++ b/foo.c > @@ -0,0 +1,18 @@ > +#include "git-compat-util.h" > +#include "strbuf.h" > + > +int main(void) > +{ > + const char *str = "this is a string that we'll repeatedly insert"; > + size_t len = strlen(str); > + > + int i; > + for (i = 0; i < 1000000; i++) { > + struct strbuf buf = STRBUF_INIT; > + int j; > + for (j = 0; j < 500; j++) > + strbuf_add(&buf, str, len); > + strbuf_release(&buf); > + } > + return 0; > +} Thanks for generating actual data. Your benchmark could do two things to stress malloc()/free() even more. I'm not claiming that this makes the benchmark more typical of real-world use, but it maybe gets us closer to the theoretical upper limit on improvement. 1. Since strbuf_grow() allocates new space in geometrically increasing sizes, the number of mallocs needed to do N strbuf_add()s increases only like log(N). So decreasing the "500" would increase the fraction of the time spent on allocations. 2. Since the size of the string being appended increases the time spent copying bytes, without appreciably changing the number of allocations (also because of geometric growth), decreasing the length of the string would also increase the fraction of the time spent on allocations. I put those together as options, programmed several variants of the string-concatenating loop, and added a perf script to run them; you can see the full patch here: https://github.com/mhagger/git/commit/b417935a4425e0f2bf62e59a924dc652bb2eae0c The guts look like this: > int j; > if (variant == 0) { > /* Use buffer allocated a single time */ > char *buf = big_constant_lifetime_buf; > > for (j = 0; j < reps; j++) > strcpy(buf + j * len, str); > } else if (variant == 1) { > /* One correct-sized buffer malloc per iteration */ > char *buf = xmalloc(reps * len + 1); > > for (j = 0; j < reps; j++) > strcpy(buf + j * len, str); > > free(buf); > } else if (variant == 2) { > /* Conventional use of strbuf */ > struct strbuf buf = STRBUF_INIT; > > for (j = 0; j < reps; j++) > strbuf_add(&buf, str, len); > > strbuf_release(&buf); > } else if (variant == 3) { > /* strbuf initialized to correct size */ > struct strbuf buf; > strbuf_init(&buf, reps * len); > > for (j = 0; j < reps; j++) > strbuf_add(&buf, str, len); > > strbuf_release(&buf); > } else if (variant == 4) { > /* > * Simulated fixed strbuf with correct size. > * This code only works because we know how > * strbuf works internally, namely that it > * will never realloc() or free() the buffer > * that we attach to it. > */ > struct strbuf buf = STRBUF_INIT; > strbuf_attach(&buf, big_constant_lifetime_buf, 0, reps * len + 1); > > for (j = 0; j < reps; j++) > strbuf_add(&buf, str, len); > > /* No strbuf_release() here! */ > } I ran this for a short string ("short") and a long string ("this is a string that we will repeatedly insert"), and also concatenating the string 5, 20, or 500 times. The number of loops around the whole program is normalized to make the total number of concatenations approximately constant. Here are the full results: time (s) Test 0 1 2 3 4 ---------------------------------------------------- 5 short strings 1.64 3.37 8.72 6.08 3.65 20 short strings 1.72 2.12 5.43 4.01 3.39 500 short strings 1.62 1.61 3.36 3.26 3.10 5 long strings 2.08 6.64 13.09 8.50 3.79 20 long strings 2.16 3.33 7.03 4.72 3.55 500 long strings 2.04 2.10 3.61 3.33 3.26 Column 0 is approximately the "bare metal" approach, with a pre-allocated buffer and no strbuf overhead. Column 1 is like column 0, except allocating a correctly-sized buffer each time through the loop. This increases the runtime by as much as 219%. Column 2 is a naive use of strbuf, where each time through the loop the strbuf is reinitialized to STRBUF_INIT, and managing the space is entirely left to strbuf. Column 3 is like column 2, except that it initializes the strbuf to the correct size right away using strbuf_init(). This reduces the runtime relative to column 2 by as much as 35%. Column 4 uses a simulated "fixed strbuf", where the fixed-size buffer is big enough for the full string (thus there are no calls to malloc()/realloc()/free()). The comparison between columns 0 and 4 shows that using a strbuf costs as much as 123% more than using a simple char array, even if the strbuf doesn't have to do any memory allocations. The comparison between columns 3 and 4 shows savings a reduction in runtime of up to 55% from using a "fixed strbuf" rather than a pre-sized conventional strbuf. I think this is the comparison that is most relevant to the current discussion. Of course strbuf manipulation (especially of small numbers of strings) is unlikely to be a big fraction of the workload of any Git command, so this is far from proof that this optimization is worthwhile in terms of code complexity. But I am still moderately in favor of the idea, for two reasons: 1. The amount of added code complexity is small and quite encapsulated. 2. The ability to use strbufs without having to allocate memory might make enough *psychological* difference that it encourages some devs to use strbufs where they would otherwise have done manual memory management. I think this would be a *big* win in terms of potential bugs and security vulnerabilities avoided. Michael ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-02 11:11 ` Michael Haggerty @ 2016-06-02 12:58 ` Matthieu Moy 2016-06-02 14:22 ` William Duclot 2016-06-24 17:20 ` Jeff King 1 sibling, 1 reply; 38+ messages in thread From: Matthieu Moy @ 2016-06-02 12:58 UTC (permalink / raw) To: Michael Haggerty Cc: Jeff King, Remi Galan Alfonso, William Duclot, git, simon rabourg, francois beutin, antoine queru Michael Haggerty <mhagger@alum.mit.edu> writes: > 1. The amount of added code complexity is small and quite > encapsulated. Actually, STRBUF_OWNS_MEMORY can even be seen as a simplification if done properly: we already have the case where the strbuf does not own the memory with strbuf_slopbuf. I already pointed places in strbuf_grow() which could be simplified after the patch. Re-reading the code it seems at lesat the call to strbuf_grow(sb, 0); in strbuf_detach becomes useless. The same in strbuf_attach() probably is, too. So, the final strbuf.[ch] code might not be "worse" that the previous. I'm unsure about the complexity of the future code using the new API. I don't forsee cases where using the new API would lead to a high maintenance cost, but I don't claim I considered all possible uses. > 2. The ability to use strbufs without having to allocate memory might > make enough *psychological* difference that it encourages some > devs to use strbufs where they would otherwise have done manual > memory management. I think this would be a *big* win in terms of > potential bugs and security vulnerabilities avoided. Note that this can also be seen as a counter-argument, since it may psychologically encourage people to micro-optimize code and use contributors/reviewers neurons to spend time on "shall this be on-stack or malloced?". I think we already have a tendency to micro-optimize non-critical code too much in Git's codebase, so it's not necessarily a step in the right direction. In conclusion, I don't have a conclusion, sorry ;-). -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-02 12:58 ` Matthieu Moy @ 2016-06-02 14:22 ` William Duclot 0 siblings, 0 replies; 38+ messages in thread From: William Duclot @ 2016-06-02 14:22 UTC (permalink / raw) To: Matthieu Moy Cc: Michael Haggerty, Jeff King, Remi Galan Alfonso, git, simon rabourg, francois beutin, antoine queru On Thu, Jun 02, 2016 at 02:58:22PM +0200, Matthieu Moy wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> 1. The amount of added code complexity is small and quite >> encapsulated. > > Actually, STRBUF_OWNS_MEMORY can even be seen as a simplification if > done properly: we already have the case where the strbuf does not own > the memory with strbuf_slopbuf. I already pointed places in > strbuf_grow() which could be simplified after the patch. Re-reading the > code it seems at lesat the call to strbuf_grow(sb, 0); in strbuf_detach > becomes useless. The same in strbuf_attach() probably is, too. > > So, the final strbuf.[ch] code might not be "worse" that the previous. > > I'm unsure about the complexity of the future code using the new API. I > don't forsee cases where using the new API would lead to a high > maintenance cost, but I don't claim I considered all possible uses. > >> 2. The ability to use strbufs without having to allocate memory might >> make enough *psychological* difference that it encourages some >> devs to use strbufs where they would otherwise have done manual >> memory management. I think this would be a *big* win in terms of >> potential bugs and security vulnerabilities avoided. > > Note that this can also be seen as a counter-argument, since it > may psychologically encourage people to micro-optimize code and use > contributors/reviewers neurons to spend time on "shall this be on-stack > or malloced?". > > I think we already have a tendency to micro-optimize non-critical code > too much in Git's codebase, so it's not necessarily a step in the right > direction. > > In conclusion, I don't have a conclusion, sorry ;-). Thank you all for your input and your tests, those are very valuable! Me and Simon have to take a decision, as this contribution is part of a school project that comes to an end. We won't have the time to create tests that are representative of the use of strbuf in the Git codebase (partly because we lack knowledge of the codebase) to settle on whether this API improvement is useful or not. Jeff made very good points, and the tests we ran ourselves seem to agree with yours. That being said, the tests made by Michael, more detailed, suggest that there may be room for an improvement of the strbuf API (even thought that's to confront to the reality of the codebase). Having little time, we will refactor the patch and send it as a V2: this way, if it appears one day that improving the API with stack-allocated memory is indeed useful, the patch will be ready to merge :) ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 0/2] strbuf: improve API 2016-06-02 11:11 ` Michael Haggerty 2016-06-02 12:58 ` Matthieu Moy @ 2016-06-24 17:20 ` Jeff King 1 sibling, 0 replies; 38+ messages in thread From: Jeff King @ 2016-06-24 17:20 UTC (permalink / raw) To: Michael Haggerty Cc: Remi Galan Alfonso, William Duclot, git, simon rabourg, francois beutin, antoine queru, matthieu moy On Thu, Jun 02, 2016 at 01:11:56PM +0200, Michael Haggerty wrote: > On 06/01/2016 11:07 PM, Jeff King wrote: > > On Wed, Jun 01, 2016 at 03:42:18AM -0400, Jeff King wrote: > > > >> I have no idea if those ideas would work. But I wouldn't want to start > >> looking into either of them without some idea of how much time we're > >> actually spending on strbuf mallocs (or how much time we would spend if > >> strbufs were used in some proposed sites). > > > > So I tried to come up with some numbers. > > > > Here's an utterly silly use of strbufs, but one that I think should > > over-emphasize the effect of any improvements we make: > [...] > > Thanks for generating actual data. Thanks for following up on this, and sorry for my slow response. It got put in my "to think about and respond to later" pile (never a good place to be :) ). > Your benchmark could do two things to stress malloc()/free() > even more. I'm not claiming that this makes the benchmark more typical > of real-world use, but it maybe gets us closer to the theoretical upper > limit on improvement. Yeah, you're right. I added more writes to try to actually stimulate the strbuf to grow, but if the point is to call malloc in a tight loop, it works against that (I agree that malloc-in-a-tight-loop does not necessarily represent a real workload, but I think the point of this exercise is to make sure we can get a useful speedup even under theoretical conditions). > I ran this for a short string ("short") and a long string ("this is a > string that we will repeatedly insert"), and also concatenating the > string 5, 20, or 500 times. The number of loops around the whole program > is normalized to make the total number of concatenations approximately > constant. Here are the full results: I suspect "5 short" and "5 long" are the only really interesting cases. For the cases we're talking about, the author clearly expects most input to fit into a single small-ish allocation (or else they would not bother with the fixed strbuf in the first place). So "500 long" is really just exercising memcpy. > Test 0 1 2 3 4 > ---------------------------------------------------- > 5 short strings 1.64 3.37 8.72 6.08 3.65 > 20 short strings 1.72 2.12 5.43 4.01 3.39 > 500 short strings 1.62 1.61 3.36 3.26 3.10 > 5 long strings 2.08 6.64 13.09 8.50 3.79 > 20 long strings 2.16 3.33 7.03 4.72 3.55 > 500 long strings 2.04 2.10 3.61 3.33 3.26 > > > Column 0 is approximately the "bare metal" approach, with a > pre-allocated buffer and no strbuf overhead. > > Column 1 is like column 0, except allocating a correctly-sized buffer > each time through the loop. This increases the runtime by as much as 219%. Makes sense. malloc() isn't free (no pun intended). And I think this shows why the 20/500 cases aren't all that interesting, as we really just end up measuring memcpy. > Column 2 is a naive use of strbuf, where each time through the loop the > strbuf is reinitialized to STRBUF_INIT, and managing the space is > entirely left to strbuf. I'd think this would be about the same as column 1 for our interesting cases, modulo some strbuf overhead. But it's way higher. That implies that either: 1. Strbuf overhead is high (and I think in a really tight loop like this that things like our overflow checks might start to matter; we could be doing those with intrinsics, for example). 2. We're doing multiple mallocs per strbuf, which implies our initial sizes could be a bit more aggressive (it's not like the 30-byte "short" case is that huge). > Column 3 is like column 2, except that it initializes the strbuf to the > correct size right away using strbuf_init(). This reduces the runtime > relative to column 2 by as much as 35%. That should show us the effects of multiple-allocations (my (2) above). And indeed, it counts for some of it but not all. > Column 4 uses a simulated "fixed strbuf", where the fixed-size buffer is > big enough for the full string (thus there are no calls to > malloc()/realloc()/free()). > > The comparison between columns 0 and 4 shows that using a strbuf costs > as much as 123% more than using a simple char array, even if the strbuf > doesn't have to do any memory allocations. Yeah, I can believe there is overhead in all of the "are we big enough" checks (though I'd hope that for the no-allocation cases we simply rely on snprintf to tell us "yes, you fit"). > The comparison between columns 3 and 4 shows savings a reduction in > runtime of up to 55% from using a "fixed strbuf" rather than a pre-sized > conventional strbuf. I think this is the comparison that is most > relevant to the current discussion. I'm including a patch at the end of this mail that implements a "variant 5", which essentially keeps a single-buffer cache of the last-released strbuf, and reuses it. Other than that, it is identical to 2 (no tricks, not even a size hint, in the caller). That skips the malloc/free cycle entirely for loops like this (or any where your malloc/free are matched, and you're not using multiple strbufs or interleaving detached ones). Here are the numbers from my machine with that variant added in: Test 0 1 2 3 4 5 ------------------------------------------------------------ 5 short strings 1.99 3.95 8.80 5.96 3.56 3.53 20 short strings 2.00 2.49 5.44 3.83 3.29 3.22 500 short strings 1.86 1.90 3.21 3.00 2.99 3.00 5 long strings 2.22 5.30 12.07 7.47 3.71 3.50 20 long strings 2.31 2.97 6.99 4.37 3.39 3.32 500 long strings 2.14 2.16 3.50 3.14 3.14 3.46 Unsurprisingly, it performs about as well as the fixed buffer on this test, because this is the optimal case for it. The question remains open whether it would do so in real code practice (I also ignored threading in my patch, which might add some overhead). But it also shows that glibc malloc is kind of expensive. I expected it to be able to optimize this case pretty well. I tried running your tests against tcmalloc. It's definitely better than glibc, but still not as fast as either variants 4 or 5. > Of course strbuf manipulation (especially of small numbers of strings) > is unlikely to be a big fraction of the workload of any Git command, so > this is far from proof that this optimization is worthwhile in terms of > code complexity. But I am still moderately in favor of the idea, for two > reasons: > > 1. The amount of added code complexity is small and quite > encapsulated. Actually, this is the part I am most concerned about. The code itself is encapsulated, but it adds a new decision to every caller: should I be using a stack buffer? How big? I already hate the "hint" field for the same reason. It's just one more complication to think about during writing and review. That's why I prefer something like my variant-5, even if it ends up being more code (e.g., to do thread-local storage). It means things Just Work, and the caller does not have to care about micro-optimizing. > 2. The ability to use strbufs without having to allocate memory might > make enough *psychological* difference that it encourages some > devs to use strbufs where they would otherwise have done manual > memory management. I think this would be a *big* win in terms of > potential bugs and security vulnerabilities avoided. Yeah, I agree there is definitely a psychological difference. I was hoping to provide evidence that it's merely superstition to combat that. Unfortunately, this conversation has left me less certain than ever. :) -Peff --- diff --git a/strbuf.c b/strbuf.c index 1ba600b..ec40c1a 100644 --- a/strbuf.c +++ b/strbuf.c @@ -18,6 +18,10 @@ int starts_with(const char *str, const char *prefix) */ char strbuf_slopbuf[1]; +int use_buf_cache; +static char *buf_cache; +static size_t alloc_cache; + void strbuf_init(struct strbuf *sb, size_t hint) { sb->alloc = sb->len = 0; @@ -29,7 +33,12 @@ void strbuf_init(struct strbuf *sb, size_t hint) void strbuf_release(struct strbuf *sb) { if (sb->alloc) { - free(sb->buf); + if (use_buf_cache && !buf_cache && sb->len < 4096) { + buf_cache = sb->buf; + alloc_cache = sb->alloc; + } else { + free(sb->buf); + } strbuf_init(sb, 0); } } @@ -61,8 +70,16 @@ void strbuf_grow(struct strbuf *sb, size_t extra) if (unsigned_add_overflows(extra, 1) || unsigned_add_overflows(sb->len, extra + 1)) die("you want to use way too much memory"); - if (new_buf) - sb->buf = NULL; + if (new_buf) { + if (buf_cache) { + sb->buf = buf_cache; + sb->alloc = alloc_cache; + buf_cache = NULL; + alloc_cache = 0; + } else { + sb->buf = NULL; + } + } ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); if (new_buf) sb->buf[0] = '\0'; diff --git a/t/helper/test-strbuf-perf.c b/t/helper/test-strbuf-perf.c index 13c437c..516905a 100644 --- a/t/helper/test-strbuf-perf.c +++ b/t/helper/test-strbuf-perf.c @@ -13,6 +13,8 @@ #include "git-compat-util.h" #include "strbuf.h" +extern int use_buf_cache; + int main(int argc, char *argv[]) { const char *str; @@ -55,6 +57,9 @@ int main(int argc, char *argv[]) } fprintf(stderr, "count = '%d'\n", count); + if (variant == 5) + use_buf_cache = 1; + big_constant_lifetime_buf = xmalloc(total_len + 1); for (i = 0; i < count; i++) { int j; @@ -72,7 +77,7 @@ int main(int argc, char *argv[]) strcpy(buf + j * len, str); free(buf); - } else if (variant == 2) { + } else if (variant == 2 || variant == 5) { /* Conventional use of strbuf */ struct strbuf buf = STRBUF_INIT; diff --git a/t/perf/p0003-strbuf.sh b/t/perf/p0003-strbuf.sh index 2be1908..5530197 100755 --- a/t/perf/p0003-strbuf.sh +++ b/t/perf/p0003-strbuf.sh @@ -5,7 +5,7 @@ test_description='Benchmark different uses of strbuf' test_perf_default_repo -for variant in $(seq 0 4) +for variant in $(seq 0 5) do export variant test_perf "variant $variant, 5 short strings" ' @@ -13,7 +13,7 @@ do ' done -for variant in $(seq 0 4) +for variant in $(seq 0 5) do export variant test_perf "variant $variant, 20 short strings" ' @@ -21,7 +21,7 @@ do ' done -for variant in $(seq 0 4) +for variant in $(seq 0 5) do export variant test_perf "variant $variant, 500 short strings" ' @@ -29,7 +29,7 @@ do ' done -for variant in $(seq 0 4) +for variant in $(seq 0 5) do export variant test_perf "variant $variant, 5 long strings" ' @@ -37,7 +37,7 @@ do ' done -for variant in $(seq 0 4) +for variant in $(seq 0 5) do export variant test_perf "variant $variant, 20 long strings" ' @@ -45,7 +45,7 @@ do ' done -for variant in $(seq 0 4) +for variant in $(seq 0 5) do export variant test_perf "variant $variant, 500 long strings" ' ^ permalink raw reply related [flat|nested] 38+ messages in thread
end of thread, other threads:[~2016-06-24 17:20 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot 2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot 2016-05-30 11:26 ` Johannes Schindelin 2016-05-30 13:42 ` Simon Rabourg 2016-05-30 11:56 ` Matthieu Moy 2016-05-31 2:04 ` Michael Haggerty 2016-05-31 9:48 ` Simon Rabourg 2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot 2016-05-30 12:13 ` Johannes Schindelin 2016-05-30 13:20 ` William Duclot 2016-05-31 6:21 ` Johannes Schindelin 2016-05-31 3:05 ` Michael Haggerty 2016-05-31 6:41 ` Johannes Schindelin 2016-05-31 8:25 ` Michael Haggerty 2016-05-30 12:52 ` Matthieu Moy 2016-05-30 14:15 ` William Duclot 2016-05-30 14:34 ` Matthieu Moy 2016-05-30 15:16 ` William Duclot 2016-05-31 4:05 ` Michael Haggerty 2016-05-31 15:59 ` William Duclot 2016-06-03 14:04 ` William Duclot 2016-05-30 21:56 ` Mike Hommey 2016-05-30 22:46 ` William Duclot 2016-05-30 22:50 ` Mike Hommey 2016-05-31 6:34 ` Junio C Hamano 2016-05-31 15:45 ` William 2016-05-31 15:54 ` Matthieu Moy 2016-05-31 16:08 ` William Duclot 2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso 2016-06-01 7:42 ` Jeff King 2016-06-01 19:50 ` David Turner 2016-06-01 20:09 ` Jeff King 2016-06-01 20:22 ` David Turner 2016-06-01 21:07 ` Jeff King 2016-06-02 11:11 ` Michael Haggerty 2016-06-02 12:58 ` Matthieu Moy 2016-06-02 14:22 ` William Duclot 2016-06-24 17:20 ` Jeff King
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.