All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192
@ 2021-07-07 10:38 Ævar Arnfjörð Bjarmason
  2021-07-07 10:38 ` [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192 Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-07 10:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

A rather trivial cleanup series to avoid hardcoding 8192 in N places
in strbuf.[ch], instead we add a STRBUF_HINT_SIZE macro. Then make
strbuf_fread() take a "hint" instead of "size" (like most similar
functions) so we can pass a size of "0" meaning the default hint to
it.

Ævar Arnfjörð Bjarmason (3):
  strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192
  strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  strbuf.[ch]: make strbuf_fread() take hint, not size

 builtin/am.c |  4 ++--
 cache-tree.c |  2 +-
 commit.c     |  2 +-
 strbuf.c     |  9 +++++----
 strbuf.h     | 14 +++++++++++++-
 5 files changed, 22 insertions(+), 9 deletions(-)

-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192
  2021-07-07 10:38 [PATCH 0/3] strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192 Ævar Arnfjörð Bjarmason
@ 2021-07-07 10:38 ` Ævar Arnfjörð Bjarmason
  2021-07-07 22:33   ` Junio C Hamano
  2021-07-07 10:38 ` [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE Ævar Arnfjörð Bjarmason
  2021-07-07 10:38 ` [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-07 10:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

In b449f4cfc97 (Rework strbuf API and semantics., 2007-09-06) the
first hardcoding of 8192 appeared in strbuf.[ch], then in
f1696ee398e (Strbuf API extensions and fixes., 2007-09-10) another one
was added, and in b4e04fb66e8 (strbuf: add strbuf_read_once to read
without blocking, 2015-12-15) a third.

Let's factor that out into a STRBUF_HINT_SIZE macro, and add a
strbuf_hint() helper macro for "hint ? hint : STRBUF_HINT_SIZE".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 strbuf.c |  6 +++---
 strbuf.h | 11 +++++++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 4df30b45494..7e9f5fdc4de 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -517,7 +517,7 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 	size_t oldlen = sb->len;
 	size_t oldalloc = sb->alloc;
 
-	strbuf_grow(sb, hint ? hint : 8192);
+	strbuf_grow(sb, strbuf_hint(hint));
 	for (;;) {
 		ssize_t want = sb->alloc - sb->len - 1;
 		ssize_t got = read_in_full(fd, sb->buf + sb->len, want);
@@ -532,7 +532,7 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 		sb->len += got;
 		if (got < want)
 			break;
-		strbuf_grow(sb, 8192);
+		strbuf_grow(sb, STRBUF_HINT_SIZE);
 	}
 
 	sb->buf[sb->len] = '\0';
@@ -544,7 +544,7 @@ ssize_t strbuf_read_once(struct strbuf *sb, int fd, size_t hint)
 	size_t oldalloc = sb->alloc;
 	ssize_t cnt;
 
-	strbuf_grow(sb, hint ? hint : 8192);
+	strbuf_grow(sb, strbuf_hint(hint));
 	cnt = xread(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
 	if (cnt > 0)
 		strbuf_setlen(sb, sb->len + cnt);
diff --git a/strbuf.h b/strbuf.h
index 223ee2094af..ca3c47966a0 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -72,6 +72,17 @@ struct strbuf {
 extern char strbuf_slopbuf[];
 #define STRBUF_INIT  { .alloc = 0, .len = 0, .buf = strbuf_slopbuf }
 
+/**
+ * Various functions take a `size_t hint` to give a hint about the
+ * file size, to avoid reallocs. This is the default hint size when
+ * `0` is given. 
+ *
+ * The strbuf_hint() convenience macro is used internally in the
+ * API. DO NOT USE any expression with side-effect for 'size'.
+ */
+#define STRBUF_HINT_SIZE 8192
+#define strbuf_hint(size) ((size) ? (size) : STRBUF_HINT_SIZE)
+
 /*
  * Predeclare this here, since cache.h includes this file before it defines the
  * struct.
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  2021-07-07 10:38 [PATCH 0/3] strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192 Ævar Arnfjörð Bjarmason
  2021-07-07 10:38 ` [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192 Ævar Arnfjörð Bjarmason
@ 2021-07-07 10:38 ` Ævar Arnfjörð Bjarmason
  2021-07-07 20:10   ` Eric Sunshine
  2021-07-07 22:37   ` Junio C Hamano
  2021-07-07 10:38 ` [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-07 10:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Change a couple of users of strbuf_init() that pass a hint of 8192 to
pass STRBUF_HINT_SIZE instead.

Both of these hardcoded occurrences pre-date the use of the strbuf
API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
2007-09-06).

In both cases the exact choice of 8192 is rather arbitrary, e.g. for
commit buffers I think 1024 or 2048 would probably be a better
default (this commit message is getting this commit close to the
former, but I daresay it's already way above the average for git
commits).

In any case, if we ever tweak STRBUF_HINT_SIZE we'll probably do so on
the basis of some codease-wide performance tests, so replacing the
hardcoded value with STRBUF_HINT_SIZE should be safe, they're the same
now, and if we change STRBUF_HINT_SIZE in the future this is one of
the main codepaths we'll be testing.

Aside: It's unfortunate that we don't take a "hint" of "0" in
strbuf_init() to mean "default" and e.g. "-1" to mean the
strbuf_slopbuf (a in STRBUF_INIT). I considered adding that, or
splitting them up so you'd do strbuf_init(&buf) for the
strbuf_init(&buf, 0) case, or strbuf_init_alloc() for the
strbuf_init(&buf, N) case, where N > 0. But that would be a big change
across the codebase, so let's punt on that for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache-tree.c | 2 +-
 commit.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 45e58666afc..d69f6d1c66f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -335,7 +335,7 @@ static int update_one(struct cache_tree *it,
 	/*
 	 * Then write out the tree object for this level.
 	 */
-	strbuf_init(&buffer, 8192);
+	strbuf_init(&buffer, STRBUF_HINT_SIZE);
 
 	i = 0;
 	while (i < entries) {
diff --git a/commit.c b/commit.c
index 8ea55a447fa..b3aab46072a 100644
--- a/commit.c
+++ b/commit.c
@@ -1521,7 +1521,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
 
-	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
+	strbuf_init(&buffer, STRBUF_HINT_SIZE); /* should avoid reallocs for the headers */
 	strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree));
 
 	/*
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size
  2021-07-07 10:38 [PATCH 0/3] strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192 Ævar Arnfjörð Bjarmason
  2021-07-07 10:38 ` [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192 Ævar Arnfjörð Bjarmason
  2021-07-07 10:38 ` [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE Ævar Arnfjörð Bjarmason
@ 2021-07-07 10:38 ` Ævar Arnfjörð Bjarmason
  2021-07-07 11:47   ` Han-Wen Nienhuys
  2 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-07 10:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Change the strbuf_fread() function to take a "size_t hint" instead of
a "size_t size", for consistency with e.g. strbuf_read(). We can then
change callers that explicitly passed in our default hint of 8192.

The strbuf_fread() has not taken a hint ever since it was added in
b449f4cfc97 (Rework strbuf API and semantics., 2007-09-06), it was
left out when strbuf_read() learned to do it in f1696ee398e (Strbuf
API extensions and fixes., 2007-09-10).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c | 4 ++--
 strbuf.c     | 3 ++-
 strbuf.h     | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 0b2d886c81b..295b9c4080c 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -781,7 +781,7 @@ static int stgit_patch_to_mail(FILE *out, FILE *in, int keep_cr)
 	}
 
 	strbuf_reset(&sb);
-	while (strbuf_fread(&sb, 8192, in) > 0) {
+	while (strbuf_fread(&sb, 0, in) > 0) {
 		fwrite(sb.buf, 1, sb.len, out);
 		strbuf_reset(&sb);
 	}
@@ -898,7 +898,7 @@ static int hg_patch_to_mail(FILE *out, FILE *in, int keep_cr)
 	}
 
 	strbuf_reset(&sb);
-	while (strbuf_fread(&sb, 8192, in) > 0) {
+	while (strbuf_fread(&sb, 0, in) > 0) {
 		fwrite(sb.buf, 1, sb.len, out);
 		strbuf_reset(&sb);
 	}
diff --git a/strbuf.c b/strbuf.c
index 7e9f5fdc4de..af3af7622d1 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -498,10 +498,11 @@ void strbuf_add_percentencode(struct strbuf *dst, const char *src, int flags)
 	}
 }
 
-size_t strbuf_fread(struct strbuf *sb, size_t size, FILE *f)
+size_t strbuf_fread(struct strbuf *sb, size_t hint, FILE *f)
 {
 	size_t res;
 	size_t oldalloc = sb->alloc;
+	size_t size = strbuf_hint(hint);
 
 	strbuf_grow(sb, size);
 	res = fread(sb->buf + sb->len, 1, size, f);
diff --git a/strbuf.h b/strbuf.h
index ca3c47966a0..7d178e3c8de 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -442,13 +442,14 @@ void strbuf_addftime(struct strbuf *sb, const char *fmt,
 
 /**
  * Read a given size of data from a FILE* pointer to the buffer.
+ * if the size is 0 the default hint is used.
  *
  * NOTE: The buffer is rewound if the read fails. If -1 is returned,
  * `errno` must be consulted, like you would do for `read(3)`.
  * `strbuf_read()`, `strbuf_read_file()` and `strbuf_getline_*()`
  * family of functions have the same behaviour as well.
  */
-size_t strbuf_fread(struct strbuf *sb, size_t size, FILE *file);
+size_t strbuf_fread(struct strbuf *sb, size_t hint, FILE *file);
 
 /**
  * Read the contents of a given file descriptor. The third argument can be
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size
  2021-07-07 10:38 ` [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size Ævar Arnfjörð Bjarmason
@ 2021-07-07 11:47   ` Han-Wen Nienhuys
  2021-07-07 21:27     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Han-Wen Nienhuys @ 2021-07-07 11:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Jeff King

On Wed, Jul 7, 2021 at 12:39 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Change the strbuf_fread() function to take a "size_t hint" instead of
> a "size_t size", for consistency with e.g. strbuf_read(). We can then

Making strbuf_fread and strbuf_read consistent is weird because they
do different things. strbuf_read reads from fd until it's exhausted,
while strbuf_fread reads up to the size specified. So the argument
isn't really a hint.

shouldn't you rename it to strbuf_fread_once, analogous to strbuf_read_once ?

bonus points for making the argument ordering consistent between
strbuf_fread and strbuf_read

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  2021-07-07 10:38 ` [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE Ævar Arnfjörð Bjarmason
@ 2021-07-07 20:10   ` Eric Sunshine
  2021-07-07 22:37   ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Sunshine @ 2021-07-07 20:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Jeff King

On Wed, Jul 7, 2021 at 6:38 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> Change a couple of users of strbuf_init() that pass a hint of 8192 to
> pass STRBUF_HINT_SIZE instead.
>
> Both of these hardcoded occurrences pre-date the use of the strbuf
> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
> 2007-09-06).
>
> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
> commit buffers I think 1024 or 2048 would probably be a better
> default (this commit message is getting this commit close to the
> former, but I daresay it's already way above the average for git
> commits).
>
> In any case, if we ever tweak STRBUF_HINT_SIZE we'll probably do so on
> the basis of some codease-wide performance tests, so replacing the

Did you mean? s/codease/code/

> hardcoded value with STRBUF_HINT_SIZE should be safe, they're the same
> now, and if we change STRBUF_HINT_SIZE in the future this is one of
> the main codepaths we'll be testing.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size
  2021-07-07 11:47   ` Han-Wen Nienhuys
@ 2021-07-07 21:27     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2021-07-07 21:27 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Ævar Arnfjörð Bjarmason, git, Jeff King

Han-Wen Nienhuys <hanwen@google.com> writes:

> shouldn't you rename it to strbuf_fread_once, analogous to strbuf_read_once ?

Good point.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192
  2021-07-07 10:38 ` [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192 Ævar Arnfjörð Bjarmason
@ 2021-07-07 22:33   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2021-07-07 22:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In b449f4cfc97 (Rework strbuf API and semantics., 2007-09-06) the
> first hardcoding of 8192 appeared in strbuf.[ch], then in
> f1696ee398e (Strbuf API extensions and fixes., 2007-09-10) another one
> was added, and in b4e04fb66e8 (strbuf: add strbuf_read_once to read
> without blocking, 2015-12-15) a third.
>
> Let's factor that out into a STRBUF_HINT_SIZE macro, and add a
> strbuf_hint() helper macro for "hint ? hint : STRBUF_HINT_SIZE".

I guess hiding 8k that appears twice in strbuf.c behind a macro or
symbolic constant might makes sense, but 

 - STRBUF_HINT_SIZE is a misnomer.  It is the default when the
   caller didn't give you a useful hint.  It is STRBUF_NO_HINT_SIZE,
   because it is the size that is used when there is no hint, or
   better yet, call it STRBUF_DEFAULT_READ_SIZE, perhaps?

 - strbuf_hint() is a misnomer for the same reason.
   strbuf_read_size(hint) perhaps, but for two or three callers,
   spelling out (hint ? hint : STRBUF_DEFAULT_READ_SIZE) is easier
   to understand.

 - Both STRBUF_HINT_SIZE and strbuf_hint() have no reason to be
   visible outside the implementation of strbuf.c; the whole reason
   why such a "the caller does not care, so let's use this logic to
   come up with the default value" exists is so that caller does not
   have to know and instead pass "0", which has nothing to do with
   the actual value used.

 - Also, there is no reason why strbuf_read() and strbuf_read_once()
   must share the same default when the caller did not give any
   hint; replacing these occurrences of 8k with a single constant
   may give a false impression that they must match.  This is merely
   an observation and not an objection, because the caller by
   passing 0 is saying they do not care at all, so while we can use
   arbitrary and different random values in these two functions, we
   can use the same arbitrary value as well.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  2021-07-07 10:38 ` [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE Ævar Arnfjörð Bjarmason
  2021-07-07 20:10   ` Eric Sunshine
@ 2021-07-07 22:37   ` Junio C Hamano
  2021-07-07 23:09     ` Junio C Hamano
                       ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Junio C Hamano @ 2021-07-07 22:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change a couple of users of strbuf_init() that pass a hint of 8192 to
> pass STRBUF_HINT_SIZE instead.
>
> Both of these hardcoded occurrences pre-date the use of the strbuf
> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
> 2007-09-06).
>
> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
> commit buffers I think 1024 or 2048 would probably be a better
> default (this commit message is getting this commit close to the
> former, but I daresay it's already way above the average for git
> commits).

Yes, they are arbitrary within the context of these callers.

I do not think using STRBUF_HINT_SIZE macro in them is the right
thing to do at all, as there is no reason to think that the best
value for the write chunk sizes in these codepath has any linkage to
the best value for the read chunk sizes used by strbuf_read() at
all.  When benchmarking reveals that the best default size for
strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how
do you tell that it also happens to be the best write buffer size
for the cache-tree writeout codepath (answer: you don't)?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  2021-07-07 22:37   ` Junio C Hamano
@ 2021-07-07 23:09     ` Junio C Hamano
  2021-07-07 23:22     ` Randall S. Becker
  2021-07-12 17:58     ` Jeff King
  2 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2021-07-07 23:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change a couple of users of strbuf_init() that pass a hint of 8192 to
>> pass STRBUF_HINT_SIZE instead.
>>
>> Both of these hardcoded occurrences pre-date the use of the strbuf
>> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
>> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
>> 2007-09-06).
>>
>> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
>> commit buffers I think 1024 or 2048 would probably be a better
>> default (this commit message is getting this commit close to the
>> former, but I daresay it's already way above the average for git
>> commits).
>
> Yes, they are arbitrary within the context of these callers.
>
> I do not think using STRBUF_HINT_SIZE macro in them is the right
> thing to do at all, as there is no reason to think that the best
> value for the write chunk sizes in these codepath has any linkage to
> the best value for the read chunk sizes used by strbuf_read() at
> all.  When benchmarking reveals that the best default size for
> strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how
> do you tell that it also happens to be the best write buffer size
> for the cache-tree writeout codepath (answer: you don't)?

Having said all that, I wouldn't be so opposed to an approach that

 - declares that we need only one "default I/O buffer size";

 - declares that the appropriate size for it is 8192;

 - #define DEFAULT_IO_SIZE 8192;

 - does something like your [PATCH 1/3], but not limited to strbuf
   API, and

 - covers also the writeout codepath of cache-tree, etc. that uses
   hardcoded I/O buffer size.

The biggest trouble I had with the posted patches, especially the
[PATCH 2/3], was that I am quite sure that you wouldn't have used
STRBUF_HINT_SIZE in the cache-tree writeout codepath or commit-tree
codepath if they didn't use strbuf as a convenient way to get an
elastic buffer.  The more relevant commonality across codepaths that
use 8192 is that the constant is used for sizing the I/O buffer, and
I got an impression that the 3-patch series posted did an incomplete
job of touching some that happen to use strbuf.

An approach that concentrated on the "right" commonality, i.e. we
have hardcoded magic constants for I/O buffer sizing, would have
covered copy.c, diff-delta.c, http-backend.c etc. that do not use
strbuf API where they have hardcoded 8k constants.

Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  2021-07-07 22:37   ` Junio C Hamano
  2021-07-07 23:09     ` Junio C Hamano
@ 2021-07-07 23:22     ` Randall S. Becker
  2021-07-12 17:58     ` Jeff King
  2 siblings, 0 replies; 12+ messages in thread
From: Randall S. Becker @ 2021-07-07 23:22 UTC (permalink / raw)
  To: 'Junio C Hamano',
	'Ævar Arnfjörð Bjarmason'
  Cc: git, 'Jeff King'

On July 7, 2021 6:38 PM, Junio C Hamano wrote:
>To: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>Cc: git@vger.kernel.org; Jeff King <peff@peff.net>
>Subject: Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
>
>Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change a couple of users of strbuf_init() that pass a hint of 8192 to
>> pass STRBUF_HINT_SIZE instead.
>>
>> Both of these hardcoded occurrences pre-date the use of the strbuf
>> API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
>> af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
>> 2007-09-06).
>>
>> In both cases the exact choice of 8192 is rather arbitrary, e.g. for
>> commit buffers I think 1024 or 2048 would probably be a better default
>> (this commit message is getting this commit close to the former, but I
>> daresay it's already way above the average for git commits).
>
>Yes, they are arbitrary within the context of these callers.
>
>I do not think using STRBUF_HINT_SIZE macro in them is the right thing to do at all, as there is no reason to think that the best value for
>the write chunk sizes in these codepath has any linkage to the best value for the read chunk sizes used by strbuf_read() at all.  When
>benchmarking reveals that the best default size for
>strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how do you tell that it also happens to be the best write buffer size for
>the cache-tree writeout codepath (answer: you don't)?

And benchmark results are going to be highly platform dependent, as we have seen with our exotic platform.
-Randall



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE
  2021-07-07 22:37   ` Junio C Hamano
  2021-07-07 23:09     ` Junio C Hamano
  2021-07-07 23:22     ` Randall S. Becker
@ 2021-07-12 17:58     ` Jeff King
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2021-07-12 17:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git

On Wed, Jul 07, 2021 at 03:37:56PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Change a couple of users of strbuf_init() that pass a hint of 8192 to
> > pass STRBUF_HINT_SIZE instead.
> >
> > Both of these hardcoded occurrences pre-date the use of the strbuf
> > API. See 5242bcbb638 (Use strbuf API in cache-tree.c, 2007-09-06) and
> > af6eb82262e (Use strbuf API in apply, blame, commit-tree and diff,
> > 2007-09-06).
> >
> > In both cases the exact choice of 8192 is rather arbitrary, e.g. for
> > commit buffers I think 1024 or 2048 would probably be a better
> > default (this commit message is getting this commit close to the
> > former, but I daresay it's already way above the average for git
> > commits).
> 
> Yes, they are arbitrary within the context of these callers.
> 
> I do not think using STRBUF_HINT_SIZE macro in them is the right
> thing to do at all, as there is no reason to think that the best
> value for the write chunk sizes in these codepath has any linkage to
> the best value for the read chunk sizes used by strbuf_read() at
> all.  When benchmarking reveals that the best default size for
> strbuf_read() is 16k, you'd update STRBUF_HINT_SIZE to 16k, but how
> do you tell that it also happens to be the best write buffer size
> for the cache-tree writeout codepath (answer: you don't)?

Being cc'd on this series, I feel compelled to respond with some review.
But I'm in such agreement with what you said here (and downthread, and
also in your response to patch 1) that I can only add a lame "me too". :)

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-12 17:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 10:38 [PATCH 0/3] strbuf.[ch]: add STRBUF_HINT_SIZE, don't hardcode 8192 Ævar Arnfjörð Bjarmason
2021-07-07 10:38 ` [PATCH 1/3] strbuf.[ch]: add STRBUF_HINT_SIZE macro = 8192 Ævar Arnfjörð Bjarmason
2021-07-07 22:33   ` Junio C Hamano
2021-07-07 10:38 ` [PATCH 2/3] strbuf.h API users: don't hardcode 8192, use STRBUF_HINT_SIZE Ævar Arnfjörð Bjarmason
2021-07-07 20:10   ` Eric Sunshine
2021-07-07 22:37   ` Junio C Hamano
2021-07-07 23:09     ` Junio C Hamano
2021-07-07 23:22     ` Randall S. Becker
2021-07-12 17:58     ` Jeff King
2021-07-07 10:38 ` [PATCH 3/3] strbuf.[ch]: make strbuf_fread() take hint, not size Ævar Arnfjörð Bjarmason
2021-07-07 11:47   ` Han-Wen Nienhuys
2021-07-07 21:27     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.