git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Martin Ågren" <martin.agren@gmail.com>,
	"Andrzej Hunt" <ajrhunt@google.com>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH 06/10] dir.c: get rid of lazy initialization
Date: Mon, 4 Oct 2021 06:45:00 -0700	[thread overview]
Message-ID: <CABPp-BFpyyJ-e8p5fbmCvyaEsfUow=RP45Nw0ckiwNEvVC4zrg@mail.gmail.com> (raw)
In-Reply-To: <patch-06.10-2b243d91696-20211004T002226Z-avarab@gmail.com>

On Sun, Oct 3, 2021 at 5:46 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> Remove the "Lazy initialization" in prep_exclude() left behind by
> aceb9429b37 (prep_exclude: remove the artificial PATH_MAX limit,
> 2014-07-14).
>
> Now that every caller who sets up a "struct dir_struct" is using the
> DIR_INIT macro we can rely on it to have done the initialization. As
> noted in an analysis of the previous control flow[1] an earlier
> passing of of "dir->basebuf.buf" to strncmp() wasn't buggy, as we'd
> only reach that code on subsequent invocations of prep_exclude(),
> i.e. after this strbuf_init() had been run. But keeping track of that
> makes for hard-to-read code. Let's just rely on the initialization
> instead.

Having read through the link previously, this all makes sense to me,
but I'm not sure if this paragraph motivates the change without that
context.  Maybe another reader can comment.

> This does change the behavior of this code in that it won't be
> pre-growing the strbuf to a size of PATH_MAX. I think that's OK.
>
> That we were using PATH_MAX at all is just a relic from this being a
> fixed buffer from way back in f87f9497486 (git-ls-files: --exclude
> mechanism updates., 2005-07-24).
>
> Pre-allocating PATH_MAX was the opposite of an optimization in this
> case. I logged all "basebuf.buf" values when running the test suite,
> and by far the most common one (around 80%) is "", which we now won't
> allocate at all for, and just use the "strbuf_slopbuf".
>
> The second most common one was "a/", followed by other common cases of
> short relative paths. So using the default "struct strbuf" growth
> pattern is a much better allocation optimization in this case.
>
> 1. https://lore.kernel.org/git/87sfxhohsj.fsf@evledraar.gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  dir.c | 8 --------
>  dir.h | 4 +++-
>  2 files changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 39fce3bcba7..efc87c2e405 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1550,14 +1550,6 @@ static void prep_exclude(struct dir_struct *dir,
>         if (dir->pattern)
>                 return;
>
> -       /*
> -        * Lazy initialization. All call sites currently just
> -        * memset(dir, 0, sizeof(*dir)) before use. Changing all of
> -        * them seems lots of work for little benefit.
> -        */
> -       if (!dir->basebuf.buf)
> -               strbuf_init(&dir->basebuf, PATH_MAX);
> -
>         /* Read from the parent directories and push them down. */
>         current = stk ? stk->baselen : -1;
>         strbuf_setlen(&dir->basebuf, current < 0 ? 0 : current);
> diff --git a/dir.h b/dir.h
> index ff3b4a7f602..e3757c6099e 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -342,7 +342,9 @@ struct dir_struct {
>         unsigned visited_directories;
>  };
>
> -#define DIR_INIT { 0 }
> +#define DIR_INIT { \
> +       .basebuf = STRBUF_INIT, \
> +}
>
>  struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp);
>
> --
> 2.33.0.1404.g83021034c5d

Wahoo!  Nice code cleanup.

  reply	other threads:[~2021-10-04 13:55 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04  0:46 [PATCH 00/10] unpack-trees & dir APIs: fix memory leaks Ævar Arnfjörð Bjarmason
2021-10-04  0:46 ` [PATCH 01/10] unpack-trees.[ch]: define and use a UNPACK_TREES_OPTIONS_INIT Ævar Arnfjörð Bjarmason
2021-10-04  0:46 ` [PATCH 02/10] merge-recursive.c: call a new unpack_trees_options_init() function Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 14:41     ` Ævar Arnfjörð Bjarmason
2021-10-04 15:04       ` Elijah Newren
2021-10-04  0:46 ` [PATCH 03/10] unpack-trees.[ch]: embed "dir" in "struct unpack_trees_options" Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 04/10] unpack-trees API: don't have clear_unpack_trees_porcelain() reset Ævar Arnfjörð Bjarmason
2021-10-04  9:31   ` Phillip Wood
2021-10-04 11:12     ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 15:20     ` Ævar Arnfjörð Bjarmason
2021-10-04 16:28       ` Elijah Newren
2021-10-04  0:46 ` [PATCH 05/10] dir.[ch]: make DIR_INIT mandatory Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 06/10] dir.c: get rid of lazy initialization Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren [this message]
2021-10-04  0:46 ` [PATCH 07/10] unpack-trees API: rename clear_unpack_trees_porcelain() Ævar Arnfjörð Bjarmason
2021-10-04  9:38   ` Phillip Wood
2021-10-04 11:10     ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 08/10] unpack-trees: don't leak memory in verify_clean_subdirectory() Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04  0:46 ` [PATCH 09/10] merge.c: avoid duplicate unpack_trees_options_release() code Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 14:50     ` Ævar Arnfjörð Bjarmason
2021-10-04  0:46 ` [PATCH 10/10] built-ins: plug memory leaks with unpack_trees_options_release() Ævar Arnfjörð Bjarmason
2021-10-04 13:45   ` Elijah Newren
2021-10-04 14:54     ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45 ` [PATCH 00/10] unpack-trees & dir APIs: fix memory leaks Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFpyyJ-e8p5fbmCvyaEsfUow=RP45Nw0ckiwNEvVC4zrg@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=ajrhunt@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martin.agren@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).