All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Fedor Biryukov <fedor.birjukov@gmail.com>,
	Philip Oakley <philipoakley@iee.email>,
	Phillip Wood <phillip.wood123@gmail.com>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options
Date: Thu, 30 Sep 2021 18:53:49 -0700	[thread overview]
Message-ID: <CABPp-BGi03JunRaMF_8SJKC00byOnq1kL3JyYhKWatz8-B4RsA@mail.gmail.com> (raw)
In-Reply-To: <87lf3etaih.fsf@evledraar.gmail.com>

On Thu, Sep 30, 2021 at 7:15 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Sep 29 2021, Elijah Newren wrote:
>
> > On Wed, Sep 29, 2021 at 11:32 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >> On Wed, Sep 29 2021, Elijah Newren wrote:
> >>
...
> > As per the next patch:
> >
> > int unpack_trees(..., struct unpack_trees_options *o)
> > {
> >     struct dir_struct dir = DIR_INIT;
> >     ...
> >     if (!o->preserve_ignored) {
> >         /* Setup 'dir', make o->dir point to it */
> >         ....
> >         o->dir = &dir;
> >     }
> >     ...
> >     if (o->dir)
> >         /* cleanup */
> >     ....
> > }
> >
> > The caller doesn't touch o->dir (other than initializing it to zeros);
> > unpack_trees() is wholly responsible for it.  I'd kind of like to
> > entirely remove dir from unpack_trees_options(), but I need a way of
> > passing it down through all the other functions in unpack-trees.c, and
> > leaving it in unpack_trees_options seems the easiest way to do so.  So
> > I just marked it as "for internal use only".
>
> I think I understand *how* it works, I'm puzzled by why you went for
> this whole level of indirection when you're using a struct on the stack
> in the end anyway, just ... put that in "struct unpack_trees_options"?
>
> Anyway, I see I have only myself to blame here, as you added these leak
> fixes in the v2 in response to some of my offhand comments.
>
> FWIW I then went on to do some deeper fixes not just on these leaks but
> the surrounding leaks, which will be blocked by 2/11 & 05/11 of this
> topic for a while. I suppose I only have myself to blame :)
>
> Below is a patch-on-top that I think makes this whole thing much simpler
> by doing away with the pointer entirely.
>
> I suppose this is also a partial reply to
> https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEarLAmraA@mail.gmail.com/;
> but I quite dislike this pattern of including a pointer like this where
> it's not needed just for the practicalities of memory management.
>
> I.e. here you use DIR_INIT. In my local patches to fix up the wider
> memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT,
> and DIR_INIT will in turn be referenced by a
> UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to
> initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization
> work all the way down the chain, and not need e.g. a manual
> strbuf_init(), dir_init() etc.

And you can keep using UNPACK_TREES_OPTIONS_INIT, because the
unpack_trees_opts->dir should be initialized to NULL.

> I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init()
> with DIR_INIT, 2021-07-01),

I might be going on a tangent here, but looking at that patch, I'm
worried that dir_init() was buggy and that you perpetuated that bug
with DIR_INIT.  Note that dir_struct has a struct strbuf basebuf
member, which neither dir_init() or DIR_INIT initialize properly
(using either strbuf_init() or STRBUF_INIT).  As far as I can tell,
dir.c relies on either strbuf_add() calls to just happen to work with
this incorrectly initialized strbuf, or else use the strbuf_init()
call in prep_exclude() to do so, using the following snippet:

    if (!dir->basebuf.buf)
        strbuf_init(&dir->basebuf, PATH_MAX);

However, earlier in that same function we see

    if (stk->baselen <= baselen &&
        !strncmp(dir->basebuf.buf, base, stk->baselen))
            break;

So either that function can never have dir->basebuf.buf be NULL and
the strbuf_init() is dead code, or else it's possible for us to
trigger a segfault.  If it's the former, it may just be a ticking time
bomb that will transform into the latter with some other change,
because it's not at all obvious to me how dir->basebuf gets
initialized appropriately to avoid that strncmp call.  Perhaps there
is some invariant where exclude_stack is only set up by previous calls
to prep_exclude() and those won't set up exclude_stack until first
initializing basebuf.  But that really at least deserves a comment
about how we're abusing basebuf, and would probably be cleaner if we
initialized basebuf to STRBUF_INIT.

> but would probably need to bring it back, of

If you need to bring it back, it's unrelated to my changes here, and
would only be because of the lack of basebuf initialization above.

> course you need some "release()" method for the
> UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release()
> (well, "dir_clear()" in that case), and it needs to call
> "strbuf_release()". It's just nicer if that boilerplate is all on
> destruction, but not also on struct/object setup.

The caller should *not* be initializing or tearing down
unpack_trees_options->dir beyond setting that field to NULL; it should
then leave it alone.

> We do need that setup in some cases (although a lot could just be
> replaced by lazy initialization), but if we don't....
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index a7e1712d236..de5cc6cd025 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>         static struct cache_entry *dfc;
>         struct pattern_list pl;
>         int free_pattern_list = 0;
> -       struct dir_struct dir = DIR_INIT;
>
>         if (o->reset == UNPACK_RESET_INVALID)
>                 BUG("o->reset had a value of 1; should be UNPACK_TREES_*_UNTRACKED");
>
>         if (len > MAX_UNPACK_TREES)
>                 die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> -       if (o->dir)
> -               BUG("o->dir is for internal use only");

I think this was an important check that you've tossed without
replacement.  Historically, callers set up and tweaked o->dir with
various values.  With my patch, we are no longer allowing that..which
introduces a transition problem -- people might have written or are
now writing patches that make new calls of unpack_trees() previous to
this change of mine, but submit them after this change of mine gets
merged.  Without this check I added, they'd probably just do a
mechanical `o->dir->` change to `o->dir.` and assume it's good...and
then possibly have ugly bugs to hunt down.

So, I think it's helpful to have a check that provides an early
warning that tweaking o->dir is not only no longer required, but also
no longer allowed.

>         trace_performance_enter();
>         trace2_region_enter("unpack_trees", "unpack_trees", the_repository);
> @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>                 BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with preserved ignored files");
>
>         if (!o->preserve_ignored) {
> -               o->dir = &dir;
> -               o->dir->flags |= DIR_SHOW_IGNORED;
> -               setup_standard_excludes(o->dir);
> +               o->dir.flags |= DIR_SHOW_IGNORED;
> +               setup_standard_excludes(&o->dir);
>         }
>
>         if (!core_apply_sparse_checkout || !o->update)
> @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  done:
>         if (free_pattern_list)
>                 clear_pattern_list(&pl);
> -       if (o->dir) {
> -               dir_clear(o->dir);
> -               o->dir = NULL;
> -       }
> +       dir_clear(&o->dir);

Unconditionally calling dir_clear()...

>         trace2_region_leave("unpack_trees", "unpack_trees", the_repository);
>         trace_performance_leave("unpack_trees");
>         return ret;
> @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
>         pathbuf = xstrfmt("%.*s/", namelen, ce->name);
>
>         memset(&d, 0, sizeof(d));
> -       if (o->dir)
> -               d.exclude_per_dir = o->dir->exclude_per_dir;
> +       d.exclude_per_dir = o->dir.exclude_per_dir;
>         i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>         if (i)
>                 return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->name);
> @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
>         if (ignore_case && icase_exists(o, name, len, st))
>                 return 0;
>
> -       if (o->dir &&
> -           is_excluded(o->dir, o->src_index, name, &dtype))
> +       if (is_excluded(&o->dir, o->src_index, name, &dtype))

Unconditionally calling is_excluded()...

>                 /*
>                  * ce->name is explicitly excluded, so it is Ok to
>                  * overwrite it.
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 71ffb7eeb0c..a8afbb20170 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -5,6 +5,7 @@
>  #include "strvec.h"
>  #include "string-list.h"
>  #include "tree-walk.h"
> +#include "dir.h"
>
>  #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES
>
> @@ -95,7 +96,7 @@ struct unpack_trees_options {
>         struct index_state result;
>
>         struct pattern_list *pl; /* for internal use */
> -       struct dir_struct *dir; /* for internal use only */
> +       struct dir_struct dir; /* for internal use only */
>         struct checkout_metadata meta;
>  };
>

Not only did you drop the important safety check that o->dir not be
setup by the caller (which needs to be reinstated in some form), your
solution also involves unconditionally calling dir_clear() and
is_excluded().  It is not clear to me that those calls are safe...and
that they will continue to be safe in the future.  Even if it is safe
and will continue to be, I don't think this should be squashed into my
patches.  I think it should be a separate patch with its own commit
message that explicitly calls out this assumption.  Especially since
this is dir.c, which is an area where attempting to fix one very
simple little bug results in years of refactoring and fixing all kinds
of historical messes, sometimes waiting a year and a half for
responses to RFCs/review requests, and where we have to sometimes just
give up on attempting to understand the purpose of various bits of
code and instead rely on the regression tests and hope they are good
enough.  I still think that dir.c deserves a little warning at the
top, like the one I suggested in [1].

[1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com/

  reply	other threads:[~2021-10-01  1:54 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18 23:15 [PATCH 0/6] Fix various issues around removal of untracked files/directories Elijah Newren via GitGitGadget
2021-09-18 23:15 ` [PATCH 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
2021-09-19 13:44   ` Ævar Arnfjörð Bjarmason
2021-09-20 14:48     ` Elijah Newren
2021-09-18 23:15 ` [PATCH 2/6] Split unpack_trees 'reset' flag into two for untracked handling Elijah Newren via GitGitGadget
2021-09-19 13:48   ` Ævar Arnfjörð Bjarmason
2021-09-20 15:20     ` Elijah Newren
2021-09-20 10:19   ` Phillip Wood
2021-09-20 16:05     ` Elijah Newren
2021-09-20 18:11       ` Phillip Wood
2021-09-24  2:27         ` Elijah Newren
2021-09-18 23:15 ` [PATCH 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
2021-09-18 23:15 ` [PATCH 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
2021-09-19 13:52   ` Ævar Arnfjörð Bjarmason
2021-09-20 16:12     ` Elijah Newren
2021-09-18 23:15 ` [PATCH 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
2021-09-24 11:47   ` Luke Diamand
2021-09-24 13:41     ` Elijah Newren
2021-09-18 23:15 ` [PATCH 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
2021-09-19 10:52   ` Philip Oakley
2021-09-19 13:36     ` Philip Oakley
2021-09-20 16:29       ` Elijah Newren
2021-09-24  6:37 ` [PATCH v2 0/6] Fix various issues around removal of " Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 1/6] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 2/6] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
2021-09-24 17:35     ` Junio C Hamano
2021-09-26  6:50       ` Elijah Newren
2021-09-24  6:37   ` [PATCH v2 3/6] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 4/6] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
2021-09-24  6:37   ` [PATCH v2 5/6] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
2021-09-24 17:50     ` Eric Sunshine
2021-09-26  6:35       ` Elijah Newren
2021-09-24  6:37   ` [PATCH v2 6/6] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
2021-09-27 16:33   ` [PATCH v3 00/11] Fix various issues around removal of " Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 01/11] t2500: add various tests for nuking untracked files Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 02/11] checkout, read-tree: fix leak of unpack_trees_options.dir Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 03/11] read-tree, merge-recursive: overwrite ignored files by default Elijah Newren via GitGitGadget
2021-12-13 17:12       ` Jack O'Connor
2021-12-13 20:10         ` Elijah Newren
2021-09-27 16:33     ` [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options Elijah Newren via GitGitGadget
2021-09-29  9:22       ` Ævar Arnfjörð Bjarmason
2021-09-29 15:35         ` Elijah Newren
2021-09-29 18:30           ` Ævar Arnfjörð Bjarmason
2021-09-30  4:25             ` Elijah Newren
2021-09-30 14:04               ` Ævar Arnfjörð Bjarmason
2021-10-01  1:53                 ` Elijah Newren [this message]
2021-10-01  8:15                   ` Ævar Arnfjörð Bjarmason
2021-10-01  9:53                     ` Ævar Arnfjörð Bjarmason
2021-10-01 18:50                     ` Elijah Newren
2021-10-02  8:44                       ` Ævar Arnfjörð Bjarmason
2021-10-03 22:21                         ` Ævar Arnfjörð Bjarmason
2021-10-04 13:45                           ` Elijah Newren
2021-10-04 13:45                         ` Elijah Newren
2021-10-04 14:07                           ` Ævar Arnfjörð Bjarmason
2021-10-04 14:57                             ` Elijah Newren
2021-09-27 16:33     ` [PATCH v3 05/11] unpack-trees: make dir an internal-only struct Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 06/11] Remove ignored files by default when they are in the way Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 07/11] Change unpack_trees' 'reset' flag into an enum Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 08/11] unpack-trees: avoid nuking untracked dir in way of unmerged file Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 09/11] unpack-trees: avoid nuking untracked dir in way of locally deleted file Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 10/11] Comment important codepaths regarding nuking untracked files/dirs Elijah Newren via GitGitGadget
2021-09-27 16:33     ` [PATCH v3 11/11] Documentation: call out commands that nuke untracked files/directories Elijah Newren via GitGitGadget
2021-09-27 20:36     ` [PATCH v3 00/11] Fix various issues around removal of " Junio C Hamano
2021-09-27 20:41       ` Elijah Newren
2021-09-27 21:31         ` Elijah Newren
2021-09-30 14:00     ` Phillip Wood
     [not found]     ` <aaa8ea3b-0902-f9e6-c1a4-0ca2b1b2f57b@gmail.com>
2021-10-01  2:08       ` Elijah Newren
2021-10-04  1:11     ` [RFC PATCH v4 00/10] " Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 01/10] t2500: add various tests for nuking untracked files Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 02/10] read-tree, merge-recursive: overwrite ignored files by default Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 03/10] unpack-trees: introduce preserve_ignored to unpack_trees_options Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 04/10] unpack-trees: rename "dir" to "private_dir" Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 05/10] Remove ignored files by default when they are in the way Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 06/10] Change unpack_trees' 'reset' flag into an enum Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 07/10] unpack-trees: avoid nuking untracked dir in way of unmerged file Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 08/10] unpack-trees: avoid nuking untracked dir in way of locally deleted file Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 09/10] Comment important codepaths regarding nuking untracked files/dirs Ævar Arnfjörð Bjarmason
2021-10-04  1:11       ` [RFC PATCH v4 10/10] Documentation: call out commands that nuke untracked files/directories Ævar Arnfjörð Bjarmason
2021-10-04 14:38       ` [RFC PATCH v4 00/10] Fix various issues around removal of " Elijah Newren
2021-10-04 16:08         ` Ævar Arnfjörð Bjarmason
2021-10-05  7:40           ` Elijah Newren
2021-10-04 18:17         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGi03JunRaMF_8SJKC00byOnq1kL3JyYhKWatz8-B4RsA@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=fedor.birjukov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=philipoakley@iee.email \
    --cc=phillip.wood123@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.