From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E49EC433F5 for ; Fri, 1 Oct 2021 01:54:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8144861A59 for ; Fri, 1 Oct 2021 01:54:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351375AbhJABzs (ORCPT ); Thu, 30 Sep 2021 21:55:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230283AbhJABzr (ORCPT ); Thu, 30 Sep 2021 21:55:47 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBB1DC06176A for ; Thu, 30 Sep 2021 18:54:03 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id l8so29251866edw.2 for ; Thu, 30 Sep 2021 18:54:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=TXs6baQrXZzRhdkD2XZagjTlsqIMp7PBPF7vu4ZOIfo=; b=c24g2FeRjGzpxOay44QqzQuHYgHd3YK/87x0+vcsnHfW3uTjUp3Fy4lZYH3TNpdxxD /x0MkmzxXeSJu2mhuVTCtVo6Fha/JQVpalORi62HdjGltOwxK3XT+GTiS1w6ukd942yr ei3A+EYICh2J77yUx+LjGCTaza8oOexRjQ2/c+YawlRh06oCzoAJnjffqt6du2/DWPby Av4SjOFtPwMiZ3TkoeDgCZeKZL/ex8em/OIAAt0GfALjJKTIf2Hs7KTQ0EfayQDskTXR HcN/syx1ZIVRs4WbnAHcqcPDlbazHg9/Mgf0PqIDzPPlTu5zNnya8Lre4ksSOEJrquQJ Y1FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=TXs6baQrXZzRhdkD2XZagjTlsqIMp7PBPF7vu4ZOIfo=; b=3dmwmY8Gt4U/QuMETa7IOj1aju6YPKikQber6/WOAEiYZlPTylRxsGhMWhFCw1llG9 lq4hebM8D1WL54KSfG5Zlk3VSHwoAMcO2VJb3BIHcA9lH1v5CzGSX4UE/LmfJRKYz7vE LuWUGjBEg0Zs4rmDoyUapnuJ0+Nv5AsEsNc0TfuyJpXCB7bQg0ybBN+xhF4IEpO4VTgQ Ojww7MdFrAc9OQTXRMFD9DQ8KskzaJ2K+qYDTDKZnDOSIagMqTz37flOFM19ug31HpvI PxTQ+hejEuBeMKgtTR1IYVjUFyBZRj3YtBp/KS/yGkIO2U/84wpmm4gy4JVi7xLQ9VuX ROaw== X-Gm-Message-State: AOAM530AA4GdGa5oyneA1sSnbiKE0JQJwcPxrwRZWHCcIsVovUqzkjJx zv1nzsGJtb5+yYjmi6tFVw/q9566J2krVEuY9Dddix4UpiV1gA== X-Google-Smtp-Source: ABdhPJwdwHutqbifo0RkxsNjoLyZyoASyeoHYc63pO7bOxF2oZTScEIXPFKQUhpvbdKTdA93A8GWBUP8nRNSpbHdVis= X-Received: by 2002:a50:f60a:: with SMTP id c10mr11183855edn.250.1633053242207; Thu, 30 Sep 2021 18:54:02 -0700 (PDT) MIME-Version: 1.0 References: <87ilyjviiy.fsf@evledraar.gmail.com> <87ee97utaq.fsf@evledraar.gmail.com> <87lf3etaih.fsf@evledraar.gmail.com> In-Reply-To: <87lf3etaih.fsf@evledraar.gmail.com> From: Elijah Newren Date: Thu, 30 Sep 2021 18:53:49 -0700 Message-ID: Subject: Re: [PATCH v3 04/11] unpack-trees: introduce preserve_ignored to unpack_trees_options To: =?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= Cc: Elijah Newren via GitGitGadget , Git Mailing List , Fedor Biryukov , Philip Oakley , Phillip Wood , Eric Sunshine Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Sep 30, 2021 at 7:15 AM =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason wrote: > > On Wed, Sep 29 2021, Elijah Newren wrote: > > > On Wed, Sep 29, 2021 at 11:32 AM =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason > > wrote: > >> > >> On Wed, Sep 29 2021, Elijah Newren wrote: > >> ... > > As per the next patch: > > > > int unpack_trees(..., struct unpack_trees_options *o) > > { > > struct dir_struct dir =3D DIR_INIT; > > ... > > if (!o->preserve_ignored) { > > /* Setup 'dir', make o->dir point to it */ > > .... > > o->dir =3D &dir; > > } > > ... > > if (o->dir) > > /* cleanup */ > > .... > > } > > > > The caller doesn't touch o->dir (other than initializing it to zeros); > > unpack_trees() is wholly responsible for it. I'd kind of like to > > entirely remove dir from unpack_trees_options(), but I need a way of > > passing it down through all the other functions in unpack-trees.c, and > > leaving it in unpack_trees_options seems the easiest way to do so. So > > I just marked it as "for internal use only". > > I think I understand *how* it works, I'm puzzled by why you went for > this whole level of indirection when you're using a struct on the stack > in the end anyway, just ... put that in "struct unpack_trees_options"? > > Anyway, I see I have only myself to blame here, as you added these leak > fixes in the v2 in response to some of my offhand comments. > > FWIW I then went on to do some deeper fixes not just on these leaks but > the surrounding leaks, which will be blocked by 2/11 & 05/11 of this > topic for a while. I suppose I only have myself to blame :) > > Below is a patch-on-top that I think makes this whole thing much simpler > by doing away with the pointer entirely. > > I suppose this is also a partial reply to > https://lore.kernel.org/git/CABPp-BG_qigBoirMGR-Yk9Niyxt0UmYCEqojsYxbSEar= LAmraA@mail.gmail.com/; > but I quite dislike this pattern of including a pointer like this where > it's not needed just for the practicalities of memory management. > > I.e. here you use DIR_INIT. In my local patches to fix up the wider > memory leaks in this area I've got DIR_INIT also using a STRBUF_INIT, > and DIR_INIT will in turn be referenced by a > UNPACK_TREES_OPTIONS_INIT. It's quite nice if you're having to > initialize with "UNPACK_TREES_OPTIONS_INIT" have that initialization > work all the way down the chain, and not need e.g. a manual > strbuf_init(), dir_init() etc. And you can keep using UNPACK_TREES_OPTIONS_INIT, because the unpack_trees_opts->dir should be initialized to NULL. > I removed the dir_init() in ce93a4c6127 (dir.[ch]: replace dir_init() > with DIR_INIT, 2021-07-01), I might be going on a tangent here, but looking at that patch, I'm worried that dir_init() was buggy and that you perpetuated that bug with DIR_INIT. Note that dir_struct has a struct strbuf basebuf member, which neither dir_init() or DIR_INIT initialize properly (using either strbuf_init() or STRBUF_INIT). As far as I can tell, dir.c relies on either strbuf_add() calls to just happen to work with this incorrectly initialized strbuf, or else use the strbuf_init() call in prep_exclude() to do so, using the following snippet: if (!dir->basebuf.buf) strbuf_init(&dir->basebuf, PATH_MAX); However, earlier in that same function we see if (stk->baselen <=3D baselen && !strncmp(dir->basebuf.buf, base, stk->baselen)) break; So either that function can never have dir->basebuf.buf be NULL and the strbuf_init() is dead code, or else it's possible for us to trigger a segfault. If it's the former, it may just be a ticking time bomb that will transform into the latter with some other change, because it's not at all obvious to me how dir->basebuf gets initialized appropriately to avoid that strncmp call. Perhaps there is some invariant where exclude_stack is only set up by previous calls to prep_exclude() and those won't set up exclude_stack until first initializing basebuf. But that really at least deserves a comment about how we're abusing basebuf, and would probably be cleaner if we initialized basebuf to STRBUF_INIT. > but would probably need to bring it back, of If you need to bring it back, it's unrelated to my changes here, and would only be because of the lack of basebuf initialization above. > course you need some "release()" method for the > UNPACK_TREES_OPTIONS_INIT, which in turn needs to call the dir_release() > (well, "dir_clear()" in that case), and it needs to call > "strbuf_release()". It's just nicer if that boilerplate is all on > destruction, but not also on struct/object setup. The caller should *not* be initializing or tearing down unpack_trees_options->dir beyond setting that field to NULL; it should then leave it alone. > We do need that setup in some cases (although a lot could just be > replaced by lazy initialization), but if we don't.... > > diff --git a/unpack-trees.c b/unpack-trees.c > index a7e1712d236..de5cc6cd025 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1694,15 +1694,12 @@ int unpack_trees(unsigned len, struct tree_desc *= t, struct unpack_trees_options > static struct cache_entry *dfc; > struct pattern_list pl; > int free_pattern_list =3D 0; > - struct dir_struct dir =3D DIR_INIT; > > if (o->reset =3D=3D UNPACK_RESET_INVALID) > BUG("o->reset had a value of 1; should be UNPACK_TREES_*_= UNTRACKED"); > > if (len > MAX_UNPACK_TREES) > die("unpack_trees takes at most %d trees", MAX_UNPACK_TRE= ES); > - if (o->dir) > - BUG("o->dir is for internal use only"); I think this was an important check that you've tossed without replacement. Historically, callers set up and tweaked o->dir with various values. With my patch, we are no longer allowing that..which introduces a transition problem -- people might have written or are now writing patches that make new calls of unpack_trees() previous to this change of mine, but submit them after this change of mine gets merged. Without this check I added, they'd probably just do a mechanical `o->dir->` change to `o->dir.` and assume it's good...and then possibly have ugly bugs to hunt down. So, I think it's helpful to have a check that provides an early warning that tweaking o->dir is not only no longer required, but also no longer allowed. > trace_performance_enter(); > trace2_region_enter("unpack_trees", "unpack_trees", the_repositor= y); > @@ -1718,9 +1715,8 @@ int unpack_trees(unsigned len, struct tree_desc *t,= struct unpack_trees_options > BUG("UNPACK_RESET_OVERWRITE_UNTRACKED incompatible with p= reserved ignored files"); > > if (!o->preserve_ignored) { > - o->dir =3D &dir; > - o->dir->flags |=3D DIR_SHOW_IGNORED; > - setup_standard_excludes(o->dir); > + o->dir.flags |=3D DIR_SHOW_IGNORED; > + setup_standard_excludes(&o->dir); > } > > if (!core_apply_sparse_checkout || !o->update) > @@ -1884,10 +1880,7 @@ int unpack_trees(unsigned len, struct tree_desc *t= , struct unpack_trees_options > done: > if (free_pattern_list) > clear_pattern_list(&pl); > - if (o->dir) { > - dir_clear(o->dir); > - o->dir =3D NULL; > - } > + dir_clear(&o->dir); Unconditionally calling dir_clear()... > trace2_region_leave("unpack_trees", "unpack_trees", the_repositor= y); > trace_performance_leave("unpack_trees"); > return ret; > @@ -2153,8 +2146,7 @@ static int verify_clean_subdirectory(const struct c= ache_entry *ce, > pathbuf =3D xstrfmt("%.*s/", namelen, ce->name); > > memset(&d, 0, sizeof(d)); > - if (o->dir) > - d.exclude_per_dir =3D o->dir->exclude_per_dir; > + d.exclude_per_dir =3D o->dir.exclude_per_dir; > i =3D read_directory(&d, o->src_index, pathbuf, namelen+1, NULL); > if (i) > return add_rejected_path(o, ERROR_NOT_UPTODATE_DIR, ce->n= ame); > @@ -2201,8 +2193,7 @@ static int check_ok_to_remove(const char *name, int= len, int dtype, > if (ignore_case && icase_exists(o, name, len, st)) > return 0; > > - if (o->dir && > - is_excluded(o->dir, o->src_index, name, &dtype)) > + if (is_excluded(&o->dir, o->src_index, name, &dtype)) Unconditionally calling is_excluded()... > /* > * ce->name is explicitly excluded, so it is Ok to > * overwrite it. > diff --git a/unpack-trees.h b/unpack-trees.h > index 71ffb7eeb0c..a8afbb20170 100644 > --- a/unpack-trees.h > +++ b/unpack-trees.h > @@ -5,6 +5,7 @@ > #include "strvec.h" > #include "string-list.h" > #include "tree-walk.h" > +#include "dir.h" > > #define MAX_UNPACK_TREES MAX_TRAVERSE_TREES > > @@ -95,7 +96,7 @@ struct unpack_trees_options { > struct index_state result; > > struct pattern_list *pl; /* for internal use */ > - struct dir_struct *dir; /* for internal use only */ > + struct dir_struct dir; /* for internal use only */ > struct checkout_metadata meta; > }; > Not only did you drop the important safety check that o->dir not be setup by the caller (which needs to be reinstated in some form), your solution also involves unconditionally calling dir_clear() and is_excluded(). It is not clear to me that those calls are safe...and that they will continue to be safe in the future. Even if it is safe and will continue to be, I don't think this should be squashed into my patches. I think it should be a separate patch with its own commit message that explicitly calls out this assumption. Especially since this is dir.c, which is an area where attempting to fix one very simple little bug results in years of refactoring and fixing all kinds of historical messes, sometimes waiting a year and a half for responses to RFCs/review requests, and where we have to sometimes just give up on attempting to understand the purpose of various bits of code and instead rely on the regression tests and hope they are good enough. I still think that dir.c deserves a little warning at the top, like the one I suggested in [1]. [1] https://lore.kernel.org/git/CABPp-BFiwzzUgiTj_zu+vF5x20L0=3D1cf25cHwk7K= ZQj2YkVzXw@mail.gmail.com/