From: Emily Shaffer <emilyshaffer@google.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Albert Cui <albertcui@google.com>,
Phillip Wood <phillip.wood123@gmail.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Matheus Tavares Bernardino <matheus.bernardino@usp.br>,
Jonathan Nieder <jrnieder@gmail.com>,
Jacob Keller <jacob.keller@gmail.com>,
Atharva Raykar <raykar.ath@gmail.com>,
Derrick Stolee <stolee@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around
Date: Tue, 23 Nov 2021 12:08:12 -0800 [thread overview]
Message-ID: <YZ1KLNwsxx7IR1+5@google.com> (raw)
In-Reply-To: <RFC-cover-0.2-00000000000-20211117T113134Z-avarab@gmail.com>
On Wed, Nov 17, 2021 at 12:43:38PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Nov 16 2021, Emily Shaffer wrote:
>
> > [...]
> > A couple things. Firstly, a semantics change *back* to the semantics of
> > v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
> > so that theoretically a submodule with multiple worktrees in multiple
> > superproject worktrees will be able to figure out which worktree of the
> > superproject it's in. (Realistically, that's not really possible right
> > now, but I'd like to change that soon.)
> >
> > Secondly, a rewording of comments and commit messages to indicate that
> > this isn't a cache of some expensive operation, but rather intended to
> > be the source of truth for all submodules. I also added a fifth commit
> > rewriting `git rev-parse --show-superproject-working-tree` to
> > demonstrate what that means in practice - but from a practical
> > standpoint, I'm a little worried about that fifth patch. More details in
> > the patch 5 description.
> >
> > I did discuss Ævar's idea of relying on in-process filesystem digging to
> > find the superproject's gitdir with the rest of the Google team, but in
> > the end decided that there are some worries about filesystem digging in
> > this way (namely, some ugly interactions with network drives that are
> > actually already an issue for Googler Linux machines). Plus, the allure
> > of being able to definitively know that we're a submodule is pretty
> > strong. ;) But overall, this is the direction I'd prefer to keep going
> > in, rather than trying to guess from the filesystem going forward.
>
> Did you try running the ad-hoc benchmark I included in [1] on that
> Google NFS? I've dealt with some slow-ish network filesystems, but if
> it's slower than AIX's local FS (where I couldn't see a difference) I'd
> put money on it being a cross-Atlantic mount or something :)
>
> Re your:
>
> "this isn't a cache of some expensive operation, but rather intended to be the source of truth for all submodules."
>
> In your 5/5 it says, in seeming contradiction to this:
>
> This commit may be more of an RFC - to demonstrate what life looks like
> if we use submodule.superprojectGitDir as the source of truth. But since
> 'git rev-parse --show-superproject-working-tree' is used in a lot of
> scripts in the wild[1], I'm not so sure it's a great example.
>
> To be honest, I'd prefer to die("Try running 'git submodule update'")
> here, but I don't think that's very script-friendly. However, falling
> back on the old implementation kind of undermines the idea of treating
> submodule.superprojectGitDir as the point of truth.
>
> Most of what I've been suggesting in my [1] and related is that I'm
> confused about if & how this is a pure caching mechanism.
>
> Removing mentions of it being a cache but it seemingly still being a
> cache at the tip of this series has just added to that confusion for
> me :)
Yeah, I think this was a bad choice for me to include that patch. I was
really hopeful that I could show off "look, we don't need to ever hunt
in the FS above us", but for established repos, that's a bad idea
(because lots of people are already using this 'git rev-parse
--show-superproject-work-tree' thing in scripts, like I mentioned). So I
think it was a mistake to include it at all. Rather, I think it's
probably a better idea to treat that particular entry point as "legacy"
and implement other things using 'submodule.superprojectGitDir'
directly.
Because the patch 5 illustrates: "I'm saying that this new config isn't
a cache, but look, here's how I can treat it like a cache that might be
invalid and here's how I can fall back on a potentially expensive
operation anyways." I think I could have illustrated it a little better
with something like "here's a brand new 'git rev-parse
--show-superproject-gitdir'" which directly calls on the new config.
So, sorry about that.
>
> Anyway. While I do think this caching mechanism is probably
> unnecessary in the short to medium term, i.e. it seems to the extent
> that it was ever needed was due to some bridging of *.sh<->*.c that
> we're *this* close to eliminating anyway.
>
> But maybe I'm wrong. The benchmark I suggested above on that Google
> NFS might be indicative. I don't really see how something that'll be
> doing a bunch of FS ops anyway is going to be noticeably slower with
> that approach, but maybe opening the index/tree of the superproject is
> more expensive than I'm expecting.
>
> In any case, all of that's not the hill I'm picking to die on. If
> you'd like to go ahead with this cache-or-not-a-cache then sure, I
> won't belabor that point.
Yeah, I think I would. I've heard some serious reservations from others
on my team about trying to use filesystem traversal here at all, so I
think that would be an uphill battle.
>
> I *do* strongly think if we're doing so though that we should have
> something like this on top. I.e. let's test wha happens if we do and
> don't have this "caching" variable, which is demonstrably easy to do.
>
> Benchmarking the two gives me:
>
> $ git hyperfine -L rev HEAD~0 -L s true,false -s 'make -j8 all' '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR={s} ./t7412-submodule-absorbgitdirs.sh)'
> Benchmark 1: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0
> Time (mean ± σ): 545.9 ms ± 1.6 ms [User: 490.3 ms, System: 114.0 ms]
> Range (min … max): 543.5 ms … 548.1 ms 10 runs
>
> Benchmark 2: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0
> Time (mean ± σ): 537.9 ms ± 11.4 ms [User: 476.8 ms, System: 117.6 ms]
> Range (min … max): 532.7 ms … 570.1 ms 10 runs
>
> Summary
> '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0' ran
> 1.01 ± 0.02 times faster than '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0'
>
> I.e. not using the cache is either indistinguishable or a bit faster
> (the "a bit faster" is definitely due to just running less test code
> though).
Yeah, once again, I think it is better to treat "git rev-parse
--show-superproject-work-tree" as "legacy" and to rely solely on the
config for new options, meaning that "what happens without this
variable" is as simple as "we treat it like it's a standalone repository
with no superproject", rather than a performance difference at all.
- Emily
>
> I'm sending this before the CI run[2] finishes (which now tests both
> modes), but both of these work for me locally on a full test suite
> run.
>
> 1. https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@evledraar.gmail.com/
> 2. https://github.com/avar/git/runs/4237446991?check_suite_focus=true
>
> Ævar Arnfjörð Bjarmason (2):
> submodule tests: fix potentially broken "config .. --unset"
> submodule: add test mode for checking absence of "superProjectGitDir"
>
> ci/run-build-and-tests.sh | 1 +
> git-submodule.sh | 2 +-
> submodule.c | 7 +++++++
> t/lib-submodule-superproject.sh | 24 ++++++++++++++++++++++++
> t/t7406-submodule-update.sh | 13 ++++++-------
> t/t7412-submodule-absorbgitdirs.sh | 19 ++++++-------------
> 6 files changed, 45 insertions(+), 21 deletions(-)
> create mode 100644 t/lib-submodule-superproject.sh
>
> --
> 2.34.0.796.g2c87ed6146a
>
next prev parent reply other threads:[~2021-11-23 20:08 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-17 0:56 [PATCH v6 0/5] teach submodules to know they're submodules Emily Shaffer
2021-11-17 0:56 ` [PATCH v6 1/5] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2021-11-17 0:56 ` [PATCH v6 2/5] introduce submodule.superprojectGitDir record Emily Shaffer
2021-11-17 23:43 ` Jonathan Tan
2021-11-17 0:56 ` [PATCH v6 3/5] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
2021-11-17 0:57 ` [PATCH v6 4/5] submodule: record superproject gitdir during 'update' Emily Shaffer
2021-11-17 0:57 ` [PATCH v6 5/5] submodule: use config to find superproject worktree Emily Shaffer
2021-11-17 11:43 ` [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around Ævar Arnfjörð Bjarmason
2021-11-17 11:43 ` [RFC PATCH 1/2] submodule tests: fix potentially broken "config .. --unset" Ævar Arnfjörð Bjarmason
2021-11-17 11:43 ` [RFC PATCH 2/2] submodule: add test mode for checking absence of "superProjectGitDir" Ævar Arnfjörð Bjarmason
2021-11-23 20:08 ` Emily Shaffer [this message]
2021-11-24 1:38 ` [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around Ævar Arnfjörð Bjarmason
2021-11-17 23:28 ` [PATCH v6 0/5] teach submodules to know they're submodules Jonathan Tan
2021-11-23 20:28 ` Emily Shaffer
2022-02-03 21:59 ` Emily Shaffer
2022-02-03 21:59 ` [PATCH v7 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2022-02-03 21:59 ` [PATCH v7 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
2022-02-03 21:59 ` [PATCH v7 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
2022-02-03 21:59 ` [PATCH v7 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
2022-02-03 22:39 ` [PATCH v6 0/5] teach submodules to know they're submodules Junio C Hamano
2022-02-04 1:15 ` Ævar Arnfjörð Bjarmason
2022-02-04 16:20 ` Junio C Hamano
2022-02-07 19:56 ` Jonathan Nieder
2022-02-07 23:21 ` Junio C Hamano
2022-02-08 1:18 ` Jonathan Nieder
2022-02-08 18:24 ` Junio C Hamano
2022-02-10 22:12 ` Emily Shaffer
2022-02-10 22:53 ` Jonathan Nieder
2022-02-12 20:35 ` Ævar Arnfjörð Bjarmason
2022-02-13 6:25 ` Junio C Hamano
2022-03-01 0:26 ` [PATCH v8 0/3] " Emily Shaffer
2022-03-01 0:26 ` [PATCH v8 1/3] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2022-03-01 0:26 ` [PATCH v8 2/3] introduce submodule.hasSuperproject record Emily Shaffer
2022-03-01 7:00 ` Junio C Hamano
2022-03-08 20:04 ` Emily Shaffer
2022-03-08 22:13 ` Glen Choo
2022-03-08 22:29 ` Glen Choo
2022-03-01 0:26 ` [PATCH v8 3/3] rev-parse: short-circuit superproject worktree when config unset Emily Shaffer
2022-03-01 7:06 ` Junio C Hamano
2022-03-09 0:38 ` Emily Shaffer
2022-03-01 3:08 ` [PATCH v8 0/3] teach submodules to know they're submodules Junio C Hamano
2022-03-08 18:54 ` Emily Shaffer
2022-03-10 0:44 ` [PATCH v9 " Emily Shaffer
2022-03-10 0:44 ` [PATCH v9 1/3] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2022-03-10 0:44 ` [PATCH v9 2/3] introduce submodule.hasSuperproject record Emily Shaffer
2022-03-10 2:09 ` Junio C Hamano
2022-03-10 21:29 ` Glen Choo
2022-03-10 21:40 ` Glen Choo
2022-03-10 22:10 ` Junio C Hamano
2022-03-10 23:42 ` Glen Choo
2022-03-10 23:53 ` Glen Choo
2022-03-15 20:48 ` Emily Shaffer
2022-03-15 20:56 ` Emily Shaffer
2022-03-15 21:19 ` Glen Choo
2022-03-15 18:39 ` Emily Shaffer
2022-03-15 19:19 ` Junio C Hamano
2022-03-10 2:32 ` Junio C Hamano
2022-03-10 21:54 ` Glen Choo
2022-03-15 18:27 ` Emily Shaffer
2022-03-10 0:44 ` [PATCH v9 3/3] rev-parse: short-circuit superproject worktree when config unset Emily Shaffer
2022-03-10 1:47 ` Junio C Hamano
2022-03-10 4:39 ` Eric Sunshine
2022-03-11 9:09 ` [PATCH v9 0/3] teach submodules to know they're submodules Ævar Arnfjörð Bjarmason
2022-03-13 5:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YZ1KLNwsxx7IR1+5@google.com \
--to=emilyshaffer@google.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=albertcui@google.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jacob.keller@gmail.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=matheus.bernardino@usp.br \
--cc=phillip.wood123@gmail.com \
--cc=raykar.ath@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).