git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Albert Cui <albertcui@google.com>,
	Phillip Wood <phillip.wood123@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Matheus Tavares Bernardino <matheus.bernardino@usp.br>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Jacob Keller <jacob.keller@gmail.com>,
	Atharva Raykar <raykar.ath@gmail.com>,
	Derrick Stolee <stolee@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around
Date: Tue, 23 Nov 2021 12:08:12 -0800	[thread overview]
Message-ID: <YZ1KLNwsxx7IR1+5@google.com> (raw)
In-Reply-To: <RFC-cover-0.2-00000000000-20211117T113134Z-avarab@gmail.com>

On Wed, Nov 17, 2021 at 12:43:38PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Nov 16 2021, Emily Shaffer wrote:
> 
> > [...]
> > A couple things. Firstly, a semantics change *back* to the semantics of
> > v3 - we map from gitdir to gitdir, *not* from common dir to common dir,
> > so that theoretically a submodule with multiple worktrees in multiple
> > superproject worktrees will be able to figure out which worktree of the
> > superproject it's in. (Realistically, that's not really possible right
> > now, but I'd like to change that soon.)
> >
> > Secondly, a rewording of comments and commit messages to indicate that
> > this isn't a cache of some expensive operation, but rather intended to
> > be the source of truth for all submodules. I also added a fifth commit
> > rewriting `git rev-parse --show-superproject-working-tree` to
> > demonstrate what that means in practice - but from a practical
> > standpoint, I'm a little worried about that fifth patch. More details in
> > the patch 5 description.
> >
> > I did discuss Ævar's idea of relying on in-process filesystem digging to
> > find the superproject's gitdir with the rest of the Google team, but in
> > the end decided that there are some worries about filesystem digging in
> > this way (namely, some ugly interactions with network drives that are
> > actually already an issue for Googler Linux machines). Plus, the allure
> > of being able to definitively know that we're a submodule is pretty
> > strong. ;) But overall, this is the direction I'd prefer to keep going                                                                                                                          
> > in, rather than trying to guess from the filesystem going forward.
> 
> Did you try running the ad-hoc benchmark I included in [1] on that
> Google NFS? I've dealt with some slow-ish network filesystems, but if
> it's slower than AIX's local FS (where I couldn't see a difference) I'd
> put money on it being a cross-Atlantic mount or something :)
> 
> Re your:
> 
>     "this isn't a cache of some expensive operation, but rather intended to                                                                                                                          be the source of truth for all submodules."
> 
> In your 5/5 it says, in seeming contradiction to this:
> 
>     This commit may be more of an RFC - to demonstrate what life looks like
>     if we use submodule.superprojectGitDir as the source of truth. But since
>     'git rev-parse --show-superproject-working-tree' is used in a lot of
>     scripts in the wild[1], I'm not so sure it's a great example.
> 
>     To be honest, I'd prefer to die("Try running 'git submodule update'")
>     here, but I don't think that's very script-friendly. However, falling
>     back on the old implementation kind of undermines the idea of treating
>     submodule.superprojectGitDir as the point of truth.
> 
> Most of what I've been suggesting in my [1] and related is that I'm
> confused about if & how this is a pure caching mechanism.
> 
> Removing mentions of it being a cache but it seemingly still being a
> cache at the tip of this series has just added to that confusion for
> me :)

Yeah, I think this was a bad choice for me to include that patch. I was
really hopeful that I could show off "look, we don't need to ever hunt
in the FS above us", but for established repos, that's a bad idea
(because lots of people are already using this 'git rev-parse
--show-superproject-work-tree' thing in scripts, like I mentioned). So I
think it was a mistake to include it at all. Rather, I think it's
probably a better idea to treat that particular entry point as "legacy"
and implement other things using 'submodule.superprojectGitDir'
directly.

Because the patch 5 illustrates: "I'm saying that this new config isn't
a cache, but look, here's how I can treat it like a cache that might be
invalid and here's how I can fall back on a potentially expensive
operation anyways." I think I could have illustrated it a little better
with something like "here's a brand new 'git rev-parse
--show-superproject-gitdir'" which directly calls on the new config.

So, sorry about that.

> 
> Anyway. While I do think this caching mechanism is probably
> unnecessary in the short to medium term, i.e. it seems to the extent
> that it was ever needed was due to some bridging of *.sh<->*.c that
> we're *this* close to eliminating anyway.
> 
> But maybe I'm wrong. The benchmark I suggested above on that Google
> NFS might be indicative. I don't really see how something that'll be
> doing a bunch of FS ops anyway is going to be noticeably slower with
> that approach, but maybe opening the index/tree of the superproject is
> more expensive than I'm expecting.
> 
> In any case, all of that's not the hill I'm picking to die on. If
> you'd like to go ahead with this cache-or-not-a-cache then sure, I
> won't belabor that point.

Yeah, I think I would. I've heard some serious reservations from others
on my team about trying to use filesystem traversal here at all, so I
think that would be an uphill battle.

> 
> I *do* strongly think if we're doing so though that we should have
> something like this on top. I.e. let's test wha happens if we do and
> don't have this "caching" variable, which is demonstrably easy to do.
> 
> Benchmarking the two gives me:
> 
>     $ git hyperfine -L rev HEAD~0 -L s true,false -s 'make -j8 all' '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR={s} ./t7412-submodule-absorbgitdirs.sh)'
>     Benchmark 1: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0
>       Time (mean ± σ):     545.9 ms ±   1.6 ms    [User: 490.3 ms, System: 114.0 ms]
>       Range (min … max):   543.5 ms … 548.1 ms    10 runs
>      
>     Benchmark 2: (cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0
>       Time (mean ± σ):     537.9 ms ±  11.4 ms    [User: 476.8 ms, System: 117.6 ms]
>       Range (min … max):   532.7 ms … 570.1 ms    10 runs
>      
>     Summary
>       '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=false ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0' ran
>         1.01 ± 0.02 times faster than '(cd t && GIT_TEST_SUBMODULE_CACHE_SUPERPROJECT_DIR=true ./t7412-submodule-absorbgitdirs.sh)' in 'HEAD~0'
> 
> I.e. not using the cache is either indistinguishable or a bit faster
> (the "a bit faster" is definitely due to just running less test code
> though).

Yeah, once again, I think it is better to treat "git rev-parse
--show-superproject-work-tree" as "legacy" and to rely solely on the
config for new options, meaning that "what happens without this
variable" is as simple as "we treat it like it's a standalone repository
with no superproject", rather than a performance difference at all.

 - Emily

> 
> I'm sending this before the CI run[2] finishes (which now tests both
> modes), but both of these work for me locally on a full test suite
> run.
> 
> 1. https://lore.kernel.org/git/211109.86v912dtfw.gmgdl@evledraar.gmail.com/
> 2. https://github.com/avar/git/runs/4237446991?check_suite_focus=true
> 
> Ævar Arnfjörð Bjarmason (2):
>   submodule tests: fix potentially broken "config .. --unset"
>   submodule: add test mode for checking absence of "superProjectGitDir"
> 
>  ci/run-build-and-tests.sh          |  1 +
>  git-submodule.sh                   |  2 +-
>  submodule.c                        |  7 +++++++
>  t/lib-submodule-superproject.sh    | 24 ++++++++++++++++++++++++
>  t/t7406-submodule-update.sh        | 13 ++++++-------
>  t/t7412-submodule-absorbgitdirs.sh | 19 ++++++-------------
>  6 files changed, 45 insertions(+), 21 deletions(-)
>  create mode 100644 t/lib-submodule-superproject.sh
> 
> -- 
> 2.34.0.796.g2c87ed6146a
> 

  parent reply	other threads:[~2021-11-23 20:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-17  0:56 [PATCH v6 0/5] teach submodules to know they're submodules Emily Shaffer
2021-11-17  0:56 ` [PATCH v6 1/5] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2021-11-17  0:56 ` [PATCH v6 2/5] introduce submodule.superprojectGitDir record Emily Shaffer
2021-11-17 23:43   ` Jonathan Tan
2021-11-17  0:56 ` [PATCH v6 3/5] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
2021-11-17  0:57 ` [PATCH v6 4/5] submodule: record superproject gitdir during 'update' Emily Shaffer
2021-11-17  0:57 ` [PATCH v6 5/5] submodule: use config to find superproject worktree Emily Shaffer
2021-11-17 11:43 ` [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around Ævar Arnfjörð Bjarmason
2021-11-17 11:43   ` [RFC PATCH 1/2] submodule tests: fix potentially broken "config .. --unset" Ævar Arnfjörð Bjarmason
2021-11-17 11:43   ` [RFC PATCH 2/2] submodule: add test mode for checking absence of "superProjectGitDir" Ævar Arnfjörð Bjarmason
2021-11-23 20:08   ` Emily Shaffer [this message]
2021-11-24  1:38     ` [RFC PATCH 0/2] submodule: test what happens if submodule.superprojectGitDir isn't around Ævar Arnfjörð Bjarmason
2021-11-17 23:28 ` [PATCH v6 0/5] teach submodules to know they're submodules Jonathan Tan
2021-11-23 20:28   ` Emily Shaffer
2022-02-03 21:59 ` Emily Shaffer
2022-02-03 21:59   ` [PATCH v7 1/4] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2022-02-03 21:59   ` [PATCH v7 2/4] introduce submodule.superprojectGitDir record Emily Shaffer
2022-02-03 21:59   ` [PATCH v7 3/4] submodule: record superproject gitdir during absorbgitdirs Emily Shaffer
2022-02-03 21:59   ` [PATCH v7 4/4] submodule: record superproject gitdir during 'update' Emily Shaffer
2022-02-03 22:39   ` [PATCH v6 0/5] teach submodules to know they're submodules Junio C Hamano
2022-02-04  1:15   ` Ævar Arnfjörð Bjarmason
2022-02-04 16:20     ` Junio C Hamano
2022-02-07 19:56     ` Jonathan Nieder
2022-02-07 23:21       ` Junio C Hamano
2022-02-08  1:18         ` Jonathan Nieder
2022-02-08 18:24           ` Junio C Hamano
2022-02-10 22:12             ` Emily Shaffer
2022-02-10 22:53               ` Jonathan Nieder
2022-02-12 20:35       ` Ævar Arnfjörð Bjarmason
2022-02-13  6:25         ` Junio C Hamano
2022-03-01  0:26   ` [PATCH v8 0/3] " Emily Shaffer
2022-03-01  0:26     ` [PATCH v8 1/3] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2022-03-01  0:26     ` [PATCH v8 2/3] introduce submodule.hasSuperproject record Emily Shaffer
2022-03-01  7:00       ` Junio C Hamano
2022-03-08 20:04         ` Emily Shaffer
2022-03-08 22:13       ` Glen Choo
2022-03-08 22:29         ` Glen Choo
2022-03-01  0:26     ` [PATCH v8 3/3] rev-parse: short-circuit superproject worktree when config unset Emily Shaffer
2022-03-01  7:06       ` Junio C Hamano
2022-03-09  0:38         ` Emily Shaffer
2022-03-01  3:08     ` [PATCH v8 0/3] teach submodules to know they're submodules Junio C Hamano
2022-03-08 18:54       ` Emily Shaffer
2022-03-10  0:44     ` [PATCH v9 " Emily Shaffer
2022-03-10  0:44       ` [PATCH v9 1/3] t7400-submodule-basic: modernize inspect() helper Emily Shaffer
2022-03-10  0:44       ` [PATCH v9 2/3] introduce submodule.hasSuperproject record Emily Shaffer
2022-03-10  2:09         ` Junio C Hamano
2022-03-10 21:29           ` Glen Choo
2022-03-10 21:40           ` Glen Choo
2022-03-10 22:10             ` Junio C Hamano
2022-03-10 23:42               ` Glen Choo
2022-03-10 23:53                 ` Glen Choo
2022-03-15 20:48                   ` Emily Shaffer
2022-03-15 20:56                     ` Emily Shaffer
2022-03-15 21:19                       ` Glen Choo
2022-03-15 18:39               ` Emily Shaffer
2022-03-15 19:19                 ` Junio C Hamano
2022-03-10  2:32         ` Junio C Hamano
2022-03-10 21:54         ` Glen Choo
2022-03-15 18:27           ` Emily Shaffer
2022-03-10  0:44       ` [PATCH v9 3/3] rev-parse: short-circuit superproject worktree when config unset Emily Shaffer
2022-03-10  1:47         ` Junio C Hamano
2022-03-10  4:39           ` Eric Sunshine
2022-03-11  9:09       ` [PATCH v9 0/3] teach submodules to know they're submodules Ævar Arnfjörð Bjarmason
2022-03-13  5:43         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZ1KLNwsxx7IR1+5@google.com \
    --to=emilyshaffer@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=albertcui@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=matheus.bernardino@usp.br \
    --cc=phillip.wood123@gmail.com \
    --cc=raykar.ath@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).