git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Kyle Meyer <kyle@kyleam.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs
Date: Wed, 4 Dec 2019 09:30:41 -0800	[thread overview]
Message-ID: <CABPp-BFG3FkTkC=L1v97LUksndkOmCN8ZhNJh5eoNdquE7v9DA@mail.gmail.com> (raw)
In-Reply-To: <87fti15agv.fsf@kyleam.com>

Hi Kyle,

Thanks for the clear report and pointing out relevant commit(s) and
discussion.  Apologies in advance if I come off a bit ranty; it's
directed to dir.c and its API and not you.  It seems to be quite the
mess...

On Tue, Dec 3, 2019 at 2:08 PM Kyle Meyer <kyle@kyleam.com> wrote:
>
> 89a1f4aaf7 (dir: if our pathspec might match files under a dir, recurse
> into it, 2019-09-17) introduced a change in behavior in terms of
> traversing untracked nested repositories.  Say we have a repository that
> contains a single untracked repository with untracked content:
>
>     $ git init && git init a && touch a/x
>
> Calling ls-files with the nested repository as the sole pathspec does
> not recurse into that repository:
>
>     $ git ls-files --other a
>     a/
>
> However, as of 89a1f4aaf7, adding an additional pathspec results in the
> nested repository being traversed:
>
>     $ git ls-files --other a foo
>     a/
>     a/x
>
> Reading 89a1f4aaf7 and skimming the patch series and related thread [*],
> I haven't found anything that makes me think this change in behavior was
> intentional.

Oh man, I guess I shouldn't be surprised by another area of the code
that depends on dir.c but lacks any meaningful tests of its behavior,
violated the existing contract[1], and depends on the side effects of
other bugs -- bugs which don't cover all cases and thus causes it to
get different behavior depending on things that otherwise shouldn't
matter.  Behold, *before* my changes to dir.c:

$ git --version
2.23.0
$ find . -type f | grep -v .git
./empty
./untracked_repo/empty
./untracked_dir/empty
./world
$ git ls-files
world
$ git ls-files -o
empty
untracked_dir/empty
untracked_repo/

So, as you say, it wouldn't traverse into untracked_repo/.  Now we
start adding pathspecs:

$ git ls-files -o untracked_repo
untracked_repo/

As you mentioned, it won't traverse into it even when specified...

$ git ls-files -o untracked_repo/
untracked_repo/empty

...except that it does traverse into this directory if the user tab
completes the name or otherwise manually adds a trailing slash.
Weird, let's try multiple pathspecs:

$ git ls-files -o untracked_dir untracked_repo
untracked_dir/empty
untracked_repo/

$ git ls-files -o untracked_dir untracked_repo/
untracked_dir/empty
untracked_repo/

So it will traverse into the untracked_repo when specified as
'untracked_repo/' but not if there are more than one pathspec given?!?
 And it traverses into an untracked directory regardless of the
trailing slash?  <sarcasm>What a paragon of consistency...</sarcasm>


At least my changes in git-2.24.0 made the behavior consistent; it'll
always traverse into a directory that matches a given pathspec.  As
for whether that's desirable or not when the pathspec is a submodule,
I'm not certain.  My fixes to dir.c stalled out for over a year and a
half despite a few reports that they had fixed issues people were
continuing to report in the wild because the whole traversal logic was
such a mess and there were so many dependencies on existing behavior
built up with that mess that it was really hard to determine what
"correct" behavior was and others seemed to be unwilling to wade
through the muck to figure out where sanity might lie so they could
help give me pointers or opinions on what "correct" was[2].

But here are some possibilities that at least sound sane:

A) ls-files -o should traverse into untracked submodules.  This case
is easy; the code already does that.

B) ls-files -o should NOT traverse into untracked submodules AND
should not even report them.  If so, fix looks like this:

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index f069a028ce..f144d44d8b 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -301,6 +301,9 @@ static void show_files(struct repository *repo,
struct dir_struct *dir)
        int i;
        struct strbuf fullname = STRBUF_INIT;

+       if (!recurse_submodules)
+               dir->flags |= DIR_SKIP_NESTED_GIT;
+
        /* For cached/deleted files we don't need to even do the readdir */
        if (show_others || show_killed) {
                if (!show_others)

C) ls-files -o should NOT traverse into untracked submodules, but
should at least report their directory name.  If so, the fix is
probably to move the DIR_NO_GITLINKS if-block within
dir.c:treat_directory() to put it right next to the
DIR_SKIP_NESTED_GIT if-block (and maybe even partially combine the
two) so that it comes before some of the other code in that function.
Might have to be careful about checking for the presence of trailing
slashes.


It seems like it should be clear which one of these is "correct", but
I seem to be short a few brain cycles right now...either that, or
maybe my very rare use of submodules and nested repositories means I
just don't have enough context to answer.  Maybe it's obvious to
someone else...


Elijah

[1] https://lore.kernel.org/git/xmqqefjp6sko.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/

  reply	other threads:[~2019-12-04 17:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-03 22:08 [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs Kyle Meyer
2019-12-04 17:30 ` Elijah Newren [this message]
2019-12-04 19:42   ` Junio C Hamano
2019-12-04 20:04   ` Kyle Meyer
2019-12-08  5:31     ` Kyle Meyer
2019-12-08  5:42       ` Elijah Newren
2019-12-08  7:46         ` Elijah Newren
2019-12-08 22:59           ` Kyle Meyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFG3FkTkC=L1v97LUksndkOmCN8ZhNJh5eoNdquE7v9DA@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kyle@kyleam.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).