Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs
@ 2019-12-03 22:08 Kyle Meyer
  2019-12-04 17:30 ` Elijah Newren
  0 siblings, 1 reply; 4+ messages in thread
From: Kyle Meyer @ 2019-12-03 22:08 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

89a1f4aaf7 (dir: if our pathspec might match files under a dir, recurse
into it, 2019-09-17) introduced a change in behavior in terms of
traversing untracked nested repositories.  Say we have a repository that
contains a single untracked repository with untracked content:

    $ git init && git init a && touch a/x

Calling ls-files with the nested repository as the sole pathspec does
not recurse into that repository:

    $ git ls-files --other a
    a/

However, as of 89a1f4aaf7, adding an additional pathspec results in the
nested repository being traversed:

    $ git ls-files --other a foo
    a/
    a/x

Reading 89a1f4aaf7 and skimming the patch series and related thread [*],
I haven't found anything that makes me think this change in behavior was
intentional.

[*]: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs
  2019-12-03 22:08 [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs Kyle Meyer
@ 2019-12-04 17:30 ` Elijah Newren
  2019-12-04 19:42   ` Junio C Hamano
  2019-12-04 20:04   ` Kyle Meyer
  0 siblings, 2 replies; 4+ messages in thread
From: Elijah Newren @ 2019-12-04 17:30 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: Git Mailing List

Hi Kyle,

Thanks for the clear report and pointing out relevant commit(s) and
discussion.  Apologies in advance if I come off a bit ranty; it's
directed to dir.c and its API and not you.  It seems to be quite the
mess...

On Tue, Dec 3, 2019 at 2:08 PM Kyle Meyer <kyle@kyleam.com> wrote:
>
> 89a1f4aaf7 (dir: if our pathspec might match files under a dir, recurse
> into it, 2019-09-17) introduced a change in behavior in terms of
> traversing untracked nested repositories.  Say we have a repository that
> contains a single untracked repository with untracked content:
>
>     $ git init && git init a && touch a/x
>
> Calling ls-files with the nested repository as the sole pathspec does
> not recurse into that repository:
>
>     $ git ls-files --other a
>     a/
>
> However, as of 89a1f4aaf7, adding an additional pathspec results in the
> nested repository being traversed:
>
>     $ git ls-files --other a foo
>     a/
>     a/x
>
> Reading 89a1f4aaf7 and skimming the patch series and related thread [*],
> I haven't found anything that makes me think this change in behavior was
> intentional.

Oh man, I guess I shouldn't be surprised by another area of the code
that depends on dir.c but lacks any meaningful tests of its behavior,
violated the existing contract[1], and depends on the side effects of
other bugs -- bugs which don't cover all cases and thus causes it to
get different behavior depending on things that otherwise shouldn't
matter.  Behold, *before* my changes to dir.c:

$ git --version
2.23.0
$ find . -type f | grep -v .git
./empty
./untracked_repo/empty
./untracked_dir/empty
./world
$ git ls-files
world
$ git ls-files -o
empty
untracked_dir/empty
untracked_repo/

So, as you say, it wouldn't traverse into untracked_repo/.  Now we
start adding pathspecs:

$ git ls-files -o untracked_repo
untracked_repo/

As you mentioned, it won't traverse into it even when specified...

$ git ls-files -o untracked_repo/
untracked_repo/empty

...except that it does traverse into this directory if the user tab
completes the name or otherwise manually adds a trailing slash.
Weird, let's try multiple pathspecs:

$ git ls-files -o untracked_dir untracked_repo
untracked_dir/empty
untracked_repo/

$ git ls-files -o untracked_dir untracked_repo/
untracked_dir/empty
untracked_repo/

So it will traverse into the untracked_repo when specified as
'untracked_repo/' but not if there are more than one pathspec given?!?
 And it traverses into an untracked directory regardless of the
trailing slash?  <sarcasm>What a paragon of consistency...</sarcasm>


At least my changes in git-2.24.0 made the behavior consistent; it'll
always traverse into a directory that matches a given pathspec.  As
for whether that's desirable or not when the pathspec is a submodule,
I'm not certain.  My fixes to dir.c stalled out for over a year and a
half despite a few reports that they had fixed issues people were
continuing to report in the wild because the whole traversal logic was
such a mess and there were so many dependencies on existing behavior
built up with that mess that it was really hard to determine what
"correct" behavior was and others seemed to be unwilling to wade
through the muck to figure out where sanity might lie so they could
help give me pointers or opinions on what "correct" was[2].

But here are some possibilities that at least sound sane:

A) ls-files -o should traverse into untracked submodules.  This case
is easy; the code already does that.

B) ls-files -o should NOT traverse into untracked submodules AND
should not even report them.  If so, fix looks like this:

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index f069a028ce..f144d44d8b 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -301,6 +301,9 @@ static void show_files(struct repository *repo,
struct dir_struct *dir)
        int i;
        struct strbuf fullname = STRBUF_INIT;

+       if (!recurse_submodules)
+               dir->flags |= DIR_SKIP_NESTED_GIT;
+
        /* For cached/deleted files we don't need to even do the readdir */
        if (show_others || show_killed) {
                if (!show_others)

C) ls-files -o should NOT traverse into untracked submodules, but
should at least report their directory name.  If so, the fix is
probably to move the DIR_NO_GITLINKS if-block within
dir.c:treat_directory() to put it right next to the
DIR_SKIP_NESTED_GIT if-block (and maybe even partially combine the
two) so that it comes before some of the other code in that function.
Might have to be careful about checking for the presence of trailing
slashes.


It seems like it should be clear which one of these is "correct", but
I seem to be short a few brain cycles right now...either that, or
maybe my very rare use of submodules and nested repositories means I
just don't have enough context to answer.  Maybe it's obvious to
someone else...


Elijah

[1] https://lore.kernel.org/git/xmqqefjp6sko.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs
  2019-12-04 17:30 ` Elijah Newren
@ 2019-12-04 19:42   ` Junio C Hamano
  2019-12-04 20:04   ` Kyle Meyer
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2019-12-04 19:42 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Kyle Meyer, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> C) ls-files -o should NOT traverse into untracked submodules, but
> should at least report their directory name.

I think this probably is the most sensible.  

The top-level directory of a working tree of a repository other than
the current one may exist in the working tree.  It is very tempting
to declare that, unless we know it is a submodule that has the
current repository as its superproject, we should just treat it as a
normal subdirectory without *any* files tracked by the current
repository, which would mean that we pretend that the ".git/" in
that subdirectory is not any special---but that would obviously make
things quite messy (e.g. our "ls-files -o" would descend into the
other project's working tree and even in its .git/ directory), so we
need to special case a directory that has ".git/" in it, whether it
is a submodule for our current repository or not.

Thanks for working on this.  I agree that dir.c traversal has become
messier and messier, especially with its interaction with
submodules.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs
  2019-12-04 17:30 ` Elijah Newren
  2019-12-04 19:42   ` Junio C Hamano
@ 2019-12-04 20:04   ` Kyle Meyer
  1 sibling, 0 replies; 4+ messages in thread
From: Kyle Meyer @ 2019-12-04 20:04 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

Hi Elijah,

Thanks for the detailed and helpful reply.

Elijah Newren <newren@gmail.com> writes:

[...]

> As you mentioned, it won't traverse into it even when specified...
>
> $ git ls-files -o untracked_repo/
> untracked_repo/empty
>
> ...except that it does traverse into this directory if the user tab
> completes the name or otherwise manually adds a trailing slash.

Ah yes, I recall encountering what I think is the same underlying issue
when working on a previous series [0,1].  In the context of 'git add
untracked_repo/', there's been some discussion related to this trailing
slash discrepancy at

  https://lore.kernel.org/git/20180618111919.GA10085@book.hvoigt.net/T/#u

> Weird, let's try multiple pathspecs:
>
> $ git ls-files -o untracked_dir untracked_repo
> untracked_dir/empty
> untracked_repo/
>
> $ git ls-files -o untracked_dir untracked_repo/
> untracked_dir/empty
> untracked_repo/
>
> So it will traverse into the untracked_repo when specified as
> 'untracked_repo/' but not if there are more than one pathspec given?!?

Eh, indeed.

>  And it traverses into an untracked directory regardless of the
> trailing slash?  <sarcasm>What a paragon of consistency...</sarcasm>
>
>
> At least my changes in git-2.24.0 made the behavior consistent; it'll
> always traverse into a directory that matches a given pathspec.

I might be getting mixed up, but the changes in 2.24.0 did introduce
some inconsistent behavior (in the no trailing slash case) with respect
to giving a single pathspec and giving multiple pathspecs, no?  Using
your example:

    $ git --version
    git version 2.24.0
    $ git ls-files -o untracked_repo
    untracked_repo/
    $ git ls-files -o untracked_repo empty
    empty
    untracked_repo/
    untracked_repo/empty

> As for whether that's desirable or not when the pathspec is a submodule,
> I'm not certain. [...]
>
> But here are some possibilities that at least sound sane:
>
> A) ls-files -o should traverse into untracked submodules.  This case
> is easy; the code already does that.

Hmm, but as shown in the last example, ls-files -o doesn't traverse into
untracked submodules for the single pathspec case.

> B) ls-files -o should NOT traverse into untracked submodules AND
> should not even report them.
>
> C) ls-files -o should NOT traverse into untracked submodules, but
> should at least report their directory name.  If so, the fix is
> [...]

This behavior---which matches the no-slash behavior when no patchspec or
a single pathspec is given (on both v2.24.0 and previous version) as
well as when multiple pathspecs are given (before v2.24.0)---is the one
I prefer.  My biased reason for this preference is that in the DataLad
project we identify untracked nested repositories based on `ls-files -o
<untracked directory>...` reporting only the directory name for
repositories.  (Looking into one of our tests that fails with Git
v2.24.0 is how I ran into the reported change in behavior [2].)

That some external project relies on unintended ls-files output of
course doesn't mean that Git should keep reporting things that way, but
it does mean that I _hope_ that not traversing into untracked
repositories is the intended behavior and that traversing (either
because a slash is appended or as of 89a1f4aaf7 because multiple
pathspecs are given) is not intended :>


[0]: https://lore.kernel.org/git/20190409230737.26809-1-kyle@kyleam.com
[1]: https://lore.kernel.org/git/87bm1mbua4.fsf@kyleam.com/
[2]: https://github.com/datalad/datalad/issues/3890#issuecomment-561722194

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03 22:08 [BUG?] ls-files -o now traverses nested repo when given multiple pathspecs Kyle Meyer
2019-12-04 17:30 ` Elijah Newren
2019-12-04 19:42   ` Junio C Hamano
2019-12-04 20:04   ` Kyle Meyer

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git