All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] extending pathspec support to submodules
@ 2016-09-14 23:57 Brandon Williams
  2016-09-15 11:57 ` Heiko Voigt
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-14 23:57 UTC (permalink / raw)
  To: git, pclouds; +Cc: Brandon Williams

---
I've been trying to think through how we could potentially add pathspec support
for --recurse-submodule options (for builtins like ls-files or grep down the
line).  This is something that could be useful if the user supply's a pathspec
that could match to a file in a submodule.  We could match the submodule to the
pathspec and then fork the process to recursively run the command on the
submodule which can be passed a modified pathspec.

For example with a pathspec 'sub/dir/a', where sub is a submodule in the root
directory of the supermodule's repo, we could match 'sub' to that spec and then
recursively call the git command with a pathspec of 'dir/a'.  The child process
would then have the responsibility of matching 'dir/a' to files in its repo.

Does this seem like a reasonable feature to add? And if so are how is my
initial approach at solving the problem?

One idea I had was to add a submodule match flag in order to perform special
matching just in the --recurse-submodules cases since we'll want somethings to
match here that wouldn't normally match.

@@ -283,6 +284,29 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
 			 item->nowildcard_len - prefix))
 		return MATCHED_FNMATCH;
 
+	/*
+	 * Preform some checks to see if "name" is a super set of the pathspec
+	 */
+	if (flags & DO_MATCH_SUBMODULE) {
+		struct strbuf buf = STRBUF_INIT;
+		strbuf_addstr(&buf, name);
+		strbuf_addch(&buf, '/');
+		/*
+		 * Check if the name is a prefix of the pathspec
+		 */
+		if ((item->match[namelen] == '/') &&
+		    !ps_strncmp(item, match, name, namelen))
+			return MATCHED_RECURSIVELY;
+		/*
+		 * Check if the name wildmatches to the pathspec
+		 */
+		if (!wildmatch(item->match, buf.buf,
+			       WM_PREFIX |
+			       (item->magic & PATHSPEC_ICASE ? WM_CASEFOLD : 0),
+			       NULL));
+		    return MATCHED_FNMATCH;
+	}
+
 	return 0;
 }
 
One of the main difficulties I was having is figuring out how wildmatching
should be applied in this case.  What I believe we want is the ability for the
whole name of the submodule to match a prefix of the pathspec pattern.  To do
this I was thinking of adding a flag to do prefix matching to the wildmatch
function like so: 


diff --git a/wildmatch.c b/wildmatch.c
index 57c8765..f1e1725 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -60,8 +60,12 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
 	for ( ; (p_ch = *p) != '\0'; text++, p++) {
 		int matched, match_slash, negated;
 		uchar t_ch, prev_ch;
-		if ((t_ch = *text) == '\0' && p_ch != '*')
-			return WM_ABORT_ALL;
+		if ((t_ch = *text) == '\0' && p_ch != '*') {
+			if ((flags & WM_PREFIX) && (*(p-1) == '/'))
+				return WM_MATCH;
+			else
+				return WM_ABORT_ALL;
+		}
 		if ((flags & WM_CASEFOLD) && ISUPPER(t_ch))
 			t_ch = tolower(t_ch);
 		if ((flags & WM_CASEFOLD) && ISUPPER(p_ch))
diff --git a/wildmatch.h b/wildmatch.h
index 4090c8f..490db51 100644
--- a/wildmatch.h
+++ b/wildmatch.h
@@ -3,6 +3,7 @@
 
 #define WM_CASEFOLD 1
 #define WM_PATHNAME 2
+#define WM_PREFIX 4
 
 #define WM_ABORT_MALFORMED 2
 #define WM_NOMATCH 1
-- 

Any comments or thoughts on this would be appreciated.

Thanks,
Brandon

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC] extending pathspec support to submodules
  2016-09-14 23:57 [RFC] extending pathspec support to submodules Brandon Williams
@ 2016-09-15 11:57 ` Heiko Voigt
  2016-09-15 15:26   ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2016-09-15 11:57 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, pclouds, Jens Lehmann, Stefan Beller

Hi,

On Wed, Sep 14, 2016 at 04:57:53PM -0700, Brandon Williams wrote:
> ---
> I've been trying to think through how we could potentially add pathspec support
> for --recurse-submodule options (for builtins like ls-files or grep down the
> line).  This is something that could be useful if the user supply's a pathspec
> that could match to a file in a submodule.  We could match the submodule to the
> pathspec and then fork the process to recursively run the command on the
> submodule which can be passed a modified pathspec.
> 
> For example with a pathspec 'sub/dir/a', where sub is a submodule in the root
> directory of the supermodule's repo, we could match 'sub' to that spec and then
> recursively call the git command with a pathspec of 'dir/a'.  The child process
> would then have the responsibility of matching 'dir/a' to files in its repo.
> 
> Does this seem like a reasonable feature to add? And if so are how is my
> initial approach at solving the problem?

The problem when you do that is that the child is not aware that it is
actually run as a submodule process. E.g.

   git grep --recurse-submodules foobar -- sub/dir/a

would report back matches in 'dir/a' instead of 'sub/dir/a'. From the
users perspective this will be confusing since she started the grep in
the superproject.

So what I would suggest is that we make the childs aware of the fact
that they are called from a superproject by passing a prefix when
calling it. E.g. like this

   git grep --submodule-prefix=sub foobar -- sub/dir/a

Whether we pass the full or stripped path is now a matter of taste or
ease of implementation, since we could either preprend or strip it in
the child. But the important difference is that we have information
about the original path.

Another approach could, of course, be to omit passing the prefix and
parse the output of the child and try to substitute, paths but that
seems quite error prone to me.

Cheers Heiko

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] extending pathspec support to submodules
  2016-09-15 11:57 ` Heiko Voigt
@ 2016-09-15 15:26   ` Brandon Williams
  2016-09-15 22:08     ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-15 15:26 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: git, pclouds, Jens Lehmann, Stefan Beller

On Thu, Sep 15, 2016 at 4:57 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
>
> The problem when you do that is that the child is not aware that it is
> actually run as a submodule process. E.g.
>
>    git grep --recurse-submodules foobar -- sub/dir/a
>
> would report back matches in 'dir/a' instead of 'sub/dir/a'. From the
> users perspective this will be confusing since she started the grep in
> the superproject.
>
> So what I would suggest is that we make the childs aware of the fact
> that they are called from a superproject by passing a prefix when
> calling it. E.g. like this
>
>    git grep --submodule-prefix=sub foobar -- sub/dir/a
>
> Whether we pass the full or stripped path is now a matter of taste or
> ease of implementation, since we could either preprend or strip it in
> the child. But the important difference is that we have information
> about the original path.
>
> Another approach could, of course, be to omit passing the prefix and
> parse the output of the child and try to substitute, paths but that
> seems quite error prone to me.
>
> Cheers Heiko

You're right that seems like the best course of action and it already falls
inline with what I did with a first patch to ls-files to support submodules.
In that patch I did exactly as you suggest and pass in the prefix to the
submodule and make the child responsible for prepending the prefix to all of
its output.  This way we can simply pass through the whole pathspec (as apposed
to my original idea of stripping the prefix off the pathspec prior to passing
it to the child...which can get complicated with wild characters) to the
childprocess and when checking if a file matches the pathspec we can check if
the prefix + file path matches.

-Brandon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] extending pathspec support to submodules
  2016-09-15 15:26   ` Brandon Williams
@ 2016-09-15 22:08     ` Junio C Hamano
  2016-09-15 22:28       ` Stefan Beller
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-15 22:08 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Heiko Voigt, git, pclouds, Jens Lehmann, Stefan Beller

Brandon Williams <bmwill@google.com> writes:

> You're right that seems like the best course of action and it already falls
> inline with what I did with a first patch to ls-files to support submodules.
> In that patch I did exactly as you suggest and pass in the prefix to the
> submodule and make the child responsible for prepending the prefix to all of
> its output.  This way we can simply pass through the whole pathspec (as apposed
> to my original idea of stripping the prefix off the pathspec prior to passing
> it to the child...which can get complicated with wild characters) to the
> childprocess and when checking if a file matches the pathspec we can check if
> the prefix + file path matches.

That's brilliant.  A few observations.

 * With that change to tell the command that is spawned in a
   submodule directory where the submodule repository is in the
   context of the top-level superproject _and_ require it to take a
   pathspec as relative to the top-level superproject, you no longer
   worry about having to find where to cut the pathspec given at the
   top-level to adjust it for the submodule's context.  That may
   simplify things.

 * Your program that runs in the top-level superproject still needs
   to be able to say "this pathspec from the top cannot possibly
   match anything in the submodule, so let's not even bother
   descending into it".

 * Earlier while reviewing "ls-files" recursion, I suggested (and
   you took) --output-path-prefix as the option name, because it was
   meant to be "when you output any path, prefix this string".  But
   the suggested name is suboptimal, as it is no longer an option
   that is only about "output".  A command that runs in a submodule
   would:

   - enumerate paths in the context of the submodule repository,
   - prepend the "prefix" to these paths,
   - filter by applying the full-tree pathspec, and
   - work on the surviving paths after filtering.

   When the last step, "work on", involves just "printing", the
   whole path (with "prefix") is sent to the output.  If it involves
   some operation relative to the submodule repository (e.g. seeing
   if it is in the index), the "prefix" may have to be stripped
   while the operation is carried out.

   So we may have to rethink what this option name should be.  "You
   are running in a repository that is used as a submodule in a
   larger context, which has the submodule at this path" is what the
   option tells the command; if any existing command already has
   such an option, we should use it.  If we are inventing one,
   perhaps "--submodule-path" (I didn't check if there are existing
   options that sound similar to it and mean completely different
   things, in which case that name is not usable)?

Thanks.

   

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] extending pathspec support to submodules
  2016-09-15 22:08     ` Junio C Hamano
@ 2016-09-15 22:28       ` Stefan Beller
  2016-09-16  9:34         ` Heiko Voigt
  0 siblings, 1 reply; 25+ messages in thread
From: Stefan Beller @ 2016-09-15 22:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Williams, Heiko Voigt, git, Duy Nguyen, Jens Lehmann

On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>> You're right that seems like the best course of action and it already falls
>> inline with what I did with a first patch to ls-files to support submodules.
>> In that patch I did exactly as you suggest and pass in the prefix to the
>> submodule and make the child responsible for prepending the prefix to all of
>> its output.  This way we can simply pass through the whole pathspec (as apposed
>> to my original idea of stripping the prefix off the pathspec prior to passing
>> it to the child...which can get complicated with wild characters) to the
>> childprocess and when checking if a file matches the pathspec we can check if
>> the prefix + file path matches.
>
> That's brilliant.  A few observations.
>
>  * With that change to tell the command that is spawned in a
>    submodule directory where the submodule repository is in the
>    context of the top-level superproject _and_ require it to take a
>    pathspec as relative to the top-level superproject, you no longer
>    worry about having to find where to cut the pathspec given at the
>    top-level to adjust it for the submodule's context.  That may
>    simplify things.

I wonder how this plays together with the prefix in the superproject, e.g.

    cd super/unrelated-path
    # when invoking a git command the internal prefix is "unrelated-path/"
    git ls-files ../submodule-*
    # a submodule in submodule-A would be run in  submodule-A
    # with a superproject prefix of super/ ? but additionally we nned
to know we're
    # not at the root of the superproject.

>    So we may have to rethink what this option name should be.  "You
>    are running in a repository that is used as a submodule in a
>    larger context, which has the submodule at this path" is what the
>    option tells the command; if any existing command already has
>    such an option, we should use it.  If we are inventing one,
>    perhaps "--submodule-path" (I didn't check if there are existing
>    options that sound similar to it and mean completely different
>    things, in which case that name is not usable)?

Would it make sense to add the '--submodule-path' to a more generic
part of the code? It's not just ls-files/grep that have to solve exactly this
problem. Up to now we just did not go for those commands, though.

Thanks

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] extending pathspec support to submodules
  2016-09-15 22:28       ` Stefan Beller
@ 2016-09-16  9:34         ` Heiko Voigt
  2016-09-16 18:40           ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Heiko Voigt @ 2016-09-16  9:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Brandon Williams, git, Duy Nguyen, Jens Lehmann

Hi,

On Thu, Sep 15, 2016 at 03:28:21PM -0700, Stefan Beller wrote:
> On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Brandon Williams <bmwill@google.com> writes:
> >
> >> You're right that seems like the best course of action and it already falls
> >> inline with what I did with a first patch to ls-files to support submodules.
> >> In that patch I did exactly as you suggest and pass in the prefix to the
> >> submodule and make the child responsible for prepending the prefix to all of
> >> its output.  This way we can simply pass through the whole pathspec (as apposed
> >> to my original idea of stripping the prefix off the pathspec prior to passing
> >> it to the child...which can get complicated with wild characters) to the
> >> childprocess and when checking if a file matches the pathspec we can check if
> >> the prefix + file path matches.
> >
> > That's brilliant.  A few observations.
> >
> >  * With that change to tell the command that is spawned in a
> >    submodule directory where the submodule repository is in the
> >    context of the top-level superproject _and_ require it to take a
> >    pathspec as relative to the top-level superproject, you no longer
> >    worry about having to find where to cut the pathspec given at the
> >    top-level to adjust it for the submodule's context.  That may
> >    simplify things.
> 
> I wonder how this plays together with the prefix in the superproject, e.g.
> 
>     cd super/unrelated-path
>     # when invoking a git command the internal prefix is "unrelated-path/"
>     git ls-files ../submodule-*
>     # a submodule in submodule-A would be run in  submodule-A
>     # with a superproject prefix of super/ ? but additionally we nned
> to know we're
>     # not at the root of the superproject.

Do we need to know that? The internal prefix is internal to each
repository and can be treated as such. I would expect that the prefix is
only prefixed when needed. E.g. when we display output to the user,
match files, ...

How about "../submodule-A" as the submodule prefix in the situation you
describe? The wildcard would be resolved by the superproject since the
directory is still in its domain.

I would think of the submodule prefix as the path relative to the
command that started everything. E.g. if we have a tree like this:

        *--subA
       /
super *--subB--subsubB
       \
        *--dirC

where subA, subB and subsubB are submodules and dirC is just a
directory inside super.

We would get the following prefixes when issuing a command in dirC that
has a pathspec for subsubB:

  subB: ../subB
  subsubB: ../subB/subsubB

An interesting case is when we issue a command in subA:

  super:   ..
  subB:    ../subB
  subsubB: ../subB/subsubB

A rule for the prefix option could be: Always specified when crossing a
repository boundary with the pathspec (including upwards).

I have not completely thought this through though so just take this as
some food for thought. Since I am not sure what Junio's rationale behind
making the prefix relative to the toplevel superproject was, but I guess
finding it could be a challenge in some situations. I.e. is the
repository in home directory tracking all the dot-files really the
superproject or was it that other one I found before?

> >    So we may have to rethink what this option name should be.  "You
> >    are running in a repository that is used as a submodule in a
> >    larger context, which has the submodule at this path" is what the
> >    option tells the command; if any existing command already has
> >    such an option, we should use it.  If we are inventing one,
> >    perhaps "--submodule-path" (I didn't check if there are existing
> >    options that sound similar to it and mean completely different
> >    things, in which case that name is not usable)?
> 
> Would it make sense to add the '--submodule-path' to a more generic
> part of the code? It's not just ls-files/grep that have to solve exactly this
> problem. Up to now we just did not go for those commands, though.

Yes I think so, since it should also handle starting from a submodule
with a pathspec to the superproject or other submodule. In case we
go with my above suggestion I would suggest a more generic name since
the option could also be passed to processes handling the superproject.
E.g. something like --module-prefix or --repository-prefix comes to my
mind, not checked though.

Cheers Heiko

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] extending pathspec support to submodules
  2016-09-16  9:34         ` Heiko Voigt
@ 2016-09-16 18:40           ` Brandon Williams
  2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-16 18:40 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Stefan Beller, Junio C Hamano, git, Duy Nguyen, Jens Lehmann

On Thu, Sep 15, 2016 at 3:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>  * Your program that runs in the top-level superproject still needs
>    to be able to say "this pathspec from the top cannot possibly
>    match anything in the submodule, so let's not even bother
>    descending into it".
>

Yes, we would need to first check if the submodule is a prefix match to the
pathspec.  ie a submodule 'sub' would need to match the pathspec 'sub/somedir'
or '*.txt' but not the pathspecs 'subdirectory' or 'otherdir'

> > >    So we may have to rethink what this option name should be.  "You
> > >    are running in a repository that is used as a submodule in a
> > >    larger context, which has the submodule at this path" is what the
> > >    option tells the command; if any existing command already has
> > >    such an option, we should use it.  If we are inventing one,
> > >    perhaps "--submodule-path" (I didn't check if there are existing
> > >    options that sound similar to it and mean completely different
> > >    things, in which case that name is not usable)?
> >
> > Would it make sense to add the '--submodule-path' to a more generic
> > part of the code? It's not just ls-files/grep that have to solve exactly this
> > problem. Up to now we just did not go for those commands, though.
>
> Yes I think so, since it should also handle starting from a submodule
> with a pathspec to the superproject or other submodule. In case we
> go with my above suggestion I would suggest a more generic name since
> the option could also be passed to processes handling the superproject.
> E.g. something like --module-prefix or --repository-prefix comes to my
> mind, not checked though.

Yeah we may want to come up with a more descriptive option name now which can
be generally applied, especially if we are going to continue adding submodule
support for other commands.

-Brandon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] ls-files: add pathspec matching for submodules
  2016-09-16 18:40           ` Brandon Williams
@ 2016-09-17  0:59             ` Brandon Williams
  2016-09-17  3:46               ` Junio C Hamano
  2016-09-19 18:18               ` [PATCH] " Junio C Hamano
  0 siblings, 2 replies; 25+ messages in thread
From: Brandon Williams @ 2016-09-17  0:59 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Pathspecs can be a bit tricky when trying to apply them to submodules.
This change permits the pathspec logic to perform a prefix match against
submodules since a pathspec could refer to a file inside of a submodule.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-ls-files.txt         |   4 +-
 builtin/ls-files.c                     | 143 +++++++++++++++++++--------------
 dir.c                                  |  62 +++++++++++++-
 dir.h                                  |   4 +
 t/t3007-ls-files-recurse-submodules.sh |  66 +++++++++++++--
 5 files changed, 208 insertions(+), 71 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index a623ebf..09e4449 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -19,7 +19,7 @@ SYNOPSIS
 		[--exclude-standard]
 		[--error-unmatch] [--with-tree=<tree-ish>]
 		[--full-name] [--recurse-submodules]
-		[--output-path-prefix=<path>]
+		[--submodule-prefix=<path>]
 		[--abbrev] [--] [<file>...]
 
 DESCRIPTION
@@ -143,7 +143,7 @@ a space) at the start of each line:
 	Recursively calls ls-files on each submodule in the repository.
 	Currently there is only support for the --cached mode.
 
---output-path-prefix=<path>::
+--submodule-prefix=<path>::
 	Prepend the provided path to the output of each file
 
 --abbrev[=<n>]::
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 687e475..dc1e076 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -29,7 +29,7 @@ static int show_valid_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
-static const char *output_path_prefix;
+static const char *submodule_prefix;
 static int recurse_submodules;
 
 static const char *prefix;
@@ -78,9 +78,9 @@ static void write_name(const char *name)
 	 * churn.
 	 */
 	static struct strbuf full_name = STRBUF_INIT;
-	if (output_path_prefix && *output_path_prefix) {
+	if (submodule_prefix && *submodule_prefix) {
 		strbuf_reset(&full_name);
-		strbuf_addstr(&full_name, output_path_prefix);
+		strbuf_addstr(&full_name, submodule_prefix);
 		strbuf_addstr(&full_name, name);
 		name = full_name.buf;
 	}
@@ -177,12 +177,30 @@ static void show_gitlink(const struct cache_entry *ce)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status;
+	int i;
 
 	argv_array_push(&cp.args, "ls-files");
 	argv_array_push(&cp.args, "--recurse-submodules");
-	argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/",
-			 output_path_prefix ? output_path_prefix : "",
+	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
+			 submodule_prefix ? submodule_prefix : "",
 			 ce->name);
+	/* add options */
+	if (show_eol)
+		argv_array_push(&cp.args, "--eol");
+	if (show_valid_bit)
+		argv_array_push(&cp.args, "-v");
+	if (show_stage)
+		argv_array_push(&cp.args, "--stage");
+	if (show_cached)
+		argv_array_push(&cp.args, "--cached");
+	if (debug_mode)
+		argv_array_push(&cp.args, "--debug");
+
+	/* Add pathspec args */
+	argv_array_push(&cp.args, "--");
+	for (i = 0; i < pathspec.nr; ++i)
+		argv_array_push(&cp.args, pathspec.items[i].original);
+
 	cp.git_cmd = 1;
 	cp.dir = ce->name;
 	status = run_command(&cp);
@@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
 
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
+	struct strbuf name = STRBUF_INIT;
 	int len = max_prefix_len;
+	if (submodule_prefix)
+		strbuf_addstr(&name, submodule_prefix);
+	strbuf_addstr(&name, ce->name);
 
 	if (len >= ce_namelen(ce))
-		die("git ls-files: internal error - cache entry not superset of prefix");
+		die("git ls-files: internal error - cache entry not "
+		    "superset of prefix");
 
-	if (!match_pathspec(&pathspec, ce->name, ce_namelen(ce),
-			    len, ps_matched,
-			    S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
-		return;
-	if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+	    submodule_path_match(&pathspec, name.buf, ps_matched)) {
 		show_gitlink(ce);
-		return;
-	}
+	} else if (match_pathspec(&pathspec, name.buf, name.len,
+				  len, ps_matched,
+				  S_ISDIR(ce->ce_mode) ||
+				  S_ISGITLINK(ce->ce_mode))) {
+		if (tag && *tag && show_valid_bit &&
+		    (ce->ce_flags & CE_VALID)) {
+			static char alttag[4];
+			memcpy(alttag, tag, 3);
+			if (isalpha(tag[0]))
+				alttag[0] = tolower(tag[0]);
+			else if (tag[0] == '?')
+				alttag[0] = '!';
+			else {
+				alttag[0] = 'v';
+				alttag[1] = tag[0];
+				alttag[2] = ' ';
+				alttag[3] = 0;
+			}
+			tag = alttag;
+		}
 
-	if (tag && *tag && show_valid_bit &&
-	    (ce->ce_flags & CE_VALID)) {
-		static char alttag[4];
-		memcpy(alttag, tag, 3);
-		if (isalpha(tag[0]))
-			alttag[0] = tolower(tag[0]);
-		else if (tag[0] == '?')
-			alttag[0] = '!';
-		else {
-			alttag[0] = 'v';
-			alttag[1] = tag[0];
-			alttag[2] = ' ';
-			alttag[3] = 0;
+		if (!show_stage) {
+			fputs(tag, stdout);
+		} else {
+			printf("%s%06o %s %d\t",
+			       tag,
+			       ce->ce_mode,
+			       find_unique_abbrev(ce->sha1,abbrev),
+			       ce_stage(ce));
+		}
+		write_eolinfo(ce, ce->name);
+		write_name(ce->name);
+		if (debug_mode) {
+			const struct stat_data *sd = &ce->ce_stat_data;
+
+			printf("  ctime: %d:%d\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
+			printf("  mtime: %d:%d\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
+			printf("  dev: %d\tino: %d\n", sd->sd_dev, sd->sd_ino);
+			printf("  uid: %d\tgid: %d\n", sd->sd_uid, sd->sd_gid);
+			printf("  size: %d\tflags: %x\n", sd->sd_size, ce->ce_flags);
 		}
-		tag = alttag;
 	}
 
-	if (!show_stage) {
-		fputs(tag, stdout);
-	} else {
-		printf("%s%06o %s %d\t",
-		       tag,
-		       ce->ce_mode,
-		       find_unique_abbrev(ce->sha1,abbrev),
-		       ce_stage(ce));
-	}
-	write_eolinfo(ce, ce->name);
-	write_name(ce->name);
-	if (debug_mode) {
-		const struct stat_data *sd = &ce->ce_stat_data;
-
-		printf("  ctime: %d:%d\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
-		printf("  mtime: %d:%d\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
-		printf("  dev: %d\tino: %d\n", sd->sd_dev, sd->sd_ino);
-		printf("  uid: %d\tgid: %d\n", sd->sd_uid, sd->sd_gid);
-		printf("  size: %d\tflags: %x\n", sd->sd_size, ce->ce_flags);
-	}
+	strbuf_release(&name);
 }
 
 static void show_ru_info(void)
@@ -510,7 +534,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		{ OPTION_SET_INT, 0, "full-name", &prefix_len, NULL,
 			N_("make the output relative to the project top directory"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL },
-		OPT_STRING(0, "output-path-prefix", &output_path_prefix,
+		OPT_STRING(0, "submodule-prefix", &submodule_prefix,
 			N_("path"), N_("prepend <path> to each file")),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			N_("recurse through submodules")),
@@ -566,27 +590,28 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		setup_work_tree();
 
 	if (recurse_submodules &&
-	    (show_stage || show_deleted || show_others || show_unmerged ||
-	     show_killed || show_modified || show_resolve_undo ||
-	     show_valid_bit || show_tag || show_eol))
-		die("ls-files --recurse-submodules can only be used in "
-		    "--cached mode");
+	    (show_deleted || show_others || show_unmerged ||
+	     show_killed || show_modified || show_resolve_undo))
+		die("ls-files --recurse-submodules unsupported mode");
 
 	if (recurse_submodules && error_unmatch)
 		die("ls-files --recurse-submodules does not support "
 		    "--error-unmatch");
 
-	if (recurse_submodules && argc)
-		die("ls-files --recurse-submodules does not support path "
-		    "arguments");
-
 	parse_pathspec(&pathspec, 0,
 		       PATHSPEC_PREFER_CWD |
 		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
 		       prefix, argv);
 
-	/* Find common prefix for all pathspec's */
-	max_prefix = common_prefix(&pathspec);
+	/*
+	 * Find common prefix for all pathspec's
+	 * This is used as a performance optimization which violates correctness
+	 * in the recurse_submodules mode
+	 */
+	if (recurse_submodules)
+		max_prefix = NULL;
+	else
+		max_prefix = common_prefix(&pathspec);
 	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
 
 	/* Treat unmatching pathspec elements as errors */
diff --git a/dir.c b/dir.c
index 0ea235f..630dc7a 100644
--- a/dir.c
+++ b/dir.c
@@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
 	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
 }
 
+static int prefix_fnmatch(const struct pathspec_item *item,
+		   const char *pattern, const char *string,
+		   int prefix)
+{
+	if (prefix > 0) {
+		if (ps_strncmp(item, pattern, string, prefix))
+			return WM_NOMATCH;
+		pattern += prefix;
+		string += prefix;
+	}
+
+	if (item->flags & PATHSPEC_ONESTAR) {
+		return WM_MATCH;
+	} else if (item->magic & PATHSPEC_GLOB) {
+		return wildmatch(pattern, string,
+				 WM_PATHNAME |
+				 (item->magic & PATHSPEC_ICASE ?
+				  WM_CASEFOLD : 0),
+				 NULL);
+	}
+
+	return WM_NOMATCH;
+}
+
 int git_fnmatch(const struct pathspec_item *item,
 		const char *pattern, const char *string,
 		int prefix)
@@ -207,8 +231,9 @@ int within_depth(const char *name, int namelen,
 	return 1;
 }
 
-#define DO_MATCH_EXCLUDE   1
-#define DO_MATCH_DIRECTORY 2
+#define DO_MATCH_EXCLUDE   (1<<0)
+#define DO_MATCH_DIRECTORY (1<<1)
+#define DO_MATCH_SUBMODULE (1<<2)
 
 /*
  * Does 'match' match the given name?
@@ -283,6 +308,24 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
 			 item->nowildcard_len - prefix))
 		return MATCHED_FNMATCH;
 
+	/* Perform checks to see if "name" is a super set of the pathspec */
+	if (flags & DO_MATCH_SUBMODULE) {
+		int matched = 0;
+
+		/* Check if the name is a literal prefix of the pathspec */
+		if ((item->match[namelen] == '/') &&
+		    !ps_strncmp(item, match, name, namelen)) {
+			matched = MATCHED_RECURSIVELY;
+		/* Check if the name wildmatches to the pathspec */
+		} else if (item->nowildcard_len < item->len &&
+			   !prefix_fnmatch(item, match, name,
+					   item->nowildcard_len - prefix)) {
+			matched = MATCHED_FNMATCH;
+		}
+
+		return matched;
+	}
+
 	return 0;
 }
 
@@ -386,6 +429,21 @@ int match_pathspec(const struct pathspec *ps,
 	return negative ? 0 : positive;
 }
 
+/**
+ * Check if a submodule is a superset of the pathspec
+ */
+int submodule_path_match(const struct pathspec *ps,
+			 const char *submodule_name,
+			 char *seen)
+{
+	int matched = do_match_pathspec(ps, submodule_name,
+					strlen(submodule_name),
+					0, seen,
+					DO_MATCH_DIRECTORY |
+					DO_MATCH_SUBMODULE);
+	return matched;
+}
+
 int report_path_error(const char *ps_matched,
 		      const struct pathspec *pathspec,
 		      const char *prefix)
diff --git a/dir.h b/dir.h
index da1a858..97c83bb 100644
--- a/dir.h
+++ b/dir.h
@@ -304,6 +304,10 @@ extern int git_fnmatch(const struct pathspec_item *item,
 		       const char *pattern, const char *string,
 		       int prefix);
 
+extern int submodule_path_match(const struct pathspec *ps,
+				const char *submodule_name,
+				char *seen);
+
 static inline int ce_path_match(const struct cache_entry *ce,
 				const struct pathspec *pathspec,
 				char *seen)
diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index caf3815..977f85c 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -69,9 +69,63 @@ test_expect_success 'ls-files recurses more than 1 level' '
 	test_cmp expect actual
 '
 
-test_expect_success '--recurse-submodules does not support using path arguments' '
-	test_must_fail git ls-files --recurse-submodules b 2>actual &&
-	test_i18ngrep "does not support path arguments" actual
+test_expect_success '--recurse-submodules and pathspecs setup' '
+	echo e >submodule/subsub/e.txt &&
+	git -C submodule/subsub add e.txt &&
+	git -C submodule/subsub commit -m "adding e.txt" &&
+	echo f >submodule/f.TXT &&
+	echo g >submodule/g.txt &&
+	git -C submodule add f.TXT g.txt &&
+	git -C submodule commit -m "add f and g" &&
+	echo h >h.txt &&
+	git add h.txt &&
+	git commit -m "add h" &&
+
+	cat >expect <<-\EOF &&
+	.gitmodules
+	a
+	b/b
+	h.txt
+	submodule/.gitmodules
+	submodule/c
+	submodule/f.TXT
+	submodule/g.txt
+	submodule/subsub/d
+	submodule/subsub/e.txt
+	EOF
+
+	git ls-files --recurse-submodules "*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--recurse-submodules and pathspecs' '
+	cat >expect <<-\EOF &&
+	h.txt
+	submodule/g.txt
+	submodule/subsub/e.txt
+	EOF
+
+	git ls-files --recurse-submodules "*.txt" >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	h.txt
+	submodule/f.TXT
+	submodule/g.txt
+	submodule/subsub/e.txt
+	EOF
+
+	git ls-files --recurse-submodules ":(icase)*.txt" >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	h.txt
+	submodule/f.TXT
+	submodule/g.txt
+	EOF
+
+	git ls-files --recurse-submodules ":(icase)*.txt" ":(exclude)submodule/subsub/*" >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success '--recurse-submodules does not support --error-unmatch' '
@@ -82,18 +136,14 @@ test_expect_success '--recurse-submodules does not support --error-unmatch' '
 test_incompatible_with_recurse_submodules () {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
 		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
-		test_i18ngrep 'can only be used in --cached mode' actual
+		test_i18ngrep 'unsupported mode' actual
 	"
 }
 
-test_incompatible_with_recurse_submodules -v
-test_incompatible_with_recurse_submodules -t
 test_incompatible_with_recurse_submodules --deleted
 test_incompatible_with_recurse_submodules --modified
 test_incompatible_with_recurse_submodules --others
-test_incompatible_with_recurse_submodules --stage
 test_incompatible_with_recurse_submodules --killed
 test_incompatible_with_recurse_submodules --unmerged
-test_incompatible_with_recurse_submodules --eol
 
 test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
@ 2016-09-17  3:46               ` Junio C Hamano
  2016-09-18 18:40                 ` Brandon Williams
  2016-09-19 18:18               ` [PATCH] " Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-17  3:46 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

Brandon Williams <bmwill@google.com> writes:

> ...
>  		[--full-name] [--recurse-submodules]
> -		[--output-path-prefix=<path>]
> +		[--submodule-prefix=<path>]
>  		[--abbrev] [--] [<file>...]
>  
> ---output-path-prefix=<path>::
> +--submodule-prefix=<path>::
>  	Prepend the provided path to the output of each file
> ...
>  static int show_eol;
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
>  static int recurse_submodules;
> ...
>  	static struct strbuf full_name = STRBUF_INIT;
> -	if (output_path_prefix && *output_path_prefix) {
> +	if (submodule_prefix && *submodule_prefix) {
>  		strbuf_reset(&full_name);
> -		strbuf_addstr(&full_name, output_path_prefix);
> +		strbuf_addstr(&full_name, submodule_prefix);
>  		strbuf_addstr(&full_name, name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

> +	/* Add pathspec args */
> +	argv_array_push(&cp.args, "--");
> +	for (i = 0; i < pathspec.nr; ++i)
> +		argv_array_push(&cp.args, pathspec.items[i].original);

OK, so as discussed previously with Heiko and Stefan, the idea is to

 - pass the original pathspec as-is,

 - when --submodule-prefix is given, a path discovered in a
   submodule repository is first prefixed with that string before
   getting checked to see if it matches the original pathspec.

And this loop is about relaying the original pathspec.

> @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
>  
>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>  {
> +	struct strbuf name = STRBUF_INIT;
>  	int len = max_prefix_len;
> +	if (submodule_prefix)
> +		strbuf_addstr(&name, submodule_prefix);
> +	strbuf_addstr(&name, ce->name);
>  
>  	if (len >= ce_namelen(ce))
> -		die("git ls-files: internal error - cache entry not superset of prefix");
> +		die("git ls-files: internal error - cache entry not "
> +		    "superset of prefix");

This is not such a great thing to do.  Upon a bug report, we can no
longer do

	git grep 'cache entry not superset'

to see where the error message is coming from.

> -	if (!match_pathspec(&pathspec, ce->name, ce_namelen(ce),
> -			    len, ps_matched,
> -			    S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
> -		return;
> -	if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
> +	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +	    submodule_path_match(&pathspec, name.buf, ps_matched)) {
>  		show_gitlink(ce);
> -		return;
> -	}
> +	} else if (match_pathspec(&pathspec, name.buf, name.len,
> +				  len, ps_matched,
> +				  S_ISDIR(ce->ce_mode) ||
> +				  S_ISGITLINK(ce->ce_mode))) {
> +		if (tag && *tag && show_valid_bit &&
> + ...

Argh.  If we had a preparatory clean-up step, would it have helped
to avoid this big re-indentation that makes the patch harder to read
than necessary, I wonder?

Another way would have been to "goto" from the end of this block

> +	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +	    submodule_path_match(&pathspec, name.buf, ps_matched)) {

where we used to "return" out to the central clean-up location, i.e.
here.

> +	strbuf_release(&name);
>  }


>  	parse_pathspec(&pathspec, 0,
>  		       PATHSPEC_PREFER_CWD |
>  		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>  		       prefix, argv);
>  
> -	/* Find common prefix for all pathspec's */
> -	max_prefix = common_prefix(&pathspec);
> +	/*
> +	 * Find common prefix for all pathspec's
> +	 * This is used as a performance optimization which violates correctness
> +	 * in the recurse_submodules mode
> +	 */

The two new lines phrase it overly negatively and also misleading.
I thought you were saying "We do this as optimization anyway; damn
the correctness in the submodule case!" in my first reading before
reading the statements the comment talks about.  "This optimization
unfortunately cannot be done when recursing into submodules" would
have been better.

> +	if (recurse_submodules)
> +		max_prefix = NULL;
> +	else
> +		max_prefix = common_prefix(&pathspec);
>  	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;

> diff --git a/dir.c b/dir.c
> index 0ea235f..630dc7a 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
>  	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +static int prefix_fnmatch(const struct pathspec_item *item,
> +		   const char *pattern, const char *string,
> +		   int prefix)
> +{

Is this meant to be free of false positives, free of false
negatives, or exact?  I think you use it to decide, without knowing
what kind of paths the submodule contains, if it is worth descending
into it, so as long as you definitively say "The pathspec can never
match anything in the submodule" with WM_NOMATCH, it is OK if you
returned WM_MATCH when it actually couldn't match anything.  I.e. it
is OK to give false positive but it is a bug to give false negative.

The answer to the above question should be a good explanation to
prepend as /* comment */ before the function.

> +	if (prefix > 0) {
> +		if (ps_strncmp(item, pattern, string, prefix))
> +			return WM_NOMATCH;

This says: when we have a set prefix that must literally match, and
that part does not match what we have, it cannot possibly match.

Is that correct?  What do we have in "name" and "item" at this
point?  We disable the common-prefix optimization, so we do not have
to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
giving you "sub/dir" as the common prefix, when you are wondering if
it is worth descending into "sub/" without knowing what it contains.
Is that what guarantees why this part is correct?

> +		pattern += prefix;
> +		string += prefix;
> +	}
> +
> +	if (item->flags & PATHSPEC_ONESTAR) {
> +		return WM_MATCH;

We have a pathspec that has a segment without wildcard letters,
followed by a '*', and there is no wildcard letters after that
asterisk.  We punt and assume it might match, which is OK for the
purpose of not giving a false negative.

> +	} else if (item->magic & PATHSPEC_GLOB) {
> +		return wildmatch(pattern, string,
> +				 WM_PATHNAME |
> +				 (item->magic & PATHSPEC_ICASE ?
> +				  WM_CASEFOLD : 0),
> +				 NULL);

What does this say?  If we are using the :(glob) semantics, which is
the default, we'll ask wildmatch() to see the remainder of the
pattern (after stripping the fixed prefix part if necessary) matches
the string (which also may have lost the prefix that we already know
matches).

Is that correct?  I think it depends on what "string" is being fed,
but I am assuing that you are working in the top-level project here
to decide if it is worth descending into a submodule.  If the item
is sub/dir?/*.c and we are considering "sub/" submodule, wildmatch
would not say "It could match" if "string" is "sub/".  Perhaps I am
reading the patch incorrectly.  Let me read on to see what the caller
does later.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-17  3:46               ` Junio C Hamano
@ 2016-09-18 18:40                 ` Brandon Williams
  2016-09-19 17:00                   ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-18 18:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

On Fri, Sep 16, 2016 at 8:46 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> As the previous one that used a wrong (sorry) argument is not even
> in 'next' yet, let's pretend that it never happened.  It is OK to
> still keep it and this patch as two separate steps, i.e. a topic
> with two patches in it.

Yeah since I've been using what I built in the last patch to test the pathspecs
I based this patch off of what you have in bw/ls-files... branch.

>
>> +     /* Add pathspec args */
>> +     argv_array_push(&cp.args, "--");
>> +     for (i = 0; i < pathspec.nr; ++i)
>> +             argv_array_push(&cp.args, pathspec.items[i].original);
>
> OK, so as discussed previously with Heiko and Stefan, the idea is to
>
>  - pass the original pathspec as-is,
>
>  - when --submodule-prefix is given, a path discovered in a
>    submodule repository is first prefixed with that string before
>    getting checked to see if it matches the original pathspec.
>
> And this loop is about relaying the original pathspec.

Exactly.  Perhaps I should have made this more clear either with a
detailed comment or
more information in the commit msg.

>> @@ -192,57 +210,63 @@ static void show_gitlink(const struct cache_entry *ce)
>>
>>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>>  {
>> +     struct strbuf name = STRBUF_INIT;
>>       int len = max_prefix_len;
>> +     if (submodule_prefix)
>> +             strbuf_addstr(&name, submodule_prefix);
>> +     strbuf_addstr(&name, ce->name);
>>
>>       if (len >= ce_namelen(ce))
>> -             die("git ls-files: internal error - cache entry not superset of prefix");
>> +             die("git ls-files: internal error - cache entry not "
>> +                 "superset of prefix");
>
> This is not such a great thing to do.  Upon a bug report, we can no
> longer do
>
>         git grep 'cache entry not superset'
>
> to see where the error message is coming from.

Oh i wasn't really thinking about that.  I simply noticed that the
line was longer than
80 characters and figured I should break it to adhere to style guidelines.

>
>> -     if (!match_pathspec(&pathspec, ce->name, ce_namelen(ce),
>> -                         len, ps_matched,
>> -                         S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
>> -             return;
>> -     if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
>> +     if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
>> +         submodule_path_match(&pathspec, name.buf, ps_matched)) {
>>               show_gitlink(ce);
>> -             return;
>> -     }
>> +     } else if (match_pathspec(&pathspec, name.buf, name.len,
>> +                               len, ps_matched,
>> +                               S_ISDIR(ce->ce_mode) ||
>> +                               S_ISGITLINK(ce->ce_mode))) {
>> +             if (tag && *tag && show_valid_bit &&
>> + ...
>
> Argh.  If we had a preparatory clean-up step, would it have helped
> to avoid this big re-indentation that makes the patch harder to read
> than necessary, I wonder?
>
> Another way would have been to "goto" from the end of this block
>
>> +     if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
>> +         submodule_path_match(&pathspec, name.buf, ps_matched)) {
>
> where we used to "return" out to the central clean-up location, i.e.
> here.
>
>> +     strbuf_release(&name);
>>  }

Yeah, the lack of destructors in C makes this a bit of a challenge when we need
to free up memory.  I've also always been taught to avoid using goto's
as they can
be error prone and lead to making the code more difficult to read.
Hence why I did
 some adjustments to make it so the function had a single exit point
to make it easy
 for cleanup.


>
>
>>       parse_pathspec(&pathspec, 0,
>>                      PATHSPEC_PREFER_CWD |
>>                      PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
>>                      prefix, argv);
>>
>> -     /* Find common prefix for all pathspec's */
>> -     max_prefix = common_prefix(&pathspec);
>> +     /*
>> +      * Find common prefix for all pathspec's
>> +      * This is used as a performance optimization which violates correctness
>> +      * in the recurse_submodules mode
>> +      */
>
> The two new lines phrase it overly negatively and also misleading.
> I thought you were saying "We do this as optimization anyway; damn
> the correctness in the submodule case!" in my first reading before
> reading the statements the comment talks about.  "This optimization
> unfortunately cannot be done when recursing into submodules" would
> have been better.

haha yeah, I wasn't trying to be overly negative!  I agree that your
wording makes more sense.

>
>> +     if (recurse_submodules)
>> +             max_prefix = NULL;
>> +     else
>> +             max_prefix = common_prefix(&pathspec);
>>       max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
>
>> diff --git a/dir.c b/dir.c
>> index 0ea235f..630dc7a 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -63,6 +63,30 @@ int fspathncmp(const char *a, const char *b, size_t count)
>>       return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>>  }
>>
>> +static int prefix_fnmatch(const struct pathspec_item *item,
>> +                const char *pattern, const char *string,
>> +                int prefix)
>> +{
>
> Is this meant to be free of false positives, free of false
> negatives, or exact?  I think you use it to decide, without knowing
> what kind of paths the submodule contains, if it is worth descending
> into it, so as long as you definitively say "The pathspec can never
> match anything in the submodule" with WM_NOMATCH, it is OK if you
> returned WM_MATCH when it actually couldn't match anything.  I.e. it
> is OK to give false positive but it is a bug to give false negative.
>
> The answer to the above question should be a good explanation to
> prepend as /* comment */ before the function.

Yeah the idea is the parent project has no knowledge of the file
structure of the
child so if the submodule name matches a prefix of the pathspec then it could
potentially apply to a file inside the submodule.  So it can
definitely give false
positives since we could descend into the submodule and discover that it doesn't
match anything there.  I can add a comment to that effect.

>
>> +     if (prefix > 0) {
>> +             if (ps_strncmp(item, pattern, string, prefix))
>> +                     return WM_NOMATCH;
>
> This says: when we have a set prefix that must literally match, and
> that part does not match what we have, it cannot possibly match.
>
> Is that correct?  What do we have in "name" and "item" at this
> point?  We disable the common-prefix optimization, so we do not have
> to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
> giving you "sub/dir" as the common prefix, when you are wondering if
> it is worth descending into "sub/" without knowing what it contains.
> Is that what guarantees why this part is correct?

I adopted this structure from another part of the code.  The caller
uses a field in
the pathspec item which indicates the location of the first wildcard character.
So the prefix (everything prior to the wildcard char) must match
literally before
we drop into a more expensive wildmatch function.


>
>> +             pattern += prefix;
>> +             string += prefix;
>> +     }
>> +
>> +     if (item->flags & PATHSPEC_ONESTAR) {
>> +             return WM_MATCH;
>
> We have a pathspec that has a segment without wildcard letters,
> followed by a '*', and there is no wildcard letters after that
> asterisk.  We punt and assume it might match, which is OK for the
> purpose of not giving a false negative.

Exactly.  Currently if you give *.txt as a pathspec it can match every
txt file no
matter which directory its in.  Because of this if we hit a * we can
punt and hand
it off to the submodule to perform more accurate matching.

>
>> +     } else if (item->magic & PATHSPEC_GLOB) {
>> +             return wildmatch(pattern, string,
>> +                              WM_PATHNAME |
>> +                              (item->magic & PATHSPEC_ICASE ?
>> +                               WM_CASEFOLD : 0),
>> +                              NULL);
>
> What does this say?  If we are using the :(glob) semantics, which is
> the default, we'll ask wildmatch() to see the remainder of the
> pattern (after stripping the fixed prefix part if necessary) matches
> the string (which also may have lost the prefix that we already know
> matches).
>
> Is that correct?  I think it depends on what "string" is being fed,
> but I am assuing that you are working in the top-level project here
> to decide if it is worth descending into a submodule.  If the item
> is sub/dir?/*.c and we are considering "sub/" submodule, wildmatch
> would not say "It could match" if "string" is "sub/".  Perhaps I am
> reading the patch incorrectly.  Let me read on to see what the caller
> does later.

Actually thinking on this we may need to add in a special case to the
wildmatch function.
Since right now I believe that a string must completely match against
the pattern passed
in to wildmatch as apposed to wanting the string to match a prefix of
the pattern.

-Brandon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-18 18:40                 ` Brandon Williams
@ 2016-09-19 17:00                   ` Junio C Hamano
  2016-09-19 17:26                     ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-19 17:00 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

Brandon Williams <bmwill@google.com> writes:

>> OK, so as discussed previously with Heiko and Stefan, the idea is to
>>
>>  - pass the original pathspec as-is,
>>
>>  - when --submodule-prefix is given, a path discovered in a
>>    submodule repository is first prefixed with that string before
>>    getting checked to see if it matches the original pathspec.
>>
>> And this loop is about relaying the original pathspec.
>
> Exactly.  Perhaps I should have made this more clear either with a
> detailed comment or
> more information in the commit msg.

I think you were clear enough.

Don't read everything other people say in their reviews as pointing
out issues.  Often trying to rephrase what they read in the code in
their own words is a good way to make sure the reviewers and the
original author are on the same page.  The above was one of these
cases.

>>> +     if (prefix > 0) {
>>> +             if (ps_strncmp(item, pattern, string, prefix))
>>> +                     return WM_NOMATCH;
>>
>> This says: when we have a set prefix that must literally match, and
>> that part does not match what we have, it cannot possibly match.
>>
>> Is that correct?  What do we have in "name" and "item" at this
>> point?  We disable the common-prefix optimization, so we do not have
>> to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
>> giving you "sub/dir" as the common prefix, when you are wondering if
>> it is worth descending into "sub/" without knowing what it contains.
>> Is that what guarantees why this part is correct?
>
> I adopted this structure from another part of the code.  The caller
> uses a field in
> the pathspec item which indicates the location of the first wildcard character.
> So the prefix (everything prior to the wildcard char) must match
> literally before
> we drop into a more expensive wildmatch function.

"Another part of the code" is about tree walking, right?  Weren't
you saying that part of the code may be buggy or something earlier
(e.g. pathspec "su?/" vs entry "sub")?

Again, what do we have in "name" and "item" at this point?  If we
have a submodule at "sub/" and we are checking a pathspec element
"sub/dir1/*", what is the non-wildcard part of the pathspec and what
is the "string"?  Aren't then "sub/dir1/" and "sub/" respectively,
which would not pass ps_strncmp() and produce a (false) negative?

I am starting to have a feeling that the best we can do in this
function safely is to see if prefix (i.e. the constant part of the
pathspec before the first wildcard) is long enough to cover the
"name" and if "name" part does not match to the directory boundary,
e.g. for this combination

	pathspec = "a/b/sib/c/*"
        name = "a/b/sub/"

we can say with confidence that it is not worth descending into.

When prefix is long enough and "name" and leading part of the prefix
matches to the directory boundary, e.g.

	pathspec = "a/b/sub/c/*"
        name = "a/b/sub/"

we can say it is worth descending into.

If these two checks cannot decide, we may have to be pessimistic and
say "it may match; we don't know until we descend into it".  When
prefix is shorter than name, I am not sure if we can devise a set of
simple rules, e.g.

	pathspec = "a/**/c/*"
        name = "a/b/sub/"

may match with its ** "b/sub" part and worth descending into, so is

	pathspec = "a/b/*/c/*"
        name = "a/b/sub/"

but not this one:

	pathspec = "a/b/su[c-z]/c/*"
        name = "a/b/sub/"

but this is OK:

	pathspec = "a/b/su[a-z]/c/*"
        name = "a/b/sub/"

So I would think we'd be in the business of counting slashes in the
name (called "string" in this function) and the pathspec, while
noticing '*' and '**' in the latter, and we may be able to be more
precise, but I am not sure how complex the end result would become.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 17:00                   ` Junio C Hamano
@ 2016-09-19 17:26                     ` Brandon Williams
  2016-09-19 18:04                       ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-19 17:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

On Mon, Sep 19, 2016 at 10:00 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I think you were clear enough.
>
> Don't read everything other people say in their reviews as pointing
> out issues.  Often trying to rephrase what they read in the code in
> their own words is a good way to make sure the reviewers and the
> original author are on the same page.  The above was one of these
> cases.

That makes sense, I'll be sure to remember that!


>
>>>> +     if (prefix > 0) {
>>>> +             if (ps_strncmp(item, pattern, string, prefix))
>>>> +                     return WM_NOMATCH;
>>>
>>> This says: when we have a set prefix that must literally match, and
>>> that part does not match what we have, it cannot possibly match.
>>>
>>> Is that correct?  What do we have in "name" and "item" at this
>>> point?  We disable the common-prefix optimization, so we do not have
>>> to worry about a pathspec with two elements "sub/dir1/*" and "sub/dir2/*"
>>> giving you "sub/dir" as the common prefix, when you are wondering if
>>> it is worth descending into "sub/" without knowing what it contains.
>>> Is that what guarantees why this part is correct?
>>
>> I adopted this structure from another part of the code.  The caller
>> uses a field in
>> the pathspec item which indicates the location of the first wildcard character.
>> So the prefix (everything prior to the wildcard char) must match
>> literally before
>> we drop into a more expensive wildmatch function.
>
> "Another part of the code" is about tree walking, right?  Weren't
> you saying that part of the code may be buggy or something earlier
> (e.g. pathspec "su?/" vs entry "sub")?

I was refering to the logic that is used to do normal pathspec checking:
'match_pathspec' and the functions it calls, in particular
git_fnmatch.  And the bug I
pointed out before, I believe is due to how the wildmatch function
works.  It requires
that the pattern and the string being compared must match exactly (in
length as well).
The "su?/" case would drop into wildmatch funciton and wouldn't match
against any file
in the directory "sub/file1" for example, because it doesn't exactly
match "su?/".  In the
case of "sub" there is logic prior to the wildmatch function call
which would classify it a
match and return before descending into wildmatch.

> Again, what do we have in "name" and "item" at this point?  If we
> have a submodule at "sub/" and we are checking a pathspec element
> "sub/dir1/*", what is the non-wildcard part of the pathspec and what
> is the "string"?  Aren't then "sub/dir1/" and "sub/" respectively,
> which would not pass ps_strncmp() and produce a (false) negative?

item will be the pathspec_item struct that we are trying to match against.
name will be the file we are trying to match, which should already have the
'prefix' cut off (this is the prefix that is used as an optimization
in the common
case, which isn't used in the submodule case).  The 'prefix' in this function's
context is the part of the pattern prior to the first wildcard character. which
we can do a literal comparison on before descending into the wildmatch function.

> I am starting to have a feeling that the best we can do in this
> function safely is to see if prefix (i.e. the constant part of the
> pathspec before the first wildcard) is long enough to cover the
> "name" and if "name" part does not match to the directory boundary,
> e.g. for this combination
>
>         pathspec = "a/b/sib/c/*"
>         name = "a/b/sub/"
>
> we can say with confidence that it is not worth descending into.
>
> When prefix is long enough and "name" and leading part of the prefix
> matches to the directory boundary, e.g.
>
>         pathspec = "a/b/sub/c/*"
>         name = "a/b/sub/"
>
> we can say it is worth descending into.
>
> If these two checks cannot decide, we may have to be pessimistic and
> say "it may match; we don't know until we descend into it".  When
> prefix is shorter than name, I am not sure if we can devise a set of
> simple rules, e.g.
>
>         pathspec = "a/**/c/*"
>         name = "a/b/sub/"
>
> may match with its ** "b/sub" part and worth descending into, so is
>
>         pathspec = "a/b/*/c/*"
>         name = "a/b/sub/"
>
> but not this one:
>
>         pathspec = "a/b/su[c-z]/c/*"
>         name = "a/b/sub/"
>
> but this is OK:
>
>         pathspec = "a/b/su[a-z]/c/*"
>         name = "a/b/sub/"
>
> So I would think we'd be in the business of counting slashes in the
> name (called "string" in this function) and the pathspec, while
> noticing '*' and '**' in the latter, and we may be able to be more
> precise, but I am not sure how complex the end result would become.
>

I agree, I'm not too sure how much more complex the logic would need
to be to handle
all matters of wildcard characters.  We could initially be more
lenient on what qualifies as
a match and then later (or in the near future) revisit the wildmatch
function (which is complex)
and see if we can add better matching capabilities more suited for
submodules while at the
same time fixing that bug discussed above.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 17:26                     ` Brandon Williams
@ 2016-09-19 18:04                       ` Junio C Hamano
  2016-09-19 18:20                         ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-19 18:04 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

Brandon Williams <bmwill@google.com> writes:

>> Again, what do we have in "name" and "item" at this point?  If we
>> have a submodule at "sub/" and we are checking a pathspec element
>> "sub/dir1/*", what is the non-wildcard part of the pathspec and what
>> is the "string"?  Aren't they "sub/dir1/" and "sub/" respectively,
>> which would not pass ps_strncmp() and produce a (false) negative?
>
> item will be the pathspec_item struct that we are trying to match against.

... which would mean "sub/dir1/" in the above example (which is
followed by '*' that is wildcard).

> name will be the file we are trying to match, which should already have the
> 'prefix' cut off (this is the prefix that is used as an optimization
> in the common
> case, which isn't used in the submodule case).  

... which would be "sub/" in the above example, because we disable
the common-prefix optimization.

So in short, the answer to the last questions in the first quoted
paragraph are yes, yes, and "no they do not pass ps_strncmp()"?

>> I am starting to have a feeling that the best we can do in this
>> function safely is to see if prefix (i.e. the constant part of the
>> pathspec before the first wildcard) is long enough to cover the
>> "name" and if "name" part [matches or does not match] ...
>> If these two checks cannot decide, we may have to be pessimistic and
>> say "it may match; we don't know until we descend into it".
>> ...
>> So I would think we'd be in the business of counting slashes in the
>> name (called "string" in this function) and the pathspec, while
>> noticing '*' and '**' in the latter, and we may be able to be more
>> precise, but I am not sure how complex the end result would become.
>
> I agree, I'm not too sure how much more complex the logic would need
> to be to handle
> all matters of wildcard characters.  We could initially be more
> lenient on what qualifies as
> a match and then later (or in the near future) revisit the wildmatch
> function (which is complex)
> and see if we can add better matching capabilities more suited for
> submodules while at the
> same time fixing that bug discussed above.

I think it is reasonable to start a function that is meant to never
have false negatives pessimistic and return "might match" from it
when in doubt.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
  2016-09-17  3:46               ` Junio C Hamano
@ 2016-09-19 18:18               ` Junio C Hamano
  1 sibling, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2016-09-19 18:18 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

>  static void show_ce_entry(const char *tag, const struct cache_entry *ce)
>  {
> +	struct strbuf name = STRBUF_INIT;
>  	int len = max_prefix_len;
> +	if (submodule_prefix)
> +		strbuf_addstr(&name, submodule_prefix);
> +	strbuf_addstr(&name, ce->name);

Continuing with the previous review, which concentrated on what
happens in the superproject; let's see what happens in the recursive
invocation in a submodule.

So a recursively spawned "ls-files --submodule-prefix=sub/" finds
a path in the index PATH and forms "sub/PATH" in "name".  From here
on, where we used to match pathspec against ce->name, we would be
matching it against name->buf.

> +	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +	    submodule_path_match(&pathspec, name.buf, ps_matched)) {
>  		show_gitlink(ce);

This is primarily what happens in the superproject to decide if the
submodule is worth showing.  When we are in a submodule, we can
descend into subsubmodule (if our ls-files run in the superproject
passed --recurse-submodule down) from here.

> +	} else if (match_pathspec(&pathspec, name.buf, name.len,
> +				  len, ps_matched,
> +				  S_ISDIR(ce->ce_mode) ||
> +				  S_ISGITLINK(ce->ce_mode))) {

This is interesting bit to see what happens in the recursive
invocation.  It uses the usual match_pathspec(), as we want to be
precise and correct, iow, we do not want to use DO_MATCH_SUBMODULE,
aka "it might be worth descending into submodule".

> +		if (tag && *tag && show_valid_bit &&
> +		    (ce->ce_flags & CE_VALID)) {
> +...
> +		}
> +		write_eolinfo(ce, ce->name);
> +		write_name(ce->name);

The prefixing is taken care of by write_name(), so it is correct to
use ce->name here.

> ...  
> +	strbuf_release(&name);
>  }

OK, everything I saw so far for the recursive invocation here makes
sense.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 18:04                       ` Junio C Hamano
@ 2016-09-19 18:20                         ` Brandon Williams
  2016-09-19 18:22                           ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-19 18:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

On Mon, Sep 19, 2016 at 11:04 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>>> Again, what do we have in "name" and "item" at this point?  If we
>>> have a submodule at "sub/" and we are checking a pathspec element
>>> "sub/dir1/*", what is the non-wildcard part of the pathspec and what
>>> is the "string"?  Aren't they "sub/dir1/" and "sub/" respectively,
>>> which would not pass ps_strncmp() and produce a (false) negative?
>>
>> item will be the pathspec_item struct that we are trying to match against.
>
> ... which would mean "sub/dir1/" in the above example (which is
> followed by '*' that is wildcard).
>
>> name will be the file we are trying to match, which should already have the
>> 'prefix' cut off (this is the prefix that is used as an optimization
>> in the common
>> case, which isn't used in the submodule case).
>
> ... which would be "sub/" in the above example, because we disable
> the common-prefix optimization.
>
> So in short, the answer to the last questions in the first quoted
> paragraph are yes, yes, and "no they do not pass ps_strncmp()"?

Yes in that case it wouldn't have passed ps_strncmp()...but we should have never
made it there in the first place due to a piece of logic in match_pathspec_item:

@@ -283,6 +308,24 @@ static int match_pathspec_item(const struct
pathspec_item *item, int prefix,
                         item->nowildcard_len - prefix))
                return MATCHED_FNMATCH;

+       /* Perform checks to see if "name" is a super set of the pathspec */
+       if (flags & DO_MATCH_SUBMODULE) {
+               int matched = 0;
+
+               /* Check if the name is a literal prefix of the pathspec */
+               if ((item->match[namelen] == '/') &&
+                   !ps_strncmp(item, match, name, namelen)) {
+                       matched = MATCHED_RECURSIVELY;
+               /* Check if the name wildmatches to the pathspec */
+               } else if (item->nowildcard_len < item->len &&
+                          !prefix_fnmatch(item, match, name,
+                                          item->nowildcard_len - prefix)) {
+                       matched = MATCHED_FNMATCH;
+               }
+
+               return matched;
+       }

Perhaps the call structure and code organization could be changes a bit to make
a little more sense.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 18:20                         ` Brandon Williams
@ 2016-09-19 18:22                           ` Junio C Hamano
  2016-09-19 18:30                             ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-19 18:22 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

Brandon Williams <bmwill@google.com> writes:

> Yes in that case it wouldn't have passed ps_strncmp()...but we should have never
> made it there in the first place due to a piece of logic in match_pathspec_item:

Ah, OK.

Thanks for clarifying.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 18:22                           ` Junio C Hamano
@ 2016-09-19 18:30                             ` Brandon Williams
  2016-09-19 18:34                               ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-19 18:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

side question if the answer is short:  Any reason as to why all of the
pathspec matching code lives inside of dir.c and not pathspec.c?

On Mon, Sep 19, 2016 at 11:22 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>> Yes in that case it wouldn't have passed ps_strncmp()...but we should have never
>> made it there in the first place due to a piece of logic in match_pathspec_item:
>
> Ah, OK.
>
> Thanks for clarifying.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 18:30                             ` Brandon Williams
@ 2016-09-19 18:34                               ` Junio C Hamano
  2016-09-19 18:35                                 ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-19 18:34 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

Brandon Williams <bmwill@google.com> writes:

> side question if the answer is short:  Any reason as to why all of the
> pathspec matching code lives inside of dir.c and not pathspec.c?

Hysterical Raisins.

A pathspec used to be just a "const char **" in simpler times, which
was no more elaborate than "argv + i" (i.e. after parsing options
and revisions out, the remainders are pathspec elements).  Major
part of the matching was done inside dir.c and we haven't had chance
to look at the larger picture to move the code around.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] ls-files: add pathspec matching for submodules
  2016-09-19 18:34                               ` Junio C Hamano
@ 2016-09-19 18:35                                 ` Brandon Williams
  2016-09-19 18:52                                   ` [PATCH v2] " Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-19 18:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Heiko Voigt, Nguyễn Thái Ngọc Duy, Stefan Beller

I thought as much.  Thanks for the quick explanation :)

On Mon, Sep 19, 2016 at 11:34 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>> side question if the answer is short:  Any reason as to why all of the
>> pathspec matching code lives inside of dir.c and not pathspec.c?
>
> Hysterical Raisins.
>
> A pathspec used to be just a "const char **" in simpler times, which
> was no more elaborate than "argv + i" (i.e. after parsing options
> and revisions out, the remainders are pathspec elements).  Major
> part of the matching was done inside dir.c and we haven't had chance
> to look at the larger picture to move the code around.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2] ls-files: add pathspec matching for submodules
  2016-09-19 18:35                                 ` Brandon Williams
@ 2016-09-19 18:52                                   ` Brandon Williams
  2016-09-19 23:21                                     ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-19 18:52 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Pathspecs can be a bit tricky when trying to apply them to submodules.
The main challenge is that the pathspecs will be with respect to the
super module and not with respect to paths in the submodule.  The
approach this patch takes is to pass in the identical pathspec from the
super module to the submodule in addition to the submodule-prefix, which
is the path from the root of the super module to the submodule, and then
we can compare an entry in the submodule prepended with the
submodule-prefix to the pathspec in order to determine if there is a
match.

This patch also permits the pathspec logic to perform a prefix match against
submodules since a pathspec could refer to a file inside of a submodule.
This can can result in false positive matches since a super module
doesn't know what files could be in the submodule.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-ls-files.txt         |   4 +-
 builtin/ls-files.c                     | 144 ++++++++++++++++++++-------------
 dir.c                                  |  67 ++++++++++++++-
 dir.h                                  |   4 +
 t/t3007-ls-files-recurse-submodules.sh |  66 +++++++++++++--
 5 files changed, 215 insertions(+), 70 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index a623ebf..09e4449 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -19,7 +19,7 @@ SYNOPSIS
 		[--exclude-standard]
 		[--error-unmatch] [--with-tree=<tree-ish>]
 		[--full-name] [--recurse-submodules]
-		[--output-path-prefix=<path>]
+		[--submodule-prefix=<path>]
 		[--abbrev] [--] [<file>...]
 
 DESCRIPTION
@@ -143,7 +143,7 @@ a space) at the start of each line:
 	Recursively calls ls-files on each submodule in the repository.
 	Currently there is only support for the --cached mode.
 
---output-path-prefix=<path>::
+--submodule-prefix=<path>::
 	Prepend the provided path to the output of each file
 
 --abbrev[=<n>]::
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 687e475..1417f44 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -29,7 +29,7 @@ static int show_valid_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
-static const char *output_path_prefix;
+static const char *submodule_prefix;
 static int recurse_submodules;
 
 static const char *prefix;
@@ -78,9 +78,9 @@ static void write_name(const char *name)
 	 * churn.
 	 */
 	static struct strbuf full_name = STRBUF_INIT;
-	if (output_path_prefix && *output_path_prefix) {
+	if (submodule_prefix && *submodule_prefix) {
 		strbuf_reset(&full_name);
-		strbuf_addstr(&full_name, output_path_prefix);
+		strbuf_addstr(&full_name, submodule_prefix);
 		strbuf_addstr(&full_name, name);
 		name = full_name.buf;
 	}
@@ -177,12 +177,34 @@ static void show_gitlink(const struct cache_entry *ce)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status;
+	int i;
 
 	argv_array_push(&cp.args, "ls-files");
 	argv_array_push(&cp.args, "--recurse-submodules");
-	argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/",
-			 output_path_prefix ? output_path_prefix : "",
+	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
+			 submodule_prefix ? submodule_prefix : "",
 			 ce->name);
+	/* add options */
+	if (show_eol)
+		argv_array_push(&cp.args, "--eol");
+	if (show_valid_bit)
+		argv_array_push(&cp.args, "-v");
+	if (show_stage)
+		argv_array_push(&cp.args, "--stage");
+	if (show_cached)
+		argv_array_push(&cp.args, "--cached");
+	if (debug_mode)
+		argv_array_push(&cp.args, "--debug");
+
+	/*
+	 * Pass in the original pathspec args.  The submodule will be
+	 * responsible for prepending the 'submodule_prefix' prior to comparing
+	 * against the pathspec for matches.
+	 */
+	argv_array_push(&cp.args, "--");
+	for (i = 0; i < pathspec.nr; ++i)
+		argv_array_push(&cp.args, pathspec.items[i].original);
+
 	cp.git_cmd = 1;
 	cp.dir = ce->name;
 	status = run_command(&cp);
@@ -192,57 +214,62 @@ static void show_gitlink(const struct cache_entry *ce)
 
 static void show_ce_entry(const char *tag, const struct cache_entry *ce)
 {
+	struct strbuf name = STRBUF_INIT;
 	int len = max_prefix_len;
+	if (submodule_prefix)
+		strbuf_addstr(&name, submodule_prefix);
+	strbuf_addstr(&name, ce->name);
 
 	if (len >= ce_namelen(ce))
 		die("git ls-files: internal error - cache entry not superset of prefix");
 
-	if (!match_pathspec(&pathspec, ce->name, ce_namelen(ce),
-			    len, ps_matched,
-			    S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)))
-		return;
-	if (recurse_submodules && S_ISGITLINK(ce->ce_mode)) {
+	if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+	    submodule_path_match(&pathspec, name.buf, ps_matched)) {
 		show_gitlink(ce);
-		return;
-	}
+	} else if (match_pathspec(&pathspec, name.buf, name.len,
+				  len, ps_matched,
+				  S_ISDIR(ce->ce_mode) ||
+				  S_ISGITLINK(ce->ce_mode))) {
+		if (tag && *tag && show_valid_bit &&
+		    (ce->ce_flags & CE_VALID)) {
+			static char alttag[4];
+			memcpy(alttag, tag, 3);
+			if (isalpha(tag[0]))
+				alttag[0] = tolower(tag[0]);
+			else if (tag[0] == '?')
+				alttag[0] = '!';
+			else {
+				alttag[0] = 'v';
+				alttag[1] = tag[0];
+				alttag[2] = ' ';
+				alttag[3] = 0;
+			}
+			tag = alttag;
+		}
 
-	if (tag && *tag && show_valid_bit &&
-	    (ce->ce_flags & CE_VALID)) {
-		static char alttag[4];
-		memcpy(alttag, tag, 3);
-		if (isalpha(tag[0]))
-			alttag[0] = tolower(tag[0]);
-		else if (tag[0] == '?')
-			alttag[0] = '!';
-		else {
-			alttag[0] = 'v';
-			alttag[1] = tag[0];
-			alttag[2] = ' ';
-			alttag[3] = 0;
+		if (!show_stage) {
+			fputs(tag, stdout);
+		} else {
+			printf("%s%06o %s %d\t",
+			       tag,
+			       ce->ce_mode,
+			       find_unique_abbrev(ce->sha1,abbrev),
+			       ce_stage(ce));
+		}
+		write_eolinfo(ce, ce->name);
+		write_name(ce->name);
+		if (debug_mode) {
+			const struct stat_data *sd = &ce->ce_stat_data;
+
+			printf("  ctime: %d:%d\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
+			printf("  mtime: %d:%d\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
+			printf("  dev: %d\tino: %d\n", sd->sd_dev, sd->sd_ino);
+			printf("  uid: %d\tgid: %d\n", sd->sd_uid, sd->sd_gid);
+			printf("  size: %d\tflags: %x\n", sd->sd_size, ce->ce_flags);
 		}
-		tag = alttag;
 	}
 
-	if (!show_stage) {
-		fputs(tag, stdout);
-	} else {
-		printf("%s%06o %s %d\t",
-		       tag,
-		       ce->ce_mode,
-		       find_unique_abbrev(ce->sha1,abbrev),
-		       ce_stage(ce));
-	}
-	write_eolinfo(ce, ce->name);
-	write_name(ce->name);
-	if (debug_mode) {
-		const struct stat_data *sd = &ce->ce_stat_data;
-
-		printf("  ctime: %d:%d\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
-		printf("  mtime: %d:%d\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
-		printf("  dev: %d\tino: %d\n", sd->sd_dev, sd->sd_ino);
-		printf("  uid: %d\tgid: %d\n", sd->sd_uid, sd->sd_gid);
-		printf("  size: %d\tflags: %x\n", sd->sd_size, ce->ce_flags);
-	}
+	strbuf_release(&name);
 }
 
 static void show_ru_info(void)
@@ -510,7 +537,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		{ OPTION_SET_INT, 0, "full-name", &prefix_len, NULL,
 			N_("make the output relative to the project top directory"),
 			PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL },
-		OPT_STRING(0, "output-path-prefix", &output_path_prefix,
+		OPT_STRING(0, "submodule-prefix", &submodule_prefix,
 			N_("path"), N_("prepend <path> to each file")),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			N_("recurse through submodules")),
@@ -566,27 +593,28 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		setup_work_tree();
 
 	if (recurse_submodules &&
-	    (show_stage || show_deleted || show_others || show_unmerged ||
-	     show_killed || show_modified || show_resolve_undo ||
-	     show_valid_bit || show_tag || show_eol))
-		die("ls-files --recurse-submodules can only be used in "
-		    "--cached mode");
+	    (show_deleted || show_others || show_unmerged ||
+	     show_killed || show_modified || show_resolve_undo))
+		die("ls-files --recurse-submodules unsupported mode");
 
 	if (recurse_submodules && error_unmatch)
 		die("ls-files --recurse-submodules does not support "
 		    "--error-unmatch");
 
-	if (recurse_submodules && argc)
-		die("ls-files --recurse-submodules does not support path "
-		    "arguments");
-
 	parse_pathspec(&pathspec, 0,
 		       PATHSPEC_PREFER_CWD |
 		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
 		       prefix, argv);
 
-	/* Find common prefix for all pathspec's */
-	max_prefix = common_prefix(&pathspec);
+	/*
+	 * Find common prefix for all pathspec's
+	 * This is used as a performance optimization which unfortunately cannot
+	 * be done when recursing into submodules
+	 */
+	if (recurse_submodules)
+		max_prefix = NULL;
+	else
+		max_prefix = common_prefix(&pathspec);
 	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
 
 	/* Treat unmatching pathspec elements as errors */
diff --git a/dir.c b/dir.c
index 0ea235f..1afc3ff 100644
--- a/dir.c
+++ b/dir.c
@@ -63,6 +63,35 @@ int fspathncmp(const char *a, const char *b, size_t count)
 	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
 }
 
+/**
+ * Used to perform prefix matching against a pathspec_item for determining if we
+ * should decend into a submodule.  This function can result in false positives
+ * since we are only trying to match the 'string' to a prefix of the 'pattern'
+ */
+static int prefix_fnmatch(const struct pathspec_item *item,
+		   const char *pattern, const char *string,
+		   int prefix)
+{
+	if (prefix > 0) {
+		if (ps_strncmp(item, pattern, string, prefix))
+			return WM_NOMATCH;
+		pattern += prefix;
+		string += prefix;
+	}
+
+	if (item->flags & PATHSPEC_ONESTAR) {
+		return WM_MATCH;
+	} else if (item->magic & PATHSPEC_GLOB) {
+		return wildmatch(pattern, string,
+				 WM_PATHNAME |
+				 (item->magic & PATHSPEC_ICASE ?
+				  WM_CASEFOLD : 0),
+				 NULL);
+	}
+
+	return WM_NOMATCH;
+}
+
 int git_fnmatch(const struct pathspec_item *item,
 		const char *pattern, const char *string,
 		int prefix)
@@ -207,8 +236,9 @@ int within_depth(const char *name, int namelen,
 	return 1;
 }
 
-#define DO_MATCH_EXCLUDE   1
-#define DO_MATCH_DIRECTORY 2
+#define DO_MATCH_EXCLUDE   (1<<0)
+#define DO_MATCH_DIRECTORY (1<<1)
+#define DO_MATCH_SUBMODULE (1<<2)
 
 /*
  * Does 'match' match the given name?
@@ -283,6 +313,24 @@ static int match_pathspec_item(const struct pathspec_item *item, int prefix,
 			 item->nowildcard_len - prefix))
 		return MATCHED_FNMATCH;
 
+	/* Perform checks to see if "name" is a super set of the pathspec */
+	if (flags & DO_MATCH_SUBMODULE) {
+		int matched = 0;
+
+		/* Check if the name is a literal prefix of the pathspec */
+		if ((item->match[namelen] == '/') &&
+		    !ps_strncmp(item, match, name, namelen)) {
+			matched = MATCHED_RECURSIVELY;
+		/* Check if the name wildmatches to the pathspec */
+		} else if (item->nowildcard_len < item->len &&
+			   !prefix_fnmatch(item, match, name,
+					   item->nowildcard_len - prefix)) {
+			matched = MATCHED_FNMATCH;
+		}
+
+		return matched;
+	}
+
 	return 0;
 }
 
@@ -386,6 +434,21 @@ int match_pathspec(const struct pathspec *ps,
 	return negative ? 0 : positive;
 }
 
+/**
+ * Check if a submodule is a superset of the pathspec
+ */
+int submodule_path_match(const struct pathspec *ps,
+			 const char *submodule_name,
+			 char *seen)
+{
+	int matched = do_match_pathspec(ps, submodule_name,
+					strlen(submodule_name),
+					0, seen,
+					DO_MATCH_DIRECTORY |
+					DO_MATCH_SUBMODULE);
+	return matched;
+}
+
 int report_path_error(const char *ps_matched,
 		      const struct pathspec *pathspec,
 		      const char *prefix)
diff --git a/dir.h b/dir.h
index da1a858..97c83bb 100644
--- a/dir.h
+++ b/dir.h
@@ -304,6 +304,10 @@ extern int git_fnmatch(const struct pathspec_item *item,
 		       const char *pattern, const char *string,
 		       int prefix);
 
+extern int submodule_path_match(const struct pathspec *ps,
+				const char *submodule_name,
+				char *seen);
+
 static inline int ce_path_match(const struct cache_entry *ce,
 				const struct pathspec *pathspec,
 				char *seen)
diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh
index caf3815..977f85c 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -69,9 +69,63 @@ test_expect_success 'ls-files recurses more than 1 level' '
 	test_cmp expect actual
 '
 
-test_expect_success '--recurse-submodules does not support using path arguments' '
-	test_must_fail git ls-files --recurse-submodules b 2>actual &&
-	test_i18ngrep "does not support path arguments" actual
+test_expect_success '--recurse-submodules and pathspecs setup' '
+	echo e >submodule/subsub/e.txt &&
+	git -C submodule/subsub add e.txt &&
+	git -C submodule/subsub commit -m "adding e.txt" &&
+	echo f >submodule/f.TXT &&
+	echo g >submodule/g.txt &&
+	git -C submodule add f.TXT g.txt &&
+	git -C submodule commit -m "add f and g" &&
+	echo h >h.txt &&
+	git add h.txt &&
+	git commit -m "add h" &&
+
+	cat >expect <<-\EOF &&
+	.gitmodules
+	a
+	b/b
+	h.txt
+	submodule/.gitmodules
+	submodule/c
+	submodule/f.TXT
+	submodule/g.txt
+	submodule/subsub/d
+	submodule/subsub/e.txt
+	EOF
+
+	git ls-files --recurse-submodules "*" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success '--recurse-submodules and pathspecs' '
+	cat >expect <<-\EOF &&
+	h.txt
+	submodule/g.txt
+	submodule/subsub/e.txt
+	EOF
+
+	git ls-files --recurse-submodules "*.txt" >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	h.txt
+	submodule/f.TXT
+	submodule/g.txt
+	submodule/subsub/e.txt
+	EOF
+
+	git ls-files --recurse-submodules ":(icase)*.txt" >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	h.txt
+	submodule/f.TXT
+	submodule/g.txt
+	EOF
+
+	git ls-files --recurse-submodules ":(icase)*.txt" ":(exclude)submodule/subsub/*" >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success '--recurse-submodules does not support --error-unmatch' '
@@ -82,18 +136,14 @@ test_expect_success '--recurse-submodules does not support --error-unmatch' '
 test_incompatible_with_recurse_submodules () {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
 		test_must_fail git ls-files --recurse-submodules $1 2>actual &&
-		test_i18ngrep 'can only be used in --cached mode' actual
+		test_i18ngrep 'unsupported mode' actual
 	"
 }
 
-test_incompatible_with_recurse_submodules -v
-test_incompatible_with_recurse_submodules -t
 test_incompatible_with_recurse_submodules --deleted
 test_incompatible_with_recurse_submodules --modified
 test_incompatible_with_recurse_submodules --others
-test_incompatible_with_recurse_submodules --stage
 test_incompatible_with_recurse_submodules --killed
 test_incompatible_with_recurse_submodules --unmerged
-test_incompatible_with_recurse_submodules --eol
 
 test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] ls-files: add pathspec matching for submodules
  2016-09-19 18:52                                   ` [PATCH v2] " Brandon Williams
@ 2016-09-19 23:21                                     ` Junio C Hamano
  2016-09-20 16:30                                       ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-19 23:21 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, Nguyễn Thái Ngọc Duy

Brandon Williams <bmwill@google.com> writes:

> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> index a623ebf..09e4449 100644
> --- a/Documentation/git-ls-files.txt
> +++ b/Documentation/git-ls-files.txt
> @@ -19,7 +19,7 @@ SYNOPSIS
> -		[--output-path-prefix=<path>]
> +		[--submodule-prefix=<path>]
> ...
> ---output-path-prefix=<path>::
> +--submodule-prefix=<path>::
> ...
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
> ...
> -	if (output_path_prefix && *output_path_prefix) {
> +	if (submodule_prefix && *submodule_prefix) {
>  		strbuf_reset(&full_name);
> -		strbuf_addstr(&full_name, output_path_prefix);
> +		strbuf_addstr(&full_name, submodule_prefix);
> ...
> -	argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/",
> -			 output_path_prefix ? output_path_prefix : "",
> +	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
> +			 submodule_prefix ? submodule_prefix : "",
>  			 ce->name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

That means that this patch will become 2/2 of a series, and 1/2 is
rerolled to use submodule_prefix from the get-go, without ever
introducing output_path_prefix variable, so that many of the above
lines we won't have to review in 2/2.

> +	/*
> +	 * Pass in the original pathspec args.  The submodule will be
> +	 * responsible for prepending the 'submodule_prefix' prior to comparing
> +	 * against the pathspec for matches.
> +	 */

Good.

> +	argv_array_push(&cp.args, "--");
> +	for (i = 0; i < pathspec.nr; ++i)
> +		argv_array_push(&cp.args, pathspec.items[i].original);
> +

Please prefer post-increment i++ over pre-increment ++i when writing
a for(;;) loop, unless there is a strong reason not to (familiarlity
in C++ is not a good reason).

> diff --git a/dir.c b/dir.c
> index 0ea235f..1afc3ff 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,35 @@ int fspathncmp(const char *a, const char *b, size_t count)
>  	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +/**
> + * Used to perform prefix matching against a pathspec_item for determining if we
> + * should decend into a submodule.  This function can result in false positives
> + * since we are only trying to match the 'string' to a prefix of the 'pattern'
> + */

Perhaps s/can/is allowed to/.  Implicit but equally important is
that it is not OK to produce false negative.

> +static int prefix_fnmatch(const struct pathspec_item *item,
> +		   const char *pattern, const char *string,
> +		   int prefix)
> +{
> +	if (prefix > 0) {
> +		if (ps_strncmp(item, pattern, string, prefix))
> +			return WM_NOMATCH;
> +		pattern += prefix;
> +		string += prefix;
> +	}
> +
> +	if (item->flags & PATHSPEC_ONESTAR) {
> +		return WM_MATCH;
> +	} else if (item->magic & PATHSPEC_GLOB) {
> +		return wildmatch(pattern, string,
> +				 WM_PATHNAME |
> +				 (item->magic & PATHSPEC_ICASE ?
> +				  WM_CASEFOLD : 0),
> +				 NULL);

Isn't this last one overly tight?  I am wondering about a scenario
where you have a submodule at "sub/" in the superproject, and "sub/"
has a "file" at the top of its working tree.  And you do:

	git ls-files --recurse-submodules ':(glob)??b/fi?e'

at the top of the superproject.  The "pattern" would be '??b/fi?e"
while string would be 'sub', and wildmatch() would not like it, but
there is no way for this caller to append anything to 'sub' before
making this call, as it hasn't looked into what paths appear in the
submodule repository (and it should not want to).  And I think we
would want it to recurse to find sub/file.  IOW, this looks like a
false negative we must avoid in this function.  As we cannot afford
to check if anything that matches 'fi?e' is in the index file of the
submodule repository, we shouldn't try to match 'fi?e' portion of
the given pathspec pattern.

Extending the scenario, if I also create a directory "sib/" in the
superproject next to "sub/" and add a "file" in it, the same
pathspec:

	git ls-files [--recurse-submodules] ':(glob)??b/fi?e'

does find "sib/file" (with or without the recursion option).

    Duy cc'ed because I am not quite sure why it is a good thing
    that I need to add an exlicit ':(glob)' in these examples to
    illustrate the breakage.  This comes from v1.8.3.2-849-gbd30c2e
    ("pathspec: support :(glob) syntax", 2013-07-14) and I do not
    think we want to change its behaviour. I just want to be
    reminded why we did this.  I am guessing that it was because of
    an ancient design decision that we would use fnmatch without
    FNM_PATHNAME by default, i.e. it would be too disruptive if we
    made :(glob), i.e. honor directory boundaries, the default.

With :(glob) that follows FNM_PATHNAME behaviour, ":(glob)s*" would
be rejected as a match with "sub/file" or "sib/file", and that is
OK, but I think our ':(glob)??b/fi?e' example should find both of
these paths as matches to the pattern.

> +
> +	return WM_NOMATCH;
> +}
> +

And of course, if it is not PATHSPEC_GLOB, returning NOMATCH does
not sound like erring on the side to prevent false negatives.  For
example, what does

	git ls-files --recurse-submodule '??b/fi?e'

do in the above scenario with this patch?  Doesn't the if/else
cascade fall through and hit this return that says "Nah, 'sub'
cannot possibly match, do not bother descending into it"?

One thing I was originally confused about was that there are at
least three kinds we need to worry about.

 - PATHSPEC_LITERAL, where we no-wildcard-len should cover
   everything in the pattern string, e.g. "sub/file" may be the
   pattern that would want to produce a match when trying to see if
   we want to descend into "sub".

 - PATHSPEC_GLOB, as discussed above.

 - Not having either LITERAL or GLOB.  's*' would want to say that
   it is worth descending into "sub".  If there were another
   submodule at "sob/dir", that would match the pattern 's*' as
   well, because an asterisk can match 'ob/dir' part unlike in
   PATHSPEC_GLOB case.

Thanks.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] ls-files: add pathspec matching for submodules
  2016-09-19 23:21                                     ` Junio C Hamano
@ 2016-09-20 16:30                                       ` Brandon Williams
  2016-09-20 21:03                                         ` Brandon Williams
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-20 16:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc Duy

On Mon, Sep 19, 2016 at 4:21 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> As the previous one that used a wrong (sorry) argument is not even
> in 'next' yet, let's pretend that it never happened.  It is OK to
> still keep it and this patch as two separate steps, i.e. a topic
> with two patches in it.
>
> That means that this patch will become 2/2 of a series, and 1/2 is
> rerolled to use submodule_prefix from the get-go, without ever
> introducing output_path_prefix variable, so that many of the above
> lines we won't have to review in 2/2.

Ah ok.  Would you like me to resend the first patch with the desired
change then?

>
>> +     /*
>> +      * Pass in the original pathspec args.  The submodule will be
>> +      * responsible for prepending the 'submodule_prefix' prior to comparing
>> +      * against the pathspec for matches.
>> +      */
>
> Good.
>
>> +     argv_array_push(&cp.args, "--");
>> +     for (i = 0; i < pathspec.nr; ++i)
>> +             argv_array_push(&cp.args, pathspec.items[i].original);
>> +
>
> Please prefer post-increment i++ over pre-increment ++i when writing
> a for(;;) loop, unless there is a strong reason not to (familiarlity
> in C++ is not a good reason).

I had a compiler instructor drill into my head that ++i in a for loop
was faster historically
since it wouldn't have to create a temporary value.  Of course now a days there
probably isn't much (or any) difference between the two.  If post-fix
operators are the
norm in git code then I can try to remember to use them :)

>> +
>> +     if (item->flags & PATHSPEC_ONESTAR) {
>> +             return WM_MATCH;
>> +     } else if (item->magic & PATHSPEC_GLOB) {
>> +             return wildmatch(pattern, string,
>> +                              WM_PATHNAME |
>> +                              (item->magic & PATHSPEC_ICASE ?
>> +                               WM_CASEFOLD : 0),
>> +                              NULL);
>
> Isn't this last one overly tight?  I am wondering about a scenario
> where you have a submodule at "sub/" in the superproject, and "sub/"
> has a "file" at the top of its working tree.  And you do:
>
>         git ls-files --recurse-submodules ':(glob)??b/fi?e'
>
> at the top of the superproject.  The "pattern" would be '??b/fi?e"
> while string would be 'sub', and wildmatch() would not like it, but
> there is no way for this caller to append anything to 'sub' before
> making this call, as it hasn't looked into what paths appear in the
> submodule repository (and it should not want to).  And I think we
> would want it to recurse to find sub/file.  IOW, this looks like a
> false negative we must avoid in this function.  As we cannot afford
> to check if anything that matches 'fi?e' is in the index file of the
> submodule repository, we shouldn't try to match 'fi?e' portion of
> the given pathspec pattern.

good point.  Let me think about this some more.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] ls-files: add pathspec matching for submodules
  2016-09-20 16:30                                       ` Brandon Williams
@ 2016-09-20 21:03                                         ` Brandon Williams
  2016-09-21 17:12                                           ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2016-09-20 21:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nguyễn Thái Ngọc Duy

>>> +
>>> +     if (item->flags & PATHSPEC_ONESTAR) {
>>> +             return WM_MATCH;
>>> +     } else if (item->magic & PATHSPEC_GLOB) {
>>> +             return wildmatch(pattern, string,
>>> +                              WM_PATHNAME |
>>> +                              (item->magic & PATHSPEC_ICASE ?
>>> +                               WM_CASEFOLD : 0),
>>> +                              NULL);
>>
>> Isn't this last one overly tight?  I am wondering about a scenario
>> where you have a submodule at "sub/" in the superproject, and "sub/"
>> has a "file" at the top of its working tree.  And you do:
>>
>>         git ls-files --recurse-submodules ':(glob)??b/fi?e'
>>
>> at the top of the superproject.  The "pattern" would be '??b/fi?e"
>> while string would be 'sub', and wildmatch() would not like it, but
>> there is no way for this caller to append anything to 'sub' before
>> making this call, as it hasn't looked into what paths appear in the
>> submodule repository (and it should not want to).  And I think we
>> would want it to recurse to find sub/file.  IOW, this looks like a
>> false negative we must avoid in this function.  As we cannot afford
>> to check if anything that matches 'fi?e' is in the index file of the
>> submodule repository, we shouldn't try to match 'fi?e' portion of
>> the given pathspec pattern.
>
> good point.  Let me think about this some more.

On a similar but slightly different note.  In general do we want the
pathspec '??b' to
match against the sib/ directory and subsequently have ls-files print
all entries inside
of the sib/ directory?  (this is in the non-recursive case)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] ls-files: add pathspec matching for submodules
  2016-09-20 21:03                                         ` Brandon Williams
@ 2016-09-21 17:12                                           ` Junio C Hamano
  2016-09-21 17:49                                             ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-09-21 17:12 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, Nguyễn Thái Ngọc Duy

Brandon Williams <bmwill@google.com> writes:

> On a similar but slightly different note.  In general do we want
> the pathspec '??b' to match against the sib/ directory and
> subsequently have ls-files print all entries inside of the sib/
> directory?  (this is in the non-recursive case)

I'd need to find time to dig a bit of history before I can give a
firm opinion on this, but here is a knee-jerk version of my reaction.

 * A pathspec element that matches literally to a directory causes
   itself and everything underneath the directory match that
   element, e.g. "sib" would be considered a match.

 * Otherwise, a pathspec that matches with the whole path as a
   pattern matches the path, e.g. "??b" would match "sib" itself,
   but not "sib/file".  Note that "??b*" would match "sib" and
   "sib/file" because the pattern match is without FNM_PATNAME
   unless ':(glob)' magic is in effect.

Historically, some commands treated a pathspec as purely a prefix
match (i.e. the former) and did not use _any_ pattern matching,
while other commands did both of the above two (e.g. compare ls-tree
and ls-files).  I thought we were slowly moving towards unifying
them, but apparently 'git log -- "D?cumentation"' does not show
anything close to what 'git log -- Documentation' gives us even in
today's Git.

Probably we want to change it at some point so that a pattern that
matches one leading directory would cause everything underneath to
match, e.g. "??b" would include "sib/file" just because "sib" would.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] ls-files: add pathspec matching for submodules
  2016-09-21 17:12                                           ` Junio C Hamano
@ 2016-09-21 17:49                                             ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2016-09-21 17:49 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, Nguyễn Thái Ngọc Duy

Junio C Hamano <gitster@pobox.com> writes:

> Brandon Williams <bmwill@google.com> writes:
>
>> On a similar but slightly different note.  In general do we want
>> the pathspec '??b' to match against the sib/ directory and
>> subsequently have ls-files print all entries inside of the sib/
>> directory?  (this is in the non-recursive case)
>
> I'd need to find time to dig a bit of history before I can give a
> firm opinion on this, but here is a knee-jerk version of my reaction.

In the context of what you are doing, i.e. "ls-files that recurses
into submodules", my opinion is that "ls-files --recurse-submodules"
should behave wrt pathspecs AS IF all the submodule contents are
flattened into a single index of the superproject.

In the sample scenario under discussion, i.e.

    In the superproject we have these
    $ git ls-files -s
    100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0       .gitmodules
    100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       sib/file
    160000 1f5a0695289c500f25e7fa55e3ad27e394d1206b 0       sub

    In 'sub' submodule we have this
    100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       file

such a flattend index would look like this:

    100644 c489803d5bdec1755f650854fe7ef5ab7a3ee58d 0       .gitmodules
    100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       sib/file
    100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       sub/file

i.e. removing 'sub' submodule entry from the index of the
superproject and overlay everything in the submodule with sub/
prefixed to its path.

And with such an index, if and only if a path matches a pathspec,
"git ls-files --recurse-submodules" run at the toplevel with the
same pathspec should show the path.  That means

    $ git ls-files --recurse-submodules '??b'

would show nothing (not even 'sub'), while

    $ git ls-files --recurse-submodules '??b*'

should show sib/file and sub/file.  That is because that is how the
command without "--recurse-submodules" working on that flat index
would produce.

The "we have historically two kinds of pathspecs and they differ how
they work with wildcard" is a separate issue, I would think, even
though the result would affect what should happen in the above
example (i.e. if we said "either a pattern match or a literal match
to a leading directory path should make everything underneath
match", '??b' would make sib/<anything> and sub/<anything> to be
shown).

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-21 17:49 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 23:57 [RFC] extending pathspec support to submodules Brandon Williams
2016-09-15 11:57 ` Heiko Voigt
2016-09-15 15:26   ` Brandon Williams
2016-09-15 22:08     ` Junio C Hamano
2016-09-15 22:28       ` Stefan Beller
2016-09-16  9:34         ` Heiko Voigt
2016-09-16 18:40           ` Brandon Williams
2016-09-17  0:59             ` [PATCH] ls-files: add pathspec matching for submodules Brandon Williams
2016-09-17  3:46               ` Junio C Hamano
2016-09-18 18:40                 ` Brandon Williams
2016-09-19 17:00                   ` Junio C Hamano
2016-09-19 17:26                     ` Brandon Williams
2016-09-19 18:04                       ` Junio C Hamano
2016-09-19 18:20                         ` Brandon Williams
2016-09-19 18:22                           ` Junio C Hamano
2016-09-19 18:30                             ` Brandon Williams
2016-09-19 18:34                               ` Junio C Hamano
2016-09-19 18:35                                 ` Brandon Williams
2016-09-19 18:52                                   ` [PATCH v2] " Brandon Williams
2016-09-19 23:21                                     ` Junio C Hamano
2016-09-20 16:30                                       ` Brandon Williams
2016-09-20 21:03                                         ` Brandon Williams
2016-09-21 17:12                                           ` Junio C Hamano
2016-09-21 17:49                                             ` Junio C Hamano
2016-09-19 18:18               ` [PATCH] " Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.