All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: wildmatches like foo/**/**/bar don't match properly due to internal optimizations
@ 2017-04-25 19:13 Ævar Arnfjörð Bjarmason
  2017-04-26  8:56 ` Duy Nguyen
  0 siblings, 1 reply; 3+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-25 19:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, Junio C Hamano; +Cc: Git Mailing List

Thought I'd just start another thread for this rather than tack it
onto the pathalogical case thread.

In commit 4c251e5cb5 ("wildmatch: make /**/ match zero or more
directories", 2012-10-15) Duy added support for ** in globs.

One test-case for this is:

    match 1 0 'foo/baz/bar' 'foo/**/**/bar'

I.e. foo/**/**/bar matches foo/baz/bar. However due to some
pre-pruning we do in pathspec/ls-tree we can't ever match it, because
the first thing we do is peel the first part of the path/pattern off,
i.e. foo/, and then match baz/bar against **/**/bar.

The monkeypatch at the end of this mail makes this case work, but
breaks some others, and more importantly now e.g. on git.git we have
to call wildmatch() ~3K times for all the files we track, instead of
just for stuff in my foo/ folder.

So I don't think this needs to be made to work, being able to prune
patterns like this is very useful, but:

a) the wildmatch tests are lacking because they just call the
low-level function without a helper, but that's not actually how it
gets invoked by anything that matters

b) If we're going to be using wildmatch but implicitly not supporting
some of its very expensive optimization-beating syntax we should
probably make that an error explicitly.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index a6c70dbe9e..4285a09ccc 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -235,3 +235,3 @@ static void show_ce_entry(const char *tag, const
struct cache_entry *ce)
        struct strbuf name = STRBUF_INIT;
-       int len = max_prefix_len;
+       int len = 0;
        if (super_prefix)
@@ -640,2 +640,3 @@ int cmd_ls_files(int argc, const char **argv,
const char *cmd_prefix)
        max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
+       max_prefix_len = 0;

diff --git a/dir.c b/dir.c
index f451bfa48c..b4399f042c 100644
--- a/dir.c
+++ b/dir.c
@@ -86,3 +86,3 @@ int git_fnmatch(const struct pathspec_item *item,
                return wildmatch(pattern, string,
-                                item->magic & PATHSPEC_ICASE ? WM_CASEFOLD : 0,
+                                item->magic & PATHSPEC_ICASE ?
WM_CASEFOLD : WM_PATHNAME,
                                 NULL);
@@ -316,3 +316,3 @@ static int match_pathspec_item(const struct
pathspec_item *item, int prefix,
            !git_fnmatch(item, match, name,
-                        item->nowildcard_len - prefix))
+                        prefix))
                return MATCHED_FNMATCH;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: BUG: wildmatches like foo/**/**/bar don't match properly due to internal optimizations
  2017-04-25 19:13 BUG: wildmatches like foo/**/**/bar don't match properly due to internal optimizations Ævar Arnfjörð Bjarmason
@ 2017-04-26  8:56 ` Duy Nguyen
  2017-04-26 17:38   ` Brandon Williams
  0 siblings, 1 reply; 3+ messages in thread
From: Duy Nguyen @ 2017-04-26  8:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, Git Mailing List

On Wed, Apr 26, 2017 at 2:13 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Thought I'd just start another thread for this rather than tack it
> onto the pathalogical case thread.
>
> In commit 4c251e5cb5 ("wildmatch: make /**/ match zero or more
> directories", 2012-10-15) Duy added support for ** in globs.
>
> One test-case for this is:
>
>     match 1 0 'foo/baz/bar' 'foo/**/**/bar'
>
> I.e. foo/**/**/bar matches foo/baz/bar. However due to some
> pre-pruning we do in pathspec/ls-tree we can't ever match it, because
> the first thing we do is peel the first part of the path/pattern off,
> i.e. foo/, and then match baz/bar against **/**/bar.

Yeah. I think the prefix compare trick predated wildmatch. When I
introduced positional wildcards "**/" I failed to spot this. Good
catch.

Ideally this sort of optimization should be contained within wildmatch
(or whatever matching engine we'll be using). It also opens up more
opportunity (like precompile pattern mentioned elsewhere in this
thread).

You need to be careful though, when we do case-insensitive matching,
sometimes we want to match the prefix case _sensitively_ instead. So
we need to pass the "prefix" info in some cases to the matching
engine.

I guess time is now ripe (i.e. somebody volunteers to work on this ;-)
to improve wildmatch. "improve" can also be "rewriting to pcre" if we
really want that route, which I have no opinion because I don't know
pcre availability on other (some obscure) platforms.
-- 
Duy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: wildmatches like foo/**/**/bar don't match properly due to internal optimizations
  2017-04-26  8:56 ` Duy Nguyen
@ 2017-04-26 17:38   ` Brandon Williams
  0 siblings, 0 replies; 3+ messages in thread
From: Brandon Williams @ 2017-04-26 17:38 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Git Mailing List

On 04/26, Duy Nguyen wrote:
> On Wed, Apr 26, 2017 at 2:13 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> > Thought I'd just start another thread for this rather than tack it
> > onto the pathalogical case thread.
> >
> > In commit 4c251e5cb5 ("wildmatch: make /**/ match zero or more
> > directories", 2012-10-15) Duy added support for ** in globs.
> >
> > One test-case for this is:
> >
> >     match 1 0 'foo/baz/bar' 'foo/**/**/bar'
> >
> > I.e. foo/**/**/bar matches foo/baz/bar. However due to some
> > pre-pruning we do in pathspec/ls-tree we can't ever match it, because
> > the first thing we do is peel the first part of the path/pattern off,
> > i.e. foo/, and then match baz/bar against **/**/bar.
> 
> Yeah. I think the prefix compare trick predated wildmatch. When I
> introduced positional wildcards "**/" I failed to spot this. Good
> catch.
> 
> Ideally this sort of optimization should be contained within wildmatch
> (or whatever matching engine we'll be using). It also opens up more
> opportunity (like precompile pattern mentioned elsewhere in this
> thread).
> 
> You need to be careful though, when we do case-insensitive matching,
> sometimes we want to match the prefix case _sensitively_ instead. So
> we need to pass the "prefix" info in some cases to the matching
> engine.
> 
> I guess time is now ripe (i.e. somebody volunteers to work on this ;-)
> to improve wildmatch. "improve" can also be "rewriting to pcre" if we
> really want that route, which I have no opinion because I don't know
> pcre availability on other (some obscure) platforms.

If we do end up improving wildmatch, we may also want the functionality
to (with a flag) have a pattern match a leading directory.  This would
be useful in the submodule case where a user gives a pathspec which
could go into a submodule but we don't want to launch a child process to
operate on the submodule unless the first part of the pattern matches
the submodule.  Right now with recursive grep, if wildcards are used
then the code just punts and says "yeah this pattern may match something
in the submodule but we won't know for sure till we actually try".

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-26 17:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25 19:13 BUG: wildmatches like foo/**/**/bar don't match properly due to internal optimizations Ævar Arnfjörð Bjarmason
2017-04-26  8:56 ` Duy Nguyen
2017-04-26 17:38   ` Brandon Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.