git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: dana <dana@dana.is>
Cc: git@vger.kernel.org, Matthieu Moy <Matthieu.Moy@imag.fr>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [BUG] gitignore documentation inconsistent with actual behaviour
Date: Thu, 11 Oct 2018 13:08:06 +0200	[thread overview]
Message-ID: <87bm80eo6h.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <C16A9F17-0375-42F9-90A9-A92C9F3D8BBA@dana.is>


On Thu, Oct 11 2018, dana wrote:

> Hello,
>
> I'm a contributor to ripgrep, which is a grep-like tool that supports using
> gitignore files to control which files are searched in a repo (or any other
> directory tree). ripgrep's support for the patterns in these files is based on
> git's official documentation, as seen here:
>
>   https://git-scm.com/docs/gitignore
>
> One of the most common reports on the ripgrep bug tracker is that it does not
> allow patterns like the following real-world examples, where a ** is used along
> with other text within the same path component:
>
>   **/**$$*.java
>   **.orig
>   **local.properties
>   !**.sha1
>
> The reason it doesn't allow them is that the gitignore documentation explicitly
> states that they're invalid:
>
>   A leading "**" followed by a slash means match in all directories...
>
>   A trailing "/**" matches everything inside...
>
>   A slash followed by two consecutive asterisks then a slash matches zero or
>   more directories...
>
>   Other consecutive asterisks are considered invalid.
>
> git itself happily accepts these patterns, however, apparently treating the **
> like a single * without fnmatch(3)'s FNM_PATHNAME flag set (in other words, it
> matches / as a regular character, thus crossing directory boundaries).
>
> ripgrep's developer is loathe to reverse-engineer this undocumented behaviour,
> and so the reports keep coming, both for ripgrep itself and for down-stream
> consumers of it and its ignore crate (including most notably Microsoft's VS Code
> editor).
>
> My request: Assuming that git's actual handling of these patterns is intended,
> would it be possible to make it 'official' and explicitly add it to the
> documentation?
>
> References (the first one is the main bug):
>
> https://github.com/BurntSushi/ripgrep/issues/373
> https://github.com/BurntSushi/ripgrep/issues/507
> https://github.com/BurntSushi/ripgrep/issues/859
> https://github.com/BurntSushi/ripgrep/issues/945
> https://github.com/BurntSushi/ripgrep/issues/1080
> https://github.com/BurntSushi/ripgrep/issues/1082
> https://github.com/Microsoft/vscode/issues/24050

Yeah those docs seem wrong. In general the docs for the matching
function are quite bad. I have on my TODO list to factor this out into
some gitwildmatch manpage, but right now the bit in gitignore is all we
have.

Our matching function comes from rsync originally, whose manpage says:

    use ’**’ to match anything, including slashes.

I believe this is accurate as far as the implementation goes. You can
also see the rather exhaustive tests here:
https://github.com/git/git/blob/master/t/t3070-wildmatch.sh

Note the different behavior with e.g. --glob-pathspecs v.s. the
default. There's also stuff like:

    $ grep diff=perl .gitattributes
    *.perl eol=lf diff=perl
    *.pl eof=lf diff=perl
    *.pm eol=lf diff=perl
    $ git ls-files ":(attr:diff=perl)" | wc -l
    65

And then the exclude syntax. This is not in .gitignore:

    $ git ls-files ":(exclude)*.pm" ":(attr:diff=perl)" | wc -l
    41
    $ git ls-files ":^*.pm" ":(attr:diff=perl)" | wc -l
    41

I.e. we have wildmatch() on one hand and then the pathspec matching.

For an unrelated thing I have been thinking of adding a new plumbing
command who'd get input like this on stdin:

    1 text t/t0202/test.pl\0\0
    2 text perl/Git.pm\0\0
    3 text *.pm\0\0
    4 text :^*.pm"\0:(attr:diff=perl)\0\0
    5 match glob,icase\04\03\0\0
    6 match icase\04\02\0\0
    7 match \04\01\0\0

Which would return (in any order):

    1 OK
    2 OK
    3 OK
    4 OK
    5 NO
    6 NO
    7 YES

Or whatever. I.e. something where you can as a batch feed various
strings into a program in a batch, and then ask how some of them would
match against each other.

The reason for this is to extend something like git-multimail[1] with a
config where users can subscribe to changes to paths as declared by git
pathspecs, and be informed about which of the glob rules they defined
matched their config.

Right now you'd need to e.g. for a "git push" run each match rule for
each user with such config against each changed path in each commit
that's pushed, but plumbing like this would allow for feeding arbitrary
combinations of those in and ask what matches against what.

The reason I'm on this tangent is to ask whether if such a thing
existed, if it's something you can see something like ripgrep
using. I.e. ask git given its .gitignore and .gitattributes what thing
matches one of its pathspecs instead of carrying your own bug-compatible
implementation.

1. https://github.com/git-multimail/git-multimail/

  parent reply	other threads:[~2018-10-11 11:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 10:19 [BUG] gitignore documentation inconsistent with actual behaviour dana
2018-10-11 10:37 ` dana
2018-10-11 11:08 ` Ævar Arnfjörð Bjarmason [this message]
2018-10-14  2:14   ` dana
2018-10-14 12:15   ` Duy Nguyen
2018-10-14 22:56     ` Junio C Hamano
2018-10-15 15:27       ` Duy Nguyen
2018-10-20  5:26 ` Duy Nguyen
2018-10-20  5:53   ` dana
2018-10-20  6:03     ` Duy Nguyen
2018-10-20  6:26       ` dana
2018-10-27  8:48 ` [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode Nguyễn Thái Ngọc Duy
2018-10-28  6:25   ` Torsten Bögershausen
2018-10-28  6:35     ` Duy Nguyen
2018-10-29  2:28       ` Junio C Hamano
2018-10-29 13:24   ` Ævar Arnfjörð Bjarmason
2018-10-29 15:53     ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bm80eo6h.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Matthieu.Moy@imag.fr \
    --cc=dana@dana.is \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).