All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: git grep with leading inverted bracket expression
Date: Thu, 07 Jun 2018 21:29:48 +0200	[thread overview]
Message-ID: <874liez977.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180607192213.GB24370@bombadil.infradead.org>


On Thu, Jun 07 2018, Matthew Wilcox wrote:

> On Thu, Jun 07, 2018 at 09:09:25PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Jun 07 2018, Matthew Wilcox wrote:
>> > If the first atom of a regex is a bracket expression with an inverted range,
>> > git grep is very slow.
>>
>> I have some WIP patches to fix all of this, which I'll hopefully submit
>> before 2.19 is out the door.
>>
>> What you've discovered here is how shitty your libc regex engine is,
>> because unless you provide -P and compile with a reasonably up-to-date
>> libpcre (preferably v2) with JIT that's what you'll get.
>
> I'm using Debian's build, and it is linked against a recent libpcre2:
> $ ldd /usr/lib/git-core/git
> 	libpcre2-8.so.0 => /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f59ad5f2000)
> $ dpkg --status libpcre2-8-0
> Version: 10.31-3
>
> But I wasn't using -P.  If I do, then I see the performance numbers you do:
>
> $ time git grep -P '[^t]truct_size' >/dev/null
> real	0m0.354s
> user	0m0.340s
> sys	0m0.639s
> $ time git grep -P 'struct_size' >/dev/null
> real	0m0.336s
> user	0m0.552s
> sys	0m0.457s
> $ time git grep 'struct_size' >/dev/null
> real	0m0.335s
> user	0m0.535s
> sys	0m0.474s
>
>> So you need to just use an up-to-date libpcre2 & -P and performance
>> won't suck.

Yeah that's recent enough & will get you all the benefits.

> I don't tend to use terribly advanced regexps, so I'll just set
> grep.patternType to 'perl' and then it'll automatically be fast for me
> without your patches ;-)

Indeed, if you're happy with that that'll do it.

>> My WIP patches will make us use PCRE for all grep modes, using an API it
>> has to convert basic & extended regexp syntax to its own syntax, so
>> we'll be able to do that transparently.
>
> That's clearly the right answer.  Thanks!

Yeah, unfortunately git-grep's default is "basic" regexp which has a
really atrocious syntax that's different enough from extended & Perl's
that we probably couldn't just switch it over.

That won't be needed with my patches, but maybe I'll follow-up with
something to s/basic/extended/g by default, because on side effect of
having the pattern converter is that we could have a warning whenever
the user has a pattern that would be different under extended/perl, so
we can see how common that is.

      reply	other threads:[~2018-06-07 19:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07 15:27 git grep with leading inverted bracket expression Matthew Wilcox
2018-06-07 19:09 ` Ævar Arnfjörð Bjarmason
2018-06-07 19:22   ` Matthew Wilcox
2018-06-07 19:29     ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874liez977.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.