All of lore.kernel.org
 help / color / mirror / Atom feed
* Very promising results with libpcre2
@ 2017-03-31 21:23 Ævar Arnfjörð Bjarmason
  2017-03-31 22:48 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-03-31 21:23 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano, Jeff King, Jeffrey Walton

The recent libpcre2 got me interested in seeing what the difference in
v1 and v2 was.

So I hacked up a *very basic* patch for libpcre2 that passes all
tests, but obviously isn't ready for inclusion (I searched/replaced
all the v1 usage with v2). I'm not even bothering sending this to the
list since discussing the patch itself isn't the point:

https://github.com/avar/git/commit/414647d88dd9c5

Before that patch, running a test[1] on linux.git where I grep the
whole tree for a fixed string / simple regex with POSIX regexes & PCRE
(all the greps match the same few lines) gives:

      s/iter    rx   prx fixed
rx      2.20    --   -2%  -34%
prx     2.17    2%    --  -32%
fixed   1.46   51%   48%    --

I.e. fixed string is fastest, and both POSIX regcomp() & pcre v1 are
~30% slower than that, with no difference in performance between the
two. Now with my patch above with pcre v2 there's a notable
performance difference:

      s/iter    rx   prx fixed
rx      2.18    --  -16%  -33%
prx     1.84   19%    --  -20%
fixed   1.47   48%   25%    --

We've gone from ~30% slower to ~20% slower for PCRE with v2. But now
let's test that with this patch:

https://github.com/avar/git/commit/4b7e5da3606c0b9b12025437de8005f5fa07ff54

That enables the new JIT support in pcre v2:

      s/iter    rx fixed   prx
rx      2.19    --  -33%  -44%
fixed   1.47   49%    --  -17%
prx     1.22   79%   20%    --

Now it's PCRE that's 20% faster than our currently fastest grep
codepath that searches for a fixed string, and in absolute terms it's
around 50% faster than the current PCRE implementation.

This is on Debian testing with both PCRE libraries installed via
packages, 8.35 & 10.22 for v1 and v2, respectively. Both are the
second-latest[2] point releases for their respective versions.

As far as turning this into a patch goes there's a few open questions:

* PCRE itself supports linking to v1 and v2 in the same program just
fine. Should we provide the possibility to link to both, or just make
the user choose? If these performance numbers hold up preferring v2 is
definitely better.

* The JIT is supposedly a bit slower if you're not doing a lot of
matching, although I doubt this matters in practice, but whether to
use it & a few other options could be controlled by some config/CLI
option. I think it probably makes sense just to always use it if it's
there pending some cases where it makes performance worse in practice

As an aside I started looking into this because I'm interested in
eventually hacking up something that makes every user-facing
regcomp()/regexec() we have now (e.g. log -G) accept PCRE as well.

How do do this in all cases isn't very obvious, we could just have
some global config option, but there's lots of stuff like e.g.
"<rev>^{/<regex>} & "git show :/<regex>" that takes regexes, and e.g.
how to pass a /i flag to some of these isn't obvious at all.

The solution I'm leaning towards is to just make stuff like thath only
work under PCRE, via the native (?<flags>:<pattern>) facility. E.g.
this now works:

    git grep -P '(?xi: h e l l o)'

1. PF=~/g/git/ perl -MBenchmark=cmpthese -wE 'cmpthese(20, { fixed =>
sub { system "$ENV{PF}git grep -F avarasu >/dev/null" }, rx => sub {
system "$ENV{PF}git grep avara?su >/dev/null" }, prx => sub { system
"$ENV{PF}git grep -P avara?su >/dev/null" } })'
2. https://ftp.pcre.org/pub/pcre/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-03-31 21:23 Very promising results with libpcre2 Ævar Arnfjörð Bjarmason
@ 2017-03-31 22:48 ` Junio C Hamano
  2017-04-01  8:55   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-03-31 22:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> That enables the new JIT support in pcre v2:
>
>       s/iter    rx fixed   prx
> rx      2.19    --  -33%  -44%
> fixed   1.47   49%    --  -17%
> prx     1.22   79%   20%    --

The numbers with JIT does look "interesting".  

I couldn't quite tell if there are major incompatibilities in the
regex language itself between two versions from their documentation,
but assuming that there isn't (modulo bugfixes and enhancements) and
assuming that we are going to use their standard matcher, it may be
OK to just use the newer one without linking both.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-03-31 22:48 ` Junio C Hamano
@ 2017-04-01  8:55   ` Ævar Arnfjörð Bjarmason
  2017-04-01 18:24     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-01  8:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King, Jeffrey Walton

On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> That enables the new JIT support in pcre v2:
>>
>>       s/iter    rx fixed   prx
>> rx      2.19    --  -33%  -44%
>> fixed   1.47   49%    --  -17%
>> prx     1.22   79%   20%    --
>
> The numbers with JIT does look "interesting".
>
> I couldn't quite tell if there are major incompatibilities in the
> regex language itself between two versions from their documentation,
> but assuming that there isn't (modulo bugfixes and enhancements) and
> assuming that we are going to use their standard matcher, it may be
> OK to just use the newer one without linking both.

There's no incompatibilities in the regex language itself (modulo bugs
etc). So yeah, I'll prepare some patch to use v2.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-04-01  8:55   ` Ævar Arnfjörð Bjarmason
@ 2017-04-01 18:24     ` Junio C Hamano
  2017-04-01 19:11       ` Ævar Arnfjörð Bjarmason
  2017-04-01 23:33       ` Jeffrey Walton
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-04-01 18:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> That enables the new JIT support in pcre v2:
>>>
>>>       s/iter    rx fixed   prx
>>> rx      2.19    --  -33%  -44%
>>> fixed   1.47   49%    --  -17%
>>> prx     1.22   79%   20%    --
>>
>> The numbers with JIT does look "interesting".
>>
>> I couldn't quite tell if there are major incompatibilities in the
>> regex language itself between two versions from their documentation,
>> but assuming that there isn't (modulo bugfixes and enhancements) and
>> assuming that we are going to use their standard matcher, it may be
>> OK to just use the newer one without linking both.
>
> There's no incompatibilities in the regex language itself (modulo bugs
> etc). So yeah, I'll prepare some patch to use v2.

Just to make sure that we are on the same page.  While I do not see
the need to link with both variants and allow users to choose
between them at runtime, I do not know if the whole world is ready
to drop pcre1 and use pcre2 (the latter of which has only been
around for a bit over two years).

So we'd probably want to do 

 (1) keep USE_LIBPCRE and enable v1 when set;
 (2) add USE_LIBPCRE2 and enable v2 when set;
 (3) make sure to error out when both are set.

or something like that.  It is tempting to allow us to say

    make USE_LIBPCRE=2

but the existing builds are likely to be depending on "is it set to
anything? then use PCRE1" behaviour, so we unfortunately cannot take
that route.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-04-01 18:24     ` Junio C Hamano
@ 2017-04-01 19:11       ` Ævar Arnfjörð Bjarmason
  2017-04-02  3:45         ` Junio C Hamano
  2017-04-01 23:33       ` Jeffrey Walton
  1 sibling, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-01 19:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King, Jeffrey Walton

On Sat, Apr 1, 2017 at 8:24 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Sat, Apr 1, 2017 at 12:48 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>> That enables the new JIT support in pcre v2:
>>>>
>>>>       s/iter    rx fixed   prx
>>>> rx      2.19    --  -33%  -44%
>>>> fixed   1.47   49%    --  -17%
>>>> prx     1.22   79%   20%    --
>>>
>>> The numbers with JIT does look "interesting".
>>>
>>> I couldn't quite tell if there are major incompatibilities in the
>>> regex language itself between two versions from their documentation,
>>> but assuming that there isn't (modulo bugfixes and enhancements) and
>>> assuming that we are going to use their standard matcher, it may be
>>> OK to just use the newer one without linking both.
>>
>> There's no incompatibilities in the regex language itself (modulo bugs
>> etc). So yeah, I'll prepare some patch to use v2.
>
> Just to make sure that we are on the same page.  While I do not see
> the need to link with both variants and allow users to choose
> between them at runtime, I do not know if the whole world is ready
> to drop pcre1 and use pcre2 (the latter of which has only been
> around for a bit over two years).
>
> So we'd probably want to do
>
>  (1) keep USE_LIBPCRE and enable v1 when set;
>  (2) add USE_LIBPCRE2 and enable v2 when set;
>  (3) make sure to error out when both are set.
>
> or something like that.  It is tempting to allow us to say
>
>     make USE_LIBPCRE=2
>
> but the existing builds are likely to be depending on "is it set to
> anything? then use PCRE1" behaviour, so we unfortunately cannot take
> that route.

We're on the same page, all of this makes sense, there'll be some OS's
/ environments which for years won't be packaging pcre2 but will have
pcre1.

I am very tempted though to support them in parallel, if only for ease
of performance testing and to be able to roll out support for
grep.patternType=perl meaning pcre1 for now, but add a
grep.patternType=pcre2 for testing (and make grep.patternType=pcre1
work, with grep.patternType=pcre being synonymous with
grep.patternType=perl, i.e. whatever the default is).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-04-01 18:24     ` Junio C Hamano
  2017-04-01 19:11       ` Ævar Arnfjörð Bjarmason
@ 2017-04-01 23:33       ` Jeffrey Walton
  2017-04-02  3:45         ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jeffrey Walton @ 2017-04-01 23:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List, Jeff King

> Just to make sure that we are on the same page.  While I do not see
> the need to link with both variants and allow users to choose
> between them at runtime, I do not know if the whole world is ready
> to drop pcre1 and use pcre2 (the latter of which has only been
> around for a bit over two years).
>
> So we'd probably want to do
>
>  (1) keep USE_LIBPCRE and enable v1 when set;
>  (2) add USE_LIBPCRE2 and enable v2 when set;
>  (3) make sure to error out when both are set.
>
> or something like that.  It is tempting to allow us to say
>
>     make USE_LIBPCRE=2
>
> but the existing builds are likely to be depending on "is it set to
> anything? then use PCRE1" behaviour, so we unfortunately cannot take
> that route.

Yeah, that's the question I kinda had.

> make USE_LIBPCRE=2

I'd prefer a configure option for consistency. Maybe:

    --with-pcre  # Original PCRE
    --with-pcre1  # Alias
    --with-pcre2  # PCRE2

I prefer it because I usually do the following to see the interesting
things that's going on:

    ./configure --help

Often, I find a `--with-ssl` or similar. If `--with-ssl` fails, then I
go to the README and INSTALL to fine tune it.

By the way, if you are tweaking Configure, then consider adding a
--with-perl=X, too. Its consistent, it side steps the hard coded
/usr/bin/perl, and it signals to users its tunable.

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-04-01 23:33       ` Jeffrey Walton
@ 2017-04-02  3:45         ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-04-02  3:45 UTC (permalink / raw)
  To: Jeffrey Walton
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List, Jeff King

Jeffrey Walton <noloader@gmail.com> writes:

>> Just to make sure that we are on the same page.  While I do not see
>> the need to link with both variants and allow users to choose
>> between them at runtime, I do not know if the whole world is ready
>> to drop pcre1 and use pcre2 (the latter of which has only been
>> around for a bit over two years).
>>
>> So we'd probably want to do
>>
>>  (1) keep USE_LIBPCRE and enable v1 when set;
>>  (2) add USE_LIBPCRE2 and enable v2 when set;
>>  (3) make sure to error out when both are set.
>>
>> or something like that.  It is tempting to allow us to say
>>
>>     make USE_LIBPCRE=2
>>
>> but the existing builds are likely to be depending on "is it set to
>> anything? then use PCRE1" behaviour, so we unfortunately cannot take
>> that route.
>
> Yeah, that's the question I kinda had.
>
>> make USE_LIBPCRE=2
>
> I'd prefer a configure option for consistency. Maybe:
> ...

It is way too early to worry about how ./configure support for this
will look like to the end user.

Because our Makefile is designed to be usable without configure, the
order we do things will be to first decide how we are going to use
Makefile variables to configure the feature (i.e. what you saw that
I said to Ævar).

Once we know the decision, then we arrange autoconf to spit out the
chosen Makefile variables using --with-pcre or whatever input.  This
step cannot start before we know the decision of the former.  The
one who writes --with-pcre support in ./configure would not know if
USE_LIBPCRE=YesPlease or something else is the desired output until
then.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Very promising results with libpcre2
  2017-04-01 19:11       ` Ævar Arnfjörð Bjarmason
@ 2017-04-02  3:45         ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-04-02  3:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I am very tempted though to support them in parallel, if only for ease
> of performance testing and to be able to roll out support for
> grep.patternType=perl meaning pcre1 for now, but add a
> grep.patternType=pcre2 for testing (and make grep.patternType=pcre1
> work, with grep.patternType=pcre being synonymous with
> grep.patternType=perl, i.e. whatever the default is).

Perhaps.  As long as the code doesn't get too ugly, I think we would
survive ;-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-02  3:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 21:23 Very promising results with libpcre2 Ævar Arnfjörð Bjarmason
2017-03-31 22:48 ` Junio C Hamano
2017-04-01  8:55   ` Ævar Arnfjörð Bjarmason
2017-04-01 18:24     ` Junio C Hamano
2017-04-01 19:11       ` Ævar Arnfjörð Bjarmason
2017-04-02  3:45         ` Junio C Hamano
2017-04-01 23:33       ` Jeffrey Walton
2017-04-02  3:45         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.