All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: git@vger.kernel.org
Subject: Re: git log filtering
Date: Thu, 8 Feb 2007 01:16:54 -0500	[thread overview]
Message-ID: <20070208061654.GA8813@coredump.intra.peff.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0702071334060.8424@woody.linux-foundation.org>

On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote:

> What's PCRE performance like? I'd hate to make "git grep" slower, and it 
> would be stupid and confusing to use two different regex libraries..
>
> Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API 
> standpoint, not from a regex standpoint!) wrapper thing, and it might be 
> interesting to hear if doing "git grep" is slower or faster..

The patch is delightfully simple (though a real patch would probably be
conditional):

diff --git a/Makefile b/Makefile
index aca96c8..cf391dc 100644
--- a/Makefile
+++ b/Makefile
@@ -323,7 +323,7 @@ BUILTIN_OBJS = \
 	builtin-pack-refs.o
 
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
-EXTLIBS = -lz
+EXTLIBS = -lz -lpcreposix -lpcre
 
 #
 # Platform specific tweaks
diff --git a/git-compat-util.h b/git-compat-util.h
index c1bcb00..a6c77f9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -40,7 +40,7 @@
 #include <sys/poll.h>
 #include <sys/socket.h>
 #include <assert.h>
-#include <regex.h>
+#include <pcreposix.h>
 #include <netinet/in.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>


A few numbers, all from a fully packed kernel repository:

# glibc, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36617minor)pagefaults 0swaps

# glibc, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36210minor)pagefaults 0swaps

# pcre, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36571minor)pagefaults 0swaps

# pcre, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36583minor)pagefaults 0swaps


So the winner seems to vary based on the complexity of the pattern.
There are some less rudimentary but non-git performance tests here:

  http://www.boost.org/libs/regex/doc/gcc-performance.html

In every case there, pcre has either comparable performance, or simply
blows away glibc.

One final note that caused some confusion during my testing: git-grep
still uses external grep for working tree greps (i.e., 'git grep foo').
This meant that 'git grep' and 'git grep --cached' produced wildly
different results once I was using pcre internally. Something to look
out for if we switch to pcre (or any other library which doesn't exactly
match external grep behavior!).

-Peff

  reply	other threads:[~2007-02-08  6:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-07 16:41 git log filtering Don Zickus
2007-02-07 16:55 ` Jakub Narebski
2007-02-07 17:01 ` Uwe Kleine-König
2007-02-07 17:12   ` Johannes Schindelin
2007-02-07 17:12 ` Linus Torvalds
2007-02-07 17:25   ` Johannes Schindelin
     [not found]     ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net>
2007-02-07 21:03       ` Linus Torvalds
2007-02-07 21:09         ` Junio C Hamano
2007-02-07 21:53           ` Linus Torvalds
2007-02-08  6:16             ` Jeff King [this message]
2007-02-08 18:06               ` Johannes Schindelin
2007-02-08 22:33                 ` Jeff King
2007-02-09  0:18                   ` Johannes Schindelin
2007-02-09  0:23                     ` Shawn O. Pearce
2007-02-09  0:45                       ` Johannes Schindelin
2007-02-09 10:15                       ` Sergey Vlasov
2007-02-09  1:59                     ` Jeff King
2007-02-09 13:13                       ` Johannes Schindelin
2007-02-09 13:22                         ` Jeff King
2007-02-09 15:02                           ` Johannes Schindelin
2007-03-07 17:37               ` pcre performance, was " Johannes Schindelin
2007-03-07 18:03                 ` Paolo Bonzini
2007-02-08  1:59         ` Horst H. von Brand
2007-02-07 18:16   ` Linus Torvalds
2007-02-07 19:49     ` Fix "git log -z" behaviour Linus Torvalds
2007-02-07 19:55       ` Junio C Hamano
2007-02-07 22:53       ` Don Zickus
2007-02-07 23:05         ` Linus Torvalds
2007-02-08 22:34       ` Junio C Hamano
2007-02-10  7:32         ` Junio C Hamano
2007-02-10  9:36           ` Junio C Hamano
2007-02-10 17:09             ` Linus Torvalds
2007-02-07 18:19   ` git log filtering Don Zickus
2007-02-07 18:27     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070208061654.GA8813@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.