All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>,
	"Zoltán Herczeg" <hzmester@freemail.hu>,
	"Brandon Williams" <bmwill@google.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v4 16/19] grep: add support for the PCRE v1 JIT API
Date: Tue, 25 Apr 2017 21:05:45 +0000	[thread overview]
Message-ID: <20170425210548.24612-17-avarab@gmail.com> (raw)
In-Reply-To: <20170425210548.24612-1-avarab@gmail.com>

Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

When JIT support is enabled the PCRE performance usually improves by
more than 50%. The pattern compilation times are relatively slower,
but those relative numbers are tiny, and are easily made back in all
but the most trivial cases of grep. Detailed benchhmarks are available
at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, shown with git --word-diff:

    7820.1: extended with how.to                               [-0.28(1.23+0.44)-]{+0.28(1.18+0.39)+}
    7820.2: extended with ^how to                              [-0.26(1.15+0.38)-]{+0.27(1.13+0.40)+}
    7820.3: extended with \w+our\w*                            [-6.06(38.44+0.35)-]{+6.11(38.66+0.32)+}
    7820.4: extended with -?-?-?-?-?-?-?-?-?-?-?-----------$   [-0.37(1.57+0.38)-]{+0.37(1.56+0.42)+}
    7820.5: pcre1 with how.to                                  [-0.26(1.15+0.37)-]{+0.19(0.39+0.55)+}
    7820.6: pcre1 with ^how to                                 [-0.46(2.66+0.31)-]{+0.22(0.67+0.44)+}
    7820.7: pcre1 with \w+our\w*                               [-16.42(99.42+0.48)-]{+0.51(3.05+0.24)+}
    7820.8: pcre1 with -?-?-?-?-?-?-?-?-?-?-?-----------$      [-81.52(275.37+0.41)-]{+5.16(19.31+0.33)+}

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will abort. I don't think this is
worth handling, it'll only fail in cases where malloc() doesn't work,
in which case we're screwed anyway.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 27 ++++++++++++++++++++++++++-
 grep.h |  5 +++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index c61e69deaa..8e3ba164b5 100644
--- a/grep.c
+++ b/grep.c
@@ -329,6 +329,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	const char *error;
 	int erroffset;
 	int options = PCRE_MULTILINE;
+#ifdef PCRE_CONFIG_JIT
+	int canjit;
+#endif
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
@@ -343,9 +346,19 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
-	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, &error);
 	if (!p->pcre1_extra_info && error)
 		die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+	pcre_config(PCRE_CONFIG_JIT, &canjit);
+	if (canjit == 1) {
+		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+		if (!p->pcre1_jit_stack)
+			die("BUG: Couldn't allocate PCRE JIT stack");
+		pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
+	}
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -356,8 +369,15 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
+#ifdef PCRE_CONFIG_JIT
+	ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
+			    0, flags, ovector, ARRAY_SIZE(ovector),
+			    p->pcre1_jit_stack);
+#else
 	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
 			0, flags, ovector, ARRAY_SIZE(ovector));
+#endif
+
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
 	if (ret > 0) {
@@ -372,7 +392,12 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
+#ifdef PCRE_CONFIG_JIT
+	pcre_free_study(p->pcre1_extra_info);
+	pcre_jit_stack_free(p->pcre1_jit_stack);
+#else
 	pcre_free(p->pcre1_extra_info);
+#endif
 	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
diff --git a/grep.h b/grep.h
index fa2ab9485f..29e20bf837 100644
--- a/grep.h
+++ b/grep.h
@@ -3,9 +3,13 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include <pcre.h>
+#ifndef PCRE_STUDY_JIT_COMPILE
+#define PCRE_STUDY_JIT_COMPILE 0
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
+typedef int pcre_jit_stack;
 #endif
 #include "kwset.h"
 #include "thread-utils.h"
@@ -48,6 +52,7 @@ struct grep_pat {
 	regex_t regexp;
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
+	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
 	kwset_t kws;
 	unsigned fixed:1;
-- 
2.11.0


  parent reply	other threads:[~2017-04-25 21:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 21:05 [PATCH v4 00/19] PCRE v1 improvements & PCRE v2 support Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 01/19] grep: amend submodule recursion test in preparation for rx engine testing Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 02/19] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 03/19] grep: submodule-related case statements should die if new fields are added Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 04/19] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 05/19] grep: remove redundant `regflags &= ~REG_EXTENDED` assignments Ævar Arnfjörð Bjarmason
2017-04-26  4:50   ` Junio C Hamano
2017-04-26  5:29   ` Junio C Hamano
2017-04-26  7:48     ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2017-04-27  0:51       ` Junio C Hamano
2017-04-25 21:05 ` [PATCH v4 06/19] Makefile & configure: reword outdated comment about PCRE Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 07/19] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 08/19] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 09/19] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 10/19] grep & rev-list doc: stop promising libpcre " Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 11/19] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 12/19] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 13/19] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 14/19] grep: change the internal PCRE code & header " Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 15/19] perf: add a performance comparison test of grep -E and -P Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` Ævar Arnfjörð Bjarmason [this message]
2017-04-25 21:05 ` [PATCH v4 17/19] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 18/19] grep: remove support for concurrent use of both PCRE v1 & v2 Ævar Arnfjörð Bjarmason
2017-04-25 21:05 ` [PATCH v4 19/19] Makefile & configure: make PCRE v2 the default PCRE implementation Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170425210548.24612-17-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hzmester@freemail.hu \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.