git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] grep: do not do external grep on skip-worktree entries
@ 2009-12-30 14:11 Nguyễn Thái Ngọc Duy
  2009-12-31  7:01 ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2009-12-30 14:11 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Skip-worktree entries are not on disk. There is no reason to call
external grep in such cases.

A trace message is also added to aid debugging external grep cases.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin-grep.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index d582232..d49c637 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -488,17 +488,34 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached,
 	read_cache();
 
 #if !NO_EXTERNAL_GREP
+	if (cached)
+		external_grep_allowed = 0;
+	if (external_grep_allowed) {
+		for (nr = 0; nr < active_nr; nr++) {
+			struct cache_entry *ce = active_cache[nr];
+			if (!S_ISREG(ce->ce_mode))
+				continue;
+			if (!pathspec_matches(paths, ce->name, opt->max_depth))
+				continue;
+			if (ce_skip_worktree(ce)) {
+				external_grep_allowed = 0;
+				break;
+			}
+		}
+	}
 	/*
 	 * Use the external "grep" command for the case where
 	 * we grep through the checked-out files. It tends to
 	 * be a lot more optimized
 	 */
-	if (!cached && external_grep_allowed) {
+	if (external_grep_allowed) {
 		hit = external_grep(opt, paths, cached);
 		if (hit >= 0)
 			return hit;
 		hit = 0;
 	}
+	else
+		trace_printf("grep: external grep not used\n");
 #endif
 
 	for (nr = 0; nr < active_nr; nr++) {
-- 
1.6.6.315.g1a406

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2009-12-30 14:11 [PATCH] grep: do not do external grep on skip-worktree entries Nguyễn Thái Ngọc Duy
@ 2009-12-31  7:01 ` Junio C Hamano
  2009-12-31  7:09   ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2009-12-31  7:01 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Skip-worktree entries are not on disk. There is no reason to call
> external grep in such cases.
>
> A trace message is also added to aid debugging external grep cases.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin-grep.c |   19 ++++++++++++++++++-
>  1 files changed, 18 insertions(+), 1 deletions(-)
>
> diff --git a/builtin-grep.c b/builtin-grep.c
> index d582232..d49c637 100644
> --- a/builtin-grep.c
> +++ b/builtin-grep.c
> @@ -488,17 +488,34 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached,
>  	read_cache();
>  
>  #if !NO_EXTERNAL_GREP
> +	if (cached)
> +		external_grep_allowed = 0;
> +	if (external_grep_allowed) {
> +		for (nr = 0; nr < active_nr; nr++) {
> +			struct cache_entry *ce = active_cache[nr];
> +			if (!S_ISREG(ce->ce_mode))
> +				continue;
> +			if (!pathspec_matches(paths, ce->name, opt->max_depth))
> +				continue;
> +			if (ce_skip_worktree(ce)) {
> +				external_grep_allowed = 0;
> +				break;
> +			}
> +		}
> +	}
>  	/*
>  	 * Use the external "grep" command for the case where
>  	 * we grep through the checked-out files. It tends to
>  	 * be a lot more optimized
>  	 */
> -	if (!cached && external_grep_allowed) {
> +	if (external_grep_allowed) {
>  		hit = external_grep(opt, paths, cached);
>  		if (hit >= 0)
>  			return hit;
>  		hit = 0;
>  	}

This looks a bit wrong for a couple of reasons:

 - external_grep() is designed to return negative without running external
   grep when it shouldn't be used (see the beginning of the function for
   how it refuses to run when opt->extended is set and other conditions).
   The new logic seems to belong there, i.e. "in addition to the existing
   case we decline, if ce_skip_worktree() entry exists in the cache, we
   decline";

 - It should be a separate helper function, has_skip_worktree_entry(paths);
   I wouldn't be surprised if there are some other codepaths that want to
   check for the same condition and do things differently, not just grep.

Originally I thought S_ISREG() check and pathspec_matches() check were
also questionable, but they actually are good things to have, as we skip
symbolic links (or submodules) and we do want to use external grep if the
subtree we are grepping in do not have skip_worktree entries even when
some other parts of the index are marked as skip_worktree.

> +	else
> +		trace_printf("grep: external grep not used\n");
>  #endif
>  
>  	for (nr = 0; nr < active_nr; nr++) {
> -- 
> 1.6.6.315.g1a406

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2009-12-31  7:01 ` Junio C Hamano
@ 2009-12-31  7:09   ` Junio C Hamano
  2010-01-02 11:50     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2009-12-31  7:09 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> This looks a bit wrong for a couple of reasons:
>
>  - external_grep() is designed to return negative without running external
>    grep when it shouldn't be used (see the beginning of the function for
>    how it refuses to run when opt->extended is set and other conditions).
>    The new logic seems to belong there, i.e. "in addition to the existing
>    case we decline, if ce_skip_worktree() entry exists in the cache, we
>    decline";

IOW, something like this instead of your patch.  You would want to tests
to demonstrate the original breakage, perhaps?

 builtin-grep.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index 813fe97..25ee75d 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -357,6 +357,21 @@ static void grep_add_color(struct strbuf *sb, const char *escape_seq)
 		strbuf_setlen(sb, sb->len - 1);
 }
 
+static int has_skip_worktree_entry(struct grep_opt *opt, const char **paths)
+{
+	int nr;
+	for (nr = 0; nr < active_nr; nr++) {
+		struct cache_entry *ce = active_cache[nr];
+		if (!S_ISREG(ce->ce_mode))
+			continue;
+		if (!pathspec_matches(paths, ce->name, opt->max_depth))
+			continue;
+		if (ce_skip_worktree(ce))
+			return 1;
+	}
+	return 0;
+}
+
 static int external_grep(struct grep_opt *opt, const char **paths, int cached)
 {
 	int i, nr, argc, hit, len, status;
@@ -365,7 +380,8 @@ static int external_grep(struct grep_opt *opt, const char **paths, int cached)
 	char *argptr = randarg;
 	struct grep_pat *p;
 
-	if (opt->extended || (opt->relative && opt->prefix_length))
+	if (opt->extended || (opt->relative && opt->prefix_length)
+	    || has_skip_worktree_entry(opt, paths))
 		return -1;
 	len = nr = 0;
 	push_arg("grep");

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2009-12-31  7:09   ` Junio C Hamano
@ 2010-01-02 11:50     ` Nguyen Thai Ngoc Duy
  2010-01-02 18:44       ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-02 11:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Dec 30, 2009 at 11:09:52PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > This looks a bit wrong for a couple of reasons:
> >
> >  - external_grep() is designed to return negative without running external
> >    grep when it shouldn't be used (see the beginning of the function for
> >    how it refuses to run when opt->extended is set and other conditions).
> >    The new logic seems to belong there, i.e. "in addition to the existing
> >    case we decline, if ce_skip_worktree() entry exists in the cache, we
> >    decline";
> 
> IOW, something like this instead of your patch.  You would want to tests
> to demonstrate the original breakage, perhaps?

Your patch works great. By the way I think we should move "cached"
check from grep_cache() into external_grep() too, for consistency.

I thought of tests when I wrote the patch, but it was hard to find a
reliable way to detect if a git build supports external grep. Perhaps
this on top of yours? The way to check for external grep support isn't
nice, but it works.


diff --git a/builtin-grep.c b/builtin-grep.c
index f093b60..bc4500a 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -222,6 +222,7 @@ static int exec_grep(int argc, const char **argv)
 	int status;
 
 	argv[argc] = NULL;
+	trace_argv_printf(argv, "trace: grep:");
 	pid = fork();
 	if (pid < 0)
 		return pid;
@@ -371,7 +372,7 @@ static int external_grep(struct grep_opt *opt, const char **paths, int cached)
 	struct grep_pat *p;
 
 	if (opt->extended || (opt->relative && opt->prefix_length)
-	    || has_skip_worktree_entry(opt, paths))
+	    || cached || has_skip_worktree_entry(opt, paths))
 		return -1;
 	len = nr = 0;
 	push_arg("grep");
@@ -509,7 +510,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached,
 	 * we grep through the checked-out files. It tends to
 	 * be a lot more optimized
 	 */
-	if (!cached && external_grep_allowed) {
+	if (external_grep_allowed) {
 		hit = external_grep(opt, paths, cached);
 		if (hit >= 0)
 			return hit;
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
index abd14bf..f77970c 100755
--- a/t/t7002-grep.sh
+++ b/t/t7002-grep.sh
@@ -8,6 +8,14 @@ test_description='git grep various.
 
 . ./test-lib.sh
 
+support_external_grep() {
+	case "$(git grep -h 2>&1 >/dev/null|grep -e --ext-grep)" in
+	*"(default)"*)  return 0;;
+	*"(ignored by this build)"*) return 1;;
+	*) test_expect_success 'External grep check is broken' 'false';;
+	esac
+}
+
 cat >hello.c <<EOF
 #include <stdio.h>
 int main(int argc, const char **argv)
@@ -426,4 +434,20 @@ test_expect_success 'grep -Fi' '
 	test_cmp expected actual
 '
 
+if support_external_grep; then
+
+test_expect_success 'external grep' '
+	GIT_TRACE=2 git grep foo >/dev/null 2>actual &&
+	grep "trace: grep:.*foo" actual >/dev/null
+'
+test_expect_success 'no external grep when skip-worktree entries exist' '
+	git update-index --skip-worktree file &&
+	GIT_TRACE=2 git grep foo >/dev/null 2>actual &&
+	! grep "trace: grep:" actual >/dev/null &&
+	git update-index --no-skip-worktree file
+'
+
+else
+	say "External grep tests skipped"
+fi
 test_done

-- 
Duy

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-02 11:50     ` Nguyen Thai Ngoc Duy
@ 2010-01-02 18:44       ` Junio C Hamano
  2010-01-02 19:15         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-02 18:44 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Wed, Dec 30, 2009 at 11:09:52PM -0800, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>> 
>> > This looks a bit wrong for a couple of reasons:
>> >
>> >  - external_grep() is designed to return negative without running external
>> >    grep when it shouldn't be used (see the beginning of the function for
>> >    how it refuses to run when opt->extended is set and other conditions).
>> >    The new logic seems to belong there, i.e. "in addition to the existing
>> >    case we decline, if ce_skip_worktree() entry exists in the cache, we
>> >    decline";
>> 
>> IOW, something like this instead of your patch.  You would want to tests
>> to demonstrate the original breakage, perhaps?
>
> Your patch works great. By the way I think we should move "cached"
> check from grep_cache() into external_grep() too, for consistency.

I think what I gave you is more consistent.

"cached" is about "are we searching in the index or in the work tree?" and
"external_grep()" is about "it often is faster to run external grep when
we are searching in the work tree, so do that if the other constraints
allow us to".  An example of such a constraint is that we must be able to
express the operation on the command line of a traditional grep, and use
of git extended grep synatx makes it unfeasible, hence the function says
"I can't" in such a case.

You can change the definition of "external_grep()" to "try to use external
grep somewhere, if the other constraints allow us to", and have the caller
and the function do an "this time, please grep for this pattern in the
index---I can't" exchange, but I don't see much point.  We _know_ no
external grep will ever be able to read from the index.

We should simply drop "cached" argument from external_grep().

> I thought of tests when I wrote the patch, but it was hard to find a
> reliable way to detect if a git build supports external grep.

Ah, you are right.  I forgot about that issue.

> diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
> index abd14bf..f77970c 100755
> --- a/t/t7002-grep.sh
> +++ b/t/t7002-grep.sh
> @@ -8,6 +8,14 @@ test_description='git grep various.
>  
>  . ./test-lib.sh
>  
> +support_external_grep() {
> +	case "$(git grep -h 2>&1 >/dev/null|grep -e --ext-grep)" in
> +	*"(default)"*)  return 0;;
> +	*"(ignored by this build)"*) return 1;;
> +	*) test_expect_success 'External grep check is broken' 'false';;
> +	esac
> +}

Heh, clever.

	git grep -h 2>&1 | grep 'allow calling of grep.*default' >/dev/null

may be sufficient, though.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-02 18:44       ` Junio C Hamano
@ 2010-01-02 19:15         ` Nguyen Thai Ngoc Duy
  2010-01-02 19:45           ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-02 19:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 1/3/10, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>  > On Wed, Dec 30, 2009 at 11:09:52PM -0800, Junio C Hamano wrote:
>  >> Junio C Hamano <gitster@pobox.com> writes:
>  >>
>  >> > This looks a bit wrong for a couple of reasons:
>  >> >
>  >> >  - external_grep() is designed to return negative without running external
>  >> >    grep when it shouldn't be used (see the beginning of the function for
>  >> >    how it refuses to run when opt->extended is set and other conditions).
>  >> >    The new logic seems to belong there, i.e. "in addition to the existing
>  >> >    case we decline, if ce_skip_worktree() entry exists in the cache, we
>  >> >    decline";
>  >>
>  >> IOW, something like this instead of your patch.  You would want to tests
>  >> to demonstrate the original breakage, perhaps?
>  >
>  > Your patch works great. By the way I think we should move "cached"
>  > check from grep_cache() into external_grep() too, for consistency.
>
>
> I think what I gave you is more consistent.
>
>  "cached" is about "are we searching in the index or in the work tree?" and
>  "external_grep()" is about "it often is faster to run external grep when
>  we are searching in the work tree, so do that if the other constraints
>  allow us to".  An example of such a constraint is that we must be able to
>  express the operation on the command line of a traditional grep, and use
>  of git extended grep synatx makes it unfeasible, hence the function says
>  "I can't" in such a case.
>
>  You can change the definition of "external_grep()" to "try to use external
>  grep somewhere, if the other constraints allow us to", and have the caller
>  and the function do an "this time, please grep for this pattern in the
>  index---I can't" exchange, but I don't see much point.  We _know_ no
>  external grep will ever be able to read from the index.
>
>  We should simply drop "cached" argument from external_grep().

OK.

>  > diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
>  > index abd14bf..f77970c 100755
>  > --- a/t/t7002-grep.sh
>  > +++ b/t/t7002-grep.sh
>  > @@ -8,6 +8,14 @@ test_description='git grep various.
>  >
>  >  . ./test-lib.sh
>  >
>  > +support_external_grep() {
>  > +     case "$(git grep -h 2>&1 >/dev/null|grep -e --ext-grep)" in
>  > +     *"(default)"*)  return 0;;
>  > +     *"(ignored by this build)"*) return 1;;
>  > +     *) test_expect_success 'External grep check is broken' 'false';;
>  > +     esac
>  > +}
>
>
> Heh, clever.
>
>         git grep -h 2>&1 | grep 'allow calling of grep.*default' >/dev/null
>
>  may be sufficient, though.
>

Yes, until somebody changes help text in builtin-grep.c and all
external grep tests become disable. I wanted to catch that case too.

Or maybe just add another option --show-features to "git rev-parse"
(or extend git version string, is the version format fixed?) to list
all built-time features that a git build supports. Some comes to mind:
external-grep, iconv, ipv6, http (unlikely), threading... Much less
tricky to test.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-02 19:15         ` Nguyen Thai Ngoc Duy
@ 2010-01-02 19:45           ` Junio C Hamano
  2010-01-03  2:35             ` Miles Bader
                               ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-02 19:45 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

>>  > +support_external_grep() {
>>  > +     case "$(git grep -h 2>&1 >/dev/null|grep -e --ext-grep)" in
>>  > +     *"(default)"*)  return 0;;
>>  > +     *"(ignored by this build)"*) return 1;;
>>  > +     *) test_expect_success 'External grep check is broken' 'false';;
>>  > +     esac
>>  > +}
>>
>>
>> Heh, clever.
>>
>>         git grep -h 2>&1 | grep 'allow calling of grep.*default' >/dev/null
>>
>>  may be sufficient, though.
>
> Yes, until somebody changes help text in builtin-grep.c and all
> external grep tests become disable. I wanted to catch that case too.

Ok, that is a worthwhile thing to do.

Then please at least make the second grep "grep ext-grep", droping "-e --"
from it.  We assume some implementation of external grep to lack "-e"
(e.g. Solaris).

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-02 19:45           ` Junio C Hamano
@ 2010-01-03  2:35             ` Miles Bader
  2010-01-03  2:47               ` Miles Bader
  2010-01-04 12:34             ` [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported Nguyễn Thái Ngọc Duy
  2010-01-04 12:34             ` [PATCH 2/2] t7002: add tests for skip-worktree fixes in commit a67e281 Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 60+ messages in thread
From: Miles Bader @ 2010-01-03  2:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git

Junio C Hamano <gitster@pobox.com> writes:
> Then please at least make the second grep "grep ext-grep", droping "-e --"
> from it.  We assume some implementation of external grep to lack "-e"
> (e.g. Solaris).

Isn't "-e" a "classic" grep option tho?

-Miles

-- 
Consult, v.i. To seek another's disapproval of a course already decided on.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-03  2:35             ` Miles Bader
@ 2010-01-03  2:47               ` Miles Bader
  2010-01-03  3:08                 ` Miles Bader
  0 siblings, 1 reply; 60+ messages in thread
From: Miles Bader @ 2010-01-03  2:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git

On Sun, Jan 3, 2010 at 11:35 AM, Miles Bader <miles@gnu.org> wrote:
>> Then please at least make the second grep "grep ext-grep", droping "-e --"
>> from it.  We assume some implementation of external grep to lack "-e"
>> (e.g. Solaris).
>
> Isn't "-e" a "classic" grep option tho?

Hmm, a bit of googling, and it seems that while 7th Edition unix (as
classic as I get) had -e, solaris indeed doesn't....

-miles

-- 
Do not taunt Happy Fun Ball.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-03  2:47               ` Miles Bader
@ 2010-01-03  3:08                 ` Miles Bader
  2010-01-03 19:32                   ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Miles Bader @ 2010-01-03  3:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git

Miles Bader <miles@gnu.org> writes:
>> Isn't "-e" a "classic" grep option tho?
>
> Hmm, a bit of googling, and it seems that while 7th Edition unix (as
> classic as I get) had -e, solaris indeed doesn't....

Tho solaris does have it in /usr/xpg4/bin/grep...

Since it's a general attribute of solaris that the default (/usr/bin)
tools are horrible sysv things and the actual useful tools are in
e.g. /usr/xpg4/bin, maybe it would be better to just try and add that
directory to the path...?

-miles

-- 
Americans are broad-minded people.  They'll accept the fact that a person can
be an alcoholic, a dope fiend, a wife beater, and even a newspaperman, but if
a man doesn't drive, there is something wrong with him.  -- Art Buchwald

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-03  3:08                 ` Miles Bader
@ 2010-01-03 19:32                   ` Linus Torvalds
  2010-01-03 20:49                     ` Junio C Hamano
  2010-01-04  6:06                     ` Mike Hommey
  0 siblings, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-03 19:32 UTC (permalink / raw)
  To: Miles Bader; +Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, git



On Sun, 3 Jan 2010, Miles Bader wrote:
> 
> Since it's a general attribute of solaris that the default (/usr/bin)
> tools are horrible sysv things and the actual useful tools are in
> e.g. /usr/xpg4/bin, maybe it would be better to just try and add that
> directory to the path...?

There is no generic way to make solaris sane, I'm afraid.

Everybody agrees that the "normal" Solaris tools are so horrible that they 
all set up some alternative. However, qutie often the preferred 
alternative is the GNU versions, mostly installed in /usr/local, and then 
they put that first in the path.

Which means that if you put /usr/xpg4/bin before other paths in your PATH, 
you'll totally break such systems, because now you get the (inferior) 
tools in xpg4 before the preferred tools in /usr/local. Or - this also 
happens - people end up installing their own versions in $HOME/bin, 
because the system admin is uncaring or incompetent.

In other words, there is no single normal or default Solaris installation 
that we can work around. The normal/default installation is so horrible 
that it's virtually never used in any environment where people actually 
have shell access and do development.

Don't ask me why. EVERYBODY knows that the default /usr/bin is total crap. 
Even Sun people know. But there's apparently some very deep-seated reason 
(probably not technical, but mental/historical) why they can't be fixed or 
replaced, probably relating to "make world" and the whole build system.

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-03 19:32                   ` Linus Torvalds
@ 2010-01-03 20:49                     ` Junio C Hamano
  2010-01-04  5:31                       ` Jeff King
  2010-01-04  6:06                     ` Mike Hommey
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-03 20:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Nguyen Thai Ngoc Duy, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Which means that if you put /usr/xpg4/bin before other paths in your PATH, 
> you'll totally break such systems, because now you get the (inferior) 
> tools in xpg4 before the preferred tools in /usr/local. Or - this also 
> happens - people end up installing their own versions in $HOME/bin, 
> because the system admin is uncaring or incompetent.

The build allows you to define SANE_TOOL_PATH ("the tools found in here
are saner than the ones in /usr/bin or /bin" is what it means) and we
insert it just in front of /usr/bin or /bin in the original PATH (see
git_brokne_path_fix in git-sh-setup.sh).

I would call this the right thing (TM) or the best workaround we could do
under the constraints, depending on the mood.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-03 20:49                     ` Junio C Hamano
@ 2010-01-04  5:31                       ` Jeff King
  2010-01-04  5:52                         ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2010-01-04  5:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

On Sun, Jan 03, 2010 at 12:49:15PM -0800, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > Which means that if you put /usr/xpg4/bin before other paths in your PATH, 
> > you'll totally break such systems, because now you get the (inferior) 
> > tools in xpg4 before the preferred tools in /usr/local. Or - this also 
> > happens - people end up installing their own versions in $HOME/bin, 
> > because the system admin is uncaring or incompetent.
> 
> The build allows you to define SANE_TOOL_PATH ("the tools found in here
> are saner than the ones in /usr/bin or /bin" is what it means) and we
> insert it just in front of /usr/bin or /bin in the original PATH (see
> git_brokne_path_fix in git-sh-setup.sh).
> 
> I would call this the right thing (TM) or the best workaround we could do
> under the constraints, depending on the mood.

I agree that Solaris default tools are insane, but is there any reason
to munge the PATH for a single feature like external grep? Why not
EXTERNAL_GREP=/usr/xpg4/bin/grep (or /usr/local/bin/grep) in the
Makefile? Why not GIT_EXTERNAL_GREP=$HOME/bin/grep in the environment?

Obviously we still need SANE_TOOL_PATH for systems where the /usr/bin is
so crappy as to be unusable for our scripts. But surely we can do better
for individual tools where the user might have some more specific
preference about which tool he uses.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  5:31                       ` Jeff King
@ 2010-01-04  5:52                         ` Junio C Hamano
  2010-01-04  6:44                           ` Jeff King
  2010-01-04 10:14                           ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-04  5:52 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

Jeff King <peff@peff.net> writes:

> On Sun, Jan 03, 2010 at 12:49:15PM -0800, Junio C Hamano wrote:
>
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> 
>> > Which means that if you put /usr/xpg4/bin before other paths in your PATH, 
>> > you'll totally break such systems, because now you get the (inferior) 
>> > tools in xpg4 before the preferred tools in /usr/local. Or - this also 
>> > happens - people end up installing their own versions in $HOME/bin, 
>> > because the system admin is uncaring or incompetent.
>> 
>> The build allows you to define SANE_TOOL_PATH ("the tools found in here
>> are saner than the ones in /usr/bin or /bin" is what it means) and we
>> insert it just in front of /usr/bin or /bin in the original PATH (see
>> git_brokne_path_fix in git-sh-setup.sh).
>> 
>> I would call this the right thing (TM) or the best workaround we could do
>> under the constraints, depending on the mood.
>
> I agree that Solaris default tools are insane, but is there any reason
> to munge the PATH for a single feature like external grep? Why not
> EXTERNAL_GREP=/usr/xpg4/bin/grep (or /usr/local/bin/grep) in the
> Makefile? Why not GIT_EXTERNAL_GREP=$HOME/bin/grep in the environment?

That git-sh-setup "fix" is not for running external grep.  It is for our
scripted Porcelains that rely on working basic tools (sed, tr, who knows
what else is broken).

In fact, our Makefile by default punts on external grep on Sun's.  Run
"git grep NO_EXTERNAL_GREP -- Makefile" to see for yourself --- it would
work even on Solaris ;-)

> Obviously we still need SANE_TOOL_PATH for systems where the /usr/bin is
> so crappy as to be unusable for our scripts. But surely we can do better
> for individual tools where the user might have some more specific
> preference about which tool he uses.
>
> -Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-03 19:32                   ` Linus Torvalds
  2010-01-03 20:49                     ` Junio C Hamano
@ 2010-01-04  6:06                     ` Mike Hommey
  2010-01-04  7:04                       ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Mike Hommey @ 2010-01-04  6:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Junio C Hamano, Nguyen Thai Ngoc Duy, git

On Sun, Jan 03, 2010 at 11:32:24AM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 3 Jan 2010, Miles Bader wrote:
> > 
> > Since it's a general attribute of solaris that the default (/usr/bin)
> > tools are horrible sysv things and the actual useful tools are in
> > e.g. /usr/xpg4/bin, maybe it would be better to just try and add that
> > directory to the path...?
> 
> There is no generic way to make solaris sane, I'm afraid.
> 
> Everybody agrees that the "normal" Solaris tools are so horrible that they 
> all set up some alternative. However, qutie often the preferred 
> alternative is the GNU versions, mostly installed in /usr/local, and then 
> they put that first in the path.

Or /usr/sfw, where Sun themselves ship some GNU tools

> Which means that if you put /usr/xpg4/bin before other paths in your PATH, 
> you'll totally break such systems, because now you get the (inferior) 
> tools in xpg4 before the preferred tools in /usr/local. Or - this also 
> happens - people end up installing their own versions in $HOME/bin, 
> because the system admin is uncaring or incompetent.
> 
> In other words, there is no single normal or default Solaris installation 
> that we can work around. The normal/default installation is so horrible 
> that it's virtually never used in any environment where people actually 
> have shell access and do development.
> 
> Don't ask me why. EVERYBODY knows that the default /usr/bin is total crap. 
> Even Sun people know. But there's apparently some very deep-seated reason 
> (probably not technical, but mental/historical) why they can't be fixed or 
> replaced, probably relating to "make world" and the whole build system.

There is simply no explanation for these tools to be that crappy at all.
The most common reason that is given for these tools to stay as crappy as
they always have been is for backwards compatibility. But how does that
prevent from adding features is beyond me. And on the other hand, they
*do* add features to their /usr/bin tools, such as -h in (at least) df
and ls.

Anyways, why not generate the hash-bang lines of the shell scripts from a
Makefile variable that would be set to /usr/xpg4/bin/sh on Solaris and
/bin/sh on others ?

Mike

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  5:52                         ` Junio C Hamano
@ 2010-01-04  6:44                           ` Jeff King
  2010-01-04  7:08                             ` Junio C Hamano
  2010-01-04 15:54                             ` Linus Torvalds
  2010-01-04 10:14                           ` Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 60+ messages in thread
From: Jeff King @ 2010-01-04  6:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

On Sun, Jan 03, 2010 at 09:52:10PM -0800, Junio C Hamano wrote:

> > I agree that Solaris default tools are insane, but is there any reason
> > to munge the PATH for a single feature like external grep? Why not
> > EXTERNAL_GREP=/usr/xpg4/bin/grep (or /usr/local/bin/grep) in the
> > Makefile? Why not GIT_EXTERNAL_GREP=$HOME/bin/grep in the environment?
> 
> That git-sh-setup "fix" is not for running external grep.  It is for our
> scripted Porcelains that rely on working basic tools (sed, tr, who knows
> what else is broken).

Right, but I thought this thread was about external grep, and I thought
you were saying "if you want decent tools, you can use SANE_TOOL_PATH".
And I think we can do much better for that particular case than
recommending SANE_TOOL_PATH (but it seems that is not what you were
actually recommending).

But I admit, I have never really wanted to specify my own external grep.
Wanting your own grep for _features_ is probably insane, as some of your
greps (on worktree files) will use the external grep, and some (on
cached files) will not.  So it is really just an optimization, and I
have never felt it so slow that I cared about messing with an
alternative grep on Solaris.

I have to wonder, though...did anybody ever actually profile our
internal grep to find out _why_ it was so much slower than GNU grep?
Could we simply ship a better grep engine and obsolete external grep?

> In fact, our Makefile by default punts on external grep on Sun's.  Run
> "git grep NO_EXTERNAL_GREP -- Makefile" to see for yourself --- it would
> work even on Solaris ;-)

Yes, I am even mentioned in the commit log of 01ae841c. :)

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  6:06                     ` Mike Hommey
@ 2010-01-04  7:04                       ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2010-01-04  7:04 UTC (permalink / raw)
  To: Mike Hommey
  Cc: Linus Torvalds, Miles Bader, Junio C Hamano, Nguyen Thai Ngoc Duy, git

On Mon, Jan 04, 2010 at 07:06:46AM +0100, Mike Hommey wrote:

> Anyways, why not generate the hash-bang lines of the shell scripts from a
> Makefile variable that would be set to /usr/xpg4/bin/sh on Solaris and
> /bin/sh on others ?

We do that already (though we default it to /bin/bash). SANE_TOOL_PATH
is about picking up all the _other_ tools for use inside our shell
scripts, like non-crappy sed, grep, etc.

Brandon did some testing with ksh and sent a patch to default to the
stock ksh on Solaris and IRIX, which is probably saner than bash for a
default, but it seems to have gotten dropped:

  http://article.gmane.org/gmane.comp.version-control.git/129830

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  6:44                           ` Jeff King
@ 2010-01-04  7:08                             ` Junio C Hamano
  2010-01-04  7:14                               ` Junio C Hamano
  2010-01-04  7:26                               ` Jeff King
  2010-01-04 15:54                             ` Linus Torvalds
  1 sibling, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-04  7:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

Jeff King <peff@peff.net> writes:

> I have to wonder, though...did anybody ever actually profile our
> internal grep to find out _why_ it was so much slower than GNU grep?

I vaguely recall that somebody fairly competent mentioned that modern grep
implementations are based on DFA engines, but I offhand don't remember if
the discussion had concrete numbers.

> Could we simply ship a better grep engine and obsolete external grep?

Yes, that is a very constructive longer term solution.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  7:08                             ` Junio C Hamano
@ 2010-01-04  7:14                               ` Junio C Hamano
  2010-01-04  7:29                                 ` Jeff King
  2010-01-04  7:26                               ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-04  7:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> I have to wonder, though...did anybody ever actually profile our
>> internal grep to find out _why_ it was so much slower than GNU grep?
>
> I vaguely recall that somebody fairly competent mentioned that modern grep
> implementations are based on DFA engines, but I offhand don't remember if
> the discussion had concrete numbers.

I found only this one.  The author was indeed somebody fairly competent.

    http://thread.gmane.org/gmane.comp.version-control.git/23798

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  7:08                             ` Junio C Hamano
  2010-01-04  7:14                               ` Junio C Hamano
@ 2010-01-04  7:26                               ` Jeff King
  2010-01-04  8:09                                 ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff King @ 2010-01-04  7:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

On Sun, Jan 03, 2010 at 11:08:46PM -0800, Junio C Hamano wrote:

> > I have to wonder, though...did anybody ever actually profile our
> > internal grep to find out _why_ it was so much slower than GNU grep?
> 
> I vaguely recall that somebody fairly competent mentioned that modern grep
> implementations are based on DFA engines, but I offhand don't remember if
> the discussion had concrete numbers.

Probably this:

  http://article.gmane.org/gmane.comp.version-control.git/41685

Also of interest is:

  http://article.gmane.org/gmane.comp.version-control.git/50174

The pcre analysis there came from just using the "pcreposix" header, I
think. From my limited research, modern pcre may have some tuning
options (including a DFA engine!) that could do a lot better.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  7:14                               ` Junio C Hamano
@ 2010-01-04  7:29                                 ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2010-01-04  7:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

On Sun, Jan 03, 2010 at 11:14:31PM -0800, Junio C Hamano wrote:

> > I vaguely recall that somebody fairly competent mentioned that modern grep
> > implementations are based on DFA engines, but I offhand don't remember if
> > the discussion had concrete numbers.
> 
> I found only this one.  The author was indeed somebody fairly competent.
> 
>     http://thread.gmane.org/gmane.comp.version-control.git/23798

Er, that link seems to be about internal grep beating external on a cold
cache because it does less I/O. Did you mean something else (maybe the
thread I pointed to in another message)?

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  7:26                               ` Jeff King
@ 2010-01-04  8:09                                 ` Jeff King
  2010-01-04 16:01                                   ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2010-01-04  8:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Miles Bader, Nguyen Thai Ngoc Duy, git

On Mon, Jan 04, 2010 at 02:26:59AM -0500, Jeff King wrote:

> The pcre analysis there came from just using the "pcreposix" header, I
> think. From my limited research, modern pcre may have some tuning
> options (including a DFA engine!) that could do a lot better.

Hmm. I was able to get some improvements by using pcre_dfa_exec, but
still not as good as external grep. For "git grep 'foo.*bar'" in the
linux-2.6 repo, I got:

  external grep: 0.76s
  pcre_dfa_exec: 1.85s
      pcre_exec: 3.21s
          glibc: 4.00s

However, gprof reports that for the pcre dfa case, we spend more time in
grep.c:end_of_line than we do actually running the regex. So clearly
there are some other micro-optimizations in GNU grep that are making a
difference, too.

By the way, you can see the abysmal performance of our internal code by
doing a "git grep foo". It uses the "fixed" internal engine and weighs
in at 3.24s on the same machine, _slower_ than pcre doing an actual
regex.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  5:52                         ` Junio C Hamano
  2010-01-04  6:44                           ` Jeff King
@ 2010-01-04 10:14                           ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 60+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-04 10:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Linus Torvalds, Miles Bader, git

On Mon, Jan 4, 2010 at 12:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> I agree that Solaris default tools are insane, but is there any reason
>> to munge the PATH for a single feature like external grep? Why not
>> EXTERNAL_GREP=/usr/xpg4/bin/grep (or /usr/local/bin/grep) in the
>> Makefile? Why not GIT_EXTERNAL_GREP=$HOME/bin/grep in the environment?
>
> That git-sh-setup "fix" is not for running external grep.  It is for our
> scripted Porcelains that rely on working basic tools (sed, tr, who knows
> what else is broken).
>
> In fact, our Makefile by default punts on external grep on Sun's.  Run
> "git grep NO_EXTERNAL_GREP -- Makefile" to see for yourself --- it would
> work even on Solaris ;-)

A bit off-topic. But it seems to me on linux (main development
platform?) GNU grep may be tested more than the builtin grep because
NO_EXTERNAL_GREP would be undefined by default. Should we test both
greps in that case?
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported
  2010-01-02 19:45           ` Junio C Hamano
  2010-01-03  2:35             ` Miles Bader
@ 2010-01-04 12:34             ` Nguyễn Thái Ngọc Duy
  2010-01-07  2:37               ` Junio C Hamano
  2010-01-04 12:34             ` [PATCH 2/2] t7002: add tests for skip-worktree fixes in commit a67e281 Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 60+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-01-04 12:34 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Nguyễn Thái Ngọc Duy

Add another test to set prerequisite "external-grep" if the current
build supports external grep. This can be used to skip external grep
only tests on builds that do not support this optimization.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t7002-grep.sh |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
index abd14bf..ffda0df 100755
--- a/t/t7002-grep.sh
+++ b/t/t7002-grep.sh
@@ -8,6 +8,18 @@ test_description='git grep various.
 
 . ./test-lib.sh
 
+test_expect_success 'Check for external grep support' '
+	case "$(git grep -h 2>&1|grep ext-grep)" in
+	*"(default)"*)
+		test_set_prereq external-grep
+		true;;
+	*"(ignored by this build)"*)
+		true;;
+	*)
+		false;;
+	esac
+'
+
 cat >hello.c <<EOF
 #include <stdio.h>
 int main(int argc, const char **argv)
-- 
1.6.6.315.g1a406

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/2] t7002: add tests for skip-worktree fixes in commit a67e281
  2010-01-02 19:45           ` Junio C Hamano
  2010-01-03  2:35             ` Miles Bader
  2010-01-04 12:34             ` [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported Nguyễn Thái Ngọc Duy
@ 2010-01-04 12:34             ` Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 60+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-01-04 12:34 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin-grep.c  |    1 +
 t/t7002-grep.sh |   12 ++++++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index f093b60..59c4b12 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -222,6 +222,7 @@ static int exec_grep(int argc, const char **argv)
 	int status;
 
 	argv[argc] = NULL;
+	trace_argv_printf(argv, "trace: grep:");
 	pid = fork();
 	if (pid < 0)
 		return pid;
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
index ffda0df..ac0a658 100755
--- a/t/t7002-grep.sh
+++ b/t/t7002-grep.sh
@@ -438,4 +438,16 @@ test_expect_success 'grep -Fi' '
 	test_cmp expected actual
 '
 
+test_expect_success external-grep 'external grep is called' '
+	GIT_TRACE=2 git grep foo >/dev/null 2>actual &&
+	grep "trace: grep:.*foo" actual >/dev/null
+'
+
+test_expect_success external-grep 'no external grep when skip-worktree entries exist' '
+	git update-index --skip-worktree file &&
+	GIT_TRACE=2 git grep foo >/dev/null 2>actual &&
+	! grep "trace: grep:" actual >/dev/null &&
+	git update-index --no-skip-worktree file
+'
+
 test_done
-- 
1.6.6.315.g1a406

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  6:44                           ` Jeff King
  2010-01-04  7:08                             ` Junio C Hamano
@ 2010-01-04 15:54                             ` Linus Torvalds
  2010-01-04 15:57                               ` Miles Bader
  2010-01-04 16:24                               ` Linus Torvalds
  1 sibling, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-04 15:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git



On Mon, 4 Jan 2010, Jeff King wrote:
> 
> I have to wonder, though...did anybody ever actually profile our
> internal grep to find out _why_ it was so much slower than GNU grep?
> Could we simply ship a better grep engine and obsolete external grep?

The internal grep is about 2.5 times slower than the external one for me. 
That's a big deal:

 - external grep:

	[torvalds@nehalem linux]$ time git grep qwerty
	...
	real	0m0.412s
	user	0m0.196s
	sys	0m0.132s

 - NO_EXTERNAL_GREP:

	[torvalds@nehalem linux]$ time ~/git/git grep qwerty
	...
	real	0m1.006s
	user	0m0.900s
	sys	0m0.096s

so that's not even close.

And "perf record" followed by "perf report" on the internal one shows 
that it's not even regexec() - we use strstr() for the trivial case:

    43.63%      git  /home/torvalds/git/git         [.] grep_buffer_1
    25.19%      git  /lib64/libc-2.11.so            [.] __strstr_sse42
     9.16%      git  /home/torvalds/git/git         [.] match_one_pattern
     4.79%      git  /lib64/libc-2.11.so            [.] __m128i_strloadu

bit it seems to be all that line-per-line crud. If we got rid of that one, 
and could do the match as a _single_ regexec() instead (at least for the 
trivial cases of just one grep expression), perhaps we'd be better off.

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04 15:54                             ` Linus Torvalds
@ 2010-01-04 15:57                               ` Miles Bader
  2010-01-04 16:03                                 ` Linus Torvalds
  2010-01-04 16:24                               ` Linus Torvalds
  1 sibling, 1 reply; 60+ messages in thread
From: Miles Bader @ 2010-01-04 15:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Junio C Hamano, Nguyen Thai Ngoc Duy, git

On Tue, Jan 5, 2010 at 12:54 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> And "perf record" followed by "perf report" on the internal one shows
> that it's not even regexec() - we use strstr() for the trivial case:

Does strstr use e.g. boyer-moore?  I imagine grep does...

-miles

-- 
Do not taunt Happy Fun Ball.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04  8:09                                 ` Jeff King
@ 2010-01-04 16:01                                   ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-04 16:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git



On Mon, 4 Jan 2010, Jeff King wrote:
> 
> However, gprof reports that for the pcre dfa case, we spend more time in
> grep.c:end_of_line than we do actually running the regex. So clearly
> there are some other micro-optimizations in GNU grep that are making a
> difference, too.

Don't use gprof. You're _much_ better off using the newish Linux 'perf' 
tool. It's quite competent, and doesn't need the code to be compiled with 
-pg (which totally changes all performance characteristics).

Do something like this:

	perf record git grep qwerty

followed by

	perf report
	perf annotate grep_buffer_1

(that "perf report" gives a per-symbol overview, the "perf annotate" gives 
a disassembly with source annotations and per-instruction costs). It works 
with inlining too, so you get things like this:

	...
         :      static char *end_of_line(char *cp, unsigned long *left)
         :      {
         :              unsigned long l = *left;
         :              while (l && *cp != '\n') {
   24.76 :        476a50:       80 3b 0a                cmpb   $0xa,(%rbx)
   10.46 :        476a53:       0f 84 e7 00 00 00       je     476b40 <grep_buffer_1+0x1b0>
         :                      l--;
         :                      cp++;
   21.19 :        476a59:       48 83 c3 01             add    $0x1,%rbx
         :      }
         :
         :      static char *end_of_line(char *cp, unsigned long *left)
         :      {
         :              unsigned long l = *left;
         :              while (l && *cp != '\n') {
    0.94 :        476a5d:       49 83 ed 01             sub    $0x1,%r13
    4.85 :        476a61:       75 ed                   jne    476a50 <grep_buffer_1+0xc0>
         :
	...

and yes, it's all the per-line crap.

The perf tools are included with modern kernels in tools/perf (which also 
has a Documentation subdirectory). I can pretty much guarantee that once 
you start using it, you'll never use gprof or oprofile again.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04 15:57                               ` Miles Bader
@ 2010-01-04 16:03                                 ` Linus Torvalds
  2010-01-11  6:39                                   ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2010-01-04 16:03 UTC (permalink / raw)
  To: Miles Bader; +Cc: Jeff King, Junio C Hamano, Nguyen Thai Ngoc Duy, git



On Tue, 5 Jan 2010, Miles Bader wrote:

> On Tue, Jan 5, 2010 at 12:54 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > And "perf record" followed by "perf report" on the internal one shows
> > that it's not even regexec() - we use strstr() for the trivial case:
> 
> Does strstr use e.g. boyer-moore?  I imagine grep does...

It doesn't matter. Since we do the line-by-line thing, the input is always 
so short that DFA vs NFA vs BM vs other-clever-search doesn't matter. 
There is no scaling - the grep buffer tends to be too small for the 
algorithm to matter.

And the reason we do things line-by-line is that we need to then output 
things line-per-line.

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04 15:54                             ` Linus Torvalds
  2010-01-04 15:57                               ` Miles Bader
@ 2010-01-04 16:24                               ` Linus Torvalds
  1 sibling, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-04 16:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git



On Mon, 4 Jan 2010, Linus Torvalds wrote:
> 
>  - external grep:
> 
> 	[torvalds@nehalem linux]$ time git grep qwerty
> 	...
> 	real	0m0.412s
> 	user	0m0.196s
> 	sys	0m0.132s
> 
>  - NO_EXTERNAL_GREP:
> 
> 	[torvalds@nehalem linux]$ time ~/git/git grep qwerty
> 	...
> 	real	0m1.006s
> 	user	0m0.900s
> 	sys	0m0.096s
> 
> so that's not even close.

Side note: at least for me, if we did some auto-parallelization, the 
internal grep would make up for all its other suckiness. Do four or eight 
greps in parallel, and buffer the results (you still need to show them in 
the right order).

That might be an acceptable way to "fix" it. Developers pretty much all 
have at least two cores these days, some of us have four+HT. We use 
threads in other places, maybe this could be one more of them.

(Start 'n' threads, do an initial per-thread regex and 'regcomp()' to make 
it thread-safer, and the only interesting issue would be serializing the 
output. Whenever you get a result, you'd need to make sure that all files 
before have been completed, but you could do that all under a specific 
lock that protects completion information).

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported
  2010-01-04 12:34             ` [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported Nguyễn Thái Ngọc Duy
@ 2010-01-07  2:37               ` Junio C Hamano
  2010-01-07  4:29                 ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-07  2:37 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Add another test to set prerequisite "external-grep" if the current
> build supports external grep. This can be used to skip external grep
> only tests on builds that do not support this optimization.

Thanks.  We seem to spell our prerequistes in a single-word, all-caps, so
I'll change this new one to EXTGREP in both [1/2] and [2/2].

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported
  2010-01-07  2:37               ` Junio C Hamano
@ 2010-01-07  4:29                 ` Junio C Hamano
  2010-01-07 13:27                   ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-07  4:29 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> Add another test to set prerequisite "external-grep" if the current
>> build supports external grep. This can be used to skip external grep
>> only tests on builds that do not support this optimization.
>
> Thanks.  We seem to spell our prerequistes in a single-word, all-caps, so
> I'll change this new one to EXTGREP in both [1/2] and [2/2].

Sorry, but I had this "Sheesh, why didn't I think of that earlier before
wasting Nguyễn's time" moment.

Why don't we just test what we _want to_ test?  After all what a67e281
(grep: do not do external grep on skip-worktree entries, 2009-12-30)
wanted to make sure was this:

    "git grep" (without --cached) should grep from the index for paths
    that are marked as skip-worktree.

So how about writing some string that does not appear in the version in
the index in the work tree file, and run "git grep" to make sure it
doesn't find it?

Yes, some implementations/builds of "git grep" may not even try to cheat
and run external grep and for them the test _should_ succeed (but your
logic to check with ce_skip_worktree() in grep_cache() may be broken by
later patch while you are looking the other way), and some will try to
cheat and the fix was about not letting them.

So by writing the test to check the desired outcome, instead of writing it
for the particular implementation of using external grep optimization, you
will catch both kinds of breakages.

Perhaps something like this (untested, of course)?

test_expect_success 'strings in work tree files are not found for skip-wt paths' '
	no="no such string in the index" &&
	test_must_fail git grep -e "$no" --cached file &&
	git update-index --skip-worktree file &&
	echo "$no" >file &&
	test_must_fail git grep -e "$no" file &&
	git update-index --no-skip-worktree file &&
	git grep -e "$no" file
'

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] t7002: set test prerequisite "external-grep" if  supported
  2010-01-07  4:29                 ` Junio C Hamano
@ 2010-01-07 13:27                   ` Nguyen Thai Ngoc Duy
  2010-01-07 14:04                     ` Johannes Sixt
  0 siblings, 1 reply; 60+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-07 13:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
>  So by writing the test to check the desired outcome, instead of writing it
>  for the particular implementation of using external grep optimization, you
>  will catch both kinds of breakages.
>
>  Perhaps something like this (untested, of course)?
>
>  test_expect_success 'strings in work tree files are not found for skip-wt paths' '
>         no="no such string in the index" &&
>         test_must_fail git grep -e "$no" --cached file &&
>         git update-index --skip-worktree file &&
>         echo "$no" >file &&
>         test_must_fail git grep -e "$no" file &&
>         git update-index --no-skip-worktree file &&
>         git grep -e "$no" file
>  '
>

Very well reasoned. I'd say go for it!

Tested-by: me
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] t7002: set test prerequisite "external-grep" if  supported
  2010-01-07 13:27                   ` Nguyen Thai Ngoc Duy
@ 2010-01-07 14:04                     ` Johannes Sixt
  2010-01-07 14:26                       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 60+ messages in thread
From: Johannes Sixt @ 2010-01-07 14:04 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git

Nguyen Thai Ngoc Duy schrieb:
> On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
>>  So by writing the test to check the desired outcome, instead of writing it
>>  for the particular implementation of using external grep optimization, you
>>  will catch both kinds of breakages.
>>
>>  Perhaps something like this (untested, of course)?
>>
>>  test_expect_success 'strings in work tree files are not found for skip-wt paths' '
>>         no="no such string in the index" &&
>>         test_must_fail git grep -e "$no" --cached file &&
>>         git update-index --skip-worktree file &&
>>         echo "$no" >file &&
>>         test_must_fail git grep -e "$no" file &&
>>         git update-index --no-skip-worktree file &&
>>         git grep -e "$no" file
>>  '
>>
> 
> Very well reasoned. I'd say go for it!
> 
> Tested-by: me

The test is not quite complete. Not only do you want to test that the
worktree file is not looked at, but that the index version is used:


test_expect_success 'for skip-wt paths, strings are found in index, not in
worktree' '
	yes="this string is in the index" &&
	no="no such string in the index" &&
	echo "$yes" >file &&
	git update-index file &&
	echo "$no" >file &&
	git grep -e "$yes" --cached file &&
	test_must_fail git grep -e "$no" --cached file &&
	git update-index --skip-worktree file &&
	git grep -e "$yes" file &&
	test_must_fail git grep -e "$no" file &&
	git update-index --no-skip-worktree file &&
	test_must_fail git grep -e "$yes" file &&
	git grep -e "$no" file
'

Just as untested... ;)

-- Hannes

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] t7002: set test prerequisite "external-grep" if  supported
  2010-01-07 14:04                     ` Johannes Sixt
@ 2010-01-07 14:26                       ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 60+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-01-07 14:26 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

On 1/7/10, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Nguyen Thai Ngoc Duy schrieb:
>
> > On 1/7/10, Junio C Hamano <gitster@pobox.com> wrote:
>  >>  So by writing the test to check the desired outcome, instead of writing it
>  >>  for the particular implementation of using external grep optimization, you
>  >>  will catch both kinds of breakages.
>  >>
>  >>  Perhaps something like this (untested, of course)?
>  >>
>  >>  test_expect_success 'strings in work tree files are not found for skip-wt paths' '
>  >>         no="no such string in the index" &&
>  >>         test_must_fail git grep -e "$no" --cached file &&
>  >>         git update-index --skip-worktree file &&
>  >>         echo "$no" >file &&
>  >>         test_must_fail git grep -e "$no" file &&
>  >>         git update-index --no-skip-worktree file &&
>  >>         git grep -e "$no" file
>  >>  '
>  >>
>  >
>  > Very well reasoned. I'd say go for it!
>  >
>  > Tested-by: me
>
>
> The test is not quite complete. Not only do you want to test that the
>  worktree file is not looked at, but that the index version is used:
>
>
>  test_expect_success 'for skip-wt paths, strings are found in index, not in
>  worktree' '
>         yes="this string is in the index" &&
>
>         no="no such string in the index" &&
>
>         echo "$yes" >file &&
>         git update-index file &&
>         echo "$no" >file &&
>         git grep -e "$yes" --cached file &&
>
>         test_must_fail git grep -e "$no" --cached file &&
>         git update-index --skip-worktree file &&
>
>         git grep -e "$yes" file &&
>
>         test_must_fail git grep -e "$no" file &&
>         git update-index --no-skip-worktree file &&
>
>         test_must_fail git grep -e "$yes" file &&
>
>         git grep -e "$no" file
>  '

Can we get rid of preparing $yes and do "grep -e foo file" instead?
There are lots of foo from setup test. It's not as strict as your test
because foo is also in worktree. But we have $no for testing worktree
already.
-- 
Duy

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-04 16:03                                 ` Linus Torvalds
@ 2010-01-11  6:39                                   ` Junio C Hamano
  2010-01-11 15:43                                     ` Linus Torvalds
                                                       ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-11  6:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> It doesn't matter. Since we do the line-by-line thing, the input is always 
> so short that DFA vs NFA vs BM vs other-clever-search doesn't matter. 
> There is no scaling - the grep buffer tends to be too small for the 
> algorithm to matter.
>
> And the reason we do things line-by-line is that we need to then output 
> things line-per-line.

Here is an experimental patch; first, some numbers (hot cache best of 5 runs).

(baseline -- builtin grep)

    $ time git grep --no-ext-grep qwerty
    drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"...

    real    0m1.521s
    user    0m1.260s
    sys     0m0.256s

    (baseline -- external grep)

    $ time git grep --ext-grep qwerty
    drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"...

    real    0m0.773s
    user    0m0.416s
    sys     0m0.376s

    (with experimental patch -- builtin grep)

    $ time ../git.git/git grep --no-ext-grep qwerty
    drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"...

    real    0m0.692s
    user    0m0.440s
    sys     0m0.252s

The idea is to see the earliest location in the entire remainder of the
buffer the pattern(s) we are looking for first appears, and skip all the
lines before that point.  For this optimization to be valid, we should not
be looking for anything complex (e.g. "find lines that have both X and Y
but not Z" is out).  We cannot use the optimization when "-v" is given,
either, because then we need to find the first line that does _not_ match
given set of patterns.

---

 grep.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/grep.c b/grep.c
index bdadf2c..940e200 100644
--- a/grep.c
+++ b/grep.c
@@ -615,6 +615,65 @@ static void show_pre_context(struct grep_opt *opt, const char *name, char *buf,
 	}
 }
 
+static int should_lookahead(struct grep_opt *opt)
+{
+	struct grep_pat *p;
+
+	if (opt->extended)
+		return 0; /* punt for too complex stuff */
+	if (opt->invert || opt->unmatch_name_only)
+		return 0;
+	for (p = opt->pattern_list; p; p = p->next) {
+		if (p->token != GREP_PATTERN)
+			return 0; /* punt for "header only" and stuff */
+	}
+	return 1;
+}
+
+static int look_ahead(struct grep_opt *opt,
+		      unsigned long *left_p,
+		      unsigned *lno_p,
+		      char **bol_p)
+{
+	unsigned lno = *lno_p;
+	char *bol = *bol_p;
+	struct grep_pat *p;
+	char *sp, *last_bol;
+	regoff_t earliest = -1;
+
+	for (p = opt->pattern_list; p; p = p->next) {
+		int hit;
+		regmatch_t m;
+
+		if (p->fixed)
+			hit = !fixmatch(p->pattern, bol, p->ignore_case, &m);
+		else
+			hit = !regexec(&p->regexp, bol, 1, &m, 0);
+		if (!hit || m.rm_so < 0 || m.rm_eo < 0)
+			continue;
+		if (earliest < 0 || m.rm_so < earliest)
+			earliest = m.rm_so;
+	}
+
+	if (earliest < 0) {
+		*bol_p = bol + *left_p;
+		*left_p = 0;
+		return 1;
+	}
+	for (sp = bol + earliest; bol < sp && sp[-1] != '\n'; sp--)
+		; /* find the beginning of the line */
+	last_bol = sp;
+
+	for (sp = bol; sp < last_bol; sp++) {
+		if (*sp == '\n')
+			lno++;
+	}
+	*left_p -= last_bol - bol;
+	*bol_p = last_bol;
+	*lno_p = lno;
+	return 0;
+}
+
 static int grep_buffer_1(struct grep_opt *opt, const char *name,
 			 char *buf, unsigned long size, int collect_hits)
 {
@@ -624,6 +683,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 	unsigned last_hit = 0;
 	int binary_match_only = 0;
 	unsigned count = 0;
+	int try_lookahead = 0;
 	enum grep_context ctx = GREP_CONTEXT_HEAD;
 	xdemitconf_t xecfg;
 
@@ -652,11 +712,15 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 			opt->priv = &xecfg;
 		}
 	}
+	try_lookahead = should_lookahead(opt);
 
 	while (left) {
 		char *eol, ch;
 		int hit;
 
+		if (try_lookahead
+		    && look_ahead(opt, &left, &lno, &bol))
+			break;
 		eol = end_of_line(bol, &left);
 		ch = *eol;
 		*eol = 0;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11  6:39                                   ` Junio C Hamano
@ 2010-01-11 15:43                                     ` Linus Torvalds
  2010-01-11 15:59                                       ` Linus Torvalds
  2010-01-11 19:26                                     ` Fredrik Kuivinen
       [not found]                                     ` <4c8ef71001111119p253170f8q37bcd3708d894a62@mail.gmail.com>
  2 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2010-01-11 15:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git



On Sun, 10 Jan 2010, Junio C Hamano wrote:
> 
> Here is an experimental patch; first, some numbers (hot cache best of 5 runs).

Without testing it, I can already ACK it. It looks like the 
ObviouslyRightThing(tm) to do. But I'll run some numbers too.

One thing that worries me - but that is independent of this patch - is 
that I don't think our 'grep' function works correctly (neither the 
'fixmatch()' one or the 'regexec()' one) when there are NUL characters in 
a file. Maybe I shouldn't care, but it worries me a bit.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 15:43                                     ` Linus Torvalds
@ 2010-01-11 15:59                                       ` Linus Torvalds
  2010-01-11 16:22                                         ` Junio C Hamano
  2010-01-12 16:21                                         ` [PATCH] grep: do not do external grep on skip-worktree entries Jeff King
  0 siblings, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-11 15:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git



On Mon, 11 Jan 2010, Linus Torvalds wrote:
>
> Without testing it, I can already ACK it. It looks like the 
> ObviouslyRightThing(tm) to do. But I'll run some numbers too.

Ok, some good news, some meh news, and some bad news.

The good news: the trivial numbers look good. It's noticeably faster than 
external grep for me when it does the 'fixmatch()' case, quite probably 
because fixmatch() on at least Linux/x86-64 (which is the only case I 
really care about) uses SSE to do the string ops.

So on my Nehalem:

	[torvalds@nehalem linux]$ time git grep qwerty > /dev/null 

	real	0m0.418s
	user	0m0.204s
	sys	0m0.136s

	[torvalds@nehalem linux]$ time git grep --no-ext-grep qwerty > /dev/null 

	real	0m0.309s
	user	0m0.168s
	sys	0m0.136s

and since that simple fixmatch case is the common one for me, I'm happy.

The meh news: this shows how grep is faster than regexec() due to being a 
smarter algorithm. For the non-fixed case (I used "qwerty.*as"), the 
numbers are

 - built-in:
	real	0m0.548s
	user	0m0.384s
	sys	0m0.152s

 - external:
	real	0m0.415s
	user	0m0.176s
	sys	0m0.160s

so it really is just 'strstr()' that is faster. But This is a 'meh', 
because I don't really care, and the new code is still way faster than the 
old one. And I'd be personally willing to just drop the external grep if 
this is the worst problem.

[ I worry a bit that some libc implementations of 'strstr' may suck, but I 
  wouldn't lose sleep over it. ]

The bad news is that you broke multi-line greps:

	git grep --no-ext-grep -2 qwerty.*as

results in:

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */

when the _correct_ result is 

	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */

ie it didn't do the "two lines after" thing.

That said, the external grep also gets this wrong (a different way), 
because it gets all the extra noise due to unnecessary separation lines, 
so for the external grep I actually get

	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */
	drivers/char/keyboard.c-        "dfghjkl;'`\000\\zxcv"                          /* 0x20 - 0x2f */
	drivers/char/keyboard.c-        "bnm,./\000*\000 \000\201\202\203\204\205"      /* 0x30 - 0x3f */
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--
	--

but that's a long-standing problem, and is more "ugly" than "wrong grep 
results".

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 15:59                                       ` Linus Torvalds
@ 2010-01-11 16:22                                         ` Junio C Hamano
  2010-01-11 16:24                                           ` Junio C Hamano
  2010-01-11 16:33                                           ` Linus Torvalds
  2010-01-12 16:21                                         ` [PATCH] grep: do not do external grep on skip-worktree entries Jeff King
  1 sibling, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-11 16:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> The bad news is that you broke multi-line greps:
>
> 	git grep --no-ext-grep -2 qwerty.*as
>
> results in:
>
> 	drivers/char/keyboard.c-unsigned char kbd_sysrq_xlate[KEY_MAX + 1] =
> 	drivers/char/keyboard.c-        "\000\0331234567890-=\177\t"                    /* 0x00 - 0x0f */
> 	drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"                          /* 0x10 - 0x1f */

Meh.  I checked pre-context codepath before sending the patch and was very
satisfied that René did the right thing in 49de321 (grep: handle pre
context lines on demand, 2009-07-02), but somehow forgot about the post
context codepath.

An ObviouslyRightThing fix is this two-liner.  We shouldn't lookahead if
we want to do something more than just skipping when we see an unmatch for
the line we are currently looking at.

 grep.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/grep.c b/grep.c
index 940e200..ac0ce0b 100644
--- a/grep.c
+++ b/grep.c
@@ -719,6 +719,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name,
 		int hit;
 
 		if (try_lookahead
+		    && !(last_hit
+			 && lno <= last_hit + opt->post_context)
 		    && look_ahead(opt, &left, &lno, &bol))
 			break;
 		eol = end_of_line(bol, &left);

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 16:22                                         ` Junio C Hamano
@ 2010-01-11 16:24                                           ` Junio C Hamano
  2010-01-11 16:33                                           ` Linus Torvalds
  1 sibling, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-11 16:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

Junio C Hamano <gitster@pobox.com> writes:

> Meh.  I checked pre-context codepath before sending the patch and was very
> satisfied that René did the right thing in 49de321 (grep: handle pre
> context lines on demand, 2009-07-02), but somehow forgot about the post
> context codepath.

Just to clarify, it was *I* who forgot to check the post context codepath
while adding the lookahead; I didn't mean René forgot anything in 49de321.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 16:22                                         ` Junio C Hamano
  2010-01-11 16:24                                           ` Junio C Hamano
@ 2010-01-11 16:33                                           ` Linus Torvalds
  2010-01-12  8:29                                             ` Junio C Hamano
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2010-01-11 16:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git



On Mon, 11 Jan 2010, Junio C Hamano wrote:
> 
> An ObviouslyRightThing fix is this two-liner.  We shouldn't lookahead if
> we want to do something more than just skipping when we see an unmatch for
> the line we are currently looking at.

Ack. Works for me. And with that, I'd love for it to go in, and get rid of 
the external grep. Performance is now a non-issue (it goes both ways), and 
the internal grep doesn't have the bug with separators between multi-line 
greps.

And dropping the external one gets rid of all the issues with PATHs, crap 
'grep' implementations, and removes actual code. Goodie.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11  6:39                                   ` Junio C Hamano
  2010-01-11 15:43                                     ` Linus Torvalds
@ 2010-01-11 19:26                                     ` Fredrik Kuivinen
       [not found]                                     ` <4c8ef71001111119p253170f8q37bcd3708d894a62@mail.gmail.com>
  2 siblings, 0 replies; 60+ messages in thread
From: Fredrik Kuivinen @ 2010-01-11 19:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

[Resending as the first copy didn't reach the list.]

On Mon, Jan 11, 2010 at 07:39, Junio C Hamano <gitster@pobox.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> > It doesn't matter. Since we do the line-by-line thing, the input is always
> > so short that DFA vs NFA vs BM vs other-clever-search doesn't matter.
> > There is no scaling - the grep buffer tends to be too small for the
> > algorithm to matter.
> >
> > And the reason we do things line-by-line is that we need to then output
> > things line-per-line.
>
> Here is an experimental patch; first, some numbers (hot cache best of 5 runs).

I get some very unexpected results with this patch. (best of 5 runs):

before:

time git grep --no-ext-grep qwerty
drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"
         /* 0x10 - 0x1f */

real 0m3.531s
user 0m3.056s
sys 0m0.468s

after:

time git grep --no-ext-grep qwerty
drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"
         /* 0x10 - 0x1f */

real    0m4.794s
user    0m4.380s
sys    0m0.404s

with external grep:
time git grep qwerty
drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"
         /* 0x10 - 0x1f */

real    0m1.236s
user    0m0.668s
sys    0m0.544s


So, if I haven't messed up the benchmark, it seems that Junio's patch
makes things go _slower_. I don't understand at all why I get these
results...

This is on a laptop with an Intel T2080 processor running Ubuntu 9.10.

Any ideas on how this can be explained?

- Fredrik

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
       [not found]                                     ` <4c8ef71001111119p253170f8q37bcd3708d894a62@mail.gmail.com>
@ 2010-01-11 19:29                                       ` Linus Torvalds
  2010-01-11 19:40                                         ` Fredrik Kuivinen
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2010-01-11 19:29 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git



On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
> 
> Any ideas on how this can be explained?

Could it be a bad 'strstr()' implementation? 

Try a complex pattern ("qwerty.*as" finds the same line), and see if that 
too is slower than before. If that is faster than it used to be (with 
--no-ext-grep, of course), then it's strstr() that is badly implemented.

For me, on x86-64 (Fedora-12), strstr() seems to do pretty well. But it's 
easy to do a stupid implementation of strstr that does a 'strlen()' first, 
for example, and thus always traverses all data _twice_ etc. Depending on 
cache sizes etc, that can end up killing performance (or not mattering 
much at all..)

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 19:29                                       ` Linus Torvalds
@ 2010-01-11 19:40                                         ` Fredrik Kuivinen
  2010-01-11 20:07                                           ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Fredrik Kuivinen @ 2010-01-11 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

On Mon, Jan 11, 2010 at 20:29, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
>>
>> Any ideas on how this can be explained?
>
> Could it be a bad 'strstr()' implementation?
>
> Try a complex pattern ("qwerty.*as" finds the same line), and see if that
> too is slower than before. If that is faster than it used to be (with
> --no-ext-grep, of course), then it's strstr() that is badly implemented.

Ah, yes, that's it. With the pattern "qwerty.*as" I get 2.5s with the
patch and 6s without.

Thanks.

- Fredrik

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 19:40                                         ` Fredrik Kuivinen
@ 2010-01-11 20:07                                           ` Linus Torvalds
  2010-01-11 21:07                                             ` Fredrik Kuivinen
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2010-01-11 20:07 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git



On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
> >
> > Try a complex pattern ("qwerty.*as" finds the same line), and see if that
> > too is slower than before. If that is faster than it used to be (with
> > --no-ext-grep, of course), then it's strstr() that is badly implemented.
> 
> Ah, yes, that's it. With the pattern "qwerty.*as" I get 2.5s with the
> patch and 6s without.

Ok, so on your machine, regcomp() is basically twice as fast as strstr().

Which is not entirely unexpected: I was actually surprised by strstr() 
being apparently so good on my machine. I do not generally expect things 
like that to be at all optimized for bigger working sets. Most common uses 
of strstr() are in short strings - not "strings" that are many kilobytes 
in size (the whole file).

In fact, I suspect it works so well for me because in my version of glibc 
it's not just SSE-optimized: judging by the naming it's SSE4.2 optimized - 
so the case I see on my machine will _only_ happen on Nehalem-based cores 
(ie the new "Core i[357]" cpu's).

It is entirely possible that strstr in general is a disaster.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 20:07                                           ` Linus Torvalds
@ 2010-01-11 21:07                                             ` Fredrik Kuivinen
  2010-01-11 21:24                                               ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Fredrik Kuivinen @ 2010-01-11 21:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Mon, Jan 11, 2010 at 21:07, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
>> >
>> > Try a complex pattern ("qwerty.*as" finds the same line), and see if that
>> > too is slower than before. If that is faster than it used to be (with
>> > --no-ext-grep, of course), then it's strstr() that is badly implemented.
>>
>> Ah, yes, that's it. With the pattern "qwerty.*as" I get 2.5s with the
>> patch and 6s without.
>
> Ok, so on your machine, regcomp() is basically twice as fast as strstr().

Yes.

> Which is not entirely unexpected: I was actually surprised by strstr()
> being apparently so good on my machine. I do not generally expect things
> like that to be at all optimized for bigger working sets. Most common uses
> of strstr() are in short strings - not "strings" that are many kilobytes
> in size (the whole file).
>
> In fact, I suspect it works so well for me because in my version of glibc
> it's not just SSE-optimized: judging by the naming it's SSE4.2 optimized -
> so the case I see on my machine will _only_ happen on Nehalem-based cores
> (ie the new "Core i[357]" cpu's).
>
> It is entirely possible that strstr in general is a disaster.

Another option is to use memmem instead. As we know the length of the
buffer already it should be a slight improvement over strstr for
everyone. memmem may cause some portability problems though as it is a
GNU extension.

I get these results: (git-grep --no-ext-grep qwerty, best of five)

Junio's patch: 0:04.84
memmem (attached patch on top of Junio's): 0:02.91
regcomp/regexec (I changed is_fixed to always return 0, also on top of
Junio's): 0:02.02

- Fredrik

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 1236 bytes --]

diff --git a/grep.c b/grep.c
index 940e200..d34247f 100644
--- a/grep.c
+++ b/grep.c
@@ -264,13 +264,14 @@ static void show_name(struct grep_opt *opt, const char *name)
 }
 
 
-static int fixmatch(const char *pattern, char *line, int ignore_case, regmatch_t *match)
+static int fixmatch(const char *pattern, char *line, char *eol,
+		    int ignore_case, regmatch_t *match)
 {
 	char *hit;
 	if (ignore_case)
 		hit = strcasestr(line, pattern);
 	else
-		hit = strstr(line, pattern);
+		hit = memmem(line, eol - line, pattern, strlen(pattern));
 
 	if (!hit) {
 		match->rm_so = match->rm_eo = -1;
@@ -333,7 +334,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 
  again:
 	if (p->fixed)
-		hit = !fixmatch(p->pattern, bol, p->ignore_case, pmatch);
+		hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch);
 	else
 		hit = !regexec(&p->regexp, bol, 1, pmatch, eflags);
 
@@ -646,7 +647,7 @@ static int look_ahead(struct grep_opt *opt,
 		regmatch_t m;
 
 		if (p->fixed)
-			hit = !fixmatch(p->pattern, bol, p->ignore_case, &m);
+			hit = !fixmatch(p->pattern, bol, bol + *left_p, p->ignore_case, &m);
 		else
 			hit = !regexec(&p->regexp, bol, 1, &m, 0);
 		if (!hit || m.rm_so < 0 || m.rm_eo < 0)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 21:07                                             ` Fredrik Kuivinen
@ 2010-01-11 21:24                                               ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-11 21:24 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git



On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
> 
> Another option is to use memmem instead. As we know the length of the
> buffer already it should be a slight improvement over strstr for
> everyone. memmem may cause some portability problems though as it is a
> GNU extension.

I'd almost prefer to just drop the strstr entirely.

It's not actually all *that* big a win, even on my machine. I get

 - strstr:

        real    0m0.309s
        user    0m0.168s
        sys     0m0.136s

 - regexec:

	real	0m0.410s
	user	0m0.220s
	sys	0m0.116s

so yeah, it's slower, but not by a huge degree. With strstr, "git grep" 
actually beats the external grep for me, but I don't really care. It's 
already way better than it used to be - and clearly strstr has a lot of 
potential problems.

Sure memmem() might be better for you than strstr, but on the other hand, 
it might easily be worse than strstr for others - and not just from a 
portability standpoint. Is memmem() optimized to take advantage of SSE4.2? 
I suspect it is not, exactly _because_ it's a GNU extension, so Intel 
hasn't published optimized sample code for people to use.

So I would argue against even bothering to try memmem. Especially since	in 
your case, regexec() is apparently faster than memmem _anyway_. I expect 
that it is for me too, but I'm too lazy to check.

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 16:33                                           ` Linus Torvalds
@ 2010-01-12  8:29                                             ` Junio C Hamano
  2010-01-12  8:31                                               ` [PATCH] grep: lookahead optimization can be used with -L option Junio C Hamano
                                                                 ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-12  8:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Ack. Works for me. And with that, I'd love for it to go in, and get rid of 
> the external grep. Performance is now a non-issue (it goes both ways), and 
> the internal grep doesn't have the bug with separators between multi-line 
> greps.
>
> And dropping the external one gets rid of all the issues with PATHs, crap 
> 'grep' implementations, and removes actual code. Goodie.
>
> 		Linus

Before going forward, I found two small nits that should go to maint.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH] grep: lookahead optimization can be used with -L option
  2010-01-12  8:29                                             ` Junio C Hamano
@ 2010-01-12  8:31                                               ` Junio C Hamano
  2010-01-12  8:32                                               ` [PATCH] grep: -L should show empty files Junio C Hamano
                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-12  8:31 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

This is not an "inverted logic" option ("-v") that shows lines
that do not match given pattern as hits, and skipping lines that
cannot possibly match wouldn't change its outcome.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Should be squashed in to the "lookahead" patch.

 grep.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/grep.c b/grep.c
index 048b982..03ffcd4 100644
--- a/grep.c
+++ b/grep.c
@@ -614,7 +614,7 @@ static int should_lookahead(struct grep_opt *opt)
 
 	if (opt->extended)
 		return 0; /* punt for too complex stuff */
-	if (opt->invert || opt->unmatch_name_only)
+	if (opt->invert)
 		return 0;
 	for (p = opt->pattern_list; p; p = p->next) {
 		if (p->token != GREP_PATTERN)
-- 
1.6.6.280.ge295b7.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH] grep: -L should show empty files
  2010-01-12  8:29                                             ` Junio C Hamano
  2010-01-12  8:31                                               ` [PATCH] grep: lookahead optimization can be used with -L option Junio C Hamano
@ 2010-01-12  8:32                                               ` Junio C Hamano
  2010-01-12 21:27                                                 ` Sverre Rabbelier
  2010-01-13  6:48                                               ` [PATCH 1/2] grep: rip out support for external grep Junio C Hamano
  2010-01-13  6:51                                               ` [PATCH 2/2] grep: rip out pessimization to use fixmatch() Junio C Hamano
  3 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

The -L (--files-without-match) option is supposed to show paths that
produced no matches.  When running the internal grep on work tree files,
however, we had an optimization to just return on zero-sized files,
without doing anything.

This optimization doesn't matter too much in practice (a tracked empty
file must be rare, or there is something wrong with your project); to
produce results consistent with GNU grep, we should stop the optimization
and show empty files as not having the given pattern.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Fix for a longstanding bug meant for maint.

 builtin-grep.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/builtin-grep.c b/builtin-grep.c
index fd450bc..84a5af3 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -159,8 +159,6 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 			error("'%s': %s", filename, strerror(errno));
 		return 0;
 	}
-	if (!st.st_size)
-		return 0; /* empty file -- no grep hit */
 	if (!S_ISREG(st.st_mode))
 		return 0;
 	sz = xsize_t(st.st_size);
-- 
1.6.6.280.ge295b7.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
  2010-01-11 15:59                                       ` Linus Torvalds
  2010-01-11 16:22                                         ` Junio C Hamano
@ 2010-01-12 16:21                                         ` Jeff King
  1 sibling, 0 replies; 60+ messages in thread
From: Jeff King @ 2010-01-12 16:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Miles Bader, Nguyen Thai Ngoc Duy, git

On Mon, Jan 11, 2010 at 07:59:18AM -0800, Linus Torvalds wrote:

> The meh news: this shows how grep is faster than regexec() due to being a 
> smarter algorithm. For the non-fixed case (I used "qwerty.*as"), the 
> numbers are
> 
>  - built-in:
> 	real	0m0.548s
> 	user	0m0.384s
> 	sys	0m0.152s
> 
>  - external:
> 	real	0m0.415s
> 	user	0m0.176s
> 	sys	0m0.160s
> 
> so it really is just 'strstr()' that is faster. But This is a 'meh', 
> because I don't really care, and the new code is still way faster than the 
> old one. And I'd be personally willing to just drop the external grep if 
> this is the worst problem.

Just for fun, I repeated my pcre tests on what's in pu (which has
Junio's lookahead patch now). Before they didn't show any improvement
because we wasted all of our time in non-regex code. There is some
improvement in just using pcre, but I didn't get any improvement by
tweaking it:

[pu]
$ time git grep 'qwerty.*as' >/dev/null
real    0m1.007s
user    0m0.752s
sys     0m0.252s

[pu + pcre]
$ time git grep --no-ext-grep 'qwerty.*as' >/dev/null
real    0m0.864s
user    0m0.648s
sys     0m0.212s

[pu + pcre_study]
$ time git grep --no-ext-grep 'qwerty.*as' >/dev/null
real    0m0.866s
user    0m0.628s
sys     0m0.200s

[pu + pcre_dfa_exec]
$ time git grep --no-ext-grep 'qwerty.*as' >/dev/null
real    0m0.868s
user    0m0.608s
sys     0m0.256s

So pcre seems to buy us about 15%, and tweaking it gets lost in the
noise (or I am tweaking it badly, which is entirely possible). I doubt
it's worth the trouble of supporting pcre for that much.

And let me add an additional vote against strstr:

$ time git grep --no-ext-grep qwerty >/dev/null
real    0m3.285s
user    0m3.032s
sys     0m0.252s

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: -L should show empty files
  2010-01-12  8:32                                               ` [PATCH] grep: -L should show empty files Junio C Hamano
@ 2010-01-12 21:27                                                 ` Sverre Rabbelier
  2010-01-13  6:56                                                   ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Sverre Rabbelier @ 2010-01-12 21:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

Heya,

On Tue, Jan 12, 2010 at 09:32, Junio C Hamano <gitster@pobox.com> wrote:
> This optimization doesn't matter too much in practice (a tracked empty
> file must be rare, or there is something wrong with your project);

How about python projects, where there's an __init__.py file
everywhere you turn your head? ;)

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/2] grep: rip out support for external grep
  2010-01-12  8:29                                             ` Junio C Hamano
  2010-01-12  8:31                                               ` [PATCH] grep: lookahead optimization can be used with -L option Junio C Hamano
  2010-01-12  8:32                                               ` [PATCH] grep: -L should show empty files Junio C Hamano
@ 2010-01-13  6:48                                               ` Junio C Hamano
  2010-01-13  8:29                                                 ` Jay Soffian
  2010-01-13 15:20                                                 ` Linus Torvalds
  2010-01-13  6:51                                               ` [PATCH 2/2] grep: rip out pessimization to use fixmatch() Junio C Hamano
  3 siblings, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-13  6:48 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

We still allow people to pass --[no-]ext-grep on the command line,
but the option is ignored.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Junio C Hamano <gitster@pobox.com> writes:

 > Linus Torvalds <torvalds@linux-foundation.org> writes:
 >
 >> Ack. Works for me. And with that, I'd love for it to go in, and get rid of 
 >> the external grep.
 > ...
 > Before going forward, I found two small nits that should go to maint.

 These nits out-of-way, we can now start doing this.

 Makefile        |   10 --
 builtin-grep.c  |  305 +------------------------------------------------------
 t/t7002-grep.sh |    6 +-
 3 files changed, 8 insertions(+), 313 deletions(-)

diff --git a/Makefile b/Makefile
index 4a1e5bc..a4b922e 100644
--- a/Makefile
+++ b/Makefile
@@ -185,10 +185,6 @@ all::
 # is a simplified version of the merge sort used in glibc. This is
 # recommended if Git triggers O(n^2) behavior in your platform's qsort().
 #
-# Define NO_EXTERNAL_GREP if you don't want "git grep" to ever call
-# your external grep (e.g., if your system lacks grep, if its grep is
-# broken, or spawning external process is slower than built-in grep git has).
-#
 # Define UNRELIABLE_FSTAT if your system's fstat does not return the same
 # information on a not yet closed file that lstat would return for the same
 # file after it was closed.
@@ -777,7 +773,6 @@ ifeq ($(uname_S),SunOS)
 	NO_MKDTEMP = YesPlease
 	NO_MKSTEMPS = YesPlease
 	NO_REGEX = YesPlease
-	NO_EXTERNAL_GREP = YesPlease
 	THREADED_DELTA_SEARCH = YesPlease
 	ifeq ($(uname_R),5.7)
 		NEEDS_RESOLV = YesPlease
@@ -895,7 +890,6 @@ ifeq ($(uname_S),IRIX)
 	# NO_MMAP.  If you suspect that your compiler is not affected by this
 	# issue, comment out the NO_MMAP statement.
 	NO_MMAP = YesPlease
-	NO_EXTERNAL_GREP = UnfortunatelyYes
 	SNPRINTF_RETURNS_BOGUS = YesPlease
 	SHELL_PATH = /usr/gnu/bin/bash
 	NEEDS_LIBGEN = YesPlease
@@ -915,7 +909,6 @@ ifeq ($(uname_S),IRIX64)
 	# NO_MMAP.  If you suspect that your compiler is not affected by this
 	# issue, comment out the NO_MMAP statement.
 	NO_MMAP = YesPlease
-	NO_EXTERNAL_GREP = UnfortunatelyYes
 	SNPRINTF_RETURNS_BOGUS = YesPlease
 	SHELL_PATH=/usr/gnu/bin/bash
 	NEEDS_LIBGEN = YesPlease
@@ -1322,9 +1315,6 @@ endif
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
-ifdef NO_EXTERNAL_GREP
-	BASIC_CFLAGS += -DNO_EXTERNAL_GREP
-endif
 ifdef UNRELIABLE_FSTAT
 	BASIC_CFLAGS += -DUNRELIABLE_FSTAT
 endif
diff --git a/builtin-grep.c b/builtin-grep.c
index a5b6719..4adb971 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -15,14 +15,6 @@
 #include "grep.h"
 #include "quote.h"
 
-#ifndef NO_EXTERNAL_GREP
-#ifdef __unix__
-#define NO_EXTERNAL_GREP 0
-#else
-#define NO_EXTERNAL_GREP 1
-#endif
-#endif
-
 static char const * const grep_usage[] = {
 	"git grep [options] [-e] <pattern> [<rev>...] [[--] path...]",
 	NULL
@@ -215,292 +207,12 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	return i;
 }
 
-#if !NO_EXTERNAL_GREP
-static int exec_grep(int argc, const char **argv)
-{
-	pid_t pid;
-	int status;
-
-	argv[argc] = NULL;
-	pid = fork();
-	if (pid < 0)
-		return pid;
-	if (!pid) {
-		execvp("grep", (char **) argv);
-		exit(255);
-	}
-	while (waitpid(pid, &status, 0) < 0) {
-		if (errno == EINTR)
-			continue;
-		return -1;
-	}
-	if (WIFEXITED(status)) {
-		if (!WEXITSTATUS(status))
-			return 1;
-		return 0;
-	}
-	return -1;
-}
-
-#define MAXARGS 1000
-#define ARGBUF 4096
-#define push_arg(a) do { \
-	if (nr < MAXARGS) argv[nr++] = (a); \
-	else die("maximum number of args exceeded"); \
-	} while (0)
-
-/*
- * If you send a singleton filename to grep, it does not give
- * the name of the file.  GNU grep has "-H" but we would want
- * that behaviour in a portable way.
- *
- * So we keep two pathnames in argv buffer unsent to grep in
- * the main loop if we need to do more than one grep.
- */
-static int flush_grep(struct grep_opt *opt,
-		      int argc, int arg0, const char **argv, int *kept)
-{
-	int status;
-	int count = argc - arg0;
-	const char *kept_0 = NULL;
-
-	if (count <= 2) {
-		/*
-		 * Because we keep at least 2 paths in the call from
-		 * the main loop (i.e. kept != NULL), and MAXARGS is
-		 * far greater than 2, this usually is a call to
-		 * conclude the grep.  However, the user could attempt
-		 * to overflow the argv buffer by giving too many
-		 * options to leave very small number of real
-		 * arguments even for the call in the main loop.
-		 */
-		if (kept)
-			die("insanely many options to grep");
-
-		/*
-		 * If we have two or more paths, we do not have to do
-		 * anything special, but we need to push /dev/null to
-		 * get "-H" behaviour of GNU grep portably but when we
-		 * are not doing "-l" nor "-L" nor "-c".
-		 */
-		if (count == 1 &&
-		    !opt->name_only &&
-		    !opt->unmatch_name_only &&
-		    !opt->count) {
-			argv[argc++] = "/dev/null";
-			argv[argc] = NULL;
-		}
-	}
-
-	else if (kept) {
-		/*
-		 * Called because we found many paths and haven't finished
-		 * iterating over the cache yet.  We keep two paths
-		 * for the concluding call.  argv[argc-2] and argv[argc-1]
-		 * has the last two paths, so save the first one away,
-		 * replace it with NULL while sending the list to grep,
-		 * and recover them after we are done.
-		 */
-		*kept = 2;
-		kept_0 = argv[argc-2];
-		argv[argc-2] = NULL;
-		argc -= 2;
-	}
-
-	if (opt->pre_context || opt->post_context) {
-		/*
-		 * grep handles hunk marks between files, but we need to
-		 * do that ourselves between multiple calls.
-		 */
-		if (opt->show_hunk_mark)
-			write_or_die(1, "--\n", 3);
-		else
-			opt->show_hunk_mark = 1;
-	}
-
-	status = exec_grep(argc, argv);
-
-	if (kept_0) {
-		/*
-		 * Then recover them.  Now the last arg is beyond the
-		 * terminating NULL which is at argc, and the second
-		 * from the last is what we saved away in kept_0
-		 */
-		argv[arg0++] = kept_0;
-		argv[arg0] = argv[argc+1];
-	}
-	return status;
-}
-
-static void grep_add_color(struct strbuf *sb, const char *escape_seq)
-{
-	size_t orig_len = sb->len;
-
-	while (*escape_seq) {
-		if (*escape_seq == 'm')
-			strbuf_addch(sb, ';');
-		else if (*escape_seq != '\033' && *escape_seq  != '[')
-			strbuf_addch(sb, *escape_seq);
-		escape_seq++;
-	}
-	if (sb->len > orig_len && sb->buf[sb->len - 1] == ';')
-		strbuf_setlen(sb, sb->len - 1);
-}
-
-static int external_grep(struct grep_opt *opt, const char **paths, int cached)
-{
-	int i, nr, argc, hit, len, status;
-	const char *argv[MAXARGS+1];
-	char randarg[ARGBUF];
-	char *argptr = randarg;
-	struct grep_pat *p;
-
-	if (opt->extended || (opt->relative && opt->prefix_length))
-		return -1;
-	len = nr = 0;
-	push_arg("grep");
-	if (opt->fixed)
-		push_arg("-F");
-	if (opt->linenum)
-		push_arg("-n");
-	if (!opt->pathname)
-		push_arg("-h");
-	if (opt->regflags & REG_EXTENDED)
-		push_arg("-E");
-	if (opt->ignore_case)
-		push_arg("-i");
-	if (opt->binary == GREP_BINARY_NOMATCH)
-		push_arg("-I");
-	if (opt->word_regexp)
-		push_arg("-w");
-	if (opt->name_only)
-		push_arg("-l");
-	if (opt->unmatch_name_only)
-		push_arg("-L");
-	if (opt->null_following_name)
-		/* in GNU grep git's "-z" translates to "-Z" */
-		push_arg("-Z");
-	if (opt->count)
-		push_arg("-c");
-	if (opt->post_context || opt->pre_context) {
-		if (opt->post_context != opt->pre_context) {
-			if (opt->pre_context) {
-				push_arg("-B");
-				len += snprintf(argptr, sizeof(randarg)-len,
-						"%u", opt->pre_context) + 1;
-				if (sizeof(randarg) <= len)
-					die("maximum length of args exceeded");
-				push_arg(argptr);
-				argptr += len;
-			}
-			if (opt->post_context) {
-				push_arg("-A");
-				len += snprintf(argptr, sizeof(randarg)-len,
-						"%u", opt->post_context) + 1;
-				if (sizeof(randarg) <= len)
-					die("maximum length of args exceeded");
-				push_arg(argptr);
-				argptr += len;
-			}
-		}
-		else {
-			push_arg("-C");
-			len += snprintf(argptr, sizeof(randarg)-len,
-					"%u", opt->post_context) + 1;
-			if (sizeof(randarg) <= len)
-				die("maximum length of args exceeded");
-			push_arg(argptr);
-			argptr += len;
-		}
-	}
-	for (p = opt->pattern_list; p; p = p->next) {
-		push_arg("-e");
-		push_arg(p->pattern);
-	}
-	if (opt->color) {
-		struct strbuf sb = STRBUF_INIT;
-
-		grep_add_color(&sb, opt->color_match);
-		setenv("GREP_COLOR", sb.buf, 1);
-
-		strbuf_reset(&sb);
-		strbuf_addstr(&sb, "mt=");
-		grep_add_color(&sb, opt->color_match);
-		strbuf_addstr(&sb, ":sl=:cx=:fn=:ln=:bn=:se=");
-		setenv("GREP_COLORS", sb.buf, 1);
-
-		strbuf_release(&sb);
-
-		if (opt->color_external && strlen(opt->color_external) > 0)
-			push_arg(opt->color_external);
-	} else {
-		unsetenv("GREP_COLOR");
-		unsetenv("GREP_COLORS");
-	}
-	unsetenv("GREP_OPTIONS");
-
-	hit = 0;
-	argc = nr;
-	for (i = 0; i < active_nr; i++) {
-		struct cache_entry *ce = active_cache[i];
-		char *name;
-		int kept;
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!pathspec_matches(paths, ce->name, opt->max_depth))
-			continue;
-		name = ce->name;
-		if (name[0] == '-') {
-			int len = ce_namelen(ce);
-			name = xmalloc(len + 3);
-			memcpy(name, "./", 2);
-			memcpy(name + 2, ce->name, len + 1);
-		}
-		argv[argc++] = name;
-		if (MAXARGS <= argc) {
-			status = flush_grep(opt, argc, nr, argv, &kept);
-			if (0 < status)
-				hit = 1;
-			argc = nr + kept;
-		}
-		if (ce_stage(ce)) {
-			do {
-				i++;
-			} while (i < active_nr &&
-				 !strcmp(ce->name, active_cache[i]->name));
-			i--; /* compensate for loop control */
-		}
-	}
-	if (argc > nr) {
-		status = flush_grep(opt, argc, nr, argv, NULL);
-		if (0 < status)
-			hit = 1;
-	}
-	return hit;
-}
-#endif
-
-static int grep_cache(struct grep_opt *opt, const char **paths, int cached,
-		      int external_grep_allowed)
+static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
 {
 	int hit = 0;
 	int nr;
 	read_cache();
 
-#if !NO_EXTERNAL_GREP
-	/*
-	 * Use the external "grep" command for the case where
-	 * we grep through the checked-out files. It tends to
-	 * be a lot more optimized
-	 */
-	if (!cached && external_grep_allowed) {
-		hit = external_grep(opt, paths, cached);
-		if (hit >= 0)
-			return hit;
-		hit = 0;
-	}
-#endif
-
 	for (nr = 0; nr < active_nr; nr++) {
 		struct cache_entry *ce = active_cache[nr];
 		if (!S_ISREG(ce->ce_mode))
@@ -697,8 +409,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 {
 	int hit = 0;
 	int cached = 0;
-	int external_grep_allowed = 1;
 	int seen_dashdash = 0;
+	int external_grep_allowed__ignored;
 	struct grep_opt opt;
 	struct object_array list = { 0, 0, NULL };
 	const char **paths = NULL;
@@ -780,13 +492,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOLEAN(0, "all-match", &opt.all_match,
 			"show only matches from files that match all patterns"),
 		OPT_GROUP(""),
-#if NO_EXTERNAL_GREP
-		OPT_BOOLEAN(0, "ext-grep", &external_grep_allowed,
-			"allow calling of grep(1) (ignored by this build)"),
-#else
-		OPT_BOOLEAN(0, "ext-grep", &external_grep_allowed,
-			"allow calling of grep(1) (default)"),
-#endif
+		OPT_BOOLEAN(0, "ext-grep", &external_grep_allowed__ignored,
+			    "allow calling of grep(1) (ignored by this build)"),
 		{ OPTION_CALLBACK, 0, "help-all", &options, NULL, "show usage",
 		  PARSE_OPT_HIDDEN | PARSE_OPT_NOARG, help_callback },
 		OPT_END()
@@ -837,8 +544,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		argc--;
 	}
 
-	if ((opt.color && !opt.color_external) || opt.funcname)
-		external_grep_allowed = 0;
 	if (!opt.pattern_list)
 		die("no pattern given.");
 	if (!opt.fixed && opt.ignore_case)
@@ -884,7 +589,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (!list.nr) {
 		if (!cached)
 			setup_work_tree();
-		return !grep_cache(&opt, paths, cached, external_grep_allowed);
+		return !grep_cache(&opt, paths, cached);
 	}
 
 	if (cached)
diff --git a/t/t7002-grep.sh b/t/t7002-grep.sh
index abd14bf..c369cdb 100755
--- a/t/t7002-grep.sh
+++ b/t/t7002-grep.sh
@@ -302,8 +302,8 @@ test_expect_success 'grep -C1, hunk mark between files' '
 	test_cmp expected actual
 '
 
-test_expect_success 'grep -C1 --no-ext-grep, hunk mark between files' '
-	git grep -C1 --no-ext-grep "^[yz]" >actual &&
+test_expect_success 'grep -C1 hunk mark between files' '
+	git grep -C1 "^[yz]" >actual &&
 	test_cmp expected actual
 '
 
@@ -359,7 +359,7 @@ test_expect_success 'log grep (6)' '
 test_expect_success 'grep with CE_VALID file' '
 	git update-index --assume-unchanged t/t &&
 	rm t/t &&
-	test "$(git grep --no-ext-grep test)" = "t/t:test" &&
+	test "$(git grep test)" = "t/t:test" &&
 	git update-index --no-assume-unchanged t/t &&
 	git checkout t/t
 '
-- 
1.6.6.292.ge84ea.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/2] grep: rip out pessimization to use fixmatch()
  2010-01-12  8:29                                             ` Junio C Hamano
                                                                 ` (2 preceding siblings ...)
  2010-01-13  6:48                                               ` [PATCH 1/2] grep: rip out support for external grep Junio C Hamano
@ 2010-01-13  6:51                                               ` Junio C Hamano
  3 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-13  6:51 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

Even when running without the -F (--fixed-strings) option, we checked the
pattern and used fixmatch() codepath when it does not contain any regex
magic.  Finding fixed strings with strstr() surely must be faster than
running the regular expression crud.

Not so.  It turns out that on some libc implementations, using the
regcomp()/regexec() pair is a lot faster than running strstr() and
strcasestr() the fixmatch() codepath uses.  Drop the optimization and use
the fixmatch() codepath only when the user explicitly asked for it with
the -F option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 I first thought that this is somewhat questionable and we might want
 an option --[no-]fixed-string-optimization to control it, but there
 already is an option to do so with a shorter name, namely "-F".

 grep.c |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/grep.c b/grep.c
index 62723da..8e1f7de 100644
--- a/grep.c
+++ b/grep.c
@@ -29,13 +29,6 @@ void append_grep_pattern(struct grep_opt *opt, const char *pat,
 	p->next = NULL;
 }
 
-static int is_fixed(const char *s)
-{
-	while (*s && !is_regex_special(*s))
-		s++;
-	return !*s;
-}
-
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int err;
@@ -43,7 +36,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 
-	if (opt->fixed || is_fixed(p->pattern))
+	if (opt->fixed)
 		p->fixed = 1;
 	if (opt->regflags & REG_ICASE)
 		p->fixed = 0;
-- 
1.6.6.292.ge84ea.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: -L should show empty files
  2010-01-12 21:27                                                 ` Sverre Rabbelier
@ 2010-01-13  6:56                                                   ` Junio C Hamano
  2010-01-13 16:04                                                     ` Sverre Rabbelier
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-13  6:56 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Junio C Hamano, git, Linus Torvalds, Miles Bader, Jeff King,
	Nguyen Thai Ngoc Duy

Sverre Rabbelier <srabbelier@gmail.com> writes:

> Heya,
>
> On Tue, Jan 12, 2010 at 09:32, Junio C Hamano <gitster@pobox.com> wrote:
>> This optimization doesn't matter too much in practice (a tracked empty
>> file must be rare, or there is something wrong with your project);
>
> How about python projects, where there's an __init__.py file
> everywhere you turn your head? ;)

It's Ok as the price we pay for producing correct result is to open those
empty files, read them, and look for matches which we will never find ;-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] grep: rip out support for external grep
  2010-01-13  6:48                                               ` [PATCH 1/2] grep: rip out support for external grep Junio C Hamano
@ 2010-01-13  8:29                                                 ` Jay Soffian
  2010-01-13  8:59                                                   ` Junio C Hamano
  2010-01-13 15:20                                                 ` Linus Torvalds
  1 sibling, 1 reply; 60+ messages in thread
From: Jay Soffian @ 2010-01-13  8:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

On Wed, Jan 13, 2010 at 1:48 AM, Junio C Hamano <gitster@pobox.com> wrote:
> We still allow people to pass --[no-]ext-grep on the command line,
> but the option is ignored.

Perhaps this squashed in on top? (If gmail's web-interface mangles
this I can resend.)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a1e36d7..6a96d9d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -645,14 +645,6 @@ color.grep::
        `never`), never.  When set to `true` or `auto`, use color only
        when the output is written to the terminal.  Defaults to `false`.

-color.grep.external::
-       The string value of this variable is passed to an external 'grep'
-       command as a command line option if match highlighting is turned
-       on.  If set to an empty string, no option is passed at all,
-       turning off coloring for external 'grep' calls; this is the default.
-       For GNU grep, set it to `--color=always` to highlight matches even
-       when a pager is used.
-
 color.grep.match::
        Use customized color for matches.  The value of this variable
        may be specified as in color.branch.<slot>.  It is passed using
diff --git a/builtin-grep.c b/builtin-grep.c
index 4adb971..3d6ebb5 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -34,8 +34,6 @@ static int grep_config(const char *var, const char
*value, void *cb)
                opt->color = git_config_colorbool(var, value, -1);
                return 0;
        }
-       if (!strcmp(var, "color.grep.external"))
-               return git_config_string(&(opt->color_external), var, value);
        if (!strcmp(var, "color.grep.match")) {
                if (!value)
                        return config_error_nonbool(var);
diff --git a/grep.h b/grep.h
index 75370f6..0c61b00 100644
--- a/grep.h
+++ b/grep.h
@@ -85,7 +85,6 @@ struct grep_opt {
        int max_depth;
        int funcname;
        char color_match[COLOR_MAXLEN];
-       const char *color_external;
        int regflags;
        unsigned pre_context;
        unsigned post_context;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] grep: rip out support for external grep
  2010-01-13  8:29                                                 ` Jay Soffian
@ 2010-01-13  8:59                                                   ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-13  8:59 UTC (permalink / raw)
  To: Jay Soffian
  Cc: git, Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

Thanks; I grepped only for EXTERNAL_GREP and ext-grep and forgot about
this one.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/2] grep: rip out support for external grep
  2010-01-13  6:48                                               ` [PATCH 1/2] grep: rip out support for external grep Junio C Hamano
  2010-01-13  8:29                                                 ` Jay Soffian
@ 2010-01-13 15:20                                                 ` Linus Torvalds
  1 sibling, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2010-01-13 15:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy



On Tue, 12 Jan 2010, Junio C Hamano wrote:
> 
>  These nits out-of-way, we can now start doing this.
> 
>  Makefile        |   10 --
>  builtin-grep.c  |  305 +------------------------------------------------------
>  t/t7002-grep.sh |    6 +-
>  3 files changed, 8 insertions(+), 313 deletions(-)

Yay!

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: -L should show empty files
  2010-01-13  6:56                                                   ` Junio C Hamano
@ 2010-01-13 16:04                                                     ` Sverre Rabbelier
  2010-01-13 19:48                                                       ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Sverre Rabbelier @ 2010-01-13 16:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

Heya,

On Wed, Jan 13, 2010 at 07:56, Junio C Hamano <gitster@pobox.com> wrote:
> It's Ok as the price we pay for producing correct result is to open those
> empty files, read them, and look for matches which we will never find ;-)

I'm not that familiar with the code, but wouldn't it be possible to
keep the early abort, but make it dependent on not using the '-L'
flag?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH] grep: -L should show empty files
  2010-01-13 16:04                                                     ` Sverre Rabbelier
@ 2010-01-13 19:48                                                       ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-13 19:48 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: git, Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy

Sverre Rabbelier <srabbelier@gmail.com> writes:

> I'm not that familiar with the code, but wouldn't it be possible to
> keep the early abort, but make it dependent on not using the '-L'
> flag?

Anything codable is possible to code, but my point was I don't think such
an optimization to avoid reading empty files is worth the time to write
and maintain extra code necessary for it.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2010-01-13 19:48 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-30 14:11 [PATCH] grep: do not do external grep on skip-worktree entries Nguyễn Thái Ngọc Duy
2009-12-31  7:01 ` Junio C Hamano
2009-12-31  7:09   ` Junio C Hamano
2010-01-02 11:50     ` Nguyen Thai Ngoc Duy
2010-01-02 18:44       ` Junio C Hamano
2010-01-02 19:15         ` Nguyen Thai Ngoc Duy
2010-01-02 19:45           ` Junio C Hamano
2010-01-03  2:35             ` Miles Bader
2010-01-03  2:47               ` Miles Bader
2010-01-03  3:08                 ` Miles Bader
2010-01-03 19:32                   ` Linus Torvalds
2010-01-03 20:49                     ` Junio C Hamano
2010-01-04  5:31                       ` Jeff King
2010-01-04  5:52                         ` Junio C Hamano
2010-01-04  6:44                           ` Jeff King
2010-01-04  7:08                             ` Junio C Hamano
2010-01-04  7:14                               ` Junio C Hamano
2010-01-04  7:29                                 ` Jeff King
2010-01-04  7:26                               ` Jeff King
2010-01-04  8:09                                 ` Jeff King
2010-01-04 16:01                                   ` Linus Torvalds
2010-01-04 15:54                             ` Linus Torvalds
2010-01-04 15:57                               ` Miles Bader
2010-01-04 16:03                                 ` Linus Torvalds
2010-01-11  6:39                                   ` Junio C Hamano
2010-01-11 15:43                                     ` Linus Torvalds
2010-01-11 15:59                                       ` Linus Torvalds
2010-01-11 16:22                                         ` Junio C Hamano
2010-01-11 16:24                                           ` Junio C Hamano
2010-01-11 16:33                                           ` Linus Torvalds
2010-01-12  8:29                                             ` Junio C Hamano
2010-01-12  8:31                                               ` [PATCH] grep: lookahead optimization can be used with -L option Junio C Hamano
2010-01-12  8:32                                               ` [PATCH] grep: -L should show empty files Junio C Hamano
2010-01-12 21:27                                                 ` Sverre Rabbelier
2010-01-13  6:56                                                   ` Junio C Hamano
2010-01-13 16:04                                                     ` Sverre Rabbelier
2010-01-13 19:48                                                       ` Junio C Hamano
2010-01-13  6:48                                               ` [PATCH 1/2] grep: rip out support for external grep Junio C Hamano
2010-01-13  8:29                                                 ` Jay Soffian
2010-01-13  8:59                                                   ` Junio C Hamano
2010-01-13 15:20                                                 ` Linus Torvalds
2010-01-13  6:51                                               ` [PATCH 2/2] grep: rip out pessimization to use fixmatch() Junio C Hamano
2010-01-12 16:21                                         ` [PATCH] grep: do not do external grep on skip-worktree entries Jeff King
2010-01-11 19:26                                     ` Fredrik Kuivinen
     [not found]                                     ` <4c8ef71001111119p253170f8q37bcd3708d894a62@mail.gmail.com>
2010-01-11 19:29                                       ` Linus Torvalds
2010-01-11 19:40                                         ` Fredrik Kuivinen
2010-01-11 20:07                                           ` Linus Torvalds
2010-01-11 21:07                                             ` Fredrik Kuivinen
2010-01-11 21:24                                               ` Linus Torvalds
2010-01-04 16:24                               ` Linus Torvalds
2010-01-04 10:14                           ` Nguyen Thai Ngoc Duy
2010-01-04  6:06                     ` Mike Hommey
2010-01-04  7:04                       ` Jeff King
2010-01-04 12:34             ` [PATCH 1/2] t7002: set test prerequisite "external-grep" if supported Nguyễn Thái Ngọc Duy
2010-01-07  2:37               ` Junio C Hamano
2010-01-07  4:29                 ` Junio C Hamano
2010-01-07 13:27                   ` Nguyen Thai Ngoc Duy
2010-01-07 14:04                     ` Johannes Sixt
2010-01-07 14:26                       ` Nguyen Thai Ngoc Duy
2010-01-04 12:34             ` [PATCH 2/2] t7002: add tests for skip-worktree fixes in commit a67e281 Nguyễn Thái Ngọc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).