All of lore.kernel.org
 help / color / mirror / Atom feed
* git log filtering
@ 2007-02-07 16:41 Don Zickus
  2007-02-07 16:55 ` Jakub Narebski
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Don Zickus @ 2007-02-07 16:41 UTC (permalink / raw)
  To: git

I was curious to know what is the easiest way to filter info inside a
commit message.

For example say I wanted to find out what patches Joe User has
submitted to the git project.
I know I can do something like ' git log |grep -B2 "^Author: Joe User"
' and it will output the matches and the commit id.  However, if I
wanted to filter on something like "Signed-off-by: Joe User", then it
is a little harder to dig for the commit id.

Is there a better way of doing this?  Or should I accept the fact that
git wasn't designed to filter info like this very quickly?

I guess what I was looking to do was embed some metadata inside the
commit message and parse through it at a later time (ie like a
bugzilla number or something).

Any thoughts/tips/tricks would be helpful.

Cheers,
Don

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 16:41 git log filtering Don Zickus
@ 2007-02-07 16:55 ` Jakub Narebski
  2007-02-07 17:01 ` Uwe Kleine-König
  2007-02-07 17:12 ` Linus Torvalds
  2 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2007-02-07 16:55 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

[Cc: git@vger.kernel.org]

Don Zickus wrote:

> I was curious to know what is the easiest way to filter info inside a
> commit message.
> 
> For example say I wanted to find out what patches Joe User has
> submitted to the git project.
>
> I know I can do something like ' git log |grep -B2 "^Author: Joe User"
> ' and it will output the matches and the commit id.  However, if I
> wanted to filter on something like "Signed-off-by: Joe User", then it
> is a little harder to dig for the commit id.
> 
> Is there a better way of doing this?  Or should I accept the fact that
> git wasn't designed to filter info like this very quickly?

You can use "git log --grep=<pattern>" for that, instead. This greps
raw commit message. You can use --author and --comitter to grep those
headers.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 16:41 git log filtering Don Zickus
  2007-02-07 16:55 ` Jakub Narebski
@ 2007-02-07 17:01 ` Uwe Kleine-König
  2007-02-07 17:12   ` Johannes Schindelin
  2007-02-07 17:12 ` Linus Torvalds
  2 siblings, 1 reply; 34+ messages in thread
From: Uwe Kleine-König @ 2007-02-07 17:01 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

Don Zickus wrote:
> I was curious to know what is the easiest way to filter info inside a
> commit message.
> 
> For example say I wanted to find out what patches Joe User has
> submitted to the git project.
> I know I can do something like ' git log |grep -B2 "^Author: Joe User"
What about

	git log --author="Joe User"

> ' and it will output the matches and the commit id.  However, if I
> wanted to filter on something like "Signed-off-by: Joe User", then it
> is a little harder to dig for the commit id.
> 
> Is there a better way of doing this?  Or should I accept the fact that
> git wasn't designed to filter info like this very quickly?
> 
> I guess what I was looking to do was embed some metadata inside the
> commit message and parse through it at a later time (ie like a
> bugzilla number or something).
> 
> Any thoughts/tips/tricks would be helpful.

Maybe:

	git log | awk -v sob="Joe User" '$1 == "commit" {commit = $2} /Signed-off-by:/ {if (match($0, sob)) print commit}'

Best regards
Uwe

-- 
Uwe Kleine-König

http://www.google.com/search?q=2004+in+roman+numerals

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 17:01 ` Uwe Kleine-König
@ 2007-02-07 17:12   ` Johannes Schindelin
  0 siblings, 0 replies; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-07 17:12 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Don Zickus, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1201 bytes --]

Hi,

On Wed, 7 Feb 2007, Uwe Kleine-König wrote:

> Don Zickus wrote:
> > I was curious to know what is the easiest way to filter info inside a
> > commit message.
> > 
> > For example say I wanted to find out what patches Joe User has
> > submitted to the git project.
> > I know I can do something like ' git log |grep -B2 "^Author: Joe User"
> What about
> 
> 	git log --author="Joe User"
> 
> > ' and it will output the matches and the commit id.  However, if I
> > wanted to filter on something like "Signed-off-by: Joe User", then it
> > is a little harder to dig for the commit id.
> > 
> > Is there a better way of doing this?  Or should I accept the fact that
> > git wasn't designed to filter info like this very quickly?
> > 
> > I guess what I was looking to do was embed some metadata inside the
> > commit message and parse through it at a later time (ie like a
> > bugzilla number or something).
> > 
> > Any thoughts/tips/tricks would be helpful.
> 
> Maybe:
> 
> 	git log | awk -v sob="Joe User" '$1 == "commit" {commit = $2} /Signed-off-by:/ {if (match($0, sob)) print commit}'

*grin* Why do you know --author, but not --grep?

git log --grep=Signed-off-by:\ Joe\ User

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 16:41 git log filtering Don Zickus
  2007-02-07 16:55 ` Jakub Narebski
  2007-02-07 17:01 ` Uwe Kleine-König
@ 2007-02-07 17:12 ` Linus Torvalds
  2007-02-07 17:25   ` Johannes Schindelin
                     ` (2 more replies)
  2 siblings, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 17:12 UTC (permalink / raw)
  To: Don Zickus; +Cc: git



On Wed, 7 Feb 2007, Don Zickus wrote:
>
> I was curious to know what is the easiest way to filter info inside a
> commit message.
> 
> For example say I wanted to find out what patches Joe User has
> submitted to the git project.
> I know I can do something like ' git log |grep -B2 "^Author: Joe User"
> ' and it will output the matches and the commit id.  However, if I
> wanted to filter on something like "Signed-off-by: Joe User", then it
> is a little harder to dig for the commit id.

There are two ways:

 - "git log" can itself do a lot of filtering. Both on date, on revisions, 
   on "modifies files/directories X, Y and Z" _and_ on strings.

   See "man git-rev-list" for more (it doesn't apply to just "git log", it 
   applies to just about any revision listing, including gitk etc)

   For example,

	git log [--author=pattern] [--committer=pattern] [--grep=pattern]

   will likely do exactly what you want. You can do

	git log --grep="Signed-off-by:.*akpm"

   on the kernel archive to see which ones were signed off by Andrew.

So the above works, and catches *most* uses. But it has problems if you 
want to do something fancier (and I think that includes something as 
simple as doing a case-insensitive grep). So the other approach is:

 - The hacky way: use "git log --pretty -z", and GNU grep -z:

	git log --pretty -z |
		grep -i -z Signed-off-by:.*junkio |
		tr '\0' '\n'

   which allows you to do anything you want with grep (or other unix tools 
   that take zero-terminated output).

> Is there a better way of doing this?  Or should I accept the fact that
> git wasn't designed to filter info like this very quickly?

Git definitely was designed to do it. The "-z" option in particular is 
very much designed for any generic UNIX scripting, but the *easy* cases 
git does internally.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 17:12 ` Linus Torvalds
@ 2007-02-07 17:25   ` Johannes Schindelin
       [not found]     ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net>
  2007-02-07 18:16   ` Linus Torvalds
  2007-02-07 18:19   ` git log filtering Don Zickus
  2 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-07 17:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Don Zickus, git

Hi,

On Wed, 7 Feb 2007, Linus Torvalds wrote:

> You can do
> 
> 	git log --grep="Signed-off-by:.*akpm"
> 
>    on the kernel archive to see which ones were signed off by Andrew.
> 
> So the above works, and catches *most* uses. But it has problems if you 
> want to do something fancier (and I think that includes something as 
> simple as doing a case-insensitive grep).

[TIC PATCH] revision.c: accept "-i" to make --grep case insensitive

When calling

	git log --grep=blabla -i --grep=blublu

the expression "blabla" is greppend case _sensitively_, but "blublu"
case _insensitively_.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>

---

 revision.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/revision.c b/revision.c
index 42ba310..843aa8e 100644
--- a/revision.c
+++ b/revision.c
@@ -9,6 +9,8 @@
 #include "grep.h"
 #include "reflog-walk.h"
 
+static int case_insensitive_grep = 0;
+
 static char *path_name(struct name_path *path, const char *name)
 {
 	struct name_path *p;
@@ -742,6 +744,8 @@ static void add_grep(struct rev_info *revs, const char *ptn, enum grep_pat_token
 		opt->status_only = 1;
 		opt->pattern_tail = &(opt->pattern_list);
 		opt->regflags = REG_NEWLINE;
+		if (case_insensitive_grep)
+			opt->regflags |= REG_ICASE;
 		revs->grep_filter = opt;
 	}
 	append_grep_pattern(revs->grep_filter, ptn,
@@ -1042,6 +1046,11 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 				add_header_grep(revs, "committer", arg+12);
 				continue;
 			}
+			if (!strcmp(arg, "-i") ||
+					!strcmp(arg, "--case-insensitive")) {
+				case_insensitive_grep = 1;
+				continue;
+			}
 			if (!strncmp(arg, "--grep=", 7)) {
 				add_message_grep(revs, arg+7);
 				continue;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 17:12 ` Linus Torvalds
  2007-02-07 17:25   ` Johannes Schindelin
@ 2007-02-07 18:16   ` Linus Torvalds
  2007-02-07 19:49     ` Fix "git log -z" behaviour Linus Torvalds
  2007-02-07 18:19   ` git log filtering Don Zickus
  2 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 18:16 UTC (permalink / raw)
  To: Don Zickus, Junio C Hamano; +Cc: Git Mailing List



On Wed, 7 Feb 2007, Linus Torvalds wrote:
> 
> 	git log --pretty -z |

Gaah. If all you want is normal logs, you don't need the "--pretty", 
of course, since that's the default. Just "git log -z" will give you 
zero-terminated logs. 

But if you want to grep on committer, you'd need to use "--pretty=full" or 
something, of course, so the "--pretty=xyz" thing is indeed often 
applicable for things like this.

Also, I just checked, and we have a bug. Merges do not have the ending 
zero in "git log -z" output. It seems to be connected to the fact that we 
handle the "always_show_header" commits differently (the ones that we 
wouldn't normally show because they have no diffs associated with them).

The obvious fix for that failed. I'll look at it some more.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 17:12 ` Linus Torvalds
  2007-02-07 17:25   ` Johannes Schindelin
  2007-02-07 18:16   ` Linus Torvalds
@ 2007-02-07 18:19   ` Don Zickus
  2007-02-07 18:27     ` Linus Torvalds
  2 siblings, 1 reply; 34+ messages in thread
From: Don Zickus @ 2007-02-07 18:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>  - "git log" can itself do a lot of filtering. Both on date, on revisions,
>    on "modifies files/directories X, Y and Z" _and_ on strings.
>
>    See "man git-rev-list" for more (it doesn't apply to just "git log", it
>    applies to just about any revision listing, including gitk etc)
>
>    For example,
>
>         git log [--author=pattern] [--committer=pattern] [--grep=pattern]
>
>    will likely do exactly what you want. You can do
>
>         git log --grep="Signed-off-by:.*akpm"
>
>    on the kernel archive to see which ones were signed off by Andrew.

Cool.  The hidden little options.  :-)  This is exactly what I was
looking for.  Thanks.

I didn't see these options in the man pages.  Might be worth putting in there??

Cheers,
Don

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 18:19   ` git log filtering Don Zickus
@ 2007-02-07 18:27     ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 18:27 UTC (permalink / raw)
  To: Don Zickus; +Cc: git



On Wed, 7 Feb 2007, Don Zickus wrote:
>
> I didn't see these options in the man pages.  Might be worth putting in
> there??

Well, they really _are_ there, indirectly:

	The command takes options applicable to the git-rev-list(1) command 
	to control what is shown and how, and options applicable to the 
	git-diff-tree(1) commands to control how the change each commit 
	introduces are shown.

so you have to look at both git-rev-list and git-diff-tree to get all the 
options.

It then goes on to say:

	This manual page describes only the most frequently used options.
	                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

so technically it's complete and true.

But yeah, maybe we could include all the options there.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Fix "git log -z" behaviour
  2007-02-07 18:16   ` Linus Torvalds
@ 2007-02-07 19:49     ` Linus Torvalds
  2007-02-07 19:55       ` Junio C Hamano
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 19:49 UTC (permalink / raw)
  To: Don Zickus, Junio C Hamano; +Cc: Git Mailing List



For commit messages, we should really put the "line_termination" when we 
output the character in between different commits, *not* between the 
commit and the diff. The diff goes hand-in-hand with the commit, it 
shouldn't be separated from it with the termination character.

So this:
 - uses the termination character for true inter-commit spacing
 - uses a regular newline between the commit log and the diff

We had it the other way around.

For the normal case where the termination character is '\n', this 
obviously doesn't change anything at all, since we just switched two 
identical characters around. So it's very safe - it doesn't change any 
normal usage, but it definitely fixes "git log -z".

By fixing "git log -z", you can now also do insane things like

	git log -p -z |
		grep -z "some patch expression" |
		tr '\0' '\n' |
		less -S

and you will see only those commits that have the "some patch expression" 
in their commit message _or_ their patches.

(This is slightly different from 'git log -S"some patch expression"', 
since the latter requires the expression to literally *change* in the 
patch, while the "git log -p -z | grep .." approach will see it if it's 
just an unchanged _part_ of the patch context)

Of course, if you actually do something like the above, you're probably 
insane, but hey, it works!

Try the above command line for a demonstration (of course, you need to 
change the "some patch expression" to be something relevant). The old 
behaviour of "git log -p -z" was useless (and got things completely wrong 
for log entries without patches).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

On Wed, 7 Feb 2007, Linus Torvalds wrote:
> 
> Also, I just checked, and we have a bug. Merges do not have the ending 
> zero in "git log -z" output. It seems to be connected to the fact that we 
> handle the "always_show_header" commits differently (the ones that we 
> wouldn't normally show because they have no diffs associated with them).
> 
> The obvious fix for that failed. I'll look at it some more.

Actually, the obvious fix was right, I just did the *wrong* obvious fix at 
first ;)

 log-tree.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index d8ca36b..85acd66 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -143,7 +143,7 @@ void show_log(struct rev_info *opt, const char *sep)
 	if (*sep != '\n' && opt->commit_format == CMIT_FMT_ONELINE)
 		extra = "\n";
 	if (opt->shown_one && opt->commit_format != CMIT_FMT_ONELINE)
-		putchar('\n');
+		putchar(opt->diffopt.line_termination);
 	opt->shown_one = 1;
 
 	/*
@@ -270,9 +270,8 @@ int log_tree_diff_flush(struct rev_info *opt)
 		    opt->commit_format != CMIT_FMT_ONELINE) {
 			int pch = DIFF_FORMAT_DIFFSTAT | DIFF_FORMAT_PATCH;
 			if ((pch & opt->diffopt.output_format) == pch)
-				printf("---%c", opt->diffopt.line_termination);
-			else
-				putchar(opt->diffopt.line_termination);
+				printf("---");
+			putchar('\n');
 		}
 	}
 	diff_flush(&opt->diffopt);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-07 19:49     ` Fix "git log -z" behaviour Linus Torvalds
@ 2007-02-07 19:55       ` Junio C Hamano
  2007-02-07 22:53       ` Don Zickus
  2007-02-08 22:34       ` Junio C Hamano
  2 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2007-02-07 19:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Ah, I was looking at other minor issues and then came up with
this one liner.  But obviously "termination should be the true
inter-commit spacing" is the right direction, so I'll chuck this
one.

diff --git a/log-tree.c b/log-tree.c
index d8ca36b..410f90f 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -354,6 +354,8 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 	if (!shown && opt->loginfo && opt->always_show_header) {
 		log.parent = NULL;
 		show_log(opt, "");
+		if (!opt->diffopt.line_termination)
+			putchar(0);
 		shown = 1;
 	}
 	opt->loginfo = NULL;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: git log filtering
       [not found]     ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net>
@ 2007-02-07 21:03       ` Linus Torvalds
  2007-02-07 21:09         ` Junio C Hamano
  2007-02-08  1:59         ` Horst H. von Brand
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 21:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Don Zickus, git



On Wed, 7 Feb 2007, Junio C Hamano wrote:
> 
> This is very tempting but, ... hmmmm...

I would actually prefer to have it be some marker on the expression 
itself.

We already do that '^' handling by hand for "author"/"committer" things. 
We could do other things like that.

Although I guess the downside of not doing standard regexps would be too 
big.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 21:03       ` Linus Torvalds
@ 2007-02-07 21:09         ` Junio C Hamano
  2007-02-07 21:53           ` Linus Torvalds
  2007-02-08  1:59         ` Horst H. von Brand
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2007-02-07 21:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Don Zickus, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 7 Feb 2007, Junio C Hamano wrote:
>> 
>> This is very tempting but, ... hmmmm...
>
> I would actually prefer to have it be some marker on the expression 
> itself.
>
> We already do that '^' handling by hand for "author"/"committer" things. 
> We could do other things like that.
>
> Although I guess the downside of not doing standard regexps would be too 
> big.
>
> 		Linus

We could go pcre and let you say "(?i)".  That would all be post
1.5.0, though.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 21:09         ` Junio C Hamano
@ 2007-02-07 21:53           ` Linus Torvalds
  2007-02-08  6:16             ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 21:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Don Zickus, git



On Wed, 7 Feb 2007, Junio C Hamano wrote:
> 
> We could go pcre and let you say "(?i)".  That would all be post
> 1.5.0, though.

Hmm. PCRE is probably wide-spread enough that it could be an option. 

What's PCRE performance like? I'd hate to make "git grep" slower, and it 
would be stupid and confusing to use two different regex libraries..

Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API 
standpoint, not from a regex standpoint!) wrapper thing, and it might be 
interesting to hear if doing "git grep" is slower or faster..

(I realize that the performance thing depends heavily on the patterns and 
the working set they are used on, but I guess _I_ personally only care 
about fairly simple patterns on the kernel ;)

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-07 19:49     ` Fix "git log -z" behaviour Linus Torvalds
  2007-02-07 19:55       ` Junio C Hamano
@ 2007-02-07 22:53       ` Don Zickus
  2007-02-07 23:05         ` Linus Torvalds
  2007-02-08 22:34       ` Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Don Zickus @ 2007-02-07 22:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

>
> For commit messages, we should really put the "line_termination" when we
> output the character in between different commits, *not* between the
> commit and the diff. The diff goes hand-in-hand with the commit, it
> shouldn't be separated from it with the termination character.
>
> So this:
>  - uses the termination character for true inter-commit spacing
>  - uses a regular newline between the commit log and the diff
>
> We had it the other way around.
>
> For the normal case where the termination character is '\n', this
> obviously doesn't change anything at all, since we just switched two
> identical characters around. So it's very safe - it doesn't change any
> normal usage, but it definitely fixes "git log -z".
>
> By fixing "git log -z", you can now also do insane things like
>
>         git log -p -z |
>                 grep -z "some patch expression" |
>                 tr '\0' '\n' |
>                 less -S
>
> and you will see only those commits that have the "some patch expression"
> in their commit message _or_ their patches.
>
> (This is slightly different from 'git log -S"some patch expression"',
> since the latter requires the expression to literally *change* in the
> patch, while the "git log -p -z | grep .." approach will see it if it's
> just an unchanged _part_ of the patch context)
>
> Of course, if you actually do something like the above, you're probably
> insane, but hey, it works!
>
> Try the above command line for a demonstration (of course, you need to
> change the "some patch expression" to be something relevant). The old
> behaviour of "git log -p -z" was useless (and got things completely wrong
> for log entries without patches).
>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> ---
>
> On Wed, 7 Feb 2007, Linus Torvalds wrote:
> >
> > Also, I just checked, and we have a bug. Merges do not have the ending
> > zero in "git log -z" output. It seems to be connected to the fact that we
> > handle the "always_show_header" commits differently (the ones that we
> > wouldn't normally show because they have no diffs associated with them).
> >
> > The obvious fix for that failed. I'll look at it some more.
>
> Actually, the obvious fix was right, I just did the *wrong* obvious fix at
> first ;)

Works for me.  :)
And I thought I had a handle on a lot of the Unix commands.  That -z
stuff just threw me for a loop.  It's pretty neat to be able to grep
commits and have the output display the whole commit and diff.

Cheers,
Don

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-07 22:53       ` Don Zickus
@ 2007-02-07 23:05         ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-02-07 23:05 UTC (permalink / raw)
  To: Don Zickus; +Cc: Junio C Hamano, Git Mailing List



On Wed, 7 Feb 2007, Don Zickus wrote:
> 
> And I thought I had a handle on a lot of the Unix commands.  That -z
> stuff just threw me for a loop.  It's pretty neat to be able to grep
> commits and have the output display the whole commit and diff.

The whole "-z" flag to grep is a GNU extension, as far as I know. I don't 
think it's portable. 

Even for GNU grep, it's not mentioned in the man-page. Whether that is 
just due to the normal inane FSF rules ("man-pages are evil, you should 
use those idiotic info pages") or whether it is a conscious effort to not 
document nonstandard features, I don't know.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 21:03       ` Linus Torvalds
  2007-02-07 21:09         ` Junio C Hamano
@ 2007-02-08  1:59         ` Horst H. von Brand
  1 sibling, 0 replies; 34+ messages in thread
From: Horst H. von Brand @ 2007-02-08  1:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Schindelin, Don Zickus, git

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, 7 Feb 2007, Junio C Hamano wrote:
> > This is very tempting but, ... hmmmm...
> 
> I would actually prefer to have it be some marker on the expression 
> itself.
> 
> We already do that '^' handling by hand for "author"/"committer" things. 
> We could do other things like that.
> 
> Although I guess the downside of not doing standard regexps would be too 
> big.

Use Perl's regexps? the pcre library packs them, and they have all sorts of
goodies like markers in the expression itself. 
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239
Casilla 110-V, Valparaiso, Chile               Fax:  +56 32 2797513

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-07 21:53           ` Linus Torvalds
@ 2007-02-08  6:16             ` Jeff King
  2007-02-08 18:06               ` Johannes Schindelin
  2007-03-07 17:37               ` pcre performance, was " Johannes Schindelin
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2007-02-08  6:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote:

> What's PCRE performance like? I'd hate to make "git grep" slower, and it 
> would be stupid and confusing to use two different regex libraries..
>
> Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API 
> standpoint, not from a regex standpoint!) wrapper thing, and it might be 
> interesting to hear if doing "git grep" is slower or faster..

The patch is delightfully simple (though a real patch would probably be
conditional):

diff --git a/Makefile b/Makefile
index aca96c8..cf391dc 100644
--- a/Makefile
+++ b/Makefile
@@ -323,7 +323,7 @@ BUILTIN_OBJS = \
 	builtin-pack-refs.o
 
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
-EXTLIBS = -lz
+EXTLIBS = -lz -lpcreposix -lpcre
 
 #
 # Platform specific tweaks
diff --git a/git-compat-util.h b/git-compat-util.h
index c1bcb00..a6c77f9 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -40,7 +40,7 @@
 #include <sys/poll.h>
 #include <sys/socket.h>
 #include <assert.h>
-#include <regex.h>
+#include <pcreposix.h>
 #include <netinet/in.h>
 #include <netinet/tcp.h>
 #include <arpa/inet.h>


A few numbers, all from a fully packed kernel repository:

# glibc, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36617minor)pagefaults 0swaps

# glibc, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36210minor)pagefaults 0swaps

# pcre, trivial regex
$ /usr/bin/time git grep --cached foo >/dev/null
7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36571minor)pagefaults 0swaps

# pcre, complex regex
$ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]'  >/dev/null
36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36583minor)pagefaults 0swaps


So the winner seems to vary based on the complexity of the pattern.
There are some less rudimentary but non-git performance tests here:

  http://www.boost.org/libs/regex/doc/gcc-performance.html

In every case there, pcre has either comparable performance, or simply
blows away glibc.

One final note that caused some confusion during my testing: git-grep
still uses external grep for working tree greps (i.e., 'git grep foo').
This meant that 'git grep' and 'git grep --cached' produced wildly
different results once I was using pcre internally. Something to look
out for if we switch to pcre (or any other library which doesn't exactly
match external grep behavior!).

-Peff

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-08  6:16             ` Jeff King
@ 2007-02-08 18:06               ` Johannes Schindelin
  2007-02-08 22:33                 ` Jeff King
  2007-03-07 17:37               ` pcre performance, was " Johannes Schindelin
  1 sibling, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-08 18:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, git

Hi,

On Thu, 8 Feb 2007, Jeff King wrote:

> On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote:
> 
> > What's PCRE performance like? I'd hate to make "git grep" slower, and it 
> > would be stupid and confusing to use two different regex libraries..
> >
> > Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API 
> > standpoint, not from a regex standpoint!) wrapper thing, and it might be 
> > interesting to hear if doing "git grep" is slower or faster..
> 
> The patch is delightfully simple (though a real patch would probably be
> conditional):
>
> [...]

May I register a complaint? This is yet _another_ dependency.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-08 18:06               ` Johannes Schindelin
@ 2007-02-08 22:33                 ` Jeff King
  2007-02-09  0:18                   ` Johannes Schindelin
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2007-02-08 22:33 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Thu, Feb 08, 2007 at 07:06:25PM +0100, Johannes Schindelin wrote:

> May I register a complaint? This is yet _another_ dependency.

Unlike other dependencies, I think it's quite natural to make it a
conditional dependency. If you have pcre, you get more featureful
regular expressions. If you don't, you get posix regular expressions.
Do you object to a few extra lines in the Makefile?

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-07 19:49     ` Fix "git log -z" behaviour Linus Torvalds
  2007-02-07 19:55       ` Junio C Hamano
  2007-02-07 22:53       ` Don Zickus
@ 2007-02-08 22:34       ` Junio C Hamano
  2007-02-10  7:32         ` Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2007-02-08 22:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Don Zickus, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> For the normal case where the termination character is '\n', this 
> obviously doesn't change anything at all, since we just switched two 
> identical characters around. So it's very safe - it doesn't change any 
> normal usage, but it definitely fixes "git log -z".

Gaah.

I have already applied this but I think this has fallout for
existing users of "-z --raw".  Nothing in-tree uses "git log" as
the upstream of a pipe as far as I know because in-tree stuff
tend to stick to plumbing when it comes to scripting, but I
think your patch would affect the plumbing level as well.

Scripts that read from "-z --raw" have been expecting to get a
record whose first 7 bytes are "commit " to be a log, which is
followed by an arbitrary number of records whose first byte is
":" (and then it needs variable number of records to complete
one diff record).  This patch removes the separator NUL between
the log message and the first diff record.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-08 22:33                 ` Jeff King
@ 2007-02-09  0:18                   ` Johannes Schindelin
  2007-02-09  0:23                     ` Shawn O. Pearce
  2007-02-09  1:59                     ` Jeff King
  0 siblings, 2 replies; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-09  0:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi,

On Thu, 8 Feb 2007, Jeff King wrote:

> On Thu, Feb 08, 2007 at 07:06:25PM +0100, Johannes Schindelin wrote:
> 
> > May I register a complaint? This is yet _another_ dependency.
> 
> Unlike other dependencies, I think it's quite natural to make it a
> conditional dependency. If you have pcre, you get more featureful
> regular expressions. If you don't, you get posix regular expressions.
> Do you object to a few extra lines in the Makefile?

Yes, I do. Not because of the extra lines, but because of the inconsistent 
interface.

We included libxdiff _exactly_ to ensure consistency between different git 
installations (remember, diff behaves quite differently on different 
platforms, and even GNU diff behaves differently depending on which 
version you use).

So no, I do not like the idea of using git on some random box, only to 
realize that what I have grown used to does not work.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09  0:18                   ` Johannes Schindelin
@ 2007-02-09  0:23                     ` Shawn O. Pearce
  2007-02-09  0:45                       ` Johannes Schindelin
  2007-02-09 10:15                       ` Sergey Vlasov
  2007-02-09  1:59                     ` Jeff King
  1 sibling, 2 replies; 34+ messages in thread
From: Shawn O. Pearce @ 2007-02-09  0:23 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> We included libxdiff _exactly_ to ensure consistency between different git 
> installations (remember, diff behaves quite differently on different 
> platforms, and even GNU diff behaves differently depending on which 
> version you use).

pcre is covered by the BSD license.  Can we ship it with git, like
we ship libxdiff?  I want to say Apache ships with pcre, but they
use the Apache License so it might be easier for them to do so.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09  0:23                     ` Shawn O. Pearce
@ 2007-02-09  0:45                       ` Johannes Schindelin
  2007-02-09 10:15                       ` Sergey Vlasov
  1 sibling, 0 replies; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-09  0:45 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jeff King, git

Hi,

On Thu, 8 Feb 2007, Shawn O. Pearce wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > We included libxdiff _exactly_ to ensure consistency between different 
> > git installations (remember, diff behaves quite differently on 
> > different platforms, and even GNU diff behaves differently depending 
> > on which version you use).
> 
> pcre is covered by the BSD license.  Can we ship it with git, like we 
> ship libxdiff?  I want to say Apache ships with pcre, but they use the 
> Apache License so it might be easier for them to do so.

If we bundle it like we do with libxdiff, I do not have any objections. It 
would also help MinGW.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09  0:18                   ` Johannes Schindelin
  2007-02-09  0:23                     ` Shawn O. Pearce
@ 2007-02-09  1:59                     ` Jeff King
  2007-02-09 13:13                       ` Johannes Schindelin
  1 sibling, 1 reply; 34+ messages in thread
From: Jeff King @ 2007-02-09  1:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Fri, Feb 09, 2007 at 01:18:01AM +0100, Johannes Schindelin wrote:

> Yes, I do. Not because of the extra lines, but because of the inconsistent 
> interface.

OK, so we may either:
  1. always use the lowest common denominator (i.e., no pcre support)
  2. force a dependency for new features (i.e., require pcre)
  3. have inconsistency between builds (i.e., conditional dependency)
  4. include all dependencies, or re-write them natively

I agree that 4 can make some sense in limited situations, but I worry
that it will eventually cease to be scalable (we don't get improvements
or bugfixes automatically from other packages, we potentially re-invent
the wheel). We already have '3' for other things: openssl, curl, expat,
even perl.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09  0:23                     ` Shawn O. Pearce
  2007-02-09  0:45                       ` Johannes Schindelin
@ 2007-02-09 10:15                       ` Sergey Vlasov
  1 sibling, 0 replies; 34+ messages in thread
From: Sergey Vlasov @ 2007-02-09 10:15 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Johannes Schindelin, Jeff King, git

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

On Thu, 8 Feb 2007 19:23:44 -0500 Shawn O. Pearce wrote:

> pcre is covered by the BSD license.  Can we ship it with git, like
> we ship libxdiff?  I want to say Apache ships with pcre, but they
> use the Apache License so it might be easier for them to do so.

If you do this, please do not forget to add a way to use the system
copy of libpcre instead of the bundled version.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09  1:59                     ` Jeff King
@ 2007-02-09 13:13                       ` Johannes Schindelin
  2007-02-09 13:22                         ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-09 13:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi,

On Thu, 8 Feb 2007, Jeff King wrote:

> On Fri, Feb 09, 2007 at 01:18:01AM +0100, Johannes Schindelin wrote:
> 
> > Yes, I do. Not because of the extra lines, but because of the inconsistent 
> > interface.
> 
> OK, so we may either:
>   1. always use the lowest common denominator (i.e., no pcre support)
>   2. force a dependency for new features (i.e., require pcre)
>   3. have inconsistency between builds (i.e., conditional dependency)
>   4. include all dependencies, or re-write them natively
> 
> I agree that 4 can make some sense in limited situations, but I worry
> that it will eventually cease to be scalable (we don't get improvements
> or bugfixes automatically from other packages, we potentially re-invent
> the wheel). We already have '3' for other things: openssl, curl, expat,
> even perl.

The difference, of course, is that with the "other things", we either have 
no alternative (if you do not have curl, you cannot use HTTP transport), 
or we have workalikes (if you don't use openssl, the (possibly slower) 
SHA1 replacements take effect).

We _used_ to rely on external "diff" and "merge", but have them as inbuilt 
components, exactly to avoid "if you have a slightly differing setup, 
git behaves differently".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09 13:13                       ` Johannes Schindelin
@ 2007-02-09 13:22                         ` Jeff King
  2007-02-09 15:02                           ` Johannes Schindelin
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2007-02-09 13:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Fri, Feb 09, 2007 at 02:13:18PM +0100, Johannes Schindelin wrote:

> The difference, of course, is that with the "other things", we either have 
> no alternative (if you do not have curl, you cannot use HTTP transport), 
> or we have workalikes (if you don't use openssl, the (possibly slower) 
> SHA1 replacements take effect).

I'm not a pcre expert, but I thought most of the additions to posix
extended regular expressions were expressed through constructs that
would otherwise be invalid patterns. For example, '(?i)' doesn't make
any sense as a pattern. Thus you would only see different behavior when
inputting nonsense. Of course, we're not currently using extended
regexps, but that could be made the default without additional
dependencies.

> We _used_ to rely on external "diff" and "merge", but have them as inbuilt 
> components, exactly to avoid "if you have a slightly differing setup, 
> git behaves differently".

But you're OK with "if you didn't built against curl, http transport
just doesn't work." So what if there is a '--pcre' option and a
corresponding config option? Thus you get the same results always,
unless you use --pcre and it's not built, in which case git dies. That
seems to be the moral equivalent of the curl situation.


At any rate, you didn't address my original point, which is _all_ of
those options have drawbacks. I think the drawbacks of re-writing or
re-packaging a regular expression library outweigh those of adding the
dependency (or even having slightly irregular behavior).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: git log filtering
  2007-02-09 13:22                         ` Jeff King
@ 2007-02-09 15:02                           ` Johannes Schindelin
  0 siblings, 0 replies; 34+ messages in thread
From: Johannes Schindelin @ 2007-02-09 15:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi,

On Fri, 9 Feb 2007, Jeff King wrote:

> On Fri, Feb 09, 2007 at 02:13:18PM +0100, Johannes Schindelin wrote:
> 
> > The difference, of course, is that with the "other things", we either have 
> > no alternative (if you do not have curl, you cannot use HTTP transport), 
> > or we have workalikes (if you don't use openssl, the (possibly slower) 
> > SHA1 replacements take effect).
> 
> I'm not a pcre expert, but I thought most of the additions to posix
> extended regular expressions were expressed through constructs that
> would otherwise be invalid patterns.

So, once pcre is used, you can use these constructs. Even in scripts. 
Which just so happen to break on platforms where git is not compiled with 
pcre support.

Or do you suggest checking (in git!) if the pattern is a pcre special or 
not? That would be insane.

> > We _used_ to rely on external "diff" and "merge", but have them as 
> > inbuilt components, exactly to avoid "if you have a slightly differing 
> > setup, git behaves differently".
> 
> But you're OK with "if you didn't built against curl, http transport 
> just doesn't work."

Yes, I am. Since HTTP is itself only a second-class citizen.

> So what if there is a '--pcre' option and a corresponding config option? 
> Thus you get the same results always, unless you use --pcre and it's not 
> built, in which case git dies. That seems to be the moral equivalent of 
> the curl situation.

I might be wrong, but most of git does not depend on HTTP.

> At any rate, you didn't address my original point, which is _all_ of 
> those options have drawbacks. I think the drawbacks of re-writing or 
> re-packaging a regular expression library outweigh those of adding the 
> dependency (or even having slightly irregular behavior).

This is only because you do not really have problems with dependencies. 
You just install, or compile, the dependent thing, which happens to be no 
hassle, since you use Linux. And you can compile & install things.

Once everybody runs Linux, and is allowed to compile & install things, I 
will no longer complain about trillions of dependencies.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-08 22:34       ` Junio C Hamano
@ 2007-02-10  7:32         ` Junio C Hamano
  2007-02-10  9:36           ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2007-02-10  7:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Don Zickus, Git Mailing List

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> For the normal case where the termination character is '\n', this 
>> obviously doesn't change anything at all, since we just switched two 
>> identical characters around. So it's very safe - it doesn't change any 
>> normal usage, but it definitely fixes "git log -z".
>
> Gaah.
>
> I have already applied this but I think this has fallout for
> existing users of "-z --raw".  Nothing in-tree uses "git log" as
> the upstream of a pipe as far as I know because in-tree stuff
> tend to stick to plumbing when it comes to scripting, but I
> think your patch would affect the plumbing level as well.

I think the new semantics for -z ("inter-record termination is
NUL") makes a lot more sense for "-p -z" format that shows
commit log message and the patch text.  It makes filtering the
output with "grep -z" feel much more natural.

The new semantics is however quite inconsistent with the other
formats: --raw, --name-only and --name-status.  These already
use NUL for separating pathnames and fields when -z is given, in
order to allow scripts sensibly deal with pathname that contain
funny characters (e.g. LF and HT).  Nobody is likely to feed
their output to "grep -z", but one problematic case I see is to
use this:

	git log -z --raw -r --pretty=raw $commit

or its equivalent:

	git rev-list $commit |
        git diff-tree --stdin --raw -r --pretty=raw

to prepare data to feed something like fast-import.

But such newly written scripts can read from non -z and unwrap
paths themselves just as easily (the pathname safety with NUL
was invented before we started using c-quote consistently), so
it might be Ok to leave them (slightly) broken.

So, I give up.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-10  7:32         ` Junio C Hamano
@ 2007-02-10  9:36           ` Junio C Hamano
  2007-02-10 17:09             ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2007-02-10  9:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Don Zickus, Git Mailing List

Junio C Hamano <junkio@cox.net> writes:

> Junio C Hamano <junkio@cox.net> writes:
>
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>>> For the normal case where the termination character is '\n', this 
>>> obviously doesn't change anything at all, since we just switched two 
>>> identical characters around. So it's very safe - it doesn't change any 
>>> normal usage, but it definitely fixes "git log -z".
>>
>> Gaah.
>>
>> I have already applied this but I think this has fallout for
>> existing users of "-z --raw".  Nothing in-tree uses "git log" as
>> the upstream of a pipe as far as I know because in-tree stuff
>> tend to stick to plumbing when it comes to scripting, but I
>> think your patch would affect the plumbing level as well.
>
> I think the new semantics for -z ("inter-record termination is
> NUL") makes a lot more sense for "-p -z" format that shows
> commit log message and the patch text.  It makes filtering the
> output with "grep -z" feel much more natural.
>
> The new semantics is however quite inconsistent with the other
> formats: --raw, --name-only and --name-status.  These already
> use NUL for separating pathnames and fields when -z is given, in
> order to allow scripts sensibly deal with pathname that contain
> funny characters (e.g. LF and HT).  Nobody is likely to feed
> their output to "grep -z", but one problematic case I see is to
> use this:
>
> 	git log -z --raw -r --pretty=raw $commit
>
> or its equivalent:
>
> 	git rev-list $commit |
>         git diff-tree --stdin --raw -r --pretty=raw
>
> to prepare data to feed something like fast-import.
>
> But such newly written scripts can read from non -z and unwrap
> paths themselves just as easily (the pathname safety with NUL
> was invented before we started using c-quote consistently), so
> it might be Ok to leave them (slightly) broken.
>
> So, I give up.

... well, it just occured to me that it might make sense not to
let this new "use NUL as inter-commit separator for grep -z"
semantics hijack existing -z option, but introduce another
option, say, -Z.  Then you could even do something like:

	git log -Z -r --numstat |
        grep -z -e '^[1-9][0-9][0-9][0-9]*	'

to find commits that has more than 100 lines of additions to a
file.  (or use --stat and grep for '| *[1-9][0-9][0-9][0-9]* ' to
look for sum of addition+deletion ).

Hmmmm.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Fix "git log -z" behaviour
  2007-02-10  9:36           ` Junio C Hamano
@ 2007-02-10 17:09             ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-02-10 17:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Don Zickus, Git Mailing List



On Sat, 10 Feb 2007, Junio C Hamano wrote:
> 
> ... well, it just occured to me that it might make sense not to
> let this new "use NUL as inter-commit separator for grep -z"
> semantics hijack existing -z option, but introduce another
> option, say, -Z.

I don't think I disagree, but I do suspect it's not worth it.

Yes, we really do have two "line_termination" characters: the one between 
commits, and the one we use within raw diffs. However, I don't think the 
*combination* ever makes sense any more (*), so using the same flag 
doesn't seem to really be a problem.

And the -z "line_termination" already got hijacked a long time ago for 
inter-commit messages too, so while adding a "-Z" would perhaps avoid a 
certain ambiguity, it would actually potentially break stuff that just did

	git-rev-list -z --pretty .. | ...

which is actually _more_ likely than the "multiple commit messages _and_ 
raw outpu _and_ '-z'" combination.

So I would suggest leaving it as-is, especially since I don't think 
anybody has actually even noticed (ie nobody probably used that 
combination), and the new semantics in many ways are both more useful and 
more logical.

		Linus

(*) It may well have made sense a year and a half ago, I don't think it 
makes much sense any more.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* pcre performance, was Re: git log filtering
  2007-02-08  6:16             ` Jeff King
  2007-02-08 18:06               ` Johannes Schindelin
@ 2007-03-07 17:37               ` Johannes Schindelin
  2007-03-07 18:03                 ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2007-03-07 17:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, git

Hi,

On Thu, 8 Feb 2007, Jeff King wrote:

> In every case there, pcre has either comparable performance, or simply 
> blows away glibc.

So I tested this against external grep. For completeness' sake, I tested 
these against each other: GNU regex-0.12, Git _without_ external grep 
(relies on glibc's regex), Git _with_ external grep ("original"), pcre, 
and for good measure, pcre with NO_MMAP=1 (to test if disk access is the 
problem).

Here are the numbers:

grep-gnu-regex:

21.41user 1.08system 0:22.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps
21.40user 1.06system 0:22.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps
21.61user 1.06system 0:22.68elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps
21.30user 1.10system 0:22.48elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps
21.30user 1.08system 0:22.43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps

grep-no-external-grep:

6.98user 1.17system 0:08.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7120minor)pagefaults 0swaps
7.07user 1.16system 0:08.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7121minor)pagefaults 0swaps
6.98user 1.12system 0:08.11elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7121minor)pagefaults 0swaps
7.00user 1.18system 0:08.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7121minor)pagefaults 0swaps

grep-original:

0.82user 1.15system 0:01.97elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7090minor)pagefaults 0swaps
0.94user 1.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7099minor)pagefaults 0swaps
0.89user 1.07system 0:01.96elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7092minor)pagefaults 0swaps
0.81user 1.15system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7092minor)pagefaults 0swaps

grep-pcre:

4.04user 1.18system 0:05.24elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7205minor)pagefaults 0swaps
4.16user 1.08system 0:05.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7206minor)pagefaults 0swaps
4.24user 0.98system 0:05.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7206minor)pagefaults 0swaps
4.08user 1.14system 0:05.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7206minor)pagefaults 0swaps

grep-pcre-no-mmap:

4.15user 1.07system 0:05.22elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps
4.01user 1.14system 0:05.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7209minor)pagefaults 0swaps
3.94user 1.18system 0:05.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps
4.11user 1.06system 0:05.18elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+7210minor)pagefaults 0swaps

BTW this was "git grep Lin.*valds" on linux-2.6, just updated.

The first test was run 5 times instead of 4 to make sure it is hot cache. 
This is on a dual 1.2GHz 2GB machine.

I cannot really say anything about the pagefaults, so I'll leave that to 
the wizards.

Result: external grep wins hands-down. GNU regex loses hands-down. pcre 
seems to be better than glibc's regex engine, and gains ever so slightly 
when using NO_MMAP.

I ran the same test on a 1GHz 256MB machine which is overloaded, and in 
that case, GNU regex is still worst (~55 sec), while glibc and pcre are 
equal (glibc slightly slower with ~35 sec, pcre ~34 sec), and external 
grep wins (~29 sec). Of course, this is io-bound, but it shows that pcre 
uses more memory than glibc.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: pcre performance, was Re: git log filtering
  2007-03-07 17:37               ` pcre performance, was " Johannes Schindelin
@ 2007-03-07 18:03                 ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2007-03-07 18:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Linus Torvalds, git


> Result: external grep wins hands-down. GNU regex loses hands-down. pcre 
> seems to be better than glibc's regex engine, and gains ever so slightly 
> when using NO_MMAP.

Indeed GNU regex 0.12 loses, and that's why it was rewritten for (IIRC)
glibc 2.3.  Older glibc's use code derived from GNU regex 0.12; but the
old GNU regex code is dead in general (maybe it survives in Emacs -- but
I don't remember), and the glibc regex code can be used by external
programs via gnulib.

glibc is slower than PCRE mostly because it is internationalized.  So
for example it supports things like stra[.ss.]e matching both strasse
and straße in a German locale, or [[=a=]] matching aàáäâ and possibly
more variations.  In theory.  In practice I couldn't make it work
while writing this message...

External grep wins hands-down because it's a DFA engine.  If the regex
uses backreferences (or the above esoteric constructs), however, external
grep will not be able to give a definite answer using the fast engine,
and will fall back to glibc regex.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-03-07 18:06 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-07 16:41 git log filtering Don Zickus
2007-02-07 16:55 ` Jakub Narebski
2007-02-07 17:01 ` Uwe Kleine-König
2007-02-07 17:12   ` Johannes Schindelin
2007-02-07 17:12 ` Linus Torvalds
2007-02-07 17:25   ` Johannes Schindelin
     [not found]     ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net>
2007-02-07 21:03       ` Linus Torvalds
2007-02-07 21:09         ` Junio C Hamano
2007-02-07 21:53           ` Linus Torvalds
2007-02-08  6:16             ` Jeff King
2007-02-08 18:06               ` Johannes Schindelin
2007-02-08 22:33                 ` Jeff King
2007-02-09  0:18                   ` Johannes Schindelin
2007-02-09  0:23                     ` Shawn O. Pearce
2007-02-09  0:45                       ` Johannes Schindelin
2007-02-09 10:15                       ` Sergey Vlasov
2007-02-09  1:59                     ` Jeff King
2007-02-09 13:13                       ` Johannes Schindelin
2007-02-09 13:22                         ` Jeff King
2007-02-09 15:02                           ` Johannes Schindelin
2007-03-07 17:37               ` pcre performance, was " Johannes Schindelin
2007-03-07 18:03                 ` Paolo Bonzini
2007-02-08  1:59         ` Horst H. von Brand
2007-02-07 18:16   ` Linus Torvalds
2007-02-07 19:49     ` Fix "git log -z" behaviour Linus Torvalds
2007-02-07 19:55       ` Junio C Hamano
2007-02-07 22:53       ` Don Zickus
2007-02-07 23:05         ` Linus Torvalds
2007-02-08 22:34       ` Junio C Hamano
2007-02-10  7:32         ` Junio C Hamano
2007-02-10  9:36           ` Junio C Hamano
2007-02-10 17:09             ` Linus Torvalds
2007-02-07 18:19   ` git log filtering Don Zickus
2007-02-07 18:27     ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.