* git log filtering @ 2007-02-07 16:41 Don Zickus 2007-02-07 16:55 ` Jakub Narebski ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Don Zickus @ 2007-02-07 16:41 UTC (permalink / raw) To: git I was curious to know what is the easiest way to filter info inside a commit message. For example say I wanted to find out what patches Joe User has submitted to the git project. I know I can do something like ' git log |grep -B2 "^Author: Joe User" ' and it will output the matches and the commit id. However, if I wanted to filter on something like "Signed-off-by: Joe User", then it is a little harder to dig for the commit id. Is there a better way of doing this? Or should I accept the fact that git wasn't designed to filter info like this very quickly? I guess what I was looking to do was embed some metadata inside the commit message and parse through it at a later time (ie like a bugzilla number or something). Any thoughts/tips/tricks would be helpful. Cheers, Don ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 16:41 git log filtering Don Zickus @ 2007-02-07 16:55 ` Jakub Narebski 2007-02-07 17:01 ` Uwe Kleine-König 2007-02-07 17:12 ` Linus Torvalds 2 siblings, 0 replies; 34+ messages in thread From: Jakub Narebski @ 2007-02-07 16:55 UTC (permalink / raw) To: Don Zickus; +Cc: git [Cc: git@vger.kernel.org] Don Zickus wrote: > I was curious to know what is the easiest way to filter info inside a > commit message. > > For example say I wanted to find out what patches Joe User has > submitted to the git project. > > I know I can do something like ' git log |grep -B2 "^Author: Joe User" > ' and it will output the matches and the commit id. However, if I > wanted to filter on something like "Signed-off-by: Joe User", then it > is a little harder to dig for the commit id. > > Is there a better way of doing this? Or should I accept the fact that > git wasn't designed to filter info like this very quickly? You can use "git log --grep=<pattern>" for that, instead. This greps raw commit message. You can use --author and --comitter to grep those headers. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 16:41 git log filtering Don Zickus 2007-02-07 16:55 ` Jakub Narebski @ 2007-02-07 17:01 ` Uwe Kleine-König 2007-02-07 17:12 ` Johannes Schindelin 2007-02-07 17:12 ` Linus Torvalds 2 siblings, 1 reply; 34+ messages in thread From: Uwe Kleine-König @ 2007-02-07 17:01 UTC (permalink / raw) To: Don Zickus; +Cc: git Don Zickus wrote: > I was curious to know what is the easiest way to filter info inside a > commit message. > > For example say I wanted to find out what patches Joe User has > submitted to the git project. > I know I can do something like ' git log |grep -B2 "^Author: Joe User" What about git log --author="Joe User" > ' and it will output the matches and the commit id. However, if I > wanted to filter on something like "Signed-off-by: Joe User", then it > is a little harder to dig for the commit id. > > Is there a better way of doing this? Or should I accept the fact that > git wasn't designed to filter info like this very quickly? > > I guess what I was looking to do was embed some metadata inside the > commit message and parse through it at a later time (ie like a > bugzilla number or something). > > Any thoughts/tips/tricks would be helpful. Maybe: git log | awk -v sob="Joe User" '$1 == "commit" {commit = $2} /Signed-off-by:/ {if (match($0, sob)) print commit}' Best regards Uwe -- Uwe Kleine-König http://www.google.com/search?q=2004+in+roman+numerals ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 17:01 ` Uwe Kleine-König @ 2007-02-07 17:12 ` Johannes Schindelin 0 siblings, 0 replies; 34+ messages in thread From: Johannes Schindelin @ 2007-02-07 17:12 UTC (permalink / raw) To: Uwe Kleine-König; +Cc: Don Zickus, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 1201 bytes --] Hi, On Wed, 7 Feb 2007, Uwe Kleine-König wrote: > Don Zickus wrote: > > I was curious to know what is the easiest way to filter info inside a > > commit message. > > > > For example say I wanted to find out what patches Joe User has > > submitted to the git project. > > I know I can do something like ' git log |grep -B2 "^Author: Joe User" > What about > > git log --author="Joe User" > > > ' and it will output the matches and the commit id. However, if I > > wanted to filter on something like "Signed-off-by: Joe User", then it > > is a little harder to dig for the commit id. > > > > Is there a better way of doing this? Or should I accept the fact that > > git wasn't designed to filter info like this very quickly? > > > > I guess what I was looking to do was embed some metadata inside the > > commit message and parse through it at a later time (ie like a > > bugzilla number or something). > > > > Any thoughts/tips/tricks would be helpful. > > Maybe: > > git log | awk -v sob="Joe User" '$1 == "commit" {commit = $2} /Signed-off-by:/ {if (match($0, sob)) print commit}' *grin* Why do you know --author, but not --grep? git log --grep=Signed-off-by:\ Joe\ User Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 16:41 git log filtering Don Zickus 2007-02-07 16:55 ` Jakub Narebski 2007-02-07 17:01 ` Uwe Kleine-König @ 2007-02-07 17:12 ` Linus Torvalds 2007-02-07 17:25 ` Johannes Schindelin ` (2 more replies) 2 siblings, 3 replies; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 17:12 UTC (permalink / raw) To: Don Zickus; +Cc: git On Wed, 7 Feb 2007, Don Zickus wrote: > > I was curious to know what is the easiest way to filter info inside a > commit message. > > For example say I wanted to find out what patches Joe User has > submitted to the git project. > I know I can do something like ' git log |grep -B2 "^Author: Joe User" > ' and it will output the matches and the commit id. However, if I > wanted to filter on something like "Signed-off-by: Joe User", then it > is a little harder to dig for the commit id. There are two ways: - "git log" can itself do a lot of filtering. Both on date, on revisions, on "modifies files/directories X, Y and Z" _and_ on strings. See "man git-rev-list" for more (it doesn't apply to just "git log", it applies to just about any revision listing, including gitk etc) For example, git log [--author=pattern] [--committer=pattern] [--grep=pattern] will likely do exactly what you want. You can do git log --grep="Signed-off-by:.*akpm" on the kernel archive to see which ones were signed off by Andrew. So the above works, and catches *most* uses. But it has problems if you want to do something fancier (and I think that includes something as simple as doing a case-insensitive grep). So the other approach is: - The hacky way: use "git log --pretty -z", and GNU grep -z: git log --pretty -z | grep -i -z Signed-off-by:.*junkio | tr '\0' '\n' which allows you to do anything you want with grep (or other unix tools that take zero-terminated output). > Is there a better way of doing this? Or should I accept the fact that > git wasn't designed to filter info like this very quickly? Git definitely was designed to do it. The "-z" option in particular is very much designed for any generic UNIX scripting, but the *easy* cases git does internally. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 17:12 ` Linus Torvalds @ 2007-02-07 17:25 ` Johannes Schindelin [not found] ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net> 2007-02-07 18:16 ` Linus Torvalds 2007-02-07 18:19 ` git log filtering Don Zickus 2 siblings, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2007-02-07 17:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Don Zickus, git Hi, On Wed, 7 Feb 2007, Linus Torvalds wrote: > You can do > > git log --grep="Signed-off-by:.*akpm" > > on the kernel archive to see which ones were signed off by Andrew. > > So the above works, and catches *most* uses. But it has problems if you > want to do something fancier (and I think that includes something as > simple as doing a case-insensitive grep). [TIC PATCH] revision.c: accept "-i" to make --grep case insensitive When calling git log --grep=blabla -i --grep=blublu the expression "blabla" is greppend case _sensitively_, but "blublu" case _insensitively_. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> --- revision.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/revision.c b/revision.c index 42ba310..843aa8e 100644 --- a/revision.c +++ b/revision.c @@ -9,6 +9,8 @@ #include "grep.h" #include "reflog-walk.h" +static int case_insensitive_grep = 0; + static char *path_name(struct name_path *path, const char *name) { struct name_path *p; @@ -742,6 +744,8 @@ static void add_grep(struct rev_info *revs, const char *ptn, enum grep_pat_token opt->status_only = 1; opt->pattern_tail = &(opt->pattern_list); opt->regflags = REG_NEWLINE; + if (case_insensitive_grep) + opt->regflags |= REG_ICASE; revs->grep_filter = opt; } append_grep_pattern(revs->grep_filter, ptn, @@ -1042,6 +1046,11 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch add_header_grep(revs, "committer", arg+12); continue; } + if (!strcmp(arg, "-i") || + !strcmp(arg, "--case-insensitive")) { + case_insensitive_grep = 1; + continue; + } if (!strncmp(arg, "--grep=", 7)) { add_message_grep(revs, arg+7); continue; ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <7v64ad7l12.fsf@assigned-by-dhcp.cox.net>]
* Re: git log filtering [not found] ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net> @ 2007-02-07 21:03 ` Linus Torvalds 2007-02-07 21:09 ` Junio C Hamano 2007-02-08 1:59 ` Horst H. von Brand 0 siblings, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 21:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, Don Zickus, git On Wed, 7 Feb 2007, Junio C Hamano wrote: > > This is very tempting but, ... hmmmm... I would actually prefer to have it be some marker on the expression itself. We already do that '^' handling by hand for "author"/"committer" things. We could do other things like that. Although I guess the downside of not doing standard regexps would be too big. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 21:03 ` Linus Torvalds @ 2007-02-07 21:09 ` Junio C Hamano 2007-02-07 21:53 ` Linus Torvalds 2007-02-08 1:59 ` Horst H. von Brand 1 sibling, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2007-02-07 21:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Johannes Schindelin, Don Zickus, git Linus Torvalds <torvalds@linux-foundation.org> writes: > On Wed, 7 Feb 2007, Junio C Hamano wrote: >> >> This is very tempting but, ... hmmmm... > > I would actually prefer to have it be some marker on the expression > itself. > > We already do that '^' handling by hand for "author"/"committer" things. > We could do other things like that. > > Although I guess the downside of not doing standard regexps would be too > big. > > Linus We could go pcre and let you say "(?i)". That would all be post 1.5.0, though. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 21:09 ` Junio C Hamano @ 2007-02-07 21:53 ` Linus Torvalds 2007-02-08 6:16 ` Jeff King 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 21:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, Don Zickus, git On Wed, 7 Feb 2007, Junio C Hamano wrote: > > We could go pcre and let you say "(?i)". That would all be post > 1.5.0, though. Hmm. PCRE is probably wide-spread enough that it could be an option. What's PCRE performance like? I'd hate to make "git grep" slower, and it would be stupid and confusing to use two different regex libraries.. Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API standpoint, not from a regex standpoint!) wrapper thing, and it might be interesting to hear if doing "git grep" is slower or faster.. (I realize that the performance thing depends heavily on the patterns and the working set they are used on, but I guess _I_ personally only care about fairly simple patterns on the kernel ;) Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 21:53 ` Linus Torvalds @ 2007-02-08 6:16 ` Jeff King 2007-02-08 18:06 ` Johannes Schindelin 2007-03-07 17:37 ` pcre performance, was " Johannes Schindelin 0 siblings, 2 replies; 34+ messages in thread From: Jeff King @ 2007-02-08 6:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: git On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote: > What's PCRE performance like? I'd hate to make "git grep" slower, and it > would be stupid and confusing to use two different regex libraries.. > > Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API > standpoint, not from a regex standpoint!) wrapper thing, and it might be > interesting to hear if doing "git grep" is slower or faster.. The patch is delightfully simple (though a real patch would probably be conditional): diff --git a/Makefile b/Makefile index aca96c8..cf391dc 100644 --- a/Makefile +++ b/Makefile @@ -323,7 +323,7 @@ BUILTIN_OBJS = \ builtin-pack-refs.o GITLIBS = $(LIB_FILE) $(XDIFF_LIB) -EXTLIBS = -lz +EXTLIBS = -lz -lpcreposix -lpcre # # Platform specific tweaks diff --git a/git-compat-util.h b/git-compat-util.h index c1bcb00..a6c77f9 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -40,7 +40,7 @@ #include <sys/poll.h> #include <sys/socket.h> #include <assert.h> -#include <regex.h> +#include <pcreposix.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <arpa/inet.h> A few numbers, all from a fully packed kernel repository: # glibc, trivial regex $ /usr/bin/time git grep --cached foo >/dev/null 10.07user 0.15system 0:10.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36617minor)pagefaults 0swaps # glibc, complex regex $ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]' >/dev/null 24.42user 0.15system 0:24.60elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36210minor)pagefaults 0swaps # pcre, trivial regex $ /usr/bin/time git grep --cached foo >/dev/null 7.82user 0.12system 0:08.00elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36571minor)pagefaults 0swaps # pcre, complex regex $ /usr/bin/time git grep --cached '[a-z][0-9][a-z][0-9][a-z]' >/dev/null 36.51user 0.13system 0:36.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+36583minor)pagefaults 0swaps So the winner seems to vary based on the complexity of the pattern. There are some less rudimentary but non-git performance tests here: http://www.boost.org/libs/regex/doc/gcc-performance.html In every case there, pcre has either comparable performance, or simply blows away glibc. One final note that caused some confusion during my testing: git-grep still uses external grep for working tree greps (i.e., 'git grep foo'). This meant that 'git grep' and 'git grep --cached' produced wildly different results once I was using pcre internally. Something to look out for if we switch to pcre (or any other library which doesn't exactly match external grep behavior!). -Peff ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-08 6:16 ` Jeff King @ 2007-02-08 18:06 ` Johannes Schindelin 2007-02-08 22:33 ` Jeff King 2007-03-07 17:37 ` pcre performance, was " Johannes Schindelin 1 sibling, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2007-02-08 18:06 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, git Hi, On Thu, 8 Feb 2007, Jeff King wrote: > On Wed, Feb 07, 2007 at 01:53:18PM -0800, Linus Torvalds wrote: > > > What's PCRE performance like? I'd hate to make "git grep" slower, and it > > would be stupid and confusing to use two different regex libraries.. > > > > Maybe somebody could test - afaik, PCRE has a regex-compatible (from a API > > standpoint, not from a regex standpoint!) wrapper thing, and it might be > > interesting to hear if doing "git grep" is slower or faster.. > > The patch is delightfully simple (though a real patch would probably be > conditional): > > [...] May I register a complaint? This is yet _another_ dependency. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-08 18:06 ` Johannes Schindelin @ 2007-02-08 22:33 ` Jeff King 2007-02-09 0:18 ` Johannes Schindelin 0 siblings, 1 reply; 34+ messages in thread From: Jeff King @ 2007-02-08 22:33 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Thu, Feb 08, 2007 at 07:06:25PM +0100, Johannes Schindelin wrote: > May I register a complaint? This is yet _another_ dependency. Unlike other dependencies, I think it's quite natural to make it a conditional dependency. If you have pcre, you get more featureful regular expressions. If you don't, you get posix regular expressions. Do you object to a few extra lines in the Makefile? -Peff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-08 22:33 ` Jeff King @ 2007-02-09 0:18 ` Johannes Schindelin 2007-02-09 0:23 ` Shawn O. Pearce 2007-02-09 1:59 ` Jeff King 0 siblings, 2 replies; 34+ messages in thread From: Johannes Schindelin @ 2007-02-09 0:18 UTC (permalink / raw) To: Jeff King; +Cc: git Hi, On Thu, 8 Feb 2007, Jeff King wrote: > On Thu, Feb 08, 2007 at 07:06:25PM +0100, Johannes Schindelin wrote: > > > May I register a complaint? This is yet _another_ dependency. > > Unlike other dependencies, I think it's quite natural to make it a > conditional dependency. If you have pcre, you get more featureful > regular expressions. If you don't, you get posix regular expressions. > Do you object to a few extra lines in the Makefile? Yes, I do. Not because of the extra lines, but because of the inconsistent interface. We included libxdiff _exactly_ to ensure consistency between different git installations (remember, diff behaves quite differently on different platforms, and even GNU diff behaves differently depending on which version you use). So no, I do not like the idea of using git on some random box, only to realize that what I have grown used to does not work. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 0:18 ` Johannes Schindelin @ 2007-02-09 0:23 ` Shawn O. Pearce 2007-02-09 0:45 ` Johannes Schindelin 2007-02-09 10:15 ` Sergey Vlasov 2007-02-09 1:59 ` Jeff King 1 sibling, 2 replies; 34+ messages in thread From: Shawn O. Pearce @ 2007-02-09 0:23 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jeff King, git Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > We included libxdiff _exactly_ to ensure consistency between different git > installations (remember, diff behaves quite differently on different > platforms, and even GNU diff behaves differently depending on which > version you use). pcre is covered by the BSD license. Can we ship it with git, like we ship libxdiff? I want to say Apache ships with pcre, but they use the Apache License so it might be easier for them to do so. -- Shawn. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 0:23 ` Shawn O. Pearce @ 2007-02-09 0:45 ` Johannes Schindelin 2007-02-09 10:15 ` Sergey Vlasov 1 sibling, 0 replies; 34+ messages in thread From: Johannes Schindelin @ 2007-02-09 0:45 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jeff King, git Hi, On Thu, 8 Feb 2007, Shawn O. Pearce wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > We included libxdiff _exactly_ to ensure consistency between different > > git installations (remember, diff behaves quite differently on > > different platforms, and even GNU diff behaves differently depending > > on which version you use). > > pcre is covered by the BSD license. Can we ship it with git, like we > ship libxdiff? I want to say Apache ships with pcre, but they use the > Apache License so it might be easier for them to do so. If we bundle it like we do with libxdiff, I do not have any objections. It would also help MinGW. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 0:23 ` Shawn O. Pearce 2007-02-09 0:45 ` Johannes Schindelin @ 2007-02-09 10:15 ` Sergey Vlasov 1 sibling, 0 replies; 34+ messages in thread From: Sergey Vlasov @ 2007-02-09 10:15 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Johannes Schindelin, Jeff King, git [-- Attachment #1: Type: text/plain, Size: 378 bytes --] On Thu, 8 Feb 2007 19:23:44 -0500 Shawn O. Pearce wrote: > pcre is covered by the BSD license. Can we ship it with git, like > we ship libxdiff? I want to say Apache ships with pcre, but they > use the Apache License so it might be easier for them to do so. If you do this, please do not forget to add a way to use the system copy of libpcre instead of the bundled version. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 0:18 ` Johannes Schindelin 2007-02-09 0:23 ` Shawn O. Pearce @ 2007-02-09 1:59 ` Jeff King 2007-02-09 13:13 ` Johannes Schindelin 1 sibling, 1 reply; 34+ messages in thread From: Jeff King @ 2007-02-09 1:59 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Fri, Feb 09, 2007 at 01:18:01AM +0100, Johannes Schindelin wrote: > Yes, I do. Not because of the extra lines, but because of the inconsistent > interface. OK, so we may either: 1. always use the lowest common denominator (i.e., no pcre support) 2. force a dependency for new features (i.e., require pcre) 3. have inconsistency between builds (i.e., conditional dependency) 4. include all dependencies, or re-write them natively I agree that 4 can make some sense in limited situations, but I worry that it will eventually cease to be scalable (we don't get improvements or bugfixes automatically from other packages, we potentially re-invent the wheel). We already have '3' for other things: openssl, curl, expat, even perl. -Peff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 1:59 ` Jeff King @ 2007-02-09 13:13 ` Johannes Schindelin 2007-02-09 13:22 ` Jeff King 0 siblings, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2007-02-09 13:13 UTC (permalink / raw) To: Jeff King; +Cc: git Hi, On Thu, 8 Feb 2007, Jeff King wrote: > On Fri, Feb 09, 2007 at 01:18:01AM +0100, Johannes Schindelin wrote: > > > Yes, I do. Not because of the extra lines, but because of the inconsistent > > interface. > > OK, so we may either: > 1. always use the lowest common denominator (i.e., no pcre support) > 2. force a dependency for new features (i.e., require pcre) > 3. have inconsistency between builds (i.e., conditional dependency) > 4. include all dependencies, or re-write them natively > > I agree that 4 can make some sense in limited situations, but I worry > that it will eventually cease to be scalable (we don't get improvements > or bugfixes automatically from other packages, we potentially re-invent > the wheel). We already have '3' for other things: openssl, curl, expat, > even perl. The difference, of course, is that with the "other things", we either have no alternative (if you do not have curl, you cannot use HTTP transport), or we have workalikes (if you don't use openssl, the (possibly slower) SHA1 replacements take effect). We _used_ to rely on external "diff" and "merge", but have them as inbuilt components, exactly to avoid "if you have a slightly differing setup, git behaves differently". Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 13:13 ` Johannes Schindelin @ 2007-02-09 13:22 ` Jeff King 2007-02-09 15:02 ` Johannes Schindelin 0 siblings, 1 reply; 34+ messages in thread From: Jeff King @ 2007-02-09 13:22 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Fri, Feb 09, 2007 at 02:13:18PM +0100, Johannes Schindelin wrote: > The difference, of course, is that with the "other things", we either have > no alternative (if you do not have curl, you cannot use HTTP transport), > or we have workalikes (if you don't use openssl, the (possibly slower) > SHA1 replacements take effect). I'm not a pcre expert, but I thought most of the additions to posix extended regular expressions were expressed through constructs that would otherwise be invalid patterns. For example, '(?i)' doesn't make any sense as a pattern. Thus you would only see different behavior when inputting nonsense. Of course, we're not currently using extended regexps, but that could be made the default without additional dependencies. > We _used_ to rely on external "diff" and "merge", but have them as inbuilt > components, exactly to avoid "if you have a slightly differing setup, > git behaves differently". But you're OK with "if you didn't built against curl, http transport just doesn't work." So what if there is a '--pcre' option and a corresponding config option? Thus you get the same results always, unless you use --pcre and it's not built, in which case git dies. That seems to be the moral equivalent of the curl situation. At any rate, you didn't address my original point, which is _all_ of those options have drawbacks. I think the drawbacks of re-writing or re-packaging a regular expression library outweigh those of adding the dependency (or even having slightly irregular behavior). -Peff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-09 13:22 ` Jeff King @ 2007-02-09 15:02 ` Johannes Schindelin 0 siblings, 0 replies; 34+ messages in thread From: Johannes Schindelin @ 2007-02-09 15:02 UTC (permalink / raw) To: Jeff King; +Cc: git Hi, On Fri, 9 Feb 2007, Jeff King wrote: > On Fri, Feb 09, 2007 at 02:13:18PM +0100, Johannes Schindelin wrote: > > > The difference, of course, is that with the "other things", we either have > > no alternative (if you do not have curl, you cannot use HTTP transport), > > or we have workalikes (if you don't use openssl, the (possibly slower) > > SHA1 replacements take effect). > > I'm not a pcre expert, but I thought most of the additions to posix > extended regular expressions were expressed through constructs that > would otherwise be invalid patterns. So, once pcre is used, you can use these constructs. Even in scripts. Which just so happen to break on platforms where git is not compiled with pcre support. Or do you suggest checking (in git!) if the pattern is a pcre special or not? That would be insane. > > We _used_ to rely on external "diff" and "merge", but have them as > > inbuilt components, exactly to avoid "if you have a slightly differing > > setup, git behaves differently". > > But you're OK with "if you didn't built against curl, http transport > just doesn't work." Yes, I am. Since HTTP is itself only a second-class citizen. > So what if there is a '--pcre' option and a corresponding config option? > Thus you get the same results always, unless you use --pcre and it's not > built, in which case git dies. That seems to be the moral equivalent of > the curl situation. I might be wrong, but most of git does not depend on HTTP. > At any rate, you didn't address my original point, which is _all_ of > those options have drawbacks. I think the drawbacks of re-writing or > re-packaging a regular expression library outweigh those of adding the > dependency (or even having slightly irregular behavior). This is only because you do not really have problems with dependencies. You just install, or compile, the dependent thing, which happens to be no hassle, since you use Linux. And you can compile & install things. Once everybody runs Linux, and is allowed to compile & install things, I will no longer complain about trillions of dependencies. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* pcre performance, was Re: git log filtering 2007-02-08 6:16 ` Jeff King 2007-02-08 18:06 ` Johannes Schindelin @ 2007-03-07 17:37 ` Johannes Schindelin 2007-03-07 18:03 ` Paolo Bonzini 1 sibling, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2007-03-07 17:37 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, git Hi, On Thu, 8 Feb 2007, Jeff King wrote: > In every case there, pcre has either comparable performance, or simply > blows away glibc. So I tested this against external grep. For completeness' sake, I tested these against each other: GNU regex-0.12, Git _without_ external grep (relies on glibc's regex), Git _with_ external grep ("original"), pcre, and for good measure, pcre with NO_MMAP=1 (to test if disk access is the problem). Here are the numbers: grep-gnu-regex: 21.41user 1.08system 0:22.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7210minor)pagefaults 0swaps 21.40user 1.06system 0:22.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7209minor)pagefaults 0swaps 21.61user 1.06system 0:22.68elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7209minor)pagefaults 0swaps 21.30user 1.10system 0:22.48elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7210minor)pagefaults 0swaps 21.30user 1.08system 0:22.43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7209minor)pagefaults 0swaps grep-no-external-grep: 6.98user 1.17system 0:08.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7120minor)pagefaults 0swaps 7.07user 1.16system 0:08.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7121minor)pagefaults 0swaps 6.98user 1.12system 0:08.11elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7121minor)pagefaults 0swaps 7.00user 1.18system 0:08.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7121minor)pagefaults 0swaps grep-original: 0.82user 1.15system 0:01.97elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7090minor)pagefaults 0swaps 0.94user 1.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7099minor)pagefaults 0swaps 0.89user 1.07system 0:01.96elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7092minor)pagefaults 0swaps 0.81user 1.15system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7092minor)pagefaults 0swaps grep-pcre: 4.04user 1.18system 0:05.24elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7205minor)pagefaults 0swaps 4.16user 1.08system 0:05.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7206minor)pagefaults 0swaps 4.24user 0.98system 0:05.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7206minor)pagefaults 0swaps 4.08user 1.14system 0:05.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7206minor)pagefaults 0swaps grep-pcre-no-mmap: 4.15user 1.07system 0:05.22elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7210minor)pagefaults 0swaps 4.01user 1.14system 0:05.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7209minor)pagefaults 0swaps 3.94user 1.18system 0:05.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7210minor)pagefaults 0swaps 4.11user 1.06system 0:05.18elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+7210minor)pagefaults 0swaps BTW this was "git grep Lin.*valds" on linux-2.6, just updated. The first test was run 5 times instead of 4 to make sure it is hot cache. This is on a dual 1.2GHz 2GB machine. I cannot really say anything about the pagefaults, so I'll leave that to the wizards. Result: external grep wins hands-down. GNU regex loses hands-down. pcre seems to be better than glibc's regex engine, and gains ever so slightly when using NO_MMAP. I ran the same test on a 1GHz 256MB machine which is overloaded, and in that case, GNU regex is still worst (~55 sec), while glibc and pcre are equal (glibc slightly slower with ~35 sec, pcre ~34 sec), and external grep wins (~29 sec). Of course, this is io-bound, but it shows that pcre uses more memory than glibc. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: pcre performance, was Re: git log filtering 2007-03-07 17:37 ` pcre performance, was " Johannes Schindelin @ 2007-03-07 18:03 ` Paolo Bonzini 0 siblings, 0 replies; 34+ messages in thread From: Paolo Bonzini @ 2007-03-07 18:03 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jeff King, Linus Torvalds, git > Result: external grep wins hands-down. GNU regex loses hands-down. pcre > seems to be better than glibc's regex engine, and gains ever so slightly > when using NO_MMAP. Indeed GNU regex 0.12 loses, and that's why it was rewritten for (IIRC) glibc 2.3. Older glibc's use code derived from GNU regex 0.12; but the old GNU regex code is dead in general (maybe it survives in Emacs -- but I don't remember), and the glibc regex code can be used by external programs via gnulib. glibc is slower than PCRE mostly because it is internationalized. So for example it supports things like stra[.ss.]e matching both strasse and straße in a German locale, or [[=a=]] matching aàáäâ and possibly more variations. In theory. In practice I couldn't make it work while writing this message... External grep wins hands-down because it's a DFA engine. If the regex uses backreferences (or the above esoteric constructs), however, external grep will not be able to give a definite answer using the fast engine, and will fall back to glibc regex. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 21:03 ` Linus Torvalds 2007-02-07 21:09 ` Junio C Hamano @ 2007-02-08 1:59 ` Horst H. von Brand 1 sibling, 0 replies; 34+ messages in thread From: Horst H. von Brand @ 2007-02-08 1:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Schindelin, Don Zickus, git Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 7 Feb 2007, Junio C Hamano wrote: > > This is very tempting but, ... hmmmm... > > I would actually prefer to have it be some marker on the expression > itself. > > We already do that '^' handling by hand for "author"/"committer" things. > We could do other things like that. > > Although I guess the downside of not doing standard regexps would be too > big. Use Perl's regexps? the pcre library packs them, and they have all sorts of goodies like markers in the expression itself. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 2654431 Universidad Tecnica Federico Santa Maria +56 32 2654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 17:12 ` Linus Torvalds 2007-02-07 17:25 ` Johannes Schindelin @ 2007-02-07 18:16 ` Linus Torvalds 2007-02-07 19:49 ` Fix "git log -z" behaviour Linus Torvalds 2007-02-07 18:19 ` git log filtering Don Zickus 2 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 18:16 UTC (permalink / raw) To: Don Zickus, Junio C Hamano; +Cc: Git Mailing List On Wed, 7 Feb 2007, Linus Torvalds wrote: > > git log --pretty -z | Gaah. If all you want is normal logs, you don't need the "--pretty", of course, since that's the default. Just "git log -z" will give you zero-terminated logs. But if you want to grep on committer, you'd need to use "--pretty=full" or something, of course, so the "--pretty=xyz" thing is indeed often applicable for things like this. Also, I just checked, and we have a bug. Merges do not have the ending zero in "git log -z" output. It seems to be connected to the fact that we handle the "always_show_header" commits differently (the ones that we wouldn't normally show because they have no diffs associated with them). The obvious fix for that failed. I'll look at it some more. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Fix "git log -z" behaviour 2007-02-07 18:16 ` Linus Torvalds @ 2007-02-07 19:49 ` Linus Torvalds 2007-02-07 19:55 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 19:49 UTC (permalink / raw) To: Don Zickus, Junio C Hamano; +Cc: Git Mailing List For commit messages, we should really put the "line_termination" when we output the character in between different commits, *not* between the commit and the diff. The diff goes hand-in-hand with the commit, it shouldn't be separated from it with the termination character. So this: - uses the termination character for true inter-commit spacing - uses a regular newline between the commit log and the diff We had it the other way around. For the normal case where the termination character is '\n', this obviously doesn't change anything at all, since we just switched two identical characters around. So it's very safe - it doesn't change any normal usage, but it definitely fixes "git log -z". By fixing "git log -z", you can now also do insane things like git log -p -z | grep -z "some patch expression" | tr '\0' '\n' | less -S and you will see only those commits that have the "some patch expression" in their commit message _or_ their patches. (This is slightly different from 'git log -S"some patch expression"', since the latter requires the expression to literally *change* in the patch, while the "git log -p -z | grep .." approach will see it if it's just an unchanged _part_ of the patch context) Of course, if you actually do something like the above, you're probably insane, but hey, it works! Try the above command line for a demonstration (of course, you need to change the "some patch expression" to be something relevant). The old behaviour of "git log -p -z" was useless (and got things completely wrong for log entries without patches). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- On Wed, 7 Feb 2007, Linus Torvalds wrote: > > Also, I just checked, and we have a bug. Merges do not have the ending > zero in "git log -z" output. It seems to be connected to the fact that we > handle the "always_show_header" commits differently (the ones that we > wouldn't normally show because they have no diffs associated with them). > > The obvious fix for that failed. I'll look at it some more. Actually, the obvious fix was right, I just did the *wrong* obvious fix at first ;) log-tree.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/log-tree.c b/log-tree.c index d8ca36b..85acd66 100644 --- a/log-tree.c +++ b/log-tree.c @@ -143,7 +143,7 @@ void show_log(struct rev_info *opt, const char *sep) if (*sep != '\n' && opt->commit_format == CMIT_FMT_ONELINE) extra = "\n"; if (opt->shown_one && opt->commit_format != CMIT_FMT_ONELINE) - putchar('\n'); + putchar(opt->diffopt.line_termination); opt->shown_one = 1; /* @@ -270,9 +270,8 @@ int log_tree_diff_flush(struct rev_info *opt) opt->commit_format != CMIT_FMT_ONELINE) { int pch = DIFF_FORMAT_DIFFSTAT | DIFF_FORMAT_PATCH; if ((pch & opt->diffopt.output_format) == pch) - printf("---%c", opt->diffopt.line_termination); - else - putchar(opt->diffopt.line_termination); + printf("---"); + putchar('\n'); } } diff_flush(&opt->diffopt); ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-07 19:49 ` Fix "git log -z" behaviour Linus Torvalds @ 2007-02-07 19:55 ` Junio C Hamano 2007-02-07 22:53 ` Don Zickus 2007-02-08 22:34 ` Junio C Hamano 2 siblings, 0 replies; 34+ messages in thread From: Junio C Hamano @ 2007-02-07 19:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Ah, I was looking at other minor issues and then came up with this one liner. But obviously "termination should be the true inter-commit spacing" is the right direction, so I'll chuck this one. diff --git a/log-tree.c b/log-tree.c index d8ca36b..410f90f 100644 --- a/log-tree.c +++ b/log-tree.c @@ -354,6 +354,8 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) if (!shown && opt->loginfo && opt->always_show_header) { log.parent = NULL; show_log(opt, ""); + if (!opt->diffopt.line_termination) + putchar(0); shown = 1; } opt->loginfo = NULL; ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-07 19:49 ` Fix "git log -z" behaviour Linus Torvalds 2007-02-07 19:55 ` Junio C Hamano @ 2007-02-07 22:53 ` Don Zickus 2007-02-07 23:05 ` Linus Torvalds 2007-02-08 22:34 ` Junio C Hamano 2 siblings, 1 reply; 34+ messages in thread From: Don Zickus @ 2007-02-07 22:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List > > For commit messages, we should really put the "line_termination" when we > output the character in between different commits, *not* between the > commit and the diff. The diff goes hand-in-hand with the commit, it > shouldn't be separated from it with the termination character. > > So this: > - uses the termination character for true inter-commit spacing > - uses a regular newline between the commit log and the diff > > We had it the other way around. > > For the normal case where the termination character is '\n', this > obviously doesn't change anything at all, since we just switched two > identical characters around. So it's very safe - it doesn't change any > normal usage, but it definitely fixes "git log -z". > > By fixing "git log -z", you can now also do insane things like > > git log -p -z | > grep -z "some patch expression" | > tr '\0' '\n' | > less -S > > and you will see only those commits that have the "some patch expression" > in their commit message _or_ their patches. > > (This is slightly different from 'git log -S"some patch expression"', > since the latter requires the expression to literally *change* in the > patch, while the "git log -p -z | grep .." approach will see it if it's > just an unchanged _part_ of the patch context) > > Of course, if you actually do something like the above, you're probably > insane, but hey, it works! > > Try the above command line for a demonstration (of course, you need to > change the "some patch expression" to be something relevant). The old > behaviour of "git log -p -z" was useless (and got things completely wrong > for log entries without patches). > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > --- > > On Wed, 7 Feb 2007, Linus Torvalds wrote: > > > > Also, I just checked, and we have a bug. Merges do not have the ending > > zero in "git log -z" output. It seems to be connected to the fact that we > > handle the "always_show_header" commits differently (the ones that we > > wouldn't normally show because they have no diffs associated with them). > > > > The obvious fix for that failed. I'll look at it some more. > > Actually, the obvious fix was right, I just did the *wrong* obvious fix at > first ;) Works for me. :) And I thought I had a handle on a lot of the Unix commands. That -z stuff just threw me for a loop. It's pretty neat to be able to grep commits and have the output display the whole commit and diff. Cheers, Don ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-07 22:53 ` Don Zickus @ 2007-02-07 23:05 ` Linus Torvalds 0 siblings, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 23:05 UTC (permalink / raw) To: Don Zickus; +Cc: Junio C Hamano, Git Mailing List On Wed, 7 Feb 2007, Don Zickus wrote: > > And I thought I had a handle on a lot of the Unix commands. That -z > stuff just threw me for a loop. It's pretty neat to be able to grep > commits and have the output display the whole commit and diff. The whole "-z" flag to grep is a GNU extension, as far as I know. I don't think it's portable. Even for GNU grep, it's not mentioned in the man-page. Whether that is just due to the normal inane FSF rules ("man-pages are evil, you should use those idiotic info pages") or whether it is a conscious effort to not document nonstandard features, I don't know. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-07 19:49 ` Fix "git log -z" behaviour Linus Torvalds 2007-02-07 19:55 ` Junio C Hamano 2007-02-07 22:53 ` Don Zickus @ 2007-02-08 22:34 ` Junio C Hamano 2007-02-10 7:32 ` Junio C Hamano 2 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2007-02-08 22:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Don Zickus, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > For the normal case where the termination character is '\n', this > obviously doesn't change anything at all, since we just switched two > identical characters around. So it's very safe - it doesn't change any > normal usage, but it definitely fixes "git log -z". Gaah. I have already applied this but I think this has fallout for existing users of "-z --raw". Nothing in-tree uses "git log" as the upstream of a pipe as far as I know because in-tree stuff tend to stick to plumbing when it comes to scripting, but I think your patch would affect the plumbing level as well. Scripts that read from "-z --raw" have been expecting to get a record whose first 7 bytes are "commit " to be a log, which is followed by an arbitrary number of records whose first byte is ":" (and then it needs variable number of records to complete one diff record). This patch removes the separator NUL between the log message and the first diff record. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-08 22:34 ` Junio C Hamano @ 2007-02-10 7:32 ` Junio C Hamano 2007-02-10 9:36 ` Junio C Hamano 0 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2007-02-10 7:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Don Zickus, Git Mailing List Junio C Hamano <junkio@cox.net> writes: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> For the normal case where the termination character is '\n', this >> obviously doesn't change anything at all, since we just switched two >> identical characters around. So it's very safe - it doesn't change any >> normal usage, but it definitely fixes "git log -z". > > Gaah. > > I have already applied this but I think this has fallout for > existing users of "-z --raw". Nothing in-tree uses "git log" as > the upstream of a pipe as far as I know because in-tree stuff > tend to stick to plumbing when it comes to scripting, but I > think your patch would affect the plumbing level as well. I think the new semantics for -z ("inter-record termination is NUL") makes a lot more sense for "-p -z" format that shows commit log message and the patch text. It makes filtering the output with "grep -z" feel much more natural. The new semantics is however quite inconsistent with the other formats: --raw, --name-only and --name-status. These already use NUL for separating pathnames and fields when -z is given, in order to allow scripts sensibly deal with pathname that contain funny characters (e.g. LF and HT). Nobody is likely to feed their output to "grep -z", but one problematic case I see is to use this: git log -z --raw -r --pretty=raw $commit or its equivalent: git rev-list $commit | git diff-tree --stdin --raw -r --pretty=raw to prepare data to feed something like fast-import. But such newly written scripts can read from non -z and unwrap paths themselves just as easily (the pathname safety with NUL was invented before we started using c-quote consistently), so it might be Ok to leave them (slightly) broken. So, I give up. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-10 7:32 ` Junio C Hamano @ 2007-02-10 9:36 ` Junio C Hamano 2007-02-10 17:09 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2007-02-10 9:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Don Zickus, Git Mailing List Junio C Hamano <junkio@cox.net> writes: > Junio C Hamano <junkio@cox.net> writes: > >> Linus Torvalds <torvalds@linux-foundation.org> writes: >> >>> For the normal case where the termination character is '\n', this >>> obviously doesn't change anything at all, since we just switched two >>> identical characters around. So it's very safe - it doesn't change any >>> normal usage, but it definitely fixes "git log -z". >> >> Gaah. >> >> I have already applied this but I think this has fallout for >> existing users of "-z --raw". Nothing in-tree uses "git log" as >> the upstream of a pipe as far as I know because in-tree stuff >> tend to stick to plumbing when it comes to scripting, but I >> think your patch would affect the plumbing level as well. > > I think the new semantics for -z ("inter-record termination is > NUL") makes a lot more sense for "-p -z" format that shows > commit log message and the patch text. It makes filtering the > output with "grep -z" feel much more natural. > > The new semantics is however quite inconsistent with the other > formats: --raw, --name-only and --name-status. These already > use NUL for separating pathnames and fields when -z is given, in > order to allow scripts sensibly deal with pathname that contain > funny characters (e.g. LF and HT). Nobody is likely to feed > their output to "grep -z", but one problematic case I see is to > use this: > > git log -z --raw -r --pretty=raw $commit > > or its equivalent: > > git rev-list $commit | > git diff-tree --stdin --raw -r --pretty=raw > > to prepare data to feed something like fast-import. > > But such newly written scripts can read from non -z and unwrap > paths themselves just as easily (the pathname safety with NUL > was invented before we started using c-quote consistently), so > it might be Ok to leave them (slightly) broken. > > So, I give up. ... well, it just occured to me that it might make sense not to let this new "use NUL as inter-commit separator for grep -z" semantics hijack existing -z option, but introduce another option, say, -Z. Then you could even do something like: git log -Z -r --numstat | grep -z -e '^[1-9][0-9][0-9][0-9]* ' to find commits that has more than 100 lines of additions to a file. (or use --stat and grep for '| *[1-9][0-9][0-9][0-9]* ' to look for sum of addition+deletion ). Hmmmm. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Fix "git log -z" behaviour 2007-02-10 9:36 ` Junio C Hamano @ 2007-02-10 17:09 ` Linus Torvalds 0 siblings, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2007-02-10 17:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: Don Zickus, Git Mailing List On Sat, 10 Feb 2007, Junio C Hamano wrote: > > ... well, it just occured to me that it might make sense not to > let this new "use NUL as inter-commit separator for grep -z" > semantics hijack existing -z option, but introduce another > option, say, -Z. I don't think I disagree, but I do suspect it's not worth it. Yes, we really do have two "line_termination" characters: the one between commits, and the one we use within raw diffs. However, I don't think the *combination* ever makes sense any more (*), so using the same flag doesn't seem to really be a problem. And the -z "line_termination" already got hijacked a long time ago for inter-commit messages too, so while adding a "-Z" would perhaps avoid a certain ambiguity, it would actually potentially break stuff that just did git-rev-list -z --pretty .. | ... which is actually _more_ likely than the "multiple commit messages _and_ raw outpu _and_ '-z'" combination. So I would suggest leaving it as-is, especially since I don't think anybody has actually even noticed (ie nobody probably used that combination), and the new semantics in many ways are both more useful and more logical. Linus (*) It may well have made sense a year and a half ago, I don't think it makes much sense any more. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 17:12 ` Linus Torvalds 2007-02-07 17:25 ` Johannes Schindelin 2007-02-07 18:16 ` Linus Torvalds @ 2007-02-07 18:19 ` Don Zickus 2007-02-07 18:27 ` Linus Torvalds 2 siblings, 1 reply; 34+ messages in thread From: Don Zickus @ 2007-02-07 18:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: git > - "git log" can itself do a lot of filtering. Both on date, on revisions, > on "modifies files/directories X, Y and Z" _and_ on strings. > > See "man git-rev-list" for more (it doesn't apply to just "git log", it > applies to just about any revision listing, including gitk etc) > > For example, > > git log [--author=pattern] [--committer=pattern] [--grep=pattern] > > will likely do exactly what you want. You can do > > git log --grep="Signed-off-by:.*akpm" > > on the kernel archive to see which ones were signed off by Andrew. Cool. The hidden little options. :-) This is exactly what I was looking for. Thanks. I didn't see these options in the man pages. Might be worth putting in there?? Cheers, Don ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: git log filtering 2007-02-07 18:19 ` git log filtering Don Zickus @ 2007-02-07 18:27 ` Linus Torvalds 0 siblings, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2007-02-07 18:27 UTC (permalink / raw) To: Don Zickus; +Cc: git On Wed, 7 Feb 2007, Don Zickus wrote: > > I didn't see these options in the man pages. Might be worth putting in > there?? Well, they really _are_ there, indirectly: The command takes options applicable to the git-rev-list(1) command to control what is shown and how, and options applicable to the git-diff-tree(1) commands to control how the change each commit introduces are shown. so you have to look at both git-rev-list and git-diff-tree to get all the options. It then goes on to say: This manual page describes only the most frequently used options. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ so technically it's complete and true. But yeah, maybe we could include all the options there. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2007-03-07 18:06 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-07 16:41 git log filtering Don Zickus 2007-02-07 16:55 ` Jakub Narebski 2007-02-07 17:01 ` Uwe Kleine-König 2007-02-07 17:12 ` Johannes Schindelin 2007-02-07 17:12 ` Linus Torvalds 2007-02-07 17:25 ` Johannes Schindelin [not found] ` <7v64ad7l12.fsf@assigned-by-dhcp.cox.net> 2007-02-07 21:03 ` Linus Torvalds 2007-02-07 21:09 ` Junio C Hamano 2007-02-07 21:53 ` Linus Torvalds 2007-02-08 6:16 ` Jeff King 2007-02-08 18:06 ` Johannes Schindelin 2007-02-08 22:33 ` Jeff King 2007-02-09 0:18 ` Johannes Schindelin 2007-02-09 0:23 ` Shawn O. Pearce 2007-02-09 0:45 ` Johannes Schindelin 2007-02-09 10:15 ` Sergey Vlasov 2007-02-09 1:59 ` Jeff King 2007-02-09 13:13 ` Johannes Schindelin 2007-02-09 13:22 ` Jeff King 2007-02-09 15:02 ` Johannes Schindelin 2007-03-07 17:37 ` pcre performance, was " Johannes Schindelin 2007-03-07 18:03 ` Paolo Bonzini 2007-02-08 1:59 ` Horst H. von Brand 2007-02-07 18:16 ` Linus Torvalds 2007-02-07 19:49 ` Fix "git log -z" behaviour Linus Torvalds 2007-02-07 19:55 ` Junio C Hamano 2007-02-07 22:53 ` Don Zickus 2007-02-07 23:05 ` Linus Torvalds 2007-02-08 22:34 ` Junio C Hamano 2007-02-10 7:32 ` Junio C Hamano 2007-02-10 9:36 ` Junio C Hamano 2007-02-10 17:09 ` Linus Torvalds 2007-02-07 18:19 ` git log filtering Don Zickus 2007-02-07 18:27 ` Linus Torvalds
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.