* [PATCH] Interpret :/<pattern> as a regular expression @ 2007-06-13 0:50 Johannes Schindelin 2007-06-13 1:21 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 0:50 UTC (permalink / raw) To: git Earlier, Git interpreted the pattern as a strict prefix, which made the operator unsuited in many cases. Now, the pattern is interpreted as a regular expression (which does not change the behaviour too much, since few onelines contain special regex characters), so that you can say git diff :/.*^Signed-off-by:.Zack.Brown to see the diff against the most recent reachable commit which was signed off by Zack, whose Kernel Cousin I miss very much. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- For fun, you can see which committer signed himself off: git show :/.*^Signed-off: *grin* sha1_name.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/sha1_name.c b/sha1_name.c index d9188ed..f1ba194 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -611,6 +611,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) { struct commit_list *list = NULL, *backup = NULL, *l; int retval = -1; + regex_t regexp; + regmatch_t regmatch[1]; if (prefix[0] == '!') { if (prefix[1] != '!') @@ -622,6 +624,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) for_each_ref(handle_one_ref, &list); for (l = list; l; l = l->next) commit_list_insert(l->item, &backup); + if (regcomp(®exp, prefix, REG_EXTENDED)) + return error("invalid regexp: %s", prefix); while (list) { char *p; struct commit *commit; @@ -630,7 +634,9 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) parse_object(commit->object.sha1); if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n"))) continue; - if (!prefixcmp(p + 2, prefix)) { + if (!regexec(®exp, p + 2, 1, regmatch, 0) && + printf("match: %d\n", regmatch[0].rm_so) && + regmatch[0].rm_so == 0) { hashcpy(sha1, commit->object.sha1); retval = 0; break; @@ -639,6 +645,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) free_commit_list(list); for (l = backup; l; l = l->next) clear_commit_marks(l->item, ONELINE_SEEN); + regfree(®exp); return retval; } -- 1.5.2.1.2827.gba84a8-dirty ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin @ 2007-06-13 1:21 ` Junio C Hamano 2007-06-13 13:13 ` Johannes Schindelin 2007-06-13 4:52 ` Junio C Hamano 2007-06-13 18:41 ` Jeff King 2 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2007-06-13 1:21 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Earlier, Git interpreted the pattern as a strict prefix, which made > the operator unsuited in many cases. > > Now, the pattern is interpreted as a regular expression (which does not > change the behaviour too much, since few onelines contain special regex > characters), so that you can say > > git diff :/.*^Signed-off-by:.Zack.Brown > > to see the diff against the most recent reachable commit which was > signed off by Zack, whose Kernel Cousin I miss very much. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> While this perhaps is an improvement and people who are not interested in paying the price have a choice of not using this silly syntax, I am moderately annoyed that the syntax does not define "the most recent reachable" very well. It is more like "the first one we happened to pick by diffing from reachable refs". It would be more useful if it took "$commit:/$pattern" form to limit the search among reachable ones from named commit. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 1:21 ` Junio C Hamano @ 2007-06-13 13:13 ` Johannes Schindelin 0 siblings, 0 replies; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 13:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Tue, 12 Jun 2007, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Earlier, Git interpreted the pattern as a strict prefix, which made > > the operator unsuited in many cases. > > > > Now, the pattern is interpreted as a regular expression (which does not > > change the behaviour too much, since few onelines contain special regex > > characters), so that you can say > > > > git diff :/.*^Signed-off-by:.Zack.Brown > > > > to see the diff against the most recent reachable commit which was > > signed off by Zack, whose Kernel Cousin I miss very much. > > > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > While this perhaps is an improvement and people who are not > interested in paying the price have a choice of not using this > silly syntax, I am moderately annoyed that the syntax does not > define "the most recent reachable" very well. It is more like > "the first one we happened to pick by diffing from reachable > refs". It would be more useful if it took "$commit:/$pattern" > form to limit the search among reachable ones from named commit. "Unfortunately", $commit:/$pattern is not a good syntax, since it suggests that you want to search _in_ $commit, not _from $commit. How about ':/!commit=$commit:$pattern'? Ciao, Dscho ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin 2007-06-13 1:21 ` Junio C Hamano @ 2007-06-13 4:52 ` Junio C Hamano 2007-06-13 11:10 ` Johannes Schindelin 2007-06-13 11:17 ` (unknown) Johannes Schindelin 2007-06-13 18:41 ` Jeff King 2 siblings, 2 replies; 14+ messages in thread From: Junio C Hamano @ 2007-06-13 4:52 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > diff --git a/sha1_name.c b/sha1_name.c > index d9188ed..f1ba194 100644 > --- a/sha1_name.c > +++ b/sha1_name.c > @@ -611,6 +611,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > { > struct commit_list *list = NULL, *backup = NULL, *l; > int retval = -1; > + regex_t regexp; > + regmatch_t regmatch[1]; > > if (prefix[0] == '!') { > if (prefix[1] != '!') Because you are not extracting any match substring, I do not think you would need regmatch[] there. > @@ -622,6 +624,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > for_each_ref(handle_one_ref, &list); > for (l = list; l; l = l->next) > commit_list_insert(l->item, &backup); > + if (regcomp(®exp, prefix, REG_EXTENDED)) > + return error("invalid regexp: %s", prefix); > while (list) { > char *p; > struct commit *commit; Why EXTENDED? I think our code prefer traditional regexp by default, but in places where extended behaviour is truly useful (e.g. grepping for bulk of text) have command line option to enable extended. Of course, extended SHA-1 notation has no chance to do "command line option", but it somehow feels inconsistent. Also you probably want REG_NEWLINE. > @@ -630,7 +634,9 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > parse_object(commit->object.sha1); > if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n"))) > continue; > - if (!prefixcmp(p + 2, prefix)) { > + if (!regexec(®exp, p + 2, 1, regmatch, 0) && > + printf("match: %d\n", regmatch[0].rm_so) && > + regmatch[0].rm_so == 0) { > hashcpy(sha1, commit->object.sha1); > retval = 0; > break; Do we want to detect return value other than REG_NOMATCH from regexec() when it does not return zero? Please lose the debugging printf() before submitting. > @@ -639,6 +645,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > free_commit_list(list); > for (l = backup; l; l = l->next) > clear_commit_marks(l->item, ONELINE_SEEN); > + regfree(®exp); > return retval; > } > Also I think you would want to fix get_sha1_oneline() so that it not to refuse to work without save_commit_buffer. These are to parse user supplied strings (it would be crazy for scripts to throw hundreds of ':/random string' to drive git -- it must come from the end user), and the user has every right to use this syntax, if he wants to, to specify the starting point for a command that deliberately turns off save_commit_buffer to save memory, because the command knows ithat t will traverse tons of commits without needing the contents of the commit buffer. After parse_object(), if commit->buffer is NULL, read the buffer with read_sha1_file() yourself to look for match (and if you did so you are also responsible for discarding it yourself). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 4:52 ` Junio C Hamano @ 2007-06-13 11:10 ` Johannes Schindelin 2007-06-13 11:17 ` (unknown) Johannes Schindelin 1 sibling, 0 replies; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 11:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Tue, 12 Jun 2007, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > diff --git a/sha1_name.c b/sha1_name.c > > index d9188ed..f1ba194 100644 > > --- a/sha1_name.c > > +++ b/sha1_name.c > > @@ -611,6 +611,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > > { > > struct commit_list *list = NULL, *backup = NULL, *l; > > int retval = -1; > > + regex_t regexp; > > + regmatch_t regmatch[1]; > > > > if (prefix[0] == '!') { > > if (prefix[1] != '!') > > Because you are not extracting any match substring, I do not > think you would need regmatch[] there. I explicitely want to anchor the match at the beginning of the message (otherwise, most of the interesting patterns would match a merge pulling that commit in), and therefore I have to check where the match was. Alas, that's only possible with regmatch. > > @@ -622,6 +624,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > > for_each_ref(handle_one_ref, &list); > > for (l = list; l; l = l->next) > > commit_list_insert(l->item, &backup); > > + if (regcomp(®exp, prefix, REG_EXTENDED)) > > + return error("invalid regexp: %s", prefix); > > while (list) { > > char *p; > > struct commit *commit; > > Why EXTENDED? Sorry. Old habit. Will fix. > Also you probably want REG_NEWLINE. No. If I look for something in the body of the message (like I showed in the two examples), I'd like to say ':/.*Blub'. Of course, you could always say git diff ':/[ -~ ]*Blub' with REG_NEWLINE, to match newlines also. But frankly, it is much more often that I want to match something in the whole message than just in the oneline. And if I _do_ want the match only in the oneline, I can still go and do git diff ':/[^ ]*Blub' when REG_NEWLINE is disabled. If you can teach me a better way to do both things, matching in the oneline _or_ matching the whole message, _with_ REG_NEWLINE, I'll gladly change it, and provide an example in the commit message as well as the documentation for equally clueless subjects as me. > > @@ -630,7 +634,9 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > > parse_object(commit->object.sha1); > > if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n"))) > > continue; > > - if (!prefixcmp(p + 2, prefix)) { > > + if (!regexec(®exp, p + 2, 1, regmatch, 0) && > > + printf("match: %d\n", regmatch[0].rm_so) && > > + regmatch[0].rm_so == 0) { > > hashcpy(sha1, commit->object.sha1); > > retval = 0; > > break; > > Do we want to detect return value other than REG_NOMATCH from > regexec() when it does not return zero? I am not well versed in the multitude of POSIX and other standards we have on this planet, therefore I just read my man page. And it says, quote: regexec() returns zero for a successful match or REG_NOMATCH for failure. Tertium non datur. At least according to my man page here. Am I mistaken in my assumption (which seems to be somewhat supported from my limited reading of the man page) that all errors should be caught at regcomp() time? > Please lose the debugging printf() before submitting. Ouch. Sorry. > > @@ -639,6 +645,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) > > free_commit_list(list); > > for (l = backup; l; l = l->next) > > clear_commit_marks(l->item, ONELINE_SEEN); > > + regfree(®exp); > > return retval; > > } > > > > Also I think you would want to fix get_sha1_oneline() so that it > not to refuse to work without save_commit_buffer. You're right, will fix. I'll resubmit shortly. However, feel free to enlighten me with insights into working _with_ REG_NEWLINE, and I'll gladly submit another version of the patch with REG_NEWLINE enabled, along with the additions to the documentation. Thanks, Dscho ^ permalink raw reply [flat|nested] 14+ messages in thread
* (unknown) 2007-06-13 4:52 ` Junio C Hamano 2007-06-13 11:10 ` Johannes Schindelin @ 2007-06-13 11:17 ` Johannes Schindelin 2007-06-13 12:11 ` [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin 1 sibling, 1 reply; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 11:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [PATCH] Interpret :/<pattern> as a regular expression Earlier, Git interpreted the pattern as a strict prefix, which made the operator unsuited in many cases. Now, the pattern is interpreted as a regular expression, on the whole message, so that you can say git diff :/.*^Signed-off-by:.Zack.Brown to see the diff against the most recent reachable commit which was signed off by Zack, whose Kernel Cousin I miss very much. If you want to match just the oneline, but with a regular expression, say something like git diff ':/[^ ]*intelligent' Since it makes more sense to match the beginning of a message (otherwise, a pattern like ':/git-gui: Improve' would match the _merge_ commit, pulling in the commit you are likely to want), the implementation uses the regmatch parameter of regexec() to anchor the pattern there. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- Anyone who can teach me how to match / not-match a newline using a more elegant syntax, please do so, by all means. Documentation/git-rev-parse.txt | 6 +++--- sha1_name.c | 31 +++++++++++++++++++++++++------ 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt index 6380676..56e1561 100644 --- a/Documentation/git-rev-parse.txt +++ b/Documentation/git-rev-parse.txt @@ -194,9 +194,9 @@ blobs contained in a commit. found. * A colon, followed by a slash, followed by a text: this names - a commit whose commit message starts with the specified text. - This name returns the youngest matching commit which is - reachable from any ref. If the commit message starts with a + a commit whose commit message starts with the specified regular + expression. This name returns the youngest matching commit which + is reachable from any ref. If the commit message starts with a '!', you have to repeat that; the special sequence ':/!', followed by something else than '!' is reserved for now. diff --git a/sha1_name.c b/sha1_name.c index d9188ed..988d599 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -599,8 +599,8 @@ static int handle_one_ref(const char *path, /* * This interprets names like ':/Initial revision of "git"' by searching - * through history and returning the first commit whose message starts - * with the given string. + * through history and returning the first commit whose message matches + * the given regular expression. * * For future extension, ':/!' is reserved. If you want to match a message * beginning with a '!', you have to repeat the exclamation mark. @@ -611,34 +611,53 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1) { struct commit_list *list = NULL, *backup = NULL, *l; int retval = -1; + regex_t regexp; + regmatch_t regmatch[1]; + char *temp_commit_buffer = NULL; if (prefix[0] == '!') { if (prefix[1] != '!') die ("Invalid search pattern: %s", prefix); prefix++; } - if (!save_commit_buffer) - return error("Could not expand oneline-name."); for_each_ref(handle_one_ref, &list); for (l = list; l; l = l->next) commit_list_insert(l->item, &backup); + if (regcomp(®exp, prefix, 0)) + return error("invalid regexp: %s", prefix); while (list) { char *p; struct commit *commit; + enum object_type type; + unsigned long size; commit = pop_most_recent_commit(&list, ONELINE_SEEN); parse_object(commit->object.sha1); - if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n"))) + if (temp_commit_buffer) + free(temp_commit_buffer); + if (commit->buffer) + p = commit->buffer; + else { + p = read_sha1_file(commit->object.sha1, &type, &size); + if (!p) + continue; + temp_commit_buffer = p; + } + if (!(p = strstr(p, "\n\n"))) continue; - if (!prefixcmp(p + 2, prefix)) { + if (!regexec(®exp, p + 2, 1, regmatch, 0) && + regmatch[0].rm_so == 0) { hashcpy(sha1, commit->object.sha1); retval = 0; break; } } + if (temp_commit_buffer) + free(temp_commit_buffer); free_commit_list(list); for (l = backup; l; l = l->next) clear_commit_marks(l->item, ONELINE_SEEN); + regfree(®exp); return retval; } -- 1.5.2.1.2827.gba84a8-dirty ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 11:17 ` (unknown) Johannes Schindelin @ 2007-06-13 12:11 ` Johannes Schindelin 0 siblings, 0 replies; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 12:11 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, I don't know what happened to my subject in the mail I'm replying to... Sorry. Ciao, Dscho ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin 2007-06-13 1:21 ` Junio C Hamano 2007-06-13 4:52 ` Junio C Hamano @ 2007-06-13 18:41 ` Jeff King 2007-06-13 18:54 ` Johannes Schindelin 2 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2007-06-13 18:41 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Wed, Jun 13, 2007 at 01:50:22AM +0100, Johannes Schindelin wrote: > Earlier, Git interpreted the pattern as a strict prefix, which made > the operator unsuited in many cases. Thank you for working on this...I really like the :/ concept, but find myself wishing for a regex all the time. I have been meaning to do it since you introduced the original. :) -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 18:41 ` Jeff King @ 2007-06-13 18:54 ` Johannes Schindelin 2007-06-13 20:00 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 18:54 UTC (permalink / raw) To: Jeff King; +Cc: git Hi, On Wed, 13 Jun 2007, Jeff King wrote: > On Wed, Jun 13, 2007 at 01:50:22AM +0100, Johannes Schindelin wrote: > > > Earlier, Git interpreted the pattern as a strict prefix, which made > > the operator unsuited in many cases. > > Thank you for working on this...I really like the :/ concept, but find > myself wishing for a regex all the time. I have been meaning to do it > since you introduced the original. :) :-) Since you seem comfortable with regular expressions, maybe you can help me: I am looking for a pattern which matches _any_ character, and one which matches only non-newlines, both with and without REG_NEWLINE. Hmm? Tia, Dscho ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 18:54 ` Johannes Schindelin @ 2007-06-13 20:00 ` Jeff King 2007-06-13 22:20 ` Johannes Schindelin 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2007-06-13 20:00 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Wed, Jun 13, 2007 at 07:54:59PM +0100, Johannes Schindelin wrote: > :-) Since you seem comfortable with regular expressions, maybe you can > help me: I am looking for a pattern which matches _any_ character, and one > which matches only non-newlines, both with and without REG_NEWLINE. Hmm? Without REG_NEWLINE, any character is just '.', but I think you are stuck with '[^ ]' for non-newlines, since POSIX makes no provisions for quoting the newline (I just skimmed through POSIX chapter 9, and I didn't see anything useful). With REG_NEWLINE, non-newlines is of course '.'. Matching both is tricky without using extended regular expressions (where you could just do '.| '). In fact, I have been playing with it for a few minutes and I can't seem to find a good way, since you really want to represent '.' _inside_ a bracketed alternation sequence. But I don't think there's a character class for "everything". I think this would be much easier with pcre, but ISTR some opposition to that a few months back. So that's probably not very helpful to you, but at least you have confirmation from one other person that the answer isn't totally obvious. :) -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 20:00 ` Jeff King @ 2007-06-13 22:20 ` Johannes Schindelin 2007-06-14 7:48 ` Sam Vilain 0 siblings, 1 reply; 14+ messages in thread From: Johannes Schindelin @ 2007-06-13 22:20 UTC (permalink / raw) To: Jeff King; +Cc: git Hi, On Wed, 13 Jun 2007, Jeff King wrote: > On Wed, Jun 13, 2007 at 07:54:59PM +0100, Johannes Schindelin wrote: > > > :-) Since you seem comfortable with regular expressions, maybe you can > > help me: I am looking for a pattern which matches _any_ character, and one > > which matches only non-newlines, both with and without REG_NEWLINE. Hmm? > > Without REG_NEWLINE, any character is just '.', but I think you are > stuck with '[^ > ]' for non-newlines, since POSIX makes no provisions for quoting the > newline (I just skimmed through POSIX chapter 9, and I didn't see > anything useful). > > With REG_NEWLINE, non-newlines is of course '.'. Matching both is tricky > without using extended regular expressions (where you could just do '.| > '). In fact, I have been playing with it for a few minutes and I can't > seem to find a good way, since you really want to represent '.' _inside_ > a bracketed alternation sequence. But I don't think there's a character > class for "everything". > > I think this would be much easier with pcre, but ISTR some opposition to > that a few months back. Actually, that's funny. Yesterday, I repeated my claim that pcre is slow on IRC, and Sam Villain on IRC accused me of trolling. But as you can see from my postings on this list ($gmane/41682), you can see that _I_ had numbers to back up my claim. So no, I think pcre is just not worth it. > So that's probably not very helpful to you, but at least you have > confirmation from one other person that the answer isn't totally > obvious. :) That confirmation is at least some consolation to me :-) Ciao, Dscho "who is not here to teach, but to learn" ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-13 22:20 ` Johannes Schindelin @ 2007-06-14 7:48 ` Sam Vilain 2007-06-14 8:09 ` Johannes Schindelin 0 siblings, 1 reply; 14+ messages in thread From: Sam Vilain @ 2007-06-14 7:48 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Jeff King, git Johannes Schindelin wrote: > Actually, that's funny. Yesterday, I repeated my claim that pcre is > slow on IRC, and Sam Villain on IRC accused me of trolling. But as you can > see from my postings on this list ($gmane/41682), you can see that _I_ had > numbers to back up my claim. > > So no, I think pcre is just not worth it. > A strange thing to conclude from your figures, which show pcre as the fastest out of several libraries that you tested. Your figures show exactly what I was saying on IRC - that a DFA (external grep) vs NFA engine (most regex libraries) is inherently faster. The paper I linked to, specially selected as I had previously read a significant amount of the peer review the paper received, explained this in detail. The one piece of feedback your numbers got on-list also mentioned this. However there is a further flaw in your study. All but one of the performance tests use an external program, which on a given system may or may not be faster because of pipeline performance characteristics. You could improve the quality of the result by using the 'pcregrep' program as a data point. It might also be worth trying a few more complex patterns. I suggest reading the paper (http://swtch.com/~rsc/regexp/regexp1.html) for some background before repeating the experiment. Apologies for not reviewing your numbers at the time; it sure is hard to keep on top of this list. But very interesting that they seem to suggest pcre would be the best choice from a performance perspective, even though the figures are very preliminary. Perhaps it is worth pursuing after all. Sam. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-14 7:48 ` Sam Vilain @ 2007-06-14 8:09 ` Johannes Schindelin 2007-06-14 9:07 ` Sam Vilain 0 siblings, 1 reply; 14+ messages in thread From: Johannes Schindelin @ 2007-06-14 8:09 UTC (permalink / raw) To: Sam Vilain; +Cc: Jeff King, git Hi, On Thu, 14 Jun 2007, Sam Vilain wrote: > Johannes Schindelin wrote: > > > So no, I think pcre is just not worth it. > > A strange thing to conclude from your figures, which show pcre as the > fastest out of several libraries that you tested. The best of the worse. Yes. An external (!) program was 4x faster than pcre. I don't know how to make it more obvious that pcre sucks. Hth, Dscho ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Interpret :/<pattern> as a regular expression 2007-06-14 8:09 ` Johannes Schindelin @ 2007-06-14 9:07 ` Sam Vilain 0 siblings, 0 replies; 14+ messages in thread From: Sam Vilain @ 2007-06-14 9:07 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin wrote: >> A strange thing to conclude from your figures, which show pcre as the >> fastest out of several libraries that you tested. >> > > The best of the worse. Yes. An external (!) program was 4x faster than > pcre. I don't know how to make it more obvious that pcre sucks. > Oh, your position is obvious enough, but it is correct and supportable by the available evidence? Why not read past the first paragraph of my post and reply to that. Sam. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-06-14 9:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-06-13 0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin 2007-06-13 1:21 ` Junio C Hamano 2007-06-13 13:13 ` Johannes Schindelin 2007-06-13 4:52 ` Junio C Hamano 2007-06-13 11:10 ` Johannes Schindelin 2007-06-13 11:17 ` (unknown) Johannes Schindelin 2007-06-13 12:11 ` [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin 2007-06-13 18:41 ` Jeff King 2007-06-13 18:54 ` Johannes Schindelin 2007-06-13 20:00 ` Jeff King 2007-06-13 22:20 ` Johannes Schindelin 2007-06-14 7:48 ` Sam Vilain 2007-06-14 8:09 ` Johannes Schindelin 2007-06-14 9:07 ` Sam Vilain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).