* [PATCH] [Outreachy] git/userdiff.c fix regex pattern error @ 2018-10-04 11:30 Ananya Krishna Maram 2018-10-04 14:26 ` Johannes Schindelin 0 siblings, 1 reply; 9+ messages in thread From: Ananya Krishna Maram @ 2018-10-04 11:30 UTC (permalink / raw) To: christian.couder, git, Johannes.Schindelin the forward slash character should be escaped with backslash. Fix Unescaped forward slash error in Python regex statements. Signed-off-by: Ananya Krishna Maram<ananyakittu1997@gmail.com> --- userdiff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/userdiff.c b/userdiff.c index f565f6731..f4ff9b9e5 100644 --- a/userdiff.c +++ b/userdiff.c @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$", /* -- */ "[a-zA-Z_][a-zA-Z0-9_]*" "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?" - "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"), + "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"), /* -- */ PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$", /* -- */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-04 11:30 [PATCH] [Outreachy] git/userdiff.c fix regex pattern error Ananya Krishna Maram @ 2018-10-04 14:26 ` Johannes Schindelin 2018-10-04 9:35 ` Ananya Krishna Maram 0 siblings, 1 reply; 9+ messages in thread From: Johannes Schindelin @ 2018-10-04 14:26 UTC (permalink / raw) To: Ananya Krishna Maram; +Cc: christian.couder, git Hi Ananya, thank you for taking the time to write this patch! On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > the forward slash character should be escaped with backslash. Fix > Unescaped forward slash error in Python regex statements. > > Signed-off-by: Ananya Krishna Maram<ananyakittu1997@gmail.com> That explains pretty well what you did, but I wonder why the forward slash needs to be escaped? I would understand if we enclosed the pattern in `/<regex>/`, as it is done e.g. in Javascript, but we do not... Thanks, Johannes > --- > userdiff.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/userdiff.c b/userdiff.c > index f565f6731..f4ff9b9e5 100644 > --- a/userdiff.c > +++ b/userdiff.c > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$", > /* -- */ > "[a-zA-Z_][a-zA-Z0-9_]*" > "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?" > - "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"), > + "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"), > /* -- */ > PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$", > /* -- */ > -- > 2.17.1 > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-04 14:26 ` Johannes Schindelin @ 2018-10-04 9:35 ` Ananya Krishna Maram 2018-10-04 15:26 ` Johannes Schindelin 0 siblings, 1 reply; 9+ messages in thread From: Ananya Krishna Maram @ 2018-10-04 9:35 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Christian Couder, git On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Ananya, > > thank you for taking the time to write this patch! > > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > > > the forward slash character should be escaped with backslash. Fix > > Unescaped forward slash error in Python regex statements. > > > > Signed-off-by: Ananya Krishna Maram<ananyakittu1997@gmail.com> > > That explains pretty well what you did, but I wonder why the forward slash > needs to be escaped? I would understand if we enclosed the pattern in > `/<regex>/`, as it is done e.g. in Javascript, but we do not... You are correct, the code would execute either ways. But when I came across this line, I didn't get it's meaning instantly because as per standards, forward slash has to be escaped. In fact when open source code is written according to standards, the code will be reachable to more people. Thanks, Ananya. > Thanks, > Johannes > > > --- > > userdiff.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/userdiff.c b/userdiff.c > > index f565f6731..f4ff9b9e5 100644 > > --- a/userdiff.c > > +++ b/userdiff.c > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$", > > /* -- */ > > "[a-zA-Z_][a-zA-Z0-9_]*" > > "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?" > > - "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"), > > + "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"), > > /* -- */ > > PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$", > > /* -- */ > > -- > > 2.17.1 > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-04 9:35 ` Ananya Krishna Maram @ 2018-10-04 15:26 ` Johannes Schindelin 2018-10-04 10:39 ` Ananya Krishna Maram 0 siblings, 1 reply; 9+ messages in thread From: Johannes Schindelin @ 2018-10-04 15:26 UTC (permalink / raw) To: Ananya Krishna Maram; +Cc: Christian Couder, git Hi Ananya, On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > Hi Ananya, > > > > thank you for taking the time to write this patch! > > > > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > > > > > the forward slash character should be escaped with backslash. Fix > > > Unescaped forward slash error in Python regex statements. > > > > > > Signed-off-by: Ananya Krishna Maram<ananyakittu1997@gmail.com> > > > > That explains pretty well what you did, but I wonder why the forward slash > > needs to be escaped? I would understand if we enclosed the pattern in > > `/<regex>/`, as it is done e.g. in Javascript, but we do not... > > You are correct, the code would execute either ways. But when I came across > this line, I didn't get it's meaning instantly because as per standards, forward > slash has to be escaped. In fact when open source code is written according to > standards, the code will be reachable to more people. I am afraid that I do not follow... Regular expressions have quite a few special characters, but forward slashes are not among them. Meaning: if we had specified the regular expression thusly (in any language that supports them to be specified in this way): /|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?/ then I would agree that this is a bug, and needs to be fixed. But we specify it as a regular C string: "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?" In this context, the backslash has an additional, nested meaning: it escapes special characters in a C string, too. So writing "\\" will actually result in a string consisting of a single backslash. And "\n" would specify a string consisting of a single line feed character. Some C compilers ignore incorrectly-escaped characters, i.e. "\/" would simply expand to the forward slash. However, you wanted to escape the forward slash in the regular expression. To do that, you would have to write "\\/" i.e. specifying, in a C string, a backslash character, followed by a forward slash character. However, the regular expression would then be interpreting the backslash character as escape character in its own right, seeing the forward slash, and replacing this sequence by a forward slash. But it does not need to be escaped, when you specify the regular expression the way we do. And the way we specified it is really the standard when specifying regular expressions in C code, i.e. *without* the suggested backslash. Ciao, Johannes > > Thanks, > Ananya. > > > Thanks, > > Johannes > > > > > --- > > > userdiff.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/userdiff.c b/userdiff.c > > > index f565f6731..f4ff9b9e5 100644 > > > --- a/userdiff.c > > > +++ b/userdiff.c > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$", > > > /* -- */ > > > "[a-zA-Z_][a-zA-Z0-9_]*" > > > "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?" > > > - "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"), > > > + "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"), > > > /* -- */ > > > PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$", > > > /* -- */ > > > -- > > > 2.17.1 > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-04 15:26 ` Johannes Schindelin @ 2018-10-04 10:39 ` Ananya Krishna Maram 2018-10-04 19:42 ` Johannes Schindelin 2018-10-06 23:41 ` Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Ananya Krishna Maram @ 2018-10-04 10:39 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Christian Couder, git On Thu, 4 Oct 2018 at 20:56, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Ananya, > > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > > > On Thu, 4 Oct 2018 at 19:56, Johannes Schindelin > > <Johannes.Schindelin@gmx.de> wrote: > > > > > > Hi Ananya, > > > > > > thank you for taking the time to write this patch! > > > > > > On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > > > > > > > the forward slash character should be escaped with backslash. Fix > > > > Unescaped forward slash error in Python regex statements. > > > > > > > > Signed-off-by: Ananya Krishna Maram<ananyakittu1997@gmail.com> > > > > > > That explains pretty well what you did, but I wonder why the forward slash > > > needs to be escaped? I would understand if we enclosed the pattern in > > > `/<regex>/`, as it is done e.g. in Javascript, but we do not... > > > > You are correct, the code would execute either ways. But when I came across > > this line, I didn't get it's meaning instantly because as per standards, forward > > slash has to be escaped. In fact when open source code is written according to > > standards, the code will be reachable to more people. > > I am afraid that I do not follow... Regular expressions have quite a few > special characters, but forward slashes are not among them. > > Meaning: if we had specified the regular expression thusly (in any > language that supports them to be specified in this way): > > /|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?/ > > then I would agree that this is a bug, and needs to be fixed. But we > specify it as a regular C string: > > "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?" > > In this context, the backslash has an additional, nested meaning: it > escapes special characters in a C string, too. So writing > > "\\" > > will actually result in a string consisting of a single backslash. And > > "\n" > > would specify a string consisting of a single line feed character. Some C > compilers ignore incorrectly-escaped characters, i.e. "\/" would simply > expand to the forward slash. > > However, you wanted to escape the forward slash in the regular expression. > To do that, you would have to write > > "\\/" > > i.e. specifying, in a C string, a backslash character, followed by a > forward slash character. > > However, the regular expression would then be interpreting the backslash > character as escape character in its own right, seeing the forward slash, > and replacing this sequence by a forward slash. > > But it does not need to be escaped, when you specify the regular > expression the way we do. And the way we specified it is really the > standard when specifying regular expressions in C code, i.e. *without* the > suggested backslash. Aha!. this makes total sense. I was thinking from a general regular expression point of view. But I should be thinking from C point of view and how C might interpret this newly submitted string. This explanation is very clear. Thanks for taking time to reply to my patch. From next time on, I will try to think from git project's point of view. Thanks, Ananya. > Ciao, > Johannes > > > > > Thanks, > > Ananya. > > > > > Thanks, > > > Johannes > > > > > > > --- > > > > userdiff.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/userdiff.c b/userdiff.c > > > > index f565f6731..f4ff9b9e5 100644 > > > > --- a/userdiff.c > > > > +++ b/userdiff.c > > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$", > > > > /* -- */ > > > > "[a-zA-Z_][a-zA-Z0-9_]*" > > > > "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?" > > > > - "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"), > > > > + "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"), > > > > /* -- */ > > > > PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$", > > > > /* -- */ > > > > -- > > > > 2.17.1 > > > > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-04 10:39 ` Ananya Krishna Maram @ 2018-10-04 19:42 ` Johannes Schindelin 2018-10-06 23:41 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Johannes Schindelin @ 2018-10-04 19:42 UTC (permalink / raw) To: Ananya Krishna Maram; +Cc: Christian Couder, git Hi Ananya, On Thu, 4 Oct 2018, Ananya Krishna Maram wrote: > On Thu, 4 Oct 2018 at 20:56, Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > [... talking about the reason why a slash does not need to be escaped > > in a C string specifying a regular expression...] > > > > But it does not need to be escaped, when you specify the regular > > expression the way we do. And the way we specified it is really the > > standard when specifying regular expressions in C code, i.e. *without* the > > suggested backslash. > > Aha!. this makes total sense. I was thinking from a general regular expression > point of view. But I should be thinking from C point of view and how C > might interpret this newly submitted string. > This explanation is very clear. Thanks for taking time to reply to my > patch. From next time on, I will try to think from > git project's point of view. Of course! Thank you for taking the time to contribute this patch. Maybe you have another idea for a micro-project? Maybe there is something in Git that you wish was more convenient? Or maybe https://public-inbox.org/git/?q=leftoverbits has something that you would like to implement? Ciao, Johannes > > Thanks, > Ananya. > > > Ciao, > > Johannes > > > > > > > > Thanks, > > > Ananya. > > > > > > > Thanks, > > > > Johannes > > > > > > > > > --- > > > > > userdiff.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/userdiff.c b/userdiff.c > > > > > index f565f6731..f4ff9b9e5 100644 > > > > > --- a/userdiff.c > > > > > +++ b/userdiff.c > > > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$", > > > > > /* -- */ > > > > > "[a-zA-Z_][a-zA-Z0-9_]*" > > > > > "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?" > > > > > - "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"), > > > > > + "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"), > > > > > /* -- */ > > > > > PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$", > > > > > /* -- */ > > > > > -- > > > > > 2.17.1 > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-04 10:39 ` Ananya Krishna Maram 2018-10-04 19:42 ` Johannes Schindelin @ 2018-10-06 23:41 ` Junio C Hamano 2018-10-12 7:57 ` Johannes Schindelin 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2018-10-06 23:41 UTC (permalink / raw) To: Ananya Krishna Maram; +Cc: Johannes Schindelin, Christian Couder, git Ananya Krishna Maram <ananyakittu1997@gmail.com> writes: >> But it does not need to be escaped, when you specify the regular >> expression the way we do. And the way we specified it is really the >> standard when specifying regular expressions in C code, i.e. *without* the >> suggested backslash. > > Aha!. this makes total sense. I was thinking from a general regular expression > point of view. But I should be thinking from C point of view and how C > might interpret this newly submitted string. If you were thinking from a general regex point of view, you would never have treated '/' as anything special, though. Historically, some languages (like sed and perl) had an regexp match operator, which is denoted by enclosing a regexp inside a pair of slashes. In these languages, if you use /<regexp>/ operator, and if you want to match slash with the pattern, you somehow need a way to write an regexp that has slash in it. E.g. if you want a pattern that would match 'a' followed by '/' followed by 'b', your regexp would look like "a/b", but the regexp match operation you would write in these languages would be like /a\/b/, so that '/<regexp>/' parser can tell that the second slash is not a slash that signals the end of the match operator. And then there is an unnamed misdesigned language that has a regmatch() function, which takes a string that contains a regular expression, but somehow requires that string to begin and end with a slash for no justifiable reason ;-). If you were thinking about regexp from that brain-dead languages' point of view, yes, you should unlearn it and what Dscho gave you would make sense. C's regexp(3) library does not share such a misdesign and just takes a regular expression as a string. You would still need to follow the quoting rules of C string literals (e.g. write a literal double-quote or backslash after an escaping backslash), but of course slash is not special there. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-06 23:41 ` Junio C Hamano @ 2018-10-12 7:57 ` Johannes Schindelin 2018-10-12 8:41 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Johannes Schindelin @ 2018-10-12 7:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ananya Krishna Maram, Christian Couder, git Hi Junio, On Sun, 7 Oct 2018, Junio C Hamano wrote: > And then there is an unnamed misdesigned language that has a > regmatch() function, which takes a string that contains a regular > expression, but somehow requires that string to begin and end with a > slash for no justifiable reason ;-). It makes more sense once you realize that /<regexp>/ is a very nice syntactic construct to obtain an instance of a regular expression object, rather than a string describing one. In Perl it is not as obvious as in Javascript. But in C, we do not have objects, so the way to describe a regular expression is a string (which you have to compile into an opaque data structure before applying it). Ciao, Johannes ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error 2018-10-12 7:57 ` Johannes Schindelin @ 2018-10-12 8:41 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2018-10-12 8:41 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Ananya Krishna Maram, Christian Couder, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Sun, 7 Oct 2018, Junio C Hamano wrote: > >> And then there is an unnamed misdesigned language that has a >> regmatch() function, which takes a string that contains a regular >> expression, but somehow requires that string to begin and end with a >> slash for no justifiable reason ;-). > > It makes more sense once you realize that /<regexp>/ is a very nice > syntactic construct to obtain an instance of a regular expression object, > rather than a string describing one. In Perl, qr/<regexp>/ is often how you use that syntactic construct. The unnamed misdesigned language literally uses a string whose content is /<regexp>/; iow, preg_match('/a.*b/', ...) is what you see, and there is no "regex object involved. As far as the function is concerned, it is receiving a string, that represents a regexp. There is no reason to require the pattern _string_ to begin and end with slash, but somehow they do. > But in C, we do not have objects, so the way to describe a regular > expression is a string (which you have to compile into an opaque data > structure before applying it). Absolutely. C does not have regexp object literal and the literal you write to represent a regexp is just a string that you would pass to functions like regcomp(). But at least C is not so brain-dead to require slashes on either end of that string ;-) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-10-12 8:42 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-04 11:30 [PATCH] [Outreachy] git/userdiff.c fix regex pattern error Ananya Krishna Maram 2018-10-04 14:26 ` Johannes Schindelin 2018-10-04 9:35 ` Ananya Krishna Maram 2018-10-04 15:26 ` Johannes Schindelin 2018-10-04 10:39 ` Ananya Krishna Maram 2018-10-04 19:42 ` Johannes Schindelin 2018-10-06 23:41 ` Junio C Hamano 2018-10-12 7:57 ` Johannes Schindelin 2018-10-12 8:41 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.