All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH] pathspec: allow escaped query values
@ 2016-06-01 23:52 Stefan Beller
  2016-06-02  0:33 ` Junio C Hamano
  2016-06-02  0:38 ` Ramsay Jones
  0 siblings, 2 replies; 16+ messages in thread
From: Stefan Beller @ 2016-06-01 23:52 UTC (permalink / raw)
  To: gitster, pclouds; +Cc: git, Stefan Beller

In our own .gitattributes file we have attributes such as:

    *.[ch] whitespace=indent,trail,space

When querying for attributes we want to be able to ask for the exact
value, i.e.

    git ls-files :(attr:whitespace=indent,trail,space)

should work, but the commas are used in the attr magic to introduce
the next attr, such that this query currently fails with

fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'

This change allows escaping characters by a backslash, such that the query

    git ls-files :(attr:whitespace=indent\,trail\,space)

will match all path that have the value "indent,trail,space" for the
whitespace attribute. To accomplish this, we need to modify two places.
First `eat_long_magic` needs to not stop early upon seeing a comma or
closing paren that is escaped. As a second step we need to remove any
escaping from the attr value. For now we just remove any backslashes.

Caveat: This doesn't allow for querying for values that have backslashes
in them, e.g.

    git ls-files :(attr:backslashes=\\)

that would ask for matches that have the `backslashes` value set to '\'.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

 * This applies on top of sb/pathspec-label
 * Junio does this come close to what you imagine for escaped commas?
 
 Thanks,
 Stefan
  
 pathspec.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index 0a02255..925f949 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -89,6 +89,22 @@ static void prefix_short_magic(struct strbuf *sb, int prefixlen,
 	strbuf_addf(sb, ",prefix:%d)", prefixlen);
 }
 
+static char *attr_value_unquote(const char *value)
+{
+	char *ret = xstrdup(value);
+
+	size_t value_len = strlen(ret);
+	size_t len = strcspn(ret, "\\");
+
+	while (len != value_len) {
+		memmove(ret + len, ret + len + 1, value_len - len);
+		value_len--;
+		len += strcspn(ret + len, "\\");
+	}
+
+	return ret;
+}
+
 static void parse_pathspec_attr_match(struct pathspec_item *item, const char *value)
 {
 	struct string_list_item *si;
@@ -132,7 +148,7 @@ static void parse_pathspec_attr_match(struct pathspec_item *item, const char *va
 				am->match_mode = MATCH_SET;
 			else {
 				am->match_mode = MATCH_VALUE;
-				am->value = xstrdup(&attr[attr_len + 1]);
+				am->value = attr_value_unquote(&attr[attr_len + 1]);
 				if (strchr(am->value, '\\'))
 					die(_("attr spec values must not contain backslashes"));
 			}
@@ -167,6 +183,9 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
 	     *copyfrom && *copyfrom != ')';
 	     copyfrom = nextat) {
 		size_t len = strcspn(copyfrom, ",)");
+		while (len > 0 && copyfrom[len - 1] == '\\'
+		       && (copyfrom[len] == ',' || copyfrom[len] == ')'))
+			len += strcspn(copyfrom + len + 1, ",)") + 1;
 		if (copyfrom[len] == ',')
 			nextat = copyfrom + len + 1;
 		else
-- 
2.8.2.124.g24a9db3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-01 23:52 [RFC/PATCH] pathspec: allow escaped query values Stefan Beller
@ 2016-06-02  0:33 ` Junio C Hamano
  2016-06-02  2:23   ` Stefan Beller
  2016-06-02  0:38 ` Ramsay Jones
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-06-02  0:33 UTC (permalink / raw)
  To: Stefan Beller; +Cc: pclouds, git

Stefan Beller <sbeller@google.com> writes:

> This change allows escaping characters by a backslash, such that the query
>
>     git ls-files :(attr:whitespace=indent\,trail\,space)
>
> will match all path that have the value "indent,trail,space" for the
> whitespace attribute. To accomplish this, we need to modify two places.
> First `eat_long_magic` needs to not stop early upon seeing a comma or
> closing paren that is escaped. As a second step we need to remove any
> escaping from the attr value. For now we just remove any backslashes.
>
> Caveat: This doesn't allow for querying for values that have backslashes
> in them, e.g.
>
>     git ls-files :(attr:backslashes=\\)
>
> that would ask for matches that have the `backslashes` value set to '\'.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
>  * This applies on top of sb/pathspec-label
>  * Junio does this come close to what you imagine for escaped commas?

I, being lazy, would have used %44 for comma, which would have
avoided touching higher level of the callchain.  But "a backslash
always quotes the next byte, if you want to express a backslash,
double it" would probably be a more end-user friendly quoting
mechanism.

I do not offhand understand why the second example does not show the
paths with backslash attribute set to a single backslash, though.

> -				am->value = xstrdup(&attr[attr_len + 1]);
> +				am->value = attr_value_unquote(&attr[attr_len + 1]);
>  				if (strchr(am->value, '\\'))
>  					die(_("attr spec values must not contain backslashes"));

IOW, the "backslash is forbidden for now" IIUC was added there so
that we can introduce a quoting like this--once we decided that
quoting mechanism is via backslashes and have quoting support,
shouldn't the "values must not have backslash" just go?

>  	     *copyfrom && *copyfrom != ')';
>  	     copyfrom = nextat) {
>  		size_t len = strcspn(copyfrom, ",)");
> +		while (len > 0 && copyfrom[len - 1] == '\\'
> +		       && (copyfrom[len] == ',' || copyfrom[len] == ')'))
> +			len += strcspn(copyfrom + len + 1, ",)") + 1;

Good that we can use ')' in values, too, but I think this gets this
case wrong:

	:(attr:foo=\\,icase)

where the value for 'foo' wants to be a single backslash, and comma
is to introduce another magic, not part of the value for 'foo'.

If you want to do the "backslash unconditionally quotes the next
byte no matter what is quoted", you'd need to lose the strcspn()
and iterate over the string yourself, I would think.

>  		if (copyfrom[len] == ',')
>  			nextat = copyfrom + len + 1;
>  		else

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-01 23:52 [RFC/PATCH] pathspec: allow escaped query values Stefan Beller
  2016-06-02  0:33 ` Junio C Hamano
@ 2016-06-02  0:38 ` Ramsay Jones
  2016-06-02  2:20   ` Stefan Beller
  2016-06-02  5:46   ` Junio C Hamano
  1 sibling, 2 replies; 16+ messages in thread
From: Ramsay Jones @ 2016-06-02  0:38 UTC (permalink / raw)
  To: Stefan Beller, gitster, pclouds; +Cc: git



On 02/06/16 00:52, Stefan Beller wrote:
> In our own .gitattributes file we have attributes such as:
> 
>     *.[ch] whitespace=indent,trail,space
> 
> When querying for attributes we want to be able to ask for the exact
> value, i.e.
> 
>     git ls-files :(attr:whitespace=indent,trail,space)
> 
> should work, but the commas are used in the attr magic to introduce
> the next attr, such that this query currently fails with
> 
> fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'
> 
> This change allows escaping characters by a backslash, such that the query
> 
>     git ls-files :(attr:whitespace=indent\,trail\,space)

Not having given this much thought at all, but the question which comes
to mind is: can you use some other separator for the <attr>-s rather than
a comma? That way you don't need to quote them in the <value> part of the
<attr>-spec.

(I dunno, maybe use ; or : instead?)

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02  0:38 ` Ramsay Jones
@ 2016-06-02  2:20   ` Stefan Beller
  2016-06-02  5:46   ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2016-06-02  2:20 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, Duy Nguyen, git

On Wed, Jun 1, 2016 at 5:38 PM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>
>
> On 02/06/16 00:52, Stefan Beller wrote:
>> In our own .gitattributes file we have attributes such as:
>>
>>     *.[ch] whitespace=indent,trail,space
>>
>> When querying for attributes we want to be able to ask for the exact
>> value, i.e.
>>
>>     git ls-files :(attr:whitespace=indent,trail,space)
>>
>> should work, but the commas are used in the attr magic to introduce
>> the next attr, such that this query currently fails with
>>
>> fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'
>>
>> This change allows escaping characters by a backslash, such that the query
>>
>>     git ls-files :(attr:whitespace=indent\,trail\,space)
>
> Not having given this much thought at all, but the question which comes
> to mind is: can you use some other separator for the <attr>-s rather than
> a comma? That way you don't need to quote them in the <value> part of the
> <attr>-spec.
>
> (I dunno, maybe use ; or : instead?)

That was essentially my proposal as well (just that I said "white
spaces" instead if ':' or ';'),
but it is not a good idea according to Junio as we don't want to need
knowledge of the attr
syntax here in the pathspec matching IIUC. We rather want "I want to
transport this exact
string for the match, but to go through the pathspec magic, we need to
apply escaping of
the exact string. That way we can change the attr system later without
having to worry
about the syntax replacements done in the pathspecs.



>
> ATB,
> Ramsay Jones
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02  0:33 ` Junio C Hamano
@ 2016-06-02  2:23   ` Stefan Beller
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Beller @ 2016-06-02  2:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, git

On Wed, Jun 1, 2016 at 5:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This change allows escaping characters by a backslash, such that the query
>>
>>     git ls-files :(attr:whitespace=indent\,trail\,space)
>>
>> will match all path that have the value "indent,trail,space" for the
>> whitespace attribute. To accomplish this, we need to modify two places.
>> First `eat_long_magic` needs to not stop early upon seeing a comma or
>> closing paren that is escaped. As a second step we need to remove any
>> escaping from the attr value. For now we just remove any backslashes.
>>
>> Caveat: This doesn't allow for querying for values that have backslashes
>> in them, e.g.
>>
>>     git ls-files :(attr:backslashes=\\)
>>
>> that would ask for matches that have the `backslashes` value set to '\'.
>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>
>>  * This applies on top of sb/pathspec-label
>>  * Junio does this come close to what you imagine for escaped commas?
>
> I, being lazy, would have used %44 for comma, which would have
> avoided touching higher level of the callchain.  But "a backslash
> always quotes the next byte, if you want to express a backslash,
> double it" would probably be a more end-user friendly quoting
> mechanism.

Heh, okay.

>
> I do not offhand understand why the second example does not show the
> paths with backslash attribute set to a single backslash, though.
>
>> -                             am->value = xstrdup(&attr[attr_len + 1]);
>> +                             am->value = attr_value_unquote(&attr[attr_len + 1]);
>>                               if (strchr(am->value, '\\'))
>>                                       die(_("attr spec values must not contain backslashes"));
>
> IOW, the "backslash is forbidden for now" IIUC was added there so
> that we can introduce a quoting like this--once we decided that
> quoting mechanism is via backslashes and have quoting support,
> shouldn't the "values must not have backslash" just go?

Right this can go now.

>
>>            *copyfrom && *copyfrom != ')';
>>            copyfrom = nextat) {
>>               size_t len = strcspn(copyfrom, ",)");
>> +             while (len > 0 && copyfrom[len - 1] == '\\'
>> +                    && (copyfrom[len] == ',' || copyfrom[len] == ')'))
>> +                     len += strcspn(copyfrom + len + 1, ",)") + 1;
>
> Good that we can use ')' in values, too, but I think this gets this
> case wrong:
>
>         :(attr:foo=\\,icase)
>
> where the value for 'foo' wants to be a single backslash, and comma
> is to introduce another magic, not part of the value for 'foo'.

Right, with this patch this would be interpreted as

    foo equal to ',icase'

>
> If you want to do the "backslash unconditionally quotes the next
> byte no matter what is quoted", you'd need to lose the strcspn()
> and iterate over the string yourself, I would think.

Okay, will do that instead.

>
>>               if (copyfrom[len] == ',')
>>                       nextat = copyfrom + len + 1;
>>               else

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02  0:38 ` Ramsay Jones
  2016-06-02  2:20   ` Stefan Beller
@ 2016-06-02  5:46   ` Junio C Hamano
  2016-06-02 15:30     ` Ramsay Jones
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-06-02  5:46 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, pclouds, git

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> Not having given this much thought at all, but the question which comes
> to mind is: can you use some other separator for the <attr>-s rather than
> a comma? That way you don't need to quote them in the <value> part of the
> <attr>-spec.
>
> (I dunno, maybe use ; or : instead?)

There are two kinds of comma involved in this discussion.

 * Multiple pathspec magic can be attached to augment the way
   <pattern> selects paths. ":(<magic1>,<magic2>,...)<pattern>" is
   the syntax, and <magicN> are things like "icase" (select the path
   that matches <pattern> case-insensitively), "top" (<pattern> must
   match from the top level of the working tree, even when you are
   running the command from a subdirectory).  We added a new kind of
   <magic> whose syntax is "attr:VAR=VAL" recently, which says "not
   only <pattern> must match the path, in order to be selected, the
   path must have the attribute VAR with value VAL".

   The comma that separates multiple magic is not something you can
   change now; it has been with us since v1.7.6 (Jun 2011)

 * My example wanted to use the attr:VAR=VAL form to select those
   paths that has one specific string as the value for whitespace
   attribute, i.e. VAR in this case is "whitespace".  The value for
   whitespace attribute determines what kind of whitespace anomalies
   are considered as errors by "git apply" and "git diff", and it is
   formed by concatenating things like "indent-with-non-tab" (starts
   a line with more than 8 consecutive SPs), "space-before-tab" (a
   SP appears immediately before HT in the indent), etc., with a
   comma in between.

   The comma that separates various kinds of whitespace errors is
   not something you can change now; it has been with us since
   v1.5.4 (Feb 2008).

So using different separator is not a viable solution.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02  5:46   ` Junio C Hamano
@ 2016-06-02 15:30     ` Ramsay Jones
  2016-06-02 16:10       ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Ramsay Jones @ 2016-06-02 15:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, pclouds, git



On 02/06/16 06:46, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
>> Not having given this much thought at all, but the question which comes
>> to mind is: can you use some other separator for the <attr>-s rather than
>> a comma? That way you don't need to quote them in the <value> part of the
>> <attr>-spec.
>>
>> (I dunno, maybe use ; or : instead?)
> 
> There are two kinds of comma involved in this discussion.
> 
>  * Multiple pathspec magic can be attached to augment the way
>    <pattern> selects paths. ":(<magic1>,<magic2>,...)<pattern>" is
>    the syntax, and <magicN> are things like "icase" (select the path
>    that matches <pattern> case-insensitively), "top" (<pattern> must
>    match from the top level of the working tree, even when you are
>    running the command from a subdirectory).  We added a new kind of
>    <magic> whose syntax is "attr:VAR=VAL" recently, which says "not
>    only <pattern> must match the path, in order to be selected, the
>    path must have the attribute VAR with value VAL".
> 
>    The comma that separates multiple magic is not something you can
>    change now; it has been with us since v1.7.6 (Jun 2011)
> 
>  * My example wanted to use the attr:VAR=VAL form to select those
>    paths that has one specific string as the value for whitespace
>    attribute, i.e. VAR in this case is "whitespace".  The value for
>    whitespace attribute determines what kind of whitespace anomalies
>    are considered as errors by "git apply" and "git diff", and it is
>    formed by concatenating things like "indent-with-non-tab" (starts
>    a line with more than 8 consecutive SPs), "space-before-tab" (a
>    SP appears immediately before HT in the indent), etc., with a
>    comma in between.
> 
>    The comma that separates various kinds of whitespace errors is
>    not something you can change now; it has been with us since
>    v1.5.4 (Feb 2008).
> 
> So using different separator is not a viable solution.

Ah, OK, makes sense. Note that I have not used 'pathspec magic' or
the attribute system in git (never felt/had the need)! ;-)

So, at risk of annoying you, let me continue in my ignorance a little
longer and ask: even if you have to protect all of this 'magic' from
the shell with '/" quoting, could you not use (nested) quotes to
protect the <value> part of an <attr>? For example:

    git ls-files ':(attr:whitespace="indent,trail,space",icase)'

[Hmm, I don't seem to be able to find documentation on the syntax
of these features (apart from the code, of course).]

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 15:30     ` Ramsay Jones
@ 2016-06-02 16:10       ` Junio C Hamano
  2016-06-02 18:42         ` Ramsay Jones
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-06-02 16:10 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, pclouds, git

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> So, at risk of annoying you, let me continue in my ignorance a little
> longer and ask: even if you have to protect all of this 'magic' from
> the shell with '/" quoting, could you not use (nested) quotes to
> protect the <value> part of an <attr>? For example:
>
>     git ls-files ':(attr:whitespace="indent,trail,space",icase)'

That would be workable, I would think.  Before attr:VAR=VAL
extention, supported pathspec <magic> were only single lowercase-ascii
alphabet tokens, so nobody would have used " as a part of magic.  So
quting with double-quote pair would work.

You'd need to come up with a way to quote a double quote that
happens to be a part of VAL somehow, though.  I think attribute
value is limited to a string with non-whitespace letters; even
though the built-in attributes that have defined meaning to the Git
itself may not use values with letters beyond [-a-zA-Z0-9,], end
users and projects can add arbitrary values within the allowed
syntax, so it is not unconceivable that some project may have a
custom attribute that lists forbidden characters in a path with

	=== .gitattributes ===
        *.txt forbidden=`"

that tells their documentation cannot have these letters in it, or
something like that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 16:10       ` Junio C Hamano
@ 2016-06-02 18:42         ` Ramsay Jones
  2016-06-02 18:58           ` Junio C Hamano
  2016-06-02 19:04           ` Stefan Beller
  0 siblings, 2 replies; 16+ messages in thread
From: Ramsay Jones @ 2016-06-02 18:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, pclouds, git



On 02/06/16 17:10, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
>> So, at risk of annoying you, let me continue in my ignorance a little
>> longer and ask: even if you have to protect all of this 'magic' from
>> the shell with '/" quoting, could you not use (nested) quotes to
>> protect the <value> part of an <attr>? For example:
>>
>>     git ls-files ':(attr:whitespace="indent,trail,space",icase)'
> 
> That would be workable, I would think.  Before attr:VAR=VAL
> extention, supported pathspec <magic> were only single lowercase-ascii
> alphabet tokens, so nobody would have used " as a part of magic.  So
> quting with double-quote pair would work.

I was thinking about both ' and ", so that you could do:

   $ ./args ':(attr:whitespace="indent,trail,space",icase)'
    1::(attr:whitespace="indent,trail,space",icase)

   $ ./args ":(attr:whitespace='indent,trail,space',icase)"
    1::(attr:whitespace='indent,trail,space',icase)

   $ p=':(attr:whitespace="indent,trail,space",icase)'
   $ ./args "$p"
    1::(attr:whitespace="indent,trail,space",icase)

   $ p=":(attr:whitespace=\"indent,trail,space\",icase)"
   $ ./args "$p"
    1::(attr:whitespace="indent,trail,space",icase)

but limiting it to " would probably be OK too.

> You'd need to come up with a way to quote a double quote that
> happens to be a part of VAL somehow, though.

Yes I was assuming \ quoting as well - I just want to reduce the
need for such quoting (especially on windows).

>                                              I think attribute
> value is limited to a string with non-whitespace letters; even
> though the built-in attributes that have defined meaning to the Git
> itself may not use values with letters beyond [-a-zA-Z0-9,], end
> users and projects can add arbitrary values within the allowed
> syntax, so it is not unconceivable that some project may have a
> custom attribute that lists forbidden characters in a path with
> 
> 	=== .gitattributes ===
>         *.txt forbidden=`"
> 
> that tells their documentation cannot have these letters in it, or
> something like that.

Heh, yeah, that gets ugly:

   $ ./args ":(attr:*.txt forbidden=\'\\\",icase)"
    1::(attr:*.txt forbidden=\'\",icase)

   $ ./args ':(attr:*.txt forbidden=\'\''\",icase)'
    1::(attr:*.txt forbidden=\'\",icase)

[Note the initial ' 1:' above is output from args]

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 18:42         ` Ramsay Jones
@ 2016-06-02 18:58           ` Junio C Hamano
  2016-06-02 19:29             ` Junio C Hamano
  2016-06-02 19:04           ` Stefan Beller
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-06-02 18:58 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, Nguyen Thai Ngoc Duy, Git Mailing List

On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>>
>> That would be workable, I would think.  Before attr:VAR=VAL
>> extention, supported pathspec <magic> were only single lowercase-ascii
>> alphabet tokens, so nobody would have used " as a part of magic.  So
>> quting with double-quote pair would work.
>
> I was thinking about both ' and ", so that you could do:

Yes, I understood your suggestion as such. Quoting like shells would work
without breaking backward compatibility for the same reason quoting with
double-quote and backslash only without supporting single-quotes would
work.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 18:42         ` Ramsay Jones
  2016-06-02 18:58           ` Junio C Hamano
@ 2016-06-02 19:04           ` Stefan Beller
  2016-06-02 19:44             ` Ramsay Jones
  1 sibling, 1 reply; 16+ messages in thread
From: Stefan Beller @ 2016-06-02 19:04 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, Duy Nguyen, git

On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>
>
> On 02/06/16 17:10, Junio C Hamano wrote:
>> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
>>
>>> So, at risk of annoying you, let me continue in my ignorance a little
>>> longer and ask: even if you have to protect all of this 'magic' from
>>> the shell with '/" quoting, could you not use (nested) quotes to
>>> protect the <value> part of an <attr>? For example:
>>>
>>>     git ls-files ':(attr:whitespace="indent,trail,space",icase)'
>>
>> That would be workable, I would think.  Before attr:VAR=VAL
>> extention, supported pathspec <magic> were only single lowercase-ascii
>> alphabet tokens, so nobody would have used " as a part of magic.  So
>> quting with double-quote pair would work.
>
> I was thinking about both ' and ", so that you could do:
>
>    $ ./args ':(attr:whitespace="indent,trail,space",icase)'
>     1::(attr:whitespace="indent,trail,space",icase)
>
>    $ ./args ":(attr:whitespace='indent,trail,space',icase)"
>     1::(attr:whitespace='indent,trail,space',icase)
>
>    $ p=':(attr:whitespace="indent,trail,space",icase)'
>    $ ./args "$p"
>     1::(attr:whitespace="indent,trail,space",icase)
>
>    $ p=":(attr:whitespace=\"indent,trail,space\",icase)"
>    $ ./args "$p"
>     1::(attr:whitespace="indent,trail,space",icase)
>
> but limiting it to " would probably be OK too.
>
>> You'd need to come up with a way to quote a double quote that
>> happens to be a part of VAL somehow, though.
>
> Yes I was assuming \ quoting as well - I just want to reduce the
> need for such quoting (especially on windows).
>
>>                                              I think attribute
>> value is limited to a string with non-whitespace letters; even
>> though the built-in attributes that have defined meaning to the Git
>> itself may not use values with letters beyond [-a-zA-Z0-9,], end
>> users and projects can add arbitrary values within the allowed
>> syntax, so it is not unconceivable that some project may have a
>> custom attribute that lists forbidden characters in a path with
>>
>>       === .gitattributes ===
>>         *.txt forbidden=`"

We restrict the 'forbidden' to follow [-a-zA-Z0-9,], so we could enforce
it for the values, too.


>
>    $ ./args ":(attr:*.txt forbidden=\'\\\",icase)"
>     1::(attr:*.txt forbidden=\'\",icase)

You should lose the *.txt in there, but put it at the back

>  $ ./args ":(attr:forbidden=\'\\\",icase)*.txt"

>
>    $ ./args ':(attr:*.txt forbidden=\'\''\",icase)'
>     1::(attr:*.txt forbidden=\'\",icase)

I see, so quoting by " or ' is preferred. What if the user
wants to do a
    forbidden=',"

so we have to escape those in there, such as

    ./args ':(attr:"forbidden=\',\"")'

We need to escape both ' and ", one for the outer
shell and one for Git parsing.

Does that seem ok?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 18:58           ` Junio C Hamano
@ 2016-06-02 19:29             ` Junio C Hamano
  2016-06-02 19:52               ` Ramsay Jones
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-06-02 19:29 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, Nguyen Thai Ngoc Duy, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
> <ramsay@ramsayjones.plus.com> wrote:
>>>
>>> That would be workable, I would think.  Before attr:VAR=VAL
>>> extention, supported pathspec <magic> were only single lowercase-ascii
>>> alphabet tokens, so nobody would have used " as a part of magic.  So
>>> quting with double-quote pair would work.
>>
>> I was thinking about both ' and ", so that you could do:
>
> Yes, I understood your suggestion as such. Quoting like shells would work
> without breaking backward compatibility for the same reason quoting with
> double-quote and backslash only without supporting single-quotes would
> work.

Having said that, "It would work" does not have to mean "Hence we
must do it that way" at all.  Quoting character pairs make the
parsing and unquoting significantly more complex.

As you said, not many people used attributes and pathspec magic, and
I do not think those who want to use the new "further limits with
attributes" magic, envisioned primarily to be those who want to give
classes to submodules, have compelling reason to name their classes
with anything but lowercase-ascii-alphabet tokens.  So for a practical
purposes, I'd rather see Stefan

 * just implement backquote-blindly-passes-the-next-byte and nothing
   more elaborate; and

 * forbid [^-a-z0-9,_] from being used in the value part in the
   attr:VAR=VAL magic.

at least for now, and concentrate more on the other more important
parts of the submodule enhancement topic.

That way, those who will start using attr:VAR=VAL magic will stick
themselves to lowercase-ascii-alphabet tokens for now (simply
because an attribut that has the forbidden characters in its value
would not be usable with the magic), and we can later extend the
magic syntax parser in a backward compatible way to allow paired
quotes and other "more convenient" syntax.


[Footnote]

*1* The reason I prefer to keep the initially allowed value
characters narrow is because I envision that something like

	:(attr:VAR=(<some expression we will come up with later>))

may become necessary, and do not want to promise users that open or
close parentheses will forever be taken literally.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 19:04           ` Stefan Beller
@ 2016-06-02 19:44             ` Ramsay Jones
  2016-06-02 19:46               ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Ramsay Jones @ 2016-06-02 19:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Duy Nguyen, git



On 02/06/16 20:04, Stefan Beller wrote:
> On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
> <ramsay@ramsayjones.plus.com> wrote:
>>
>>
>> On 02/06/16 17:10, Junio C Hamano wrote:
>>> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
>>>
>>>> So, at risk of annoying you, let me continue in my ignorance a little
>>>> longer and ask: even if you have to protect all of this 'magic' from
>>>> the shell with '/" quoting, could you not use (nested) quotes to
>>>> protect the <value> part of an <attr>? For example:
>>>>
>>>>     git ls-files ':(attr:whitespace="indent,trail,space",icase)'
>>>
>>> That would be workable, I would think.  Before attr:VAR=VAL
>>> extention, supported pathspec <magic> were only single lowercase-ascii
>>> alphabet tokens, so nobody would have used " as a part of magic.  So
>>> quting with double-quote pair would work.
>>
>> I was thinking about both ' and ", so that you could do:
>>
>>    $ ./args ':(attr:whitespace="indent,trail,space",icase)'
>>     1::(attr:whitespace="indent,trail,space",icase)
>>
>>    $ ./args ":(attr:whitespace='indent,trail,space',icase)"
>>     1::(attr:whitespace='indent,trail,space',icase)
>>
>>    $ p=':(attr:whitespace="indent,trail,space",icase)'
>>    $ ./args "$p"
>>     1::(attr:whitespace="indent,trail,space",icase)
>>
>>    $ p=":(attr:whitespace=\"indent,trail,space\",icase)"
>>    $ ./args "$p"
>>     1::(attr:whitespace="indent,trail,space",icase)
>>
>> but limiting it to " would probably be OK too.
>>
>>> You'd need to come up with a way to quote a double quote that
>>> happens to be a part of VAL somehow, though.
>>
>> Yes I was assuming \ quoting as well - I just want to reduce the
>> need for such quoting (especially on windows).
>>
>>>                                              I think attribute
>>> value is limited to a string with non-whitespace letters; even
>>> though the built-in attributes that have defined meaning to the Git
>>> itself may not use values with letters beyond [-a-zA-Z0-9,], end
>>> users and projects can add arbitrary values within the allowed
>>> syntax, so it is not unconceivable that some project may have a
>>> custom attribute that lists forbidden characters in a path with
>>>
>>>       === .gitattributes ===
>>>         *.txt forbidden=`"
> 
> We restrict the 'forbidden' to follow [-a-zA-Z0-9,], so we could enforce
> it for the values, too.
> 
> 
>>
>>    $ ./args ":(attr:*.txt forbidden=\'\\\",icase)"
>>     1::(attr:*.txt forbidden=\'\",icase)
> 
> You should lose the *.txt in there, but put it at the back

Ah, yes, just shows my ignorance of the attribute system!

> 
>>  $ ./args ":(attr:forbidden=\'\\\",icase)*.txt"
> 
>>
>>    $ ./args ':(attr:*.txt forbidden=\'\''\",icase)'
>>     1::(attr:*.txt forbidden=\'\",icase)
> 
> I see, so quoting by " or ' is preferred. What if the user
> wants to do a

I think Junio wants to go with just " quoting (see other thread).

>     forbidden=',"
> 
> so we have to escape those in there, such as
> 
>     ./args ':(attr:"forbidden=\',\"")'

No, that won't work (" is not terminated), try this:

   $ ./args ':(attr:"forbidden='\'',\"")'
    1::(attr:"forbidden=',\"")
   $ 

   $ ./args ":(attr:\"forbidden=',\\\"\")"
    1::(attr:"forbidden=',\"")
   $ 

[half of the problem is just getting past the shell]

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 19:44             ` Ramsay Jones
@ 2016-06-02 19:46               ` Junio C Hamano
  2016-06-02 19:53                 ` Ramsay Jones
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2016-06-02 19:46 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Stefan Beller, Duy Nguyen, git

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> I think Junio wants to go with just " quoting (see other thread).

No.  I meant just \ quoting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 19:29             ` Junio C Hamano
@ 2016-06-02 19:52               ` Ramsay Jones
  0 siblings, 0 replies; 16+ messages in thread
From: Ramsay Jones @ 2016-06-02 19:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Nguyen Thai Ngoc Duy, Git Mailing List



On 02/06/16 20:29, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> On Thu, Jun 2, 2016 at 11:42 AM, Ramsay Jones
>> <ramsay@ramsayjones.plus.com> wrote:
>>>>
>>>> That would be workable, I would think.  Before attr:VAR=VAL
>>>> extention, supported pathspec <magic> were only single lowercase-ascii
>>>> alphabet tokens, so nobody would have used " as a part of magic.  So
>>>> quting with double-quote pair would work.
>>>
>>> I was thinking about both ' and ", so that you could do:
>>
>> Yes, I understood your suggestion as such. Quoting like shells would work
>> without breaking backward compatibility for the same reason quoting with
>> double-quote and backslash only without supporting single-quotes would
>> work.
> 
> Having said that, "It would work" does not have to mean "Hence we
> must do it that way" at all.  Quoting character pairs make the
> parsing and unquoting significantly more complex.
> 
> As you said, not many people used attributes and pathspec magic, and
> I do not think those who want to use the new "further limits with
> attributes" magic, envisioned primarily to be those who want to give
> classes to submodules, have compelling reason to name their classes
> with anything but lowercase-ascii-alphabet tokens.  So for a practical
> purposes, I'd rather see Stefan
> 
>  * just implement backquote-blindly-passes-the-next-byte and nothing
>    more elaborate; and
> 
>  * forbid [^-a-z0-9,_] from being used in the value part in the
>    attr:VAR=VAL magic.
> 
> at least for now, and concentrate more on the other more important
> parts of the submodule enhancement topic.

OK, that reasonable. I didn't mean to derail Stefan's development!

ATB,
Ramsay Jones

> 
> That way, those who will start using attr:VAR=VAL magic will stick
> themselves to lowercase-ascii-alphabet tokens for now (simply
> because an attribut that has the forbidden characters in its value
> would not be usable with the magic), and we can later extend the
> magic syntax parser in a backward compatible way to allow paired
> quotes and other "more convenient" syntax.
> 
> 
> [Footnote]
> 
> *1* The reason I prefer to keep the initially allowed value
> characters narrow is because I envision that something like
> 
> 	:(attr:VAR=(<some expression we will come up with later>))
> 
> may become necessary, and do not want to promise users that open or
> close parentheses will forever be taken literally.
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC/PATCH] pathspec: allow escaped query values
  2016-06-02 19:46               ` Junio C Hamano
@ 2016-06-02 19:53                 ` Ramsay Jones
  0 siblings, 0 replies; 16+ messages in thread
From: Ramsay Jones @ 2016-06-02 19:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Duy Nguyen, git



On 02/06/16 20:46, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
>> I think Junio wants to go with just " quoting (see other thread).
> 
> No.  I meant just \ quoting.

Yes, sorry, I only just read your last email on the other thread.

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-06-02 19:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 23:52 [RFC/PATCH] pathspec: allow escaped query values Stefan Beller
2016-06-02  0:33 ` Junio C Hamano
2016-06-02  2:23   ` Stefan Beller
2016-06-02  0:38 ` Ramsay Jones
2016-06-02  2:20   ` Stefan Beller
2016-06-02  5:46   ` Junio C Hamano
2016-06-02 15:30     ` Ramsay Jones
2016-06-02 16:10       ` Junio C Hamano
2016-06-02 18:42         ` Ramsay Jones
2016-06-02 18:58           ` Junio C Hamano
2016-06-02 19:29             ` Junio C Hamano
2016-06-02 19:52               ` Ramsay Jones
2016-06-02 19:04           ` Stefan Beller
2016-06-02 19:44             ` Ramsay Jones
2016-06-02 19:46               ` Junio C Hamano
2016-06-02 19:53                 ` Ramsay Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.