linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] sscanf: implement basic character sets
@ 2016-02-26 20:20 Jessica Yu
  2016-02-26 20:28 ` Jessica Yu
  2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
  0 siblings, 2 replies; 15+ messages in thread
From: Jessica Yu @ 2016-02-26 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel, Jessica Yu

Implement basic character sets for the '%[' conversion specifier.

The '%[' conversion specifier matches a nonempty sequence of characters
from the specified set of accepted (or with '^', rejected) characters
between the brackets. The substring matched is to be made up of characters
in (or not in) the set. This is useful for matching substrings that are
delimited by something other than spaces.

This implementation differs from its glibc counterpart in the following ways:
(1) No support for character ranges (e.g., 'a-z' or '0-9')
(2) The hyphen '-' is not a special character
(3) The closing bracket ']' cannot be matched
(4) No support (yet) for discarding matching input ('%*[')

Signed-off-by: Jessica Yu <jeyu@redhat.com>
---
Patch based on linux-next-20160226.

v4:
 - To avoid allocations (i.e. kstrndup), use a bitmap to represent
   the character set (suggested by Rasmus Villemoes)
 - Add a comment to document non-glibc behavior and provide example usage
 - Check for '[' in the '*' (discard) case. Since it is not supported
   yet, it is considered malformed input

v3:
 - Fix memory leak in error path (kfree() before returning)
 - Remove redundant condition in while loop
 - Style fix (*op)() -> op()

v2:
 - Use kstrndup() to copy the character set from fmt instead of using a
   statically allocated array

 lib/vsprintf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 525c8e1..9a3b860 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2640,8 +2640,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
 		if (*fmt == '*') {
 			if (!*str)
 				break;
-			while (!isspace(*fmt) && *fmt != '%' && *fmt)
+			while (!isspace(*fmt) && *fmt != '%' && *fmt) {
+				/* '%*[' not yet supported, invalid format */
+				if (*fmt == '[')
+					return num;
 				fmt++;
+			}
 			while (!isspace(*str) && *str)
 				str++;
 			continue;
@@ -2714,6 +2718,57 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
 			num++;
 		}
 		continue;
+		/*
+		 * Warning: This implementation of the '[' conversion specifier
+		 * deviates from its glibc counterpart in the following ways:
+		 * (1) It does NOT support ranges i.e. '-' is NOT a special character
+		 * (2) It cannot match the closing bracket ']' itself
+		 * (3) A field width is required
+		 * (4) '%*[' (discard matching input) is currently not supported
+		 *
+		 * Example usage:
+		 * ret = sscanf("00:0a:95","%2[^:]:%2[^:]:%2[^:]", buf1, buf2, buf3);
+		 * if (ret < 3)
+		 *    // etc..
+		 */
+		case '[':
+		{
+			char *s = (char *)va_arg(args, char *);
+			DECLARE_BITMAP(set, 256) = {0};
+			unsigned int len = 0;
+			bool negate = (*fmt == '^');
+
+			/* field width is required */
+			if (field_width == -1)
+				return num;
+
+			if (negate)
+				++fmt;
+
+			for ( ; *fmt && *fmt != ']'; ++fmt, ++len)
+				set_bit((u8)*fmt, set);
+
+			/* no ']' or no character set found */
+			if (!*fmt || !len)
+				return num;
+			++fmt;
+
+			if (negate) {
+				bitmap_complement(set, set, 256);
+				/* exclude null '\0' byte */
+				clear_bit(0, set);
+			}
+
+			/* match must be non-empty */
+			if (!test_bit((u8)*str, set))
+				return num;
+
+			while (test_bit((u8)*str, set) && field_width--)
+				*s++ = *str++;
+			*s = '\0';
+			++num;
+		}
+		continue;
 		case 'o':
 			base = 8;
 			break;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-26 20:20 [PATCH v4] sscanf: implement basic character sets Jessica Yu
@ 2016-02-26 20:28 ` Jessica Yu
  2016-03-07 23:12   ` Jessica Yu
  2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
  1 sibling, 1 reply; 15+ messages in thread
From: Jessica Yu @ 2016-02-26 20:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Jessica Yu [26/02/16 15:20 -0500]:
>Implement basic character sets for the '%[' conversion specifier.
>
>The '%[' conversion specifier matches a nonempty sequence of characters
>from the specified set of accepted (or with '^', rejected) characters
>between the brackets. The substring matched is to be made up of characters
>in (or not in) the set. This is useful for matching substrings that are
>delimited by something other than spaces.
>
>This implementation differs from its glibc counterpart in the following ways:
>(1) No support for character ranges (e.g., 'a-z' or '0-9')
>(2) The hyphen '-' is not a special character
>(3) The closing bracket ']' cannot be matched
>(4) No support (yet) for discarding matching input ('%*[')
>
>Signed-off-by: Jessica Yu <jeyu@redhat.com>

Since this version is largely based on Rasmus' sample bitmap code
(with only very minor tweaks), what is the best way to provide
attribution in this case? A Suggested-by: tag or another
Signed-off-by: tag (since actual code is involved)?

Thanks,
Jessica

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] sscanf: implement basic character sets
  2016-02-26 20:20 [PATCH v4] sscanf: implement basic character sets Jessica Yu
  2016-02-26 20:28 ` Jessica Yu
@ 2016-03-02 23:49 ` Rasmus Villemoes
  2016-03-07 23:09   ` Jessica Yu
  1 sibling, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2016-03-02 23:49 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

On Fri, Feb 26 2016, Jessica Yu <jeyu@redhat.com> wrote:

> @@ -2714,6 +2718,57 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>  			num++;
>  		}
>  		continue;
> +		/*
> +		 * Warning: This implementation of the '[' conversion specifier
> +		 * deviates from its glibc counterpart in the following ways:
> +		 * (1) It does NOT support ranges i.e. '-' is NOT a special character
> +		 * (2) It cannot match the closing bracket ']' itself
> +		 * (3) A field width is required
> +		 * (4) '%*[' (discard matching input) is currently not supported
> +		 *
> +		 * Example usage:
> +		 * ret = sscanf("00:0a:95","%2[^:]:%2[^:]:%2[^:]", buf1, buf2, buf3);
> +		 * if (ret < 3)
> +		 *    // etc..
> +		 */
> +		case '[':
> +		{
> +			char *s = (char *)va_arg(args, char *);
> +			DECLARE_BITMAP(set, 256) = {0};
> +			unsigned int len = 0;
> +			bool negate = (*fmt == '^');
> +
> +			/* field width is required */
> +			if (field_width == -1)
> +				return num;
> +
> +			if (negate)
> +				++fmt;
> +
> +			for ( ; *fmt && *fmt != ']'; ++fmt, ++len)
> +				set_bit((u8)*fmt, set);
> +
> +			/* no ']' or no character set found */
> +			if (!*fmt || !len)
> +				return num;
> +			++fmt;
> +

I think it might be useful to be able to do [^] to match any sequence of
characters. If the user passed [] the code below won't match anything,
so we'll return num anyway. In other words, I'd just omit the test for
empty character set. Other than that, LGTM.

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
@ 2016-03-07 23:09   ` Jessica Yu
  0 siblings, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-03-07 23:09 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

+++ Rasmus Villemoes [03/03/16 00:49 +0100]:
>On Fri, Feb 26 2016, Jessica Yu <jeyu@redhat.com> wrote:
>
>> @@ -2714,6 +2718,57 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		/*
>> +		 * Warning: This implementation of the '[' conversion specifier
>> +		 * deviates from its glibc counterpart in the following ways:
>> +		 * (1) It does NOT support ranges i.e. '-' is NOT a special character
>> +		 * (2) It cannot match the closing bracket ']' itself
>> +		 * (3) A field width is required
>> +		 * (4) '%*[' (discard matching input) is currently not supported
>> +		 *
>> +		 * Example usage:
>> +		 * ret = sscanf("00:0a:95","%2[^:]:%2[^:]:%2[^:]", buf1, buf2, buf3);
>> +		 * if (ret < 3)
>> +		 *    // etc..
>> +		 */
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			DECLARE_BITMAP(set, 256) = {0};
>> +			unsigned int len = 0;
>> +			bool negate = (*fmt == '^');
>> +
>> +			/* field width is required */
>> +			if (field_width == -1)
>> +				return num;
>> +
>> +			if (negate)
>> +				++fmt;
>> +
>> +			for ( ; *fmt && *fmt != ']'; ++fmt, ++len)
>> +				set_bit((u8)*fmt, set);
>> +
>> +			/* no ']' or no character set found */
>> +			if (!*fmt || !len)
>> +				return num;
>> +			++fmt;
>> +
>
>I think it might be useful to be able to do [^] to match any sequence of
>characters. If the user passed [] the code below won't match anything,
>so we'll return num anyway. In other words, I'd just omit the test for
>empty character set. Other than that, LGTM.

Thanks for the review. My only concern would be that that behavior
(i.e., have [^] match any sequence of characters) would also deviate
from glibc sccanf behavior (which matches nothing), and would need to
be documented as well. Perhaps we should best keep these differences
to a minimum, so as to prevent unexpected surprises.

Jessica

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-26 20:28 ` Jessica Yu
@ 2016-03-07 23:12   ` Jessica Yu
  2016-03-07 23:24     ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Jessica Yu @ 2016-03-07 23:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Jessica Yu [26/02/16 15:28 -0500]:
>+++ Jessica Yu [26/02/16 15:20 -0500]:
>>Implement basic character sets for the '%[' conversion specifier.
>>
>>The '%[' conversion specifier matches a nonempty sequence of characters
>>from the specified set of accepted (or with '^', rejected) characters
>>between the brackets. The substring matched is to be made up of characters
>>in (or not in) the set. This is useful for matching substrings that are
>>delimited by something other than spaces.
>>
>>This implementation differs from its glibc counterpart in the following ways:
>>(1) No support for character ranges (e.g., 'a-z' or '0-9')
>>(2) The hyphen '-' is not a special character
>>(3) The closing bracket ']' cannot be matched
>>(4) No support (yet) for discarding matching input ('%*[')
>>
>>Signed-off-by: Jessica Yu <jeyu@redhat.com>
>
>Since this version is largely based on Rasmus' sample bitmap code
>(with only very minor tweaks), what is the best way to provide
>attribution in this case? A Suggested-by: tag or another
>Signed-off-by: tag (since actual code is involved)?

Andrew, friendly ping on this patch and question? :-)

Thanks,
Jessica

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-07 23:12   ` Jessica Yu
@ 2016-03-07 23:24     ` Andrew Morton
  2016-03-07 23:32       ` Rasmus Villemoes
  2016-03-08  1:07       ` Jessica Yu
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2016-03-07 23:24 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

On Mon, 7 Mar 2016 18:12:20 -0500 Jessica Yu <jeyu@redhat.com> wrote:

> +++ Jessica Yu [26/02/16 15:28 -0500]:
> >+++ Jessica Yu [26/02/16 15:20 -0500]:
> >>Implement basic character sets for the '%[' conversion specifier.
> >>
> >>The '%[' conversion specifier matches a nonempty sequence of characters
> >>from the specified set of accepted (or with '^', rejected) characters
> >>between the brackets. The substring matched is to be made up of characters
> >>in (or not in) the set. This is useful for matching substrings that are
> >>delimited by something other than spaces.
> >>
> >>This implementation differs from its glibc counterpart in the following ways:
> >>(1) No support for character ranges (e.g., 'a-z' or '0-9')
> >>(2) The hyphen '-' is not a special character
> >>(3) The closing bracket ']' cannot be matched
> >>(4) No support (yet) for discarding matching input ('%*[')
> >>
> >>Signed-off-by: Jessica Yu <jeyu@redhat.com>
> >
> >Since this version is largely based on Rasmus' sample bitmap code
> >(with only very minor tweaks), what is the best way to provide
> >attribution in this case? A Suggested-by: tag or another
> >Signed-off-by: tag (since actual code is involved)?
> 
> Andrew, friendly ping on this patch and question? :-)

Rasmus's Signed-off-by: would be most appropriate, please.

I've queued the patch for some testing, however the changelog which
used to have IMO-inadequate justification now has no justification at
all!

So please send along a paragraph or two which we can put in there to
explain to people why we believe this change should be made to the
kernel.  Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-07 23:24     ` Andrew Morton
@ 2016-03-07 23:32       ` Rasmus Villemoes
  2016-03-08  1:07       ` Jessica Yu
  1 sibling, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2016-03-07 23:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jessica Yu, Andy Shevchenko, Kees Cook, linux-kernel

On Tue, Mar 08 2016, Andrew Morton <akpm@linux-foundation.org> wrote:

>> >
>> >Since this version is largely based on Rasmus' sample bitmap code
>> >(with only very minor tweaks), what is the best way to provide
>> >attribution in this case? A Suggested-by: tag or another
>> >Signed-off-by: tag (since actual code is involved)?
>> 
>> Andrew, friendly ping on this patch and question? :-)
>
> Rasmus's Signed-off-by: would be most appropriate, please.
>

Sure,

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-07 23:24     ` Andrew Morton
  2016-03-07 23:32       ` Rasmus Villemoes
@ 2016-03-08  1:07       ` Jessica Yu
  1 sibling, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-03-08  1:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Andrew Morton [07/03/16 15:24 -0800]:
>On Mon, 7 Mar 2016 18:12:20 -0500 Jessica Yu <jeyu@redhat.com> wrote:
>
>> +++ Jessica Yu [26/02/16 15:28 -0500]:
>> >+++ Jessica Yu [26/02/16 15:20 -0500]:
>> >>Implement basic character sets for the '%[' conversion specifier.
>> >>
>> >>The '%[' conversion specifier matches a nonempty sequence of characters
>> >>from the specified set of accepted (or with '^', rejected) characters
>> >>between the brackets. The substring matched is to be made up of characters
>> >>in (or not in) the set. This is useful for matching substrings that are
>> >>delimited by something other than spaces.
>> >>
>> >>This implementation differs from its glibc counterpart in the following ways:
>> >>(1) No support for character ranges (e.g., 'a-z' or '0-9')
>> >>(2) The hyphen '-' is not a special character
>> >>(3) The closing bracket ']' cannot be matched
>> >>(4) No support (yet) for discarding matching input ('%*[')
>> >>
>> >>Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> >
>> >Since this version is largely based on Rasmus' sample bitmap code
>> >(with only very minor tweaks), what is the best way to provide
>> >attribution in this case? A Suggested-by: tag or another
>> >Signed-off-by: tag (since actual code is involved)?
>>
>> Andrew, friendly ping on this patch and question? :-)
>
>Rasmus's Signed-off-by: would be most appropriate, please.
>
>I've queued the patch for some testing, however the changelog which
>used to have IMO-inadequate justification now has no justification at
>all!
>
>So please send along a paragraph or two which we can put in there to
>explain to people why we believe this change should be made to the
>kernel.  Thanks.

Andrew, I've included a more detailed explanation of the motivation
behind the patch below. Could you please append it to the end of the
original changelog? Thanks!
---

The motivation for adding character set support to sscanf originally
stemmed from the kernel livepatching project. An ongoing patchset
utilizes new livepatch Elf symbol and section names to store important
metadata livepatch needs to properly apply its patches. Such metadata
is stored in these section and symbol names as substrings delimited by
periods '.' and commas ','. For example, a livepatch symbol name might
look like this:

.klp.sym.vmlinux.printk,0

However, sscanf currently can only extract "substrings" delimited by
whitespace using the "%s" specifier. Thus for the above symbol name,
one cannot not use sscanf() to extract substrings "vmlinux" or "printk",
for example. A number of discussions on the livepatch mailing list
dealing with string parsing code for extracting these '.' and ','
delimited substrings eventually led to the conclusion that such code would
be completely unnecessary if the kernel sscanf() supported character sets.
Thus only a single sscanf() call would be necessary to extract these
substrings. In addition, such an addition to sscanf() could benefit other
areas of the kernel that might have a similar need in the future.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-04-05 14:32 [PATCH v4] " Shahbaz Youssefi
@ 2016-04-05 17:25 ` Jessica Yu
  0 siblings, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-04-05 17:25 UTC (permalink / raw)
  To: Shahbaz Youssefi; +Cc: LKML

+++ Shahbaz Youssefi [05/04/16 10:32 -0400]:
>Note: CC me
>
>I just read on lwn about the implementation of `%[` in sscanf. Just
>wanted to point out my implementation (which supports `]` ranges (`-`)
>and negation (`^`)) and let you know that you are free to take
>code/inspiration from it:
>
>https://github.com/ShabbyX/kio/blob/master/src/scanf.c#L398

Thanks Shahbaz, this is helpful. You are of course welcome to
submit a patch adding ranges and other missing features to the
kernel sscanf() as well, if that's something you're interested in
doing.

Jessica

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
@ 2016-02-24  5:39   ` Jessica Yu
  0 siblings, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-02-24  5:39 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

+++ Rasmus Villemoes [23/02/16 23:47 +0100]:
>On Tue, Feb 23 2016, Jessica Yu <jeyu@redhat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>>
>>  lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 41 insertions(+)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 525c8e1..983358a 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			char *set;
>> +			size_t (*op)(const char *str, const char *set);
>> +			size_t len = 0;
>> +			bool negate = (*(fmt) == '^');
>> +
>> +			if (field_width == -1)
>> +				field_width = SHRT_MAX;
>> +
>> +			op = negate ? &strcspn : &strspn;
>> +			if (negate)
>> +				fmt++;
>> +
>> +			len = strcspn(fmt, "]");
>> +			/* invalid format; stop here */
>> +			if (!len)
>> +				return num;
>> +
>> +			set = kstrndup(fmt, len, GFP_KERNEL);
>> +			if (!set)
>> +				return num;
>> +
>> +			/* advance fmt past ']' */
>> +			fmt += len + 1;
>> +
>> +			len = op(str, set);
>> +			/* no matches */
>> +			if (!len) {
>> +				kfree(set);
>> +				return num;
>> +			}
>> +
>> +			while (len-- && field_width--)
>> +				*s++ = *str++;
>> +			*s = '\0';
>> +			kfree(set);
>> +			num++;
>> +		}
>> +		continue;
>>  		case 'o':
>>  			base = 8;
>>  			break;
>
>(1) How do we know that doing a memory allocation would be ok, and then
>with GFP_KERNEL? vsnprintf can be called from just about any context, so
>I don't think that would fly there. Sooner or later someone is going to
>be calling sscanf with a spinlock held, methinks.
>
>(2) I think a field width should be mandatory (so %[ should simply be
>regarded as malformed - it should be %*[ or %n[ for some explicit
>decimal n). That will allow the compiler or other static analyzers to do
>sanity checking, and we'll probably be saved from a few buffer
>overflows down the line.
>
>It's a bit sad that the C standard doesn't include the terminating '\0'
>in the field width, so one would sometimes have to write
>'(int)sizeof(buf)-1', but there's not much to do about that. On that
>note, it seems that your field width handling is off-by-one.
>
>To get rid of the allocation, why not use a small bitmap? Something like
>
>{
>  char *s = (char *)va_arg(args, char *);
>  DECLARE_BITMAP(map, 256) = {0};
>  bool negate = false;
>
>  /* a field width is required, and must provide room for at least a '\0' */
>  if (field_width <= 0)
>    return num;
>
>  if (*fmt == '^') {
>    negate = true;
>    ++fmt;
>  }
>  for ( ; *fmt && *fmt != ']'; ++fmt)
>    set_bit((u8)*fmt, map);
>  if (!*fmt) // no ], so malformed input
>    return num;
>  ++fmt;
>  if (negate) {
>    bitmap_complement(map, map, 256);
>    clear_bit(0, map); // this avoids testing *str != '\0' below
>  }
>
>  if (!test_bit((u8)*str, map)) // match must be non-empty
>    return num;
>  while (test_bit((u8)*str, map) && --field_width) {
>    *s++ = *str++;
>  }
>  *s = '\0';
>  ++num;
>}

I quite like this idea, as it avoids allocations and doesn't need
strcspn/strspn. What do other people think?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-24  5:13   ` Jessica Yu
@ 2016-02-24  5:28     ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2016-02-24  5:28 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

On Wed, 24 Feb 2016 00:13:47 -0500 Jessica Yu <jeyu@redhat.com> wrote:

> >> This patch adds support for the '%[' conversion specifier for sscanf().
> >> This is useful in cases where we'd like to match substrings delimited by
> >> something other than spaces. The original motivation for this patch
> >> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> >> where we were trying to come up with a clean way to parse symbol names with
> >> substrings delimited by periods and commas.
> >
> > It would be better to include the justification right here in the
> > changelog please. 
> > Not via some link-to-discussion and definitely not
> > below the ^--- marker!  It's very important.
> 
> Thanks for the corrections Andrew. I am however slightly confused, are
> you suggesting that I should provide a much more thorough explanation
> about the motivation here in the changelog (below the ^--- marker), or
> would this be better suited for a (separate) cover letter?

Just in the plain old changelog is good - if it was in [0/n] I'd only
move it into the changelog anyway.

And 99.9% of the stuff people put below ^--- is useful so I always end
up moving that into the changelog as well...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-23 22:05 ` Andrew Morton
@ 2016-02-24  5:13   ` Jessica Yu
  2016-02-24  5:28     ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Jessica Yu @ 2016-02-24  5:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Andrew Morton [23/02/16 14:05 -0800]:
>On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@redhat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>> The '%[]' conversion specifier matches a nonempty sequence of characters
>> from the specified set of accepted (or with '^', rejected) characters
>> between the brackets. The substring matched is to be made up of characters
>> in (or not in) the set. This implementation differs from its glibc
>> counterpart in that it does not support character ranges (e.g., 'a-z' or
>> '0-9'), the hyphen '-' is *not* a special character, and the brackets
>> themselves cannot be matched.
>>
>> Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> ---
>>
>> This patch adds support for the '%[' conversion specifier for sscanf().
>> This is useful in cases where we'd like to match substrings delimited by
>> something other than spaces. The original motivation for this patch
>> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
>> where we were trying to come up with a clean way to parse symbol names with
>> substrings delimited by periods and commas.
>
> It would be better to include the justification right here in the
> changelog please. 
> Not via some link-to-discussion and definitely not
> below the ^--- marker!  It's very important.

Thanks for the corrections Andrew. I am however slightly confused, are
you suggesting that I should provide a much more thorough explanation
about the motivation here in the changelog (below the ^--- marker), or
would this be better suited for a (separate) cover letter?

>The deviation from the glibc behaviour is a bit of a worry,
>particularly as it is done in a non-back-compat manner: code which
>assumes "-" is non-magic might break if someone later adds range
>support.
>
>Presumably we can live with that - there won't be many callsites and
>they can be grepped for.  But please, let's get a description of all
>these considerations into the code as a comment.  Probably it would be
>helpful to include a little usage example in that comment.

Hm, that is a very good point. At the moment we can be sure there
aren't any users of sscanf() using the %[ conversion specifier, as it
doesn't exist yet :-) But yes, this behavior should be documented
clearly in a comment, so future users will be aware..

>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			char *set;
>> +			size_t (*op)(const char *str, const char *set);
>> +			size_t len = 0;
>> +			bool negate = (*(fmt) == '^');
>> +
>> +			if (field_width == -1)
>> +				field_width = SHRT_MAX;
>> +
>> +			op = negate ? &strcspn : &strspn;
>> +			if (negate)
>> +				fmt++;
>> +
>> +			len = strcspn(fmt, "]");
>> +			/* invalid format; stop here */
>> +			if (!len)
>> +				return num;
>> +
>> +			set = kstrndup(fmt, len, GFP_KERNEL);
>
>Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
>limits the situations in which this functionality can be used.
>
>afaict the allocation is there merely so we can null-terminate the
>string so we can use existing library functions (strcspn, strspn).  Is
>that compromise really worth it?  We could pretty easily convert
>strcspn() into
>
>	strcnspn(const char *s, const char *reject, size_t len)
>
>and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)
>
>In fact I think we could still use strspn() and strcspn() on `fmt'
>directly?  We just need to check for the return value exceeding `len'
>and if so, treat that as a no-match?
>

Perhaps we can use Rasmus' bitmap solution, as it avoids the
allocation altogether and it doesn't need to use strspn()/strcspn().

>> +			if (!set)
>> +				return num;
>> +
>> +			/* advance fmt past ']' */
>> +			fmt += len + 1;
>> +
>> +			len = op(str, set);
>> +			/* no matches */
>> +			if (!len) {
>> +				kfree(set);
>> +				return num;
>> +			}
>> +
>> +			while (len-- && field_width--)
>> +				*s++ = *str++;
>> +			*s = '\0';
>> +			kfree(set);
>> +			num++;
>> +		}
>> +		continue;
>>  		case 'o':
>>  			base = 8;
>>  			break;
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-23 19:00   ` Kees Cook
@ 2016-02-23 19:40     ` Jessica Yu
  0 siblings, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-02-23 19:40 UTC (permalink / raw)
  To: Kees Cook; +Cc: Andy Shevchenko, Andrew Morton, Rasmus Villemoes, LKML

+++ Kees Cook [23/02/16 11:00 -0800]:
>On Tue, Feb 23, 2016 at 2:56 AM, Andy Shevchenko
><andriy.shevchenko@linux.intel.com> wrote:
>> On Mon, 2016-02-22 at 16:24 -0500, Jessica Yu wrote:
>>> Implement basic character sets for the '%[]' conversion specifier.
>
>What part of the kernel will be using this feature, by the way?
>

I explained the motivation a bit more in patch v1's cover letter:
https://lkml.kernel.org/g/1455931259-27117-1-git-send-email-jeyu@redhat.com

The original idea stemmed from a discussion from the kernel livepatch mailing list:
https://lkml.org/lkml/2016/2/8/790

We were looking for a way to parse out substrings delimited by
something other than spaces. Specifically, in livepatch we are parsing
symbol names that contain substrings (which contain livepatch-specific
information) delimited by '.' and ','. Instead of manually looking for
these delimiters and adding a lot of string code to livepatch, it
would be cleaner to have a single sscanf() call to do the parsing for us.

Jessica

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-23 10:56 ` Andy Shevchenko
  2016-02-23 19:00   ` Kees Cook
@ 2016-02-23 19:26   ` Jessica Yu
  1 sibling, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-02-23 19:26 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Andrew Morton, Rasmus Villemoes, Kees Cook, linux-kernel

+++ Andy Shevchenko [23/02/16 12:56 +0200]:
>On Mon, 2016-02-22 at 16:24 -0500, Jessica Yu wrote:
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>> The '%[]' conversion specifier matches a nonempty sequence of
>> characters
>> from the specified set of accepted (or with '^', rejected) characters
>> between the brackets. The substring matched is to be made up of
>> characters
>> in (or not in) the set. This implementation differs from its glibc
>> counterpart in that it does not support character ranges (e.g., 'a-z'
>> or
>> '0-9'), the hyphen '-' is *not* a special character, and the brackets
>> themselves cannot be matched.
>>
>> Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> ---
>> Patch based on linux-next-20160222.
>>
>> v2:
>>  - Use kstrndup() to copy the character set from fmt instead of using
>> a
>>    statically allocated array
>>  
>>  lib/vsprintf.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 525c8e1..93a6f52 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,45 @@ int vsscanf(const char *buf, const char *fmt,
>> va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			char *set;
>> +			size_t (*op)(const char *str, const char
>> *set);
>> +			size_t len = 0;
>> +			bool negate = (*(fmt) == '^');
>> +
>> +			if (field_width == -1)
>> +				field_width = SHRT_MAX;
>
>I'm not sure if it's needed here. It will count down till 0 in any
>case.

I think it might be good to be consistent with the '%s' specifier code
and have some sort of upper bound set, even if it is much more likely
that len will get to 0 before field_width does.

>> +
>> +			op = negate ? &strcspn : &strspn;
>> +			if (negate)
>> +				fmt++;
>
>> +
>> +			len = strcspn(fmt, "]");
>> +			/* invalid format; stop here */
>> +			if (!len)
>> +				return num;
>> +
>> +			set = kstrndup(fmt, len, GFP_KERNEL);
>> +			if (!set)
>> +				return num;
>> +
>> +			/* advance fmt past ']' */
>> +			fmt += len + 1;
>> +
>> +			len = (*op)(str, set);
>
>Can we use just normal form:
> op();
>?
>
>> +			/* no matches */
>> +			if (!len)
>
>Memory leak here.
>
>> +				return num;
>> +
>> +			while (*str && len-- && field_width--)
>> +				*s++ = *str++;
>
>Looks like strcpy() variant. First of all, is it possible to have *str
>== '\0' when len != 0?

Good point. The *str check is redundant, since after the call to
strspn/strcspn we know there are at least len bytes in str, so that
check can be removed.

>> +			*s = '\0';
>> +			kfree(set);
>> +			num++;
>> +		}
>> +		continue;
>>  		case 'o':
>>  			base = 8;
>>  			break;
>
>-- 
>Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>Intel Finland Oy
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-22 10:13 ` Andy Shevchenko
@ 2016-02-22 17:51   ` Jessica Yu
  0 siblings, 0 replies; 15+ messages in thread
From: Jessica Yu @ 2016-02-22 17:51 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Andrew Morton, Rasmus Villemoes, Kees Cook, linux-kernel

+++ Andy Shevchenko [22/02/16 12:13 +0200]:
>On Fri, 2016-02-19 at 20:22 -0500, Jessica Yu wrote:
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>> The '%[]' conversion specifier matches a nonempty sequence of
>> characters
>> from the specified set of accepted (or with '^', rejected) characters
>> between the brackets. The substring matched is to be made up of
>> characters
>> in (or not in) the set. This implementation differs from its glibc
>> counterpart in that it does not support character ranges (e.g., 'a-z'
>> or
>> '0-9'), the hyphen '-' is *not* a special character, and the brackets
>> themselves cannot be matched.
>>
>> Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> ---
>>  lib/vsprintf.c | 35 +++++++++++++++++++++++++++++++++++
>>  1 file changed, 35 insertions(+)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 525c8e1..6ee3e7f 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,41 @@ int vsscanf(const char *buf, const char *fmt,
>> va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			char set[U8_MAX] = { 0 };
>
>Hmm... 255 on stack, not the best idea.
>
>> +			size_t (*op)(const char *str, const char
>> *set);
>> +			size_t len = 0;
>> +			bool negate = (*(fmt) == '^');
>> +
>> +			if (field_width == -1)
>> +				field_width = SHRT_MAX;
>> +
>> +			op = negate ? &strcspn : &strspn;
>> +			if (negate)
>> +				fmt++;
>> +
>> +			len = strcspn(fmt, "]");
>> +			/* invalid format; stop here */
>> +			if (!len)
>> +				return num;
>> +
>> +			strncpy(set, fmt, len);
>
>Perhaps here you may allocate memory on heap and copy the given set.
>IIRC kstrndup() does this.

Thanks for the comments Andy. I did in fact use kstrndup() originally,
but I was not sure about error handling. i.e., if kstrndup() fails we
normally return -ENOMEM, but in this case I suppose sscanf() could
just fail and return num?

Thanks,
Jessica

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-04-05 17:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26 20:20 [PATCH v4] sscanf: implement basic character sets Jessica Yu
2016-02-26 20:28 ` Jessica Yu
2016-03-07 23:12   ` Jessica Yu
2016-03-07 23:24     ` Andrew Morton
2016-03-07 23:32       ` Rasmus Villemoes
2016-03-08  1:07       ` Jessica Yu
2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
2016-03-07 23:09   ` Jessica Yu
  -- strict thread matches above, loose matches on Subject: below --
2016-04-05 14:32 [PATCH v4] " Shahbaz Youssefi
2016-04-05 17:25 ` Jessica Yu
2016-02-23 20:38 [PATCH v3] " Jessica Yu
2016-02-23 22:05 ` Andrew Morton
2016-02-24  5:13   ` Jessica Yu
2016-02-24  5:28     ` Andrew Morton
2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
2016-02-24  5:39   ` Jessica Yu
2016-02-22 21:24 [PATCH v2] " Jessica Yu
2016-02-23 10:56 ` Andy Shevchenko
2016-02-23 19:00   ` Kees Cook
2016-02-23 19:40     ` Jessica Yu
2016-02-23 19:26   ` Jessica Yu
2016-02-20  1:22 [PATCH 1/1] " Jessica Yu
2016-02-22 10:13 ` Andy Shevchenko
2016-02-22 17:51   ` Jessica Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).