linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] sscanf: implement basic character sets
@ 2016-02-26 20:20 Jessica Yu
  2016-02-26 20:28 ` Jessica Yu
  2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
  0 siblings, 2 replies; 9+ messages in thread
From: Jessica Yu @ 2016-02-26 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel, Jessica Yu

Implement basic character sets for the '%[' conversion specifier.

The '%[' conversion specifier matches a nonempty sequence of characters
from the specified set of accepted (or with '^', rejected) characters
between the brackets. The substring matched is to be made up of characters
in (or not in) the set. This is useful for matching substrings that are
delimited by something other than spaces.

This implementation differs from its glibc counterpart in the following ways:
(1) No support for character ranges (e.g., 'a-z' or '0-9')
(2) The hyphen '-' is not a special character
(3) The closing bracket ']' cannot be matched
(4) No support (yet) for discarding matching input ('%*[')

Signed-off-by: Jessica Yu <jeyu@redhat.com>
---
Patch based on linux-next-20160226.

v4:
 - To avoid allocations (i.e. kstrndup), use a bitmap to represent
   the character set (suggested by Rasmus Villemoes)
 - Add a comment to document non-glibc behavior and provide example usage
 - Check for '[' in the '*' (discard) case. Since it is not supported
   yet, it is considered malformed input

v3:
 - Fix memory leak in error path (kfree() before returning)
 - Remove redundant condition in while loop
 - Style fix (*op)() -> op()

v2:
 - Use kstrndup() to copy the character set from fmt instead of using a
   statically allocated array

 lib/vsprintf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 525c8e1..9a3b860 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2640,8 +2640,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
 		if (*fmt == '*') {
 			if (!*str)
 				break;
-			while (!isspace(*fmt) && *fmt != '%' && *fmt)
+			while (!isspace(*fmt) && *fmt != '%' && *fmt) {
+				/* '%*[' not yet supported, invalid format */
+				if (*fmt == '[')
+					return num;
 				fmt++;
+			}
 			while (!isspace(*str) && *str)
 				str++;
 			continue;
@@ -2714,6 +2718,57 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
 			num++;
 		}
 		continue;
+		/*
+		 * Warning: This implementation of the '[' conversion specifier
+		 * deviates from its glibc counterpart in the following ways:
+		 * (1) It does NOT support ranges i.e. '-' is NOT a special character
+		 * (2) It cannot match the closing bracket ']' itself
+		 * (3) A field width is required
+		 * (4) '%*[' (discard matching input) is currently not supported
+		 *
+		 * Example usage:
+		 * ret = sscanf("00:0a:95","%2[^:]:%2[^:]:%2[^:]", buf1, buf2, buf3);
+		 * if (ret < 3)
+		 *    // etc..
+		 */
+		case '[':
+		{
+			char *s = (char *)va_arg(args, char *);
+			DECLARE_BITMAP(set, 256) = {0};
+			unsigned int len = 0;
+			bool negate = (*fmt == '^');
+
+			/* field width is required */
+			if (field_width == -1)
+				return num;
+
+			if (negate)
+				++fmt;
+
+			for ( ; *fmt && *fmt != ']'; ++fmt, ++len)
+				set_bit((u8)*fmt, set);
+
+			/* no ']' or no character set found */
+			if (!*fmt || !len)
+				return num;
+			++fmt;
+
+			if (negate) {
+				bitmap_complement(set, set, 256);
+				/* exclude null '\0' byte */
+				clear_bit(0, set);
+			}
+
+			/* match must be non-empty */
+			if (!test_bit((u8)*str, set))
+				return num;
+
+			while (test_bit((u8)*str, set) && field_width--)
+				*s++ = *str++;
+			*s = '\0';
+			++num;
+		}
+		continue;
 		case 'o':
 			base = 8;
 			break;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-26 20:20 [PATCH v4] sscanf: implement basic character sets Jessica Yu
@ 2016-02-26 20:28 ` Jessica Yu
  2016-03-07 23:12   ` Jessica Yu
  2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
  1 sibling, 1 reply; 9+ messages in thread
From: Jessica Yu @ 2016-02-26 20:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Jessica Yu [26/02/16 15:20 -0500]:
>Implement basic character sets for the '%[' conversion specifier.
>
>The '%[' conversion specifier matches a nonempty sequence of characters
>from the specified set of accepted (or with '^', rejected) characters
>between the brackets. The substring matched is to be made up of characters
>in (or not in) the set. This is useful for matching substrings that are
>delimited by something other than spaces.
>
>This implementation differs from its glibc counterpart in the following ways:
>(1) No support for character ranges (e.g., 'a-z' or '0-9')
>(2) The hyphen '-' is not a special character
>(3) The closing bracket ']' cannot be matched
>(4) No support (yet) for discarding matching input ('%*[')
>
>Signed-off-by: Jessica Yu <jeyu@redhat.com>

Since this version is largely based on Rasmus' sample bitmap code
(with only very minor tweaks), what is the best way to provide
attribution in this case? A Suggested-by: tag or another
Signed-off-by: tag (since actual code is involved)?

Thanks,
Jessica

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] sscanf: implement basic character sets
  2016-02-26 20:20 [PATCH v4] sscanf: implement basic character sets Jessica Yu
  2016-02-26 20:28 ` Jessica Yu
@ 2016-03-02 23:49 ` Rasmus Villemoes
  2016-03-07 23:09   ` Jessica Yu
  1 sibling, 1 reply; 9+ messages in thread
From: Rasmus Villemoes @ 2016-03-02 23:49 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

On Fri, Feb 26 2016, Jessica Yu <jeyu@redhat.com> wrote:

> @@ -2714,6 +2718,57 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>  			num++;
>  		}
>  		continue;
> +		/*
> +		 * Warning: This implementation of the '[' conversion specifier
> +		 * deviates from its glibc counterpart in the following ways:
> +		 * (1) It does NOT support ranges i.e. '-' is NOT a special character
> +		 * (2) It cannot match the closing bracket ']' itself
> +		 * (3) A field width is required
> +		 * (4) '%*[' (discard matching input) is currently not supported
> +		 *
> +		 * Example usage:
> +		 * ret = sscanf("00:0a:95","%2[^:]:%2[^:]:%2[^:]", buf1, buf2, buf3);
> +		 * if (ret < 3)
> +		 *    // etc..
> +		 */
> +		case '[':
> +		{
> +			char *s = (char *)va_arg(args, char *);
> +			DECLARE_BITMAP(set, 256) = {0};
> +			unsigned int len = 0;
> +			bool negate = (*fmt == '^');
> +
> +			/* field width is required */
> +			if (field_width == -1)
> +				return num;
> +
> +			if (negate)
> +				++fmt;
> +
> +			for ( ; *fmt && *fmt != ']'; ++fmt, ++len)
> +				set_bit((u8)*fmt, set);
> +
> +			/* no ']' or no character set found */
> +			if (!*fmt || !len)
> +				return num;
> +			++fmt;
> +

I think it might be useful to be able to do [^] to match any sequence of
characters. If the user passed [] the code below won't match anything,
so we'll return num anyway. In other words, I'd just omit the test for
empty character set. Other than that, LGTM.

Rasmus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
@ 2016-03-07 23:09   ` Jessica Yu
  0 siblings, 0 replies; 9+ messages in thread
From: Jessica Yu @ 2016-03-07 23:09 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

+++ Rasmus Villemoes [03/03/16 00:49 +0100]:
>On Fri, Feb 26 2016, Jessica Yu <jeyu@redhat.com> wrote:
>
>> @@ -2714,6 +2718,57 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		/*
>> +		 * Warning: This implementation of the '[' conversion specifier
>> +		 * deviates from its glibc counterpart in the following ways:
>> +		 * (1) It does NOT support ranges i.e. '-' is NOT a special character
>> +		 * (2) It cannot match the closing bracket ']' itself
>> +		 * (3) A field width is required
>> +		 * (4) '%*[' (discard matching input) is currently not supported
>> +		 *
>> +		 * Example usage:
>> +		 * ret = sscanf("00:0a:95","%2[^:]:%2[^:]:%2[^:]", buf1, buf2, buf3);
>> +		 * if (ret < 3)
>> +		 *    // etc..
>> +		 */
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			DECLARE_BITMAP(set, 256) = {0};
>> +			unsigned int len = 0;
>> +			bool negate = (*fmt == '^');
>> +
>> +			/* field width is required */
>> +			if (field_width == -1)
>> +				return num;
>> +
>> +			if (negate)
>> +				++fmt;
>> +
>> +			for ( ; *fmt && *fmt != ']'; ++fmt, ++len)
>> +				set_bit((u8)*fmt, set);
>> +
>> +			/* no ']' or no character set found */
>> +			if (!*fmt || !len)
>> +				return num;
>> +			++fmt;
>> +
>
>I think it might be useful to be able to do [^] to match any sequence of
>characters. If the user passed [] the code below won't match anything,
>so we'll return num anyway. In other words, I'd just omit the test for
>empty character set. Other than that, LGTM.

Thanks for the review. My only concern would be that that behavior
(i.e., have [^] match any sequence of characters) would also deviate
from glibc sccanf behavior (which matches nothing), and would need to
be documented as well. Perhaps we should best keep these differences
to a minimum, so as to prevent unexpected surprises.

Jessica

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-26 20:28 ` Jessica Yu
@ 2016-03-07 23:12   ` Jessica Yu
  2016-03-07 23:24     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Jessica Yu @ 2016-03-07 23:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Jessica Yu [26/02/16 15:28 -0500]:
>+++ Jessica Yu [26/02/16 15:20 -0500]:
>>Implement basic character sets for the '%[' conversion specifier.
>>
>>The '%[' conversion specifier matches a nonempty sequence of characters
>>from the specified set of accepted (or with '^', rejected) characters
>>between the brackets. The substring matched is to be made up of characters
>>in (or not in) the set. This is useful for matching substrings that are
>>delimited by something other than spaces.
>>
>>This implementation differs from its glibc counterpart in the following ways:
>>(1) No support for character ranges (e.g., 'a-z' or '0-9')
>>(2) The hyphen '-' is not a special character
>>(3) The closing bracket ']' cannot be matched
>>(4) No support (yet) for discarding matching input ('%*[')
>>
>>Signed-off-by: Jessica Yu <jeyu@redhat.com>
>
>Since this version is largely based on Rasmus' sample bitmap code
>(with only very minor tweaks), what is the best way to provide
>attribution in this case? A Suggested-by: tag or another
>Signed-off-by: tag (since actual code is involved)?

Andrew, friendly ping on this patch and question? :-)

Thanks,
Jessica

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-07 23:12   ` Jessica Yu
@ 2016-03-07 23:24     ` Andrew Morton
  2016-03-07 23:32       ` Rasmus Villemoes
  2016-03-08  1:07       ` Jessica Yu
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2016-03-07 23:24 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

On Mon, 7 Mar 2016 18:12:20 -0500 Jessica Yu <jeyu@redhat.com> wrote:

> +++ Jessica Yu [26/02/16 15:28 -0500]:
> >+++ Jessica Yu [26/02/16 15:20 -0500]:
> >>Implement basic character sets for the '%[' conversion specifier.
> >>
> >>The '%[' conversion specifier matches a nonempty sequence of characters
> >>from the specified set of accepted (or with '^', rejected) characters
> >>between the brackets. The substring matched is to be made up of characters
> >>in (or not in) the set. This is useful for matching substrings that are
> >>delimited by something other than spaces.
> >>
> >>This implementation differs from its glibc counterpart in the following ways:
> >>(1) No support for character ranges (e.g., 'a-z' or '0-9')
> >>(2) The hyphen '-' is not a special character
> >>(3) The closing bracket ']' cannot be matched
> >>(4) No support (yet) for discarding matching input ('%*[')
> >>
> >>Signed-off-by: Jessica Yu <jeyu@redhat.com>
> >
> >Since this version is largely based on Rasmus' sample bitmap code
> >(with only very minor tweaks), what is the best way to provide
> >attribution in this case? A Suggested-by: tag or another
> >Signed-off-by: tag (since actual code is involved)?
> 
> Andrew, friendly ping on this patch and question? :-)

Rasmus's Signed-off-by: would be most appropriate, please.

I've queued the patch for some testing, however the changelog which
used to have IMO-inadequate justification now has no justification at
all!

So please send along a paragraph or two which we can put in there to
explain to people why we believe this change should be made to the
kernel.  Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-07 23:24     ` Andrew Morton
@ 2016-03-07 23:32       ` Rasmus Villemoes
  2016-03-08  1:07       ` Jessica Yu
  1 sibling, 0 replies; 9+ messages in thread
From: Rasmus Villemoes @ 2016-03-07 23:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jessica Yu, Andy Shevchenko, Kees Cook, linux-kernel

On Tue, Mar 08 2016, Andrew Morton <akpm@linux-foundation.org> wrote:

>> >
>> >Since this version is largely based on Rasmus' sample bitmap code
>> >(with only very minor tweaks), what is the best way to provide
>> >attribution in this case? A Suggested-by: tag or another
>> >Signed-off-by: tag (since actual code is involved)?
>> 
>> Andrew, friendly ping on this patch and question? :-)
>
> Rasmus's Signed-off-by: would be most appropriate, please.
>

Sure,

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sscanf: implement basic character sets
  2016-03-07 23:24     ` Andrew Morton
  2016-03-07 23:32       ` Rasmus Villemoes
@ 2016-03-08  1:07       ` Jessica Yu
  1 sibling, 0 replies; 9+ messages in thread
From: Jessica Yu @ 2016-03-08  1:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Andrew Morton [07/03/16 15:24 -0800]:
>On Mon, 7 Mar 2016 18:12:20 -0500 Jessica Yu <jeyu@redhat.com> wrote:
>
>> +++ Jessica Yu [26/02/16 15:28 -0500]:
>> >+++ Jessica Yu [26/02/16 15:20 -0500]:
>> >>Implement basic character sets for the '%[' conversion specifier.
>> >>
>> >>The '%[' conversion specifier matches a nonempty sequence of characters
>> >>from the specified set of accepted (or with '^', rejected) characters
>> >>between the brackets. The substring matched is to be made up of characters
>> >>in (or not in) the set. This is useful for matching substrings that are
>> >>delimited by something other than spaces.
>> >>
>> >>This implementation differs from its glibc counterpart in the following ways:
>> >>(1) No support for character ranges (e.g., 'a-z' or '0-9')
>> >>(2) The hyphen '-' is not a special character
>> >>(3) The closing bracket ']' cannot be matched
>> >>(4) No support (yet) for discarding matching input ('%*[')
>> >>
>> >>Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> >
>> >Since this version is largely based on Rasmus' sample bitmap code
>> >(with only very minor tweaks), what is the best way to provide
>> >attribution in this case? A Suggested-by: tag or another
>> >Signed-off-by: tag (since actual code is involved)?
>>
>> Andrew, friendly ping on this patch and question? :-)
>
>Rasmus's Signed-off-by: would be most appropriate, please.
>
>I've queued the patch for some testing, however the changelog which
>used to have IMO-inadequate justification now has no justification at
>all!
>
>So please send along a paragraph or two which we can put in there to
>explain to people why we believe this change should be made to the
>kernel.  Thanks.

Andrew, I've included a more detailed explanation of the motivation
behind the patch below. Could you please append it to the end of the
original changelog? Thanks!
---

The motivation for adding character set support to sscanf originally
stemmed from the kernel livepatching project. An ongoing patchset
utilizes new livepatch Elf symbol and section names to store important
metadata livepatch needs to properly apply its patches. Such metadata
is stored in these section and symbol names as substrings delimited by
periods '.' and commas ','. For example, a livepatch symbol name might
look like this:

.klp.sym.vmlinux.printk,0

However, sscanf currently can only extract "substrings" delimited by
whitespace using the "%s" specifier. Thus for the above symbol name,
one cannot not use sscanf() to extract substrings "vmlinux" or "printk",
for example. A number of discussions on the livepatch mailing list
dealing with string parsing code for extracting these '.' and ','
delimited substrings eventually led to the conclusion that such code would
be completely unnecessary if the kernel sscanf() supported character sets.
Thus only a single sscanf() call would be necessary to extract these
substrings. In addition, such an addition to sscanf() could benefit other
areas of the kernel that might have a similar need in the future.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v4] sscanf: implement basic character sets
@ 2016-04-05 14:32 Shahbaz Youssefi
  0 siblings, 0 replies; 9+ messages in thread
From: Shahbaz Youssefi @ 2016-04-05 14:32 UTC (permalink / raw)
  To: LKML, jeyu

Note: CC me

I just read on lwn about the implementation of `%[` in sscanf. Just
wanted to point out my implementation (which supports `]` ranges (`-`)
and negation (`^`)) and let you know that you are free to take
code/inspiration from it:

https://github.com/ShabbyX/kio/blob/master/src/scanf.c#L398

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-05 14:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26 20:20 [PATCH v4] sscanf: implement basic character sets Jessica Yu
2016-02-26 20:28 ` Jessica Yu
2016-03-07 23:12   ` Jessica Yu
2016-03-07 23:24     ` Andrew Morton
2016-03-07 23:32       ` Rasmus Villemoes
2016-03-08  1:07       ` Jessica Yu
2016-03-02 23:49 ` [PATCH v4] " Rasmus Villemoes
2016-03-07 23:09   ` Jessica Yu
2016-04-05 14:32 [PATCH v4] " Shahbaz Youssefi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).