[PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
@ 2020-11-12 11:17 Richard Fitzgerald
  2020-11-12 15:35 ` Steven Rostedt
  2020-11-12 16:17 ` Petr Mladek
  0 siblings, 2 replies; 8+ messages in thread
From: Richard Fitzgerald @ 2020-11-12 11:17 UTC (permalink / raw)
  To: pmladek, rostedt, sergey.senozhatsky
  Cc: linux-kernel, patches, Richard Fitzgerald

Number conversion in vsscanf converts a whole string of digits and then
extracts the field width part from the converted value. The maximum run
of digits is limited by overflow. Conversion was using either
simple_strto[u]l or simple_strto[u]ll based on the 'L' qualifier. This
created a difference in truncation between builds where long is 32-bit
and builds where it is 64-bit. This especially affects parsing a run of
contiguous digits into separate fields - the maximum length of the run
is 16 digits if long is 64-bit but only 8 digits if long is 32-bits.
For example a conversion "%6x%6x" would convert both fields correctly if
long is 64-bit but not if long is 32-bit.

It is undesirable for vsscanf to parse numbers differently depending on
the size of long on the target build.

As simple_strto[u]l just calls simple_strto[u]ll anyway the conversion
is always 64-bit, and the result is manipulated as a u64, so this is an
avoidable behaviour difference between 32-bit and 64-bit builds. The
conversion can call simple_strto[u]ll directly and preserve the full
64-bits that were parsed out of the string.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
---
 lib/vsprintf.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 14c9a6af1b23..63b6cddfa7f7 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3444,13 +3444,9 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
 			break;

 		if (is_sign)
-			val.s = qualifier != 'L' ?
-				simple_strtol(str, &next, base) :
-				simple_strtoll(str, &next, base);
+			val.s = simple_strtoll(str, &next, base);
 		else
-			val.u = qualifier != 'L' ?
-				simple_strtoul(str, &next, base) :
-				simple_strtoull(str, &next, base);
+			val.u = simple_strtoull(str, &next, base);

 		if (field_width > 0 && next - str > field_width) {
 			if (base == 0)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-12 11:17 [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing Richard Fitzgerald
@ 2020-11-12 15:35 ` Steven Rostedt
  2020-11-12 15:46   ` Richard Fitzgerald
  2020-11-12 16:17 ` Petr Mladek
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2020-11-12 15:35 UTC (permalink / raw)
  To: Richard Fitzgerald; +Cc: pmladek, sergey.senozhatsky, linux-kernel, patches

On Thu, 12 Nov 2020 11:17:59 +0000
Richard Fitzgerald <rf@opensource.cirrus.com> wrote:

> Number conversion in vsscanf converts a whole string of digits and then
> extracts the field width part from the converted value. The maximum run
> of digits is limited by overflow. Conversion was using either
> simple_strto[u]l or simple_strto[u]ll based on the 'L' qualifier. This
> created a difference in truncation between builds where long is 32-bit
> and builds where it is 64-bit. This especially affects parsing a run of
> contiguous digits into separate fields - the maximum length of the run
> is 16 digits if long is 64-bit but only 8 digits if long is 32-bits.
> For example a conversion "%6x%6x" would convert both fields correctly if
> long is 64-bit but not if long is 32-bit.
> 
> It is undesirable for vsscanf to parse numbers differently depending on
> the size of long on the target build.
> 
> As simple_strto[u]l just calls simple_strto[u]ll anyway the conversion
> is always 64-bit, and the result is manipulated as a u64, so this is an
> avoidable behaviour difference between 32-bit and 64-bit builds. The
> conversion can call simple_strto[u]ll directly and preserve the full
> 64-bits that were parsed out of the string.
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
> ---
>  lib/vsprintf.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 14c9a6af1b23..63b6cddfa7f7 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -3444,13 +3444,9 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>  			break;
>  
>  		if (is_sign)
> -			val.s = qualifier != 'L' ?
> -				simple_strtol(str, &next, base) :
> -				simple_strtoll(str, &next, base);
> +			val.s = simple_strtoll(str, &next, base);
>  		else
> -			val.u = qualifier != 'L' ?
> -				simple_strtoul(str, &next, base) :
> -				simple_strtoull(str, &next, base);
> +			val.u = simple_strtoull(str, &next, base);
>  
>  		if (field_width > 0 && next - str > field_width) {
>  			if (base == 0)

It looks like this is fixing the symptom and not the disease. The real
issue I see here is that vsscanf is not honoring the '6' of '%6x' here. It
should only read the 6 characters then do the conversion, not the other
way around! This looks to me that the design is of issue.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-12 15:35 ` Steven Rostedt
@ 2020-11-12 15:46   ` Richard Fitzgerald
  2020-11-12 17:04     ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Fitzgerald @ 2020-11-12 15:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: pmladek, sergey.senozhatsky, linux-kernel, patches

On 12/11/2020 15:35, Steven Rostedt wrote:
> On Thu, 12 Nov 2020 11:17:59 +0000
> Richard Fitzgerald <rf@opensource.cirrus.com> wrote:
> 
>> Number conversion in vsscanf converts a whole string of digits and then
>> extracts the field width part from the converted value. The maximum run
>> of digits is limited by overflow. Conversion was using either
>> simple_strto[u]l or simple_strto[u]ll based on the 'L' qualifier. This
>> created a difference in truncation between builds where long is 32-bit
>> and builds where it is 64-bit. This especially affects parsing a run of
>> contiguous digits into separate fields - the maximum length of the run
>> is 16 digits if long is 64-bit but only 8 digits if long is 32-bits.
>> For example a conversion "%6x%6x" would convert both fields correctly if
>> long is 64-bit but not if long is 32-bit.
>>
>> It is undesirable for vsscanf to parse numbers differently depending on
>> the size of long on the target build.
>>
>> As simple_strto[u]l just calls simple_strto[u]ll anyway the conversion
>> is always 64-bit, and the result is manipulated as a u64, so this is an
>> avoidable behaviour difference between 32-bit and 64-bit builds. The
>> conversion can call simple_strto[u]ll directly and preserve the full
>> 64-bits that were parsed out of the string.
>>
>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>> ---
>>   lib/vsprintf.c | 8 ++------
>>   1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 14c9a6af1b23..63b6cddfa7f7 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -3444,13 +3444,9 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>   			break;
>>   
>>   		if (is_sign)
>> -			val.s = qualifier != 'L' ?
>> -				simple_strtol(str, &next, base) :
>> -				simple_strtoll(str, &next, base);
>> +			val.s = simple_strtoll(str, &next, base);
>>   		else
>> -			val.u = qualifier != 'L' ?
>> -				simple_strtoul(str, &next, base) :
>> -				simple_strtoull(str, &next, base);
>> +			val.u = simple_strtoull(str, &next, base);
>>   
>>   		if (field_width > 0 && next - str > field_width) {
>>   			if (base == 0)
> 
> It looks like this is fixing the symptom and not the disease. The real
> issue I see here is that vsscanf is not honoring the '6' of '%6x' here. It
> should only read the 6 characters then do the conversion, not the other
> way around! This looks to me that the design is of issue.
> 
> -- Steve
> 

See this thread from 2014 where the field width problem was raised and
explained:
http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03443.html

and the reply from Linus Torvalds that was against fixing field width
handling:
http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03488.html

which I assume is why the field handling wasn't unoptimized to be
strictly correct.

Nevertheless, I see no reason not to remove avoidable inconsistencies
from the current design.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-12 11:17 [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing Richard Fitzgerald
  2020-11-12 15:35 ` Steven Rostedt
@ 2020-11-12 16:17 ` Petr Mladek
  2020-11-12 16:45   ` David Laight
  1 sibling, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2020-11-12 16:17 UTC (permalink / raw)
  To: Richard Fitzgerald
  Cc: rostedt, sergey.senozhatsky, linux-kernel, patches,
	Andy Shevchenko, Rasmus Villemoes

Adding other vsprintf maintainers and reviewes into CC.

On Thu 2020-11-12 11:17:59, Richard Fitzgerald wrote:
> Number conversion in vsscanf converts a whole string of digits and then
> extracts the field width part from the converted value. The maximum run
> of digits is limited by overflow. Conversion was using either
> simple_strto[u]l or simple_strto[u]ll based on the 'L' qualifier. This
> created a difference in truncation between builds where long is 32-bit
> and builds where it is 64-bit. This especially affects parsing a run of
> contiguous digits into separate fields - the maximum length of the run
> is 16 digits if long is 64-bit but only 8 digits if long is 32-bits.
> For example a conversion "%6x%6x" would convert both fields correctly if
> long is 64-bit but not if long is 32-bit.

I might be just slow today. But it took me really long time to
understand what exactly is the problem and how it is caused.
The description is nicely detailed but somehow cryptic.

My understanding is that there is a bug when parsing numbers
with a limited width like the above mentioned "%6x%6x".

The problem is how the width is handled:

 1. The width is ignored in the 1st step. The entire number
    is read using simple_strto[u]l[l] functions.

 2. The width limit is achieved by dividing the result from
    the first step until it fits the width.

It gives wrong result when there was an overflow in the 1st step.
The high bits were lost even when the limited number would
not overflow.

This patch improves the situation on 32-bit system because
it reduces the chance of overflow there. It also makes
it work the same on 32-bit and 64-bit systems.

But the problem is still there when the 64-bit number
overflows!

IMHO, the right solution is to copy only the limited number
of characters into a buffer and call simple_strto[u]l[l]
on this buffer. Then we do not need to divide the result
at all.

That said, I agree that simple_strto[u]ll() might always get called.
It is used internally by simple_strto[u]l() anyway. The value
is always stored into a local 64-bit variable. It is properly
casted later when the value is copied into the caller's variable.
But I do not consider this a proper fix of the width handling.

Best Regards,
Petr

> It is undesirable for vsscanf to parse numbers differently depending on
> the size of long on the target build.
> 
> As simple_strto[u]l just calls simple_strto[u]ll anyway the conversion
> is always 64-bit, and the result is manipulated as a u64, so this is an
> avoidable behaviour difference between 32-bit and 64-bit builds. The
> conversion can call simple_strto[u]ll directly and preserve the full
> 64-bits that were parsed out of the string.
> 
> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
> ---
>  lib/vsprintf.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 14c9a6af1b23..63b6cddfa7f7 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -3444,13 +3444,9 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>  			break;
>  
>  		if (is_sign)
> -			val.s = qualifier != 'L' ?
> -				simple_strtol(str, &next, base) :
> -				simple_strtoll(str, &next, base);
> +			val.s = simple_strtoll(str, &next, base);
>  		else
> -			val.u = qualifier != 'L' ?
> -				simple_strtoul(str, &next, base) :
> -				simple_strtoull(str, &next, base);
> +			val.u = simple_strtoull(str, &next, base);
>  
>  		if (field_width > 0 && next - str > field_width) {
>  			if (base == 0)
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-12 16:17 ` Petr Mladek
@ 2020-11-12 16:45   ` David Laight
  0 siblings, 0 replies; 8+ messages in thread
From: David Laight @ 2020-11-12 16:45 UTC (permalink / raw)
  To: 'Petr Mladek', Richard Fitzgerald
  Cc: rostedt, sergey.senozhatsky, linux-kernel, patches,
	Andy Shevchenko, Rasmus Villemoes

From: Petr Mladek
> Sent: 12 November 2020 16:18
> 
> Adding other vsprintf maintainers and reviewes into CC.
> 
> On Thu 2020-11-12 11:17:59, Richard Fitzgerald wrote:
> > Number conversion in vsscanf converts a whole string of digits and then
> > extracts the field width part from the converted value. The maximum run
> > of digits is limited by overflow. Conversion was using either
> > simple_strto[u]l or simple_strto[u]ll based on the 'L' qualifier. This
> > created a difference in truncation between builds where long is 32-bit
> > and builds where it is 64-bit. This especially affects parsing a run of
> > contiguous digits into separate fields - the maximum length of the run
> > is 16 digits if long is 64-bit but only 8 digits if long is 32-bits.
> > For example a conversion "%6x%6x" would convert both fields correctly if
> > long is 64-bit but not if long is 32-bit.

If %6x%6x works one might expect %6x%6x%6x to also work.

> I might be just slow today. But it took me really long time to
> understand what exactly is the problem and how it is caused.
> The description is nicely detailed but somehow cryptic.
> 
> My understanding is that there is a bug when parsing numbers
> with a limited width like the above mentioned "%6x%6x".
> 
> The problem is how the width is handled:
> 
>  1. The width is ignored in the 1st step. The entire number
>     is read using simple_strto[u]l[l] functions.
> 
>  2. The width limit is achieved by dividing the result from
>     the first step until it fits the width.
> 
> It gives wrong result when there was an overflow in the 1st step.
> The high bits were lost even when the limited number would
> not overflow.

What happens if there are leading zeros on the hex input?
From the description I think 'something terrible happens'.
(Well, horribly unexpected anyway.)

I'd also expect strtoull() to 'eat' all the digits that exist,
not stop when the value got too large.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-12 15:46   ` Richard Fitzgerald
@ 2020-11-12 17:04     ` Steven Rostedt
  2020-11-13 14:00       ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2020-11-12 17:04 UTC (permalink / raw)
  To: Richard Fitzgerald; +Cc: pmladek, sergey.senozhatsky, linux-kernel, patches

On Thu, 12 Nov 2020 15:46:46 +0000
Richard Fitzgerald <rf@opensource.cirrus.com> wrote:

> See this thread from 2014 where the field width problem was raised and
> explained:
> http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03443.html
> 
> and the reply from Linus Torvalds that was against fixing field width
> handling:
> http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03488.html

Thanks for the pointers, but note, that references to older emails should
use https://lore.kernel.org/ as these links format the output really
horribly.

> 
> which I assume is why the field handling wasn't unoptimized to be
> strictly correct.
> 
> Nevertheless, I see no reason not to remove avoidable inconsistencies
> from the current design.

Yes, but perhaps its time to fix the real problem and not just add
band-aids. That thread is over 6 years old (the email was from Jan 14, 2014)

$ git diff `git rev-list --before 'Jan 14 2014' HEAD --max-count=1` |
  grep '^+' | grep sscanf | wc -l
622

There's been over 600 new additions of sscanf(). Now is the time to just
fix it correctly.

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-12 17:04     ` Steven Rostedt
@ 2020-11-13 14:00       ` Petr Mladek
  2020-11-16 10:47         ` Richard Fitzgerald
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2020-11-13 14:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Richard Fitzgerald, sergey.senozhatsky, linux-kernel, patches

On Thu 2020-11-12 12:04:27, Steven Rostedt wrote:
> On Thu, 12 Nov 2020 15:46:46 +0000
> Richard Fitzgerald <rf@opensource.cirrus.com> wrote:
> 
> > See this thread from 2014 where the field width problem was raised and
> > explained:
> > http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03443.html
> > 
> > and the reply from Linus Torvalds that was against fixing field width
> > handling:
> > http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03488.html
> 
> Thanks for the pointers, but note, that references to older emails should
> use https://lore.kernel.org/ as these links format the output really
> horribly.
> 
> > 
> > which I assume is why the field handling wasn't unoptimized to be
> > strictly correct.

Honestly, the handling of the number width by div does not look like
a real optimization to me. It avoids the need of the temporary buffer
by expensive and error-prone operation.

IMHO, it is safe to assume that the width will be limited so that
the value would never overflow.

The longest supported number would be (2^64 - 1) in octal. If I am
counting correctly, it is

     01777777777777777777777

and it fits into buf[24] including the trailing '\0'.

We could call WARN_ON_ONCE() when the width >= 24 is higher.
And we could add a compiler check when long long is bigger
than 64-bit.

> Yes, but perhaps its time to fix the real problem and not just add
> band-aids. That thread is over 6 years old (the email was from Jan 14, 2014)
>
> $ git diff `git rev-list --before 'Jan 14 2014' HEAD --max-count=1` |
>   grep '^+' | grep sscanf | wc -l
> 622
> 
> There's been over 600 new additions of sscanf(). Now is the time to just
> fix it correctly.

And the following one might suffer from this problem:

drivers/soundwire/slave.c:              ret = sscanf(compat, "sdw%01x%04hx%04hx%02hhx", &sdw_version,

I agree with Steven that it is time to fix it properly.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing
  2020-11-13 14:00       ` Petr Mladek
@ 2020-11-16 10:47         ` Richard Fitzgerald
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Fitzgerald @ 2020-11-16 10:47 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt; +Cc: sergey.senozhatsky, linux-kernel, patches

On 13/11/2020 14:00, Petr Mladek wrote:
> On Thu 2020-11-12 12:04:27, Steven Rostedt wrote:
>> On Thu, 12 Nov 2020 15:46:46 +0000
>> Richard Fitzgerald <rf@opensource.cirrus.com> wrote:
>>
>>> See this thread from 2014 where the field width problem was raised and
>>> explained:
>>> http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03443.html
>>>
>>> and the reply from Linus Torvalds that was against fixing field width
>>> handling:
>>> http://lkml.iu.edu/hypermail/linux/kernel/1401.1/03488.html
>>
>> Thanks for the pointers, but note, that references to older emails should
>> use https://lore.kernel.org/ as these links format the output really
>> horribly.
>>
>>>
>>> which I assume is why the field handling wasn't unoptimized to be
>>> strictly correct.
> 
> Honestly, the handling of the number width by div does not look like
> a real optimization to me. It avoids the need of the temporary buffer
> by expensive and error-prone operation.
> 
> IMHO, it is safe to assume that the width will be limited so that
> the value would never overflow.
> 
> The longest supported number would be (2^64 - 1) in octal. If I am
> counting correctly, it is
> 
>       01777777777777777777777
> 
> and it fits into buf[24] including the trailing '\0'.
> 
> We could call WARN_ON_ONCE() when the width >= 24 is higher.
> And we could add a compiler check when long long is bigger
> than 64-bit.
> 
>> Yes, but perhaps its time to fix the real problem and not just add
>> band-aids. That thread is over 6 years old (the email was from Jan 14, 2014)
>>
>> $ git diff `git rev-list --before 'Jan 14 2014' HEAD --max-count=1` |
>>    grep '^+' | grep sscanf | wc -l
>> 622
>>
>> There's been over 600 new additions of sscanf(). Now is the time to just
>> fix it correctly.
> 
> And the following one might suffer from this problem:
> 
> drivers/soundwire/slave.c:              ret = sscanf(compat, "sdw%01x%04hx%04hx%02hhx", &sdw_version,
> 

That's exactly the bug I have.
I'll look at reworking the code to handle number field widths properly.

> I agree with Steven that it is time to fix it properly.
> 
> Best Regards,
> Petr
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-16 12:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 11:17 [PATCH] lib: vsprintf: Avoid 32-bit truncation in vsscanf number parsing Richard Fitzgerald
2020-11-12 15:35 ` Steven Rostedt
2020-11-12 15:46   ` Richard Fitzgerald
2020-11-12 17:04     ` Steven Rostedt
2020-11-13 14:00       ` Petr Mladek
2020-11-16 10:47         ` Richard Fitzgerald
2020-11-12 16:17 ` Petr Mladek
2020-11-12 16:45   ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).