* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-18 20:49 ` Andy Shevchenko
@ 2022-10-19 10:43 ` Haakon Bugge
2022-10-19 11:25 ` Andy Shevchenko
2022-10-19 18:36 ` Jane Chu
2022-10-20 7:44 ` Petr Mladek
2 siblings, 1 reply; 18+ messages in thread
From: Haakon Bugge @ 2022-10-19 10:43 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Jane Chu, Petr Mladek, rostedt, senozhatsky, linux, linux-kernel,
linux-mm, John Haxby
> On 18 Oct 2022, at 22:49, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
>
> On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
>> On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
>>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
>>>> On 10/18/2022 5:45 AM, Petr Mladek wrote:
>>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
>>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
>>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
>>>>>>>> While debugging a separate issue, it was found that an invalid string
>>>>>>>> pointer could very well contain a non-canical address, such as
>>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
>>>>>>>> to protect the kernel from crashing due to general protection fault
>>>>>>>>
>>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>>>>>>>> return "(efault)";
>>>>>>>>
>>>>>>>> So instead, use kern_addr_valid() to validate the string pointer.
>>>>>>>
>>>>>>> How did you check that value of the (invalid string) pointer?
>>>>>>>
>>>>>>
>>>>>> In the bug scenario, the invalid string pointer was an out-of-bound
>>>>>> string pointer. While the OOB referencing is fixed,
>>>>>
>>>>> Could you please provide more details about the fixed OOB?
>>>>> What exact vsprintf()/printk() call was broken and eventually
>>>>> how it was fixed, please?
>>>>
>>>> For sensitive reason, I'd like to avoid mentioning the specific name of
>>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
>>>> and describe the precise nature of the issue.
>>>>
>>>> devX_attrY[] is a string array, declared and filled at compile time,
>>>> like
>>>> const char const devX_attrY[] = {
>>>> [ATTRY_A] = "Dev X AttributeY A",
>>>> [ATTRY_B] = "Dev X AttributeY B",
>>>> ...
>>>> [ATTRY_G] = "Dev X AttributeY G",
>>>> }
>>>> such that, when user "cat /sys/devices/systems/.../attry_1",
>>>> "Dev X AttributeY B" will show up in the terminal.
>>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
>>>>
>>>> The bug was that the index to the array was wrongfully produced,
>>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
>>>> calculation and that is not an upstream fix.
>>>>
>>>>>
>>>>>> the lingering issue
>>>>>> is that the kernel ought to be able to protect itself, as the pointer
>>>>>> contains a non-canonical address.
>>>>>
>>>>> Was the pointer used only by the vsprintf()?
>>>>> Or was it accessed also by another code, please?
>>>>
>>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case.
>>>> No other code uses the OOB pointer, verified both by code examination
>>>> and test.
>>>
>>> So, then the vsprintf() is _the_ point to crash and why should we hide that?
>>> Because of the crash you found the culprit, right? The efault will hide very
>>> important details.
>>>
>>> So to me it sounds like I like this change less and less...
>>
>> What about the existing check
>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>> return "(efault)";
>> ?
>
> Because it's _special_. We know that First page is equivalent to a NULL pointer
> and the last one is dedicated for so called error pointers. There are no more
> special exceptions to the addresses in the Linux kernel (I don't talk about
> alignment requirements by the certain architectures).
>
>> In an experiment just to print the raw OOB pointer values, I saw below
>> (the devX attrY stuff are substitutes of the real attributes, other
>> values and strings are verbatim copy from "dmesg"):
>>
>> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E
>> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F
>> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G
>> [ 3002.772349] devX_attrY[29]: (0) (null)
>> [ 3002.772350] devX_attrY[30]: (0) (null)
>> [ 3002.772351] devX_attrY[31]: (0) (null)
>> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval)
>> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval)
>> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval)
>> [ 3002.772355] devX_attrY[35]: (746e75) (einval)
>>
>> where starting from index 29 are all OOB pointers.
>>
>> As you can see, if the OOBs are NULL, "(null)" was printed due to the
>> existing checking, but when the OOBs are turned to non-canonical which
>> is detectable, the fact the pointer value deviates from
>> (ffffffff84d60aee + 4 * sizeof(void *))
>> evidently shown that the OOBs are detectable.
>>
>> The question then is why should the non-canonical OOBs be treated
>> differently from NULL and ERR_VALUE?
>
> Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> to see a bug as early as possible?
If you follow that argument, why doesn't the kernel crash when the pointer is, e.g., a NULL pointer? According to you, shouldn't it crash a early as possible in that case also?
Thxs, Håkon
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-19 10:43 ` Haakon Bugge
@ 2022-10-19 11:25 ` Andy Shevchenko
0 siblings, 0 replies; 18+ messages in thread
From: Andy Shevchenko @ 2022-10-19 11:25 UTC (permalink / raw)
To: Haakon Bugge
Cc: Jane Chu, Petr Mladek, rostedt, senozhatsky, linux, linux-kernel,
linux-mm, John Haxby
On Wed, Oct 19, 2022 at 10:43:07AM +0000, Haakon Bugge wrote:
> > On 18 Oct 2022, at 22:49, Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
...
> > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> > to see a bug as early as possible?
>
> If you follow that argument, why doesn't the kernel crash when the pointer
> is, e.g., a NULL pointer? According to you, shouldn't it crash a early as
> possible in that case also?
Because it is _special_. It's not just an invalid pointer. There may be
very well good cases where we supply (valid!) NULL pointers to the printf().
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-18 20:49 ` Andy Shevchenko
2022-10-19 10:43 ` Haakon Bugge
@ 2022-10-19 18:36 ` Jane Chu
2022-10-19 19:26 ` Andy Shevchenko
2022-10-20 7:44 ` Petr Mladek
2 siblings, 1 reply; 18+ messages in thread
From: Jane Chu @ 2022-10-19 18:36 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Petr Mladek, rostedt, senozhatsky, linux, linux-kernel, linux-mm,
Haakon Bugge, John Haxby, Jane Chu
On 10/18/2022 1:49 PM, Andy Shevchenko wrote:
> On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
>> On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
>>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
>>>> On 10/18/2022 5:45 AM, Petr Mladek wrote:
>>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
>>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
>>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
>>>>>>>> While debugging a separate issue, it was found that an invalid string
>>>>>>>> pointer could very well contain a non-canical address, such as
>>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
>>>>>>>> to protect the kernel from crashing due to general protection fault
>>>>>>>>
>>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>>>>>>>> return "(efault)";
>>>>>>>>
>>>>>>>> So instead, use kern_addr_valid() to validate the string pointer.
>>>>>>>
>>>>>>> How did you check that value of the (invalid string) pointer?
>>>>>>>
>>>>>>
>>>>>> In the bug scenario, the invalid string pointer was an out-of-bound
>>>>>> string pointer. While the OOB referencing is fixed,
>>>>>
>>>>> Could you please provide more details about the fixed OOB?
>>>>> What exact vsprintf()/printk() call was broken and eventually
>>>>> how it was fixed, please?
>>>>
>>>> For sensitive reason, I'd like to avoid mentioning the specific name of
>>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
>>>> and describe the precise nature of the issue.
>>>>
>>>> devX_attrY[] is a string array, declared and filled at compile time,
>>>> like
>>>> const char const devX_attrY[] = {
>>>> [ATTRY_A] = "Dev X AttributeY A",
>>>> [ATTRY_B] = "Dev X AttributeY B",
>>>> ...
>>>> [ATTRY_G] = "Dev X AttributeY G",
>>>> }
>>>> such that, when user "cat /sys/devices/systems/.../attry_1",
>>>> "Dev X AttributeY B" will show up in the terminal.
>>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
>>>>
>>>> The bug was that the index to the array was wrongfully produced,
>>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
>>>> calculation and that is not an upstream fix.
>>>>
>>>>>
>>>>>> the lingering issue
>>>>>> is that the kernel ought to be able to protect itself, as the pointer
>>>>>> contains a non-canonical address.
>>>>>
>>>>> Was the pointer used only by the vsprintf()?
>>>>> Or was it accessed also by another code, please?
>>>>
>>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case.
>>>> No other code uses the OOB pointer, verified both by code examination
>>>> and test.
>>>
>>> So, then the vsprintf() is _the_ point to crash and why should we hide that?
>>> Because of the crash you found the culprit, right? The efault will hide very
>>> important details.
>>>
>>> So to me it sounds like I like this change less and less...
>>
>> What about the existing check
>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>> return "(efault)";
>> ?
>
> Because it's _special_. We know that First page is equivalent to a NULL pointer
> and the last one is dedicated for so called error pointers. There are no more
> special exceptions to the addresses in the Linux kernel (I don't talk about
> alignment requirements by the certain architectures).
>
>> In an experiment just to print the raw OOB pointer values, I saw below
>> (the devX attrY stuff are substitutes of the real attributes, other
>> values and strings are verbatim copy from "dmesg"):
>>
>> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E
>> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F
>> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G
>> [ 3002.772349] devX_attrY[29]: (0) (null)
>> [ 3002.772350] devX_attrY[30]: (0) (null)
>> [ 3002.772351] devX_attrY[31]: (0) (null)
>> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval)
>> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval)
>> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval)
>> [ 3002.772355] devX_attrY[35]: (746e75) (einval)
>>
>> where starting from index 29 are all OOB pointers.
>>
>> As you can see, if the OOBs are NULL, "(null)" was printed due to the
>> existing checking, but when the OOBs are turned to non-canonical which
>> is detectable, the fact the pointer value deviates from
>> (ffffffff84d60aee + 4 * sizeof(void *))
>> evidently shown that the OOBs are detectable.
>>
>> The question then is why should the non-canonical OOBs be treated
>> differently from NULL and ERR_VALUE?
>
> Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> to see a bug as early as possible?
>
If the purpose is to see the bug as early as possible, then getting
"(efault)" from reading sysfs attribute would serve the purpose, right?
The fact an OOB pointer has already being turned into either NULL or
non-canonical value implies that *if* kernel code other than
vsprintf() references the pointer, it'll crash else where; but *if* no
other code referencing the pointer, why crash?
thanks,
-jane
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-19 18:36 ` Jane Chu
@ 2022-10-19 19:26 ` Andy Shevchenko
2022-10-19 20:16 ` Jane Chu
0 siblings, 1 reply; 18+ messages in thread
From: Andy Shevchenko @ 2022-10-19 19:26 UTC (permalink / raw)
To: Jane Chu
Cc: Petr Mladek, rostedt, senozhatsky, linux, linux-kernel, linux-mm,
Haakon Bugge, John Haxby
On Wed, Oct 19, 2022 at 06:36:07PM +0000, Jane Chu wrote:
> On 10/18/2022 1:49 PM, Andy Shevchenko wrote:
> > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
> >> On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
> >>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
> >>>> On 10/18/2022 5:45 AM, Petr Mladek wrote:
> >>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
> >>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
> >>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
> >>>>>>>> While debugging a separate issue, it was found that an invalid string
> >>>>>>>> pointer could very well contain a non-canical address, such as
> >>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
> >>>>>>>> to protect the kernel from crashing due to general protection fault
> >>>>>>>>
> >>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> >>>>>>>> return "(efault)";
> >>>>>>>>
> >>>>>>>> So instead, use kern_addr_valid() to validate the string pointer.
> >>>>>>>
> >>>>>>> How did you check that value of the (invalid string) pointer?
> >>>>>>>
> >>>>>>
> >>>>>> In the bug scenario, the invalid string pointer was an out-of-bound
> >>>>>> string pointer. While the OOB referencing is fixed,
> >>>>>
> >>>>> Could you please provide more details about the fixed OOB?
> >>>>> What exact vsprintf()/printk() call was broken and eventually
> >>>>> how it was fixed, please?
> >>>>
> >>>> For sensitive reason, I'd like to avoid mentioning the specific name of
> >>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
> >>>> and describe the precise nature of the issue.
> >>>>
> >>>> devX_attrY[] is a string array, declared and filled at compile time,
> >>>> like
> >>>> const char const devX_attrY[] = {
> >>>> [ATTRY_A] = "Dev X AttributeY A",
> >>>> [ATTRY_B] = "Dev X AttributeY B",
> >>>> ...
> >>>> [ATTRY_G] = "Dev X AttributeY G",
> >>>> }
> >>>> such that, when user "cat /sys/devices/systems/.../attry_1",
> >>>> "Dev X AttributeY B" will show up in the terminal.
> >>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
> >>>>
> >>>> The bug was that the index to the array was wrongfully produced,
> >>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
> >>>> calculation and that is not an upstream fix.
> >>>>
> >>>>>
> >>>>>> the lingering issue
> >>>>>> is that the kernel ought to be able to protect itself, as the pointer
> >>>>>> contains a non-canonical address.
> >>>>>
> >>>>> Was the pointer used only by the vsprintf()?
> >>>>> Or was it accessed also by another code, please?
> >>>>
> >>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case.
> >>>> No other code uses the OOB pointer, verified both by code examination
> >>>> and test.
> >>>
> >>> So, then the vsprintf() is _the_ point to crash and why should we hide that?
> >>> Because of the crash you found the culprit, right? The efault will hide very
> >>> important details.
> >>>
> >>> So to me it sounds like I like this change less and less...
> >>
> >> What about the existing check
> >> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> >> return "(efault)";
> >> ?
> >
> > Because it's _special_. We know that First page is equivalent to a NULL pointer
> > and the last one is dedicated for so called error pointers. There are no more
> > special exceptions to the addresses in the Linux kernel (I don't talk about
> > alignment requirements by the certain architectures).
> >
> >> In an experiment just to print the raw OOB pointer values, I saw below
> >> (the devX attrY stuff are substitutes of the real attributes, other
> >> values and strings are verbatim copy from "dmesg"):
> >>
> >> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E
> >> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F
> >> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G
> >> [ 3002.772349] devX_attrY[29]: (0) (null)
> >> [ 3002.772350] devX_attrY[30]: (0) (null)
> >> [ 3002.772351] devX_attrY[31]: (0) (null)
> >> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval)
> >> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval)
> >> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval)
> >> [ 3002.772355] devX_attrY[35]: (746e75) (einval)
> >>
> >> where starting from index 29 are all OOB pointers.
> >>
> >> As you can see, if the OOBs are NULL, "(null)" was printed due to the
> >> existing checking, but when the OOBs are turned to non-canonical which
> >> is detectable, the fact the pointer value deviates from
> >> (ffffffff84d60aee + 4 * sizeof(void *))
> >> evidently shown that the OOBs are detectable.
> >>
> >> The question then is why should the non-canonical OOBs be treated
> >> differently from NULL and ERR_VALUE?
> >
> > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> > to see a bug as early as possible?
> >
>
> If the purpose is to see the bug as early as possible, then getting
> "(efault)" from reading sysfs attribute would serve the purpose, right?
>
> The fact an OOB pointer has already being turned into either NULL or
> non-canonical value implies that *if* kernel code other than
> vsprintf() references the pointer, it'll crash else where;
No, not the case for error pointers and NULL.
> but *if* no
> other code referencing the pointer, why crash?
Because how else you can see the bug?! The trace will give you essential
information about registers, etc that gives you a hint what the _cause_ of the
crash. And we need that cause. The "(efault)" has not even a bit close to what
crash gives us.
So, this is my last message in the discussion.
Here is a formal NAK. Up to maintainers to decide what to do with this.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-19 19:26 ` Andy Shevchenko
@ 2022-10-19 20:16 ` Jane Chu
0 siblings, 0 replies; 18+ messages in thread
From: Jane Chu @ 2022-10-19 20:16 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Petr Mladek, rostedt, senozhatsky, linux, linux-kernel, linux-mm,
Haakon Bugge, John Haxby, Jane Chu
On 10/19/2022 12:26 PM, Andy Shevchenko wrote:
> On Wed, Oct 19, 2022 at 06:36:07PM +0000, Jane Chu wrote:
>> On 10/18/2022 1:49 PM, Andy Shevchenko wrote:
>>> On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
>>>> On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
>>>>> On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
>>>>>> On 10/18/2022 5:45 AM, Petr Mladek wrote:
>>>>>>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
>>>>>>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
>>>>>>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
>>>>>>>>>> While debugging a separate issue, it was found that an invalid string
>>>>>>>>>> pointer could very well contain a non-canical address, such as
>>>>>>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
>>>>>>>>>> to protect the kernel from crashing due to general protection fault
>>>>>>>>>>
>>>>>>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>>>>>>>>>> return "(efault)";
>>>>>>>>>>
>>>>>>>>>> So instead, use kern_addr_valid() to validate the string pointer.
>>>>>>>>>
>>>>>>>>> How did you check that value of the (invalid string) pointer?
>>>>>>>>>
>>>>>>>>
>>>>>>>> In the bug scenario, the invalid string pointer was an out-of-bound
>>>>>>>> string pointer. While the OOB referencing is fixed,
>>>>>>>
>>>>>>> Could you please provide more details about the fixed OOB?
>>>>>>> What exact vsprintf()/printk() call was broken and eventually
>>>>>>> how it was fixed, please?
>>>>>>
>>>>>> For sensitive reason, I'd like to avoid mentioning the specific name of
>>>>>> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
>>>>>> and describe the precise nature of the issue.
>>>>>>
>>>>>> devX_attrY[] is a string array, declared and filled at compile time,
>>>>>> like
>>>>>> const char const devX_attrY[] = {
>>>>>> [ATTRY_A] = "Dev X AttributeY A",
>>>>>> [ATTRY_B] = "Dev X AttributeY B",
>>>>>> ...
>>>>>> [ATTRY_G] = "Dev X AttributeY G",
>>>>>> }
>>>>>> such that, when user "cat /sys/devices/systems/.../attry_1",
>>>>>> "Dev X AttributeY B" will show up in the terminal.
>>>>>> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
>>>>>>
>>>>>> The bug was that the index to the array was wrongfully produced,
>>>>>> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
>>>>>> calculation and that is not an upstream fix.
>>>>>>
>>>>>>>
>>>>>>>> the lingering issue
>>>>>>>> is that the kernel ought to be able to protect itself, as the pointer
>>>>>>>> contains a non-canonical address.
>>>>>>>
>>>>>>> Was the pointer used only by the vsprintf()?
>>>>>>> Or was it accessed also by another code, please?
>>>>>>
>>>>>> The OOB pointer was used only by vsprintf() for the "cat" sysfs case.
>>>>>> No other code uses the OOB pointer, verified both by code examination
>>>>>> and test.
>>>>>
>>>>> So, then the vsprintf() is _the_ point to crash and why should we hide that?
>>>>> Because of the crash you found the culprit, right? The efault will hide very
>>>>> important details.
>>>>>
>>>>> So to me it sounds like I like this change less and less...
>>>>
>>>> What about the existing check
>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>>>> return "(efault)";
>>>> ?
>>>
>>> Because it's _special_. We know that First page is equivalent to a NULL pointer
>>> and the last one is dedicated for so called error pointers. There are no more
>>> special exceptions to the addresses in the Linux kernel (I don't talk about
>>> alignment requirements by the certain architectures).
>>>
>>>> In an experiment just to print the raw OOB pointer values, I saw below
>>>> (the devX attrY stuff are substitutes of the real attributes, other
>>>> values and strings are verbatim copy from "dmesg"):
>>>>
>>>> [ 3002.772329] devX_attrY[26]: (ffffffff84d60ad3) Dev X AttributeY E
>>>> [ 3002.772346] devX_attrY[27]: (ffffffff84d60ae4) Dev X AttributeY F
>>>> [ 3002.772347] devX_attrY[28]: (ffffffff84d60aee) Dev X AttributeY G
>>>> [ 3002.772349] devX_attrY[29]: (0) (null)
>>>> [ 3002.772350] devX_attrY[30]: (0) (null)
>>>> [ 3002.772351] devX_attrY[31]: (0) (null)
>>>> [ 3002.772352] devX_attrY[32]: (7665645f63616465) (einval)
>>>> [ 3002.772354] devX_attrY[33]: (646e61685f656369) (einval)
>>>> [ 3002.772355] devX_attrY[34]: (6f635f65755f656c) (einval)
>>>> [ 3002.772355] devX_attrY[35]: (746e75) (einval)
>>>>
>>>> where starting from index 29 are all OOB pointers.
>>>>
>>>> As you can see, if the OOBs are NULL, "(null)" was printed due to the
>>>> existing checking, but when the OOBs are turned to non-canonical which
>>>> is detectable, the fact the pointer value deviates from
>>>> (ffffffff84d60aee + 4 * sizeof(void *))
>>>> evidently shown that the OOBs are detectable.
>>>>
>>>> The question then is why should the non-canonical OOBs be treated
>>>> differently from NULL and ERR_VALUE?
>>>
>>> Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
>>> to see a bug as early as possible?
>>>
>>
>> If the purpose is to see the bug as early as possible, then getting
>> "(efault)" from reading sysfs attribute would serve the purpose, right?
>>
>> The fact an OOB pointer has already being turned into either NULL or
>> non-canonical value implies that *if* kernel code other than
>> vsprintf() references the pointer, it'll crash else where;
>
> No, not the case for error pointers and NULL.
Sorry, I don't understand, what about Oops from NUll pointer dereference?
>
>> but *if* no
>> other code referencing the pointer, why crash?
>
> Because how else you can see the bug?! The trace will give you essential
> information about registers, etc that gives you a hint what the _cause_ of the
> crash. And we need that cause. The "(efault)" has not even a bit close to what
> crash gives us.
>
> So, this is my last message in the discussion.
>
> Here is a formal NAK. Up to maintainers to decide what to do with this.
>
Sigh, but thanks for taking the time articulating your point of view.
-jane
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-18 20:49 ` Andy Shevchenko
2022-10-19 10:43 ` Haakon Bugge
2022-10-19 18:36 ` Jane Chu
@ 2022-10-20 7:44 ` Petr Mladek
2022-10-20 9:18 ` Petr Mladek
2022-10-20 13:57 ` Andy Shevchenko
2 siblings, 2 replies; 18+ messages in thread
From: Petr Mladek @ 2022-10-20 7:44 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Jane Chu, rostedt, senozhatsky, linux, linux-kernel, linux-mm,
Haakon Bugge, John Haxby
On Tue 2022-10-18 23:49:27, Andy Shevchenko wrote:
> On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
> > On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
> > > On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
> > >> On 10/18/2022 5:45 AM, Petr Mladek wrote:
> > >>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
> > >>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
> > >>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
> > >>>>>> While debugging a separate issue, it was found that an invalid string
> > >>>>>> pointer could very well contain a non-canical address, such as
> > >>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
> > >>>>>> to protect the kernel from crashing due to general protection fault
> > >>>>>>
> > >>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > >>>>>> return "(efault)";
> > >>>>>>
> > >>>>>> So instead, use kern_addr_valid() to validate the string pointer.
> > >>>>>
> > >>>>> How did you check that value of the (invalid string) pointer?
> > >>>>>
> > >>>>
> > >>>> In the bug scenario, the invalid string pointer was an out-of-bound
> > >>>> string pointer. While the OOB referencing is fixed,
> > >>>
> > >>> Could you please provide more details about the fixed OOB?
> > >>> What exact vsprintf()/printk() call was broken and eventually
> > >>> how it was fixed, please?
> > >>
> > >> For sensitive reason, I'd like to avoid mentioning the specific name of
> > >> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
> > >> and describe the precise nature of the issue.
> > >>
> > >> devX_attrY[] is a string array, declared and filled at compile time,
> > >> like
> > >> const char const devX_attrY[] = {
> > >> [ATTRY_A] = "Dev X AttributeY A",
> > >> [ATTRY_B] = "Dev X AttributeY B",
> > >> ...
> > >> [ATTRY_G] = "Dev X AttributeY G",
> > >> }
> > >> such that, when user "cat /sys/devices/systems/.../attry_1",
> > >> "Dev X AttributeY B" will show up in the terminal.
> > >> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
> > >>
> > >> The bug was that the index to the array was wrongfully produced,
> > >> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
> > >> calculation and that is not an upstream fix.
I see. printk()/vsprintf() is the only code that accesses this pointer.
If vsprintf() survives than the system survives.
> > As you can see, if the OOBs are NULL, "(null)" was printed due to the
> > existing checking, but when the OOBs are turned to non-canonical which
> > is detectable, the fact the pointer value deviates from
> > (ffffffff84d60aee + 4 * sizeof(void *))
> > evidently shown that the OOBs are detectable.
> >
> > The question then is why should the non-canonical OOBs be treated
> > differently from NULL and ERR_VALUE?
>
> Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> to see a bug as early as possible?
I do not agree here. Kernel tries to survive many situations when
thighs does not work as expected. It prints a warning so that
users/developers are aware of the problem and could fix it.
In our case, the crash happened when reading a sysfs file.
IMHO, it is much better to show (-EINVAL) than crash. The bug
when accessing devX_attrY[] does not affect the stability of
the system at all.
And the broken string might be passed in a very rare case,
e.g. in an error path. So that it might be hard to catch
when testing.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-20 7:44 ` Petr Mladek
@ 2022-10-20 9:18 ` Petr Mladek
2022-10-20 13:57 ` Andy Shevchenko
1 sibling, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2022-10-20 9:18 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Jane Chu, rostedt, senozhatsky, linux, linux-kernel, linux-mm,
Haakon Bugge, John Haxby
On Thu 2022-10-20 09:44:06, Petr Mladek wrote:
> On Tue 2022-10-18 23:49:27, Andy Shevchenko wrote:
> > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
> > > On 10/18/2022 1:07 PM, Andy Shevchenko wrote:
> > > > On Tue, Oct 18, 2022 at 06:56:31PM +0000, Jane Chu wrote:
> > > >> On 10/18/2022 5:45 AM, Petr Mladek wrote:
> > > >>> On Mon 2022-10-17 19:31:53, Jane Chu wrote:
> > > >>>> On 10/17/2022 12:25 PM, Andy Shevchenko wrote:
> > > >>>>> On Mon, Oct 17, 2022 at 01:16:11PM -0600, Jane Chu wrote:
> > > >>>>>> While debugging a separate issue, it was found that an invalid string
> > > >>>>>> pointer could very well contain a non-canical address, such as
> > > >>>>>> 0x7665645f63616465. In that case, this line of defense isn't enough
> > > >>>>>> to protect the kernel from crashing due to general protection fault
> > > >>>>>>
> > > >>>>>> if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > > >>>>>> return "(efault)";
> > > >>>>>>
> > > >>>>>> So instead, use kern_addr_valid() to validate the string pointer.
> > > >>>>>
> > > >>>>> How did you check that value of the (invalid string) pointer?
> > > >>>>>
> > > >>>>
> > > >>>> In the bug scenario, the invalid string pointer was an out-of-bound
> > > >>>> string pointer. While the OOB referencing is fixed,
> > > >>>
> > > >>> Could you please provide more details about the fixed OOB?
> > > >>> What exact vsprintf()/printk() call was broken and eventually
> > > >>> how it was fixed, please?
> > > >>
> > > >> For sensitive reason, I'd like to avoid mentioning the specific name of
> > > >> the sysfs attribute in the bug, instead, just call it "devX_attrY[]",
> > > >> and describe the precise nature of the issue.
> > > >>
> > > >> devX_attrY[] is a string array, declared and filled at compile time,
> > > >> like
> > > >> const char const devX_attrY[] = {
> > > >> [ATTRY_A] = "Dev X AttributeY A",
> > > >> [ATTRY_B] = "Dev X AttributeY B",
> > > >> ...
> > > >> [ATTRY_G] = "Dev X AttributeY G",
> > > >> }
> > > >> such that, when user "cat /sys/devices/systems/.../attry_1",
> > > >> "Dev X AttributeY B" will show up in the terminal.
> > > >> That's it, no more reference to the pointer devX_attrY[ATTRY_B] after that.
> > > >>
> > > >> The bug was that the index to the array was wrongfully produced,
> > > >> leading up to OOB, e.g. devX_attrY[11]. The fix was to fix the
> > > >> calculation and that is not an upstream fix.
>
> I see. printk()/vsprintf() is the only code that accesses this pointer.
> If vsprintf() survives than the system survives.
>
> > > As you can see, if the OOBs are NULL, "(null)" was printed due to the
> > > existing checking, but when the OOBs are turned to non-canonical which
> > > is detectable, the fact the pointer value deviates from
> > > (ffffffff84d60aee + 4 * sizeof(void *))
> > > evidently shown that the OOBs are detectable.
> > >
> > > The question then is why should the non-canonical OOBs be treated
> > > differently from NULL and ERR_VALUE?
> >
> > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> > to see a bug as early as possible?
>
> I do not agree here. Kernel tries to survive many situations when
> thighs does not work as expected. It prints a warning so that
> users/developers are aware of the problem and could fix it.
>
> In our case, the crash happened when reading a sysfs file.
> IMHO, it is much better to show (-EINVAL) than crash. The bug
> when accessing devX_attrY[] does not affect the stability of
> the system at all.
>
> And the broken string might be passed in a very rare case,
> e.g. in an error path. So that it might be hard to catch
> when testing.
That said, there is definitely a difference between NULL or error code
code and a random pointer address.
The pointers in ERR_RANGE are likely to stay in this range.
It means that this pointer is hardly usable as a security
attack.
On the other hand, "random" pointer has a bigger chance to be
used for a security attack. From this POV, it is more important
to catch and fix random pointer issues. And shoving just -EINVAL
might not be enough to catch attention.
I guess that this was what Andy wanted to explain. And kernel
crash would definitely catch attention. Showing some warning
with KERN_WARNING or even WARN() might be an alternative.
Anyway, I think that this patch is not worth it:
+ kern_addr_valid() always succeeds on all architectures
except on x86_64. It means that the check would help
only on x86_64.
+ kern_addr_valid() always fails on x86 when build with SPARSEMEM.
This is not acceptable for vsprintf().
+ the situation when only vsprintf() would access the wrong pointer
are rare. In most cases, the pointer is later used and the kernel
crashes anyway.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] vsprintf: protect kernel from panic due to non-canonical pointer dereference
2022-10-20 7:44 ` Petr Mladek
2022-10-20 9:18 ` Petr Mladek
@ 2022-10-20 13:57 ` Andy Shevchenko
1 sibling, 0 replies; 18+ messages in thread
From: Andy Shevchenko @ 2022-10-20 13:57 UTC (permalink / raw)
To: Petr Mladek
Cc: Jane Chu, rostedt, senozhatsky, linux, linux-kernel, linux-mm,
Haakon Bugge, John Haxby
On Thu, Oct 20, 2022 at 09:44:05AM +0200, Petr Mladek wrote:
> On Tue 2022-10-18 23:49:27, Andy Shevchenko wrote:
> > On Tue, Oct 18, 2022 at 08:30:01PM +0000, Jane Chu wrote:
...
> > Obviously, to see the crash. And let kernel _to crash_. Isn't it what we need
> > to see a bug as early as possible?
>
> I do not agree here. Kernel tries to survive many situations when
> thighs does not work as expected. It prints a warning so that
> users/developers are aware of the problem and could fix it.
How the user will know what the root cause and how to fix it? The crash
report will give all needed information, the "(eXXXXXX)" will hide it all,
which I consider inappropriate approach.
I.o.w. consider "(eXXXXXX)" vs. something like "your stuff crashed kernel
because of misaligned / etc pointer which has value of 0xXXXXXXXX and other
registers have these values" and so on, so on...
> In our case, the crash happened when reading a sysfs file.
> IMHO, it is much better to show (-EINVAL) than crash. The bug
> when accessing devX_attrY[] does not affect the stability of
> the system at all.
When I got "eXXXXX" from cat /sys/... I think "OK, something went wrong,
I shouldn't really take it seriously". And completely different feelings
when you got a crash, right?
> And the broken string might be passed in a very rare case,
> e.g. in an error path. So that it might be hard to catch
> when testing.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 18+ messages in thread