All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
       [not found] ` <CAHk-=whyNbpBtPyoS=wh4nVgBtUBpihcOT+LFdEw369kYjATaQ@mail.gmail.com>
@ 2019-01-06 19:18   ` Linus Torvalds
  2019-01-06 20:24     ` Guenter Roeck
  2019-01-07  2:39   ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2019-01-06 19:18 UTC (permalink / raw)
  To: Guenter Roeck, Matt Turner, Yoshinori Sato; +Cc: Linux List Kernel Mailing

[-- Attachment #1: Type: text/plain, Size: 2775 bytes --]

[ Re-sending the message because my first reply bounced - Guenther had
mis-typed the lkml address ]

On Sun, Jan 6, 2019 at 10:09 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> All alpha and sh4 (big and little endian) images fail to boot in qemu
> with this patch applied. Reverting it fixes the problem.

Funky. 99% of that patch is a complete no-op on non-x86.

The one exception is the strncpy_from_user() and strnlen_user() cases,
which didn't use to do access_ok() at all, and now essentially do.

But I think I see what may be the problem. I think the alpha version
of "access_ok()" is buggy.

Lookie here:

  #define __access_ok(addr, size) \
        ((get_fs().seg & (addr | size | (addr+size))) == 0)

and what it basically tests is of any of the high bits get set (the
USER_DS value is 0xfffffc0000000000).

And that's completely wrong for the "addr+size" check. It's off-by-one
for the case where we check to the very end of the user address space,
which is exactly what the strn*_user() functions do.

Why? Because "addr+size" will be exactly the size of the address
space, so trying to access the last byte of the user address space
will *fail* the __access_ok() check, even though it shouldn't.

So it's not really that that commit is buggy in itself, but it
triggers that off-by-one error in access_ok().

Side note: that alpha macro is buggy for another reason too: it
re-uses the arguments twice.

And SH has almost the exact same bug:

  #define __addr_ok(addr) \
        ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)

so far so good: yes, a user address must be below the limit. But then:

  #define __access_ok(addr, size)         \
        (__addr_ok((addr) + (size)))

is wrong with the exact same off-by-one case: the case when
"addr+size" is exactly _equal_ to the limit is actually perfectly
fine.

The SH version is actually seriously buggy in another way: it doesn't
actually check for overflow, even though it did copy the _comment_
that talks about overflow.

So it turns out that both SH and alpha actually have completely
buggered implementations of access_ok(), but they happened to work
(although the SH overflow one is a serious serious security bug, not
that anybody likely cares about SH security)

Ho humm.

Maybe something like the attached patch? Entirely untested, I don't
have a cross-build environment, much less a boot setup.

It isn't trying to be clever, the end address is based on this logic:

        unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;    \

which basically says "subtract one unless the length was zero".

For a lot of access_ok() users the length is a constant, so this isn't
actually as expensive as it initially looks.

Does that fix things for you?

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/x-patch, Size: 1669 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
  2019-01-06 19:18   ` [PATCH] make 'user_access_begin()' do 'access_ok()' Linus Torvalds
@ 2019-01-06 20:24     ` Guenter Roeck
  0 siblings, 0 replies; 7+ messages in thread
From: Guenter Roeck @ 2019-01-06 20:24 UTC (permalink / raw)
  To: Linus Torvalds, Matt Turner, Yoshinori Sato; +Cc: Linux List Kernel Mailing

On 1/6/19 11:18 AM, Linus Torvalds wrote:
> [ Re-sending the message because my first reply bounced - Guenther had
> mis-typed the lkml address ]
> 

Sigh. That _always_ happens to me when typing fast. Sorry.

> On Sun, Jan 6, 2019 at 10:09 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> All alpha and sh4 (big and little endian) images fail to boot in qemu
>> with this patch applied. Reverting it fixes the problem.
> 
> Funky. 99% of that patch is a complete no-op on non-x86.
> 
> The one exception is the strncpy_from_user() and strnlen_user() cases,
> which didn't use to do access_ok() at all, and now essentially do.
> 
> But I think I see what may be the problem. I think the alpha version
> of "access_ok()" is buggy.
> 
> Lookie here:
> 
>    #define __access_ok(addr, size) \
>          ((get_fs().seg & (addr | size | (addr+size))) == 0)
> 
> and what it basically tests is of any of the high bits get set (the
> USER_DS value is 0xfffffc0000000000).
> 
> And that's completely wrong for the "addr+size" check. It's off-by-one
> for the case where we check to the very end of the user address space,
> which is exactly what the strn*_user() functions do.
> 
> Why? Because "addr+size" will be exactly the size of the address
> space, so trying to access the last byte of the user address space
> will *fail* the __access_ok() check, even though it shouldn't.
> 
> So it's not really that that commit is buggy in itself, but it
> triggers that off-by-one error in access_ok().
> 
> Side note: that alpha macro is buggy for another reason too: it
> re-uses the arguments twice.
> 
> And SH has almost the exact same bug:
> 
>    #define __addr_ok(addr) \
>          ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)
> 
> so far so good: yes, a user address must be below the limit. But then:
> 
>    #define __access_ok(addr, size)         \
>          (__addr_ok((addr) + (size)))
> 
> is wrong with the exact same off-by-one case: the case when
> "addr+size" is exactly _equal_ to the limit is actually perfectly
> fine.
> 
> The SH version is actually seriously buggy in another way: it doesn't
> actually check for overflow, even though it did copy the _comment_
> that talks about overflow.
> 
> So it turns out that both SH and alpha actually have completely
> buggered implementations of access_ok(), but they happened to work
> (although the SH overflow one is a serious serious security bug, not
> that anybody likely cares about SH security)
> 
> Ho humm.
> 
> Maybe something like the attached patch? Entirely untested, I don't
> have a cross-build environment, much less a boot setup.
> 
> It isn't trying to be clever, the end address is based on this logic:
> 
>          unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;    \
> 
> which basically says "subtract one unless the length was zero".
> 
> For a lot of access_ok() users the length is a constant, so this isn't
> actually as expensive as it initially looks.
> 
> Does that fix things for you?
> 

Yes, it does, for both alpha and sh (little and big endian).

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
       [not found] ` <CAHk-=whyNbpBtPyoS=wh4nVgBtUBpihcOT+LFdEw369kYjATaQ@mail.gmail.com>
  2019-01-06 19:18   ` [PATCH] make 'user_access_begin()' do 'access_ok()' Linus Torvalds
@ 2019-01-07  2:39   ` Linus Torvalds
  2019-01-07  4:05     ` Guenter Roeck
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2019-01-07  2:39 UTC (permalink / raw)
  To: Guenter Roeck, Matt Turner, Yoshinori Sato, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Chris Zankel, Max Filippov
  Cc: linux-kernel, linux-arch

On Sun, Jan 6, 2019 at 11:15 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But I think I see what may be the problem. I think the alpha version
> of "access_ok()" is buggy.

Ok, so the alpha and SH cases got fixed (hopefully correctly) by
commit 94bd8a05cd4d ("Fix 'acccess_ok()' on alpha and SH") but I'm
looking around, and finding some suspicious access_ok() cases
elsewhere too.

Adding a few more people just to ask them to check their situation..

MIPS does a very similar

      return (get_fs().seg & (addr | (addr + size) | __ua_size(size))) == 0;

thing to what alpha used to do, where "addr+size" may have the same
off-by-one error. At least MIPS uses an inline function, so it doesn't
have the "arguments used twice" issue.

nds32 seems to get the range check right, but has the "macro arguments
used twice" problem.

nios2 seems to have all the bugs alpha had.

openrisc has the "macro arguments used twice" problem. And also gets
the parenthesis *completely* wrong when casting, resulting in random
behavior. This code:

  #define access_ok(addr, size) \
      __range_ok((unsigned long)addr, (unsigned long)size)

does all kinds of odd things if "addr" or "size" is not a simple
expression, since the cast tends to have higher precedence than pretty
much anything else.

xtensa has the "macro arguments used twice" problem.

So it looks like a lot of architectures have problems in access_ok.
Can we have people look at it?

See that commit 94bd8a05cd4d for a longer explanation of what alpha
and SH got wrong.

NOTE! I only took a very quick look, the above may be incomplete
and/or actively wrong. Maybe I claimed something was buggy that
wasn't, but please take a look.

              Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
  2019-01-07  2:39   ` Linus Torvalds
@ 2019-01-07  4:05     ` Guenter Roeck
  2019-01-07 18:02       ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Guenter Roeck @ 2019-01-07  4:05 UTC (permalink / raw)
  To: Linus Torvalds, Matt Turner, Yoshinori Sato, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Chris Zankel, Max Filippov
  Cc: linux-kernel, linux-arch

On 1/6/19 6:39 PM, Linus Torvalds wrote:
> On Sun, Jan 6, 2019 at 11:15 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> But I think I see what may be the problem. I think the alpha version
>> of "access_ok()" is buggy.
> 
> Ok, so the alpha and SH cases got fixed (hopefully correctly) by
> commit 94bd8a05cd4d ("Fix 'acccess_ok()' on alpha and SH") but I'm
> looking around, and finding some suspicious access_ok() cases
> elsewhere too.
> 
> Adding a few more people just to ask them to check their situation..
> 
> MIPS does a very similar
> 
>        return (get_fs().seg & (addr | (addr + size) | __ua_size(size))) == 0;
> 
> thing to what alpha used to do, where "addr+size" may have the same
> off-by-one error. At least MIPS uses an inline function, so it doesn't
> have the "arguments used twice" issue.
> 
> nds32 seems to get the range check right, but has the "macro arguments
> used twice" problem.
> 
> nios2 seems to have all the bugs alpha had.
> 
> openrisc has the "macro arguments used twice" problem. And also gets
> the parenthesis *completely* wrong when casting, resulting in random
> behavior. This code:
> 
>    #define access_ok(addr, size) \
>        __range_ok((unsigned long)addr, (unsigned long)size)
> 
> does all kinds of odd things if "addr" or "size" is not a simple
> expression, since the cast tends to have higher precedence than pretty
> much anything else.
> 
> xtensa has the "macro arguments used twice" problem.
> 
> So it looks like a lot of architectures have problems in access_ok.
> Can we have people look at it?
> 

Of the above, my test system boots images for the following architectures
in qemu.

- mips (32/64 bit, big/little endian)
- nios2
- openrisc
- xtensa (mmu and nommu)

None of those experiences boot failures, so whatever is wrong there
doesn't cause an immediate crash. I'll be happy to run some manual
tests if someone tells me what to do.

Also, I'll be happy to provide configurations, root file systems,
and qemu command lines as needed.

Guenter

> See that commit 94bd8a05cd4d for a longer explanation of what alpha
> and SH got wrong.
> 
> NOTE! I only took a very quick look, the above may be incomplete
> and/or actively wrong. Maybe I claimed something was buggy that
> wasn't, but please take a look.
> 
>                Linus
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
  2019-01-07  4:05     ` Guenter Roeck
@ 2019-01-07 18:02       ` Linus Torvalds
  2019-01-07 18:05         ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2019-01-07 18:02 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Matt Turner, Yoshinori Sato, Paul Burton, Greentime Hu,
	Ley Foon Tan, Jonas Bonn, Stefan Kristiansson, Stafford Horne,
	Chris Zankel, Max Filippov, linux-kernel, linux-arch

On Sun, Jan 6, 2019 at 8:05 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> Of the above, my test system boots images for the following architectures
> in qemu.
>
> - mips (32/64 bit, big/little endian)
> - nios2
> - openrisc
> - xtensa (mmu and nommu)

So most of those are "only" the "macro arguments used twice" problem
(although openrisc also does the "arguments not quoted right"). That
doesn't cause problems with the new commit, it's an independent issue
that could cause random problems elsewhere

The nios2 access_ok() case is the same bug as alpha has, but it turns
out to be hidden by the fact that the user/kernel limit is at
0x80000000, but nios2 does:

    # define TASK_SIZE           0x7FFF0000UL

so it doesn't actually allow anything close to the top of the user
address space anyway. So the access_ok() check uses a different limit
than the TASK_SIZE, which is odd, but does hide the "last byte of the
user address space" bug.

That may be intentional, and regardless, it's generally a good idea to
not allow mapping of the last page in the address space, exactly to
avoid the border conditions.

MIPS has some of the same "saved by mistake" behavior, but in that
case it really looks to be pure luck, not design. In particular,
MIPS32 has

  #ifdef CONFIG_32BIT
  #ifdef CONFIG_KVM_GUEST
  /* User space process size is limited to 1GB in KVM Guest Mode */
  #define TASK_SIZE       0x3fff8000UL
  #else
  /*
   * User space process size: 2GB. This is hardcoded into a few places,
   * so don't change it unless you know what you are doing.
   */
  #define TASK_SIZE       0x80000000UL
  #endif

and I suspect you tested MIPS32 with that KVM_GUEST config option.

Because MIPS32 with TASK_SIZE 0x80000000UL really looks like it has
the off-by-one error that I think makes access_ok() fail for the "last
byte of the user address space" case.

HOWEVER. MIPS32 is actually going to boot for that case with the
recent patches for a simple other accidental reason: despite the
access_ok() bug, it will never trigger it in strncpy_from_user(). Why?
Because MIPS doesn't use the generic version, but its own hardcoded
assembler one.

I suspect the MIPS assembler version is actually *worse* than the
generic one (it looks like it does things one byte at a time), but it
hides the bug in access_ok()...

The other architctures you tested only have the "technically wrong,
but works" bugs that don't matter for the new stricter access_ok()
tests.

                Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
  2019-01-07 18:02       ` Linus Torvalds
@ 2019-01-07 18:05         ` Linus Torvalds
  2019-01-07 21:49           ` Stafford Horne
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2019-01-07 18:05 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Matt Turner, Yoshinori Sato, Paul Burton, Greentime Hu,
	Ley Foon Tan, Jonas Bonn, Stefan Kristiansson, Stafford Horne,
	Chris Zankel, Max Filippov, linux-arch,
	Linux List Kernel Mailing

Gaah. Re-sending this for the kernel mailing list just for posterity.
I keep replying to emails that had the mailing list address wrong, and
then my reply will have it wrong too.

I will learn to fix up the address just in time for this thread to die
out, I suspect.

                Linus

On Mon, Jan 7, 2019 at 10:02 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, Jan 6, 2019 at 8:05 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > Of the above, my test system boots images for the following architectures
> > in qemu.
> >
> > - mips (32/64 bit, big/little endian)
> > - nios2
> > - openrisc
> > - xtensa (mmu and nommu)
>
> So most of those are "only" the "macro arguments used twice" problem
> (although openrisc also does the "arguments not quoted right"). That
> doesn't cause problems with the new commit, it's an independent issue
> that could cause random problems elsewhere
>
> The nios2 access_ok() case is the same bug as alpha has, but it turns
> out to be hidden by the fact that the user/kernel limit is at
> 0x80000000, but nios2 does:
>
>     # define TASK_SIZE           0x7FFF0000UL
>
> so it doesn't actually allow anything close to the top of the user
> address space anyway. So the access_ok() check uses a different limit
> than the TASK_SIZE, which is odd, but does hide the "last byte of the
> user address space" bug.
>
> That may be intentional, and regardless, it's generally a good idea to
> not allow mapping of the last page in the address space, exactly to
> avoid the border conditions.
>
> MIPS has some of the same "saved by mistake" behavior, but in that
> case it really looks to be pure luck, not design. In particular,
> MIPS32 has
>
>   #ifdef CONFIG_32BIT
>   #ifdef CONFIG_KVM_GUEST
>   /* User space process size is limited to 1GB in KVM Guest Mode */
>   #define TASK_SIZE       0x3fff8000UL
>   #else
>   /*
>    * User space process size: 2GB. This is hardcoded into a few places,
>    * so don't change it unless you know what you are doing.
>    */
>   #define TASK_SIZE       0x80000000UL
>   #endif
>
> and I suspect you tested MIPS32 with that KVM_GUEST config option.
>
> Because MIPS32 with TASK_SIZE 0x80000000UL really looks like it has
> the off-by-one error that I think makes access_ok() fail for the "last
> byte of the user address space" case.
>
> HOWEVER. MIPS32 is actually going to boot for that case with the
> recent patches for a simple other accidental reason: despite the
> access_ok() bug, it will never trigger it in strncpy_from_user(). Why?
> Because MIPS doesn't use the generic version, but its own hardcoded
> assembler one.
>
> I suspect the MIPS assembler version is actually *worse* than the
> generic one (it looks like it does things one byte at a time), but it
> hides the bug in access_ok()...
>
> The other architctures you tested only have the "technically wrong,
> but works" bugs that don't matter for the new stricter access_ok()
> tests.
>
>                 Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] make 'user_access_begin()' do 'access_ok()'
  2019-01-07 18:05         ` Linus Torvalds
@ 2019-01-07 21:49           ` Stafford Horne
  0 siblings, 0 replies; 7+ messages in thread
From: Stafford Horne @ 2019-01-07 21:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Matt Turner, Yoshinori Sato, Paul Burton,
	Greentime Hu, Ley Foon Tan, Jonas Bonn, Stefan Kristiansson,
	Chris Zankel, Max Filippov, linux-arch,
	Linux List Kernel Mailing

On Mon, Jan 07, 2019 at 10:05:12AM -0800, Linus Torvalds wrote:
> Gaah. Re-sending this for the kernel mailing list just for posterity.
> I keep replying to emails that had the mailing list address wrong, and
> then my reply will have it wrong too.
> 
> I will learn to fix up the address just in time for this thread to die
> out, I suspect.
> 
>                 Linus
> 
> On Mon, Jan 7, 2019 at 10:02 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Sun, Jan 6, 2019 at 8:05 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > >
> > > Of the above, my test system boots images for the following architectures
> > > in qemu.
> > >
> > > - mips (32/64 bit, big/little endian)
> > > - nios2
> > > - openrisc
> > > - xtensa (mmu and nommu)
> >
> > So most of those are "only" the "macro arguments used twice" problem
> > (although openrisc also does the "arguments not quoted right"). That
> > doesn't cause problems with the new commit, it's an independent issue
> > that could cause random problems elsewhere

Linus,  Thanks for pointing this out and Guenter thanks for testing.

I ack the issue on OpenRISC and will send a patch for review in a day or so.
I'll also do an audit of openrisc to see if we have similar issues.

Next to access_ok() is __addr_ok() which seems to have the similar quoting
issue, but I don't see it being used anywhere, it's also defined and not
used on: arm, x86 and csky.   It is used on SH.

OK to remove __addr_ok()?

-Stafford

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-07 21:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190106180927.GA11993@roeck-us.net>
     [not found] ` <CAHk-=whyNbpBtPyoS=wh4nVgBtUBpihcOT+LFdEw369kYjATaQ@mail.gmail.com>
2019-01-06 19:18   ` [PATCH] make 'user_access_begin()' do 'access_ok()' Linus Torvalds
2019-01-06 20:24     ` Guenter Roeck
2019-01-07  2:39   ` Linus Torvalds
2019-01-07  4:05     ` Guenter Roeck
2019-01-07 18:02       ` Linus Torvalds
2019-01-07 18:05         ` Linus Torvalds
2019-01-07 21:49           ` Stafford Horne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.