All of lore.kernel.org
 help / color / mirror / Atom feed
* vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 10:40 ` Hector Martin 'marcan'
  0 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-10 10:40 UTC (permalink / raw)
  To: luto; +Cc: LKML, kernel-hardening, x86

As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
`man 7 vdso`) make no mention of how much stack the vDSO functions are
allowed to use. They just say "the usual C ABI", which makes no guarantees.

It turns out that Go has been assuming that those functions use less
than 104 bytes of stack space, because it calls them directly on its
tiny stack allocations with no guard pages or other hardware overflow
protection [1]. On most systems, this is fine.

However, on my system the stars aligned and turned it into a
nondeterministic crash. I use Gentoo Hardened, which builds its
toolchain with -fstack-check on by default. It turns out that with the
combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
not marked inline, so it's perfectly within its right not to do that,
though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
even though that nominally gives it greater freedom *not* to inline
things marked inline). That turns __vdso_clock_gettime and
__vdso_gettimeofday into non-leaf functions, and GCC then inserts a
stack probe (full objdump at [2]):

0000000000000030 <__vdso_clock_gettime>:
  30:	55                   	push   %rbp
  31:	48 89 e5             	mov    %rsp,%rbp
  34:	48 81 ec 20 10 00 00 	sub    $0x1020,%rsp
  3b:	48 83 0c 24 00       	orq    $0x0,(%rsp)
  40:	48 81 c4 20 10 00 00 	add    $0x1020,%rsp

That silently overflows the Go stack. "orq 0" does nothing as long as
the page is mapped, but it's not atomic. It turns out that sometimes
(pretty often on my box) that races another thread accessing the same
location and corrupts memory.

The stack probe sounds unnecessary, since it only calls vread_tsc and
that can't ever skip over more than a page of stack. In fact I don't
even know why it does the probe; I thought the point of stack probes was
to poke the stack on allocations >4K to ensure the guard page isn't
skipped, but none of these functions use more than a few bytes of stack
space. Nonetheless, none of this is wrong per se; the current vDSO spec
makes no guarantees about stack usage.

The question is, should it? Should the vDSO spec set a hard limit on
stack consumption that userspace can rely on, and perhaps inline
everything and/or disable -fstack-check to avoid the stack probes?

[1] https://github.com/golang/go/issues/20427#issuecomment-343255844
[2] https://marcan.st/paste/HCVuLG6T.txt

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [kernel-hardening] vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 10:40 ` Hector Martin 'marcan'
  0 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-10 10:40 UTC (permalink / raw)
  To: luto; +Cc: LKML, kernel-hardening, x86

As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
`man 7 vdso`) make no mention of how much stack the vDSO functions are
allowed to use. They just say "the usual C ABI", which makes no guarantees.

It turns out that Go has been assuming that those functions use less
than 104 bytes of stack space, because it calls them directly on its
tiny stack allocations with no guard pages or other hardware overflow
protection [1]. On most systems, this is fine.

However, on my system the stars aligned and turned it into a
nondeterministic crash. I use Gentoo Hardened, which builds its
toolchain with -fstack-check on by default. It turns out that with the
combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
not marked inline, so it's perfectly within its right not to do that,
though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
even though that nominally gives it greater freedom *not* to inline
things marked inline). That turns __vdso_clock_gettime and
__vdso_gettimeofday into non-leaf functions, and GCC then inserts a
stack probe (full objdump at [2]):

0000000000000030 <__vdso_clock_gettime>:
  30:	55                   	push   %rbp
  31:	48 89 e5             	mov    %rsp,%rbp
  34:	48 81 ec 20 10 00 00 	sub    $0x1020,%rsp
  3b:	48 83 0c 24 00       	orq    $0x0,(%rsp)
  40:	48 81 c4 20 10 00 00 	add    $0x1020,%rsp

That silently overflows the Go stack. "orq 0" does nothing as long as
the page is mapped, but it's not atomic. It turns out that sometimes
(pretty often on my box) that races another thread accessing the same
location and corrupts memory.

The stack probe sounds unnecessary, since it only calls vread_tsc and
that can't ever skip over more than a page of stack. In fact I don't
even know why it does the probe; I thought the point of stack probes was
to poke the stack on allocations >4K to ensure the guard page isn't
skipped, but none of these functions use more than a few bytes of stack
space. Nonetheless, none of this is wrong per se; the current vDSO spec
makes no guarantees about stack usage.

The question is, should it? Should the vDSO spec set a hard limit on
stack consumption that userspace can rely on, and perhaps inline
everything and/or disable -fstack-check to avoid the stack probes?

[1] https://github.com/golang/go/issues/20427#issuecomment-343255844
[2] https://marcan.st/paste/HCVuLG6T.txt

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-10 10:40 ` [kernel-hardening] " Hector Martin 'marcan'
@ 2017-11-10 14:57   ` Andy Lutomirski
  -1 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-10 14:57 UTC (permalink / raw)
  To: Hector Martin 'marcan'; +Cc: LKML, kernel-hardening, X86 ML

On Fri, Nov 10, 2017 at 2:40 AM, Hector Martin 'marcan'
<marcan@marcan.st> wrote:
> As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
> `man 7 vdso`) make no mention of how much stack the vDSO functions are
> allowed to use. They just say "the usual C ABI", which makes no guarantees.
>
> It turns out that Go has been assuming that those functions use less
> than 104 bytes of stack space, because it calls them directly on its
> tiny stack allocations with no guard pages or other hardware overflow
> protection [1]. On most systems, this is fine.
>
> However, on my system the stars aligned and turned it into a
> nondeterministic crash. I use Gentoo Hardened, which builds its
> toolchain with -fstack-check on by default. It turns out that with the
> combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
> CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
> not marked inline, so it's perfectly within its right not to do that,
> though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
> even though that nominally gives it greater freedom *not* to inline
> things marked inline). That turns __vdso_clock_gettime and
> __vdso_gettimeofday into non-leaf functions, and GCC then inserts a
> stack probe (full objdump at [2]):
>
> 0000000000000030 <__vdso_clock_gettime>:
>   30:   55                      push   %rbp
>   31:   48 89 e5                mov    %rsp,%rbp
>   34:   48 81 ec 20 10 00 00    sub    $0x1020,%rsp
>   3b:   48 83 0c 24 00          orq    $0x0,(%rsp)
>   40:   48 81 c4 20 10 00 00    add    $0x1020,%rsp

This code is so wrong I don't even no where to start.  Seriously, sub,
orq, add?  How about just orq with an offset?  How about a *load*
instead of a store?

But stepping back even further, an offset > 4096 is just bogus.
That's big enough to skip right over the guard page.

Anyway, my recollection is that GCC's stack check code is busted until
much newer gcc versions.  I suppose we could try to make the kernel
fail to build at all on a broken configuration like this.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 14:57   ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-10 14:57 UTC (permalink / raw)
  To: Hector Martin 'marcan'; +Cc: LKML, kernel-hardening, X86 ML

On Fri, Nov 10, 2017 at 2:40 AM, Hector Martin 'marcan'
<marcan@marcan.st> wrote:
> As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
> `man 7 vdso`) make no mention of how much stack the vDSO functions are
> allowed to use. They just say "the usual C ABI", which makes no guarantees.
>
> It turns out that Go has been assuming that those functions use less
> than 104 bytes of stack space, because it calls them directly on its
> tiny stack allocations with no guard pages or other hardware overflow
> protection [1]. On most systems, this is fine.
>
> However, on my system the stars aligned and turned it into a
> nondeterministic crash. I use Gentoo Hardened, which builds its
> toolchain with -fstack-check on by default. It turns out that with the
> combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
> CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
> not marked inline, so it's perfectly within its right not to do that,
> though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
> even though that nominally gives it greater freedom *not* to inline
> things marked inline). That turns __vdso_clock_gettime and
> __vdso_gettimeofday into non-leaf functions, and GCC then inserts a
> stack probe (full objdump at [2]):
>
> 0000000000000030 <__vdso_clock_gettime>:
>   30:   55                      push   %rbp
>   31:   48 89 e5                mov    %rsp,%rbp
>   34:   48 81 ec 20 10 00 00    sub    $0x1020,%rsp
>   3b:   48 83 0c 24 00          orq    $0x0,(%rsp)
>   40:   48 81 c4 20 10 00 00    add    $0x1020,%rsp

This code is so wrong I don't even no where to start.  Seriously, sub,
orq, add?  How about just orq with an offset?  How about a *load*
instead of a store?

But stepping back even further, an offset > 4096 is just bogus.
That's big enough to skip right over the guard page.

Anyway, my recollection is that GCC's stack check code is busted until
much newer gcc versions.  I suppose we could try to make the kernel
fail to build at all on a broken configuration like this.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-10 14:57   ` [kernel-hardening] " Andy Lutomirski
@ 2017-11-10 16:02     ` Hector Martin 'marcan'
  -1 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-10 16:02 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-10 23:57, Andy Lutomirski wrote:
> This code is so wrong I don't even no where to start.  Seriously, sub,
> orq, add?  How about just orq with an offset?  How about a *load*
> instead of a store?

Stores should be cheaper than loads (since they don't stall), but
apparently the rationale for using orq is:

gcc/config/i386/i386.md: ;; Use IOR for stack probes, this is shorter.

Saves bytes I guess? Though being read-modify-write it probably hurts
performance; I don't know what real CPUs would do with it.

I suspect the sub, add is there to guarantee that the stack pointer is
actually below the probed location. IIRC the x86-64 ABI specifies a
128-byte redzone that you can freely mess with; going beyond that would
require actually changing the stack pointer.

> But stepping back even further, an offset > 4096 is just bogus.
> That's big enough to skip right over the guard page.

The code (gcc/config/i386/i386.c) says:

  /* We skip the probe for the first interval + a small dope of 4 words
     and probe that many bytes past the specified size to maintain a
     protection area at the botton of the stack.  */

Not entirely sure what's going on here.

OTOH I'm not sure why it's probing at all, since AIUI it only needs to
probe for stack frames >4k to begin with.

> Anyway, my recollection is that GCC's stack check code is busted until
> much newer gcc versions.  I suppose we could try to make the kernel
> fail to build at all on a broken configuration like this.

Well, the original point still stands. Even if what GCC is doing is
stupid here, it's not illegal (it's just eating stack space), and the
kernel still currently makes no guarantees about that. So I think the
conversation regarding vDSO stack usage guarantees is still worth having.

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 16:02     ` Hector Martin 'marcan'
  0 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-10 16:02 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-10 23:57, Andy Lutomirski wrote:
> This code is so wrong I don't even no where to start.  Seriously, sub,
> orq, add?  How about just orq with an offset?  How about a *load*
> instead of a store?

Stores should be cheaper than loads (since they don't stall), but
apparently the rationale for using orq is:

gcc/config/i386/i386.md: ;; Use IOR for stack probes, this is shorter.

Saves bytes I guess? Though being read-modify-write it probably hurts
performance; I don't know what real CPUs would do with it.

I suspect the sub, add is there to guarantee that the stack pointer is
actually below the probed location. IIRC the x86-64 ABI specifies a
128-byte redzone that you can freely mess with; going beyond that would
require actually changing the stack pointer.

> But stepping back even further, an offset > 4096 is just bogus.
> That's big enough to skip right over the guard page.

The code (gcc/config/i386/i386.c) says:

  /* We skip the probe for the first interval + a small dope of 4 words
     and probe that many bytes past the specified size to maintain a
     protection area at the botton of the stack.  */

Not entirely sure what's going on here.

OTOH I'm not sure why it's probing at all, since AIUI it only needs to
probe for stack frames >4k to begin with.

> Anyway, my recollection is that GCC's stack check code is busted until
> much newer gcc versions.  I suppose we could try to make the kernel
> fail to build at all on a broken configuration like this.

Well, the original point still stands. Even if what GCC is doing is
stupid here, it's not illegal (it's just eating stack space), and the
kernel still currently makes no guarantees about that. So I think the
conversation regarding vDSO stack usage guarantees is still worth having.

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-10 16:02     ` Hector Martin 'marcan'
@ 2017-11-10 16:36       ` Hector Martin 'marcan'
  -1 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-10 16:36 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
> Not entirely sure what's going on here.

Actually, if you think about it, it doesn't matter that it skips the
first page, since it's probing one page more. That just means the caller
will have probed the previous page. So ultimately you're just probing
ahead of where you need to, but that should be OK.

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 16:36       ` Hector Martin 'marcan'
  0 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-10 16:36 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
> Not entirely sure what's going on here.

Actually, if you think about it, it doesn't matter that it skips the
first page, since it's probing one page more. That just means the caller
will have probed the previous page. So ultimately you're just probing
ahead of where you need to, but that should be OK.

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-10 16:02     ` Hector Martin 'marcan'
@ 2017-11-10 22:04       ` Andy Lutomirski
  -1 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-10 22:04 UTC (permalink / raw)
  To: Hector Martin 'marcan'; +Cc: LKML, kernel-hardening, X86 ML

> On Nov 10, 2017, at 8:02 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>
>> On 2017-11-10 23:57, Andy Lutomirski wrote:
>> This code is so wrong I don't even no where to start.  Seriously, sub,
>> orq, add?  How about just orq with an offset?  How about a *load*
>> instead of a store?
>
> Stores should be cheaper than loads (since they don't stall), but
> apparently the rationale for using orq is:

I'm having trouble imagining a CPU that would stall on an unused load
but would not stall on an RMW.

>
> gcc/config/i386/i386.md: ;; Use IOR for stack probes, this is shorter.
>
> Saves bytes I guess? Though being read-modify-write it probably hurts
> performance; I don't know what real CPUs would do with it.
>
> I suspect the sub, add is there to guarantee that the stack pointer is
> actually below the probed location. IIRC the x86-64 ABI specifies a
> 128-byte redzone that you can freely mess with; going beyond that would
> require actually changing the stack pointer.
>

The redzone says that signals won't clobber the first 128 bytes.  For
a stack probe, no one cares about the value at the probed address, so
this seems moot.  Maybe there's some kernel that would object to the
sort-of-out-of-bounds probe, but that seems unlikely.

>> But stepping back even further, an offset > 4096 is just bogus.
>> That's big enough to skip right over the guard page.
>
> The code (gcc/config/i386/i386.c) says:
>
>  /* We skip the probe for the first interval + a small dope of 4 words
>     and probe that many bytes past the specified size to maintain a
>     protection area at the botton of the stack.  */
>
> Not entirely sure what's going on here.
>
> OTOH I'm not sure why it's probing at all, since AIUI it only needs to
> probe for stack frames >4k to begin with.
>
>> Anyway, my recollection is that GCC's stack check code is busted until
>> much newer gcc versions.  I suppose we could try to make the kernel
>> fail to build at all on a broken configuration like this.
>
> Well, the original point still stands. Even if what GCC is doing is
> stupid here, it's not illegal (it's just eating stack space), and the
> kernel still currently makes no guarantees about that. So I think the
> conversation regarding vDSO stack usage guarantees is still worth having.
>
> --
> Hector Martin "marcan" (marcan@marcan.st)
> Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 22:04       ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-10 22:04 UTC (permalink / raw)
  To: Hector Martin 'marcan'; +Cc: LKML, kernel-hardening, X86 ML

> On Nov 10, 2017, at 8:02 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>
>> On 2017-11-10 23:57, Andy Lutomirski wrote:
>> This code is so wrong I don't even no where to start.  Seriously, sub,
>> orq, add?  How about just orq with an offset?  How about a *load*
>> instead of a store?
>
> Stores should be cheaper than loads (since they don't stall), but
> apparently the rationale for using orq is:

I'm having trouble imagining a CPU that would stall on an unused load
but would not stall on an RMW.

>
> gcc/config/i386/i386.md: ;; Use IOR for stack probes, this is shorter.
>
> Saves bytes I guess? Though being read-modify-write it probably hurts
> performance; I don't know what real CPUs would do with it.
>
> I suspect the sub, add is there to guarantee that the stack pointer is
> actually below the probed location. IIRC the x86-64 ABI specifies a
> 128-byte redzone that you can freely mess with; going beyond that would
> require actually changing the stack pointer.
>

The redzone says that signals won't clobber the first 128 bytes.  For
a stack probe, no one cares about the value at the probed address, so
this seems moot.  Maybe there's some kernel that would object to the
sort-of-out-of-bounds probe, but that seems unlikely.

>> But stepping back even further, an offset > 4096 is just bogus.
>> That's big enough to skip right over the guard page.
>
> The code (gcc/config/i386/i386.c) says:
>
>  /* We skip the probe for the first interval + a small dope of 4 words
>     and probe that many bytes past the specified size to maintain a
>     protection area at the botton of the stack.  */
>
> Not entirely sure what's going on here.
>
> OTOH I'm not sure why it's probing at all, since AIUI it only needs to
> probe for stack frames >4k to begin with.
>
>> Anyway, my recollection is that GCC's stack check code is busted until
>> much newer gcc versions.  I suppose we could try to make the kernel
>> fail to build at all on a broken configuration like this.
>
> Well, the original point still stands. Even if what GCC is doing is
> stupid here, it's not illegal (it's just eating stack space), and the
> kernel still currently makes no guarantees about that. So I think the
> conversation regarding vDSO stack usage guarantees is still worth having.
>
> --
> Hector Martin "marcan" (marcan@marcan.st)
> Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-10 16:36       ` Hector Martin 'marcan'
@ 2017-11-10 22:04         ` Andy Lutomirski
  -1 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-10 22:04 UTC (permalink / raw)
  To: Hector Martin 'marcan'; +Cc: LKML, kernel-hardening, X86 ML

> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>
>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>> Not entirely sure what's going on here.
>
> Actually, if you think about it, it doesn't matter that it skips the
> first page, since it's probing one page more. That just means the caller
> will have probed the previous page. So ultimately you're just probing
> ahead of where you need to, but that should be OK.
>

The whole point is to touch the stack pages in order.  Also, I see no
guarantee that the function would touch the intermediate page before
clobbering the probed page.  You're seeing exactly that behavior, in
fact.

> --
> Hector Martin "marcan" (marcan@marcan.st)
> Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-10 22:04         ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-10 22:04 UTC (permalink / raw)
  To: Hector Martin 'marcan'; +Cc: LKML, kernel-hardening, X86 ML

> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>
>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>> Not entirely sure what's going on here.
>
> Actually, if you think about it, it doesn't matter that it skips the
> first page, since it's probing one page more. That just means the caller
> will have probed the previous page. So ultimately you're just probing
> ahead of where you need to, but that should be OK.
>

The whole point is to touch the stack pages in order.  Also, I see no
guarantee that the function would touch the intermediate page before
clobbering the probed page.  You're seeing exactly that behavior, in
fact.

> --
> Hector Martin "marcan" (marcan@marcan.st)
> Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-10 22:04         ` Andy Lutomirski
@ 2017-11-11  5:16           ` Hector Martin 'marcan'
  -1 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-11  5:16 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-11 07:04, Andy Lutomirski wrote:
>> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>>
>>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>>> Not entirely sure what's going on here.
>>
>> Actually, if you think about it, it doesn't matter that it skips the
>> first page, since it's probing one page more. That just means the caller
>> will have probed the previous page. So ultimately you're just probing
>> ahead of where you need to, but that should be OK.
>>
> 
> The whole point is to touch the stack pages in order.  Also, I see no
> guarantee that the function would touch the intermediate page before
> clobbering the probed page.  You're seeing exactly that behavior, in
> fact.

Only because Go is not C and is not compiled like this. If all the code
is GCC-compiled C code and built with -fstack-check, it should always
probe stack pages in order except for potentially the second page in the
stack, which may be touched after the third page (but hopefully your
stack is at least two pages long to begin with).

AIUI -fstack-check was not intended for stack clash protection (the
latter isn't even in a GCC release yet), but in most circumstances it
seems to me like it's an effective mitigation if all code is compiled
with it. Qualys mentioned it as such in their advisory. This is probably
why Gentoo Hardened enables it by default globally in their toolchain.

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-11  5:16           ` Hector Martin 'marcan'
  0 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-11  5:16 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-11 07:04, Andy Lutomirski wrote:
>> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>>
>>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>>> Not entirely sure what's going on here.
>>
>> Actually, if you think about it, it doesn't matter that it skips the
>> first page, since it's probing one page more. That just means the caller
>> will have probed the previous page. So ultimately you're just probing
>> ahead of where you need to, but that should be OK.
>>
> 
> The whole point is to touch the stack pages in order.  Also, I see no
> guarantee that the function would touch the intermediate page before
> clobbering the probed page.  You're seeing exactly that behavior, in
> fact.

Only because Go is not C and is not compiled like this. If all the code
is GCC-compiled C code and built with -fstack-check, it should always
probe stack pages in order except for potentially the second page in the
stack, which may be touched after the third page (but hopefully your
stack is at least two pages long to begin with).

AIUI -fstack-check was not intended for stack clash protection (the
latter isn't even in a GCC release yet), but in most circumstances it
seems to me like it's an effective mitigation if all code is compiled
with it. Qualys mentioned it as such in their advisory. This is probably
why Gentoo Hardened enables it by default globally in their toolchain.

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-11  5:16           ` Hector Martin 'marcan'
@ 2017-11-12  4:21             ` Andy Lutomirski
  -1 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-12  4:21 UTC (permalink / raw)
  To: Hector Martin 'marcan'
  Cc: Andy Lutomirski, LKML, kernel-hardening, X86 ML

On Fri, Nov 10, 2017 at 9:16 PM, Hector Martin 'marcan'
<marcan@marcan.st> wrote:
> On 2017-11-11 07:04, Andy Lutomirski wrote:
>>> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>>>
>>>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>>>> Not entirely sure what's going on here.
>>>
>>> Actually, if you think about it, it doesn't matter that it skips the
>>> first page, since it's probing one page more. That just means the caller
>>> will have probed the previous page. So ultimately you're just probing
>>> ahead of where you need to, but that should be OK.
>>>
>>
>> The whole point is to touch the stack pages in order.  Also, I see no
>> guarantee that the function would touch the intermediate page before
>> clobbering the probed page.  You're seeing exactly that behavior, in
>> fact.
>
> Only because Go is not C and is not compiled like this. If all the code
> is GCC-compiled C code and built with -fstack-check, it should always
> probe stack pages in order except for potentially the second page in the
> stack, which may be touched after the third page (but hopefully your
> stack is at least two pages long to begin with).

If you're generating code to improve stack overflow, assuming that you
have at least two pages left seems like an *awful* assumption to make.

>
> AIUI -fstack-check was not intended for stack clash protection (the
> latter isn't even in a GCC release yet), but in most circumstances it
> seems to me like it's an effective mitigation if all code is compiled
> with it. Qualys mentioned it as such in their advisory. This is probably
> why Gentoo Hardened enables it by default globally in their toolchain.
>

Gentoo Hardened should seriously consider turning it back off.  Do you
happen to know what exactly Gentoo does to cause the vdso to get build
with -fstack-check?  I'll write a patch to either fail the build or to
force it off.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-12  4:21             ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2017-11-12  4:21 UTC (permalink / raw)
  To: Hector Martin 'marcan'
  Cc: Andy Lutomirski, LKML, kernel-hardening, X86 ML

On Fri, Nov 10, 2017 at 9:16 PM, Hector Martin 'marcan'
<marcan@marcan.st> wrote:
> On 2017-11-11 07:04, Andy Lutomirski wrote:
>>> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <marcan@marcan.st> wrote:
>>>
>>>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>>>> Not entirely sure what's going on here.
>>>
>>> Actually, if you think about it, it doesn't matter that it skips the
>>> first page, since it's probing one page more. That just means the caller
>>> will have probed the previous page. So ultimately you're just probing
>>> ahead of where you need to, but that should be OK.
>>>
>>
>> The whole point is to touch the stack pages in order.  Also, I see no
>> guarantee that the function would touch the intermediate page before
>> clobbering the probed page.  You're seeing exactly that behavior, in
>> fact.
>
> Only because Go is not C and is not compiled like this. If all the code
> is GCC-compiled C code and built with -fstack-check, it should always
> probe stack pages in order except for potentially the second page in the
> stack, which may be touched after the third page (but hopefully your
> stack is at least two pages long to begin with).

If you're generating code to improve stack overflow, assuming that you
have at least two pages left seems like an *awful* assumption to make.

>
> AIUI -fstack-check was not intended for stack clash protection (the
> latter isn't even in a GCC release yet), but in most circumstances it
> seems to me like it's an effective mitigation if all code is compiled
> with it. Qualys mentioned it as such in their advisory. This is probably
> why Gentoo Hardened enables it by default globally in their toolchain.
>

Gentoo Hardened should seriously consider turning it back off.  Do you
happen to know what exactly Gentoo does to cause the vdso to get build
with -fstack-check?  I'll write a patch to either fail the build or to
force it off.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
  2017-11-12  4:21             ` Andy Lutomirski
@ 2017-11-12  4:39               ` Hector Martin 'marcan'
  -1 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-12  4:39 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-12 13:21, Andy Lutomirski wrote:
>> Only because Go is not C and is not compiled like this. If all the code
>> is GCC-compiled C code and built with -fstack-check, it should always
>> probe stack pages in order except for potentially the second page in the
>> stack, which may be touched after the third page (but hopefully your
>> stack is at least two pages long to begin with).
> 
> If you're generating code to improve stack overflow, assuming that you
> have at least two pages left seems like an *awful* assumption to make.

If all the code is compiled with the option, then it guarantees you have
at least two pages left, as long as you have at least two pages when the
program/thread starts.

> Gentoo Hardened should seriously consider turning it back off.  Do you
> happen to know what exactly Gentoo does to cause the vdso to get build
> with -fstack-check?  I'll write a patch to either fail the build or to
> force it off.

So you're saying Gentoo Hardened should turn off an exploit mitigation
that, though imperfect, works in the vast majority of cases and seems to
have caused a grand total of one [1] bug in a package so far (two if you
count the one I found)? That seems completely silly to me. I'm sure once
GCC 8 is out with dedicated stack clash protection they'll switch to
that, but in the meantime -fstack-check seems like a perfectly
reasonable idea.

Anyway, they just add it to the default specs for gcc. You can turn it
back off with -fstack-check=no.

You still haven't addressed the important question, though: should vDSO
be *documented* to consume a certain maximum amount of stack space? If
not, this whole thing is pointless since vDSO would be fine as it is. If
yes, then -fstack-check=no isn't the only thing you have to worry about;
more options to limit stack frame size, and perhaps code changes to
inline everything, might be appropriate.

[1] https://bugs.gentoo.org/637152

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check
@ 2017-11-12  4:39               ` Hector Martin 'marcan'
  0 siblings, 0 replies; 18+ messages in thread
From: Hector Martin 'marcan' @ 2017-11-12  4:39 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, kernel-hardening, X86 ML

On 2017-11-12 13:21, Andy Lutomirski wrote:
>> Only because Go is not C and is not compiled like this. If all the code
>> is GCC-compiled C code and built with -fstack-check, it should always
>> probe stack pages in order except for potentially the second page in the
>> stack, which may be touched after the third page (but hopefully your
>> stack is at least two pages long to begin with).
> 
> If you're generating code to improve stack overflow, assuming that you
> have at least two pages left seems like an *awful* assumption to make.

If all the code is compiled with the option, then it guarantees you have
at least two pages left, as long as you have at least two pages when the
program/thread starts.

> Gentoo Hardened should seriously consider turning it back off.  Do you
> happen to know what exactly Gentoo does to cause the vdso to get build
> with -fstack-check?  I'll write a patch to either fail the build or to
> force it off.

So you're saying Gentoo Hardened should turn off an exploit mitigation
that, though imperfect, works in the vast majority of cases and seems to
have caused a grand total of one [1] bug in a package so far (two if you
count the one I found)? That seems completely silly to me. I'm sure once
GCC 8 is out with dedicated stack clash protection they'll switch to
that, but in the meantime -fstack-check seems like a perfectly
reasonable idea.

Anyway, they just add it to the default specs for gcc. You can turn it
back off with -fstack-check=no.

You still haven't addressed the important question, though: should vDSO
be *documented* to consume a certain maximum amount of stack space? If
not, this whole thing is pointless since vDSO would be fine as it is. If
yes, then -fstack-check=no isn't the only thing you have to worry about;
more options to limit stack frame size, and perhaps code changes to
inline everything, might be appropriate.

[1] https://bugs.gentoo.org/637152

-- 
Hector Martin "marcan" (marcan@marcan.st)
Public Key: https://mrcn.st/pub

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-11-12  4:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10 10:40 vDSO maximum stack usage, stack probes, and -fstack-check Hector Martin 'marcan'
2017-11-10 10:40 ` [kernel-hardening] " Hector Martin 'marcan'
2017-11-10 14:57 ` Andy Lutomirski
2017-11-10 14:57   ` [kernel-hardening] " Andy Lutomirski
2017-11-10 16:02   ` Hector Martin 'marcan'
2017-11-10 16:02     ` Hector Martin 'marcan'
2017-11-10 16:36     ` Hector Martin 'marcan'
2017-11-10 16:36       ` Hector Martin 'marcan'
2017-11-10 22:04       ` Andy Lutomirski
2017-11-10 22:04         ` Andy Lutomirski
2017-11-11  5:16         ` Hector Martin 'marcan'
2017-11-11  5:16           ` Hector Martin 'marcan'
2017-11-12  4:21           ` Andy Lutomirski
2017-11-12  4:21             ` Andy Lutomirski
2017-11-12  4:39             ` Hector Martin 'marcan'
2017-11-12  4:39               ` Hector Martin 'marcan'
2017-11-10 22:04     ` Andy Lutomirski
2017-11-10 22:04       ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.