LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH v2] fork: Unconditionally clear stack on fork
@ 2018-02-21  2:16 Kees Cook
  2018-02-21 10:29 ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2018-02-21  2:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Lutomirski, Laura Abbott, Michal Hocko, Rasmus Villemoes,
	linux-kernel, kernel-hardening

One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated. Normally, those stacks are not zeroed, and the old contents
remain in place. In the face of stack content exposure flaws, those
contents can leak to userspace.

Fixing this will make the kernel no longer vulnerable to these flaws,
as the stack will be wiped each time a stack is assigned to a new
process. There's not a meaningful change in runtime performance; it
almost looks like it provides a benefit.

Performing back-to-back kernel builds before:
	Run times: 157.86 157.09 158.90 160.94 160.80
	Mean: 159.12
	Std Dev: 1.54

and after:
	Run times: 159.31 157.34 156.71 158.15 160.81
	Mean: 158.46
	Std Dev: 1.46

Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.

[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/thread_info.h | 6 +-----
 kernel/fork.c               | 3 +--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 34f053a150a9..cf2862bd134a 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -43,11 +43,7 @@ enum {
 #define THREAD_ALIGN	THREAD_SIZE
 #endif
 
-#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK)
-# define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO)
-#else
-# define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT)
-#endif
+#define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO)
 
 /*
  * flag set/clear/test wrappers
diff --git a/kernel/fork.c b/kernel/fork.c
index be8aa5b98666..4f2ee527c7d2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -216,10 +216,9 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
 		if (!s)
 			continue;
 
-#ifdef CONFIG_DEBUG_KMEMLEAK
 		/* Clear stale pointers from reused stack. */
 		memset(s->addr, 0, THREAD_SIZE);
-#endif
+
 		tsk->stack_vm_area = s;
 		return s->addr;
 	}
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fork: Unconditionally clear stack on fork
  2018-02-21  2:16 [PATCH v2] fork: Unconditionally clear stack on fork Kees Cook
@ 2018-02-21 10:29 ` Michal Hocko
  2018-02-21 20:59   ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2018-02-21 10:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Andy Lutomirski, Laura Abbott, Rasmus Villemoes,
	linux-kernel, kernel-hardening

On Tue 20-02-18 18:16:59, Kees Cook wrote:
> One of the classes of kernel stack content leaks[1] is exposing the
> contents of prior heap or stack contents when a new process stack is
> allocated. Normally, those stacks are not zeroed, and the old contents
> remain in place. In the face of stack content exposure flaws, those
> contents can leak to userspace.
> 
> Fixing this will make the kernel no longer vulnerable to these flaws,
> as the stack will be wiped each time a stack is assigned to a new
> process. There's not a meaningful change in runtime performance; it
> almost looks like it provides a benefit.
> 
> Performing back-to-back kernel builds before:
> 	Run times: 157.86 157.09 158.90 160.94 160.80
> 	Mean: 159.12
> 	Std Dev: 1.54
> 
> and after:
> 	Run times: 159.31 157.34 156.71 158.15 160.81
> 	Mean: 158.46
> 	Std Dev: 1.46

/bin/true or similar would be more representative for the worst case
but it is good to see that this doesn't have any visible effect on
a more real usecase.

> Instead of making this a build or runtime config, Andy Lutomirski
> recommended this just be enabled by default.
> 
> [1] A noisy search for many kinds of stack content leaks can be seen here:
> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/thread_info.h | 6 +-----
>  kernel/fork.c               | 3 +--
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index 34f053a150a9..cf2862bd134a 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -43,11 +43,7 @@ enum {
>  #define THREAD_ALIGN	THREAD_SIZE
>  #endif
>  
> -#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK)
> -# define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO)
> -#else
> -# define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT)
> -#endif
> +#define THREADINFO_GFP		(GFP_KERNEL_ACCOUNT | __GFP_ZERO)
>  
>  /*
>   * flag set/clear/test wrappers
> diff --git a/kernel/fork.c b/kernel/fork.c
> index be8aa5b98666..4f2ee527c7d2 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -216,10 +216,9 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
>  		if (!s)
>  			continue;
>  
> -#ifdef CONFIG_DEBUG_KMEMLEAK
>  		/* Clear stale pointers from reused stack. */
>  		memset(s->addr, 0, THREAD_SIZE);
> -#endif
> +
>  		tsk->stack_vm_area = s;
>  		return s->addr;
>  	}
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fork: Unconditionally clear stack on fork
  2018-02-21 10:29 ` Michal Hocko
@ 2018-02-21 20:59   ` Andrew Morton
  2018-02-22  2:15     ` Kees Cook
  2018-02-22  9:53     ` Mel Gorman
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2018-02-21 20:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andy Lutomirski, Laura Abbott, Rasmus Villemoes,
	linux-kernel, kernel-hardening

On Wed, 21 Feb 2018 11:29:33 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> On Tue 20-02-18 18:16:59, Kees Cook wrote:
> > One of the classes of kernel stack content leaks[1] is exposing the
> > contents of prior heap or stack contents when a new process stack is
> > allocated. Normally, those stacks are not zeroed, and the old contents
> > remain in place. In the face of stack content exposure flaws, those
> > contents can leak to userspace.
> > 
> > Fixing this will make the kernel no longer vulnerable to these flaws,
> > as the stack will be wiped each time a stack is assigned to a new
> > process. There's not a meaningful change in runtime performance; it
> > almost looks like it provides a benefit.
> > 
> > Performing back-to-back kernel builds before:
> > 	Run times: 157.86 157.09 158.90 160.94 160.80
> > 	Mean: 159.12
> > 	Std Dev: 1.54
> > 
> > and after:
> > 	Run times: 159.31 157.34 156.71 158.15 160.81
> > 	Mean: 158.46
> > 	Std Dev: 1.46
> 
> /bin/true or similar would be more representative for the worst case
> but it is good to see that this doesn't have any visible effect on
> a more real usecase.

Yes, that's a pretty large memset.  And while it will populate the CPU
cache with the stack contents, doing so will evict other things.

So some quite careful quantitative testing is needed here, methinks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fork: Unconditionally clear stack on fork
  2018-02-21 20:59   ` Andrew Morton
@ 2018-02-22  2:15     ` Kees Cook
  2018-04-18 16:38       ` Kees Cook
  2018-02-22  9:53     ` Mel Gorman
  1 sibling, 1 reply; 7+ messages in thread
From: Kees Cook @ 2018-02-22  2:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Andy Lutomirski, Laura Abbott, Rasmus Villemoes,
	LKML, Kernel Hardening

On Wed, Feb 21, 2018 at 12:59 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 21 Feb 2018 11:29:33 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>
>> On Tue 20-02-18 18:16:59, Kees Cook wrote:
>> > One of the classes of kernel stack content leaks[1] is exposing the
>> > contents of prior heap or stack contents when a new process stack is
>> > allocated. Normally, those stacks are not zeroed, and the old contents
>> > remain in place. In the face of stack content exposure flaws, those
>> > contents can leak to userspace.
>> >
>> > Fixing this will make the kernel no longer vulnerable to these flaws,
>> > as the stack will be wiped each time a stack is assigned to a new
>> > process. There's not a meaningful change in runtime performance; it
>> > almost looks like it provides a benefit.
>> >
>> > Performing back-to-back kernel builds before:
>> >     Run times: 157.86 157.09 158.90 160.94 160.80
>> >     Mean: 159.12
>> >     Std Dev: 1.54
>> >
>> > and after:
>> >     Run times: 159.31 157.34 156.71 158.15 160.81
>> >     Mean: 158.46
>> >     Std Dev: 1.46
>>
>> /bin/true or similar would be more representative for the worst case
>> but it is good to see that this doesn't have any visible effect on
>> a more real usecase.
>
> Yes, that's a pretty large memset.  And while it will populate the CPU
> cache with the stack contents, doing so will evict other things.
>
> So some quite careful quantitative testing is needed here, methinks.

Well, I did some more with perf and cycle counts on running 100,000
execs of /bin/true.

before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean:  221015379122.60
Std Dev: 4662486552.47

after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean:  217745009865.40
Std Dev: 5935559279.99

It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy. I'm
open to ideas!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fork: Unconditionally clear stack on fork
  2018-02-21 20:59   ` Andrew Morton
  2018-02-22  2:15     ` Kees Cook
@ 2018-02-22  9:53     ` Mel Gorman
  1 sibling, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2018-02-22  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Kees Cook, Andy Lutomirski, Laura Abbott,
	Rasmus Villemoes, linux-kernel, kernel-hardening

On Wed, Feb 21, 2018 at 12:59:14PM -0800, Andrew Morton wrote:
> On Wed, 21 Feb 2018 11:29:33 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Tue 20-02-18 18:16:59, Kees Cook wrote:
> > > One of the classes of kernel stack content leaks[1] is exposing the
> > > contents of prior heap or stack contents when a new process stack is
> > > allocated. Normally, those stacks are not zeroed, and the old contents
> > > remain in place. In the face of stack content exposure flaws, those
> > > contents can leak to userspace.
> > > 
> > > Fixing this will make the kernel no longer vulnerable to these flaws,
> > > as the stack will be wiped each time a stack is assigned to a new
> > > process. There's not a meaningful change in runtime performance; it
> > > almost looks like it provides a benefit.
> > > 
> > > Performing back-to-back kernel builds before:
> > > 	Run times: 157.86 157.09 158.90 160.94 160.80
> > > 	Mean: 159.12
> > > 	Std Dev: 1.54
> > > 
> > > and after:
> > > 	Run times: 159.31 157.34 156.71 158.15 160.81
> > > 	Mean: 158.46
> > > 	Std Dev: 1.46
> > 
> > /bin/true or similar would be more representative for the worst case
> > but it is good to see that this doesn't have any visible effect on
> > a more real usecase.
> 
> Yes, that's a pretty large memset.  And while it will populate the CPU
> cache with the stack contents, doing so will evict other things.
> 

The lines will also bounce on the child. I expect the zeroing will be a
relatively small percentage of the overall cost. The cost is all elsewhere
such as the the full search that fork does for queueing a task on a CPU
for the first time. Using perf will mask the issue unless the performance
governor is used in this case. Otherwise you hit the weird corner case
whereby perf itself increases CPU utilisation and the cpufreq governor
(even if it's HWP or another hardware-based scheme) will increase the
p-state and it'll appear to run faster.

> So some quite careful quantitative testing is needed here, methinks.

With an emphasis on careful.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fork: Unconditionally clear stack on fork
  2018-02-22  2:15     ` Kees Cook
@ 2018-04-18 16:38       ` Kees Cook
  2018-04-18 19:50         ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2018-04-18 16:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Andy Lutomirski, Laura Abbott, Rasmus Villemoes,
	LKML, Kernel Hardening

On Wed, Feb 21, 2018 at 6:15 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Feb 21, 2018 at 12:59 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Wed, 21 Feb 2018 11:29:33 +0100 Michal Hocko <mhocko@kernel.org> wrote:
>>
>>> On Tue 20-02-18 18:16:59, Kees Cook wrote:
>>> > One of the classes of kernel stack content leaks[1] is exposing the
>>> > contents of prior heap or stack contents when a new process stack is
>>> > allocated. Normally, those stacks are not zeroed, and the old contents
>>> > remain in place. In the face of stack content exposure flaws, those
>>> > contents can leak to userspace.
>>> >
>>> > Fixing this will make the kernel no longer vulnerable to these flaws,
>>> > as the stack will be wiped each time a stack is assigned to a new
>>> > process. There's not a meaningful change in runtime performance; it
>>> > almost looks like it provides a benefit.
>>> >
>>> > Performing back-to-back kernel builds before:
>>> >     Run times: 157.86 157.09 158.90 160.94 160.80
>>> >     Mean: 159.12
>>> >     Std Dev: 1.54
>>> >
>>> > and after:
>>> >     Run times: 159.31 157.34 156.71 158.15 160.81
>>> >     Mean: 158.46
>>> >     Std Dev: 1.46
>>>
>>> /bin/true or similar would be more representative for the worst case
>>> but it is good to see that this doesn't have any visible effect on
>>> a more real usecase.
>>
>> Yes, that's a pretty large memset.  And while it will populate the CPU
>> cache with the stack contents, doing so will evict other things.
>>
>> So some quite careful quantitative testing is needed here, methinks.
>
> Well, I did some more with perf and cycle counts on running 100,000
> execs of /bin/true.
>
> before:
> Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
> Mean:  221015379122.60
> Std Dev: 4662486552.47
>
> after:
> Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
> Mean:  217745009865.40
> Std Dev: 5935559279.99
>
> It continues to look like it's faster, though the deviation is rather
> wide, but I'm not sure what I could do that would be less noisy. I'm
> open to ideas!

Friendly ping. Andrew, can you add this to -mm?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] fork: Unconditionally clear stack on fork
  2018-04-18 16:38       ` Kees Cook
@ 2018-04-18 19:50         ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2018-04-18 19:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Andy Lutomirski, Laura Abbott, Rasmus Villemoes,
	LKML, Kernel Hardening

On Wed, 18 Apr 2018 09:38:07 -0700 Kees Cook <keescook@chromium.org> wrote:

> >> So some quite careful quantitative testing is needed here, methinks.
> >
> > Well, I did some more with perf and cycle counts on running 100,000
> > execs of /bin/true.
> >
> > before:
> > Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
> > Mean:  221015379122.60
> > Std Dev: 4662486552.47
> >
> > after:
> > Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
> > Mean:  217745009865.40
> > Std Dev: 5935559279.99
> >
> > It continues to look like it's faster, though the deviation is rather
> > wide, but I'm not sure what I could do that would be less noisy. I'm
> > open to ideas!
> 
> Friendly ping. Andrew, can you add this to -mm?

I did so on Feb 21 but didn't merge it up because I'd told myself that
careful perf testing is needed.  I guess we've sufficiently ticked that
box.  Kind of.  Maybe.

Oh well, it's easy enough to revert.  I'll add it to the next
batch-for-Linus.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-21  2:16 [PATCH v2] fork: Unconditionally clear stack on fork Kees Cook
2018-02-21 10:29 ` Michal Hocko
2018-02-21 20:59   ` Andrew Morton
2018-02-22  2:15     ` Kees Cook
2018-04-18 16:38       ` Kees Cook
2018-04-18 19:50         ` Andrew Morton
2018-02-22  9:53     ` Mel Gorman

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox