linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] usercopy: optimize stack check flow when the
@ 2018-08-14 12:17 Xiaofeng Yuan
  2018-08-14 12:34 ` Matthew Wilcox
  0 siblings, 1 reply; 6+ messages in thread
From: Xiaofeng Yuan @ 2018-08-14 12:17 UTC (permalink / raw)
  To: keescook; +Cc: linux-mm, linux-kernel, Xiaofeng Yuan

The check_heap_object() checks the spanning multiple pages and slab.
When the page-spanning test is disabled, the check_heap_object() is
redundant for spanning multiple pages. However, the kernel stacks are
multiple pages under certain conditions: CONFIG_ARCH_THREAD_STACK_ALLOCATOR
is not defined and (THREAD_SIZE >= PAGE_SIZE). At this point, We can skip
the check_heap_object() for kernel stacks to improve performance.
Similarly, the virtually-mapped stack can skip check_heap_object() also,
beacause virt_addr_valid() will return.

I launched more than 100 apps on smartphone, and recorded total check time
and numbers of kernel stacks. The average time of checking kernel stacks
reduced by 48%.


Signed-off-by: Xiaofeng Yuan <yuanxiaofeng1@huawei.com>
---
 mm/usercopy.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325..af350f6 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -255,6 +255,29 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 	/* Check for invalid addresses. */
 	check_bogus_address((const unsigned long)ptr, n, to_user);
 
+#if !defined(CONFIG_HARDENED_USERCOPY_PAGESPAN) && \
+    !defined(CONFIG_ARCH_THREAD_STACK_ALLOCATOR) && \
+    (THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK))
+	/* Check for bad stack object. */
+	switch (check_stack_object(ptr, n)) {
+	case NOT_STACK:
+		/* Object is not touching the current process stack. */
+		break;
+	case GOOD_FRAME:
+	case GOOD_STACK:
+		/*
+		 * Object is either in the correct frame (when it
+		 * is possible to check) or just generally on the
+		 * process stack (when frame checking not available).
+		 */
+		return;
+	default:
+		usercopy_abort("process stack", NULL, to_user, 0, n);
+	}
+
+	/* Check for bad heap object. */
+	check_heap_object(ptr, n, to_user);
+#else
 	/* Check for bad heap object. */
 	check_heap_object(ptr, n, to_user);
 
@@ -274,6 +297,7 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
 	default:
 		usercopy_abort("process stack", NULL, to_user, 0, n);
 	}
+#endif
 
 	/* Check for object in kernel to avoid text exposure. */
 	check_kernel_text_object((const unsigned long)ptr, n, to_user);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC] usercopy: optimize stack check flow when the
  2018-08-14 12:17 [PATCH RFC] usercopy: optimize stack check flow when the Xiaofeng Yuan
@ 2018-08-14 12:34 ` Matthew Wilcox
  2018-08-14 13:02   ` [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled Yuanxiaofeng (XiAn)
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2018-08-14 12:34 UTC (permalink / raw)
  To: Xiaofeng Yuan; +Cc: keescook, linux-mm, linux-kernel

On Tue, Aug 14, 2018 at 08:17:31PM +0800, Xiaofeng Yuan wrote:
> The check_heap_object() checks the spanning multiple pages and slab.
> When the page-spanning test is disabled, the check_heap_object() is
> redundant for spanning multiple pages. However, the kernel stacks are
> multiple pages under certain conditions: CONFIG_ARCH_THREAD_STACK_ALLOCATOR
> is not defined and (THREAD_SIZE >= PAGE_SIZE). At this point, We can skip
> the check_heap_object() for kernel stacks to improve performance.
> Similarly, the virtually-mapped stack can skip check_heap_object() also,
> beacause virt_addr_valid() will return.

Why not just check_stack_object() first, then check_heap_object() second?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled
  2018-08-14 12:34 ` Matthew Wilcox
@ 2018-08-14 13:02   ` Yuanxiaofeng (XiAn)
  2018-08-14 13:09     ` Matthew Wilcox
  2018-08-14 18:54     ` Kees Cook
  0 siblings, 2 replies; 6+ messages in thread
From: Yuanxiaofeng (XiAn) @ 2018-08-14 13:02 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: keescook, linux-mm, linux-kernel

1, When the THREAD_SIZE is less than PAGE_SIZE, the stack will allocate memory by kmem_cache_alloc_node(), it's slab memory and will execute __check_heap_object().
2, When CONFIG_HARDENED_USERCOPY_PAGESPAN is enabled, the multiple-pages stacks will do some check in check_page_span().

So, I set some restrictions to make sure the useful check will not be skipped.

-----Original Message-----
From: Matthew Wilcox [mailto:willy@infradead.org] 
Sent: Tuesday, August 14, 2018 8:35 PM
To: Yuanxiaofeng (XiAn)
Cc: keescook@chromium.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] usercopy: optimize stack check flow when the

On Tue, Aug 14, 2018 at 08:17:31PM +0800, Xiaofeng Yuan wrote:
> The check_heap_object() checks the spanning multiple pages and slab.
> When the page-spanning test is disabled, the check_heap_object() is
> redundant for spanning multiple pages. However, the kernel stacks are
> multiple pages under certain conditions: CONFIG_ARCH_THREAD_STACK_ALLOCATOR
> is not defined and (THREAD_SIZE >= PAGE_SIZE). At this point, We can skip
> the check_heap_object() for kernel stacks to improve performance.
> Similarly, the virtually-mapped stack can skip check_heap_object() also,
> beacause virt_addr_valid() will return.

Why not just check_stack_object() first, then check_heap_object() second?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled
  2018-08-14 13:02   ` [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled Yuanxiaofeng (XiAn)
@ 2018-08-14 13:09     ` Matthew Wilcox
  2018-08-14 18:54     ` Kees Cook
  1 sibling, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2018-08-14 13:09 UTC (permalink / raw)
  To: Yuanxiaofeng (XiAn); +Cc: keescook, linux-mm, linux-kernel

On Tue, Aug 14, 2018 at 01:02:55PM +0000, Yuanxiaofeng (XiAn) wrote:
> 1, When the THREAD_SIZE is less than PAGE_SIZE, the stack will allocate memory by kmem_cache_alloc_node(), it's slab memory and will execute __check_heap_object().
> 2, When CONFIG_HARDENED_USERCOPY_PAGESPAN is enabled, the multiple-pages stacks will do some check in check_page_span().

I understand the checks will still do something useful, but I do not see the
scenario in which an object would satisfy the stack checks but fail the heap
checks.

> So, I set some restrictions to make sure the useful check will not be skipped.
> 
> -----Original Message-----
> From: Matthew Wilcox [mailto:willy@infradead.org] 
> Sent: Tuesday, August 14, 2018 8:35 PM
> To: Yuanxiaofeng (XiAn)
> Cc: keescook@chromium.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH RFC] usercopy: optimize stack check flow when the
> 
> On Tue, Aug 14, 2018 at 08:17:31PM +0800, Xiaofeng Yuan wrote:
> > The check_heap_object() checks the spanning multiple pages and slab.
> > When the page-spanning test is disabled, the check_heap_object() is
> > redundant for spanning multiple pages. However, the kernel stacks are
> > multiple pages under certain conditions: CONFIG_ARCH_THREAD_STACK_ALLOCATOR
> > is not defined and (THREAD_SIZE >= PAGE_SIZE). At this point, We can skip
> > the check_heap_object() for kernel stacks to improve performance.
> > Similarly, the virtually-mapped stack can skip check_heap_object() also,
> > beacause virt_addr_valid() will return.
> 
> Why not just check_stack_object() first, then check_heap_object() second?
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled
  2018-08-14 13:02   ` [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled Yuanxiaofeng (XiAn)
  2018-08-14 13:09     ` Matthew Wilcox
@ 2018-08-14 18:54     ` Kees Cook
  2018-08-15 11:59       ` Yuanxiaofeng (XiAn)
  1 sibling, 1 reply; 6+ messages in thread
From: Kees Cook @ 2018-08-14 18:54 UTC (permalink / raw)
  To: Yuanxiaofeng (XiAn); +Cc: Matthew Wilcox, linux-mm, linux-kernel

(Please use contextual quoting in replies... mixing contextual with
top-posting becomes very hard to read...)

On Tue, Aug 14, 2018 at 6:02 AM, Yuanxiaofeng (XiAn)
<yuanxiaofeng1@huawei.com> wrote:
> On Tue, Aug 14, 2018 at 8:35PM Matthew Wilcox wrote:
>> On Tue, Aug 14, 2018 at 08:17:31PM +0800, Xiaofeng Yuan wrote:
>>> The check_heap_object() checks the spanning multiple pages and slab.
>>> When the page-spanning test is disabled, the check_heap_object() is
>>> redundant for spanning multiple pages. However, the kernel stacks are
>>> multiple pages under certain conditions: CONFIG_ARCH_THREAD_STACK_ALLOCATOR
>>> is not defined and (THREAD_SIZE >= PAGE_SIZE). At this point, We can skip
>>> the check_heap_object() for kernel stacks to improve performance.
>>> Similarly, the virtually-mapped stack can skip check_heap_object() also,
>>> beacause virt_addr_valid() will return.
>>
>> Why not just check_stack_object() first, then check_heap_object() second?

Most of the dynamically-sized copies (i.e. those that will trigger
__check_object_size being used at all) come out of heap. Stack copies
tend to be a fixed size. That said, the stack check is pretty cheap:
if it's not bounded by task_stack_page(current) ... +THREAD_SIZE, it
kicks out immediately. The frame-walking will only happen if it IS
actually stack (and once finished will short-circuit all remaining
tests).

> 1, When the THREAD_SIZE is less than PAGE_SIZE, the stack will allocate memory by kmem_cache_alloc_node(), it's slab memory and will execute __check_heap_object().

Correct, though if an architecture supports stack frame analysis, this
is a more narrow check than the bulk heap object check. (i.e. it may
have sub-object granularity to determine if a copy spans a stack
frame.) This supports the idea of just doing the stack check first,
though.

> 2, When CONFIG_HARDENED_USERCOPY_PAGESPAN is enabled, the multiple-pages stacks will do some check in check_page_span().

PAGESPAN checking is buggy for a lot of reasons, unfortunately. It
should generally stay disabled unless someone is working on getting
rid of allocations that _should_ have marked themselves as spanning
pages. It's unclear if this is even a solvable problem in the kernel
right now due to how networking code manages skbs.

> So, I set some restrictions to make sure the useful check will not be skipped.

It'd be nice to find some workloads that visibly change by making the
heap/stack order change. I think the known worst-case (small-packet
UDP flooding) wouldn't get worse since both checks will be performed
in either case.

(Maybe we should also short-circuit early in heap checks if it IS a
valid heap object: no reason to go do the kernel text check after
that...)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled
  2018-08-14 18:54     ` Kees Cook
@ 2018-08-15 11:59       ` Yuanxiaofeng (XiAn)
  0 siblings, 0 replies; 6+ messages in thread
From: Yuanxiaofeng (XiAn) @ 2018-08-15 11:59 UTC (permalink / raw)
  To: Kees Cook; +Cc: Matthew Wilcox, linux-mm, linux-kernel, yinyouzhan



On 8/15/2018 2:54 AM, Kees Cook wrote:
> (Please use contextual quoting in replies... mixing contextual with
> top-posting becomes very hard to read...)
> 
> On Tue, Aug 14, 2018 at 6:02 AM, Yuanxiaofeng (XiAn)
> <yuanxiaofeng1@huawei.com> wrote:
>> On Tue, Aug 14, 2018 at 8:35PM Matthew Wilcox wrote:
>>> On Tue, Aug 14, 2018 at 08:17:31PM +0800, Xiaofeng Yuan wrote:
>>>> The check_heap_object() checks the spanning multiple pages and slab.
>>>> When the page-spanning test is disabled, the check_heap_object() is
>>>> redundant for spanning multiple pages. However, the kernel stacks are
>>>> multiple pages under certain conditions: CONFIG_ARCH_THREAD_STACK_ALLOCATOR
>>>> is not defined and (THREAD_SIZE >= PAGE_SIZE). At this point, We can skip
>>>> the check_heap_object() for kernel stacks to improve performance.
>>>> Similarly, the virtually-mapped stack can skip check_heap_object() also,
>>>> beacause virt_addr_valid() will return.
>>>
>>> Why not just check_stack_object() first, then check_heap_object() second?
> 
> Most of the dynamically-sized copies (i.e. those that will trigger
> __check_object_size being used at all) come out of heap. Stack copies
> tend to be a fixed size. That said, the stack check is pretty cheap:
> if it's not bounded by task_stack_page(current) ... +THREAD_SIZE, it
> kicks out immediately. The frame-walking will only happen if it IS
> actually stack (and once finished will short-circuit all remaining
> tests).
>
>> 1, When the THREAD_SIZE is less than PAGE_SIZE, the stack will allocate memory by kmem_cache_alloc_node(), it's slab memory and will execute __check_heap_object().
> 
> Correct, though if an architecture supports stack frame analysis, this
> is a more narrow check than the bulk heap object check. (i.e. it may
> have sub-object granularity to determine if a copy spans a stack
> frame.) This supports the idea of just doing the stack check first,
> though.
> 

>> 2, When CONFIG_HARDENED_USERCOPY_PAGESPAN is enabled, the multiple-pages stacks will do some check in check_page_span().
> 
> PAGESPAN checking is buggy for a lot of reasons, unfortunately. It
> should generally stay disabled unless someone is working on getting
> rid of allocations that _should_ have marked themselves as spanning
> pages. It's unclear if this is even a solvable problem in the kernel
> right now due to how networking code manages skbs.
> 
I also found the PAGESPAN is disabled by default, it's a reason why I
change the heap/stack order. If PAGESPAN is enabled in the future,
this patch will restore the original check flow.

>> So, I set some restrictions to make sure the useful check will not be skipped.
> 
> It'd be nice to find some workloads that visibly change by making the
> heap/stack order change. I think the known worst-case (small-packet
> UDP flooding) wouldn't get worse since both checks will be performed
> in either case.
> 
Only the stack will skip the heap check. Other scenarios will return
NOT_STACK in check_stack_object(), and will not skip any checks.
This change just influences and benefits to the kernel stack check.

> (Maybe we should also short-circuit early in heap checks if it IS a
> valid heap object: no reason to go do the kernel text check after
> that...)
> 
I tested the average time of each check (there may be some bias on different
devices), the check_heap_object() spend most time.  And the kernel stack spend
many time in heap check. It's another reason why I want change the order.
If we can skip the valid heap's text check, we need do some validation.
However, it will be beneficial to performance as the usercopy check is executed
frequently.

> -Kees
> 
-Xiaofeng


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-15 11:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14 12:17 [PATCH RFC] usercopy: optimize stack check flow when the Xiaofeng Yuan
2018-08-14 12:34 ` Matthew Wilcox
2018-08-14 13:02   ` [PATCH RFC] usercopy: optimize stack check flow when the page-spanning test is disabled Yuanxiaofeng (XiAn)
2018-08-14 13:09     ` Matthew Wilcox
2018-08-14 18:54     ` Kees Cook
2018-08-15 11:59       ` Yuanxiaofeng (XiAn)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).