linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arch/x86: fix out-of-bounds in get_wchan()
@ 2015-09-28  9:00 Dmitry Vyukov
  2015-09-28  9:37 ` Borislav Petkov
  2015-09-28 15:40 ` Andrey Ryabinin
  0 siblings, 2 replies; 30+ messages in thread
From: Dmitry Vyukov @ 2015-09-28  9:00 UTC (permalink / raw)
  To: tglx, mingo, hpa, luto, bp, dvlasenk
  Cc: x86, linux-kernel, kcc, glider, andreyknvl, ryabinin.a.a,
	sasha.levin, ak, kasan-dev, Dmitry Vyukov

get_wchan() checks that fp is within stack bounds,
but then dereferences fp+8. This can crash kernel
or leak sensitive information. Also the function
operates on a potentially running stack, but does
not use READ_ONCE. As the result it can check that
one value is within stack bounds, but then deref
another value.

Fix the bounds check and use READ_ONCE for all
volatile data.

The bug was discovered with KASAN.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
---
FTR, here is the KASAN report:

[  124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000
[  124.578633] Accessed by thread T10915:
[  124.581050]   #2 ffffffff810dd423 in __tsan_read8 ??:0
[  124.581893]   #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444
[  124.582763]   #4 ffffffff81342108 in do_task_stat array.c:0
[  124.583634]   #5 ffffffff81342dcc in proc_tgid_stat ??:0
[  124.584548]   #6 ffffffff8133c984 in proc_single_show base.c:0
[  124.585461]   #7 ffffffff812d18cc in seq_read ./fs/seq_file.c:222
[  124.586313]   #8 ffffffff8129e503 in vfs_read ??:0
[  124.587137]   #9 ffffffff8129f800 in SyS_read ??:0
[  124.587827]   #10 ffffffff81929bf5 in sysenter_dispatch ./arch/x86/ia32/ia32entry.S:164
[  124.588738]
[  124.593434] Shadow bytes around the buggy address:
[  124.594270]   ffff88002e27fd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  124.595339]   ffff88002e27fe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  124.596453]   ffff88002e27fe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  124.597466]   ffff88002e27ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  124.598501]   ffff88002e27ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  124.599629] =>ffff88002e280000:[fa]fa fa fa fa fa fa fa fa fa 00 00 00 00 00 00
[  124.600873]   ffff88002e280080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  124.601892]   ffff88002e280100: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[  124.603037]   ffff88002e280180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[  124.604047]   ffff88002e280200: fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd
[  124.605054]   ffff88002e280280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
[  124.605993] Shadow byte legend (one shadow byte represents 8 application bytes):
[  124.606958]   Addressable:   00
[  124.607483]   Partially addressable: 01 02 03 04 05 06 07
[  124.608219]   Heap redzone:  fa
[  124.608724]   Heap kmalloc redzone:  fb
[  124.609249]   Freed heap region: fd
[  124.609753]   Shadow gap:fe
[  124.610292] =========================================================================
---
 arch/x86/kernel/process_64.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 71d7849..a1fce34 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -506,17 +506,19 @@ unsigned long get_wchan(struct task_struct *p)
 	if (!p || p == current || p->state == TASK_RUNNING)
 		return 0;
 	stack = (unsigned long)task_stack_page(p);
-	if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
+	/* The task can be already running at this point, so tread carefully. */
+	fp = READ_ONCE(p->thread.sp);
+	if (fp < stack || fp >= stack+THREAD_SIZE)
 		return 0;
-	fp = *(u64 *)(p->thread.sp);
+	fp = READ_ONCE(*(u64 *)fp);
 	do {
 		if (fp < (unsigned long)stack ||
-		    fp >= (unsigned long)stack+THREAD_SIZE)
+		    fp+8 >= (unsigned long)stack+THREAD_SIZE)
 			return 0;
-		ip = *(u64 *)(fp+8);
+		ip = READ_ONCE(*(u64 *)(fp+8));
 		if (!in_sched_functions(ip))
 			return ip;
-		fp = *(u64 *)fp;
+		fp = READ_ONCE(*(u64 *)fp);
 	} while (count++ < 16);
 	return 0;
 }
-- 
2.6.0.rc2.230.g3dd15c0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28  9:00 [PATCH] arch/x86: fix out-of-bounds in get_wchan() Dmitry Vyukov
@ 2015-09-28  9:37 ` Borislav Petkov
  2015-09-28  9:49   ` Dmitry Vyukov
  2015-09-28  9:54   ` Dmitry Vyukov
  2015-09-28 15:40 ` Andrey Ryabinin
  1 sibling, 2 replies; 30+ messages in thread
From: Borislav Petkov @ 2015-09-28  9:37 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: tglx, mingo, hpa, luto, dvlasenk, x86, linux-kernel, kcc, glider,
	andreyknvl, ryabinin.a.a, sasha.levin, ak, kasan-dev

On Mon, Sep 28, 2015 at 11:00:39AM +0200, Dmitry Vyukov wrote:
> get_wchan() checks that fp is within stack bounds,
> but then dereferences fp+8. This can crash kernel
> or leak sensitive information. Also the function
> operates on a potentially running stack, but does
> not use READ_ONCE. As the result it can check that
> one value is within stack bounds, but then deref
> another value.
> 
> Fix the bounds check and use READ_ONCE for all
> volatile data.
> 
> The bug was discovered with KASAN.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> ---
> FTR, here is the KASAN report:
> 
> [  124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000
> [  124.578633] Accessed by thread T10915:
> [  124.581050]   #2 ffffffff810dd423 in __tsan_read8 ??:0
> [  124.581893]   #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444
> [  124.582763]   #4 ffffffff81342108 in do_task_stat array.c:0
> [  124.583634]   #5 ffffffff81342dcc in proc_tgid_stat ??:0
> [  124.584548]   #6 ffffffff8133c984 in proc_single_show base.c:0
> [  124.585461]   #7 ffffffff812d18cc in seq_read ./fs/seq_file.c:222
> [  124.586313]   #8 ffffffff8129e503 in vfs_read ??:0
> [  124.587137]   #9 ffffffff8129f800 in SyS_read ??:0
> [  124.587827]   #10 ffffffff81929bf5 in sysenter_dispatch ./arch/x86/ia32/ia32entry.S:164
> [  124.588738]
> [  124.593434] Shadow bytes around the buggy address:
> [  124.594270]   ffff88002e27fd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  124.595339]   ffff88002e27fe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  124.596453]   ffff88002e27fe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  124.597466]   ffff88002e27ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  124.598501]   ffff88002e27ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  124.599629] =>ffff88002e280000:[fa]fa fa fa fa fa fa fa fa fa 00 00 00 00 00 00
> [  124.600873]   ffff88002e280080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [  124.601892]   ffff88002e280100: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
> [  124.603037]   ffff88002e280180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
> [  124.604047]   ffff88002e280200: fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd
> [  124.605054]   ffff88002e280280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
> [  124.605993] Shadow byte legend (one shadow byte represents 8 application bytes):
> [  124.606958]   Addressable:   00
> [  124.607483]   Partially addressable: 01 02 03 04 05 06 07
> [  124.608219]   Heap redzone:  fa
> [  124.608724]   Heap kmalloc redzone:  fb
> [  124.609249]   Freed heap region: fd
> [  124.609753]   Shadow gap:fe
> [  124.610292] =========================================================================
> ---
>  arch/x86/kernel/process_64.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 71d7849..a1fce34 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -506,17 +506,19 @@ unsigned long get_wchan(struct task_struct *p)
>  	if (!p || p == current || p->state == TASK_RUNNING)
>  		return 0;
>  	stack = (unsigned long)task_stack_page(p);
> -	if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
> +	/* The task can be already running at this point, so tread carefully. */
> +	fp = READ_ONCE(p->thread.sp);
> +	if (fp < stack || fp >= stack+THREAD_SIZE)
>  		return 0;
> -	fp = *(u64 *)(p->thread.sp);
> +	fp = READ_ONCE(*(u64 *)fp);

Why isn't this:

	fp = READ_ONCE(*(u64 *)p->thread.sp);

like the original code did?

Actually, the original code looks fishy to me too - it did access live
stack three times. And shouldn't we be accessing it only once?

I.e.,

	fp_st = READ_ONCE(p->thread.sp);
	if (fp_st < stack || fp_st >= stack + THREAD_SIZE)
		return 0;
	fp = *(u64 *)fp_st;

Hmm?

Maybe I'm not completely clear on how the whole locking happens here
because we do

        if (!p || p == current || p->state == TASK_RUNNING)
                return 0;

earlier but apparently we can become TASK_RUNNING after the check...

Also, shouldn't this one have a CVE number assigned or so due to the
leakage potential?

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28  9:37 ` Borislav Petkov
@ 2015-09-28  9:49   ` Dmitry Vyukov
  2015-09-28 10:23     ` Borislav Petkov
  2015-09-28  9:54   ` Dmitry Vyukov
  1 sibling, 1 reply; 30+ messages in thread
From: Dmitry Vyukov @ 2015-09-28  9:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto, dvlasenk,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Andrey Ryabinin, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 11:37 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Sep 28, 2015 at 11:00:39AM +0200, Dmitry Vyukov wrote:
>> get_wchan() checks that fp is within stack bounds,
>> but then dereferences fp+8. This can crash kernel
>> or leak sensitive information. Also the function
>> operates on a potentially running stack, but does
>> not use READ_ONCE. As the result it can check that
>> one value is within stack bounds, but then deref
>> another value.
>>
>> Fix the bounds check and use READ_ONCE for all
>> volatile data.
>>
>> The bug was discovered with KASAN.
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> ---
>> FTR, here is the KASAN report:
>>
>> [  124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000
>> [  124.578633] Accessed by thread T10915:
>> [  124.581050]   #2 ffffffff810dd423 in __tsan_read8 ??:0
>> [  124.581893]   #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444
>> [  124.582763]   #4 ffffffff81342108 in do_task_stat array.c:0
>> [  124.583634]   #5 ffffffff81342dcc in proc_tgid_stat ??:0
>> [  124.584548]   #6 ffffffff8133c984 in proc_single_show base.c:0
>> [  124.585461]   #7 ffffffff812d18cc in seq_read ./fs/seq_file.c:222
>> [  124.586313]   #8 ffffffff8129e503 in vfs_read ??:0
>> [  124.587137]   #9 ffffffff8129f800 in SyS_read ??:0
>> [  124.587827]   #10 ffffffff81929bf5 in sysenter_dispatch ./arch/x86/ia32/ia32entry.S:164
>> [  124.588738]
>> [  124.593434] Shadow bytes around the buggy address:
>> [  124.594270]   ffff88002e27fd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.595339]   ffff88002e27fe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.596453]   ffff88002e27fe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.597466]   ffff88002e27ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.598501]   ffff88002e27ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.599629] =>ffff88002e280000:[fa]fa fa fa fa fa fa fa fa fa 00 00 00 00 00 00
>> [  124.600873]   ffff88002e280080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.601892]   ffff88002e280100: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> [  124.603037]   ffff88002e280180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> [  124.604047]   ffff88002e280200: fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd
>> [  124.605054]   ffff88002e280280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
>> [  124.605993] Shadow byte legend (one shadow byte represents 8 application bytes):
>> [  124.606958]   Addressable:   00
>> [  124.607483]   Partially addressable: 01 02 03 04 05 06 07
>> [  124.608219]   Heap redzone:  fa
>> [  124.608724]   Heap kmalloc redzone:  fb
>> [  124.609249]   Freed heap region: fd
>> [  124.609753]   Shadow gap:fe
>> [  124.610292] =========================================================================
>> ---
>>  arch/x86/kernel/process_64.c | 12 +++++++-----
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index 71d7849..a1fce34 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -506,17 +506,19 @@ unsigned long get_wchan(struct task_struct *p)
>>       if (!p || p == current || p->state == TASK_RUNNING)
>>               return 0;
>>       stack = (unsigned long)task_stack_page(p);
>> -     if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>> +     /* The task can be already running at this point, so tread carefully. */
>> +     fp = READ_ONCE(p->thread.sp);
>> +     if (fp < stack || fp >= stack+THREAD_SIZE)
>>               return 0;
>> -     fp = *(u64 *)(p->thread.sp);
>> +     fp = READ_ONCE(*(u64 *)fp);
>
> Why isn't this:
>
>         fp = READ_ONCE(*(u64 *)p->thread.sp);
>
> like the original code did?


Original code did:

     if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
           return 0;
     fp = *(u64 *)(p->thread.sp);

p->thread.sp can change concurrently.
So we could check that p->thread.sp is within stack bounds, but then
dereference another value (which is already outside of bounds).




> Actually, the original code looks fishy to me too - it did access live
> stack three times. And shouldn't we be accessing it only once?
>
> I.e.,
>
>         fp_st = READ_ONCE(p->thread.sp);
>         if (fp_st < stack || fp_st >= stack + THREAD_SIZE)
>                 return 0;
>         fp = *(u64 *)fp_st;
>
> Hmm?

That's what my patch does.


> Maybe I'm not completely clear on how the whole locking happens here
> because we do
>
>         if (!p || p == current || p->state == TASK_RUNNING)
>                 return 0;
>
> earlier but apparently we can become TASK_RUNNING after the check...
>
> Also, shouldn't this one have a CVE number assigned or so due to the
> leakage potential?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28  9:37 ` Borislav Petkov
  2015-09-28  9:49   ` Dmitry Vyukov
@ 2015-09-28  9:54   ` Dmitry Vyukov
  2015-09-28 10:32     ` Borislav Petkov
  1 sibling, 1 reply; 30+ messages in thread
From: Dmitry Vyukov @ 2015-09-28  9:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto, dvlasenk,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Andrey Ryabinin, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 11:37 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Sep 28, 2015 at 11:00:39AM +0200, Dmitry Vyukov wrote:
>> get_wchan() checks that fp is within stack bounds,
>> but then dereferences fp+8. This can crash kernel
>> or leak sensitive information. Also the function
>> operates on a potentially running stack, but does
>> not use READ_ONCE. As the result it can check that
>> one value is within stack bounds, but then deref
>> another value.
>>
>> Fix the bounds check and use READ_ONCE for all
>> volatile data.
>>
>> The bug was discovered with KASAN.
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> ---
>> FTR, here is the KASAN report:
>>
>> [  124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000
>> [  124.578633] Accessed by thread T10915:
>> [  124.581050]   #2 ffffffff810dd423 in __tsan_read8 ??:0
>> [  124.581893]   #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444
>> [  124.582763]   #4 ffffffff81342108 in do_task_stat array.c:0
>> [  124.583634]   #5 ffffffff81342dcc in proc_tgid_stat ??:0
>> [  124.584548]   #6 ffffffff8133c984 in proc_single_show base.c:0
>> [  124.585461]   #7 ffffffff812d18cc in seq_read ./fs/seq_file.c:222
>> [  124.586313]   #8 ffffffff8129e503 in vfs_read ??:0
>> [  124.587137]   #9 ffffffff8129f800 in SyS_read ??:0
>> [  124.587827]   #10 ffffffff81929bf5 in sysenter_dispatch ./arch/x86/ia32/ia32entry.S:164
>> [  124.588738]
>> [  124.593434] Shadow bytes around the buggy address:
>> [  124.594270]   ffff88002e27fd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.595339]   ffff88002e27fe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.596453]   ffff88002e27fe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.597466]   ffff88002e27ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.598501]   ffff88002e27ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.599629] =>ffff88002e280000:[fa]fa fa fa fa fa fa fa fa fa 00 00 00 00 00 00
>> [  124.600873]   ffff88002e280080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [  124.601892]   ffff88002e280100: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> [  124.603037]   ffff88002e280180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> [  124.604047]   ffff88002e280200: fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd
>> [  124.605054]   ffff88002e280280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
>> [  124.605993] Shadow byte legend (one shadow byte represents 8 application bytes):
>> [  124.606958]   Addressable:   00
>> [  124.607483]   Partially addressable: 01 02 03 04 05 06 07
>> [  124.608219]   Heap redzone:  fa
>> [  124.608724]   Heap kmalloc redzone:  fb
>> [  124.609249]   Freed heap region: fd
>> [  124.609753]   Shadow gap:fe
>> [  124.610292] =========================================================================
>> ---
>>  arch/x86/kernel/process_64.c | 12 +++++++-----
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index 71d7849..a1fce34 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -506,17 +506,19 @@ unsigned long get_wchan(struct task_struct *p)
>>       if (!p || p == current || p->state == TASK_RUNNING)
>>               return 0;
>>       stack = (unsigned long)task_stack_page(p);
>> -     if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>> +     /* The task can be already running at this point, so tread carefully. */
>> +     fp = READ_ONCE(p->thread.sp);
>> +     if (fp < stack || fp >= stack+THREAD_SIZE)
>>               return 0;
>> -     fp = *(u64 *)(p->thread.sp);
>> +     fp = READ_ONCE(*(u64 *)fp);
>
> Why isn't this:
>
>         fp = READ_ONCE(*(u64 *)p->thread.sp);
>
> like the original code did?
>
> Actually, the original code looks fishy to me too - it did access live
> stack three times. And shouldn't we be accessing it only once?
>
> I.e.,
>
>         fp_st = READ_ONCE(p->thread.sp);
>         if (fp_st < stack || fp_st >= stack + THREAD_SIZE)
>                 return 0;
>         fp = *(u64 *)fp_st;
>
> Hmm?
>
> Maybe I'm not completely clear on how the whole locking happens here
> because we do
>
>         if (!p || p == current || p->state == TASK_RUNNING)
>                 return 0;
>
> earlier but apparently we can become TASK_RUNNING after the check...
>
> Also, shouldn't this one have a CVE number assigned or so due to the
> leakage potential?

Dunno. Should it? Most likely the data will be leaked iff it is
in_sched_functions. Never requested a CVE number before. If you
insist, I guess I can contact MITRE.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28  9:49   ` Dmitry Vyukov
@ 2015-09-28 10:23     ` Borislav Petkov
  2015-09-28 10:33       ` Dmitry Vyukov
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2015-09-28 10:23 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto, dvlasenk,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Andrey Ryabinin, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 11:49:18AM +0200, Dmitry Vyukov wrote:
> Original code did:
> 
>      if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>            return 0;
>      fp = *(u64 *)(p->thread.sp);
> 
> p->thread.sp can change concurrently.
> So we could check that p->thread.sp is within stack bounds, but then
> dereference another value (which is already outside of bounds).

Right, we do deref it. I realized that after hitting "Send" :\

Which begs another, probably also stupid, question:

What guarantees the task won't disappear after we've checked p?

I.e., after this:

        if (!p || p == current || p->state == TASK_RUNNING)
                return 0;

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28  9:54   ` Dmitry Vyukov
@ 2015-09-28 10:32     ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2015-09-28 10:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto, dvlasenk,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Andrey Ryabinin, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 11:54:43AM +0200, Dmitry Vyukov wrote:
> Dunno. Should it? Most likely the data will be leaked iff it is
> in_sched_functions.

.. if not in_sched_functions...

> Never requested a CVE number before. If you insist, I guess I can
> contact MITRE.

Just wondering - generally the leaking to userspace things do get a CVE
number but let's see what the others think.

luto will know - he loves to open CVEs! :-P

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28 10:23     ` Borislav Petkov
@ 2015-09-28 10:33       ` Dmitry Vyukov
  2015-09-28 10:51         ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Vyukov @ 2015-09-28 10:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto, dvlasenk,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Andrey Ryabinin, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 12:23 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Sep 28, 2015 at 11:49:18AM +0200, Dmitry Vyukov wrote:
>> Original code did:
>>
>>      if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>>            return 0;
>>      fp = *(u64 *)(p->thread.sp);
>>
>> p->thread.sp can change concurrently.
>> So we could check that p->thread.sp is within stack bounds, but then
>> dereference another value (which is already outside of bounds).
>
> Right, we do deref it. I realized that after hitting "Send" :\
>
> Which begs another, probably also stupid, question:
>
> What guarantees the task won't disappear after we've checked p?
>
> I.e., after this:
>
>         if (!p || p == current || p->state == TASK_RUNNING)
>                 return 0;


I have not checked, but I would expect that it is caller's
responsibility. There is generally no way to magically resurrect a
pointer to a freed object passed in.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28 10:33       ` Dmitry Vyukov
@ 2015-09-28 10:51         ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2015-09-28 10:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, luto, dvlasenk,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Andrey Ryabinin, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 12:33:09PM +0200, Dmitry Vyukov wrote:
> I have not checked, but I would expect that it is caller's
> responsibility.

Looks like it: proc, for example, does get_pid_task()->get_task_struct().

> There is generally no way to magically resurrect a
> pointer to a freed object passed in.

Right.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28  9:00 [PATCH] arch/x86: fix out-of-bounds in get_wchan() Dmitry Vyukov
  2015-09-28  9:37 ` Borislav Petkov
@ 2015-09-28 15:40 ` Andrey Ryabinin
  2015-09-28 16:08   ` Dmitry Vyukov
  1 sibling, 1 reply; 30+ messages in thread
From: Andrey Ryabinin @ 2015-09-28 15:40 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, ak,
	kasan-dev

2015-09-28 12:00 GMT+03:00 Dmitry Vyukov <dvyukov@google.com>:
>         stack = (unsigned long)task_stack_page(p);
> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
> +       /* The task can be already running at this point, so tread carefully. */
> +       fp = READ_ONCE(p->thread.sp);
> +       if (fp < stack || fp >= stack+THREAD_SIZE)

Since we deference fp, it should be "|| fp + sizeof(u64) >= stack + THREAD_SIZE"

>                 return 0;
> -       fp = *(u64 *)(p->thread.sp);
> +       fp = READ_ONCE(*(u64 *)fp);
>         do {
>                 if (fp < (unsigned long)stack ||
> -                   fp >= (unsigned long)stack+THREAD_SIZE)
> +                   fp+8 >= (unsigned long)stack+THREAD_SIZE)

The same as above;
        'fp+8 +sizeof(u64) >= ...'

>                         return 0;
> -               ip = *(u64 *)(fp+8);
> +               ip = READ_ONCE(*(u64 *)(fp+8));
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28 15:40 ` Andrey Ryabinin
@ 2015-09-28 16:08   ` Dmitry Vyukov
  2015-09-28 16:32     ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Vyukov @ 2015-09-28 16:08 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, Sep 28, 2015 at 5:40 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> 2015-09-28 12:00 GMT+03:00 Dmitry Vyukov <dvyukov@google.com>:
>>         stack = (unsigned long)task_stack_page(p);
>> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>> +       /* The task can be already running at this point, so tread carefully. */
>> +       fp = READ_ONCE(p->thread.sp);
>> +       if (fp < stack || fp >= stack+THREAD_SIZE)
>
> Since we deference fp, it should be "|| fp + sizeof(u64) >= stack + THREAD_SIZE"

Good point.
I guess it should be "|| fp + sizeof(u64) > stack + THREAD_SIZE",
because == is OK if we add 8.

>>                 return 0;
>> -       fp = *(u64 *)(p->thread.sp);
>> +       fp = READ_ONCE(*(u64 *)fp);
>>         do {
>>                 if (fp < (unsigned long)stack ||
>> -                   fp >= (unsigned long)stack+THREAD_SIZE)
>> +                   fp+8 >= (unsigned long)stack+THREAD_SIZE)
>
> The same as above;
>         'fp+8 +sizeof(u64) >= ...'
>
>>                         return 0;
>> -               ip = *(u64 *)(fp+8);
>> +               ip = READ_ONCE(*(u64 *)(fp+8));
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28 16:08   ` Dmitry Vyukov
@ 2015-09-28 16:32     ` Thomas Gleixner
  2015-09-29 18:15       ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2015-09-28 16:32 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev

On Mon, 28 Sep 2015, Dmitry Vyukov wrote:

> On Mon, Sep 28, 2015 at 5:40 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> > 2015-09-28 12:00 GMT+03:00 Dmitry Vyukov <dvyukov@google.com>:
> >>         stack = (unsigned long)task_stack_page(p);
> >> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
> >> +       /* The task can be already running at this point, so tread carefully. */
> >> +       fp = READ_ONCE(p->thread.sp);
> >> +       if (fp < stack || fp >= stack+THREAD_SIZE)
> >
> > Since we deference fp, it should be "|| fp + sizeof(u64) >= stack + THREAD_SIZE"
> 
> Good point.
> I guess it should be "|| fp + sizeof(u64) > stack + THREAD_SIZE",
> because == is OK if we add 8.
> 

This whole mess with +8 and -16 and whatever is just crap. And all of
it completely undocumented. Proper version below.

Thanks,

	tglx

8<-------------------------------

Subject: x86/process: Add proper bound checks in 64bit get_wchan()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 28 Sep 2015 17:16:52 +0200

Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be top of stack minus 2 * sizeof(unsigned
   long). This is required because the stack pointer points at the
   frame pointer. The layout on the stack is: ... IP FP ... IP FP.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: x86@kernel.org
---
 arch/x86/kernel/process_64.c |   41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

Index: tip/arch/x86/kernel/process_64.c
===================================================================
--- tip.orig/arch/x86/kernel/process_64.c
+++ tip/arch/x86/kernel/process_64.c
@@ -501,24 +501,47 @@ EXPORT_SYMBOL_GPL(set_personality_ia32);
 
 unsigned long get_wchan(struct task_struct *p)
 {
-	unsigned long stack;
-	u64 fp, ip;
+	unsigned long start, bottom, top, sp, fp, ip;
 	int count = 0;
 
 	if (!p || p == current || p->state == TASK_RUNNING)
 		return 0;
-	stack = (unsigned long)task_stack_page(p);
-	if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
+
+	start = (unsigned long)task_stack_page(p);
+	if (!start)
 		return 0;
-	fp = *(u64 *)(p->thread.sp);
+
+	/*
+	 * Layout of the stack page:
+	 *
+	 * ----------- top = start = THREAD_SIZE - sizeof(unsigned long)
+	 * stack
+	 * ----------- bottom = start + sizeof(thread_info)
+	 * thread_info
+	 * ----------- start
+	 *
+	 * The tasks stack pointer points at the location where the
+	 * framepointer is stored. The data on the stack is:
+	 * ... IP FP ... IP FP
+	 *
+	 * We need to read FP and IP, so we need to adjust the upper
+	 * bound by another unsigned long.
+	 */
+	top = start + THREAD_SIZE - 2 * sizeof(unsigned long);
+	bottom = start + sizeof(struct thread_info);
+
+	sp = p->thread.sp;
+	if (sp < bottom || sp > top)
+		return 0;
+
+	fp = *(unsigned long *)sp;
 	do {
-		if (fp < (unsigned long)stack ||
-		    fp >= (unsigned long)stack+THREAD_SIZE)
+		if (fp < bottom || fp > top)
 			return 0;
-		ip = *(u64 *)(fp+8);
+		ip = *(unsigned long *)(fp + sizeof(unsigned long));
 		if (!in_sched_functions(ip))
 			return ip;
-		fp = *(u64 *)fp;
+		fp = *(unsigned long *)fp;
 	} while (count++ < 16);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-28 16:32     ` Thomas Gleixner
@ 2015-09-29 18:15       ` Andy Lutomirski
  2015-09-29 18:30         ` Andy Lutomirski
                           ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Andy Lutomirski @ 2015-09-29 18:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Vyukov, Andrey Ryabinin, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, x86, LKML,
	Kostya Serebryany, Alexander Potapenko, Andrey Konovalov,
	Sasha Levin, Andi Kleen, kasan-dev

On Mon, Sep 28, 2015 at 9:32 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 28 Sep 2015, Dmitry Vyukov wrote:
>
>> On Mon, Sep 28, 2015 at 5:40 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
>> > 2015-09-28 12:00 GMT+03:00 Dmitry Vyukov <dvyukov@google.com>:
>> >>         stack = (unsigned long)task_stack_page(p);
>> >> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>> >> +       /* The task can be already running at this point, so tread carefully. */
>> >> +       fp = READ_ONCE(p->thread.sp);
>> >> +       if (fp < stack || fp >= stack+THREAD_SIZE)
>> >
>> > Since we deference fp, it should be "|| fp + sizeof(u64) >= stack + THREAD_SIZE"
>>
>> Good point.
>> I guess it should be "|| fp + sizeof(u64) > stack + THREAD_SIZE",
>> because == is OK if we add 8.
>>
>
> This whole mess with +8 and -16 and whatever is just crap. And all of
> it completely undocumented. Proper version below.
>
> Thanks,
>
>         tglx
>
> 8<-------------------------------
>
> Subject: x86/process: Add proper bound checks in 64bit get_wchan()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Mon, 28 Sep 2015 17:16:52 +0200
>
> Dmitry Vyukov reported the following using trinity and the memory
> error detector AddressSanitizer
> (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
>
> [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
> address ffff88002e280000
> [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
> the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
> [ 124.578633] Accessed by thread T10915:
> [ 124.579295] inlined in describe_heap_address
> ./arch/x86/mm/asan/report.c:164
> [ 124.579295] #0 ffffffff810dd277 in asan_report_error
> ./arch/x86/mm/asan/report.c:278
> [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
> ./arch/x86/mm/asan/asan.c:37
> [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
> [ 124.581893] #3 ffffffff8107c093 in get_wchan
> ./arch/x86/kernel/process_64.c:444
>
> The address checks in the 64bit implementation of get_wchan() are
> wrong in several ways:
>
>  - The lower bound of the stack is not the start of the stack
>    page. It's the start of the stack page plus sizeof (struct
>    thread_info)
>
>  - The upper bound must be top of stack minus 2 * sizeof(unsigned
>    long). This is required because the stack pointer points at the
>    frame pointer. The layout on the stack is: ... IP FP ... IP FP.
>
> Fix the bound checks and get rid of the mix of numeric constants, u64
> and unsigned long. Making all unsigned long allows us to use the same
> function for 32bit as well.
>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Cc: x86@kernel.org
> ---
>  arch/x86/kernel/process_64.c |   41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
>
> Index: tip/arch/x86/kernel/process_64.c
> ===================================================================
> --- tip.orig/arch/x86/kernel/process_64.c
> +++ tip/arch/x86/kernel/process_64.c
> @@ -501,24 +501,47 @@ EXPORT_SYMBOL_GPL(set_personality_ia32);
>
>  unsigned long get_wchan(struct task_struct *p)
>  {
> -       unsigned long stack;
> -       u64 fp, ip;
> +       unsigned long start, bottom, top, sp, fp, ip;
>         int count = 0;
>
>         if (!p || p == current || p->state == TASK_RUNNING)
>                 return 0;
> -       stack = (unsigned long)task_stack_page(p);
> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
> +
> +       start = (unsigned long)task_stack_page(p);
> +       if (!start)
>                 return 0;
> -       fp = *(u64 *)(p->thread.sp);
> +
> +       /*
> +        * Layout of the stack page:
> +        *
> +        * ----------- top = start = THREAD_SIZE - sizeof(unsigned long)
> +        * stack

There's TOP_OF_KERNEL_STACK_PADDING in here, too.  Arguably the
padding is still in bounds, though.  Also, I think you mean "start +",
not "start =".

> +        * ----------- bottom = start + sizeof(thread_info)
> +        * thread_info
> +        * ----------- start
> +        *
> +        * The tasks stack pointer points at the location where the
> +        * framepointer is stored. The data on the stack is:
> +        * ... IP FP ... IP FP
> +        *
> +        * We need to read FP and IP, so we need to adjust the upper
> +        * bound by another unsigned long.
> +        */
> +       top = start + THREAD_SIZE - 2 * sizeof(unsigned long);
> +       bottom = start + sizeof(struct thread_info);
> +
> +       sp = p->thread.sp;
> +       if (sp < bottom || sp > top)
> +               return 0;
> +
> +       fp = *(unsigned long *)sp;
>         do {
> -               if (fp < (unsigned long)stack ||
> -                   fp >= (unsigned long)stack+THREAD_SIZE)
> +               if (fp < bottom || fp > top)
>                         return 0;
> -               ip = *(u64 *)(fp+8);
> +               ip = *(unsigned long *)(fp + sizeof(unsigned long));
>                 if (!in_sched_functions(ip))
>                         return ip;
> -               fp = *(u64 *)fp;
> +               fp = *(unsigned long *)fp;
>         } while (count++ < 16);

I'm be vaguely amazed if this isn't an exploitable info leak even
without the out of bounds thing.  Can we really not find a way to do
this without walking the stack?

The bounds checking looks okay, though.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-29 18:15       ` Andy Lutomirski
@ 2015-09-29 18:30         ` Andy Lutomirski
  2015-09-29 18:41           ` Borislav Petkov
  2015-09-30  7:15         ` [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan Ingo Molnar
  2015-09-30  8:07         ` [PATCH] arch/x86: fix out-of-bounds in get_wchan() Thomas Gleixner
  2 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2015-09-29 18:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Vyukov, Andrey Ryabinin, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, x86, LKML,
	Kostya Serebryany, Alexander Potapenko, Andrey Konovalov,
	Sasha Levin, Andi Kleen, kasan-dev

On Tue, Sep 29, 2015 at 11:15 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Sep 28, 2015 at 9:32 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Mon, 28 Sep 2015, Dmitry Vyukov wrote:
>>
>>> On Mon, Sep 28, 2015 at 5:40 PM, Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
>>> > 2015-09-28 12:00 GMT+03:00 Dmitry Vyukov <dvyukov@google.com>:
>>> >>         stack = (unsigned long)task_stack_page(p);
>>> >> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>>> >> +       /* The task can be already running at this point, so tread carefully. */
>>> >> +       fp = READ_ONCE(p->thread.sp);
>>> >> +       if (fp < stack || fp >= stack+THREAD_SIZE)
>>> >
>>> > Since we deference fp, it should be "|| fp + sizeof(u64) >= stack + THREAD_SIZE"
>>>
>>> Good point.
>>> I guess it should be "|| fp + sizeof(u64) > stack + THREAD_SIZE",
>>> because == is OK if we add 8.
>>>
>>
>> This whole mess with +8 and -16 and whatever is just crap. And all of
>> it completely undocumented. Proper version below.
>>
>> Thanks,
>>
>>         tglx
>>
>> 8<-------------------------------
>>
>> Subject: x86/process: Add proper bound checks in 64bit get_wchan()
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Date: Mon, 28 Sep 2015 17:16:52 +0200
>>
>> Dmitry Vyukov reported the following using trinity and the memory
>> error detector AddressSanitizer
>> (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
>>
>> [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
>> address ffff88002e280000
>> [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
>> the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
>> [ 124.578633] Accessed by thread T10915:
>> [ 124.579295] inlined in describe_heap_address
>> ./arch/x86/mm/asan/report.c:164
>> [ 124.579295] #0 ffffffff810dd277 in asan_report_error
>> ./arch/x86/mm/asan/report.c:278
>> [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
>> ./arch/x86/mm/asan/asan.c:37
>> [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
>> [ 124.581893] #3 ffffffff8107c093 in get_wchan
>> ./arch/x86/kernel/process_64.c:444
>>
>> The address checks in the 64bit implementation of get_wchan() are
>> wrong in several ways:
>>
>>  - The lower bound of the stack is not the start of the stack
>>    page. It's the start of the stack page plus sizeof (struct
>>    thread_info)
>>
>>  - The upper bound must be top of stack minus 2 * sizeof(unsigned
>>    long). This is required because the stack pointer points at the
>>    frame pointer. The layout on the stack is: ... IP FP ... IP FP.
>>
>> Fix the bound checks and get rid of the mix of numeric constants, u64
>> and unsigned long. Making all unsigned long allows us to use the same
>> function for 32bit as well.
>>
>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>> Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Andrey Konovalov <andreyknvl@google.com>
>> Cc: x86@kernel.org
>> ---
>>  arch/x86/kernel/process_64.c |   41 ++++++++++++++++++++++++++++++++---------
>>  1 file changed, 32 insertions(+), 9 deletions(-)
>>
>> Index: tip/arch/x86/kernel/process_64.c
>> ===================================================================
>> --- tip.orig/arch/x86/kernel/process_64.c
>> +++ tip/arch/x86/kernel/process_64.c
>> @@ -501,24 +501,47 @@ EXPORT_SYMBOL_GPL(set_personality_ia32);
>>
>>  unsigned long get_wchan(struct task_struct *p)
>>  {
>> -       unsigned long stack;
>> -       u64 fp, ip;
>> +       unsigned long start, bottom, top, sp, fp, ip;
>>         int count = 0;
>>
>>         if (!p || p == current || p->state == TASK_RUNNING)
>>                 return 0;
>> -       stack = (unsigned long)task_stack_page(p);
>> -       if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
>> +
>> +       start = (unsigned long)task_stack_page(p);
>> +       if (!start)
>>                 return 0;
>> -       fp = *(u64 *)(p->thread.sp);
>> +
>> +       /*
>> +        * Layout of the stack page:
>> +        *
>> +        * ----------- top = start = THREAD_SIZE - sizeof(unsigned long)
>> +        * stack
>
> There's TOP_OF_KERNEL_STACK_PADDING in here, too.  Arguably the
> padding is still in bounds, though.  Also, I think you mean "start +",
> not "start =".
>
>> +        * ----------- bottom = start + sizeof(thread_info)
>> +        * thread_info
>> +        * ----------- start
>> +        *
>> +        * The tasks stack pointer points at the location where the
>> +        * framepointer is stored. The data on the stack is:
>> +        * ... IP FP ... IP FP
>> +        *
>> +        * We need to read FP and IP, so we need to adjust the upper
>> +        * bound by another unsigned long.
>> +        */
>> +       top = start + THREAD_SIZE - 2 * sizeof(unsigned long);
>> +       bottom = start + sizeof(struct thread_info);
>> +
>> +       sp = p->thread.sp;
>> +       if (sp < bottom || sp > top)
>> +               return 0;
>> +
>> +       fp = *(unsigned long *)sp;
>>         do {
>> -               if (fp < (unsigned long)stack ||
>> -                   fp >= (unsigned long)stack+THREAD_SIZE)
>> +               if (fp < bottom || fp > top)
>>                         return 0;
>> -               ip = *(u64 *)(fp+8);
>> +               ip = *(unsigned long *)(fp + sizeof(unsigned long));
>>                 if (!in_sched_functions(ip))
>>                         return ip;
>> -               fp = *(u64 *)fp;
>> +               fp = *(unsigned long *)fp;
>>         } while (count++ < 16);
>
> I'm be vaguely amazed if this isn't an exploitable info leak even
> without the out of bounds thing.  Can we really not find a way to do
> this without walking the stack?
>
> The bounds checking looks okay, though.

Also, I like Borislav's READ_ONCE suggestion.  Let's avoid TOCTOU due
to optimization.

Re: a CVE: if anyone wants a CVE, ask oss-security.  It's unclear to
me exactly how one might exploit this.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-29 18:30         ` Andy Lutomirski
@ 2015-09-29 18:41           ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2015-09-29 18:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Dmitry Vyukov, Andrey Ryabinin, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Denys Vlasenko, x86, LKML,
	Kostya Serebryany, Alexander Potapenko, Andrey Konovalov,
	Sasha Levin, Andi Kleen, kasan-dev

On Tue, Sep 29, 2015 at 11:30:33AM -0700, Andy Lutomirski wrote:
> Also, I like Borislav's READ_ONCE suggestion.  Let's avoid TOCTOU due
> to optimization.

Dmitry's original patch did READ_ONCE already.

> Re: a CVE: if anyone wants a CVE, ask oss-security. It's unclear to me
> exactly how one might exploit this.

I was just asking - it possibly leaking sensitive info and all.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan
  2015-09-29 18:15       ` Andy Lutomirski
  2015-09-29 18:30         ` Andy Lutomirski
@ 2015-09-30  7:15         ` Ingo Molnar
  2015-09-30  7:35           ` Thomas Gleixner
  2015-09-30  8:07         ` [PATCH] arch/x86: fix out-of-bounds in get_wchan() Thomas Gleixner
  2 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-09-30  7:15 UTC (permalink / raw)
  To: Andy Lutomirski, Kees Cook
  Cc: Thomas Gleixner, Dmitry Vyukov, Andrey Ryabinin, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, Denys Vlasenko,
	x86, LKML, Kostya Serebryany, Alexander Potapenko,
	Andrey Konovalov, Sasha Levin, Andi Kleen, kasan-dev,
	Linus Torvalds, Peter Zijlstra, Andrew Morton, Kees Cook,
	Al Viro


* Andy Lutomirski <luto@amacapital.net> wrote:

> > +        * ----------- bottom = start + sizeof(thread_info)
> > +        * thread_info
> > +        * ----------- start
> > +        *
> > +        * The tasks stack pointer points at the location where the
> > +        * framepointer is stored. The data on the stack is:
> > +        * ... IP FP ... IP FP
> > +        *
> > +        * We need to read FP and IP, so we need to adjust the upper
> > +        * bound by another unsigned long.
> > +        */
> > +       top = start + THREAD_SIZE - 2 * sizeof(unsigned long);
> > +       bottom = start + sizeof(struct thread_info);
> > +
> > +       sp = p->thread.sp;
> > +       if (sp < bottom || sp > top)
> > +               return 0;
> > +
> > +       fp = *(unsigned long *)sp;
> >         do {
> > -               if (fp < (unsigned long)stack ||
> > -                   fp >= (unsigned long)stack+THREAD_SIZE)
> > +               if (fp < bottom || fp > top)
> >                         return 0;
> > -               ip = *(u64 *)(fp+8);
> > +               ip = *(unsigned long *)(fp + sizeof(unsigned long));
> >                 if (!in_sched_functions(ip))
> >                         return ip;
> > -               fp = *(u64 *)fp;
> > +               fp = *(unsigned long *)fp;
> >         } while (count++ < 16);
> 
> I'm be vaguely amazed if this isn't an exploitable info leak even
> without the out of bounds thing.  Can we really not find a way to do
> this without walking the stack?

So wchan leaks absolute kernel addresses to unprivileged user-space, of kernel 
functions that sleep:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) < 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

So for example it trivially leaks the KASLR offset to any local attacker:

  fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:

  triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

So shouldn't we try to set all numeric output to 0 and only allow symbolic output 
via /proc/PID/wchan?

These days there's very little legitimate reason user-space would be interested in 
the absolute address. The absolute address is mostly historic: from the days when 
we didn't have kallsyms and user-space procps had to do the decoding itself via 
the System.map.

( The absolute sleep address can generally still be profiled via perf, by tasks 
  with sufficient privileges. )

I.e. how about something like the patch below? (completely untested.)

Thanks,

	Ingo

======================>

 fs/proc/array.c | 2 +-
 fs/proc/base.c  | 7 +------
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index f60f0121e331..99082730b2ac 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -507,7 +507,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
+	seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ll(m, ' ', task->exit_signal);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4cead5..2fdbf303e3eb 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,13 +430,8 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (lookup_symbol_name(wchan, symname) < 0) {
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
-			return 0;
-		seq_printf(m, "%lu", wchan);
-	} else {
+	if (!lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
-	}
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan
  2015-09-30  7:15         ` [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan Ingo Molnar
@ 2015-09-30  7:35           ` Thomas Gleixner
  2015-09-30 13:59             ` [PATCH v2] " Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2015-09-30  7:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Kees Cook, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

On Wed, 30 Sep 2015, Ingo Molnar wrote:
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index f60f0121e331..99082730b2ac 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -507,7 +507,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
>  	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
>  	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
>  	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
> -	seq_put_decimal_ull(m, ' ', wchan);
> +	seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */

That should get rid of all wchan usage in do_task_stat()

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] arch/x86: fix out-of-bounds in get_wchan()
  2015-09-29 18:15       ` Andy Lutomirski
  2015-09-29 18:30         ` Andy Lutomirski
  2015-09-30  7:15         ` [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan Ingo Molnar
@ 2015-09-30  8:07         ` Thomas Gleixner
  2 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2015-09-30  8:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Vyukov, Andrey Ryabinin, Ingo Molnar, H. Peter Anvin,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, x86, LKML,
	Kostya Serebryany, Alexander Potapenko, Andrey Konovalov,
	Sasha Levin, Andi Kleen, kasan-dev

On Tue, 29 Sep 2015, Andy Lutomirski wrote:
> I'm be vaguely amazed if this isn't an exploitable info leak even
> without the out of bounds thing.

The info leak happens in fs/proc, where we happily print arbitrary
"IP" values, if we cant resolve a symbol.

> Can we really not find a way to do this without walking the stack?

We would have to add a 'store wait channel' mechanism to all functions
which are the primary entry points to scheduling. Not impossible, but
not pretty either.

If we want to prevent the stack changing under us, we'd need to take
p->pi_lock and do the task != RUNNING check and the walk under it. I
don't think we want to do that, unless there is a compelling reason to
do so.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] fs/proc: Don't expose absolute kernel addresses via wchan
  2015-09-30  7:35           ` Thomas Gleixner
@ 2015-09-30 13:59             ` Ingo Molnar
  2015-09-30 20:36               ` Thomas Gleixner
                                 ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-09-30 13:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Kees Cook, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 30 Sep 2015, Ingo Molnar wrote:
> > diff --git a/fs/proc/array.c b/fs/proc/array.c
> > index f60f0121e331..99082730b2ac 100644
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -507,7 +507,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> >  	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
> >  	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
> >  	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
> > -	seq_put_decimal_ull(m, ' ', wchan);
> > +	seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
> 
> That should get rid of all wchan usage in do_task_stat()

Indeed - updated patch attached.

Thanks,

	Ingo

================================>
>From 985037cd05b379240dd381b29c2525758c665bb0 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 30 Sep 2015 09:15:37 +0200
Subject: [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan

So wchan leaks absolute kernel addresses to unprivileged
user-space, of kernel  functions that sleep:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) < 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

So for example it trivially leaks the KASLR offset to any local
attacker:

  fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in
/proc/PID/stat:

  triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

These days there's very little legitimate reason user-space
would be interested in  the absolute address. The absolute
address is mostly historic: from the days when  we didn't have
kallsyms and user-space procps had to do the decoding itself via
the System.map.

So this patch sets all numeric output to 0 and keeps the
symbolic output in /proc/PID/wchan.

( The absolute sleep address can generally still be profiled via
  perf, by tasks with sufficient privileges. )

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930071537.GA19048@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 fs/proc/array.c | 6 ++----
 fs/proc/base.c  | 7 +------
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index f60f0121e331..ad5ad1e376ad 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task, int whole)
 {
-	unsigned long vsize, eip, esp, wchan = ~0UL;
+	unsigned long vsize, eip, esp;
 	int priority, nice;
 	int tty_pgrp = -1, tty_nr = 0;
 	sigset_t sigign, sigcatch;
@@ -454,8 +454,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 		unlock_task_sighand(task, &flags);
 	}
 
-	if (permitted && (!whole || num_threads < 2))
-		wchan = get_wchan(task);
 	if (!whole) {
 		min_flt = task->min_flt;
 		maj_flt = task->maj_flt;
@@ -507,7 +505,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
+	seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ll(m, ' ', task->exit_signal);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4cead5..2fdbf303e3eb 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,13 +430,8 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (lookup_symbol_name(wchan, symname) < 0) {
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
-			return 0;
-		seq_printf(m, "%lu", wchan);
-	} else {
+	if (!lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
-	}
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] fs/proc: Don't expose absolute kernel addresses via wchan
  2015-09-30 13:59             ` [PATCH v2] " Ingo Molnar
@ 2015-09-30 20:36               ` Thomas Gleixner
  2015-09-30 21:21               ` Kees Cook
  2015-10-01 12:49               ` [tip:core/debug] fs/proc, core/debug: Don' t " tip-bot for Ingo Molnar
  2 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2015-09-30 20:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Kees Cook, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

On Wed, 30 Sep 2015, Ingo Molnar wrote:
> These days there's very little legitimate reason user-space
> would be interested in  the absolute address. The absolute
> address is mostly historic: from the days when  we didn't have
> kallsyms and user-space procps had to do the decoding itself via
> the System.map.
> 
> So this patch sets all numeric output to 0 and keeps the
> symbolic output in /proc/PID/wchan.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] fs/proc: Don't expose absolute kernel addresses via wchan
  2015-09-30 13:59             ` [PATCH v2] " Ingo Molnar
  2015-09-30 20:36               ` Thomas Gleixner
@ 2015-09-30 21:21               ` Kees Cook
  2015-09-30 21:38                 ` Thomas Gleixner
  2015-10-01  7:57                 ` [PATCH v3] fs/proc, core/debug: " Ingo Molnar
  2015-10-01 12:49               ` [tip:core/debug] fs/proc, core/debug: Don' t " tip-bot for Ingo Molnar
  2 siblings, 2 replies; 30+ messages in thread
From: Kees Cook @ 2015-09-30 21:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

On Wed, Sep 30, 2015 at 6:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> On Wed, 30 Sep 2015, Ingo Molnar wrote:
>> > diff --git a/fs/proc/array.c b/fs/proc/array.c
>> > index f60f0121e331..99082730b2ac 100644
>> > --- a/fs/proc/array.c
>> > +++ b/fs/proc/array.c
>> > @@ -507,7 +507,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
>> >     seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
>> >     seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
>> >     seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
>> > -   seq_put_decimal_ull(m, ' ', wchan);
>> > +   seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
>>
>> That should get rid of all wchan usage in do_task_stat()
>
> Indeed - updated patch attached.
>
> Thanks,
>
>         Ingo
>
> ================================>
> From 985037cd05b379240dd381b29c2525758c665bb0 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@kernel.org>
> Date: Wed, 30 Sep 2015 09:15:37 +0200
> Subject: [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan
>
> So wchan leaks absolute kernel addresses to unprivileged
> user-space, of kernel  functions that sleep:
>
> static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
>                           struct pid *pid, struct task_struct *task)
> {
>         unsigned long wchan;
>         char symname[KSYM_NAME_LEN];
>
>         wchan = get_wchan(task);
>
>         if (lookup_symbol_name(wchan, symname) < 0) {
>                 if (!ptrace_may_access(task, PTRACE_MODE_READ))
>                         return 0;
>                 seq_printf(m, "%lu", wchan);
>         } else {
>                 seq_printf(m, "%s", symname);
>         }
>
>         return 0;
> }
>
> So for example it trivially leaks the KASLR offset to any local
> attacker:
>
>   fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
>   ffffffff8123b380
>
> Most real-life uses of wchan are symbolic:
>
>   ps -eo pid:10,tid:10,wchan:30,comm
>
> and procps uses /proc/PID/wchan, not the absolute address in
> /proc/PID/stat:
>
>   triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
>   open("/proc/30833/wchan", O_RDONLY)     = 6
>
> These days there's very little legitimate reason user-space
> would be interested in  the absolute address. The absolute
> address is mostly historic: from the days when  we didn't have
> kallsyms and user-space procps had to do the decoding itself via
> the System.map.
>
> So this patch sets all numeric output to 0 and keeps the
> symbolic output in /proc/PID/wchan.
>
> ( The absolute sleep address can generally still be profiled via
>   perf, by tasks with sufficient privileges. )
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: kasan-dev <kasan-dev@googlegroups.com>
> Cc: linux-kernel@vger.kernel.org
> Link: http://lkml.kernel.org/r/20150930071537.GA19048@gmail.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  fs/proc/array.c | 6 ++----
>  fs/proc/base.c  | 7 +------
>  2 files changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index f60f0121e331..ad5ad1e376ad 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
>  static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
>                         struct pid *pid, struct task_struct *task, int whole)
>  {
> -       unsigned long vsize, eip, esp, wchan = ~0UL;
> +       unsigned long vsize, eip, esp;
>         int priority, nice;
>         int tty_pgrp = -1, tty_nr = 0;
>         sigset_t sigign, sigcatch;
> @@ -454,8 +454,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
>                 unlock_task_sighand(task, &flags);
>         }
>
> -       if (permitted && (!whole || num_threads < 2))
> -               wchan = get_wchan(task);
>         if (!whole) {
>                 min_flt = task->min_flt;
>                 maj_flt = task->maj_flt;
> @@ -507,7 +505,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
>         seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
>         seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
>         seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
> -       seq_put_decimal_ull(m, ' ', wchan);
> +       seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */

Probably should also update Documentation/filesystems/proc.txt with
something like:

--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -310,7 +310,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
   blocked       bitmap of blocked signals
   sigign        bitmap of ignored signals
   sigcatch      bitmap of caught signals
-  wchan         address where process went to sleep
+  0             (place holder, was wchan, see /proc/PID/wchan instead)
   0             (place holder)
   0             (place holder)
   exit_signal   signal to send to parent thread on exit

>         seq_put_decimal_ull(m, ' ', 0);
>         seq_put_decimal_ull(m, ' ', 0);
>         seq_put_decimal_ll(m, ' ', task->exit_signal);
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index b25eee4cead5..2fdbf303e3eb 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -430,13 +430,8 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
>
>         wchan = get_wchan(task);
>
> -       if (lookup_symbol_name(wchan, symname) < 0) {
> -               if (!ptrace_may_access(task, PTRACE_MODE_READ))
> -                       return 0;
> -               seq_printf(m, "%lu", wchan);
> -       } else {
> +       if (!lookup_symbol_name(wchan, symname))
>                 seq_printf(m, "%s", symname);
> -       }
>
>         return 0;
>  }

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] fs/proc: Don't expose absolute kernel addresses via wchan
  2015-09-30 21:21               ` Kees Cook
@ 2015-09-30 21:38                 ` Thomas Gleixner
  2015-10-01  7:57                 ` [PATCH v3] fs/proc, core/debug: " Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2015-09-30 21:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Andy Lutomirski, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

On Wed, 30 Sep 2015, Kees Cook wrote:
> Probably should also update Documentation/filesystems/proc.txt with
> something like:
> 
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -310,7 +310,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
>    blocked       bitmap of blocked signals
>    sigign        bitmap of ignored signals
>    sigcatch      bitmap of caught signals
> -  wchan         address where process went to sleep
> +  0             (place holder, was wchan, see /proc/PID/wchan instead)
>    0             (place holder)
>    0             (place holder)
>    exit_signal   signal to send to parent thread on exit

Good catch!

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-09-30 21:21               ` Kees Cook
  2015-09-30 21:38                 ` Thomas Gleixner
@ 2015-10-01  7:57                 ` Ingo Molnar
  2015-10-01  8:57                   ` Andrey Ryabinin
  2015-10-01  9:37                   ` [PATCH v4] " Ingo Molnar
  1 sibling, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-10-01  7:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro


* Kees Cook <keescook@chromium.org> wrote:

> > @@ -507,7 +505,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> >         seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
> >         seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
> >         seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
> > -       seq_put_decimal_ull(m, ' ', wchan);
> > +       seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
> 
> Probably should also update Documentation/filesystems/proc.txt with
> something like:
> 
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -310,7 +310,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
>    blocked       bitmap of blocked signals
>    sigign        bitmap of ignored signals
>    sigcatch      bitmap of caught signals
> -  wchan         address where process went to sleep
> +  0             (place holder, was wchan, see /proc/PID/wchan instead)
>    0             (place holder)
>    0             (place holder)
>    exit_signal   signal to send to parent thread on exit

Indeed - I ended up clarifying both wchan explanations, see the changes below.

I also made the 'no symbols' output "0" (instead of an empty string), to better 
match the /proc/PID/stat behavior and previous output.

I'll push it out after a bit more testing and if nothing goes wrong I'll send this 
patch to Linus in the v4.4 merge window.

Thanks,

	Ingo

============>
>From bc43bb95763e5b215e389f75860eca0952ca4704 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 30 Sep 2015 15:59:17 +0200
Subject: [PATCH] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan

So the /proc/PID/stat 'wchan' field (the 30th field) leaks absolute kernel
addresses to unprivileged user-space, of kernel functions that sleep:

        seq_put_decimal_ull(m, ' ', wchan);

The absolute address might also leak via /proc/PID/wchan, if KALLSYMS is
turned off or if the symbol lookup fails for some reason:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) < 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

This isn't ideal, because for example it trivially leaks the KASLR offset
to any local attacker:

  fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in
/proc/PID/stat:

  triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

These days there's very little legitimate reason user-space
would be interested in  the absolute address. The absolute
address is mostly historic: from the days when  we didn't have
kallsyms and user-space procps had to do the decoding itself via
the System.map.

So this patch sets all numeric output to "0" and keeps only symbolic
output, in /proc/PID/wchan.

( The absolute sleep address can generally still be profiled via
  perf, by tasks with sufficient privileges. )

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/filesystems/proc.txt | 5 +++--
 fs/proc/array.c                    | 6 ++----
 fs/proc/base.c                     | 9 +++------
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d411ca63c8b6..db64f7d6492d 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
  stat		Process status
  statm		Process memory status information
  status		Process status in human readable form
- wchan		If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ wchan		If CONFIG_KALLSYMS=y, wchan (the kernel function the process is
+		blocked in) symbol string. "0" if not blocked or !KALLSYMS.
  pagemap	Page table
  stack		Report full stack trace, enable via CONFIG_STACKTRACE
  smaps		a extension based on maps, showing the memory consumption of
@@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
   blocked       bitmap of blocked signals
   sigign        bitmap of ignored signals
   sigcatch      bitmap of caught signals
-  wchan         address where process went to sleep
+  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
   0             (place holder)
   0             (place holder)
   exit_signal   signal to send to parent thread on exit
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f60f0121e331..ad5ad1e376ad 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task, int whole)
 {
-	unsigned long vsize, eip, esp, wchan = ~0UL;
+	unsigned long vsize, eip, esp;
 	int priority, nice;
 	int tty_pgrp = -1, tty_nr = 0;
 	sigset_t sigign, sigcatch;
@@ -454,8 +454,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 		unlock_task_sighand(task, &flags);
 	}
 
-	if (permitted && (!whole || num_threads < 2))
-		wchan = get_wchan(task);
 	if (!whole) {
 		min_flt = task->min_flt;
 		maj_flt = task->maj_flt;
@@ -507,7 +505,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
+	seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ll(m, ' ', task->exit_signal);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4cead5..6f05aabce3aa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (lookup_symbol_name(wchan, symname) < 0) {
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
-			return 0;
-		seq_printf(m, "%lu", wchan);
-	} else {
+	if (!lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
-	}
+	else
+		seq_putc(m, '0');
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01  7:57                 ` [PATCH v3] fs/proc, core/debug: " Ingo Molnar
@ 2015-10-01  8:57                   ` Andrey Ryabinin
  2015-10-01  9:29                     ` Ingo Molnar
  2015-10-01  9:37                   ` [PATCH v4] " Ingo Molnar
  1 sibling, 1 reply; 30+ messages in thread
From: Andrey Ryabinin @ 2015-10-01  8:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

2015-10-01 10:57 GMT+03:00 Ingo Molnar <mingo@kernel.org>:
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index d411ca63c8b6..db64f7d6492d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
>   stat          Process status
>   statm         Process memory status information
>   status                Process status in human readable form
> - wchan         If CONFIG_KALLSYMS is set, a pre-decoded wchan
> + wchan         If CONFIG_KALLSYMS=y, wchan (the kernel function the process is
> +               blocked in) symbol string. "0" if not blocked or !KALLSYMS.

/proc/PID/wchan is under #ifdef CONFIG_KALLSYMS.


> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index b25eee4cead5..6f05aabce3aa 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
>
>         wchan = get_wchan(task);
>
> -       if (lookup_symbol_name(wchan, symname) < 0) {
> -               if (!ptrace_may_access(task, PTRACE_MODE_READ))
> -                       return 0;
> -               seq_printf(m, "%lu", wchan);
> -       } else {
> +       if (!lookup_symbol_name(wchan, symname))
>                 seq_printf(m, "%s", symname);
> -       }
> +       else
> +               seq_putc(m, '0');

Maybe we should respect 'kptr_restrict' sysctl when we use '%ps', '%pB' etc.
printk formats (AFAIK %ps just prints address if KALLSYMS=n, or lookup failed).
In that case you could just do 'seq_printf(m, "%ps", wchan)'.


OTOH, %ps, %pS are used mostly in debugging, so investigating some crash
in production kernel with no !KALLSYMS and with kptr_restrict != 0
will be a nightmare.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01  8:57                   ` Andrey Ryabinin
@ 2015-10-01  9:29                     ` Ingo Molnar
  2015-10-01 10:16                       ` Andrey Ryabinin
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-10-01  9:29 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kees Cook, Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro


* Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:

> 2015-10-01 10:57 GMT+03:00 Ingo Molnar <mingo@kernel.org>:
> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> > index d411ca63c8b6..db64f7d6492d 100644
> > --- a/Documentation/filesystems/proc.txt
> > +++ b/Documentation/filesystems/proc.txt
> > @@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
> >   stat          Process status
> >   statm         Process memory status information
> >   status                Process status in human readable form
> > - wchan         If CONFIG_KALLSYMS is set, a pre-decoded wchan
> > + wchan         If CONFIG_KALLSYMS=y, wchan (the kernel function the process is
> > +               blocked in) symbol string. "0" if not blocked or !KALLSYMS.
> 
> /proc/PID/wchan is under #ifdef CONFIG_KALLSYMS.

Yeah, indeed, so I clarified that text to now read:

+ wchan         Present with CONFIG_KALLSYMS=y: it shows the kernel function
+               symbol the task is blocked in - or "0" if not blocked.

> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index b25eee4cead5..6f05aabce3aa 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
> >
> >         wchan = get_wchan(task);
> >
> > -       if (lookup_symbol_name(wchan, symname) < 0) {
> > -               if (!ptrace_may_access(task, PTRACE_MODE_READ))
> > -                       return 0;
> > -               seq_printf(m, "%lu", wchan);
> > -       } else {
> > +       if (!lookup_symbol_name(wchan, symname))
> >                 seq_printf(m, "%s", symname);
> > -       }
> > +       else
> > +               seq_putc(m, '0');
> 
> Maybe we should respect 'kptr_restrict' sysctl when we use '%ps', '%pB' etc. 
> printk formats (AFAIK %ps just prints address if KALLSYMS=n, or lookup failed). 
> In that case you could just do 'seq_printf(m, "%ps", wchan)'.
> 
> OTOH, %ps, %pS are used mostly in debugging, so investigating some crash in 
> production kernel with no !KALLSYMS and with kptr_restrict != 0 will be a 
> nightmare.

So this code does not use %pX, it prints the symbol. Yes, the symbol in itself is 
'information' about the execution of the task in itself - but /proc per se is all 
about providing information about tasks in the system (including to unprivileged 
users), so there's IMHO little point in restricting this output any further ...

I think ktrp_restrict is mostly about not exposing absolute addresses.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01  7:57                 ` [PATCH v3] fs/proc, core/debug: " Ingo Molnar
  2015-10-01  8:57                   ` Andrey Ryabinin
@ 2015-10-01  9:37                   ` Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-10-01  9:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov, Andrey Ryabinin,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Kees Cook <keescook@chromium.org> wrote:
> 
> > > @@ -507,7 +505,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> > >         seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
> > >         seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
> > >         seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
> > > -       seq_put_decimal_ull(m, ' ', wchan);
> > > +       seq_puts(m, " 0"); /* Used to be numeric wchan - replaced by /proc/PID/wchan */
> > 
> > Probably should also update Documentation/filesystems/proc.txt with
> > something like:
> > 
> > --- a/Documentation/filesystems/proc.txt
> > +++ b/Documentation/filesystems/proc.txt
> > @@ -310,7 +310,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
> >    blocked       bitmap of blocked signals
> >    sigign        bitmap of ignored signals
> >    sigcatch      bitmap of caught signals
> > -  wchan         address where process went to sleep
> > +  0             (place holder, was wchan, see /proc/PID/wchan instead)
> >    0             (place holder)
> >    0             (place holder)
> >    exit_signal   signal to send to parent thread on exit
> 
> Indeed - I ended up clarifying both wchan explanations, see the changes below.
> 
> I also made the 'no symbols' output "0" (instead of an empty string), to better 
> match the /proc/PID/stat behavior and previous output.
> 
> I'll push it out after a bit more testing and if nothing goes wrong I'll send this 
> patch to Linus in the v4.4 merge window.

Yeah, so testing uncovered the following additional ABI detail: procps relies on 
the wchan field in /proc/PID/stat, but only as a flag (in most cases), whether to 
look at /proc/PID/wchan.

To keep the ABI, the v4 patch below outputs not the absolute address, but a 0/1 
flag to indicate whether the task is blocked and whether there's anything worth 
looking at in /proc/PID/wchan.

I tested this approach with procps and it seems to fully work. In fact due to the 
ptrace check we properly restrict the information to our own tasks only. root 
still sees the wchan field of all tasks.

Btw., the very latest procps-ng grew this nice change:

  6b8dc5511fb9 ("library: refactor and rely on modern kernels for wchan")

which greatly simplified procps's handling of /proc/PID/wchan.

... but my testing was done with an older procps version.

Thanks,

	Ingo

==========================>
>From b26a16469b0b6f3f0604aafb95c50d1532b3fff2 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 30 Sep 2015 15:59:17 +0200
Subject: [PATCH] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan

So the /proc/PID/stat 'wchan' field (the 30th field, which contains
the absolute kernel address of the kernel function a task is blocked in)
leaks absolute kernel addresses to unprivileged user-space:

        seq_put_decimal_ull(m, ' ', wchan);

The absolute address might also leak via /proc/PID/wchan as well, if
KALLSYMS is turned off or if the symbol lookup fails for some reason:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) < 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

This isn't ideal, because for example it trivially leaks the KASLR offset
to any local attacker:

  fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:

  triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

There's one compatibility quirk here: procps relies on whether the
absolute value is non-zero - and we can provide that functionality
by outputing "0" or "1" depending on whether the task is blocked
(whether there's a wchan address).

These days there appears to be very little legitimate reason
user-space would be interested in  the absolute address. The
absolute address is mostly historic: from the days when we
didn't have kallsyms and user-space procps had to do the
decoding itself via the System.map.

So this patch sets all numeric output to "0" or "1" and keeps only
symbolic output, in /proc/PID/wchan.

( The absolute sleep address can generally still be profiled via
  perf, by tasks with sufficient privileges. )

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/filesystems/proc.txt |  5 +++--
 fs/proc/array.c                    | 16 ++++++++++++++--
 fs/proc/base.c                     |  9 +++------
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d411ca63c8b6..3a9d65c912e7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
  stat		Process status
  statm		Process memory status information
  status		Process status in human readable form
- wchan		If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
+		symbol the task is blocked in - or "0" if not blocked.
  pagemap	Page table
  stack		Report full stack trace, enable via CONFIG_STACKTRACE
  smaps		a extension based on maps, showing the memory consumption of
@@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
   blocked       bitmap of blocked signals
   sigign        bitmap of ignored signals
   sigcatch      bitmap of caught signals
-  wchan         address where process went to sleep
+  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
   0             (place holder)
   0             (place holder)
   exit_signal   signal to send to parent thread on exit
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f60f0121e331..eed2050db9be 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task, int whole)
 {
-	unsigned long vsize, eip, esp, wchan = ~0UL;
+	unsigned long vsize, eip, esp, wchan = 0;
 	int priority, nice;
 	int tty_pgrp = -1, tty_nr = 0;
 	sigset_t sigign, sigcatch;
@@ -507,7 +507,19 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
+
+	/*
+	 * We used to output the absolute kernel address, but that's an
+	 * information leak - so instead we show a 0/1 flag here, to signal
+	 * to user-space whether there's a wchan field in /proc/PID/wchan.
+	 *
+	 * This works with older implementations of procps as well.
+	 */
+	if (wchan)
+		seq_puts(m, " 1");
+	else
+		seq_puts(m, " 0");
+
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ll(m, ' ', task->exit_signal);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4cead5..6f05aabce3aa 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (lookup_symbol_name(wchan, symname) < 0) {
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
-			return 0;
-		seq_printf(m, "%lu", wchan);
-	} else {
+	if (!lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
-	}
+	else
+		seq_putc(m, '0');
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01  9:29                     ` Ingo Molnar
@ 2015-10-01 10:16                       ` Andrey Ryabinin
  2015-10-01 10:39                         ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Andrey Ryabinin @ 2015-10-01 10:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

On 10/01/2015 12:29 PM, Ingo Molnar wrote:
> 
> * Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:
> 
>> 2015-10-01 10:57 GMT+03:00 Ingo Molnar <mingo@kernel.org>:
>>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>>> index d411ca63c8b6..db64f7d6492d 100644
>>> --- a/Documentation/filesystems/proc.txt
>>> +++ b/Documentation/filesystems/proc.txt
>>> @@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
>>>   stat          Process status
>>>   statm         Process memory status information
>>>   status                Process status in human readable form
>>> - wchan         If CONFIG_KALLSYMS is set, a pre-decoded wchan
>>> + wchan         If CONFIG_KALLSYMS=y, wchan (the kernel function the process is
>>> +               blocked in) symbol string. "0" if not blocked or !KALLSYMS.
>>
>> /proc/PID/wchan is under #ifdef CONFIG_KALLSYMS.
> 
> Yeah, indeed, so I clarified that text to now read:
> 
> + wchan         Present with CONFIG_KALLSYMS=y: it shows the kernel function
> +               symbol the task is blocked in - or "0" if not blocked.
> 
>>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>>> index b25eee4cead5..6f05aabce3aa 100644
>>> --- a/fs/proc/base.c
>>> +++ b/fs/proc/base.c
>>> @@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
>>>
>>>         wchan = get_wchan(task);
>>>
>>> -       if (lookup_symbol_name(wchan, symname) < 0) {
>>> -               if (!ptrace_may_access(task, PTRACE_MODE_READ))
>>> -                       return 0;
>>> -               seq_printf(m, "%lu", wchan);
>>> -       } else {
>>> +       if (!lookup_symbol_name(wchan, symname))
>>>                 seq_printf(m, "%s", symname);
>>> -       }
>>> +       else
>>> +               seq_putc(m, '0');
>>
>> Maybe we should respect 'kptr_restrict' sysctl when we use '%ps', '%pB' etc. 
>> printk formats (AFAIK %ps just prints address if KALLSYMS=n, or lookup failed). 
>> In that case you could just do 'seq_printf(m, "%ps", wchan)'.
>>
>> OTOH, %ps, %pS are used mostly in debugging, so investigating some crash in 
>> production kernel with no !KALLSYMS and with kptr_restrict != 0 will be a 
>> nightmare.
> 
> So this code does not use %pX, it prints the symbol. 

I think you misunderstood me.
Yes, this code currently doesn't use %pX, but it could:

diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4..f58f66e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -425,18 +425,7 @@ static int proc_pid_auxv(struct seq_file *m, struct pid_namespace *ns,
 static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
 {
-       unsigned long wchan;
-       char symname[KSYM_NAME_LEN];
-
-       wchan = get_wchan(task);
-
-       if (lookup_symbol_name(wchan, symname) < 0) {
-               if (!ptrace_may_access(task, PTRACE_MODE_READ))
-                       return 0;
-               seq_printf(m, "%lu", wchan);
-       } else {
-               seq_printf(m, "%s", symname);
-       }
+       seq_printf(m, "%ps", get_wchan(task));
 
        return 0;
 }


There is a problem here, though. %ps will print absolute kernel address instead of symbol name
if KALLSYMS=n or if resolution of address failed.
So I was wondering, may be should just fix %ps ?
i.e. print 0 instead of absolute address if KALLSYMS=n or lookup failure?


> Yes, the symbol in itself is 
> 'information' about the execution of the task in itself - but /proc per se is all 
> about providing information about tasks in the system (including to unprivileged 
> users), so there's IMHO little point in restricting this output any further ...
> 
> I think ktrp_restrict is mostly about not exposing absolute addresses.
> 

Right, and '%ps' may expose absolute address if KALLSYMS=n or address lookup failed for some reason.

> Thanks,
> 
> 	Ingo
> 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01 10:16                       ` Andrey Ryabinin
@ 2015-10-01 10:39                         ` Ingo Molnar
  2015-10-01 10:47                           ` Andrey Ryabinin
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2015-10-01 10:39 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kees Cook, Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro


* Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:

> I think you misunderstood me.
> Yes, this code currently doesn't use %pX, but it could:
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index b25eee4..f58f66e 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -425,18 +425,7 @@ static int proc_pid_auxv(struct seq_file *m, struct pid_namespace *ns,
>  static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
>                           struct pid *pid, struct task_struct *task)
>  {
> -       unsigned long wchan;
> -       char symname[KSYM_NAME_LEN];
> -
> -       wchan = get_wchan(task);
> -
> -       if (lookup_symbol_name(wchan, symname) < 0) {
> -               if (!ptrace_may_access(task, PTRACE_MODE_READ))
> -                       return 0;
> -               seq_printf(m, "%lu", wchan);
> -       } else {
> -               seq_printf(m, "%s", symname);
> -       }
> +       seq_printf(m, "%ps", get_wchan(task));
>  
>         return 0;
>  }
> 
> 
> There is a problem here, though. %ps will print absolute kernel address instead of symbol name
> if KALLSYMS=n or if resolution of address failed.
> So I was wondering, may be should just fix %ps ?
> i.e. print 0 instead of absolute address if KALLSYMS=n or lookup failure?

There's another problem as well: your change loses the PTRACE_MODE_READ permission 
check.

But ... I think I like it open coded, which is good precisely because it will stay 
invariant even if we change details in the %ps/etc. debug output.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01 10:39                         ` Ingo Molnar
@ 2015-10-01 10:47                           ` Andrey Ryabinin
  2015-10-01 10:57                             ` [PATCH v5] " Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Andrey Ryabinin @ 2015-10-01 10:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kees Cook, Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro

2015-10-01 13:39 GMT+03:00 Ingo Molnar <mingo@kernel.org>:
>
> There's another problem as well: your change loses the PTRACE_MODE_READ permission
> check.


I guess, you should fix your patch than, because it removed this check
likewise...


>
> But ... I think I like it open coded, which is good precisely because it will stay
> invariant even if we change details in the %ps/etc. debug output.
>

Fair enough.

> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan
  2015-10-01 10:47                           ` Andrey Ryabinin
@ 2015-10-01 10:57                             ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2015-10-01 10:57 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Kees Cook, Thomas Gleixner, Andy Lutomirski, Dmitry Vyukov,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Borislav Petkov,
	Denys Vlasenko, x86, LKML, Kostya Serebryany,
	Alexander Potapenko, Andrey Konovalov, Sasha Levin, Andi Kleen,
	kasan-dev, Linus Torvalds, Peter Zijlstra, Andrew Morton,
	Al Viro


* Andrey Ryabinin <ryabinin.a.a@gmail.com> wrote:

> 2015-10-01 13:39 GMT+03:00 Ingo Molnar <mingo@kernel.org>:
> >
> > There's another problem as well: your change loses the PTRACE_MODE_READ permission
> > check.
> 
> I guess, you should fix your patch than, because it removed this check
> likewise...

Doh, indeed. I didn't notice that bug in testing, because the wchan field of 
/proc/PID/stat was 0/1 according to the ptrace check so 'ps' masked the symbol 
accordingly - but the /proc/PID/wchan value was always accessible.

Updated patch below.

Thanks,

	Ingo

=======================>
>From b2f73922d119686323f14fbbe46587f863852328 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Wed, 30 Sep 2015 15:59:17 +0200
Subject: [PATCH] fs/proc, core/debug: Don't expose absolute kernel addresses via wchan

So the /proc/PID/stat 'wchan' field (the 30th field, which contains
the absolute kernel address of the kernel function a task is blocked in)
leaks absolute kernel addresses to unprivileged user-space:

        seq_put_decimal_ull(m, ' ', wchan);

The absolute address might also leak via /proc/PID/wchan as well, if
KALLSYMS is turned off or if the symbol lookup fails for some reason:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) < 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

This isn't ideal, because for example it trivially leaks the KASLR offset
to any local attacker:

  fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:

  triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

There's one compatibility quirk here: procps relies on whether the
absolute value is non-zero - and we can provide that functionality
by outputing "0" or "1" depending on whether the task is blocked
(whether there's a wchan address).

These days there appears to be very little legitimate reason
user-space would be interested in  the absolute address. The
absolute address is mostly historic: from the days when we
didn't have kallsyms and user-space procps had to do the
decoding itself via the System.map.

So this patch sets all numeric output to "0" or "1" and keeps only
symbolic output, in /proc/PID/wchan.

( The absolute sleep address can generally still be profiled via
  perf, by tasks with sufficient privileges. )

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/filesystems/proc.txt |  5 +++--
 fs/proc/array.c                    | 16 ++++++++++++++--
 fs/proc/base.c                     |  9 +++------
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d411ca63c8b6..3a9d65c912e7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
  stat		Process status
  statm		Process memory status information
  status		Process status in human readable form
- wchan		If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
+		symbol the task is blocked in - or "0" if not blocked.
  pagemap	Page table
  stack		Report full stack trace, enable via CONFIG_STACKTRACE
  smaps		a extension based on maps, showing the memory consumption of
@@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
   blocked       bitmap of blocked signals
   sigign        bitmap of ignored signals
   sigcatch      bitmap of caught signals
-  wchan         address where process went to sleep
+  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
   0             (place holder)
   0             (place holder)
   exit_signal   signal to send to parent thread on exit
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f60f0121e331..eed2050db9be 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task, int whole)
 {
-	unsigned long vsize, eip, esp, wchan = ~0UL;
+	unsigned long vsize, eip, esp, wchan = 0;
 	int priority, nice;
 	int tty_pgrp = -1, tty_nr = 0;
 	sigset_t sigign, sigcatch;
@@ -507,7 +507,19 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
+
+	/*
+	 * We used to output the absolute kernel address, but that's an
+	 * information leak - so instead we show a 0/1 flag here, to signal
+	 * to user-space whether there's a wchan field in /proc/PID/wchan.
+	 *
+	 * This works with older implementations of procps as well.
+	 */
+	if (wchan)
+		seq_puts(m, " 1");
+	else
+		seq_puts(m, " 0");
+
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ll(m, ' ', task->exit_signal);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4cead5..29595af32866 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (lookup_symbol_name(wchan, symname) < 0) {
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
-			return 0;
-		seq_printf(m, "%lu", wchan);
-	} else {
+	if (wchan && ptrace_may_access(task, PTRACE_MODE_READ) && !lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
-	}
+	else
+		seq_putc(m, '0');
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:core/debug] fs/proc, core/debug: Don' t expose absolute kernel addresses via wchan
  2015-09-30 13:59             ` [PATCH v2] " Ingo Molnar
  2015-09-30 20:36               ` Thomas Gleixner
  2015-09-30 21:21               ` Kees Cook
@ 2015-10-01 12:49               ` tip-bot for Ingo Molnar
  2 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Ingo Molnar @ 2015-10-01 12:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: a.p.zijlstra, bp, hpa, kasan-dev, ryabinin.a.a, dvlasenk, efault,
	kcc, keescook, viro, linux-kernel, glider, tglx, luto, torvalds,
	sasha.levin, mingo, luto, peterz, stable, andreyknvl, dvyukov

Commit-ID:  b2f73922d119686323f14fbbe46587f863852328
Gitweb:     http://git.kernel.org/tip/b2f73922d119686323f14fbbe46587f863852328
Author:     Ingo Molnar <mingo@kernel.org>
AuthorDate: Wed, 30 Sep 2015 15:59:17 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 1 Oct 2015 12:55:34 +0200

fs/proc, core/debug: Don't expose absolute kernel addresses via wchan

So the /proc/PID/stat 'wchan' field (the 30th field, which contains
the absolute kernel address of the kernel function a task is blocked in)
leaks absolute kernel addresses to unprivileged user-space:

        seq_put_decimal_ull(m, ' ', wchan);

The absolute address might also leak via /proc/PID/wchan as well, if
KALLSYMS is turned off or if the symbol lookup fails for some reason:

static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
                          struct pid *pid, struct task_struct *task)
{
        unsigned long wchan;
        char symname[KSYM_NAME_LEN];

        wchan = get_wchan(task);

        if (lookup_symbol_name(wchan, symname) < 0) {
                if (!ptrace_may_access(task, PTRACE_MODE_READ))
                        return 0;
                seq_printf(m, "%lu", wchan);
        } else {
                seq_printf(m, "%s", symname);
        }

        return 0;
}

This isn't ideal, because for example it trivially leaks the KASLR offset
to any local attacker:

  fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35)
  ffffffff8123b380

Most real-life uses of wchan are symbolic:

  ps -eo pid:10,tid:10,wchan:30,comm

and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat:

  triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1
  open("/proc/30833/wchan", O_RDONLY)     = 6

There's one compatibility quirk here: procps relies on whether the
absolute value is non-zero - and we can provide that functionality
by outputing "0" or "1" depending on whether the task is blocked
(whether there's a wchan address).

These days there appears to be very little legitimate reason
user-space would be interested in  the absolute address. The
absolute address is mostly historic: from the days when we
didn't have kallsyms and user-space procps had to do the
decoding itself via the System.map.

So this patch sets all numeric output to "0" or "1" and keeps only
symbolic output, in /proc/PID/wchan.

( The absolute sleep address can generally still be profiled via
  perf, by tasks with sufficient privileges. )

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/filesystems/proc.txt |  5 +++--
 fs/proc/array.c                    | 16 ++++++++++++++--
 fs/proc/base.c                     |  9 +++------
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d411ca6..3a9d65c 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
  stat		Process status
  statm		Process memory status information
  status		Process status in human readable form
- wchan		If CONFIG_KALLSYMS is set, a pre-decoded wchan
+ wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
+		symbol the task is blocked in - or "0" if not blocked.
  pagemap	Page table
  stack		Report full stack trace, enable via CONFIG_STACKTRACE
  smaps		a extension based on maps, showing the memory consumption of
@@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
   blocked       bitmap of blocked signals
   sigign        bitmap of ignored signals
   sigcatch      bitmap of caught signals
-  wchan         address where process went to sleep
+  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
   0             (place holder)
   0             (place holder)
   exit_signal   signal to send to parent thread on exit
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f60f012..eed2050 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -375,7 +375,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task, int whole)
 {
-	unsigned long vsize, eip, esp, wchan = ~0UL;
+	unsigned long vsize, eip, esp, wchan = 0;
 	int priority, nice;
 	int tty_pgrp = -1, tty_nr = 0;
 	sigset_t sigign, sigcatch;
@@ -507,7 +507,19 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
+
+	/*
+	 * We used to output the absolute kernel address, but that's an
+	 * information leak - so instead we show a 0/1 flag here, to signal
+	 * to user-space whether there's a wchan field in /proc/PID/wchan.
+	 *
+	 * This works with older implementations of procps as well.
+	 */
+	if (wchan)
+		seq_puts(m, " 1");
+	else
+		seq_puts(m, " 0");
+
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ull(m, ' ', 0);
 	seq_put_decimal_ll(m, ' ', task->exit_signal);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b25eee4..29595af 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (lookup_symbol_name(wchan, symname) < 0) {
-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
-			return 0;
-		seq_printf(m, "%lu", wchan);
-	} else {
+	if (wchan && ptrace_may_access(task, PTRACE_MODE_READ) && !lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
-	}
+	else
+		seq_putc(m, '0');
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-10-01 12:51 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-28  9:00 [PATCH] arch/x86: fix out-of-bounds in get_wchan() Dmitry Vyukov
2015-09-28  9:37 ` Borislav Petkov
2015-09-28  9:49   ` Dmitry Vyukov
2015-09-28 10:23     ` Borislav Petkov
2015-09-28 10:33       ` Dmitry Vyukov
2015-09-28 10:51         ` Borislav Petkov
2015-09-28  9:54   ` Dmitry Vyukov
2015-09-28 10:32     ` Borislav Petkov
2015-09-28 15:40 ` Andrey Ryabinin
2015-09-28 16:08   ` Dmitry Vyukov
2015-09-28 16:32     ` Thomas Gleixner
2015-09-29 18:15       ` Andy Lutomirski
2015-09-29 18:30         ` Andy Lutomirski
2015-09-29 18:41           ` Borislav Petkov
2015-09-30  7:15         ` [PATCH] fs/proc: Don't expose absolute kernel addresses via wchan Ingo Molnar
2015-09-30  7:35           ` Thomas Gleixner
2015-09-30 13:59             ` [PATCH v2] " Ingo Molnar
2015-09-30 20:36               ` Thomas Gleixner
2015-09-30 21:21               ` Kees Cook
2015-09-30 21:38                 ` Thomas Gleixner
2015-10-01  7:57                 ` [PATCH v3] fs/proc, core/debug: " Ingo Molnar
2015-10-01  8:57                   ` Andrey Ryabinin
2015-10-01  9:29                     ` Ingo Molnar
2015-10-01 10:16                       ` Andrey Ryabinin
2015-10-01 10:39                         ` Ingo Molnar
2015-10-01 10:47                           ` Andrey Ryabinin
2015-10-01 10:57                             ` [PATCH v5] " Ingo Molnar
2015-10-01  9:37                   ` [PATCH v4] " Ingo Molnar
2015-10-01 12:49               ` [tip:core/debug] fs/proc, core/debug: Don' t " tip-bot for Ingo Molnar
2015-09-30  8:07         ` [PATCH] arch/x86: fix out-of-bounds in get_wchan() Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).