linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG -tip] kmemleak and stacktrace cause page faul
@ 2019-10-19 11:44 Cyrill Gorcunov
  2019-10-22 14:23 ` Cyrill Gorcunov
  0 siblings, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-19 11:44 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas

Hi! I'm not sure if I've CC'ed proper persons, so please sorry if I did.
Anyway, today's -tip (07b4dbf1d830) refused to boot

[    0.024793] No NUMA configuration found
[    0.025406] Faking a node at [mem 0x0000000000000000-0x000000007ffdefff]
[    0.026462] NODE_DATA(0) allocated [mem 0x7ffdb000-0x7ffdefff]
[    0.027246] BUG: unable to handle page fault for address: 0000000000001ff0
[    0.028160] #PF: supervisor read access in kernel mode
[    0.028992] #PF: error_code(0x0000) - not-present page
[    0.029820] PGD 0 P4D 0 
[    0.030226] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.031069] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00258-g07b4dbf1d830 #93
[    0.032317] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[    0.034163] RIP: 0010:get_stack_info+0xb3/0x148
[    0.034903] Code: 04 d5 84 48 01 82 66 85 c0 74 25 8b 0c d5 80 48 01 82 0f b7 14 d5 86 48 01 82 48 01 f1 89 13 48 01 c8 48 89 4b 08 48 89 43 10 <48> 8b 40 f0 eb 2b 65 48 8b 05 1f f4 f9 7e 48 8d 90 00 c0 ff ff 48
[    0.037579] RSP: 0000:ffffffff82603be0 EFLAGS: 00010006

I nailed it down to the following kmemleak code

create_object
  ...
  object->trace_len = __save_stack_trace(object->trace);

if I drop this line out it boots fine. Just wanted to share the observation,
probably it is known issue already.

Sidenote: The last -tip kernel which I've been working with is dated Sep 18
so the changes which cause the problem should be introduced last month.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-19 11:44 [BUG -tip] kmemleak and stacktrace cause page faul Cyrill Gorcunov
@ 2019-10-22 14:23 ` Cyrill Gorcunov
  2019-10-22 14:56   ` Cyrill Gorcunov
  0 siblings, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-22 14:23 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas

On Sat, Oct 19, 2019 at 02:44:21PM +0300, Cyrill Gorcunov wrote:
...
> 
> I nailed it down to the following kmemleak code
> 
> create_object
>   ...
>   object->trace_len = __save_stack_trace(object->trace);
> 
> if I drop this line out it boots fine. Just wanted to share the observation,
> probably it is known issue already.
> 
> Sidenote: The last -tip kernel which I've been working with is dated Sep 18
> so the changes which cause the problem should be introduced last month.

I've just tried to boot with fresh -tip

commit c2e50c28eeb90c0f3309d43c2ce0bd292a6751b3 (HEAD -> master, origin/master, origin/HEAD)
Merge: aec1ea9d4f48 27a0a90d6301
Author: Ingo Molnar <mingo@kernel.org>
Date:   Tue Oct 22 01:16:59 2019 +0200

    Merge branch 'perf/core'
    
    Conflicts:
            tools/perf/check-headers.sh

but got the same issue. So I tried to go deeper, and here is a result: we're failing
in arch/x86/kernel/dumpstack_64.c:in_exception_stack routine, precisely at line

	ep = &estack_pages[k];
	/* Guard page? */
	if (!ep->size)
		return false;

so I added a logline here

[    0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
[    0.084951] BUG: unable to handle page fault for address: 0000000000001ff0
[    0.086724] #PF: supervisor read access in kernel mode
[    0.088506] #PF: error_code(0x0000) - not-present page
[    0.090265] PGD 0 P4D 0 
[    0.090846] Oops: 0000 [#2] PREEMPT SMP PTI
[    0.091734] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc4-00258-gc2e50c28eeb9-dirty #114
[    0.093514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[    0.096993] RIP: 0010:get_stack_info+0xdc/0x173
[    0.097994] Code: 84 48 01 82 66 85 c0 74 27 42 8b 14 f5 80 48 01 82 49 01 d5 42 0f b7 14 f5 86 48 01 82 4c 01 e8 4c 89 6b 08 89 13 48 89 43 10 <48> 8b 40 f0 eb 2b 65 48 8b 05 16 f4 f9 7e 48 8d 90 00 c0 ff ff 48

I presume the kmemleak tries to save stack trace too early when estack_pages are not
yet filled.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-22 14:23 ` Cyrill Gorcunov
@ 2019-10-22 14:56   ` Cyrill Gorcunov
  2019-10-23 13:21     ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-22 14:56 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas

On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> 
> I presume the kmemleak tries to save stack trace too early when estack_pages are not
> yet filled.

Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
yet and estack_pages full of crap

[    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
[    0.159395] estack_pages[0] = 0x0
[    0.160046] estack_pages[1] = 0x5100000001000
[    0.160881] estack_pages[2] = 0x0
[    0.161530] estack_pages[3] = 0x6100000003000
[    0.162343] estack_pages[4] = 0x0
[    0.162962] estack_pages[5] = 0x0
[    0.163523] estack_pages[6] = 0x0
[    0.164065] estack_pages[7] = 0x8100000007000
[    0.164978] estack_pages[8] = 0x0
[    0.165624] estack_pages[9] = 0x9100000009000
[    0.166448] estack_pages[10] = 0x0
[    0.167064] estack_pages[11] = 0xa10000000b000
[    0.168055] estack_pages[12] = 0x0
[    0.168891] BUG: unable to handle page fault for address: 0000000000001ff0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-22 14:56   ` Cyrill Gorcunov
@ 2019-10-23 13:21     ` Thomas Gleixner
  2019-10-23 13:32       ` Cyrill Gorcunov
  2019-10-23 13:47       ` Thomas Gleixner
  0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:21 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > 
> > I presume the kmemleak tries to save stack trace too early when estack_pages are not
> > yet filled.
> 
> Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
> yet and estack_pages full of crap
> 
> [    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
> [    0.159395] estack_pages[0] = 0x0
> [    0.160046] estack_pages[1] = 0x5100000001000
> [    0.160881] estack_pages[2] = 0x0
> [    0.161530] estack_pages[3] = 0x6100000003000
> [    0.162343] estack_pages[4] = 0x0
> [    0.162962] estack_pages[5] = 0x0
> [    0.163523] estack_pages[6] = 0x0
> [    0.164065] estack_pages[7] = 0x8100000007000
> [    0.164978] estack_pages[8] = 0x0
> [    0.165624] estack_pages[9] = 0x9100000009000
> [    0.166448] estack_pages[10] = 0x0
> [    0.167064] estack_pages[11] = 0xa10000000b000
> [    0.168055] estack_pages[12] = 0x0

Errm. estack_pages is statically initialized and it's an array of:.

struct estack_pages {
        u32     offs;
        u16     size;
        u16     type;
};

[0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

The rest looks completely valid if you actually decode it proper.

e.g. 0x51000 00001000

     bit  0-31: 00001000		Offset 0x1000: 1 Page
     bit 32-47: 1000			Size 0x1000:   1 Page
     bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF

So, no. This is NOT the problem.

But yes, you are right that percpu_setup_exception_stacks() has not yet
been called simply because the percpu entry area has not been mapped yet.

So lets look at the full context:

        begin = (unsigned long)__this_cpu_read(cea_exception_stacks);

When percpu_setup_exception_stacks() has not been called yet, then begin
should be 0.

        end = begin + sizeof(struct cea_exception_stacks);

end should be 0 + sizeof(struct cea_exception_stacks);

        /* Bail if @stack is outside the exception stack area. */
        if (stk < begin || stk >= end)
                return false;

So 'begin <= stk < end' must be true to get to the below:

        /* Calc page offset from start of exception stacks */
        k = (stk - begin) >> PAGE_SHIFT;

which gives a valid 'k' no matter what 'begin' is. And obviously 'k' cannot
be outside of the array size of estack_pages.

        /* Lookup the page descriptor */
        ep = &estack_pages[k];

Ergo ep must be a valid pointer pointing to the statically allocated and
statically initialized estack_pages array.

        /* Guard page? */
        if (!ep->size)

How on earth can dereferencing ep crash the machine?

                return false;

That does not make any sense.

Surely, we should not even try to decode exception stack when
cea_exception_stacks is not yet initialized, but that does not explain
anything what you are observing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:21     ` Thomas Gleixner
@ 2019-10-23 13:32       ` Cyrill Gorcunov
  2019-10-23 13:38         ` Thomas Gleixner
  2019-10-23 13:47       ` Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> > On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > > 
> > > I presume the kmemleak tries to save stack trace too early when estack_pages are not
> > > yet filled.
> > 
> > Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
> > yet and estack_pages full of crap
> > 
> > [    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
> > [    0.159395] estack_pages[0] = 0x0
> > [    0.160046] estack_pages[1] = 0x5100000001000
> > [    0.160881] estack_pages[2] = 0x0
> > [    0.161530] estack_pages[3] = 0x6100000003000
> > [    0.162343] estack_pages[4] = 0x0
> > [    0.162962] estack_pages[5] = 0x0
> > [    0.163523] estack_pages[6] = 0x0
> > [    0.164065] estack_pages[7] = 0x8100000007000
> > [    0.164978] estack_pages[8] = 0x0
> > [    0.165624] estack_pages[9] = 0x9100000009000
> > [    0.166448] estack_pages[10] = 0x0
> > [    0.167064] estack_pages[11] = 0xa10000000b000
> > [    0.168055] estack_pages[12] = 0x0
> 
> Errm. estack_pages is statically initialized and it's an array of:.
> 
> struct estack_pages {
>         u32     offs;
>         u16     size;
>         u16     type;
> };
> 
> [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
the memory for this estack_pages has not yet been allocated, no?

> The rest looks completely valid if you actually decode it proper.

The diff I made to fetch the values are

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 753b8cfe8b8a..bf0d755b6079 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 
 	/* Calc page offset from start of exception stacks */
 	k = (stk - begin) >> PAGE_SHIFT;
+
 	/* Lookup the page descriptor */
 	ep = &estack_pages[k];
+
+	printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n",
+	       stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep);
+
+	for (k = 0; k < CEA_ESTACK_PAGES; k++) {
+		long v = *(long *)(void *)&estack_pages[k];
+		printk("estack_pages[%d] = 0x%lx\n", k, v);
+	}
+
 	/* Guard page? */
 	if (!ep->size)
 		return false;


> 
> e.g. 0x51000 00001000
> 
>      bit  0-31: 00001000		Offset 0x1000: 1 Page
>      bit 32-47: 1000			Size 0x1000:   1 Page
>      bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> 
> So, no. This is NOT the problem.

I drop the left of your reply. True, I agreed with anything you said.
You know I didn't manage to dive more into this problem yesterday
but if time permits I'll continue today. It is easily triggering
under kvm (the kernel I'm building is almost without modules so
I simply upload bzImage into the guest). FWIW, the config I'm
using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:32       ` Cyrill Gorcunov
@ 2019-10-23 13:38         ` Thomas Gleixner
  2019-10-23 13:44           ` Cyrill Gorcunov
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:38 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, 23 Oct 2019, Cyrill Gorcunov wrote:
> On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> > Errm. estack_pages is statically initialized and it's an array of:.
> > 
> > struct estack_pages {
> >         u32     offs;
> >         u16     size;
> >         u16     type;
> > };
> > 
> > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> 
> Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> the memory for this estack_pages has not yet been allocated, no?

static const
struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
        EPAGERANGE(DF),
	EPAGERANGE(NMI),
	EPAGERANGE(DB1),
	EPAGERANGE(DB),
        EPAGERANGE(MCE),
};

It's statically allocated. So it's available from the very beginning.

> The diff I made to fetch the values are
> 
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..bf0d755b6079 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
>  
>  	/* Calc page offset from start of exception stacks */
>  	k = (stk - begin) >> PAGE_SHIFT;
> +
>  	/* Lookup the page descriptor */
>  	ep = &estack_pages[k];
> +
> +	printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n",
> +	       stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep);
> +
> +	for (k = 0; k < CEA_ESTACK_PAGES; k++) {
> +		long v = *(long *)(void *)&estack_pages[k];
> +		printk("estack_pages[%d] = 0x%lx\n", k, v);

And as I explained to you properly decoded the values _ARE_ correct and
make sense.

> +	}
> +
>  	/* Guard page? */
>  	if (!ep->size)
>  		return false;
> 
> 
> > 
> > e.g. 0x51000 00001000
> > 
> >      bit  0-31: 00001000		Offset 0x1000: 1 Page
> >      bit 32-47: 1000			Size 0x1000:   1 Page
> >      bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> > 
> > So, no. This is NOT the problem.
> 
> I drop the left of your reply. True, I agreed with anything you said.
>
> You know I didn't manage to dive more into this problem yesterday
> but if time permits I'll continue today. It is easily triggering
> under kvm (the kernel I'm building is almost without modules so
> I simply upload bzImage into the guest). FWIW, the config I'm
> using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

That's helpful because I enabled kmemleak and the kernel comes up just fine.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:38         ` Thomas Gleixner
@ 2019-10-23 13:44           ` Cyrill Gorcunov
  0 siblings, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:38:40PM +0200, Thomas Gleixner wrote:
> > > 
> > > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> > 
> > Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> > the memory for this estack_pages has not yet been allocated, no?
> 
> static const
> struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
>         EPAGERANGE(DF),
> 	EPAGERANGE(NMI),
> 	EPAGERANGE(DB1),
> 	EPAGERANGE(DB),
>         EPAGERANGE(MCE),
> };
> 
> It's statically allocated. So it's available from the very beginning.

Indeed, thanks! I happened to overlooked this moment.
...
> And as I explained to you properly decoded the values _ARE_ correct and
> make sense.

Yes, just posted the diff itself to be sure. Thanks a huge for
explanation, Thomas!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:21     ` Thomas Gleixner
  2019-10-23 13:32       ` Cyrill Gorcunov
@ 2019-10-23 13:47       ` Thomas Gleixner
  2019-10-23 13:53         ` Cyrill Gorcunov
  2019-10-23 13:59         ` Cyrill Gorcunov
  1 sibling, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:47 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, 23 Oct 2019, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> Ergo ep must be a valid pointer pointing to the statically allocated and
> statically initialized estack_pages array.
> 
>         /* Guard page? */
>         if (!ep->size)
> 
> How on earth can dereferencing ep crash the machine?
> 
>                 return false;
> 
> That does not make any sense.
> 
> Surely, we should not even try to decode exception stack when
> cea_exception_stacks is not yet initialized, but that does not explain
> anything what you are observing.

So looking at your actual crash:

[    0.027246] BUG: unable to handle page fault for address: 0000000000001ff0

So this derefences the stack pointer address.

[    0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep
0xffffffff82014888

ep is pointing correctly to estack_pages[1] which is bogus because 0x1010
is not a valid stack value, but dereferencing ep does not make it crash.

The crash farther down:

    	end = begin + (unsigned long)ep->size;

==> end = 0x2000

        regs = (struct pt_regs *)end - 1;

==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

        info->type      = ep->type;
        info->begin     = (unsigned long *)begin;
        info->end       = (unsigned long *)end;

---->	info->next_sp   = (unsigned long *)regs->sp;

	This is the crashing instruction trying to access 0x1ff0

And you are right this happens because cea_exception_stacks is not yet
initialized which makes begin = 0 and therefore point into nirvana.

So the fix is trivial.

Thanks,

	tglx

8<------------
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:47       ` Thomas Gleixner
@ 2019-10-23 13:53         ` Cyrill Gorcunov
  2019-10-23 13:59         ` Cyrill Gorcunov
  1 sibling, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote:
> 
> And you are right this happens because cea_exception_stacks is not yet
> initialized which makes begin = 0 and therefore point into nirvana.
> 
> So the fix is trivial.

Great! Thanks a lot for sych detailed analysis! I'll test the patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:47       ` Thomas Gleixner
  2019-10-23 13:53         ` Cyrill Gorcunov
@ 2019-10-23 13:59         ` Cyrill Gorcunov
  2019-10-23 18:05           ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote:
> 
> So the fix is trivial.

Works like a charm.

Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 13:59         ` Cyrill Gorcunov
@ 2019-10-23 18:05           ` Thomas Gleixner
  2019-10-23 18:31             ` Matthew Wilcox
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Thomas Gleixner @ 2019-10-23 18:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas,
	x86, Josh Poimboeuf

Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/dumpstack_64.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:05           ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner
@ 2019-10-23 18:31             ` Matthew Wilcox
  2019-10-23 18:43               ` Cyrill Gorcunov
  2019-10-23 21:27               ` Thomas Gleixner
  2019-10-23 19:17             ` Josh Poimboeuf
  2019-11-04 23:56             ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
  2 siblings, 2 replies; 16+ messages in thread
From: Matthew Wilcox @ 2019-10-23 18:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> Prevent this by checking the validity of the cea_exception_stack base
> address and bailing out if it is zero.

Could also initialise cea_exception_stack to -1?  That would lead to it
being caught by ...

>  	end = begin + sizeof(struct cea_exception_stacks);
>  	/* Bail if @stack is outside the exception stack area. */
>  	if (stk < begin || stk >= end)

this existing check.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:31             ` Matthew Wilcox
@ 2019-10-23 18:43               ` Cyrill Gorcunov
  2019-10-23 21:27               ` Thomas Gleixner
  1 sibling, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 18:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, Oct 23, 2019 at 11:31:40AM -0700, Matthew Wilcox wrote:
> On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> > Prevent this by checking the validity of the cea_exception_stack base
> > address and bailing out if it is zero.
> 
> Could also initialise cea_exception_stack to -1?  That would lead to it
> being caught by ...
> 
> >  	end = begin + sizeof(struct cea_exception_stacks);
> >  	/* Bail if @stack is outside the exception stack area. */
> >  	if (stk < begin || stk >= end)
> 
> this existing check.

As to me this would be a hack and fragile :/ In turn the current explicit
test Thomas made is a way more readable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:05           ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner
  2019-10-23 18:31             ` Matthew Wilcox
@ 2019-10-23 19:17             ` Josh Poimboeuf
  2019-11-04 23:56             ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 16+ messages in thread
From: Josh Poimboeuf @ 2019-10-23 19:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86

On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> Cyrill reported the following crash:
> 
>   BUG: unable to handle page fault for address: 0000000000001ff0
>   #PF: supervisor read access in kernel mode
>   RIP: 0010:get_stack_info+0xb3/0x148
> 
> It turns out that if the stack tracer is invoked before the exception stack
> mappings are initialized in_exception_stack() can erroneously classify an
> invalid address as an address inside of an exception stack:
> 
>     begin = this_cpu_read(cea_exception_stacks);  <- 0
>     end = begin + sizeof(exception stacks);
> 
> i.e. any address between 0 and end will be considered as exception stack
> address and the subsequent code will then try to derefence the resulting
> stack frame at a non mapped address.
> 
>  end = begin + (unsigned long)ep->size;
>      ==> end = 0x2000
> 
>  regs = (struct pt_regs *)end - 1;
>      ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0
> 
>  info->next_sp   = (unsigned long *)regs->sp;
>      ==> Crashes due to accessing 0x1ff0
> 
> Prevent this by checking the validity of the cea_exception_stack base
> address and bailing out if it is zero.
> 
> Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
> Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: stable@vger.kernel.org

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:31             ` Matthew Wilcox
  2019-10-23 18:43               ` Cyrill Gorcunov
@ 2019-10-23 21:27               ` Thomas Gleixner
  1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2019-10-23 21:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, 23 Oct 2019, Matthew Wilcox wrote:

> On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> > Prevent this by checking the validity of the cea_exception_stack base
> > address and bailing out if it is zero.
> 
> Could also initialise cea_exception_stack to -1?  That would lead to it
> being caught by ...
> 
> >  	end = begin + sizeof(struct cea_exception_stacks);
> >  	/* Bail if @stack is outside the exception stack area. */
> >  	if (stk < begin || stk >= end)
> 
> this existing check.

Yes thought about that, but then decided to do it in a readable way :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip: x86/urgent] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:05           ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner
  2019-10-23 18:31             ` Matthew Wilcox
  2019-10-23 19:17             ` Josh Poimboeuf
@ 2019-11-04 23:56             ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2019-11-04 23:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Cyrill Gorcunov, Thomas Gleixner, Josh Poimboeuf, stable,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     e361362b08cab1098b64b0e5fd8c879f086b3f46
Gitweb:        https://git.kernel.org/tip/e361362b08cab1098b64b0e5fd8c879f086b3f46
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 23 Oct 2019 20:05:49 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 05 Nov 2019 00:51:35 +01:00

x86/dumpstack/64: Don't evaluate exception stacks before setup

Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de

---
 arch/x86/kernel/dumpstack_64.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 753b8cf..87b9789 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-11-04 23:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-19 11:44 [BUG -tip] kmemleak and stacktrace cause page faul Cyrill Gorcunov
2019-10-22 14:23 ` Cyrill Gorcunov
2019-10-22 14:56   ` Cyrill Gorcunov
2019-10-23 13:21     ` Thomas Gleixner
2019-10-23 13:32       ` Cyrill Gorcunov
2019-10-23 13:38         ` Thomas Gleixner
2019-10-23 13:44           ` Cyrill Gorcunov
2019-10-23 13:47       ` Thomas Gleixner
2019-10-23 13:53         ` Cyrill Gorcunov
2019-10-23 13:59         ` Cyrill Gorcunov
2019-10-23 18:05           ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner
2019-10-23 18:31             ` Matthew Wilcox
2019-10-23 18:43               ` Cyrill Gorcunov
2019-10-23 21:27               ` Thomas Gleixner
2019-10-23 19:17             ` Josh Poimboeuf
2019-11-04 23:56             ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).