All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG -tip] kmemleak and stacktrace cause page faul
@ 2019-10-19 11:44 Cyrill Gorcunov
  2019-10-22 14:23 ` Cyrill Gorcunov
  0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-19 11:44 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas

Hi! I'm not sure if I've CC'ed proper persons, so please sorry if I did.
Anyway, today's -tip (07b4dbf1d830) refused to boot

[    0.024793] No NUMA configuration found
[    0.025406] Faking a node at [mem 0x0000000000000000-0x000000007ffdefff]
[    0.026462] NODE_DATA(0) allocated [mem 0x7ffdb000-0x7ffdefff]
[    0.027246] BUG: unable to handle page fault for address: 0000000000001ff0
[    0.028160] #PF: supervisor read access in kernel mode
[    0.028992] #PF: error_code(0x0000) - not-present page
[    0.029820] PGD 0 P4D 0 
[    0.030226] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.031069] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00258-g07b4dbf1d830 #93
[    0.032317] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[    0.034163] RIP: 0010:get_stack_info+0xb3/0x148
[    0.034903] Code: 04 d5 84 48 01 82 66 85 c0 74 25 8b 0c d5 80 48 01 82 0f b7 14 d5 86 48 01 82 48 01 f1 89 13 48 01 c8 48 89 4b 08 48 89 43 10 <48> 8b 40 f0 eb 2b 65 48 8b 05 1f f4 f9 7e 48 8d 90 00 c0 ff ff 48
[    0.037579] RSP: 0000:ffffffff82603be0 EFLAGS: 00010006

I nailed it down to the following kmemleak code

create_object
  ...
  object->trace_len = __save_stack_trace(object->trace);

if I drop this line out it boots fine. Just wanted to share the observation,
probably it is known issue already.

Sidenote: The last -tip kernel which I've been working with is dated Sep 18
so the changes which cause the problem should be introduced last month.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-19 11:44 [BUG -tip] kmemleak and stacktrace cause page faul Cyrill Gorcunov
@ 2019-10-22 14:23 ` Cyrill Gorcunov
  2019-10-22 14:56   ` Cyrill Gorcunov
  0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-22 14:23 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas

On Sat, Oct 19, 2019 at 02:44:21PM +0300, Cyrill Gorcunov wrote:
...
> 
> I nailed it down to the following kmemleak code
> 
> create_object
>   ...
>   object->trace_len = __save_stack_trace(object->trace);
> 
> if I drop this line out it boots fine. Just wanted to share the observation,
> probably it is known issue already.
> 
> Sidenote: The last -tip kernel which I've been working with is dated Sep 18
> so the changes which cause the problem should be introduced last month.

I've just tried to boot with fresh -tip

commit c2e50c28eeb90c0f3309d43c2ce0bd292a6751b3 (HEAD -> master, origin/master, origin/HEAD)
Merge: aec1ea9d4f48 27a0a90d6301
Author: Ingo Molnar <mingo@kernel.org>
Date:   Tue Oct 22 01:16:59 2019 +0200

    Merge branch 'perf/core'
    
    Conflicts:
            tools/perf/check-headers.sh

but got the same issue. So I tried to go deeper, and here is a result: we're failing
in arch/x86/kernel/dumpstack_64.c:in_exception_stack routine, precisely at line

	ep = &estack_pages[k];
	/* Guard page? */
	if (!ep->size)
		return false;

so I added a logline here

[    0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
[    0.084951] BUG: unable to handle page fault for address: 0000000000001ff0
[    0.086724] #PF: supervisor read access in kernel mode
[    0.088506] #PF: error_code(0x0000) - not-present page
[    0.090265] PGD 0 P4D 0 
[    0.090846] Oops: 0000 [#2] PREEMPT SMP PTI
[    0.091734] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc4-00258-gc2e50c28eeb9-dirty #114
[    0.093514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[    0.096993] RIP: 0010:get_stack_info+0xdc/0x173
[    0.097994] Code: 84 48 01 82 66 85 c0 74 27 42 8b 14 f5 80 48 01 82 49 01 d5 42 0f b7 14 f5 86 48 01 82 4c 01 e8 4c 89 6b 08 89 13 48 89 43 10 <48> 8b 40 f0 eb 2b 65 48 8b 05 16 f4 f9 7e 48 8d 90 00 c0 ff ff 48

I presume the kmemleak tries to save stack trace too early when estack_pages are not
yet filled.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-22 14:23 ` Cyrill Gorcunov
@ 2019-10-22 14:56   ` Cyrill Gorcunov
  2019-10-23 13:21       ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-22 14:56 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas

On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> 
> I presume the kmemleak tries to save stack trace too early when estack_pages are not
> yet filled.

Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
yet and estack_pages full of crap

[    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
[    0.159395] estack_pages[0] = 0x0
[    0.160046] estack_pages[1] = 0x5100000001000
[    0.160881] estack_pages[2] = 0x0
[    0.161530] estack_pages[3] = 0x6100000003000
[    0.162343] estack_pages[4] = 0x0
[    0.162962] estack_pages[5] = 0x0
[    0.163523] estack_pages[6] = 0x0
[    0.164065] estack_pages[7] = 0x8100000007000
[    0.164978] estack_pages[8] = 0x0
[    0.165624] estack_pages[9] = 0x9100000009000
[    0.166448] estack_pages[10] = 0x0
[    0.167064] estack_pages[11] = 0xa10000000b000
[    0.168055] estack_pages[12] = 0x0
[    0.168891] BUG: unable to handle page fault for address: 0000000000001ff0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-22 14:56   ` Cyrill Gorcunov
@ 2019-10-23 13:21       ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:21 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > 
> > I presume the kmemleak tries to save stack trace too early when estack_pages are not
> > yet filled.
> 
> Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
> yet and estack_pages full of crap
> 
> [    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
> [    0.159395] estack_pages[0] = 0x0
> [    0.160046] estack_pages[1] = 0x5100000001000
> [    0.160881] estack_pages[2] = 0x0
> [    0.161530] estack_pages[3] = 0x6100000003000
> [    0.162343] estack_pages[4] = 0x0
> [    0.162962] estack_pages[5] = 0x0
> [    0.163523] estack_pages[6] = 0x0
> [    0.164065] estack_pages[7] = 0x8100000007000
> [    0.164978] estack_pages[8] = 0x0
> [    0.165624] estack_pages[9] = 0x9100000009000
> [    0.166448] estack_pages[10] = 0x0
> [    0.167064] estack_pages[11] = 0xa10000000b000
> [    0.168055] estack_pages[12] = 0x0

Errm. estack_pages is statically initialized and it's an array of:.

struct estack_pages {
        u32     offs;
        u16     size;
        u16     type;
};

[0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

The rest looks completely valid if you actually decode it proper.

e.g. 0x51000 00001000

     bit  0-31: 00001000		Offset 0x1000: 1 Page
     bit 32-47: 1000			Size 0x1000:   1 Page
     bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF

So, no. This is NOT the problem.

But yes, you are right that percpu_setup_exception_stacks() has not yet
been called simply because the percpu entry area has not been mapped yet.

So lets look at the full context:

        begin = (unsigned long)__this_cpu_read(cea_exception_stacks);

When percpu_setup_exception_stacks() has not been called yet, then begin
should be 0.

        end = begin + sizeof(struct cea_exception_stacks);

end should be 0 + sizeof(struct cea_exception_stacks);

        /* Bail if @stack is outside the exception stack area. */
        if (stk < begin || stk >= end)
                return false;

So 'begin <= stk < end' must be true to get to the below:

        /* Calc page offset from start of exception stacks */
        k = (stk - begin) >> PAGE_SHIFT;

which gives a valid 'k' no matter what 'begin' is. And obviously 'k' cannot
be outside of the array size of estack_pages.

        /* Lookup the page descriptor */
        ep = &estack_pages[k];

Ergo ep must be a valid pointer pointing to the statically allocated and
statically initialized estack_pages array.

        /* Guard page? */
        if (!ep->size)

How on earth can dereferencing ep crash the machine?

                return false;

That does not make any sense.

Surely, we should not even try to decode exception stack when
cea_exception_stacks is not yet initialized, but that does not explain
anything what you are observing.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
@ 2019-10-23 13:21       ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:21 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > 
> > I presume the kmemleak tries to save stack trace too early when estack_pages are not
> > yet filled.
> 
> Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
> yet and estack_pages full of crap
> 
> [    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
> [    0.159395] estack_pages[0] = 0x0
> [    0.160046] estack_pages[1] = 0x5100000001000
> [    0.160881] estack_pages[2] = 0x0
> [    0.161530] estack_pages[3] = 0x6100000003000
> [    0.162343] estack_pages[4] = 0x0
> [    0.162962] estack_pages[5] = 0x0
> [    0.163523] estack_pages[6] = 0x0
> [    0.164065] estack_pages[7] = 0x8100000007000
> [    0.164978] estack_pages[8] = 0x0
> [    0.165624] estack_pages[9] = 0x9100000009000
> [    0.166448] estack_pages[10] = 0x0
> [    0.167064] estack_pages[11] = 0xa10000000b000
> [    0.168055] estack_pages[12] = 0x0

Errm. estack_pages is statically initialized and it's an array of:.

struct estack_pages {
        u32     offs;
        u16     size;
        u16     type;
};

[0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

The rest looks completely valid if you actually decode it proper.

e.g. 0x51000 00001000

     bit  0-31: 00001000		Offset 0x1000: 1 Page
     bit 32-47: 1000			Size 0x1000:   1 Page
     bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF

So, no. This is NOT the problem.

But yes, you are right that percpu_setup_exception_stacks() has not yet
been called simply because the percpu entry area has not been mapped yet.

So lets look at the full context:

        begin = (unsigned long)__this_cpu_read(cea_exception_stacks);

When percpu_setup_exception_stacks() has not been called yet, then begin
should be 0.

        end = begin + sizeof(struct cea_exception_stacks);

end should be 0 + sizeof(struct cea_exception_stacks);

        /* Bail if @stack is outside the exception stack area. */
        if (stk < begin || stk >= end)
                return false;

So 'begin <= stk < end' must be true to get to the below:

        /* Calc page offset from start of exception stacks */
        k = (stk - begin) >> PAGE_SHIFT;

which gives a valid 'k' no matter what 'begin' is. And obviously 'k' cannot
be outside of the array size of estack_pages.

        /* Lookup the page descriptor */
        ep = &estack_pages[k];

Ergo ep must be a valid pointer pointing to the statically allocated and
statically initialized estack_pages array.

        /* Guard page? */
        if (!ep->size)

How on earth can dereferencing ep crash the machine?

                return false;

That does not make any sense.

Surely, we should not even try to decode exception stack when
cea_exception_stacks is not yet initialized, but that does not explain
anything what you are observing.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:21       ` Thomas Gleixner
  (?)
@ 2019-10-23 13:32       ` Cyrill Gorcunov
  2019-10-23 13:38           ` Thomas Gleixner
  -1 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> > On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote:
> > > 
> > > I presume the kmemleak tries to save stack trace too early when estack_pages are not
> > > yet filled.
> > 
> > Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called
> > yet and estack_pages full of crap
> > 
> > [    0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888
> > [    0.159395] estack_pages[0] = 0x0
> > [    0.160046] estack_pages[1] = 0x5100000001000
> > [    0.160881] estack_pages[2] = 0x0
> > [    0.161530] estack_pages[3] = 0x6100000003000
> > [    0.162343] estack_pages[4] = 0x0
> > [    0.162962] estack_pages[5] = 0x0
> > [    0.163523] estack_pages[6] = 0x0
> > [    0.164065] estack_pages[7] = 0x8100000007000
> > [    0.164978] estack_pages[8] = 0x0
> > [    0.165624] estack_pages[9] = 0x9100000009000
> > [    0.166448] estack_pages[10] = 0x0
> > [    0.167064] estack_pages[11] = 0xa10000000b000
> > [    0.168055] estack_pages[12] = 0x0
> 
> Errm. estack_pages is statically initialized and it's an array of:.
> 
> struct estack_pages {
>         u32     offs;
>         u16     size;
>         u16     type;
> };
> 
> [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all

Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
the memory for this estack_pages has not yet been allocated, no?

> The rest looks completely valid if you actually decode it proper.

The diff I made to fetch the values are

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 753b8cfe8b8a..bf0d755b6079 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 
 	/* Calc page offset from start of exception stacks */
 	k = (stk - begin) >> PAGE_SHIFT;
+
 	/* Lookup the page descriptor */
 	ep = &estack_pages[k];
+
+	printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n",
+	       stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep);
+
+	for (k = 0; k < CEA_ESTACK_PAGES; k++) {
+		long v = *(long *)(void *)&estack_pages[k];
+		printk("estack_pages[%d] = 0x%lx\n", k, v);
+	}
+
 	/* Guard page? */
 	if (!ep->size)
 		return false;


> 
> e.g. 0x51000 00001000
> 
>      bit  0-31: 00001000		Offset 0x1000: 1 Page
>      bit 32-47: 1000			Size 0x1000:   1 Page
>      bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> 
> So, no. This is NOT the problem.

I drop the left of your reply. True, I agreed with anything you said.
You know I didn't manage to dive more into this problem yesterday
but if time permits I'll continue today. It is easily triggering
under kvm (the kernel I'm building is almost without modules so
I simply upload bzImage into the guest). FWIW, the config I'm
using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:32       ` Cyrill Gorcunov
@ 2019-10-23 13:38           ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:38 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, 23 Oct 2019, Cyrill Gorcunov wrote:
> On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> > Errm. estack_pages is statically initialized and it's an array of:.
> > 
> > struct estack_pages {
> >         u32     offs;
> >         u16     size;
> >         u16     type;
> > };
> > 
> > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> 
> Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> the memory for this estack_pages has not yet been allocated, no?

static const
struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
        EPAGERANGE(DF),
	EPAGERANGE(NMI),
	EPAGERANGE(DB1),
	EPAGERANGE(DB),
        EPAGERANGE(MCE),
};

It's statically allocated. So it's available from the very beginning.

> The diff I made to fetch the values are
> 
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..bf0d755b6079 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
>  
>  	/* Calc page offset from start of exception stacks */
>  	k = (stk - begin) >> PAGE_SHIFT;
> +
>  	/* Lookup the page descriptor */
>  	ep = &estack_pages[k];
> +
> +	printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n",
> +	       stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep);
> +
> +	for (k = 0; k < CEA_ESTACK_PAGES; k++) {
> +		long v = *(long *)(void *)&estack_pages[k];
> +		printk("estack_pages[%d] = 0x%lx\n", k, v);

And as I explained to you properly decoded the values _ARE_ correct and
make sense.

> +	}
> +
>  	/* Guard page? */
>  	if (!ep->size)
>  		return false;
> 
> 
> > 
> > e.g. 0x51000 00001000
> > 
> >      bit  0-31: 00001000		Offset 0x1000: 1 Page
> >      bit 32-47: 1000			Size 0x1000:   1 Page
> >      bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> > 
> > So, no. This is NOT the problem.
> 
> I drop the left of your reply. True, I agreed with anything you said.
>
> You know I didn't manage to dive more into this problem yesterday
> but if time permits I'll continue today. It is easily triggering
> under kvm (the kernel I'm building is almost without modules so
> I simply upload bzImage into the guest). FWIW, the config I'm
> using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

That's helpful because I enabled kmemleak and the kernel comes up just fine.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
@ 2019-10-23 13:38           ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:38 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, 23 Oct 2019, Cyrill Gorcunov wrote:
> On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote:
> > Errm. estack_pages is statically initialized and it's an array of:.
> > 
> > struct estack_pages {
> >         u32     offs;
> >         u16     size;
> >         u16     type;
> > };
> > 
> > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> 
> Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> the memory for this estack_pages has not yet been allocated, no?

static const
struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
        EPAGERANGE(DF),
	EPAGERANGE(NMI),
	EPAGERANGE(DB1),
	EPAGERANGE(DB),
        EPAGERANGE(MCE),
};

It's statically allocated. So it's available from the very beginning.

> The diff I made to fetch the values are
> 
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..bf0d755b6079 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
>  
>  	/* Calc page offset from start of exception stacks */
>  	k = (stk - begin) >> PAGE_SHIFT;
> +
>  	/* Lookup the page descriptor */
>  	ep = &estack_pages[k];
> +
> +	printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n",
> +	       stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep);
> +
> +	for (k = 0; k < CEA_ESTACK_PAGES; k++) {
> +		long v = *(long *)(void *)&estack_pages[k];
> +		printk("estack_pages[%d] = 0x%lx\n", k, v);

And as I explained to you properly decoded the values _ARE_ correct and
make sense.

> +	}
> +
>  	/* Guard page? */
>  	if (!ep->size)
>  		return false;
> 
> 
> > 
> > e.g. 0x51000 00001000
> > 
> >      bit  0-31: 00001000		Offset 0x1000: 1 Page
> >      bit 32-47: 1000			Size 0x1000:   1 Page
> >      bit 48-63: 5			Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF
> > 
> > So, no. This is NOT the problem.
> 
> I drop the left of your reply. True, I agreed with anything you said.
>
> You know I didn't manage to dive more into this problem yesterday
> but if time permits I'll continue today. It is easily triggering
> under kvm (the kernel I'm building is almost without modules so
> I simply upload bzImage into the guest). FWIW, the config I'm
> using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b

That's helpful because I enabled kmemleak and the kernel comes up just fine.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:38           ` Thomas Gleixner
  (?)
@ 2019-10-23 13:44           ` Cyrill Gorcunov
  -1 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:38:40PM +0200, Thomas Gleixner wrote:
> > > 
> > > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all
> > 
> > Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer,
> > the memory for this estack_pages has not yet been allocated, no?
> 
> static const
> struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
>         EPAGERANGE(DF),
> 	EPAGERANGE(NMI),
> 	EPAGERANGE(DB1),
> 	EPAGERANGE(DB),
>         EPAGERANGE(MCE),
> };
> 
> It's statically allocated. So it's available from the very beginning.

Indeed, thanks! I happened to overlooked this moment.
...
> And as I explained to you properly decoded the values _ARE_ correct and
> make sense.

Yes, just posted the diff itself to be sure. Thanks a huge for
explanation, Thomas!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:21       ` Thomas Gleixner
@ 2019-10-23 13:47         ` Thomas Gleixner
  -1 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:47 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, 23 Oct 2019, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> Ergo ep must be a valid pointer pointing to the statically allocated and
> statically initialized estack_pages array.
> 
>         /* Guard page? */
>         if (!ep->size)
> 
> How on earth can dereferencing ep crash the machine?
> 
>                 return false;
> 
> That does not make any sense.
> 
> Surely, we should not even try to decode exception stack when
> cea_exception_stacks is not yet initialized, but that does not explain
> anything what you are observing.

So looking at your actual crash:

[    0.027246] BUG: unable to handle page fault for address: 0000000000001ff0

So this derefences the stack pointer address.

[    0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep
0xffffffff82014888

ep is pointing correctly to estack_pages[1] which is bogus because 0x1010
is not a valid stack value, but dereferencing ep does not make it crash.

The crash farther down:

    	end = begin + (unsigned long)ep->size;

==> end = 0x2000

        regs = (struct pt_regs *)end - 1;

==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

        info->type      = ep->type;
        info->begin     = (unsigned long *)begin;
        info->end       = (unsigned long *)end;

---->	info->next_sp   = (unsigned long *)regs->sp;

	This is the crashing instruction trying to access 0x1ff0

And you are right this happens because cea_exception_stacks is not yet
initialized which makes begin = 0 and therefore point into nirvana.

So the fix is trivial.

Thanks,

	tglx

8<------------
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
@ 2019-10-23 13:47         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 13:47 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, 23 Oct 2019, Thomas Gleixner wrote:
> On Tue, 22 Oct 2019, Cyrill Gorcunov wrote:
> Ergo ep must be a valid pointer pointing to the statically allocated and
> statically initialized estack_pages array.
> 
>         /* Guard page? */
>         if (!ep->size)
> 
> How on earth can dereferencing ep crash the machine?
> 
>                 return false;
> 
> That does not make any sense.
> 
> Surely, we should not even try to decode exception stack when
> cea_exception_stacks is not yet initialized, but that does not explain
> anything what you are observing.

So looking at your actual crash:

[    0.027246] BUG: unable to handle page fault for address: 0000000000001ff0

So this derefences the stack pointer address.

[    0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep
0xffffffff82014888

ep is pointing correctly to estack_pages[1] which is bogus because 0x1010
is not a valid stack value, but dereferencing ep does not make it crash.

The crash farther down:

    	end = begin + (unsigned long)ep->size;

==> end = 0x2000

        regs = (struct pt_regs *)end - 1;

==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

        info->type      = ep->type;
        info->begin     = (unsigned long *)begin;
        info->end       = (unsigned long *)end;

---->	info->next_sp   = (unsigned long *)regs->sp;

	This is the crashing instruction trying to access 0x1ff0

And you are right this happens because cea_exception_stacks is not yet
initialized which makes begin = 0 and therefore point into nirvana.

So the fix is trivial.

Thanks,

	tglx

8<------------
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:47         ` Thomas Gleixner
  (?)
@ 2019-10-23 13:53         ` Cyrill Gorcunov
  -1 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote:
> 
> And you are right this happens because cea_exception_stacks is not yet
> initialized which makes begin = 0 and therefore point into nirvana.
> 
> So the fix is trivial.

Great! Thanks a lot for sych detailed analysis! I'll test the patch.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [BUG -tip] kmemleak and stacktrace cause page faul
  2019-10-23 13:47         ` Thomas Gleixner
  (?)
  (?)
@ 2019-10-23 13:59         ` Cyrill Gorcunov
  2019-10-23 18:05             ` Thomas Gleixner
  -1 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 13:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas

On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote:
> 
> So the fix is trivial.

Works like a charm.

Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 13:59         ` Cyrill Gorcunov
@ 2019-10-23 18:05             ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 18:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas,
	x86, Josh Poimboeuf

Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/dumpstack_64.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
@ 2019-10-23 18:05             ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 18:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas,
	x86, Josh Poimboeuf

Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/dumpstack_64.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:05             ` Thomas Gleixner
  (?)
@ 2019-10-23 18:31             ` Matthew Wilcox
  2019-10-23 18:43               ` Cyrill Gorcunov
  2019-10-23 21:27                 ` Thomas Gleixner
  -1 siblings, 2 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-10-23 18:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> Prevent this by checking the validity of the cea_exception_stack base
> address and bailing out if it is zero.

Could also initialise cea_exception_stack to -1?  That would lead to it
being caught by ...

>  	end = begin + sizeof(struct cea_exception_stacks);
>  	/* Bail if @stack is outside the exception stack area. */
>  	if (stk < begin || stk >= end)

this existing check.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:31             ` Matthew Wilcox
@ 2019-10-23 18:43               ` Cyrill Gorcunov
  2019-10-23 21:27                 ` Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23 18:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, Oct 23, 2019 at 11:31:40AM -0700, Matthew Wilcox wrote:
> On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> > Prevent this by checking the validity of the cea_exception_stack base
> > address and bailing out if it is zero.
> 
> Could also initialise cea_exception_stack to -1?  That would lead to it
> being caught by ...
> 
> >  	end = begin + sizeof(struct cea_exception_stacks);
> >  	/* Bail if @stack is outside the exception stack area. */
> >  	if (stk < begin || stk >= end)
> 
> this existing check.

As to me this would be a hack and fragile :/ In turn the current explicit
test Thomas made is a way more readable.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:05             ` Thomas Gleixner
  (?)
  (?)
@ 2019-10-23 19:17             ` Josh Poimboeuf
  -1 siblings, 0 replies; 21+ messages in thread
From: Josh Poimboeuf @ 2019-10-23 19:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86

On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> Cyrill reported the following crash:
> 
>   BUG: unable to handle page fault for address: 0000000000001ff0
>   #PF: supervisor read access in kernel mode
>   RIP: 0010:get_stack_info+0xb3/0x148
> 
> It turns out that if the stack tracer is invoked before the exception stack
> mappings are initialized in_exception_stack() can erroneously classify an
> invalid address as an address inside of an exception stack:
> 
>     begin = this_cpu_read(cea_exception_stacks);  <- 0
>     end = begin + sizeof(exception stacks);
> 
> i.e. any address between 0 and end will be considered as exception stack
> address and the subsequent code will then try to derefence the resulting
> stack frame at a non mapped address.
> 
>  end = begin + (unsigned long)ep->size;
>      ==> end = 0x2000
> 
>  regs = (struct pt_regs *)end - 1;
>      ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0
> 
>  info->next_sp   = (unsigned long *)regs->sp;
>      ==> Crashes due to accessing 0x1ff0
> 
> Prevent this by checking the validity of the cea_exception_stack base
> address and bailing out if it is zero.
> 
> Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
> Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: stable@vger.kernel.org

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:31             ` Matthew Wilcox
@ 2019-10-23 21:27                 ` Thomas Gleixner
  2019-10-23 21:27                 ` Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 21:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, 23 Oct 2019, Matthew Wilcox wrote:

> On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> > Prevent this by checking the validity of the cea_exception_stack base
> > address and bailing out if it is zero.
> 
> Could also initialise cea_exception_stack to -1?  That would lead to it
> being caught by ...
> 
> >  	end = begin + sizeof(struct cea_exception_stacks);
> >  	/* Bail if @stack is outside the exception stack area. */
> >  	if (stk < begin || stk >= end)
> 
> this existing check.

Yes thought about that, but then decided to do it in a readable way :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup
@ 2019-10-23 21:27                 ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-23 21:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm,
	Catalin Marinas, x86, Josh Poimboeuf

On Wed, 23 Oct 2019, Matthew Wilcox wrote:

> On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote:
> > Prevent this by checking the validity of the cea_exception_stack base
> > address and bailing out if it is zero.
> 
> Could also initialise cea_exception_stack to -1?  That would lead to it
> being caught by ...
> 
> >  	end = begin + sizeof(struct cea_exception_stacks);
> >  	/* Bail if @stack is outside the exception stack area. */
> >  	if (stk < begin || stk >= end)
> 
> this existing check.

Yes thought about that, but then decided to do it in a readable way :)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip: x86/urgent] x86/dumpstack/64: Don't evaluate exception stacks before setup
  2019-10-23 18:05             ` Thomas Gleixner
                               ` (2 preceding siblings ...)
  (?)
@ 2019-11-04 23:56             ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2019-11-04 23:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Cyrill Gorcunov, Thomas Gleixner, Josh Poimboeuf, stable,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     e361362b08cab1098b64b0e5fd8c879f086b3f46
Gitweb:        https://git.kernel.org/tip/e361362b08cab1098b64b0e5fd8c879f086b3f46
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 23 Oct 2019 20:05:49 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 05 Nov 2019 00:51:35 +01:00

x86/dumpstack/64: Don't evaluate exception stacks before setup

Cyrill reported the following crash:

  BUG: unable to handle page fault for address: 0000000000001ff0
  #PF: supervisor read access in kernel mode
  RIP: 0010:get_stack_info+0xb3/0x148

It turns out that if the stack tracer is invoked before the exception stack
mappings are initialized in_exception_stack() can erroneously classify an
invalid address as an address inside of an exception stack:

    begin = this_cpu_read(cea_exception_stacks);  <- 0
    end = begin + sizeof(exception stacks);

i.e. any address between 0 and end will be considered as exception stack
address and the subsequent code will then try to derefence the resulting
stack frame at a non mapped address.

 end = begin + (unsigned long)ep->size;
     ==> end = 0x2000

 regs = (struct pt_regs *)end - 1;
     ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0

 info->next_sp   = (unsigned long *)regs->sp;
     ==> Crashes due to accessing 0x1ff0

Prevent this by checking the validity of the cea_exception_stack base
address and bailing out if it is zero.

Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de

---
 arch/x86/kernel/dumpstack_64.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 753b8cf..87b9789 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
 	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
+	/*
+	 * Handle the case where stack trace is collected _before_
+	 * cea_exception_stacks had been initialized.
+	 */
+	if (!begin)
+		return false;
+
 	end = begin + sizeof(struct cea_exception_stacks);
 	/* Bail if @stack is outside the exception stack area. */
 	if (stk < begin || stk >= end)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-11-04 23:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-19 11:44 [BUG -tip] kmemleak and stacktrace cause page faul Cyrill Gorcunov
2019-10-22 14:23 ` Cyrill Gorcunov
2019-10-22 14:56   ` Cyrill Gorcunov
2019-10-23 13:21     ` Thomas Gleixner
2019-10-23 13:21       ` Thomas Gleixner
2019-10-23 13:32       ` Cyrill Gorcunov
2019-10-23 13:38         ` Thomas Gleixner
2019-10-23 13:38           ` Thomas Gleixner
2019-10-23 13:44           ` Cyrill Gorcunov
2019-10-23 13:47       ` Thomas Gleixner
2019-10-23 13:47         ` Thomas Gleixner
2019-10-23 13:53         ` Cyrill Gorcunov
2019-10-23 13:59         ` Cyrill Gorcunov
2019-10-23 18:05           ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner
2019-10-23 18:05             ` Thomas Gleixner
2019-10-23 18:31             ` Matthew Wilcox
2019-10-23 18:43               ` Cyrill Gorcunov
2019-10-23 21:27               ` Thomas Gleixner
2019-10-23 21:27                 ` Thomas Gleixner
2019-10-23 19:17             ` Josh Poimboeuf
2019-11-04 23:56             ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.