* [BUG -tip] kmemleak and stacktrace cause page faul @ 2019-10-19 11:44 Cyrill Gorcunov 2019-10-22 14:23 ` Cyrill Gorcunov 0 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-19 11:44 UTC (permalink / raw) To: LKML Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas Hi! I'm not sure if I've CC'ed proper persons, so please sorry if I did. Anyway, today's -tip (07b4dbf1d830) refused to boot [ 0.024793] No NUMA configuration found [ 0.025406] Faking a node at [mem 0x0000000000000000-0x000000007ffdefff] [ 0.026462] NODE_DATA(0) allocated [mem 0x7ffdb000-0x7ffdefff] [ 0.027246] BUG: unable to handle page fault for address: 0000000000001ff0 [ 0.028160] #PF: supervisor read access in kernel mode [ 0.028992] #PF: error_code(0x0000) - not-present page [ 0.029820] PGD 0 P4D 0 [ 0.030226] Oops: 0000 [#1] PREEMPT SMP PTI [ 0.031069] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00258-g07b4dbf1d830 #93 [ 0.032317] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [ 0.034163] RIP: 0010:get_stack_info+0xb3/0x148 [ 0.034903] Code: 04 d5 84 48 01 82 66 85 c0 74 25 8b 0c d5 80 48 01 82 0f b7 14 d5 86 48 01 82 48 01 f1 89 13 48 01 c8 48 89 4b 08 48 89 43 10 <48> 8b 40 f0 eb 2b 65 48 8b 05 1f f4 f9 7e 48 8d 90 00 c0 ff ff 48 [ 0.037579] RSP: 0000:ffffffff82603be0 EFLAGS: 00010006 I nailed it down to the following kmemleak code create_object ... object->trace_len = __save_stack_trace(object->trace); if I drop this line out it boots fine. Just wanted to share the observation, probably it is known issue already. Sidenote: The last -tip kernel which I've been working with is dated Sep 18 so the changes which cause the problem should be introduced last month. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-19 11:44 [BUG -tip] kmemleak and stacktrace cause page faul Cyrill Gorcunov @ 2019-10-22 14:23 ` Cyrill Gorcunov 2019-10-22 14:56 ` Cyrill Gorcunov 0 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-22 14:23 UTC (permalink / raw) To: LKML Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas On Sat, Oct 19, 2019 at 02:44:21PM +0300, Cyrill Gorcunov wrote: ... > > I nailed it down to the following kmemleak code > > create_object > ... > object->trace_len = __save_stack_trace(object->trace); > > if I drop this line out it boots fine. Just wanted to share the observation, > probably it is known issue already. > > Sidenote: The last -tip kernel which I've been working with is dated Sep 18 > so the changes which cause the problem should be introduced last month. I've just tried to boot with fresh -tip commit c2e50c28eeb90c0f3309d43c2ce0bd292a6751b3 (HEAD -> master, origin/master, origin/HEAD) Merge: aec1ea9d4f48 27a0a90d6301 Author: Ingo Molnar <mingo@kernel.org> Date: Tue Oct 22 01:16:59 2019 +0200 Merge branch 'perf/core' Conflicts: tools/perf/check-headers.sh but got the same issue. So I tried to go deeper, and here is a result: we're failing in arch/x86/kernel/dumpstack_64.c:in_exception_stack routine, precisely at line ep = &estack_pages[k]; /* Guard page? */ if (!ep->size) return false; so I added a logline here [ 0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 [ 0.084951] BUG: unable to handle page fault for address: 0000000000001ff0 [ 0.086724] #PF: supervisor read access in kernel mode [ 0.088506] #PF: error_code(0x0000) - not-present page [ 0.090265] PGD 0 P4D 0 [ 0.090846] Oops: 0000 [#2] PREEMPT SMP PTI [ 0.091734] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc4-00258-gc2e50c28eeb9-dirty #114 [ 0.093514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [ 0.096993] RIP: 0010:get_stack_info+0xdc/0x173 [ 0.097994] Code: 84 48 01 82 66 85 c0 74 27 42 8b 14 f5 80 48 01 82 49 01 d5 42 0f b7 14 f5 86 48 01 82 4c 01 e8 4c 89 6b 08 89 13 48 89 43 10 <48> 8b 40 f0 eb 2b 65 48 8b 05 16 f4 f9 7e 48 8d 90 00 c0 ff ff 48 I presume the kmemleak tries to save stack trace too early when estack_pages are not yet filled. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-22 14:23 ` Cyrill Gorcunov @ 2019-10-22 14:56 ` Cyrill Gorcunov 2019-10-23 13:21 ` Thomas Gleixner 0 siblings, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-22 14:56 UTC (permalink / raw) To: LKML Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-mm, Catalin Marinas On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote: > > I presume the kmemleak tries to save stack trace too early when estack_pages are not > yet filled. Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called yet and estack_pages full of crap [ 0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 [ 0.159395] estack_pages[0] = 0x0 [ 0.160046] estack_pages[1] = 0x5100000001000 [ 0.160881] estack_pages[2] = 0x0 [ 0.161530] estack_pages[3] = 0x6100000003000 [ 0.162343] estack_pages[4] = 0x0 [ 0.162962] estack_pages[5] = 0x0 [ 0.163523] estack_pages[6] = 0x0 [ 0.164065] estack_pages[7] = 0x8100000007000 [ 0.164978] estack_pages[8] = 0x0 [ 0.165624] estack_pages[9] = 0x9100000009000 [ 0.166448] estack_pages[10] = 0x0 [ 0.167064] estack_pages[11] = 0xa10000000b000 [ 0.168055] estack_pages[12] = 0x0 [ 0.168891] BUG: unable to handle page fault for address: 0000000000001ff0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-22 14:56 ` Cyrill Gorcunov @ 2019-10-23 13:21 ` Thomas Gleixner 2019-10-23 13:32 ` Cyrill Gorcunov 2019-10-23 13:47 ` Thomas Gleixner 0 siblings, 2 replies; 15+ messages in thread From: Thomas Gleixner @ 2019-10-23 13:21 UTC (permalink / raw) To: Cyrill Gorcunov Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Tue, 22 Oct 2019, Cyrill Gorcunov wrote: > On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote: > > > > I presume the kmemleak tries to save stack trace too early when estack_pages are not > > yet filled. > > Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called > yet and estack_pages full of crap > > [ 0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 > [ 0.159395] estack_pages[0] = 0x0 > [ 0.160046] estack_pages[1] = 0x5100000001000 > [ 0.160881] estack_pages[2] = 0x0 > [ 0.161530] estack_pages[3] = 0x6100000003000 > [ 0.162343] estack_pages[4] = 0x0 > [ 0.162962] estack_pages[5] = 0x0 > [ 0.163523] estack_pages[6] = 0x0 > [ 0.164065] estack_pages[7] = 0x8100000007000 > [ 0.164978] estack_pages[8] = 0x0 > [ 0.165624] estack_pages[9] = 0x9100000009000 > [ 0.166448] estack_pages[10] = 0x0 > [ 0.167064] estack_pages[11] = 0xa10000000b000 > [ 0.168055] estack_pages[12] = 0x0 Errm. estack_pages is statically initialized and it's an array of:. struct estack_pages { u32 offs; u16 size; u16 type; }; [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all The rest looks completely valid if you actually decode it proper. e.g. 0x51000 00001000 bit 0-31: 00001000 Offset 0x1000: 1 Page bit 32-47: 1000 Size 0x1000: 1 Page bit 48-63: 5 Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF So, no. This is NOT the problem. But yes, you are right that percpu_setup_exception_stacks() has not yet been called simply because the percpu entry area has not been mapped yet. So lets look at the full context: begin = (unsigned long)__this_cpu_read(cea_exception_stacks); When percpu_setup_exception_stacks() has not been called yet, then begin should be 0. end = begin + sizeof(struct cea_exception_stacks); end should be 0 + sizeof(struct cea_exception_stacks); /* Bail if @stack is outside the exception stack area. */ if (stk < begin || stk >= end) return false; So 'begin <= stk < end' must be true to get to the below: /* Calc page offset from start of exception stacks */ k = (stk - begin) >> PAGE_SHIFT; which gives a valid 'k' no matter what 'begin' is. And obviously 'k' cannot be outside of the array size of estack_pages. /* Lookup the page descriptor */ ep = &estack_pages[k]; Ergo ep must be a valid pointer pointing to the statically allocated and statically initialized estack_pages array. /* Guard page? */ if (!ep->size) How on earth can dereferencing ep crash the machine? return false; That does not make any sense. Surely, we should not even try to decode exception stack when cea_exception_stacks is not yet initialized, but that does not explain anything what you are observing. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-23 13:21 ` Thomas Gleixner @ 2019-10-23 13:32 ` Cyrill Gorcunov 2019-10-23 13:38 ` Thomas Gleixner 2019-10-23 13:47 ` Thomas Gleixner 1 sibling, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-23 13:32 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote: > On Tue, 22 Oct 2019, Cyrill Gorcunov wrote: > > On Tue, Oct 22, 2019 at 05:23:25PM +0300, Cyrill Gorcunov wrote: > > > > > > I presume the kmemleak tries to save stack trace too early when estack_pages are not > > > yet filled. > > > > Indeed, at this stage of boot the percpu_setup_exception_stacks has not been called > > yet and estack_pages full of crap > > > > [ 0.157502] stk 0x1008 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 > > [ 0.159395] estack_pages[0] = 0x0 > > [ 0.160046] estack_pages[1] = 0x5100000001000 > > [ 0.160881] estack_pages[2] = 0x0 > > [ 0.161530] estack_pages[3] = 0x6100000003000 > > [ 0.162343] estack_pages[4] = 0x0 > > [ 0.162962] estack_pages[5] = 0x0 > > [ 0.163523] estack_pages[6] = 0x0 > > [ 0.164065] estack_pages[7] = 0x8100000007000 > > [ 0.164978] estack_pages[8] = 0x0 > > [ 0.165624] estack_pages[9] = 0x9100000009000 > > [ 0.166448] estack_pages[10] = 0x0 > > [ 0.167064] estack_pages[11] = 0xa10000000b000 > > [ 0.168055] estack_pages[12] = 0x0 > > Errm. estack_pages is statically initialized and it's an array of:. > > struct estack_pages { > u32 offs; > u16 size; > u16 type; > }; > > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer, the memory for this estack_pages has not yet been allocated, no? > The rest looks completely valid if you actually decode it proper. The diff I made to fetch the values are diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index 753b8cfe8b8a..bf0d755b6079 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info) /* Calc page offset from start of exception stacks */ k = (stk - begin) >> PAGE_SHIFT; + /* Lookup the page descriptor */ ep = &estack_pages[k]; + + printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n", + stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep); + + for (k = 0; k < CEA_ESTACK_PAGES; k++) { + long v = *(long *)(void *)&estack_pages[k]; + printk("estack_pages[%d] = 0x%lx\n", k, v); + } + /* Guard page? */ if (!ep->size) return false; > > e.g. 0x51000 00001000 > > bit 0-31: 00001000 Offset 0x1000: 1 Page > bit 32-47: 1000 Size 0x1000: 1 Page > bit 48-63: 5 Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF > > So, no. This is NOT the problem. I drop the left of your reply. True, I agreed with anything you said. You know I didn't manage to dive more into this problem yesterday but if time permits I'll continue today. It is easily triggering under kvm (the kernel I'm building is almost without modules so I simply upload bzImage into the guest). FWIW, the config I'm using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-23 13:32 ` Cyrill Gorcunov @ 2019-10-23 13:38 ` Thomas Gleixner 2019-10-23 13:44 ` Cyrill Gorcunov 0 siblings, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2019-10-23 13:38 UTC (permalink / raw) To: Cyrill Gorcunov Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Wed, 23 Oct 2019, Cyrill Gorcunov wrote: > On Wed, Oct 23, 2019 at 03:21:05PM +0200, Thomas Gleixner wrote: > > Errm. estack_pages is statically initialized and it's an array of:. > > > > struct estack_pages { > > u32 offs; > > u16 size; > > u16 type; > > }; > > > > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all > > Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer, > the memory for this estack_pages has not yet been allocated, no? static const struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = { EPAGERANGE(DF), EPAGERANGE(NMI), EPAGERANGE(DB1), EPAGERANGE(DB), EPAGERANGE(MCE), }; It's statically allocated. So it's available from the very beginning. > The diff I made to fetch the values are > > diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c > index 753b8cfe8b8a..bf0d755b6079 100644 > --- a/arch/x86/kernel/dumpstack_64.c > +++ b/arch/x86/kernel/dumpstack_64.c > @@ -101,8 +101,18 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info) > > /* Calc page offset from start of exception stacks */ > k = (stk - begin) >> PAGE_SHIFT; > + > /* Lookup the page descriptor */ > ep = &estack_pages[k]; > + > + printk("stk 0x%lx k %u begin 0x%lx end 0x%lx estack_pages 0x%lx ep 0x%lx\n", > + stk, k, begin, end, (long)(void *)&estack_pages[0], (long)(void *)ep); > + > + for (k = 0; k < CEA_ESTACK_PAGES; k++) { > + long v = *(long *)(void *)&estack_pages[k]; > + printk("estack_pages[%d] = 0x%lx\n", k, v); And as I explained to you properly decoded the values _ARE_ correct and make sense. > + } > + > /* Guard page? */ > if (!ep->size) > return false; > > > > > > e.g. 0x51000 00001000 > > > > bit 0-31: 00001000 Offset 0x1000: 1 Page > > bit 32-47: 1000 Size 0x1000: 1 Page > > bit 48-63: 5 Type 5: STACK_TYPE_EXCEPTION + ESTACK_DF > > > > So, no. This is NOT the problem. > > I drop the left of your reply. True, I agreed with anything you said. > > You know I didn't manage to dive more into this problem yesterday > but if time permits I'll continue today. It is easily triggering > under kvm (the kernel I'm building is almost without modules so > I simply upload bzImage into the guest). FWIW, the config I'm > using is https://gist.github.com/cyrillos/7cd5d2510a99af8ea872f07ac6f9095b That's helpful because I enabled kmemleak and the kernel comes up just fine. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-23 13:38 ` Thomas Gleixner @ 2019-10-23 13:44 ` Cyrill Gorcunov 0 siblings, 0 replies; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-23 13:44 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Wed, Oct 23, 2019 at 03:38:40PM +0200, Thomas Gleixner wrote: > > > > > > [0,2,4,5,6,8,10,12] are guard pages so 0 is not that crappy at all > > > > Wait, Thomas, I might be wrong, but per-cpu is initialized to the pointer, > > the memory for this estack_pages has not yet been allocated, no? > > static const > struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = { > EPAGERANGE(DF), > EPAGERANGE(NMI), > EPAGERANGE(DB1), > EPAGERANGE(DB), > EPAGERANGE(MCE), > }; > > It's statically allocated. So it's available from the very beginning. Indeed, thanks! I happened to overlooked this moment. ... > And as I explained to you properly decoded the values _ARE_ correct and > make sense. Yes, just posted the diff itself to be sure. Thanks a huge for explanation, Thomas! ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-23 13:21 ` Thomas Gleixner 2019-10-23 13:32 ` Cyrill Gorcunov @ 2019-10-23 13:47 ` Thomas Gleixner 2019-10-23 13:53 ` Cyrill Gorcunov 2019-10-23 13:59 ` Cyrill Gorcunov 1 sibling, 2 replies; 15+ messages in thread From: Thomas Gleixner @ 2019-10-23 13:47 UTC (permalink / raw) To: Cyrill Gorcunov Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Wed, 23 Oct 2019, Thomas Gleixner wrote: > On Tue, 22 Oct 2019, Cyrill Gorcunov wrote: > Ergo ep must be a valid pointer pointing to the statically allocated and > statically initialized estack_pages array. > > /* Guard page? */ > if (!ep->size) > > How on earth can dereferencing ep crash the machine? > > return false; > > That does not make any sense. > > Surely, we should not even try to decode exception stack when > cea_exception_stacks is not yet initialized, but that does not explain > anything what you are observing. So looking at your actual crash: [ 0.027246] BUG: unable to handle page fault for address: 0000000000001ff0 So this derefences the stack pointer address. [ 0.082275] stk 0x1010 k 1 begin 0x0 end 0xd000 estack_pages 0xffffffff82014880 ep 0xffffffff82014888 ep is pointing correctly to estack_pages[1] which is bogus because 0x1010 is not a valid stack value, but dereferencing ep does not make it crash. The crash farther down: end = begin + (unsigned long)ep->size; ==> end = 0x2000 regs = (struct pt_regs *)end - 1; ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 info->type = ep->type; info->begin = (unsigned long *)begin; info->end = (unsigned long *)end; ----> info->next_sp = (unsigned long *)regs->sp; This is the crashing instruction trying to access 0x1ff0 And you are right this happens because cea_exception_stacks is not yet initialized which makes begin = 0 and therefore point into nirvana. So the fix is trivial. Thanks, tglx 8<------------ --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned BUILD_BUG_ON(N_EXCEPTION_STACKS != 6); begin = (unsigned long)__this_cpu_read(cea_exception_stacks); + /* + * Handle the case where stack trace is collected _before_ + * cea_exception_stacks had been initialized. + */ + if (!begin) + return false; + end = begin + sizeof(struct cea_exception_stacks); /* Bail if @stack is outside the exception stack area. */ if (stk < begin || stk >= end) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-23 13:47 ` Thomas Gleixner @ 2019-10-23 13:53 ` Cyrill Gorcunov 2019-10-23 13:59 ` Cyrill Gorcunov 1 sibling, 0 replies; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-23 13:53 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote: > > And you are right this happens because cea_exception_stacks is not yet > initialized which makes begin = 0 and therefore point into nirvana. > > So the fix is trivial. Great! Thanks a lot for sych detailed analysis! I'll test the patch. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [BUG -tip] kmemleak and stacktrace cause page faul 2019-10-23 13:47 ` Thomas Gleixner 2019-10-23 13:53 ` Cyrill Gorcunov @ 2019-10-23 13:59 ` Cyrill Gorcunov 2019-10-23 18:05 ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner 1 sibling, 1 reply; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-23 13:59 UTC (permalink / raw) To: Thomas Gleixner Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas On Wed, Oct 23, 2019 at 03:47:57PM +0200, Thomas Gleixner wrote: > > So the fix is trivial. Works like a charm. Tested-by: Cyrill Gorcunov <gorcunov@gmail.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-10-23 13:59 ` Cyrill Gorcunov @ 2019-10-23 18:05 ` Thomas Gleixner 2019-10-23 18:31 ` Matthew Wilcox 2019-10-23 19:17 ` Josh Poimboeuf 0 siblings, 2 replies; 15+ messages in thread From: Thomas Gleixner @ 2019-10-23 18:05 UTC (permalink / raw) To: Cyrill Gorcunov Cc: LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas, x86, Josh Poimboeuf Cyrill reported the following crash: BUG: unable to handle page fault for address: 0000000000001ff0 #PF: supervisor read access in kernel mode RIP: 0010:get_stack_info+0xb3/0x148 It turns out that if the stack tracer is invoked before the exception stack mappings are initialized in_exception_stack() can erroneously classify an invalid address as an address inside of an exception stack: begin = this_cpu_read(cea_exception_stacks); <- 0 end = begin + sizeof(exception stacks); i.e. any address between 0 and end will be considered as exception stack address and the subsequent code will then try to derefence the resulting stack frame at a non mapped address. end = begin + (unsigned long)ep->size; ==> end = 0x2000 regs = (struct pt_regs *)end - 1; ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 info->next_sp = (unsigned long *)regs->sp; ==> Crashes due to accessing 0x1ff0 Prevent this by checking the validity of the cea_exception_stack base address and bailing out if it is zero. Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist") Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org --- arch/x86/kernel/dumpstack_64.c | 7 +++++++ 1 file changed, 7 insertions(+) --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -94,6 +94,13 @@ static bool in_exception_stack(unsigned BUILD_BUG_ON(N_EXCEPTION_STACKS != 6); begin = (unsigned long)__this_cpu_read(cea_exception_stacks); + /* + * Handle the case where stack trace is collected _before_ + * cea_exception_stacks had been initialized. + */ + if (!begin) + return false; + end = begin + sizeof(struct cea_exception_stacks); /* Bail if @stack is outside the exception stack area. */ if (stk < begin || stk >= end) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-10-23 18:05 ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner @ 2019-10-23 18:31 ` Matthew Wilcox 2019-10-23 18:43 ` Cyrill Gorcunov 2019-10-23 21:27 ` Thomas Gleixner 2019-10-23 19:17 ` Josh Poimboeuf 1 sibling, 2 replies; 15+ messages in thread From: Matthew Wilcox @ 2019-10-23 18:31 UTC (permalink / raw) To: Thomas Gleixner Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas, x86, Josh Poimboeuf On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote: > Prevent this by checking the validity of the cea_exception_stack base > address and bailing out if it is zero. Could also initialise cea_exception_stack to -1? That would lead to it being caught by ... > end = begin + sizeof(struct cea_exception_stacks); > /* Bail if @stack is outside the exception stack area. */ > if (stk < begin || stk >= end) this existing check. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-10-23 18:31 ` Matthew Wilcox @ 2019-10-23 18:43 ` Cyrill Gorcunov 2019-10-23 21:27 ` Thomas Gleixner 1 sibling, 0 replies; 15+ messages in thread From: Cyrill Gorcunov @ 2019-10-23 18:43 UTC (permalink / raw) To: Matthew Wilcox Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas, x86, Josh Poimboeuf On Wed, Oct 23, 2019 at 11:31:40AM -0700, Matthew Wilcox wrote: > On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote: > > Prevent this by checking the validity of the cea_exception_stack base > > address and bailing out if it is zero. > > Could also initialise cea_exception_stack to -1? That would lead to it > being caught by ... > > > end = begin + sizeof(struct cea_exception_stacks); > > /* Bail if @stack is outside the exception stack area. */ > > if (stk < begin || stk >= end) > > this existing check. As to me this would be a hack and fragile :/ In turn the current explicit test Thomas made is a way more readable. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-10-23 18:31 ` Matthew Wilcox 2019-10-23 18:43 ` Cyrill Gorcunov @ 2019-10-23 21:27 ` Thomas Gleixner 1 sibling, 0 replies; 15+ messages in thread From: Thomas Gleixner @ 2019-10-23 21:27 UTC (permalink / raw) To: Matthew Wilcox Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas, x86, Josh Poimboeuf On Wed, 23 Oct 2019, Matthew Wilcox wrote: > On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote: > > Prevent this by checking the validity of the cea_exception_stack base > > address and bailing out if it is zero. > > Could also initialise cea_exception_stack to -1? That would lead to it > being caught by ... > > > end = begin + sizeof(struct cea_exception_stacks); > > /* Bail if @stack is outside the exception stack area. */ > > if (stk < begin || stk >= end) > > this existing check. Yes thought about that, but then decided to do it in a readable way :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-10-23 18:05 ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner 2019-10-23 18:31 ` Matthew Wilcox @ 2019-10-23 19:17 ` Josh Poimboeuf 1 sibling, 0 replies; 15+ messages in thread From: Josh Poimboeuf @ 2019-10-23 19:17 UTC (permalink / raw) To: Thomas Gleixner Cc: Cyrill Gorcunov, LKML, Ingo Molnar, Peter Zijlstra, linux-mm, Catalin Marinas, x86 On Wed, Oct 23, 2019 at 08:05:49PM +0200, Thomas Gleixner wrote: > Cyrill reported the following crash: > > BUG: unable to handle page fault for address: 0000000000001ff0 > #PF: supervisor read access in kernel mode > RIP: 0010:get_stack_info+0xb3/0x148 > > It turns out that if the stack tracer is invoked before the exception stack > mappings are initialized in_exception_stack() can erroneously classify an > invalid address as an address inside of an exception stack: > > begin = this_cpu_read(cea_exception_stacks); <- 0 > end = begin + sizeof(exception stacks); > > i.e. any address between 0 and end will be considered as exception stack > address and the subsequent code will then try to derefence the resulting > stack frame at a non mapped address. > > end = begin + (unsigned long)ep->size; > ==> end = 0x2000 > > regs = (struct pt_regs *)end - 1; > ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 > > info->next_sp = (unsigned long *)regs->sp; > ==> Crashes due to accessing 0x1ff0 > > Prevent this by checking the validity of the cea_exception_stack base > address and bailing out if it is zero. > > Fixes: afcd21dad88b ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist") > Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Tested-by: Cyrill Gorcunov <gorcunov@gmail.com> > Cc: stable@vger.kernel.org Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> -- Josh ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-10-23 21:27 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-19 11:44 [BUG -tip] kmemleak and stacktrace cause page faul Cyrill Gorcunov 2019-10-22 14:23 ` Cyrill Gorcunov 2019-10-22 14:56 ` Cyrill Gorcunov 2019-10-23 13:21 ` Thomas Gleixner 2019-10-23 13:32 ` Cyrill Gorcunov 2019-10-23 13:38 ` Thomas Gleixner 2019-10-23 13:44 ` Cyrill Gorcunov 2019-10-23 13:47 ` Thomas Gleixner 2019-10-23 13:53 ` Cyrill Gorcunov 2019-10-23 13:59 ` Cyrill Gorcunov 2019-10-23 18:05 ` [PATCH] x86/dumpstack/64: Don't evaluate exception stacks before setup Thomas Gleixner 2019-10-23 18:31 ` Matthew Wilcox 2019-10-23 18:43 ` Cyrill Gorcunov 2019-10-23 21:27 ` Thomas Gleixner 2019-10-23 19:17 ` Josh Poimboeuf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).