* [PATCH] perf: Fix oops when kthread execs user process @ 2019-05-28 12:31 Young Xiao 2019-05-28 12:41 ` Russell King - ARM Linux admin 2019-05-28 14:01 ` Peter Zijlstra 0 siblings, 2 replies; 30+ messages in thread From: Young Xiao @ 2019-05-28 12:31 UTC (permalink / raw) To: will.deacon, linux, mark.rutland, mingo, bp, hpa, x86, peterz, kan.liang, linux-arm-kernel, linux-kernel Cc: Young Xiao When a kthread calls call_usermodehelper() the steps are: 1. allocate current->mm 2. load_elf_binary() 3. populate current->thread.regs While doing this, interrupts are not disabled. If there is a perf interrupt in the middle of this process (i.e. step 1 has completed but not yet reached to step 3) and if perf tries to read userspace regs, kernel oops. Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace pt_regs are not set. See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") for details. Signed-off-by: Young Xiao <92siuyang@gmail.com> --- arch/arm/kernel/perf_regs.c | 3 ++- arch/arm64/kernel/perf_regs.c | 3 ++- arch/x86/kernel/perf_regs.c | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c index 05fe92a..78ee29a 100644 --- a/arch/arm/kernel/perf_regs.c +++ b/arch/arm/kernel/perf_regs.c @@ -36,5 +36,6 @@ void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs_user_copy) { regs_user->regs = task_pt_regs(current); - regs_user->abi = perf_reg_abi(current); + regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) : + PERF_SAMPLE_REGS_ABI_NONE; } diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index 0bbac61..ac19d19 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -58,5 +58,6 @@ void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs_user_copy) { regs_user->regs = task_pt_regs(current); - regs_user->abi = perf_reg_abi(current); + regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) : + PERF_SAMPLE_REGS_ABI_NONE; } diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c index 07c30ee..fa79d6d 100644 --- a/arch/x86/kernel/perf_regs.c +++ b/arch/x86/kernel/perf_regs.c @@ -102,7 +102,8 @@ void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs_user_copy) { regs_user->regs = task_pt_regs(current); - regs_user->abi = perf_reg_abi(current); + regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) : + PERF_SAMPLE_REGS_ABI_NONE; } #else /* CONFIG_X86_64 */ #define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \ -- 2.7.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 12:31 [PATCH] perf: Fix oops when kthread execs user process Young Xiao @ 2019-05-28 12:41 ` Russell King - ARM Linux admin 2019-05-28 14:01 ` Peter Zijlstra 1 sibling, 0 replies; 30+ messages in thread From: Russell King - ARM Linux admin @ 2019-05-28 12:41 UTC (permalink / raw) To: Young Xiao Cc: mark.rutland, peterz, x86, will.deacon, linux-kernel, mingo, bp, hpa, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > When a kthread calls call_usermodehelper() the steps are: > 1. allocate current->mm > 2. load_elf_binary() > 3. populate current->thread.regs > > While doing this, interrupts are not disabled. If there is a perf > interrupt in the middle of this process (i.e. step 1 has completed > but not yet reached to step 3) and if perf tries to read userspace > regs, kernel oops. > > Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace > pt_regs are not set. > > See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs > user process") for details. > > Signed-off-by: Young Xiao <92siuyang@gmail.com> > --- > arch/arm/kernel/perf_regs.c | 3 ++- > arch/arm64/kernel/perf_regs.c | 3 ++- > arch/x86/kernel/perf_regs.c | 3 ++- > 3 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c > index 05fe92a..78ee29a 100644 > --- a/arch/arm/kernel/perf_regs.c > +++ b/arch/arm/kernel/perf_regs.c > @@ -36,5 +36,6 @@ void perf_get_regs_user(struct perf_regs *regs_user, > struct pt_regs *regs_user_copy) > { > regs_user->regs = task_pt_regs(current); > - regs_user->abi = perf_reg_abi(current); > + regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) : > + PERF_SAMPLE_REGS_ABI_NONE; I'd prefer it if we didn't introduce unnecessary parens - what function do the parens around "regs_user->regs" serve? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 12:31 [PATCH] perf: Fix oops when kthread execs user process Young Xiao 2019-05-28 12:41 ` Russell King - ARM Linux admin @ 2019-05-28 14:01 ` Peter Zijlstra 2019-05-28 15:32 ` Will Deacon 2019-05-29 1:44 ` Michael Ellerman 1 sibling, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2019-05-28 14:01 UTC (permalink / raw) To: Young Xiao Cc: mark.rutland, mpe, x86, will.deacon, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > When a kthread calls call_usermodehelper() the steps are: > 1. allocate current->mm > 2. load_elf_binary() > 3. populate current->thread.regs > > While doing this, interrupts are not disabled. If there is a perf > interrupt in the middle of this process (i.e. step 1 has completed > but not yet reached to step 3) and if perf tries to read userspace > regs, kernel oops. > > Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace > pt_regs are not set. > > See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs > user process") for details. Why the hell do we set current->mm before it is complete? Note that normally exec() builds the new mm before attaching it, see exec_mmap() in flush_old_exec(). Also, why did those PPC folks 'fix' this in isolation? And why didn't you Cc them? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 14:01 ` Peter Zijlstra @ 2019-05-28 15:32 ` Will Deacon 2019-05-28 16:12 ` Mark Rutland ` (2 more replies) 2019-05-29 1:44 ` Michael Ellerman 1 sibling, 3 replies; 30+ messages in thread From: Will Deacon @ 2019-05-28 15:32 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote: > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > > When a kthread calls call_usermodehelper() the steps are: > > 1. allocate current->mm > > 2. load_elf_binary() > > 3. populate current->thread.regs > > > > While doing this, interrupts are not disabled. If there is a perf > > interrupt in the middle of this process (i.e. step 1 has completed > > but not yet reached to step 3) and if perf tries to read userspace > > regs, kernel oops. This seems to be because pt_regs(current) gives NULL for kthreads on Power. > > Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace > > pt_regs are not set. > > > > See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs > > user process") for details. > > Why the hell do we set current->mm before it is complete? Note that > normally exec() builds the new mm before attaching it, see exec_mmap() > in flush_old_exec(). From the initial report [1], it doesn't look like the mm isn't initialised, but rather than we're dereferencing a NULL pt_regs pointer somehow for the current task (see previous comment). I don't see how that can happen on arm64, given that we put the pt_regs on the kernel stack which is allocated during fork. Will [1] https://git.kernel.org/linus/bf05fc25f268 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 15:32 ` Will Deacon @ 2019-05-28 16:12 ` Mark Rutland 2019-05-28 17:32 ` Peter Zijlstra 2019-05-29 4:21 ` Michael Ellerman 2 siblings, 0 replies; 30+ messages in thread From: Mark Rutland @ 2019-05-28 16:12 UTC (permalink / raw) To: Will Deacon Cc: Young Xiao, Peter Zijlstra, mpe, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote: > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote: > > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > > > When a kthread calls call_usermodehelper() the steps are: > > > 1. allocate current->mm > > > 2. load_elf_binary() > > > 3. populate current->thread.regs > > > > > > While doing this, interrupts are not disabled. If there is a perf > > > interrupt in the middle of this process (i.e. step 1 has completed > > > but not yet reached to step 3) and if perf tries to read userspace > > > regs, kernel oops. > > This seems to be because pt_regs(current) gives NULL for kthreads on Power. I think you mean task_pt_regs(current) here. > > > Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace > > > pt_regs are not set. > > > > > > See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs > > > user process") for details. > > > > Why the hell do we set current->mm before it is complete? Note that > > normally exec() builds the new mm before attaching it, see exec_mmap() > > in flush_old_exec(). > > From the initial report [1], it doesn't look like the mm isn't initialised, > but rather than we're dereferencing a NULL pt_regs pointer somehow for the > current task (see previous comment). I don't see how that can happen on > arm64, given that we put the pt_regs on the kernel stack which is allocated > during fork. > > Will > > [1] https://git.kernel.org/linus/bf05fc25f268 One caveat is that for the idle threads, the initial SP overlaps the task_pt_regs() area: * __primary_switched starts SP at init_thread_union + THREAD_SIZE. * __cpu_up() starts SP at task_stack_page(idle) + THREAD_SIZE. ... and in either case, sampling that would be bad. For both arm, I believe similar holds true. AFAICT x86 seems to reserve the regs area in its head_{64,32}.S, but I can't see what it does for other threads. Regardless, for arm, arm64, and x86, task_pt_regs(current) cannot be NULL. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 15:32 ` Will Deacon 2019-05-28 16:12 ` Mark Rutland @ 2019-05-28 17:32 ` Peter Zijlstra 2019-05-29 9:17 ` Will Deacon 2019-05-29 10:11 ` Mark Rutland 2019-05-29 4:21 ` Michael Ellerman 2 siblings, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2019-05-28 17:32 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Young Xiao, mpe, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote: > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote: > > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > > > When a kthread calls call_usermodehelper() the steps are: > > > 1. allocate current->mm > > > 2. load_elf_binary() > > > 3. populate current->thread.regs > > > > > > While doing this, interrupts are not disabled. If there is a perf > > > interrupt in the middle of this process (i.e. step 1 has completed > > > but not yet reached to step 3) and if perf tries to read userspace > > > regs, kernel oops. > > This seems to be because pt_regs(current) gives NULL for kthreads on Power. 'funny' thing that, perf_sample_regs_user() seems to assume that anything with current->mm is in fact a user task, and that assumption is just plain wrong, consider use_mm(). So I'm thinking the right thing to do here is something like the below; umh should get PF_KTHREAD cleared when it passes exec(). And this should also fix the power splat I'm thinking. --- diff --git a/kernel/events/core.c b/kernel/events/core.c index abbd4b3b96c2..9929404b6eb9 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user, if (user_mode(regs)) { regs_user->abi = perf_reg_abi(current); regs_user->regs = regs; - } else if (current->mm) { + } else if (!(current->flags & PF_KTHREAD) && current->mm) { perf_get_regs_user(regs_user, regs, regs_user_copy); } else { regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 17:32 ` Peter Zijlstra @ 2019-05-29 9:17 ` Will Deacon 2019-05-29 10:10 ` Peter Zijlstra 2019-05-29 10:11 ` Mark Rutland 1 sibling, 1 reply; 30+ messages in thread From: Will Deacon @ 2019-05-29 9:17 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote: > On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote: > > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote: > > > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > > > > When a kthread calls call_usermodehelper() the steps are: > > > > 1. allocate current->mm > > > > 2. load_elf_binary() > > > > 3. populate current->thread.regs > > > > > > > > While doing this, interrupts are not disabled. If there is a perf > > > > interrupt in the middle of this process (i.e. step 1 has completed > > > > but not yet reached to step 3) and if perf tries to read userspace > > > > regs, kernel oops. > > > > This seems to be because pt_regs(current) gives NULL for kthreads on Power. > > 'funny' thing that, perf_sample_regs_user() seems to assume that > anything with current->mm is in fact a user task, and that assumption is > just plain wrong, consider use_mm(). Right, I suppose that was attempting to handle interrupt skid from the PMU overflow? > So I'm thinking the right thing to do here is something like the below; > umh should get PF_KTHREAD cleared when it passes exec(). And this should > also fix the power splat I'm thinking. > > --- > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index abbd4b3b96c2..9929404b6eb9 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user, > if (user_mode(regs)) { > regs_user->abi = perf_reg_abi(current); > regs_user->regs = regs; > - } else if (current->mm) { > + } else if (!(current->flags & PF_KTHREAD) && current->mm) { > perf_get_regs_user(regs_user, regs, regs_user_copy); Makes sense, but under which circumstances would we have a NULL mm here? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 9:17 ` Will Deacon @ 2019-05-29 10:10 ` Peter Zijlstra 2019-05-29 10:20 ` Will Deacon 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2019-05-29 10:10 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Young Xiao, mpe, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 10:17:33AM +0100, Will Deacon wrote: > On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote: > > 'funny' thing that, perf_sample_regs_user() seems to assume that > > anything with current->mm is in fact a user task, and that assumption is > > just plain wrong, consider use_mm(). > > Right, I suppose that was attempting to handle interrupt skid from the PMU > overflow? Nah, just a broken test to determine if there is userspace at all. It is mostly right, just not completely :-) > > So I'm thinking the right thing to do here is something like the below; > > umh should get PF_KTHREAD cleared when it passes exec(). And this should > > also fix the power splat I'm thinking. > > > > --- > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index abbd4b3b96c2..9929404b6eb9 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user, > > if (user_mode(regs)) { > > regs_user->abi = perf_reg_abi(current); > > regs_user->regs = regs; > > - } else if (current->mm) { > > + } else if (!(current->flags & PF_KTHREAD) && current->mm) { > > perf_get_regs_user(regs_user, regs, regs_user_copy); > > Makes sense, but under which circumstances would we have a NULL mm here? Dunno; I'm paranoid, and also: mm/memcontrol.c: if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) mm/vmacache.c: return current->mm == mm && !(current->flags & PF_KTHREAD); _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 10:10 ` Peter Zijlstra @ 2019-05-29 10:20 ` Will Deacon 2019-05-29 12:55 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Will Deacon @ 2019-05-29 10:20 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 12:10:42PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 10:17:33AM +0100, Will Deacon wrote: > > On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote: > > > > 'funny' thing that, perf_sample_regs_user() seems to assume that > > > anything with current->mm is in fact a user task, and that assumption is > > > just plain wrong, consider use_mm(). > > > > Right, I suppose that was attempting to handle interrupt skid from the PMU > > overflow? > > Nah, just a broken test to determine if there is userspace at all. It is > mostly right, just not completely :-) > > > > So I'm thinking the right thing to do here is something like the below; > > > umh should get PF_KTHREAD cleared when it passes exec(). And this should > > > also fix the power splat I'm thinking. > > > > > > --- > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > index abbd4b3b96c2..9929404b6eb9 100644 > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user, > > > if (user_mode(regs)) { > > > regs_user->abi = perf_reg_abi(current); > > > regs_user->regs = regs; > > > - } else if (current->mm) { > > > + } else if (!(current->flags & PF_KTHREAD) && current->mm) { > > > perf_get_regs_user(regs_user, regs, regs_user_copy); > > > > Makes sense, but under which circumstances would we have a NULL mm here? > > Dunno; I'm paranoid, and also: > > mm/memcontrol.c: if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) So this one I also don't understand... > mm/vmacache.c: return current->mm == mm && !(current->flags & PF_KTHREAD); ... but this one is just about an mm mismatch, rather than a NULL mm. Anyway, you can add my ack to your patch, but I bet we can remove that mm check :D Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 10:20 ` Will Deacon @ 2019-05-29 12:55 ` Peter Zijlstra 2019-05-29 13:05 ` Will Deacon 2019-05-30 8:38 ` Ravi Bangoria 0 siblings, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2019-05-29 12:55 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 11:20:22AM +0100, Will Deacon wrote: > Anyway, you can add my ack to your patch, but I bet we can remove that mm > check :D I've ended up with the below. Ravi, can you test if that does indeed obsolete your PPC patch? --- Subject: perf: Fix perf_sample_regs_user() From: Peter Zijlstra <peterz@infradead.org> Date: Wed May 29 14:37:24 CEST 2019 perf_sample_regs_user() uses 'current->mm' to test for the presence of userspace, but this is insufficient, consider use_mm(). A better test is: '!(current->flags & PF_KTHREAD)', exec() clears PF_KTHREAD after it sets the new ->mm but before it drops to userspace for the first time. Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reported-by: Young Xiao <92siuyang@gmail.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Will Deacon <will.deacon@arm.com> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct if (user_mode(regs)) { regs_user->abi = perf_reg_abi(current); regs_user->regs = regs; - } else if (current->mm) { + } else if (!(current->flags & PF_KTHREAD)) { perf_get_regs_user(regs_user, regs, regs_user_copy); } else { regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 12:55 ` Peter Zijlstra @ 2019-05-29 13:05 ` Will Deacon 2019-05-29 13:25 ` Peter Zijlstra 2019-05-30 8:38 ` Ravi Bangoria 1 sibling, 1 reply; 30+ messages in thread From: Will Deacon @ 2019-05-29 13:05 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 11:20:22AM +0100, Will Deacon wrote: > > Anyway, you can add my ack to your patch, but I bet we can remove that mm > > check :D > > I've ended up with the below. Ravi, can you test if that does indeed > obsolete your PPC patch? > > --- > Subject: perf: Fix perf_sample_regs_user() > From: Peter Zijlstra <peterz@infradead.org> > Date: Wed May 29 14:37:24 CEST 2019 > > perf_sample_regs_user() uses 'current->mm' to test for the presence of > userspace, but this is insufficient, consider use_mm(). > > A better test is: '!(current->flags & PF_KTHREAD)', exec() clears > PF_KTHREAD after it sets the new ->mm but before it drops to userspace > for the first time. > > Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") > > Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> > Reported-by: Young Xiao <92siuyang@gmail.com> > Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> > Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Stephane Eranian <eranian@google.com> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com> > Acked-by: Will Deacon <will.deacon@arm.com> > Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/events/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct > if (user_mode(regs)) { Hmm, so it just occurred to me that Mark's observation is that the regs can be junk in some cases. In which case, should we be checking for kthreads first? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 13:05 ` Will Deacon @ 2019-05-29 13:25 ` Peter Zijlstra 2019-05-29 14:35 ` Will Deacon 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2019-05-29 13:25 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > if (user_mode(regs)) { > > Hmm, so it just occurred to me that Mark's observation is that the regs > can be junk in some cases. In which case, should we be checking for > kthreads first? task_pt_regs() can return garbage, but @regs is the exception (or perf_arch_fetch_caller_regs()) regs, and for those user_mode() had better be correct. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 13:25 ` Peter Zijlstra @ 2019-05-29 14:35 ` Will Deacon 2019-05-29 16:19 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Will Deacon @ 2019-05-29 14:35 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > if (user_mode(regs)) { > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > can be junk in some cases. In which case, should we be checking for > > kthreads first? > > task_pt_regs() can return garbage, but @regs is the exception (or > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had > better be correct. So what should we report for the idle task? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 14:35 ` Will Deacon @ 2019-05-29 16:19 ` Peter Zijlstra 2019-05-29 16:24 ` Mark Rutland 2019-05-29 16:25 ` Will Deacon 0 siblings, 2 replies; 30+ messages in thread From: Peter Zijlstra @ 2019-05-29 16:19 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote: > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote: > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > > > if (user_mode(regs)) { > > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > > can be junk in some cases. In which case, should we be checking for > > > kthreads first? > > > > task_pt_regs() can return garbage, but @regs is the exception (or > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had > > better be correct. > > So what should we report for the idle task? If an interrupt hits the idle task, @regs would be !user_mode(regs), we'll find current->flags & PF_KTHREAD (idle not having passed through exec()) and therefore we'll take ABI_NONE for the user regs. Or am I not getting it? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 16:19 ` Peter Zijlstra @ 2019-05-29 16:24 ` Mark Rutland 2019-05-29 16:38 ` Mark Rutland 2019-05-29 16:25 ` Will Deacon 1 sibling, 1 reply; 30+ messages in thread From: Mark Rutland @ 2019-05-29 16:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Young Xiao, mpe, jolsa, x86, Will Deacon, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 06:19:55PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote: > > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote: > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > > > > > if (user_mode(regs)) { > > > > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > > > can be junk in some cases. In which case, should we be checking for > > > > kthreads first? > > > > > > task_pt_regs() can return garbage, but @regs is the exception (or > > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had > > > better be correct. > > > > So what should we report for the idle task? > > If an interrupt hits the idle task, @regs would be !user_mode(regs), > we'll find current->flags & PF_KTHREAD (idle not having passed through > exec()) and therefore we'll take ABI_NONE for the user regs. > > Or am I not getting it? If the contents of task_pt_regs(current) is garbage, then the result of user_mode(task_pt_regs(current)) is also garbage, no? Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 16:24 ` Mark Rutland @ 2019-05-29 16:38 ` Mark Rutland 2019-05-29 17:03 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Mark Rutland @ 2019-05-29 16:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Young Xiao, mpe, jolsa, x86, Will Deacon, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 05:24:36PM +0100, Mark Rutland wrote: > On Wed, May 29, 2019 at 06:19:55PM +0200, Peter Zijlstra wrote: > > On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote: > > > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote: > > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > > > > > > > if (user_mode(regs)) { > > > > > > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > > > > can be junk in some cases. In which case, should we be checking for > > > > > kthreads first? > > > > > > > > task_pt_regs() can return garbage, but @regs is the exception (or > > > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had > > > > better be correct. > > > > > > So what should we report for the idle task? > > > > If an interrupt hits the idle task, @regs would be !user_mode(regs), > > we'll find current->flags & PF_KTHREAD (idle not having passed through > > exec()) and therefore we'll take ABI_NONE for the user regs. > > > > Or am I not getting it? > > If the contents of task_pt_regs(current) is garbage, then the result of > user_mode(task_pt_regs(current)) is also garbage, no? Ugh; I was being thick here and assuming regs was the result of task_pt_regs() when it's actually the interrupted regs. Sorry for the noise. Generally speaking though, if we ever task task_pt_regs() of an idle task we'll get junk, and user_mode() could be true. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 16:38 ` Mark Rutland @ 2019-05-29 17:03 ` Peter Zijlstra 2019-05-30 10:35 ` Mark Rutland 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2019-05-29 17:03 UTC (permalink / raw) To: Mark Rutland Cc: Young Xiao, mpe, jolsa, x86, Will Deacon, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 05:38:54PM +0100, Mark Rutland wrote: > Sorry for the noise. n/p, confusion happens :-) > Generally speaking though, if we ever task task_pt_regs() of an idle > task we'll get junk, and user_mode() could be true. Agreed, but we're not doing that. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 17:03 ` Peter Zijlstra @ 2019-05-30 10:35 ` Mark Rutland 0 siblings, 0 replies; 30+ messages in thread From: Mark Rutland @ 2019-05-30 10:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Young Xiao, mpe, jolsa, x86, Will Deacon, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 07:03:13PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 05:38:54PM +0100, Mark Rutland wrote: > > Generally speaking though, if we ever task task_pt_regs() of an idle > > task we'll get junk, and user_mode() could be true. > > Agreed, but we're not doing that. Sure. I just think that might be an argument for having task_pt_regs() return NULL for kthreads, or having a WARN_ON_ONCE(t->flags & PF_KTHREAD) to catch missing checks elsewhere. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 16:19 ` Peter Zijlstra 2019-05-29 16:24 ` Mark Rutland @ 2019-05-29 16:25 ` Will Deacon 2019-05-29 16:44 ` Peter Zijlstra 1 sibling, 1 reply; 30+ messages in thread From: Will Deacon @ 2019-05-29 16:25 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 06:19:55PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 03:35:10PM +0100, Will Deacon wrote: > > On Wed, May 29, 2019 at 03:25:15PM +0200, Peter Zijlstra wrote: > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > > > > > if (user_mode(regs)) { > > > > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > > > can be junk in some cases. In which case, should we be checking for > > > > kthreads first? > > > > > > task_pt_regs() can return garbage, but @regs is the exception (or > > > perf_arch_fetch_caller_regs()) regs, and for those user_mode() had > > > better be correct. > > > > So what should we report for the idle task? > > If an interrupt hits the idle task, @regs would be !user_mode(regs), > we'll find current->flags & PF_KTHREAD (idle not having passed through > exec()) and therefore we'll take ABI_NONE for the user regs. > > Or am I not getting it? Sorry, I'm not trying to catch you out! Just trying to understand what the semantics are supposed to be. I do find the concept of user_mode(regs) bizarre for the idle task. By the above, we definitely have a bug on arm64 (user_mode(regs) tends to be true for the idle task), and I couldn't figure out how you avoided it on x86. I guess it happens to work because the stack is zero-initialised or something? Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 16:25 ` Will Deacon @ 2019-05-29 16:44 ` Peter Zijlstra 2019-05-30 7:28 ` Will Deacon 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2019-05-29 16:44 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 05:25:28PM +0100, Will Deacon wrote: > > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > > > > > > > if (user_mode(regs)) { > > > > > > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > > > > can be junk in some cases. In which case, should we be checking for > > > > > kthreads first? > Sorry, I'm not trying to catch you out! Just trying to understand what the > semantics are supposed to be. > > I do find the concept of user_mode(regs) bizarre for the idle task. By the > above, we definitely have a bug on arm64 (user_mode(regs) tends to be > true for the idle task), and I couldn't figure out how you avoided it on > x86. I guess it happens to work because the stack is zero-initialised or > something? So lets take the whole thing: static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs, struct pt_regs *regs_user_copy) { if (user_mode(regs)) { regs_user->abi = perf_reg_abi(current); regs_user->regs = regs; } else if (!(current->flags & PF_KTHREAD)) { perf_get_regs_user(regs_user, regs, regs_user_copy); } else { regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; regs_user->regs = NULL; } } This is called from the perf-generate-a-sample path, which is typically an exception (IRQ/NMI/whatever) or a software/tracepoint thing. In the exception case, the @regs argument are the exception register, as provided by your entry.S to your exception handlers. In the software/tracepoint thing, it is the result of perf_arch_fetch_caller_regs(). So @regs is always 'sane' and user_mode(regs) tells us if the exception came from userspace (and software/tracepoints always fail this, they 'obviously' don't come from userspace). If we're idle, we're not from userspace, so this branch doesn't matter. Next, we test if there is a userspace part _at_all_, this is the newly minted: '!(current->flags & PF_KTHREAD)', if that passes, we use architecture magic -- task_pt_regs() -- to get the user-regs. This can be crap. But since the idle task will always fail our test (as would the old one, idle->mm is always NULL), we'll never get here for idle. Then failing the above two, as we must for idle, we'll default to ABI_NONE/NULL. Does that help? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 16:44 ` Peter Zijlstra @ 2019-05-30 7:28 ` Will Deacon 0 siblings, 0 replies; 30+ messages in thread From: Will Deacon @ 2019-05-30 7:28 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Wed, May 29, 2019 at 06:44:07PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 05:25:28PM +0100, Will Deacon wrote: > > > > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote: > > > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote: > > > > > > > > > > > > if (user_mode(regs)) { > > > > > > > > > > > > Hmm, so it just occurred to me that Mark's observation is that the regs > > > > > > can be junk in some cases. In which case, should we be checking for > > > > > > kthreads first? > > > Sorry, I'm not trying to catch you out! Just trying to understand what the > > semantics are supposed to be. > > > > I do find the concept of user_mode(regs) bizarre for the idle task. By the > > above, we definitely have a bug on arm64 (user_mode(regs) tends to be > > true for the idle task), and I couldn't figure out how you avoided it on > > x86. I guess it happens to work because the stack is zero-initialised or > > something? > > So lets take the whole thing: > > static void perf_sample_regs_user(struct perf_regs *regs_user, > struct pt_regs *regs, > struct pt_regs *regs_user_copy) > { > if (user_mode(regs)) { > regs_user->abi = perf_reg_abi(current); > regs_user->regs = regs; > } else if (!(current->flags & PF_KTHREAD)) { > perf_get_regs_user(regs_user, regs, regs_user_copy); > } else { > regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; > regs_user->regs = NULL; > } > } > > This is called from the perf-generate-a-sample path, which is typically > an exception (IRQ/NMI/whatever) or a software/tracepoint thing. Yes, sorry, fell into the same trap as Mark here and misunderstood your assertion about user_mode(regs) always needing to be valid. Then I went down a stupid rabbit hole and dragged you with me. I can't ack a patch twice, so I'll just go do something else for a bit... Thanks for your patience! Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-29 12:55 ` Peter Zijlstra 2019-05-29 13:05 ` Will Deacon @ 2019-05-30 8:38 ` Ravi Bangoria 2019-05-30 10:27 ` Ravi Bangoria 1 sibling, 1 reply; 30+ messages in thread From: Ravi Bangoria @ 2019-05-30 8:38 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, ravi.bangoria, Young Xiao, mpe, jolsa, x86, Will Deacon, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On 5/29/19 6:25 PM, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 11:20:22AM +0100, Will Deacon wrote: >> Anyway, you can add my ack to your patch, but I bet we can remove that mm >> check :D > > I've ended up with the below. Ravi, can you test if that does indeed > obsolete your PPC patch? Checking. > > --- > Subject: perf: Fix perf_sample_regs_user() > From: Peter Zijlstra <peterz@infradead.org> > Date: Wed May 29 14:37:24 CEST 2019 > > perf_sample_regs_user() uses 'current->mm' to test for the presence of > userspace, but this is insufficient, consider use_mm(). > > A better test is: '!(current->flags & PF_KTHREAD)', exec() clears > PF_KTHREAD after it sets the new ->mm but before it drops to userspace > for the first time. This looks correct. I'll give it a try. > > Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") > > Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> > Reported-by: Young Xiao <92siuyang@gmail.com> > Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> > Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Stephane Eranian <eranian@google.com> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com> > Acked-by: Will Deacon <will.deacon@arm.com> > Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/events/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct > if (user_mode(regs)) { > regs_user->abi = perf_reg_abi(current); > regs_user->regs = regs; > - } else if (current->mm) { > + } else if (!(current->flags & PF_KTHREAD)) { > perf_get_regs_user(regs_user, regs, regs_user_copy); > } else { > regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-30 8:38 ` Ravi Bangoria @ 2019-05-30 10:27 ` Ravi Bangoria 2019-05-31 15:37 ` Will Deacon 0 siblings, 1 reply; 30+ messages in thread From: Ravi Bangoria @ 2019-05-30 10:27 UTC (permalink / raw) To: Peter Zijlstra Cc: mark.rutland, Ravi Bangoria, Young Xiao, mpe, jolsa, x86, Will Deacon, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On 5/30/19 2:08 PM, Ravi Bangoria wrote: >> --- >> Subject: perf: Fix perf_sample_regs_user() >> From: Peter Zijlstra <peterz@infradead.org> >> Date: Wed May 29 14:37:24 CEST 2019 >> >> perf_sample_regs_user() uses 'current->mm' to test for the presence of >> userspace, but this is insufficient, consider use_mm(). >> >> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears >> PF_KTHREAD after it sets the new ->mm but before it drops to userspace >> for the first time. > > This looks correct. I'll give it a try. > >> >> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") >> >> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> >> Reported-by: Young Xiao <92siuyang@gmail.com> >> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> >> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> Cc: Jiri Olsa <jolsa@redhat.com> >> Cc: Frederic Weisbecker <fweisbec@gmail.com> >> Cc: Stephane Eranian <eranian@google.com> >> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> >> Acked-by: Will Deacon <will.deacon@arm.com> >> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >> --- >> kernel/events/core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct >> if (user_mode(regs)) { >> regs_user->abi = perf_reg_abi(current); >> regs_user->regs = regs; >> - } else if (current->mm) { >> + } else if (!(current->flags & PF_KTHREAD)) { With this patch applied and my patch reverted, I still see it crashing because current->flags does not have PF_KTHREAD set. Sample trace with v5.0 kernel: BUG: Kernel NULL pointer dereference at 0x00000000 Faulting instruction address: 0xc0000000000f1a6c Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7 NIP: c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90 REGS: c0000001e80469a0 TRAP: 0300 Not tainted (5.0.0+) MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 48022448 XER: 20000000 CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0 LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 Call Trace: [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable) [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0 [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90 [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0 [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670 [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0 [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70 [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120 --- interrupt: f01 at slice_scan_available+0x20/0xc0 LR = slice_find_area+0x174/0x210 [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0 [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170 [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0 [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580 [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0 [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980 [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290 [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 Instruction dump: 7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 ---[ end trace 54f3492ad1d403d8 ]--- _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-30 10:27 ` Ravi Bangoria @ 2019-05-31 15:37 ` Will Deacon 2019-06-03 11:23 ` Will Deacon ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Will Deacon @ 2019-05-31 15:37 UTC (permalink / raw) To: Ravi Bangoria Cc: mark.rutland, Young Xiao, Peter Zijlstra, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote: > > > On 5/30/19 2:08 PM, Ravi Bangoria wrote: > >> --- > >> Subject: perf: Fix perf_sample_regs_user() > >> From: Peter Zijlstra <peterz@infradead.org> > >> Date: Wed May 29 14:37:24 CEST 2019 > >> > >> perf_sample_regs_user() uses 'current->mm' to test for the presence of > >> userspace, but this is insufficient, consider use_mm(). > >> > >> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears > >> PF_KTHREAD after it sets the new ->mm but before it drops to userspace > >> for the first time. > > > > This looks correct. I'll give it a try. > > > >> > >> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") > >> > >> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> > >> Reported-by: Young Xiao <92siuyang@gmail.com> > >> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> > >> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> > >> Cc: Michael Ellerman <mpe@ellerman.id.au> > >> Cc: Jiri Olsa <jolsa@redhat.com> > >> Cc: Frederic Weisbecker <fweisbec@gmail.com> > >> Cc: Stephane Eranian <eranian@google.com> > >> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> > >> Acked-by: Will Deacon <will.deacon@arm.com> > >> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") > >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > >> --- > >> kernel/events/core.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> --- a/kernel/events/core.c > >> +++ b/kernel/events/core.c > >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct > >> if (user_mode(regs)) { > >> regs_user->abi = perf_reg_abi(current); > >> regs_user->regs = regs; > >> - } else if (current->mm) { > >> + } else if (!(current->flags & PF_KTHREAD)) { > > With this patch applied and my patch reverted, I still see it crashing > because current->flags does not have PF_KTHREAD set. Sample trace with > v5.0 kernel: > > > BUG: Kernel NULL pointer dereference at 0x00000000 > Faulting instruction address: 0xc0000000000f1a6c > Oops: Kernel access of bad area, sig: 11 [#1] > LE SMP NR_CPUS=2048 NUMA pSeries > CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7 > NIP: c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90 > REGS: c0000001e80469a0 TRAP: 0300 Not tainted (5.0.0+) > MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 48022448 XER: 20000000 > CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 > GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 > GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 > GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 > GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 > GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 > GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 > GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 > GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 > NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0 > LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 > Call Trace: > [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable) > [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0 > [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90 > [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0 > [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670 > [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0 > [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70 > [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120 > --- interrupt: f01 at slice_scan_available+0x20/0xc0 > LR = slice_find_area+0x174/0x210 > [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0 > [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170 > [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0 > [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580 > [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0 > [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980 > [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290 > [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 > Instruction dump: > 7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 > 7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 > ---[ end trace 54f3492ad1d403d8 ]--- Oh, nice! I think this happens because Power doesn't actually initialise the regs after a kthread execs() until late in start_thread(). But the plot thickens somewhat, since current_pt_regs() is different to task_pt_regs(current) on Power (the former cannot return NULL). So a really hideous hack on top of Peter's patch might be: diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c index 0bbac612146e..5bde866024b6 100644 --- a/arch/arm64/kernel/perf_regs.c +++ b/arch/arm64/kernel/perf_regs.c @@ -57,6 +57,6 @@ void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs, struct pt_regs *regs_user_copy) { - regs_user->regs = task_pt_regs(current); + regs_user->regs = current_pt_regs(); regs_user->abi = perf_reg_abi(current); } Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-31 15:37 ` Will Deacon @ 2019-06-03 11:23 ` Will Deacon 2019-06-03 11:48 ` Peter Zijlstra 2019-06-03 13:30 ` Michael Ellerman 2 siblings, 0 replies; 30+ messages in thread From: Will Deacon @ 2019-06-03 11:23 UTC (permalink / raw) To: Ravi Bangoria Cc: mark.rutland, Young Xiao, Peter Zijlstra, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Fri, May 31, 2019 at 04:37:15PM +0100, Will Deacon wrote: > Oh, nice! I think this happens because Power doesn't actually initialise > the regs after a kthread execs() until late in start_thread(). But the plot > thickens somewhat, since current_pt_regs() is different to > task_pt_regs(current) on Power (the former cannot return NULL). > > So a really hideous hack on top of Peter's patch might be: > > diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c > index 0bbac612146e..5bde866024b6 100644 > --- a/arch/arm64/kernel/perf_regs.c > +++ b/arch/arm64/kernel/perf_regs.c > @@ -57,6 +57,6 @@ void perf_get_regs_user(struct perf_regs *regs_user, > struct pt_regs *regs, > struct pt_regs *regs_user_copy) > { > - regs_user->regs = task_pt_regs(current); > + regs_user->regs = current_pt_regs(); > regs_user->abi = perf_reg_abi(current); ^^^ Bah, this was clearly supposed to be a change in the powerpc code, but you get the idea. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-31 15:37 ` Will Deacon 2019-06-03 11:23 ` Will Deacon @ 2019-06-03 11:48 ` Peter Zijlstra 2019-06-03 13:30 ` Michael Ellerman 2 siblings, 0 replies; 30+ messages in thread From: Peter Zijlstra @ 2019-06-03 11:48 UTC (permalink / raw) To: Will Deacon Cc: mark.rutland, Ravi Bangoria, Young Xiao, mpe, jolsa, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, fweisbec, linux-arm-kernel, kan.liang On Fri, May 31, 2019 at 04:37:03PM +0100, Will Deacon wrote: > On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote: > > On 5/30/19 2:08 PM, Ravi Bangoria wrote: > > >> --- a/kernel/events/core.c > > >> +++ b/kernel/events/core.c > > >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct > > >> if (user_mode(regs)) { > > >> regs_user->abi = perf_reg_abi(current); > > >> regs_user->regs = regs; > > >> - } else if (current->mm) { > > >> + } else if (!(current->flags & PF_KTHREAD)) { > > > > With this patch applied and my patch reverted, I still see it crashing > > because current->flags does not have PF_KTHREAD set. Sample trace with > > v5.0 kernel: > > > > > > BUG: Kernel NULL pointer dereference at 0x00000000 > > Faulting instruction address: 0xc0000000000f1a6c > > Oops: Kernel access of bad area, sig: 11 [#1] > > LE SMP NR_CPUS=2048 NUMA pSeries > > CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7 > > NIP: c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90 > > REGS: c0000001e80469a0 TRAP: 0300 Not tainted (5.0.0+) > > MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 48022448 XER: 20000000 > > CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 > > GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 > > GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 > > GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 > > GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 > > GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 > > GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 > > GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 > > GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 > > NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0 > > LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 > > Call Trace: > > [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable) > > [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0 > > [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90 > > [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0 > > [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670 > > [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0 > > [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70 > > [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120 > > --- interrupt: f01 at slice_scan_available+0x20/0xc0 > > LR = slice_find_area+0x174/0x210 > > [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0 > > [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170 > > [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0 > > [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580 > > [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0 > > [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980 > > [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290 > > [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 > > Instruction dump: > > 7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 > > 7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 > > ---[ end trace 54f3492ad1d403d8 ]--- > > Oh, nice! I think this happens because Power doesn't actually initialise > the regs after a kthread execs() until late in start_thread(). But the plot > thickens somewhat, since current_pt_regs() is different to > task_pt_regs(current) on Power (the former cannot return NULL). So one possibility would be to have activate_mm() initialize the user regs set. Doing it there ensure that the moment PF_KTHREAD gets cleared, task_pt_regs() is valid. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-31 15:37 ` Will Deacon 2019-06-03 11:23 ` Will Deacon 2019-06-03 11:48 ` Peter Zijlstra @ 2019-06-03 13:30 ` Michael Ellerman 2 siblings, 0 replies; 30+ messages in thread From: Michael Ellerman @ 2019-06-03 13:30 UTC (permalink / raw) To: Will Deacon, Ravi Bangoria Cc: mark.rutland, Young Xiao, Peter Zijlstra, fweisbec, x86, linux, eranian, linux-kernel, acme, mingo, bp, hpa, ravi.bangoria, jolsa, linux-arm-kernel, kan.liang Will Deacon <will.deacon@arm.com> writes: > On Thu, May 30, 2019 at 03:57:36PM +0530, Ravi Bangoria wrote: >> On 5/30/19 2:08 PM, Ravi Bangoria wrote: >> >> --- >> >> Subject: perf: Fix perf_sample_regs_user() >> >> From: Peter Zijlstra <peterz@infradead.org> >> >> Date: Wed May 29 14:37:24 CEST 2019 >> >> >> >> perf_sample_regs_user() uses 'current->mm' to test for the presence of >> >> userspace, but this is insufficient, consider use_mm(). >> >> >> >> A better test is: '!(current->flags & PF_KTHREAD)', exec() clears >> >> PF_KTHREAD after it sets the new ->mm but before it drops to userspace >> >> for the first time. >> > >> > This looks correct. I'll give it a try. >> > >> >> >> >> Possibly obsoletes: bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs user process") >> >> >> >> Reported-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> >> >> Reported-by: Young Xiao <92siuyang@gmail.com> >> >> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> >> >> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> >> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> >> Cc: Jiri Olsa <jolsa@redhat.com> >> >> Cc: Frederic Weisbecker <fweisbec@gmail.com> >> >> Cc: Stephane Eranian <eranian@google.com> >> >> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> >> >> Acked-by: Will Deacon <will.deacon@arm.com> >> >> Fixes: 4018994f3d87 ("perf: Add ability to attach user level registers dump to sample") >> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >> >> --- >> >> kernel/events/core.c | 2 +- >> >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> >> >> --- a/kernel/events/core.c >> >> +++ b/kernel/events/core.c >> >> @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct >> >> if (user_mode(regs)) { >> >> regs_user->abi = perf_reg_abi(current); >> >> regs_user->regs = regs; >> >> - } else if (current->mm) { >> >> + } else if (!(current->flags & PF_KTHREAD)) { >> >> With this patch applied and my patch reverted, I still see it crashing >> because current->flags does not have PF_KTHREAD set. Sample trace with >> v5.0 kernel: >> >> >> BUG: Kernel NULL pointer dereference at 0x00000000 >> Faulting instruction address: 0xc0000000000f1a6c >> Oops: Kernel access of bad area, sig: 11 [#1] >> LE SMP NR_CPUS=2048 NUMA pSeries >> CPU: 17 PID: 3241 Comm: systemd-cgroups Kdump: loaded Not tainted 5.0.0+ #7 >> NIP: c0000000000f1a6c LR: c0000000002acc7c CTR: c0000000002a8f90 >> REGS: c0000001e80469a0 TRAP: 0300 Not tainted (5.0.0+) >> MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 48022448 XER: 20000000 >> CFAR: c00000000000deb4 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 >> GPR00: c0000000002acc7c c0000001e8046c30 c000000001607500 0000000000000000 >> GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000128618 >> GPR08: 000007ffffffffff 0000000000000000 ffffffffffffffff c00000000001cd40 >> GPR12: c000000000446fd8 c00000003ffdf080 00000000ffff0000 0000000000000007 >> GPR16: c0000001edd74988 c0000001edd60400 00007fff89801130 000000000005e1b0 >> GPR20: c0000001edb77a08 c0000001e8047208 c0000001f03d9800 c0000001e8046e00 >> GPR24: 000000000000b1af c000000001126938 c0000001f03d9b28 0000000000010000 >> GPR28: c0000001e8046d30 0000000000000000 0000000000000000 0000000000000000 >> NIP [c0000000000f1a6c] perf_reg_value+0x5c/0xc0 >> LR [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 >> Call Trace: >> [c0000001e8046c30] [c0000000002acc7c] perf_output_sample_regs+0x6c/0xd0 (unreliable) >> [c0000001e8046c80] [c0000000002b9cd0] perf_output_sample+0x620/0x8c0 >> [c0000001e8046d10] [c0000000002ba6b4] perf_event_output_forward+0x64/0x90 >> [c0000001e8046d80] [c0000000002b2908] __perf_event_overflow+0x88/0x1e0 >> [c0000001e8046dd0] [c0000000000f3d18] record_and_restart+0x288/0x670 >> [c0000001e8047180] [c0000000000f4c18] perf_event_interrupt+0x2b8/0x4b0 >> [c0000001e8047280] [c00000000002b380] performance_monitor_exception+0x50/0x70 >> [c0000001e80472a0] [c000000000009ca0] performance_monitor_common+0x110/0x120 >> --- interrupt: f01 at slice_scan_available+0x20/0xc0 >> LR = slice_find_area+0x174/0x210 >> [c0000001e8047630] [c000000000083ea0] slice_get_unmapped_area+0x3d0/0x7f0 >> [c0000001e8047ae0] [c00000000032d5b0] get_unmapped_area+0xa0/0x170 >> [c0000001e8047b10] [c00000000001cd40] arch_setup_additional_pages+0xc0/0x1c0 >> [c0000001e8047b60] [c000000000446fd8] load_elf_binary+0xb48/0x1580 >> [c0000001e8047c80] [c0000000003c3938] search_binary_handler+0xe8/0x2a0 >> [c0000001e8047d10] [c0000000003c42f4] __do_execve_file.isra.13+0x694/0x980 >> [c0000001e8047de0] [c000000000128618] call_usermodehelper_exec_async+0x248/0x290 >> [c0000001e8047e20] [c00000000000b65c] ret_from_kernel_thread+0x5c/0x80 >> Instruction dump: >> 7c9e2378 7c7f1b78 f8010010 f821ffd1 419e0044 3d22ff6b 7bc41764 3929ae10 >> 7d29202e 2b890150 419d003c 38210030 <7c7f482a> e8010010 ebc1fff0 ebe1fff8 >> ---[ end trace 54f3492ad1d403d8 ]--- > > Oh, nice! I think this happens because Power doesn't actually initialise > the regs after a kthread execs() until late in start_thread(). Hmm, it's more or less at the top of start_thread(), but that's late vs flush_old_exec(), so there's definitely a window there. > But the plot thickens somewhat, since current_pt_regs() is different to > task_pt_regs(current) on Power (the former cannot return NULL). Ugh. Mark had convinced me in the other part of the thread that returning NULL for kthreads made sense, but having different results depending on which similarly named accessor you use is gross. We used to implement current_pt_regs() without actually looking at current via: ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) - 1) Where current_thread_info() just masked the stack pointer, so that was a nice optimisation. But now that we have THREAD_INFO_IN_TASK we're going via current anyway, so we may as well just get rid of current_pt_regs() and make it a synonym for task_pt_regs(current). Though that will probably cause something else to break :D > So a really hideous hack on top of Peter's patch might be: > > diff --git a/arch/arm64/kernel/perf_regs.c b/arch/arm64/kernel/perf_regs.c > index 0bbac612146e..5bde866024b6 100644 > --- a/arch/arm64/kernel/perf_regs.c > +++ b/arch/arm64/kernel/perf_regs.c > @@ -57,6 +57,6 @@ void perf_get_regs_user(struct perf_regs *regs_user, > struct pt_regs *regs, > struct pt_regs *regs_user_copy) > { > - regs_user->regs = task_pt_regs(current); > + regs_user->regs = current_pt_regs(); > regs_user->abi = perf_reg_abi(current); I'd be inclined to stick with what we've got, at least so long as we're the only arch that lets task_pt_regs() return NULL. void perf_get_regs_user(struct perf_regs *regs_user, struct pt_regs *regs, struct pt_regs *regs_user_copy) { regs_user->regs = task_pt_regs(current); regs_user->abi = (regs_user->regs) ? perf_reg_abi(current) : PERF_SAMPLE_REGS_ABI_NONE; } cheers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 17:32 ` Peter Zijlstra 2019-05-29 9:17 ` Will Deacon @ 2019-05-29 10:11 ` Mark Rutland 1 sibling, 0 replies; 30+ messages in thread From: Mark Rutland @ 2019-05-29 10:11 UTC (permalink / raw) To: Peter Zijlstra Cc: Young Xiao, mpe, x86, Will Deacon, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang On Tue, May 28, 2019 at 07:32:28PM +0200, Peter Zijlstra wrote: > On Tue, May 28, 2019 at 04:32:24PM +0100, Will Deacon wrote: > > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote: > > > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: > > > > When a kthread calls call_usermodehelper() the steps are: > > > > 1. allocate current->mm > > > > 2. load_elf_binary() > > > > 3. populate current->thread.regs > > > > > > > > While doing this, interrupts are not disabled. If there is a perf > > > > interrupt in the middle of this process (i.e. step 1 has completed > > > > but not yet reached to step 3) and if perf tries to read userspace > > > > regs, kernel oops. > > > > This seems to be because pt_regs(current) gives NULL for kthreads on Power. > > 'funny' thing that, perf_sample_regs_user() seems to assume that > anything with current->mm is in fact a user task, and that assumption is > just plain wrong, consider use_mm(). Tagnentially, it looks like that assumption is made elsewhere, and could do with a more general cleanup. IIUC, the following are suspect: * kmemleak's scan_should_stop() * x86's __kernel_fpu_begin() * arm64's arch_dup_task_struct() It's probably worth an is_thread(task) helper so that those can be written in an obviously correct way. > So I'm thinking the right thing to do here is something like the below; > umh should get PF_KTHREAD cleared when it passes exec(). And this should > also fix the power splat I'm thinking. > > --- > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index abbd4b3b96c2..9929404b6eb9 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5923,7 +5923,7 @@ static void perf_sample_regs_user(struct perf_regs *regs_user, > if (user_mode(regs)) { > regs_user->abi = perf_reg_abi(current); > regs_user->regs = regs; > - } else if (current->mm) { > + } else if (!(current->flags & PF_KTHREAD) && current->mm) { Wouldn't !PF_KTHREAD imply current->mm anyhow? Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 15:32 ` Will Deacon 2019-05-28 16:12 ` Mark Rutland 2019-05-28 17:32 ` Peter Zijlstra @ 2019-05-29 4:21 ` Michael Ellerman 2 siblings, 0 replies; 30+ messages in thread From: Michael Ellerman @ 2019-05-29 4:21 UTC (permalink / raw) To: Will Deacon, Peter Zijlstra Cc: mark.rutland, Young Xiao, x86, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang Will Deacon <will.deacon@arm.com> writes: > On Tue, May 28, 2019 at 04:01:03PM +0200, Peter Zijlstra wrote: >> On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: >> > When a kthread calls call_usermodehelper() the steps are: >> > 1. allocate current->mm >> > 2. load_elf_binary() >> > 3. populate current->thread.regs >> > >> > While doing this, interrupts are not disabled. If there is a perf >> > interrupt in the middle of this process (i.e. step 1 has completed >> > but not yet reached to step 3) and if perf tries to read userspace >> > regs, kernel oops. > > This seems to be because pt_regs(current) gives NULL for kthreads on Power. Right, we've done that since roughly forever in copy_thread(): int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long kthread_arg, struct task_struct *p) { ... /* Copy registers */ sp -= sizeof(struct pt_regs); childregs = (struct pt_regs *) sp; if (unlikely(p->flags & PF_KTHREAD)) { /* kernel thread */ memset(childregs, 0, sizeof(struct pt_regs)); childregs->gpr[1] = sp + sizeof(struct pt_regs); ... p->thread.regs = NULL; /* no user register state */ See commit from 2002: https://github.com/mpe/linux-fullhistory/commit/c0a96c0918d21d8a99270e94d9c4a4a322d04581#diff-edb76bfcc84905163f34d24d2aad3f3aR187 > From the initial report [1], it doesn't look like the mm isn't initialised, > but rather than we're dereferencing a NULL pt_regs pointer somehow for the > current task (see previous comment). I don't see how that can happen on > arm64, given that we put the pt_regs on the kernel stack which is allocated > during fork. We have the regs on the stack too (see above), but we're explicitly NULL'ing the link from task->thread. Looks like on arm64 and x86 there is no link from task->thread, instead you get from task to pt_regs via task_stack_page(). That actually seems potentially fishy given the comment on task_stack_page() about the stack going away for exiting tasks. We should probably be NULL'ing the regs pointer in free_thread_stack() or similar. Though that race mustn't be happening because other arches would see it. Or are we just wrong and kthreads should have non-NULL regs? I can't find another arch that does the same as us. cheers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] perf: Fix oops when kthread execs user process 2019-05-28 14:01 ` Peter Zijlstra 2019-05-28 15:32 ` Will Deacon @ 2019-05-29 1:44 ` Michael Ellerman 1 sibling, 0 replies; 30+ messages in thread From: Michael Ellerman @ 2019-05-29 1:44 UTC (permalink / raw) To: Peter Zijlstra, Young Xiao Cc: mark.rutland, x86, will.deacon, linux, linux-kernel, mingo, bp, hpa, ravi.bangoria, linux-arm-kernel, kan.liang Peter Zijlstra <peterz@infradead.org> writes: > On Tue, May 28, 2019 at 08:31:29PM +0800, Young Xiao wrote: >> When a kthread calls call_usermodehelper() the steps are: >> 1. allocate current->mm >> 2. load_elf_binary() >> 3. populate current->thread.regs >> >> While doing this, interrupts are not disabled. If there is a perf >> interrupt in the middle of this process (i.e. step 1 has completed >> but not yet reached to step 3) and if perf tries to read userspace >> regs, kernel oops. >> >> Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace >> pt_regs are not set. >> >> See commit bf05fc25f268 ("powerpc/perf: Fix oops when kthread execs >> user process") for details. > > Why the hell do we set current->mm before it is complete? Note that > normally exec() builds the new mm before attaching it, see exec_mmap() > in flush_old_exec(). > > Also, why did those PPC folks 'fix' this in isolation? And why didn't > you Cc them? We just assumed it was our bug, 'cause we have plenty of those :) cheers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2019-06-03 13:30 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-28 12:31 [PATCH] perf: Fix oops when kthread execs user process Young Xiao 2019-05-28 12:41 ` Russell King - ARM Linux admin 2019-05-28 14:01 ` Peter Zijlstra 2019-05-28 15:32 ` Will Deacon 2019-05-28 16:12 ` Mark Rutland 2019-05-28 17:32 ` Peter Zijlstra 2019-05-29 9:17 ` Will Deacon 2019-05-29 10:10 ` Peter Zijlstra 2019-05-29 10:20 ` Will Deacon 2019-05-29 12:55 ` Peter Zijlstra 2019-05-29 13:05 ` Will Deacon 2019-05-29 13:25 ` Peter Zijlstra 2019-05-29 14:35 ` Will Deacon 2019-05-29 16:19 ` Peter Zijlstra 2019-05-29 16:24 ` Mark Rutland 2019-05-29 16:38 ` Mark Rutland 2019-05-29 17:03 ` Peter Zijlstra 2019-05-30 10:35 ` Mark Rutland 2019-05-29 16:25 ` Will Deacon 2019-05-29 16:44 ` Peter Zijlstra 2019-05-30 7:28 ` Will Deacon 2019-05-30 8:38 ` Ravi Bangoria 2019-05-30 10:27 ` Ravi Bangoria 2019-05-31 15:37 ` Will Deacon 2019-06-03 11:23 ` Will Deacon 2019-06-03 11:48 ` Peter Zijlstra 2019-06-03 13:30 ` Michael Ellerman 2019-05-29 10:11 ` Mark Rutland 2019-05-29 4:21 ` Michael Ellerman 2019-05-29 1:44 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).