linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com,
	"Steven Rostedt (VMware)" <rostedt@goodmis.org>,
	bsegall@google.com, mgorman@suse.de,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] sched/core: fix illegal RCU from offline CPUs
Date: Wed, 1 Apr 2020 17:05:50 -0400	[thread overview]
Message-ID: <C6D3C929-C2FF-4F0B-8095-801995B8F525@lca.pw> (raw)
In-Reply-To: <5FC8821F-833D-4EE8-B6BD-F8ADB5F29A77@lca.pw>



> On Mar 29, 2020, at 10:42 PM, Qian Cai <cai@lca.pw> wrote:
> 
> 
> 
>> On Jan 21, 2020, at 5:35 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> On Mon, Jan 20, 2020 at 03:35:09PM -0500, Qian Cai wrote:
>>>> On Jan 20, 2020, at 5:17 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> 
>>>> Bah.. that's horrible. Surely we can find a better place to do this in
>>>> the whole hotplug machinery.
>>>> 
>>>> Perhaps you can have takedown_cpu() do the mmdrop()?
>>> 
>>> The problem is that no all arch_cpu_idle_dead() will call
>>> idle_task_exit(). For example, alpha and parisc are not, so it needs
>> 
>> How is that not broken? If the idle thread runs on an active_mm, we need
>> to drop that reference. This needs fixing regardless.
> 
> It turns out alpha does not support cpu hotplug (no CONFIG_HOTPLUG_CPU),
> and parisc will use &init_mm for idle tasks of secondary CPUs as in,
> 
> smp_callin()
>  smp_cpu_init()
> 
> /* Initialise the idle task for this CPU */	
> mmgrab(&init_mm);
> current->active_mm = &init_mm;
> BUG_ON(current->mm);
> enter_lazy_tlb(&init_mm, current);
> 
>> 
>>> to deal with some kind of ifdef dance in takedown_cpu() to
>>> conditionally call mmdrop() which sounds even more horrible?
>>> 
>>> If you really prefer it anyway, maybe something like touching every arch’s __cpu_die() to also call mmdrop() depends on arches?
>>> 
>>> BTW, how to obtain the other CPU’s current task mm? Is that cpu_rq(cpu)->curr->active_mm?
>> 
>> Something like this; except you'll need to go audit archs to make sure
>> they all call idle_task_exit() and/or put in comments on why they don't
>> have to (perhaps their bringup switches them to &init_mm unconditionally
>> and the switch_mm() is not required).
> 
> I have to modify your code slightly (below) to pass compilation, but then it will
> panic during cpu offline for some reasons,

OK, I found this little thing on powerpc needs to flip. I’ll post an v2 with Peter
as an author. Thus, need your signed-off.

> 
> [  258.972625][ T9006] Faulting instruction address: 0xc00000000010afc4
> [  258.972640][ T9006] Oops: Kernel access of bad area, sig: 11 [#1]
> [  258.972651][ T9006] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
> [  258.972675][ T9006] Modules linked in: kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci mdio libahci tg3 libata libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [  258.972716][ T9006] CPU: 84 PID: 9006 Comm: cpuhotplug04.sh Not tainted 5.6.0-rc7-next-20200327+ #7
> [  258.972741][ T9006] NIP:  c00000000010afc4 LR: c00000000010afa0 CTR: fffffffffffffffd
> [  258.972775][ T9006] REGS: c000201a54def7f0 TRAP: 0300   Not tainted  (5.6.0-rc7-next-20200327+)
> [  258.972809][ T9006] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28224824  XER: 20040000
> [  258.972839][ T9006] CFAR: c00000000015951c DAR: 0000000000000050 DSISR: 40000000 IRQMASK: 0 
> [  258.972839][ T9006] GPR00: c00000000010afa0 c000201a54defa80 c00000000165bd00 c000000001604180 
> [  258.972839][ T9006] GPR04: c00000008160496f 0000000000000000 ffff0a01ffffff10 0000000000000020 
> [  258.972839][ T9006] GPR08: 0000000000000050 0000000000000000 c00000000162c0e0 0000000000000002 
> [  258.972839][ T9006] GPR12: 0000000088224428 c000201fff69a900 0000000040000000 000000011a029798 
> [  258.972839][ T9006] GPR16: 000000011a029724 0000000119fc6968 0000000119f5f230 000000011a02d568 
> [  258.972839][ T9006] GPR20: 00000001554c66f0 0000000000000000 c000001ffc957820 c00000000010af80 
> [  258.972839][ T9006] GPR24: 0000001ffb8a0000 c0000000014f34a8 0000000000000056 c0000000010b7820 
> [  258.972839][ T9006] GPR28: c00000000168c654 0000000000000000 c00000000168c3a8 0000000000000000 
> [  258.973070][ T9006] NIP [c00000000010afc4] finish_cpu+0x44/0x90
> atomic_dec_return_relaxed at arch/powerpc/include/asm/atomic.h:179
> (inlined by) atomic_dec_return at include/linux/atomic-fallback.h:518
> (inlined by) atomic_dec_and_test at include/linux/atomic-fallback.h:1035
> (inlined by) mmdrop at include/linux/sched/mm.h:48
> (inlined by) finish_cpu at kernel/cpu.c:579
> [  258.973092][ T9006] LR [c00000000010afa0] finish_cpu+0x20/0x90
> [  258.973113][ T9006] Call Trace:
> [  258.973132][ T9006] [c000201a54defa80] [c00000000010afa0] finish_cpu+0x20/0x90 (unreliable)
> [  258.973166][ T9006] [c000201a54defaa0] [c00000000010cd34] cpuhp_invoke_callback+0x194/0x15f0
> cpuhp_invoke_callback at kernel/cpu.c:173
> [  258.973202][ T9006] [c000201a54defb50] [c000000000111acc] _cpu_down.constprop.15+0x17c/0x410
> [  258.973237][ T9006] [c000201a54defbc0] [c00000000010ff84] cpu_down+0x64/0xb0
> [  258.973263][ T9006] [c000201a54defc00] [c000000000754760] cpu_subsys_offline+0x20/0x40
> [  258.973299][ T9006] [c000201a54defc20] [c00000000074ab10] device_offline+0x100/0x140
> [  258.973344][ T9006] [c000201a54defc60] [c00000000074ace0] online_store+0xa0/0x110
> [  258.973378][ T9006] [c000201a54defcb0] [c000000000744508] dev_attr_store+0x38/0x60
> [  258.973414][ T9006] [c000201a54defcd0] [c0000000005df770] sysfs_kf_write+0x70/0xb0
> [  258.973438][ T9006] [c000201a54defd10] [c0000000005de92c] kernfs_fop_write+0x11c/0x270
> [  258.973463][ T9006] [c000201a54defd60] [c0000000004c09bc] __vfs_write+0x3c/0x70
> [  258.973487][ T9006] [c000201a54defd80] [c0000000004c3dbc] vfs_write+0xcc/0x200
> [  258.973522][ T9006] [c000201a54defdd0] [c0000000004c415c] ksys_write+0x7c/0x140
> [  258.973557][ T9006] [c000201a54defe20] [c00000000000b3f8] system_call+0x5c/0x68
> [  258.973579][ T9006] Instruction dump:
> [  258.973598][ T9006] f8010010 f821ffe1 4804e4ed 60000000 3d42fffd 394a03e0 e9230560 7fa95000 
> [  258.973636][ T9006] 419e0030 f9430560 39090050 7c0004ac <7d404028> 314affff 7d40412d 40c2fff4 
> [  258.973675][ T9006] ---[ end trace 4f72c711066ac397 ]---
> [  259.076271][ T9006] 
> [  260.076362][ T9006] Kernel panic - not syncing: Fatal exception
> 
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index c49257a3b510..bc782562ae93 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -49,6 +49,8 @@ static inline void mmdrop(struct mm_struct *mm)
> 		__mmdrop(mm);
> }
> 
> +extern void mmdrop(struct mm_struct *mm);
> +
> /*
>  * This has to be called after a get_task_mm()/mmget_not_zero()
>  * followed by taking the mmap_sem for writing before modifying the
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 0cca1a2b4930..302007a602a5 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -3,6 +3,7 @@
>  *
>  * This code is licenced under the GPL.
>  */
> +#include <linux/sched/mm.h>
> #include <linux/proc_fs.h>
> #include <linux/smp.h>
> #include <linux/init.h>
> @@ -564,6 +565,22 @@ static int bringup_cpu(unsigned int cpu)
> 	return bringup_wait_for_ap(cpu);
> }
> 
> +static int finish_cpu(unsigned int cpu)
> +{
> +	struct task_struct *idle = idle_thread_get(cpu);
> +	struct mm_struct *mm = idle->active_mm;
> +
> +	/*
> +	 * idle_task_exit() will have switched to &init_mm, now
> +	 * clean up any remaining active_mm state.
> +	 */
> +	if (mm != &init_mm) {
> +		idle->active_mm = &init_mm;
> +		mmdrop(mm);
> +	}
> +	return 0;
> +}
> +
> /*
>  * Hotplug state machine related functions
>  */
> @@ -1549,7 +1566,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> 	[CPUHP_BRINGUP_CPU] = {
> 		.name			= "cpu:bringup",
> 		.startup.single		= bringup_cpu,
> -		.teardown.single	= NULL,
> +		.teardown.single	= finish_cpu,
> 		.cant_stop		= true,
> 	},
> 	/* Final state before CPU kills itself */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ac8b65298774..16aad39216a2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6248,13 +6248,14 @@ void idle_task_exit(void)
> 	struct mm_struct *mm = current->active_mm;
> 
> 	BUG_ON(cpu_online(smp_processor_id()));
> +	BUG_ON(current != this_rq()->idle);
> 
> 	if (mm != &init_mm) {
> 		switch_mm(mm, &init_mm, current);
> -		current->active_mm = &init_mm;
> 		finish_arch_post_lock_switch();
> 	}
> -	mmdrop(mm);
> +
> +	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
> }
> 
> /*
> 
> 
>> 
>> ---
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 9c706af713fb..2b4d8a69e8d9 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -564,6 +564,23 @@ static int bringup_cpu(unsigned int cpu)
>> 	return bringup_wait_for_ap(cpu);
>> }
>> 
>> +static int finish_cpu(unsigned int cpu)
>> +{
>> +	struct task_struct *idle = idle_thread_get(cpu);
>> +	struct mm_struct *mm = idle->active_mm;
>> +
>> +	/*
>> +	 * idle_task_exit() will have switched to &init_mm, now
>> +	 * clean up any remaining active_mm state.
>> +	 */
>> +
>> +	if (mm == &init_mm)
>> +		return;
>> +
>> +	idle->active_mm = &init_mm;
>> +	mmdrop(mm);
>> +}
>> +
>> /*
>> * Hotplug state machine related functions
>> */
>> @@ -1434,7 +1451,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>> 	[CPUHP_BRINGUP_CPU] = {
>> 		.name			= "cpu:bringup",
>> 		.startup.single		= bringup_cpu,
>> -		.teardown.single	= NULL,
>> +		.teardown.single	= finish_cpu,
>> 		.cant_stop		= true,
>> 	},
>> 	/* Final state before CPU kills itself */
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index fc1dfc007604..8f049fb77a3d 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6188,13 +6188,14 @@ void idle_task_exit(void)
>> 	struct mm_struct *mm = current->active_mm;
>> 
>> 	BUG_ON(cpu_online(smp_processor_id()));
>> +	BUG_ON(current != this_rq()->idle);
>> 
>> 	if (mm != &init_mm) {
>> 		switch_mm(mm, &init_mm, current);
>> -		current->active_mm = &init_mm;
>> 		finish_arch_post_lock_switch();
>> 	}
>> -	mmdrop(mm);
>> +
>> +	/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
>> }
>> 
>> /*


  reply	other threads:[~2020-04-01 21:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-13 19:03 [PATCH v2] sched/core: fix illegal RCU from offline CPUs Qian Cai
2020-01-20 10:16 ` Peter Zijlstra
2020-01-20 20:35   ` Qian Cai
2020-01-21 10:35     ` Peter Zijlstra
2020-01-24  4:21       ` Qian Cai
2020-01-24  5:02         ` Matthew Wilcox
2020-03-30  2:42       ` Qian Cai
2020-04-01 21:05         ` Qian Cai [this message]
2020-04-01 21:40 Qian Cai
2020-04-02 11:24 ` Michael Ellerman
2020-04-02 14:00   ` Qian Cai
2020-04-02 15:54     ` Paul E. McKenney
2020-04-02 16:19       ` Qian Cai
2020-04-02 16:57         ` Paul E. McKenney
2020-04-17 13:26   ` Qian Cai
2020-04-21 13:56     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C6D3C929-C2FF-4F0B-8095-801995B8F525@lca.pw \
    --to=cai@lca.pw \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).