* [PATCH] sched/core: fix illegal RCU from offline CPUs
@ 2020-01-12 16:17 Qian Cai
2020-01-13 0:33 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Qian Cai @ 2020-01-12 16:17 UTC (permalink / raw)
To: peterz, mingo
Cc: juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
mgorman, paulmck, tglx, linux-mm, linux-kernel, Qian Cai
In the CPU-offline process, it calls mmdrop() after idle entry and the
subsequent call to cpuhp_report_idle_dead(). Once execution passes the
call to rcu_report_dead(), RCU is ignoring the CPU, which results in
lockdep complaints when mmdrop() uses RCU from either memcg or
debugobjects. Fix it by scheduling mmdrop() on another online CPU.
=============================
WARNING: suspicious RCU usage
-----------------------------
kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by swapper/37/0:
#0: c0000000010af608 (rcu_read_lock){....}, at:
percpu_ref_put_many+0x8/0x230
#1: c0000000010af608 (rcu_read_lock){....}, at:
__queue_work+0x7c/0xca0
stack backtrace:
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
get_work_pool+0x110/0x150
__queue_work+0x1bc/0xca0
queue_work_on+0x114/0x120
css_release+0x9c/0xc0
percpu_ref_put_many+0x204/0x230
free_pcp_prepare+0x264/0x570
free_unref_page+0x38/0xf0
__mmdrop+0x21c/0x2c0
idle_task_exit+0x170/0x1b0
pnv_smp_cpu_kill_self+0x38/0x2e0
cpu_die+0x48/0x64
arch_cpu_idle_dead+0x30/0x50
do_idle+0x2f4/0x470
cpu_startup_entry+0x38/0x40
start_secondary+0x7a8/0xa80
start_secondary_resume+0x10/0x14
=============================
WARNING: suspicious RCU usage
-----------------------------
kernel/sched/core.c:562 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by swapper/94/0:
#0: c000201cc77dc118 (&base->lock){-.-.}, at:
lock_timer_base+0x114/0x1f0
#1: c0000000010af608 (rcu_read_lock){....}, at:
get_nohz_timer_target+0x3c/0x2d0
stack backtrace:
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
get_nohz_timer_target+0x248/0x2d0
add_timer+0x24c/0x470
__queue_delayed_work+0x8c/0x110
queue_delayed_work_on+0x128/0x130
__debug_check_no_obj_freed+0x2ec/0x320
free_pcp_prepare+0x1b4/0x570
free_unref_page+0x38/0xf0
__mmdrop+0x21c/0x2c0
idle_task_exit+0x170/0x1b0
pnv_smp_cpu_kill_self+0x38/0x2e0
cpu_die+0x48/0x64
arch_cpu_idle_dead+0x30/0x50
do_idle+0x2f4/0x470
cpu_startup_entry+0x38/0x40
start_secondary+0x7a8/0xa80
start_secondary_prolog+0x10/0x14
Signed-off-by: Qian Cai <cai@lca.pw>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90e4b00ace89..41fb49f3dfce 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6194,7 +6194,8 @@ void idle_task_exit(void)
current->active_mm = &init_mm;
finish_arch_post_lock_switch();
}
- mmdrop(mm);
+ smp_call_function_single(cpumask_first(cpu_online_mask),
+ (void (*)(void *))mmdrop, mm, 0);
}
/*
--
2.21.0 (Apple Git-122.2)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
2020-01-12 16:17 [PATCH] sched/core: fix illegal RCU from offline CPUs Qian Cai
@ 2020-01-13 0:33 ` Tetsuo Handa
2020-01-13 6:30 ` Qian Cai
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2020-01-13 0:33 UTC (permalink / raw)
To: Qian Cai, peterz, mingo
Cc: juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
mgorman, paulmck, tglx, linux-mm, linux-kernel
On 2020/01/13 1:17, Qian Cai wrote:
> In the CPU-offline process, it calls mmdrop() after idle entry and the
> subsequent call to cpuhp_report_idle_dead(). Once execution passes the
> call to rcu_report_dead(), RCU is ignoring the CPU, which results in
> lockdep complaints when mmdrop() uses RCU from either memcg or
> debugobjects. Fix it by scheduling mmdrop() on another online CPU.
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 90e4b00ace89..41fb49f3dfce 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6194,7 +6194,8 @@ void idle_task_exit(void)
> current->active_mm = &init_mm;
> finish_arch_post_lock_switch();
> }
> - mmdrop(mm);
> + smp_call_function_single(cpumask_first(cpu_online_mask),
> + (void (*)(void *))mmdrop, mm, 0);
mmdrop() might sleep, but
/*
* smp_call_function_single - Run a function on a specific CPU
* @func: The function to run. This must be fast and non-blocking.
* @info: An arbitrary pointer to pass to the function.
* @wait: If true, wait until function has completed on other CPUs.
*
* Returns 0 on success, else a negative status code.
*/
. Maybe mmdrop_async() instead?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
2020-01-13 0:33 ` Tetsuo Handa
@ 2020-01-13 6:30 ` Qian Cai
2020-01-13 8:20 ` Tetsuo Handa
0 siblings, 1 reply; 5+ messages in thread
From: Qian Cai @ 2020-01-13 6:30 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Peter Zijlstra, Ingo Molnar, juri.lelli, vincent.guittot,
dietmar.eggemann, Steven Rostedt (VMware),
bsegall, mgorman, paulmck, tglx, linux-mm, linux-kernel
> On Jan 12, 2020, at 7:33 PM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>
> On 2020/01/13 1:17, Qian Cai wrote:
>> In the CPU-offline process, it calls mmdrop() after idle entry and the
>> subsequent call to cpuhp_report_idle_dead(). Once execution passes the
>> call to rcu_report_dead(), RCU is ignoring the CPU, which results in
>> lockdep complaints when mmdrop() uses RCU from either memcg or
>> debugobjects. Fix it by scheduling mmdrop() on another online CPU.
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 90e4b00ace89..41fb49f3dfce 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6194,7 +6194,8 @@ void idle_task_exit(void)
>> current->active_mm = &init_mm;
>> finish_arch_post_lock_switch();
>> }
>> - mmdrop(mm);
>> + smp_call_function_single(cpumask_first(cpu_online_mask),
>> + (void (*)(void *))mmdrop, mm, 0);
>
> mmdrop() might sleep, but
If that is the case, and then the commit e78a7614f387 (“idle: Prevent
late-arriving interrupts from disrupting offline”) is incorrect because it
will disable local irq before calling mmdrop() which will trigger
the might_sleep() warning. Can you prove it?
>
> /*
> * smp_call_function_single - Run a function on a specific CPU
> * @func: The function to run. This must be fast and non-blocking.
> * @info: An arbitrary pointer to pass to the function.
> * @wait: If true, wait until function has completed on other CPUs.
> *
> * Returns 0 on success, else a negative status code.
> */
>
> . Maybe mmdrop_async() instead?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
2020-01-13 6:30 ` Qian Cai
@ 2020-01-13 8:20 ` Tetsuo Handa
2020-01-13 13:42 ` Qian Cai
0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2020-01-13 8:20 UTC (permalink / raw)
To: Qian Cai
Cc: Peter Zijlstra, Ingo Molnar, juri.lelli, vincent.guittot,
dietmar.eggemann, Steven Rostedt (VMware),
bsegall, mgorman, paulmck, tglx, linux-mm, linux-kernel
On 2020/01/13 15:30, Qian Cai wrote:
>>> - mmdrop(mm);
>>> + smp_call_function_single(cpumask_first(cpu_online_mask),
>>> + (void (*)(void *))mmdrop, mm, 0);
>>
>> mmdrop() might sleep, but
>
> If that is the case, and then the commit e78a7614f387 (“idle: Prevent
> late-arriving interrupts from disrupting offline”) is incorrect because it
> will disable local irq before calling mmdrop() which will trigger
> the might_sleep() warning. Can you prove it?
Is commit 7283094ec3db318e ("kernel, oom: fix potential pgd_lock deadlock from
__mmdrop") about only softirq? Is it guaranteed that smp_call_function_single()
does not hit such race? Then just my overcareful...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
2020-01-13 8:20 ` Tetsuo Handa
@ 2020-01-13 13:42 ` Qian Cai
0 siblings, 0 replies; 5+ messages in thread
From: Qian Cai @ 2020-01-13 13:42 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Peter Zijlstra, Ingo Molnar, juri.lelli, vincent.guittot,
dietmar.eggemann, Steven Rostedt (VMware),
bsegall, mgorman, paulmck, tglx, linux-mm, linux-kernel
> On Jan 13, 2020, at 3:22 AM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Is commit 7283094ec3db318e ("kernel, oom: fix potential pgd_lock deadlock from
> __mmdrop") about only softirq? Is it guaranteed that smp_call_function_single()
> does not hit such race? Then just my overcareful...
That commit looks like a different issue. We are not call mmdrop() from softirq here.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-01-13 13:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-12 16:17 [PATCH] sched/core: fix illegal RCU from offline CPUs Qian Cai
2020-01-13 0:33 ` Tetsuo Handa
2020-01-13 6:30 ` Qian Cai
2020-01-13 8:20 ` Tetsuo Handa
2020-01-13 13:42 ` Qian Cai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).