linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/core: fix illegal RCU from offline CPUs
@ 2020-01-12 16:17 Qian Cai
  2020-01-13  0:33 ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Qian Cai @ 2020-01-12 16:17 UTC (permalink / raw)
  To: peterz, mingo
  Cc: juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, paulmck, tglx, linux-mm, linux-kernel, Qian Cai

In the CPU-offline process, it calls mmdrop() after idle entry and the
subsequent call to cpuhp_report_idle_dead(). Once execution passes the
call to rcu_report_dead(), RCU is ignoring the CPU, which results in
lockdep complaints when mmdrop() uses RCU from either memcg or
debugobjects. Fix it by scheduling mmdrop() on another online CPU.

=============================
 WARNING: suspicious RCU usage
 -----------------------------
 kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!

 other info that might help us debug this:

 RCU used illegally from offline CPU!
 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by swapper/37/0:
  #0: c0000000010af608 (rcu_read_lock){....}, at:
      percpu_ref_put_many+0x8/0x230
  #1: c0000000010af608 (rcu_read_lock){....}, at:
      __queue_work+0x7c/0xca0

 stack backtrace:
 Call Trace:
  dump_stack+0xf4/0x164 (unreliable)
  lockdep_rcu_suspicious+0x140/0x164
  get_work_pool+0x110/0x150
  __queue_work+0x1bc/0xca0
  queue_work_on+0x114/0x120
  css_release+0x9c/0xc0
  percpu_ref_put_many+0x204/0x230
  free_pcp_prepare+0x264/0x570
  free_unref_page+0x38/0xf0
  __mmdrop+0x21c/0x2c0
  idle_task_exit+0x170/0x1b0
  pnv_smp_cpu_kill_self+0x38/0x2e0
  cpu_die+0x48/0x64
  arch_cpu_idle_dead+0x30/0x50
  do_idle+0x2f4/0x470
  cpu_startup_entry+0x38/0x40
  start_secondary+0x7a8/0xa80
  start_secondary_resume+0x10/0x14

 =============================
 WARNING: suspicious RCU usage
 -----------------------------
 kernel/sched/core.c:562 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 RCU used illegally from offline CPU!
 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by swapper/94/0:
  #0: c000201cc77dc118 (&base->lock){-.-.}, at:
      lock_timer_base+0x114/0x1f0
  #1: c0000000010af608 (rcu_read_lock){....}, at:
      get_nohz_timer_target+0x3c/0x2d0

 stack backtrace:
 Call Trace:
  dump_stack+0xf4/0x164 (unreliable)
  lockdep_rcu_suspicious+0x140/0x164
  get_nohz_timer_target+0x248/0x2d0
  add_timer+0x24c/0x470
  __queue_delayed_work+0x8c/0x110
  queue_delayed_work_on+0x128/0x130
  __debug_check_no_obj_freed+0x2ec/0x320
  free_pcp_prepare+0x1b4/0x570
  free_unref_page+0x38/0xf0
  __mmdrop+0x21c/0x2c0
  idle_task_exit+0x170/0x1b0
  pnv_smp_cpu_kill_self+0x38/0x2e0
  cpu_die+0x48/0x64
  arch_cpu_idle_dead+0x30/0x50
  do_idle+0x2f4/0x470
  cpu_startup_entry+0x38/0x40
  start_secondary+0x7a8/0xa80
  start_secondary_prolog+0x10/0x14

Signed-off-by: Qian Cai <cai@lca.pw>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90e4b00ace89..41fb49f3dfce 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6194,7 +6194,8 @@ void idle_task_exit(void)
 		current->active_mm = &init_mm;
 		finish_arch_post_lock_switch();
 	}
-	mmdrop(mm);
+	smp_call_function_single(cpumask_first(cpu_online_mask),
+				(void (*)(void *))mmdrop, mm, 0);
 }
 
 /*
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
  2020-01-12 16:17 [PATCH] sched/core: fix illegal RCU from offline CPUs Qian Cai
@ 2020-01-13  0:33 ` Tetsuo Handa
  2020-01-13  6:30   ` Qian Cai
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2020-01-13  0:33 UTC (permalink / raw)
  To: Qian Cai, peterz, mingo
  Cc: juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, paulmck, tglx, linux-mm, linux-kernel

On 2020/01/13 1:17, Qian Cai wrote:
> In the CPU-offline process, it calls mmdrop() after idle entry and the
> subsequent call to cpuhp_report_idle_dead(). Once execution passes the
> call to rcu_report_dead(), RCU is ignoring the CPU, which results in
> lockdep complaints when mmdrop() uses RCU from either memcg or
> debugobjects. Fix it by scheduling mmdrop() on another online CPU.
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 90e4b00ace89..41fb49f3dfce 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6194,7 +6194,8 @@ void idle_task_exit(void)
>  		current->active_mm = &init_mm;
>  		finish_arch_post_lock_switch();
>  	}
> -	mmdrop(mm);
> +	smp_call_function_single(cpumask_first(cpu_online_mask),
> +				(void (*)(void *))mmdrop, mm, 0);

mmdrop() might sleep, but

/*
 * smp_call_function_single - Run a function on a specific CPU
 * @func: The function to run. This must be fast and non-blocking.
 * @info: An arbitrary pointer to pass to the function.
 * @wait: If true, wait until function has completed on other CPUs.
 *
 * Returns 0 on success, else a negative status code.
 */

. Maybe mmdrop_async() instead?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
  2020-01-13  0:33 ` Tetsuo Handa
@ 2020-01-13  6:30   ` Qian Cai
  2020-01-13  8:20     ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Qian Cai @ 2020-01-13  6:30 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Peter Zijlstra, Ingo Molnar, juri.lelli, vincent.guittot,
	dietmar.eggemann, Steven Rostedt (VMware),
	bsegall, mgorman, paulmck, tglx, linux-mm, linux-kernel



> On Jan 12, 2020, at 7:33 PM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
> On 2020/01/13 1:17, Qian Cai wrote:
>> In the CPU-offline process, it calls mmdrop() after idle entry and the
>> subsequent call to cpuhp_report_idle_dead(). Once execution passes the
>> call to rcu_report_dead(), RCU is ignoring the CPU, which results in
>> lockdep complaints when mmdrop() uses RCU from either memcg or
>> debugobjects. Fix it by scheduling mmdrop() on another online CPU.
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 90e4b00ace89..41fb49f3dfce 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6194,7 +6194,8 @@ void idle_task_exit(void)
>> 		current->active_mm = &init_mm;
>> 		finish_arch_post_lock_switch();
>> 	}
>> -	mmdrop(mm);
>> +	smp_call_function_single(cpumask_first(cpu_online_mask),
>> +				(void (*)(void *))mmdrop, mm, 0);
> 
> mmdrop() might sleep, but

If that is the case, and then the commit e78a7614f387 (“idle: Prevent
late-arriving interrupts from disrupting offline”) is incorrect because it
will disable local irq before calling mmdrop() which will trigger
the might_sleep() warning. Can you prove it?

> 
> /*
> * smp_call_function_single - Run a function on a specific CPU
> * @func: The function to run. This must be fast and non-blocking.
> * @info: An arbitrary pointer to pass to the function.
> * @wait: If true, wait until function has completed on other CPUs.
> *
> * Returns 0 on success, else a negative status code.
> */
> 
> . Maybe mmdrop_async() instead?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
  2020-01-13  6:30   ` Qian Cai
@ 2020-01-13  8:20     ` Tetsuo Handa
  2020-01-13 13:42       ` Qian Cai
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2020-01-13  8:20 UTC (permalink / raw)
  To: Qian Cai
  Cc: Peter Zijlstra, Ingo Molnar, juri.lelli, vincent.guittot,
	dietmar.eggemann, Steven Rostedt (VMware),
	bsegall, mgorman, paulmck, tglx, linux-mm, linux-kernel

On 2020/01/13 15:30, Qian Cai wrote:
>>> -	mmdrop(mm);
>>> +	smp_call_function_single(cpumask_first(cpu_online_mask),
>>> +				(void (*)(void *))mmdrop, mm, 0);
>>
>> mmdrop() might sleep, but
> 
> If that is the case, and then the commit e78a7614f387 (“idle: Prevent
> late-arriving interrupts from disrupting offline”) is incorrect because it
> will disable local irq before calling mmdrop() which will trigger
> the might_sleep() warning. Can you prove it?

Is commit 7283094ec3db318e ("kernel, oom: fix potential pgd_lock deadlock from
__mmdrop") about only softirq? Is it guaranteed that smp_call_function_single()
does not hit such race? Then just my overcareful...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/core: fix illegal RCU from offline CPUs
  2020-01-13  8:20     ` Tetsuo Handa
@ 2020-01-13 13:42       ` Qian Cai
  0 siblings, 0 replies; 5+ messages in thread
From: Qian Cai @ 2020-01-13 13:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Peter Zijlstra, Ingo Molnar, juri.lelli, vincent.guittot,
	dietmar.eggemann, Steven Rostedt (VMware),
	bsegall, mgorman, paulmck, tglx, linux-mm, linux-kernel



> On Jan 13, 2020, at 3:22 AM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> 
> Is commit 7283094ec3db318e ("kernel, oom: fix potential pgd_lock deadlock from
> __mmdrop") about only softirq? Is it guaranteed that smp_call_function_single()
> does not hit such race? Then just my overcareful...

That commit looks like a different issue. We are not call mmdrop() from softirq here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-13 13:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-12 16:17 [PATCH] sched/core: fix illegal RCU from offline CPUs Qian Cai
2020-01-13  0:33 ` Tetsuo Handa
2020-01-13  6:30   ` Qian Cai
2020-01-13  8:20     ` Tetsuo Handa
2020-01-13 13:42       ` Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).