All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
@ 2021-01-13 18:31 Valentin Schneider
  2021-01-13 22:02 ` Paul E. McKenney
  2021-01-22 17:41 ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
  0 siblings, 2 replies; 10+ messages in thread
From: Valentin Schneider @ 2021-01-13 18:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: paulmck, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

Since commit

  1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")

tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.

Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.

Print the dying rq's tasks to shed some light on the stragglers.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
As Peter pointed out, this should really be caught much earlier than
sched_cpu_dying().

If we go down the route of preventing kthreads from being affined to
!active CPUs in __set_cpus_allowed_ptr() (genuine percpu kthreads sidestep
it via kthread_bind_mask()), then I *think* we could catch this in wakeups,
i.e. select_task_rq(). I've been playing around there, but it's not as
straightforward as I'd have hoped.
---
 kernel/sched/core.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9a08a39d7cdb..d784dd1ae436 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7564,6 +7564,25 @@ static void calc_load_migrate(struct rq *rq)
 		atomic_long_add(delta, &calc_load_tasks);
 }
 
+static void dump_rq_tasks(struct rq *rq, const char *loglvl)
+{
+	struct task_struct *g, *p;
+	int cpu = cpu_of(rq);
+
+	lockdep_assert_held(&rq->lock);
+
+	printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running);
+	for_each_process_thread(g, p) {
+		if (task_cpu(p) != cpu)
+			continue;
+
+		if (!task_on_rq_queued(p))
+			continue;
+
+		printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);
+	}
+}
+
 int sched_cpu_dying(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7573,7 +7592,10 @@ int sched_cpu_dying(unsigned int cpu)
 	sched_tick_stop(cpu);
 
 	rq_lock_irqsave(rq, &rf);
-	BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq));
+	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
+		WARN(true, "Dying CPU not properly vacated!");
+		dump_rq_tasks(rq, KERN_WARNING);
+	}
 	rq_unlock_irqrestore(rq, &rf);
 
 	calc_load_migrate(rq);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-13 18:31 [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying() Valentin Schneider
@ 2021-01-13 22:02 ` Paul E. McKenney
  2021-01-14  0:15   ` Valentin Schneider
  2021-01-22 17:41 ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider
  1 sibling, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-13 22:02 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On Wed, Jan 13, 2021 at 06:31:41PM +0000, Valentin Schneider wrote:
> Since commit
> 
>   1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
> 
> tasks are expected to move themselves out of a out-going CPU. For most
> tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
> will have to cooperate and move themselves away one way or another.
> 
> Currently, some percpu kthreads (workqueues being a notable exemple) do not
> cooperate nicely and can end up on an out-going CPU at the time
> sched_cpu_dying() is invoked.
> 
> Print the dying rq's tasks to shed some light on the stragglers.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> As Peter pointed out, this should really be caught much earlier than
> sched_cpu_dying().
> 
> If we go down the route of preventing kthreads from being affined to
> !active CPUs in __set_cpus_allowed_ptr() (genuine percpu kthreads sidestep
> it via kthread_bind_mask()), then I *think* we could catch this in wakeups,
> i.e. select_task_rq(). I've been playing around there, but it's not as
> straightforward as I'd have hoped.
> ---

Given that I am not seeing much sched_cpu_dying(), this patch didn't
produce any output.  (I will try other configurations.)

However, it did produce the following new-to-me splat, which will
hopefully be of some help.

							Thanx, Paul

------------------------------------------------------------------------

WARNING: CPU: 2 PID: 23 at kernel/kthread.c:508 kthread_set_per_cpu+0x3b/0x50
Modules linked in:
CPU: 2 PID: 23 Comm: cpuhp/2 Not tainted 5.11.0-rc3+ #1180
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:kthread_set_per_cpu+0x3b/0x50
Code: 00 48 85 c0 74 1f 40 84 f6 74 16 81 e2 00 00 00 04 74 1b 83 bf a0 03 00 00
+01 75 0e f0 80 08 01 c3 f0 80 20 fe c3 0f 0b eb d0 <0f> 0b eb ee 0f 0b eb e1 0f
+1f 00 66 2e 0f 1f 84 00 00 00 00 00 f6
RSP: 0000:ffffb25c80103df8 EFLAGS: 00010202
RAX: ffff94ac8188ec00 RBX: ffff94ac81390240 RCX: 0000000000000000
RDX: 0000000004000000 RSI: 0000000000000001 RDI: ffff94ac818fde00
RBP: ffff94ac9f4aadc0 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000004 R11: 0000000000000000 R12: ffff94ac9f4ab0e8
R13: 0000000000000002 R14: ffffffffb9868c40 R15: 00000000ffffffff
FS:  0000000000000000(0000) GS:ffff94ac9f480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000001b022000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 workqueue_online_cpu+0x19c/0x360
 ? workqueue_prepare_cpu+0x70/0x70
 cpuhp_invoke_callback+0x9e/0x890
 cpuhp_thread_fun+0x199/0x230
 ? _raw_spin_unlock_irqrestore+0x2f/0x50
 ? sort_range+0x20/0x20
 smpboot_thread_fn+0x193/0x230
 kthread+0x13b/0x160
 ? kthread_insert_work_sanity_check+0x50/0x50
 ret_from_fork+0x22/0x30
irq event stamp: 38113
hardirqs last  enabled at (38121): [<ffffffffb80dafaa>]
+console_unlock+0x46a/0x550
hardirqs last disabled at (38130): [<ffffffffb80daf16>]
+console_unlock+0x3d6/0x550
softirqs last  enabled at (37574): [<ffffffffb9000342>] __do_softirq+0x342/0x48e
softirqs last disabled at (37567): [<ffffffffb8e00f92>]
+asm_call_irq_on_stack+0x12/0x20
---[ end trace 0b77ae0f211adc14 ]---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-13 22:02 ` Paul E. McKenney
@ 2021-01-14  0:15   ` Valentin Schneider
  2021-01-14  0:36     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2021-01-14  0:15 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On 13/01/21 14:02, Paul E. McKenney wrote:
>
> Given that I am not seeing much sched_cpu_dying(), this patch didn't
> produce any output.  (I will try other configurations.)
>

Thanks for giving it a spin! I think with the current series (either
Lai's or Peter's) sched_cpu_dying() should go smoothly, but you never
know.

> However, it did produce the following new-to-me splat, which will
> hopefully be of some help.
>
>                                                       Thanx, Paul
>
> ------------------------------------------------------------------------
>
> WARNING: CPU: 2 PID: 23 at kernel/kthread.c:508 kthread_set_per_cpu+0x3b/0x50

Aha, so that's that warning I was expecting to see [1].
Did you also get the process_one_work() one?

FWIW I think Peter's suggested approach of killing+respawning the pcpu
kworkers should prevent at least this one from showing up - all of the
affinity / flag changes would happen before the task gets enqueued
anywhere.

[1]: http://lore.kernel.org/r/jhjturkzzv9.mognet@arm.com

> Modules linked in:
> CPU: 2 PID: 23 Comm: cpuhp/2 Not tainted 5.11.0-rc3+ #1180
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> RIP: 0010:kthread_set_per_cpu+0x3b/0x50
> Code: 00 48 85 c0 74 1f 40 84 f6 74 16 81 e2 00 00 00 04 74 1b 83 bf a0 03 00 00
> +01 75 0e f0 80 08 01 c3 f0 80 20 fe c3 0f 0b eb d0 <0f> 0b eb ee 0f 0b eb e1 0f
> +1f 00 66 2e 0f 1f 84 00 00 00 00 00 f6
> RSP: 0000:ffffb25c80103df8 EFLAGS: 00010202
> RAX: ffff94ac8188ec00 RBX: ffff94ac81390240 RCX: 0000000000000000
> RDX: 0000000004000000 RSI: 0000000000000001 RDI: ffff94ac818fde00
> RBP: ffff94ac9f4aadc0 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000004 R11: 0000000000000000 R12: ffff94ac9f4ab0e8
> R13: 0000000000000002 R14: ffffffffb9868c40 R15: 00000000ffffffff
> FS:  0000000000000000(0000) GS:ffff94ac9f480000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000001b022000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  workqueue_online_cpu+0x19c/0x360
>  ? workqueue_prepare_cpu+0x70/0x70
>  cpuhp_invoke_callback+0x9e/0x890
>  cpuhp_thread_fun+0x199/0x230
>  ? _raw_spin_unlock_irqrestore+0x2f/0x50
>  ? sort_range+0x20/0x20
>  smpboot_thread_fn+0x193/0x230
>  kthread+0x13b/0x160
>  ? kthread_insert_work_sanity_check+0x50/0x50
>  ret_from_fork+0x22/0x30
> irq event stamp: 38113
> hardirqs last  enabled at (38121): [<ffffffffb80dafaa>]
> +console_unlock+0x46a/0x550
> hardirqs last disabled at (38130): [<ffffffffb80daf16>]
> +console_unlock+0x3d6/0x550
> softirqs last  enabled at (37574): [<ffffffffb9000342>] __do_softirq+0x342/0x48e
> softirqs last disabled at (37567): [<ffffffffb8e00f92>]
> +asm_call_irq_on_stack+0x12/0x20
> ---[ end trace 0b77ae0f211adc14 ]---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-14  0:15   ` Valentin Schneider
@ 2021-01-14  0:36     ` Paul E. McKenney
  2021-01-14 10:37       ` Valentin Schneider
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-14  0:36 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On Thu, Jan 14, 2021 at 12:15:24AM +0000, Valentin Schneider wrote:
> On 13/01/21 14:02, Paul E. McKenney wrote:
> >
> > Given that I am not seeing much sched_cpu_dying(), this patch didn't
> > produce any output.  (I will try other configurations.)
> 
> Thanks for giving it a spin! I think with the current series (either
> Lai's or Peter's) sched_cpu_dying() should go smoothly, but you never
> know.

I was running the patch set having one of Lai's and three of Peter's,
which sounds like Peter's.  If I understand which series is which,
Peter's has the advantage of not requiring rcutorture changes.  ;-)

> > However, it did produce the following new-to-me splat, which will
> > hopefully be of some help.
> >
> >                                                       Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > WARNING: CPU: 2 PID: 23 at kernel/kthread.c:508 kthread_set_per_cpu+0x3b/0x50
> 
> Aha, so that's that warning I was expecting to see [1].
> Did you also get the process_one_work() one?

Yes.  Of 112 one-hour runs, there were five process_one_work() splats
and two kthread_set_per_cpu() splats.  Each splat-ridden run had exactly
one splat.

> FWIW I think Peter's suggested approach of killing+respawning the pcpu
> kworkers should prevent at least this one from showing up - all of the
> affinity / flag changes would happen before the task gets enqueued
> anywhere.

Here is hoping!  ;-)

							Thanx, Paul

> [1]: http://lore.kernel.org/r/jhjturkzzv9.mognet@arm.com
> 
> > Modules linked in:
> > CPU: 2 PID: 23 Comm: cpuhp/2 Not tainted 5.11.0-rc3+ #1180
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
> > RIP: 0010:kthread_set_per_cpu+0x3b/0x50
> > Code: 00 48 85 c0 74 1f 40 84 f6 74 16 81 e2 00 00 00 04 74 1b 83 bf a0 03 00 00
> > +01 75 0e f0 80 08 01 c3 f0 80 20 fe c3 0f 0b eb d0 <0f> 0b eb ee 0f 0b eb e1 0f
> > +1f 00 66 2e 0f 1f 84 00 00 00 00 00 f6
> > RSP: 0000:ffffb25c80103df8 EFLAGS: 00010202
> > RAX: ffff94ac8188ec00 RBX: ffff94ac81390240 RCX: 0000000000000000
> > RDX: 0000000004000000 RSI: 0000000000000001 RDI: ffff94ac818fde00
> > RBP: ffff94ac9f4aadc0 R08: 0000000000000001 R09: 0000000000000001
> > R10: 0000000000000004 R11: 0000000000000000 R12: ffff94ac9f4ab0e8
> > R13: 0000000000000002 R14: ffffffffb9868c40 R15: 00000000ffffffff
> > FS:  0000000000000000(0000) GS:ffff94ac9f480000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 000000001b022000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  workqueue_online_cpu+0x19c/0x360
> >  ? workqueue_prepare_cpu+0x70/0x70
> >  cpuhp_invoke_callback+0x9e/0x890
> >  cpuhp_thread_fun+0x199/0x230
> >  ? _raw_spin_unlock_irqrestore+0x2f/0x50
> >  ? sort_range+0x20/0x20
> >  smpboot_thread_fn+0x193/0x230
> >  kthread+0x13b/0x160
> >  ? kthread_insert_work_sanity_check+0x50/0x50
> >  ret_from_fork+0x22/0x30
> > irq event stamp: 38113
> > hardirqs last  enabled at (38121): [<ffffffffb80dafaa>]
> > +console_unlock+0x46a/0x550
> > hardirqs last disabled at (38130): [<ffffffffb80daf16>]
> > +console_unlock+0x3d6/0x550
> > softirqs last  enabled at (37574): [<ffffffffb9000342>] __do_softirq+0x342/0x48e
> > softirqs last disabled at (37567): [<ffffffffb8e00f92>]
> > +asm_call_irq_on_stack+0x12/0x20
> > ---[ end trace 0b77ae0f211adc14 ]---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-14  0:36     ` Paul E. McKenney
@ 2021-01-14 10:37       ` Valentin Schneider
  2021-01-14 15:22         ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2021-01-14 10:37 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On 13/01/21 16:36, Paul E. McKenney wrote:
> On Thu, Jan 14, 2021 at 12:15:24AM +0000, Valentin Schneider wrote:
>> On 13/01/21 14:02, Paul E. McKenney wrote:
>> Thanks for giving it a spin! I think with the current series (either
>> Lai's or Peter's) sched_cpu_dying() should go smoothly, but you never
>> know.
>
> I was running the patch set having one of Lai's and three of Peter's,
> which sounds like Peter's.

That's how I was seeing it :)

> If I understand which series is which,
> Peter's has the advantage of not requiring rcutorture changes.  ;-)
>
>> > However, it did produce the following new-to-me splat, which will
>> > hopefully be of some help.
>> >
>> >                                                       Thanx, Paul
>> >
>> > ------------------------------------------------------------------------
>> >
>> > WARNING: CPU: 2 PID: 23 at kernel/kthread.c:508 kthread_set_per_cpu+0x3b/0x50
>>
>> Aha, so that's that warning I was expecting to see [1].
>> Did you also get the process_one_work() one?
>
> Yes.  Of 112 one-hour runs, there were five process_one_work() splats
> and two kthread_set_per_cpu() splats.  Each splat-ridden run had exactly
> one splat.
>

I was expecting to see both in one run, so am still somewhat confused.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-14 10:37       ` Valentin Schneider
@ 2021-01-14 15:22         ` Paul E. McKenney
  2021-01-14 15:52           ` Valentin Schneider
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-14 15:22 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On Thu, Jan 14, 2021 at 10:37:35AM +0000, Valentin Schneider wrote:
> On 13/01/21 16:36, Paul E. McKenney wrote:
> > On Thu, Jan 14, 2021 at 12:15:24AM +0000, Valentin Schneider wrote:
> >> On 13/01/21 14:02, Paul E. McKenney wrote:
> >> Thanks for giving it a spin! I think with the current series (either
> >> Lai's or Peter's) sched_cpu_dying() should go smoothly, but you never
> >> know.
> >
> > I was running the patch set having one of Lai's and three of Peter's,
> > which sounds like Peter's.
> 
> That's how I was seeing it :)

If someone can identify Lai's series to me, I would be happy to give it
a try as well.  All I see are workqueue-specific patches and patches
contributing to the discussion of possible fixes to the splats from
Peter's series.  (I figured that I would wait for the discussion to
converge a bit.)

> > If I understand which series is which,
> > Peter's has the advantage of not requiring rcutorture changes.  ;-)
> >
> >> > However, it did produce the following new-to-me splat, which will
> >> > hopefully be of some help.
> >> >
> >> >                                                       Thanx, Paul
> >> >
> >> > ------------------------------------------------------------------------
> >> >
> >> > WARNING: CPU: 2 PID: 23 at kernel/kthread.c:508 kthread_set_per_cpu+0x3b/0x50
> >>
> >> Aha, so that's that warning I was expecting to see [1].
> >> Did you also get the process_one_work() one?
> >
> > Yes.  Of 112 one-hour runs, there were five process_one_work() splats
> > and two kthread_set_per_cpu() splats.  Each splat-ridden run had exactly
> > one splat.
> 
> I was expecting to see both in one run, so am still somewhat confused.

Well, if we weren't confused in some way or another, the bug would not
exist, so I will count becoming aware of confusion as a step forward.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-14 15:22         ` Paul E. McKenney
@ 2021-01-14 15:52           ` Valentin Schneider
  2021-01-14 17:13             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2021-01-14 15:52 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On 14/01/21 07:22, Paul E. McKenney wrote:
> If someone can identify Lai's series to me, I would be happy to give it
> a try as well.  All I see are workqueue-specific patches and patches
> contributing to the discussion of possible fixes to the splats from
> Peter's series.  (I figured that I would wait for the discussion to
> converge a bit.)
>

I was referring to

http://lore.kernel.org/r/20210111152638.2417-1-jiangshanlai@gmail.com

which I believe you already tested earlier version of.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-14 15:52           ` Valentin Schneider
@ 2021-01-14 17:13             ` Paul E. McKenney
  2021-01-15 16:54               ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-14 17:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On Thu, Jan 14, 2021 at 03:52:56PM +0000, Valentin Schneider wrote:
> On 14/01/21 07:22, Paul E. McKenney wrote:
> > If someone can identify Lai's series to me, I would be happy to give it
> > a try as well.  All I see are workqueue-specific patches and patches
> > contributing to the discussion of possible fixes to the splats from
> > Peter's series.  (I figured that I would wait for the discussion to
> > converge a bit.)
> >
> 
> I was referring to
> 
> http://lore.kernel.org/r/20210111152638.2417-1-jiangshanlai@gmail.com
> 
> which I believe you already tested earlier version of.

Indeed I did, thank you for the clarification.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-14 17:13             ` Paul E. McKenney
@ 2021-01-15 16:54               ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2021-01-15 16:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, mingo, tglx, jiangshanlai, cai,
	vincent.donnefort, decui, vincent.guittot, rostedt, tj

On Thu, Jan 14, 2021 at 09:13:17AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 14, 2021 at 03:52:56PM +0000, Valentin Schneider wrote:
> > On 14/01/21 07:22, Paul E. McKenney wrote:
> > > If someone can identify Lai's series to me, I would be happy to give it
> > > a try as well.  All I see are workqueue-specific patches and patches
> > > contributing to the discussion of possible fixes to the splats from
> > > Peter's series.  (I figured that I would wait for the discussion to
> > > converge a bit.)
> > 
> > I was referring to
> > 
> > http://lore.kernel.org/r/20210111152638.2417-1-jiangshanlai@gmail.com
> > 
> > which I believe you already tested earlier version of.
> 
> Indeed I did, thank you for the clarification.

And there was only one run that failed yesterday evening, but it managed
to get both warnings.  Please see below, timestamps included this time.
My guess is that the BUG...preemptible warnings are a consequence of
the prior warning, but I am including them anyway.

							Thanx, Paul

------------------------------------------------------------------------

[ 4166.692555] WARNING: CPU: 1 PID: 17 at kernel/kthread.c:508 kthread_set_per_cpu+0x3b/0x50
[ 4166.693913] Modules linked in:
[ 4166.694367] CPU: 1 PID: 17 Comm: cpuhp/1 Not tainted 5.11.0-rc3+ #1222
[ 4166.695351] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[ 4166.696746] RIP: 0010:kthread_set_per_cpu+0x3b/0x50
[ 4166.697680] Code: 00 48 85 c0 74 1f 40 84 f6 74 16 81 e2 00 00 00 04 74 1b 83 bf a0 03 00 00 01 75 0e f0 80 08 01 c3 f0 80 20 fe c3 0f 0b eb d0 <0f> 0b eb ee 0f 0b eb e1 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f6
[ 4166.700957] RSP: 0000:ffff98ea000cfdf8 EFLAGS: 00010202
[ 4166.702152] RAX: ffff966e41869b00 RBX: ffff966e42a9f480 RCX: 0000000000000000
[ 4166.703319] RDX: 0000000004000000 RSI: 0000000000000001 RDI: ffff966e43049780
[ 4166.704475] RBP: ffff966e5f46adc0 R08: 0000000000000001 R09: 0000000000000001
[ 4166.705557] R10: 0000000000000004 R11: 0000000000000000 R12: ffff966e5f46b0e8
[ 4166.706604] R13: 0000000000000001 R14: ffffffffb3868c40 R15: 00000000ffffffff
[ 4166.707749] FS:  0000000000000000(0000) GS:ffff966e5f440000(0000) knlGS:0000000000000000
[ 4166.709119] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4166.710337] CR2: 0000000000000000 CR3: 0000000010a22000 CR4: 00000000000006e0
[ 4166.711586] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4166.712639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4166.713769] Call Trace:
[ 4166.714143]  workqueue_online_cpu+0x19c/0x360
[ 4166.714789]  ? workqueue_prepare_cpu+0x70/0x70
[ 4166.715434]  cpuhp_invoke_callback+0x9e/0x890
[ 4166.716098]  cpuhp_thread_fun+0x199/0x230
[ 4166.716683]  ? sort_range+0x20/0x20
[ 4166.717192]  smpboot_thread_fn+0x193/0x230
[ 4166.717832]  kthread+0x13b/0x160
[ 4166.718296]  ? kthread_insert_work_sanity_check+0x50/0x50
[ 4166.719356]  ret_from_fork+0x22/0x30
[ 4166.720108] irq event stamp: 162011
[ 4166.720614] hardirqs last  enabled at (162021): [<ffffffffb20dafaa>] console_unlock+0x46a/0x550
[ 4166.721900] hardirqs last disabled at (162030): [<ffffffffb20daf16>] console_unlock+0x3d6/0x550
[ 4166.723157] softirqs last  enabled at (161064): [<ffffffffb3000342>] __do_softirq+0x342/0x48e
[ 4166.724450] softirqs last disabled at (161057): [<ffffffffb2e00f92>] asm_call_irq_on_stack+0x12/0x20
[ 4166.725776] ---[ end trace dc72cd4b00d9fafc ]---

...

[14094.371624] ------------[ cut here ]------------
[14094.373762] WARNING: CPU: 0 PID: 22 at kernel/workqueue.c:2194 process_one_work+0x8c/0x5f0
[14094.375067] Modules linked in:
[14094.375572] CPU: 0 PID: 22 Comm: kworker/1:1 Tainted: G        W         5.11.0-rc3+ #1222
[14094.376855] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[14094.378240] Workqueue:  0x0 (events)
[14094.378807] RIP: 0010:process_one_work+0x8c/0x5f0
[14094.379553] Code: 48 8b 46 38 41 83 e6 20 48 89 45 c0 48 8b 46 40 48 89 45 c8 41 f6 44 24 4c 04 75 10 65 8b 05 db 5d f8 4d 41 39 44 24 40 74 02 <0f> 0b 48 ba eb 83 b5 80 46 86 c8 61 48 0f af d3 48 c1 ea 3a 49 8b
[14094.382431] RSP: 0018:ffff98ea000fbe70 EFLAGS: 00010002
[14094.383234] RAX: 0000000000000000 RBX: ffff966e5f4727e0 RCX: 2b970af959bb2a7d
[14094.384445] RDX: ffff966e5f4727e8 RSI: ffff966e5f4727e0 RDI: ffff966e41202000
[14094.385743] RBP: ffff98ea000fbed0 R08: 0000000000000001 R09: ffff966e411e4680
[14094.387526] R10: 0000000000000000 R11: 0000000000000000 R12: ffff966e5f46adc0
[14094.388927] R13: ffff966e5f46f700 R14: 0000000000000000 R15: ffff966e41202000
[14094.390043] FS:  0000000000000000(0000) GS:ffff966e5f400000(0000) knlGS:0000000000000000
[14094.391276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14094.392151] CR2: 0000000000000000 CR3: 000000000253c000 CR4: 00000000000006f0
[14094.393250] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[14094.394354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[14094.395447] Call Trace:
[14094.395837]  ? process_one_work+0x5f0/0x5f0
[14094.396521]  worker_thread+0x28/0x3c0
[14094.397106]  ? process_one_work+0x5f0/0x5f0
[14094.397782]  kthread+0x13b/0x160
[14094.398302]  ? kthread_insert_work_sanity_check+0x50/0x50
[14094.399268]  ret_from_fork+0x22/0x30
[14094.399824] irq event stamp: 10439696
[14094.400409] hardirqs last  enabled at (10439695): [<ffffffffb2ca91df>] _raw_spin_unlock_irq+0x1f/0x40
[14094.402463] hardirqs last disabled at (10439696): [<ffffffffb2ca8fc1>] _raw_spin_lock_irq+0x41/0x50
[14094.403837] softirqs last  enabled at (10438912): [<ffffffffb20ece97>] srcu_invoke_callbacks+0xe7/0x1a0
[14094.405255] softirqs last disabled at (10438910): [<ffffffffb20ece97>] srcu_invoke_callbacks+0xe7/0x1a0
[14094.406683] ---[ end trace dc72cd4b00d9fafd ]---
[14094.407896] kvm-guest: stealtime: cpu 1, msr 1f457b00
[14094.587644] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/1:1/22
[14094.589432] caller is refresh_cpu_vm_stats+0x1a6/0x320
[14094.591054] CPU: 0 PID: 22 Comm: kworker/1:1 Tainted: G        W         5.11.0-rc3+ #1222
[14094.592319] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[14094.593597] Workqueue: mm_percpu_wq vmstat_update
[14094.594318] Call Trace:
[14094.594709]  dump_stack+0x77/0x97
[14094.595234]  check_preemption_disabled+0xb6/0xd0
[14094.595946]  refresh_cpu_vm_stats+0x1a6/0x320
[14094.596645]  vmstat_update+0xe/0x60
[14094.597198]  process_one_work+0x2a0/0x5f0
[14094.597818]  ? process_one_work+0x5f0/0x5f0
[14094.598466]  worker_thread+0x28/0x3c0
[14094.599032]  ? process_one_work+0x5f0/0x5f0
[14094.599697]  kthread+0x13b/0x160
[14094.600281]  ? kthread_insert_work_sanity_check+0x50/0x50
[14094.601244]  ret_from_fork+0x22/0x30
[14094.601839] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/1:1/22
[14094.603102] caller is refresh_cpu_vm_stats+0x1a6/0x320
[14094.603917] CPU: 0 PID: 22 Comm: kworker/1:1 Tainted: G        W         5.11.0-rc3+ #1222
[14094.605190] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[14094.606499] Workqueue: mm_percpu_wq vmstat_update
[14094.607226] Call Trace:
[14094.607744]  dump_stack+0x77/0x97
[14094.608433]  check_preemption_disabled+0xb6/0xd0
[14094.609390]  refresh_cpu_vm_stats+0x1a6/0x320
[14094.610090]  vmstat_update+0xe/0x60
[14094.610637]  process_one_work+0x2a0/0x5f0
[14094.611270]  ? process_one_work+0x5f0/0x5f0
[14094.611933]  worker_thread+0x28/0x3c0
[14094.612509]  ? process_one_work+0x5f0/0x5f0
[14094.613169]  kthread+0x13b/0x160
[14094.613653]  ? kthread_insert_work_sanity_check+0x50/0x50
[14094.614486]  ret_from_fork+0x22/0x30
[14095.424619] smpboot: Booting Node 0 Processor 7 APIC 0x7
[14095.445528] kvm-clock: cpu 7, msr 112b31c1, secondary cpu clock
[14095.485902] kvm-guest: stealtime: cpu 7, msr 1f5d7b00
[14095.610598] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/1:1/22
[14095.612427] caller is refresh_cpu_vm_stats+0x1a6/0x320
[14095.613568] CPU: 6 PID: 22 Comm: kworker/1:1 Tainted: G        W         5.11.0-rc3+ #1222
[14095.615315] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[14095.617108] Workqueue: mm_percpu_wq vmstat_update
[14095.618094] Call Trace:
[14095.618615]  dump_stack+0x77/0x97
[14095.619587]  check_preemption_disabled+0xb6/0xd0
[14095.620667]  refresh_cpu_vm_stats+0x1a6/0x320
[14095.621690]  vmstat_update+0xe/0x60
[14095.622452]  process_one_work+0x2a0/0x5f0
[14095.623340]  ? process_one_work+0x5f0/0x5f0
[14095.624485]  worker_thread+0x28/0x3c0
[14095.625354]  ? process_one_work+0x5f0/0x5f0
[14095.626305]  kthread+0x13b/0x160
[14095.627059]  ? kthread_insert_work_sanity_check+0x50/0x50
[14095.628228]  ret_from_fork+0x22/0x30
[14095.629066] BUG: using __this_cpu_read() in preemptible [00000000] code: kworker/1:1/22
[14095.630865] caller is refresh_cpu_vm_stats+0x1a6/0x320
[14095.631799] CPU: 6 PID: 22 Comm: kworker/1:1 Tainted: G        W         5.11.0-rc3+ #1222
[14095.633549] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-2.el7 04/01/2014
[14095.635359] Workqueue: mm_percpu_wq vmstat_update
[14095.636370] Call Trace:
[14095.636885]  dump_stack+0x77/0x97
[14095.637613]  check_preemption_disabled+0xb6/0xd0
[14095.638595]  refresh_cpu_vm_stats+0x1a6/0x320
[14095.639565]  vmstat_update+0xe/0x60
[14095.640310]  process_one_work+0x2a0/0x5f0
[14095.641178]  ? process_one_work+0x5f0/0x5f0
[14095.642091]  worker_thread+0x28/0x3c0
[14095.643025]  ? process_one_work+0x5f0/0x5f0
[14095.643929]  kthread+0x13b/0x160
[14095.644627]  ? kthread_insert_work_sanity_check+0x50/0x50
[14095.645790]  ret_from_fork+0x22/0x30

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: sched/urgent] sched/core: Print out straggler tasks in sched_cpu_dying()
  2021-01-13 18:31 [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying() Valentin Schneider
  2021-01-13 22:02 ` Paul E. McKenney
@ 2021-01-22 17:41 ` tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2021-01-22 17:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     36c6e17bf16922935a5a0dd073d5b032d34aa73d
Gitweb:        https://git.kernel.org/tip/36c6e17bf16922935a5a0dd073d5b032d34aa73d
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 13 Jan 2021 18:31:41 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 22 Jan 2021 15:09:41 +01:00

sched/core: Print out straggler tasks in sched_cpu_dying()

Since commit

  1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")

tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.

Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.

Print the dying rq's tasks to shed some light on the stragglers.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210113183141.11974-1-valentin.schneider@arm.com
---
 kernel/sched/core.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 15d2562..627534f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7574,6 +7574,25 @@ static void calc_load_migrate(struct rq *rq)
 		atomic_long_add(delta, &calc_load_tasks);
 }
 
+static void dump_rq_tasks(struct rq *rq, const char *loglvl)
+{
+	struct task_struct *g, *p;
+	int cpu = cpu_of(rq);
+
+	lockdep_assert_held(&rq->lock);
+
+	printk("%sCPU%d enqueued tasks (%u total):\n", loglvl, cpu, rq->nr_running);
+	for_each_process_thread(g, p) {
+		if (task_cpu(p) != cpu)
+			continue;
+
+		if (!task_on_rq_queued(p))
+			continue;
+
+		printk("%s\tpid: %d, name: %s\n", loglvl, p->pid, p->comm);
+	}
+}
+
 int sched_cpu_dying(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7583,7 +7602,10 @@ int sched_cpu_dying(unsigned int cpu)
 	sched_tick_stop(cpu);
 
 	rq_lock_irqsave(rq, &rf);
-	BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq));
+	if (rq->nr_running != 1 || rq_has_pinned_tasks(rq)) {
+		WARN(true, "Dying CPU not properly vacated!");
+		dump_rq_tasks(rq, KERN_WARNING);
+	}
 	rq_unlock_irqrestore(rq, &rf);
 
 	calc_load_migrate(rq);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-22 17:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 18:31 [PATCH] sched/core: Print out straggler tasks in sched_cpu_dying() Valentin Schneider
2021-01-13 22:02 ` Paul E. McKenney
2021-01-14  0:15   ` Valentin Schneider
2021-01-14  0:36     ` Paul E. McKenney
2021-01-14 10:37       ` Valentin Schneider
2021-01-14 15:22         ` Paul E. McKenney
2021-01-14 15:52           ` Valentin Schneider
2021-01-14 17:13             ` Paul E. McKenney
2021-01-15 16:54               ` Paul E. McKenney
2021-01-22 17:41 ` [tip: sched/urgent] " tip-bot2 for Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.