All of lore.kernel.org
 help / color / mirror / Atom feed
* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-02 19:27 ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-02 19:27 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: peterz, efault, tglx, mingo, alexander.levin

Hi all,

While fuzzing arm64 v4.15 with Syzkaller, I intermittently hit spinlock
recursion splats, around one or two a day. This looks like the same
issue Sasha hit on x86 a number of months back [1].

I've been seeing this for a few releases, but haven't had any luck
tracking it down, or generating a minimized reproducer.

I've dumped my kernel config, Syzkaller logs and reports to my
kernel.org webspace [2].

The most common case looks like:

--------
BUG: spinlock recursion on CPU#1, syz-executor1/1521
 lock: 0xffff80001a764080, .magic: dead4ead, .owner: syz-executor1/1521, .owner_cpu: 0
CPU: 1 PID: 1521 Comm: syz-executor1 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0xd0/0x120 lib/dump_stack.c:53
 spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
 spin_bug kernel/locking/spinlock_debug.c:75 [inline]
 debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
 do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
 _raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
 __task_rq_lock+0xc0/0x288 kernel/sched/core.c:103
 wake_up_new_task+0x384/0x798 kernel/sched/core.c:2465
 _do_fork+0x1b4/0xa78 kernel/fork.c:2069
 SYSC_clone kernel/fork.c:2154 [inline]
 SyS_clone+0x48/0x60 kernel/fork.c:2132
 el0_svc_naked+0x20/0x24
--------

... though I see this in other paths, e.g. one identical to Sasha's
report:

--------
BUG: spinlock recursion on CPU#0, kworker/u9:4/7741
 lock: 0xffff80001a783080, .magic: dead4ead, .owner: kworker/u9:4/7741, .owner_cpu: 1
CPU: 0 PID: 7741 Comm: kworker/u9:4 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound call_usermodehelper_exec_work
Call trace:
 dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0xd0/0x120 lib/dump_stack.c:53
 spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
 spin_bug kernel/locking/spinlock_debug.c:75 [inline]
 debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
 do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
 _raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
 rq_lock kernel/sched/sched.h:1766 [inline]
 ttwu_queue kernel/sched/core.c:1863 [inline]
 try_to_wake_up+0x6c0/0xa58 kernel/sched/core.c:2078
 default_wake_function+0x30/0x50 kernel/sched/core.c:3628
 __wake_up_common+0x128/0x470 kernel/sched/wait.c:97
 __wake_up_locked+0x18/0x20 kernel/sched/wait.c:158
 complete+0x68/0x90 kernel/sched/completion.c:39
 umh_complete+0x40/0xa8 kernel/umh.c:55
 call_usermodehelper_exec_sync kernel/umh.c:152 [inline]
 call_usermodehelper_exec_work+0x160/0x240 kernel/umh.c:175
 process_one_work+0x590/0xe90 kernel/workqueue.c:2113
 worker_thread+0x3b0/0xd30 kernel/workqueue.c:2247
 kthread+0x2a4/0x378 kernel/kthread.c:238
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:994
--------

... in some cases, owner_cpu is -1, so I guess we're racing with an
unlock. I only ever see this on the runqueue locks in wake up functions.

I'm at a loss as to how we can see a stale owner value that matches
current. If current held the lock in the past, it would have stored -1
to owner first, and at least this should be visible.

Any ideas?

My best guess so far is that this is happening when a task_struct gets
freed and reallocated, and we're somehow missing a release-acquire
relationship. i.e. task A releases the lock on cpu X, dies, then A's
task_struct gets reallocated for task B cpu Y, without all of A's stores
on cpu X having been made visible to B on cpu Y.

>From a scan of the kmem_cache code, that seems unlikely, though. AFAICT
free and allocation have to occur on the same CPU, so that ordering
should be given by the combination of local CPU order and whatever
barriers switch_to() has...

Maybe it's possible to detect that by hacking a task generation number
into each task struct, and embedding this into the lock's owner field. I
can try hacking that up next week.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20170604070911.5sx66m5xnoh2amyd@sasha-lappy
[2] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20180202-rq-spinlock-recursion/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-02 19:27 ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-02 19:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

While fuzzing arm64 v4.15 with Syzkaller, I intermittently hit spinlock
recursion splats, around one or two a day. This looks like the same
issue Sasha hit on x86 a number of months back [1].

I've been seeing this for a few releases, but haven't had any luck
tracking it down, or generating a minimized reproducer.

I've dumped my kernel config, Syzkaller logs and reports to my
kernel.org webspace [2].

The most common case looks like:

--------
BUG: spinlock recursion on CPU#1, syz-executor1/1521
 lock: 0xffff80001a764080, .magic: dead4ead, .owner: syz-executor1/1521, .owner_cpu: 0
CPU: 1 PID: 1521 Comm: syz-executor1 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0xd0/0x120 lib/dump_stack.c:53
 spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
 spin_bug kernel/locking/spinlock_debug.c:75 [inline]
 debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
 do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
 _raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
 __task_rq_lock+0xc0/0x288 kernel/sched/core.c:103
 wake_up_new_task+0x384/0x798 kernel/sched/core.c:2465
 _do_fork+0x1b4/0xa78 kernel/fork.c:2069
 SYSC_clone kernel/fork.c:2154 [inline]
 SyS_clone+0x48/0x60 kernel/fork.c:2132
 el0_svc_naked+0x20/0x24
--------

... though I see this in other paths, e.g. one identical to Sasha's
report:

--------
BUG: spinlock recursion on CPU#0, kworker/u9:4/7741
 lock: 0xffff80001a783080, .magic: dead4ead, .owner: kworker/u9:4/7741, .owner_cpu: 1
CPU: 0 PID: 7741 Comm: kworker/u9:4 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Workqueue: events_unbound call_usermodehelper_exec_work
Call trace:
 dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0xd0/0x120 lib/dump_stack.c:53
 spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
 spin_bug kernel/locking/spinlock_debug.c:75 [inline]
 debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
 do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
 _raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
 rq_lock kernel/sched/sched.h:1766 [inline]
 ttwu_queue kernel/sched/core.c:1863 [inline]
 try_to_wake_up+0x6c0/0xa58 kernel/sched/core.c:2078
 default_wake_function+0x30/0x50 kernel/sched/core.c:3628
 __wake_up_common+0x128/0x470 kernel/sched/wait.c:97
 __wake_up_locked+0x18/0x20 kernel/sched/wait.c:158
 complete+0x68/0x90 kernel/sched/completion.c:39
 umh_complete+0x40/0xa8 kernel/umh.c:55
 call_usermodehelper_exec_sync kernel/umh.c:152 [inline]
 call_usermodehelper_exec_work+0x160/0x240 kernel/umh.c:175
 process_one_work+0x590/0xe90 kernel/workqueue.c:2113
 worker_thread+0x3b0/0xd30 kernel/workqueue.c:2247
 kthread+0x2a4/0x378 kernel/kthread.c:238
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:994
--------

... in some cases, owner_cpu is -1, so I guess we're racing with an
unlock. I only ever see this on the runqueue locks in wake up functions.

I'm at a loss as to how we can see a stale owner value that matches
current. If current held the lock in the past, it would have stored -1
to owner first, and at least this should be visible.

Any ideas?

My best guess so far is that this is happening when a task_struct gets
freed and reallocated, and we're somehow missing a release-acquire
relationship. i.e. task A releases the lock on cpu X, dies, then A's
task_struct gets reallocated for task B cpu Y, without all of A's stores
on cpu X having been made visible to B on cpu Y.

>From a scan of the kmem_cache code, that seems unlikely, though. AFAICT
free and allocation have to occur on the same CPU, so that ordering
should be given by the combination of local CPU order and whatever
barriers switch_to() has...

Maybe it's possible to detect that by hacking a task generation number
into each task struct, and embedding this into the lock's owner field. I
can try hacking that up next week.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20170604070911.5sx66m5xnoh2amyd at sasha-lappy
[2] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20180202-rq-spinlock-recursion/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runqueue spinlock recursion on arm64 v4.15
  2018-02-02 19:27 ` Mark Rutland
@ 2018-02-02 19:55   ` Peter Zijlstra
  -1 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-02-02 19:55 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, linux-arm-kernel, efault, tglx, mingo, alexander.levin

On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> ... in some cases, owner_cpu is -1, so I guess we're racing with an
> unlock. I only ever see this on the runqueue locks in wake up functions.

So runqueue locks are special in that the owner changes over a contex
switch, maybe something goes funny there?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-02 19:55   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-02-02 19:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> ... in some cases, owner_cpu is -1, so I guess we're racing with an
> unlock. I only ever see this on the runqueue locks in wake up functions.

So runqueue locks are special in that the owner changes over a contex
switch, maybe something goes funny there?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runqueue spinlock recursion on arm64 v4.15
  2018-02-02 19:55   ` Peter Zijlstra
@ 2018-02-02 22:07     ` Mark Rutland
  -1 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-02 22:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arm-kernel, efault, tglx, mingo, alexander.levin

On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > unlock. I only ever see this on the runqueue locks in wake up functions.
> 
> So runqueue locks are special in that the owner changes over a contex
> switch, maybe something goes funny there?

Aha! I think that's it!

In finish_lock_switch() we do:

	smp_store_release(&prev->on_cpu, 0);
	...
	rq->lock.owner = current;

As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
another task on that rq).

I guess the below (completely untested) would fix that. I'll try to give it a
go next week.

Thanks,
Mark.

---->8----
>From 3ef7e7466d09d2270d2a353d032ff0f9bc0488a7 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Fri, 2 Feb 2018 21:56:17 +0000
Subject: [PATCH] sched/core: avoid spurious spinlock recursion splats

The runqueue locks are special in that the owner changes over a context switch.
To ensure that this is account for in CONFIG_DEBUG_SPINLOCK builds,
finish_lock_switch updates rq->lock.owner while the lock is held.

However, this happens *after* prev->on_cpu is cleared, which allows prev to be
scheduled on another CPU. If prev then attempts to acquire the same rq lock, it
will see itself as the owner.

This can be seen in virtual environments, where the vCPU can be preempted for
an arbitrarily long period between updating prev->on_cpu and rq->lock.owner.

We can avoid this issue by updating rq->lock.owner first. The release of
prev->on_cpu will ensure that the new owner is visible to prev if it is
scheduled on another CPU.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/sched.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b19552a212de..4f0d2e3701c3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1342,6 +1342,10 @@ static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_DEBUG_SPINLOCK
+	/* this is a valid case when another task releases the spinlock */
+	rq->lock.owner = current;
+#endif
 #ifdef CONFIG_SMP
 	/*
 	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
@@ -1355,10 +1359,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 	 */
 	smp_store_release(&prev->on_cpu, 0);
 #endif
-#ifdef CONFIG_DEBUG_SPINLOCK
-	/* this is a valid case when another task releases the spinlock */
-	rq->lock.owner = current;
-#endif
 	/*
 	 * If we are tracking spinlock dependencies then we have to
 	 * fix up the runqueue lock - which gets 'carried over' from
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-02 22:07     ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-02 22:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > unlock. I only ever see this on the runqueue locks in wake up functions.
> 
> So runqueue locks are special in that the owner changes over a contex
> switch, maybe something goes funny there?

Aha! I think that's it!

In finish_lock_switch() we do:

	smp_store_release(&prev->on_cpu, 0);
	...
	rq->lock.owner = current;

As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
another task on that rq).

I guess the below (completely untested) would fix that. I'll try to give it a
go next week.

Thanks,
Mark.

---->8----
>From 3ef7e7466d09d2270d2a353d032ff0f9bc0488a7 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Fri, 2 Feb 2018 21:56:17 +0000
Subject: [PATCH] sched/core: avoid spurious spinlock recursion splats

The runqueue locks are special in that the owner changes over a context switch.
To ensure that this is account for in CONFIG_DEBUG_SPINLOCK builds,
finish_lock_switch updates rq->lock.owner while the lock is held.

However, this happens *after* prev->on_cpu is cleared, which allows prev to be
scheduled on another CPU. If prev then attempts to acquire the same rq lock, it
will see itself as the owner.

This can be seen in virtual environments, where the vCPU can be preempted for
an arbitrarily long period between updating prev->on_cpu and rq->lock.owner.

We can avoid this issue by updating rq->lock.owner first. The release of
prev->on_cpu will ensure that the new owner is visible to prev if it is
scheduled on another CPU.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/sched.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b19552a212de..4f0d2e3701c3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1342,6 +1342,10 @@ static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_DEBUG_SPINLOCK
+	/* this is a valid case when another task releases the spinlock */
+	rq->lock.owner = current;
+#endif
 #ifdef CONFIG_SMP
 	/*
 	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
@@ -1355,10 +1359,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 	 */
 	smp_store_release(&prev->on_cpu, 0);
 #endif
-#ifdef CONFIG_DEBUG_SPINLOCK
-	/* this is a valid case when another task releases the spinlock */
-	rq->lock.owner = current;
-#endif
 	/*
 	 * If we are tracking spinlock dependencies then we have to
 	 * fix up the runqueue lock - which gets 'carried over' from
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Runqueue spinlock recursion on arm64 v4.15
  2018-02-02 22:07     ` Mark Rutland
@ 2018-02-05 13:36       ` Mark Rutland
  -1 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-05 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: efault, linux-kernel, alexander.levin, tglx, mingo, linux-arm-kernel

On Fri, Feb 02, 2018 at 10:07:26PM +0000, Mark Rutland wrote:
> On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > > unlock. I only ever see this on the runqueue locks in wake up functions.
> > 
> > So runqueue locks are special in that the owner changes over a contex
> > switch, maybe something goes funny there?
> 
> Aha! I think that's it!
> 
> In finish_lock_switch() we do:
> 
> 	smp_store_release(&prev->on_cpu, 0);
> 	...
> 	rq->lock.owner = current;
> 
> As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
> can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
> another task on that rq).

I hacked in a forced vCPU preemption between the two using a sled of WFE
instructions, and now I can trigger the problem in seconds rather than
hours.

With the patch below applied, things seem to fine so far.

So I'm pretty sure this is it. I'll clean up the patch text and resend
that in a bit.

Thanks,
Mark.

> I guess the below (completely untested) would fix that. I'll try to give it a
> go next week.
> 
> Thanks,
> Mark.
> 
> ---->8----
> From 3ef7e7466d09d2270d2a353d032ff0f9bc0488a7 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Fri, 2 Feb 2018 21:56:17 +0000
> Subject: [PATCH] sched/core: avoid spurious spinlock recursion splats
> 
> The runqueue locks are special in that the owner changes over a context switch.
> To ensure that this is account for in CONFIG_DEBUG_SPINLOCK builds,
> finish_lock_switch updates rq->lock.owner while the lock is held.
> 
> However, this happens *after* prev->on_cpu is cleared, which allows prev to be
> scheduled on another CPU. If prev then attempts to acquire the same rq lock, it
> will see itself as the owner.
> 
> This can be seen in virtual environments, where the vCPU can be preempted for
> an arbitrarily long period between updating prev->on_cpu and rq->lock.owner.
> 
> We can avoid this issue by updating rq->lock.owner first. The release of
> prev->on_cpu will ensure that the new owner is visible to prev if it is
> scheduled on another CPU.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/sched.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index b19552a212de..4f0d2e3701c3 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1342,6 +1342,10 @@ static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
>  
>  static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  {
> +#ifdef CONFIG_DEBUG_SPINLOCK
> +	/* this is a valid case when another task releases the spinlock */
> +	rq->lock.owner = current;
> +#endif
>  #ifdef CONFIG_SMP
>  	/*
>  	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
> @@ -1355,10 +1359,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  	 */
>  	smp_store_release(&prev->on_cpu, 0);
>  #endif
> -#ifdef CONFIG_DEBUG_SPINLOCK
> -	/* this is a valid case when another task releases the spinlock */
> -	rq->lock.owner = current;
> -#endif
>  	/*
>  	 * If we are tracking spinlock dependencies then we have to
>  	 * fix up the runqueue lock - which gets 'carried over' from
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-05 13:36       ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-05 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 02, 2018 at 10:07:26PM +0000, Mark Rutland wrote:
> On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > > unlock. I only ever see this on the runqueue locks in wake up functions.
> > 
> > So runqueue locks are special in that the owner changes over a contex
> > switch, maybe something goes funny there?
> 
> Aha! I think that's it!
> 
> In finish_lock_switch() we do:
> 
> 	smp_store_release(&prev->on_cpu, 0);
> 	...
> 	rq->lock.owner = current;
> 
> As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
> can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
> another task on that rq).

I hacked in a forced vCPU preemption between the two using a sled of WFE
instructions, and now I can trigger the problem in seconds rather than
hours.

With the patch below applied, things seem to fine so far.

So I'm pretty sure this is it. I'll clean up the patch text and resend
that in a bit.

Thanks,
Mark.

> I guess the below (completely untested) would fix that. I'll try to give it a
> go next week.
> 
> Thanks,
> Mark.
> 
> ---->8----
> From 3ef7e7466d09d2270d2a353d032ff0f9bc0488a7 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <mark.rutland@arm.com>
> Date: Fri, 2 Feb 2018 21:56:17 +0000
> Subject: [PATCH] sched/core: avoid spurious spinlock recursion splats
> 
> The runqueue locks are special in that the owner changes over a context switch.
> To ensure that this is account for in CONFIG_DEBUG_SPINLOCK builds,
> finish_lock_switch updates rq->lock.owner while the lock is held.
> 
> However, this happens *after* prev->on_cpu is cleared, which allows prev to be
> scheduled on another CPU. If prev then attempts to acquire the same rq lock, it
> will see itself as the owner.
> 
> This can be seen in virtual environments, where the vCPU can be preempted for
> an arbitrarily long period between updating prev->on_cpu and rq->lock.owner.
> 
> We can avoid this issue by updating rq->lock.owner first. The release of
> prev->on_cpu will ensure that the new owner is visible to prev if it is
> scheduled on another CPU.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/sched.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index b19552a212de..4f0d2e3701c3 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1342,6 +1342,10 @@ static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
>  
>  static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  {
> +#ifdef CONFIG_DEBUG_SPINLOCK
> +	/* this is a valid case when another task releases the spinlock */
> +	rq->lock.owner = current;
> +#endif
>  #ifdef CONFIG_SMP
>  	/*
>  	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
> @@ -1355,10 +1359,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  	 */
>  	smp_store_release(&prev->on_cpu, 0);
>  #endif
> -#ifdef CONFIG_DEBUG_SPINLOCK
> -	/* this is a valid case when another task releases the spinlock */
> -	rq->lock.owner = current;
> -#endif
>  	/*
>  	 * If we are tracking spinlock dependencies then we have to
>  	 * fix up the runqueue lock - which gets 'carried over' from
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runqueue spinlock recursion on arm64 v4.15
  2018-02-05 13:36       ` Mark Rutland
@ 2018-02-05 14:02         ` Peter Zijlstra
  -1 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-02-05 14:02 UTC (permalink / raw)
  To: Mark Rutland
  Cc: efault, linux-kernel, alexander.levin, tglx, mingo, linux-arm-kernel

On Mon, Feb 05, 2018 at 01:36:00PM +0000, Mark Rutland wrote:
> On Fri, Feb 02, 2018 at 10:07:26PM +0000, Mark Rutland wrote:
> > On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > > > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > > > unlock. I only ever see this on the runqueue locks in wake up functions.
> > > 
> > > So runqueue locks are special in that the owner changes over a contex
> > > switch, maybe something goes funny there?
> > 
> > Aha! I think that's it!
> > 
> > In finish_lock_switch() we do:
> > 
> > 	smp_store_release(&prev->on_cpu, 0);
> > 	...
> > 	rq->lock.owner = current;
> > 
> > As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
> > can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
> > another task on that rq).
> 
> I hacked in a forced vCPU preemption between the two using a sled of WFE
> instructions, and now I can trigger the problem in seconds rather than
> hours.
> 
> With the patch below applied, things seem to fine so far.
> 
> So I'm pretty sure this is it. I'll clean up the patch text and resend
> that in a bit.

Also try and send it against an up-to-date scheduler tree, we just
moved some stuff around just about there.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-05 14:02         ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-02-05 14:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 05, 2018 at 01:36:00PM +0000, Mark Rutland wrote:
> On Fri, Feb 02, 2018 at 10:07:26PM +0000, Mark Rutland wrote:
> > On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > > > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > > > unlock. I only ever see this on the runqueue locks in wake up functions.
> > > 
> > > So runqueue locks are special in that the owner changes over a contex
> > > switch, maybe something goes funny there?
> > 
> > Aha! I think that's it!
> > 
> > In finish_lock_switch() we do:
> > 
> > 	smp_store_release(&prev->on_cpu, 0);
> > 	...
> > 	rq->lock.owner = current;
> > 
> > As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
> > can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
> > another task on that rq).
> 
> I hacked in a forced vCPU preemption between the two using a sled of WFE
> instructions, and now I can trigger the problem in seconds rather than
> hours.
> 
> With the patch below applied, things seem to fine so far.
> 
> So I'm pretty sure this is it. I'll clean up the patch text and resend
> that in a bit.

Also try and send it against an up-to-date scheduler tree, we just
moved some stuff around just about there.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Runqueue spinlock recursion on arm64 v4.15
  2018-02-05 14:02         ` Peter Zijlstra
@ 2018-02-05 14:15           ` Mark Rutland
  -1 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-05 14:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: efault, linux-kernel, alexander.levin, tglx, mingo, linux-arm-kernel

On Mon, Feb 05, 2018 at 03:02:01PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 05, 2018 at 01:36:00PM +0000, Mark Rutland wrote:
> > On Fri, Feb 02, 2018 at 10:07:26PM +0000, Mark Rutland wrote:
> > > On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > > > > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > > > > unlock. I only ever see this on the runqueue locks in wake up functions.
> > > > 
> > > > So runqueue locks are special in that the owner changes over a contex
> > > > switch, maybe something goes funny there?
> > > 
> > > Aha! I think that's it!
> > > 
> > > In finish_lock_switch() we do:
> > > 
> > > 	smp_store_release(&prev->on_cpu, 0);
> > > 	...
> > > 	rq->lock.owner = current;
> > > 
> > > As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
> > > can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
> > > another task on that rq).
> > 
> > I hacked in a forced vCPU preemption between the two using a sled of WFE
> > instructions, and now I can trigger the problem in seconds rather than
> > hours.
> > 
> > With the patch below applied, things seem to fine so far.
> > 
> > So I'm pretty sure this is it. I'll clean up the patch text and resend
> > that in a bit.
> 
> Also try and send it against an up-to-date scheduler tree, we just
> moved some stuff around just about there.

Ah, will do. I guess I should base on TIP sched/urgent?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Runqueue spinlock recursion on arm64 v4.15
@ 2018-02-05 14:15           ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2018-02-05 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 05, 2018 at 03:02:01PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 05, 2018 at 01:36:00PM +0000, Mark Rutland wrote:
> > On Fri, Feb 02, 2018 at 10:07:26PM +0000, Mark Rutland wrote:
> > > On Fri, Feb 02, 2018 at 08:55:06PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 02, 2018 at 07:27:04PM +0000, Mark Rutland wrote:
> > > > > ... in some cases, owner_cpu is -1, so I guess we're racing with an
> > > > > unlock. I only ever see this on the runqueue locks in wake up functions.
> > > > 
> > > > So runqueue locks are special in that the owner changes over a contex
> > > > switch, maybe something goes funny there?
> > > 
> > > Aha! I think that's it!
> > > 
> > > In finish_lock_switch() we do:
> > > 
> > > 	smp_store_release(&prev->on_cpu, 0);
> > > 	...
> > > 	rq->lock.owner = current;
> > > 
> > > As soon as we update prev->on_cpu, prev can be scheduled on another CPU, and
> > > can thus see a stale value for rq->lock.owner (e.g. if it tries to wake up
> > > another task on that rq).
> > 
> > I hacked in a forced vCPU preemption between the two using a sled of WFE
> > instructions, and now I can trigger the problem in seconds rather than
> > hours.
> > 
> > With the patch below applied, things seem to fine so far.
> > 
> > So I'm pretty sure this is it. I'll clean up the patch text and resend
> > that in a bit.
> 
> Also try and send it against an up-to-date scheduler tree, we just
> moved some stuff around just about there.

Ah, will do. I guess I should base on TIP sched/urgent?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-05 14:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-02 19:27 Runqueue spinlock recursion on arm64 v4.15 Mark Rutland
2018-02-02 19:27 ` Mark Rutland
2018-02-02 19:55 ` Peter Zijlstra
2018-02-02 19:55   ` Peter Zijlstra
2018-02-02 22:07   ` Mark Rutland
2018-02-02 22:07     ` Mark Rutland
2018-02-05 13:36     ` Mark Rutland
2018-02-05 13:36       ` Mark Rutland
2018-02-05 14:02       ` Peter Zijlstra
2018-02-05 14:02         ` Peter Zijlstra
2018-02-05 14:15         ` Mark Rutland
2018-02-05 14:15           ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.