linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
@ 2015-07-10 11:32 Masami Hiramatsu
  2015-07-10 13:00 ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Masami Hiramatsu @ 2015-07-10 11:32 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel, Waiman Long, Peter Zijlstra (Intel)

Hi,

I've hit a kernel panic on Locking API testsuite on kvm-qemu.

The top commit is abf9b5f800eb13e53543ff284177efb538dc68fd.
It seems there is a bug in paravirt qspinlock implementation.

To make the configuration for reproducing this bug is very simple,
make allmodconfig && make localmodconfig.

Here is the kernel message which I've recovered from kernel logbuffer by using gdb.

-----
Locking API testsuite:
----------------------------------------------------------------------------
                                 | spin |wlock |rlock |mutex | wsem | rsem |
  --------------------------------------------------------------------------
                     A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
             A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
             A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
         A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
         A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
         A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                    double unlock:
------------[ cut here ]------------
kernel BUG at /home/mhiramat/ksrc/linux-3/kernel/locking/qspinlock_paravirt.h:137!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc1-01639-gabf9b5f #2
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
task: ffffffff81e2b700 ti: ffffffff81e00000 task.ti: ffffffff81e00000
RIP: 0010:[<ffffffff811152bc>]  [<ffffffff811152bc>] __pv_queued_spin_unlock+0xd0/0x110
RSP: 0000:ffffffff81e07e60  EFLAGS: 00010202
RAX: 0000000000000010 RBX: ffffffff828e8c50 RCX: 0000000000000100
RDX: ffff88007fe8efc0 RSI: 00000000000000ff RDI: ffffffff828e8c50
RBP: ffffffff81e07e60 R08: ffff88007fe8eec0 R09: 0000000000000100
R10: 0000000000000000 R11: 2020202020202020 R12: 0000000000000000
R13: 0000000000000001 R14: ffffffff814603b1 R15: 0000ffffffff82c8
FS:  0000000000000000(0000) GS:ffff88006ce00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffffff CR3: 0000000001e22000 CR4: 00000000000006b0
Stack:
 ffffffff81e07ec8 ffffffff81114a59 2020202020202020 0000000000000000
 0000000000000000 0000000000000000 ffffffff828e8c50 ffffffff81c8cf83
 00000000000000b8 00000000000000b8 ffffffff81117133 00000000000000b8

Call Trace:
 [<ffffffff81114a59>] __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
 [<ffffffff81117133>] ? do_raw_spin_unlock+0xfa/0x10c
 [<ffffffff817cd3f7>] _raw_spin_unlock+0x44/0x64
 [<ffffffff814603ee>] double_unlock_spin+0x3d/0x46
 [<ffffffff817c1e42>] dotest+0x85/0x16e
 [<ffffffff81471431>] locking_selftest+0x67d/0x2a80
 [<ffffffff82c9062a>] start_kernel+0x5bc/0x874
 [<ffffffff82c8fc1d>] ? set_init_arg+0xb6/0xb6
 [<ffffffff82c8f120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff82c8f625>] x86_64_start_reservations+0x46/0x4f
 [<ffffffff82c8f84c>] x86_64_start_kernel+0x21e/0x234
Code: 44 fe ca 75 62 eb 2d 48 ff c1 48 ff 05 de 73 c7 02 48 8d 14 01 48 21 f2 48 c1 e2 04 4c 01 c2 4c 39 c9 72 af 48 ff 05 d4 73 c7 02 <0f> 0b 48 ff 05 d3 73 c7 02 48 83 3d ab e8 d7 00 00 48 63 78 40
RIP  [<ffffffff811152bc>] __pv_queued_spin_unlock+0xd0/0x110
RSP <ffffffff81e07e60>
---[ end trace ffa8b6c1f29ba3a3 ]---
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception"

And here is the result of backtrace from gdb.

(gdb) bt
#0  __delay (loops=1) at /home/mhiramat/ksrc/linux-3/arch/x86/lib/delay.c:108
#1  0xffffffff8144f02a in __const_udelay (xloops=<optimized out>)
    at /home/mhiramat/ksrc/linux-3/arch/x86/lib/delay.c:122
#2  0xffffffff817b6090 in panic (fmt=<optimized out>)
    at /home/mhiramat/ksrc/linux-3/kernel/panic.c:201
#3  0xffffffff8101fecd in oops_end (flags=514,
    regs=0xffffffff81e07db8 <init_thread_union+32184>, signr=11)
    at /home/mhiramat/ksrc/linux-3/arch/x86/kernel/dumpstack.c:249
#4  0xffffffff8102039f in die (str=0xffffffff81c6b82e "invalid opcode",
    regs=0xffffffff81e07db8 <init_thread_union+32184>, err=0)
    at /home/mhiramat/ksrc/linux-3/arch/x86/kernel/dumpstack.c:316
#5  0xffffffff8101b33f in do_trap_no_signal (error_code=<optimized out>,
    regs=<optimized out>, str=<optimized out>, trapnr=<optimized out>,
    tsk=<optimized out>)
    at /home/mhiramat/ksrc/linux-3/arch/x86/kernel/traps.c:204
#6  do_trap (trapnr=<optimized out>, signr=<optimized out>,
    str=0x0 <irq_stack_union>, regs=0x0 <irq_stack_union>, error_code=1,
    info=<optimized out>)
    at /home/mhiramat/ksrc/linux-3/arch/x86/kernel/traps.c:250
#7  0xffffffff8101b8fe in do_error_trap (
    regs=0xffffffff81e07db8 <init_thread_union+32184>, error_code=0,
    str=0xffffffff81c6b82e "invalid opcode", trapnr=6, signr=4)
    at /home/mhiramat/ksrc/linux-3/arch/x86/kernel/traps.c:289
#8  0xffffffff8101bf7d in do_invalid_op (regs=<optimized out>,
    error_code=<optimized out>)
    at /home/mhiramat/ksrc/linux-3/arch/x86/kernel/traps.c:302
#9  0xffffffff817d004e in invalid_op ()
    at /home/mhiramat/ksrc/linux-3/arch/x86/entry/entry_64.S:826
#10 0x0000ffffffff82c8 in ?? ()
#11 0xffffffff814603b1 in bad_unlock_order_spin ()
    at /home/mhiramat/ksrc/linux-3/lib/locking-selftest.c:532
#12 0x0000000000000001 in irq_stack_union ()
#13 0x0000000000000000 in ?? ()

-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-10 11:32 [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137! Masami Hiramatsu
@ 2015-07-10 13:00 ` Peter Zijlstra
  2015-07-10 13:57   ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2015-07-10 13:00 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Ingo Molnar, linux-kernel, Waiman Long

On Fri, Jul 10, 2015 at 08:32:46PM +0900, Masami Hiramatsu wrote:

>                     double unlock:
> ------------[ cut here ]------------
> kernel BUG at /home/mhiramat/ksrc/linux-3/kernel/locking/qspinlock_paravirt.h:137!

> Call Trace:
>  [<ffffffff81114a59>] __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
>  [<ffffffff81117133>] ? do_raw_spin_unlock+0xfa/0x10c
>  [<ffffffff817cd3f7>] _raw_spin_unlock+0x44/0x64
>  [<ffffffff814603ee>] double_unlock_spin+0x3d/0x46

Cute, but somewhat expected. A double unlock really is a BUG and the PV
spinlock code cannot deal with it.

Do we want to make double unlock non-fatal unconditionally?

---
 kernel/locking/qspinlock_paravirt.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 04ab18151cc8..172deeaf1311 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -286,15 +286,22 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
 {
 	struct __qspinlock *l = (void *)lock;
 	struct pv_node *node;
+	u8 locked;
 
 	/*
 	 * We must not unlock if SLOW, because in that case we must first
 	 * unhash. Otherwise it would be possible to have multiple @lock
 	 * entries, which would be BAD.
 	 */
-	if (likely(cmpxchg(&l->locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL))
+	locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
+	if (likely(locked == _Q_LOCKED_VAL))
 		return;
 
+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+	if (unlikely(!locked))
+		return;
+#endif
+
 	/*
 	 * Since the above failed to release, this must be the SLOW path.
 	 * Therefore start by looking up the blocked node and unhashing it.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-10 13:00 ` Peter Zijlstra
@ 2015-07-10 13:57   ` Ingo Molnar
  2015-07-10 14:28     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2015-07-10 13:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Masami Hiramatsu, Ingo Molnar, linux-kernel, Waiman Long


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Jul 10, 2015 at 08:32:46PM +0900, Masami Hiramatsu wrote:
> 
> >                     double unlock:
> > ------------[ cut here ]------------
> > kernel BUG at /home/mhiramat/ksrc/linux-3/kernel/locking/qspinlock_paravirt.h:137!
> 
> > Call Trace:
> >  [<ffffffff81114a59>] __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
> >  [<ffffffff81117133>] ? do_raw_spin_unlock+0xfa/0x10c
> >  [<ffffffff817cd3f7>] _raw_spin_unlock+0x44/0x64
> >  [<ffffffff814603ee>] double_unlock_spin+0x3d/0x46
> 
> Cute, but somewhat expected. A double unlock really is a BUG and the PV
> spinlock code cannot deal with it.
> 
> Do we want to make double unlock non-fatal unconditionally?

No, just don't BUG() out, don't crash the system - generate a warning?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-10 13:57   ` Ingo Molnar
@ 2015-07-10 14:28     ` Peter Zijlstra
  2015-07-11  0:32       ` Masami Hiramatsu
  2015-07-11 10:27       ` Ingo Molnar
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2015-07-10 14:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Masami Hiramatsu, Ingo Molnar, linux-kernel, Waiman Long

On Fri, Jul 10, 2015 at 03:57:46PM +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:

> > Do we want to make double unlock non-fatal unconditionally?
> 
> No, just don't BUG() out, don't crash the system - generate a warning?

So that would be a yes..

Something like so then? Won't this generate a splat on that locking self
test then? And upset people?

---
 kernel/locking/qspinlock_paravirt.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 04ab18151cc8..286e8978a562 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -133,8 +133,14 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
 	 * This guarantees a limited lookup time and is itself guaranteed by
 	 * having the lock owner do the unhash -- IFF the unlock sees the
 	 * SLOW flag, there MUST be a hash entry.
+	 *
+	 * This can trigger due to double-unlock. In which case, return a
+	 * random pointer so that __pv_queued_spin_unlock() can dereference it
+	 * without crashing.
 	 */
-	BUG();
+	WARN_ON_ONCE(true);
+
+	return (struct pv_node *)this_cpu_ptr(&mcs_nodes[0]);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-10 14:28     ` Peter Zijlstra
@ 2015-07-11  0:32       ` Masami Hiramatsu
  2015-07-11  1:27         ` Waiman Long
  2015-07-11 10:27       ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Masami Hiramatsu @ 2015-07-11  0:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: Ingo Molnar, linux-kernel, Waiman Long

On 2015/07/10 23:28, Peter Zijlstra wrote:
> On Fri, Jul 10, 2015 at 03:57:46PM +0200, Ingo Molnar wrote:
>> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>>> Do we want to make double unlock non-fatal unconditionally?
>>
>> No, just don't BUG() out, don't crash the system - generate a warning?
> 
> So that would be a yes..
> 
> Something like so then? Won't this generate a splat on that locking self
> test then? And upset people?

Hmm, yes, this still noisy...
Can't we avoid double-unlock completely? it seems that this warning can
happen randomly, which means pv-spinlock randomly broken, doesn't it?

Thank you,

> ---
>  kernel/locking/qspinlock_paravirt.h | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index 04ab18151cc8..286e8978a562 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -133,8 +133,14 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
>  	 * This guarantees a limited lookup time and is itself guaranteed by
>  	 * having the lock owner do the unhash -- IFF the unlock sees the
>  	 * SLOW flag, there MUST be a hash entry.
> +	 *
> +	 * This can trigger due to double-unlock. In which case, return a
> +	 * random pointer so that __pv_queued_spin_unlock() can dereference it
> +	 * without crashing.
>  	 */
> -	BUG();
> +	WARN_ON_ONCE(true);
> +
> +	return (struct pv_node *)this_cpu_ptr(&mcs_nodes[0]);
>  }
>  
>  /*
> 


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-11  0:32       ` Masami Hiramatsu
@ 2015-07-11  1:27         ` Waiman Long
  2015-07-11  5:05           ` Masami Hiramatsu
  2015-07-11 10:22           ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Waiman Long @ 2015-07-11  1:27 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Peter Zijlstra, Ingo Molnar, Ingo Molnar, linux-kernel

On 07/10/2015 08:32 PM, Masami Hiramatsu wrote:
> On 2015/07/10 23:28, Peter Zijlstra wrote:
>> On Fri, Jul 10, 2015 at 03:57:46PM +0200, Ingo Molnar wrote:
>>> * Peter Zijlstra<peterz@infradead.org>  wrote:
>>>> Do we want to make double unlock non-fatal unconditionally?
>>> No, just don't BUG() out, don't crash the system - generate a warning?
>> So that would be a yes..
>>
>> Something like so then? Won't this generate a splat on that locking self
>> test then? And upset people?
> Hmm, yes, this still noisy...
> Can't we avoid double-unlock completely? it seems that this warning can
> happen randomly, which means pv-spinlock randomly broken, doesn't it?

It shouldn't randomly happen. The message should be printed at the first 
instance of double-unlock. If that is not case, there may be some 
problem in the code.

Anyway, I have an alternative fix that should better capture the problem:

-------------------------------
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 04ab181..92fc54f 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -286,15 +286,24 @@ __visible void __pv_queued_spin_unlock(struct 
qspinlock *lock)
  {
      struct __qspinlock *l = (void *)lock;
      struct pv_node *node;
+    u8 lockval = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);

      /*
       * We must not unlock if SLOW, because in that case we must first
       * unhash. Otherwise it would be possible to have multiple @lock
       * entries, which would be BAD.
       */
-    if (likely(cmpxchg(&l->locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL))
+    if (likely(lockval == _Q_LOCKED_VAL))
          return;

+    if (unlikely(lockval != _Q_SLOW_VAL)) {
+        printk(KERN_WARNING
+               "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
+               (unsigned long)lock, atomic_read(&lock->val));
+        WARN_ON_ONCE(1);
+        return;
+    }
+
      /*
       * Since the above failed to release, this must be the SLOW path.
       * Therefore start by looking up the blocked node and unhashing it.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-11  1:27         ` Waiman Long
@ 2015-07-11  5:05           ` Masami Hiramatsu
  2015-07-12  3:09             ` Waiman Long
  2015-07-11 10:22           ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Masami Hiramatsu @ 2015-07-11  5:05 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Ingo Molnar, linux-kernel

On 2015/07/11 10:27, Waiman Long wrote:
> On 07/10/2015 08:32 PM, Masami Hiramatsu wrote:
>> On 2015/07/10 23:28, Peter Zijlstra wrote:
>>> On Fri, Jul 10, 2015 at 03:57:46PM +0200, Ingo Molnar wrote:
>>>> * Peter Zijlstra<peterz@infradead.org>  wrote:
>>>>> Do we want to make double unlock non-fatal unconditionally?
>>>> No, just don't BUG() out, don't crash the system - generate a warning?
>>> So that would be a yes..
>>>
>>> Something like so then? Won't this generate a splat on that locking self
>>> test then? And upset people?
>> Hmm, yes, this still noisy...
>> Can't we avoid double-unlock completely? it seems that this warning can
>> happen randomly, which means pv-spinlock randomly broken, doesn't it?
> 
> It shouldn't randomly happen. The message should be printed at the first 
> instance of double-unlock. If that is not case, there may be some 
> problem in the code.

Ah, OK. That comes from locking selftest. In that case, do we really
need the warning while selftest, since we know it always fails ?

> Anyway, I have an alternative fix that should better capture the problem:

Do we need both Peter's BUG() removing patch and this?

Thank you,

> -------------------------------
> diff --git a/kernel/locking/qspinlock_paravirt.h 
> b/kernel/locking/qspinlock_paravirt.h
> index 04ab181..92fc54f 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -286,15 +286,24 @@ __visible void __pv_queued_spin_unlock(struct 
> qspinlock *lock)
>   {
>       struct __qspinlock *l = (void *)lock;
>       struct pv_node *node;
> +    u8 lockval = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
> 
>       /*
>        * We must not unlock if SLOW, because in that case we must first
>        * unhash. Otherwise it would be possible to have multiple @lock
>        * entries, which would be BAD.
>        */
> -    if (likely(cmpxchg(&l->locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL))
> +    if (likely(lockval == _Q_LOCKED_VAL))
>           return;
> 
> +    if (unlikely(lockval != _Q_SLOW_VAL)) {
> +        printk(KERN_WARNING
> +               "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
> +               (unsigned long)lock, atomic_read(&lock->val));
> +        WARN_ON_ONCE(1);
> +        return;
> +    }
> +
>       /*
>        * Since the above failed to release, this must be the SLOW path.
>        * Therefore start by looking up the blocked node and unhashing it.
> 
> 


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-11  1:27         ` Waiman Long
  2015-07-11  5:05           ` Masami Hiramatsu
@ 2015-07-11 10:22           ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2015-07-11 10:22 UTC (permalink / raw)
  To: Waiman Long; +Cc: Masami Hiramatsu, Ingo Molnar, Ingo Molnar, linux-kernel

On Fri, Jul 10, 2015 at 09:27:45PM -0400, Waiman Long wrote:
> Anyway, I have an alternative fix that should better capture the problem:
> 
> -------------------------------
> diff --git a/kernel/locking/qspinlock_paravirt.h
> b/kernel/locking/qspinlock_paravirt.h
> index 04ab181..92fc54f 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -286,15 +286,24 @@ __visible void __pv_queued_spin_unlock(struct
> qspinlock *lock)
>  {
>      struct __qspinlock *l = (void *)lock;
>      struct pv_node *node;
> +    u8 lockval = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
> 
>      /*
>       * We must not unlock if SLOW, because in that case we must first
>       * unhash. Otherwise it would be possible to have multiple @lock
>       * entries, which would be BAD.
>       */
> -    if (likely(cmpxchg(&l->locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL))
> +    if (likely(lockval == _Q_LOCKED_VAL))
>          return;
> 
> +    if (unlikely(lockval != _Q_SLOW_VAL)) {
> +        printk(KERN_WARNING
> +               "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
> +               (unsigned long)lock, atomic_read(&lock->val));
> +        WARN_ON_ONCE(1);

	WARN_ONCE(1, "foo");

> +        return;
> +    }

Right, so since this should not ever happen in 'sane' code, its a shame
to have to put in this condition. But yes, this works too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-10 14:28     ` Peter Zijlstra
  2015-07-11  0:32       ` Masami Hiramatsu
@ 2015-07-11 10:27       ` Ingo Molnar
  1 sibling, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2015-07-11 10:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Masami Hiramatsu, Ingo Molnar, linux-kernel, Waiman Long


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Jul 10, 2015 at 03:57:46PM +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Do we want to make double unlock non-fatal unconditionally?
> > 
> > No, just don't BUG() out, don't crash the system - generate a warning?
> 
> So that would be a yes..
> 
> Something like so then? Won't this generate a splat on that locking self
> test then? And upset people?
> 
> ---
>  kernel/locking/qspinlock_paravirt.h | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index 04ab18151cc8..286e8978a562 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -133,8 +133,14 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
>  	 * This guarantees a limited lookup time and is itself guaranteed by
>  	 * having the lock owner do the unhash -- IFF the unlock sees the
>  	 * SLOW flag, there MUST be a hash entry.
> +	 *
> +	 * This can trigger due to double-unlock. In which case, return a
> +	 * random pointer so that __pv_queued_spin_unlock() can dereference it
> +	 * without crashing.
>  	 */
> -	BUG();
> +	WARN_ON_ONCE(true);
> +
> +	return (struct pv_node *)this_cpu_ptr(&mcs_nodes[0]);

Yeah, just please also use debug_locks_silent to make the self-test execute 
properly or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137!
  2015-07-11  5:05           ` Masami Hiramatsu
@ 2015-07-12  3:09             ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2015-07-12  3:09 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Peter Zijlstra, Ingo Molnar, Ingo Molnar, linux-kernel

On 07/11/2015 01:05 AM, Masami Hiramatsu wrote:
> On 2015/07/11 10:27, Waiman Long wrote:
>> On 07/10/2015 08:32 PM, Masami Hiramatsu wrote:
>>> On 2015/07/10 23:28, Peter Zijlstra wrote:
>>>> On Fri, Jul 10, 2015 at 03:57:46PM +0200, Ingo Molnar wrote:
>>>>> * Peter Zijlstra<peterz@infradead.org>   wrote:
>>>>>> Do we want to make double unlock non-fatal unconditionally?
>>>>> No, just don't BUG() out, don't crash the system - generate a warning?
>>>> So that would be a yes..
>>>>
>>>> Something like so then? Won't this generate a splat on that locking self
>>>> test then? And upset people?
>>> Hmm, yes, this still noisy...
>>> Can't we avoid double-unlock completely? it seems that this warning can
>>> happen randomly, which means pv-spinlock randomly broken, doesn't it?
>> It shouldn't randomly happen. The message should be printed at the first
>> instance of double-unlock. If that is not case, there may be some
>> problem in the code.
> Ah, OK. That comes from locking selftest. In that case, do we really
> need the warning while selftest, since we know it always fails ?
>
>> Anyway, I have an alternative fix that should better capture the problem:
> Do we need both Peter's BUG() removing patch and this?
>

No, you can choose either one. They are just different ways to solve the 
same BUG() problem.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-12  3:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 11:32 [BUG][tip/master] kernel panic while locking selftest at qspinlock_paravirt.h:137! Masami Hiramatsu
2015-07-10 13:00 ` Peter Zijlstra
2015-07-10 13:57   ` Ingo Molnar
2015-07-10 14:28     ` Peter Zijlstra
2015-07-11  0:32       ` Masami Hiramatsu
2015-07-11  1:27         ` Waiman Long
2015-07-11  5:05           ` Masami Hiramatsu
2015-07-12  3:09             ` Waiman Long
2015-07-11 10:22           ` Peter Zijlstra
2015-07-11 10:27       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).