inconsistent lock state on v4.14.20-rt17

* inconsistent lock state on v4.14.20-rt17
@ 2018-03-06 15:27 Roosen Henri
  2018-03-06 18:21 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 10+ messages in thread
From: Roosen Henri @ 2018-03-06 15:27 UTC (permalink / raw)
  To: linux-rt-users

Hi,

Ever since 4.9 we've been chasing random kernel crashes which are
reproducible on RT in SMP on iMX6Q. It happens when the system is
stressed using hackbench, however, only when hackbench is used with
sockets, not when used with pipes.

Lately we've upgraded to v4.14.20-rt17, which doesn't solve the issue,
but instead locks up the kernel. After switching on some Lock-Debugging 
we've been able to catch a trace (see below). It would be great if
someone could have a look at it, or guide me in tracing down the root-
cause.

Thanks,
Henri

[18586.277233] ================================
[18586.277236] WARNING: inconsistent lock state
[18586.277245] 4.14.20-rt17-henri-1 #15 Tainted: G        W
[18586.277248] --------------------------------
[18586.277253] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[18586.277263] hackbench/18985 [HC0[0]:SC0[0]:HE1:SE1] takes:
[18586.277267]  (&rq->lock){?...}, at: [<c0992134>]
__schedule+0x128/0x6ac
[18586.277300] {IN-HARDIRQ-W} state was registered at:
[18586.277314]   lock_acquire+0x288/0x32c
[18586.277324]   _raw_spin_lock+0x48/0x58
[18586.277338]   scheduler_tick+0x40/0xb4
[18586.277349]   update_process_times+0x38/0x6c
[18586.277359]   tick_periodic+0x120/0x148
[18586.277366]   tick_handle_periodic+0x2c/0xa0
[18586.277378]   twd_handler+0x3c/0x48
[18586.277389]   handle_percpu_devid_irq+0x290/0x608
[18586.277395]   generic_handle_irq+0x28/0x38
[18586.277402]   __handle_domain_irq+0xd4/0xf0
[18586.277409]   gic_handle_irq+0x64/0xa8
[18586.277414]   __irq_svc+0x70/0xc4
[18586.277420]   lock_acquire+0x2a4/0x32c
[18586.277425]   lock_acquire+0x2a4/0x32c
[18586.277440]   down_write_nested+0x54/0x68
[18586.277453]   sget_userns+0x310/0x4f4
[18586.277465]   mount_pseudo_xattr+0x68/0x170
[18586.277477]   nsfs_mount+0x3c/0x50
[18586.277484]   mount_fs+0x24/0xa8
[18586.277490]   vfs_kern_mount+0x58/0x118
[18586.277496]   kern_mount_data+0x24/0x34
[18586.277507]   nsfs_init+0x20/0x58
[18586.277522]   start_kernel+0x2f8/0x360
[18586.277528]   0x1000807c
[18586.277532] irq event stamp: 19441
[18586.277542] hardirqs last  enabled at (19441): [<c099665c>]
_raw_spin_unlock_irqrestore+0x88/0x90
[18586.277550] hardirqs last disabled at (19440): [<c09962f8>]
_raw_spin_lock_irqsave+0x2c/0x68
[18586.277562] softirqs last  enabled at (0): [<c0120c18>]
copy_process.part.5+0x370/0x1a54
[18586.277568] softirqs last disabled at (0): [<  (null)>]   (null)
[18586.277571]
               other info that might help us debug this:
[18586.277574]  Possible unsafe locking scenario:

[18586.277576]        CPU0
[18586.277578]        ----
[18586.277580]   lock(&rq->lock);
[18586.277587]   <Interrupt>
[18586.277588]     lock(&rq->lock);
[18586.277594]
                *** DEADLOCK ***

[18586.277599] 2 locks held by hackbench/18985:
[18586.277601]  #0:  (&u->iolock){+.+.}, at: [<c081de30>]
unix_stream_read_generic+0xb0/0x7e4
[18586.277624]  #1:  (rcu_read_lock){....}, at: [<c081b73c>]
unix_write_space+0x0/0x2b0
[18586.277640]
               stack backtrace:
[18586.277651] CPU: 1 PID: 18985 Comm: hackbench Tainted:
G        W       4.14.20-rt17-henri-1 #15
[18586.277654] Hardware name: Freescale i.MX6 Quad/DualLite (Device
Tree)
[18586.277683] [<c0111600>] (unwind_backtrace) from [<c010bfe8>]
(show_stack+0x20/0x24)
[18586.277701] [<c010bfe8>] (show_stack) from [<c097d79c>]
(dump_stack+0x9c/0xd0)
[18586.277714] [<c097d79c>] (dump_stack) from [<c0175424>]
(print_usage_bug+0x1c8/0x2d0)
[18586.277725] [<c0175424>] (print_usage_bug) from [<c0175970>]
(mark_lock+0x444/0x69c)
[18586.277736] [<c0175970>] (mark_lock) from [<c0177114>]
(__lock_acquire+0x23c/0x172c)
[18586.277748] [<c0177114>] (__lock_acquire) from [<c017935c>]
(lock_acquire+0x288/0x32c)
[18586.277759] [<c017935c>] (lock_acquire) from [<c0996150>]
(_raw_spin_lock+0x48/0x58)
[18586.277774] [<c0996150>] (_raw_spin_lock) from [<c0992134>]
(__schedule+0x128/0x6ac)
[18586.277789] [<c0992134>] (__schedule) from [<c09929c0>]
(preempt_schedule_irq+0x5c/0x8c)
[18586.277801] [<c09929c0>] (preempt_schedule_irq) from [<c010cc8c>]
(svc_preempt+0x8/0x2c)
[18586.277815] [<c010cc8c>] (svc_preempt) from [<c0190b60>]
(__rcu_read_unlock+0x40/0x98)
[18586.277829] [<c0190b60>] (__rcu_read_unlock) from [<c081b9a4>]
(unix_write_space+0x268/0x2b0)
[18586.277847] [<c081b9a4>] (unix_write_space) from [<c07643d8>]
(sock_wfree+0x70/0xac)
[18586.277860] [<c07643d8>] (sock_wfree) from [<c081aff0>]
(unix_destruct_scm+0x74/0x7c)
[18586.277876] [<c081aff0>] (unix_destruct_scm) from [<c076a8dc>]
(skb_release_head_state+0x78/0x80)
[18586.277891] [<c076a8dc>] (skb_release_head_state) from [<c076ac28>]
(skb_release_all+0x1c/0x34)
[18586.277905] [<c076ac28>] (skb_release_all) from [<c076ac5c>]
(__kfree_skb+0x1c/0x28)
[18586.277919] [<c076ac5c>] (__kfree_skb) from [<c076b470>]
(consume_skb+0x228/0x2b4)
[18586.277933] [<c076b470>] (consume_skb) from [<c081e3d4>]
(unix_stream_read_generic+0x654/0x7e4)
[18586.277947] [<c081e3d4>] (unix_stream_read_generic) from
[<c081e65c>] (unix_stream_recvmsg+0x5c/0x68)
[18586.277969] [<c081e65c>] (unix_stream_recvmsg) from [<c075f0e0>]
(sock_recvmsg+0x28/0x2c)
[18586.277983] [<c075f0e0>] (sock_recvmsg) from [<c075f174>]
(sock_read_iter+0x90/0xb8)
[18586.277998] [<c075f174>] (sock_read_iter) from [<c02559ec>]
(__vfs_read+0x108/0x12c)
[18586.278010] [<c02559ec>] (__vfs_read) from [<c0255ab0>]
(vfs_read+0xa0/0x10c)
[18586.278021] [<c0255ab0>] (vfs_read) from [<c0255f4c>]
(SyS_read+0x50/0x88)
[18586.278035] [<c0255f4c>] (SyS_read) from [<c01074e0>]
(ret_fast_syscall+0x0/0x28)

^ permalink raw reply	[flat|nested] 10+ messages in thread