All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.6-rc1 IB complaint
@ 2012-08-07 16:48 Bart Van Assche
       [not found] ` <502146C1.80405-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Bart Van Assche @ 2012-08-07 16:48 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello,

Has anyone else already seen the ugly kernel message below ? This
message is generated during boot and prevents my IB HCA to come up
properly with 3.6-rc1. This did not happen with kernel 3.5.

=================================
[ INFO: inconsistent lock state ]
3.6.0-rc1-debug+ #1 Not tainted
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
 (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa0328df4>] update_sm_ah+0x94/0xd0 [mlx4_ib]
{HARDIRQ-ON-W} state was registered at:
  [<ffffffff81095e8a>] __lock_acquire+0x66a/0x1ca0
  [<ffffffff81097ac5>] lock_acquire+0x95/0x130
  [<ffffffff8140e815>] _raw_spin_lock+0x45/0x80
  [<ffffffffa0329b6b>] mlx4_ib_process_mad+0x58b/0x7a0 [mlx4_ib]
  [<ffffffffa03178be>] ib_post_send_mad+0x34e/0x6d0 [ib_mad]
  [<ffffffffa033afc5>] ib_umad_write+0x515/0x630 [ib_umad]
  [<ffffffff8114e41e>] vfs_write+0xce/0x170
  [<ffffffff8114e724>] sys_write+0x54/0xa0
  [<ffffffff81417692>] system_call_fastpath+0x16/0x1b
irq event stamp: 306104
hardirqs last  enabled at (306101): [<ffffffff8100ae75>] mwait_idle+0x95/0x180
hardirqs last disabled at (306102): [<ffffffff8140f5e7>] common_interrupt+0x67/0x6c
softirqs last  enabled at (306104): [<ffffffff81045793>] _local_bh_enable+0x13/0x20
softirqs last disabled at (306103): [<ffffffff81046045>] irq_enter+0x75/0x90

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&ibdev->sm_lock)->rlock);
  <Interrupt>
    lock(&(&ibdev->sm_lock)->rlock);

 *** DEADLOCK ***

1 lock held by swapper/1/0:
 #0:  (&(&priv->ctx_lock)->rlock){-.....}, at: [<ffffffffa02472c9>] mlx4_dispatch_event+0x39/0x90 [mlx4_core]

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.6.0-rc1-debug+ #1
Call Trace:
 <IRQ>  [<ffffffff81095429>] print_usage_bug+0x219/0x220
 [<ffffffff8109579f>] mark_lock+0x36f/0x3f0
 [<ffffffff8109602a>] __lock_acquire+0x80a/0x1ca0
 [<ffffffff81097ac5>] lock_acquire+0x95/0x130
 [<ffffffffa0328df4>] ? update_sm_ah+0x94/0xd0 [mlx4_ib]
 [<ffffffffa02f492b>] ? rdma_port_get_link_layer+0x1b/0x40 [ib_core]
 [<ffffffff8140e815>] _raw_spin_lock+0x45/0x80
 [<ffffffffa0328df4>] ? update_sm_ah+0x94/0xd0 [mlx4_ib]
 [<ffffffffa02f42aa>] ? ib_create_ah+0x1a/0x40 [ib_core]
 [<ffffffffa0328df4>] update_sm_ah+0x94/0xd0 [mlx4_ib]
 [<ffffffffa032957b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib]
 [<ffffffffa0329ed0>] mlx4_ib_event+0x120/0x170 [mlx4_ib]
 [<ffffffff8140e9f3>] ? _raw_spin_lock_irqsave+0x83/0xa0
 [<ffffffffa02472c9>] ? mlx4_dispatch_event+0x39/0x90 [mlx4_core]
 [<ffffffffa02472fc>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core]
 [<ffffffffa0241a80>] mlx4_eq_int+0x4d0/0x920 [mlx4_core]
 [<ffffffff8107673f>] ? local_clock+0x4f/0x60
 [<ffffffffa0241ee4>] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
 [<ffffffff810bd215>] handle_irq_event_percpu+0x75/0x230
 [<ffffffff810bd41e>] handle_irq_event+0x4e/0x80
 [<ffffffff810bfd55>] handle_edge_irq+0x85/0x130
 [<ffffffff81004375>] handle_irq+0x25/0x40
 [<ffffffff81418ddd>] do_IRQ+0x5d/0xe0
 [<ffffffff8140f5ec>] common_interrupt+0x6c/0x6c
 <EOI>  [<ffffffff8100ae7e>] ? mwait_idle+0x9e/0x180
 [<ffffffff8100ae75>] ? mwait_idle+0x95/0x180
 [<ffffffff8100b7a6>] cpu_idle+0xa6/0xe0
 [<ffffffff8140777d>] start_secondary+0x204/0x206
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: 3.6-rc1 IB complaint
       [not found] ` <502146C1.80405-HInyCGIudOg@public.gmane.org>
@ 2012-08-08 12:08   ` Jack Morgenstein
  0 siblings, 0 replies; 2+ messages in thread
From: Jack Morgenstein @ 2012-08-08 12:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier

Hi Bart,

I submitted a patch to Roland on August 3 (along with SRIOV-IB V2) to fix this:
 
[PATCH] IB/mlx4: fix possible deadlock with sm_lock spinlock

I notice that you tested out the fix and it worked.

Roland, please take the patch and submit to Linus. This fixes a bug in
the upstream 3.6-RC1 code.

Thanks!

-Jack

On Tuesday 07 August 2012 19:48, Bart Van Assche wrote:
> Hello,
> 
> Has anyone else already seen the ugly kernel message below ? This
> message is generated during boot and prevents my IB HCA to come up
> properly with 3.6-rc1. This did not happen with kernel 3.5.
> 
> =================================
> [ INFO: inconsistent lock state ]
> 3.6.0-rc1-debug+ #1 Not tainted
> ---------------------------------
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>  (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa0328df4>] update_sm_ah+0x94/0xd0 [mlx4_ib]
> {HARDIRQ-ON-W} state was registered at:
>   [<ffffffff81095e8a>] __lock_acquire+0x66a/0x1ca0
>   [<ffffffff81097ac5>] lock_acquire+0x95/0x130
>   [<ffffffff8140e815>] _raw_spin_lock+0x45/0x80
>   [<ffffffffa0329b6b>] mlx4_ib_process_mad+0x58b/0x7a0 [mlx4_ib]
>   [<ffffffffa03178be>] ib_post_send_mad+0x34e/0x6d0 [ib_mad]
>   [<ffffffffa033afc5>] ib_umad_write+0x515/0x630 [ib_umad]
>   [<ffffffff8114e41e>] vfs_write+0xce/0x170
>   [<ffffffff8114e724>] sys_write+0x54/0xa0
>   [<ffffffff81417692>] system_call_fastpath+0x16/0x1b
> irq event stamp: 306104
> hardirqs last  enabled at (306101): [<ffffffff8100ae75>] mwait_idle+0x95/0x180
> hardirqs last disabled at (306102): [<ffffffff8140f5e7>] common_interrupt+0x67/0x6c
> softirqs last  enabled at (306104): [<ffffffff81045793>] _local_bh_enable+0x13/0x20
> softirqs last disabled at (306103): [<ffffffff81046045>] irq_enter+0x75/0x90
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&(&ibdev->sm_lock)->rlock);
>   <Interrupt>
>     lock(&(&ibdev->sm_lock)->rlock);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by swapper/1/0:
>  #0:  (&(&priv->ctx_lock)->rlock){-.....}, at: [<ffffffffa02472c9>] mlx4_dispatch_event+0x39/0x90 [mlx4_core]
> 
> stack backtrace:
> Pid: 0, comm: swapper/1 Not tainted 3.6.0-rc1-debug+ #1
> Call Trace:
>  <IRQ>  [<ffffffff81095429>] print_usage_bug+0x219/0x220
>  [<ffffffff8109579f>] mark_lock+0x36f/0x3f0
>  [<ffffffff8109602a>] __lock_acquire+0x80a/0x1ca0
>  [<ffffffff81097ac5>] lock_acquire+0x95/0x130
>  [<ffffffffa0328df4>] ? update_sm_ah+0x94/0xd0 [mlx4_ib]
>  [<ffffffffa02f492b>] ? rdma_port_get_link_layer+0x1b/0x40 [ib_core]
>  [<ffffffff8140e815>] _raw_spin_lock+0x45/0x80
>  [<ffffffffa0328df4>] ? update_sm_ah+0x94/0xd0 [mlx4_ib]
>  [<ffffffffa02f42aa>] ? ib_create_ah+0x1a/0x40 [ib_core]
>  [<ffffffffa0328df4>] update_sm_ah+0x94/0xd0 [mlx4_ib]
>  [<ffffffffa032957b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib]
>  [<ffffffffa0329ed0>] mlx4_ib_event+0x120/0x170 [mlx4_ib]
>  [<ffffffff8140e9f3>] ? _raw_spin_lock_irqsave+0x83/0xa0
>  [<ffffffffa02472c9>] ? mlx4_dispatch_event+0x39/0x90 [mlx4_core]
>  [<ffffffffa02472fc>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core]
>  [<ffffffffa0241a80>] mlx4_eq_int+0x4d0/0x920 [mlx4_core]
>  [<ffffffff8107673f>] ? local_clock+0x4f/0x60
>  [<ffffffffa0241ee4>] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
>  [<ffffffff810bd215>] handle_irq_event_percpu+0x75/0x230
>  [<ffffffff810bd41e>] handle_irq_event+0x4e/0x80
>  [<ffffffff810bfd55>] handle_edge_irq+0x85/0x130
>  [<ffffffff81004375>] handle_irq+0x25/0x40
>  [<ffffffff81418ddd>] do_IRQ+0x5d/0xe0
>  [<ffffffff8140f5ec>] common_interrupt+0x6c/0x6c
>  <EOI>  [<ffffffff8100ae7e>] ? mwait_idle+0x9e/0x180
>  [<ffffffff8100ae75>] ? mwait_idle+0x95/0x180
>  [<ffffffff8100b7a6>] cpu_idle+0xa6/0xe0
>  [<ffffffff8140777d>] start_secondary+0x204/0x206
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-08-08 12:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-07 16:48 3.6-rc1 IB complaint Bart Van Assche
     [not found] ` <502146C1.80405-HInyCGIudOg@public.gmane.org>
2012-08-08 12:08   ` Jack Morgenstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.