* crash unloading mlx4 in 3.2-rc3
@ 2011-12-06 17:38 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373256536F5-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Hefty, Sean @ 2011-12-06 17:38 UTC (permalink / raw)
To: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
I'm in the process of debugging this, but I see the following crash in 3.2-rc3 when unloading the IB stack. I'm unloading the drivers immediately after the system comes up after booting.
- Sean
Dec 5 18:23:34 cst-lin0 kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
Dec 5 18:23:34 cst-lin0 kernel: IP: [<ffffffff812455e4>] bitmap_clear+0xa4/0xd0
Dec 5 18:23:34 cst-lin0 kernel: PGD 333fae067 PUD 3341b9067 PMD 0
Dec 5 18:23:34 cst-lin0 kernel: Oops: 0002 [#1] SMP
Dec 5 18:23:34 cst-lin0 kernel: CPU 0
Dec 5 18:23:34 cst-lin0 kernel: Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT bridge stp llc autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 xt_physdev iptable_filter ip_tables dm_mirror dm_region_hash dm_log kvm_intel kvm uinput serio_raw pcspkr sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ioatdma dca i7core_edac edac_core mlx4_ib(-) ib_mad ib_core mlx4_en mlx4_core ext4 mbcache jbd2 sd_mod crc_t10dif ahci libahci dm_mod [last unloaded: microcode]
Dec 5 18:23:34 cst-lin0 kernel:
Dec 5 18:23:34 cst-lin0 kernel: Pid: 6883, comm: rmmod Not tainted 3.2.0-rc3 #1 Intel Corporation S5500HV/S5500HV
Dec 5 18:23:34 cst-lin0 kernel: RIP: 0010:[<ffffffff812455e4>] [<ffffffff812455e4>] bitmap_clear+0xa4/0xd0
Dec 5 18:23:34 cst-lin0 kernel: RSP: 0018:ffff880335089de0 EFLAGS: 00010286
Dec 5 18:23:34 cst-lin0 kernel: RAX: 7fffffffffffffff RBX: ffff880334430940 RCX: 0000000000000001
Dec 5 18:23:34 cst-lin0 kernel: RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
Dec 5 18:23:34 cst-lin0 kernel: RBP: ffff880335089de8 R08: ffffffffffffffff R09: 0000000000000000
Dec 5 18:23:34 cst-lin0 kernel: R10: 00000000ffffffc0 R11: ffffffff81deb300 R12: 00000000ffffffff
Dec 5 18:23:34 cst-lin0 kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
Dec 5 18:23:34 cst-lin0 kernel: FS: 00007f65b2443700(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
Dec 5 18:23:34 cst-lin0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 5 18:23:34 cst-lin0 kernel: CR2: 0000000000000000 CR3: 0000000333a7b000 CR4: 00000000000006f0
Dec 5 18:23:34 cst-lin0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 5 18:23:34 cst-lin0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 5 18:23:34 cst-lin0 kernel: Process rmmod (pid: 6883, threadinfo ffff880335088000, task ffff880333e0d520)
Dec 5 18:23:34 cst-lin0 kernel: Stack:
Dec 5 18:23:34 cst-lin0 kernel: ffff880334430940 ffff880335089e18 ffffffffa00ed176 ffffc9001260c000
Dec 5 18:23:34 cst-lin0 kernel: 0000000000000001 ffff880334430000 ffff8801b4a68000 ffff880335089e28
Dec 5 18:23:34 cst-lin0 kernel: ffffffffa00ed1c3 ffff880335089e38 ffffffffa00f5ba5 ffff880335089e68
Dec 5 18:23:34 cst-lin0 kernel: Call Trace:
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa00ed176>] mlx4_bitmap_free_range+0x46/0x80 [mlx4_core]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa00ed1c3>] mlx4_bitmap_free+0x13/0x20 [mlx4_core]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa00f5ba5>] mlx4_counter_free+0x15/0x20 [mlx4_core]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa0127b82>] mlx4_ib_remove+0x82/0x120 [mlx4_ib]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa00f376d>] mlx4_remove_device+0x6d/0x80 [mlx4_core]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa00f3843>] mlx4_unregister_interface+0x43/0x80 [mlx4_core]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffffa012f448>] mlx4_ib_cleanup+0x10/0x1e [mlx4_ib]
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffff8109d370>] sys_delete_module+0x1a0/0x270
Dec 5 18:23:34 cst-lin0 kernel: [<ffffffff814e1382>] system_call_fastpath+0x16/0x1b
Dec 5 18:23:34 cst-lin0 kernel: Code: 7c df 08 48 c7 c0 ff ff ff ff c1 e1 06 41 29 ca 44 89 d1 85 c9 74 17 01 f2 49 c7 c0 ff ff ff ff f6 c2 3f 75 12 4c 21 c0 48 f7 d0
Dec 5 18:23:34 cst-lin0 kernel: RIP [<ffffffff812455e4>] bitmap_clear+0xa4/0xd0
Dec 5 18:23:34 cst-lin0 kernel: RSP <ffff880335089de0>
Dec 5 18:23:34 cst-lin0 kernel: CR2: 0000000000000000
Dec 5 18:23:34 cst-lin0 kernel: ---[ end trace 9f46d5b6f4fe8152 ]---
Dec 5 18:25:01 cst-lin0 abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Dec 5 18:25:01 cst-lin0 abrtd: Directory 'kerneloops-1323138301-5798-1' creation detected
Dec 5 18:25:01 cst-lin0 abrtd: New crash /var/spool/abrt/kerneloops-1323138301-5798-1, processing
Dec 5 18:25:02 cst-lin0 kernel: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case. If you have one, please send an email to linux-mm-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: crash unloading mlx4 in 3.2-rc3
[not found] ` <1828884A29C6694DAF28B7E6B8A82373256536F5-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-12-06 17:50 ` Roland Dreier
2011-12-06 17:57 ` Roland Dreier
1 sibling, 0 replies; 6+ messages in thread
From: Roland Dreier @ 2011-12-06 17:50 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
On Tue, Dec 6, 2011 at 9:38 AM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> Dec 5 18:23:34 cst-lin0 kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
3.1 was OK?
- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: crash unloading mlx4 in 3.2-rc3
[not found] ` <1828884A29C6694DAF28B7E6B8A82373256536F5-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-12-06 17:50 ` Roland Dreier
@ 2011-12-06 17:57 ` Roland Dreier
[not found] ` <CAL1RGDUkZHZqbLgOBE0PFK7xCV84f1Q-1gYH01p7UKiTYEPRBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2011-12-06 17:57 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
[-- Attachment #1: Type: text/plain, Size: 150 bytes --]
Does something like the attached (not even compile tested) help?
(sorry for the email format, just going quick and dirty with webmail
at the moment)
[-- Attachment #2: mlx4-counter.txt --]
[-- Type: text/plain, Size: 929 bytes --]
drivers/infiniband/hw/mlx4/main.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 77f3dbc..18836cd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1244,7 +1244,8 @@ err_reg:
err_counter:
for (; i; --i)
- mlx4_counter_free(ibdev->dev, ibdev->counters[i - 1]);
+ if (ibdev->counters[i - 1] != -1)
+ mlx4_counter_free(ibdev->dev, ibdev->counters[i - 1]);
err_map:
iounmap(ibdev->uar_map);
@@ -1275,7 +1276,8 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
}
iounmap(ibdev->uar_map);
for (p = 0; p < ibdev->num_ports; ++p)
- mlx4_counter_free(ibdev->dev, ibdev->counters[p]);
+ if (ibdev->counters[p] != -1)
+ mlx4_counter_free(ibdev->dev, ibdev->counters[p]);
mlx4_foreach_port(p, dev, MLX4_PORT_TYPE_IB)
mlx4_CLOSE_PORT(dev, p);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: crash unloading mlx4 in 3.2-rc3
[not found] ` <CAL1RGDUkZHZqbLgOBE0PFK7xCV84f1Q-1gYH01p7UKiTYEPRBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-06 18:16 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823732565377F-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Hefty, Sean @ 2011-12-06 18:16 UTC (permalink / raw)
To: Roland Dreier
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
> Does something like the attached (not even compile tested) help?
I no longer see the crash using that patch (as-is). Thanks.
I don't know if this occurs on 3.1 without re-installing that kernel.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: crash unloading mlx4 in 3.2-rc3
[not found] ` <1828884A29C6694DAF28B7E6B8A823732565377F-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-12-06 18:19 ` Roland Dreier
[not found] ` <CAL1RGDWE1HkzWnrU0+Eke-Q2nmxc6jRaoB85s3hnEmEQs3xR6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2011-12-06 18:19 UTC (permalink / raw)
To: Hefty, Sean
Cc: linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
On Tue, Dec 6, 2011 at 10:16 AM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> I don't know if this occurs on 3.1 without re-installing that kernel.
Yeah, no worries, since that patch works I think I understand the
problem. And it was in 3.1 AFAICT too... came in with cfcde11c3d7a
("IB/mlx4: Use flow counters on IBoE ports") which was in 3.1.
I'll queue up my fix, thanks for testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: crash unloading mlx4 in 3.2-rc3
[not found] ` <CAL1RGDWE1HkzWnrU0+Eke-Q2nmxc6jRaoB85s3hnEmEQs3xR6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-12-06 19:11 ` Or Gerlitz
0 siblings, 0 replies; 6+ messages in thread
From: Or Gerlitz @ 2011-12-06 19:11 UTC (permalink / raw)
To: Roland Dreier
Cc: Hefty, Sean,
linux-rdma (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org> wrote:
> Yeah, no worries, since that patch works I think I understand the
> problem. And it was in 3.1 AFAICT too... came in with cfcde11c3d7a
> ("IB/mlx4: Use flow counters on IBoE ports") which was in 3.1.
> I'll queue up my fix, thanks for testing.
mmm, so its an A0 thing... should be testing a bit harder next time,
thanks for the quick
debugging
Or.
drivers/infiniband/hw/mlx4/main.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/main.c
b/drivers/infiniband/hw/mlx4/main.c
index 77f3dbc..18836cd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1244,7 +1244,8 @@ err_reg:
err_counter:
for (; i; --i)
- mlx4_counter_free(ibdev->dev, ibdev->counters[i - 1]);
+ if (ibdev->counters[i - 1] != -1)
+ mlx4_counter_free(ibdev->dev, ibdev->counters[i - 1]);
err_map:
iounmap(ibdev->uar_map);
@@ -1275,7 +1276,8 @@ static void mlx4_ib_remove(struct mlx4_dev *dev,
void *ibdev_ptr)
}
iounmap(ibdev->uar_map);
for (p = 0; p < ibdev->num_ports; ++p)
- mlx4_counter_free(ibdev->dev, ibdev->counters[p]);
+ if (ibdev->counters[p] != -1)
+ mlx4_counter_free(ibdev->dev, ibdev->counters[p]);
mlx4_foreach_port(p, dev, MLX4_PORT_TYPE_IB)
mlx4_CLOSE_PORT(dev, p);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-12-06 19:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-06 17:38 crash unloading mlx4 in 3.2-rc3 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373256536F5-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-12-06 17:50 ` Roland Dreier
2011-12-06 17:57 ` Roland Dreier
[not found] ` <CAL1RGDUkZHZqbLgOBE0PFK7xCV84f1Q-1gYH01p7UKiTYEPRBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 18:16 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823732565377F-P5GAC/sN6hlcIJlls4ac1rfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-12-06 18:19 ` Roland Dreier
[not found] ` <CAL1RGDWE1HkzWnrU0+Eke-Q2nmxc6jRaoB85s3hnEmEQs3xR6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 19:11 ` Or Gerlitz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.