All of lore.kernel.org
 help / color / mirror / Atom feed
* mlx4 module loading fail
@ 2013-03-07 11:18 Hudzia, Benoit
       [not found] ` <96353B6F8A3DAE4BBC51047BD0E6BAC20913A5-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Hudzia, Benoit @ 2013-03-07 11:18 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

I am currently experiencing some trouble with my connectx2 cards. 

I  have been doing test with smallish server without any problem and this week I upgraded to a more beefier option. However I fail to be able setup the IB card with our current kernel .


The servers spec are as follow: 
	* 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
	* 1TB of RAM 
	* 1 connectx2 IB 

Kernel Version : 3.5.0 

Note if I downgrade to a 3.2 kernel I do not experience this issue. However I am forced to work with a 3.5 or higher. Can somebody help me with that? 
Thanks 
Benoit

Kernel log trace: 

Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here ]------------
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm tpm_bios acpi_memhotpl
ug evbug crc32c_intel megaraid_sas usbhid hid
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe Not tainted 3.5.0-heca-dev-34dd48a+ #29
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>] warn_slowpath_common+0x7f/0xc0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>] warn_slowpath_null+0x1a/0x20
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>] __alloc_pages_nodemask+0x2b9/0x810
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ? __alloc_pages_nodemask+0x185/0x810
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>] alloc_pages_current+0xb6/0x120
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>] __get_free_pages+0xe/0x40
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>] kmalloc_order_trace+0x3f/0xd0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ? __get_free_pages+0xe/0x40
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>] __kmalloc+0x100/0x160
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>] mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>] mlx4_init_mr_table+0xca/0x150 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>] mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ? mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>] mlx4_setup_hca+0x2b/0x70 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>] __mlx4_init_one+0x744/0x960 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>] mlx4_init_one+0x3d/0x42 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>] pci_call_probe+0x96/0xb0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>] pci_device_probe+0x79/0xa0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ? driver_sysfs_add+0x7a/0xb0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>] really_probe+0x68/0x200
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>] driver_probe_device+0x22/0x30
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>] __driver_attach+0xab/0xb0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ? driver_probe_device+0x30/0x30
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>] bus_for_each_dev+0x56/0x90
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>] driver_attach+0x1e/0x20
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>] bus_add_driver+0x1a0/0x270
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>] driver_register+0x76/0x130
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ? notifier_call_chain+0x4d/0x70
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ? add_kallsyms+0x1e0/0x1e0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>] __pci_register_driver+0x55/0xd0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>] mlx4_init+0xac/0xec [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>] sys_init_module+0x8f/0x200
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>] system_call_fastpath+0x16/0x1b
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace 8886e8f0c535939d ]---
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0: Failed to initialize memory region table, aborting.
Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of 0000:86:00.0 failed with error -12



Dr. Benoit Hudzia
Senior Researcher

SAP Next Business and Technology 
SAP (UK) Limited
The Concourse Building 
Queen's Road , Queen's Island, Titanic Quarter
BT3 9TD Belfast
T +44 (0)28 9078 5742
F +44 (0)28 9078  5777
M +44 (0)79 834 46729
mailto:benoit.hudzia-y6kNeMnOB+c@public.gmane.org
www.sap.com/research

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mlx4 module loading fail
       [not found] ` <96353B6F8A3DAE4BBC51047BD0E6BAC20913A5-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
@ 2013-03-07 12:38   ` Dongsu Park
       [not found]     ` <20130307123854.GB15491-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-03-07 15:34   ` Or Gerlitz
  1 sibling, 1 reply; 11+ messages in thread
From: Dongsu Park @ 2013-03-07 12:38 UTC (permalink / raw)
  To: Hudzia, Benoit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

On 07.03.2013 11:18, Hudzia, Benoit wrote:
> The servers spec are as follow: 
> 	* 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
> 	* 1TB of RAM 
> 	* 1 connectx2 IB 
> 
> Kernel Version : 3.5.0 
> 
> Note if I downgrade to a 3.2 kernel I do not experience this issue. However I am forced to work with a 3.5 or higher. Can somebody help me with that? 

Probably the commit 89dd86db (mlx4_core: Allow large mlx4_buddy bitmaps),
which is already included in 3.6 or higher, has already fixed the problem.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit?h=linux-3.6.y&id=89dd86db

Regards,
Dongsu

> Thanks 
> Benoit
> 
> Kernel log trace: 
> 
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here ]------------
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm tpm_bios acpi_memhotpl
> ug evbug crc32c_intel megaraid_sas usbhid hid
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe Not tainted 3.5.0-heca-dev-34dd48a+ #29
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>] warn_slowpath_common+0x7f/0xc0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>] warn_slowpath_null+0x1a/0x20
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>] __alloc_pages_nodemask+0x2b9/0x810
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ? __alloc_pages_nodemask+0x185/0x810
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>] alloc_pages_current+0xb6/0x120
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>] __get_free_pages+0xe/0x40
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>] kmalloc_order_trace+0x3f/0xd0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ? __get_free_pages+0xe/0x40
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>] __kmalloc+0x100/0x160
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>] mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>] mlx4_init_mr_table+0xca/0x150 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>] mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ? mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>] mlx4_setup_hca+0x2b/0x70 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>] __mlx4_init_one+0x744/0x960 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>] mlx4_init_one+0x3d/0x42 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>] pci_call_probe+0x96/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>] pci_device_probe+0x79/0xa0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ? driver_sysfs_add+0x7a/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>] really_probe+0x68/0x200
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>] driver_probe_device+0x22/0x30
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>] __driver_attach+0xab/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ? driver_probe_device+0x30/0x30
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>] bus_for_each_dev+0x56/0x90
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>] driver_attach+0x1e/0x20
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>] bus_add_driver+0x1a0/0x270
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>] driver_register+0x76/0x130
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ? notifier_call_chain+0x4d/0x70
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ? add_kallsyms+0x1e0/0x1e0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>] __pci_register_driver+0x55/0xd0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>] mlx4_init+0xac/0xec [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>] sys_init_module+0x8f/0x200
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>] system_call_fastpath+0x16/0x1b
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace 8886e8f0c535939d ]---
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0: Failed to initialize memory region table, aborting.
> Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of 0000:86:00.0 failed with error -12
> 
> 
> 
> Dr. Benoit Hudzia
> Senior Researcher
> 
> SAP Next Business and Technology 
> SAP (UK) Limited
> The Concourse Building 
> Queen's Road , Queen's Island, Titanic Quarter
> BT3 9TD Belfast
> T +44 (0)28 9078 5742
> F +44 (0)28 9078  5777
> M +44 (0)79 834 46729
> mailto:benoit.hudzia-y6kNeMnOB+c@public.gmane.org
> www.sap.com/research
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: mlx4 module loading fail
       [not found]     ` <20130307123854.GB15491-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-03-07 12:56       ` Hudzia, Benoit
  0 siblings, 0 replies; 11+ messages in thread
From: Hudzia, Benoit @ 2013-03-07 12:56 UTC (permalink / raw)
  To: Dongsu Park; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7060 bytes --]

I think I tried  with the 3.8 stable but I will check again to make sure. 


> -----Original Message-----
> From: Dongsu Park [mailto:dongsu.park@profitbricks.com]
> Sent: 07 March 2013 12:39
> To: Hudzia, Benoit
> Cc: linux-rdma@vger.kernel.org
> Subject: Re: mlx4 module loading fail
> 
> Hi,
> 
> On 07.03.2013 11:18, Hudzia, Benoit wrote:
> > The servers spec are as follow:
> > 	* 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
> > 	* 1TB of RAM
> > 	* 1 connectx2 IB
> >
> > Kernel Version : 3.5.0
> >
> > Note if I downgrade to a 3.2 kernel I do not experience this issue. However
> I am forced to work with a 3.5 or higher. Can somebody help me with that?
> 
> Probably the commit 89dd86db (mlx4_core: Allow large mlx4_buddy
> bitmaps),
> which is already included in 3.6 or higher, has already fixed the problem.
> 
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
> stable.git/commit?h=linux-3.6.y&id=89dd86db
> 
> Regards,
> Dongsu
> 
> > Thanks
> > Benoit
> >
> > Kernel log trace:
> >
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here ]---------
> ---
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at
> mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev
> coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio
> ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm
> tpm_bios acpi_memhotpl
> > ug evbug crc32c_intel megaraid_sas usbhid hid
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe
> Not tainted 3.5.0-heca-dev-34dd48a+ #29
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>]
> warn_slowpath_common+0x7f/0xc0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>]
> warn_slowpath_null+0x1a/0x20
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>]
> __alloc_pages_nodemask+0x2b9/0x810
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ?
> __alloc_pages_nodemask+0x185/0x810
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>]
> alloc_pages_current+0xb6/0x120
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>]
> __get_free_pages+0xe/0x40
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>]
> kmalloc_order_trace+0x3f/0xd0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ?
> __get_free_pages+0xe/0x40
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>]
> __kmalloc+0x100/0x160
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>]
> mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>]
> mlx4_init_mr_table+0xca/0x150 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>]
> mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ?
> mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>]
> mlx4_setup_hca+0x2b/0x70 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>]
> __mlx4_init_one+0x744/0x960 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>]
> mlx4_init_one+0x3d/0x42 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>]
> pci_call_probe+0x96/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>]
> pci_device_probe+0x79/0xa0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ?
> driver_sysfs_add+0x7a/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>]
> really_probe+0x68/0x200
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>]
> driver_probe_device+0x22/0x30
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>]
> __driver_attach+0xab/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ?
> driver_probe_device+0x30/0x30
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>]
> bus_for_each_dev+0x56/0x90
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>]
> driver_attach+0x1e/0x20
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>]
> bus_add_driver+0x1a0/0x270
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>]
> driver_register+0x76/0x130
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ?
> notifier_call_chain+0x4d/0x70
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ?
> add_kallsyms+0x1e0/0x1e0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>]
> __pci_register_driver+0x55/0xd0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>]
> mlx4_init+0xac/0xec [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>]
> do_one_initcall+0x3f/0x170
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>]
> sys_init_module+0x8f/0x200
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>]
> system_call_fastpath+0x16/0x1b
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace
> 8886e8f0c535939d ]---
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0:
> Failed to initialize memory region table, aborting.
> > Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of
> 0000:86:00.0 failed with error -12
> >
> >
> >
> > Dr. Benoit Hudzia
> > Senior Researcher
> >
> > SAP Next Business and Technology
> > SAP (UK) Limited
> > The Concourse Building
> > Queen's Road , Queen's Island, Titanic Quarter
> > BT3 9TD Belfast
> > T +44 (0)28 9078 5742
> > F +44 (0)28 9078  5777
> > M +44 (0)79 834 46729
> > mailto:benoit.hudzia@sap.com
> > www.sap.com/research
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mlx4 module loading fail
       [not found] ` <96353B6F8A3DAE4BBC51047BD0E6BAC20913A5-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
  2013-03-07 12:38   ` Dongsu Park
@ 2013-03-07 15:34   ` Or Gerlitz
       [not found]     ` <5138B372.4020201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2013-03-07 15:34 UTC (permalink / raw)
  To: Hudzia, Benoit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

On 07/03/2013 13:18, Hudzia, Benoit wrote:
> I am currently experiencing some trouble with my connectx2 cards. I  have been doing test with smallish server without any problem and this week I upgraded to a more beefier option. However I fail to be able setup the IB card with our current kernel.
> The servers spec are as follow:
> 	* 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
> 	* 1TB of RAM
> 	* 1 connectx2 IB
>
> Kernel Version : 3.5.0 Note if I downgrade to a 3.2 kernel I do not experience this issue. However I am forced to work with a 3.5 or higher. Can somebody help me with that?

Hi Benoit,

As was suggested here can you try 3.8 or 3.9-rc1, this will help a lot 
to isolate the problem, but even before that, the warning you are 
getting is as of
allocation with order > MAX_ORDER, what's MAX_ORDER under your 
configuration and what value do you provide to mlx4_buddy_init from 
mlx4_init_mr_table (did you modify that code?)

Or.

>
> Kernel log trace:
>
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here ]------------
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm tpm_bios acpi_memhotpl
> ug evbug crc32c_intel megaraid_sas usbhid hid
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe Not tainted 3.5.0-heca-dev-34dd48a+ #29
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>] warn_slowpath_common+0x7f/0xc0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>] warn_slowpath_null+0x1a/0x20
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>] __alloc_pages_nodemask+0x2b9/0x810
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ? __alloc_pages_nodemask+0x185/0x810
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>] alloc_pages_current+0xb6/0x120
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>] __get_free_pages+0xe/0x40
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>] kmalloc_order_trace+0x3f/0xd0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ? __get_free_pages+0xe/0x40
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>] __kmalloc+0x100/0x160
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>] mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>] mlx4_init_mr_table+0xca/0x150 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>] mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ? mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>] mlx4_setup_hca+0x2b/0x70 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>] __mlx4_init_one+0x744/0x960 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>] mlx4_init_one+0x3d/0x42 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>] pci_call_probe+0x96/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>] pci_device_probe+0x79/0xa0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ? driver_sysfs_add+0x7a/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>] really_probe+0x68/0x200
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>] driver_probe_device+0x22/0x30
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>] __driver_attach+0xab/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ? driver_probe_device+0x30/0x30
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>] bus_for_each_dev+0x56/0x90
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>] driver_attach+0x1e/0x20
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>] bus_add_driver+0x1a0/0x270
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>] driver_register+0x76/0x130
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ? notifier_call_chain+0x4d/0x70
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ? add_kallsyms+0x1e0/0x1e0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>] __pci_register_driver+0x55/0xd0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ? mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>] mlx4_init+0xac/0xec [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>] sys_init_module+0x8f/0x200
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>] system_call_fastpath+0x16/0x1b
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace 8886e8f0c535939d ]---
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0: Failed to initialize memory region table, aborting.
> Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of 0000:86:00.0 failed with error -12
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: mlx4 module loading fail
       [not found]     ` <5138B372.4020201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-03-07 16:06       ` Hudzia, Benoit
       [not found]         ` <96353B6F8A3DAE4BBC51047BD0E6BAC20914D9-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Hudzia, Benoit @ 2013-03-07 16:06 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

Hi Or,


We didn't change that code  as our code is  sitting above the rdma_ucm  bit.  ( we do not touch any of the core RDMA function or drivers, just using them).
We are using the default OFED setup ( driver are loaded with the default config ) and there is nothing special .
I will investigate the MAX_ORDER  aspect asap and test with 3.9rc1 also. 

However I did a quick test and by removing physically HALF the ram of the server ( basically moving from 1TB to 512GB) everything works fine.. 


Regards
Benoit



> -----Original Message-----
> From: Or Gerlitz [mailto:ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Sent: 07 March 2013 15:34
> To: Hudzia, Benoit
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Jack Morgenstein
> Subject: Re: mlx4 module loading fail
> 
> On 07/03/2013 13:18, Hudzia, Benoit wrote:
> > I am currently experiencing some trouble with my connectx2 cards. I  have
> been doing test with smallish server without any problem and this week I
> upgraded to a more beefier option. However I fail to be able setup the IB
> card with our current kernel.
> > The servers spec are as follow:
> > 	* 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
> > 	* 1TB of RAM
> > 	* 1 connectx2 IB
> >
> > Kernel Version : 3.5.0 Note if I downgrade to a 3.2 kernel I do not
> experience this issue. However I am forced to work with a 3.5 or higher. Can
> somebody help me with that?
> 
> Hi Benoit,
> 
> As was suggested here can you try 3.8 or 3.9-rc1, this will help a lot
> to isolate the problem, but even before that, the warning you are
> getting is as of
> allocation with order > MAX_ORDER, what's MAX_ORDER under your
> configuration and what value do you provide to mlx4_buddy_init from
> mlx4_init_mr_table (did you modify that code?)
> 
> Or.
> 
> >
> > Kernel log trace:
> >
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here ]---------
> ---
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at
> mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev
> coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio
> ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm
> tpm_bios acpi_memhotpl
> > ug evbug crc32c_intel megaraid_sas usbhid hid
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe
> Not tainted 3.5.0-heca-dev-34dd48a+ #29
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>]
> warn_slowpath_common+0x7f/0xc0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>]
> warn_slowpath_null+0x1a/0x20
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>]
> __alloc_pages_nodemask+0x2b9/0x810
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ?
> __alloc_pages_nodemask+0x185/0x810
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>]
> alloc_pages_current+0xb6/0x120
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>]
> __get_free_pages+0xe/0x40
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>]
> kmalloc_order_trace+0x3f/0xd0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ?
> __get_free_pages+0xe/0x40
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>]
> __kmalloc+0x100/0x160
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>]
> mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>]
> mlx4_init_mr_table+0xca/0x150 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>]
> mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ?
> mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>]
> mlx4_setup_hca+0x2b/0x70 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>]
> __mlx4_init_one+0x744/0x960 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>]
> mlx4_init_one+0x3d/0x42 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>]
> pci_call_probe+0x96/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>]
> pci_device_probe+0x79/0xa0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ?
> driver_sysfs_add+0x7a/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>]
> really_probe+0x68/0x200
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>]
> driver_probe_device+0x22/0x30
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>]
> __driver_attach+0xab/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ?
> driver_probe_device+0x30/0x30
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>]
> bus_for_each_dev+0x56/0x90
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>]
> driver_attach+0x1e/0x20
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>]
> bus_add_driver+0x1a0/0x270
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>]
> driver_register+0x76/0x130
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ?
> notifier_call_chain+0x4d/0x70
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ?
> add_kallsyms+0x1e0/0x1e0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>]
> __pci_register_driver+0x55/0xd0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>]
> mlx4_init+0xac/0xec [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>]
> do_one_initcall+0x3f/0x170
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>]
> sys_init_module+0x8f/0x200
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>]
> system_call_fastpath+0x16/0x1b
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace
> 8886e8f0c535939d ]---
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0:
> Failed to initialize memory region table, aborting.
> > Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of
> 0000:86:00.0 failed with error -12
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mlx4 module loading fail
       [not found]         ` <96353B6F8A3DAE4BBC51047BD0E6BAC20914D9-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
@ 2013-03-07 16:22           ` Or Gerlitz
       [not found]             ` <5138BED3.30506-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-03-08 13:32           ` Or Gerlitz
  1 sibling, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2013-03-07 16:22 UTC (permalink / raw)
  To: Hudzia, Benoit; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

On 07/03/2013 18:06, Hudzia, Benoit wrote:
> We didn't change that code  as our code is  sitting above the rdma_ucm  bit.  ( we do not touch any of the core RDMA function or drivers, just using them). We are using the default OFED setup ( driver are loaded with the default config ) and there is nothing special. I will investigate the MAX_ORDER aspect asap and test with 3.9rc1 also.

Do you use plain upstream bits or install driver from external source?


>
> However I did a quick test and by removing physically HALF the ram of the server ( basically moving from 1TB to 512GB) everything works fine..
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: mlx4 module loading fail
       [not found]             ` <5138BED3.30506-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-03-07 16:54               ` Hudzia, Benoit
  0 siblings, 0 replies; 11+ messages in thread
From: Hudzia, Benoit @ 2013-03-07 16:54 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

Plain upstream. 

Debian testing  with compiled upstream kernel . 

I replicated it also with Centos .


> -----Original Message-----
> From: Or Gerlitz [mailto:ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Sent: 07 March 2013 16:23
> To: Hudzia, Benoit
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Jack Morgenstein
> Subject: Re: mlx4 module loading fail
> 
> On 07/03/2013 18:06, Hudzia, Benoit wrote:
> > We didn't change that code  as our code is  sitting above the rdma_ucm  bit.
> ( we do not touch any of the core RDMA function or drivers, just using them).
> We are using the default OFED setup ( driver are loaded with the default
> config ) and there is nothing special. I will investigate the MAX_ORDER aspect
> asap and test with 3.9rc1 also.
> 
> Do you use plain upstream bits or install driver from external source?
> 
> 
> >
> > However I did a quick test and by removing physically HALF the ram of the
> server ( basically moving from 1TB to 512GB) everything works fine..
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mlx4 module loading fail
       [not found]         ` <96353B6F8A3DAE4BBC51047BD0E6BAC20914D9-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
  2013-03-07 16:22           ` Or Gerlitz
@ 2013-03-08 13:32           ` Or Gerlitz
       [not found]             ` <CAJZOPZKyZgpf3dqfif3c6WHWhriWic06xsWCkdo2TCars3Aehw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2013-03-08 13:32 UTC (permalink / raw)
  To: Hudzia, Benoit
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

On Thu, Mar 7, 2013 at 6:06 PM, Hudzia, Benoit <benoit.hudzia-y6kNeMnOB+c@public.gmane.org> wrote:
>
> However I did a quick test and by removing physically HALF the ram of the server ( basically moving from 1TB to 512GB) everything works fine..



Yep, you probably hit the problem fixed by commit "mlx4_core: Allow
large mlx4_buddy bitmaps" 89dd86db78e08b51bab29e168fd41b2fd943e6b6,
updating your kernel should get that from your way.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: mlx4 module loading fail
       [not found]             ` <CAJZOPZKyZgpf3dqfif3c6WHWhriWic06xsWCkdo2TCars3Aehw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-03-14 22:53               ` Hudzia, Benoit
       [not found]                 ` <96353B6F8A3DAE4BBC51047BD0E6BAC2094AD2-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Hudzia, Benoit @ 2013-03-14 22:53 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

Hi upgrading to 3.9 rc2 fix the issue.

Also  3.2 kernel and under  doesn't cause any error 

> -----Original Message-----
> From: Or Gerlitz [mailto:or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
> Sent: 08 March 2013 13:33
> To: Hudzia, Benoit
> Cc: Or Gerlitz; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Jack Morgenstein
> Subject: Re: mlx4 module loading fail
> 
> On Thu, Mar 7, 2013 at 6:06 PM, Hudzia, Benoit <benoit.hudzia-y6kNeMnOB+c@public.gmane.org>
> wrote:
> >
> > However I did a quick test and by removing physically HALF the ram of the
> server ( basically moving from 1TB to 512GB) everything works fine..
> 
> 
> 
> Yep, you probably hit the problem fixed by commit "mlx4_core: Allow
> large mlx4_buddy bitmaps" 89dd86db78e08b51bab29e168fd41b2fd943e6b6,
> updating your kernel should get that from your way.
> 
> Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mlx4 module loading fail
       [not found]                 ` <96353B6F8A3DAE4BBC51047BD0E6BAC2094AD2-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
@ 2013-03-17  7:45                   ` Or Gerlitz
       [not found]                     ` <514574AE.9080002-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Or Gerlitz @ 2013-03-17  7:45 UTC (permalink / raw)
  To: Hudzia, Benoit
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

On 15/03/2013 00:53, Hudzia, Benoit wrote:
> Hi upgrading to 3.9 rc2 fix the issue.

good! did you check 3.8?

> Also  3.2 kernel and under  doesn't cause any error

do you do any registration from user space? if yes, of how much memory?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: mlx4 module loading fail
       [not found]                     ` <514574AE.9080002-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-03-17  8:30                       ` Hudzia, Benoit
  0 siblings, 0 replies; 11+ messages in thread
From: Hudzia, Benoit @ 2013-03-17  8:30 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein

The error is only at boot. The amount of memory registered at run time rarely goes above 5 GB at any time. 



> -----Original Message-----
> From: Or Gerlitz [mailto:ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Sent: 17 March 2013 07:46
> To: Hudzia, Benoit
> Cc: Or Gerlitz; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Jack Morgenstein
> Subject: Re: mlx4 module loading fail
> 
> On 15/03/2013 00:53, Hudzia, Benoit wrote:
> > Hi upgrading to 3.9 rc2 fix the issue.
> 
> good! did you check 3.8?
> 
> > Also  3.2 kernel and under  doesn't cause any error
> 
> do you do any registration from user space? if yes, of how much memory?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-03-17  8:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-07 11:18 mlx4 module loading fail Hudzia, Benoit
     [not found] ` <96353B6F8A3DAE4BBC51047BD0E6BAC20913A5-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
2013-03-07 12:38   ` Dongsu Park
     [not found]     ` <20130307123854.GB15491-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-03-07 12:56       ` Hudzia, Benoit
2013-03-07 15:34   ` Or Gerlitz
     [not found]     ` <5138B372.4020201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-03-07 16:06       ` Hudzia, Benoit
     [not found]         ` <96353B6F8A3DAE4BBC51047BD0E6BAC20914D9-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
2013-03-07 16:22           ` Or Gerlitz
     [not found]             ` <5138BED3.30506-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-03-07 16:54               ` Hudzia, Benoit
2013-03-08 13:32           ` Or Gerlitz
     [not found]             ` <CAJZOPZKyZgpf3dqfif3c6WHWhriWic06xsWCkdo2TCars3Aehw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-14 22:53               ` Hudzia, Benoit
     [not found]                 ` <96353B6F8A3DAE4BBC51047BD0E6BAC2094AD2-v0w1aZ/WxVLTw0Kyn31wWKuC/IaeJB0jHWlK3eZauXw@public.gmane.org>
2013-03-17  7:45                   ` Or Gerlitz
     [not found]                     ` <514574AE.9080002-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-03-17  8:30                       ` Hudzia, Benoit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.