linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Problem with Infiniband adapter on IBM p550
@ 2010-10-08  3:24 Patrick Finnegan
  2010-10-08  5:41 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick Finnegan @ 2010-10-08  3:24 UTC (permalink / raw)
  To: linuxppc-dev

I seem to be running into a problem getting a Mellanox Infinihost  
Infiniband adapter working on my IBM p550 (a 9113-550).  I'm using 
Debian squeeze, and tried upgrading to the 2.6.35.7 kernel without any 
help.

I get the following messages in dmesg:
[    4.972548] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 
2008)
[    4.972564] ib_mthca: Initializing 0000:c1:00.0
[    4.972674] ib_mthca 0000:c1:00.0: Missing DCS, aborting.


The problem looks the same as a problem I ran into with OpenFirmware on 
a Sun V880, which was fixed with this patch by Dave Miller:
http://ns3.spinics.net/lists/linux-rdma/msg01779.html

I spent some time looking at the equivalent function on powerpc, but 
didn't a block of code that looked similar.

Any suggestions?

I have dmesg, the dev .properties from openfirmware, and lspci -v from 
the machine:

http://ned.rcac.purdue.edu/p550-ib/dmesg
http://ned.rcac.purdue.edu/p550-ib/ib-of-device
http://ned.rcac.purdue.edu/p550-ib/lspci-v

Pat
-- 
Purdue University Research Computing ---  http://www.rcac.purdue.edu/
The Computer Refuge                  ---  http://computer-refuge.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with Infiniband adapter on IBM p550
  2010-10-08  3:24 Problem with Infiniband adapter on IBM p550 Patrick Finnegan
@ 2010-10-08  5:41 ` Benjamin Herrenschmidt
  2010-10-08  5:45   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-08  5:41 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: linuxppc-dev

On Thu, 2010-10-07 at 23:24 -0400, Patrick Finnegan wrote:
> I seem to be running into a problem getting a Mellanox Infinihost  
> Infiniband adapter working on my IBM p550 (a 9113-550).  I'm using 
> Debian squeeze, and tried upgrading to the 2.6.35.7 kernel without any 
> help.
> 
> I get the following messages in dmesg:
> [    4.972548] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 
> 2008)
> [    4.972564] ib_mthca: Initializing 0000:c1:00.0
> [    4.972674] ib_mthca 0000:c1:00.0: Missing DCS, aborting.

Ok, so from what I can tell, the driver is unhappy because either BAR 0
hasn't been assigned a memory resource or the size doesn't match what
the driver expects.

Let's see...

> The problem looks the same as a problem I ran into with OpenFirmware on 
> a Sun V880, which was fixed with this patch by Dave Miller:
> http://ns3.spinics.net/lists/linux-rdma/msg01779.html
> 
> I spent some time looking at the equivalent function on powerpc, but 
> didn't a block of code that looked similar.

I don't think we are hitting the same problem. I believe our code in
that area differs enough.

In your lspci, however, I see:

	Memory at <unassigned> (64-bit, non-prefetchable)
	Memory at <unassigned> (64-bit, prefetchable)

Which doesn't look good...

>From your OF log

> Any suggestions?
> 
> I have dmesg, the dev .properties from openfirmware, and lspci -v from 
> the machine:
> 
> http://ned.rcac.purdue.edu/p550-ib/dmesg
> http://ned.rcac.purdue.edu/p550-ib/ib-of-device
> http://ned.rcac.purdue.edu/p550-ib/lspci-v
> 
> Pat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with Infiniband adapter on IBM p550
  2010-10-08  5:41 ` Benjamin Herrenschmidt
@ 2010-10-08  5:45   ` Benjamin Herrenschmidt
  2010-11-03  3:15     ` Patrick Finnegan
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-08  5:45 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: paulus, linuxppc-dev


> Ok, so from what I can tell, the driver is unhappy because either BAR 0
> hasn't been assigned a memory resource or the size doesn't match what
> the driver expects.
> 
Ooops, accidentally sent too quickly...

>From your OF log I see:

reg                     00c10000 00000000 00000000  00000000 00000000 
                        03c10010 00000000 00000000  00000000 00100000 
                        43c10018 00000000 00000000  00000000 00800000 
                        43c10020 00000000 00000000  00000000 08000000 
assigned-addresses      83c10020 00000000 e8000000  00000000 08000000 

Now, I think this is the problem.

The "assigned-addresses" property seems to indicate that the firmware only
assigned BAR 4 and didn't assign anything to the other ones.

I don't know why, but it definitely looks like a firmware bug to me. On those
machines, PCI resource assignment is under hypervisor control and so Linux
cannot re-assign missing resources itself.

I'll see if I can find a FW person to shed some light on this.

Can you provide me (privately maybe) with the FW version on the machine ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with Infiniband adapter on IBM p550
  2010-10-08  5:45   ` Benjamin Herrenschmidt
@ 2010-11-03  3:15     ` Patrick Finnegan
  2010-11-03 13:34       ` Anton Blanchard
  2010-11-03 13:46       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 8+ messages in thread
From: Patrick Finnegan @ 2010-11-03  3:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: paulus, linuxppc-dev

On Friday, October 08, 2010, Benjamin Herrenschmidt wrote:
> > Ok, so from what I can tell, the driver is unhappy because either
> > BAR 0 hasn't been assigned a memory resource or the size doesn't
> > match what the driver expects.
> 
> Ooops, accidentally sent too quickly...
> 
> >From your OF log I see:
> reg                     00c10000 00000000 00000000  00000000 00000000
>                         03c10010 00000000 00000000  00000000 00100000
>                         43c10018 00000000 00000000  00000000 00800000
>                         43c10020 00000000 00000000  00000000 08000000
> assigned-addresses      83c10020 00000000 e8000000  00000000 08000000
> 
> Now, I think this is the problem.
> 
> The "assigned-addresses" property seems to indicate that the firmware
> only assigned BAR 4 and didn't assign anything to the other ones.
> 
> I don't know why, but it definitely looks like a firmware bug to me.
> On those machines, PCI resource assignment is under hypervisor
> control and so Linux cannot re-assign missing resources itself.
> 
> I'll see if I can find a FW person to shed some light on this.
> 
> Can you provide me (privately maybe) with the FW version on the
> machine ?

Ben,

Have you found out anything more on this (firmware) bug?

Pat
-- 
Purdue University Research Computing ---  http://www.rcac.purdue.edu/
The Computer Refuge                  ---  http://computer-refuge.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with Infiniband adapter on IBM p550
  2010-11-03  3:15     ` Patrick Finnegan
@ 2010-11-03 13:34       ` Anton Blanchard
  2010-11-03 14:43         ` Patrick Finnegan
  2010-11-03 13:46       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 8+ messages in thread
From: Anton Blanchard @ 2010-11-03 13:34 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: paulus, linuxppc-dev


Hi,

> > Now, I think this is the problem.
> > 
> > The "assigned-addresses" property seems to indicate that the
> > firmware only assigned BAR 4 and didn't assign anything to the
> > other ones.
> > 
> > I don't know why, but it definitely looks like a firmware bug to me.
> > On those machines, PCI resource assignment is under hypervisor
> > control and so Linux cannot re-assign missing resources itself.
> > 
> > I'll see if I can find a FW person to shed some light on this.
> > 
> > Can you provide me (privately maybe) with the FW version on the
> > machine ?
> 
> Ben,
> 
> Have you found out anything more on this (firmware) bug?

Firmware has the concept of "super slots" which allow larger memory
windows and TCE tables. Section 3.4.3 explains it:

http://www.redbooks.ibm.com/redpapers/pdfs/redp4095.pdf

Anton

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with Infiniband adapter on IBM p550
  2010-11-03  3:15     ` Patrick Finnegan
  2010-11-03 13:34       ` Anton Blanchard
@ 2010-11-03 13:46       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2010-11-03 13:46 UTC (permalink / raw)
  To: Patrick Finnegan; +Cc: linuxppc-dev, paulus

On Tue, 2010-11-02 at 23:15 -0400, Patrick Finnegan wrote:
> > I don't know why, but it definitely looks like a firmware bug to me.
> > On those machines, PCI resource assignment is under hypervisor
> > control and so Linux cannot re-assign missing resources itself.
> > 
> > I'll see if I can find a FW person to shed some light on this.
> > 
> > Can you provide me (privately maybe) with the FW version on the
> > machine ?
> 
> Ben,
> 
> Have you found out anything more on this (firmware) bug?

No, not yet. Let me ping some folks again.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem with Infiniband adapter on IBM p550
  2010-11-03 13:34       ` Anton Blanchard
@ 2010-11-03 14:43         ` Patrick Finnegan
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick Finnegan @ 2010-11-03 14:43 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: paulus, linuxppc-dev

On Wednesday, November 03, 2010, Anton Blanchard wrote:
> Firmware has the concept of "super slots" which allow larger memory
> windows and TCE tables. Section 3.4.3 explains it:
> 
> http://www.redbooks.ibm.com/redpapers/pdfs/redp4095.pdf

Aha!  I tried moving the adapter from slot C3 to C5, which is listed in 
that guide, and now it's working.

Thanks for the pointer!

Pat
-- 
Purdue University Research Computing ---  http://www.rcac.purdue.edu/
The Computer Refuge                  ---  http://computer-refuge.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Problem with Infiniband adapter on IBM p550
@ 2010-10-08  2:57 Patrick Finnegan
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick Finnegan @ 2010-10-08  2:57 UTC (permalink / raw)
  To: linuxppc-dev

I seem to be running into a problem getting a Mellanox Infinihost  
Infiniband adapter working on my IBM p550 (a 9113-550).  I'm using 
Debian squeeze, and tried upgrading to the 2.6.35.7 kernel without any 
help.

I get the following messages in dmesg:
[    4.972548] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 
2008)
[    4.972564] ib_mthca: Initializing 0000:c1:00.0
[    4.972674] ib_mthca 0000:c1:00.0: Missing DCS, aborting.


The problem looks the same as a problem I ran into with OpenFirmware on 
a Sun V880, which was fixed with this patch by Dave Miller:
http://ns3.spinics.net/lists/linux-rdma/msg01779.html

I spent some time looking at the equivalent function on powerpc, but 
didn't a block of code that looked similar.

Any suggestions?

I have dmesg, the dev .properties from openfirmware, and lspci -v from 
the machine:

http://ned.rcac.purdue.edu/p550-ib/dmesg
http://ned.rcac.purdue.edu/p550-ib/ib-of-device
http://ned.rcac.purdue.edu/p550-ib/lspci-v

Pat
-- 
Purdue University ITaP/Research Systems -- http://www.rcac.purdue.edu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-11-03 14:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-08  3:24 Problem with Infiniband adapter on IBM p550 Patrick Finnegan
2010-10-08  5:41 ` Benjamin Herrenschmidt
2010-10-08  5:45   ` Benjamin Herrenschmidt
2010-11-03  3:15     ` Patrick Finnegan
2010-11-03 13:34       ` Anton Blanchard
2010-11-03 14:43         ` Patrick Finnegan
2010-11-03 13:46       ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2010-10-08  2:57 Patrick Finnegan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).