All of lore.kernel.org
 help / color / mirror / Atom feed
* Errors from ibchecknet
@ 2010-09-02 12:34 Chuck Hartley
       [not found] ` <AANLkTi=EuVGxLyjMFMw=YZm38CPvCTbcxt104R-B_o_i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Hartley @ 2010-09-02 12:34 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello,

We installed 1.5.1 and are having problems getting the IB fabric
working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
no errors. However, ibchecknet shows that the switch ports are not
being configured.  We have never seen this before and are at a loss as
to where the problem might be - would someone please point us in the
right direction to look?  Could it be a problem with the switch
itself? Output from ibchecknet below.


# ibchecknet
Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 7
#warn: SM Lid is not configured
Port check lid 3 port 7:  FAILED
# Checked Switch: nodeguid 0x0002c90200405368 with failure
ibwarn: [26751] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 10
#warn: SM Lid is not configured
Port check lid 3 port 10:  FAILED
ibwarn: [26770] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 11
#warn: SM Lid is not configured
Port check lid 3 port 11:  FAILED
ibwarn: [26789] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 34
#warn: SM Lid is not configured
Port check lid 3 port 34:  FAILED
ibwarn: [26808] dump_perfcounters: PortXmitWait not indicated so
ignore this counter
#warn: Lid is not configured lid 3 port 35
#warn: SM Lid is not configured
Port check lid 3 port 35:  FAILED

# Checking Ca: nodeguid 0x0030487f30760000
ibwarn: [26832] dump_perfcounters: PortXmitWait not indicated so
ignore this counter

# Checking Ca: nodeguid 0x0030487f32b20000
ibwarn: [26856] dump_perfcounters: PortXmitWait not indicated so
ignore this counter

# Checking Ca: nodeguid 0x0002c9030003360c

# Checking Ca: nodeguid 0x0002c90300084162
ibwarn: [26904] dump_perfcounters: PortXmitWait not indicated so
ignore this counter

# Checking Ca: nodeguid 0x0002c90300032de0

## Summary: 6 nodes checked, 0 bad nodes found
##          10 ports checked, 5 bad ports found
##          0 ports have errors beyond threshold
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found] ` <AANLkTi=EuVGxLyjMFMw=YZm38CPvCTbcxt104R-B_o_i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-09-02 12:56   ` Hal Rosenstock
       [not found]     ` <AANLkTikoYH+1tBzHX29uUnFGepFhfXdOkRHSX3OLW9g6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Hal Rosenstock @ 2010-09-02 12:56 UTC (permalink / raw)
  To: Chuck Hartley; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hello,
>
> We installed 1.5.1 and are having problems getting the IB fabric
> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
> no errors. However, ibchecknet shows that the switch ports are not
> being configured.  We have never seen this before and are at a loss as
> to where the problem might be - would someone please point us in the
> right direction to look?  Could it be a problem with the switch
> itself? Output from ibchecknet below.
>
>
> # ibchecknet
> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
> #warn: Lid is not configured lid 3 port 7
> #warn: SM Lid is not configured

Is there an SM running on your subnet ? If so, I think that the lack
of an SM could account for all of the issues mentioned here.

-- Hal

> Port check lid 3 port 7:  FAILED
> # Checked Switch: nodeguid 0x0002c90200405368 with failure
> ibwarn: [26751] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
> #warn: Lid is not configured lid 3 port 10
> #warn: SM Lid is not configured
> Port check lid 3 port 10:  FAILED
> ibwarn: [26770] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
> #warn: Lid is not configured lid 3 port 11
> #warn: SM Lid is not configured
> Port check lid 3 port 11:  FAILED
> ibwarn: [26789] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
> #warn: Lid is not configured lid 3 port 34
> #warn: SM Lid is not configured
> Port check lid 3 port 34:  FAILED
> ibwarn: [26808] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
> #warn: Lid is not configured lid 3 port 35
> #warn: SM Lid is not configured
> Port check lid 3 port 35:  FAILED
>
> # Checking Ca: nodeguid 0x0030487f30760000
> ibwarn: [26832] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
>
> # Checking Ca: nodeguid 0x0030487f32b20000
> ibwarn: [26856] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
>
> # Checking Ca: nodeguid 0x0002c9030003360c
>
> # Checking Ca: nodeguid 0x0002c90300084162
> ibwarn: [26904] dump_perfcounters: PortXmitWait not indicated so
> ignore this counter
>
> # Checking Ca: nodeguid 0x0002c90300032de0
>
> ## Summary: 6 nodes checked, 0 bad nodes found
> ##          10 ports checked, 5 bad ports found
> ##          0 ports have errors beyond threshold
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]     ` <AANLkTikoYH+1tBzHX29uUnFGepFhfXdOkRHSX3OLW9g6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-09-02 13:56       ` Chuck Hartley
       [not found]         ` <AANLkTikQ3jeNwtbR6zqpSvt7=YWatRhWfr60Q5of_FJR-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Hartley @ 2010-09-02 13:56 UTC (permalink / raw)
  To: Hal Rosenstock; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

We swapped in a different switch and see the same errors. The opensm
logfile does not show any errors:

-------------------------------------------------
OpenSM 3.3.5
Command Line Arguments:
 Daemon mode
 Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 3.3.5

Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
Entering DISCOVERING state

Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
pending umads specified
Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
Using default GUID 0x2c90300032de1
Entering MASTER state

Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
port 0x2c90300032de1
Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
port 0x2c90300032de1
Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
IS_SM on port 0x0002c90300032de1
Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
SUBNET UP

Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
minhop tables configured on all switches
Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP


On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8@public.gmane.orgm> wrote:
> On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hello,
>>
>> We installed 1.5.1 and are having problems getting the IB fabric
>> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
>> no errors. However, ibchecknet shows that the switch ports are not
>> being configured.  We have never seen this before and are at a loss as
>> to where the problem might be - would someone please point us in the
>> right direction to look?  Could it be a problem with the switch
>> itself? Output from ibchecknet below.
>>
>>
>> # ibchecknet
>> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
>> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
>> ignore this counter
>> #warn: Lid is not configured lid 3 port 7
>> #warn: SM Lid is not configured
>
> Is there an SM running on your subnet ? If so, I think that the lack
> of an SM could account for all of the issues mentioned here.
>
> -- Hal
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]         ` <AANLkTikQ3jeNwtbR6zqpSvt7=YWatRhWfr60Q5of_FJR-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-09-02 16:03           ` Ira Weiny
       [not found]             ` <20100902090330.093d8ba6.weiny2-i2BcT+NCU+M@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ira Weiny @ 2010-09-02 16:03 UTC (permalink / raw)
  To: Chuck Hartley; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2 Sep 2010 06:56:50 -0700
Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> We swapped in a different switch and see the same errors. The opensm
> logfile does not show any errors:

Could you run "ibstat" on the node with OpenSM running?

And "iblinkinfo" on the same node?

Send that output.

Ira

> 
> -------------------------------------------------
> OpenSM 3.3.5
> Command Line Arguments:
>  Daemon mode
>  Log File: /var/log/opensm.log
> -------------------------------------------------
> OpenSM 3.3.5
> 
> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
> Entering DISCOVERING state
> 
> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
> pending umads specified
> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
> Using default GUID 0x2c90300032de1
> Entering MASTER state
> 
> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> port 0x2c90300032de1
> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> port 0x2c90300032de1
> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
> IS_SM on port 0x0002c90300032de1
> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
> SUBNET UP
> 
> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
> minhop tables configured on all switches
> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
> 
> 
> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> Hello,
> >>
> >> We installed 1.5.1 and are having problems getting the IB fabric
> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
> >> no errors. However, ibchecknet shows that the switch ports are not
> >> being configured.  We have never seen this before and are at a loss as
> >> to where the problem might be - would someone please point us in the
> >> right direction to look?  Could it be a problem with the switch
> >> itself? Output from ibchecknet below.
> >>
> >>
> >> # ibchecknet
> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
> >> ignore this counter
> >> #warn: Lid is not configured lid 3 port 7
> >> #warn: SM Lid is not configured
> >
> > Is there an SM running on your subnet ? If so, I think that the lack
> > of an SM could account for all of the issues mentioned here.
> >
> > -- Hal
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]             ` <20100902090330.093d8ba6.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2010-09-02 18:11               ` Chuck Hartley
       [not found]                 ` <AANLkTin=p2Mt223QS1oXfy6vi1RBNiF4HaQQcScfg5FP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-09-02 18:16               ` Chuck Hartley
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Hartley @ 2010-09-02 18:11 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Sure, here is the output:
Note this is with the switch we swapped in, so the port numbers don't
match the ibchecknet output in the original message.

# ibstat
CA 'mlx4_0'
	CA type: MT26428
	Number of ports: 2
	Firmware version: 2.6.0
	Hardware version: a0
	Node GUID: 0x0002c90300032de0
	System image GUID: 0x0002c90300032de3
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 6
		LMC: 0
		SM lid: 6
		Capability mask: 0x0251086a
		Port GUID: 0x0002c90300032de1
	Port 2:
		State: Down
		Physical state: Polling
		Rate: 10
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x02510868
		Port GUID: 0x0002c90300032de2
CA 'mthca0'
	CA type: MT25204
	Number of ports: 1
	Firmware version: 1.2.0
	Hardware version: a0
	Node GUID: 0x003048c64c0c0000
	System image GUID: 0x003048c64c0c0003
	Port 1:
		State: Down
		Physical state: Polling
		Rate: 10
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x02510a68
		Port GUID: 0x003048c64c0c0001

# iblinkinfo
Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
           1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5
1[  ] " HCA-1" ( )
           1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6
1[  ] "linux70 HCA-1" ( )
           1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7
1[  ] "linux71 HCA-1" ( )
           1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       9
1[  ] " HCA-1" ( )
           1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       8
1[  ] " HCA-1" ( )
           1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )
           1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
[  ] "" ( )

On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> On Thu, 2 Sep 2010 06:56:50 -0700
> Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> We swapped in a different switch and see the same errors. The opensm
>> logfile does not show any errors:
>
> Could you run "ibstat" on the node with OpenSM running?
>
> And "iblinkinfo" on the same node?
>
> Send that output.
>
> Ira
>
>>
>> -------------------------------------------------
>> OpenSM 3.3.5
>> Command Line Arguments:
>>  Daemon mode
>>  Log File: /var/log/opensm.log
>> -------------------------------------------------
>> OpenSM 3.3.5
>>
>> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
>> Entering DISCOVERING state
>>
>> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
>> pending umads specified
>> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
>> Using default GUID 0x2c90300032de1
>> Entering MASTER state
>>
>> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
>> port 0x2c90300032de1
>> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
>> port 0x2c90300032de1
>> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
>> IS_SM on port 0x0002c90300032de1
>> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
>> SUBNET UP
>>
>> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
>> minhop tables configured on all switches
>> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
>>
>>
>> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
>> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> >> Hello,
>> >>
>> >> We installed 1.5.1 and are having problems getting the IB fabric
>> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
>> >> no errors. However, ibchecknet shows that the switch ports are not
>> >> being configured.  We have never seen this before and are at a loss as
>> >> to where the problem might be - would someone please point us in the
>> >> right direction to look?  Could it be a problem with the switch
>> >> itself? Output from ibchecknet below.
>> >>
>> >>
>> >> # ibchecknet
>> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
>> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
>> >> ignore this counter
>> >> #warn: Lid is not configured lid 3 port 7
>> >> #warn: SM Lid is not configured
>> >
>> > Is there an SM running on your subnet ? If so, I think that the lack
>> > of an SM could account for all of the issues mentioned here.
>> >
>> > -- Hal
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2-i2BcT+NCU+M@public.gmane.org
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]             ` <20100902090330.093d8ba6.weiny2-i2BcT+NCU+M@public.gmane.org>
  2010-09-02 18:11               ` Chuck Hartley
@ 2010-09-02 18:16               ` Chuck Hartley
  1 sibling, 0 replies; 11+ messages in thread
From: Chuck Hartley @ 2010-09-02 18:16 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

BTW, I am able to communicate between nodes via 'ibping'.  That is the
only test program I found that will work without needing a host IP.



On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> On Thu, 2 Sep 2010 06:56:50 -0700
> Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> We swapped in a different switch and see the same errors. The opensm
>> logfile does not show any errors:
>
> Could you run "ibstat" on the node with OpenSM running?
>
> And "iblinkinfo" on the same node?
>
> Send that output.
>
> Ira
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]                 ` <AANLkTin=p2Mt223QS1oXfy6vi1RBNiF4HaQQcScfg5FP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-09-02 20:16                   ` Ira Weiny
       [not found]                     ` <20100902131614.440c3111.weiny2-i2BcT+NCU+M@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ira Weiny @ 2010-09-02 20:16 UTC (permalink / raw)
  To: Chuck Hartley; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, 2 Sep 2010 11:11:13 -0700
Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Sure, here is the output:
> Note this is with the switch we swapped in, so the port numbers don't
> match the ibchecknet output in the original message.
> 
> # ibstat
> CA 'mlx4_0'
> 	CA type: MT26428
> 	Number of ports: 2
> 	Firmware version: 2.6.0
> 	Hardware version: a0
> 	Node GUID: 0x0002c90300032de0
> 	System image GUID: 0x0002c90300032de3
> 	Port 1:
> 		State: Active
> 		Physical state: LinkUp
> 		Rate: 40
> 		Base lid: 6
> 		LMC: 0
> 		SM lid: 6

Well the SM lid is set here.  Is it set on the other nodes?

I don't run ibchecknet usually but I am getting the same errors here on a
working fabric...

ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this counter
#warn: Lid is not configured lid 37 port 2
#warn: SM Lid is not configured
Port check lid 37 port 2:  FAILED 

Looking at this output I don't think this is an error.

13:17:14 > smpquery nodeinfo 37
# Node info: Lid 37
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Switch
NumPorts:........................24
...

On switch external Ports the Lid and SMLid are not used.

Hal, would you concur?

Chuck,
Is it just that IPoIB is not working for you?

Ira


> 		Capability mask: 0x0251086a
> 		Port GUID: 0x0002c90300032de1
> 	Port 2:
> 		State: Down
> 		Physical state: Polling
> 		Rate: 10
> 		Base lid: 0
> 		LMC: 0
> 		SM lid: 0
> 		Capability mask: 0x02510868
> 		Port GUID: 0x0002c90300032de2
> CA 'mthca0'
> 	CA type: MT25204
> 	Number of ports: 1
> 	Firmware version: 1.2.0
> 	Hardware version: a0
> 	Node GUID: 0x003048c64c0c0000
> 	System image GUID: 0x003048c64c0c0003
> 	Port 1:
> 		State: Down
> 		Physical state: Polling
> 		Rate: 10
> 		Base lid: 0
> 		LMC: 0
> 		SM lid: 0
> 		Capability mask: 0x02510a68
> 		Port GUID: 0x003048c64c0c0001
> 
> # iblinkinfo
> Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
>            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5
> 1[  ] " HCA-1" ( )
>            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6
> 1[  ] "linux70 HCA-1" ( )
>            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7
> 1[  ] "linux71 HCA-1" ( )
>            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       9
> 1[  ] " HCA-1" ( )
>            1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       8
> 1[  ] " HCA-1" ( )
>            1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
>            1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> [  ] "" ( )
> 
> On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> > On Thu, 2 Sep 2010 06:56:50 -0700
> > Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> >> We swapped in a different switch and see the same errors. The opensm
> >> logfile does not show any errors:
> >
> > Could you run "ibstat" on the node with OpenSM running?
> >
> > And "iblinkinfo" on the same node?
> >
> > Send that output.
> >
> > Ira
> >
> >>
> >> -------------------------------------------------
> >> OpenSM 3.3.5
> >> Command Line Arguments:
> >>  Daemon mode
> >>  Log File: /var/log/opensm.log
> >> -------------------------------------------------
> >> OpenSM 3.3.5
> >>
> >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
> >> Entering DISCOVERING state
> >>
> >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
> >> pending umads specified
> >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
> >> Using default GUID 0x2c90300032de1
> >> Entering MASTER state
> >>
> >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> >> port 0x2c90300032de1
> >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> >> port 0x2c90300032de1
> >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
> >> IS_SM on port 0x0002c90300032de1
> >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
> >> SUBNET UP
> >>
> >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
> >> minhop tables configured on all switches
> >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
> >>
> >>
> >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
> >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14@gmail.com> wrote:
> >> >> Hello,
> >> >>
> >> >> We installed 1.5.1 and are having problems getting the IB fabric
> >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
> >> >> no errors. However, ibchecknet shows that the switch ports are not
> >> >> being configured.  We have never seen this before and are at a loss as
> >> >> to where the problem might be - would someone please point us in the
> >> >> right direction to look?  Could it be a problem with the switch
> >> >> itself? Output from ibchecknet below.
> >> >>
> >> >>
> >> >> # ibchecknet
> >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
> >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
> >> >> ignore this counter
> >> >> #warn: Lid is not configured lid 3 port 7
> >> >> #warn: SM Lid is not configured
> >> >
> >> > Is there an SM running on your subnet ? If so, I think that the lack
> >> > of an SM could account for all of the issues mentioned here.
> >> >
> >> > -- Hal
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
> >>
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > weiny2-i2BcT+NCU+M@public.gmane.org
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]                     ` <20100902131614.440c3111.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2010-09-02 20:34                       ` Hal Rosenstock
  2010-09-03 21:04                       ` Chuck Hartley
  1 sibling, 0 replies; 11+ messages in thread
From: Hal Rosenstock @ 2010-09-02 20:34 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Chuck Hartley, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 2, 2010 at 4:16 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> On Thu, 2 Sep 2010 11:11:13 -0700
> Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Sure, here is the output:
>> Note this is with the switch we swapped in, so the port numbers don't
>> match the ibchecknet output in the original message.
>>
>> # ibstat
>> CA 'mlx4_0'
>>       CA type: MT26428
>>       Number of ports: 2
>>       Firmware version: 2.6.0
>>       Hardware version: a0
>>       Node GUID: 0x0002c90300032de0
>>       System image GUID: 0x0002c90300032de3
>>       Port 1:
>>               State: Active
>>               Physical state: LinkUp
>>               Rate: 40
>>               Base lid: 6
>>               LMC: 0
>>               SM lid: 6
>
> Well the SM lid is set here.  Is it set on the other nodes?
>
> I don't run ibchecknet usually but I am getting the same errors here on a
> working fabric...
>
> ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this counter
> #warn: Lid is not configured lid 37 port 2
> #warn: SM Lid is not configured
> Port check lid 37 port 2:  FAILED
>
> Looking at this output I don't think this is an error.
>
> 13:17:14 > smpquery nodeinfo 37
> # Node info: Lid 37
> BaseVers:........................1
> ClassVers:.......................1
> NodeType:........................Switch
> NumPorts:........................24
> ...
>
> On switch external Ports the Lid and SMLid are not used.
>
> Hal, would you concur?

Yes, on switch external ports, both LID and SMLID are not valid.

-- Hal

>
> Chuck,
> Is it just that IPoIB is not working for you?
>
> Ira
>
>
>>               Capability mask: 0x0251086a
>>               Port GUID: 0x0002c90300032de1
>>       Port 2:
>>               State: Down
>>               Physical state: Polling
>>               Rate: 10
>>               Base lid: 0
>>               LMC: 0
>>               SM lid: 0
>>               Capability mask: 0x02510868
>>               Port GUID: 0x0002c90300032de2
>> CA 'mthca0'
>>       CA type: MT25204
>>       Number of ports: 1
>>       Firmware version: 1.2.0
>>       Hardware version: a0
>>       Node GUID: 0x003048c64c0c0000
>>       System image GUID: 0x003048c64c0c0003
>>       Port 1:
>>               State: Down
>>               Physical state: Polling
>>               Rate: 10
>>               Base lid: 0
>>               LMC: 0
>>               SM lid: 0
>>               Capability mask: 0x02510a68
>>               Port GUID: 0x003048c64c0c0001
>>
>> # iblinkinfo
>> Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
>>            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5
>> 1[  ] " HCA-1" ( )
>>            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6
>> 1[  ] "linux70 HCA-1" ( )
>>            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7
>> 1[  ] "linux71 HCA-1" ( )
>>            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       9
>> 1[  ] " HCA-1" ( )
>>            1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       8
>> 1[  ] " HCA-1" ( )
>>            1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>
>> On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
>> > On Thu, 2 Sep 2010 06:56:50 -0700
>> > Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> >
>> >> We swapped in a different switch and see the same errors. The opensm
>> >> logfile does not show any errors:
>> >
>> > Could you run "ibstat" on the node with OpenSM running?
>> >
>> > And "iblinkinfo" on the same node?
>> >
>> > Send that output.
>> >
>> > Ira
>> >
>> >>
>> >> -------------------------------------------------
>> >> OpenSM 3.3.5
>> >> Command Line Arguments:
>> >>  Daemon mode
>> >>  Log File: /var/log/opensm.log
>> >> -------------------------------------------------
>> >> OpenSM 3.3.5
>> >>
>> >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
>> >> Entering DISCOVERING state
>> >>
>> >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
>> >> pending umads specified
>> >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
>> >> Using default GUID 0x2c90300032de1
>> >> Entering MASTER state
>> >>
>> >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
>> >> port 0x2c90300032de1
>> >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
>> >> port 0x2c90300032de1
>> >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
>> >> IS_SM on port 0x0002c90300032de1
>> >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
>> >> SUBNET UP
>> >>
>> >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
>> >> minhop tables configured on all switches
>> >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
>> >>
>> >>
>> >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
>> >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14@gmail.com> wrote:
>> >> >> Hello,
>> >> >>
>> >> >> We installed 1.5.1 and are having problems getting the IB fabric
>> >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
>> >> >> no errors. However, ibchecknet shows that the switch ports are not
>> >> >> being configured.  We have never seen this before and are at a loss as
>> >> >> to where the problem might be - would someone please point us in the
>> >> >> right direction to look?  Could it be a problem with the switch
>> >> >> itself? Output from ibchecknet below.
>> >> >>
>> >> >>
>> >> >> # ibchecknet
>> >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
>> >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
>> >> >> ignore this counter
>> >> >> #warn: Lid is not configured lid 3 port 7
>> >> >> #warn: SM Lid is not configured
>> >> >
>> >> > Is there an SM running on your subnet ? If so, I think that the lack
>> >> > of an SM could account for all of the issues mentioned here.
>> >> >
>> >> > -- Hal
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
>> >>
>> >
>> >
>> > --
>> > Ira Weiny
>> > Math Programmer/Computer Scientist
>> > Lawrence Livermore National Lab
>> > 925-423-8008
>> > weiny2-i2BcT+NCU+M@public.gmane.org
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2-i2BcT+NCU+M@public.gmane.org
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]                     ` <20100902131614.440c3111.weiny2-i2BcT+NCU+M@public.gmane.org>
  2010-09-02 20:34                       ` Hal Rosenstock
@ 2010-09-03 21:04                       ` Chuck Hartley
       [not found]                         ` <AANLkTi=zJWVk3KCiiQpuQd+Etuxc3JVTg48EeMQ_xV9C-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Hartley @ 2010-09-03 21:04 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

I checked another working  fabric here and also see the same warnings,
so it looks like the warnings are not really a problem.

Well, I assume that it is just IPoIB that isn't working. Since ibping
works, I believe that says the IB part is ok. Of course, I can't run
any of the perftools since they all need IPoIB to resolve the host IP.

Do you have any suggestions of what to check to diagnose the IPoIB
problem?  Specifically, can you think of any interaction with the
"normal" networking stuff in the kernel that might be misconfigured?
The reason I mention that is because I rebuilt/installed OFED (no
errors/warnings) and it is in its default configuration, which is
running well on other similar fabrics here.  Therefore I assume the
problem must be with the non-OFED stuff. Previously, whenever this
kind of problem cropped up it has always been because opensm was not
running. I did check that iptables was off, so it isn't a firewall
issue.

- Chuck


On Thu, Sep 2, 2010 at 4:16 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> On Thu, 2 Sep 2010 11:11:13 -0700
> Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Sure, here is the output:
>> Note this is with the switch we swapped in, so the port numbers don't
>> match the ibchecknet output in the original message.
>>
>> # ibstat
>> CA 'mlx4_0'
>>       CA type: MT26428
>>       Number of ports: 2
>>       Firmware version: 2.6.0
>>       Hardware version: a0
>>       Node GUID: 0x0002c90300032de0
>>       System image GUID: 0x0002c90300032de3
>>       Port 1:
>>               State: Active
>>               Physical state: LinkUp
>>               Rate: 40
>>               Base lid: 6
>>               LMC: 0
>>               SM lid: 6
>
> Well the SM lid is set here.  Is it set on the other nodes?
>
> I don't run ibchecknet usually but I am getting the same errors here on a
> working fabric...
>
> ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this counter
> #warn: Lid is not configured lid 37 port 2
> #warn: SM Lid is not configured
> Port check lid 37 port 2:  FAILED
>
> Looking at this output I don't think this is an error.
>
> 13:17:14 > smpquery nodeinfo 37
> # Node info: Lid 37
> BaseVers:........................1
> ClassVers:.......................1
> NodeType:........................Switch
> NumPorts:........................24
> ...
>
> On switch external Ports the Lid and SMLid are not used.
>
> Hal, would you concur?
>
> Chuck,
> Is it just that IPoIB is not working for you?
>
> Ira
>
>
>>               Capability mask: 0x0251086a
>>               Port GUID: 0x0002c90300032de1
>>       Port 2:
>>               State: Down
>>               Physical state: Polling
>>               Rate: 10
>>               Base lid: 0
>>               LMC: 0
>>               SM lid: 0
>>               Capability mask: 0x02510868
>>               Port GUID: 0x0002c90300032de2
>> CA 'mthca0'
>>       CA type: MT25204
>>       Number of ports: 1
>>       Firmware version: 1.2.0
>>       Hardware version: a0
>>       Node GUID: 0x003048c64c0c0000
>>       System image GUID: 0x003048c64c0c0003
>>       Port 1:
>>               State: Down
>>               Physical state: Polling
>>               Rate: 10
>>               Base lid: 0
>>               LMC: 0
>>               SM lid: 0
>>               Capability mask: 0x02510a68
>>               Port GUID: 0x003048c64c0c0001
>>
>> # iblinkinfo
>> Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
>>            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5
>> 1[  ] " HCA-1" ( )
>>            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6
>> 1[  ] "linux70 HCA-1" ( )
>>            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7
>> 1[  ] "linux71 HCA-1" ( )
>>            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       9
>> 1[  ] " HCA-1" ( )
>>            1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       8
>> 1[  ] " HCA-1" ( )
>>            1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>            1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
>> [  ] "" ( )
>>
>> On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
>> > On Thu, 2 Sep 2010 06:56:50 -0700
>> > Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> >
>> >> We swapped in a different switch and see the same errors. The opensm
>> >> logfile does not show any errors:
>> >
>> > Could you run "ibstat" on the node with OpenSM running?
>> >
>> > And "iblinkinfo" on the same node?
>> >
>> > Send that output.
>> >
>> > Ira
>> >
>> >>
>> >> -------------------------------------------------
>> >> OpenSM 3.3.5
>> >> Command Line Arguments:
>> >>  Daemon mode
>> >>  Log File: /var/log/opensm.log
>> >> -------------------------------------------------
>> >> OpenSM 3.3.5
>> >>
>> >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
>> >> Entering DISCOVERING state
>> >>
>> >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
>> >> pending umads specified
>> >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
>> >> Using default GUID 0x2c90300032de1
>> >> Entering MASTER state
>> >>
>> >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
>> >> port 0x2c90300032de1
>> >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
>> >> port 0x2c90300032de1
>> >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
>> >> IS_SM on port 0x0002c90300032de1
>> >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
>> >> SUBNET UP
>> >>
>> >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
>> >> minhop tables configured on all switches
>> >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
>> >>
>> >>
>> >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
>> >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14@gmail.com> wrote:
>> >> >> Hello,
>> >> >>
>> >> >> We installed 1.5.1 and are having problems getting the IB fabric
>> >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
>> >> >> no errors. However, ibchecknet shows that the switch ports are not
>> >> >> being configured.  We have never seen this before and are at a loss as
>> >> >> to where the problem might be - would someone please point us in the
>> >> >> right direction to look?  Could it be a problem with the switch
>> >> >> itself? Output from ibchecknet below.
>> >> >>
>> >> >>
>> >> >> # ibchecknet
>> >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
>> >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
>> >> >> ignore this counter
>> >> >> #warn: Lid is not configured lid 3 port 7
>> >> >> #warn: SM Lid is not configured
>> >> >
>> >> > Is there an SM running on your subnet ? If so, I think that the lack
>> >> > of an SM could account for all of the issues mentioned here.
>> >> >
>> >> > -- Hal
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
>> >>
>> >
>> >
>> > --
>> > Ira Weiny
>> > Math Programmer/Computer Scientist
>> > Lawrence Livermore National Lab
>> > 925-423-8008
>> > weiny2-i2BcT+NCU+M@public.gmane.org
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2-i2BcT+NCU+M@public.gmane.org
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]                         ` <AANLkTi=zJWVk3KCiiQpuQd+Etuxc3JVTg48EeMQ_xV9C-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-09-08  2:07                           ` Ira Weiny
       [not found]                             ` <20100907190756.c7710d9a.weiny2-i2BcT+NCU+M@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Ira Weiny @ 2010-09-08  2:07 UTC (permalink / raw)
  To: Chuck Hartley; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Fri, 3 Sep 2010 14:04:37 -0700
Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> I checked another working  fabric here and also see the same warnings,
> so it looks like the warnings are not really a problem.

Yes I think you should consider those warnings not errors.

> 
> Well, I assume that it is just IPoIB that isn't working. Since ibping
> works, I believe that says the IB part is ok. Of course, I can't run
> any of the perftools since they all need IPoIB to resolve the host IP.
> 
> Do you have any suggestions of what to check to diagnose the IPoIB
> problem?

Can you log into the nodes or do you have console output?  Is ib0 up?

Ira

> Specifically, can you think of any interaction with the
> "normal" networking stuff in the kernel that might be misconfigured?
> The reason I mention that is because I rebuilt/installed OFED (no
> errors/warnings) and it is in its default configuration, which is
> running well on other similar fabrics here.  Therefore I assume the
> problem must be with the non-OFED stuff. Previously, whenever this
> kind of problem cropped up it has always been because opensm was not
> running. I did check that iptables was off, so it isn't a firewall
> issue.
> 
> - Chuck
> 
> 
> On Thu, Sep 2, 2010 at 4:16 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> > On Thu, 2 Sep 2010 11:11:13 -0700
> > Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> >> Sure, here is the output:
> >> Note this is with the switch we swapped in, so the port numbers don't
> >> match the ibchecknet output in the original message.
> >>
> >> # ibstat
> >> CA 'mlx4_0'
> >>       CA type: MT26428
> >>       Number of ports: 2
> >>       Firmware version: 2.6.0
> >>       Hardware version: a0
> >>       Node GUID: 0x0002c90300032de0
> >>       System image GUID: 0x0002c90300032de3
> >>       Port 1:
> >>               State: Active
> >>               Physical state: LinkUp
> >>               Rate: 40
> >>               Base lid: 6
> >>               LMC: 0
> >>               SM lid: 6
> >
> > Well the SM lid is set here.  Is it set on the other nodes?
> >
> > I don't run ibchecknet usually but I am getting the same errors here on a
> > working fabric...
> >
> > ibwarn: [13629] dump_perfcounters: PortXmitWait not indicated so ignore this counter
> > #warn: Lid is not configured lid 37 port 2
> > #warn: SM Lid is not configured
> > Port check lid 37 port 2:  FAILED
> >
> > Looking at this output I don't think this is an error.
> >
> > 13:17:14 > smpquery nodeinfo 37
> > # Node info: Lid 37
> > BaseVers:........................1
> > ClassVers:.......................1
> > NodeType:........................Switch
> > NumPorts:........................24
> > ...
> >
> > On switch external Ports the Lid and SMLid are not used.
> >
> > Hal, would you concur?
> >
> > Chuck,
> > Is it just that IPoIB is not working for you?
> >
> > Ira
> >
> >
> >>               Capability mask: 0x0251086a
> >>               Port GUID: 0x0002c90300032de1
> >>       Port 2:
> >>               State: Down
> >>               Physical state: Polling
> >>               Rate: 10
> >>               Base lid: 0
> >>               LMC: 0
> >>               SM lid: 0
> >>               Capability mask: 0x02510868
> >>               Port GUID: 0x0002c90300032de2
> >> CA 'mthca0'
> >>       CA type: MT25204
> >>       Number of ports: 1
> >>       Firmware version: 1.2.0
> >>       Hardware version: a0
> >>       Node GUID: 0x003048c64c0c0000
> >>       System image GUID: 0x003048c64c0c0003
> >>       Port 1:
> >>               State: Down
> >>               Physical state: Polling
> >>               Rate: 10
> >>               Base lid: 0
> >>               LMC: 0
> >>               SM lid: 0
> >>               Capability mask: 0x02510a68
> >>               Port GUID: 0x003048c64c0c0001
> >>
> >> # iblinkinfo
> >> Switch 0x0002c9020041a7a0 Infiniscale-IV Mellanox Technologies:
> >>            1    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5
> >> 1[  ] " HCA-1" ( )
> >>            1    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6
> >> 1[  ] "linux70 HCA-1" ( )
> >>            1    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7
> >> 1[  ] "linux71 HCA-1" ( )
> >>            1    4[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1    5[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1    6[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1    7[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1    8[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1    9[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   10[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   12[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   14[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   15[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   16[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   17[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   18[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   19[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   20[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   21[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   22[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   23[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   24[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       9
> >> 1[  ] " HCA-1" ( )
> >>            1   25[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>       8
> >> 1[  ] " HCA-1" ( )
> >>            1   26[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   27[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   28[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   29[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   30[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   31[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   32[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   33[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   34[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   35[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>            1   36[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>
> >> [  ] "" ( )
> >>
> >> On Thu, Sep 2, 2010 at 12:03 PM, Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org> wrote:
> >> > On Thu, 2 Sep 2010 06:56:50 -0700
> >> > Chuck Hartley <hartlch14-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> >
> >> >> We swapped in a different switch and see the same errors. The opensm
> >> >> logfile does not show any errors:
> >> >
> >> > Could you run "ibstat" on the node with OpenSM running?
> >> >
> >> > And "iblinkinfo" on the same node?
> >> >
> >> > Send that output.
> >> >
> >> > Ira
> >> >
> >> >>
> >> >> -------------------------------------------------
> >> >> OpenSM 3.3.5
> >> >> Command Line Arguments:
> >> >>  Daemon mode
> >> >>  Log File: /var/log/opensm.log
> >> >> -------------------------------------------------
> >> >> OpenSM 3.3.5
> >> >>
> >> >> Sep 02 05:56:29 933684 [B53B8700] 0x80 -> OpenSM 3.3.5
> >> >> Entering DISCOVERING state
> >> >>
> >> >> Sep 02 05:56:29 934931 [B53B8700] 0x02 -> osm_vendor_init: 1000
> >> >> pending umads specified
> >> >> Sep 02 05:56:29 935079 [B53B8700] 0x80 -> Entering DISCOVERING state
> >> >> Using default GUID 0x2c90300032de1
> >> >> Entering MASTER state
> >> >>
> >> >> Sep 02 05:56:29 953763 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> >> >> port 0x2c90300032de1
> >> >> Sep 02 05:56:29 990146 [B53B8700] 0x02 -> osm_vendor_bind: Binding to
> >> >> port 0x2c90300032de1
> >> >> Sep 02 05:56:29 990240 [B53B8700] 0x02 -> osm_opensm_bind: Setting
> >> >> IS_SM on port 0x0002c90300032de1
> >> >> Sep 02 05:56:30 009040 [AF1DB710] 0x80 -> Entering MASTER state
> >> >> SUBNET UP
> >> >>
> >> >> Sep 02 05:56:30 009885 [AF1DB710] 0x02 -> osm_ucast_mgr_process:
> >> >> minhop tables configured on all switches
> >> >> Sep 02 05:56:30 014593 [AF1DB710] 0x80 -> SUBNET UP
> >> >>
> >> >>
> >> >> On Thu, Sep 2, 2010 at 8:56 AM, Hal Rosenstock <hal.rosenstock@gmail.com> wrote:
> >> >> > On Thu, Sep 2, 2010 at 8:34 AM, Chuck Hartley <hartlch14@gmail.com> wrote:
> >> >> >> Hello,
> >> >> >>
> >> >> >> We installed 1.5.1 and are having problems getting the IB fabric
> >> >> >> working. ibv_devinfo shows the HCAs ports are ok and ibdiagnet reports
> >> >> >> no errors. However, ibchecknet shows that the switch ports are not
> >> >> >> being configured.  We have never seen this before and are at a loss as
> >> >> >> to where the problem might be - would someone please point us in the
> >> >> >> right direction to look?  Could it be a problem with the switch
> >> >> >> itself? Output from ibchecknet below.
> >> >> >>
> >> >> >>
> >> >> >> # ibchecknet
> >> >> >> Error check on lid 3 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
> >> >> >> ibwarn: [26732] dump_perfcounters: PortXmitWait not indicated so
> >> >> >> ignore this counter
> >> >> >> #warn: Lid is not configured lid 3 port 7
> >> >> >> #warn: SM Lid is not configured
> >> >> >
> >> >> > Is there an SM running on your subnet ? If so, I think that the lack
> >> >> > of an SM could account for all of the issues mentioned here.
> >> >> >
> >> >> > -- Hal
> >> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> >> More majordomo info at  http://***vger.kernel.org/majordomo-info.html
> >> >>
> >> >
> >> >
> >> > --
> >> > Ira Weiny
> >> > Math Programmer/Computer Scientist
> >> > Lawrence Livermore National Lab
> >> > 925-423-8008
> >> > weiny2-i2BcT+NCU+M@public.gmane.org
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> More majordomo info at  http://**vger.kernel.org/majordomo-info.html
> >>
> >
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > weiny2-i2BcT+NCU+M@public.gmane.org
> >
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Errors from ibchecknet
       [not found]                             ` <20100907190756.c7710d9a.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2010-09-08 17:51                               ` Chuck Hartley
  0 siblings, 0 replies; 11+ messages in thread
From: Chuck Hartley @ 2010-09-08 17:51 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA

We found what the problem is...

There is an onboard mthca DDR interface and a mlx4 QDR add-in card.
When the system comes up, it finds the onboard HCA first and was
making that IB0. The ifconfig output shows that interface ib0 is up
even though the actual IB state is down/polling. We disabled the mthca
driver from loading and now the mlx4 port 1 interface becomes ib0 as
we wanted.

I'm not sure what the best way to handle this configuration would be
(assuming we actually needed to use all 3 ports / 2 HCAs).  We could
specify the HWADDR in the ifcfg-ib[0,1,2] files, assuming that it
works with the 20-byte IB addresses. Or, I seem to remember adding
alias lines in /etc/modprobe.conf years ago to sort out a similar
situation with ethernet ports. Do you guys have other/better
suggestions?

What is the purpose of the "alias ib0 ib_ipoib" line in
etc/modprobe.conf? Is it required by something in OFED? Are you
supposed to have one for any additional interfaces (ib1, ib2)? Since
the OFED install did not create entries for the other interfaces, I'm
guessing it is not strictly required?

-Chuck
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-09-08 17:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-02 12:34 Errors from ibchecknet Chuck Hartley
     [not found] ` <AANLkTi=EuVGxLyjMFMw=YZm38CPvCTbcxt104R-B_o_i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-02 12:56   ` Hal Rosenstock
     [not found]     ` <AANLkTikoYH+1tBzHX29uUnFGepFhfXdOkRHSX3OLW9g6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-02 13:56       ` Chuck Hartley
     [not found]         ` <AANLkTikQ3jeNwtbR6zqpSvt7=YWatRhWfr60Q5of_FJR-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-02 16:03           ` Ira Weiny
     [not found]             ` <20100902090330.093d8ba6.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-09-02 18:11               ` Chuck Hartley
     [not found]                 ` <AANLkTin=p2Mt223QS1oXfy6vi1RBNiF4HaQQcScfg5FP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-02 20:16                   ` Ira Weiny
     [not found]                     ` <20100902131614.440c3111.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-09-02 20:34                       ` Hal Rosenstock
2010-09-03 21:04                       ` Chuck Hartley
     [not found]                         ` <AANLkTi=zJWVk3KCiiQpuQd+Etuxc3JVTg48EeMQ_xV9C-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-09-08  2:07                           ` Ira Weiny
     [not found]                             ` <20100907190756.c7710d9a.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-09-08 17:51                               ` Chuck Hartley
2010-09-02 18:16               ` Chuck Hartley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.