All of lore.kernel.org
 help / color / mirror / Atom feed
* problems about opensm 3.3.2
       [not found] ` <78022df1003081806v542ec57dy491fc27e32520d7e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-03-09  2:10   ` Larry
       [not found]     ` <78022df1003081810x1a2633f3sa4895f7bbc1fe5e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Larry @ 2010-03-09  2:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi, all
We have an infiniband network with about 1,000 nodes, and have two
opensm nodes, one active and one standby. But there are a lot of error
in the opensm's log, like that

Mar 08 09:41:05 003518 [4580A940] 0x02 -> SUBNET UP
Mar 08 09:41:06 035406 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 036331 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 037045 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 037728 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 038929 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 040478 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 040642 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 044892 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 046116 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 046564 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 048440 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 049224 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 050253 [41802940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 050455 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 084310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x20
trans_id=0x8e614b257505) -- dropping
Mar 08 09:41:09 084346 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 084366 [4600B940] 0x01 -> Received SMP on a 4 hop path:
                Initial path = 0,0,0,0,0
                Return path  = 0,0,0,0,0
Mar 08 09:41:09 084379 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 084424 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x4
                trans_id................ 0x4b257505
                attr_id.................0x20 (SMInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................
0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,22,11
                Return path:  0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:09 364332 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258973) -- dropping
Mar 08 09:41:09 364370 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 364393 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:09 364409 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 364465 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258973
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,6,19,24
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:09 412323 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258af8) -- dropping
Mar 08 09:41:09 412358 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 412381 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:09 412396 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 412451 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258af8
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,10,17,13
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:09 440326 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258b7d) -- dropping
Mar 08 09:41:09 440361 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 440384 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:09 440399 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 440453 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258b7d
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,10,21,13
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:09 908345 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258bda) -- dropping
Mar 08 09:41:09 908381 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 908404 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:09 908420 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 908473 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258bda
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,8,17,11
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:10 168348 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258bde) -- dropping
Mar 08 09:41:10 168386 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 168409 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:10 168424 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 168479 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258bde
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,8,17,15
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:10 224344 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c04) -- dropping
Mar 08 09:41:10 224379 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 224403 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:10 224417 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 224471 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c04
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,8,22,15
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:10 244310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c07) -- dropping
Mar 08 09:41:10 244344 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 244380 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:10 244395 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 244449 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c07
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,8,22,19
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:10 712361 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c09) -- dropping
Mar 08 09:41:10 712397 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 712421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:10 712436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 712490 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c09
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,8,22,24
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:10 972363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c15) -- dropping
Mar 08 09:41:10 972399 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 972421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:10 972436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 972490 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c15
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,8,20,13
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:11 048363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c79) -- dropping
Mar 08 09:41:11 056589 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 056620 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:11 056635 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 056691 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c79
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,13,20
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:11 056736 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c8c) -- dropping
Mar 08 09:41:11 056768 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 056789 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:11 056803 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 056856 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c8c
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,14,24
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:11 516381 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c95) -- dropping
Mar 08 09:41:11 516418 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 516441 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:11 516456 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 516510 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258c95
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,15,11
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:11 776387 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258ca8) -- dropping
Mar 08 09:41:11 776424 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 776447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:11 776462 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 776517 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258ca8
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,16,12
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:11 860388 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258cad) -- dropping
Mar 08 09:41:11 860423 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 860447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:11 860461 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 860516 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258cad
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,16,18
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:11 860580 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258cb0) -- dropping
Mar 08 09:41:11 860616 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 860639 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:11 860653 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 860707 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258cb0
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,16,24
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:12 015232 [41001940] 0x01 -> trap_rcv_process_request:
Received Generic Notice type:1 num:128 (Link state change) Producer:2
(Switch) from LID:1212 TID:0x000000000000001f
Mar 08 09:41:12 015332 [41001940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:1 num:128 (Link state change) from LID:1212
GID:fe80::d200:0:0:68
Mar 08 09:41:12 054465 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 055979 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 056121 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 057983 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 058803 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 060284 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 061670 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 320396 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258cbc) -- dropping
Mar 08 09:41:12 320435 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 320459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:12 320476 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 320532 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258cbc
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,18,14
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:12 596401 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258d02) -- dropping
Mar 08 09:41:12 596436 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 596459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:12 596473 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 596528 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258d02
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,7,21,12
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:12 724423 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258f10) -- dropping
Mar 08 09:41:12 724460 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 724484 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:12 724499 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 724554 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258f10
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,4,19,17
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:12 756407 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258f91) -- dropping
Mar 08 09:41:12 756431 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 756453 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:12 756468 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 756522 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258f91
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,11,14,13
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:13 124374 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258f97) -- dropping
Mar 08 09:41:13 124389 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:13 124398 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:13 124404 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:13 124425 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b258f97
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,11,14,20
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:13 516395 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b259178) -- dropping
Mar 08 09:41:13 516410 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:13 516419 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:13 516426 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:13 516446 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b259178
                attr_id.................0x11 (NodeInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,9,24,11
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:13 596432 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x15
trans_id=0x8e614b25933a) -- dropping
Mar 08 09:41:13 596470 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:13 596494 [4600B940] 0x01 -> Received SMP on a 6 hop path:
                Initial path = 0,0,0,0,0,0,0
                Return path  = 0,0,0,0,0,0,0
Mar 08 09:41:13 596509 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:13 596565 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x6
                trans_id................0x4b25933a
                attr_id.................0x15 (PortInfo)
                resv....................0x0
                attr_mod................0x1
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,1,6,19,14
                Return path:  0,0,0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:14 784437 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x20
trans_id=0x8e614b25bb1b) -- dropping
Mar 08 09:41:14 784468 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:14 784478 [4600B940] 0x01 -> Received SMP on a 4 hop path:
                Initial path = 0,0,0,0,0
                Return path  = 0,0,0,0,0
Mar 08 09:41:14 784486 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:14 784508 [4600B940] 0x01 -> SMP dump:
                base_ver................0x1
                mgmt_class..............0x81
                class_ver...............0x1
                method..................0x1 (SubnGet)
                D bit...................0x0
                status..................0x0
                hop_ptr.................0x0
                hop_count...............0x4
                trans_id................0x4b25bb1b
                attr_id.................0x20 (SMInfo)
                resv....................0x0
                attr_mod................0x0
                m_key...................0x0000000000000000
                dr_slid.................65535
                dr_dlid.................65535

                Initial path: 0,1,1,22,11
                Return path:  0,0,0,0,0
                Reserved:     [0][0][0][0][0][0][0]

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

                00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

Mar 08 09:41:15 064054 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:15 065782 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
...
...

After the opensm runs for about 20 days, it dies; and changes to the
standby one(sm2). And about 20 days more, the process on sm2 also
dies. Every node cannot communicate with opensm until restarting the
process manually. I think this have some business with the errors in
log. So what does the ERR 1B12, 5409, 5413 mean? How to configure the
opensm correctly? Thanks a lot
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: problems about opensm 3.3.2
       [not found]     ` <78022df1003081810x1a2633f3sa4895f7bbc1fe5e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-03-11 13:34       ` Hal Rosenstock
  0 siblings, 0 replies; 2+ messages in thread
From: Hal Rosenstock @ 2010-03-11 13:34 UTC (permalink / raw)
  To: Larry; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

On Mon, Mar 8, 2010 at 9:10 PM, Larry <tsrjzq@gmail.com> wrote:
> Hi, all
> We have an infiniband network with about 1,000 nodes, and have two
> opensm nodes, one active and one standby. But there are a lot of error
> in the opensm's log, like that
>
> Mar 08 09:41:05 003518 [4580A940] 0x02 -> SUBNET UP
> Mar 08 09:41:06 035406 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 036331 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 037045 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 037728 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 038929 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 040478 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 040642 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 044892 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 046116 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 046564 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 048440 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 049224 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 050253 [41802940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 050455 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID

These errors are most commonly due to a port which is unable to
support the required rate of the multicast group. It looks like these
ports are the switch port 0's on the Voltaire sFB-2012. If so, perhaps
these ports are running IPoIB (for management purposes) which would
cause the rmulticast registrations so either shut these off or
reconfigure the IPoIB broadcast group.

> Mar 08 09:41:09 084310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x20
> trans_id=0x8e614b257505) -- dropping
> Mar 08 09:41:09 084346 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 084366 [4600B940] 0x01 -> Received SMP on a 4 hop path:
>                 Initial path = 0,0,0,0,0
>                 Return path  = 0,0,0,0,0
> Mar 08 09:41:09 084379 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 084424 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x4
>                 trans_id................ 0x4b257505
>                 attr_id.................0x20 (SMInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................
> 0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,22,11

Is this node running another OpenSM or an SM on the Voltaire switch ?
Also, what routing protocol is being used with OpenSM ?

>                 Return path:  0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 364332 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258973) -- dropping
> Mar 08 09:41:09 364370 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 364393 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:09 364409 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 364465 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258973
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,6,19,24
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 412323 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258af8) -- dropping
> Mar 08 09:41:09 412358 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 412381 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:09 412396 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 412451 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258af8
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,10,17,13
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 440326 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258b7d) -- dropping
> Mar 08 09:41:09 440361 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 440384 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:09 440399 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 440453 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258b7d
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,10,21,13
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 908345 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258bda) -- dropping
> Mar 08 09:41:09 908381 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 908404 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:09 908420 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 908473 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258bda
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,8,17,11
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 168348 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258bde) -- dropping
> Mar 08 09:41:10 168386 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 168409 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:10 168424 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 168479 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258bde
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,8,17,15
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 224344 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c04) -- dropping
> Mar 08 09:41:10 224379 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 224403 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:10 224417 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 224471 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c04
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,8,22,15
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 244310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c07) -- dropping
> Mar 08 09:41:10 244344 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 244380 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:10 244395 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 244449 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c07
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,8,22,19
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 712361 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c09) -- dropping
> Mar 08 09:41:10 712397 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 712421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:10 712436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 712490 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c09
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,8,22,24
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 972363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c15) -- dropping
> Mar 08 09:41:10 972399 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 972421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:10 972436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 972490 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c15
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,8,20,13
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 048363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c79) -- dropping
> Mar 08 09:41:11 056589 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 056620 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:11 056635 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 056691 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c79
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,13,20
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 056736 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c8c) -- dropping
> Mar 08 09:41:11 056768 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 056789 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:11 056803 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 056856 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c8c
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,14,24
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 516381 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c95) -- dropping
> Mar 08 09:41:11 516418 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 516441 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:11 516456 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 516510 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258c95
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,15,11
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 776387 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258ca8) -- dropping
> Mar 08 09:41:11 776424 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 776447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:11 776462 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 776517 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258ca8
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,16,12
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 860388 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258cad) -- dropping
> Mar 08 09:41:11 860423 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 860447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:11 860461 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 860516 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258cad
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,16,18
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 860580 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258cb0) -- dropping
> Mar 08 09:41:11 860616 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 860639 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:11 860653 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 860707 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258cb0
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,16,24
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00

These are unresponsive nodes. The DR (direct routed) paths from the SM
node should be checked as well as the peer ports. What are these nodes
doing ? Is booting over IB being used ?

> Mar 08 09:41:12 015232 [41001940] 0x01 -> trap_rcv_process_request:
> Received Generic Notice type:1 num:128 (Link state change) Producer:2
> (Switch) from LID:1212 TID:0x000000000000001f
> Mar 08 09:41:12 015332 [41001940] 0x02 -> osm_report_notice: Reporting
> Generic Notice type:1 num:128 (Link state change) from LID:1212
> GID:fe80::d200:0:0:68
> Mar 08 09:41:12 054465 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 055979 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 056121 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 057983 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 058803 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 060284 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 061670 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 320396 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258cbc) -- dropping
> Mar 08 09:41:12 320435 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 320459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:12 320476 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 320532 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258cbc
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,18,14
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:12 596401 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258d02) -- dropping
> Mar 08 09:41:12 596436 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 596459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:12 596473 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 596528 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258d02
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,7,21,12
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:12 724423 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258f10) -- dropping
> Mar 08 09:41:12 724460 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 724484 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:12 724499 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 724554 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258f10
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,4,19,17
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:12 756407 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258f91) -- dropping
> Mar 08 09:41:12 756431 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 756453 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:12 756468 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 756522 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258f91
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,11,14,13
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:13 124374 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258f97) -- dropping
> Mar 08 09:41:13 124389 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:13 124398 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:13 124404 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:13 124425 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b258f97
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,11,14,20
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:13 516395 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b259178) -- dropping
> Mar 08 09:41:13 516410 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:13 516419 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:13 516426 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:13 516446 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b259178
>                 attr_id.................0x11 (NodeInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,9,24,11
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:13 596432 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x15
> trans_id=0x8e614b25933a) -- dropping
> Mar 08 09:41:13 596470 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:13 596494 [4600B940] 0x01 -> Received SMP on a 6 hop path:
>                 Initial path = 0,0,0,0,0,0,0
>                 Return path  = 0,0,0,0,0,0,0
> Mar 08 09:41:13 596509 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:13 596565 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x6
>                 trans_id................0x4b25933a
>                 attr_id.................0x15 (PortInfo)
>                 resv....................0x0
>                 attr_mod................0x1
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,1,6,19,14
>                 Return path:  0,0,0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:14 784437 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x20
> trans_id=0x8e614b25bb1b) -- dropping
> Mar 08 09:41:14 784468 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:14 784478 [4600B940] 0x01 -> Received SMP on a 4 hop path:
>                 Initial path = 0,0,0,0,0
>                 Return path  = 0,0,0,0,0
> Mar 08 09:41:14 784486 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:14 784508 [4600B940] 0x01 -> SMP dump:
>                 base_ver................0x1
>                 mgmt_class..............0x81
>                 class_ver...............0x1
>                 method..................0x1 (SubnGet)
>                 D bit...................0x0
>                 status..................0x0
>                 hop_ptr.................0x0
>                 hop_count...............0x4
>                 trans_id................0x4b25bb1b
>                 attr_id.................0x20 (SMInfo)
>                 resv....................0x0
>                 attr_mod................0x0
>                 m_key...................0x0000000000000000
>                 dr_slid.................65535
>                 dr_dlid.................65535
>
>                 Initial path: 0,1,1,22,11
>                 Return path:  0,0,0,0,0
>                 Reserved:     [0][0][0][0][0][0][0]
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
>                 00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00
>
> Mar 08 09:41:15 064054 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:15 065782 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> ...
> ...
>
> After the opensm runs for about 20 days, it dies;

Do you have/can you get a core dump and determine where OpenSM dies ?

> and changes to the
> standby one(sm2). And about 20 days more, the process on sm2 also
> dies. Every node cannot communicate with opensm until restarting the
> process manually.

Are there no SMs in the subnet at this point ?

-- Hal

> I think this have some business with the errors in
> log. So what does the ERR 1B12, 5409, 5413 mean? How to configure the
> opensm correctly? Thanks a lot
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-03-11 13:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <78022df1003081806v542ec57dy491fc27e32520d7e@mail.gmail.com>
     [not found] ` <78022df1003081806v542ec57dy491fc27e32520d7e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-09  2:10   ` problems about opensm 3.3.2 Larry
     [not found]     ` <78022df1003081810x1a2633f3sa4895f7bbc1fe5e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-11 13:34       ` Hal Rosenstock

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.