* problems about opensm 3.3.2
[not found] ` <78022df1003081806v542ec57dy491fc27e32520d7e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-03-09 2:10 ` Larry
[not found] ` <78022df1003081810x1a2633f3sa4895f7bbc1fe5e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 2+ messages in thread
From: Larry @ 2010-03-09 2:10 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi, all
We have an infiniband network with about 1,000 nodes, and have two
opensm nodes, one active and one standby. But there are a lot of error
in the opensm's log, like that
Mar 08 09:41:05 003518 [4580A940] 0x02 -> SUBNET UP
Mar 08 09:41:06 035406 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 036331 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 037045 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 037728 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 038929 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 040478 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:06 040642 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 044892 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 046116 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 046564 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 048440 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 049224 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 050253 [41802940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 050455 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:09 084310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x20
trans_id=0x8e614b257505) -- dropping
Mar 08 09:41:09 084346 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 084366 [4600B940] 0x01 -> Received SMP on a 4 hop path:
Initial path = 0,0,0,0,0
Return path = 0,0,0,0,0
Mar 08 09:41:09 084379 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 084424 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x4
trans_id................ 0x4b257505
attr_id.................0x20 (SMInfo)
resv....................0x0
attr_mod................0x0
m_key...................
0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,22,11
Return path: 0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:09 364332 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258973) -- dropping
Mar 08 09:41:09 364370 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 364393 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:09 364409 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 364465 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258973
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,6,19,24
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:09 412323 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258af8) -- dropping
Mar 08 09:41:09 412358 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 412381 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:09 412396 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 412451 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258af8
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,10,17,13
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:09 440326 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258b7d) -- dropping
Mar 08 09:41:09 440361 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 440384 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:09 440399 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 440453 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258b7d
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,10,21,13
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:09 908345 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258bda) -- dropping
Mar 08 09:41:09 908381 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:09 908404 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:09 908420 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:09 908473 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258bda
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,8,17,11
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:10 168348 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258bde) -- dropping
Mar 08 09:41:10 168386 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 168409 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:10 168424 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 168479 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258bde
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,8,17,15
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:10 224344 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c04) -- dropping
Mar 08 09:41:10 224379 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 224403 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:10 224417 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 224471 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c04
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,8,22,15
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:10 244310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c07) -- dropping
Mar 08 09:41:10 244344 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 244380 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:10 244395 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 244449 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c07
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,8,22,19
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:10 712361 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c09) -- dropping
Mar 08 09:41:10 712397 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 712421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:10 712436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 712490 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c09
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,8,22,24
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:10 972363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c15) -- dropping
Mar 08 09:41:10 972399 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:10 972421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:10 972436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:10 972490 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c15
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,8,20,13
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:11 048363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c79) -- dropping
Mar 08 09:41:11 056589 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 056620 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:11 056635 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 056691 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c79
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,13,20
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:11 056736 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c8c) -- dropping
Mar 08 09:41:11 056768 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 056789 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:11 056803 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 056856 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c8c
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,14,24
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:11 516381 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258c95) -- dropping
Mar 08 09:41:11 516418 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 516441 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:11 516456 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 516510 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258c95
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,15,11
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:11 776387 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258ca8) -- dropping
Mar 08 09:41:11 776424 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 776447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:11 776462 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 776517 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258ca8
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,16,12
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:11 860388 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258cad) -- dropping
Mar 08 09:41:11 860423 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 860447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:11 860461 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 860516 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258cad
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,16,18
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:11 860580 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258cb0) -- dropping
Mar 08 09:41:11 860616 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:11 860639 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:11 860653 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:11 860707 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258cb0
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,16,24
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:12 015232 [41001940] 0x01 -> trap_rcv_process_request:
Received Generic Notice type:1 num:128 (Link state change) Producer:2
(Switch) from LID:1212 TID:0x000000000000001f
Mar 08 09:41:12 015332 [41001940] 0x02 -> osm_report_notice: Reporting
Generic Notice type:1 num:128 (Link state change) from LID:1212
GID:fe80::d200:0:0:68
Mar 08 09:41:12 054465 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 055979 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 056121 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 057983 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 058803 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 060284 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 061670 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:12 320396 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258cbc) -- dropping
Mar 08 09:41:12 320435 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 320459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:12 320476 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 320532 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258cbc
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,18,14
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:12 596401 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258d02) -- dropping
Mar 08 09:41:12 596436 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 596459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:12 596473 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 596528 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258d02
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,7,21,12
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:12 724423 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258f10) -- dropping
Mar 08 09:41:12 724460 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 724484 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:12 724499 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 724554 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258f10
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,4,19,17
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:12 756407 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258f91) -- dropping
Mar 08 09:41:12 756431 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:12 756453 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:12 756468 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:12 756522 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258f91
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,11,14,13
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:13 124374 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b258f97) -- dropping
Mar 08 09:41:13 124389 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:13 124398 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:13 124404 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:13 124425 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b258f97
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,11,14,20
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:13 516395 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x11
trans_id=0x8e614b259178) -- dropping
Mar 08 09:41:13 516410 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:13 516419 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:13 516426 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:13 516446 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b259178
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,9,24,11
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:13 596432 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x15
trans_id=0x8e614b25933a) -- dropping
Mar 08 09:41:13 596470 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:13 596494 [4600B940] 0x01 -> Received SMP on a 6 hop path:
Initial path = 0,0,0,0,0,0,0
Return path = 0,0,0,0,0,0,0
Mar 08 09:41:13 596509 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:13 596565 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x6
trans_id................0x4b25933a
attr_id.................0x15 (PortInfo)
resv....................0x0
attr_mod................0x1
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,1,6,19,14
Return path: 0,0,0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:14 784437 [4600B940] 0x01 -> umad_receiver: ERR 5409:
send completed with error (method=0x1 attr=0x20
trans_id=0x8e614b25bb1b) -- dropping
Mar 08 09:41:14 784468 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
SMP Hop Ptr: 0x0
Mar 08 09:41:14 784478 [4600B940] 0x01 -> Received SMP on a 4 hop path:
Initial path = 0,0,0,0,0
Return path = 0,0,0,0,0
Mar 08 09:41:14 784486 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
3113: MAD completed in error (IB_TIMEOUT)
Mar 08 09:41:14 784508 [4600B940] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x4
trans_id................0x4b25bb1b
attr_id.................0x20 (SMInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0,1,1,22,11
Return path: 0,0,0,0,0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Mar 08 09:41:15 064054 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
Mar 08 09:41:15 065782 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
sending IB_SA_MAD_STATUS_REQ_INVALID
...
...
After the opensm runs for about 20 days, it dies; and changes to the
standby one(sm2). And about 20 days more, the process on sm2 also
dies. Every node cannot communicate with opensm until restarting the
process manually. I think this have some business with the errors in
log. So what does the ERR 1B12, 5409, 5413 mean? How to configure the
opensm correctly? Thanks a lot
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: problems about opensm 3.3.2
[not found] ` <78022df1003081810x1a2633f3sa4895f7bbc1fe5e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-03-11 13:34 ` Hal Rosenstock
0 siblings, 0 replies; 2+ messages in thread
From: Hal Rosenstock @ 2010-03-11 13:34 UTC (permalink / raw)
To: Larry; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi,
On Mon, Mar 8, 2010 at 9:10 PM, Larry <tsrjzq@gmail.com> wrote:
> Hi, all
> We have an infiniband network with about 1,000 nodes, and have two
> opensm nodes, one active and one standby. But there are a lot of error
> in the opensm's log, like that
>
> Mar 08 09:41:05 003518 [4580A940] 0x02 -> SUBNET UP
> Mar 08 09:41:06 035406 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 036331 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 037045 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 037728 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 038929 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 040478 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:06 040642 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 044892 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 046116 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 046564 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 048440 [44808940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 049224 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 050253 [41802940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:09 050455 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
These errors are most commonly due to a port which is unable to
support the required rate of the multicast group. It looks like these
ports are the switch port 0's on the Voltaire sFB-2012. If so, perhaps
these ports are running IPoIB (for management purposes) which would
cause the rmulticast registrations so either shut these off or
reconfigure the IPoIB broadcast group.
> Mar 08 09:41:09 084310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x20
> trans_id=0x8e614b257505) -- dropping
> Mar 08 09:41:09 084346 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 084366 [4600B940] 0x01 -> Received SMP on a 4 hop path:
> Initial path = 0,0,0,0,0
> Return path = 0,0,0,0,0
> Mar 08 09:41:09 084379 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 084424 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x4
> trans_id................ 0x4b257505
> attr_id.................0x20 (SMInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................
> 0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,22,11
Is this node running another OpenSM or an SM on the Voltaire switch ?
Also, what routing protocol is being used with OpenSM ?
> Return path: 0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 364332 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258973) -- dropping
> Mar 08 09:41:09 364370 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 364393 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:09 364409 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 364465 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258973
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,6,19,24
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 412323 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258af8) -- dropping
> Mar 08 09:41:09 412358 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 412381 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:09 412396 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 412451 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258af8
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,10,17,13
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 440326 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258b7d) -- dropping
> Mar 08 09:41:09 440361 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 440384 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:09 440399 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 440453 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258b7d
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,10,21,13
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:09 908345 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258bda) -- dropping
> Mar 08 09:41:09 908381 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:09 908404 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:09 908420 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:09 908473 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258bda
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,8,17,11
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 168348 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258bde) -- dropping
> Mar 08 09:41:10 168386 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 168409 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:10 168424 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 168479 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258bde
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,8,17,15
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 224344 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c04) -- dropping
> Mar 08 09:41:10 224379 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 224403 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:10 224417 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 224471 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c04
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,8,22,15
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 244310 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c07) -- dropping
> Mar 08 09:41:10 244344 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 244380 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:10 244395 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 244449 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c07
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,8,22,19
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 712361 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c09) -- dropping
> Mar 08 09:41:10 712397 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 712421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:10 712436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 712490 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c09
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,8,22,24
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:10 972363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c15) -- dropping
> Mar 08 09:41:10 972399 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:10 972421 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:10 972436 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:10 972490 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c15
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,8,20,13
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 048363 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c79) -- dropping
> Mar 08 09:41:11 056589 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 056620 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:11 056635 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 056691 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c79
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,13,20
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 056736 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c8c) -- dropping
> Mar 08 09:41:11 056768 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 056789 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:11 056803 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 056856 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c8c
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,14,24
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 516381 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258c95) -- dropping
> Mar 08 09:41:11 516418 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 516441 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:11 516456 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 516510 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258c95
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,15,11
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 776387 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258ca8) -- dropping
> Mar 08 09:41:11 776424 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 776447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:11 776462 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 776517 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258ca8
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,16,12
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 860388 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258cad) -- dropping
> Mar 08 09:41:11 860423 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 860447 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:11 860461 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 860516 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258cad
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,16,18
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:11 860580 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258cb0) -- dropping
> Mar 08 09:41:11 860616 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:11 860639 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:11 860653 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:11 860707 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258cb0
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,16,24
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
These are unresponsive nodes. The DR (direct routed) paths from the SM
node should be checked as well as the peer ports. What are these nodes
doing ? Is booting over IB being used ?
> Mar 08 09:41:12 015232 [41001940] 0x01 -> trap_rcv_process_request:
> Received Generic Notice type:1 num:128 (Link state change) Producer:2
> (Switch) from LID:1212 TID:0x000000000000001f
> Mar 08 09:41:12 015332 [41001940] 0x02 -> osm_report_notice: Reporting
> Generic Notice type:1 num:128 (Link state change) from LID:1212
> GID:fe80::d200:0:0:68
> Mar 08 09:41:12 054465 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 055979 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 056121 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b026b (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 057983 [41001940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02d7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 058803 [44007940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02f3 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 060284 [43005940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02e7 (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 061670 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b029f (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:12 320396 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258cbc) -- dropping
> Mar 08 09:41:12 320435 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 320459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:12 320476 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 320532 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258cbc
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,18,14
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:12 596401 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258d02) -- dropping
> Mar 08 09:41:12 596436 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 596459 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:12 596473 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 596528 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258d02
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,7,21,12
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:12 724423 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258f10) -- dropping
> Mar 08 09:41:12 724460 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 724484 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:12 724499 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 724554 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258f10
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,4,19,17
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:12 756407 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258f91) -- dropping
> Mar 08 09:41:12 756431 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:12 756453 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:12 756468 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:12 756522 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258f91
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,11,14,13
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:13 124374 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b258f97) -- dropping
> Mar 08 09:41:13 124389 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:13 124398 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:13 124404 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:13 124425 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b258f97
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,11,14,20
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:13 516395 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x11
> trans_id=0x8e614b259178) -- dropping
> Mar 08 09:41:13 516410 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:13 516419 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:13 516426 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:13 516446 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b259178
> attr_id.................0x11 (NodeInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,9,24,11
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:13 596432 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x15
> trans_id=0x8e614b25933a) -- dropping
> Mar 08 09:41:13 596470 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:13 596494 [4600B940] 0x01 -> Received SMP on a 6 hop path:
> Initial path = 0,0,0,0,0,0,0
> Return path = 0,0,0,0,0,0,0
> Mar 08 09:41:13 596509 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:13 596565 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x6
> trans_id................0x4b25933a
> attr_id.................0x15 (PortInfo)
> resv....................0x0
> attr_mod................0x1
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,1,6,19,14
> Return path: 0,0,0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:14 784437 [4600B940] 0x01 -> umad_receiver: ERR 5409:
> send completed with error (method=0x1 attr=0x20
> trans_id=0x8e614b25bb1b) -- dropping
> Mar 08 09:41:14 784468 [4600B940] 0x01 -> umad_receiver: ERR 5411: DR
> SMP Hop Ptr: 0x0
> Mar 08 09:41:14 784478 [4600B940] 0x01 -> Received SMP on a 4 hop path:
> Initial path = 0,0,0,0,0
> Return path = 0,0,0,0,0
> Mar 08 09:41:14 784486 [4600B940] 0x01 -> sm_mad_ctrl_send_err_cb: ERR
> 3113: MAD completed in error (IB_TIMEOUT)
> Mar 08 09:41:14 784508 [4600B940] 0x01 -> SMP dump:
> base_ver................0x1
> mgmt_class..............0x81
> class_ver...............0x1
> method..................0x1 (SubnGet)
> D bit...................0x0
> status..................0x0
> hop_ptr.................0x0
> hop_count...............0x4
> trans_id................0x4b25bb1b
> attr_id.................0x20 (SMInfo)
> resv....................0x0
> attr_mod................0x0
> m_key...................0x0000000000000000
> dr_slid.................65535
> dr_dlid.................65535
>
> Initial path: 0,1,1,22,11
> Return path: 0,0,0,0,0
> Reserved: [0][0][0][0][0][0][0]
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Mar 08 09:41:15 064054 [42003940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02bf (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> Mar 08 09:41:15 065782 [42804940] 0x01 -> mcmr_rcv_join_mgrp: ERR
> 1B12: validate_more_comp_fields, validate_port_caps, or JoinState = 0
> failed from port 0x0008f100010b02ab (ISR2012 Voltaire sFB-2012),
> sending IB_SA_MAD_STATUS_REQ_INVALID
> ...
> ...
>
> After the opensm runs for about 20 days, it dies;
Do you have/can you get a core dump and determine where OpenSM dies ?
> and changes to the
> standby one(sm2). And about 20 days more, the process on sm2 also
> dies. Every node cannot communicate with opensm until restarting the
> process manually.
Are there no SMs in the subnet at this point ?
-- Hal
> I think this have some business with the errors in
> log. So what does the ERR 1B12, 5409, 5413 mean? How to configure the
> opensm correctly? Thanks a lot
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-03-11 13:34 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <78022df1003081806v542ec57dy491fc27e32520d7e@mail.gmail.com>
[not found] ` <78022df1003081806v542ec57dy491fc27e32520d7e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-09 2:10 ` problems about opensm 3.3.2 Larry
[not found] ` <78022df1003081810x1a2633f3sa4895f7bbc1fe5e4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-11 13:34 ` Hal Rosenstock
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.