All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] opensm: Improved mkey support
@ 2012-06-26  0:54 Jim Foraker
       [not found] ` <1340672058.5218.97.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:54 UTC (permalink / raw)
  To: linux-rdma, Alex Netes; +Cc: Ira Weiny

     I'm about to post a set of patches intended to improve mkey support
in OpenSM.  These patches have been fairly rigorously tested on a small
fabric, and I believe are sufficiently stable for inclusion.  The
primary intent here is threefold:

1) Fix a multitude of edge case issues with the existing
single-mkey-per-subnet support in OpenSM.  For instance, the current
implementation provides no way to change an established non-zero mkey
without rebooting or manually re-keying each CA on the entire subnet.

2) Enable mkey protection across the fabric.  This involves not only
setting a non-zero protection level, but also providing the SM with a
sufficient information cache to initialize the subnet on restart without
having to wait for mkey lease timeouts (provided one is set).

3) Provide a basis on which to build multiple-mkey systems for OpenSM
(be they per-host, KDF, or random) in the future.

     The patches add two new cache files: a port guid-to-mkey cache, and
a neighboring link (port guid to port guid) cache.
     The guid2mkey cache is used to provide a hint at the initial mkey
for a CA during initialization.  It is a hint only; the SM is capable of
dealing with cases where the guid2mkey cache is incorrect, although it
may require waiting for (potentially multiple) mkey lease timeouts at
non-zero mkey protection levels.  The guid2mkey cache is presented first
in the patch set, as it ends up ameliorating several corner cases in a
cleaner way than attacking them directly did.
     The neighbors cache file provides an initial hint to the SM of what
port guid we may expect at the opposite end of a link that is being
initialized.  This is necessary at mkey protection level 2, where we
cannot do the SubnGet necessary to determine the port guid to use in
looking up an mkey hint.
     The changes to the osm_req functions to support mkeys in patch 2
now require plock to be held when called.  This was generally already
the case, but there were a few spots where it was not.  In most of these
cases, the plock is still not technically necessary, as they occur
during hops 0/1 when DR path traversal is trivial.  I wrapped all of
these occurrences in locks in a separate patch (#3), in order to make
the changes more obvious and invite comment.

     Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/8] opensm: Add guid2mkey cache file support
       [not found] ` <1340672058.5218.97.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
@ 2012-06-26  0:54   ` Jim Foraker
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-07-04  0:25   ` [PATCH 0/8] opensm: Improved mkey support Jim Foraker
  2012-08-01 14:48   ` Jim Foraker
  2 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

Adds support for a guid2mkey file, and uses the database
to select which mkey to use in outgoing SMPs.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_db_pack.h |  144 ++++++++++++++++++++++++++++++++++++++++++
 include/opensm/osm_port.h    |   23 ++-----
 include/opensm/osm_subnet.h  |    2 +
 opensm/osm_db_pack.c         |   73 +++++++++++++++++++++
 opensm/osm_lid_mgr.c         |   12 +++-
 opensm/osm_link_mgr.c        |   14 +++-
 opensm/osm_opensm.c          |   10 +--
 opensm/osm_port.c            |   32 ++++++++++
 opensm/osm_port_info_rcv.c   |    6 +-
 opensm/osm_req.c             |    1 +
 opensm/osm_state_mgr.c       |    4 ++
 opensm/osm_subnet.c          |   77 ++++++++++++++++++++++
 12 files changed, 372 insertions(+), 26 deletions(-)

diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
index 3d24926..25644df 100644
--- a/include/opensm/osm_db_pack.h
+++ b/include/opensm/osm_db_pack.h
@@ -235,5 +235,149 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid);
 * osm_db_guid2lid_get, osm_db_guid2lid_set
 *********/
 
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_init
+* NAME
+*	osm_db_guid2mkey_init
+*
+* DESCRIPTION
+*	Initialize a domain for the guid2mkey table
+*
+* SYNOPSIS
+*/
+static inline osm_db_domain_t *osm_db_guid2mkey_init(IN osm_db_t * p_db)
+{
+	return (osm_db_domain_init(p_db, "guid2mkey"));
+}
+
+/*
+* PARAMETERS
+*	p_db
+*		[in] Pointer to the database object to construct
+*
+* RETURN VALUES
+*	The pointer to the new allocated domain object or NULL.
+*
+* NOTE: DB domains are destroyed by the osm_db_destroy
+*
+* SEE ALSO
+*	Database, osm_db_init, osm_db_destroy
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_guids
+* NAME
+*	osm_db_guid2mkey_guids
+*
+* DESCRIPTION
+*	Provides back a list of guid elements.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
+			  OUT cl_qlist_t * p_guid_list);
+/*
+* PARAMETERS
+*	p_g2l
+*		[in] Pointer to the guid2mkey domain
+*
+*  p_guid_list
+*     [out] A quick list of guid elements of type osm_db_guid_elem_t
+*
+* RETURN VALUES
+*	0 if successful
+*
+* NOTE: the output qlist should be initialized and each item freed
+*       by the caller, then destroyed.
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids, osm_db_guid2mkey_get
+* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_get
+* NAME
+*	osm_db_guid2mkey_get
+*
+* DESCRIPTION
+*	Get the mkey for the given guid.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 OUT uint64_t * p_mkey);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  p_mkey
+*     [out] Pointer to the resulting mkey in host order.
+*
+* RETURN VALUES
+*	0 if successful. The lid will be set to 0 if not found.
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_set
+* NAME
+*	osm_db_guid2mkey_set
+*
+* DESCRIPTION
+*	Set the mkey for the given guid.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 IN uint64_t mkey);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  mkey
+*     [in] The mkey value to set, in host order
+*
+* RETURN VALUES
+*	0 if successful
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_get, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_delete
+* NAME
+*	osm_db_guid2mkey_delete
+*
+* DESCRIPTION
+*	Delete the entry by the given guid
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+* RETURN VALUES
+*	0 if successful otherwise 1
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_get, osm_db_guid2mkey_set
+*********/
+
 END_C_DECLS
 #endif				/* _OSM_DB_PACK_H_ */
diff --git a/include/opensm/osm_port.h b/include/opensm/osm_port.h
index 473b269..3575039 100644
--- a/include/opensm/osm_port.h
+++ b/include/opensm/osm_port.h
@@ -66,6 +66,7 @@ BEGIN_C_DECLS
 struct osm_port;
 struct osm_node;
 struct osm_mgrp;
+struct osm_sm;
 
 /****h* OpenSM/Physical Port
 * NAME
@@ -431,22 +432,9 @@ static inline void osm_physp_set_health(IN osm_physp_t * p_physp,
 *
 * SYNOPSIS
 */
-static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
-					   IN const ib_port_info_t * p_pi)
-{
-	CL_ASSERT(p_pi);
-	CL_ASSERT(osm_physp_is_valid(p_physp));
-
-	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
-		/* If PortState is down, only copy PortState */
-		/* and PortPhysicalState per C14-24-2.1 */
-		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
-		ib_port_info_set_port_phys_state
-		    (ib_port_info_get_port_phys_state(p_pi),
-		     &p_physp->port_info);
-	} else
-		p_physp->port_info = *p_pi;
-}
+void osm_physp_set_port_info(IN osm_physp_t * p_physp,
+					   IN const ib_port_info_t * p_pi,
+					   IN const struct osm_sm * p_sm);
 
 /*
 * PARAMETERS
@@ -456,6 +444,9 @@ static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
 *	p_pi
 *		[in] Pointer to the IBA defined PortInfo at this port number.
 *
+* 	p_sm
+* 		[in] Pointer to an osm_sm_t object.
+*
 * RETURN VALUES
 *	This function does not return a value.
 *
diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index d88f9c7..f8a1f3f 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -54,6 +54,7 @@
 #include <complib/cl_list.h>
 #include <opensm/osm_base.h>
 #include <opensm/osm_prefix_route.h>
+#include <opensm/osm_db.h>
 #include <stdio.h>
 
 #ifdef __cplusplus
@@ -586,6 +587,7 @@ typedef struct osm_subn {
 	boolean_t sweeping_enabled;
 	unsigned need_update;
 	cl_fmap_t mgrp_mgid_tbl;
+	osm_db_domain_t *p_g2m;
 	void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
 } osm_subn_t;
 /*
diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
index 9122ac4..3fea4a2 100644
--- a/opensm/osm_db_pack.c
+++ b/opensm/osm_db_pack.c
@@ -83,6 +83,17 @@ static inline int unpack_lids(IN char *p_lid_str, OUT uint16_t * p_min_lid,
 	return 0;
 }
 
+static inline void pack_mkey(uint64_t mkey, char *p_mkey_str)
+{
+	sprintf(p_mkey_str, "0x%016" PRIx64, mkey);
+}
+
+static inline uint64_t unpack_mkey(char *p_mkey_str)
+{
+	return strtoull(p_mkey_str, NULL, 0);
+}
+
+
 int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
 			  OUT cl_qlist_t * p_guid_list)
 {
@@ -149,3 +160,65 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid)
 	pack_guid(guid, guid_str);
 	return osm_db_delete(p_g2l, guid_str);
 }
+
+int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
+			   OUT cl_qlist_t * p_guid_list)
+{
+	char *p_key;
+	cl_list_t keys;
+	osm_db_guid_elem_t *p_guid_elem;
+
+	cl_list_construct(&keys);
+	cl_list_init(&keys, 10);
+
+	if (osm_db_keys(p_g2m, &keys))
+		return 1;
+
+	while ((p_key = cl_list_remove_head(&keys)) != NULL) {
+		p_guid_elem =
+		    (osm_db_guid_elem_t *) malloc(sizeof(osm_db_guid_elem_t));
+		CL_ASSERT(p_guid_elem != NULL);
+
+		p_guid_elem->guid = unpack_guid(p_key);
+		cl_qlist_insert_head(p_guid_list, &p_guid_elem->item);
+	}
+
+	cl_list_destroy(&keys);
+	return 0;
+}
+
+int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 OUT uint64_t * p_mkey)
+{
+	char guid_str[20];
+	char *p_mkey_str;
+
+	pack_guid(guid, guid_str);
+	p_mkey_str = osm_db_lookup(p_g2m, guid_str);
+	if (!p_mkey_str)
+		return 1;
+
+	if (p_mkey)
+		*p_mkey = unpack_mkey(p_mkey_str);
+
+	return 0;
+}
+
+int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 IN uint64_t mkey)
+{
+	char guid_str[20];
+	char mkey_str[20];
+
+	pack_guid(guid, guid_str);
+	pack_mkey(mkey, mkey_str);
+
+	return osm_db_update(p_g2m, guid_str, mkey_str);
+}
+
+int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
+{
+	char guid_str[20];
+	pack_guid(guid, guid_str);
+	return osm_db_delete(p_g2m, guid_str);
+}
diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
index a7613e2..b169f6b 100644
--- a/opensm/osm_lid_mgr.c
+++ b/opensm/osm_lid_mgr.c
@@ -798,6 +798,7 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 	uint8_t op_vls;
 	uint8_t port_num;
 	boolean_t send_set = FALSE;
+	boolean_t update_mkey = FALSE;
 	int ret = 0;
 
 	OSM_LOG_ENTER(p_mgr->p_log);
@@ -860,8 +861,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 		send_set = TRUE;
 
 	p_pi->m_key = p_mgr->p_subn->opt.m_key;
-	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key)))
+	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key))) {
+		update_mkey = TRUE;
 		send_set = TRUE;
+	}
 
 	p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix;
 	if (memcmp(&p_pi->subnet_prefix, &p_old_pi->subnet_prefix,
@@ -1051,6 +1054,13 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 			     CL_DISP_MSGID_NONE, &context);
 	if (status != IB_SUCCESS)
 		ret = -1;
+	/* If we sent a new mkey above, update our guid2mkey map
+	   now, on the assumption that the SubnSet succeeds
+	*/
+	if (update_mkey)
+		osm_db_guid2mkey_set(p_mgr->p_subn->p_g2m,
+				     cl_ntoh64(p_physp->port_guid),
+				     cl_ntoh64(p_pi->m_key));
 
 Exit:
 	OSM_LOG_EXIT(p_mgr->p_log);
diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
index 8fcd0da..98eceaf 100644
--- a/opensm/osm_link_mgr.c
+++ b/opensm/osm_link_mgr.c
@@ -54,6 +54,7 @@
 #include <opensm/osm_helper.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db_pack.h>
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
@@ -102,6 +103,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 	int qdr_change = 0, fdr10_change = 0;
 	int ret = 0;
 	ib_net32_t attr_mod, cap_mask;
+	boolean_t update_mkey = FALSE;
 
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -192,8 +194,10 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 		    port_num == 0) {
 			p_pi->m_key = sm->p_subn->opt.m_key;
 			if (memcmp(&p_pi->m_key, &p_old_pi->m_key,
-				   sizeof(p_pi->m_key)))
+				   sizeof(p_pi->m_key))) {
+				update_mkey = TRUE;
 				send_set = TRUE;
+			}
 
 			p_pi->subnet_prefix = sm->p_subn->opt.subnet_prefix;
 			if (memcmp(&p_pi->subnet_prefix,
@@ -464,6 +468,14 @@ Send:
 	if (status)
 		ret = -1;
 
+	/* If we sent a new mkey above, update our guid2mkey map
+ 	   now, on the assumption that the SubnSet succeeds
+	 */
+	if (update_mkey)
+		osm_db_guid2mkey_set(sm->p_subn->p_g2m,
+				     cl_ntoh64(p_physp->port_guid),
+				     cl_ntoh64(p_pi->m_key));
+
 	if (send_set2) {
 		status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_physp),
 				     payload2, sizeof(payload2),
diff --git a/opensm/osm_opensm.c b/opensm/osm_opensm.c
index 379e269..f518591 100644
--- a/opensm/osm_opensm.c
+++ b/opensm/osm_opensm.c
@@ -410,6 +410,11 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
 	if (status != IB_SUCCESS)
 		goto Exit;
 
+	/* the DB is in use by subn so init before */
+	status = osm_db_init(&p_osm->db, &p_osm->log);
+	if (status != IB_SUCCESS)
+		goto Exit;
+
 	status = osm_subn_init(&p_osm->subn, p_osm, p_opt);
 	if (status != IB_SUCCESS)
 		goto Exit;
@@ -432,11 +437,6 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
 	if (status != IB_SUCCESS)
 		goto Exit;
 
-	/* the DB is in use by the SM and SA so init before */
-	status = osm_db_init(&p_osm->db, &p_osm->log);
-	if (status != IB_SUCCESS)
-		goto Exit;
-
 	status = osm_sm_init(&p_osm->sm, &p_osm->subn, &p_osm->db,
 			     p_osm->p_vendor, &p_osm->mad_pool, &p_osm->vl15,
 			     &p_osm->log, &p_osm->stats, &p_osm->disp,
diff --git a/opensm/osm_port.c b/opensm/osm_port.c
index 5438c2c..91ed3a3 100644
--- a/opensm/osm_port.c
+++ b/opensm/osm_port.c
@@ -52,6 +52,8 @@
 #include <opensm/osm_node.h>
 #include <opensm/osm_madw.h>
 #include <opensm/osm_switch.h>
+#include <opensm/osm_db_pack.h>
+#include <opensm/osm_sm.h>
 
 void osm_physp_construct(IN osm_physp_t * p_physp)
 {
@@ -657,3 +659,33 @@ void osm_alias_guid_delete(IN OUT osm_alias_guid_t ** pp_alias_guid)
 	free(*pp_alias_guid);
 	*pp_alias_guid = NULL;
 }
+
+void osm_physp_set_port_info(IN osm_physp_t * p_physp,
+					   IN const ib_port_info_t * p_pi,
+					   IN const struct osm_sm * p_sm)
+{
+	CL_ASSERT(p_pi);
+	CL_ASSERT(osm_physp_is_valid(p_physp));
+
+	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
+		/* If PortState is down, only copy PortState */
+		/* and PortPhysicalState per C14-24-2.1 */
+		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
+		ib_port_info_set_port_phys_state
+		    (ib_port_info_get_port_phys_state(p_pi),
+		     &p_physp->port_info);
+	} else {
+		p_physp->port_info = *p_pi;
+
+		/* The MKey in p_pi can only be considered valid if it's
+		 * for a HCA/router or switch port 0, and it's either
+		 * non-zero or the MKeyProtect bits are also zero.
+		 */
+		if ((osm_node_get_type(p_physp->p_node) != IB_NODE_TYPE_SWITCH ||
+		     p_physp->port_num == 0) &&
+		    (p_pi->m_key != 0 || ib_port_info_get_mpb(p_pi) == 0)) 
+			osm_db_guid2mkey_set(p_sm->p_subn->p_g2m,
+				     	     cl_ntoh64(p_physp->port_guid),
+					     cl_ntoh64(p_pi->m_key));
+	}
+}
diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
index a4f81a8..7afa26b 100644
--- a/opensm/osm_port_info_rcv.c
+++ b/opensm/osm_port_info_rcv.c
@@ -310,7 +310,7 @@ static void pi_rcv_process_switch_port(IN osm_sm_t * sm, IN osm_node_t * p_node,
 	/*
 	   Update the PortInfo attribute.
 	 */
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	if (port_num == 0) {
 		/* Determine if base switch port 0 */
@@ -335,7 +335,7 @@ static void pi_rcv_process_ca_or_router_port(IN osm_sm_t * sm,
 
 	pi_rcv_check_and_fix_lid(sm->p_log, p_pi, p_physp);
 
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	pi_rcv_process_endport(sm, p_physp, p_pi);
 
@@ -473,7 +473,7 @@ static void pi_rcv_process_set(IN osm_sm_t * sm, IN osm_node_t * p_node,
 		cl_ntoh64(osm_node_get_node_guid(p_node)),
 		cl_ntoh64(p_smp->trans_id));
 
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	OSM_LOG_EXIT(sm->p_log);
 }
diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index 7e9e664..640334a 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -56,6 +56,7 @@
 #include <opensm/osm_vl15intf.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db_pack.h>
 
 /**********************************************************************
   The plock MAY or MAY NOT be held before calling this function.
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 2629fc5..39fb15a 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -64,6 +64,7 @@
 #include <vendor/osm_vendor_api.h>
 #include <opensm/osm_inform.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db.h>
 
 extern void osm_drop_mgr_process(IN osm_sm_t * sm);
 extern int osm_qos_setup(IN osm_opensm_t * p_osm);
@@ -1444,6 +1445,9 @@ repeat_discovery:
 	if (sm->p_subn->force_heavy_sweep
 	    || sm->p_subn->subnet_initialization_error)
 		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
+
+	/* Write a new copy of our persistent guid2mkey database */
+	osm_db_store(sm->p_subn->p_g2m);
 }
 
 static void do_process_mgrp_queue(osm_sm_t * sm)
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 04c7b18..200287a 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -73,6 +73,8 @@
 #include <opensm/osm_event_plugin.h>
 #include <opensm/osm_qos_policy.h>
 #include <opensm/osm_service.h>
+#include <opensm/osm_db.h>
+#include <opensm/osm_db_pack.h>
 
 static const char null_str[] = "(null)";
 
@@ -422,6 +424,52 @@ static int compar_mgids(const void *m1, const void *m2)
 	return memcmp(m1, m2, sizeof(ib_gid_t));
 }
 
+static void subn_validate_g2m(osm_subn_t *p_subn)
+{
+	cl_qlist_t guids;
+	osm_db_guid_elem_t *p_item;
+	uint64_t mkey;
+	boolean_t valid_entry;
+
+	OSM_LOG_ENTER(&(p_subn->p_osm->log));
+	cl_qlist_init(&guids);
+
+	if (osm_db_guid2mkey_guids(p_subn->p_g2m, &guids)) {
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7506: "
+			"could not get mkey guid list\n");
+		goto Exit;
+	}
+
+	while ((p_item = (osm_db_guid_elem_t *) cl_qlist_remove_head(&guids))
+	       != (osm_db_guid_elem_t *) cl_qlist_end(&guids)) {
+		valid_entry = TRUE;
+
+		if (p_item->guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7507: found invalid zero guid");
+			valid_entry = FALSE;
+		} else if (osm_db_guid2mkey_get(p_subn->p_g2m, p_item->guid,
+						&mkey)) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7508: could not get mkey for guid:0x%016"
+				PRIx64 "\n", p_item->guid);
+			valid_entry = FALSE;
+		}
+
+		if (valid_entry == FALSE) {
+			if (osm_db_guid2mkey_delete(p_subn->p_g2m,
+                                                        p_item->guid))
+                                OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+                                        "ERR 7509: failed to delete entry for "
+                                        "guid:0x%016" PRIx64 "\n",
+                                        p_item->guid);
+		}
+	}
+
+Exit:
+	OSM_LOG_EXIT(&(p_subn->p_osm->log));
+}
+
 void osm_subn_construct(IN osm_subn_t * p_subn)
 {
 	memset(p_subn, 0, sizeof(*p_subn));
@@ -628,6 +676,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 	p_subn->sweeping_enabled = TRUE;
 	p_subn->last_sm_port_state = 1;
 
+	/* Initialize the guid2mkey database */
+	p_subn->p_g2m = osm_db_domain_init(&(p_osm->db), "guid2mkey");
+	if (!p_subn->p_g2m) {
+		OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7510: "
+			"Error initializing Guid-to-MKey persistent database\n");
+		return IB_ERROR;
+	}
+
+	if (osm_db_restore(p_subn->p_g2m)) {
+#ifndef __WIN__
+		/*
+		 * When Windows is BSODing, it might corrupt files that
+		 * were previously opened for writing, even if the files
+		 * are closed, so we might see corrupted guid2mkey file.
+		 */
+		if (p_subn->opt.exit_on_fatal) {
+			osm_log(&(p_osm->log), OSM_LOG_SYS, 
+				"FATAL: Error restoring Guid-to-Mkey "
+				"persistent database\n");
+			return IB_ERROR;
+		} else
+#endif
+			OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
+				"ERR 7511: Error restoring Guid-to-Mkey "
+				"persistent database\n");
+	}
+
+	subn_validate_g2m(p_subn);
+
 	return IB_SUCCESS;
 }
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/8] opensm: Allow recovery of subnets with misset mkeys
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
@ 2012-06-26  0:54       ` Jim Foraker
  2012-06-26  0:54       ` [PATCH 3/8] Add locking where necessary around osm_req_* Jim Foraker
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

Allow the initialization of endpoints that already have an mkey
configured that is different than that listed in the configuration
file.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_req.c |  103 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 95 insertions(+), 8 deletions(-)

diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index 640334a..ee13517 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -59,7 +59,88 @@
 #include <opensm/osm_db_pack.h>
 
 /**********************************************************************
-  The plock MAY or MAY NOT be held before calling this function.
+  The plock must be held before calling this function.
+**********************************************************************/
+static ib_net64_t req_determine_mkey(IN osm_sm_t * sm,
+				     IN const osm_dr_path_t * p_path)
+{
+	osm_node_t *p_node;
+	osm_port_t *p_sm_port;
+	osm_physp_t *p_physp;
+	ib_net64_t dest_port_guid, m_key;
+	uint8_t hop;
+
+	OSM_LOG_ENTER(sm->p_log);
+
+	p_physp = NULL;
+
+	p_sm_port = osm_get_port_by_guid(sm->p_subn, sm->p_subn->sm_port_guid);
+
+	/* hop_count == 0: destination port guid is SM */
+	if (p_path->hop_count == 0) {
+		if (p_sm_port != NULL)
+			dest_port_guid = sm->p_subn->sm_port_guid;
+		else
+			dest_port_guid = sm->p_subn->opt.guid;
+		goto Remote_Guid;
+	}
+
+	if (p_sm_port) {
+		p_physp = p_sm_port->p_physp;
+	}
+
+	/* hop_count == 1: outgoing physp is SM physp */
+	for (hop = 2; p_physp && hop <= p_path->hop_count; hop++) {
+		p_physp = p_physp->p_remote_physp;
+		if (!p_physp) {
+			break;
+		}
+		p_node = p_physp->p_node;
+		p_physp = osm_node_get_physp_ptr(p_node, p_path->path[hop]);
+	}
+
+	/* At this point, p_physp points at the outgoing physp on the
+	   last hop, or NULL if we don't know it.
+	*/
+	if (!p_physp) {
+		OSM_LOG(sm->p_log, OSM_LOG_ERROR,
+			"req_determine_mkey: ERR 1107: "
+			"Outgoing physp is null on non-hop_0!\n");
+		dest_port_guid = 0;
+		goto Remote_Guid;
+	}
+	
+	if (p_physp->p_remote_physp) {
+		dest_port_guid = p_physp->p_remote_physp->port_guid;
+		goto Remote_Guid;
+	}
+
+Remote_Guid:
+	if (dest_port_guid) {
+		if (!osm_db_guid2mkey_get(sm->p_subn->p_g2m,
+					  cl_ntoh64(dest_port_guid), &m_key)) {
+			m_key = cl_hton64(m_key);
+			OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+				"Found mkey for guid 0x%"
+				PRIx64 "\n", cl_ntoh64(dest_port_guid));
+		} else {
+			OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+				"Target port mkey unknown, using default\n");
+			m_key = sm->p_subn->opt.m_key;
+		}
+	} else {
+		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+			"Target port guid unknown, using default\n");
+		m_key = sm->p_subn->opt.m_key;
+	}
+
+	OSM_LOG_EXIT(sm->p_log);
+
+	return m_key;
+}
+
+/**********************************************************************
+  The plock must be held before calling this function.
 **********************************************************************/
 ib_api_status_t osm_req_get(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 			    IN ib_net16_t attr_id, IN ib_net32_t attr_mod,
@@ -69,6 +150,7 @@ ib_api_status_t osm_req_get(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	osm_madw_t *p_madw;
 	ib_api_status_t status = IB_SUCCESS;
 	ib_net64_t tid;
+	ib_net64_t m_key;
 
 	CL_ASSERT(sm);
 
@@ -93,15 +175,17 @@ ib_api_status_t osm_req_get(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	}
 
 	tid = cl_hton64((uint64_t) cl_atomic_inc(&sm->sm_trans_id));
+	m_key = req_determine_mkey(sm, p_path);
 
 	OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
-		"Getting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64 "\n",
+		"Getting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64
+                ", MKey 0x%016" PRIx64 "\n",
 		ib_get_sm_attr_str(attr_id), cl_ntoh16(attr_id),
-		cl_ntoh32(attr_mod), cl_ntoh64(tid));
+		cl_ntoh32(attr_mod), cl_ntoh64(tid), cl_ntoh64(m_key));
 
 	ib_smp_init_new(osm_madw_get_smp_ptr(p_madw), IB_MAD_METHOD_GET,
 			tid, attr_id, attr_mod, p_path->hop_count,
-			sm->p_subn->opt.m_key, p_path->path,
+			m_key, p_path->path,
 			IB_LID_PERMISSIVE, IB_LID_PERMISSIVE);
 
 	p_madw->mad_addr.dest_lid = IB_LID_PERMISSIVE;
@@ -126,7 +210,7 @@ Exit:
 }
 
 /**********************************************************************
-  The plock MAY or MAY NOT be held before calling this function.
+  The plock must be held before calling this function.
 **********************************************************************/
 ib_api_status_t osm_req_set(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 			    IN const uint8_t * p_payload,
@@ -138,6 +222,7 @@ ib_api_status_t osm_req_set(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	osm_madw_t *p_madw;
 	ib_api_status_t status = IB_SUCCESS;
 	ib_net64_t tid;
+	ib_net64_t m_key;
 
 	CL_ASSERT(sm);
 
@@ -163,15 +248,17 @@ ib_api_status_t osm_req_set(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	}
 
 	tid = cl_hton64((uint64_t) cl_atomic_inc(&sm->sm_trans_id));
+	m_key = req_determine_mkey(sm, p_path);
 
 	OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
-		"Setting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64 "\n",
+		"Setting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64 
+                ", MKey 0x%016" PRIx64 "\n",
 		ib_get_sm_attr_str(attr_id), cl_ntoh16(attr_id),
-		cl_ntoh32(attr_mod), cl_ntoh64(tid));
+		cl_ntoh32(attr_mod), cl_ntoh64(tid), cl_ntoh64(m_key));
 
 	ib_smp_init_new(osm_madw_get_smp_ptr(p_madw), IB_MAD_METHOD_SET,
 			tid, attr_id, attr_mod, p_path->hop_count,
-			sm->p_subn->opt.m_key, p_path->path,
+			m_key, p_path->path,
 			IB_LID_PERMISSIVE, IB_LID_PERMISSIVE);
 
 	p_madw->mad_addr.dest_lid = IB_LID_PERMISSIVE;
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/8] Add locking where necessary around osm_req_*
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-06-26  0:54       ` [PATCH 2/8] opensm: Allow recovery of subnets with misset mkeys Jim Foraker
@ 2012-06-26  0:54       ` Jim Foraker
  2012-06-26  0:55       ` [PATCH 4/8] Add support for setting mkey protection levels Jim Foraker
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

Grabs plock for reading in the places where one did not
already exist when osm_req_get/osm_req_set are called.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_perfmgr.c      |    6 ++++++
 opensm/osm_sm_state_mgr.c |    2 ++
 opensm/osm_state_mgr.c    |    8 ++++++++
 opensm/osm_trap_rcv.c     |    6 +++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
index 41ef08b..1b4b4cf 100644
--- a/opensm/osm_perfmgr.c
+++ b/opensm/osm_perfmgr.c
@@ -620,8 +620,10 @@ static int sweep_hop_1(osm_sm_t * sm)
 		path_array[1] = port_num;
 
 		osm_dr_path_init(&hop_1_path, h_bind, 1, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &hop_1_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, &context);
+		CL_PLOCK_RELEASE(sm->p_lock);
 
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 4C82: "
@@ -652,9 +654,11 @@ static int sweep_hop_1(osm_sm_t * sm)
 			path_array[1] = port_num;
 
 			osm_dr_path_init(&hop_1_path, h_bind, 1, path_array);
+			CL_PLOCK_ACQUIRE(sm->p_lock);
 			status = osm_req_get(sm, &hop_1_path,
 					     IB_MAD_ATTR_NODE_INFO, 0,
 					     CL_DISP_MSGID_NONE, &context);
+			CL_PLOCK_RELEASE(sm->p_lock);
 
 			if (status != IB_SUCCESS)
 				OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 4C84: "
@@ -714,8 +718,10 @@ static int sweep_hop_0(osm_sm_t * sm)
 	}
 
 	osm_dr_path_init(&dr_path, h_bind, 0, path_array);
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_NODE_INFO, 0,
 			     CL_DISP_MSGID_NONE, NULL);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR,
diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
index ac895fa..602a393 100644
--- a/opensm/osm_sm_state_mgr.c
+++ b/opensm/osm_sm_state_mgr.c
@@ -107,9 +107,11 @@ static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm)
 	context.smi_context.port_guid = p_port->guid;
 	context.smi_context.set_method = FALSE;
 
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_get(sm, osm_physp_get_dr_path_ptr(p_port->p_physp),
 			     IB_MAD_ATTR_SM_INFO, 0, CL_DISP_MSGID_NONE,
 			     &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3204: "
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 39fb15a..4d5da46 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -237,8 +237,10 @@ static ib_api_status_t state_mgr_sweep_hop_0(IN osm_sm_t * sm)
 		CL_PLOCK_RELEASE(sm->p_lock);
 
 		osm_dr_path_init(&dr_path, h_bind, 0, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, NULL);
+		CL_PLOCK_RELEASE(sm->p_lock);
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3305: "
 				"Request for NodeInfo failed (%s)\n",
@@ -442,8 +444,10 @@ static ib_api_status_t state_mgr_sweep_hop_1(IN osm_sm_t * sm)
 		path_array[1] = port_num;
 
 		osm_dr_path_init(&hop_1_path, h_bind, 1, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &hop_1_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, &context);
+		CL_PLOCK_RELEASE(sm->p_lock);
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3311: "
 				"Request for NodeInfo failed (%s)\n",
@@ -472,10 +476,12 @@ static ib_api_status_t state_mgr_sweep_hop_1(IN osm_sm_t * sm)
 				path_array[1] = port_num;
 				osm_dr_path_init(&hop_1_path, h_bind, 1,
 						 path_array);
+				CL_PLOCK_ACQUIRE(sm->p_lock);
 				status = osm_req_get(sm, &hop_1_path,
 						     IB_MAD_ATTR_NODE_INFO, 0,
 						     CL_DISP_MSGID_NONE,
 						     &context);
+				CL_PLOCK_RELEASE(sm->p_lock);
 				if (status != IB_SUCCESS)
 					OSM_LOG(sm->p_log, OSM_LOG_ERROR,
 						"ERR 3312: "
@@ -822,10 +828,12 @@ static void state_mgr_send_handover(IN osm_sm_t * sm, IN osm_remote_sm_t * p_sm)
 		p_smi->sm_key = 0;
 	}
 
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_port->p_physp),
 			     payload, sizeof(payload), IB_MAD_ATTR_SM_INFO,
 			     IB_SMINFO_ATTR_MOD_HANDOVER, CL_DISP_MSGID_NONE,
 			     &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3317: "
diff --git a/opensm/osm_trap_rcv.c b/opensm/osm_trap_rcv.c
index 2aaaba4..5419c09 100644
--- a/opensm/osm_trap_rcv.c
+++ b/opensm/osm_trap_rcv.c
@@ -211,6 +211,7 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p)
 	uint8_t payload[IB_SMP_DATA_SIZE];
 	osm_madw_context_t context;
 	ib_port_info_t *pi = (ib_port_info_t *)payload;
+	ib_api_status_t status;
 
 	/* select the nearest port to master opensm */
 	if (p->p_remote_physp &&
@@ -233,10 +234,13 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p)
 	context.pi_context.light_sweep = FALSE;
 	context.pi_context.active_transition = FALSE;
 
-	return osm_req_set(sm, osm_physp_get_dr_path_ptr(p),
+	CL_PLOCK_ACQUIRE(sm->p_lock);
+	status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p),
 			   payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO,
 			   cl_hton32(osm_physp_get_port_num(p)),
 			   CL_DISP_MSGID_NONE, &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
+	return status;
 }
 
 static void log_trap_info(osm_log_t *p_log, ib_mad_notice_attr_t *p_ntci,
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/8] Add support for setting mkey protection levels
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-06-26  0:54       ` [PATCH 2/8] opensm: Allow recovery of subnets with misset mkeys Jim Foraker
  2012-06-26  0:54       ` [PATCH 3/8] Add locking where necessary around osm_req_* Jim Foraker
@ 2012-06-26  0:55       ` Jim Foraker
  2012-06-26  0:55       ` [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts Jim Foraker
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

The M_Key Protection Level for the subnet may now be set
in the config file by specifying a numeric value for
m_key_protection_level (defaults to 0).

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_subnet.h |    1 +
 opensm/osm_lid_mgr.c        |   21 +++++++++++----------
 opensm/osm_link_mgr.c       |    2 +-
 opensm/osm_subnet.c         |    5 +++++
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index f8a1f3f..8e96314 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -161,6 +161,7 @@ typedef struct osm_subn_opt {
 	ib_net64_t sa_key;
 	ib_net64_t subnet_prefix;
 	ib_net16_t m_key_lease_period;
+        uint8_t m_key_protect_bits;
 	uint32_t sweep_interval;
 	uint32_t max_wire_smps;
 	uint32_t max_wire_smps2;
diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
index b169f6b..dc32c66 100644
--- a/opensm/osm_lid_mgr.c
+++ b/opensm/osm_lid_mgr.c
@@ -887,6 +887,11 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 		   sizeof(p_pi->m_key_lease_period)))
 		send_set = TRUE;
 
+	p_pi->mkey_lmc = 0;
+	ib_port_info_set_mpb(p_pi, p_mgr->p_subn->opt.m_key_protect_bits);
+	if (ib_port_info_get_mpb(p_pi) != ib_port_info_get_mpb(p_old_pi))
+		send_set = TRUE; 
+
 	/*
 	   we want to set the timeout for both the switch port 0
 	   and the CA ports
@@ -908,12 +913,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 			   sizeof(p_pi->link_width_enabled)))
 			send_set = TRUE;
 
-		/* M_KeyProtectBits are currently always zero */
-		p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc;
+		/* p_pi->mkey_lmc is initialized earlier */
+		ib_port_info_set_lmc(p_pi, p_mgr->p_subn->opt.lmc);
 		if (ib_port_info_get_lmc(p_pi) !=
-		    ib_port_info_get_lmc(p_old_pi) ||
-		    ib_port_info_get_mpb(p_pi) !=
-		    ib_port_info_get_mpb(p_old_pi))
+		    ib_port_info_get_lmc(p_old_pi))
 			send_set = TRUE;
 
 		/* calc new op_vls and mtu */
@@ -994,12 +997,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 
 		/* Determine if enhanced switch port 0 and if so set LMC */
 		if (osm_switch_sp0_is_lmc_capable(p_node->sw, p_mgr->p_subn)) {
-			/* M_KeyProtectBits are currently always zero */
-			p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc;
+			/* p_pi->mkey_lmc is initialized earlier */
+                        ib_port_info_set_lmc(p_pi, p_mgr->p_subn->opt.lmc);
 			if (ib_port_info_get_lmc(p_pi) !=
-			    ib_port_info_get_lmc(p_old_pi) ||
-			    ib_port_info_get_mpb(p_pi) !=
-			    ib_port_info_get_mpb(p_old_pi))
+			    ib_port_info_get_lmc(p_old_pi))
 				send_set = TRUE;
 		}
 	}
diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
index 98eceaf..02fde95 100644
--- a/opensm/osm_link_mgr.c
+++ b/opensm/osm_link_mgr.c
@@ -238,8 +238,8 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 				   sizeof(p_pi->m_key_lease_period)))
 				send_set = TRUE;
 
-			/* M_KeyProtectBits are currently always zero */
 			p_pi->mkey_lmc = 0;
+			ib_port_info_set_mpb(p_pi, sm->p_subn->opt.m_key_protect_bits);
 			if (esp0 == FALSE || sm->p_subn->opt.lmc_esp0)
 				ib_port_info_set_lmc(p_pi, sm->p_subn->opt.lmc);
 			if (ib_port_info_get_lmc(p_old_pi) !=
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 200287a..4f5272f 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -300,6 +300,7 @@ static const opt_rec_t opt_tbl[] = {
 	{ "sa_key", OPT_OFFSET(sa_key), opts_parse_net64, NULL, 1 },
 	{ "subnet_prefix", OPT_OFFSET(subnet_prefix), opts_parse_net64, NULL, 1 },
 	{ "m_key_lease_period", OPT_OFFSET(m_key_lease_period), opts_parse_net16, NULL, 1 },
+        { "m_key_protection_level", OPT_OFFSET(m_key_protect_bits), opts_parse_uint8, NULL, 1 },
 	{ "sweep_interval", OPT_OFFSET(sweep_interval), opts_parse_uint32, NULL, 1 },
 	{ "max_wire_smps", OPT_OFFSET(max_wire_smps), opts_parse_uint32, NULL, 1 },
 	{ "max_wire_smps2", OPT_OFFSET(max_wire_smps2), opts_parse_uint32, NULL, 1 },
@@ -894,6 +895,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
 	p_opt->sa_key = OSM_DEFAULT_SA_KEY;
 	p_opt->subnet_prefix = IB_DEFAULT_SUBNET_PREFIX;
 	p_opt->m_key_lease_period = 0;
+        p_opt->m_key_protect_bits = 0;
 	p_opt->sweep_interval = OSM_DEFAULT_SWEEP_INTERVAL_SECS;
 	p_opt->max_wire_smps = OSM_DEFAULT_SMP_MAX_ON_WIRE;
 	p_opt->max_wire_smps2 = p_opt->max_wire_smps;
@@ -1558,6 +1560,8 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
 		"m_key 0x%016" PRIx64 "\n\n"
 		"# The lease period used for the M_Key on this subnet in [sec]\n"
 		"m_key_lease_period %u\n\n"
+                "# The protection level used for the M_Key on this subnet\n"
+                "m_key_protection_level %u\n\n"
 		"# SM_Key value of the SM used for SM authentication\n"
 		"sm_key 0x%016" PRIx64 "\n\n"
 		"# SM_Key value to qualify rcv SA queries as 'trusted'\n"
@@ -1639,6 +1643,7 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
 		cl_ntoh64(p_opts->guid),
 		cl_ntoh64(p_opts->m_key),
 		cl_ntoh16(p_opts->m_key_lease_period),
+                p_opts->m_key_protect_bits,
 		cl_ntoh64(p_opts->sm_key),
 		cl_ntoh64(p_opts->sa_key),
 		cl_ntoh64(p_opts->subnet_prefix),
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                         ` (2 preceding siblings ...)
  2012-06-26  0:55       ` [PATCH 4/8] Add support for setting mkey protection levels Jim Foraker
@ 2012-06-26  0:55       ` Jim Foraker
       [not found]         ` <1340672104-18039-5-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-06-26  0:55       ` [PATCH 6/8] opensm: Add neighboring link cache file Jim Foraker
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

A subnet should not be listed as cleanly initialized if CAs
fail to respond to SubnGet requests.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_sm_mad_ctrl.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
index f0bcff2..464b6b0 100644
--- a/opensm/osm_sm_mad_ctrl.c
+++ b/opensm/osm_sm_mad_ctrl.c
@@ -741,6 +741,15 @@ static void sm_mad_ctrl_send_err_cb(IN void *context, IN osm_madw_t * p_madw)
 			cl_ntoh16(p_smp->attr_id),
 			ib_get_sm_attr_str(p_smp->attr_id));
 		p_ctrl->p_subn->subnet_initialization_error = TRUE;
+	} else if (p_madw->status == IB_TIMEOUT &&
+		   p_smp->method == IB_MAD_METHOD_GET) {
+		/* Timeouts on SubnGet may be an indication of an mkey
+	 	   error at protection levels 2/3 */
+		OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
+			"Timeout while getting attribute 0x%X (%s)\n",
+			cl_ntoh16(p_smp->attr_id),
+			ib_get_sm_attr_str(p_smp->attr_id));
+		p_ctrl->p_subn->subnet_initialization_error = TRUE;
 	}
 
 	osm_dump_dr_smp(p_ctrl->p_log, p_smp, OSM_LOG_VERBOSE);
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/8] opensm: Add neighboring link cache file
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                         ` (3 preceding siblings ...)
  2012-06-26  0:55       ` [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts Jim Foraker
@ 2012-06-26  0:55       ` Jim Foraker
  2012-06-26  0:55       ` [PATCH 7/8] opensm: Check for valid mkey protection level in config file Jim Foraker
  2012-06-26  0:55       ` [PATCH 8/8] opensm: Ensure sweep interval/mkey lease are sensibly set Jim Foraker
  6 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

At high mkey protection levels (ie, 2), an initializing OpenSM
may run into a chicken-and-egg problem, where it needs the guid
of a previously-configured HCA in order to determine what mkey to
use when requesting its guid in the NodeInfo SMP.  By caching
the guids/port numbers at either end of each link between restarts,
this problem is avoided.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_db_pack.h |  185 ++++++++++++++++++++++++++++++++++++++++++
 include/opensm/osm_subnet.h  |    1 +
 opensm/osm_db_pack.c         |  103 +++++++++++++++++++++++
 opensm/osm_node_info_rcv.c   |   17 +++-
 opensm/osm_req.c             |   11 ++-
 opensm/osm_state_mgr.c       |    1 +
 opensm/osm_subnet.c          |  102 +++++++++++++++++++++++
 7 files changed, 418 insertions(+), 2 deletions(-)

diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
index 25644df..1ff2ff9 100644
--- a/include/opensm/osm_db_pack.h
+++ b/include/opensm/osm_db_pack.h
@@ -379,5 +379,190 @@ int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
 * osm_db_guid2mkey_get, osm_db_guid2mkey_set
 *********/
 
+/****f* OpenSM: DB-Pack/osm_db_neighbor_init
+* NAME
+*	osm_db_neighbor_init
+*
+* DESCRIPTION
+*	Initialize a domain for the neighbors table
+*
+* SYNOPSIS
+*/
+static inline osm_db_domain_t *osm_db_neighbor_init(IN osm_db_t * p_db)
+{
+	return (osm_db_domain_init(p_db, "neighbors"));
+}
+
+/*
+* PARAMETERS
+*	p_db
+*		[in] Pointer to the database object to construct
+*
+* RETURN VALUES
+*	The pointer to the new allocated domain object or NULL.
+*
+* NOTE: DB domains are destroyed by the osm_db_destroy
+*
+* SEE ALSO
+*	Database, osm_db_init, osm_db_destroy
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_elem
+* NAME
+*	osm_db_neighbor_elem
+*
+* DESCRIPTION
+*	Initialize a domain for the neighbor table
+*
+* SYNOPSIS
+*/
+typedef struct osm_db_neighbor_elem {
+	cl_list_item_t item;
+	uint64_t guid;
+	uint8_t portnum;
+} osm_db_neighbor_elem_t;
+/*
+* FIELDS
+*	item
+*		required for list manipulations
+*
+*  guid
+*  portnum
+*
+************/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_guids
+* NAME
+*	osm_db_neighbor_guids
+*
+* DESCRIPTION
+*	Provides back a list of neighbor elements.
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_guids(IN osm_db_domain_t * p_neighbor,
+			  OUT cl_qlist_t * p_guid_list);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  p_guid_list
+*     [out] A quick list of neighbor elements of type osm_db_neighbor_elem_t
+*
+* RETURN VALUES
+*	0 if successful
+*
+* NOTE: the output qlist should be initialized and each item freed
+*       by the caller, then destroyed.
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids, osm_db_neighbor_get
+* osm_db_neighbor_set, osm_db_neighbor_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_get
+* NAME
+*	osm_db_neighbor_get
+*
+* DESCRIPTION
+*	Get a neighbor's guid by given guid/port.
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_get(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t port1, OUT uint64_t * p_guid2,
+			OUT uint8_t * p_port2);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  guid1
+*     [in] The guid to look for
+*
+*  port1
+*     [in] The port to look for
+*
+*  p_guid2
+*     [out] Pointer to the resulting guid of the neighboring port.
+*
+*  p_port2
+*     [out] Pointer to the resulting port of the neighboring port.
+*
+* RETURN VALUES
+*	0 if successful. The lid will be set to 0 if not found.
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids
+* osm_db_neighbor_set, osm_db_neighbor_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_set
+* NAME
+*	osm_db_neighbor_set
+*
+* DESCRIPTION
+*	Set up a relationship between two ports
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_set(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t port1, IN uint64_t guid2, IN uint8_t port2);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  guid1
+*     [in] The first guid in the relationship
+*
+*  port1
+*     [in] The first port in the relationship
+*
+*  guid2
+*     [in] The second guid in the relationship
+*
+*  port2
+*     [in] The second port in the relationship
+*
+* RETURN VALUES
+*	0 if successful
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids
+* osm_db_neighbor_get, osm_db_neighbor_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_delete
+* NAME
+*	osm_db_neighbor_delete
+*
+* DESCRIPTION
+*	Delete the relationship between two ports
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_delete(IN osm_db_domain_t * p_neighbor,
+			   IN uint64_t guid, IN uint8_t port);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  port
+*     [in] The port to look for
+*
+* RETURN VALUES
+*	0 if successful otherwise 1
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids
+* osm_db_neighbor_get, osm_db_neighbor_set
+*********/
+
 END_C_DECLS
 #endif				/* _OSM_DB_PACK_H_ */
diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index 8e96314..b4deb15 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -589,6 +589,7 @@ typedef struct osm_subn {
 	unsigned need_update;
 	cl_fmap_t mgrp_mgid_tbl;
 	osm_db_domain_t *p_g2m;
+	osm_db_domain_t *p_neighbor;
 	void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
 } osm_subn_t;
 /*
diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
index 3fea4a2..b4907c1 100644
--- a/opensm/osm_db_pack.c
+++ b/opensm/osm_db_pack.c
@@ -93,6 +93,37 @@ static inline uint64_t unpack_mkey(char *p_mkey_str)
 	return strtoull(p_mkey_str, NULL, 0);
 }
 
+static inline void pack_neighbor(uint64_t guid, uint8_t portnum, char *p_str)
+{
+	sprintf(p_str, "0x%016" PRIx64 ":%u", guid, portnum);
+}
+
+static inline int unpack_neighbor(char *p_str, uint64_t *guid,
+				  uint8_t *portnum)
+{
+	char tmp_str[24];
+	char *p_num, *p_next;
+	unsigned long tmp_port;
+
+	strncpy(tmp_str, p_str, 23);
+	tmp_str[23] = '\0';
+	p_num = strtok_r(tmp_str, ":", &p_next);
+	if (!p_num)
+		return 1;
+	if (guid)
+		*guid = strtoull(p_num, NULL, 0);
+
+	p_num = strtok_r(NULL, ":", &p_next);
+	if (!p_num)
+		return 1;
+	if (portnum) {
+		tmp_port = strtoul(p_num, NULL, 0);
+		CL_ASSERT(tmp_port < 0x100);
+		*portnum = (uint8_t) tmp_port;
+	}
+
+	return 0;
+}
 
 int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
 			  OUT cl_qlist_t * p_guid_list)
@@ -222,3 +253,75 @@ int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
 	pack_guid(guid, guid_str);
 	return osm_db_delete(p_g2m, guid_str);
 }
+
+int osm_db_neighbor_guids(IN osm_db_domain_t * p_neighbor,
+			  OUT cl_qlist_t * p_neighbor_list)
+{
+	char *p_key;
+	cl_list_t keys;
+	osm_db_neighbor_elem_t *p_neighbor_elem;
+
+	cl_list_construct(&keys);
+	cl_list_init(&keys, 10);
+
+	if (osm_db_keys(p_neighbor, &keys))
+		return 1;
+
+	while ((p_key = cl_list_remove_head(&keys)) != NULL) {
+		p_neighbor_elem =
+		    (osm_db_neighbor_elem_t *) malloc(sizeof(osm_db_neighbor_elem_t));
+		CL_ASSERT(p_neighbor_elem != NULL);
+
+		unpack_neighbor(p_key, &p_neighbor_elem->guid,
+				&p_neighbor_elem->portnum);
+		cl_qlist_insert_head(p_neighbor_list, &p_neighbor_elem->item);
+	}
+
+	cl_list_destroy(&keys);
+	return 0;
+}
+
+int osm_db_neighbor_get(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t portnum1, OUT uint64_t * p_guid2,
+			OUT uint8_t * p_portnum2)
+{
+	char neighbor_str[24];
+	char *p_other_str;
+	uint64_t temp_guid;
+	uint8_t temp_portnum;
+
+	pack_neighbor(guid1, portnum1, neighbor_str);
+	p_other_str = osm_db_lookup(p_neighbor, neighbor_str);
+	if (!p_other_str)
+		return 1;
+	if (unpack_neighbor(p_other_str, &temp_guid, &temp_portnum))
+		return 1;
+
+	if (p_guid2)
+		*p_guid2 = temp_guid;
+	if (p_portnum2)
+		*p_portnum2 = temp_portnum;
+
+	return 0;
+}
+
+int osm_db_neighbor_set(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t portnum1, IN uint64_t guid2,
+			IN uint8_t portnum2)
+{
+	char n1_str[24], n2_str[24];
+
+	pack_neighbor(guid1, portnum1, n1_str);
+	pack_neighbor(guid2, portnum2, n2_str);
+
+	return osm_db_update(p_neighbor, n1_str, n2_str);
+}
+
+int osm_db_neighbor_delete(IN osm_db_domain_t * p_neighbor, IN uint64_t guid,
+			   IN uint8_t portnum)
+{
+	char n_str[24];
+
+	pack_neighbor(guid, portnum, n_str);
+	return osm_db_delete(p_neighbor, n_str);
+}
diff --git a/opensm/osm_node_info_rcv.c b/opensm/osm_node_info_rcv.c
index 7d2675f..50df519 100644
--- a/opensm/osm_node_info_rcv.c
+++ b/opensm/osm_node_info_rcv.c
@@ -61,6 +61,7 @@
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
 #include <opensm/osm_ucast_mgr.h>
+#include <opensm/osm_db_pack.h>
 
 static void report_duplicated_guid(IN osm_sm_t * sm, osm_physp_t * p_physp,
 				   osm_node_t * p_neighbor_node,
@@ -132,7 +133,7 @@ static void ni_rcv_set_links(IN osm_sm_t * sm, osm_node_t * p_node,
 			     const osm_ni_context_t * p_ni_context)
 {
 	osm_node_t *p_neighbor_node;
-	osm_physp_t *p_physp;
+	osm_physp_t *p_physp, *p_remote_physp;
 
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -243,6 +244,20 @@ static void ni_rcv_set_links(IN osm_sm_t * sm, osm_node_t * p_node,
 	osm_node_link(p_node, port_num, p_neighbor_node,
 		      p_ni_context->port_num);
 
+	p_physp = osm_node_get_physp_ptr(p_node, port_num);
+	p_remote_physp = osm_node_get_physp_ptr(p_neighbor_node,
+						p_ni_context->port_num);
+	osm_db_neighbor_set(sm->p_subn->p_neighbor,
+			    cl_ntoh64(osm_physp_get_port_guid(p_physp)),
+			    port_num,
+			    cl_ntoh64(osm_physp_get_port_guid(p_remote_physp)),
+			    p_ni_context->port_num);
+	osm_db_neighbor_set(sm->p_subn->p_neighbor,
+			    cl_ntoh64(osm_physp_get_port_guid(p_remote_physp)),
+			    p_ni_context->port_num,
+			    cl_ntoh64(osm_physp_get_port_guid(p_physp)),
+			    port_num);		
+
 _exit:
 	OSM_LOG_EXIT(sm->p_log);
 }
diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index ee13517..5fe378e 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -100,7 +100,7 @@ static ib_net64_t req_determine_mkey(IN osm_sm_t * sm,
 	}
 
 	/* At this point, p_physp points at the outgoing physp on the
-	   last hop, or NULL if we don't know it.
+   	   last hop, or NULL if we don't know it.
 	*/
 	if (!p_physp) {
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR,
@@ -115,6 +115,15 @@ static ib_net64_t req_determine_mkey(IN osm_sm_t * sm,
 		goto Remote_Guid;
 	}
 
+	OSM_LOG(sm->p_log, OSM_LOG_DEBUG, "Target port guid unknown, "
+		"using persistent DB\n");
+	if (!osm_db_neighbor_get(sm->p_subn->p_neighbor,
+			 	cl_ntoh64(p_physp->port_guid),
+			 	p_physp->port_num,
+			 	&dest_port_guid, NULL)) {
+		dest_port_guid = cl_hton64(dest_port_guid);
+	}
+
 Remote_Guid:
 	if (dest_port_guid) {
 		if (!osm_db_guid2mkey_get(sm->p_subn->p_g2m,
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 4d5da46..e44715e 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -1456,6 +1456,7 @@ repeat_discovery:
 
 	/* Write a new copy of our persistent guid2mkey database */
 	osm_db_store(sm->p_subn->p_g2m);
+	osm_db_store(sm->p_subn->p_neighbor);
 }
 
 static void do_process_mgrp_queue(osm_sm_t * sm)
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 4f5272f..bb69a66 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -471,6 +471,79 @@ Exit:
 	OSM_LOG_EXIT(&(p_subn->p_osm->log));
 }
 
+static void subn_validate_neighbor(osm_subn_t *p_subn)
+{
+	cl_qlist_t entries;
+	osm_db_neighbor_elem_t *p_item;
+	boolean_t valid_entry;
+	uint64_t guid;
+	uint8_t port;
+
+	OSM_LOG_ENTER(&(p_subn->p_osm->log));
+	cl_qlist_init(&entries);
+
+	if (osm_db_neighbor_guids(p_subn->p_neighbor, &entries)) {
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7512: "
+			"could not get neighbor entry list\n");
+		goto Exit;
+	}
+
+	while ((p_item = (osm_db_neighbor_elem_t *) cl_qlist_remove_head(&entries))
+	       != (osm_db_neighbor_elem_t *) cl_qlist_end(&entries)) {
+		valid_entry = TRUE;
+
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_DEBUG,
+			"Validating neighbor for 0x%016" PRIx64 ", port %d\n",
+			p_item->guid, p_item->portnum);
+		if (p_item->guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7513: found invalid zero guid\n");
+			valid_entry = FALSE;
+		} else if (p_item->portnum == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7514: found invalid zero port\n");
+			valid_entry = FALSE;
+		} else if (osm_db_neighbor_get(p_subn->p_neighbor,
+					       p_item->guid, p_item->portnum,
+					       &guid, &port)) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7515: could not find neighbor for "
+				"guid: 0x%016" PRIx64 "\n", p_item->guid);
+			valid_entry = FALSE;
+		} else if (guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7516: found invalid neighbor "
+				"zero guid");
+			valid_entry = FALSE;
+		} else if (port == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7517: found invalid neighbor "
+				"zero port\n");
+			valid_entry = FALSE;
+		} else if (osm_db_neighbor_get(p_subn->p_neighbor,
+					       guid, port, &guid, &port) ||
+                           guid != p_item->guid || port != p_item->portnum) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7518: neighbor does not point "
+				"back at us\n");
+			valid_entry = FALSE;
+		}
+
+		if (valid_entry == FALSE) {
+			if (osm_db_neighbor_delete(p_subn->p_neighbor,
+						   p_item->guid,
+						   p_item->portnum))
+                                OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+                                        "ERR 7519: failed to delete entry for "
+                                        "guid:0x%016" PRIx64 " port:%u\n",
+                                        p_item->guid, p_item->portnum);
+		}
+	}
+
+Exit:
+	OSM_LOG_EXIT(&(p_subn->p_osm->log));
+}
+
 void osm_subn_construct(IN osm_subn_t * p_subn)
 {
 	memset(p_subn, 0, sizeof(*p_subn));
@@ -706,6 +779,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 
 	subn_validate_g2m(p_subn);
 
+	/* Initialize the neighbor database */
+	p_subn->p_neighbor = osm_db_domain_init(&(p_osm->db), "neighbors");
+	if (!p_subn->p_neighbor) {
+		OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7520: Error "
+			"initializing neighbor link persistent database\n");
+		return IB_ERROR;
+	}
+
+	if (osm_db_restore(p_subn->p_neighbor)) {
+#ifndef __WIN__
+		/*
+		 * When Windows is BSODing, it might corrupt files that
+		 * were previously opened for writing, even if the files
+		 * are closed, so we might see corrupted neighbors file.
+		 */
+		if (p_subn->opt.exit_on_fatal) {
+			osm_log(&(p_osm->log), OSM_LOG_SYS, 
+				"FATAL: Error restoring neighbor link "
+				"persistent database\n");
+			return IB_ERROR;
+		} else
+#endif
+			OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
+				"ERR 7521: Error restoring neighbor link "
+				"persistent database\n");
+	}
+
+	subn_validate_neighbor(p_subn);
+
 	return IB_SUCCESS;
 }
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/8] opensm: Check for valid mkey protection level in config file
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                         ` (4 preceding siblings ...)
  2012-06-26  0:55       ` [PATCH 6/8] opensm: Add neighboring link cache file Jim Foraker
@ 2012-06-26  0:55       ` Jim Foraker
  2012-06-26  0:55       ` [PATCH 8/8] opensm: Ensure sweep interval/mkey lease are sensibly set Jim Foraker
  6 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker


Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_subnet.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index bb69a66..ddee955 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -1496,6 +1496,13 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 	}
 #endif
 
+	if (p_opts->m_key_protect_bits > 3) {
+		log_report(" Invalid Cached Option Value:"
+		           "m_key_protection_level = %u Setting to %u "
+			   "instead\n", p_opts->m_key_protect_bits, 2);
+		p_opts->m_key_protect_bits = 2;
+	}
+
 	if (p_opts->root_guid_file != NULL) {
 		FILE *root_file = fopen(p_opts->root_guid_file, "r");
 		if (!root_file) {
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/8] opensm: Ensure sweep interval/mkey lease are sensibly set
       [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                         ` (5 preceding siblings ...)
  2012-06-26  0:55       ` [PATCH 7/8] opensm: Check for valid mkey protection level in config file Jim Foraker
@ 2012-06-26  0:55       ` Jim Foraker
       [not found]         ` <1340672104-18039-8-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  6 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-06-26  0:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, alexne-VPRAkNaXOzVWk0Htik3J/w
  Cc: weiny2-i2BcT+NCU+M, Jim Foraker

If mkeys are protected, sweep should always be enabled and
set to an interval < the lease timeout, to ensure a missed trap
doesn't lead to mkey exposure.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_subnet.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index ddee955..3f336d8 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -1502,6 +1502,26 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 			   "instead\n", p_opts->m_key_protect_bits, 2);
 		p_opts->m_key_protect_bits = 2;
 	}
+	if (p_opts->m_key_protect_bits && p_opts->m_key_lease_period) {
+		if (!p_opts->sweep_interval) {
+			log_report(" Sweep disabled with protected mkey "
+				   "leases in effect; re-enabling sweeping "
+				   "with interval %u\n",
+				   cl_ntoh16(p_opts->m_key_lease_period) - 1);
+			p_opts->sweep_interval =
+				cl_ntoh16(p_opts->m_key_lease_period) - 1;
+		}
+		if (p_opts->sweep_interval >=
+			cl_ntoh16(p_opts->m_key_lease_period)) {
+			log_report(" Sweep interval %u >= mkey lease period "
+				   "%u. Setting lease period to %u\n",
+				   p_opts->sweep_interval,
+				   cl_ntoh16(p_opts->m_key_lease_period),
+				   p_opts->sweep_interval + 1);
+			p_opts->m_key_lease_period =
+				cl_hton16(p_opts->sweep_interval + 1);
+		}
+	}
 
 	if (p_opts->root_guid_file != NULL) {
 		FILE *root_file = fopen(p_opts->root_guid_file, "r");
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/8] opensm: Improved mkey support
       [not found] ` <1340672058.5218.97.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  2012-06-26  0:54   ` [PATCH 1/8] opensm: Add guid2mkey cache file support Jim Foraker
@ 2012-07-04  0:25   ` Jim Foraker
       [not found]     ` <1341361508.5218.148.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  2012-08-01 14:48   ` Jim Foraker
  2 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-07-04  0:25 UTC (permalink / raw)
  To: linux-rdma; +Cc: Alex Netes, Weiny, Ira K.

     I'm sending new versions of patches 1 and 3 to the list, which
correct merge/build issues introduced by other recently accepted
patches.  They will be marked as "V1.1".

     Jim

On Mon, 2012-06-25 at 17:54 -0700, Jim Foraker wrote:
> I'm about to post a set of patches intended to improve mkey support
> in OpenSM.  These patches have been fairly rigorously tested on a small
> fabric, and I believe are sufficiently stable for inclusion.  The
> primary intent here is threefold:
> 
> 1) Fix a multitude of edge case issues with the existing
> single-mkey-per-subnet support in OpenSM.  For instance, the current
> implementation provides no way to change an established non-zero mkey
> without rebooting or manually re-keying each CA on the entire subnet.
> 
> 2) Enable mkey protection across the fabric.  This involves not only
> setting a non-zero protection level, but also providing the SM with a
> sufficient information cache to initialize the subnet on restart without
> having to wait for mkey lease timeouts (provided one is set).
> 
> 3) Provide a basis on which to build multiple-mkey systems for OpenSM
> (be they per-host, KDF, or random) in the future.
> 
>      The patches add two new cache files: a port guid-to-mkey cache, and
> a neighboring link (port guid to port guid) cache.
>      The guid2mkey cache is used to provide a hint at the initial mkey
> for a CA during initialization.  It is a hint only; the SM is capable of
> dealing with cases where the guid2mkey cache is incorrect, although it
> may require waiting for (potentially multiple) mkey lease timeouts at
> non-zero mkey protection levels.  The guid2mkey cache is presented first
> in the patch set, as it ends up ameliorating several corner cases in a
> cleaner way than attacking them directly did.
>      The neighbors cache file provides an initial hint to the SM of what
> port guid we may expect at the opposite end of a link that is being
> initialized.  This is necessary at mkey protection level 2, where we
> cannot do the SubnGet necessary to determine the port guid to use in
> looking up an mkey hint.
>      The changes to the osm_req functions to support mkeys in patch 2
> now require plock to be held when called.  This was generally already
> the case, but there were a few spots where it was not.  In most of these
> cases, the plock is still not technically necessary, as they occur
> during hops 0/1 when DR path traversal is trivial.  I wrapped all of
> these occurrences in locks in a separate patch (#3), in order to make
> the changes more obvious and invite comment.
> 
>      Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support
       [not found]     ` <1341361508.5218.148.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
@ 2012-07-04  0:25       ` Jim Foraker
       [not found]         ` <1341361548-30229-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-07-23 15:59       ` [PATCH 0/8] opensm: Improved mkey support Alex Netes
  1 sibling, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-07-04  0:25 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: alexne-VPRAkNaXOzVWk0Htik3J/w, weiny2-i2BcT+NCU+M, Jim Foraker

Adds support for a guid2mkey file, and uses the database
to select which mkey to use in outgoing SMPs.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_db.h      |    7 +-
 include/opensm/osm_db_pack.h |  144 ++++++++++++++++++++++++++++++++++++++++++
 include/opensm/osm_port.h    |   23 ++-----
 include/opensm/osm_subnet.h  |    2 +
 opensm/osm_db_files.c        |    1 +
 opensm/osm_db_pack.c         |   73 +++++++++++++++++++++
 opensm/osm_lid_mgr.c         |   12 +++-
 opensm/osm_link_mgr.c        |   14 +++-
 opensm/osm_opensm.c          |   10 +--
 opensm/osm_port.c            |   32 ++++++++++
 opensm/osm_port_info_rcv.c   |    6 +-
 opensm/osm_req.c             |    1 +
 opensm/osm_state_mgr.c       |    4 ++
 opensm/osm_subnet.c          |   77 ++++++++++++++++++++++
 14 files changed, 377 insertions(+), 29 deletions(-)

diff --git a/include/opensm/osm_db.h b/include/opensm/osm_db.h
index 7077347..d05bfa0 100644
--- a/include/opensm/osm_db.h
+++ b/include/opensm/osm_db.h
@@ -43,7 +43,8 @@
 
 #include <complib/cl_list.h>
 #include <complib/cl_spinlock.h>
-#include <opensm/osm_log.h>
+
+struct osm_log;
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -118,7 +119,7 @@ typedef struct osm_db_domain {
 */
 typedef struct osm_db {
 	void *p_db_imp;
-	osm_log_t *p_log;
+	struct osm_log *p_log;
 	cl_list_t domains;
 } osm_db_t;
 /*
@@ -185,7 +186,7 @@ void osm_db_destroy(IN osm_db_t * p_db);
 *
 * SYNOPSIS
 */
-int osm_db_init(IN osm_db_t * p_db, IN osm_log_t * p_log);
+int osm_db_init(IN osm_db_t * p_db, IN struct osm_log * p_log);
 /*
 * PARAMETERS
 *
diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
index 3d24926..25644df 100644
--- a/include/opensm/osm_db_pack.h
+++ b/include/opensm/osm_db_pack.h
@@ -235,5 +235,149 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid);
 * osm_db_guid2lid_get, osm_db_guid2lid_set
 *********/
 
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_init
+* NAME
+*	osm_db_guid2mkey_init
+*
+* DESCRIPTION
+*	Initialize a domain for the guid2mkey table
+*
+* SYNOPSIS
+*/
+static inline osm_db_domain_t *osm_db_guid2mkey_init(IN osm_db_t * p_db)
+{
+	return (osm_db_domain_init(p_db, "guid2mkey"));
+}
+
+/*
+* PARAMETERS
+*	p_db
+*		[in] Pointer to the database object to construct
+*
+* RETURN VALUES
+*	The pointer to the new allocated domain object or NULL.
+*
+* NOTE: DB domains are destroyed by the osm_db_destroy
+*
+* SEE ALSO
+*	Database, osm_db_init, osm_db_destroy
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_guids
+* NAME
+*	osm_db_guid2mkey_guids
+*
+* DESCRIPTION
+*	Provides back a list of guid elements.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
+			  OUT cl_qlist_t * p_guid_list);
+/*
+* PARAMETERS
+*	p_g2l
+*		[in] Pointer to the guid2mkey domain
+*
+*  p_guid_list
+*     [out] A quick list of guid elements of type osm_db_guid_elem_t
+*
+* RETURN VALUES
+*	0 if successful
+*
+* NOTE: the output qlist should be initialized and each item freed
+*       by the caller, then destroyed.
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids, osm_db_guid2mkey_get
+* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_get
+* NAME
+*	osm_db_guid2mkey_get
+*
+* DESCRIPTION
+*	Get the mkey for the given guid.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 OUT uint64_t * p_mkey);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  p_mkey
+*     [out] Pointer to the resulting mkey in host order.
+*
+* RETURN VALUES
+*	0 if successful. The lid will be set to 0 if not found.
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_set
+* NAME
+*	osm_db_guid2mkey_set
+*
+* DESCRIPTION
+*	Set the mkey for the given guid.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 IN uint64_t mkey);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  mkey
+*     [in] The mkey value to set, in host order
+*
+* RETURN VALUES
+*	0 if successful
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_get, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_delete
+* NAME
+*	osm_db_guid2mkey_delete
+*
+* DESCRIPTION
+*	Delete the entry by the given guid
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+* RETURN VALUES
+*	0 if successful otherwise 1
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_get, osm_db_guid2mkey_set
+*********/
+
 END_C_DECLS
 #endif				/* _OSM_DB_PACK_H_ */
diff --git a/include/opensm/osm_port.h b/include/opensm/osm_port.h
index 56e9c37..6b73cc7 100644
--- a/include/opensm/osm_port.h
+++ b/include/opensm/osm_port.h
@@ -66,6 +66,7 @@ BEGIN_C_DECLS
 struct osm_port;
 struct osm_node;
 struct osm_mgrp;
+struct osm_sm;
 
 /****h* OpenSM/Physical Port
 * NAME
@@ -431,22 +432,9 @@ static inline void osm_physp_set_health(IN osm_physp_t * p_physp,
 *
 * SYNOPSIS
 */
-static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
-					   IN const ib_port_info_t * p_pi)
-{
-	CL_ASSERT(p_pi);
-	CL_ASSERT(osm_physp_is_valid(p_physp));
-
-	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
-		/* If PortState is down, only copy PortState */
-		/* and PortPhysicalState per C14-24-2.1 */
-		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
-		ib_port_info_set_port_phys_state
-		    (ib_port_info_get_port_phys_state(p_pi),
-		     &p_physp->port_info);
-	} else
-		p_physp->port_info = *p_pi;
-}
+void osm_physp_set_port_info(IN osm_physp_t * p_physp,
+					   IN const ib_port_info_t * p_pi,
+					   IN const struct osm_sm * p_sm);
 
 /*
 * PARAMETERS
@@ -456,6 +444,9 @@ static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
 *	p_pi
 *		[in] Pointer to the IBA defined PortInfo at this port number.
 *
+* 	p_sm
+* 		[in] Pointer to an osm_sm_t object.
+*
 * RETURN VALUES
 *	This function does not return a value.
 *
diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index 838ca82..c13d0c8 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -54,6 +54,7 @@
 #include <complib/cl_list.h>
 #include <opensm/osm_base.h>
 #include <opensm/osm_prefix_route.h>
+#include <opensm/osm_db.h>
 #include <stdio.h>
 
 #ifdef __cplusplus
@@ -600,6 +601,7 @@ typedef struct osm_subn {
 	boolean_t sweeping_enabled;
 	unsigned need_update;
 	cl_fmap_t mgrp_mgid_tbl;
+	osm_db_domain_t *p_g2m;
 	void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
 	osm_log_level_t per_mod_log_tbl[256];
 } osm_subn_t;
diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c
index 7ab6b56..9f338f3 100644
--- a/opensm/osm_db_files.c
+++ b/opensm/osm_db_files.c
@@ -50,6 +50,7 @@
 #define FILE_ID OSM_FILE_DB_FILES_C
 #include <opensm/st.h>
 #include <opensm/osm_db.h>
+#include <opensm/osm_log.h>
 
 /****d* Database/OSM_DB_MAX_LINE_LEN
  * NAME
diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
index c1ec4ab..57c3a66 100644
--- a/opensm/osm_db_pack.c
+++ b/opensm/osm_db_pack.c
@@ -85,6 +85,17 @@ static inline int unpack_lids(IN char *p_lid_str, OUT uint16_t * p_min_lid,
 	return 0;
 }
 
+static inline void pack_mkey(uint64_t mkey, char *p_mkey_str)
+{
+	sprintf(p_mkey_str, "0x%016" PRIx64, mkey);
+}
+
+static inline uint64_t unpack_mkey(char *p_mkey_str)
+{
+	return strtoull(p_mkey_str, NULL, 0);
+}
+
+
 int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
 			  OUT cl_qlist_t * p_guid_list)
 {
@@ -151,3 +162,65 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid)
 	pack_guid(guid, guid_str);
 	return osm_db_delete(p_g2l, guid_str);
 }
+
+int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
+			   OUT cl_qlist_t * p_guid_list)
+{
+	char *p_key;
+	cl_list_t keys;
+	osm_db_guid_elem_t *p_guid_elem;
+
+	cl_list_construct(&keys);
+	cl_list_init(&keys, 10);
+
+	if (osm_db_keys(p_g2m, &keys))
+		return 1;
+
+	while ((p_key = cl_list_remove_head(&keys)) != NULL) {
+		p_guid_elem =
+		    (osm_db_guid_elem_t *) malloc(sizeof(osm_db_guid_elem_t));
+		CL_ASSERT(p_guid_elem != NULL);
+
+		p_guid_elem->guid = unpack_guid(p_key);
+		cl_qlist_insert_head(p_guid_list, &p_guid_elem->item);
+	}
+
+	cl_list_destroy(&keys);
+	return 0;
+}
+
+int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 OUT uint64_t * p_mkey)
+{
+	char guid_str[20];
+	char *p_mkey_str;
+
+	pack_guid(guid, guid_str);
+	p_mkey_str = osm_db_lookup(p_g2m, guid_str);
+	if (!p_mkey_str)
+		return 1;
+
+	if (p_mkey)
+		*p_mkey = unpack_mkey(p_mkey_str);
+
+	return 0;
+}
+
+int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 IN uint64_t mkey)
+{
+	char guid_str[20];
+	char mkey_str[20];
+
+	pack_guid(guid, guid_str);
+	pack_mkey(mkey, mkey_str);
+
+	return osm_db_update(p_g2m, guid_str, mkey_str);
+}
+
+int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
+{
+	char guid_str[20];
+	pack_guid(guid, guid_str);
+	return osm_db_delete(p_g2m, guid_str);
+}
diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
index cb7ff0b..7799ee3 100644
--- a/opensm/osm_lid_mgr.c
+++ b/opensm/osm_lid_mgr.c
@@ -800,6 +800,7 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 	uint8_t op_vls;
 	uint8_t port_num;
 	boolean_t send_set = FALSE;
+	boolean_t update_mkey = FALSE;
 	int ret = 0;
 
 	OSM_LOG_ENTER(p_mgr->p_log);
@@ -862,8 +863,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 		send_set = TRUE;
 
 	p_pi->m_key = p_mgr->p_subn->opt.m_key;
-	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key)))
+	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key))) {
+		update_mkey = TRUE;
 		send_set = TRUE;
+	}
 
 	p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix;
 	if (memcmp(&p_pi->subnet_prefix, &p_old_pi->subnet_prefix,
@@ -1053,6 +1056,13 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 			     CL_DISP_MSGID_NONE, &context);
 	if (status != IB_SUCCESS)
 		ret = -1;
+	/* If we sent a new mkey above, update our guid2mkey map
+	   now, on the assumption that the SubnSet succeeds
+	*/
+	if (update_mkey)
+		osm_db_guid2mkey_set(p_mgr->p_subn->p_g2m,
+				     cl_ntoh64(p_physp->port_guid),
+				     cl_ntoh64(p_pi->m_key));
 
 Exit:
 	OSM_LOG_EXIT(p_mgr->p_log);
diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
index 8301643..50393c5 100644
--- a/opensm/osm_link_mgr.c
+++ b/opensm/osm_link_mgr.c
@@ -56,6 +56,7 @@
 #include <opensm/osm_helper.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db_pack.h>
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
@@ -104,6 +105,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 	int qdr_change = 0, fdr10_change = 0;
 	int ret = 0;
 	ib_net32_t attr_mod, cap_mask;
+	boolean_t update_mkey = FALSE;
 
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -194,8 +196,10 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 		    port_num == 0) {
 			p_pi->m_key = sm->p_subn->opt.m_key;
 			if (memcmp(&p_pi->m_key, &p_old_pi->m_key,
-				   sizeof(p_pi->m_key)))
+				   sizeof(p_pi->m_key))) {
+				update_mkey = TRUE;
 				send_set = TRUE;
+			}
 
 			p_pi->subnet_prefix = sm->p_subn->opt.subnet_prefix;
 			if (memcmp(&p_pi->subnet_prefix,
@@ -466,6 +470,14 @@ Send:
 	if (status)
 		ret = -1;
 
+	/* If we sent a new mkey above, update our guid2mkey map
+ 	   now, on the assumption that the SubnSet succeeds
+	 */
+	if (update_mkey)
+		osm_db_guid2mkey_set(sm->p_subn->p_g2m,
+				     cl_ntoh64(p_physp->port_guid),
+				     cl_ntoh64(p_pi->m_key));
+
 	if (send_set2) {
 		status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_physp),
 				     payload2, sizeof(payload2),
diff --git a/opensm/osm_opensm.c b/opensm/osm_opensm.c
index 429108a..42cbb36 100644
--- a/opensm/osm_opensm.c
+++ b/opensm/osm_opensm.c
@@ -413,6 +413,11 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
 	if (status != IB_SUCCESS)
 		goto Exit;
 
+	/* the DB is in use by subn so init before */
+	status = osm_db_init(&p_osm->db, &p_osm->log);
+	if (status != IB_SUCCESS)
+		goto Exit;
+
 	status = osm_subn_init(&p_osm->subn, p_osm, p_opt);
 	if (status != IB_SUCCESS)
 		goto Exit;
@@ -435,11 +440,6 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
 	if (status != IB_SUCCESS)
 		goto Exit;
 
-	/* the DB is in use by the SM and SA so init before */
-	status = osm_db_init(&p_osm->db, &p_osm->log);
-	if (status != IB_SUCCESS)
-		goto Exit;
-
 	status = osm_sm_init(&p_osm->sm, &p_osm->subn, &p_osm->db,
 			     p_osm->p_vendor, &p_osm->mad_pool, &p_osm->vl15,
 			     &p_osm->log, &p_osm->stats, &p_osm->disp,
diff --git a/opensm/osm_port.c b/opensm/osm_port.c
index 88b9fd8..0730c14 100644
--- a/opensm/osm_port.c
+++ b/opensm/osm_port.c
@@ -54,6 +54,8 @@
 #include <opensm/osm_node.h>
 #include <opensm/osm_madw.h>
 #include <opensm/osm_switch.h>
+#include <opensm/osm_db_pack.h>
+#include <opensm/osm_sm.h>
 
 void osm_physp_construct(IN osm_physp_t * p_physp)
 {
@@ -659,3 +661,33 @@ void osm_alias_guid_delete(IN OUT osm_alias_guid_t ** pp_alias_guid)
 	free(*pp_alias_guid);
 	*pp_alias_guid = NULL;
 }
+
+void osm_physp_set_port_info(IN osm_physp_t * p_physp,
+					   IN const ib_port_info_t * p_pi,
+					   IN const struct osm_sm * p_sm)
+{
+	CL_ASSERT(p_pi);
+	CL_ASSERT(osm_physp_is_valid(p_physp));
+
+	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
+		/* If PortState is down, only copy PortState */
+		/* and PortPhysicalState per C14-24-2.1 */
+		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
+		ib_port_info_set_port_phys_state
+		    (ib_port_info_get_port_phys_state(p_pi),
+		     &p_physp->port_info);
+	} else {
+		p_physp->port_info = *p_pi;
+
+		/* The MKey in p_pi can only be considered valid if it's
+		 * for a HCA/router or switch port 0, and it's either
+		 * non-zero or the MKeyProtect bits are also zero.
+		 */
+		if ((osm_node_get_type(p_physp->p_node) != IB_NODE_TYPE_SWITCH ||
+		     p_physp->port_num == 0) &&
+		    (p_pi->m_key != 0 || ib_port_info_get_mpb(p_pi) == 0)) 
+			osm_db_guid2mkey_set(p_sm->p_subn->p_g2m,
+				     	     cl_ntoh64(p_physp->port_guid),
+					     cl_ntoh64(p_pi->m_key));
+	}
+}
diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
index ab7418b..00cbfc7 100644
--- a/opensm/osm_port_info_rcv.c
+++ b/opensm/osm_port_info_rcv.c
@@ -312,7 +312,7 @@ static void pi_rcv_process_switch_port(IN osm_sm_t * sm, IN osm_node_t * p_node,
 	/*
 	   Update the PortInfo attribute.
 	 */
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	if (port_num == 0) {
 		/* Determine if base switch port 0 */
@@ -337,7 +337,7 @@ static void pi_rcv_process_ca_or_router_port(IN osm_sm_t * sm,
 
 	pi_rcv_check_and_fix_lid(sm->p_log, p_pi, p_physp);
 
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	pi_rcv_process_endport(sm, p_physp, p_pi);
 
@@ -475,7 +475,7 @@ static void pi_rcv_process_set(IN osm_sm_t * sm, IN osm_node_t * p_node,
 		cl_ntoh64(osm_node_get_node_guid(p_node)),
 		cl_ntoh64(p_smp->trans_id));
 
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	OSM_LOG_EXIT(sm->p_log);
 }
diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index 2532f9c..51220f3 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -58,6 +58,7 @@
 #include <opensm/osm_vl15intf.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db_pack.h>
 
 /**********************************************************************
   The plock MAY or MAY NOT be held before calling this function.
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 143b744..74114af 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -66,6 +66,7 @@
 #include <vendor/osm_vendor_api.h>
 #include <opensm/osm_inform.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db.h>
 
 extern void osm_drop_mgr_process(IN osm_sm_t * sm);
 extern int osm_qos_setup(IN osm_opensm_t * p_osm);
@@ -1440,6 +1441,9 @@ repeat_discovery:
 	if (sm->p_subn->force_heavy_sweep
 	    || sm->p_subn->subnet_initialization_error)
 		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
+
+	/* Write a new copy of our persistent guid2mkey database */
+	osm_db_store(sm->p_subn->p_g2m);
 }
 
 static void do_process_mgrp_queue(osm_sm_t * sm)
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 7fb5c8f..47a5606 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -75,6 +75,8 @@
 #include <opensm/osm_event_plugin.h>
 #include <opensm/osm_qos_policy.h>
 #include <opensm/osm_service.h>
+#include <opensm/osm_db.h>
+#include <opensm/osm_db_pack.h>
 
 static const char null_str[] = "(null)";
 
@@ -538,6 +540,52 @@ static int compar_mgids(const void *m1, const void *m2)
 	return memcmp(m1, m2, sizeof(ib_gid_t));
 }
 
+static void subn_validate_g2m(osm_subn_t *p_subn)
+{
+	cl_qlist_t guids;
+	osm_db_guid_elem_t *p_item;
+	uint64_t mkey;
+	boolean_t valid_entry;
+
+	OSM_LOG_ENTER(&(p_subn->p_osm->log));
+	cl_qlist_init(&guids);
+
+	if (osm_db_guid2mkey_guids(p_subn->p_g2m, &guids)) {
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7506: "
+			"could not get mkey guid list\n");
+		goto Exit;
+	}
+
+	while ((p_item = (osm_db_guid_elem_t *) cl_qlist_remove_head(&guids))
+	       != (osm_db_guid_elem_t *) cl_qlist_end(&guids)) {
+		valid_entry = TRUE;
+
+		if (p_item->guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7507: found invalid zero guid");
+			valid_entry = FALSE;
+		} else if (osm_db_guid2mkey_get(p_subn->p_g2m, p_item->guid,
+						&mkey)) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7508: could not get mkey for guid:0x%016"
+				PRIx64 "\n", p_item->guid);
+			valid_entry = FALSE;
+		}
+
+		if (valid_entry == FALSE) {
+			if (osm_db_guid2mkey_delete(p_subn->p_g2m,
+                                                        p_item->guid))
+                                OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+                                        "ERR 7509: failed to delete entry for "
+                                        "guid:0x%016" PRIx64 "\n",
+                                        p_item->guid);
+		}
+	}
+
+Exit:
+	OSM_LOG_EXIT(&(p_subn->p_osm->log));
+}
+
 void osm_subn_construct(IN osm_subn_t * p_subn)
 {
 	memset(p_subn, 0, sizeof(*p_subn));
@@ -744,6 +792,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 	p_subn->sweeping_enabled = TRUE;
 	p_subn->last_sm_port_state = 1;
 
+	/* Initialize the guid2mkey database */
+	p_subn->p_g2m = osm_db_domain_init(&(p_osm->db), "guid2mkey");
+	if (!p_subn->p_g2m) {
+		OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7510: "
+			"Error initializing Guid-to-MKey persistent database\n");
+		return IB_ERROR;
+	}
+
+	if (osm_db_restore(p_subn->p_g2m)) {
+#ifndef __WIN__
+		/*
+		 * When Windows is BSODing, it might corrupt files that
+		 * were previously opened for writing, even if the files
+		 * are closed, so we might see corrupted guid2mkey file.
+		 */
+		if (p_subn->opt.exit_on_fatal) {
+			osm_log(&(p_osm->log), OSM_LOG_SYS, 
+				"FATAL: Error restoring Guid-to-Mkey "
+				"persistent database\n");
+			return IB_ERROR;
+		} else
+#endif
+			OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
+				"ERR 7511: Error restoring Guid-to-Mkey "
+				"persistent database\n");
+	}
+
+	subn_validate_g2m(p_subn);
+
 	return IB_SUCCESS;
 }
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH V1.1 3/8] Add locking where necessary around osm_req_*
       [not found]         ` <1341361548-30229-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
@ 2012-07-04  0:25           ` Jim Foraker
  2012-07-23 15:55           ` [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support Alex Netes
  1 sibling, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-07-04  0:25 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: alexne-VPRAkNaXOzVWk0Htik3J/w, weiny2-i2BcT+NCU+M, Jim Foraker

Grabs plock for reading in the places where one did not
already exist when osm_req_get/osm_req_set are called.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_perfmgr.c      |    6 ++++++
 opensm/osm_sm_state_mgr.c |    2 ++
 opensm/osm_state_mgr.c    |    8 ++++++++
 opensm/osm_trap_rcv.c     |    6 +++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
index a7ff017..f8e56d2 100644
--- a/opensm/osm_perfmgr.c
+++ b/opensm/osm_perfmgr.c
@@ -610,8 +610,10 @@ static int sweep_hop_1(osm_sm_t * sm)
 		path_array[1] = port_num;
 
 		osm_dr_path_init(&hop_1_path, 1, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &hop_1_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, &context);
+		CL_PLOCK_RELEASE(sm->p_lock);
 
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 4C82: "
@@ -642,9 +644,11 @@ static int sweep_hop_1(osm_sm_t * sm)
 			path_array[1] = port_num;
 
 			osm_dr_path_init(&hop_1_path, 1, path_array);
+			CL_PLOCK_ACQUIRE(sm->p_lock);
 			status = osm_req_get(sm, &hop_1_path,
 					     IB_MAD_ATTR_NODE_INFO, 0,
 					     CL_DISP_MSGID_NONE, &context);
+			CL_PLOCK_RELEASE(sm->p_lock);
 
 			if (status != IB_SUCCESS)
 				OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 4C84: "
@@ -704,8 +708,10 @@ static int sweep_hop_0(osm_sm_t * sm)
 	}
 
 	osm_dr_path_init(&dr_path, 0, path_array);
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_NODE_INFO, 0,
 			     CL_DISP_MSGID_NONE, NULL);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR,
diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
index e826f1f..061a0f2 100644
--- a/opensm/osm_sm_state_mgr.c
+++ b/opensm/osm_sm_state_mgr.c
@@ -109,9 +109,11 @@ static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm)
 	context.smi_context.port_guid = p_port->guid;
 	context.smi_context.set_method = FALSE;
 
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_get(sm, osm_physp_get_dr_path_ptr(p_port->p_physp),
 			     IB_MAD_ATTR_SM_INFO, 0, CL_DISP_MSGID_NONE,
 			     &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3204: "
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 74114af..e276a64 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -239,8 +239,10 @@ static ib_api_status_t state_mgr_sweep_hop_0(IN osm_sm_t * sm)
 		CL_PLOCK_RELEASE(sm->p_lock);
 
 		osm_dr_path_init(&dr_path, 0, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, NULL);
+		CL_PLOCK_RELEASE(sm->p_lock);
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3305: "
 				"Request for NodeInfo failed (%s)\n",
@@ -432,8 +434,10 @@ static ib_api_status_t state_mgr_sweep_hop_1(IN osm_sm_t * sm)
 		path_array[1] = port_num;
 
 		osm_dr_path_init(&hop_1_path, 1, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &hop_1_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, &context);
+		CL_PLOCK_RELEASE(sm->p_lock);
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3311: "
 				"Request for NodeInfo failed (%s)\n",
@@ -461,10 +465,12 @@ static ib_api_status_t state_mgr_sweep_hop_1(IN osm_sm_t * sm)
 
 				path_array[1] = port_num;
 				osm_dr_path_init(&hop_1_path, 1, path_array);
+				CL_PLOCK_ACQUIRE(sm->p_lock);
 				status = osm_req_get(sm, &hop_1_path,
 						     IB_MAD_ATTR_NODE_INFO, 0,
 						     CL_DISP_MSGID_NONE,
 						     &context);
+				CL_PLOCK_RELEASE(sm->p_lock);
 				if (status != IB_SUCCESS)
 					OSM_LOG(sm->p_log, OSM_LOG_ERROR,
 						"ERR 3312: "
@@ -811,10 +817,12 @@ static void state_mgr_send_handover(IN osm_sm_t * sm, IN osm_remote_sm_t * p_sm)
 		p_smi->sm_key = 0;
 	}
 
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_port->p_physp),
 			     payload, sizeof(payload), IB_MAD_ATTR_SM_INFO,
 			     IB_SMINFO_ATTR_MOD_HANDOVER, CL_DISP_MSGID_NONE,
 			     &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3317: "
diff --git a/opensm/osm_trap_rcv.c b/opensm/osm_trap_rcv.c
index e90ad5e..90cb673 100644
--- a/opensm/osm_trap_rcv.c
+++ b/opensm/osm_trap_rcv.c
@@ -213,6 +213,7 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p)
 	uint8_t payload[IB_SMP_DATA_SIZE];
 	osm_madw_context_t context;
 	ib_port_info_t *pi = (ib_port_info_t *)payload;
+	ib_api_status_t status;
 
 	/* select the nearest port to master opensm */
 	if (p->p_remote_physp &&
@@ -235,10 +236,13 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p)
 	context.pi_context.light_sweep = FALSE;
 	context.pi_context.active_transition = FALSE;
 
-	return osm_req_set(sm, osm_physp_get_dr_path_ptr(p),
+	CL_PLOCK_ACQUIRE(sm->p_lock);
+	status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p),
 			   payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO,
 			   cl_hton32(osm_physp_get_port_num(p)),
 			   CL_DISP_MSGID_NONE, &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
+	return status;
 }
 
 static void log_trap_info(osm_log_t *p_log, ib_mad_notice_attr_t *p_ntci,
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts
       [not found]         ` <1340672104-18039-5-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
@ 2012-07-23 15:43           ` Alex Netes
  2012-07-23 22:19             ` Jim Foraker
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Netes @ 2012-07-23 15:43 UTC (permalink / raw)
  To: Jim Foraker; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, weiny2-i2BcT+NCU+M

Hi Jim,

On 17:55 Mon 25 Jun     , Jim Foraker wrote:
> A subnet should not be listed as cleanly initialized if CAs
> fail to respond to SubnGet requests.
> 
> Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> ---
>  opensm/osm_sm_mad_ctrl.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
> index f0bcff2..464b6b0 100644
> --- a/opensm/osm_sm_mad_ctrl.c
> +++ b/opensm/osm_sm_mad_ctrl.c
> @@ -741,6 +741,15 @@ static void sm_mad_ctrl_send_err_cb(IN void *context, IN osm_madw_t * p_madw)
>  			cl_ntoh16(p_smp->attr_id),
>  			ib_get_sm_attr_str(p_smp->attr_id));
>  		p_ctrl->p_subn->subnet_initialization_error = TRUE;
> +	} else if (p_madw->status == IB_TIMEOUT &&
> +		   p_smp->method == IB_MAD_METHOD_GET) {

It's pretty common to see timeouts in fabrics without m_key support (e.g.
switch reboots) and it's not desirable to start another heavy sweep because
of that. So I guess it would be better if we could initiate heavy sweep only
when m_key is set and protection level is 2 or 3.

> +		/* Timeouts on SubnGet may be an indication of an mkey
> +	 	   error at protection levels 2/3 */
> +		OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
> +			"Timeout while getting attribute 0x%X (%s)\n",
> +			cl_ntoh16(p_smp->attr_id),
> +			ib_get_sm_attr_str(p_smp->attr_id));
> +		p_ctrl->p_subn->subnet_initialization_error = TRUE;
>  	}
>  
>  	osm_dump_dr_smp(p_ctrl->p_log, p_smp, OSM_LOG_VERBOSE);
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support
       [not found]         ` <1341361548-30229-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-07-04  0:25           ` [PATCH V1.1 3/8] Add locking where necessary around osm_req_* Jim Foraker
@ 2012-07-23 15:55           ` Alex Netes
  2012-07-23 22:37             ` Jim Foraker
  1 sibling, 1 reply; 33+ messages in thread
From: Alex Netes @ 2012-07-23 15:55 UTC (permalink / raw)
  To: Jim Foraker; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, weiny2-i2BcT+NCU+M

Hi Jim,

On 17:25 Tue 03 Jul     , Jim Foraker wrote:
> Adds support for a guid2mkey file, and uses the database
> to select which mkey to use in outgoing SMPs.

One thing that I notice that missing (and I'm not sure how common it's being used,
but still I see it as part of SUSE distro) is sldd.sh script found under
scripts. It distributes guid2lid between STANDBY SMs, so I think guid2mkey file
should be also redistributed between the STANDBY SMs.

> 
> Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> ---
>  include/opensm/osm_db.h      |    7 +-
>  include/opensm/osm_db_pack.h |  144 ++++++++++++++++++++++++++++++++++++++++++
>  include/opensm/osm_port.h    |   23 ++-----
>  include/opensm/osm_subnet.h  |    2 +
>  opensm/osm_db_files.c        |    1 +
>  opensm/osm_db_pack.c         |   73 +++++++++++++++++++++
>  opensm/osm_lid_mgr.c         |   12 +++-
>  opensm/osm_link_mgr.c        |   14 +++-
>  opensm/osm_opensm.c          |   10 +--
>  opensm/osm_port.c            |   32 ++++++++++
>  opensm/osm_port_info_rcv.c   |    6 +-
>  opensm/osm_req.c             |    1 +
>  opensm/osm_state_mgr.c       |    4 ++
>  opensm/osm_subnet.c          |   77 ++++++++++++++++++++++
>  14 files changed, 377 insertions(+), 29 deletions(-)
> 
> diff --git a/include/opensm/osm_db.h b/include/opensm/osm_db.h
> index 7077347..d05bfa0 100644
> --- a/include/opensm/osm_db.h
> +++ b/include/opensm/osm_db.h
> @@ -43,7 +43,8 @@
>  
>  #include <complib/cl_list.h>
>  #include <complib/cl_spinlock.h>
> -#include <opensm/osm_log.h>
> +
> +struct osm_log;
>  
>  #ifdef __cplusplus
>  #  define BEGIN_C_DECLS extern "C" {
> @@ -118,7 +119,7 @@ typedef struct osm_db_domain {
>  */
>  typedef struct osm_db {
>  	void *p_db_imp;
> -	osm_log_t *p_log;
> +	struct osm_log *p_log;
>  	cl_list_t domains;
>  } osm_db_t;
>  /*
> @@ -185,7 +186,7 @@ void osm_db_destroy(IN osm_db_t * p_db);
>  *
>  * SYNOPSIS
>  */
> -int osm_db_init(IN osm_db_t * p_db, IN osm_log_t * p_log);
> +int osm_db_init(IN osm_db_t * p_db, IN struct osm_log * p_log);
>  /*
>  * PARAMETERS
>  *
> diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
> index 3d24926..25644df 100644
> --- a/include/opensm/osm_db_pack.h
> +++ b/include/opensm/osm_db_pack.h
> @@ -235,5 +235,149 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid);
>  * osm_db_guid2lid_get, osm_db_guid2lid_set
>  *********/
>  
> +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_init
> +* NAME
> +*	osm_db_guid2mkey_init
> +*
> +* DESCRIPTION
> +*	Initialize a domain for the guid2mkey table
> +*
> +* SYNOPSIS
> +*/
> +static inline osm_db_domain_t *osm_db_guid2mkey_init(IN osm_db_t * p_db)
> +{
> +	return (osm_db_domain_init(p_db, "guid2mkey"));
> +}
> +
> +/*
> +* PARAMETERS
> +*	p_db
> +*		[in] Pointer to the database object to construct
> +*
> +* RETURN VALUES
> +*	The pointer to the new allocated domain object or NULL.
> +*
> +* NOTE: DB domains are destroyed by the osm_db_destroy
> +*
> +* SEE ALSO
> +*	Database, osm_db_init, osm_db_destroy
> +*********/
> +
> +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_guids
> +* NAME
> +*	osm_db_guid2mkey_guids
> +*
> +* DESCRIPTION
> +*	Provides back a list of guid elements.
> +*
> +* SYNOPSIS
> +*/
> +int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
> +			  OUT cl_qlist_t * p_guid_list);
> +/*
> +* PARAMETERS
> +*	p_g2l
> +*		[in] Pointer to the guid2mkey domain
> +*
> +*  p_guid_list
> +*     [out] A quick list of guid elements of type osm_db_guid_elem_t
> +*
> +* RETURN VALUES
> +*	0 if successful
> +*
> +* NOTE: the output qlist should be initialized and each item freed
> +*       by the caller, then destroyed.
> +*
> +* SEE ALSO
> +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids, osm_db_guid2mkey_get
> +* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
> +*********/
> +
> +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_get
> +* NAME
> +*	osm_db_guid2mkey_get
> +*
> +* DESCRIPTION
> +*	Get the mkey for the given guid.
> +*
> +* SYNOPSIS
> +*/
> +int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> +			 OUT uint64_t * p_mkey);
> +/*
> +* PARAMETERS
> +*	p_g2m
> +*		[in] Pointer to the guid2mkey domain
> +*
> +*  guid
> +*     [in] The guid to look for
> +*
> +*  p_mkey
> +*     [out] Pointer to the resulting mkey in host order.
> +*
> +* RETURN VALUES
> +*	0 if successful. The lid will be set to 0 if not found.
> +*
> +* SEE ALSO
> +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
> +* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
> +*********/
> +
> +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_set
> +* NAME
> +*	osm_db_guid2mkey_set
> +*
> +* DESCRIPTION
> +*	Set the mkey for the given guid.
> +*
> +* SYNOPSIS
> +*/
> +int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> +			 IN uint64_t mkey);
> +/*
> +* PARAMETERS
> +*	p_g2m
> +*		[in] Pointer to the guid2mkey domain
> +*
> +*  guid
> +*     [in] The guid to look for
> +*
> +*  mkey
> +*     [in] The mkey value to set, in host order
> +*
> +* RETURN VALUES
> +*	0 if successful
> +*
> +* SEE ALSO
> +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
> +* osm_db_guid2mkey_get, osm_db_guid2mkey_delete
> +*********/
> +
> +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_delete
> +* NAME
> +*	osm_db_guid2mkey_delete
> +*
> +* DESCRIPTION
> +*	Delete the entry by the given guid
> +*
> +* SYNOPSIS
> +*/
> +int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
> +/*
> +* PARAMETERS
> +*	p_g2m
> +*		[in] Pointer to the guid2mkey domain
> +*
> +*  guid
> +*     [in] The guid to look for
> +*
> +* RETURN VALUES
> +*	0 if successful otherwise 1
> +*
> +* SEE ALSO
> +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
> +* osm_db_guid2mkey_get, osm_db_guid2mkey_set
> +*********/
> +
>  END_C_DECLS
>  #endif				/* _OSM_DB_PACK_H_ */
> diff --git a/include/opensm/osm_port.h b/include/opensm/osm_port.h
> index 56e9c37..6b73cc7 100644
> --- a/include/opensm/osm_port.h
> +++ b/include/opensm/osm_port.h
> @@ -66,6 +66,7 @@ BEGIN_C_DECLS
>  struct osm_port;
>  struct osm_node;
>  struct osm_mgrp;
> +struct osm_sm;
>  
>  /****h* OpenSM/Physical Port
>  * NAME
> @@ -431,22 +432,9 @@ static inline void osm_physp_set_health(IN osm_physp_t * p_physp,
>  *
>  * SYNOPSIS
>  */
> -static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> -					   IN const ib_port_info_t * p_pi)
> -{
> -	CL_ASSERT(p_pi);
> -	CL_ASSERT(osm_physp_is_valid(p_physp));
> -
> -	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
> -		/* If PortState is down, only copy PortState */
> -		/* and PortPhysicalState per C14-24-2.1 */
> -		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
> -		ib_port_info_set_port_phys_state
> -		    (ib_port_info_get_port_phys_state(p_pi),
> -		     &p_physp->port_info);
> -	} else
> -		p_physp->port_info = *p_pi;
> -}
> +void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> +					   IN const ib_port_info_t * p_pi,
> +					   IN const struct osm_sm * p_sm);
>  
>  /*
>  * PARAMETERS
> @@ -456,6 +444,9 @@ static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
>  *	p_pi
>  *		[in] Pointer to the IBA defined PortInfo at this port number.
>  *
> +* 	p_sm
> +* 		[in] Pointer to an osm_sm_t object.
> +*
>  * RETURN VALUES
>  *	This function does not return a value.
>  *
> diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
> index 838ca82..c13d0c8 100644
> --- a/include/opensm/osm_subnet.h
> +++ b/include/opensm/osm_subnet.h
> @@ -54,6 +54,7 @@
>  #include <complib/cl_list.h>
>  #include <opensm/osm_base.h>
>  #include <opensm/osm_prefix_route.h>
> +#include <opensm/osm_db.h>
>  #include <stdio.h>
>  
>  #ifdef __cplusplus
> @@ -600,6 +601,7 @@ typedef struct osm_subn {
>  	boolean_t sweeping_enabled;
>  	unsigned need_update;
>  	cl_fmap_t mgrp_mgid_tbl;
> +	osm_db_domain_t *p_g2m;
>  	void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
>  	osm_log_level_t per_mod_log_tbl[256];
>  } osm_subn_t;
> diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c
> index 7ab6b56..9f338f3 100644
> --- a/opensm/osm_db_files.c
> +++ b/opensm/osm_db_files.c
> @@ -50,6 +50,7 @@
>  #define FILE_ID OSM_FILE_DB_FILES_C
>  #include <opensm/st.h>
>  #include <opensm/osm_db.h>
> +#include <opensm/osm_log.h>
>  
>  /****d* Database/OSM_DB_MAX_LINE_LEN
>   * NAME
> diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
> index c1ec4ab..57c3a66 100644
> --- a/opensm/osm_db_pack.c
> +++ b/opensm/osm_db_pack.c
> @@ -85,6 +85,17 @@ static inline int unpack_lids(IN char *p_lid_str, OUT uint16_t * p_min_lid,
>  	return 0;
>  }
>  
> +static inline void pack_mkey(uint64_t mkey, char *p_mkey_str)
> +{
> +	sprintf(p_mkey_str, "0x%016" PRIx64, mkey);
> +}
> +
> +static inline uint64_t unpack_mkey(char *p_mkey_str)
> +{
> +	return strtoull(p_mkey_str, NULL, 0);
> +}
> +
> +
>  int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
>  			  OUT cl_qlist_t * p_guid_list)
>  {
> @@ -151,3 +162,65 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid)
>  	pack_guid(guid, guid_str);
>  	return osm_db_delete(p_g2l, guid_str);
>  }
> +
> +int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
> +			   OUT cl_qlist_t * p_guid_list)
> +{
> +	char *p_key;
> +	cl_list_t keys;
> +	osm_db_guid_elem_t *p_guid_elem;
> +
> +	cl_list_construct(&keys);
> +	cl_list_init(&keys, 10);
> +
> +	if (osm_db_keys(p_g2m, &keys))
> +		return 1;
> +
> +	while ((p_key = cl_list_remove_head(&keys)) != NULL) {
> +		p_guid_elem =
> +		    (osm_db_guid_elem_t *) malloc(sizeof(osm_db_guid_elem_t));
> +		CL_ASSERT(p_guid_elem != NULL);
> +
> +		p_guid_elem->guid = unpack_guid(p_key);
> +		cl_qlist_insert_head(p_guid_list, &p_guid_elem->item);
> +	}
> +
> +	cl_list_destroy(&keys);
> +	return 0;
> +}
> +
> +int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> +			 OUT uint64_t * p_mkey)
> +{
> +	char guid_str[20];
> +	char *p_mkey_str;
> +
> +	pack_guid(guid, guid_str);
> +	p_mkey_str = osm_db_lookup(p_g2m, guid_str);
> +	if (!p_mkey_str)
> +		return 1;
> +
> +	if (p_mkey)
> +		*p_mkey = unpack_mkey(p_mkey_str);
> +
> +	return 0;
> +}
> +
> +int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> +			 IN uint64_t mkey)
> +{
> +	char guid_str[20];
> +	char mkey_str[20];
> +
> +	pack_guid(guid, guid_str);
> +	pack_mkey(mkey, mkey_str);
> +
> +	return osm_db_update(p_g2m, guid_str, mkey_str);
> +}
> +
> +int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
> +{
> +	char guid_str[20];
> +	pack_guid(guid, guid_str);
> +	return osm_db_delete(p_g2m, guid_str);
> +}
> diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
> index cb7ff0b..7799ee3 100644
> --- a/opensm/osm_lid_mgr.c
> +++ b/opensm/osm_lid_mgr.c
> @@ -800,6 +800,7 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
>  	uint8_t op_vls;
>  	uint8_t port_num;
>  	boolean_t send_set = FALSE;
> +	boolean_t update_mkey = FALSE;
>  	int ret = 0;
>  
>  	OSM_LOG_ENTER(p_mgr->p_log);
> @@ -862,8 +863,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
>  		send_set = TRUE;
>  
>  	p_pi->m_key = p_mgr->p_subn->opt.m_key;
> -	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key)))
> +	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key))) {
> +		update_mkey = TRUE;
>  		send_set = TRUE;
> +	}
>  
>  	p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix;
>  	if (memcmp(&p_pi->subnet_prefix, &p_old_pi->subnet_prefix,
> @@ -1053,6 +1056,13 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
>  			     CL_DISP_MSGID_NONE, &context);
>  	if (status != IB_SUCCESS)
>  		ret = -1;
> +	/* If we sent a new mkey above, update our guid2mkey map
> +	   now, on the assumption that the SubnSet succeeds
> +	*/
> +	if (update_mkey)
> +		osm_db_guid2mkey_set(p_mgr->p_subn->p_g2m,
> +				     cl_ntoh64(p_physp->port_guid),
> +				     cl_ntoh64(p_pi->m_key));
>  
>  Exit:
>  	OSM_LOG_EXIT(p_mgr->p_log);
> diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
> index 8301643..50393c5 100644
> --- a/opensm/osm_link_mgr.c
> +++ b/opensm/osm_link_mgr.c
> @@ -56,6 +56,7 @@
>  #include <opensm/osm_helper.h>
>  #include <opensm/osm_msgdef.h>
>  #include <opensm/osm_opensm.h>
> +#include <opensm/osm_db_pack.h>
>  
>  static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
>  {
> @@ -104,6 +105,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
>  	int qdr_change = 0, fdr10_change = 0;
>  	int ret = 0;
>  	ib_net32_t attr_mod, cap_mask;
> +	boolean_t update_mkey = FALSE;
>  
>  	OSM_LOG_ENTER(sm->p_log);
>  
> @@ -194,8 +196,10 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
>  		    port_num == 0) {
>  			p_pi->m_key = sm->p_subn->opt.m_key;
>  			if (memcmp(&p_pi->m_key, &p_old_pi->m_key,
> -				   sizeof(p_pi->m_key)))
> +				   sizeof(p_pi->m_key))) {
> +				update_mkey = TRUE;
>  				send_set = TRUE;
> +			}
>  
>  			p_pi->subnet_prefix = sm->p_subn->opt.subnet_prefix;
>  			if (memcmp(&p_pi->subnet_prefix,
> @@ -466,6 +470,14 @@ Send:
>  	if (status)
>  		ret = -1;
>  
> +	/* If we sent a new mkey above, update our guid2mkey map
> + 	   now, on the assumption that the SubnSet succeeds
> +	 */
> +	if (update_mkey)
> +		osm_db_guid2mkey_set(sm->p_subn->p_g2m,
> +				     cl_ntoh64(p_physp->port_guid),
> +				     cl_ntoh64(p_pi->m_key));
> +
>  	if (send_set2) {
>  		status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_physp),
>  				     payload2, sizeof(payload2),
> diff --git a/opensm/osm_opensm.c b/opensm/osm_opensm.c
> index 429108a..42cbb36 100644
> --- a/opensm/osm_opensm.c
> +++ b/opensm/osm_opensm.c
> @@ -413,6 +413,11 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
>  	if (status != IB_SUCCESS)
>  		goto Exit;
>  
> +	/* the DB is in use by subn so init before */
> +	status = osm_db_init(&p_osm->db, &p_osm->log);
> +	if (status != IB_SUCCESS)
> +		goto Exit;
> +
>  	status = osm_subn_init(&p_osm->subn, p_osm, p_opt);
>  	if (status != IB_SUCCESS)
>  		goto Exit;
> @@ -435,11 +440,6 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
>  	if (status != IB_SUCCESS)
>  		goto Exit;
>  
> -	/* the DB is in use by the SM and SA so init before */
> -	status = osm_db_init(&p_osm->db, &p_osm->log);
> -	if (status != IB_SUCCESS)
> -		goto Exit;
> -
>  	status = osm_sm_init(&p_osm->sm, &p_osm->subn, &p_osm->db,
>  			     p_osm->p_vendor, &p_osm->mad_pool, &p_osm->vl15,
>  			     &p_osm->log, &p_osm->stats, &p_osm->disp,
> diff --git a/opensm/osm_port.c b/opensm/osm_port.c
> index 88b9fd8..0730c14 100644
> --- a/opensm/osm_port.c
> +++ b/opensm/osm_port.c
> @@ -54,6 +54,8 @@
>  #include <opensm/osm_node.h>
>  #include <opensm/osm_madw.h>
>  #include <opensm/osm_switch.h>
> +#include <opensm/osm_db_pack.h>
> +#include <opensm/osm_sm.h>
>  
>  void osm_physp_construct(IN osm_physp_t * p_physp)
>  {
> @@ -659,3 +661,33 @@ void osm_alias_guid_delete(IN OUT osm_alias_guid_t ** pp_alias_guid)
>  	free(*pp_alias_guid);
>  	*pp_alias_guid = NULL;
>  }
> +
> +void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> +					   IN const ib_port_info_t * p_pi,
> +					   IN const struct osm_sm * p_sm)
> +{
> +	CL_ASSERT(p_pi);
> +	CL_ASSERT(osm_physp_is_valid(p_physp));
> +
> +	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
> +		/* If PortState is down, only copy PortState */
> +		/* and PortPhysicalState per C14-24-2.1 */
> +		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
> +		ib_port_info_set_port_phys_state
> +		    (ib_port_info_get_port_phys_state(p_pi),
> +		     &p_physp->port_info);
> +	} else {
> +		p_physp->port_info = *p_pi;
> +
> +		/* The MKey in p_pi can only be considered valid if it's
> +		 * for a HCA/router or switch port 0, and it's either
> +		 * non-zero or the MKeyProtect bits are also zero.
> +		 */
> +		if ((osm_node_get_type(p_physp->p_node) != IB_NODE_TYPE_SWITCH ||
> +		     p_physp->port_num == 0) &&
> +		    (p_pi->m_key != 0 || ib_port_info_get_mpb(p_pi) == 0)) 
> +			osm_db_guid2mkey_set(p_sm->p_subn->p_g2m,
> +				     	     cl_ntoh64(p_physp->port_guid),
> +					     cl_ntoh64(p_pi->m_key));
> +	}
> +}
> diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
> index ab7418b..00cbfc7 100644
> --- a/opensm/osm_port_info_rcv.c
> +++ b/opensm/osm_port_info_rcv.c
> @@ -312,7 +312,7 @@ static void pi_rcv_process_switch_port(IN osm_sm_t * sm, IN osm_node_t * p_node,
>  	/*
>  	   Update the PortInfo attribute.
>  	 */
> -	osm_physp_set_port_info(p_physp, p_pi);
> +	osm_physp_set_port_info(p_physp, p_pi, sm);
>  
>  	if (port_num == 0) {
>  		/* Determine if base switch port 0 */
> @@ -337,7 +337,7 @@ static void pi_rcv_process_ca_or_router_port(IN osm_sm_t * sm,
>  
>  	pi_rcv_check_and_fix_lid(sm->p_log, p_pi, p_physp);
>  
> -	osm_physp_set_port_info(p_physp, p_pi);
> +	osm_physp_set_port_info(p_physp, p_pi, sm);
>  
>  	pi_rcv_process_endport(sm, p_physp, p_pi);
>  
> @@ -475,7 +475,7 @@ static void pi_rcv_process_set(IN osm_sm_t * sm, IN osm_node_t * p_node,
>  		cl_ntoh64(osm_node_get_node_guid(p_node)),
>  		cl_ntoh64(p_smp->trans_id));
>  
> -	osm_physp_set_port_info(p_physp, p_pi);
> +	osm_physp_set_port_info(p_physp, p_pi, sm);
>  
>  	OSM_LOG_EXIT(sm->p_log);
>  }
> diff --git a/opensm/osm_req.c b/opensm/osm_req.c
> index 2532f9c..51220f3 100644
> --- a/opensm/osm_req.c
> +++ b/opensm/osm_req.c
> @@ -58,6 +58,7 @@
>  #include <opensm/osm_vl15intf.h>
>  #include <opensm/osm_msgdef.h>
>  #include <opensm/osm_opensm.h>
> +#include <opensm/osm_db_pack.h>
>  
>  /**********************************************************************
>    The plock MAY or MAY NOT be held before calling this function.
> diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
> index 143b744..74114af 100644
> --- a/opensm/osm_state_mgr.c
> +++ b/opensm/osm_state_mgr.c
> @@ -66,6 +66,7 @@
>  #include <vendor/osm_vendor_api.h>
>  #include <opensm/osm_inform.h>
>  #include <opensm/osm_opensm.h>
> +#include <opensm/osm_db.h>
>  
>  extern void osm_drop_mgr_process(IN osm_sm_t * sm);
>  extern int osm_qos_setup(IN osm_opensm_t * p_osm);
> @@ -1440,6 +1441,9 @@ repeat_discovery:
>  	if (sm->p_subn->force_heavy_sweep
>  	    || sm->p_subn->subnet_initialization_error)
>  		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
> +
> +	/* Write a new copy of our persistent guid2mkey database */
> +	osm_db_store(sm->p_subn->p_g2m);
>  }
>  
>  static void do_process_mgrp_queue(osm_sm_t * sm)
> diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
> index 7fb5c8f..47a5606 100644
> --- a/opensm/osm_subnet.c
> +++ b/opensm/osm_subnet.c
> @@ -75,6 +75,8 @@
>  #include <opensm/osm_event_plugin.h>
>  #include <opensm/osm_qos_policy.h>
>  #include <opensm/osm_service.h>
> +#include <opensm/osm_db.h>
> +#include <opensm/osm_db_pack.h>
>  
>  static const char null_str[] = "(null)";
>  
> @@ -538,6 +540,52 @@ static int compar_mgids(const void *m1, const void *m2)
>  	return memcmp(m1, m2, sizeof(ib_gid_t));
>  }
>  
> +static void subn_validate_g2m(osm_subn_t *p_subn)
> +{
> +	cl_qlist_t guids;
> +	osm_db_guid_elem_t *p_item;
> +	uint64_t mkey;
> +	boolean_t valid_entry;
> +
> +	OSM_LOG_ENTER(&(p_subn->p_osm->log));
> +	cl_qlist_init(&guids);
> +
> +	if (osm_db_guid2mkey_guids(p_subn->p_g2m, &guids)) {
> +		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7506: "
> +			"could not get mkey guid list\n");
> +		goto Exit;
> +	}
> +
> +	while ((p_item = (osm_db_guid_elem_t *) cl_qlist_remove_head(&guids))
> +	       != (osm_db_guid_elem_t *) cl_qlist_end(&guids)) {
> +		valid_entry = TRUE;
> +
> +		if (p_item->guid == 0) {
> +			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
> +				"ERR 7507: found invalid zero guid");
> +			valid_entry = FALSE;
> +		} else if (osm_db_guid2mkey_get(p_subn->p_g2m, p_item->guid,
> +						&mkey)) {
> +			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
> +				"ERR 7508: could not get mkey for guid:0x%016"
> +				PRIx64 "\n", p_item->guid);
> +			valid_entry = FALSE;
> +		}
> +
> +		if (valid_entry == FALSE) {
> +			if (osm_db_guid2mkey_delete(p_subn->p_g2m,
> +                                                        p_item->guid))
> +                                OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
> +                                        "ERR 7509: failed to delete entry for "
> +                                        "guid:0x%016" PRIx64 "\n",
> +                                        p_item->guid);
> +		}
> +	}
> +
> +Exit:
> +	OSM_LOG_EXIT(&(p_subn->p_osm->log));
> +}
> +
>  void osm_subn_construct(IN osm_subn_t * p_subn)
>  {
>  	memset(p_subn, 0, sizeof(*p_subn));
> @@ -744,6 +792,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
>  	p_subn->sweeping_enabled = TRUE;
>  	p_subn->last_sm_port_state = 1;
>  
> +	/* Initialize the guid2mkey database */
> +	p_subn->p_g2m = osm_db_domain_init(&(p_osm->db), "guid2mkey");
> +	if (!p_subn->p_g2m) {
> +		OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7510: "
> +			"Error initializing Guid-to-MKey persistent database\n");
> +		return IB_ERROR;
> +	}
> +
> +	if (osm_db_restore(p_subn->p_g2m)) {
> +#ifndef __WIN__
> +		/*
> +		 * When Windows is BSODing, it might corrupt files that
> +		 * were previously opened for writing, even if the files
> +		 * are closed, so we might see corrupted guid2mkey file.
> +		 */
> +		if (p_subn->opt.exit_on_fatal) {
> +			osm_log(&(p_osm->log), OSM_LOG_SYS, 
> +				"FATAL: Error restoring Guid-to-Mkey "
> +				"persistent database\n");
> +			return IB_ERROR;
> +		} else
> +#endif
> +			OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
> +				"ERR 7511: Error restoring Guid-to-Mkey "
> +				"persistent database\n");
> +	}
> +
> +	subn_validate_g2m(p_subn);
> +
>  	return IB_SUCCESS;
>  }
>  
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/8] opensm: Improved mkey support
       [not found]     ` <1341361508.5218.148.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  2012-07-04  0:25       ` [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support Jim Foraker
@ 2012-07-23 15:59       ` Alex Netes
  2012-07-23 22:28         ` Jim Foraker
  1 sibling, 1 reply; 33+ messages in thread
From: Alex Netes @ 2012-07-23 15:59 UTC (permalink / raw)
  To: Jim Foraker; +Cc: linux-rdma, Weiny, Ira K.

Hi Jim,

Can you please add short description of the improved M_Key mechanism into the
man page?

On 17:25 Tue 03 Jul     , Jim Foraker wrote:
>      I'm sending new versions of patches 1 and 3 to the list, which
> correct merge/build issues introduced by other recently accepted
> patches.  They will be marked as "V1.1".
> 
>      Jim
> 
> On Mon, 2012-06-25 at 17:54 -0700, Jim Foraker wrote:
> > I'm about to post a set of patches intended to improve mkey support
> > in OpenSM.  These patches have been fairly rigorously tested on a small
> > fabric, and I believe are sufficiently stable for inclusion.  The
> > primary intent here is threefold:
> > 
> > 1) Fix a multitude of edge case issues with the existing
> > single-mkey-per-subnet support in OpenSM.  For instance, the current
> > implementation provides no way to change an established non-zero mkey
> > without rebooting or manually re-keying each CA on the entire subnet.
> > 
> > 2) Enable mkey protection across the fabric.  This involves not only
> > setting a non-zero protection level, but also providing the SM with a
> > sufficient information cache to initialize the subnet on restart without
> > having to wait for mkey lease timeouts (provided one is set).
> > 
> > 3) Provide a basis on which to build multiple-mkey systems for OpenSM
> > (be they per-host, KDF, or random) in the future.
> > 
> >      The patches add two new cache files: a port guid-to-mkey cache, and
> > a neighboring link (port guid to port guid) cache.
> >      The guid2mkey cache is used to provide a hint at the initial mkey
> > for a CA during initialization.  It is a hint only; the SM is capable of
> > dealing with cases where the guid2mkey cache is incorrect, although it
> > may require waiting for (potentially multiple) mkey lease timeouts at
> > non-zero mkey protection levels.  The guid2mkey cache is presented first
> > in the patch set, as it ends up ameliorating several corner cases in a
> > cleaner way than attacking them directly did.
> >      The neighbors cache file provides an initial hint to the SM of what
> > port guid we may expect at the opposite end of a link that is being
> > initialized.  This is necessary at mkey protection level 2, where we
> > cannot do the SubnGet necessary to determine the port guid to use in
> > looking up an mkey hint.
> >      The changes to the osm_req functions to support mkeys in patch 2
> > now require plock to be held when called.  This was generally already
> > the case, but there were a few spots where it was not.  In most of these
> > cases, the plock is still not technically necessary, as they occur
> > during hops 0/1 when DR path traversal is trivial.  I wrapped all of
> > these occurrences in locks in a separate patch (#3), in order to make
> > the changes more obvious and invite comment.
> > 
> >      Jim
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts
  2012-07-23 15:43           ` Alex Netes
@ 2012-07-23 22:19             ` Jim Foraker
       [not found]               ` <1343081989.29792.12.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-07-23 22:19 UTC (permalink / raw)
  To: Alex Netes; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Weiny, Ira K.


On Mon, 2012-07-23 at 08:43 -0700, Alex Netes wrote:
> Hi Jim,
> 
> On 17:55 Mon 25 Jun     , Jim Foraker wrote:
> > A subnet should not be listed as cleanly initialized if CAs
> > fail to respond to SubnGet requests.
> > 
> > Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> > ---
> >  opensm/osm_sm_mad_ctrl.c |    9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
> > index f0bcff2..464b6b0 100644
> > --- a/opensm/osm_sm_mad_ctrl.c
> > +++ b/opensm/osm_sm_mad_ctrl.c
> > @@ -741,6 +741,15 @@ static void sm_mad_ctrl_send_err_cb(IN void *context, IN osm_madw_t * p_madw)
> >  			cl_ntoh16(p_smp->attr_id),
> >  			ib_get_sm_attr_str(p_smp->attr_id));
> >  		p_ctrl->p_subn->subnet_initialization_error = TRUE;
> > +	} else if (p_madw->status == IB_TIMEOUT &&
> > +		   p_smp->method == IB_MAD_METHOD_GET) {
> 
> It's pretty common to see timeouts in fabrics without m_key support (e.g.
> switch reboots) and it's not desirable to start another heavy sweep because
> of that. So I guess it would be better if we could initiate heavy sweep only
> when m_key is set and protection level is 2 or 3.
     This was done primarily to ensure that "SUBNET UP" doesn't get
displayed/logged while there are unconfigured HCAs due to misset mkeys.
I'm reasonably sure (I will re-test to verify) that future light sweeps
will catch HCAs whos mkeys timeout, presuming the timeout is set.  So we
could also just log the error and not worry about setting
subnet_initialization_error.

     Jim

> > +		/* Timeouts on SubnGet may be an indication of an mkey
> > +	 	   error at protection levels 2/3 */
> > +		OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
> > +			"Timeout while getting attribute 0x%X (%s)\n",
> > +			cl_ntoh16(p_smp->attr_id),
> > +			ib_get_sm_attr_str(p_smp->attr_id));
> > +		p_ctrl->p_subn->subnet_initialization_error = TRUE;
> >  	}
> >  
> >  	osm_dump_dr_smp(p_ctrl->p_log, p_smp, OSM_LOG_VERBOSE);
> > -- 
> > 1.7.9.2
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/8] opensm: Improved mkey support
  2012-07-23 15:59       ` [PATCH 0/8] opensm: Improved mkey support Alex Netes
@ 2012-07-23 22:28         ` Jim Foraker
  0 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-07-23 22:28 UTC (permalink / raw)
  To: Alex Netes; +Cc: linux-rdma, Weiny, Ira K.


On Mon, 2012-07-23 at 08:59 -0700, Alex Netes wrote:
> Hi Jim,
> 
> Can you please add short description of the improved M_Key mechanism into the
> man page?
     Sure.  I don't see anything about mkeys in the page now, so I'll
write up a section for them.
     v2, BTW, will also fix some whitespace errors that I noticed
creeped into the first set.

     Jim

> On 17:25 Tue 03 Jul     , Jim Foraker wrote:
> >      I'm sending new versions of patches 1 and 3 to the list, which
> > correct merge/build issues introduced by other recently accepted
> > patches.  They will be marked as "V1.1".
> > 
> >      Jim
> > 
> > On Mon, 2012-06-25 at 17:54 -0700, Jim Foraker wrote:
> > > I'm about to post a set of patches intended to improve mkey support
> > > in OpenSM.  These patches have been fairly rigorously tested on a small
> > > fabric, and I believe are sufficiently stable for inclusion.  The
> > > primary intent here is threefold:
> > > 
> > > 1) Fix a multitude of edge case issues with the existing
> > > single-mkey-per-subnet support in OpenSM.  For instance, the current
> > > implementation provides no way to change an established non-zero mkey
> > > without rebooting or manually re-keying each CA on the entire subnet.
> > > 
> > > 2) Enable mkey protection across the fabric.  This involves not only
> > > setting a non-zero protection level, but also providing the SM with a
> > > sufficient information cache to initialize the subnet on restart without
> > > having to wait for mkey lease timeouts (provided one is set).
> > > 
> > > 3) Provide a basis on which to build multiple-mkey systems for OpenSM
> > > (be they per-host, KDF, or random) in the future.
> > > 
> > >      The patches add two new cache files: a port guid-to-mkey cache, and
> > > a neighboring link (port guid to port guid) cache.
> > >      The guid2mkey cache is used to provide a hint at the initial mkey
> > > for a CA during initialization.  It is a hint only; the SM is capable of
> > > dealing with cases where the guid2mkey cache is incorrect, although it
> > > may require waiting for (potentially multiple) mkey lease timeouts at
> > > non-zero mkey protection levels.  The guid2mkey cache is presented first
> > > in the patch set, as it ends up ameliorating several corner cases in a
> > > cleaner way than attacking them directly did.
> > >      The neighbors cache file provides an initial hint to the SM of what
> > > port guid we may expect at the opposite end of a link that is being
> > > initialized.  This is necessary at mkey protection level 2, where we
> > > cannot do the SubnGet necessary to determine the port guid to use in
> > > looking up an mkey hint.
> > >      The changes to the osm_req functions to support mkeys in patch 2
> > > now require plock to be held when called.  This was generally already
> > > the case, but there were a few spots where it was not.  In most of these
> > > cases, the plock is still not technically necessary, as they occur
> > > during hops 0/1 when DR path traversal is trivial.  I wrapped all of
> > > these occurrences in locks in a separate patch (#3), in order to make
> > > the changes more obvious and invite comment.
> > > 
> > >      Jim
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support
  2012-07-23 15:55           ` [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support Alex Netes
@ 2012-07-23 22:37             ` Jim Foraker
  0 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-07-23 22:37 UTC (permalink / raw)
  To: Alex Netes; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Weiny, Ira K.


On Mon, 2012-07-23 at 08:55 -0700, Alex Netes wrote:
> Hi Jim,
> 
> On 17:25 Tue 03 Jul     , Jim Foraker wrote:
> > Adds support for a guid2mkey file, and uses the database
> > to select which mkey to use in outgoing SMPs.
> 
> One thing that I notice that missing (and I'm not sure how common it's being used,
> but still I see it as part of SUSE distro) is sldd.sh script found under
> scripts. It distributes guid2lid between STANDBY SMs, so I think guid2mkey file
> should be also redistributed between the STANDBY SMs.
     That probably makes sense.  The neighbors cache file would fall
into the same category as well.

     Jim

> 
> >
> > Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> > ---
> >  include/opensm/osm_db.h      |    7 +-
> >  include/opensm/osm_db_pack.h |  144 ++++++++++++++++++++++++++++++++++++++++++
> >  include/opensm/osm_port.h    |   23 ++-----
> >  include/opensm/osm_subnet.h  |    2 +
> >  opensm/osm_db_files.c        |    1 +
> >  opensm/osm_db_pack.c         |   73 +++++++++++++++++++++
> >  opensm/osm_lid_mgr.c         |   12 +++-
> >  opensm/osm_link_mgr.c        |   14 +++-
> >  opensm/osm_opensm.c          |   10 +--
> >  opensm/osm_port.c            |   32 ++++++++++
> >  opensm/osm_port_info_rcv.c   |    6 +-
> >  opensm/osm_req.c             |    1 +
> >  opensm/osm_state_mgr.c       |    4 ++
> >  opensm/osm_subnet.c          |   77 ++++++++++++++++++++++
> >  14 files changed, 377 insertions(+), 29 deletions(-)
> >
> > diff --git a/include/opensm/osm_db.h b/include/opensm/osm_db.h
> > index 7077347..d05bfa0 100644
> > --- a/include/opensm/osm_db.h
> > +++ b/include/opensm/osm_db.h
> > @@ -43,7 +43,8 @@
> >
> >  #include <complib/cl_list.h>
> >  #include <complib/cl_spinlock.h>
> > -#include <opensm/osm_log.h>
> > +
> > +struct osm_log;
> >
> >  #ifdef __cplusplus
> >  #  define BEGIN_C_DECLS extern "C" {
> > @@ -118,7 +119,7 @@ typedef struct osm_db_domain {
> >  */
> >  typedef struct osm_db {
> >       void *p_db_imp;
> > -     osm_log_t *p_log;
> > +     struct osm_log *p_log;
> >       cl_list_t domains;
> >  } osm_db_t;
> >  /*
> > @@ -185,7 +186,7 @@ void osm_db_destroy(IN osm_db_t * p_db);
> >  *
> >  * SYNOPSIS
> >  */
> > -int osm_db_init(IN osm_db_t * p_db, IN osm_log_t * p_log);
> > +int osm_db_init(IN osm_db_t * p_db, IN struct osm_log * p_log);
> >  /*
> >  * PARAMETERS
> >  *
> > diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
> > index 3d24926..25644df 100644
> > --- a/include/opensm/osm_db_pack.h
> > +++ b/include/opensm/osm_db_pack.h
> > @@ -235,5 +235,149 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid);
> >  * osm_db_guid2lid_get, osm_db_guid2lid_set
> >  *********/
> >
> > +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_init
> > +* NAME
> > +*    osm_db_guid2mkey_init
> > +*
> > +* DESCRIPTION
> > +*    Initialize a domain for the guid2mkey table
> > +*
> > +* SYNOPSIS
> > +*/
> > +static inline osm_db_domain_t *osm_db_guid2mkey_init(IN osm_db_t * p_db)
> > +{
> > +     return (osm_db_domain_init(p_db, "guid2mkey"));
> > +}
> > +
> > +/*
> > +* PARAMETERS
> > +*    p_db
> > +*            [in] Pointer to the database object to construct
> > +*
> > +* RETURN VALUES
> > +*    The pointer to the new allocated domain object or NULL.
> > +*
> > +* NOTE: DB domains are destroyed by the osm_db_destroy
> > +*
> > +* SEE ALSO
> > +*    Database, osm_db_init, osm_db_destroy
> > +*********/
> > +
> > +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_guids
> > +* NAME
> > +*    osm_db_guid2mkey_guids
> > +*
> > +* DESCRIPTION
> > +*    Provides back a list of guid elements.
> > +*
> > +* SYNOPSIS
> > +*/
> > +int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
> > +                       OUT cl_qlist_t * p_guid_list);
> > +/*
> > +* PARAMETERS
> > +*    p_g2l
> > +*            [in] Pointer to the guid2mkey domain
> > +*
> > +*  p_guid_list
> > +*     [out] A quick list of guid elements of type osm_db_guid_elem_t
> > +*
> > +* RETURN VALUES
> > +*    0 if successful
> > +*
> > +* NOTE: the output qlist should be initialized and each item freed
> > +*       by the caller, then destroyed.
> > +*
> > +* SEE ALSO
> > +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids, osm_db_guid2mkey_get
> > +* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
> > +*********/
> > +
> > +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_get
> > +* NAME
> > +*    osm_db_guid2mkey_get
> > +*
> > +* DESCRIPTION
> > +*    Get the mkey for the given guid.
> > +*
> > +* SYNOPSIS
> > +*/
> > +int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> > +                      OUT uint64_t * p_mkey);
> > +/*
> > +* PARAMETERS
> > +*    p_g2m
> > +*            [in] Pointer to the guid2mkey domain
> > +*
> > +*  guid
> > +*     [in] The guid to look for
> > +*
> > +*  p_mkey
> > +*     [out] Pointer to the resulting mkey in host order.
> > +*
> > +* RETURN VALUES
> > +*    0 if successful. The lid will be set to 0 if not found.
> > +*
> > +* SEE ALSO
> > +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
> > +* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
> > +*********/
> > +
> > +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_set
> > +* NAME
> > +*    osm_db_guid2mkey_set
> > +*
> > +* DESCRIPTION
> > +*    Set the mkey for the given guid.
> > +*
> > +* SYNOPSIS
> > +*/
> > +int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> > +                      IN uint64_t mkey);
> > +/*
> > +* PARAMETERS
> > +*    p_g2m
> > +*            [in] Pointer to the guid2mkey domain
> > +*
> > +*  guid
> > +*     [in] The guid to look for
> > +*
> > +*  mkey
> > +*     [in] The mkey value to set, in host order
> > +*
> > +* RETURN VALUES
> > +*    0 if successful
> > +*
> > +* SEE ALSO
> > +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
> > +* osm_db_guid2mkey_get, osm_db_guid2mkey_delete
> > +*********/
> > +
> > +/****f* OpenSM: DB-Pack/osm_db_guid2mkey_delete
> > +* NAME
> > +*    osm_db_guid2mkey_delete
> > +*
> > +* DESCRIPTION
> > +*    Delete the entry by the given guid
> > +*
> > +* SYNOPSIS
> > +*/
> > +int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
> > +/*
> > +* PARAMETERS
> > +*    p_g2m
> > +*            [in] Pointer to the guid2mkey domain
> > +*
> > +*  guid
> > +*     [in] The guid to look for
> > +*
> > +* RETURN VALUES
> > +*    0 if successful otherwise 1
> > +*
> > +* SEE ALSO
> > +* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
> > +* osm_db_guid2mkey_get, osm_db_guid2mkey_set
> > +*********/
> > +
> >  END_C_DECLS
> >  #endif                               /* _OSM_DB_PACK_H_ */
> > diff --git a/include/opensm/osm_port.h b/include/opensm/osm_port.h
> > index 56e9c37..6b73cc7 100644
> > --- a/include/opensm/osm_port.h
> > +++ b/include/opensm/osm_port.h
> > @@ -66,6 +66,7 @@ BEGIN_C_DECLS
> >  struct osm_port;
> >  struct osm_node;
> >  struct osm_mgrp;
> > +struct osm_sm;
> >
> >  /****h* OpenSM/Physical Port
> >  * NAME
> > @@ -431,22 +432,9 @@ static inline void osm_physp_set_health(IN osm_physp_t * p_physp,
> >  *
> >  * SYNOPSIS
> >  */
> > -static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> > -                                        IN const ib_port_info_t * p_pi)
> > -{
> > -     CL_ASSERT(p_pi);
> > -     CL_ASSERT(osm_physp_is_valid(p_physp));
> > -
> > -     if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
> > -             /* If PortState is down, only copy PortState */
> > -             /* and PortPhysicalState per C14-24-2.1 */
> > -             ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
> > -             ib_port_info_set_port_phys_state
> > -                 (ib_port_info_get_port_phys_state(p_pi),
> > -                  &p_physp->port_info);
> > -     } else
> > -             p_physp->port_info = *p_pi;
> > -}
> > +void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> > +                                        IN const ib_port_info_t * p_pi,
> > +                                        IN const struct osm_sm * p_sm);
> >
> >  /*
> >  * PARAMETERS
> > @@ -456,6 +444,9 @@ static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> >  *    p_pi
> >  *            [in] Pointer to the IBA defined PortInfo at this port number.
> >  *
> > +*    p_sm
> > +*            [in] Pointer to an osm_sm_t object.
> > +*
> >  * RETURN VALUES
> >  *    This function does not return a value.
> >  *
> > diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
> > index 838ca82..c13d0c8 100644
> > --- a/include/opensm/osm_subnet.h
> > +++ b/include/opensm/osm_subnet.h
> > @@ -54,6 +54,7 @@
> >  #include <complib/cl_list.h>
> >  #include <opensm/osm_base.h>
> >  #include <opensm/osm_prefix_route.h>
> > +#include <opensm/osm_db.h>
> >  #include <stdio.h>
> >
> >  #ifdef __cplusplus
> > @@ -600,6 +601,7 @@ typedef struct osm_subn {
> >       boolean_t sweeping_enabled;
> >       unsigned need_update;
> >       cl_fmap_t mgrp_mgid_tbl;
> > +     osm_db_domain_t *p_g2m;
> >       void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
> >       osm_log_level_t per_mod_log_tbl[256];
> >  } osm_subn_t;
> > diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c
> > index 7ab6b56..9f338f3 100644
> > --- a/opensm/osm_db_files.c
> > +++ b/opensm/osm_db_files.c
> > @@ -50,6 +50,7 @@
> >  #define FILE_ID OSM_FILE_DB_FILES_C
> >  #include <opensm/st.h>
> >  #include <opensm/osm_db.h>
> > +#include <opensm/osm_log.h>
> >
> >  /****d* Database/OSM_DB_MAX_LINE_LEN
> >   * NAME
> > diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
> > index c1ec4ab..57c3a66 100644
> > --- a/opensm/osm_db_pack.c
> > +++ b/opensm/osm_db_pack.c
> > @@ -85,6 +85,17 @@ static inline int unpack_lids(IN char *p_lid_str, OUT uint16_t * p_min_lid,
> >       return 0;
> >  }
> >
> > +static inline void pack_mkey(uint64_t mkey, char *p_mkey_str)
> > +{
> > +     sprintf(p_mkey_str, "0x%016" PRIx64, mkey);
> > +}
> > +
> > +static inline uint64_t unpack_mkey(char *p_mkey_str)
> > +{
> > +     return strtoull(p_mkey_str, NULL, 0);
> > +}
> > +
> > +
> >  int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
> >                         OUT cl_qlist_t * p_guid_list)
> >  {
> > @@ -151,3 +162,65 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid)
> >       pack_guid(guid, guid_str);
> >       return osm_db_delete(p_g2l, guid_str);
> >  }
> > +
> > +int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
> > +                        OUT cl_qlist_t * p_guid_list)
> > +{
> > +     char *p_key;
> > +     cl_list_t keys;
> > +     osm_db_guid_elem_t *p_guid_elem;
> > +
> > +     cl_list_construct(&keys);
> > +     cl_list_init(&keys, 10);
> > +
> > +     if (osm_db_keys(p_g2m, &keys))
> > +             return 1;
> > +
> > +     while ((p_key = cl_list_remove_head(&keys)) != NULL) {
> > +             p_guid_elem =
> > +                 (osm_db_guid_elem_t *) malloc(sizeof(osm_db_guid_elem_t));
> > +             CL_ASSERT(p_guid_elem != NULL);
> > +
> > +             p_guid_elem->guid = unpack_guid(p_key);
> > +             cl_qlist_insert_head(p_guid_list, &p_guid_elem->item);
> > +     }
> > +
> > +     cl_list_destroy(&keys);
> > +     return 0;
> > +}
> > +
> > +int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> > +                      OUT uint64_t * p_mkey)
> > +{
> > +     char guid_str[20];
> > +     char *p_mkey_str;
> > +
> > +     pack_guid(guid, guid_str);
> > +     p_mkey_str = osm_db_lookup(p_g2m, guid_str);
> > +     if (!p_mkey_str)
> > +             return 1;
> > +
> > +     if (p_mkey)
> > +             *p_mkey = unpack_mkey(p_mkey_str);
> > +
> > +     return 0;
> > +}
> > +
> > +int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
> > +                      IN uint64_t mkey)
> > +{
> > +     char guid_str[20];
> > +     char mkey_str[20];
> > +
> > +     pack_guid(guid, guid_str);
> > +     pack_mkey(mkey, mkey_str);
> > +
> > +     return osm_db_update(p_g2m, guid_str, mkey_str);
> > +}
> > +
> > +int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
> > +{
> > +     char guid_str[20];
> > +     pack_guid(guid, guid_str);
> > +     return osm_db_delete(p_g2m, guid_str);
> > +}
> > diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
> > index cb7ff0b..7799ee3 100644
> > --- a/opensm/osm_lid_mgr.c
> > +++ b/opensm/osm_lid_mgr.c
> > @@ -800,6 +800,7 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
> >       uint8_t op_vls;
> >       uint8_t port_num;
> >       boolean_t send_set = FALSE;
> > +     boolean_t update_mkey = FALSE;
> >       int ret = 0;
> >
> >       OSM_LOG_ENTER(p_mgr->p_log);
> > @@ -862,8 +863,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
> >               send_set = TRUE;
> >
> >       p_pi->m_key = p_mgr->p_subn->opt.m_key;
> > -     if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key)))
> > +     if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key))) {
> > +             update_mkey = TRUE;
> >               send_set = TRUE;
> > +     }
> >
> >       p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix;
> >       if (memcmp(&p_pi->subnet_prefix, &p_old_pi->subnet_prefix,
> > @@ -1053,6 +1056,13 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
> >                            CL_DISP_MSGID_NONE, &context);
> >       if (status != IB_SUCCESS)
> >               ret = -1;
> > +     /* If we sent a new mkey above, update our guid2mkey map
> > +        now, on the assumption that the SubnSet succeeds
> > +     */
> > +     if (update_mkey)
> > +             osm_db_guid2mkey_set(p_mgr->p_subn->p_g2m,
> > +                                  cl_ntoh64(p_physp->port_guid),
> > +                                  cl_ntoh64(p_pi->m_key));
> >
> >  Exit:
> >       OSM_LOG_EXIT(p_mgr->p_log);
> > diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
> > index 8301643..50393c5 100644
> > --- a/opensm/osm_link_mgr.c
> > +++ b/opensm/osm_link_mgr.c
> > @@ -56,6 +56,7 @@
> >  #include <opensm/osm_helper.h>
> >  #include <opensm/osm_msgdef.h>
> >  #include <opensm/osm_opensm.h>
> > +#include <opensm/osm_db_pack.h>
> >
> >  static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
> >  {
> > @@ -104,6 +105,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
> >       int qdr_change = 0, fdr10_change = 0;
> >       int ret = 0;
> >       ib_net32_t attr_mod, cap_mask;
> > +     boolean_t update_mkey = FALSE;
> >
> >       OSM_LOG_ENTER(sm->p_log);
> >
> > @@ -194,8 +196,10 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
> >                   port_num == 0) {
> >                       p_pi->m_key = sm->p_subn->opt.m_key;
> >                       if (memcmp(&p_pi->m_key, &p_old_pi->m_key,
> > -                                sizeof(p_pi->m_key)))
> > +                                sizeof(p_pi->m_key))) {
> > +                             update_mkey = TRUE;
> >                               send_set = TRUE;
> > +                     }
> >
> >                       p_pi->subnet_prefix = sm->p_subn->opt.subnet_prefix;
> >                       if (memcmp(&p_pi->subnet_prefix,
> > @@ -466,6 +470,14 @@ Send:
> >       if (status)
> >               ret = -1;
> >
> > +     /* If we sent a new mkey above, update our guid2mkey map
> > +        now, on the assumption that the SubnSet succeeds
> > +      */
> > +     if (update_mkey)
> > +             osm_db_guid2mkey_set(sm->p_subn->p_g2m,
> > +                                  cl_ntoh64(p_physp->port_guid),
> > +                                  cl_ntoh64(p_pi->m_key));
> > +
> >       if (send_set2) {
> >               status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_physp),
> >                                    payload2, sizeof(payload2),
> > diff --git a/opensm/osm_opensm.c b/opensm/osm_opensm.c
> > index 429108a..42cbb36 100644
> > --- a/opensm/osm_opensm.c
> > +++ b/opensm/osm_opensm.c
> > @@ -413,6 +413,11 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
> >       if (status != IB_SUCCESS)
> >               goto Exit;
> >
> > +     /* the DB is in use by subn so init before */
> > +     status = osm_db_init(&p_osm->db, &p_osm->log);
> > +     if (status != IB_SUCCESS)
> > +             goto Exit;
> > +
> >       status = osm_subn_init(&p_osm->subn, p_osm, p_opt);
> >       if (status != IB_SUCCESS)
> >               goto Exit;
> > @@ -435,11 +440,6 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
> >       if (status != IB_SUCCESS)
> >               goto Exit;
> >
> > -     /* the DB is in use by the SM and SA so init before */
> > -     status = osm_db_init(&p_osm->db, &p_osm->log);
> > -     if (status != IB_SUCCESS)
> > -             goto Exit;
> > -
> >       status = osm_sm_init(&p_osm->sm, &p_osm->subn, &p_osm->db,
> >                            p_osm->p_vendor, &p_osm->mad_pool, &p_osm->vl15,
> >                            &p_osm->log, &p_osm->stats, &p_osm->disp,
> > diff --git a/opensm/osm_port.c b/opensm/osm_port.c
> > index 88b9fd8..0730c14 100644
> > --- a/opensm/osm_port.c
> > +++ b/opensm/osm_port.c
> > @@ -54,6 +54,8 @@
> >  #include <opensm/osm_node.h>
> >  #include <opensm/osm_madw.h>
> >  #include <opensm/osm_switch.h>
> > +#include <opensm/osm_db_pack.h>
> > +#include <opensm/osm_sm.h>
> >
> >  void osm_physp_construct(IN osm_physp_t * p_physp)
> >  {
> > @@ -659,3 +661,33 @@ void osm_alias_guid_delete(IN OUT osm_alias_guid_t ** pp_alias_guid)
> >       free(*pp_alias_guid);
> >       *pp_alias_guid = NULL;
> >  }
> > +
> > +void osm_physp_set_port_info(IN osm_physp_t * p_physp,
> > +                                        IN const ib_port_info_t * p_pi,
> > +                                        IN const struct osm_sm * p_sm)
> > +{
> > +     CL_ASSERT(p_pi);
> > +     CL_ASSERT(osm_physp_is_valid(p_physp));
> > +
> > +     if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
> > +             /* If PortState is down, only copy PortState */
> > +             /* and PortPhysicalState per C14-24-2.1 */
> > +             ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
> > +             ib_port_info_set_port_phys_state
> > +                 (ib_port_info_get_port_phys_state(p_pi),
> > +                  &p_physp->port_info);
> > +     } else {
> > +             p_physp->port_info = *p_pi;
> > +
> > +             /* The MKey in p_pi can only be considered valid if it's
> > +              * for a HCA/router or switch port 0, and it's either
> > +              * non-zero or the MKeyProtect bits are also zero.
> > +              */
> > +             if ((osm_node_get_type(p_physp->p_node) != IB_NODE_TYPE_SWITCH ||
> > +                  p_physp->port_num == 0) &&
> > +                 (p_pi->m_key != 0 || ib_port_info_get_mpb(p_pi) == 0))
> > +                     osm_db_guid2mkey_set(p_sm->p_subn->p_g2m,
> > +                                          cl_ntoh64(p_physp->port_guid),
> > +                                          cl_ntoh64(p_pi->m_key));
> > +     }
> > +}
> > diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
> > index ab7418b..00cbfc7 100644
> > --- a/opensm/osm_port_info_rcv.c
> > +++ b/opensm/osm_port_info_rcv.c
> > @@ -312,7 +312,7 @@ static void pi_rcv_process_switch_port(IN osm_sm_t * sm, IN osm_node_t * p_node,
> >       /*
> >          Update the PortInfo attribute.
> >        */
> > -     osm_physp_set_port_info(p_physp, p_pi);
> > +     osm_physp_set_port_info(p_physp, p_pi, sm);
> >
> >       if (port_num == 0) {
> >               /* Determine if base switch port 0 */
> > @@ -337,7 +337,7 @@ static void pi_rcv_process_ca_or_router_port(IN osm_sm_t * sm,
> >
> >       pi_rcv_check_and_fix_lid(sm->p_log, p_pi, p_physp);
> >
> > -     osm_physp_set_port_info(p_physp, p_pi);
> > +     osm_physp_set_port_info(p_physp, p_pi, sm);
> >
> >       pi_rcv_process_endport(sm, p_physp, p_pi);
> >
> > @@ -475,7 +475,7 @@ static void pi_rcv_process_set(IN osm_sm_t * sm, IN osm_node_t * p_node,
> >               cl_ntoh64(osm_node_get_node_guid(p_node)),
> >               cl_ntoh64(p_smp->trans_id));
> >
> > -     osm_physp_set_port_info(p_physp, p_pi);
> > +     osm_physp_set_port_info(p_physp, p_pi, sm);
> >
> >       OSM_LOG_EXIT(sm->p_log);
> >  }
> > diff --git a/opensm/osm_req.c b/opensm/osm_req.c
> > index 2532f9c..51220f3 100644
> > --- a/opensm/osm_req.c
> > +++ b/opensm/osm_req.c
> > @@ -58,6 +58,7 @@
> >  #include <opensm/osm_vl15intf.h>
> >  #include <opensm/osm_msgdef.h>
> >  #include <opensm/osm_opensm.h>
> > +#include <opensm/osm_db_pack.h>
> >
> >  /**********************************************************************
> >    The plock MAY or MAY NOT be held before calling this function.
> > diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
> > index 143b744..74114af 100644
> > --- a/opensm/osm_state_mgr.c
> > +++ b/opensm/osm_state_mgr.c
> > @@ -66,6 +66,7 @@
> >  #include <vendor/osm_vendor_api.h>
> >  #include <opensm/osm_inform.h>
> >  #include <opensm/osm_opensm.h>
> > +#include <opensm/osm_db.h>
> >
> >  extern void osm_drop_mgr_process(IN osm_sm_t * sm);
> >  extern int osm_qos_setup(IN osm_opensm_t * p_osm);
> > @@ -1440,6 +1441,9 @@ repeat_discovery:
> >       if (sm->p_subn->force_heavy_sweep
> >           || sm->p_subn->subnet_initialization_error)
> >               osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
> > +
> > +     /* Write a new copy of our persistent guid2mkey database */
> > +     osm_db_store(sm->p_subn->p_g2m);
> >  }
> >
> >  static void do_process_mgrp_queue(osm_sm_t * sm)
> > diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
> > index 7fb5c8f..47a5606 100644
> > --- a/opensm/osm_subnet.c
> > +++ b/opensm/osm_subnet.c
> > @@ -75,6 +75,8 @@
> >  #include <opensm/osm_event_plugin.h>
> >  #include <opensm/osm_qos_policy.h>
> >  #include <opensm/osm_service.h>
> > +#include <opensm/osm_db.h>
> > +#include <opensm/osm_db_pack.h>
> >
> >  static const char null_str[] = "(null)";
> >
> > @@ -538,6 +540,52 @@ static int compar_mgids(const void *m1, const void *m2)
> >       return memcmp(m1, m2, sizeof(ib_gid_t));
> >  }
> >
> > +static void subn_validate_g2m(osm_subn_t *p_subn)
> > +{
> > +     cl_qlist_t guids;
> > +     osm_db_guid_elem_t *p_item;
> > +     uint64_t mkey;
> > +     boolean_t valid_entry;
> > +
> > +     OSM_LOG_ENTER(&(p_subn->p_osm->log));
> > +     cl_qlist_init(&guids);
> > +
> > +     if (osm_db_guid2mkey_guids(p_subn->p_g2m, &guids)) {
> > +             OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7506: "
> > +                     "could not get mkey guid list\n");
> > +             goto Exit;
> > +     }
> > +
> > +     while ((p_item = (osm_db_guid_elem_t *) cl_qlist_remove_head(&guids))
> > +            != (osm_db_guid_elem_t *) cl_qlist_end(&guids)) {
> > +             valid_entry = TRUE;
> > +
> > +             if (p_item->guid == 0) {
> > +                     OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
> > +                             "ERR 7507: found invalid zero guid");
> > +                     valid_entry = FALSE;
> > +             } else if (osm_db_guid2mkey_get(p_subn->p_g2m, p_item->guid,
> > +                                             &mkey)) {
> > +                     OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
> > +                             "ERR 7508: could not get mkey for guid:0x%016"
> > +                             PRIx64 "\n", p_item->guid);
> > +                     valid_entry = FALSE;
> > +             }
> > +
> > +             if (valid_entry == FALSE) {
> > +                     if (osm_db_guid2mkey_delete(p_subn->p_g2m,
> > +                                                        p_item->guid))
> > +                                OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
> > +                                        "ERR 7509: failed to delete entry for "
> > +                                        "guid:0x%016" PRIx64 "\n",
> > +                                        p_item->guid);
> > +             }
> > +     }
> > +
> > +Exit:
> > +     OSM_LOG_EXIT(&(p_subn->p_osm->log));
> > +}
> > +
> >  void osm_subn_construct(IN osm_subn_t * p_subn)
> >  {
> >       memset(p_subn, 0, sizeof(*p_subn));
> > @@ -744,6 +792,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
> >       p_subn->sweeping_enabled = TRUE;
> >       p_subn->last_sm_port_state = 1;
> >
> > +     /* Initialize the guid2mkey database */
> > +     p_subn->p_g2m = osm_db_domain_init(&(p_osm->db), "guid2mkey");
> > +     if (!p_subn->p_g2m) {
> > +             OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7510: "
> > +                     "Error initializing Guid-to-MKey persistent database\n");
> > +             return IB_ERROR;
> > +     }
> > +
> > +     if (osm_db_restore(p_subn->p_g2m)) {
> > +#ifndef __WIN__
> > +             /*
> > +              * When Windows is BSODing, it might corrupt files that
> > +              * were previously opened for writing, even if the files
> > +              * are closed, so we might see corrupted guid2mkey file.
> > +              */
> > +             if (p_subn->opt.exit_on_fatal) {
> > +                     osm_log(&(p_osm->log), OSM_LOG_SYS,
> > +                             "FATAL: Error restoring Guid-to-Mkey "
> > +                             "persistent database\n");
> > +                     return IB_ERROR;
> > +             } else
> > +#endif
> > +                     OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
> > +                             "ERR 7511: Error restoring Guid-to-Mkey "
> > +                             "persistent database\n");
> > +     }
> > +
> > +     subn_validate_g2m(p_subn);
> > +
> >       return IB_SUCCESS;
> >  }
> >
> > --
> > 1.7.9.2
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/8] opensm: Ensure sweep interval/mkey lease are sensibly set
       [not found]         ` <1340672104-18039-8-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
@ 2012-07-24  9:01           ` Alex Netes
  2012-07-24 17:40             ` Jim Foraker
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Netes @ 2012-07-24  9:01 UTC (permalink / raw)
  To: Jim Foraker; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, weiny2-i2BcT+NCU+M

Hi Jim,

On 17:55 Mon 25 Jun     , Jim Foraker wrote:
> If mkeys are protected, sweep should always be enabled and
> set to an interval < the lease timeout, to ensure a missed trap
> doesn't lead to mkey exposure.

This is a minimal requirement, but it might not be enough to avoid mkey
exposure completely. In noisy fabrics, sweep duration may take more
time than sweep interval (and lease timeout), so theoretically we can still
get mkey exposure. However, I don't know how can we completely avoid this.

> 
> Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> ---
>  opensm/osm_subnet.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
> index ddee955..3f336d8 100644
> --- a/opensm/osm_subnet.c
> +++ b/opensm/osm_subnet.c
> @@ -1502,6 +1502,26 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
>  			   "instead\n", p_opts->m_key_protect_bits, 2);
>  		p_opts->m_key_protect_bits = 2;
>  	}
> +	if (p_opts->m_key_protect_bits && p_opts->m_key_lease_period) {
> +		if (!p_opts->sweep_interval) {
> +			log_report(" Sweep disabled with protected mkey "
> +				   "leases in effect; re-enabling sweeping "
> +				   "with interval %u\n",
> +				   cl_ntoh16(p_opts->m_key_lease_period) - 1);
> +			p_opts->sweep_interval =
> +				cl_ntoh16(p_opts->m_key_lease_period) - 1;
> +		}
> +		if (p_opts->sweep_interval >=
> +			cl_ntoh16(p_opts->m_key_lease_period)) {
> +			log_report(" Sweep interval %u >= mkey lease period "
> +				   "%u. Setting lease period to %u\n",
> +				   p_opts->sweep_interval,
> +				   cl_ntoh16(p_opts->m_key_lease_period),
> +				   p_opts->sweep_interval + 1);
> +			p_opts->m_key_lease_period =
> +				cl_hton16(p_opts->sweep_interval + 1);
> +		}
> +	}
>  
>  	if (p_opts->root_guid_file != NULL) {
>  		FILE *root_file = fopen(p_opts->root_guid_file, "r");
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 8/8] opensm: Ensure sweep interval/mkey lease are sensibly set
  2012-07-24  9:01           ` Alex Netes
@ 2012-07-24 17:40             ` Jim Foraker
  0 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-07-24 17:40 UTC (permalink / raw)
  To: Alex Netes; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Weiny, Ira K.


On Tue, 2012-07-24 at 02:01 -0700, Alex Netes wrote:
> Hi Jim,
> 
> On 17:55 Mon 25 Jun     , Jim Foraker wrote:
> > If mkeys are protected, sweep should always be enabled and
> > set to an interval < the lease timeout, to ensure a missed trap
> > doesn't lead to mkey exposure.
> 
> This is a minimal requirement, but it might not be enough to avoid mkey
> exposure completely. In noisy fabrics, sweep duration may take more
> time than sweep interval (and lease timeout), so theoretically we can still
> get mkey exposure. However, I don't know how can we completely avoid this.
     I thought about this a bit while writing the patch, and
fundamentally we can't.  Particularly if we assume that an attacker can
wield some control over the amount of time a sweep takes, which I think
is very reasonable.
     The primary defense against mkey exposure is the TrapRepress that
the SM sends in response to mkey violation traps, which will reset the
lease timer.  Setting sweep interval < mkey timeout just provides some
complementary backup in case a Trap/TrapRepress somehow gets missed.
     Fundamentally however, if an mkey timeout is set, an attacker who
knows how to confuse/DOS/crash the SM will always be able to recover
keys on the subnet.

     Jim

> 
> > 
> > Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> > ---
> >  opensm/osm_subnet.c |   20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
> > index ddee955..3f336d8 100644
> > --- a/opensm/osm_subnet.c
> > +++ b/opensm/osm_subnet.c
> > @@ -1502,6 +1502,26 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
> >  			   "instead\n", p_opts->m_key_protect_bits, 2);
> >  		p_opts->m_key_protect_bits = 2;
> >  	}
> > +	if (p_opts->m_key_protect_bits && p_opts->m_key_lease_period) {
> > +		if (!p_opts->sweep_interval) {
> > +			log_report(" Sweep disabled with protected mkey "
> > +				   "leases in effect; re-enabling sweeping "
> > +				   "with interval %u\n",
> > +				   cl_ntoh16(p_opts->m_key_lease_period) - 1);
> > +			p_opts->sweep_interval =
> > +				cl_ntoh16(p_opts->m_key_lease_period) - 1;
> > +		}
> > +		if (p_opts->sweep_interval >=
> > +			cl_ntoh16(p_opts->m_key_lease_period)) {
> > +			log_report(" Sweep interval %u >= mkey lease period "
> > +				   "%u. Setting lease period to %u\n",
> > +				   p_opts->sweep_interval,
> > +				   cl_ntoh16(p_opts->m_key_lease_period),
> > +				   p_opts->sweep_interval + 1);
> > +			p_opts->m_key_lease_period =
> > +				cl_hton16(p_opts->sweep_interval + 1);
> > +		}
> > +	}
> >  
> >  	if (p_opts->root_guid_file != NULL) {
> >  		FILE *root_file = fopen(p_opts->root_guid_file, "r");
> > -- 
> > 1.7.9.2
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts
       [not found]               ` <1343081989.29792.12.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
@ 2012-07-29 16:29                 ` Alex Netes
  2012-07-30 17:19                   ` Foraker, Jim
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Netes @ 2012-07-29 16:29 UTC (permalink / raw)
  To: Jim Foraker; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Weiny, Ira K.

Hi Jim,

On 15:19 Mon 23 Jul     , Jim Foraker wrote:
> 
> On Mon, 2012-07-23 at 08:43 -0700, Alex Netes wrote:
> > Hi Jim,
> > 
> > On 17:55 Mon 25 Jun     , Jim Foraker wrote:
> > > A subnet should not be listed as cleanly initialized if CAs
> > > fail to respond to SubnGet requests.
> > > 
> > > Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
> > > ---
> > >  opensm/osm_sm_mad_ctrl.c |    9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
> > > index f0bcff2..464b6b0 100644
> > > --- a/opensm/osm_sm_mad_ctrl.c
> > > +++ b/opensm/osm_sm_mad_ctrl.c
> > > @@ -741,6 +741,15 @@ static void sm_mad_ctrl_send_err_cb(IN void *context, IN osm_madw_t * p_madw)
> > >  			cl_ntoh16(p_smp->attr_id),
> > >  			ib_get_sm_attr_str(p_smp->attr_id));
> > >  		p_ctrl->p_subn->subnet_initialization_error = TRUE;
> > > +	} else if (p_madw->status == IB_TIMEOUT &&
> > > +		   p_smp->method == IB_MAD_METHOD_GET) {
> > 
> > It's pretty common to see timeouts in fabrics without m_key support (e.g.
> > switch reboots) and it's not desirable to start another heavy sweep because
> > of that. So I guess it would be better if we could initiate heavy sweep only
> > when m_key is set and protection level is 2 or 3.
>      This was done primarily to ensure that "SUBNET UP" doesn't get
> displayed/logged while there are unconfigured HCAs due to misset mkeys.
> I'm reasonably sure (I will re-test to verify) that future light sweeps
> will catch HCAs whos mkeys timeout, presuming the timeout is set.  So we
> could also just log the error and not worry about setting
> subnet_initialization_error.

It's fine to have TIMEOUTs on Get() in case we are dealing with M_Key set, but
in general case we don't want to run into heavy sweep loops because of
TIMEOUTs on Get(), so I suggest the following:

+       } else if (p_ctrl->p_subn->opt.m_key &&
+                  p_ctrl->p_subn->opt.m_key_protect_bits > 1 &&
+                  p_madw->status == IB_TIMEOUT &&
+                  p_smp->method == IB_MAD_METHOD_GET) {
+               /* Timeouts on SubnGet may be an indication of an mkey
+                  error at protection levels 2/3 */
+               OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
+                       "Timeout while getting attribute 0x%X (%s)\n",
+                       cl_ntoh16(p_smp->attr_id),
+                       ib_get_sm_attr_str(p_smp->attr_id));
+               p_ctrl->p_subn->subnet_initialization_error = TRUE;


-- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts
  2012-07-29 16:29                 ` Alex Netes
@ 2012-07-30 17:19                   ` Foraker, Jim
  0 siblings, 0 replies; 33+ messages in thread
From: Foraker, Jim @ 2012-07-30 17:19 UTC (permalink / raw)
  To: Alex Netes; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Weiny, Ira K.

On 7/29/12 9:29 AM, "Alex Netes" <alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:

>Hi Jim,
>
>On 15:19 Mon 23 Jul     , Jim Foraker wrote:
>> 
>> On Mon, 2012-07-23 at 08:43 -0700, Alex Netes wrote:
>> > Hi Jim,
>> > 
>> > On 17:55 Mon 25 Jun     , Jim Foraker wrote:
>> > > A subnet should not be listed as cleanly initialized if CAs
>> > > fail to respond to SubnGet requests.
>> > > 
>> > > Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
>> > > ---
>> > >  opensm/osm_sm_mad_ctrl.c |    9 +++++++++
>> > >  1 file changed, 9 insertions(+)
>> > > 
>> > > diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
>> > > index f0bcff2..464b6b0 100644
>> > > --- a/opensm/osm_sm_mad_ctrl.c
>> > > +++ b/opensm/osm_sm_mad_ctrl.c
>> > > @@ -741,6 +741,15 @@ static void sm_mad_ctrl_send_err_cb(IN void
>>*context, IN osm_madw_t * p_madw)
>> > >  			cl_ntoh16(p_smp->attr_id),
>> > >  			ib_get_sm_attr_str(p_smp->attr_id));
>> > >  		p_ctrl->p_subn->subnet_initialization_error = TRUE;
>> > > +	} else if (p_madw->status == IB_TIMEOUT &&
>> > > +		   p_smp->method == IB_MAD_METHOD_GET) {
>> > 
>> > It's pretty common to see timeouts in fabrics without m_key support
>>(e.g.
>> > switch reboots) and it's not desirable to start another heavy sweep
>>because
>> > of that. So I guess it would be better if we could initiate heavy
>>sweep only
>> > when m_key is set and protection level is 2 or 3.
>>      This was done primarily to ensure that "SUBNET UP" doesn't get
>> displayed/logged while there are unconfigured HCAs due to misset mkeys.
>> I'm reasonably sure (I will re-test to verify) that future light sweeps
>> will catch HCAs whos mkeys timeout, presuming the timeout is set.  So we
>> could also just log the error and not worry about setting
>> subnet_initialization_error.
>
>It's fine to have TIMEOUTs on Get() in case we are dealing with M_Key
>set, but
>in general case we don't want to run into heavy sweep loops because of
>TIMEOUTs on Get(), so I suggest the following:
>
>+       } else if (p_ctrl->p_subn->opt.m_key &&
>+                  p_ctrl->p_subn->opt.m_key_protect_bits > 1 &&
>+                  p_madw->status == IB_TIMEOUT &&
>+                  p_smp->method == IB_MAD_METHOD_GET) {
>+               /* Timeouts on SubnGet may be an indication of an mkey
>+                  error at protection levels 2/3 */
>+               OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
>+                       "Timeout while getting attribute 0x%X (%s)\n",
>+                       cl_ntoh16(p_smp->attr_id),
>+                       ib_get_sm_attr_str(p_smp->attr_id));
>+               p_ctrl->p_subn->subnet_initialization_error = TRUE;
>
>
>-- Alex
>

     At the moment, what I have instead is:

+ 	} else if (p_madw->status == IB_TIMEOUT &&
+ 		   p_smp->method == IB_MAD_METHOD_GET) {
+		OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
+ 			"Timeout while getting attribute 0x%X (%s); "
+ 			"Possible mis-set mkey?\n",
+ 		cl_ntoh16(p_smp->attr_id),
+ 		ib_get_sm_attr_str(p_smp->attr_id));

IE, we do not set the initialization error flag, but we always log the
error.  I like always reporting the error, because it better catches the
case where the CA's mkey/protect bits don't match what the SM expects.


     Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/8] opensm: Improved mkey support
       [not found] ` <1340672058.5218.97.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  2012-06-26  0:54   ` [PATCH 1/8] opensm: Add guid2mkey cache file support Jim Foraker
  2012-07-04  0:25   ` [PATCH 0/8] opensm: Improved mkey support Jim Foraker
@ 2012-08-01 14:48   ` Jim Foraker
       [not found]     ` <1343832537.26423.8.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  2 siblings, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:48 UTC (permalink / raw)
  To: linux-rdma; +Cc: Alex Netes, Ira Weiny

v2 is about to be posted.  It is now a 9-patch set.  Changes from v1:

. Subnet initialization behavior changed to log errors on SubnGet
timeouts, but not flag the init as failed, so that we don't heavy sweep
more than necessary
. sldd.sh modified to sync multiple files, and neighbors/guid2mkey
caches added to its default list
. Rebased against current HEAD
. Several whitespace/code format/git log cleanup fixes

     There will be a man page patch coming shortly, but I want to get
these out the door for review now.

     Jim

On Mon, 2012-06-25 at 17:54 -0700, Jim Foraker wrote:
> I'm about to post a set of patches intended to improve mkey support
> in OpenSM.  These patches have been fairly rigorously tested on a small
> fabric, and I believe are sufficiently stable for inclusion.  The
> primary intent here is threefold:
> 
> 1) Fix a multitude of edge case issues with the existing
> single-mkey-per-subnet support in OpenSM.  For instance, the current
> implementation provides no way to change an established non-zero mkey
> without rebooting or manually re-keying each CA on the entire subnet.
> 
> 2) Enable mkey protection across the fabric.  This involves not only
> setting a non-zero protection level, but also providing the SM with a
> sufficient information cache to initialize the subnet on restart without
> having to wait for mkey lease timeouts (provided one is set).
> 
> 3) Provide a basis on which to build multiple-mkey systems for OpenSM
> (be they per-host, KDF, or random) in the future.
> 
>      The patches add two new cache files: a port guid-to-mkey cache, and
> a neighboring link (port guid to port guid) cache.
>      The guid2mkey cache is used to provide a hint at the initial mkey
> for a CA during initialization.  It is a hint only; the SM is capable of
> dealing with cases where the guid2mkey cache is incorrect, although it
> may require waiting for (potentially multiple) mkey lease timeouts at
> non-zero mkey protection levels.  The guid2mkey cache is presented first
> in the patch set, as it ends up ameliorating several corner cases in a
> cleaner way than attacking them directly did.
>      The neighbors cache file provides an initial hint to the SM of what
> port guid we may expect at the opposite end of a link that is being
> initialized.  This is necessary at mkey protection level 2, where we
> cannot do the SubnGet necessary to determine the port guid to use in
> looking up an mkey hint.
>      The changes to the osm_req functions to support mkeys in patch 2
> now require plock to be held when called.  This was generally already
> the case, but there were a few spots where it was not.  In most of these
> cases, the plock is still not technically necessary, as they occur
> during hops 0/1 when DR path traversal is trivial.  I wrapped all of
> these occurrences in locks in a separate patch (#3), in order to make
> the changes more obvious and invite comment.
> 
>      Jim

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/9 v2] opensm: Add guid2mkey cache file support
       [not found]     ` <1343832537.26423.8.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
@ 2012-08-01 14:52       ` Jim Foraker
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-08-01 20:19       ` [PATCH 0/8] opensm: Improved mkey support Alex Netes
  1 sibling, 1 reply; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

Adds support for a guid2mkey file, and uses the database
to select which mkey to use in outgoing SMPs.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_db.h      |    7 +-
 include/opensm/osm_db_pack.h |  144 ++++++++++++++++++++++++++++++++++++++++++
 include/opensm/osm_port.h    |   23 ++-----
 include/opensm/osm_subnet.h  |    2 +
 opensm/osm_db_files.c        |    1 +
 opensm/osm_db_pack.c         |   73 +++++++++++++++++++++
 opensm/osm_lid_mgr.c         |   12 +++-
 opensm/osm_link_mgr.c        |   14 +++-
 opensm/osm_opensm.c          |   10 +--
 opensm/osm_port.c            |   32 ++++++++++
 opensm/osm_port_info_rcv.c   |    6 +-
 opensm/osm_req.c             |    1 +
 opensm/osm_state_mgr.c       |    4 ++
 opensm/osm_subnet.c          |   77 ++++++++++++++++++++++
 14 files changed, 377 insertions(+), 29 deletions(-)

diff --git a/include/opensm/osm_db.h b/include/opensm/osm_db.h
index 7077347..d05bfa0 100644
--- a/include/opensm/osm_db.h
+++ b/include/opensm/osm_db.h
@@ -43,7 +43,8 @@
 
 #include <complib/cl_list.h>
 #include <complib/cl_spinlock.h>
-#include <opensm/osm_log.h>
+
+struct osm_log;
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -118,7 +119,7 @@ typedef struct osm_db_domain {
 */
 typedef struct osm_db {
 	void *p_db_imp;
-	osm_log_t *p_log;
+	struct osm_log *p_log;
 	cl_list_t domains;
 } osm_db_t;
 /*
@@ -185,7 +186,7 @@ void osm_db_destroy(IN osm_db_t * p_db);
 *
 * SYNOPSIS
 */
-int osm_db_init(IN osm_db_t * p_db, IN osm_log_t * p_log);
+int osm_db_init(IN osm_db_t * p_db, IN struct osm_log * p_log);
 /*
 * PARAMETERS
 *
diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
index 3d24926..af43ba1 100644
--- a/include/opensm/osm_db_pack.h
+++ b/include/opensm/osm_db_pack.h
@@ -235,5 +235,149 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid);
 * osm_db_guid2lid_get, osm_db_guid2lid_set
 *********/
 
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_init
+* NAME
+*	osm_db_guid2mkey_init
+*
+* DESCRIPTION
+*	Initialize a domain for the guid2mkey table
+*
+* SYNOPSIS
+*/
+static inline osm_db_domain_t *osm_db_guid2mkey_init(IN osm_db_t * p_db)
+{
+	return osm_db_domain_init(p_db, "guid2mkey");
+}
+
+/*
+* PARAMETERS
+*	p_db
+*		[in] Pointer to the database object to construct
+*
+* RETURN VALUES
+*	The pointer to the new allocated domain object or NULL.
+*
+* NOTE: DB domains are destroyed by the osm_db_destroy
+*
+* SEE ALSO
+*	Database, osm_db_init, osm_db_destroy
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_guids
+* NAME
+*	osm_db_guid2mkey_guids
+*
+* DESCRIPTION
+*	Provides back a list of guid elements.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
+			  OUT cl_qlist_t * p_guid_list);
+/*
+* PARAMETERS
+*	p_g2l
+*		[in] Pointer to the guid2mkey domain
+*
+*  p_guid_list
+*     [out] A quick list of guid elements of type osm_db_guid_elem_t
+*
+* RETURN VALUES
+*	0 if successful
+*
+* NOTE: the output qlist should be initialized and each item freed
+*       by the caller, then destroyed.
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids, osm_db_guid2mkey_get
+* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_get
+* NAME
+*	osm_db_guid2mkey_get
+*
+* DESCRIPTION
+*	Get the mkey for the given guid.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 OUT uint64_t * p_mkey);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  p_mkey
+*     [out] Pointer to the resulting mkey in host order.
+*
+* RETURN VALUES
+*	0 if successful. The lid will be set to 0 if not found.
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_set, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_set
+* NAME
+*	osm_db_guid2mkey_set
+*
+* DESCRIPTION
+*	Set the mkey for the given guid.
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 IN uint64_t mkey);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  mkey
+*     [in] The mkey value to set, in host order
+*
+* RETURN VALUES
+*	0 if successful
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_get, osm_db_guid2mkey_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_guid2mkey_delete
+* NAME
+*	osm_db_guid2mkey_delete
+*
+* DESCRIPTION
+*	Delete the entry by the given guid
+*
+* SYNOPSIS
+*/
+int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
+/*
+* PARAMETERS
+*	p_g2m
+*		[in] Pointer to the guid2mkey domain
+*
+*  guid
+*     [in] The guid to look for
+*
+* RETURN VALUES
+*	0 if successful otherwise 1
+*
+* SEE ALSO
+* osm_db_guid2mkey_init, osm_db_guid2mkey_guids
+* osm_db_guid2mkey_get, osm_db_guid2mkey_set
+*********/
+
 END_C_DECLS
 #endif				/* _OSM_DB_PACK_H_ */
diff --git a/include/opensm/osm_port.h b/include/opensm/osm_port.h
index e06483a..5fc186c 100644
--- a/include/opensm/osm_port.h
+++ b/include/opensm/osm_port.h
@@ -66,6 +66,7 @@ BEGIN_C_DECLS
 struct osm_port;
 struct osm_node;
 struct osm_mgrp;
+struct osm_sm;
 
 /****h* OpenSM/Physical Port
 * NAME
@@ -449,22 +450,9 @@ static inline void osm_physp_set_health(IN osm_physp_t * p_physp,
 *
 * SYNOPSIS
 */
-static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
-					   IN const ib_port_info_t * p_pi)
-{
-	CL_ASSERT(p_pi);
-	CL_ASSERT(osm_physp_is_valid(p_physp));
-
-	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
-		/* If PortState is down, only copy PortState */
-		/* and PortPhysicalState per C14-24-2.1 */
-		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
-		ib_port_info_set_port_phys_state
-		    (ib_port_info_get_port_phys_state(p_pi),
-		     &p_physp->port_info);
-	} else
-		p_physp->port_info = *p_pi;
-}
+void osm_physp_set_port_info(IN osm_physp_t * p_physp,
+					   IN const ib_port_info_t * p_pi,
+					   IN const struct osm_sm * p_sm);
 
 /*
 * PARAMETERS
@@ -474,6 +462,9 @@ static inline void osm_physp_set_port_info(IN osm_physp_t * p_physp,
 *	p_pi
 *		[in] Pointer to the IBA defined PortInfo at this port number.
 *
+*	p_sm
+*		[in] Pointer to an osm_sm_t object.
+*
 * RETURN VALUES
 *	This function does not return a value.
 *
diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index 724497d..0061f3a 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -54,6 +54,7 @@
 #include <complib/cl_list.h>
 #include <opensm/osm_base.h>
 #include <opensm/osm_prefix_route.h>
+#include <opensm/osm_db.h>
 #include <stdio.h>
 
 #ifdef __cplusplus
@@ -756,6 +757,7 @@ typedef struct osm_subn {
 	boolean_t sweeping_enabled;
 	unsigned need_update;
 	cl_fmap_t mgrp_mgid_tbl;
+	osm_db_domain_t *p_g2m;
 	void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
 } osm_subn_t;
 /*
diff --git a/opensm/osm_db_files.c b/opensm/osm_db_files.c
index 7ab6b56..9f338f3 100644
--- a/opensm/osm_db_files.c
+++ b/opensm/osm_db_files.c
@@ -50,6 +50,7 @@
 #define FILE_ID OSM_FILE_DB_FILES_C
 #include <opensm/st.h>
 #include <opensm/osm_db.h>
+#include <opensm/osm_log.h>
 
 /****d* Database/OSM_DB_MAX_LINE_LEN
  * NAME
diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
index c1ec4ab..57c3a66 100644
--- a/opensm/osm_db_pack.c
+++ b/opensm/osm_db_pack.c
@@ -85,6 +85,17 @@ static inline int unpack_lids(IN char *p_lid_str, OUT uint16_t * p_min_lid,
 	return 0;
 }
 
+static inline void pack_mkey(uint64_t mkey, char *p_mkey_str)
+{
+	sprintf(p_mkey_str, "0x%016" PRIx64, mkey);
+}
+
+static inline uint64_t unpack_mkey(char *p_mkey_str)
+{
+	return strtoull(p_mkey_str, NULL, 0);
+}
+
+
 int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
 			  OUT cl_qlist_t * p_guid_list)
 {
@@ -151,3 +162,65 @@ int osm_db_guid2lid_delete(IN osm_db_domain_t * p_g2l, IN uint64_t guid)
 	pack_guid(guid, guid_str);
 	return osm_db_delete(p_g2l, guid_str);
 }
+
+int osm_db_guid2mkey_guids(IN osm_db_domain_t * p_g2m,
+			   OUT cl_qlist_t * p_guid_list)
+{
+	char *p_key;
+	cl_list_t keys;
+	osm_db_guid_elem_t *p_guid_elem;
+
+	cl_list_construct(&keys);
+	cl_list_init(&keys, 10);
+
+	if (osm_db_keys(p_g2m, &keys))
+		return 1;
+
+	while ((p_key = cl_list_remove_head(&keys)) != NULL) {
+		p_guid_elem =
+		    (osm_db_guid_elem_t *) malloc(sizeof(osm_db_guid_elem_t));
+		CL_ASSERT(p_guid_elem != NULL);
+
+		p_guid_elem->guid = unpack_guid(p_key);
+		cl_qlist_insert_head(p_guid_list, &p_guid_elem->item);
+	}
+
+	cl_list_destroy(&keys);
+	return 0;
+}
+
+int osm_db_guid2mkey_get(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 OUT uint64_t * p_mkey)
+{
+	char guid_str[20];
+	char *p_mkey_str;
+
+	pack_guid(guid, guid_str);
+	p_mkey_str = osm_db_lookup(p_g2m, guid_str);
+	if (!p_mkey_str)
+		return 1;
+
+	if (p_mkey)
+		*p_mkey = unpack_mkey(p_mkey_str);
+
+	return 0;
+}
+
+int osm_db_guid2mkey_set(IN osm_db_domain_t * p_g2m, IN uint64_t guid,
+			 IN uint64_t mkey)
+{
+	char guid_str[20];
+	char mkey_str[20];
+
+	pack_guid(guid, guid_str);
+	pack_mkey(mkey, mkey_str);
+
+	return osm_db_update(p_g2m, guid_str, mkey_str);
+}
+
+int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
+{
+	char guid_str[20];
+	pack_guid(guid, guid_str);
+	return osm_db_delete(p_g2m, guid_str);
+}
diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
index cb7ff0b..7799ee3 100644
--- a/opensm/osm_lid_mgr.c
+++ b/opensm/osm_lid_mgr.c
@@ -800,6 +800,7 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 	uint8_t op_vls;
 	uint8_t port_num;
 	boolean_t send_set = FALSE;
+	boolean_t update_mkey = FALSE;
 	int ret = 0;
 
 	OSM_LOG_ENTER(p_mgr->p_log);
@@ -862,8 +863,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 		send_set = TRUE;
 
 	p_pi->m_key = p_mgr->p_subn->opt.m_key;
-	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key)))
+	if (memcmp(&p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key))) {
+		update_mkey = TRUE;
 		send_set = TRUE;
+	}
 
 	p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix;
 	if (memcmp(&p_pi->subnet_prefix, &p_old_pi->subnet_prefix,
@@ -1053,6 +1056,13 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 			     CL_DISP_MSGID_NONE, &context);
 	if (status != IB_SUCCESS)
 		ret = -1;
+	/* If we sent a new mkey above, update our guid2mkey map
+	   now, on the assumption that the SubnSet succeeds
+	*/
+	if (update_mkey)
+		osm_db_guid2mkey_set(p_mgr->p_subn->p_g2m,
+				     cl_ntoh64(p_physp->port_guid),
+				     cl_ntoh64(p_pi->m_key));
 
 Exit:
 	OSM_LOG_EXIT(p_mgr->p_log);
diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
index 8301643..b700977 100644
--- a/opensm/osm_link_mgr.c
+++ b/opensm/osm_link_mgr.c
@@ -56,6 +56,7 @@
 #include <opensm/osm_helper.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db_pack.h>
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
@@ -104,6 +105,7 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 	int qdr_change = 0, fdr10_change = 0;
 	int ret = 0;
 	ib_net32_t attr_mod, cap_mask;
+	boolean_t update_mkey = FALSE;
 
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -194,8 +196,10 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 		    port_num == 0) {
 			p_pi->m_key = sm->p_subn->opt.m_key;
 			if (memcmp(&p_pi->m_key, &p_old_pi->m_key,
-				   sizeof(p_pi->m_key)))
+				   sizeof(p_pi->m_key))) {
+				update_mkey = TRUE;
 				send_set = TRUE;
+			}
 
 			p_pi->subnet_prefix = sm->p_subn->opt.subnet_prefix;
 			if (memcmp(&p_pi->subnet_prefix,
@@ -466,6 +470,14 @@ Send:
 	if (status)
 		ret = -1;
 
+	/* If we sent a new mkey above, update our guid2mkey map
+	   now, on the assumption that the SubnSet succeeds
+	 */
+	if (update_mkey)
+		osm_db_guid2mkey_set(sm->p_subn->p_g2m,
+				     cl_ntoh64(p_physp->port_guid),
+				     cl_ntoh64(p_pi->m_key));
+
 	if (send_set2) {
 		status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_physp),
 				     payload2, sizeof(payload2),
diff --git a/opensm/osm_opensm.c b/opensm/osm_opensm.c
index d648c6c..0909a36 100644
--- a/opensm/osm_opensm.c
+++ b/opensm/osm_opensm.c
@@ -416,6 +416,11 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
 	if (status != IB_SUCCESS)
 		goto Exit;
 
+	/* the DB is in use by subn so init before */
+	status = osm_db_init(&p_osm->db, &p_osm->log);
+	if (status != IB_SUCCESS)
+		goto Exit;
+
 	status = osm_subn_init(&p_osm->subn, p_osm, p_opt);
 	if (status != IB_SUCCESS)
 		goto Exit;
@@ -438,11 +443,6 @@ ib_api_status_t osm_opensm_init(IN osm_opensm_t * p_osm,
 	if (status != IB_SUCCESS)
 		goto Exit;
 
-	/* the DB is in use by the SM and SA so init before */
-	status = osm_db_init(&p_osm->db, &p_osm->log);
-	if (status != IB_SUCCESS)
-		goto Exit;
-
 	status = osm_sm_init(&p_osm->sm, &p_osm->subn, &p_osm->db,
 			     p_osm->p_vendor, &p_osm->mad_pool, &p_osm->vl15,
 			     &p_osm->log, &p_osm->stats, &p_osm->disp,
diff --git a/opensm/osm_port.c b/opensm/osm_port.c
index 88b9fd8..6e73e66 100644
--- a/opensm/osm_port.c
+++ b/opensm/osm_port.c
@@ -54,6 +54,8 @@
 #include <opensm/osm_node.h>
 #include <opensm/osm_madw.h>
 #include <opensm/osm_switch.h>
+#include <opensm/osm_db_pack.h>
+#include <opensm/osm_sm.h>
 
 void osm_physp_construct(IN osm_physp_t * p_physp)
 {
@@ -659,3 +661,33 @@ void osm_alias_guid_delete(IN OUT osm_alias_guid_t ** pp_alias_guid)
 	free(*pp_alias_guid);
 	*pp_alias_guid = NULL;
 }
+
+void osm_physp_set_port_info(IN osm_physp_t * p_physp,
+					   IN const ib_port_info_t * p_pi,
+					   IN const struct osm_sm * p_sm)
+{
+	CL_ASSERT(p_pi);
+	CL_ASSERT(osm_physp_is_valid(p_physp));
+
+	if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) {
+		/* If PortState is down, only copy PortState */
+		/* and PortPhysicalState per C14-24-2.1 */
+		ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN);
+		ib_port_info_set_port_phys_state
+		    (ib_port_info_get_port_phys_state(p_pi),
+		     &p_physp->port_info);
+	} else {
+		p_physp->port_info = *p_pi;
+
+		/* The MKey in p_pi can only be considered valid if it's
+		 * for a HCA/router or switch port 0, and it's either
+		 * non-zero or the MKeyProtect bits are also zero.
+		 */
+		if ((osm_node_get_type(p_physp->p_node) !=
+		     IB_NODE_TYPE_SWITCH || p_physp->port_num == 0) &&
+		    (p_pi->m_key != 0 || ib_port_info_get_mpb(p_pi) == 0))
+			osm_db_guid2mkey_set(p_sm->p_subn->p_g2m,
+					     cl_ntoh64(p_physp->port_guid),
+					     cl_ntoh64(p_pi->m_key));
+	}
+}
diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
index ab7418b..00cbfc7 100644
--- a/opensm/osm_port_info_rcv.c
+++ b/opensm/osm_port_info_rcv.c
@@ -312,7 +312,7 @@ static void pi_rcv_process_switch_port(IN osm_sm_t * sm, IN osm_node_t * p_node,
 	/*
 	   Update the PortInfo attribute.
 	 */
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	if (port_num == 0) {
 		/* Determine if base switch port 0 */
@@ -337,7 +337,7 @@ static void pi_rcv_process_ca_or_router_port(IN osm_sm_t * sm,
 
 	pi_rcv_check_and_fix_lid(sm->p_log, p_pi, p_physp);
 
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	pi_rcv_process_endport(sm, p_physp, p_pi);
 
@@ -475,7 +475,7 @@ static void pi_rcv_process_set(IN osm_sm_t * sm, IN osm_node_t * p_node,
 		cl_ntoh64(osm_node_get_node_guid(p_node)),
 		cl_ntoh64(p_smp->trans_id));
 
-	osm_physp_set_port_info(p_physp, p_pi);
+	osm_physp_set_port_info(p_physp, p_pi, sm);
 
 	OSM_LOG_EXIT(sm->p_log);
 }
diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index 2532f9c..51220f3 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -58,6 +58,7 @@
 #include <opensm/osm_vl15intf.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
+#include <opensm/osm_db_pack.h>
 
 /**********************************************************************
   The plock MAY or MAY NOT be held before calling this function.
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 4d762a3..d9563ee 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -67,6 +67,7 @@
 #include <opensm/osm_inform.h>
 #include <opensm/osm_opensm.h>
 #include <opensm/osm_congestion_control.h>
+#include <opensm/osm_db.h>
 
 extern void osm_drop_mgr_process(IN osm_sm_t * sm);
 extern int osm_qos_setup(IN osm_opensm_t * p_osm);
@@ -1453,6 +1454,9 @@ repeat_discovery:
 	if (sm->p_subn->force_heavy_sweep
 	    || sm->p_subn->subnet_initialization_error)
 		osm_sm_signal(sm, OSM_SIGNAL_SWEEP);
+
+	/* Write a new copy of our persistent guid2mkey database */
+	osm_db_store(sm->p_subn->p_g2m);
 }
 
 static void do_process_mgrp_queue(osm_sm_t * sm)
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 2358140..3e461f5 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -76,6 +76,8 @@
 #include <opensm/osm_event_plugin.h>
 #include <opensm/osm_qos_policy.h>
 #include <opensm/osm_service.h>
+#include <opensm/osm_db.h>
+#include <opensm/osm_db_pack.h>
 
 static const char null_str[] = "(null)";
 
@@ -844,6 +846,52 @@ static int compar_mgids(const void *m1, const void *m2)
 	return memcmp(m1, m2, sizeof(ib_gid_t));
 }
 
+static void subn_validate_g2m(osm_subn_t *p_subn)
+{
+	cl_qlist_t guids;
+	osm_db_guid_elem_t *p_item;
+	uint64_t mkey;
+	boolean_t valid_entry;
+
+	OSM_LOG_ENTER(&(p_subn->p_osm->log));
+	cl_qlist_init(&guids);
+
+	if (osm_db_guid2mkey_guids(p_subn->p_g2m, &guids)) {
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7506: "
+			"could not get mkey guid list\n");
+		goto Exit;
+	}
+
+	while ((p_item = (osm_db_guid_elem_t *) cl_qlist_remove_head(&guids))
+	       != (osm_db_guid_elem_t *) cl_qlist_end(&guids)) {
+		valid_entry = TRUE;
+
+		if (p_item->guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7507: found invalid zero guid");
+			valid_entry = FALSE;
+		} else if (osm_db_guid2mkey_get(p_subn->p_g2m, p_item->guid,
+						&mkey)) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7508: could not get mkey for guid:0x%016"
+				PRIx64 "\n", p_item->guid);
+			valid_entry = FALSE;
+		}
+
+		if (valid_entry == FALSE) {
+			if (osm_db_guid2mkey_delete(p_subn->p_g2m,
+						    p_item->guid))
+				OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+					"ERR 7509: failed to delete entry for "
+					"guid:0x%016" PRIx64 "\n",
+					p_item->guid);
+		}
+	}
+
+Exit:
+	OSM_LOG_EXIT(&(p_subn->p_osm->log));
+}
+
 void osm_subn_construct(IN osm_subn_t * p_subn)
 {
 	memset(p_subn, 0, sizeof(*p_subn));
@@ -1052,6 +1100,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 	p_subn->sweeping_enabled = TRUE;
 	p_subn->last_sm_port_state = 1;
 
+	/* Initialize the guid2mkey database */
+	p_subn->p_g2m = osm_db_domain_init(&(p_osm->db), "guid2mkey");
+	if (!p_subn->p_g2m) {
+		OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7510: "
+			"Error initializing Guid-to-MKey persistent database\n");
+		return IB_ERROR;
+	}
+
+	if (osm_db_restore(p_subn->p_g2m)) {
+#ifndef __WIN__
+		/*
+		 * When Windows is BSODing, it might corrupt files that
+		 * were previously opened for writing, even if the files
+		 * are closed, so we might see corrupted guid2mkey file.
+		 */
+		if (p_subn->opt.exit_on_fatal) {
+			osm_log(&(p_osm->log), OSM_LOG_SYS,
+				"FATAL: Error restoring Guid-to-Mkey "
+				"persistent database\n");
+			return IB_ERROR;
+		} else
+#endif
+			OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
+				"ERR 7511: Error restoring Guid-to-Mkey "
+				"persistent database\n");
+	}
+
+	subn_validate_g2m(p_subn);
+
 	return IB_SUCCESS;
 }
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/9 v2] opensm: Allow recovery of subnets with misset mkeys
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 3/9 v2] opensm: Add locking where necessary around osm_req_* Jim Foraker
                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

Allow the initialization of endpoints that already have an mkey
configured that is different than that listed in the configuration
file.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_req.c |  101 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 93 insertions(+), 8 deletions(-)

diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index 51220f3..d397b14 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -61,7 +61,86 @@
 #include <opensm/osm_db_pack.h>
 
 /**********************************************************************
-  The plock MAY or MAY NOT be held before calling this function.
+  The plock must be held before calling this function.
+**********************************************************************/
+static ib_net64_t req_determine_mkey(IN osm_sm_t * sm,
+				     IN const osm_dr_path_t * p_path)
+{
+	osm_node_t *p_node;
+	osm_port_t *p_sm_port;
+	osm_physp_t *p_physp;
+	ib_net64_t dest_port_guid, m_key;
+	uint8_t hop;
+
+	OSM_LOG_ENTER(sm->p_log);
+
+	p_physp = NULL;
+
+	p_sm_port = osm_get_port_by_guid(sm->p_subn, sm->p_subn->sm_port_guid);
+
+	/* hop_count == 0: destination port guid is SM */
+	if (p_path->hop_count == 0) {
+		if (p_sm_port != NULL)
+			dest_port_guid = sm->p_subn->sm_port_guid;
+		else
+			dest_port_guid = sm->p_subn->opt.guid;
+		goto Remote_Guid;
+	}
+
+	if (p_sm_port)
+		p_physp = p_sm_port->p_physp;
+
+	/* hop_count == 1: outgoing physp is SM physp */
+	for (hop = 2; p_physp && hop <= p_path->hop_count; hop++) {
+		p_physp = p_physp->p_remote_physp;
+		if (!p_physp)
+			break;
+		p_node = p_physp->p_node;
+		p_physp = osm_node_get_physp_ptr(p_node, p_path->path[hop]);
+	}
+
+	/* At this point, p_physp points at the outgoing physp on the
+	   last hop, or NULL if we don't know it.
+	*/
+	if (!p_physp) {
+		OSM_LOG(sm->p_log, OSM_LOG_ERROR,
+			"req_determine_mkey: ERR 1107: "
+			"Outgoing physp is null on non-hop_0!\n");
+		dest_port_guid = 0;
+		goto Remote_Guid;
+	}
+
+	if (p_physp->p_remote_physp) {
+		dest_port_guid = p_physp->p_remote_physp->port_guid;
+		goto Remote_Guid;
+	}
+
+Remote_Guid:
+	if (dest_port_guid) {
+		if (!osm_db_guid2mkey_get(sm->p_subn->p_g2m,
+					  cl_ntoh64(dest_port_guid), &m_key)) {
+			m_key = cl_hton64(m_key);
+			OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+				"Found mkey for guid 0x%"
+				PRIx64 "\n", cl_ntoh64(dest_port_guid));
+		} else {
+			OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+				"Target port mkey unknown, using default\n");
+			m_key = sm->p_subn->opt.m_key;
+		}
+	} else {
+		OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
+			"Target port guid unknown, using default\n");
+		m_key = sm->p_subn->opt.m_key;
+	}
+
+	OSM_LOG_EXIT(sm->p_log);
+
+	return m_key;
+}
+
+/**********************************************************************
+  The plock must be held before calling this function.
 **********************************************************************/
 ib_api_status_t osm_req_get(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 			    IN ib_net16_t attr_id, IN ib_net32_t attr_mod,
@@ -71,6 +150,7 @@ ib_api_status_t osm_req_get(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	osm_madw_t *p_madw;
 	ib_api_status_t status = IB_SUCCESS;
 	ib_net64_t tid;
+	ib_net64_t m_key;
 
 	CL_ASSERT(sm);
 
@@ -95,15 +175,17 @@ ib_api_status_t osm_req_get(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	}
 
 	tid = cl_hton64((uint64_t) cl_atomic_inc(&sm->sm_trans_id));
+	m_key = req_determine_mkey(sm, p_path);
 
 	OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
-		"Getting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64 "\n",
+		"Getting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64
+		", MKey 0x%016" PRIx64 "\n",
 		ib_get_sm_attr_str(attr_id), cl_ntoh16(attr_id),
-		cl_ntoh32(attr_mod), cl_ntoh64(tid));
+		cl_ntoh32(attr_mod), cl_ntoh64(tid), cl_ntoh64(m_key));
 
 	ib_smp_init_new(osm_madw_get_smp_ptr(p_madw), IB_MAD_METHOD_GET,
 			tid, attr_id, attr_mod, p_path->hop_count,
-			sm->p_subn->opt.m_key, p_path->path,
+			m_key, p_path->path,
 			IB_LID_PERMISSIVE, IB_LID_PERMISSIVE);
 
 	p_madw->mad_addr.dest_lid = IB_LID_PERMISSIVE;
@@ -128,7 +210,7 @@ Exit:
 }
 
 /**********************************************************************
-  The plock MAY or MAY NOT be held before calling this function.
+  The plock must be held before calling this function.
 **********************************************************************/
 ib_api_status_t osm_req_set(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 			    IN const uint8_t * p_payload,
@@ -140,6 +222,7 @@ ib_api_status_t osm_req_set(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	osm_madw_t *p_madw;
 	ib_api_status_t status = IB_SUCCESS;
 	ib_net64_t tid;
+	ib_net64_t m_key;
 
 	CL_ASSERT(sm);
 
@@ -165,15 +248,17 @@ ib_api_status_t osm_req_set(IN osm_sm_t * sm, IN const osm_dr_path_t * p_path,
 	}
 
 	tid = cl_hton64((uint64_t) cl_atomic_inc(&sm->sm_trans_id));
+	m_key = req_determine_mkey(sm, p_path);
 
 	OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
-		"Setting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64 "\n",
+		"Setting %s (0x%X), modifier 0x%X, TID 0x%" PRIx64
+		", MKey 0x%016" PRIx64 "\n",
 		ib_get_sm_attr_str(attr_id), cl_ntoh16(attr_id),
-		cl_ntoh32(attr_mod), cl_ntoh64(tid));
+		cl_ntoh32(attr_mod), cl_ntoh64(tid), cl_ntoh64(m_key));
 
 	ib_smp_init_new(osm_madw_get_smp_ptr(p_madw), IB_MAD_METHOD_SET,
 			tid, attr_id, attr_mod, p_path->hop_count,
-			sm->p_subn->opt.m_key, p_path->path,
+			m_key, p_path->path,
 			IB_LID_PERMISSIVE, IB_LID_PERMISSIVE);
 
 	p_madw->mad_addr.dest_lid = IB_LID_PERMISSIVE;
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/9 v2] opensm: Add locking where necessary around osm_req_*
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-08-01 14:52           ` [PATCH 2/9 v2] opensm: Allow recovery of subnets with misset mkeys Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 4/9 v2] opensm: Add support for setting mkey protection levels Jim Foraker
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

Grabs plock for reading in the places where one did not
already exist when osm_req_get/osm_req_set are called.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_perfmgr.c      |    6 ++++++
 opensm/osm_sm_state_mgr.c |    2 ++
 opensm/osm_state_mgr.c    |    8 ++++++++
 opensm/osm_trap_rcv.c     |    6 +++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
index 9083ea9..3b36ef6 100644
--- a/opensm/osm_perfmgr.c
+++ b/opensm/osm_perfmgr.c
@@ -622,8 +622,10 @@ static int sweep_hop_1(osm_sm_t * sm)
 		path_array[1] = port_num;
 
 		osm_dr_path_init(&hop_1_path, 1, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &hop_1_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, &context);
+		CL_PLOCK_RELEASE(sm->p_lock);
 
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 4C82: "
@@ -654,9 +656,11 @@ static int sweep_hop_1(osm_sm_t * sm)
 			path_array[1] = port_num;
 
 			osm_dr_path_init(&hop_1_path, 1, path_array);
+			CL_PLOCK_ACQUIRE(sm->p_lock);
 			status = osm_req_get(sm, &hop_1_path,
 					     IB_MAD_ATTR_NODE_INFO, 0,
 					     CL_DISP_MSGID_NONE, &context);
+			CL_PLOCK_RELEASE(sm->p_lock);
 
 			if (status != IB_SUCCESS)
 				OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 4C84: "
@@ -716,8 +720,10 @@ static int sweep_hop_0(osm_sm_t * sm)
 	}
 
 	osm_dr_path_init(&dr_path, 0, path_array);
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_NODE_INFO, 0,
 			     CL_DISP_MSGID_NONE, NULL);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR,
diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
index e826f1f..061a0f2 100644
--- a/opensm/osm_sm_state_mgr.c
+++ b/opensm/osm_sm_state_mgr.c
@@ -109,9 +109,11 @@ static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm)
 	context.smi_context.port_guid = p_port->guid;
 	context.smi_context.set_method = FALSE;
 
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_get(sm, osm_physp_get_dr_path_ptr(p_port->p_physp),
 			     IB_MAD_ATTR_SM_INFO, 0, CL_DISP_MSGID_NONE,
 			     &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3204: "
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index d9563ee..63f7347 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -240,8 +240,10 @@ static ib_api_status_t state_mgr_sweep_hop_0(IN osm_sm_t * sm)
 		CL_PLOCK_RELEASE(sm->p_lock);
 
 		osm_dr_path_init(&dr_path, 0, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &dr_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, NULL);
+		CL_PLOCK_RELEASE(sm->p_lock);
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3305: "
 				"Request for NodeInfo failed (%s)\n",
@@ -433,8 +435,10 @@ static ib_api_status_t state_mgr_sweep_hop_1(IN osm_sm_t * sm)
 		path_array[1] = port_num;
 
 		osm_dr_path_init(&hop_1_path, 1, path_array);
+		CL_PLOCK_ACQUIRE(sm->p_lock);
 		status = osm_req_get(sm, &hop_1_path, IB_MAD_ATTR_NODE_INFO, 0,
 				     CL_DISP_MSGID_NONE, &context);
+		CL_PLOCK_RELEASE(sm->p_lock);
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3311: "
 				"Request for NodeInfo failed (%s)\n",
@@ -462,10 +466,12 @@ static ib_api_status_t state_mgr_sweep_hop_1(IN osm_sm_t * sm)
 
 				path_array[1] = port_num;
 				osm_dr_path_init(&hop_1_path, 1, path_array);
+				CL_PLOCK_ACQUIRE(sm->p_lock);
 				status = osm_req_get(sm, &hop_1_path,
 						     IB_MAD_ATTR_NODE_INFO, 0,
 						     CL_DISP_MSGID_NONE,
 						     &context);
+				CL_PLOCK_RELEASE(sm->p_lock);
 				if (status != IB_SUCCESS)
 					OSM_LOG(sm->p_log, OSM_LOG_ERROR,
 						"ERR 3312: "
@@ -812,10 +818,12 @@ static void state_mgr_send_handover(IN osm_sm_t * sm, IN osm_remote_sm_t * p_sm)
 		p_smi->sm_key = 0;
 	}
 
+	CL_PLOCK_ACQUIRE(sm->p_lock);
 	status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p_port->p_physp),
 			     payload, sizeof(payload), IB_MAD_ATTR_SM_INFO,
 			     IB_SMINFO_ATTR_MOD_HANDOVER, CL_DISP_MSGID_NONE,
 			     &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
 
 	if (status != IB_SUCCESS)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3317: "
diff --git a/opensm/osm_trap_rcv.c b/opensm/osm_trap_rcv.c
index 89caa39..4a59cfe 100644
--- a/opensm/osm_trap_rcv.c
+++ b/opensm/osm_trap_rcv.c
@@ -213,6 +213,7 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p)
 	uint8_t payload[IB_SMP_DATA_SIZE];
 	osm_madw_context_t context;
 	ib_port_info_t *pi = (ib_port_info_t *)payload;
+	ib_api_status_t status;
 
 	/* select the nearest port to master opensm */
 	if (p->p_remote_physp &&
@@ -235,10 +236,13 @@ static int disable_port(osm_sm_t *sm, osm_physp_t *p)
 	context.pi_context.light_sweep = FALSE;
 	context.pi_context.active_transition = FALSE;
 
-	return osm_req_set(sm, osm_physp_get_dr_path_ptr(p),
+	CL_PLOCK_ACQUIRE(sm->p_lock);
+	status = osm_req_set(sm, osm_physp_get_dr_path_ptr(p),
 			   payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO,
 			   cl_hton32(osm_physp_get_port_num(p)),
 			   CL_DISP_MSGID_NONE, &context);
+	CL_PLOCK_RELEASE(sm->p_lock);
+	return status;
 }
 
 static void log_trap_info(osm_log_t *p_log, ib_mad_notice_attr_t *p_ntci,
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9 v2] opensm: Add support for setting mkey protection levels
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
  2012-08-01 14:52           ` [PATCH 2/9 v2] opensm: Allow recovery of subnets with misset mkeys Jim Foraker
  2012-08-01 14:52           ` [PATCH 3/9 v2] opensm: Add locking where necessary around osm_req_* Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 5/9 v2] opensm: Log errors on SubnGet timeouts Jim Foraker
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

The M_Key Protection Level for the subnet may now be set
in the config file by specifying a numeric value for
m_key_protection_level (defaults to 0).

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_subnet.h |    1 +
 opensm/osm_lid_mgr.c        |   21 +++++++++++----------
 opensm/osm_link_mgr.c       |    2 +-
 opensm/osm_subnet.c         |    5 +++++
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index 0061f3a..ee97f14 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -250,6 +250,7 @@ typedef struct osm_subn_opt {
 	ib_net64_t sa_key;
 	ib_net64_t subnet_prefix;
 	ib_net16_t m_key_lease_period;
+	uint8_t m_key_protect_bits;
 	uint32_t sweep_interval;
 	uint32_t max_wire_smps;
 	uint32_t max_wire_smps2;
diff --git a/opensm/osm_lid_mgr.c b/opensm/osm_lid_mgr.c
index 7799ee3..aa48eab 100644
--- a/opensm/osm_lid_mgr.c
+++ b/opensm/osm_lid_mgr.c
@@ -889,6 +889,11 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 		   sizeof(p_pi->m_key_lease_period)))
 		send_set = TRUE;
 
+	p_pi->mkey_lmc = 0;
+	ib_port_info_set_mpb(p_pi, p_mgr->p_subn->opt.m_key_protect_bits);
+	if (ib_port_info_get_mpb(p_pi) != ib_port_info_get_mpb(p_old_pi))
+		send_set = TRUE;
+
 	/*
 	   we want to set the timeout for both the switch port 0
 	   and the CA ports
@@ -910,12 +915,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 			   sizeof(p_pi->link_width_enabled)))
 			send_set = TRUE;
 
-		/* M_KeyProtectBits are currently always zero */
-		p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc;
+		/* p_pi->mkey_lmc is initialized earlier */
+		ib_port_info_set_lmc(p_pi, p_mgr->p_subn->opt.lmc);
 		if (ib_port_info_get_lmc(p_pi) !=
-		    ib_port_info_get_lmc(p_old_pi) ||
-		    ib_port_info_get_mpb(p_pi) !=
-		    ib_port_info_get_mpb(p_old_pi))
+		    ib_port_info_get_lmc(p_old_pi))
 			send_set = TRUE;
 
 		/* calc new op_vls and mtu */
@@ -996,12 +999,10 @@ static int lid_mgr_set_physp_pi(IN osm_lid_mgr_t * p_mgr,
 
 		/* Determine if enhanced switch port 0 and if so set LMC */
 		if (osm_switch_sp0_is_lmc_capable(p_node->sw, p_mgr->p_subn)) {
-			/* M_KeyProtectBits are currently always zero */
-			p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc;
+			/* p_pi->mkey_lmc is initialized earlier */
+			ib_port_info_set_lmc(p_pi, p_mgr->p_subn->opt.lmc);
 			if (ib_port_info_get_lmc(p_pi) !=
-			    ib_port_info_get_lmc(p_old_pi) ||
-			    ib_port_info_get_mpb(p_pi) !=
-			    ib_port_info_get_mpb(p_old_pi))
+			    ib_port_info_get_lmc(p_old_pi))
 				send_set = TRUE;
 		}
 	}
diff --git a/opensm/osm_link_mgr.c b/opensm/osm_link_mgr.c
index b700977..7799c46 100644
--- a/opensm/osm_link_mgr.c
+++ b/opensm/osm_link_mgr.c
@@ -240,8 +240,8 @@ static int link_mgr_set_physp_pi(osm_sm_t * sm, IN osm_physp_t * p_physp,
 				   sizeof(p_pi->m_key_lease_period)))
 				send_set = TRUE;
 
-			/* M_KeyProtectBits are currently always zero */
 			p_pi->mkey_lmc = 0;
+			ib_port_info_set_mpb(p_pi, sm->p_subn->opt.m_key_protect_bits);
 			if (esp0 == FALSE || sm->p_subn->opt.lmc_esp0)
 				ib_port_info_set_lmc(p_pi, sm->p_subn->opt.lmc);
 			if (ib_port_info_get_lmc(p_old_pi) !=
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 3e461f5..a4c5150 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -699,6 +699,7 @@ static const opt_rec_t opt_tbl[] = {
 	{ "sa_key", OPT_OFFSET(sa_key), opts_parse_net64, NULL, 1 },
 	{ "subnet_prefix", OPT_OFFSET(subnet_prefix), opts_parse_net64, NULL, 1 },
 	{ "m_key_lease_period", OPT_OFFSET(m_key_lease_period), opts_parse_net16, NULL, 1 },
+	{ "m_key_protection_level", OPT_OFFSET(m_key_protect_bits), opts_parse_uint8, NULL, 1 },
 	{ "sweep_interval", OPT_OFFSET(sweep_interval), opts_parse_uint32, NULL, 1 },
 	{ "max_wire_smps", OPT_OFFSET(max_wire_smps), opts_parse_uint32, NULL, 1 },
 	{ "max_wire_smps2", OPT_OFFSET(max_wire_smps2), opts_parse_uint32, NULL, 1 },
@@ -1318,6 +1319,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
 	p_opt->sa_key = OSM_DEFAULT_SA_KEY;
 	p_opt->subnet_prefix = IB_DEFAULT_SUBNET_PREFIX;
 	p_opt->m_key_lease_period = 0;
+	p_opt->m_key_protect_bits = 0;
 	p_opt->sweep_interval = OSM_DEFAULT_SWEEP_INTERVAL_SECS;
 	p_opt->max_wire_smps = OSM_DEFAULT_SMP_MAX_ON_WIRE;
 	p_opt->max_wire_smps2 = p_opt->max_wire_smps;
@@ -2071,6 +2073,8 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
 		"m_key 0x%016" PRIx64 "\n\n"
 		"# The lease period used for the M_Key on this subnet in [sec]\n"
 		"m_key_lease_period %u\n\n"
+		"# The protection level used for the M_Key on this subnet\n"
+		"m_key_protection_level %u\n\n"
 		"# SM_Key value of the SM used for SM authentication\n"
 		"sm_key 0x%016" PRIx64 "\n\n"
 		"# SM_Key value to qualify rcv SA queries as 'trusted'\n"
@@ -2152,6 +2156,7 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
 		cl_ntoh64(p_opts->guid),
 		cl_ntoh64(p_opts->m_key),
 		cl_ntoh16(p_opts->m_key_lease_period),
+		p_opts->m_key_protect_bits,
 		cl_ntoh64(p_opts->sm_key),
 		cl_ntoh64(p_opts->sa_key),
 		cl_ntoh64(p_opts->subnet_prefix),
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9 v2] opensm: Log errors on SubnGet timeouts
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                             ` (2 preceding siblings ...)
  2012-08-01 14:52           ` [PATCH 4/9 v2] opensm: Add support for setting mkey protection levels Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 6/9 v2] opensm: Add neighboring link cache file Jim Foraker
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

At protection levels >=2, CAs will not respond to SubnGets
that do not have a valid mkey.  We log errors for the timed out
requests.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_sm_mad_ctrl.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
index e4c8d94..c384eca 100644
--- a/opensm/osm_sm_mad_ctrl.c
+++ b/opensm/osm_sm_mad_ctrl.c
@@ -743,6 +743,13 @@ static void sm_mad_ctrl_send_err_cb(IN void *context, IN osm_madw_t * p_madw)
 			cl_ntoh16(p_smp->attr_id),
 			ib_get_sm_attr_str(p_smp->attr_id));
 		p_ctrl->p_subn->subnet_initialization_error = TRUE;
+	} else if (p_madw->status == IB_TIMEOUT &&
+		   p_smp->method == IB_MAD_METHOD_GET) {
+		OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3120 "
+			"Timeout while getting attribute 0x%X (%s); "
+			"Possible mis-set mkey?\n",
+			cl_ntoh16(p_smp->attr_id),
+			ib_get_sm_attr_str(p_smp->attr_id));
 	}
 
 	osm_dump_dr_smp_v2(p_ctrl->p_log, p_smp, FILE_ID, OSM_LOG_VERBOSE);
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9 v2] opensm: Add neighboring link cache file
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                             ` (3 preceding siblings ...)
  2012-08-01 14:52           ` [PATCH 5/9 v2] opensm: Log errors on SubnGet timeouts Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 7/9 v2] opensm: Check for valid mkey protection level in config file Jim Foraker
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

At high mkey protection levels (ie, 2), an initializing OpenSM
may run into a chicken-and-egg problem, where it needs the guid
of a previously-configured HCA in order to determine what mkey to
use when requesting its guid in the NodeInfo SMP.  By cacheing
the guids/port numbers at either end of each link between restarts,
this problem is avoided.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 include/opensm/osm_db_pack.h |  185 ++++++++++++++++++++++++++++++++++++++++++
 include/opensm/osm_subnet.h  |    1 +
 opensm/osm_db_pack.c         |  103 +++++++++++++++++++++++
 opensm/osm_node_info_rcv.c   |   17 +++-
 opensm/osm_req.c             |    9 ++
 opensm/osm_state_mgr.c       |    1 +
 opensm/osm_subnet.c          |  103 +++++++++++++++++++++++
 7 files changed, 418 insertions(+), 1 deletion(-)

diff --git a/include/opensm/osm_db_pack.h b/include/opensm/osm_db_pack.h
index af43ba1..f2d7af2 100644
--- a/include/opensm/osm_db_pack.h
+++ b/include/opensm/osm_db_pack.h
@@ -379,5 +379,190 @@ int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid);
 * osm_db_guid2mkey_get, osm_db_guid2mkey_set
 *********/
 
+/****f* OpenSM: DB-Pack/osm_db_neighbor_init
+* NAME
+*	osm_db_neighbor_init
+*
+* DESCRIPTION
+*	Initialize a domain for the neighbors table
+*
+* SYNOPSIS
+*/
+static inline osm_db_domain_t *osm_db_neighbor_init(IN osm_db_t * p_db)
+{
+	return osm_db_domain_init(p_db, "neighbors");
+}
+
+/*
+* PARAMETERS
+*	p_db
+*		[in] Pointer to the database object to construct
+*
+* RETURN VALUES
+*	The pointer to the new allocated domain object or NULL.
+*
+* NOTE: DB domains are destroyed by the osm_db_destroy
+*
+* SEE ALSO
+*	Database, osm_db_init, osm_db_destroy
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_elem
+* NAME
+*	osm_db_neighbor_elem
+*
+* DESCRIPTION
+*	Initialize a domain for the neighbor table
+*
+* SYNOPSIS
+*/
+typedef struct osm_db_neighbor_elem {
+	cl_list_item_t item;
+	uint64_t guid;
+	uint8_t portnum;
+} osm_db_neighbor_elem_t;
+/*
+* FIELDS
+*	item
+*		required for list manipulations
+*
+*  guid
+*  portnum
+*
+************/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_guids
+* NAME
+*	osm_db_neighbor_guids
+*
+* DESCRIPTION
+*	Provides back a list of neighbor elements.
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_guids(IN osm_db_domain_t * p_neighbor,
+			  OUT cl_qlist_t * p_guid_list);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  p_guid_list
+*     [out] A quick list of neighbor elements of type osm_db_neighbor_elem_t
+*
+* RETURN VALUES
+*	0 if successful
+*
+* NOTE: the output qlist should be initialized and each item freed
+*       by the caller, then destroyed.
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids, osm_db_neighbor_get
+* osm_db_neighbor_set, osm_db_neighbor_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_get
+* NAME
+*	osm_db_neighbor_get
+*
+* DESCRIPTION
+*	Get a neighbor's guid by given guid/port.
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_get(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t port1, OUT uint64_t * p_guid2,
+			OUT uint8_t * p_port2);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  guid1
+*     [in] The guid to look for
+*
+*  port1
+*     [in] The port to look for
+*
+*  p_guid2
+*     [out] Pointer to the resulting guid of the neighboring port.
+*
+*  p_port2
+*     [out] Pointer to the resulting port of the neighboring port.
+*
+* RETURN VALUES
+*	0 if successful. The lid will be set to 0 if not found.
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids
+* osm_db_neighbor_set, osm_db_neighbor_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_set
+* NAME
+*	osm_db_neighbor_set
+*
+* DESCRIPTION
+*	Set up a relationship between two ports
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_set(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t port1, IN uint64_t guid2, IN uint8_t port2);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  guid1
+*     [in] The first guid in the relationship
+*
+*  port1
+*     [in] The first port in the relationship
+*
+*  guid2
+*     [in] The second guid in the relationship
+*
+*  port2
+*     [in] The second port in the relationship
+*
+* RETURN VALUES
+*	0 if successful
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids
+* osm_db_neighbor_get, osm_db_neighbor_delete
+*********/
+
+/****f* OpenSM: DB-Pack/osm_db_neighbor_delete
+* NAME
+*	osm_db_neighbor_delete
+*
+* DESCRIPTION
+*	Delete the relationship between two ports
+*
+* SYNOPSIS
+*/
+int osm_db_neighbor_delete(IN osm_db_domain_t * p_neighbor,
+			   IN uint64_t guid, IN uint8_t port);
+/*
+* PARAMETERS
+*	p_neighbor
+*		[in] Pointer to the neighbor domain
+*
+*  guid
+*     [in] The guid to look for
+*
+*  port
+*     [in] The port to look for
+*
+* RETURN VALUES
+*	0 if successful otherwise 1
+*
+* SEE ALSO
+* osm_db_neighbor_init, osm_db_neighbor_guids
+* osm_db_neighbor_get, osm_db_neighbor_set
+*********/
+
 END_C_DECLS
 #endif				/* _OSM_DB_PACK_H_ */
diff --git a/include/opensm/osm_subnet.h b/include/opensm/osm_subnet.h
index ee97f14..f0d24cb 100644
--- a/include/opensm/osm_subnet.h
+++ b/include/opensm/osm_subnet.h
@@ -759,6 +759,7 @@ typedef struct osm_subn {
 	unsigned need_update;
 	cl_fmap_t mgrp_mgid_tbl;
 	osm_db_domain_t *p_g2m;
+	osm_db_domain_t *p_neighbor;
 	void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
 } osm_subn_t;
 /*
diff --git a/opensm/osm_db_pack.c b/opensm/osm_db_pack.c
index 57c3a66..ea00c31 100644
--- a/opensm/osm_db_pack.c
+++ b/opensm/osm_db_pack.c
@@ -95,6 +95,37 @@ static inline uint64_t unpack_mkey(char *p_mkey_str)
 	return strtoull(p_mkey_str, NULL, 0);
 }
 
+static inline void pack_neighbor(uint64_t guid, uint8_t portnum, char *p_str)
+{
+	sprintf(p_str, "0x%016" PRIx64 ":%u", guid, portnum);
+}
+
+static inline int unpack_neighbor(char *p_str, uint64_t *guid,
+				  uint8_t *portnum)
+{
+	char tmp_str[24];
+	char *p_num, *p_next;
+	unsigned long tmp_port;
+
+	strncpy(tmp_str, p_str, 23);
+	tmp_str[23] = '\0';
+	p_num = strtok_r(tmp_str, ":", &p_next);
+	if (!p_num)
+		return 1;
+	if (guid)
+		*guid = strtoull(p_num, NULL, 0);
+
+	p_num = strtok_r(NULL, ":", &p_next);
+	if (!p_num)
+		return 1;
+	if (portnum) {
+		tmp_port = strtoul(p_num, NULL, 0);
+		CL_ASSERT(tmp_port < 0x100);
+		*portnum = (uint8_t) tmp_port;
+	}
+
+	return 0;
+}
 
 int osm_db_guid2lid_guids(IN osm_db_domain_t * p_g2l,
 			  OUT cl_qlist_t * p_guid_list)
@@ -224,3 +255,75 @@ int osm_db_guid2mkey_delete(IN osm_db_domain_t * p_g2m, IN uint64_t guid)
 	pack_guid(guid, guid_str);
 	return osm_db_delete(p_g2m, guid_str);
 }
+
+int osm_db_neighbor_guids(IN osm_db_domain_t * p_neighbor,
+			  OUT cl_qlist_t * p_neighbor_list)
+{
+	char *p_key;
+	cl_list_t keys;
+	osm_db_neighbor_elem_t *p_neighbor_elem;
+
+	cl_list_construct(&keys);
+	cl_list_init(&keys, 10);
+
+	if (osm_db_keys(p_neighbor, &keys))
+		return 1;
+
+	while ((p_key = cl_list_remove_head(&keys)) != NULL) {
+		p_neighbor_elem =
+		    (osm_db_neighbor_elem_t *) malloc(sizeof(osm_db_neighbor_elem_t));
+		CL_ASSERT(p_neighbor_elem != NULL);
+
+		unpack_neighbor(p_key, &p_neighbor_elem->guid,
+				&p_neighbor_elem->portnum);
+		cl_qlist_insert_head(p_neighbor_list, &p_neighbor_elem->item);
+	}
+
+	cl_list_destroy(&keys);
+	return 0;
+}
+
+int osm_db_neighbor_get(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t portnum1, OUT uint64_t * p_guid2,
+			OUT uint8_t * p_portnum2)
+{
+	char neighbor_str[24];
+	char *p_other_str;
+	uint64_t temp_guid;
+	uint8_t temp_portnum;
+
+	pack_neighbor(guid1, portnum1, neighbor_str);
+	p_other_str = osm_db_lookup(p_neighbor, neighbor_str);
+	if (!p_other_str)
+		return 1;
+	if (unpack_neighbor(p_other_str, &temp_guid, &temp_portnum))
+		return 1;
+
+	if (p_guid2)
+		*p_guid2 = temp_guid;
+	if (p_portnum2)
+		*p_portnum2 = temp_portnum;
+
+	return 0;
+}
+
+int osm_db_neighbor_set(IN osm_db_domain_t * p_neighbor, IN uint64_t guid1,
+			IN uint8_t portnum1, IN uint64_t guid2,
+			IN uint8_t portnum2)
+{
+	char n1_str[24], n2_str[24];
+
+	pack_neighbor(guid1, portnum1, n1_str);
+	pack_neighbor(guid2, portnum2, n2_str);
+
+	return osm_db_update(p_neighbor, n1_str, n2_str);
+}
+
+int osm_db_neighbor_delete(IN osm_db_domain_t * p_neighbor, IN uint64_t guid,
+			   IN uint8_t portnum)
+{
+	char n_str[24];
+
+	pack_neighbor(guid, portnum, n_str);
+	return osm_db_delete(p_neighbor, n_str);
+}
diff --git a/opensm/osm_node_info_rcv.c b/opensm/osm_node_info_rcv.c
index c35aea4..25546d9 100644
--- a/opensm/osm_node_info_rcv.c
+++ b/opensm/osm_node_info_rcv.c
@@ -63,6 +63,7 @@
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
 #include <opensm/osm_ucast_mgr.h>
+#include <opensm/osm_db_pack.h>
 
 static void report_duplicated_guid(IN osm_sm_t * sm, osm_physp_t * p_physp,
 				   osm_node_t * p_neighbor_node,
@@ -134,7 +135,7 @@ static void ni_rcv_set_links(IN osm_sm_t * sm, osm_node_t * p_node,
 			     const osm_ni_context_t * p_ni_context)
 {
 	osm_node_t *p_neighbor_node;
-	osm_physp_t *p_physp;
+	osm_physp_t *p_physp, *p_remote_physp;
 
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -245,6 +246,20 @@ static void ni_rcv_set_links(IN osm_sm_t * sm, osm_node_t * p_node,
 	osm_node_link(p_node, port_num, p_neighbor_node,
 		      p_ni_context->port_num);
 
+	p_physp = osm_node_get_physp_ptr(p_node, port_num);
+	p_remote_physp = osm_node_get_physp_ptr(p_neighbor_node,
+						p_ni_context->port_num);
+	osm_db_neighbor_set(sm->p_subn->p_neighbor,
+			    cl_ntoh64(osm_physp_get_port_guid(p_physp)),
+			    port_num,
+			    cl_ntoh64(osm_physp_get_port_guid(p_remote_physp)),
+			    p_ni_context->port_num);
+	osm_db_neighbor_set(sm->p_subn->p_neighbor,
+			    cl_ntoh64(osm_physp_get_port_guid(p_remote_physp)),
+			    p_ni_context->port_num,
+			    cl_ntoh64(osm_physp_get_port_guid(p_physp)),
+			    port_num);
+
 _exit:
 	OSM_LOG_EXIT(sm->p_log);
 }
diff --git a/opensm/osm_req.c b/opensm/osm_req.c
index d397b14..5f46cd3 100644
--- a/opensm/osm_req.c
+++ b/opensm/osm_req.c
@@ -115,6 +115,15 @@ static ib_net64_t req_determine_mkey(IN osm_sm_t * sm,
 		goto Remote_Guid;
 	}
 
+	OSM_LOG(sm->p_log, OSM_LOG_DEBUG, "Target port guid unknown, "
+		"using persistent DB\n");
+	if (!osm_db_neighbor_get(sm->p_subn->p_neighbor,
+				 cl_ntoh64(p_physp->port_guid),
+				 p_physp->port_num,
+				 &dest_port_guid, NULL)) {
+		dest_port_guid = cl_hton64(dest_port_guid);
+	}
+
 Remote_Guid:
 	if (dest_port_guid) {
 		if (!osm_db_guid2mkey_get(sm->p_subn->p_g2m,
diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c
index 63f7347..175741f 100644
--- a/opensm/osm_state_mgr.c
+++ b/opensm/osm_state_mgr.c
@@ -1465,6 +1465,7 @@ repeat_discovery:
 
 	/* Write a new copy of our persistent guid2mkey database */
 	osm_db_store(sm->p_subn->p_g2m);
+	osm_db_store(sm->p_subn->p_neighbor);
 }
 
 static void do_process_mgrp_queue(osm_sm_t * sm)
diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index a4c5150..3e923f2 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -893,6 +893,80 @@ Exit:
 	OSM_LOG_EXIT(&(p_subn->p_osm->log));
 }
 
+static void subn_validate_neighbor(osm_subn_t *p_subn)
+{
+	cl_qlist_t entries;
+	osm_db_neighbor_elem_t *p_item;
+	boolean_t valid_entry;
+	uint64_t guid;
+	uint8_t port;
+
+	OSM_LOG_ENTER(&(p_subn->p_osm->log));
+	cl_qlist_init(&entries);
+
+	if (osm_db_neighbor_guids(p_subn->p_neighbor, &entries)) {
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR, "ERR 7512: "
+			"could not get neighbor entry list\n");
+		goto Exit;
+	}
+
+	while ((p_item =
+		(osm_db_neighbor_elem_t *) cl_qlist_remove_head(&entries))
+	       != (osm_db_neighbor_elem_t *) cl_qlist_end(&entries)) {
+		valid_entry = TRUE;
+
+		OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_DEBUG,
+			"Validating neighbor for 0x%016" PRIx64 ", port %d\n",
+			p_item->guid, p_item->portnum);
+		if (p_item->guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7513: found invalid zero guid\n");
+			valid_entry = FALSE;
+		} else if (p_item->portnum == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7514: found invalid zero port\n");
+			valid_entry = FALSE;
+		} else if (osm_db_neighbor_get(p_subn->p_neighbor,
+					       p_item->guid, p_item->portnum,
+					       &guid, &port)) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7515: could not find neighbor for "
+				"guid: 0x%016" PRIx64 "\n", p_item->guid);
+			valid_entry = FALSE;
+		} else if (guid == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7516: found invalid neighbor "
+				"zero guid");
+			valid_entry = FALSE;
+		} else if (port == 0) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7517: found invalid neighbor "
+				"zero port\n");
+			valid_entry = FALSE;
+		} else if (osm_db_neighbor_get(p_subn->p_neighbor,
+					       guid, port, &guid, &port) ||
+			guid != p_item->guid || port != p_item->portnum) {
+			OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+				"ERR 7518: neighbor does not point "
+				"back at us\n");
+			valid_entry = FALSE;
+		}
+
+		if (valid_entry == FALSE) {
+			if (osm_db_neighbor_delete(p_subn->p_neighbor,
+						   p_item->guid,
+						   p_item->portnum))
+				OSM_LOG(&(p_subn->p_osm->log), OSM_LOG_ERROR,
+					"ERR 7519: failed to delete entry for "
+					"guid:0x%016" PRIx64 " port:%u\n",
+					p_item->guid, p_item->portnum);
+		}
+	}
+
+Exit:
+	OSM_LOG_EXIT(&(p_subn->p_osm->log));
+}
+
 void osm_subn_construct(IN osm_subn_t * p_subn)
 {
 	memset(p_subn, 0, sizeof(*p_subn));
@@ -1130,6 +1204,35 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 
 	subn_validate_g2m(p_subn);
 
+	/* Initialize the neighbor database */
+	p_subn->p_neighbor = osm_db_domain_init(&(p_osm->db), "neighbors");
+	if (!p_subn->p_neighbor) {
+		OSM_LOG(&(p_osm->log), OSM_LOG_ERROR, "ERR 7520: Error "
+			"initializing neighbor link persistent database\n");
+		return IB_ERROR;
+	}
+
+	if (osm_db_restore(p_subn->p_neighbor)) {
+#ifndef __WIN__
+		/*
+		 * When Windows is BSODing, it might corrupt files that
+		 * were previously opened for writing, even if the files
+		 * are closed, so we might see corrupted neighbors file.
+		 */
+		if (p_subn->opt.exit_on_fatal) {
+			osm_log(&(p_osm->log), OSM_LOG_SYS,
+				"FATAL: Error restoring neighbor link "
+				"persistent database\n");
+			return IB_ERROR;
+		} else
+#endif
+			OSM_LOG(&(p_osm->log), OSM_LOG_ERROR,
+				"ERR 7521: Error restoring neighbor link "
+				"persistent database\n");
+	}
+
+	subn_validate_neighbor(p_subn);
+
 	return IB_SUCCESS;
 }
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9 v2] opensm: Check for valid mkey protection level in config file
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                             ` (4 preceding siblings ...)
  2012-08-01 14:52           ` [PATCH 6/9 v2] opensm: Add neighboring link cache file Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 8/9 v2] opensm: Ensure sweep interval/mkey lease are sensibly set Jim Foraker
  2012-08-01 14:52           ` [PATCH 9/9 v2] opensm/scripts/sldd.sh: Update to support guid2mkey/neighbors Jim Foraker
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_subnet.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 3e923f2..5ae49bb 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -2005,6 +2005,13 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 	}
 #endif
 
+	if (p_opts->m_key_protect_bits > 3) {
+		log_report(" Invalid Cached Option Value:"
+			   "m_key_protection_level = %u Setting to %u "
+			   "instead\n", p_opts->m_key_protect_bits, 2);
+		p_opts->m_key_protect_bits = 2;
+	}
+
 	if (p_opts->root_guid_file != NULL) {
 		FILE *root_file = fopen(p_opts->root_guid_file, "r");
 		if (!root_file) {
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/9 v2] opensm: Ensure sweep interval/mkey lease are sensibly set
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                             ` (5 preceding siblings ...)
  2012-08-01 14:52           ` [PATCH 7/9 v2] opensm: Check for valid mkey protection level in config file Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  2012-08-01 14:52           ` [PATCH 9/9 v2] opensm/scripts/sldd.sh: Update to support guid2mkey/neighbors Jim Foraker
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

If mkeys are protected, sweep should always be enabled and
set to an interval < the lease timeout, to ensure a missed trap
doesn't lead to mkey exposure.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 opensm/osm_subnet.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/opensm/osm_subnet.c b/opensm/osm_subnet.c
index 5ae49bb..460582a 100644
--- a/opensm/osm_subnet.c
+++ b/opensm/osm_subnet.c
@@ -2011,6 +2011,26 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 			   "instead\n", p_opts->m_key_protect_bits, 2);
 		p_opts->m_key_protect_bits = 2;
 	}
+	if (p_opts->m_key_protect_bits && p_opts->m_key_lease_period) {
+		if (!p_opts->sweep_interval) {
+			log_report(" Sweep disabled with protected mkey "
+				   "leases in effect; re-enabling sweeping "
+				   "with interval %u\n",
+				   cl_ntoh16(p_opts->m_key_lease_period) - 1);
+			p_opts->sweep_interval =
+				cl_ntoh16(p_opts->m_key_lease_period) - 1;
+		}
+		if (p_opts->sweep_interval >=
+			cl_ntoh16(p_opts->m_key_lease_period)) {
+			log_report(" Sweep interval %u >= mkey lease period "
+				   "%u. Setting lease period to %u\n",
+				   p_opts->sweep_interval,
+				   cl_ntoh16(p_opts->m_key_lease_period),
+				   p_opts->sweep_interval + 1);
+			p_opts->m_key_lease_period =
+				cl_hton16(p_opts->sweep_interval + 1);
+		}
+	}
 
 	if (p_opts->root_guid_file != NULL) {
 		FILE *root_file = fopen(p_opts->root_guid_file, "r");
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9 v2] opensm/scripts/sldd.sh: Update to support guid2mkey/neighbors
       [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
                             ` (6 preceding siblings ...)
  2012-08-01 14:52           ` [PATCH 8/9 v2] opensm: Ensure sweep interval/mkey lease are sensibly set Jim Foraker
@ 2012-08-01 14:52           ` Jim Foraker
  7 siblings, 0 replies; 33+ messages in thread
From: Jim Foraker @ 2012-08-01 14:52 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: weiny2-i2BcT+NCU+M, alexne-LTKPlh9/A5tDPfheJLI6IQ, Jim Foraker

Essentially a straightforward parameterization of all of the
functions, with a for loop wrapped around the main body, so
that the script can support sync'ing multiple files.

This allows us to add the new guid2mkey and neighbors cache
files to the default file list.

Semantics for the CACHE_FILE environment parameter have changed
slightly; instead of specifying a path to the guid2lid file,
it specifies a colon-separated list of paths to be synchronized,
which replaces the default list in its entirety.

Signed-off-by: Jim Foraker <foraker1-i2BcT+NCU+M@public.gmane.org>
---
 scripts/sldd.sh.in |  143 ++++++++++++++++++++++++++++------------------------
 1 file changed, 78 insertions(+), 65 deletions(-)

diff --git a/scripts/sldd.sh.in b/scripts/sldd.sh.in
index f7635fe..9b0e282 100755
--- a/scripts/sldd.sh.in
+++ b/scripts/sldd.sh.in
@@ -49,9 +49,9 @@ fi
 
 SLDD_DEBUG=${SLDD_DEBUG:-0}
 
-CACHE_FILE=${CACHE_FILE:-/var/cache/opensm/guid2lid}
-CACHE_DIR=$(dirname ${CACHE_FILE})
-tmp_cache=${CACHE_FILE}.tmp
+CACHE_FILE=${CACHE_FILE:-/var/cache/opensm/guid2lid:/var/cache/opensm/guid2mkey:/var/cache/opensm/neighbors}
+declare -a arr_CACHE_FILES
+arr_CACHE_FILES=(`echo $CACHE_FILE| sed 's/:/\n/g' | sort | uniq`)
 
 PING='ping -w 1 -c 1'
 
@@ -104,8 +104,8 @@ is_local()
 
 update_remote_cache()
 {
-	/bin/rm -f ${CACHE_FILE}.upd
-	/bin/cp -a ${CACHE_FILE} ${CACHE_FILE}.upd
+	/bin/rm -f "$1.upd"
+	/bin/cp -a "$1" "$1.upd"
 
 	[ $SLDD_DEBUG -eq 1 ] &&
 	echo "Updating remote cache file"
@@ -118,17 +118,18 @@ update_remote_cache()
 		fi
 
 		if is_alive $host; then
-			stat=$($RSH $host "/bin/mkdir -p ${CACHE_DIR} > /dev/null 2>&1; /bin/rm -f ${CACHE_FILE}.${local_host} > /dev/null 2>&1; echo \$?" | tr -d '[:space:]')
+			cache_dir=$(dirname "$1")
+			stat=$($RSH $host "/bin/mkdir -p ${cache_dir} > /dev/null 2>&1; /bin/rm -f "$1.${local_host}" > /dev/null 2>&1; echo \$?" | tr -d '[:space:]')
 			if [ "X${stat}" == "X0" ]; then
 				[ $SLDD_DEBUG -eq 1 ] &&
 				echo "Updating $host"
-				logger -i "SLDD: updating $host with ${CACHE_FILE}"
-				$RCP ${CACHE_FILE}.upd ${host}:${CACHE_FILE}.${local_host}
-				/bin/cp ${CACHE_FILE}.upd ${CACHE_FILE}.${host}
+				logger -i "SLDD: updating $host with $1"
+				$RCP "$1.upd" "${host}:$1.${local_host}"
+				/bin/cp "$1.upd" "$1.${host}"
 			else
 				[ $SLDD_DEBUG -eq 1 ] &&
 				echo "$RSH to $host failed."
-				logger -i "SLDD: Failed to update $host with ${CACHE_FILE}. $RSH without password should be enabled"
+				logger -i "SLDD: Failed to update $host with $1. $RSH without password should be enabled"
 				exit 5
 			fi
 		else
@@ -142,21 +143,21 @@ update_remote_cache()
 get_latest_remote_cache()
 {
 	# Find most updated remote cache file (the suffix should be like ip address: *.*.*.*)
-	echo -n "$(/bin/ls -1t ${CACHE_FILE}.*.* 2> /dev/null | head -1)"
+	echo -n "$(/bin/ls -1t $1.*.* 2> /dev/null | head -1)"
 }
 
 get_largest_remote_cache()
 {
 	# Find largest (size) remote cache file (the suffix should be like ip address: *.*.*.*)
-	echo -n "$(/bin/ls -1S ${CACHE_FILE}.*.* 2> /dev/null | head -1)"
+	echo -n "$(/bin/ls -1S $1.*.* 2> /dev/null | head -1)"
 }
 
 swap_cache_files()
 {
-	/bin/rm -f ${CACHE_FILE}.old
-	/bin/mv ${CACHE_FILE} ${CACHE_FILE}.old
-	/bin/cp ${largest_remote_cache} ${CACHE_FILE}
-	touch ${CACHE_FILE}.tmp
+	/bin/rm -f "$1.old"
+	/bin/mv "$1" "$1.old"
+	/bin/cp "$2" "$1"
+	touch "$1.tmp"
 }
 
 # Find local host in the osm hosts list
@@ -170,74 +171,86 @@ done
 
 # Get cache file info
 declare -i new_size=0
-declare -i last_size=0
+declare -ai arr_last_size
+for i in  ${!arr_CACHE_FILES[@]}
+do
+	arr_last_size[$i]=0
+done
 declare -i largest_remote_cache_size=0
 
-if [ -e ${CACHE_FILE} ]; then
-	last_size=$(du -b ${CACHE_FILE} | awk '{print$1}' | tr -d '[:space:]')
-else
-	touch ${CACHE_FILE} ${CACHE_FILE}.tmp
-fi
+for i in ${!arr_CACHE_FILES[@]}
+do
+	cache_file=${arr_CACHE_FILES[$i]}
+	if [ -e ${cache_file} ]; then
+		arr_last_size[$i]=$(du -b ${cache_file} | awk '{print$1}' | tr -d '[:space:]')
+	else
+		touch ${cache_file} ${cache_file}.tmp
+	fi
 
-# if [ ${last_size} -gt 0 ]; then
-# 	# First time update
-# 	update_remote_cache
-# fi
+#	if [ ${arr_last_size[$i]} -gt 0 ]; then
+#		# First time update
+#		update_remote_cache ${cache_file}
+#	fi
+done
 
 while true
 do
-	if [ -s "${CACHE_FILE}" ]; then
-		new_size=$(du -b ${CACHE_FILE} | awk '{print$1}' | tr -d '[:space:]')
-		# Check if local cache file grew from its last version or the time stamp changed
-		if [ ${new_size} -gt ${last_size} ]
-		   [ "$(/bin/ls -1t ${CACHE_FILE} ${CACHE_FILE}.tmp 2> /dev/null | head -1)"  != "${CACHE_FILE}.tmp" ]; then
-			largest_remote_cache=$(get_largest_remote_cache)
+	for i in ${!arr_CACHE_FILES[@]}
+	do
+		cache_file=${arr_CACHE_FILES[$i]}
+		if [ -s "${cache_file}" ]; then
+			new_size=$(du -b ${cache_file} | awk '{print$1}' | tr -d '[:space:]')
+			# Check if local cache file grew from its last version or the time stamp changed
+			if [ ${new_size} -gt ${arr_last_size[$i]} ]
+			   [ "$(/bin/ls -1t ${cache_file} ${cache_file}.tmp 2> /dev/null | head -1)"  != "${cache_file}.tmp" ]; then
+				largest_remote_cache=$(get_largest_remote_cache ${cache_file})
+				if [[ -n "${largest_remote_cache}" && -s "${largest_remote_cache}" ]]; then
+					largest_remote_cache_size=$(du -b ${largest_remote_cache} 2> /dev/null | awk '{print$1}' | tr -d '[:space:]')
+				else
+					largest_remote_cache_size=0
+				fi
+
+				# Check if local cache file larger than remote chache file
+				if [ ${new_size} -gt ${largest_remote_cache_size} ]; then
+					[ $SLDD_DEBUG -eq 1 ] &&
+					echo "Local cache file larger then remote. Update remote cache files"
+					arr_last_size[$i]=${new_size}
+					update_remote_cache ${cache_file}
+					continue
+				fi
+			fi
+
+			largest_remote_cache=$(get_largest_remote_cache ${cache_file})
 			if [[ -n "${largest_remote_cache}" && -s "${largest_remote_cache}" ]]; then
 				largest_remote_cache_size=$(du -b ${largest_remote_cache} 2> /dev/null | awk '{print$1}' | tr -d '[:space:]')
 			else
 				largest_remote_cache_size=0
 			fi
 
-			# Check if local cache file larger than remote chache file
-			if [ ${new_size} -gt ${largest_remote_cache_size} ]; then
+			# Update local cache file from remote
+			if [ ${largest_remote_cache_size} -gt ${new_size} ]; then
 				[ $SLDD_DEBUG -eq 1 ] &&
-				echo "Local cache file larger then remote. Update remote cache files"
-				last_size=${new_size}
-				update_remote_cache
-				continue
+				echo "Local cache file shorter then remote. Use ${largest_remote_cache}"
+				logger -i "SLDD: updating local cache file with ${largest_remote_cache}"
+				swap_cache_files ${cache_file} ${largest_remote_cache}
+				arr_last_size[$i]=${largest_remote_cache_size}
 			fi
-		fi
 
-		largest_remote_cache=$(get_largest_remote_cache)
-		if [[ -n "${largest_remote_cache}" && -s "${largest_remote_cache}" ]]; then
-			largest_remote_cache_size=$(du -b ${largest_remote_cache} 2> /dev/null | awk '{print$1}' | tr -d '[:space:]')
-		else
-			largest_remote_cache_size=0
-		fi
-
-		# Update local cache file from remote
-		if [ ${largest_remote_cache_size} -gt ${new_size} ]; then
+		else # The local cache file is empty
 			[ $SLDD_DEBUG -eq 1 ] &&
-			echo "Local cache file shorter then remote. Use ${largest_remote_cache}"
-			logger -i "SLDD: updating local cache file with ${largest_remote_cache}"
-			swap_cache_files
-			last_size=${largest_remote_cache_size}
-		fi
+			echo "${cache_file} is empty"
 
-	else # The local cache file is empty
-		[ $SLDD_DEBUG -eq 1 ] &&
-		echo "${CACHE_FILE} is empty"
+			largest_remote_cache=$(get_largest_remote_cache ${cache_file})
+			if [[ -n "${largest_remote_cache}" && -s "${largest_remote_cache}" ]]; then
+				# Copy it to the current cache
+				[ $SLDD_DEBUG -eq 1 ] &&
+				echo "Local cache file is empty. Use ${largest_remote_cache}"
+				logger -i "SLDD: updating local cache file with ${largest_remote_cache}"
+				swap_cache_files ${cache_file} ${largest_remote_cache}
+			fi
 
-		largest_remote_cache=$(get_largest_remote_cache)
-		if [[ -n "${largest_remote_cache}" && -s "${largest_remote_cache}" ]]; then
-			# Copy it to the current cache
-			[ $SLDD_DEBUG -eq 1 ] &&
-			echo "Local cache file is empty. Use ${largest_remote_cache}"
-			logger -i "SLDD: updating local cache file with ${largest_remote_cache}"
-			swap_cache_files
 		fi
-
-	fi
+	done
 
 	[ $SLDD_DEBUG -eq 1 ] &&
 	echo "Sleeping ${RESCAN_TIME} seconds."
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/8] opensm: Improved mkey support
       [not found]     ` <1343832537.26423.8.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
  2012-08-01 14:52       ` [PATCH 1/9 v2] opensm: Add guid2mkey cache file support Jim Foraker
@ 2012-08-01 20:19       ` Alex Netes
  1 sibling, 0 replies; 33+ messages in thread
From: Alex Netes @ 2012-08-01 20:19 UTC (permalink / raw)
  To: Jim Foraker; +Cc: linux-rdma, Ira Weiny

hi Jim

On 07:48 Wed 01 Aug     , Jim Foraker wrote:
> v2 is about to be posted.  It is now a 9-patch set.  Changes from v1:
> 
> . Subnet initialization behavior changed to log errors on SubnGet
> timeouts, but not flag the init as failed, so that we don't heavy sweep
> more than necessary
> . sldd.sh modified to sync multiple files, and neighbors/guid2mkey
> caches added to its default list
> . Rebased against current HEAD
> . Several whitespace/code format/git log cleanup fixes
> 
>      There will be a man page patch coming shortly, but I want to get
> these out the door for review now.
> 
>      Jim
> 

Applied the series, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-08-01 20:19 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-26  0:54 [PATCH 0/8] opensm: Improved mkey support Jim Foraker
     [not found] ` <1340672058.5218.97.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
2012-06-26  0:54   ` [PATCH 1/8] opensm: Add guid2mkey cache file support Jim Foraker
     [not found]     ` <1340672104-18039-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
2012-06-26  0:54       ` [PATCH 2/8] opensm: Allow recovery of subnets with misset mkeys Jim Foraker
2012-06-26  0:54       ` [PATCH 3/8] Add locking where necessary around osm_req_* Jim Foraker
2012-06-26  0:55       ` [PATCH 4/8] Add support for setting mkey protection levels Jim Foraker
2012-06-26  0:55       ` [PATCH 5/8] opensm: Signal subnet init errors on SubnGet timeouts Jim Foraker
     [not found]         ` <1340672104-18039-5-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
2012-07-23 15:43           ` Alex Netes
2012-07-23 22:19             ` Jim Foraker
     [not found]               ` <1343081989.29792.12.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
2012-07-29 16:29                 ` Alex Netes
2012-07-30 17:19                   ` Foraker, Jim
2012-06-26  0:55       ` [PATCH 6/8] opensm: Add neighboring link cache file Jim Foraker
2012-06-26  0:55       ` [PATCH 7/8] opensm: Check for valid mkey protection level in config file Jim Foraker
2012-06-26  0:55       ` [PATCH 8/8] opensm: Ensure sweep interval/mkey lease are sensibly set Jim Foraker
     [not found]         ` <1340672104-18039-8-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
2012-07-24  9:01           ` Alex Netes
2012-07-24 17:40             ` Jim Foraker
2012-07-04  0:25   ` [PATCH 0/8] opensm: Improved mkey support Jim Foraker
     [not found]     ` <1341361508.5218.148.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
2012-07-04  0:25       ` [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support Jim Foraker
     [not found]         ` <1341361548-30229-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
2012-07-04  0:25           ` [PATCH V1.1 3/8] Add locking where necessary around osm_req_* Jim Foraker
2012-07-23 15:55           ` [PATCH V1.1 1/8] opensm: Add guid2mkey cache file support Alex Netes
2012-07-23 22:37             ` Jim Foraker
2012-07-23 15:59       ` [PATCH 0/8] opensm: Improved mkey support Alex Netes
2012-07-23 22:28         ` Jim Foraker
2012-08-01 14:48   ` Jim Foraker
     [not found]     ` <1343832537.26423.8.camel-mxTxeWJot8FliZ7u+bvwcg@public.gmane.org>
2012-08-01 14:52       ` [PATCH 1/9 v2] opensm: Add guid2mkey cache file support Jim Foraker
     [not found]         ` <1343832755-26753-1-git-send-email-foraker1-i2BcT+NCU+M@public.gmane.org>
2012-08-01 14:52           ` [PATCH 2/9 v2] opensm: Allow recovery of subnets with misset mkeys Jim Foraker
2012-08-01 14:52           ` [PATCH 3/9 v2] opensm: Add locking where necessary around osm_req_* Jim Foraker
2012-08-01 14:52           ` [PATCH 4/9 v2] opensm: Add support for setting mkey protection levels Jim Foraker
2012-08-01 14:52           ` [PATCH 5/9 v2] opensm: Log errors on SubnGet timeouts Jim Foraker
2012-08-01 14:52           ` [PATCH 6/9 v2] opensm: Add neighboring link cache file Jim Foraker
2012-08-01 14:52           ` [PATCH 7/9 v2] opensm: Check for valid mkey protection level in config file Jim Foraker
2012-08-01 14:52           ` [PATCH 8/9 v2] opensm: Ensure sweep interval/mkey lease are sensibly set Jim Foraker
2012-08-01 14:52           ` [PATCH 9/9 v2] opensm/scripts/sldd.sh: Update to support guid2mkey/neighbors Jim Foraker
2012-08-01 20:19       ` [PATCH 0/8] opensm: Improved mkey support Alex Netes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.