All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/17] opensm: Add new torus routing engine: torus-2QoS
@ 2010-06-15 19:53 Jim Schutt
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

This is v3 of a patchset to add to opensm a new routing engine designed
to handle large fabrics connected with a 2D/3D torus topology.

Changes since v2:

- Rebased to a3dec3a87a.
- Divide "Add torus-2QoS routing engine" patch into three parts
   to avoid rejection by mailing list.
- Bug fix: reduce number of required seed links for a torus
   with one or more dimensions of radix four.
- Bug fix: don't let torus-2QoS be fooled into thinking it can route
   a torus with two or more blocks of switches adjacent in z missing.
- Bug fix: if osm_ucast_mgr_process() fails, no configured routing engine
   could route the fabric, so wait for a trap or sweep interval before
   next heavy sweep.
- Bug fix: cut-n-paste error in handle_case_0x731().

Changes since initial version:

- Merged my patchsets from 11/20/2009, 12/18/2009, 2/16/2010.
- Moved infomation contained in the earlier patch series introduction
    emails into the appropriate commit messages.
- Rebased to c183eb8c4c.
- Addressed issues found by Yevgeny Kliteynik in original patchsets.
    Yevgeny's --no_default_routing option patch is not included
    in the merging, but would be a good addition.
- Renamed osm_ucast_torus.c to osm_torus.c.
    Since osm_torus.c contains code to implement both unicast and
    multicast routing, the new name seems more appropriate.  The
    multicast support depends heavily on the unicast routing code,
    so it is more convenient to keep everything in one file.
- Removed redundant check for changed sl2vl map.
    This functionality already exists in sl2vl_update_table().
- Set sl2vl maps on CA ports for torus-2QoS.
    This was missing in the original patches.
- Do not force torus-2QoS to use SLs 8-15 when not using "opensm -Q".
    This was an interim measure introduced before multicast support was
    working, that allowed multicast to use SL/VL 0 and thus not deadlock
    against unicast.  I forget to take it out in the multicast patchset,
    so I took it out when I merged.
- Renamed torus variables referencing "origin" to "seed".
    These things refer to switches used to seed the torus topology
    appropriately, so the new name should reduce confusion going forward.
    This also contains a keyword change in the torus configuration file,
    so I'll repost an updated example.


Jim Schutt (17):
  opensm: Prepare for routing engine input to path record SL lookup and
    SL2VL map setup.
  opensm: Allow the routing engine to influence SL2VL calculations.
  opensm: Allow the routing engine to participate in path SL
    calculations.
  opensm: Track the minimum value in the fabric of data VLs supported.
  opensm: Add struct osm_routing_engine callback to build spanning
    trees for multicast.
  opensm: Make mcast_mgr_purge_tree() available outside
    osm_mcast_mgr.c.
  opensm: Add torus-2QoS routing engine, part 1.
  opensm: Add torus-2QoS routing engine, part 2.
  opensm: Add torus-2QoS routing engine, part 3.
  opensm: Update documentation to describe torus-2QoS.
  opensm: Enable torus-2QoS routing engine.
  opensm: Add opensm option to specify file name for extra torus-2QoS
    configuration information.
  opensm: Do not require -Q option for torus-2QoS routing engine.
  opensm: Make it possible to configure no fallback routing engine.
  opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of
    osm_port_t:priv.
  opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS
    persistent use of osm_port_t:priv.
  opensm: Cause status of unicast routing attempt to propogate to
    callers of osm_ucast_mgr_process().

 opensm/doc/current-routing.txt         |  269 +-
 opensm/include/opensm/osm_base.h       |   18 +
 opensm/include/opensm/osm_multicast.h  |   33 +
 opensm/include/opensm/osm_opensm.h     |   29 +-
 opensm/include/opensm/osm_subnet.h     |    7 +
 opensm/include/opensm/osm_switch.h     |   12 +
 opensm/include/opensm/osm_ucast_lash.h |    3 -
 opensm/man/opensm.8.in                 |    9 +-
 opensm/opensm/Makefile.am              |    2 +-
 opensm/opensm/main.c                   |   11 +-
 opensm/opensm/osm_console.c            |   10 +-
 opensm/opensm/osm_dump.c               |    5 +-
 opensm/opensm/osm_link_mgr.c           |   16 +-
 opensm/opensm/osm_mcast_mgr.c          |   11 +-
 opensm/opensm/osm_opensm.c             |   54 +-
 opensm/opensm/osm_port_info_rcv.c      |   13 +-
 opensm/opensm/osm_qos.c                |   40 +-
 opensm/opensm/osm_sa_path_record.c     |   33 +-
 opensm/opensm/osm_state_mgr.c          |   23 +-
 opensm/opensm/osm_subnet.c             |   20 +-
 opensm/opensm/osm_switch.c             |    7 +-
 opensm/opensm/osm_torus.c              | 9120 ++++++++++++++++++++++++++++++++
 opensm/opensm/osm_ucast_lash.c         |   11 +-
 opensm/opensm/osm_ucast_mgr.c          |   55 +-
 24 files changed, 9702 insertions(+), 109 deletions(-)
 create mode 100644 opensm/opensm/osm_torus.c


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2010-06-15 19:53   ` Jim Schutt
       [not found]     ` <1276631604-29230-2-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2010-06-15 19:53   ` [PATCH v3 02/17] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
                     ` (15 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

In the event a routing engine needs to participate in SL assignment and
SL2VL map setup in order to avoid credit loops in a fabric, it will be
useful to make the routing engine context more widely available.

To this end, have osm_opensm_t save a pointer to the routing engine used,
rather than its type.  This will make the routing engine context easily
available in, e.g., sl2vl_update() and pr_rcv_get_path_parms().

Make the necessary adjustments to the code that used the old
routing_engine_used as an enum _osm_routing_engine_type.  In order to
keep the behavior where minhop was used if the configured routing engines
failed, the easiest solution was to add a pointer to osm_opensm_t which
pointed to the minhop struct osm_routing_engine.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |    4 ++-
 opensm/opensm/osm_console.c        |   10 ++++++--
 opensm/opensm/osm_dump.c           |    3 +-
 opensm/opensm/osm_link_mgr.c       |    5 ++-
 opensm/opensm/osm_opensm.c         |   43 +++++++++++++++++++++---------------
 opensm/opensm/osm_sa_path_record.c |    3 +-
 opensm/opensm/osm_ucast_lash.c     |    3 +-
 opensm/opensm/osm_ucast_mgr.c      |   17 ++++++++------
 8 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index c6c9bdb..e97142e 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -120,6 +120,7 @@ typedef enum _osm_routing_engine_type {
 *	added later.
 */
 struct osm_routing_engine {
+	osm_routing_engine_type_t type;
 	const char *name;
 	void *context;
 	int (*build_lid_matrices) (void *context);
@@ -183,7 +184,8 @@ typedef struct osm_opensm {
 	cl_dispatcher_t disp;
 	cl_plock_t lock;
 	struct osm_routing_engine *routing_engine_list;
-	osm_routing_engine_type_t routing_engine_used;
+	struct osm_routing_engine *routing_engine_used;
+	struct osm_routing_engine *default_routing_engine;
 	osm_stats_t stats;
 	osm_console_t console;
 	nn_map_t *node_name_map;
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index bc7bea3..b99bb84 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -382,6 +382,8 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
 	cl_list_item_t *item;
 
 	if (out) {
+		const char *re_str;
+
 		cl_plock_acquire(&p_osm->lock);
 		fprintf(out, "   OpenSM Version       : %s\n", p_osm->osm_version);
 		fprintf(out, "   SM State             : %s\n",
@@ -390,9 +392,11 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
 			p_osm->subn.opt.sm_priority);
 		fprintf(out, "   SA State             : %s\n",
 			sa_state_str(p_osm->sa.state));
-		fprintf(out, "   Routing Engine       : %s\n",
-			osm_routing_engine_type_str(p_osm->
-						    routing_engine_used));
+
+		re_str = p_osm->routing_engine_used ?
+			osm_routing_engine_type_str(p_osm->routing_engine_used->type) :
+			osm_routing_engine_type_str(OSM_ROUTING_ENGINE_TYPE_NONE);
+		fprintf(out, "   Routing Engine       : %s\n", re_str);
 
 		fprintf(out, "   Loaded event plugins :");
 		if (cl_qlist_head(&p_osm->plugin_list) ==
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index fe2c3bc..bfff1a0 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -135,7 +135,8 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * file, void *cxt)
 		"Switch 0x%016" PRIx64 "\nLID    : Port : Hops : Optimal\n",
 		cl_ntoh64(osm_node_get_node_guid(p_node)));
 
-	dor = (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_DOR);
+	dor = (p_osm->routing_engine_used &&
+	       p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_DOR);
 
 	for (lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++) {
 		fprintf(file, "0x%04X : ", lid_ho);
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index e6c9b3b..c309916 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -64,8 +64,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 
 	OSM_LOG_ENTER(sm->p_log);
 
-	if (p_osm->routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH
-	    || !(slid = osm_physp_get_base_lid(p_physp))) {
+	if (!(p_osm->routing_engine_used &&
+	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH &&
+	      (slid = osm_physp_get_base_lid(p_physp)))) {
 		/* Use default SL if lash routing is not used */
 		OSM_LOG_EXIT(sm->p_log);
 		return sm->p_subn->opt.sm_sl;
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index d3dc02e..5614240 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -147,7 +147,8 @@ static void append_routing_engine(osm_opensm_t *osm,
 	r->next = routing_engine;
 }
 
-static void setup_routing_engine(osm_opensm_t *osm, const char *name)
+static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
+						       const char *name)
 {
 	struct osm_routing_engine *re;
 	const struct routing_engine_module *m;
@@ -158,47 +159,53 @@ static void setup_routing_engine(osm_opensm_t *osm, const char *name)
 			if (!re) {
 				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
 					"memory allocation failed\n");
-				return;
+				return NULL;
 			}
 			memset(re, 0, sizeof(struct osm_routing_engine));
 
 			re->name = m->name;
+			re->type = osm_routing_engine_type(m->name);
 			if (m->setup(re, osm)) {
 				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
 					"setup of routing"
 					" engine \'%s\' failed\n", name);
-				return;
+				free(re);
+				return NULL;
 			}
 			OSM_LOG(&osm->log, OSM_LOG_DEBUG,
 				"\'%s\' routing engine set up\n", re->name);
-			append_routing_engine(osm, re);
-			return;
+			if (re->type == OSM_ROUTING_ENGINE_TYPE_MINHOP)
+				osm->default_routing_engine = re;
+			return re;
 		}
 	}
 
 	OSM_LOG(&osm->log, OSM_LOG_ERROR,
 		"cannot find or setup routing engine \'%s\'\n", name);
+	return NULL;
 }
 
 static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names)
 {
 	char *name, *str, *p;
+	struct osm_routing_engine *re;
 
-	if (!engine_names || !*engine_names) {
-		setup_routing_engine(osm, "minhop");
-		return;
+	if (engine_names && *engine_names) {
+		str = strdup(engine_names);
+		name = strtok_r(str, ", \t\n", &p);
+		while (name && *name) {
+			re = setup_routing_engine(osm, name);
+			if (re)
+				append_routing_engine(osm, re);
+			name = strtok_r(NULL, ", \t\n", &p);
+		}
+		free(str);
 	}
-
-	str = strdup(engine_names);
-	name = strtok_r(str, ", \t\n", &p);
-	while (name && *name) {
-		setup_routing_engine(osm, name);
-		name = strtok_r(NULL, ", \t\n", &p);
+	if (!osm->default_routing_engine) {
+		re = setup_routing_engine(osm, "minhop");
+		if (!osm->routing_engine_list && re)
+			append_routing_engine(osm, re);
 	}
-	free(str);
-
-	if (!osm->routing_engine_list)
-		setup_routing_engine(osm, "minhop");
 }
 
 void osm_opensm_construct(IN osm_opensm_t * p_osm)
diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
index f0d7ca2..093c70d 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -667,7 +667,8 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	 * Set PathRecord SL
 	 */
 
-	is_lash = (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_LASH);
+	is_lash = (p_osm->routing_engine_used &&
+		   p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH);
 
 	if (comp_mask & IB_PR_COMPMASK_SL) {
 		/*
diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c
index 4669946..72c4c3d 100644
--- a/opensm/opensm/osm_ucast_lash.c
+++ b/opensm/opensm/osm_ucast_lash.c
@@ -1284,7 +1284,8 @@ uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
 	unsigned src_id;
 	osm_switch_t *p_sw;
 
-	if (p_osm->routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH)
+	if (!(p_osm->routing_engine_used &&
+	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH))
 		return OSM_DEFAULT_SL;
 
 	p_sw = get_osm_switch_from_port(p_dst_port);
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 37b8741..10629cb 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1056,7 +1056,7 @@ static int ucast_mgr_route(struct osm_routing_engine *r, osm_opensm_t * osm)
 		return ret;
 	}
 
-	osm->routing_engine_used = osm_routing_engine_type(r->name);
+	osm->routing_engine_used = r;
 
 	osm_ucast_mgr_set_fwd_tables(&osm->sm.ucast_mgr);
 
@@ -1084,24 +1084,27 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 	    ucast_mgr_setup_all_switches(p_mgr->p_subn) < 0)
 		goto Exit;
 
-	p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE;
+	p_osm->routing_engine_used = NULL;
 	while (p_routing_eng) {
 		if (!ucast_mgr_route(p_routing_eng, p_osm))
 			break;
 		p_routing_eng = p_routing_eng->next;
 	}
 
-	if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) {
+	if (!p_osm->routing_engine_used) {
 		/* If configured routing algorithm failed, use default MinHop */
-		osm_ucast_mgr_build_lid_matrices(p_mgr);
-		ucast_mgr_build_lfts(p_mgr);
+		struct osm_routing_engine *r = p_osm->default_routing_engine;
+
+		r->build_lid_matrices(r->context);
+		r->ucast_build_fwd_tables(r->context);
+		p_osm->routing_engine_used = r;
 		osm_ucast_mgr_set_fwd_tables(p_mgr);
-		p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP;
 	}
 
 	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
 		"%s tables configured on all switches\n",
-		osm_routing_engine_type_str(p_osm->routing_engine_used));
+		osm_routing_engine_type_str(p_osm->
+					    routing_engine_used->type));
 
 	if (p_mgr->p_subn->opt.use_ucast_cache)
 		p_mgr->cache_valid = TRUE;
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 02/17] opensm: Allow the routing engine to influence SL2VL calculations.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2010-06-15 19:53   ` [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

Note that the original code assumes that QoS setup is mostly static and
based only on user configuration.  As a result, there is no provision for
routing engines that want to compute contributions to the SL2VL maps.

Fix this up by adding a callback to struct osm_routing_engine that computes
a per-port SL2VL map, and call it from the appropriate place in the QoS
setup path.  Assume that if a routing engine provides a update_sl2vl()
callback that there will input-port dependence in the SL2VL maps, and
so do not attempt to use optimized SL2VL map programming even if the
switch supports it.

Also need to move the call to osm_qos_setup() in do_sweep() to after the
call to the routing engine, so that any SL2VL map contributions from the
routing engine are based on the latest information.  Need to call
osm_qos_setup() for requested reroute for the same reason.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |   12 ++++++++++++
 opensm/opensm/osm_qos.c            |   27 +++++++++++++++++++++++----
 opensm/opensm/osm_state_mgr.c      |    5 +++--
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index e97142e..25a6f90 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -126,6 +126,9 @@ struct osm_routing_engine {
 	int (*build_lid_matrices) (void *context);
 	int (*ucast_build_fwd_tables) (void *context);
 	void (*ucast_dump_tables) (void *context);
+	void (*update_sl2vl)(void *context, IN osm_physp_t *port,
+			     IN uint8_t in_port_num, IN uint8_t out_port_num,
+			     IN OUT ib_slvl_table_t *t);
 	void (*delete) (void *context);
 	struct osm_routing_engine *next;
 };
@@ -147,6 +150,15 @@ struct osm_routing_engine {
 *	ucast_dump_tables
 *		The callback for dumping unicast routing tables.
 *
+*	update_sl2vl(void *context, IN osm_physp_t *port,
+*		     IN uint8_t in_port_num, IN uint8_t out_port_num,
+*		     OUT ib_slvl_table_t *t)
+*		The callback to allow routing engine input for SL2VL maps.
+*		*port is the phyical port for which the SL2VL map is to be
+*		updated. For switches, in_port_num/out_port_num identify
+*		which part of the SL2VL map to update.  For router/HCA ports,
+*		in_port_num/out_port_num should be ignored.
+*
 *	delete
 *		The delete method, may be used for routing engine
 *		internals cleanup.
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index cce59ee..dadef29 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -207,6 +207,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node,
 	osm_physp_t *p0, *p;
 	unsigned force_update;
 	unsigned num_ports = osm_node_get_num_physp(node);
+	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
 	int ret = 0;
 	unsigned i, j;
 
@@ -223,7 +224,7 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node,
 		return ret;
 
 	if (ib_switch_info_get_opt_sl2vlmapping(&node->sw->switch_info) &&
-	    sm->p_subn->opt.use_optimized_slvl) {
+	    sm->p_subn->opt.use_optimized_slvl && !re->update_sl2vl) {
 		p = osm_node_get_physp_ptr(node, 1);
 		force_update = p->need_update || sm->p_subn->need_update;
 		return sl2vl_update_table(sm, p, 1, 0x30000, force_update,
@@ -234,10 +235,20 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node,
 		p = osm_node_get_physp_ptr(node, i);
 		force_update = p->need_update || sm->p_subn->need_update;
 		j = ib_switch_info_is_enhanced_port0(&node->sw->switch_info) ? 0 : 1;
-		for (; j < num_ports; j++)
+		for (; j < num_ports; j++) {
+			const ib_slvl_table_t *port_sl2vl = &qcfg->sl2vl;
+			ib_slvl_table_t routing_sl2vl;
+
+			if (re->update_sl2vl) {
+				routing_sl2vl = *port_sl2vl;
+				re->update_sl2vl(re->context,
+						 p, i, j, &routing_sl2vl);
+				port_sl2vl = &routing_sl2vl;
+			}
 			if (sl2vl_update_table(sm, p, i, i << 8 | j,
-					       force_update, &qcfg->sl2vl))
+					       force_update, port_sl2vl))
 				ret = -1;
+		}
 	}
 
 	return ret;
@@ -247,6 +258,9 @@ static int qos_endport_setup(osm_sm_t * sm, osm_physp_t * p,
 			     const struct qos_config *qcfg)
 {
 	unsigned force_update = p->need_update || sm->p_subn->need_update;
+	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
+	const ib_slvl_table_t *port_sl2vl = &qcfg->sl2vl;
+	ib_slvl_table_t routing_sl2vl;
 
 	p->vl_high_limit = qcfg->vl_high_limit;
 	if (vlarb_update(sm, p, 0, force_update, qcfg))
@@ -255,7 +269,12 @@ static int qos_endport_setup(osm_sm_t * sm, osm_physp_t * p,
 	if (!(p->port_info.capability_mask & IB_PORT_CAP_HAS_SL_MAP))
 		return 0;
 
-	if (sl2vl_update_table(sm, p, 0, 0, force_update, &qcfg->sl2vl))
+	if (re->update_sl2vl) {
+		routing_sl2vl = *port_sl2vl;
+		re->update_sl2vl(re->context, p, 0, 0, &routing_sl2vl);
+		port_sl2vl = &routing_sl2vl;
+	}
+	if (sl2vl_update_table(sm, p, 0, 0, force_update, port_sl2vl))
 		return -1;
 
 	return 0;
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 81c8f54..cdd72c1 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1141,6 +1141,7 @@ static void do_sweep(osm_sm_t * sm)
 		sm->p_subn->ignore_existing_lfts = TRUE;
 
 		osm_ucast_mgr_process(&sm->ucast_mgr);
+		osm_qos_setup(sm->p_subn->p_osm);
 
 		/* Reset flag */
 		sm->p_subn->ignore_existing_lfts = FALSE;
@@ -1259,8 +1260,6 @@ repeat_discovery:
 
 	osm_pkey_mgr_process(sm->p_subn->p_osm);
 
-	osm_qos_setup(sm->p_subn->p_osm);
-
 	/* try to restore SA DB (this should be before lid_mgr
 	   because we may want to disable clients reregistration
 	   when SA DB is restored) */
@@ -1301,6 +1300,8 @@ repeat_discovery:
 	    osm_ucast_cache_process(&sm->ucast_mgr))
 		osm_ucast_mgr_process(&sm->ucast_mgr);
 
+	osm_qos_setup(sm->p_subn->p_osm);
+
 	if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
 		return;
 
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2010-06-15 19:53   ` [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 02/17] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
       [not found]     ` <1276631604-29230-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2010-06-15 19:53   ` [PATCH v3 04/17] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
                     ` (13 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

LASH already does this, in a hard-coded fashion.

Generalize this by adding a callback to struct osm_routing_engine that
computes a path SL value, and fix up LASH to use it.

This patchset causes the requested or QoS-computed SL value to be passed
to the routing engine path SL computation as a hint.  In the event the
routing engine's use of SLs allows it to support more than one QoS level,
it may be able to make use of the SL hint to do so.

For now, LASH just ignores the hint.

Note that before this change, if LASH was configured and a specific path
SL value was requested that differed from what LASH needed to route the
fabric without credit loops, the path SL lookup would fail.  Now LASH's
SL value is always used.

Possibly the choice between failing a path SL request when it conflicts
with routing, vs. always providing an SL value that gives a credit-loop-
free routing, should be user-configurable?

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h     |    6 +++++
 opensm/include/opensm/osm_ucast_lash.h |    3 --
 opensm/opensm/osm_link_mgr.c           |   15 ++++++++-----
 opensm/opensm/osm_sa_path_record.c     |   34 +++++++++++--------------------
 opensm/opensm/osm_ucast_lash.c         |    8 +++++-
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index 25a6f90..734a6db 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -129,6 +129,9 @@ struct osm_routing_engine {
 	void (*update_sl2vl)(void *context, IN osm_physp_t *port,
 			     IN uint8_t in_port_num, IN uint8_t out_port_num,
 			     IN OUT ib_slvl_table_t *t);
+	uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
+			   IN const osm_port_t *src_port,
+			   IN const osm_port_t *dst_port);
 	void (*delete) (void *context);
 	struct osm_routing_engine *next;
 };
@@ -159,6 +162,9 @@ struct osm_routing_engine {
 *		which part of the SL2VL map to update.  For router/HCA ports,
 *		in_port_num/out_port_num should be ignored.
 *
+*	path_sl
+*		The callback for computing path SL.
+*
 *	delete
 *		The delete method, may be used for routing engine
 *		internals cleanup.
diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h
index 9e15d38..dd90d5d 100644
--- a/opensm/include/opensm/osm_ucast_lash.h
+++ b/opensm/include/opensm/osm_ucast_lash.h
@@ -94,7 +94,4 @@ typedef struct _lash {
 	int ***virtual_location;
 } lash_t;
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-			const osm_port_t * p_dst_port);
-
 #endif
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index c309916..e446e16 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -53,21 +53,23 @@
 #include <opensm/osm_helper.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
-#include <opensm/osm_ucast_lash.h>
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
 	osm_opensm_t *p_osm = sm->p_subn->p_osm;
+	struct osm_routing_engine *re = p_osm->routing_engine_used;
 	const osm_port_t *p_sm_port, *p_src_port;
 	ib_net16_t slid;
 	uint8_t sl;
 
 	OSM_LOG_ENTER(sm->p_log);
 
-	if (!(p_osm->routing_engine_used &&
-	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH &&
+	if (!(re && re->path_sl &&
 	      (slid = osm_physp_get_base_lid(p_physp)))) {
-		/* Use default SL if lash routing is not used */
+		/*
+		 * Use default SL if routing engine does not provide a
+		 * path SL lookup callback.
+		 */
 		OSM_LOG_EXIT(sm->p_log);
 		return sm->p_subn->opt.sm_sl;
 	}
@@ -78,8 +80,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 	/* Find osm_port of the source = p_physp */
 	p_src_port = osm_get_port_by_lid(sm->p_subn, slid);
 
-	/* Call lash to find proper SL */
-	sl = osm_get_lash_sl(p_osm, p_src_port, p_sm_port);
+	/* Call into routing engine to find proper SL */
+	sl = re->path_sl(re->context, sm->p_subn->opt.sm_sl,
+			 p_src_port, p_sm_port);
 
 	OSM_LOG_EXIT(sm->p_log);
 	return sl;
diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
index 093c70d..a323671 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -164,6 +164,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	const osm_physp_t *p_dest_physp;
 	const osm_prtn_t *p_prtn = NULL;
 	osm_opensm_t *p_osm;
+	struct osm_routing_engine *p_re;
 	const ib_port_info_t *p_pi;
 	ib_api_status_t status = IB_SUCCESS;
 	ib_net16_t pkey;
@@ -180,7 +181,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	ib_slvl_table_t *p_slvl_tbl = NULL;
 	osm_qos_level_t *p_qos_level = NULL;
 	uint16_t valid_sl_mask = 0xffff;
-	int is_lash;
 	int hops = 0;
 
 	OSM_LOG_ENTER(sa->p_log);
@@ -192,6 +192,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	p_src_physp = p_physp;
 	p_pi = &p_physp->port_info;
 	p_osm = sa->p_subn->p_osm;
+	p_re = p_osm->routing_engine_used;
 
 	mtu = ib_port_info_get_mtu_cap(p_pi);
 	rate = ib_port_info_compute_rate(p_pi);
@@ -667,9 +668,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	 * Set PathRecord SL
 	 */
 
-	is_lash = (p_osm->routing_engine_used &&
-		   p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH);
-
 	if (comp_mask & IB_PR_COMPMASK_SL) {
 		/*
 		 * Specific SL was requested
@@ -686,26 +684,10 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 			goto Exit;
 		}
 
-		if (is_lash
-		    && osm_get_lash_sl(p_osm, p_src_port, p_dest_port) != sl) {
-			OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F23: "
-				"Required PathRecord SL (%u) doesn't "
-				"match LASH SL\n", sl);
-			status = IB_NOT_FOUND;
-			goto Exit;
-		}
-
-	} else if (is_lash) {
-		/*
-		 * No specific SL in PathRecord request.
-		 * If it's LASH routing - use its SL.
-		 * slid and dest_lid are stored in network in lash.
-		 */
-		sl = osm_get_lash_sl(p_osm, p_src_port, p_dest_port);
 	} else if (p_qos_level && p_qos_level->sl_set) {
 		/*
-		 * No specific SL was requested, and we're not in
-		 * LASH routing, but there is an SL in QoS level.
+		 * No specific SL was requested, but there is an SL in
+		 * QoS level.
 		 */
 		sl = p_qos_level->sl;
 
@@ -746,6 +728,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 		goto Exit;
 	}
 
+	/*
+	 * If the routing engine wants to have a say in path SL selection,
+	 * send the currently computed SL value as a hint and let the routing
+	 * engine override it.
+	 */
+	if (p_re && p_re->path_sl)
+		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
+
 	/* reset pkey when raw traffic */
 	if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
 	    cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c
index 72c4c3d..8746c37 100644
--- a/opensm/opensm/osm_ucast_lash.c
+++ b/opensm/opensm/osm_ucast_lash.c
@@ -1277,12 +1277,15 @@ static void lash_delete(void *context)
 	free(p_lash);
 }
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-			const osm_port_t * p_dst_port)
+static uint8_t get_lash_sl(void *context, uint8_t path_sl_hint,
+			   const osm_port_t *p_src_port,
+			   const osm_port_t *p_dst_port)
 {
 	unsigned dst_id;
 	unsigned src_id;
 	osm_switch_t *p_sw;
+	lash_t *p_lash = context;
+	osm_opensm_t *p_osm = p_lash->p_osm;
 
 	if (!(p_osm->routing_engine_used &&
 	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH))
@@ -1312,6 +1315,7 @@ int osm_ucast_lash_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm)
 
 	r->context = p_lash;
 	r->ucast_build_fwd_tables = lash_process;
+	r->path_sl = get_lash_sl;
 	r->delete = lash_delete;
 
 	return 0;
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 04/17] opensm: Track the minimum value in the fabric of data VLs supported.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 05/17] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

A routing engine that wants to make contributions to SL2VL maps in support
of routing free from credit loops may need to know the minimum number
of supported data VLs in the fabric.

This code tracks that value.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_subnet.h |    1 +
 opensm/opensm/osm_port_info_rcv.c  |   13 ++++++++++++-
 opensm/opensm/osm_state_mgr.c      |    6 ++++++
 opensm/opensm/osm_subnet.c         |    1 +
 4 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 95a635c..4fa0161 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -536,6 +536,7 @@ typedef struct osm_subn {
 	uint16_t max_mcast_lid_ho;
 	uint8_t min_ca_mtu;
 	uint8_t min_ca_rate;
+	uint8_t min_data_vls;
 	boolean_t ignore_existing_lfts;
 	boolean_t subnet_initialization_error;
 	boolean_t force_heavy_sweep;
diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c
index 9260047..c05301e 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -83,6 +83,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN osm_physp_t * p_physp,
 	ib_api_status_t status;
 	ib_net64_t port_guid;
 	uint8_t rate, mtu;
+	unsigned data_vls;
 	cl_qmap_t *p_sm_tbl;
 	osm_remote_sm_t *p_sm;
 
@@ -92,7 +93,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN osm_physp_t * p_physp,
 
 	/* HACK extended port 0 should be handled too! */
 	if (osm_physp_get_port_num(p_physp) != 0) {
-		/* track the minimal endport MTU and rate */
+		/* track the minimal endport MTU, rate, and operational VLs */
 		mtu = ib_port_info_get_mtu_cap(p_pi);
 		if (mtu < sm->p_subn->min_ca_mtu) {
 			OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
@@ -108,6 +109,16 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN osm_physp_t * p_physp,
 				PRIx64 "\n", rate, cl_ntoh64(port_guid));
 			sm->p_subn->min_ca_rate = rate;
 		}
+
+		data_vls = 1U << (ib_port_info_get_op_vls(p_pi) - 1);
+		if (data_vls >= IB_MAX_NUM_VLS)
+			data_vls = IB_MAX_NUM_VLS - 1;
+		if ((uint8_t)data_vls < sm->p_subn->min_data_vls) {
+			OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
+				"Setting endport minimal data VLs to:%u defined by port:0x%"
+				PRIx64 "\n", data_vls, cl_ntoh64(port_guid));
+			sm->p_subn->min_data_vls = data_vls;
+		}
 	}
 
 	if (port_guid != sm->p_subn->sm_port_guid) {
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index cdd72c1..762bb27 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1164,6 +1164,12 @@ repeat_discovery:
 	sm->p_subn->force_reroute = FALSE;
 	sm->p_subn->subnet_initialization_error = FALSE;
 
+	/* Reset tracking values in case limiting component got removed
+	 * from fabric. */
+	sm->p_subn->min_ca_mtu = IB_MAX_MTU;
+	sm->p_subn->min_ca_rate = IB_MAX_RATE;
+	sm->p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
+
 	/* rescan configuration updates */
 	if (!config_parsed && osm_subn_rescan_conf_files(sm->p_subn) < 0)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331A: "
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index d5c5ab2..8224b5f 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -529,6 +529,7 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 	p_subn->max_mcast_lid_ho = IB_LID_MCAST_END_HO;
 	p_subn->min_ca_mtu = IB_MAX_MTU;
 	p_subn->min_ca_rate = IB_MAX_RATE;
+	p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
 	p_subn->ignore_existing_lfts = TRUE;
 
 	/* we assume master by default - so we only need to set it true if STANDBY */
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 05/17] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 04/17] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 06/17] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

If a routing engine needs to compute spanning trees with special
properties, it needs a way to override the default implementation.
A routing engine callback provides that mechanism.  Routing engines
that can use the default implementation can leave the callback
pointer set to NULL.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |    6 ++++++
 opensm/opensm/osm_mcast_mgr.c      |    7 ++++++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index 734a6db..fddcf53 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -132,6 +132,8 @@ struct osm_routing_engine {
 	uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
 			   IN const osm_port_t *src_port,
 			   IN const osm_port_t *dst_port);
+	ib_api_status_t (*mcast_build_stree)(void *context,
+					     IN OUT osm_mgrp_box_t *mgb);
 	void (*delete) (void *context);
 	struct osm_routing_engine *next;
 };
@@ -165,6 +167,10 @@ struct osm_routing_engine {
 *	path_sl
 *		The callback for computing path SL.
 *
+*	mcast_build_stree
+*		The callback for building the spanning tree for multicast
+*		forwarding, called per MLID.
+*
 *	delete
 *		The delete method, may be used for routing engine
 *		internals cleanup.
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index 322635d..bd67d4e 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -986,6 +986,7 @@ Exit:
 static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * sm, uint16_t mlid)
 {
 	ib_api_status_t status = IB_SUCCESS;
+	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
 	osm_mgrp_box_t *mbox;
 
 	OSM_LOG_ENTER(sm->p_log);
@@ -1000,7 +1001,11 @@ static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * sm, uint16_t mlid)
 
 	mbox = osm_get_mbox_by_mlid(sm->p_subn, cl_hton16(mlid));
 	if (mbox) {
-		status = mcast_mgr_build_spanning_tree(sm, mbox);
+		if (re && re->mcast_build_stree)
+			status = re->mcast_build_stree(re->context, mbox);
+		else
+			status = mcast_mgr_build_spanning_tree(sm, mbox);
+
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 0A17: "
 				"Unable to create spanning tree (%s) for mlid "
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 06/17] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 05/17] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 07/17] opensm: Add torus-2QoS routing engine, part 1 Jim Schutt
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

A routing engine that needs to compute multicast spanning trees with
special properties will need to delete old trees.  There's already
a function that does this: mcast_mgr_purge_tree().

Make it available outside osm_mcast_mgr.c, and change the name
to follow the naming convention (osm_ prefix) for global functions.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_multicast.h |   33 +++++++++++++++++++++++++++++++++
 opensm/opensm/osm_mcast_mgr.c         |    4 ++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h
index 1da575d..df6ac6c 100644
--- a/opensm/include/opensm/osm_multicast.h
+++ b/opensm/include/opensm/osm_multicast.h
@@ -53,6 +53,7 @@
 #include <opensm/osm_mcm_port.h>
 #include <opensm/osm_subnet.h>
 #include <opensm/osm_log.h>
+#include <opensm/osm_sm.h>
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -193,6 +194,38 @@ osm_mgrp_t *osm_mgrp_new(IN osm_subn_t * subn, IN ib_net16_t mlid,
 *	Multicast Group, osm_mgrp_delete
 *********/
 
+/*
+ * Need a forward declaration to work around include loop:
+ * osm_sm.h <- osm_multicast.h
+ */
+struct osm_sm;
+
+/****f* OpenSM: Multicast Tree/osm_purge_mtree
+* NAME
+*	osm_purge_mtree
+*
+* DESCRIPTION
+*	Frees all the nodes in a multicast spanning tree
+*
+* SYNOPSIS
+*/
+void osm_purge_mtree(IN struct osm_sm * sm, IN osm_mgrp_box_t * mgb);
+/*
+* PARAMETERS
+*	sm
+*		[in] Pointer to osm_sm_t object.
+*	mgb
+*		[in] Pointer to an osm_mgrp_box_t object.
+*
+* RETURN VALUES
+*	None.
+*
+*
+* NOTES
+*
+* SEE ALSO
+*********/
+
 /****f* OpenSM: Multicast Group/osm_mgrp_is_guid
 * NAME
 *	osm_mgrp_is_guid
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index bd67d4e..e6db6db 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -146,7 +146,7 @@ static void mcast_mgr_purge_tree_node(IN osm_mtree_node_t * p_mtn)
 	free(p_mtn);
 }
 
-static void mcast_mgr_purge_tree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
+void osm_purge_mtree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
 {
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -735,7 +735,7 @@ static ib_api_status_t mcast_mgr_build_spanning_tree(osm_sm_t * sm,
 	   on multicast forwarding table information if the user wants to
 	   preserve existing multicast routes.
 	 */
-	mcast_mgr_purge_tree(sm, mbox);
+	osm_purge_mtree(sm, mbox);
 
 	/* build the first "subset" containing all member ports */
 	if (make_port_list(&port_list, mbox)) {
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 07/17] opensm: Add torus-2QoS routing engine, part 1.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 06/17] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 09/17] opensm: Add torus-2QoS routing engine, part 3 Jim Schutt
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

Generating routes for a torus that are free of credit loops requires
the use of multiple virtual lanes, and thus SLs on IB.  For IB fabrics
it also requires that _every_ application use path record queries -
any application that uses an SL that was not obtained via a path record
query may cause credit loops.

In addition, if a fabric topology change (e.g. failed switch/link)
causes a change in the path SL values needed to prevent credit loops,
then _every_ application needs to repath for every path whose SL has
changed.  AFAIK there is no good way to do this as yet in general.

Also, the requirement for path SL queries on every connection places a
heavy load on subnet administration, and the possibility that path SL
values can change makes caching as a performance enhancement more
difficult.

Since multiple VL/SL values are required to prevent credit loops on a
torus,  supporting QoS means that QoS and routing need to share the small
pool of available SL values, and the even smaller pool of available VL
values.

The torus-2QoS engine addresses these issues for a 2D/3D torus fabric
by providing the following functionality:
- routing that is free of credit loops
- two levels of QoS, assuming switches support 8 data VLs
- ability to route around a single failed switch, and/or multiple failed
    links, without
    - introducing credit loops
    - changing path SL values
- very short run times, with good scaling properties as fabric size
    increases

The routing engine currently in opensm that is most functional for a
torus-connected fabric is LASH.  In comparison with torus-2QoS, LASH
has the following issues:
- LASH does not support QoS.
- changing inter-switch topology (add/remove a switch, or
    removing all the links between a switch) can change many
    path SL values, potentially leading to credit loops if
    running applications do not repath.
- running time to calculate routes scales poorly with increasing
    fabric size.

The basic algorithm used by torus-2QoS is DOR.  It also uses SL bits 0-2,
one SL bit per torus dimension, to encode whether a path crosses a dateline
(where the coordinate value wraps to zero) for each of the three dimensions,
in order to avoid the credit loops that otherwise result on a torus.  It
uses SL bit 3 to distinguish between two QoS levels.

It uses the SL2VL tables to map those eight SL values per QoS level into
two VL values per QoS level, based on which coordinate direction a link
points.  For two QoS levels, this consumes four data VLs, where VL bit
0 encodes whether the path crosses the dateline for the coordinate
direction in which the link points, and VL bit 2 encodes QoS level.

In the event of link failure, it routes the long way around the 1-D ring
containing the failed link.  I.e. no turns are introduced into a path in
order to route around a failed link.  Note that due to this implementation,
torus-2QoS cannot route a torus with link failures that break a 1-D ring
into two disjoint segments.

Under DOR routing in a torus with a failed switch, paths that would
otherwise turn at the failed switch cannot be routed without introducing
an "illegal" turn into the path.  Such turns are "illegal" in the
sense that allowing them will allow credit loops, unless something can
be done.

The routes produced by torus-2QoS will introduce such "illegal" turns when
a switch fails.  It makes use of the input/output port dependence in the
SL2VL maps to set the otherwise unused VL bit 1 for the path hop following
such an illegal turn.  This is enough to avoid credit loops in the
presence of a single failed switch.

As an example, consider the following 2D torus, and consider routes
from S to D, both when the switch at F is operational, and when it
has failed.  torus-2QoS will generate routes such that the path
S-F-D is followed if F is operational, and the path S-E-I-L-D
if F has failed:

    |    |    |    |    |    |    |
  --+----+----+----+----+----+----+--
    |    |    |    |    |    |    |
  --+----+----+----+----+----D----+--
    |    |    |    |    |    |    |
  --+----+----+----+----I----L----+--
    |    |    |    |    |    |    |
  --+----+----S----+----E----F----+--
    |    |    |    |    |    |    |
  --+----+----+----+----+----+----+--

The turn in S-E-I-L-D at switch I is the illegal turn introduced
into the path.  The turns at E and L are extra turns introduced
into the path that are legal in the sense that no credit loops
can be constructed using them.

The path hop after the turn at switch I has VL bit 1 set, which marks
it as a hop after an illegal turn.

I've used the latest development version of ibdmchk, because it can
use path SL values and SL2VL tables, to check for credit loops in
cases like the above routed with torus-2QoS, and it finds none.

I've also looked for credit loops in a torus with multiple failed
switches routed with torus-2QoS, and learned that if and only if
the failed switches are adjacent in the last DOR dimension, there
will be no credit loops.

Since trous-2QoS uses all available SL values for unicast traffic,
multicast traffic must share SL values with unicast traffic.  This
in turn means that multicast routing must be compatible with unicast
routing to prevent credit loops.

Since torus-2QoS unicast routing is based on DOR, it turns out to
be possible to construct spanning trees so that when multicast
and unicast traffic are overlaid, credit loops are not possible.

Here is a 2D example of such a spanning tree, where "x" is the
root switch, and each "+" is a non-root switch:

   +  +  +  +  +
   |  |  |  |  |
   +  +  +  +  +
   |  |  |  |  |
   +--+--x--+--+
   |  |  |  |  |
   +  +  +  +  +

For multicast traffic routed from root to tip, every turn in the
above spanning tree is a legal DOR turn.

For traffic routed from tip to root, and traffic routed through
the root, turns are not legal DOR turns.  However, to construct
a credit loop, the union of multicast routing on this spanning
tree with DOR unicast routing can only provide 3 of the 4 turns
needed for the loop.

In addition, if none of the above spanning tree branches crosses
a dateline used for unicast credit loop avoidance on a torus,
and multicast traffic is confined to SL 0 or SL 8 (recall that
torus-2QoS uses SL bit 3 to differentiate QoS level), then
multicast traffic also cannot contribute to the "ring" credit
loops that are otherwise possible in a torus.

Torus-2QoS uses these ideas to create a master spanning tree.
Every multicast group spanning tree will be constructed as a
subset of the master tree, with the same root as the master
tree.

Such multicast group spanning trees will in general not be
optimal for groups which are a subset of the full fabric.
However, this compromise must be made to enable support for
two QoS levels on a torus while preventing credit loops.

To build a spanning tree for a particular MLID, torus-2QoS just
needs to mark all the ports that participate in that multicast
group, then walk the master spanning tree and add switches
hosting the marked ports to the multicast group spanning tree.
A depth-first search of the master spanning tree is used for this.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_torus.c | 2936 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 2936 insertions(+), 0 deletions(-)
 create mode 100644 opensm/opensm/osm_torus.c

diff --git a/opensm/opensm/osm_torus.c b/opensm/opensm/osm_torus.c
new file mode 100644
index 0000000..9bf7aa7
--- /dev/null
+++ b/opensm/opensm/osm_torus.c
@@ -0,0 +1,2936 @@
+/*
+ * Copyright 2009 Sandia Corporation.  Under the terms of Contract
+ * DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
+ * certain rights in this software.
+ *
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+/* for getline() in stdio.h */
+#define _GNU_SOURCE
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <errno.h>
+#include <string.h>
+
+#if HAVE_CONFIG_H
+#  include <config.h>
+#endif				/* HAVE_CONFIG_H */
+
+#include <opensm/osm_log.h>
+#include <opensm/osm_port.h>
+#include <opensm/osm_switch.h>
+#include <opensm/osm_node.h>
+#include <opensm/osm_opensm.h>
+
+#define TORUS_MAX_DIM        3
+#define PORTGRP_MAX_PORTS    16
+#define SWITCH_MAX_PORTGRPS  (1 + 2 * TORUS_MAX_DIM)
+
+typedef ib_net64_t guid_t;
+#define ntohllu(v_64bit) ((unsigned long long)cl_ntoh64(v_64bit))
+
+
+/*
+ * An endpoint terminates a link, and is one of three types:
+ *   UNKNOWN  - Uninitialized endpoint.
+ *   SRCSINK  - generates or consumes traffic, and thus has an associated LID;
+ *		  i.e. a CA or router port.
+ *   PASSTHRU - Has no associated LID; i.e. a switch port.
+ *
+ * If it is possible to communicate in-band with a switch, it will require
+ * a port with a GUID in the switch to source/sink that traffic, but there
+ * will be no attached link.  This code assumes there is only one such port.
+ *
+ * Here is an endpoint taxonomy:
+ *
+ *   type == SRCSINK
+ *   link == pointer to a valid struct link
+ *     ==> This endpoint is a CA or router port connected via a link to
+ *	     either a switch or another CA/router.  Thus:
+ *	   n_id ==> identifies the CA/router node GUID
+ *	   sw   ==> NULL
+ *	   port ==> identifies the port on the CA/router this endpoint uses
+ *	   pgrp ==> NULL
+ *
+ *   type == SRCSINK
+ *   link == NULL pointer
+ *     ==> This endpoint is the switch port used for in-band communication
+ *	     with the switch itself.  Thus:
+ *	   n_id ==> identifies the node GUID used to talk to the switch
+ *		      containing this endpoint
+ *	   sw   ==> pointer to valid struct switch containing this endpoint
+ *	   port ==> identifies the port on the switch this endpoint uses
+ *	   pgrp ==> NULL, or pointer to the valid struct port_grp holding
+ *		      the port in a t_switch.
+ *
+ *   type == PASSTHRU
+ *   link == pointer to valid struct link
+ *     ==> This endpoint is a switch port connected via a link to either
+ *	     another switch or a CA/router.  Thus:
+ *	   n_id ==> identifies the node GUID used to talk to the switch
+ *		      containing this endpoint - since each switch is assumed
+ *		      to have only one in-band communication port, this is a
+ *		      convenient unique name for the switch itself.
+ *	   sw   ==> pointer to valid struct switch containing this endpoint,
+ *		      or NULL, in the case of a fabric link that has been
+ *		      disconnected after being transferred to a torus link.
+ *	   port ==> identifies the port on the switch this endpoint uses.
+ *		      Note that in the special case of the coordinate direction
+ *		      links, the port value is -1, as those links aren't
+ *		      really connected to anything.
+ *	   pgrp ==> NULL, or pointer to the valid struct port_grp holding
+ *		      the port in a t_switch.
+ */
+enum endpt_type { UNKNOWN = 0, SRCSINK, PASSTHRU };
+struct torus;
+struct t_switch;
+struct port_grp;
+
+struct endpoint {
+	enum endpt_type type;
+	int port;
+	guid_t n_id;		/* IBA node GUID */
+	void *sw;		/* void* can point to either switch type */
+	struct link *link;
+	struct port_grp *pgrp;
+	void *tmp;
+	/*
+	 * Note: osm_port is only guaranteed to contain a valid pointer
+	 * when the call stack contains torus_build_lfts() or
+	 * osm_port_relink_endpoint().
+	 *
+	 * Otherwise, the opensm core could have deleted an osm_port object
+	 * without notifying us, invalidating the pointer we hold.
+	 *
+	 * When presented with a pointer to an osm_port_t, it is generally
+	 * safe and required to cast osm_port_t:priv to struct endpoint, and
+	 * check that the endpoint's osm_port is the same as the original
+	 * osm_port_t pointer.  Failure to do so means that invalidated
+	 * pointers will go undetected.
+	 */
+	struct osm_port *osm_port;
+};
+
+struct link {
+	struct endpoint end[2];
+};
+
+/*
+ * A port group is a collection of endpoints on a switch that share certain
+ * characteristics.  All the endpoints in a port group must have the same
+ * type.  Furthermore, if that type is PASSTHRU, then the connected links:
+ *   1) are parallel to a given coordinate direction
+ *   2) share the same two switches as endpoints.
+ *
+ * Torus-2QoS uses one master spanning tree for multicast, of which every
+ * multicast group spanning tree is a subtree.  to_stree_root is a pointer
+ * to the next port_grp on the path to the master spanning tree root.
+ * to_stree_tip is a pointer to the next port_grp on the path to a master
+ * spanning tree branch tip.
+ *
+ * Each t_switch can have at most one port_grp with a non-NULL to_stree_root.
+ * Exactly one t_switch in the fabric will have all port_grp objects with
+ * to_stree_root NULL; it is the master spanning tree root.
+ *
+ * A t_switch with all port_grp objects where to_stree_tip is NULL is at a
+ * master spanning tree branch tip.
+ */
+struct port_grp {
+	enum endpt_type type;
+	size_t port_cnt;	/* number of attached ports in group
+				 */
+	size_t port_grp;	/* what switch port_grp we're in */
+	unsigned sw_dlid_cnt;	/* switch dlids routed through this group */
+	unsigned ca_dlid_cnt;	/* CA dlids routed through this group */
+	struct t_switch *sw;	/* what switch we're attached to */
+	struct port_grp *to_stree_root;
+	struct port_grp *to_stree_tip;
+	struct endpoint **port;
+};
+
+/*
+ * A struct t_switch is used to represent a switch as placed in a torus.
+ *
+ * A t_switch used to build an N-dimensional torus will have 2N+1 port groups,
+ * used as follows, assuming 0 <= d < N:
+ *   port_grp[2d]   => links leaving in negative direction for coordinate d
+ *   port_grp[2d+1] => links leaving in positive direction for coordinate d
+ *   port_grp[2N]   => endpoints local to switch; i.e., hosts on switch
+ *
+ * struct link objects referenced by a t_switch are assumed to be oriented:
+ * traversing a link from link.end[0] to link.end[1] is always in the positive
+ * coordinate direction.
+ */
+struct t_switch {
+	guid_t n_id;		/* IBA node GUID */
+	int i, j, k;
+	unsigned port_cnt;	/* including management port */
+	struct torus *torus;
+	void *tmp;
+	/*
+	 * Note: osm_switch is only guaranteed to contain a valid pointer
+	 * when the call stack contains torus_build_lfts().
+	 *
+	 * Otherwise, the opensm core could have deleted an osm_switch object
+	 * without notifying us, invalidating the pointer we hold.
+	 *
+	 * When presented with a pointer to an osm_switch_t, it is generally
+	 * safe and required to cast osm_switch_t:priv to struct t_switch, and
+	 * check that the switch's osm_switch is the same as the original
+	 * osm_switch_t pointer.  Failure to do so means that invalidated
+	 * pointers will go undetected.
+	 */
+	struct osm_switch *osm_switch;
+
+	struct port_grp ptgrp[SWITCH_MAX_PORTGRPS];
+	struct endpoint **port;
+};
+
+/*
+ * We'd like to be able to discover the torus topology in a pile of switch
+ * links if we can.  We'll use a struct f_switch to store raw topology for a
+ * fabric description, then contruct the torus topology from struct t_switch
+ * objects as we process the fabric and recover it.
+ */
+struct f_switch {
+	guid_t n_id;		/* IBA node GUID */
+	unsigned port_cnt;	/* including management port */
+	void *tmp;
+	/*
+	 * Same rules apply here as for a struct t_switch member osm_switch.
+	 */
+	struct osm_switch *osm_switch;
+	struct endpoint **port;
+};
+
+struct fabric {
+	osm_opensm_t *osm;
+	unsigned ca_cnt;
+	unsigned link_cnt;
+	unsigned switch_cnt;
+
+	unsigned link_cnt_max;
+	unsigned switch_cnt_max;
+
+	struct link **link;
+	struct f_switch **sw;
+};
+
+struct coord_dirs {
+	/*
+	 * These links define the coordinate directions for the torus.
+	 * They are duplicates of links connected to switches.  Each of
+	 * these links must connect to a common switch.
+	 *
+	 * In the event that a failed switch was specified as one of these
+	 * link endpoints, our algorithm would not be able to find the
+	 * torus in the fabric.  So, we'll allow multiple instances of
+	 * this in the config file to allow improved resiliency.
+	 */
+	struct link xm_link, ym_link, zm_link;
+	struct link xp_link, yp_link, zp_link;
+	/*
+	 * A torus dimension has coordinate values 0, 1, ..., radix - 1.
+	 * The dateline, where we need to change VLs to avoid credit loops,
+	 * for a torus dimension is always between coordinate values
+	 * radix - 1 and 0.  The following specify the dateline location
+	 * relative to the coordinate links shared switch location.
+	 *
+	 * E.g. if the shared switch is at 0,0,0, the following are all
+	 * zero; if the shared switch is at 1,1,1, the following are all
+	 * -1, etc.
+	 *
+	 * Since our SL/VL assignment for a path depends on the position
+	 * of the path endpoints relative to the torus datelines, we need
+	 * this information to keep SL/VL assignment constant in the event
+	 * one of the switches used to specify coordinate directions fails.
+	 */
+	int x_dateline, y_dateline, z_dateline;
+};
+
+struct torus {
+	osm_opensm_t *osm;
+	unsigned ca_cnt;
+	unsigned link_cnt;
+	unsigned switch_cnt;
+	unsigned seed_cnt, seed_idx;
+	unsigned x_sz, y_sz, z_sz;
+
+	unsigned sw_pool_sz;
+	unsigned link_pool_sz;
+	unsigned seed_sz;
+	unsigned portgrp_sz;	/* max ports for port groups in this torus */
+
+	struct fabric *fabric;
+	struct t_switch **sw_pool;
+	struct link *link_pool;
+
+	struct coord_dirs *seed;
+	struct t_switch ****sw;
+	struct t_switch *master_stree_root;
+
+	unsigned flags;
+	int debug;
+};
+
+/*
+ * Bits to use in torus.flags
+ */
+#define X_MESH (1U << 0)
+#define Y_MESH (1U << 1)
+#define Z_MESH (1U << 2)
+#define MSG_DEADLOCK (1U << 29)
+#define NOTIFY_CHANGES (1U << 30)
+
+#define ALL_MESH(flags) \
+	((flags & (X_MESH | Y_MESH | Z_MESH)) == (X_MESH | Y_MESH | Z_MESH))
+
+
+struct torus_context {
+	osm_opensm_t *osm;
+	struct torus *torus;
+	struct fabric fabric;
+};
+
+static
+void teardown_fabric(struct fabric *f)
+{
+	unsigned l, p, s;
+	struct endpoint *port;
+	struct f_switch *sw;
+
+	if (!f)
+		return;
+
+	if (f->sw) {
+		/*
+		 * Need to free switches, and also find/free the endpoints
+		 * we allocated for switch management ports.
+		 */
+		for (s = 0; s < f->switch_cnt; s++) {
+			sw = f->sw[s];
+			if (!sw)
+				continue;
+
+			for (p = 0; p < sw->port_cnt; p++) {
+				port = sw->port[p];
+				if (port && !port->link)
+					free(port);	/* management port */
+			}
+			free(sw);
+		}
+		free(f->sw);
+	}
+	if (f->link) {
+		for (l = 0; l < f->link_cnt; l++)
+			if (f->link[l])
+				free(f->link[l]);
+
+		free(f->link);
+	}
+	memset(f, 0, sizeof(*f));
+}
+
+void teardown_torus(struct torus *t)
+{
+	unsigned p, s;
+	struct endpoint *port;
+	struct t_switch *sw;
+
+	if (!t)
+		return;
+
+	if (t->sw_pool) {
+		/*
+		 * Need to free switches, and also find/free the endpoints
+		 * we allocated for switch management ports.
+		 */
+		for (s = 0; s < t->switch_cnt; s++) {
+			sw = t->sw_pool[s];
+			if (!sw)
+				continue;
+
+			for (p = 0; p < sw->port_cnt; p++) {
+				port = sw->port[p];
+				if (port && !port->link)
+					free(port);	/* management port */
+			}
+			free(sw);
+		}
+		free(t->sw_pool);
+	}
+	if (t->link_pool)
+		free(t->link_pool);
+
+	if (t->sw)
+		free(t->sw);
+
+	if (t->seed)
+		free(t->seed);
+
+	free(t);
+}
+
+static
+struct torus_context *torus_context_create(osm_opensm_t *osm)
+{
+	struct torus_context *ctx;
+
+	ctx = calloc(1, sizeof(*ctx));
+	ctx->osm = osm;
+
+	return ctx;
+}
+
+static
+void torus_context_delete(void *context)
+{
+	struct torus_context *ctx = context;
+
+	teardown_fabric(&ctx->fabric);
+	if (ctx->torus)
+		teardown_torus(ctx->torus);
+	free(ctx);
+}
+
+static
+bool grow_seed_array(struct torus *t, int new_seeds)
+{
+	unsigned cnt;
+	void *ptr;
+
+	cnt = t->seed_cnt + new_seeds;
+	if (cnt > t->seed_sz) {
+		cnt += 2 + cnt / 2;
+		ptr = realloc(t->seed, cnt * sizeof(*t->seed));
+		if (!ptr)
+			return false;
+		t->seed = ptr;
+		t->seed_sz = cnt;
+		memset(&t->seed[t->seed_cnt], 0,
+		       (cnt - t->seed_cnt) * sizeof(*t->seed));
+	}
+	return true;
+}
+
+static
+struct f_switch *find_f_sw(struct fabric *f, guid_t sw_guid)
+{
+	unsigned s;
+	struct f_switch *sw;
+
+	if (f->sw) {
+		for (s = 0; s < f->switch_cnt; s++) {
+			sw = f->sw[s];
+			if (sw->n_id == sw_guid)
+				return sw;
+		}
+	}
+	return NULL;
+}
+
+static
+struct link *find_f_link(struct fabric *f,
+			 guid_t guid0, int port0, guid_t guid1, int port1)
+{
+	unsigned l;
+	struct link *link;
+
+	if (f->link) {
+		for (l = 0; l < f->link_cnt; l++) {
+			link = f->link[l];
+			if ((link->end[0].n_id == guid0 &&
+			     link->end[0].port == port0 &&
+			     link->end[1].n_id == guid1 &&
+			     link->end[1].port == port1) ||
+			    (link->end[0].n_id == guid1 &&
+			     link->end[0].port == port1 &&
+			     link->end[1].n_id == guid0 &&
+			     link->end[1].port == port0))
+				return link;
+		}
+	}
+	return NULL;
+}
+
+static
+struct f_switch *alloc_fswitch(struct fabric *f,
+			       guid_t sw_id, unsigned port_cnt)
+{
+	size_t new_sw_sz;
+	unsigned cnt_max;
+	struct f_switch *sw = NULL;
+	void *ptr;
+
+	if (f->switch_cnt >= f->switch_cnt_max) {
+
+		cnt_max = 16 + 5 * f->switch_cnt_max / 4;
+		ptr = realloc(f->sw, cnt_max * sizeof(*f->sw));
+		if (!ptr) {
+			OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+				"Error: realloc: %s\n", strerror(errno));
+			goto out;
+		}
+		f->sw = ptr;
+		f->switch_cnt_max = cnt_max;
+		memset(&f->sw[f->switch_cnt], 0,
+		       (f->switch_cnt_max - f->switch_cnt)*sizeof(*f->sw));
+	}
+	new_sw_sz = sizeof(*sw) + port_cnt * sizeof(*sw->port);
+	sw = calloc(1, new_sw_sz);
+	if (!sw) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: calloc: %s\n", strerror(errno));
+		goto out;
+	}
+	sw->port = (void *)(sw + 1);
+	sw->n_id = sw_id;
+	sw->port_cnt = port_cnt;
+	f->sw[f->switch_cnt++] = sw;
+out:
+	return sw;
+}
+
+static
+struct link *alloc_flink(struct fabric *f)
+{
+	unsigned cnt_max;
+	struct link *l = NULL;
+	void *ptr;
+
+	if (f->link_cnt >= f->link_cnt_max) {
+
+		cnt_max = 16 + 5 * f->link_cnt_max / 4;
+		ptr = realloc(f->link, cnt_max * sizeof(*f->link));
+		if (!ptr) {
+			OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+				"Error: realloc: %s\n", strerror(errno));
+			goto out;
+		}
+		f->link = ptr;
+		f->link_cnt_max = cnt_max;
+		memset(&f->link[f->link_cnt], 0,
+		       (f->link_cnt_max - f->link_cnt) * sizeof(*f->link));
+	}
+	l = calloc(1, sizeof(*l));
+	if (!l) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: calloc: %s\n", strerror(errno));
+		goto out;
+	}
+	f->link[f->link_cnt++] = l;
+out:
+	return l;
+}
+
+/*
+ * Caller must ensure osm_port points to a valid port which contains
+ * a valid osm_physp_t pointer for port 0, the switch management port.
+ */
+static
+bool build_sw_endpoint(struct fabric *f, osm_port_t *osm_port)
+{
+	int sw_port;
+	guid_t sw_guid;
+	struct osm_switch *osm_sw;
+	struct f_switch *sw;
+	struct endpoint *ep;
+	bool success = false;
+
+	sw_port = osm_physp_get_port_num(osm_port->p_physp);
+	sw_guid = osm_node_get_node_guid(osm_port->p_node);
+	osm_sw = osm_port->p_node->sw;
+
+	/*
+	 * The switch must already exist.
+	 */
+	sw = find_f_sw(f, sw_guid);
+	if (!sw) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: missing switch w/ GUID 0x%04llx\n",
+			ntohllu(sw_guid));
+		goto out;
+	}
+	/*
+	 * The endpoint may already exist.
+	 */
+	if (sw->port[sw_port]) {
+		if (sw->port[sw_port]->n_id == sw_guid) {
+			ep = sw->port[sw_port];
+			goto success;
+		} else
+			OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+				"Error: switch port %d has id "
+				"0x%04llx, expected 0x%04llx\n",
+				sw_port, ntohllu(sw->port[sw_port]->n_id),
+				ntohllu(sw_guid));
+		goto out;
+	}
+	ep = calloc(1, sizeof(*ep));
+	if (!ep) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: allocating endpoint: %s\n", strerror(errno));
+		goto out;
+	}
+	ep->type = SRCSINK;
+	ep->port = sw_port;
+	ep->n_id = sw_guid;
+	ep->link = NULL;
+	ep->sw = sw;
+
+	sw->port[sw_port] = ep;
+
+success:
+	/*
+	 * Fabric objects are temporary, so don't set osm_sw/osm_port priv
+	 * pointers using them.  Wait until torus objects get constructed.
+	 */
+	sw->osm_switch = osm_sw;
+	ep->osm_port = osm_port;
+
+	success = true;
+out:
+	return success;
+}
+
+static
+bool build_ca_link(struct fabric *f,
+		   osm_port_t *osm_port_ca, guid_t sw_guid, int sw_port)
+{
+	int ca_port;
+	guid_t ca_guid;
+	struct link *l;
+	struct f_switch *sw;
+	bool success = false;
+
+	ca_port = osm_physp_get_port_num(osm_port_ca->p_physp);
+	ca_guid = osm_node_get_node_guid(osm_port_ca->p_node);
+
+	/*
+	 * The link may already exist.
+	 */
+	l = find_f_link(f, sw_guid, sw_port, ca_guid, ca_port);
+	if (l) {
+		success = true;
+		goto out;
+	}
+	/*
+	 * The switch must already exist.
+	 */
+	sw = find_f_sw(f, sw_guid);
+	if (!sw) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: missing switch w/ GUID 0x%04llx\n",
+			ntohllu(sw_guid));
+		goto out;
+	}
+	l = alloc_flink(f);
+	if (!l)
+		goto out;
+
+	l->end[0].type = PASSTHRU;
+	l->end[0].port = sw_port;
+	l->end[0].n_id = sw_guid;
+	l->end[0].sw = sw;
+	l->end[0].link = l;
+
+	sw->port[sw_port] = &l->end[0];
+
+	l->end[1].type = SRCSINK;
+	l->end[1].port = ca_port;
+	l->end[1].n_id = ca_guid;
+	l->end[1].sw = NULL;		/* Correct for a CA */
+	l->end[1].link = l;
+
+	/*
+	 * Fabric objects are temporary, so don't set osm_sw/osm_port priv
+	 * pointers using them.  Wait until torus objects get constructed.
+	 */
+	l->end[1].osm_port = osm_port_ca;
+
+	++f->ca_cnt;
+	success = true;
+out:
+	return success;
+}
+
+static
+bool build_link(struct fabric *f,
+		guid_t sw_guid0, int sw_port0, guid_t sw_guid1, int sw_port1)
+{
+	struct link *l;
+	struct f_switch *sw0, *sw1;
+	bool success = false;
+
+	/*
+	 * The link may already exist.
+	 */
+	l = find_f_link(f, sw_guid0, sw_port0, sw_guid1, sw_port1);
+	if (l) {
+		success = true;
+		goto out;
+	}
+	/*
+	 * The switches must already exist.
+	 */
+	sw0 = find_f_sw(f, sw_guid0);
+	if (!sw0) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: missing switch w/ GUID 0x%04llx\n",
+			ntohllu(sw_guid0));
+			goto out;
+	}
+	sw1 = find_f_sw(f, sw_guid1);
+	if (!sw1) {
+		OSM_LOG(&f->osm->log, OSM_LOG_ERROR,
+			"Error: missing switch w/ GUID 0x%04llx\n",
+			ntohllu(sw_guid1));
+			goto out;
+	}
+	l = alloc_flink(f);
+	if (!l)
+		goto out;
+
+	l->end[0].type = PASSTHRU;
+	l->end[0].port = sw_port0;
+	l->end[0].n_id = sw_guid0;
+	l->end[0].sw = sw0;
+	l->end[0].link = l;
+
+	sw0->port[sw_port0] = &l->end[0];
+
+	l->end[1].type = PASSTHRU;
+	l->end[1].port = sw_port1;
+	l->end[1].n_id = sw_guid1;
+	l->end[1].sw = sw1;
+	l->end[1].link = l;
+
+	sw1->port[sw_port1] = &l->end[1];
+
+	success = true;
+out:
+	return success;
+}
+
+static
+bool parse_size(unsigned *tsz, unsigned *tflags, unsigned mask,
+		const char *parse_sep)
+{
+	char *val, *nextchar;
+
+	val = strtok(NULL, parse_sep);
+	if (!val)
+		return false;
+	*tsz = strtoul(val, &nextchar, 0);
+	if (*tsz) {
+		if (*nextchar == 't' || *nextchar == 'T')
+			*tflags &= ~mask;
+		else if (*nextchar == 'm' || *nextchar == 'M')
+			*tflags |= mask;
+		/*
+		 * A torus of radix two is also a mesh of radix two
+		 * with multiple links between switches in that direction.
+		 *
+		 * Make it so always, otherwise the failure case routing
+		 * logic gets confused.
+		 */
+		if (*tsz == 2)
+			*tflags |= mask;
+	}
+	return true;
+}
+
+static
+bool parse_torus(struct torus *t, const char *parse_sep)
+{
+	unsigned i, j, k, cnt;
+	char *ptr;
+	bool success = false;
+
+	if (!parse_size(&t->x_sz, &t->flags, X_MESH, parse_sep))
+		goto out;
+
+	if (!parse_size(&t->y_sz, &t->flags, Y_MESH, parse_sep))
+		goto out;
+
+	if (!parse_size(&t->z_sz, &t->flags, Z_MESH, parse_sep))
+		goto out;
+
+	/*
+	 * Set up a linear array of switch pointers big enough to hold
+	 * all expected switches.
+	 */
+	t->sw_pool_sz = t->x_sz * t->y_sz * t->z_sz;
+	t->sw_pool = calloc(t->sw_pool_sz, sizeof(*t->sw_pool));
+	if (!t->sw_pool) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: Torus switch array calloc: %s\n",
+			strerror(errno));
+		goto out;
+	}
+	/*
+	 * Set things up so that t->sw[i][j][k] can point to the i,j,k switch.
+	 */
+	cnt = t->x_sz * (1 + t->y_sz * (1 + t->z_sz));
+	t->sw = malloc(cnt * sizeof(void *));
+	if (!t->sw) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: Torus switch array malloc: %s\n",
+			strerror(errno));
+		goto out;
+	}
+	ptr = (void *)(t->sw);
+
+	ptr += t->x_sz * sizeof(void *);
+	for (i = 0; i < t->x_sz; i++) {
+		t->sw[i] = (void *)ptr;
+		ptr += t->y_sz * sizeof(void *);
+	}
+	for (i = 0; i < t->x_sz; i++)
+		for (j = 0; j < t->y_sz; j++) {
+			t->sw[i][j] = (void *)ptr;
+			ptr += t->z_sz * sizeof(void *);
+		}
+
+	for (i = 0; i < t->x_sz; i++)
+		for (j = 0; j < t->y_sz; j++)
+			for (k = 0; k < t->z_sz; k++)
+				t->sw[i][j][k] = NULL;
+
+	success = true;
+out:
+	return success;
+}
+
+static
+bool parse_pg_max_ports(struct torus *t, const char *parse_sep)
+{
+	char *val, *nextchar;
+
+	val = strtok(NULL, parse_sep);
+	if (!val)
+		return false;
+	t->portgrp_sz = strtoul(val, &nextchar, 0);
+	return true;
+}
+
+static
+bool parse_guid(struct torus *t, guid_t *guid, const char *parse_sep)
+{
+	char *val;
+	bool success = false;
+
+	val = strtok(NULL, parse_sep);
+	if (!val)
+		goto out;
+	*guid = strtoull(val, NULL, 0);
+	*guid = cl_hton64(*guid);
+
+	success = true;
+out:
+	return success;
+}
+
+static
+bool parse_dir_link(int c_dir, struct torus *t, const char *parse_sep)
+{
+	guid_t sw_guid0, sw_guid1;
+	struct link *l;
+	bool success = false;
+
+	if (!parse_guid(t, &sw_guid0, parse_sep))
+		goto out;
+
+	if (!parse_guid(t, &sw_guid1, parse_sep))
+		goto out;
+
+	if (!t) {
+		success = true;
+		goto out;
+	}
+
+	switch (c_dir) {
+	case -1:
+		l = &t->seed[t->seed_cnt - 1].xm_link;
+		break;
+	case  1:
+		l = &t->seed[t->seed_cnt - 1].xp_link;
+		break;
+	case -2:
+		l = &t->seed[t->seed_cnt - 1].ym_link;
+		break;
+	case  2:
+		l = &t->seed[t->seed_cnt - 1].yp_link;
+		break;
+	case -3:
+		l = &t->seed[t->seed_cnt - 1].zm_link;
+		break;
+	case  3:
+		l = &t->seed[t->seed_cnt - 1].zp_link;
+		break;
+	default:
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: unknown link direction %d\n", c_dir);
+		goto out;
+	}
+	l->end[0].type = PASSTHRU;
+	l->end[0].port = -1;		/* We don't really connect. */
+	l->end[0].n_id = sw_guid0;
+	l->end[0].sw = NULL;		/* Fix this up later. */
+	l->end[0].link = NULL;		/* Fix this up later. */
+
+	l->end[1].type = PASSTHRU;
+	l->end[1].port = -1;		/* We don't really connect. */
+	l->end[1].n_id = sw_guid1;
+	l->end[1].sw = NULL;		/* Fix this up later. */
+	l->end[1].link = NULL;		/* Fix this up later. */
+
+	success = true;
+out:
+	return success;
+}
+
+static
+bool parse_dir_dateline(int c_dir, struct torus *t, const char *parse_sep)
+{
+	char *val;
+	int *dl, max_dl;
+	bool success = false;
+
+	val = strtok(NULL, parse_sep);
+	if (!val)
+		goto out;
+
+	if (!t) {
+		success = true;
+		goto out;
+	}
+
+	switch (c_dir) {
+	case  1:
+		dl = &t->seed[t->seed_cnt - 1].x_dateline;
+		max_dl = t->x_sz;
+		break;
+	case  2:
+		dl = &t->seed[t->seed_cnt - 1].y_dateline;
+		max_dl = t->y_sz;
+		break;
+	case  3:
+		dl = &t->seed[t->seed_cnt - 1].z_dateline;
+		max_dl = t->z_sz;
+		break;
+	default:
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: unknown dateline direction %d\n", c_dir);
+		goto out;
+	}
+	*dl = strtol(val, NULL, 0);
+
+	if ((*dl < 0 && *dl <= -max_dl) || *dl >= max_dl)
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: dateline value for coordinate direction %d "
+			"must be %d < dl < %d\n",
+			c_dir, -max_dl, max_dl);
+	else
+		success = true;
+out:
+	return success;
+}
+
+static
+bool parse_config(const char *fn, struct fabric *f, struct torus *t)
+{
+	FILE *fp;
+	char *keyword;
+	char *line_buf = NULL;
+	const char *parse_sep = " \n\t";
+	size_t line_buf_sz = 0;
+	size_t line_cntr = 0;
+	ssize_t llen;
+	bool kw_success, success = true;
+
+	if (!grow_seed_array(t, 2))
+		return false;
+
+	fp = fopen(fn, "r");
+	if (!fp) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Opening %s: %s\n", fn, strerror(errno));
+		return false;
+	}
+	t->flags |= NOTIFY_CHANGES;
+	t->portgrp_sz = PORTGRP_MAX_PORTS;
+
+next_line:
+	llen = getline(&line_buf, &line_buf_sz, fp);
+	if (llen < 0)
+		goto out;
+
+	++line_cntr;
+
+	keyword = strtok(line_buf, parse_sep);
+	if (!keyword)
+		goto next_line;
+
+	if (strcmp("torus", keyword) == 0) {
+		kw_success = parse_torus(t, parse_sep);
+	} else if (strcmp("mesh", keyword) == 0) {
+		t->flags |= X_MESH | Y_MESH | Z_MESH;
+		kw_success = parse_torus(t, parse_sep);
+	} else if (strcmp("next_seed", keyword) == 0) {
+		kw_success = grow_seed_array(t, 1);
+		t->seed_cnt++;
+	} else if (strcmp("portgroup_max_ports", keyword) == 0) {
+		kw_success = parse_pg_max_ports(t, parse_sep);
+	} else if (strcmp("xp_link", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_link(1, t, parse_sep);
+	} else if (strcmp("xm_link", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_link(-1, t, parse_sep);
+	} else if (strcmp("x_dateline", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_dateline(1, t, parse_sep);
+	} else if (strcmp("yp_link", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_link(2, t, parse_sep);
+	} else if (strcmp("ym_link", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_link(-2, t, parse_sep);
+	} else if (strcmp("y_dateline", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_dateline(2, t, parse_sep);
+	} else if (strcmp("zp_link", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_link(3, t, parse_sep);
+	} else if (strcmp("zm_link", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_link(-3, t, parse_sep);
+	} else if (strcmp("z_dateline", keyword) == 0) {
+		if (!t->seed_cnt)
+			t->seed_cnt++;
+		kw_success = parse_dir_dateline(3, t, parse_sep);
+	} else if (keyword[0] == '#')
+		goto next_line;
+	else {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: no keyword found: line %u\n",
+			(unsigned)line_cntr);
+		kw_success = false;
+	}
+	if (!kw_success) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: parsing '%s': line %u\n",
+			keyword, (unsigned)line_cntr);
+	}
+	success = success && kw_success;
+	goto next_line;
+
+out:
+	if (line_buf)
+		free(line_buf);
+	fclose(fp);
+	return success;
+}
+
+static
+bool capture_fabric(struct fabric *fabric)
+{
+	osm_subn_t *subnet = &fabric->osm->subn;
+	osm_switch_t *osm_sw;
+	osm_physp_t *lphysp, *rphysp;
+	osm_port_t *lport;
+	osm_node_t *osm_node;
+	cl_map_item_t *item;
+	uint8_t ltype, rtype;
+	int p, port_cnt;
+	guid_t sw_guid;
+	bool success = true;
+
+	OSM_LOG_ENTER(&fabric->osm->log);
+
+	/*
+	 * On OpenSM data structures:
+	 *
+	 * Apparently, every port in a fabric has an associated osm_physp_t,
+	 * but not every port has an associated osm_port_t.  Apparently every
+	 * osm_port_t has an associated osm_physp_t.
+	 *
+	 * So, in order to find the inter-switch links we need to walk the
+	 * switch list and examine each port, via its osm_physp_t object.
+	 *
+	 * But, we need to associate our CA and switch management port
+	 * endpoints with the corresponding osm_port_t objects, in order
+	 * to simplify computation of LFT entries and perform SL lookup for
+	 * path records. Since it is apparently difficult to locate the
+	 * osm_port_t that corresponds to a given osm_physp_t, we also
+	 * need to walk the list of ports indexed by GUID to get access
+	 * to the appropriate osm_port_t objects.
+	 *
+	 * Need to allocate our switches before we do anything else.
+	 */
+	item = cl_qmap_head(&subnet->sw_guid_tbl);
+	while (item != cl_qmap_end(&subnet->sw_guid_tbl)) {
+
+		osm_sw = (osm_switch_t *)item;
+		item = cl_qmap_next(item);
+		osm_node = osm_sw->p_node;
+
+		if (osm_node_get_type(osm_node) != IB_NODE_TYPE_SWITCH)
+			continue;
+
+		port_cnt = osm_node_get_num_physp(osm_node);
+		sw_guid = osm_node_get_node_guid(osm_node);
+
+		success = alloc_fswitch(fabric, sw_guid, port_cnt);
+		if (!success)
+			goto out;
+	}
+	/*
+	 * Now build all our endpoints.
+	 */
+	item = cl_qmap_head(&subnet->port_guid_tbl);
+	while (item != cl_qmap_end(&subnet->port_guid_tbl)) {
+
+		lport = (osm_port_t *)item;
+		item = cl_qmap_next(item);
+
+		lphysp = lport->p_physp;
+		if (!(lphysp && osm_physp_is_valid(lphysp)))
+			continue;
+
+		ltype = osm_node_get_type(lphysp->p_node);
+		/*
+		 * Switch management port is always port 0.
+		 */
+		if (lphysp->port_num == 0 && ltype == IB_NODE_TYPE_SWITCH) {
+			success = build_sw_endpoint(fabric, lport);
+			if (!success)
+				goto out;
+			continue;
+		}
+		rphysp = lphysp->p_remote_physp;
+		if (!(rphysp && osm_physp_is_valid(rphysp)))
+			continue;
+
+		rtype = osm_node_get_type(rphysp->p_node);
+
+		if ((ltype != IB_NODE_TYPE_CA &&
+		     ltype != IB_NODE_TYPE_ROUTER) ||
+		    rtype != IB_NODE_TYPE_SWITCH)
+			continue;
+
+		success =
+			build_ca_link(fabric, lport,
+				      osm_node_get_node_guid(rphysp->p_node),
+				      osm_physp_get_port_num(rphysp));
+		if (!success)
+			goto out;
+	}
+	/*
+	 * Lastly, build all our interswitch links.
+	 */
+	item = cl_qmap_head(&subnet->sw_guid_tbl);
+	while (item != cl_qmap_end(&subnet->sw_guid_tbl)) {
+
+		osm_sw = (osm_switch_t *)item;
+		item = cl_qmap_next(item);
+
+		port_cnt = osm_node_get_num_physp(osm_sw->p_node);
+		for (p = 0; p < port_cnt; p++) {
+
+			lphysp = osm_node_get_physp_ptr(osm_sw->p_node, p);
+			if (!(lphysp && osm_physp_is_valid(lphysp)))
+				continue;
+
+			rphysp = lphysp->p_remote_physp;
+			if (!(rphysp && osm_physp_is_valid(rphysp)))
+				continue;
+
+			if (lphysp == rphysp)
+				continue;	/* ignore loopbacks */
+
+			ltype = osm_node_get_type(lphysp->p_node);
+			rtype = osm_node_get_type(rphysp->p_node);
+
+			if (ltype != IB_NODE_TYPE_SWITCH ||
+			    rtype != IB_NODE_TYPE_SWITCH)
+				continue;
+
+			success =
+				build_link(fabric,
+					   osm_node_get_node_guid(lphysp->p_node),
+					   osm_physp_get_port_num(lphysp),
+					   osm_node_get_node_guid(rphysp->p_node),
+					   osm_physp_get_port_num(rphysp));
+			if (!success)
+				goto out;
+		}
+	}
+out:
+	OSM_LOG_EXIT(&fabric->osm->log);
+	return success;
+}
+
+/*
+ * diagnose_fabric() is just intended to report on fabric elements that
+ * could not be placed into the torus.  We want to warn that there were
+ * non-torus fabric elements, but they will be ignored for routing purposes.
+ * Having them is not an error, and diagnose_fabric() thus has no return
+ * value.
+ */
+static
+void diagnose_fabric(struct fabric *f)
+{
+	struct link *l;
+	struct endpoint *ep;
+	unsigned k, p;
+
+	/*
+	 * Report on any links that didn't get transferred to the torus.
+	 */
+	for (k = 0; k < f->link_cnt; k++) {
+		l = f->link[k];
+
+		if (!(l->end[0].sw && l->end[1].sw))
+			continue;
+
+		OSM_LOG(&f->osm->log, OSM_LOG_INFO,
+			"Found non-torus fabric link:"
+			" sw GUID 0x%04llx port %d <->"
+			" sw GUID 0x%04llx port %d\n",
+			ntohllu(l->end[0].n_id), l->end[0].port,
+			ntohllu(l->end[1].n_id), l->end[1].port);
+	}
+	/*
+	 * Report on any switches with ports using endpoints that didn't
+	 * get transferred to the torus.
+	 */
+	for (k = 0; k < f->switch_cnt; k++)
+		for (p = 0; p < f->sw[k]->port_cnt; p++) {
+
+			if (!f->sw[k]->port[p])
+				continue;
+
+			ep = f->sw[k]->port[p];
+
+			/*
+			 * We already reported on inter-switch links above.
+			 */
+			if (ep->type == PASSTHRU)
+				continue;
+
+			OSM_LOG(&f->osm->log, OSM_LOG_INFO,
+				"Found non-torus fabric port:"
+				" sw GUID 0x%04llx port %d\n",
+				ntohllu(f->sw[k]->n_id), p);
+		}
+}
+
+static
+struct t_switch *alloc_tswitch(struct torus *t, struct f_switch *fsw)
+{
+	unsigned g;
+	size_t new_sw_sz;
+	struct t_switch *sw = NULL;
+	void *ptr;
+
+	if (!fsw)
+		goto out;
+
+	if (t->switch_cnt >= t->sw_pool_sz) {
+		/*
+		 * This should never happen, but occasionally a particularly
+		 * pathological fabric can induce it.  So log an error.
+		 */
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: unexpectedly requested too many switch "
+			"structures!\n");
+		goto out;
+	}
+	new_sw_sz = sizeof(*sw)
+		+ fsw->port_cnt * sizeof(*sw->port)
+		+ SWITCH_MAX_PORTGRPS * t->portgrp_sz * sizeof(*sw->ptgrp[0].port);
+	sw = calloc(1, new_sw_sz);
+	if (!sw) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: calloc: %s\n", strerror(errno));
+		goto out;
+	}
+	sw->port = (void *)(sw + 1);
+	sw->n_id = fsw->n_id;
+	sw->port_cnt = fsw->port_cnt;
+	sw->torus = t;
+	sw->tmp = fsw;
+
+	ptr = &sw->port[sw->port_cnt];
+
+	for (g = 0; g < SWITCH_MAX_PORTGRPS; g++) {
+		sw->ptgrp[g].port_grp = g;
+		sw->ptgrp[g].sw = sw;
+		sw->ptgrp[g].port = ptr;
+		ptr = &sw->ptgrp[g].port[t->portgrp_sz];
+	}
+	t->sw_pool[t->switch_cnt++] = sw;
+out:
+	return sw;
+}
+
+/*
+ * install_tswitch() expects the switch coordinates i,j,k to be canonicalized
+ * by caller.
+ */
+static
+bool install_tswitch(struct torus *t,
+		     int i, int j, int k, struct f_switch *fsw)
+{
+	struct t_switch **sw = &t->sw[i][j][k];
+
+	if (!*sw)
+		*sw = alloc_tswitch(t, fsw);
+
+	if (*sw) {
+		(*sw)->i = i;
+		(*sw)->j = j;
+		(*sw)->k = k;
+	}
+	return !!*sw;
+}
+
+static
+struct link *alloc_tlink(struct torus *t)
+{
+	if (t->link_cnt >= t->link_pool_sz) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: unexpectedly out of pre-allocated link "
+			"structures!\n");
+		return NULL;
+	}
+	return &t->link_pool[t->link_cnt++];
+}
+
+static
+int canonicalize(int v, int vmax)
+{
+	if (v >= 0 && v < vmax)
+		return v;
+
+	if (v < 0)
+		v += vmax * (1 - v/vmax);
+
+	return v % vmax;
+}
+
+static
+unsigned set_fp_bit(bool present, int i, int j, int k)
+{
+	return (unsigned)(!present) << (i + 2 * j + 4 * k);
+}
+
+/*
+ * Returns an 11-bit fingerprint of what switches are absent in a cube of
+ * neighboring switches.  Each bit 0-7 corresponds to a corner of the cube;
+ * if a bit is set the corresponding switch is absent.
+ *
+ * Bits 8-10 distinguish between 2D and 3D cases.  If bit 8+d is set,
+ * for 0 <= d < 3;  the d dimension of the desired torus has radix greater
+ * than 1. Thus, if all bits 8-10 are set, the desired torus is 3D.
+ */
+static
+unsigned fingerprint(struct torus *t, int i, int j, int k)
+{
+	unsigned fp;
+	int ip1, jp1, kp1;
+	int x_sz_gt1, y_sz_gt1, z_sz_gt1;
+
+	x_sz_gt1 = t->x_sz > 1;
+	y_sz_gt1 = t->y_sz > 1;
+	z_sz_gt1 = t->z_sz > 1;
+
+	ip1 = canonicalize(i + 1, t->x_sz);
+	jp1 = canonicalize(j + 1, t->y_sz);
+	kp1 = canonicalize(k + 1, t->z_sz);
+
+	fp  = set_fp_bit(t->sw[i][j][k], 0, 0, 0);
+	fp |= set_fp_bit(t->sw[ip1][j][k], x_sz_gt1, 0, 0);
+	fp |= set_fp_bit(t->sw[i][jp1][k], 0, y_sz_gt1, 0);
+	fp |= set_fp_bit(t->sw[ip1][jp1][k], x_sz_gt1, y_sz_gt1, 0);
+	fp |= set_fp_bit(t->sw[i][j][kp1], 0, 0, z_sz_gt1);
+	fp |= set_fp_bit(t->sw[ip1][j][kp1], x_sz_gt1, 0, z_sz_gt1);
+	fp |= set_fp_bit(t->sw[i][jp1][kp1], 0, y_sz_gt1, z_sz_gt1);
+	fp |= set_fp_bit(t->sw[ip1][jp1][kp1], x_sz_gt1, y_sz_gt1, z_sz_gt1);
+
+	fp |= x_sz_gt1 << 8;
+	fp |= y_sz_gt1 << 9;
+	fp |= z_sz_gt1 << 10;
+
+	return fp;
+}
+
+static
+bool connect_tlink(struct port_grp *pg0, struct endpoint *f_ep0,
+		   struct port_grp *pg1, struct endpoint *f_ep1,
+		   struct torus *t)
+{
+	struct link *l;
+	bool success = false;
+
+	if (pg0->port_cnt == t->portgrp_sz) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: exceeded port group max "
+			"port count (%d): switch GUID 0x%04llx\n",
+			t->portgrp_sz, ntohllu(pg0->sw->n_id));
+		goto out;
+	}
+	if (pg1->port_cnt == t->portgrp_sz) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: exceeded port group max "
+			"port count (%d): switch GUID 0x%04llx\n",
+			t->portgrp_sz, ntohllu(pg1->sw->n_id));
+		goto out;
+	}
+	l = alloc_tlink(t);
+	if (!l)
+		goto out;
+
+	l->end[0].type = f_ep0->type;
+	l->end[0].port = f_ep0->port;
+	l->end[0].n_id = f_ep0->n_id;
+	l->end[0].sw = pg0->sw;
+	l->end[0].link = l;
+	l->end[0].pgrp = pg0;
+	pg0->port[pg0->port_cnt++] = &l->end[0];
+	pg0->sw->port[f_ep0->port] = &l->end[0];
+
+	if (f_ep0->osm_port) {
+		l->end[0].osm_port = f_ep0->osm_port;
+		l->end[0].osm_port->priv = &l->end[0];
+		f_ep0->osm_port = NULL;
+	}
+
+	l->end[1].type = f_ep1->type;
+	l->end[1].port = f_ep1->port;
+	l->end[1].n_id = f_ep1->n_id;
+	l->end[1].sw = pg1->sw;
+	l->end[1].link = l;
+	l->end[1].pgrp = pg1;
+	pg1->port[pg1->port_cnt++] = &l->end[1];
+	pg1->sw->port[f_ep1->port] = &l->end[1];
+
+	if (f_ep1->osm_port) {
+		l->end[1].osm_port = f_ep1->osm_port;
+		l->end[1].osm_port->priv = &l->end[1];
+		f_ep1->osm_port = NULL;
+	}
+	/*
+	 * Disconnect fabric link, so that later we can see if any were
+	 * left unconnected in the torus.
+	 */
+	((struct f_switch *)f_ep0->sw)->port[f_ep0->port] = NULL;
+	f_ep0->sw = NULL;
+	f_ep0->port = -1;
+
+	((struct f_switch *)f_ep1->sw)->port[f_ep1->port] = NULL;
+	f_ep1->sw = NULL;
+	f_ep1->port = -1;
+
+	success = true;
+out:
+	return success;
+}
+
+static
+bool link_tswitches(struct torus *t, int cdir,
+		    struct t_switch *t_sw0, struct t_switch *t_sw1)
+{
+	int p;
+	struct port_grp *pg0, *pg1;
+	struct f_switch *f_sw0, *f_sw1;
+	char *cdir_name = "unknown";
+	unsigned port_cnt;
+	int success = false;
+
+	/*
+	 * If this is a 2D torus, it is possible for this function to be
+	 * called with its two switch arguments being the same switch, in
+	 * which case there are no links to install.
+	 */
+	if (t_sw0 == t_sw1 &&
+	    ((cdir == 0 && t->x_sz == 1) ||
+	     (cdir == 1 && t->y_sz == 1) ||
+	     (cdir == 2 && t->z_sz == 1))) {
+		success = true;
+		goto out;
+	}
+	/*
+	 * Ensure that t_sw1 is in the positive cdir direction wrt. t_sw0.
+	 * ring_next_sw() relies on it.
+	 */
+	switch (cdir) {
+	case 0:
+		if (t->x_sz > 1 &&
+		    canonicalize(t_sw0->i + 1, t->x_sz) != t_sw1->i) {
+			cdir_name = "x";
+			goto cdir_error;
+		}
+		break;
+	case 1:
+		if (t->y_sz > 1 &&
+		    canonicalize(t_sw0->j + 1, t->y_sz) != t_sw1->j) {
+			cdir_name = "y";
+			goto cdir_error;
+		}
+		break;
+	case 2:
+		if (t->z_sz > 1 &&
+		    canonicalize(t_sw0->k + 1, t->z_sz) != t_sw1->k) {
+			cdir_name = "z";
+			goto cdir_error;
+		}
+		break;
+	default:
+	cdir_error:
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR, "Error: "
+			"sw 0x%04llx (%d,%d,%d) <--> sw 0x%04llx (%d,%d,%d) "
+			"invalid torus %s link orientation\n",
+			ntohllu(t_sw0->n_id), t_sw0->i, t_sw0->j, t_sw0->k,
+			ntohllu(t_sw1->n_id), t_sw1->i, t_sw1->j, t_sw1->k,
+			cdir_name);
+		goto out;
+	}
+
+	f_sw0 = t_sw0->tmp;
+	f_sw1 = t_sw1->tmp;
+
+	if (!f_sw0 || !f_sw1) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: missing fabric switches!\n"
+			"  switch GUIDs: 0x%04llx 0x%04llx\n",
+			ntohllu(t_sw0->n_id), ntohllu(t_sw1->n_id));
+		goto out;
+	}
+	pg0 = &t_sw0->ptgrp[2*cdir + 1];
+	pg0->type = PASSTHRU;
+
+	pg1 = &t_sw1->ptgrp[2*cdir];
+	pg1->type = PASSTHRU;
+
+	port_cnt = f_sw0->port_cnt;
+	/*
+	 * Find all the links between these two switches.
+	 */
+	for (p = 0; p < port_cnt; p++) {
+		struct endpoint *f_ep0 = NULL, *f_ep1 = NULL;
+
+		if (!f_sw0->port[p] || !f_sw0->port[p]->link)
+			continue;
+
+		if (f_sw0->port[p]->link->end[0].n_id == t_sw0->n_id &&
+		    f_sw0->port[p]->link->end[1].n_id == t_sw1->n_id) {
+
+			f_ep0 = &f_sw0->port[p]->link->end[0];
+			f_ep1 = &f_sw0->port[p]->link->end[1];
+		} else if (f_sw0->port[p]->link->end[1].n_id == t_sw0->n_id &&
+			   f_sw0->port[p]->link->end[0].n_id == t_sw1->n_id) {
+
+			f_ep0 = &f_sw0->port[p]->link->end[1];
+			f_ep1 = &f_sw0->port[p]->link->end[0];
+		} else
+			continue;
+
+		if (!(f_ep0->type == PASSTHRU && f_ep1->type == PASSTHRU)) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: not interswitch "
+				"link:\n  0x%04llx/%d <-> 0x%04llx/%d\n",
+				ntohllu(f_ep0->n_id), f_ep0->port,
+				ntohllu(f_ep1->n_id), f_ep1->port);
+			goto out;
+		}
+		/*
+		 * Skip over links that already have been established in the
+		 * torus.
+		 */
+		if (!(f_ep0->sw && f_ep1->sw))
+			continue;
+
+		if (!connect_tlink(pg0, f_ep0, pg1, f_ep1, t))
+			goto out;
+	}
+	success = true;
+out:
+	return success;
+}
+
+static
+bool link_srcsink(struct torus *t, int i, int j, int k)
+{
+	struct endpoint *f_ep0;
+	struct endpoint *f_ep1;
+	struct t_switch *tsw;
+	struct f_switch *fsw;
+	struct port_grp *pg;
+	struct link *fl, *tl;
+	unsigned p, port_cnt;
+	bool success = false;
+
+	i = canonicalize(i, t->x_sz);
+	j = canonicalize(j, t->y_sz);
+	k = canonicalize(k, t->z_sz);
+
+	tsw = t->sw[i][j][k];
+	if (!tsw)
+		return true;
+
+	fsw = tsw->tmp;
+	pg = &tsw->ptgrp[2 * TORUS_MAX_DIM];
+	pg->type = SRCSINK;
+	tsw->osm_switch = fsw->osm_switch;
+	tsw->osm_switch->priv = tsw;
+	fsw->osm_switch = NULL;
+
+	port_cnt = fsw->port_cnt;
+	for (p = 0; p < port_cnt; p++) {
+
+		if (!fsw->port[p])
+			continue;
+
+		if (fsw->port[p]->type == SRCSINK) {
+			/*
+			 * If the endpoint is the switch port used for in-band
+			 * communication with the switch itself, move it to
+			 * the torus.
+			 */
+			if (pg->port_cnt == t->portgrp_sz) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: exceeded port group max port "
+					"count (%d): switch GUID 0x%04llx\n",
+					t->portgrp_sz, ntohllu(tsw->n_id));
+				goto out;
+			}
+			fsw->port[p]->sw = tsw;
+			fsw->port[p]->pgrp = pg;
+			tsw->port[p] = fsw->port[p];
+			tsw->port[p]->osm_port->priv = tsw->port[p];
+			pg->port[pg->port_cnt++] = fsw->port[p];
+			fsw->port[p] = NULL;
+
+		} else if (fsw->port[p]->link &&
+			   fsw->port[p]->type == PASSTHRU) {
+			/*
+			 * If the endpoint is a link to a CA, create a new link
+			 * in the torus.  Disconnect the fabric link.
+			 */
+
+			fl = fsw->port[p]->link;
+
+			if (fl->end[0].sw == fsw) {
+				f_ep0 = &fl->end[0];
+				f_ep1 = &fl->end[1];
+			} else if (fl->end[1].sw == fsw) {
+				f_ep1 = &fl->end[0];
+				f_ep0 = &fl->end[1];
+			} else
+				continue;
+
+			if (f_ep1->type != SRCSINK)
+				continue;
+
+			if (pg->port_cnt == t->portgrp_sz) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: exceeded port group max port "
+					"count (%d): switch GUID 0x%04llx\n",
+					t->portgrp_sz, ntohllu(tsw->n_id));
+				goto out;
+			}
+			/*
+			 * Switch ports connected to links don't get
+			 * associated with osm_port_t objects; see
+			 * capture_fabric().  So just check CA end.
+			 */
+			if (!f_ep1->osm_port) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: NULL osm_port->priv port "
+					"GUID 0x%04llx\n",
+					ntohllu(f_ep1->n_id));
+				goto out;
+			}
+			tl = alloc_tlink(t);
+			if (!tl)
+				continue;
+
+			tl->end[0].type = f_ep0->type;
+			tl->end[0].port = f_ep0->port;
+			tl->end[0].n_id = f_ep0->n_id;
+			tl->end[0].sw = tsw;
+			tl->end[0].link = tl;
+			tl->end[0].pgrp = pg;
+			pg->port[pg->port_cnt++] = &tl->end[0];
+			pg->sw->port[f_ep0->port] =  &tl->end[0];
+
+			tl->end[1].type = f_ep1->type;
+			tl->end[1].port = f_ep1->port;
+			tl->end[1].n_id = f_ep1->n_id;
+			tl->end[1].sw = NULL;	/* Correct for a CA */
+			tl->end[1].link = tl;
+			tl->end[1].pgrp = NULL;	/* Correct for a CA */
+
+			tl->end[1].osm_port = f_ep1->osm_port;
+			tl->end[1].osm_port->priv = &tl->end[1];
+			f_ep1->osm_port = NULL;
+
+			t->ca_cnt++;
+			f_ep0->sw = NULL;
+			f_ep0->port = -1;
+			fsw->port[p] = NULL;
+		}
+	}
+	success = true;
+out:
+	return success;
+}
+
+static
+struct f_switch *ffind_face_corner(struct f_switch *fsw0,
+				   struct f_switch *fsw1,
+				   struct f_switch *fsw2)
+{
+	int p0, p3;
+	struct link *l;
+	struct endpoint *far_end;
+	struct f_switch *fsw, *fsw3 = NULL;
+
+	if (!(fsw0 && fsw1 && fsw2))
+		goto out;
+
+	for (p0 = 0; p0 < fsw0->port_cnt; p0++) {
+		/*
+		 * Ignore everything except switch links that haven't
+		 * been installed into the torus.
+		 */
+		if (!(fsw0->port[p0] && fsw0->port[p0]->sw &&
+		      fsw0->port[p0]->type == PASSTHRU))
+			continue;
+
+		l = fsw0->port[p0]->link;
+
+		if (l->end[0].n_id == fsw0->n_id)
+			far_end = &l->end[1];
+		else
+			far_end = &l->end[0];
+
+		/*
+		 * Ignore CAs
+		 */
+		if (!(far_end->type == PASSTHRU && far_end->sw))
+			continue;
+
+		fsw3 = far_end->sw;
+		if (fsw3->n_id == fsw1->n_id)	/* existing corner */
+			continue;
+
+		for (p3 = 0; p3 < fsw3->port_cnt; p3++) {
+			/*
+			 * Ignore everything except switch links that haven't
+			 * been installed into the torus.
+			 */
+			if (!(fsw3->port[p3] && fsw3->port[p3]->sw &&
+			      fsw3->port[p3]->type == PASSTHRU))
+				continue;
+
+			l = fsw3->port[p3]->link;
+
+			if (l->end[0].n_id == fsw3->n_id)
+				far_end = &l->end[1];
+			else
+				far_end = &l->end[0];
+
+			/*
+			 * Ignore CAs
+			 */
+			if (!(far_end->type == PASSTHRU && far_end->sw))
+				continue;
+
+			fsw = far_end->sw;
+			if (fsw->n_id == fsw2->n_id)
+				goto out;
+		}
+	}
+	fsw3 = NULL;
+out:
+	return fsw3;
+}
+
+static
+struct f_switch *tfind_face_corner(struct t_switch *tsw0,
+				   struct t_switch *tsw1,
+				   struct t_switch *tsw2)
+{
+	if (!(tsw0 && tsw1 && tsw2))
+		return NULL;
+
+	return ffind_face_corner(tsw0->tmp, tsw1->tmp, tsw2->tmp);
+}
+
+/*
+ * This code can break on any torus with a dimension that has radix four.
+ *
+ * What is supposed to happen is that this code will find the
+ * two faces whose shared edge is the desired perpendicular.
+ *
+ * What actually happens is while searching we send two connected
+ * edges that are colinear in a torus dimension with radix four to
+ * ffind_face_corner(), which tries to complete a face by finding a
+ * 4-loop of edges.
+ *
+ * In the radix four torus case, it can find a 4-loop which is a ring in a
+ * dimension with radix four, rather than the desired face.  It thus returns
+ * true when it shouldn't, so the wrong edge is returned as the perpendicular.
+ *
+ * The appropriate instance of safe_N_perpendicular() (where N == x, y, z)
+ * should be used to determine if it is safe to call ffind_perpendicular();
+ * these functions will return false it there is a possibility of finding
+ * a wrong perpendicular.
+ */
+struct f_switch *ffind_3d_perpendicular(struct f_switch *fsw0,
+					struct f_switch *fsw1,
+					struct f_switch *fsw2,
+					struct f_switch *fsw3)
+{
+	int p1;
+	struct link *l;
+	struct endpoint *far_end;
+	struct f_switch *fsw4 = NULL;
+
+	if (!(fsw0 && fsw1 && fsw2 && fsw3))
+		goto out;
+
+	/*
+	 * Look at all the ports on the switch, fsw1,  that is the base of
+	 * the perpendicular.
+	 */
+	for (p1 = 0; p1 < fsw1->port_cnt; p1++) {
+		/*
+		 * Ignore everything except switch links that haven't
+		 * been installed into the torus.
+		 */
+		if (!(fsw1->port[p1] && fsw1->port[p1]->sw &&
+		      fsw1->port[p1]->type == PASSTHRU))
+			continue;
+
+		l = fsw1->port[p1]->link;
+
+		if (l->end[0].n_id == fsw1->n_id)
+			far_end = &l->end[1];
+		else
+			far_end = &l->end[0];
+		/*
+		 * Ignore CAs
+		 */
+		if (!(far_end->type == PASSTHRU && far_end->sw))
+			continue;
+
+		fsw4 = far_end->sw;
+		if (fsw4->n_id == fsw3->n_id)	/* wrong perpendicular */
+			continue;
+
+		if (ffind_face_corner(fsw0, fsw1, fsw4) &&
+		    ffind_face_corner(fsw2, fsw1, fsw4))
+			goto out;
+	}
+	fsw4 = NULL;
+out:
+	return fsw4;
+}
+struct f_switch *ffind_2d_perpendicular(struct f_switch *fsw0,
+					struct f_switch *fsw1,
+					struct f_switch *fsw2)
+{
+	int p1;
+	struct link *l;
+	struct endpoint *far_end;
+	struct f_switch *fsw3 = NULL;
+
+	if (!(fsw0 && fsw1 && fsw2))
+		goto out;
+
+	/*
+	 * Look at all the ports on the switch, fsw1,  that is the base of
+	 * the perpendicular.
+	 */
+	for (p1 = 0; p1 < fsw1->port_cnt; p1++) {
+		/*
+		 * Ignore everything except switch links that haven't
+		 * been installed into the torus.
+		 */
+		if (!(fsw1->port[p1] && fsw1->port[p1]->sw &&
+		      fsw1->port[p1]->type == PASSTHRU))
+			continue;
+
+		l = fsw1->port[p1]->link;
+
+		if (l->end[0].n_id == fsw1->n_id)
+			far_end = &l->end[1];
+		else
+			far_end = &l->end[0];
+		/*
+		 * Ignore CAs
+		 */
+		if (!(far_end->type == PASSTHRU && far_end->sw))
+			continue;
+
+		fsw3 = far_end->sw;
+		if (fsw3->n_id == fsw2->n_id)	/* wrong perpendicular */
+			continue;
+
+		if (ffind_face_corner(fsw0, fsw1, fsw3))
+			goto out;
+	}
+	fsw3 = NULL;
+out:
+	return fsw3;
+}
+
+static
+struct f_switch *tfind_3d_perpendicular(struct t_switch *tsw0,
+					struct t_switch *tsw1,
+					struct t_switch *tsw2,
+					struct t_switch *tsw3)
+{
+	if (!(tsw0 && tsw1 && tsw2 && tsw3))
+		return NULL;
+
+	return ffind_3d_perpendicular(tsw0->tmp, tsw1->tmp,
+				      tsw2->tmp, tsw3->tmp);
+}
+
+static
+struct f_switch *tfind_2d_perpendicular(struct t_switch *tsw0,
+					struct t_switch *tsw1,
+					struct t_switch *tsw2)
+{
+	if (!(tsw0 && tsw1 && tsw2))
+		return NULL;
+
+	return ffind_2d_perpendicular(tsw0->tmp, tsw1->tmp, tsw2->tmp);
+}
+
+static
+bool safe_x_ring(struct torus *t, int i, int j, int k)
+{
+	int im1, ip1, ip2;
+	bool success = true;
+
+	/*
+	 * If this x-direction radix-4 ring has at least two links
+	 * already installed into the torus,  then this ring does not
+	 * prevent us from looking for y or z direction perpendiculars.
+	 *
+	 * It is easier to check for the appropriate switches being installed
+	 * into the torus than it is to check for the links, so force the
+	 * link installation if the appropriate switches are installed.
+	 *
+	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
+	 */
+	if (t->x_sz != 4 || t->flags & X_MESH)
+		goto out;
+
+	im1 = canonicalize(i - 1, t->x_sz);
+	ip1 = canonicalize(i + 1, t->x_sz);
+	ip2 = canonicalize(i + 2, t->x_sz);
+
+	if (!!t->sw[im1][j][k] +
+	    !!t->sw[ip1][j][k] + !!t->sw[ip2][j][k] < 2) {
+		success = false;
+		goto out;
+	}
+	if (t->sw[ip2][j][k] && t->sw[im1][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[ip2][j][k],
+					 t->sw[im1][j][k])
+			&& success;
+
+	if (t->sw[im1][j][k] && t->sw[i][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[im1][j][k],
+					 t->sw[i][j][k])
+			&& success;
+
+	if (t->sw[i][j][k] && t->sw[ip1][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[i][j][k],
+					 t->sw[ip1][j][k])
+			&& success;
+
+	if (t->sw[ip1][j][k] && t->sw[ip2][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[ip1][j][k],
+					 t->sw[ip2][j][k])
+			&& success;
+out:
+	return success;
+}
+
+static
+bool safe_y_ring(struct torus *t, int i, int j, int k)
+{
+	int jm1, jp1, jp2;
+	bool success = true;
+
+	/*
+	 * If this y-direction radix-4 ring has at least two links
+	 * already installed into the torus,  then this ring does not
+	 * prevent us from looking for x or z direction perpendiculars.
+	 *
+	 * It is easier to check for the appropriate switches being installed
+	 * into the torus than it is to check for the links, so force the
+	 * link installation if the appropriate switches are installed.
+	 *
+	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
+	 */
+	if (t->y_sz != 4 || (t->flags & Y_MESH))
+		goto out;
+
+	jm1 = canonicalize(j - 1, t->y_sz);
+	jp1 = canonicalize(j + 1, t->y_sz);
+	jp2 = canonicalize(j + 2, t->y_sz);
+
+	if (!!t->sw[i][jm1][k] +
+	    !!t->sw[i][jp1][k] + !!t->sw[i][jp2][k] < 2) {
+		success = false;
+		goto out;
+	}
+	if (t->sw[i][jp2][k] && t->sw[i][jm1][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][jp2][k],
+					 t->sw[i][jm1][k])
+			&& success;
+
+	if (t->sw[i][jm1][k] && t->sw[i][j][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][jm1][k],
+					 t->sw[i][j][k])
+			&& success;
+
+	if (t->sw[i][j][k] && t->sw[i][jp1][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][j][k],
+					 t->sw[i][jp1][k])
+			&& success;
+
+	if (t->sw[i][jp1][k] && t->sw[i][jp2][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][jp1][k],
+					 t->sw[i][jp2][k])
+			&& success;
+out:
+	return success;
+}
+
+static
+bool safe_z_ring(struct torus *t, int i, int j, int k)
+{
+	int km1, kp1, kp2;
+	bool success = true;
+
+	/*
+	 * If this z-direction radix-4 ring has at least two links
+	 * already installed into the torus,  then this ring does not
+	 * prevent us from looking for x or y direction perpendiculars.
+	 *
+	 * It is easier to check for the appropriate switches being installed
+	 * into the torus than it is to check for the links, so force the
+	 * link installation if the appropriate switches are installed.
+	 *
+	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
+	 */
+	if (t->z_sz != 4 || t->flags & Z_MESH)
+		goto out;
+
+	km1 = canonicalize(k - 1, t->z_sz);
+	kp1 = canonicalize(k + 1, t->z_sz);
+	kp2 = canonicalize(k + 2, t->z_sz);
+
+	if (!!t->sw[i][j][km1] +
+	    !!t->sw[i][j][kp1] + !!t->sw[i][j][kp2] < 2) {
+		success = false;
+		goto out;
+	}
+	if (t->sw[i][j][kp2] && t->sw[i][j][km1])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][kp2],
+					 t->sw[i][j][km1])
+			&& success;
+
+	if (t->sw[i][j][km1] && t->sw[i][j][k])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][km1],
+					 t->sw[i][j][k])
+			&& success;
+
+	if (t->sw[i][j][k] && t->sw[i][j][kp1])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][k],
+					 t->sw[i][j][kp1])
+			&& success;
+
+	if (t->sw[i][j][kp1] && t->sw[i][j][kp2])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][kp1],
+					 t->sw[i][j][kp2])
+			&& success;
+out:
+	return success;
+}
+
+/*
+ * These functions return true when it safe to call
+ * tfind_3d_perpendicular()/ffind_3d_perpendicular().
+ */
+static
+bool safe_x_perpendicular(struct torus *t, int i, int j, int k)
+{
+	/*
+	 * If the dimensions perpendicular to the search direction are
+	 * not radix 4 torus dimensions, it is always safe to search for
+	 * a perpendicular.
+	 *
+	 * Here we are checking for enough appropriate links having been
+	 * installed into the torus to prevent an incorrect link from being
+	 * considered as a perpendicular candidate.
+	 */
+	return safe_y_ring(t, i, j, k) && safe_z_ring(t, i, j, k);
+}
+
+static
+bool safe_y_perpendicular(struct torus *t, int i, int j, int k)
+{
+	/*
+	 * If the dimensions perpendicular to the search direction are
+	 * not radix 4 torus dimensions, it is always safe to search for
+	 * a perpendicular.
+	 *
+	 * Here we are checking for enough appropriate links having been
+	 * installed into the torus to prevent an incorrect link from being
+	 * considered as a perpendicular candidate.
+	 */
+	return safe_x_ring(t, i, j, k) && safe_z_ring(t, i, j, k);
+}
+
+static
+bool safe_z_perpendicular(struct torus *t, int i, int j, int k)
+{
+	/*
+	 * If the dimensions perpendicular to the search direction are
+	 * not radix 4 torus dimensions, it is always safe to search for
+	 * a perpendicular.
+	 *
+	 * Implement this by checking for enough appropriate links having
+	 * been installed into the torus to prevent an incorrect link from
+	 * being considered as a perpendicular candidate.
+	 */
+	return safe_x_ring(t, i, j, k) && safe_y_ring(t, i, j, k);
+}
+
+/*
+ * Templates for determining 2D/3D case fingerprints. Recall that if
+ * a fingerprint bit is set the corresponding switch is absent from
+ * the all-switches-present template.
+ *
+ * I.e., for the 2D case where the x,y dimensions have a radix greater
+ * than one, and the z dimension has radix 1, fingerprint bits 4-7 are
+ * always zero.
+ *
+ * For the 2D case where the x,z dimensions have a radix greater than
+ * one, and the y dimension has radix 1, fingerprint bits 2,3,6,7 are
+ * always zero.
+ *
+ * For the 2D case where the y,z dimensions have a radix greater than
+ * one, and the x dimension has radix 1, fingerprint bits 1,3,5,7 are
+ * always zero.
+ *
+ * Recall also that bits 8-10 distinguish between 2D and 3D cases.
+ * If bit 8+d is set, for 0 <= d < 3;  the d dimension of the desired
+ * torus has radix greater than 1.
+ */
+
+/*
+ * 2D case 0x300
+ *  b0: t->sw[i  ][j  ][0  ]
+ *  b1: t->sw[i+1][j  ][0  ]
+ *  b2: t->sw[i  ][j+1][0  ]
+ *  b3: t->sw[i+1][j+1][0  ]
+ *                                    O . . . . . O
+ * 2D case 0x500                      .           .
+ *  b0: t->sw[i  ][0  ][k  ]          .           .
+ *  b1: t->sw[i+1][0  ][k  ]          .           .
+ *  b4: t->sw[i  ][0  ][k+1]          .           .
+ *  b5: t->sw[i+1][0  ][k+1]          .           .
+ *                                    @ . . . . . O
+ * 2D case 0x600
+ *  b0: t->sw[0  ][j  ][k  ]
+ *  b2: t->sw[0  ][j+1][k  ]
+ *  b4: t->sw[0  ][j  ][k+1]
+ *  b6: t->sw[0  ][j+1][k+1]
+ */
+
+/*
+ * 3D case 0x700:                           O
+ *                                        . . .
+ *  b0: t->sw[i  ][j  ][k  ]            .   .   .
+ *  b1: t->sw[i+1][j  ][k  ]          .     .     .
+ *  b2: t->sw[i  ][j+1][k  ]        .       .       .
+ *  b3: t->sw[i+1][j+1][k  ]      O         .         O
+ *  b4: t->sw[i  ][j  ][k+1]      . .       O       . .
+ *  b5: t->sw[i+1][j  ][k+1]      .   .   .   .   .   .
+ *  b6: t->sw[i  ][j+1][k+1]      .     .       .     .
+ *  b7: t->sw[i+1][j+1][k+1]      .   .   .   .   .   .
+ *                                . .       O       . .
+ *                                O         .         O
+ *                                  .       .       .
+ *                                    .     .     .
+ *                                      .   .   .
+ *                                        . . .
+ *                                          @
+ */
+
+static
+void log_no_crnr(struct torus *t, unsigned n,
+		 int case_i, int case_j, int case_k,
+		 int crnr_i, int crnr_j, int crnr_k)
+{
+	if (t->debug)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO, "Case 0x%03x "
+			"@ %d %d %d: no corner @ %d %d %d\n",
+			n, case_i, case_j, case_k, crnr_i, crnr_j, crnr_k);
+}
+
+static
+void log_no_perp(struct torus *t, unsigned n,
+		 int case_i, int case_j, int case_k,
+		 int perp_i, int perp_j, int perp_k)
+{
+	if (t->debug)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO, "Case 0x%03x "
+			"@ %d %d %d: no perpendicular @ %d %d %d\n",
+			n, case_i, case_j, case_k, perp_i, perp_j, perp_k);
+}
+
+/*
+ * Handle the 2D cases with a single existing edge.
+ *
+ */
+
+/*
+ * 2D case 0x30c
+ *  b0: t->sw[i  ][j  ][0  ]
+ *  b1: t->sw[i+1][j  ][0  ]
+ *  b2:
+ *  b3:
+ *                                    O           O
+ * 2D case 0x530
+ *  b0: t->sw[i  ][0  ][k  ]
+ *  b1: t->sw[i+1][0  ][k  ]
+ *  b4:
+ *  b5:
+ *                                    @ . . . . . O
+ * 2D case 0x650
+ *  b0: t->sw[0  ][j  ][k  ]
+ *  b2: t->sw[0  ][j+1][k  ]
+ *  b4:
+ *  b6:
+ */
+static
+bool handle_case_0x30c(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jm1 = canonicalize(j - 1, t->y_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (safe_y_perpendicular(t, i, j, k) &&
+	    install_tswitch(t, i, jp1, k,
+			    tfind_2d_perpendicular(t->sw[ip1][j][k],
+						   t->sw[i][j][k],
+						   t->sw[i][jm1][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x30c, i, j, k, i, j, k);
+
+	if (safe_y_perpendicular(t, ip1, j, k) &&
+	    install_tswitch(t, ip1, jp1, k,
+			    tfind_2d_perpendicular(t->sw[i][j][k],
+						   t->sw[ip1][j][k],
+						   t->sw[ip1][jm1][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x30c, i, j, k, ip1, j, k);
+	return false;
+}
+
+static
+bool handle_case_0x530(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int km1 = canonicalize(k - 1, t->z_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (safe_z_perpendicular(t, i, j, k) &&
+	    install_tswitch(t, i, j, kp1,
+			    tfind_2d_perpendicular(t->sw[ip1][j][k],
+						   t->sw[i][j][k],
+						   t->sw[i][j][km1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x530, i, j, k, i, j, k);
+
+	if (safe_z_perpendicular(t, ip1, j, k) &&
+	      install_tswitch(t, ip1, j, kp1,
+			      tfind_2d_perpendicular(t->sw[i][j][k],
+						     t->sw[ip1][j][k],
+						     t->sw[ip1][j][km1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x530, i, j, k, ip1, j, k);
+	return false;
+}
+
+static
+bool handle_case_0x650(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int km1 = canonicalize(k - 1, t->z_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (safe_z_perpendicular(t, i, j, k) &&
+	    install_tswitch(t, i, j, kp1,
+			    tfind_2d_perpendicular(t->sw[i][jp1][k],
+						   t->sw[i][j][k],
+						   t->sw[i][j][km1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x650, i, j, k, i, j, k);
+
+	if (safe_z_perpendicular(t, i, jp1, k) &&
+	    install_tswitch(t, i, jp1, kp1,
+			    tfind_2d_perpendicular(t->sw[i][j][k],
+						   t->sw[i][jp1][k],
+						   t->sw[i][jp1][km1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x650, i, j, k, i, jp1, k);
+	return false;
+}
+
+/*
+ * 2D case 0x305
+ *  b0:
+ *  b1: t->sw[i+1][j  ][0  ]
+ *  b2:
+ *  b3: t->sw[i+1][j+1][0  ]
+ *                                    O           O
+ * 2D case 0x511                                  .
+ *  b0:                                           .
+ *  b1: t->sw[i+1][0  ][k  ]                      .
+ *  b4:                                           .
+ *  b5: t->sw[i+1][0  ][k+1]                      .
+ *                                    @           O
+ * 2D case 0x611
+ *  b0:
+ *  b2: t->sw[0  ][j+1][k  ]
+ *  b4:
+ *  b6: t->sw[0  ][j+1][k+1]
+ */
+static
+bool handle_case_0x305(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int ip2 = canonicalize(i + 2, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (safe_x_perpendicular(t, ip1, j, k) &&
+	    install_tswitch(t, i, j, k,
+			    tfind_2d_perpendicular(t->sw[ip1][jp1][k],
+						   t->sw[ip1][j][k],
+						   t->sw[ip2][j][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x305, i, j, k, ip1, j, k);
+
+	if (safe_x_perpendicular(t, ip1, jp1, k) &&
+	    install_tswitch(t, i, jp1, k,
+			    tfind_2d_perpendicular(t->sw[ip1][j][k],
+						   t->sw[ip1][jp1][k],
+						   t->sw[ip2][jp1][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x305, i, j, k, ip1, jp1, k);
+	return false;
+}
+
+static
+bool handle_case_0x511(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int ip2 = canonicalize(i + 2, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (safe_x_perpendicular(t, ip1, j, k) &&
+	    install_tswitch(t, i, j, k,
+			    tfind_2d_perpendicular(t->sw[ip1][j][kp1],
+						   t->sw[ip1][j][k],
+						   t->sw[ip2][j][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x511, i, j, k, ip1, j, k);
+
+	if (safe_x_perpendicular(t, ip1, j, kp1) &&
+	    install_tswitch(t, i, j, kp1,
+			    tfind_2d_perpendicular(t->sw[ip1][j][k],
+						   t->sw[ip1][j][kp1],
+						   t->sw[ip2][j][kp1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x511, i, j, k, ip1, j, kp1);
+	return false;
+}
+
+static
+bool handle_case_0x611(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int jp2 = canonicalize(j + 2, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (safe_y_perpendicular(t, i, jp1, k) &&
+	    install_tswitch(t, i, j, k,
+			    tfind_2d_perpendicular(t->sw[i][jp1][kp1],
+						   t->sw[i][jp1][k],
+						   t->sw[i][jp2][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x611, i, j, k, i, jp1, k);
+
+	if (safe_y_perpendicular(t, i, jp1, kp1) &&
+	    install_tswitch(t, i, j, kp1,
+			    tfind_2d_perpendicular(t->sw[i][jp1][k],
+						   t->sw[i][jp1][kp1],
+						   t->sw[i][jp2][kp1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x611, i, j, k, i, jp1, kp1);
+	return false;
+}
+
+/*
+ * 2D case 0x303
+ *  b0:
+ *  b1:
+ *  b2: t->sw[i  ][j+1][0  ]
+ *  b3: t->sw[i+1][j+1][0  ]
+ *                                    O . . . . . O
+ * 2D case 0x503
+ *  b0:
+ *  b1:
+ *  b4: t->sw[i  ][0  ][k+1]
+ *  b5: t->sw[i+1][0  ][k+1]
+ *                                    @           O
+ * 2D case 0x605
+ *  b0:
+ *  b2:
+ *  b4: t->sw[0  ][j  ][k+1]
+ *  b6: t->sw[0  ][j+1][k+1]
+ */
+static
+bool handle_case_0x303(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int jp2 = canonicalize(j + 2, t->y_sz);
+
+	if (safe_y_perpendicular(t, i, jp1, k) &&
+	    install_tswitch(t, i, j, k,
+			    tfind_2d_perpendicular(t->sw[ip1][jp1][k],
+						   t->sw[i][jp1][k],
+						   t->sw[i][jp2][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x303, i, j, k, i, jp1, k);
+
+	if (safe_y_perpendicular(t, ip1, jp1, k) &&
+	    install_tswitch(t, ip1, j, k,
+			    tfind_2d_perpendicular(t->sw[i][jp1][k],
+						   t->sw[ip1][jp1][k],
+						   t->sw[ip1][jp2][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x303, i, j, k, ip1, jp1, k);
+	return false;
+}
+
+static
+bool handle_case_0x503(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+	int kp2 = canonicalize(k + 2, t->z_sz);
+
+	if (safe_z_perpendicular(t, i, j, kp1) &&
+	    install_tswitch(t, i, j, k,
+			    tfind_2d_perpendicular(t->sw[ip1][j][kp1],
+						   t->sw[i][j][kp1],
+						   t->sw[i][j][kp2]))) {
+		return true;
+	}
+	log_no_perp(t, 0x503, i, j, k, i, j, kp1);
+
+	if (safe_z_perpendicular(t, ip1, j, kp1) &&
+	    install_tswitch(t, ip1, j, k,
+			    tfind_2d_perpendicular(t->sw[i][j][kp1],
+						   t->sw[ip1][j][kp1],
+						   t->sw[ip1][j][kp2]))) {
+		return true;
+	}
+	log_no_perp(t, 0x503, i, j, k, ip1, j, kp1);
+	return false;
+}
+
+static
+bool handle_case_0x605(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+	int kp2 = canonicalize(k + 2, t->z_sz);
+
+	if (safe_z_perpendicular(t, i, j, kp1) &&
+	    install_tswitch(t, i, j, k,
+			    tfind_2d_perpendicular(t->sw[i][jp1][kp1],
+						   t->sw[i][j][kp1],
+						   t->sw[i][j][kp2]))) {
+		return true;
+	}
+	log_no_perp(t, 0x605, i, j, k, i, j, kp1);
+
+	if (safe_z_perpendicular(t, i, jp1, kp1) &&
+	    install_tswitch(t, i, jp1, k,
+			    tfind_2d_perpendicular(t->sw[i][j][kp1],
+						   t->sw[i][jp1][kp1],
+						   t->sw[i][jp1][kp2]))) {
+		return true;
+	}
+	log_no_perp(t, 0x605, i, j, k, i, jp1, kp1);
+	return false;
+}
+
+/*
+ * 2D case 0x30a
+ *  b0: t->sw[i  ][j  ][0  ]
+ *  b1:
+ *  b2: t->sw[i  ][j+1][0  ]
+ *  b3:
+ *                                    O           O
+ * 2D case 0x522                      .
+ *  b0: t->sw[i  ][0  ][k  ]          .
+ *  b1:                               .
+ *  b4: t->sw[i  ][0  ][k+1]          .
+ *  b5:                               .
+ *                                    @           O
+ * 2D case 0x644
+ *  b0: t->sw[0  ][j  ][k  ]
+ *  b2:
+ *  b4: t->sw[0  ][j  ][k+1]
+ *  b6:
+ */
+static
+bool handle_case_0x30a(struct torus *t, int i, int j, int k)
+{
+	int im1 = canonicalize(i - 1, t->x_sz);
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (safe_x_perpendicular(t, i, j, k) &&
+	    install_tswitch(t, ip1, j, k,
+			    tfind_2d_perpendicular(t->sw[i][jp1][k],
+						   t->sw[i][j][k],
+						   t->sw[im1][j][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x30a, i, j, k, i, j, k);
+
+	if (safe_x_perpendicular(t, i, jp1, k) &&
+	    install_tswitch(t, ip1, jp1, k,
+			    tfind_2d_perpendicular(t->sw[i][j][k],
+						   t->sw[i][jp1][k],
+						   t->sw[im1][jp1][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x30a, i, j, k, i, jp1, k);
+	return false;
+}
+
+static
+bool handle_case_0x522(struct torus *t, int i, int j, int k)
+{
+	int im1 = canonicalize(i - 1, t->x_sz);
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (safe_x_perpendicular(t, i, j, k) &&
+	    install_tswitch(t, ip1, j, k,
+			    tfind_2d_perpendicular(t->sw[i][j][kp1],
+						   t->sw[i][j][k],
+						   t->sw[im1][j][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x522, i, j, k, i, j, k);
+
+	if (safe_x_perpendicular(t, i, j, kp1) &&
+	    install_tswitch(t, ip1, j, kp1,
+			    tfind_2d_perpendicular(t->sw[i][j][k],
+						   t->sw[i][j][kp1],
+						   t->sw[im1][j][kp1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x522, i, j, k, i, j, kp1);
+	return false;
+}
+
+static
+bool handle_case_0x644(struct torus *t, int i, int j, int k)
+{
+	int jm1 = canonicalize(j - 1, t->y_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (safe_y_perpendicular(t, i, j, k) &&
+	    install_tswitch(t, i, jp1, k,
+			    tfind_2d_perpendicular(t->sw[i][j][kp1],
+						   t->sw[i][j][k],
+						   t->sw[i][jm1][k]))) {
+		return true;
+	}
+	log_no_perp(t, 0x644, i, j, k, i, j, k);
+
+	if (safe_y_perpendicular(t, i, j, kp1) &&
+	    install_tswitch(t, i, jp1, kp1,
+			    tfind_2d_perpendicular(t->sw[i][j][k],
+						   t->sw[i][j][kp1],
+						   t->sw[i][jm1][kp1]))) {
+		return true;
+	}
+	log_no_perp(t, 0x644, i, j, k, i, j, kp1);
+	return false;
+}
+
+/*
+ * Handle the 2D cases where two existing edges meet at a corner.
+ *
+ */
+
+/*
+ * 2D case 0x301
+ *  b0:
+ *  b1: t->sw[i+1][j  ][0  ]
+ *  b2: t->sw[i  ][j+1][0  ]
+ *  b3: t->sw[i+1][j+1][0  ]
+ *                                    O . . . . . O
+ * 2D case 0x501                                  .
+ *  b0:                                           .
+ *  b1: t->sw[i+1][0  ][k  ]                      .
+ *  b4: t->sw[i  ][0  ][k+1]                      .
+ *  b5: t->sw[i+1][0  ][k+1]                      .
+ *                                    @           O
+ * 2D case 0x601
+ *  b0:
+ *  b2: t->sw[0  ][j+1][k  ]
+ *  b4: t->sw[0  ][j  ][k+1]
+ *  b6: t->sw[0  ][j+1][k+1]
+ */
+static
+bool handle_case_0x301(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (install_tswitch(t, i, j, k,
+			    tfind_face_corner(t->sw[ip1][j][k],
+					      t->sw[ip1][jp1][k],
+					      t->sw[i][jp1][k]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x301, i, j, k, i, j, k);
+	return false;
+}
+
+static
+bool handle_case_0x501(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, i, j, k,
+			    tfind_face_corner(t->sw[ip1][j][k],
+					      t->sw[ip1][j][kp1],
+					      t->sw[i][j][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x501, i, j, k, i, j, k);
+	return false;
+}
+
+static
+bool handle_case_0x601(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, i, j, k,
+			    tfind_face_corner(t->sw[i][jp1][k],
+					      t->sw[i][jp1][kp1],
+					      t->sw[i][j][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x601, i, j, k, i, j, k);
+	return false;
+}
+
+/*
+ * 2D case 0x302
+ *  b0: t->sw[i  ][j  ][0  ]
+ *  b1:
+ *  b2: t->sw[i  ][j+1][0  ]
+ *  b3: t->sw[i+1][j+1][0  ]
+ *                                    O . . . . . O
+ * 2D case 0x502                      .
+ *  b0: t->sw[i  ][0  ][k  ]          .
+ *  b1:                               .
+ *  b4: t->sw[i  ][0  ][k+1]          .
+ *  b5: t->sw[i+1][0  ][k+1]          .
+ *                                    @           O
+ * 2D case 0x604
+ *  b0: t->sw[0  ][j  ][k  ]
+ *  b2:
+ *  b4: t->sw[0  ][j  ][k+1]
+ *  b6: t->sw[0  ][j+1][k+1]
+ */
+static
+bool handle_case_0x302(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (install_tswitch(t, ip1, j, k,
+			    tfind_face_corner(t->sw[i][j][k],
+					      t->sw[i][jp1][k],
+					      t->sw[ip1][jp1][k]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x302, i, j, k, ip1, j, k);
+	return false;
+}
+
+static
+bool handle_case_0x502(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, ip1, j, k,
+			    tfind_face_corner(t->sw[i][j][k],
+					      t->sw[i][j][kp1],
+					      t->sw[ip1][j][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x502, i, j, k, ip1, j, k);
+	return false;
+}
+
+static
+bool handle_case_0x604(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, i, jp1, k,
+			    tfind_face_corner(t->sw[i][j][k],
+					      t->sw[i][j][kp1],
+					      t->sw[i][jp1][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x604, i, j, k, i, jp1, k);
+	return false;
+}
+
+
+/*
+ * 2D case 0x308
+ *  b0: t->sw[i  ][j  ][0  ]
+ *  b1: t->sw[i+1][j  ][0  ]
+ *  b2: t->sw[i  ][j+1][0  ]
+ *  b3:
+ *                                    O           O
+ * 2D case 0x520                      .
+ *  b0: t->sw[i  ][0  ][k  ]          .
+ *  b1: t->sw[i+1][0  ][k  ]          .
+ *  b4: t->sw[i  ][0  ][k+1]          .
+ *  b5:                               .
+ *                                    @ . . . . . O
+ * 2D case 0x640
+ *  b0: t->sw[0  ][j  ][k  ]
+ *  b2: t->sw[0  ][j+1][k  ]
+ *  b4: t->sw[0  ][j  ][k+1]
+ *  b6:
+ */
+static
+bool handle_case_0x308(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (install_tswitch(t, ip1, jp1, k,
+			    tfind_face_corner(t->sw[ip1][j][k],
+					      t->sw[i][j][k],
+					      t->sw[i][jp1][k]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x308, i, j, k, ip1, jp1, k);
+	return false;
+}
+
+static
+bool handle_case_0x520(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, ip1, j, kp1,
+			    tfind_face_corner(t->sw[ip1][j][k],
+					      t->sw[i][j][k],
+					      t->sw[i][j][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x520, i, j, k, ip1, j, kp1);
+	return false;
+}
+
+static
+bool handle_case_0x640(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, i, jp1, kp1,
+			    tfind_face_corner(t->sw[i][jp1][k],
+					      t->sw[i][j][k],
+					      t->sw[i][j][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x640, i, j, k, i, jp1, kp1);
+	return false;
+}
+
+/*
+ * 2D case 0x304
+ *  b0: t->sw[i  ][j  ][0  ]
+ *  b1: t->sw[i+1][j  ][0  ]
+ *  b2:
+ *  b3: t->sw[i+1][j+1][0  ]
+ *                                    O           O
+ * 2D case 0x510                                  .
+ *  b0: t->sw[i  ][0  ][k  ]                      .
+ *  b1: t->sw[i+1][0  ][k  ]                      .
+ *  b4:                                           .
+ *  b5: t->sw[i+1][0  ][k+1]                      .
+ *                                    @ . . . . . O
+ * 2D case 0x610
+ *  b0: t->sw[0  ][j  ][k  ]
+ *  b2: t->sw[0  ][j+1][k  ]
+ *  b4:
+ *  b6: t->sw[0  ][j+1][k+1]
+ */
+static
+bool handle_case_0x304(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int jp1 = canonicalize(j + 1, t->y_sz);
+
+	if (install_tswitch(t, i, jp1, k,
+			    tfind_face_corner(t->sw[i][j][k],
+					      t->sw[ip1][j][k],
+					      t->sw[ip1][jp1][k]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x304, i, j, k, i, jp1, k);
+	return false;
+}
+
+static
+bool handle_case_0x510(struct torus *t, int i, int j, int k)
+{
+	int ip1 = canonicalize(i + 1, t->x_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, i, j, kp1,
+			    tfind_face_corner(t->sw[i][j][k],
+					      t->sw[ip1][j][k],
+					      t->sw[ip1][j][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x510, i, j, k, i, j, kp1);
+	return false;
+}
+
+static
+bool handle_case_0x610(struct torus *t, int i, int j, int k)
+{
+	int jp1 = canonicalize(j + 1, t->y_sz);
+	int kp1 = canonicalize(k + 1, t->z_sz);
+
+	if (install_tswitch(t, i, j, kp1,
+			    tfind_face_corner(t->sw[i][j][k],
+					      t->sw[i][jp1][k],
+					      t->sw[i][jp1][kp1]))) {
+		return true;
+	}
+	log_no_crnr(t, 0x610, i, j, k, i, j, kp1);
+	return false;
+}
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 09/17] opensm: Add torus-2QoS routing engine, part 3.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (6 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 07/17] opensm: Add torus-2QoS routing engine, part 1 Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 10/17] opensm: Update documentation to describe torus-2QoS Jim Schutt
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/Makefile.am |    2 +-
 opensm/opensm/osm_torus.c | 2191 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 2192 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am
index db7d790..9f21296 100644
--- a/opensm/opensm/Makefile.am
+++ b/opensm/opensm/Makefile.am
@@ -53,7 +53,7 @@ opensm_SOURCES = main.c osm_console_io.c osm_console.c osm_db_files.c \
 		 osm_prtn.c osm_prtn_config.c osm_qos.c osm_router.c \
 		 osm_trap_rcv.c osm_ucast_mgr.c osm_ucast_updn.c \
 		 osm_ucast_lash.c osm_ucast_file.c osm_ucast_ftree.c \
-		 osm_vl15intf.c osm_vl_arb_rcv.c \
+		 osm_torus.c osm_vl15intf.c osm_vl_arb_rcv.c \
 		 st.c osm_perfmgr.c osm_perfmgr_db.c \
 		 osm_event_plugin.c osm_dump.c osm_ucast_cache.c \
 		 osm_qos_parser_y.y osm_qos_parser_l.l osm_qos_policy.c
diff --git a/opensm/opensm/osm_torus.c b/opensm/opensm/osm_torus.c
index 3257ec4..fe643f2 100644
--- a/opensm/opensm/osm_torus.c
+++ b/opensm/opensm/osm_torus.c
@@ -6927,3 +6927,2194 @@ again:
 out:
 	return;
 }
+
+#define LINK_ERR_STR " direction link required!\n"
+#define SEED_ERR_STR " direction links with different seed switches!\n"
+
+static
+bool verify_setup(struct torus *t, struct fabric *f)
+{
+	struct coord_dirs *o;
+	unsigned n = 0;
+	bool success = false;
+	bool all_sw_present, need_seed = true;
+
+	if (!(t->x_sz && t->y_sz && t->z_sz)) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: missing required torus size specification!\n");
+		goto out;
+	}
+	if (t->osm->subn.min_data_vls < 2)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Warning: Too few data VLs to support torus routing "
+			"without credit loops (have %d need 2)\n",
+			(int)t->osm->subn.min_data_vls);
+	if (t->osm->subn.min_data_vls < 4)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Warning: Too few data VLs to support torus routing "
+			"with a failed switch without credit loops"
+			"(have %d need 4)\n",
+			(int)t->osm->subn.min_data_vls);
+	if (t->osm->subn.min_data_vls < 8)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Warning: Too few data VLs to support torus routing "
+			"with two QoS levels (have %d need 8)\n",
+			(int)t->osm->subn.min_data_vls);
+	/*
+	 * Unfortunately, there is a problem with non-unique topology for any
+	 * torus dimension which has radix four.  This problem requires extra
+	 * input, in the form of specifying both the positive and negative
+	 * coordinate directions from a common switch, for any torus dimension
+	 * with radix four (see also build_torus()).
+	 *
+	 * Do the checking required to ensure that the required information
+	 * is present, but more than the needed information is not required.
+	 *
+	 * So, verify that we learned the coordinate directions correctly for
+	 * the fabric.  The coordinate direction links get an invalid port
+	 * set on their ends when parsed.
+	 */
+again:
+	all_sw_present = true;
+	o = &t->seed[n];
+
+	if (t->x_sz == 4 && !(t->flags & X_MESH)) {
+		if (o->xp_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive x" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->xm_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Negative x" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->xp_link.end[0].n_id != o->xm_link.end[0].n_id) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive/negative x" SEED_ERR_STR);
+			goto out;
+		}
+	}
+	if (t->y_sz == 4 && !(t->flags & Y_MESH)) {
+		if (o->yp_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive y" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->ym_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Negative y" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->yp_link.end[0].n_id != o->ym_link.end[0].n_id) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive/negative y" SEED_ERR_STR);
+			goto out;
+		}
+	}
+	if (t->z_sz == 4 && !(t->flags & Z_MESH)) {
+		if (o->zp_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive z" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->zm_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Negative z" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->zp_link.end[0].n_id != o->zm_link.end[0].n_id) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive/negative z" SEED_ERR_STR);
+			goto out;
+		}
+	}
+	if (t->x_sz > 1) {
+		if (o->xp_link.end[0].port >= 0 &&
+		    o->xm_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive or negative x" LINK_ERR_STR);
+			goto out;
+		}
+		if (o->xp_link.end[0].port < 0 &&
+		    !find_f_sw(f, o->xp_link.end[0].n_id))
+			all_sw_present = false;
+
+		if (o->xp_link.end[1].port < 0 &&
+		    !find_f_sw(f, o->xp_link.end[1].n_id))
+			all_sw_present = false;
+
+		if (o->xm_link.end[0].port < 0 &&
+		    !find_f_sw(f, o->xp_link.end[0].n_id))
+			all_sw_present = false;
+
+		if (o->xm_link.end[1].port < 0 &&
+		    !find_f_sw(f, o->xp_link.end[1].n_id))
+			all_sw_present = false;
+	}
+	if (t->z_sz > 1) {
+		if (o->zp_link.end[0].port >= 0 &&
+		    o->zm_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive or negative z" LINK_ERR_STR);
+			goto out;
+		}
+		if ((o->xp_link.end[0].port < 0 &&
+		     o->zp_link.end[0].port < 0 &&
+		     o->zp_link.end[0].n_id != o->xp_link.end[0].n_id) ||
+
+		    (o->xp_link.end[0].port < 0 &&
+		     o->zm_link.end[0].port < 0 &&
+		     o->zm_link.end[0].n_id != o->xp_link.end[0].n_id) ||
+
+		    (o->xm_link.end[0].port < 0 &&
+		     o->zp_link.end[0].port < 0 &&
+		     o->zp_link.end[0].n_id != o->xm_link.end[0].n_id) ||
+
+		    (o->xm_link.end[0].port < 0 &&
+		     o->zm_link.end[0].port < 0 &&
+		     o->zm_link.end[0].n_id != o->xm_link.end[0].n_id)) {
+
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: x and z" SEED_ERR_STR);
+			goto out;
+		}
+		if (o->zp_link.end[0].port < 0 &&
+		    !find_f_sw(f, o->zp_link.end[0].n_id))
+			all_sw_present = false;
+
+		if (o->zp_link.end[1].port < 0 &&
+		    !find_f_sw(f, o->zp_link.end[1].n_id))
+			all_sw_present = false;
+
+		if (o->zm_link.end[0].port < 0 &&
+		    !find_f_sw(f, o->zp_link.end[0].n_id))
+			all_sw_present = false;
+
+		if (o->zm_link.end[1].port < 0 &&
+		    !find_f_sw(f, o->zp_link.end[1].n_id))
+			all_sw_present = false;
+	}
+	if (t->y_sz > 1) {
+		if (o->yp_link.end[0].port >= 0 &&
+		    o->ym_link.end[0].port >= 0) {
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: Positive or negative y" LINK_ERR_STR);
+			goto out;
+		}
+		if ((o->xp_link.end[0].port < 0 &&
+		     o->yp_link.end[0].port < 0 &&
+		     o->yp_link.end[0].n_id != o->xp_link.end[0].n_id) ||
+
+		    (o->xp_link.end[0].port < 0 &&
+		     o->ym_link.end[0].port < 0 &&
+		     o->ym_link.end[0].n_id != o->xp_link.end[0].n_id) ||
+
+		    (o->xm_link.end[0].port < 0 &&
+		     o->yp_link.end[0].port < 0 &&
+		     o->yp_link.end[0].n_id != o->xm_link.end[0].n_id) ||
+
+		    (o->xm_link.end[0].port < 0 &&
+		     o->ym_link.end[0].port < 0 &&
+		     o->ym_link.end[0].n_id != o->xm_link.end[0].n_id)) {
+
+			OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+				"Error: x and y" SEED_ERR_STR);
+			goto out;
+		}
+		if (o->yp_link.end[0].port < 0 &&
+		    !find_f_sw(f, o->yp_link.end[0].n_id))
+			all_sw_present = false;
+
+		if (o->yp_link.end[1].port < 0 &&
+		    !find_f_sw(f, o->yp_link.end[1].n_id))
+			all_sw_present = false;
+
+		if (o->ym_link.end[0].port < 0 &&
+		    !find_f_sw(f, o->yp_link.end[0].n_id))
+			all_sw_present = false;
+
+		if (o->ym_link.end[1].port < 0 &&
+		    !find_f_sw(f, o->yp_link.end[1].n_id))
+			all_sw_present = false;
+	}
+	if (all_sw_present && need_seed) {
+		t->seed_idx = n;
+		need_seed = false;
+	}
+	if (++n < t->seed_cnt)
+		goto again;
+
+	if (need_seed)
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: Every configured torus seed has at "
+			"least one switch missing in fabric!\n");
+	else
+		success = true;
+out:
+	return success;
+}
+
+static
+void build_torus(struct fabric *f, struct torus *t)
+{
+	int i, j, k;
+	int im1, jm1, km1;
+	int ip1, jp1, kp1;
+	unsigned nlink;
+	struct coord_dirs *o;
+	struct f_switch *fsw0, *fsw1;
+	struct t_switch ****sw = t->sw;
+	bool success = true;
+
+	t->link_pool_sz = f->link_cnt;
+	t->link_pool = calloc(1, t->link_pool_sz * sizeof(*t->link_pool));
+	if (!t->link_pool) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: Allocating torus link pool: %s\n",
+			strerror(errno));
+		goto out;
+	}
+	t->fabric = f;
+
+	/*
+	 * Get things started by locating the up to seven switches that
+	 * define the torus "seed", coordinate directions, and datelines.
+	 */
+	o = &t->seed[t->seed_idx];
+
+	i = canonicalize(-o->x_dateline, t->x_sz);
+	j = canonicalize(-o->y_dateline, t->y_sz);
+	k = canonicalize(-o->z_dateline, t->z_sz);
+
+	if (o->xp_link.end[0].port < 0) {
+		ip1 = canonicalize(1 - o->x_dateline, t->x_sz);
+		fsw0 = find_f_sw(f, o->xp_link.end[0].n_id);
+		fsw1 = find_f_sw(f, o->xp_link.end[1].n_id);
+		success =
+			install_tswitch(t, i, j, k, fsw0) &&
+			install_tswitch(t, ip1, j, k, fsw1) &&
+			link_tswitches(t, 0, sw[i][j][k], sw[ip1][j][k]) &&
+			success;
+	}
+	if (o->xm_link.end[0].port < 0) {
+		im1 = canonicalize(-1 - o->x_dateline, t->x_sz);
+		fsw0 = find_f_sw(f, o->xm_link.end[0].n_id);
+		fsw1 = find_f_sw(f, o->xm_link.end[1].n_id);
+		success =
+			install_tswitch(t, i, j, k, fsw0) &&
+			install_tswitch(t, im1, j, k, fsw1) &&
+			link_tswitches(t, 0, sw[im1][j][k], sw[i][j][k]) &&
+			success;
+	}
+	if (o->yp_link.end[0].port < 0) {
+		jp1 = canonicalize(1 - o->y_dateline, t->y_sz);
+		fsw0 = find_f_sw(f, o->yp_link.end[0].n_id);
+		fsw1 = find_f_sw(f, o->yp_link.end[1].n_id);
+		success =
+			install_tswitch(t, i, j, k, fsw0) &&
+			install_tswitch(t, i, jp1, k, fsw1) &&
+			link_tswitches(t, 1, sw[i][j][k], sw[i][jp1][k]) &&
+			success;
+	}
+	if (o->ym_link.end[0].port < 0) {
+		jm1 = canonicalize(-1 - o->y_dateline, t->y_sz);
+		fsw0 = find_f_sw(f, o->ym_link.end[0].n_id);
+		fsw1 = find_f_sw(f, o->ym_link.end[1].n_id);
+		success =
+			install_tswitch(t, i, j, k, fsw0) &&
+			install_tswitch(t, i, jm1, k, fsw1) &&
+			link_tswitches(t, 1, sw[i][jm1][k], sw[i][j][k]) &&
+			success;
+	}
+	if (o->zp_link.end[0].port < 0) {
+		kp1 = canonicalize(1 - o->z_dateline, t->z_sz);
+		fsw0 = find_f_sw(f, o->zp_link.end[0].n_id);
+		fsw1 = find_f_sw(f, o->zp_link.end[1].n_id);
+		success =
+			install_tswitch(t, i, j, k, fsw0) &&
+			install_tswitch(t, i, j, kp1, fsw1) &&
+			link_tswitches(t, 2, sw[i][j][k], sw[i][j][kp1]) &&
+			success;
+	}
+	if (o->zm_link.end[0].port < 0) {
+		km1 = canonicalize(-1 - o->z_dateline, t->z_sz);
+		fsw0 = find_f_sw(f, o->zm_link.end[0].n_id);
+		fsw1 = find_f_sw(f, o->zm_link.end[1].n_id);
+		success =
+			install_tswitch(t, i, j, k, fsw0) &&
+			install_tswitch(t, i, j, km1, fsw1) &&
+			link_tswitches(t, 2, sw[i][j][km1], sw[i][j][k]) &&
+			success;
+	}
+	if (!success)
+		goto out;
+
+	if (!t->seed_idx)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Using torus seed configured as default "
+			"(seed sw %d,%d,%d GUID 0x%04llx).\n",
+			i, j, k, ntohllu(sw[i][j][k]->n_id));
+	else
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Using torus seed configured as backup #%u "
+			"(seed sw %d,%d,%d GUID 0x%04llx).\n",
+			t->seed_idx, i, j, k, ntohllu(sw[i][j][k]->n_id));
+
+	/*
+	 * Search the fabric and construct the expected torus topology.
+	 *
+	 * The algorithm is to consider the "cube" formed by eight switch
+	 * locations bounded by the corners i, j, k and i+1, j+1, k+1.
+	 * For each such cube look at the topology of the switches already
+	 * placed in the torus, and deduce which new switches can be placed
+	 * into their proper locations in the torus.  Examine each cube
+	 * multiple times, until the number of links moved into the torus
+	 * topology does not change.
+	 */
+again:
+	nlink = t->link_cnt;
+
+	for (k = 0; k < (int)t->z_sz; k++)
+		for (j = 0; j < (int)t->y_sz; j++)
+			for (i = 0; i < (int)t->x_sz; i++)
+				locate_sw(t, i, j, k);
+
+	if (t->link_cnt != nlink)
+		goto again;
+
+	/*
+	 * Move all other endpoints into torus/mesh.
+	 */
+	for (k = 0; k < (int)t->z_sz; k++)
+		for (j = 0; j < (int)t->y_sz; j++)
+			for (i = 0; i < (int)t->x_sz; i++)
+				link_srcsink(t, i, j, k);
+out:
+	return;
+}
+
+/*
+ * Returns a count of differences between old and new switches.
+ */
+static
+unsigned tsw_changes(struct t_switch *nsw, struct t_switch *osw)
+{
+	unsigned p, cnt = 0, port_cnt;
+	struct endpoint *npt, *opt;
+	struct endpoint *rnpt, *ropt;
+
+	if (nsw && !osw) {
+		cnt++;
+		OSM_LOG(&nsw->torus->osm->log, OSM_LOG_INFO,
+			"New torus switch %d,%d,%d GUID 0x%04llx\n",
+			nsw->i, nsw->j, nsw->k, ntohllu(nsw->n_id));
+		goto out;
+	}
+	if (osw && !nsw) {
+		cnt++;
+		OSM_LOG(&osw->torus->osm->log, OSM_LOG_INFO,
+			"Lost torus switch %d,%d,%d GUID 0x%04llx\n",
+			osw->i, osw->j, osw->k, ntohllu(osw->n_id));
+		goto out;
+	}
+	if (!(nsw && osw))
+		goto out;
+
+	if (nsw->n_id != osw->n_id) {
+		cnt++;
+		OSM_LOG(&nsw->torus->osm->log, OSM_LOG_INFO,
+			"Torus switch %d,%d,%d GUID "
+			"was 0x%04llx, now 0x%04llx\n",
+			nsw->i, nsw->j, nsw->k,
+			ntohllu(osw->n_id), ntohllu(nsw->n_id));
+	}
+
+	if (nsw->port_cnt != osw->port_cnt) {
+		cnt++;
+		OSM_LOG(&nsw->torus->osm->log, OSM_LOG_INFO,
+			"Torus switch %d,%d,%d GUID 0x%04llx "
+			"had %d ports, now has %d\n",
+			nsw->i, nsw->j, nsw->k, ntohllu(nsw->n_id),
+			osw->port_cnt, nsw->port_cnt);
+	}
+	port_cnt = nsw->port_cnt;
+	if (port_cnt > osw->port_cnt)
+		port_cnt = osw->port_cnt;
+
+	for (p = 0; p < port_cnt; p++) {
+		npt = nsw->port[p];
+		opt = osw->port[p];
+
+		if (npt && npt->link) {
+			if (&npt->link->end[0] == npt)
+				rnpt = &npt->link->end[1];
+			else
+				rnpt = &npt->link->end[0];
+		} else
+			rnpt = NULL;
+
+		if (opt && opt->link) {
+			if (&opt->link->end[0] == opt)
+				ropt = &opt->link->end[1];
+			else
+				ropt = &opt->link->end[0];
+		} else
+			ropt = NULL;
+
+		if (rnpt && !ropt) {
+			++cnt;
+			OSM_LOG(&nsw->torus->osm->log, OSM_LOG_INFO,
+				"Torus switch %d,%d,%d GUID 0x%04llx[%d] "
+				"remote now %s GUID 0x%04llx[%d], "
+				"was missing\n",
+				nsw->i, nsw->j, nsw->k, ntohllu(nsw->n_id), p,
+				rnpt->type == PASSTHRU ? "sw" : "node",
+				ntohllu(rnpt->n_id), rnpt->port);
+			continue;
+		}
+		if (ropt && !rnpt) {
+			++cnt;
+			OSM_LOG(&nsw->torus->osm->log, OSM_LOG_INFO,
+				"Torus switch %d,%d,%d GUID 0x%04llx[%d] "
+				"remote now missing, "
+				"was %s GUID 0x%04llx[%d]\n",
+				osw->i, osw->j, osw->k, ntohllu(nsw->n_id), p,
+				ropt->type == PASSTHRU ? "sw" : "node",
+				ntohllu(ropt->n_id), ropt->port);
+			continue;
+		}
+		if (!(rnpt && ropt))
+			continue;
+
+		if (rnpt->n_id != ropt->n_id) {
+			++cnt;
+			OSM_LOG(&nsw->torus->osm->log, OSM_LOG_INFO,
+				"Torus switch %d,%d,%d GUID 0x%04llx[%d] "
+				"remote now %s GUID 0x%04llx[%d], "
+				"was %s GUID 0x%04llx[%d]\n",
+				nsw->i, nsw->j, nsw->k, ntohllu(nsw->n_id), p,
+				rnpt->type == PASSTHRU ? "sw" : "node",
+				ntohllu(rnpt->n_id), rnpt->port,
+				ropt->type == PASSTHRU ? "sw" : "node",
+				ntohllu(ropt->n_id), ropt->port);
+			continue;
+		}
+	}
+out:
+	return cnt;
+}
+
+static
+void report_torus_changes(struct torus *nt, struct torus *ot)
+{
+	unsigned cnt = 0;
+	unsigned i, j, k;
+	unsigned x_sz = nt->x_sz;
+	unsigned y_sz = nt->y_sz;
+	unsigned z_sz = nt->z_sz;
+
+	if (!(nt && ot))
+		return;
+
+	if (x_sz != ot->x_sz) {
+		cnt++;
+		OSM_LOG(&nt->osm->log, OSM_LOG_INFO,
+			"Torus x radix was %d now %d\n",
+			ot->x_sz, nt->x_sz);
+		if (x_sz > ot->x_sz)
+			x_sz = ot->x_sz;
+	}
+	if (y_sz != ot->y_sz) {
+		cnt++;
+		OSM_LOG(&nt->osm->log, OSM_LOG_INFO,
+			"Torus y radix was %d now %d\n",
+			ot->y_sz, nt->y_sz);
+		if (y_sz > ot->y_sz)
+			y_sz = ot->y_sz;
+	}
+	if (z_sz != ot->z_sz) {
+		cnt++;
+		OSM_LOG(&nt->osm->log, OSM_LOG_INFO,
+			"Torus z radix was %d now %d\n",
+			ot->z_sz, nt->z_sz);
+		if (z_sz > ot->z_sz)
+			z_sz = ot->z_sz;
+	}
+
+	for (k = 0; k < z_sz; k++)
+		for (j = 0; j < y_sz; j++)
+			for (i = 0; i < x_sz; i++) {
+				cnt += tsw_changes(nt->sw[i][j][k],
+						   ot->sw[i][j][k]);
+				/*
+				 * Booting a big fabric will cause lots of
+				 * changes as hosts come up, so don't spew.
+				 * We want to log changes to learn more about
+				 * bouncing links, etc, so they can be fixed.
+				 */
+				if (cnt > 32) {
+					OSM_LOG(&nt->osm->log, OSM_LOG_INFO,
+						"Too many torus changes; "
+						"stopping reporting early\n");
+					return;
+				}
+			}
+}
+
+static
+void rpt_torus_missing(struct torus *t, int i, int j, int k,
+		       struct t_switch *sw, int *missing_z)
+{
+	unsigned long long guid_ho;
+
+	if (!sw) {
+		/*
+		 * We can have multiple missing switches without deadlock
+		 * if and only if they are adajacent in the Z direction.
+		 */
+		if ((t->switch_cnt + 1) < t->sw_pool_sz) {
+			if (t->sw[i][j][canonicalize(k - 1, t->z_sz)] &&
+			    t->sw[i][j][canonicalize(k + 1, t->z_sz)])
+				t->flags |= MSG_DEADLOCK;
+		}
+		/*
+		 * There can be only one such Z-column of missing switches.
+		 */
+		if (*missing_z < 0)
+			*missing_z = i + j * t->x_sz;
+		else if (*missing_z != i + j * t->x_sz)
+			t->flags |= MSG_DEADLOCK;
+
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus switch at %d,%d,%d\n", i, j, k);
+		return;
+	}
+	guid_ho = ntohllu(sw->n_id);
+
+	if (!(sw->ptgrp[0].port_cnt || (t->x_sz == 1) ||
+	      ((t->flags & X_MESH) && i == 0)))
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus -x link on "
+			"switch %d,%d,%d GUID 0x%04llx\n",
+			i, j, k, guid_ho);
+	if (!(sw->ptgrp[1].port_cnt || (t->x_sz == 1) ||
+	      ((t->flags & X_MESH) && (i + 1) == t->x_sz)))
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus +x link on "
+			"switch %d,%d,%d GUID 0x%04llx\n",
+			i, j, k, guid_ho);
+	if (!(sw->ptgrp[2].port_cnt || (t->y_sz == 1) ||
+	      ((t->flags & Y_MESH) && j == 0)))
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus -y link on "
+			"switch %d,%d,%d GUID 0x%04llx\n",
+			i, j, k, guid_ho);
+	if (!(sw->ptgrp[3].port_cnt || (t->y_sz == 1) ||
+	      ((t->flags & Y_MESH) && (j + 1) == t->y_sz)))
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus +y link on "
+			"switch %d,%d,%d GUID 0x%04llx\n",
+			i, j, k, guid_ho);
+	if (!(sw->ptgrp[4].port_cnt || (t->z_sz == 1) ||
+	      ((t->flags & Z_MESH) && k == 0)))
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus -z link on "
+			"switch %d,%d,%d GUID 0x%04llx\n",
+			i, j, k, guid_ho);
+	if (!(sw->ptgrp[5].port_cnt || (t->z_sz == 1) ||
+	      ((t->flags & Z_MESH) && (k + 1) == t->z_sz)))
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Missing torus +z link on "
+			"switch %d,%d,%d GUID 0x%04llx\n",
+			i, j, k, guid_ho);
+}
+
+/*
+ * Returns true if the torus can be successfully routed, false otherwise.
+ */
+static
+bool routable_torus(struct torus *t, struct fabric *f)
+{
+	int i, j, k, tmp = -1;
+	unsigned b2g_cnt, g2b_cnt;
+	bool success = true;
+
+	t->flags &= ~MSG_DEADLOCK;
+
+	if (t->link_cnt != f->link_cnt || t->switch_cnt != f->switch_cnt)
+		OSM_LOG(&t->osm->log, OSM_LOG_INFO,
+			"Warning: Could not construct torus using all "
+			"known fabric switches and/or links.\n");
+
+	for (k = 0; k < (int)t->z_sz; k++)
+		for (j = 0; j < (int)t->y_sz; j++)
+			for (i = 0; i < (int)t->x_sz; i++)
+				rpt_torus_missing(t, i, j, k,
+						  t->sw[i][j][k], &tmp);
+	/*
+	 * Check for multiple failures that create disjoint regions on a ring.
+	 */
+	for (k = 0; k < (int)t->z_sz; k++)
+		for (j = 0; j < (int)t->y_sz; j++) {
+			b2g_cnt = 0;
+			g2b_cnt = 0;
+			for (i = 0; i < (int)t->x_sz; i++) {
+
+				if (!t->sw[i][j][k])
+					continue;
+
+				if (!t->sw[i][j][k]->ptgrp[0].port_cnt)
+					b2g_cnt++;
+				if (!t->sw[i][j][k]->ptgrp[1].port_cnt)
+					g2b_cnt++;
+			}
+			if (b2g_cnt != g2b_cnt) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: strange failures in "
+					"x ring at y=%d  z=%d"
+					" b2g_cnt %u g2b_cnt %u\n",
+					j, k, b2g_cnt, g2b_cnt);
+				success = false;
+			}
+			if (b2g_cnt > 1) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: disjoint failures in "
+					"x ring at y=%d  z=%d\n", j, k);
+				success = false;
+			}
+		}
+
+	for (i = 0; i < (int)t->x_sz; i++)
+		for (k = 0; k < (int)t->z_sz; k++) {
+			b2g_cnt = 0;
+			g2b_cnt = 0;
+			for (j = 0; j < (int)t->y_sz; j++) {
+
+				if (!t->sw[i][j][k])
+					continue;
+
+				if (!t->sw[i][j][k]->ptgrp[2].port_cnt)
+					b2g_cnt++;
+				if (!t->sw[i][j][k]->ptgrp[3].port_cnt)
+					g2b_cnt++;
+			}
+			if (b2g_cnt != g2b_cnt) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: strange failures in "
+					"y ring at x=%d  z=%d"
+					" b2g_cnt %u g2b_cnt %u\n",
+					i, k, b2g_cnt, g2b_cnt);
+				success = false;
+			}
+			if (b2g_cnt > 1) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: disjoint failures in "
+					"y ring at x=%d  z=%d\n", i, k);
+				success = false;
+			}
+		}
+
+	for (j = 0; j < (int)t->y_sz; j++)
+		for (i = 0; i < (int)t->x_sz; i++) {
+			b2g_cnt = 0;
+			g2b_cnt = 0;
+			for (k = 0; k < (int)t->z_sz; k++) {
+
+				if (!t->sw[i][j][k])
+					continue;
+
+				if (!t->sw[i][j][k]->ptgrp[4].port_cnt)
+					b2g_cnt++;
+				if (!t->sw[i][j][k]->ptgrp[5].port_cnt)
+					g2b_cnt++;
+			}
+			if (b2g_cnt != g2b_cnt) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: strange failures in "
+					"z ring at x=%d  y=%d"
+					" b2g_cnt %u g2b_cnt %u\n",
+					i, j, b2g_cnt, g2b_cnt);
+				success = false;
+			}
+			if (b2g_cnt > 1) {
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: disjoint failures in "
+					"z ring at x=%d  y=%d\n", i, j);
+				success = false;
+			}
+		}
+
+	if (t->flags & MSG_DEADLOCK) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: missing switch topology "
+			"==> message deadlock!\n");
+		success = false;
+	}
+	return success;
+}
+
+/*
+ * Use this function to re-establish the pointers between a torus endpoint
+ * and an opensm osm_port_t.
+ *
+ * Typically this is only needed when "opensm --ucast-cache" is used, and
+ * a CA link bounces.  When the CA port goes away, the osm_port_t object
+ * is destroyed, invalidating the endpoint osm_port_t pointer.  When the
+ * link comes back, a new osm_port_t object is created with a NULL priv
+ * member.  Thus, when osm_get_torus_sl() is called it is missing the data
+ * needed to do its work.  Use this function to fix things up.
+ */
+static
+struct endpoint *osm_port_relink_endpoint(const osm_port_t *osm_port)
+{
+	guid_t node_guid;
+	uint8_t port_num, r_port_num;
+	struct t_switch *sw;
+	struct endpoint *ep = NULL;
+	osm_switch_t *osm_sw;
+	osm_physp_t *osm_physp;
+	osm_node_t *osm_node, *r_osm_node;
+
+	/*
+	 * We need to find the torus endpoint that has the same GUID as
+	 * the osm_port.  Rather than search the entire set of endpoints,
+	 * we'll try to follow pointers.
+	 */
+	osm_physp = osm_port->p_physp;
+	osm_node = osm_port->p_node;
+	port_num = osm_physp_get_port_num(osm_physp);
+	node_guid = osm_node_get_node_guid(osm_node);
+	/*
+	 * Switch management port?
+	 */
+	if (port_num == 0 &&
+	    osm_node_get_type(osm_node) == IB_NODE_TYPE_SWITCH) {
+
+		osm_sw = osm_node->sw;
+		if (osm_sw && osm_sw->priv) {
+			sw = osm_sw->priv;
+			if (sw->osm_switch == osm_sw &&
+			    sw->port[0]->n_id == node_guid) {
+
+				ep = sw->port[0];
+				goto relink_priv;
+			}
+		}
+	}
+	/*
+	 * CA port?  Try other end of link.  This should also catch a
+	 * router port if it is connected to a switch.
+	 */
+	r_osm_node = osm_node_get_remote_node(osm_node, port_num, &r_port_num);
+	if (!r_osm_node)
+		goto out;
+
+	osm_sw = r_osm_node->sw;
+	if (!osm_sw)
+		goto out;
+
+	sw = osm_sw->priv;
+	if (!(sw && sw->osm_switch == osm_sw))
+		goto out;
+
+	ep = sw->port[r_port_num];
+	if (!(ep && ep->link))
+		goto out;
+
+	if (ep->link->end[0].n_id == node_guid) {
+		ep = &ep->link->end[0];
+		goto relink_priv;
+	}
+	if (ep->link->end[1].n_id == node_guid) {
+		ep = &ep->link->end[1];
+		goto relink_priv;
+	}
+	ep = NULL;
+	goto out;
+
+relink_priv:
+	/* FIXME:
+	 * Unfortunately, we need to cast away const to rebuild the links
+	 * between the torus endpoint and the osm_port_t.
+	 *
+	 * What is really needed is to check whether pr_rcv_get_path_parms()
+	 * needs its port objects to be const.  If so, why, and whether
+	 * anything can be done about it.
+	 */
+	((osm_port_t *)osm_port)->priv = ep;
+	ep->osm_port = (osm_port_t *)osm_port;
+out:
+	return ep;
+}
+
+/*
+ * Computing LFT entries and path SL values:
+ *
+ * For a pristine torus, we compute LFT entries using XYZ DOR, and select
+ * which direction to route on a ring (i.e., the 1-D torus for the coordinate
+ * in question) based on shortest path.  We compute the SL to use for the
+ * path based on whether we crossed a dateline (where a ring coordinate
+ * wraps to zero) for each coordinate.
+ *
+ * When there is a link/switch failure, we want to compute LFT entries
+ * to route around the failure, without changing the path SL.  I.e., we
+ * want the SL to reach a given destination from a given source to be
+ * independent of the presence or number of failed components in the fabric.
+ *
+ * In order to make this feasible, we will assume that no ring is broken
+ * into disjoint pieces by multiple failures
+ *
+ * We handle failure by attempting to take the long way around any ring
+ * with connectivity interrupted by failed components, unless the path
+ * requires a turn on a failed switch.
+ *
+ * For paths that require a turn on a failed switch, we head towards the
+ * failed switch, then turn when progress is blocked by a failure, using a
+ * turn allowed under XYZ DOR.  However, such a path will also require a turn
+ * that is not a legal XYZ DOR turn, so we construct the SL2VL mapping tables
+ * such that XYZ DOR turns use one set of VLs and ZYX DOR turns use a
+ * separate set of VLs.
+ *
+ * Under these rules the algorithm guarantees credit-loop-free routing for a
+ * single failed switch, without any change in path SL values.  We can also
+ * guarantee credit-loop-free routing for failures of multiple switches, if
+ * they are adjacent in the last DOR direction.  Since we use XYZ-DOR,
+ * that means failed switches at i,j,k and i,j,k+1 will not cause credit
+ * loops.
+ *
+ * These failure routing rules are intended to prevent paths that cross any
+ * coordinate dateline twice (over and back), so we don't need to worry about
+ * any ambiguity over which SL to use for such a case.  Also, we cannot have
+ * a ring deadlock when a ring is broken by failure and we route the long
+ * way around, so we don't need to worry about the impact of such routing
+ * on SL choice.
+ */
+
+/*
+ * Functions to set our SL bit encoding for routing/QoS info.  Combine the
+ * resuts of these functions with bitwise or to get final SL.
+ *
+ * SL bits 0-2 encode whether we "looped" in a given direction
+ * on the torus on the path from source to destination.
+ *
+ * SL bit 3 encodes the QoS level.  We only support two QoS levels.
+ *
+ * Below we assume TORUS_MAX_DIM == 3 and 0 <= coord_dir < TORUS_MAX_DIM.
+ */
+static inline
+unsigned sl_set_use_loop_vl(bool use_loop_vl, unsigned coord_dir)
+{
+	return (coord_dir < TORUS_MAX_DIM)
+		? ((unsigned)use_loop_vl << coord_dir) : 0;
+}
+
+static inline
+unsigned sl_set_qos(unsigned qos)
+{
+	return (unsigned)(!!qos) << TORUS_MAX_DIM;
+}
+
+/*
+ * Functions to crack our SL bit encoding for routing/QoS info.
+ */
+static inline
+bool sl_get_use_loop_vl(unsigned sl, unsigned coord_dir)
+{
+	return (coord_dir < TORUS_MAX_DIM)
+		? (sl >> coord_dir) & 0x1 : false;
+}
+
+static inline
+unsigned sl_get_qos(unsigned sl)
+{
+	return (sl >> TORUS_MAX_DIM) & 0x1;
+}
+
+/*
+ * Functions to encode routing/QoS info into VL bits.  Combine the resuts of
+ * these functions with bitwise or to get final VL.
+ *
+ * VL bit 0 encodes whether we need to leave on the "loop" VL.
+ *
+ * VL bit 1 encodes whether turn is XYZ DOR or ZYX DOR. A 3d mesh/torus
+ * has 6 turn types: x-y, y-z, x-z, y-x, z-y, z-x.  The first three are
+ * legal XYZ DOR turns, and the second three are legal ZYX DOR turns.
+ * Straight-through (x-x, y-y, z-z) paths are legal in both DOR variants,
+ * so we'll assign them to XYZ DOR VLs.
+ *
+ * Note that delivery to switch-local ports (i.e. those that source/sink
+ * traffic, rather than forwarding it) cannot cause a deadlock, so that
+ * can also use either XYZ or ZYX DOR.
+ *
+ * VL bit 2 encodes QoS level.
+ *
+ * Note that if VL bit encodings are changed here, the available fabric VL
+ * verification in verify_setup() needs to be updated as well.
+ */
+static inline
+unsigned vl_set_loop_vl(bool use_loop_vl)
+{
+	return use_loop_vl;
+}
+
+static inline
+unsigned vl_set_qos_vl(unsigned qos)
+{
+	return (qos & 0x1) << 2;
+}
+
+static inline
+unsigned vl_set_turn_vl(unsigned in_coord_dir, unsigned out_coord_dir)
+{
+	unsigned vl = 0;
+
+	if (in_coord_dir != TORUS_MAX_DIM &&
+	    out_coord_dir != TORUS_MAX_DIM)
+		vl = (in_coord_dir > out_coord_dir)
+			? 0x1 << 1 : 0;
+
+	return vl;
+}
+
+static
+unsigned sl2vl_entry(struct torus *t, struct t_switch *sw,
+		     int input_pt, int output_pt, unsigned sl)
+{
+	unsigned id, od, vl, data_vls;
+
+	if (sw && sw->port[input_pt])
+		id = sw->port[input_pt]->pgrp->port_grp / 2;
+	else
+		id = TORUS_MAX_DIM;
+
+	if (sw && sw->port[output_pt])
+		od = sw->port[output_pt]->pgrp->port_grp / 2;
+	else
+		od = TORUS_MAX_DIM;
+
+	data_vls = t->osm->subn.min_data_vls;
+	vl = 0;
+
+	if (data_vls >= 2)
+		vl |= vl_set_loop_vl(sl_get_use_loop_vl(sl, od));
+	if (data_vls >= 4)
+		vl |= vl_set_turn_vl(id, od);
+	if (data_vls >= 8)
+		vl |= vl_set_qos_vl(sl_get_qos(sl));
+
+	return vl;
+}
+
+static
+void torus_update_osm_sl2vl(void *context, osm_physp_t *osm_phys_port,
+			    uint8_t iport_num, uint8_t oport_num,
+			    ib_slvl_table_t *osm_oport_sl2vl)
+{
+	osm_node_t *node = osm_physp_get_node_ptr(osm_phys_port);
+	struct torus_context *ctx = context;
+	struct t_switch *sw = NULL;
+	int sl, vl;
+
+	if (node->sw) {
+		sw = node->sw->priv;
+		if (sw && sw->osm_switch != node->sw) {
+			osm_log_t *log = &ctx->osm->log;
+			guid_t guid;
+
+			guid = osm_node_get_node_guid(node);
+			OSM_LOG(log, OSM_LOG_INFO,
+				"Error: osm_switch (GUID 0x%04llx) "
+				"not in our fabric description\n",
+				ntohllu(guid));
+		return;
+		}
+	}
+	for (sl = 0; sl < 16; sl++) {
+		vl = sl2vl_entry(ctx->torus, sw, iport_num, oport_num, sl);
+		ib_slvl_table_set(osm_oport_sl2vl, sl, vl);
+	}
+}
+
+/*
+ * Computes the path lengths *vl0_len and *vl1_len to get from src
+ * to dst on a ring with count switches.
+ *
+ * *vl0_len is the path length for a direct path; it corresponds to a path
+ * that should be assigned to use VL0 in a switch.  *vl1_len is the path
+ * length for a path that wraps aroung the ring, i.e. where the ring index
+ * goes from count to zero or from zero to count.  It corresponds to the path
+ * that should be assigned to use VL1 in a switch.
+ */
+static
+void get_pathlen(unsigned src, unsigned dst, unsigned count,
+		 unsigned *vl0_len, unsigned *vl1_len)
+{
+	unsigned s, l;		/* assume s < l */
+
+	if (dst > src) {
+		s = src;
+		l = dst;
+	} else {
+		s = dst;
+		l = src;
+	}
+	*vl0_len = l - s;
+	*vl1_len = s + count - l;
+}
+
+/*
+ * Returns a positive number if we should take the "positive" ring direction
+ * to reach dst from src, a negative number if we should take the "negative"
+ * ring direction, and 0 if src and dst are the same.  The choice is strictly
+ * based on which path is shorter.
+ */
+static
+int ring_dir_idx(unsigned src, unsigned dst, unsigned count)
+{
+	int r;
+	unsigned vl0_len, vl1_len;
+
+	if (dst == src)
+		return 0;
+
+	get_pathlen(src, dst, count, &vl0_len, &vl1_len);
+
+	if (dst > src)
+		r = vl0_len <= vl1_len ? 1 : -1;
+	else
+		r = vl0_len <= vl1_len ? -1 : 1;
+
+	return r;
+}
+
+/*
+ * Returns true if the VL1 path should be used to reach src from dst on a
+ * ring, based on which path is shorter.
+ */
+static
+bool use_vl1(unsigned src, unsigned dst, unsigned count)
+{
+	unsigned vl0_len, vl1_len;
+
+	get_pathlen(src, dst, count, &vl0_len, &vl1_len);
+
+	return vl0_len <= vl1_len ? false : true;
+}
+
+/*
+ * Returns the next switch in the ring of switches along coordinate direction
+ * cdir, in the positive ring direction if rdir is positive, and in the
+ * negative ring direction if rdir is negative.
+ *
+ * Returns NULL if rdir is zero, or there is no next switch.
+ */
+static
+struct t_switch *ring_next_sw(struct t_switch *sw, unsigned cdir, int rdir)
+{
+	unsigned pt_grp, far_end = 0;
+
+	if (!rdir)
+		return NULL;
+	/*
+	 * Recall that links are installed into the torus so that their 1 end
+	 * is in the "positive" coordinate direction relative to their 0 end
+	 * (see link_tswitches() and connect_tlink()).  Recall also that for
+	 * interswitch links, all links in a given switch port group have the
+	 * same endpoints, so we just need to look at the first link.
+	 */
+	pt_grp = 2 * cdir;
+	if (rdir > 0) {
+		pt_grp++;
+		far_end = 1;
+	}
+
+	if (!sw->ptgrp[pt_grp].port_cnt)
+		return NULL;
+
+	return sw->ptgrp[pt_grp].port[0]->link->end[far_end].sw;
+}
+
+/*
+ * Returns a positive number if we should take the "positive" ring direction
+ * to reach dsw from ssw, a negative number if we should take the "negative"
+ * ring direction, and 0 if src and dst are the same, or if dsw is not
+ * reachable from ssw because the path is interrupted by failure.
+ */
+static
+int ring_dir_path(struct torus *t, unsigned cdir,
+		  struct t_switch *ssw, struct t_switch *dsw)
+{
+	int d = 0;
+	struct t_switch *sw;
+
+	switch (cdir) {
+	case 0:
+		d = ring_dir_idx(ssw->i, dsw->i, t->x_sz);
+		break;
+	case 1:
+		d = ring_dir_idx(ssw->j, dsw->j, t->y_sz);
+		break;
+	case 2:
+		d = ring_dir_idx(ssw->k, dsw->k, t->z_sz);
+		break;
+	default:
+		break;
+	}
+	if (!d)
+		goto out;
+
+	sw = ssw;
+	while (sw) {
+		sw = ring_next_sw(sw, cdir, d);
+		if (sw == dsw)
+			goto out;
+	}
+	d *= -1;
+	sw = ssw;
+	while (sw) {
+		sw = ring_next_sw(sw, cdir, d);
+		if (sw == dsw)
+			goto out;
+	}
+	d = 0;
+out:
+	return d;
+}
+
+/*
+ * Returns true, and sets *pt_grp to the port group index to use for the
+ * next hop, if it is possible to make progress from ssw to dsw along the
+ * coordinate direction cdir, taking into account whether there are
+ * interruptions in the path.
+ *
+ * This next hop result can be used without worrying about ring deadlocks -
+ * if we don't choose the shortest path it is because there is a failure in
+ * the ring, which removes the possibilility of a ring deadlock on that ring.
+ */
+static
+bool next_hop_path(struct torus *t, unsigned cdir,
+		   struct t_switch *ssw, struct t_switch *dsw,
+		   unsigned *pt_grp)
+{
+	struct t_switch *tsw = NULL;
+	bool success = false;
+	int d;
+
+	/*
+	 * If the path from ssw to dsw turns, this is the switch where the
+	 * turn happens.
+	 */
+	switch (cdir) {
+	case 0:
+		tsw = t->sw[dsw->i][ssw->j][ssw->k];
+		break;
+	case 1:
+		tsw = t->sw[ssw->i][dsw->j][ssw->k];
+		break;
+	case 2:
+		tsw = t->sw[ssw->i][ssw->j][dsw->k];
+		break;
+	default:
+		goto out;
+	}
+	if (tsw) {
+		d = ring_dir_path(t, cdir, ssw, tsw);
+		cdir *= 2;
+		if (d > 0)
+			*pt_grp = cdir + 1;
+		else if (d < 0)
+			*pt_grp = cdir;
+		else
+			goto out;
+		success = true;
+	}
+out:
+	return success;
+}
+
+/*
+ * Returns true, and sets *pt_grp to the port group index to use for the
+ * next hop, if it is possible to make progress from ssw to dsw along the
+ * coordinate direction cdir.  This decision is made strictly on a
+ * shortest-path basis without regard for path availability.
+ */
+static
+bool next_hop_idx(struct torus *t, unsigned cdir,
+		  struct t_switch *ssw, struct t_switch *dsw,
+		  unsigned *pt_grp)
+{
+	int d;
+	unsigned g;
+	bool success = false;
+
+	switch (cdir) {
+	case 0:
+		d = ring_dir_idx(ssw->i, dsw->i, t->x_sz);
+		break;
+	case 1:
+		d = ring_dir_idx(ssw->j, dsw->j, t->y_sz);
+		break;
+	case 2:
+		d = ring_dir_idx(ssw->k, dsw->k, t->z_sz);
+		break;
+	default:
+		goto out;
+	}
+
+	cdir *= 2;
+	if (d > 0)
+		g = cdir + 1;
+	else if (d < 0)
+		g = cdir;
+	else
+		goto out;
+
+	if (!ssw->ptgrp[g].port_cnt)
+		goto out;
+
+	*pt_grp = g;
+	success = true;
+out:
+	return success;
+}
+
+static
+void warn_on_routing(const char *msg,
+		     struct t_switch *sw, struct t_switch *dsw)
+{
+	OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+		"%s from sw 0x%04llx (%d,%d,%d) to sw 0x%04llx (%d,%d,%d)\n",
+		msg, ntohllu(sw->n_id), sw->i, sw->j, sw->k,
+		ntohllu(dsw->n_id), dsw->i, dsw->j, dsw->k);
+}
+
+static
+bool next_hop_x(struct torus *t,
+		struct t_switch *ssw, struct t_switch *dsw, unsigned *pt_grp)
+{
+	if (t->sw[dsw->i][ssw->j][ssw->k])
+		/*
+		 * The next turning switch on this path is available,
+		 * so head towards it by the shortest available path.
+		 */
+		return next_hop_path(t, 0, ssw, dsw, pt_grp);
+	else
+		/*
+		 * The next turning switch on this path is not
+		 * available, so head towards it in the shortest
+		 * path direction.
+		 */
+		return next_hop_idx(t, 0, ssw, dsw, pt_grp);
+}
+
+static
+bool next_hop_y(struct torus *t,
+		struct t_switch *ssw, struct t_switch *dsw, unsigned *pt_grp)
+{
+	if (t->sw[ssw->i][dsw->j][ssw->k])
+		/*
+		 * The next turning switch on this path is available,
+		 * so head towards it by the shortest available path.
+		 */
+		return next_hop_path(t, 1, ssw, dsw, pt_grp);
+	else
+		/*
+		 * The next turning switch on this path is not
+		 * available, so head towards it in the shortest
+		 * path direction.
+		 */
+		return next_hop_idx(t, 1, ssw, dsw, pt_grp);
+}
+
+static
+bool next_hop_z(struct torus *t,
+		struct t_switch *ssw, struct t_switch *dsw, unsigned *pt_grp)
+{
+	return next_hop_path(t, 2, ssw, dsw, pt_grp);
+}
+
+/*
+ * Returns the port number on *sw to use to reach *dsw, or -1 if unable to
+ * route.
+ */
+static
+int lft_port(struct torus *t,
+	     struct t_switch *sw, struct t_switch *dsw,
+	     bool update_port_cnt, bool ca)
+{
+	unsigned g, p;
+	struct port_grp *pg;
+
+	/*
+	 * The IBA does not provide a way to preserve path history for
+	 * routing decisions and VL assignment, and the only mechanism to
+	 * provide global fabric knowledge to the routing engine is via
+	 * the four SL bits.  This severely constrains the ability to deal
+	 * with missing/dead switches.
+	 *
+	 * Also, if routing a torus with XYZ-DOR, the only way to route
+	 * around a missing/dead switch is to introduce a turn that is
+	 * illegal under XYZ-DOR.
+	 *
+	 * But here's what we can do:
+	 *
+	 * We have a VL bit we use to flag illegal turns, thus putting the
+	 * hop directly after an illegal turn on a separate set of VLs.
+	 * Unfortunately, since there is no path history,  the _second_
+	 * and subsequent hops after an illegal turn use the standard
+	 * XYZ-DOR VL set.  This is enough to introduce credit loops in
+	 * many cases.
+	 *
+	 * To minimize the number of cases such illegal turns can introduce
+	 * credit loops, we try to introduce the illegal turn as late in a
+	 * path as possible.
+	 *
+	 * Define a turning switch as a switch where a path turns from one
+	 * coordinate direction onto another.  If a turning switch in a path
+	 * is missing, construct the LFT entries so that the path progresses
+	 * as far as possible on the shortest path to the turning switch.
+	 * When progress is not possible, turn onto the next coordinate
+	 * direction.
+	 *
+	 * The next turn after that will be an illegal turn, after which
+	 * point the path will continue to use a standard XYZ-DOR path.
+	 */
+	if (dsw->i != sw->i) {
+
+		if (next_hop_x(t, sw, dsw, &g))
+			goto done;
+		/*
+		 * This path has made as much progress in this direction as
+		 * is possible, so turn it now.
+		 */
+		if (dsw->j != sw->j && next_hop_y(t, sw, dsw, &g))
+			goto done;
+
+		if (dsw->k != sw->k && next_hop_z(t, sw, dsw, &g))
+			goto done;
+
+		warn_on_routing("Error: unable to route", sw, dsw);
+		goto no_route;
+	} else if (dsw->j != sw->j) {
+
+		if (next_hop_y(t, sw, dsw, &g))
+			goto done;
+
+		if (dsw->k != sw->k && next_hop_z(t, sw, dsw, &g))
+			goto done;
+
+		warn_on_routing("Error: unable to route", sw, dsw);
+		goto no_route;
+	} else {
+		if (dsw->k == sw->k)
+			warn_on_routing("Warning: bad routing", sw, dsw);
+
+		if (next_hop_z(t, sw, dsw, &g))
+			goto done;
+
+		warn_on_routing("Error: unable to route", sw, dsw);
+		goto no_route;
+	}
+done:
+	pg = &sw->ptgrp[g];
+	if (!pg->port_cnt)
+		goto no_route;
+
+	if (update_port_cnt) {
+		if (ca)
+			p = pg->ca_dlid_cnt++ % pg->port_cnt;
+		else
+			p = pg->sw_dlid_cnt++ % pg->port_cnt;
+	} else {
+		/*
+		 * If we're not updating port counts, then we're just running
+		 * routes for SL path checking, and it doesn't matter which
+		 * of several parallel links we use.  Use the first one.
+		 */
+		p = 0;
+	}
+	p = pg->port[p]->port;
+
+	return p;
+
+no_route:
+	/*
+	 * We can't get there from here.
+	 */
+	OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+		"Error: routing on sw 0x%04llx: sending "
+		"traffic for dest sw 0x%04llx to port %u\n",
+		ntohllu(sw->n_id), ntohllu(dsw->n_id), OSM_NO_PATH);
+	return -1;
+}
+
+static
+bool get_lid(struct port_grp *pg, unsigned p,
+	     uint16_t *dlid_base, uint8_t *dlid_lmc, bool *ca)
+{
+	struct endpoint *ep;
+	osm_port_t *osm_port;
+
+	if (p >= pg->port_cnt) {
+		OSM_LOG(&pg->sw->torus->osm->log, OSM_LOG_ERROR,
+			"Error: Port group index %u too large: sw "
+			"0x%04llx pt_grp %u pt_grp_cnt %u\n",
+			p, ntohllu(pg->sw->n_id),
+			(unsigned)pg->port_grp, (unsigned)pg->port_cnt);
+		return false;
+	}
+	if (pg->port[p]->type == SRCSINK) {
+		ep = pg->port[p];
+		if (ca)
+			*ca = false;
+	} else if (pg->port[p]->type == PASSTHRU &&
+		   pg->port[p]->link->end[1].type == SRCSINK) {
+		/*
+		 * If this port is connected via a link to a CA, then we
+		 * know link->end[0] is the switch end and link->end[1] is
+		 * the CA end; see build_ca_link() and link_srcsink().
+		 */
+		ep = &pg->port[p]->link->end[1];
+		if (ca)
+			*ca = true;
+	} else {
+		OSM_LOG(&pg->sw->torus->osm->log, OSM_LOG_ERROR,
+			"Error: Switch 0x%04llx port %d improperly connected\n",
+			ntohllu(pg->sw->n_id), pg->port[p]->port);
+		return false;
+	}
+	osm_port = ep->osm_port;
+	if (!(osm_port && osm_port->priv == ep)) {
+		OSM_LOG(&pg->sw->torus->osm->log, OSM_LOG_ERROR,
+			"Error: ep->osm_port->priv != ep "
+			"for sw 0x%04llu port %d\n",
+			ntohllu(((struct t_switch *)(ep->sw))->n_id), ep->port);
+		return false;
+	}
+	*dlid_base = cl_ntoh16(osm_physp_get_base_lid(osm_port->p_physp));
+	*dlid_lmc = osm_physp_get_lmc(osm_port->p_physp);
+
+	return true;
+}
+
+static
+bool torus_lft(struct torus *t, struct t_switch *sw)
+{
+	bool success = true;
+	int dp;
+	unsigned p, s;
+	uint16_t l, dlid_base;
+	uint8_t dlid_lmc;
+	bool ca;
+	struct port_grp *pgrp;
+	struct t_switch *dsw;
+	osm_switch_t *osm_sw;
+
+	if (!(sw->osm_switch && sw->osm_switch->priv == sw)) {
+		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+			"Error: sw->osm_switch->priv != sw "
+			"for sw 0x%04llu\n", ntohllu(sw->n_id));
+		return false;
+	}
+	osm_sw = sw->osm_switch;
+	memset(osm_sw->new_lft, OSM_NO_PATH, osm_sw->lft_size);
+
+	for (s = 0; s < t->switch_cnt; s++) {
+
+		dsw = t->sw_pool[s];
+		pgrp = &dsw->ptgrp[2 * TORUS_MAX_DIM];
+
+		for (p = 0; p < pgrp->port_cnt; p++) {
+
+			if (!get_lid(pgrp, p, &dlid_base, &dlid_lmc, &ca))
+				return false;
+
+			if (sw->n_id == dsw->n_id)
+				dp = pgrp->port[p]->port;
+			else
+				dp = lft_port(t, sw, dsw, true, ca);
+			/*
+			 * LMC > 0 doesn't really make sense for torus-2QoS.
+			 * So, just make sure traffic gets delivered if
+			 * non-zero LMC is used.
+			 */
+			if (dp >= 0)
+				for (l = 0; l < (1U << dlid_lmc); l++)
+					osm_sw->new_lft[dlid_base + l] = dp;
+			else
+				success = false;
+		}
+	}
+	return success;
+}
+
+static
+osm_mtree_node_t *mcast_stree_branch(struct t_switch *sw, osm_switch_t *osm_sw,
+				     osm_mgrp_box_t *mgb, unsigned depth,
+				     unsigned *port_cnt, unsigned *max_depth)
+{
+	osm_mtree_node_t *mtn = NULL;
+	osm_mcast_tbl_t *mcast_tbl, *ds_mcast_tbl;
+	osm_node_t *ds_node;
+	struct t_switch *ds_sw;
+	struct port_grp *ptgrp;
+	struct link *link;
+	struct endpoint *port;
+	unsigned g, p;
+	unsigned mcast_fwd_ports = 0, mcast_end_ports = 0;
+
+	depth++;
+
+	if (osm_sw->priv != sw) {
+		OSM_LOG(&sw->torus->osm->log, OSM_LOG_INFO,
+			"Error: osm_sw (GUID 0x%04llx) "
+			"not in our fabric description\n",
+			ntohllu(osm_node_get_node_guid(osm_sw->p_node)));
+		goto out;
+	}
+	if (!osm_switch_supports_mcast(osm_sw)) {
+		OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+			"Error: osm_sw (GUID 0x%04llx) "
+			"does not support multicast\n",
+			ntohllu(osm_node_get_node_guid(osm_sw->p_node)));
+		goto out;
+	}
+	mtn = osm_mtree_node_new(osm_sw);
+	if (!mtn) {
+		OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+			"Insufficient memory to build multicast tree\n");
+		goto out;
+	}
+	mcast_tbl = osm_switch_get_mcast_tbl_ptr(osm_sw);
+	/*
+	 * Recurse to downstream switches, i.e. those closer to master
+	 * spanning tree branch tips.
+	 *
+	 * Note that if there are multiple ports in this port group, i.e.,
+	 * multiple parallel links, we can pick any one of them to use for
+	 * any individual MLID without causing loops.  Pick one based on MLID
+	 * for now, until someone turns up evidence we need to be smarter.
+	 *
+	 * Also, it might be we got called in a window between a switch getting
+	 * removed from the fabric, and torus-2QoS getting to rebuild its
+	 * fabric representation.  If that were to happen, our next hop
+	 * osm_switch pointer might be stale.  Look it up via opensm's fabric
+	 * description to be sure it's not.
+	 */
+	for (g = 0; g < 2 * TORUS_MAX_DIM; g++) {
+		ptgrp = &sw->ptgrp[g];
+		if (!ptgrp->to_stree_tip)
+			continue;
+
+		p = mgb->mlid % ptgrp->port_cnt;/* port # in port group */
+		p = ptgrp->port[p]->port;	/* now port # in switch */
+
+		ds_node = osm_node_get_remote_node(osm_sw->p_node, p, NULL);
+		ds_sw = ptgrp->to_stree_tip->sw;
+
+		if (!(ds_node && ds_node->sw &&
+		      ds_sw->osm_switch == ds_node->sw)) {
+			OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+				"Error: stale pointer to osm_sw "
+				"(GUID 0x%04llx)\n", ntohllu(ds_sw->n_id));
+			continue;
+		}
+		mtn->child_array[p] =
+			mcast_stree_branch(ds_sw, ds_node->sw, mgb,
+					   depth, port_cnt, max_depth);
+		if (!mtn->child_array[p])
+			continue;
+
+		osm_mcast_tbl_set(mcast_tbl, mgb->mlid, p);
+		mcast_fwd_ports++;
+		/*
+		 * Since we forward traffic for this multicast group on this
+		 * port, cause the switch on the other end of the link
+		 * to forward traffic back to us.  Do it now since have at
+		 * hand the link used; otherwise it'll be hard to figure out
+		 * later, and if we get it wrong we get a MC routing loop.
+		 */
+		link = sw->port[p]->link;
+		ds_mcast_tbl = osm_switch_get_mcast_tbl_ptr(ds_node->sw);
+
+		if (&link->end[0] == sw->port[p])
+			osm_mcast_tbl_set(ds_mcast_tbl, mgb->mlid,
+					  link->end[1].port);
+		else
+			osm_mcast_tbl_set(ds_mcast_tbl, mgb->mlid,
+					  link->end[0].port);
+	}
+	/*
+	 * Add any host ports marked as in mcast group into spanning tree.
+	 */
+	ptgrp = &sw->ptgrp[2 * TORUS_MAX_DIM];
+	for (p = 0; p < ptgrp->port_cnt; p++) {
+		port = ptgrp->port[p];
+		if (port->tmp) {
+			port->tmp = NULL;
+			mtn->child_array[port->port] = OSM_MTREE_LEAF;
+			osm_mcast_tbl_set(mcast_tbl, mgb->mlid, port->port);
+			mcast_end_ports++;
+		}
+	}
+	if (!(mcast_end_ports || mcast_fwd_ports)) {
+		free(mtn);
+		mtn = NULL;
+	} else if (depth > *max_depth)
+		*max_depth = depth;
+
+	*port_cnt += mcast_end_ports;
+out:
+	return mtn;
+}
+
+static
+osm_port_t *next_mgrp_box_port(osm_mgrp_box_t *mgb,
+			       cl_list_item_t **list_iterator,
+			       cl_map_item_t **map_iterator)
+{
+	osm_mgrp_t *mgrp;
+	osm_mcm_port_t *mcm_port;
+	osm_port_t *osm_port = NULL;
+	cl_map_item_t *m_item = *map_iterator;
+	cl_list_item_t *l_item = *list_iterator;
+
+next_mgrp:
+	if (!l_item)
+		l_item = cl_qlist_head(&mgb->mgrp_list);
+	if (l_item == cl_qlist_end(&mgb->mgrp_list)) {
+		l_item = NULL;
+		goto out;
+	}
+	mgrp = cl_item_obj(l_item, mgrp, list_item);
+
+	if (!m_item)
+		m_item = cl_qmap_head(&mgrp->mcm_port_tbl);
+	if (m_item == cl_qmap_end(&mgrp->mcm_port_tbl)) {
+		m_item = NULL;
+		l_item = cl_qlist_next(l_item);
+		goto next_mgrp;
+	}
+	mcm_port = cl_item_obj(m_item, mcm_port, map_item);
+	m_item = cl_qmap_next(m_item);
+	osm_port = mcm_port->port;
+out:
+	*list_iterator = l_item;
+	*map_iterator = m_item;
+	return osm_port;
+}
+
+static
+ib_api_status_t torus_mcast_stree(void *context, osm_mgrp_box_t *mgb)
+{
+	struct torus_context *ctx = context;
+	struct torus *t = ctx->torus;
+	cl_map_item_t *m_item = NULL;
+	cl_list_item_t *l_item = NULL;
+	osm_port_t *osm_port;
+	osm_switch_t *osm_sw;
+	struct endpoint *port;
+	unsigned port_cnt = 0, max_depth = 0;
+
+	osm_purge_mtree(&ctx->osm->sm, mgb);
+
+	/*
+	 * Build a spanning tree for a multicast group by first marking
+	 * the torus endpoints that are participating in the group.
+	 * Then do a depth-first search of the torus master spanning
+	 * tree to build up the spanning tree specific to this group.
+	 *
+	 * Since the torus master spanning tree is constructed specifically
+	 * to guarantee that multicast will not deadlock against unicast
+	 * when they share VLs, we can be sure that any multicast group
+	 * spanning tree constructed this way has the same property.
+	 */
+	while ((osm_port = next_mgrp_box_port(mgb, &l_item, &m_item))) {
+		port = osm_port->priv;
+		if (!(port && port->osm_port == osm_port)) {
+			port = osm_port_relink_endpoint(osm_port);
+			if (!port) {
+				guid_t id;
+				id = osm_node_get_node_guid(osm_port->p_node);
+				OSM_LOG(&ctx->osm->log, OSM_LOG_ERROR,
+					"Error: osm_port (GUID 0x%04llx) "
+					"not in our fabric description\n",
+					ntohllu(id));
+				continue;
+			}
+		}
+		/*
+		 * If this is a CA port, mark the switch port at the
+		 * other end of this port's link.
+		 *
+		 * By definition, a CA port is connected to end[1] of a link,
+		 * and the switch port is end[0].  See build_ca_link() and
+		 * link_srcsink().
+		 */
+		if (port->link)
+			port = &port->link->end[0];
+		port->tmp = osm_port;
+	}
+	/*
+	 * It might be we got called in a window between a switch getting
+	 * removed from the fabric, and torus-2QoS getting to rebuild its
+	 * fabric representation.  If that were to happen, our
+	 * master_stree_root->osm_switch pointer might be stale.  Look up
+	 * the osm_switch by GUID to be sure it's not.
+	 *
+	 * Also, call into mcast_stree_branch with depth = -1, because
+	 * depth at root switch needs to be 0.
+	 */
+	osm_sw = (osm_switch_t *)cl_qmap_get(&ctx->osm->subn.sw_guid_tbl,
+					     t->master_stree_root->n_id);
+	if (!(osm_sw && t->master_stree_root->osm_switch == osm_sw)) {
+		OSM_LOG(&ctx->osm->log, OSM_LOG_ERROR,
+			"Error: stale pointer to osm_sw (GUID 0x%04llx)\n",
+			ntohllu(t->master_stree_root->n_id));
+		return IB_ERROR;
+	}
+	mgb->root = mcast_stree_branch(t->master_stree_root, osm_sw,
+				       mgb, -1, &port_cnt, &max_depth);
+
+	OSM_LOG(&ctx->osm->log, OSM_LOG_VERBOSE,
+		"Configured MLID 0x%X for %u ports, max tree depth = %u\n",
+		mgb->mlid, port_cnt, max_depth);
+
+	return IB_SUCCESS;
+}
+
+static
+bool good_xy_ring(struct torus *t, const int x, const int y, const int z)
+{
+	struct t_switch ****sw = t->sw;
+	bool good_ring = true;
+	int x_tst, y_tst;
+
+	for (x_tst = 0; x_tst < t->x_sz && good_ring; x_tst++)
+		good_ring = sw[x_tst][y][z];
+
+	for (y_tst = 0; y_tst < t->y_sz && good_ring; y_tst++)
+		good_ring = sw[x][y_tst][z];
+
+	return good_ring;
+}
+
+static
+struct t_switch *find_plane_mid(struct torus *t, const int z)
+{
+	int x, dx, xm = t->x_sz / 2;
+	int y, dy, ym = t->y_sz / 2;
+	struct t_switch ****sw = t->sw;
+
+	if (good_xy_ring(t, xm, ym, z))
+		return sw[xm][ym][z];
+
+	for (dx = 1, dy = 1; dx <= xm && dy <= ym; dx++, dy++) {
+
+		x = canonicalize(xm - dx, t->x_sz);
+		y = canonicalize(ym - dy, t->y_sz);
+		if (good_xy_ring(t, x, y, z))
+			return sw[x][y][z];
+
+		x = canonicalize(xm + dx, t->x_sz);
+		y = canonicalize(ym + dy, t->y_sz);
+		if (good_xy_ring(t, x, y, z))
+			return sw[x][y][z];
+	}
+	return NULL;
+}
+
+static
+struct t_switch *find_stree_root(struct torus *t)
+{
+	int x, y, z, dz, zm = t->z_sz / 2;
+	struct t_switch ****sw = t->sw;
+	struct t_switch *root;
+	bool good_plane;
+
+	/*
+	 * Look for a switch near the "center" (wrt. the datelines) of the
+	 * torus, as that will be the most optimum spanning tree root.  Use
+	 * a search that is not exhaustive, on the theory that this routing
+	 * engine isn't useful anyway if too many switches are missing.
+	 *
+	 * Also, want to pick an x-y plane with no missing switches, so that
+	 * the master spanning tree construction algorithm doesn't have to
+	 * deal with needing a turn on a missing switch.
+	 */
+	for (dz = 0; dz <= zm; dz++) {
+
+		z = canonicalize(zm - dz, t->z_sz);
+		good_plane = true;
+		for (y = 0; y < t->y_sz && good_plane; y++)
+			for (x = 0; x < t->x_sz && good_plane; x++)
+				good_plane = sw[x][y][z];
+
+		if (good_plane) {
+			root = find_plane_mid(t, z);
+			if (root)
+				goto out;
+		}
+		if (!dz)
+			continue;
+
+		z = canonicalize(zm + dz, t->z_sz);
+		good_plane = true;
+		for (y = 0; y < t->y_sz && good_plane; y++)
+			for (x = 0; x < t->x_sz && good_plane; x++)
+				good_plane = sw[x][y][z];
+
+		if (good_plane) {
+			root = find_plane_mid(t, z);
+			if (root)
+				goto out;
+		}
+	}
+	/*
+	 * Note that torus-2QoS can route a torus that is missing an entire
+	 * column (switches with x,y constant, for all z values) without
+	 * deadlocks.
+	 *
+	 * if we've reached this point, we must have a column of missing
+	 * switches, as routable_torus() would have returned false for
+	 * any other configuration of missing switches that made it through
+	 * the above.
+	 *
+	 * So any switch in the mid-z plane will do as the root.
+	 */
+	root = find_plane_mid(t, zm);
+out:
+	return root;
+}
+
+static
+bool sw_in_master_stree(struct t_switch *sw)
+{
+	int g;
+	bool connected;
+
+	connected = sw == sw->torus->master_stree_root;
+	for (g = 0; g < 2 * TORUS_MAX_DIM; g++)
+		connected = connected || sw->ptgrp[g].to_stree_root;
+
+	return connected;
+}
+
+static
+void grow_master_stree_branch(struct t_switch *root, struct t_switch *tip,
+			      unsigned to_root_pg, unsigned to_tip_pg)
+{
+	root->ptgrp[to_tip_pg].to_stree_tip = &tip->ptgrp[to_root_pg];
+	tip->ptgrp[to_root_pg].to_stree_root = &root->ptgrp[to_tip_pg];
+}
+
+static
+void build_master_stree_branch(struct t_switch *branch_root, int cdir)
+{
+	struct t_switch *sw, *n_sw, *p_sw;
+	unsigned l, idx, cnt, pg, ng;
+
+	switch (cdir) {
+	case 0:
+		idx = branch_root->i;
+		cnt = branch_root->torus->x_sz;
+		break;
+	case 1:
+		idx = branch_root->j;
+		cnt = branch_root->torus->y_sz;
+		break;
+	case 2:
+		idx = branch_root->k;
+		cnt = branch_root->torus->z_sz;
+		break;
+	default:
+		goto out;
+	}
+	/*
+	 * This algorithm intends that a spanning tree branch never crosses
+	 * a dateline unless the 1-D ring for which we're building the branch
+	 * is interrupted by failure.  We need that guarantee to prevent
+	 * multicast/unicast credit loops.
+	 */
+	n_sw = branch_root;		/* tip of negative cdir branch */
+	ng = 2 * cdir;			/* negative cdir port group index */
+	p_sw = branch_root;		/* tip of positive cdir branch */
+	pg = 2 * cdir + 1;		/* positive cdir port group index */
+
+	for (l = idx; n_sw && l >= 1; l--) {
+		sw = ring_next_sw(n_sw, cdir, -1);
+		if (sw && !sw_in_master_stree(sw)) {
+			grow_master_stree_branch(n_sw, sw, pg, ng);
+			n_sw = sw;
+		} else
+			n_sw = NULL;
+	}
+	for (l = idx; p_sw && l < (cnt - 1); l++) {
+		sw = ring_next_sw(p_sw, cdir, 1);
+		if (sw && !sw_in_master_stree(sw)) {
+			grow_master_stree_branch(p_sw, sw, ng, pg);
+			p_sw = sw;
+		} else
+			p_sw = NULL;
+	}
+	if (n_sw && p_sw)
+		goto out;
+	/*
+	 * At least one branch couldn't grow to the dateline for this ring.
+	 * That means it is acceptable to grow the branch by crossing the
+	 * dateline.
+	 */
+	for (l = 0; l < cnt; l++) {
+		if (n_sw) {
+			sw = ring_next_sw(n_sw, cdir, -1);
+			if (sw && !sw_in_master_stree(sw)) {
+				grow_master_stree_branch(n_sw, sw, pg, ng);
+				n_sw = sw;
+			} else
+				n_sw = NULL;
+		}
+		if (p_sw) {
+			sw = ring_next_sw(p_sw, cdir, 1);
+			if (sw && !sw_in_master_stree(sw)) {
+				grow_master_stree_branch(p_sw, sw, ng, pg);
+				p_sw = sw;
+			} else
+				p_sw = NULL;
+		}
+		if (!(n_sw || p_sw))
+			break;
+	}
+out:
+	return;
+}
+
+static
+bool torus_master_stree(struct torus *t)
+{
+	int i, j, k;
+	bool success = false;
+	struct t_switch *stree_root = find_stree_root(t);
+
+	if (stree_root)
+		build_master_stree_branch(stree_root, 0);
+	else
+		goto out;
+
+	k = stree_root->k;
+	for (i = 0; i < t->x_sz; i++) {
+		j = stree_root->j;
+		if (t->sw[i][j][k])
+			build_master_stree_branch(t->sw[i][j][k], 1);
+
+		for (j = 0; j < t->y_sz; j++)
+			if (t->sw[i][j][k])
+				build_master_stree_branch(t->sw[i][j][k], 2);
+	}
+	t->master_stree_root = stree_root;
+	/*
+	 * At this point we should have a master spanning tree that contains
+	 * every present switch, for all fabrics that torus-2QoS can route
+	 * without deadlocks.  Make sure this is the case; otherwise warn
+	 * and return failure so we get bug reports.
+	 */
+	success = true;
+	for (i = 0; i < t->x_sz; i++)
+		for (j = 0; j < t->y_sz; j++)
+			for (k = 0; k < t->z_sz; k++) {
+				struct t_switch *sw = t->sw[i][j][k];
+				if (!sw || sw_in_master_stree(sw))
+					continue;
+
+				success = false;
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: sw 0x%04llx (%d,%d,%d) not in "
+					"torus multicast master spanning tree\n",
+					ntohllu(sw->n_id), i, j, k);
+			}
+out:
+	return success;
+}
+
+int route_torus(struct torus *t)
+{
+	int s;
+	bool success = true;
+
+	for (s = 0; s < (int)t->switch_cnt; s++)
+		success = torus_lft(t, t->sw_pool[s]) && success;
+
+	success = success && torus_master_stree(t);
+
+	return success ? 0 : -1;
+}
+
+uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
+		      const osm_port_t *osm_sport,
+		      const osm_port_t *osm_dport)
+{
+	struct torus_context *ctx = context;
+	osm_log_t *log = &ctx->osm->log;
+	struct endpoint *sport, *dport;
+	struct t_switch *ssw, *dsw;
+	struct torus *t;
+	guid_t guid;
+	unsigned sl = 0, sp;
+
+	sport = osm_sport->priv;
+	if (!(sport && sport->osm_port == osm_sport)) {
+		sport = osm_port_relink_endpoint(osm_sport);
+		if (!sport) {
+			guid = osm_node_get_node_guid(osm_sport->p_node);
+			OSM_LOG(log, OSM_LOG_INFO,
+				"Error: osm_sport (GUID 0x%04llx) "
+				"not in our fabric description\n",
+				ntohllu(guid));
+			goto out;
+		}
+	}
+	dport = osm_dport->priv;
+	if (!(dport && dport->osm_port == osm_dport)) {
+		dport = osm_port_relink_endpoint(osm_dport);
+		if (!dport) {
+			guid = osm_node_get_node_guid(osm_dport->p_node);
+			OSM_LOG(log, OSM_LOG_INFO,
+				"Error: osm_dport (GUID 0x%04llx) "
+				"not in our fabric description\n",
+				ntohllu(guid));
+			goto out;
+		}
+	}
+	/*
+	 * We're only supposed to be called for CA ports, and maybe
+	 * switch management ports.
+	 */
+	if (sport->type != SRCSINK) {
+		guid = osm_node_get_node_guid(osm_sport->p_node);
+		OSM_LOG(log, OSM_LOG_INFO,
+			"Error: osm_sport (GUID 0x%04llx) "
+			"not a data src/sink port\n", ntohllu(guid));
+		goto out;
+	}
+	if (dport->type != SRCSINK) {
+		guid = osm_node_get_node_guid(osm_dport->p_node);
+		OSM_LOG(log, OSM_LOG_INFO,
+			"Error: osm_dport (GUID 0x%04llx) "
+			"not a data src/sink port\n", ntohllu(guid));
+		goto out;
+	}
+	/*
+	 * By definition, a CA port is connected to end[1] of a link, and
+	 * the switch port is end[0].  See build_ca_link() and link_srcsink().
+	 */
+	if (sport->link) {
+		ssw = sport->link->end[0].sw;
+		sp = sport->link->end[0].port;
+	} else {
+		ssw = sport->sw;
+		sp = sport->port;
+	}
+	if (dport->link)
+		dsw = dport->link->end[0].sw;
+	else
+		dsw = dport->sw;
+
+	t = ssw->torus;
+
+	sl  = sl_set_use_loop_vl(use_vl1(ssw->i, dsw->i, t->x_sz), 0);
+	sl |= sl_set_use_loop_vl(use_vl1(ssw->j, dsw->j, t->y_sz), 1);
+	sl |= sl_set_use_loop_vl(use_vl1(ssw->k, dsw->k, t->z_sz), 2);
+	sl |= sl_set_qos(sl_get_qos(path_sl_hint));
+out:
+	return sl;
+}
+
+static
+int torus_build_lfts(void *context)
+{
+	int status = -1;
+	struct torus_context *ctx = context;
+	struct fabric *fabric;
+	struct torus *torus;
+
+	fabric = &ctx->fabric;
+	teardown_fabric(fabric);
+
+	torus = calloc(1, sizeof(*torus));
+	if (!torus) {
+		OSM_LOG(&ctx->osm->log, OSM_LOG_ERROR,
+			"Error: allocating torus: %s\n", strerror(errno));
+		goto out;
+	}
+	torus->osm = ctx->osm;
+	fabric->osm = ctx->osm;
+
+	if (!parse_config(OPENSM_CONFIG_DIR "/opensm-torus.conf",
+			  fabric, torus))
+		goto out;
+
+	if (!capture_fabric(fabric))
+		goto out;
+
+	OSM_LOG(&torus->osm->log, OSM_LOG_INFO,
+		"Found fabric w/ %d links, %d switches, %d CA ports, "
+		"minimum %d data VLs\n",
+		(int)fabric->link_cnt, (int)fabric->switch_cnt,
+		(int)fabric->ca_cnt, (int)ctx->osm->subn.min_data_vls);
+
+	if (!verify_setup(torus, fabric))
+		goto out;
+
+	OSM_LOG(&torus->osm->log, OSM_LOG_INFO,
+		"Looking for %d x %d x %d %s\n",
+		(int)torus->x_sz, (int)torus->y_sz, (int)torus->z_sz,
+		(ALL_MESH(torus->flags) ? "mesh" : "torus"));
+
+	build_torus(fabric, torus);
+
+	OSM_LOG(&torus->osm->log, OSM_LOG_INFO,
+		"Built %d x %d x %d %s w/ %d links, %d switches, %d CA ports\n",
+		(int)torus->x_sz, (int)torus->y_sz, (int)torus->z_sz,
+		(ALL_MESH(torus->flags) ? "mesh" : "torus"),
+		(int)torus->link_cnt, (int)torus->switch_cnt,
+		(int)torus->ca_cnt);
+
+	diagnose_fabric(fabric);
+	/*
+	 * Since we found some sort of torus fabric, report on any topology
+	 * changes vs. the last torus we found.
+	 */
+	if (torus->flags & NOTIFY_CHANGES)
+		report_torus_changes(torus, ctx->torus);
+
+	if (routable_torus(torus, fabric))
+		status = route_torus(torus);
+
+out:
+	if (status) {		/* bad torus!! */
+		if (torus)
+			teardown_torus(torus);
+	} else {
+		if (ctx->torus)
+			teardown_torus(ctx->torus);
+		ctx->torus = torus;
+	}
+	teardown_fabric(fabric);
+	return status;
+}
+
+int osm_ucast_torus2QoS_setup(struct osm_routing_engine *r,
+			      osm_opensm_t *osm)
+{
+	struct torus_context *ctx;
+
+	ctx = torus_context_create(osm);
+
+	r->context = ctx;
+	r->ucast_build_fwd_tables = torus_build_lfts;
+	r->update_sl2vl = torus_update_osm_sl2vl;
+	r->path_sl = torus_path_sl;
+	r->mcast_build_stree = torus_mcast_stree;
+	r->delete = torus_context_delete;
+	return 0;
+}
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 10/17] opensm: Update documentation to describe torus-2QoS.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (7 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 09/17] opensm: Add torus-2QoS routing engine, part 3 Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 11/17] opensm: Enable torus-2QoS routing engine Jim Schutt
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/doc/current-routing.txt |  269 +++++++++++++++++++++++++++++++++++++++-
 opensm/man/opensm.8.in         |    9 ++-
 2 files changed, 275 insertions(+), 3 deletions(-)

diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
index 1302860..78a2e01 100644
--- a/opensm/doc/current-routing.txt
+++ b/opensm/doc/current-routing.txt
@@ -1,7 +1,7 @@
 Current OpenSM Routing
-7/9/07
+10/9/09
 
-OpenSM offers five routing engines:
+OpenSM offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -28,6 +28,13 @@ two switches.  This provides deadlock free routes for hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM provides an optional unicast routing cache (enabled by -A or
 --ucast_cache options). When enabled, unicast routing cache prevents
 routing recalculation (which is a heavy task in a large cluster) when
@@ -388,3 +395,261 @@ ports, one port on one end of the cable, and the other port on the
 other end, continuing along the mesh dimension.
 
 Use '-R dor' option to activate the DOR algorithm.
+
+Torus-2QoS Routing Algorithm
+----------------------------
+
+Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
+The torus-2QoS routing engine can provide the following functionality on
+a 2D/3D torus:
+- routing that is free of credit loops
+- two levels of QoS, assuming switches support 8 data VLs
+- ability to route around a single failed switch, and/or multiple failed
+    links, without
+    - introducing credit loops
+    - changing path SL values
+- very short run times, with good scaling properties as fabric size
+    increases
+
+Torus-2QoS is a DOR-based algorithm that avoids deadlocks that would otherwise
+occur in a torus using the concept of a dateline for each torus dimension.
+It encodes into a path SL which datelines the path crosses as follows:
+
+  sl = 0;
+  for (d = 0; d < torus_dimensions; d++)
+    /* path_crosses_dateline(d) returns 0 or 1 */
+    sl |= path_crosses_dateline(d) << d;
+
+For a 3D torus, that leaves one SL bit free, which torus-2QoS uses to
+implement two QoS levels.
+
+This is possible because torus-2QoS also makes use of the output port
+dependence of the switch SL2VL maps.  It computes in which torus coordinate
+direction each interswitch link "points", and writes SL2VL maps for such
+ports as follows:
+
+  for (sl = 0; sl < 16; sl ++)
+    /* cdir(port) reports which torus coordinate direction a switch port
+     * "points" in, and returns 0, 1, or 2 */
+    sl2vl(iport,oport,sl) = 0x1 & (sl >> cdir(oport));
+
+Thus torus-2QoS consumes 8 SL values (SL bits 0-2) and 2 VL values (VL bit 0)
+per QoS level to provide deadlock-free routing on a 3D torus.
+
+Torus-2QoS routes around link failure by "taking the long way around" any
+1D ring interrupted by a link failure.  For example, consider the 2D 6x5
+torus below, where switches are denoted by [+a-zA-Z]:
+
+        |    |    |    |    |    |
+   4  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   3  --+----+----+----D----+----+--
+        |    |    |    |    |    |
+   2  --+----+----I----r----+----+--
+        |    |    |    |    |    |
+   1  --m----S----n----T----o----p--
+        |    |    |    |    |    |
+ y=0  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+
+      x=0    1    2    3    4    5
+
+For a pristine fabric the path from S to D would be S-n-T-r-d.  In the
+event that either link S-n or n-T has failed, torus-2QoS would use the path
+S-m-p-o-T-r-D.  Note that it can do this without changing the path SL
+value; once the 1D ring m-S-n-T-o-p-m has been broken by failure, path
+segments using it cannot contribute to deadlock, and the x-direction
+dateline (between, say, x=5 and x=0) can be ignored for path segments on
+that ring.
+
+One result of this is that torus-2QoS can route around many simultaneous
+link failures, as long as no 1D ring is broken into disjoint regions.  For
+example, if links n-T and T-o have both failed, that ring has been broken
+into two disjoint regions, T and o-p-m-S-n.  Torus-2QoS checks for such
+issues, reports if they are found, and refuses to route such fabrics.
+
+Handling a failed switch under DOR requires introducing into a path at
+least one turn that would be otherwise "illegal", i.e. not allowed by DOR
+rules.  Torus-2QoS will introduce such a turn as close as possible to the
+failed switch in order to route around it.
+
+In the above example, suppose switch T has failed, and consider the path
+from S to D.  Torus-2QoS will produce the path S-n-I-r-D, rather than the
+S-n-T-r-D path for a pristine torus, by introducing an early turn at n.
+For traffic arriving at switch I from n, normal DOR rules will generate an
+illegal turn in the path from S to D at I, and a legal turn at r.
+
+Torus-2QoS will also use the input port dependence of SL2VL maps to set VL
+bit 1 (which would be otherwise unused) for y-x, z-x, and z-y turns, i.e.,
+those turns that are illegal under DOR.  This causes the first hop after
+any such turn to use a separate set of VL values, and prevents deadlock in
+the presence of a single failed switch.
+
+For any given path, only the hops after a turn that is illegal under DOR
+can contribute to a credit loop that leads to deadlock.  So in the example
+above with failed switch T, the location of the illegal turn at I in the
+path from S to D requires that any credit loop caused by that turn must
+encircle the failed switch at T.  Thus the second and later hops after the
+illegal turn at I (i.e., hop r-D) cannot contribute to a credit loop
+because they cannot be used to construct a loop encircling T.  The hop I-r
+uses a separate VL, so it cannot contribute to a credit loop encircling T.
+
+Extending this argument shows that in addition to being capable of routing
+around a single switch failure without introducing deadlock, torus-2QoS can
+also route around multiple failed switches on the condition they are
+adjacent in the last dimension routed by DOR.  For example, consider the
+following case on a 6x6 2D torus:
+
+
+        |    |    |    |    |    |
+   5  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   4  --+----+----+----D----+----+--
+        |    |    |    |    |    |
+   3  --+----+----I----u----+----+--
+        |    |    |    |    |    |
+   2  --+----+----q----R----+----+--
+        |    |    |    |    |    |
+   1  --m----S----n----T----o----p--
+        |    |    |    |    |    |
+ y=0  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+
+      x=0    1    2    3    4    5
+
+
+Suppose switches T and R have failed, and consider the path from S to D.
+Torus-2QoS will generate the path S-n-q-I-u-D, with an illegal turn at
+switch I, and with hop I-u using a VL with bit 1 set.
+
+As a further example, consider a case that torus-2QoS cannot route without
+deadlock: two failed switches adjacent in a dimension that is not the last
+dimension routed by DOR; here the failed switches are O and T:
+
+        |    |    |    |    |    |
+   5  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   4  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   3  --+----+----+----+----D----+--
+        |    |    |    |    |    |
+   2  --+----+----I----q----r----+--
+        |    |    |    |    |    |
+   1  --m----S----n----O----T----p--
+        |    |    |    |    |    |
+ y=0  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+
+      x=0    1    2    3    4    5
+
+In a pristine fabric, torus-2QoS would generate the path from S to D as
+S-n-O-T-r-D.  With failed switches O and T, torus-2QoS will generate the
+path S-n-I-q-r-D, with illegal turn at switch I, and with hop I-q using a
+VL with bit 1 set.  In contrast to the earlier examples, the second hop
+after the illegal turn, q-r, can be used to construct a credit loop
+encircling the failed switches.
+
+Since torus-2QoS uses all four available SL bits, and the three data VL
+bits that are typically available in current switches, there is no way
+to use SL/VL values to separate multicast traffic from unicast traffic.
+Thus, torus-2QoS must generate multicast routing such that credit loops
+cannot arise from a combination of multicast and unicast path segments.
+
+It turns out that it is possible to construct spanning trees for multicast
+routing that have that property.  For the 2D 6x5 torus example above, here
+is the full-fabric spanning tree that torus-2QoS will construct, where "x"
+is the root switch and each "+" is a non-root switch:
+
+   4    +    +    +    +    +    +
+        |    |    |    |    |    |
+   3    +    +    +    +    +    +
+        |    |    |    |    |    |
+   2    +----+----+----x----+----+
+        |    |    |    |    |    |
+   1    +    +    +    +    +    +
+        |    |    |    |    |    |
+ y=0    +    +    +    +    +    +
+
+      x=0    1    2    3    4    5
+
+For multicast traffic routed from root to tip, every turn in the above
+spanning tree is a legal DOR turn.
+
+For traffic routed from tip to root, and some traffic routed through the
+root, turns are not legal DOR turns.  However, to construct a credit loop,
+the union of multicast routing on this spanning tree with DOR unicast
+routing can only provide 3 of the 4 turns needed for the loop.
+
+In addition, if none of the above spanning tree branches crosses a dateline
+used for unicast credit loop avoidance on a torus, and if multicast traffic
+is confined to SL 0 or SL 8 (recall that torus-2QoS uses SL bit 3 to
+differentiate QoS level), then multicast traffic also cannot contribute to
+the "ring" credit loops that are otherwise possible in a torus.
+
+Torus-2QoS uses these ideas to create a master spanning tree.  Every
+multicast group spanning tree will be constructed as a subset of the master
+tree, with the same root as the master tree.
+
+Such multicast group spanning trees will in general not be optimal for
+groups which are a subset of the full fabric. However, this compromise must
+be made to enable support for two QoS levels on a torus while preventing
+credit loops.
+
+In the presence of link or switch failures that result in a fabric for
+which torus-2QoS can generate credit-loop-free unicast routes, it is also
+possible to generate a master spanning tree for multicast that retains the
+required properties.  For example, consider that same 2D 6x5 torus, with
+the link from (2,2) to (3,2) failed.  Torus-2QoS will generate the following
+master spanning tree:
+
+   4    +    +    +    +    +    +
+        |    |    |    |    |    |
+   3    +    +    +    +    +    +
+        |    |    |    |    |    |
+   2  --+----+----+    x----+----+--
+        |    |    |    |    |    |
+   1    +    +    +    +    +    +
+        |    |    |    |    |    |
+ y=0    +    +    +    +    +    +
+
+      x=0    1    2    3    4    5
+
+Two things are notable about this master spanning tree.  First, assuming
+the x dateline was between x=5 and x=0, this spanning tree has a branch
+that crosses the dateline.  However, just as for unicast, crossing a
+dateline on a 1D ring (here, the ring for y=2) that is broken by a failure
+cannot contribute to a torus credit loop.
+
+Second, this spanning tree is no longer optimal even for multicast groups
+that encompass the entire fabric.  That, unfortunately, is a compromise that
+must be made to retain the other desirable properties of torus-2QoS routing.
+
+In the event that a single switch fails, torus-2QoS will generate a master
+spanning tree that has no "extra" turns by appropriately selecting a root
+switch.  In the 2D 6x5 torus example, assume now that the switch at (3,2),
+i.e. the root for a pristine fabric, fails.  Torus-2QoS will generate the
+following master spanning tree for that case:
+
+                       |
+   4    +    +    +    +    +    +
+        |    |    |    |    |    |
+   3    +    +    +    +    +    +
+        |    |    |         |    |
+   2    +    +    +         +    +
+        |    |    |         |    |
+   1    +----+----x----+----+----+
+        |    |    |    |    |    |
+ y=0    +    +    +    +    +    +
+                       |
+
+      x=0    1    2    3    4    5
+
+Assuming the y dateline was between y=4 and y=0, this spanning tree has
+a branch that crosses a dateline.  However, again this cannot contribute
+to credit loops as it occurs on a 1D ring (the ring for x=3) that is
+broken by a failure, as in the above example.
+
+Due to the use made by torus-2QoS of SLs and VLs, QoS configuration should
+only employ SL values 0 and 8, for both multicast and unicast.  Also,
+SL to VL map configuration must be under the complete control of torus-2QoS,
+so any user-supplied configuration must and will be ignored.
diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
index 9053611..47dff99 100644
--- a/opensm/man/opensm.8.in
+++ b/opensm/man/opensm.8.in
@@ -649,7 +649,7 @@ compiling opensm with -DROUTER_EXP which has been obsoleted.
 
 .SH ROUTING
 .PP
-OpenSM now offers five routing engines:
+OpenSM now offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -678,6 +678,13 @@ two switches.  This provides deadlock free routes for hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM also supports a file method which
 can load routes from a table. See \'Modular Routing Engine\' for more
 information on this.
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 11/17] opensm: Enable torus-2QoS routing engine.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (8 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 10/17] opensm: Update documentation to describe torus-2QoS Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 12/17] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |    1 +
 opensm/opensm/main.c               |    2 +-
 opensm/opensm/osm_opensm.c         |    6 ++++++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index fddcf53..8d63111 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -105,6 +105,7 @@ typedef enum _osm_routing_engine_type {
 	OSM_ROUTING_ENGINE_TYPE_FTREE,
 	OSM_ROUTING_ENGINE_TYPE_LASH,
 	OSM_ROUTING_ENGINE_TYPE_DOR,
+	OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS,
 	OSM_ROUTING_ENGINE_TYPE_UNKNOWN
 } osm_routing_engine_type_t;
 /***********/
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 0093aa7..abc3282 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -174,7 +174,7 @@ static void show_usage(void)
 	       "          Min Hop algorithm.  Multiple routing engines can be specified\n"
 	       "          separated by commas so that specific ordering of routing\n"
 	       "          algorithms will be tried if earlier routing engines fail.\n"
-	       "          Supported engines: updn, file, ftree, lash, dor\n\n");
+	       "          Supported engines: updn, file, ftree, lash, dor, torus-2QoS\n\n");
 	printf("--do_mesh_analysis\n"
 	       "          This option enables additional analysis for the lash\n"
 	       "          routing engine to precondition switch port assignments\n"
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 5614240..8b03947 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -70,6 +70,7 @@ extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *);
+extern int osm_ucast_torus2QoS_setup(struct osm_routing_engine *, osm_opensm_t *);
 
 const static struct routing_engine_module routing_modules[] = {
 	{"minhop", osm_ucast_minhop_setup},
@@ -78,6 +79,7 @@ const static struct routing_engine_module routing_modules[] = {
 	{"ftree", osm_ucast_ftree_setup},
 	{"lash", osm_ucast_lash_setup},
 	{"dor", osm_ucast_dor_setup},
+	{"torus-2QoS", osm_ucast_torus2QoS_setup},
 	{NULL, NULL}
 };
 
@@ -98,6 +100,8 @@ const char *osm_routing_engine_type_str(IN osm_routing_engine_type_t type)
 		return "lash";
 	case OSM_ROUTING_ENGINE_TYPE_DOR:
 		return "dor";
+	case OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS:
+		return "torus-2QoS";
 	default:
 		break;
 	}
@@ -124,6 +128,8 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str)
 		return OSM_ROUTING_ENGINE_TYPE_LASH;
 	else if (!strcasecmp(str, "dor"))
 		return OSM_ROUTING_ENGINE_TYPE_DOR;
+	else if (!strcasecmp(str, "torus-2QoS"))
+		return OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS;
 	else
 		return OSM_ROUTING_ENGINE_TYPE_UNKNOWN;
 }
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 12/17] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (9 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 11/17] opensm: Enable torus-2QoS routing engine Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 13/17] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_base.h   |   18 ++++++++++++++++++
 opensm/include/opensm/osm_subnet.h |    5 +++++
 opensm/opensm/main.c               |    9 +++++++++
 opensm/opensm/osm_subnet.c         |    1 +
 opensm/opensm/osm_torus.c          |    2 +-
 5 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index e0d6c66..fa4c78d 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -271,6 +271,24 @@ BEGIN_C_DECLS
 #endif
 /***********/
 
+/****d* OpenSM: Base/OSM_DEFAULT_TORUS_CONF_FILE
+* NAME
+*	OSM_DEFAULT_TORUS_CONF_FILE
+*
+* DESCRIPTION
+*	Specifies the default file name for extra torus-2QoS configuration
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_TORUS_CONF_FILE strcat(GetOsmCachePath(), "osm-torus-2QoS.conf")
+#elif defined(OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_TORUS_CONF_FILE OPENSM_CONFIG_DIR "/torus-2QoS.conf"
+#else
+#define OSM_DEFAULT_TORUS_CONF_FILE "/etc/opensm/torus-2QoS.conf"
+#endif /* __WIN__ */
+/***********/
+
 /****d* OpenSM: Base/OSM_DEFAULT_PREFIX_ROUTES_FILE
 * NAME
 *	OSM_DEFAULT_PREFIX_ROUTES_FILE
diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 4fa0161..fa3e46e 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -204,6 +204,7 @@ typedef struct osm_subn_opt {
 	char *guid_routing_order_file;
 	char *sa_db_file;
 	boolean_t sa_db_dump;
+	char *torus_conf_file;
 	boolean_t do_mesh_analysis;
 	boolean_t exit_on_fatal;
 	boolean_t honor_guid2lid_file;
@@ -431,6 +432,10 @@ typedef struct osm_subn_opt {
 *		When TRUE causes OpenSM to dump SA DB at the end of every
 *		light sweep regardless the current verbosity level.
 *
+*	torus_conf_file
+*		Name of the file with extra configuration info for torus-2QoS
+*		routing engine.
+*
 *	exit_on_fatal
 *		If TRUE (default) - SM will exit on fatal subnet initialization
 *		issues.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index abc3282..b0bc372 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -231,6 +231,10 @@ static void show_usage(void)
 	       "          Set the order port guids will be routed for the MinHop\n"
 	       "          and Up/Down routing algorithms to the guids provided in the\n"
 	       "          given file (one to a line)\n\n");
+	printf("--torus_config <path to file>\n"
+	       "          This option defines the file name for the extra configuration\n"
+	       "          info needed for the torus-2QoS routing engine.   The default\n"
+	       "          name is \'"OSM_DEFAULT_TORUS_CONF_FILE"\'\n\n");
 	printf("--once, -o\n"
 	       "          This option causes OpenSM to configure the subnet\n"
 	       "          once, then exit.  Ports remain in the ACTIVE state.\n\n");
@@ -615,6 +619,7 @@ int main(int argc, char *argv[])
 		{"sm_sl", 1, NULL, 7},
 		{"retries", 1, NULL, 8},
 		{"log_prefix", 1, NULL, 9},
+		{"torus_config", 1, NULL, 10},
 		{NULL, 0, NULL, 0}	/* Required at the end of the array */
 	};
 
@@ -1003,6 +1008,10 @@ int main(int argc, char *argv[])
 			SET_STR_OPT(opt.log_prefix, optarg);
 			printf("Log prefix = %s\n", opt.log_prefix);
 			break;
+		case 10:
+			SET_STR_OPT(opt.torus_conf_file, optarg);
+			printf("Torus-2QoS config file = %s\n", opt.torus_conf_file);
+			break;
 		case 'h':
 		case '?':
 		case ':':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 8224b5f..bc34a0f 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -753,6 +753,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
 	p_opt->guid_routing_order_file = NULL;
 	p_opt->sa_db_file = NULL;
 	p_opt->sa_db_dump = FALSE;
+	p_opt->torus_conf_file = strdup(OSM_DEFAULT_TORUS_CONF_FILE);
 	p_opt->do_mesh_analysis = FALSE;
 	p_opt->exit_on_fatal = TRUE;
 	p_opt->enable_quirks = FALSE;
diff --git a/opensm/opensm/osm_torus.c b/opensm/opensm/osm_torus.c
index fe643f2..871a3f5 100644
--- a/opensm/opensm/osm_torus.c
+++ b/opensm/opensm/osm_torus.c
@@ -9049,7 +9049,7 @@ int torus_build_lfts(void *context)
 	torus->osm = ctx->osm;
 	fabric->osm = ctx->osm;
 
-	if (!parse_config(OPENSM_CONFIG_DIR "/opensm-torus.conf",
+	if (!parse_config(ctx->osm->subn.opt.torus_conf_file,
 			  fabric, torus))
 		goto out;
 
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 13/17] opensm: Do not require -Q option for torus-2QoS routing engine.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (10 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 12/17] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 14/17] opensm: Make it possible to configure no fallback " Jim Schutt
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

The torus-2QoS engine provides a deadlock-free routing for a 2D/3D torus,
but requires that switch SL2VL maps be programmed.  Before this change,
"opensm -Q" was required for that to happen.

When a routing engine sets the struct osm_routing_engine:update_sl2vl
pointer, it is signalling its intent to participate in SL2VL map programming.
So, don't return early from osm_qos_setup() in that case; instead do everything
except attempt to read QoS configuration information.

For that to work properly, need to also always set up the default QoS config
information, instead of just when QoS is requested via -Q.

With that in place, the -Q option now means the same thing to torus-2QoS that
it means to other routing engines: QoS configuration is requested.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_qos.c    |    7 +++++--
 opensm/opensm/osm_subnet.c |   18 +++++++++---------
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index dadef29..6d2af55 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -290,7 +290,9 @@ int osm_qos_setup(osm_opensm_t * p_osm)
 	osm_node_t *p_node;
 	int ret = 0;
 
-	if (!p_osm->subn.opt.qos)
+	if (!(p_osm->subn.opt.qos ||
+	      (p_osm->routing_engine_used &&
+	       p_osm->routing_engine_used->update_sl2vl)))
 		return 0;
 
 	OSM_LOG_ENTER(&p_osm->log);
@@ -307,7 +309,8 @@ int osm_qos_setup(osm_opensm_t * p_osm)
 	cl_plock_excl_acquire(&p_osm->lock);
 
 	/* read QoS policy config file */
-	osm_qos_parse_policy_file(&p_osm->subn);
+	if (p_osm->subn.opt.qos)
+		osm_qos_parse_policy_file(&p_osm->subn);
 
 	p_tbl = &p_osm->subn.port_guid_tbl;
 	p_next = cl_qmap_head(p_tbl);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index bc34a0f..f714af7 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -1051,6 +1051,8 @@ static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix,
 
 int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 {
+	osm_qos_options_t dflt;
+
 	if (p_opts->lmc > 7) {
 		log_report(" Invalid Cached Option Value:lmc = %u:"
 			   "Using Default:%u\n", p_opts->lmc, OSM_DEFAULT_LMC);
@@ -1101,17 +1103,15 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 		p_opts->console = OSM_DEFAULT_CONSOLE;
 	}
 
-	if (p_opts->qos) {
-		osm_qos_options_t dflt;
-
-		/* the default options in qos_options must be correct.
-		 * every other one need not be, b/c those will default
-		 * back to whatever is in qos_options.
-		 */
 
-		subn_set_default_qos_options(&dflt);
+	/* the default options in qos_options must be correct.
+	 * every other one need not be, b/c those will default
+	 * back to whatever is in qos_options.
+	 */
+	subn_set_default_qos_options(&dflt);
+	subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
 
-		subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
+	if (p_opts->qos) {
 		subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca",
 				    &p_opts->qos_options);
 		subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0",
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 14/17] opensm: Make it possible to configure no fallback routing engine.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (11 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 13/17] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 15/17] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

For a fabric that requires routing with an engine with special properties,
say avoiding credit loops via making use of SLs in routing, it might
be preferable to not fall back to minhop if the configured routing engine
fails.

E.g. the torus-2QoS routing engine uses both SL2VL maps and path SL values
to provide routing free of credit loops, but cannot route fabrics for
some patterns of failed switches.  Should a switch fail that creates such
a pattern, it may be preferable to keep the previous routing information
loaded in the switches until a switch can be replaced that restores
torus-2QoS's ability to route the fabric.

The alternative, having some other engine route the fabric, will immediately
introduce credit loops.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_subnet.h |    1 +
 opensm/opensm/osm_opensm.c         |    5 +++++
 opensm/opensm/osm_qos.c            |    6 ++++++
 opensm/opensm/osm_ucast_mgr.c      |   23 +++++++++++++++--------
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index fa3e46e..42ae416 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -219,6 +219,7 @@ typedef struct osm_subn_opt {
 	osm_qos_options_t qos_rtr_options;
 	boolean_t enable_quirks;
 	boolean_t no_clients_rereg;
+	boolean_t no_fallback_routing_engine;
 #ifdef ENABLE_OSM_PERF_MGR
 	boolean_t perfmgr;
 	boolean_t perfmgr_redir;
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 8b03947..e296812 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -159,6 +159,11 @@ static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
 	struct osm_routing_engine *re;
 	const struct routing_engine_module *m;
 
+	if (!strcmp(name, "no_fallback")) {
+		osm->subn.opt.no_fallback_routing_engine = TRUE;
+		return NULL;
+	}
+
 	for (m = routing_modules; m->name && *m->name; m++) {
 		if (!strcmp(m->name, name)) {
 			re = malloc(sizeof(struct osm_routing_engine));
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index 6d2af55..dc6a8ff 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -211,6 +211,12 @@ static int qos_extports_setup(osm_sm_t * sm, osm_node_t *node,
 	int ret = 0;
 	unsigned i, j;
 
+	/*
+	 * Do nothing unless the most recent routing attempt was successful.
+	 */
+	if (!re)
+		return ret;
+
 	for (i = 1; i < num_ports; i++) {
 		p = osm_node_get_physp_ptr(node, i);
 		force_update = p->need_update || sm->p_subn->need_update;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 10629cb..d1c485f 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1091,7 +1091,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 		p_routing_eng = p_routing_eng->next;
 	}
 
-	if (!p_osm->routing_engine_used) {
+	if (!p_osm->routing_engine_used &&
+	    p_osm->subn.opt.no_fallback_routing_engine != TRUE) {
 		/* If configured routing algorithm failed, use default MinHop */
 		struct osm_routing_engine *r = p_osm->default_routing_engine;
 
@@ -1101,14 +1102,20 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 		osm_ucast_mgr_set_fwd_tables(p_mgr);
 	}
 
-	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
-		"%s tables configured on all switches\n",
-		osm_routing_engine_type_str(p_osm->
-					    routing_engine_used->type));
-
-	if (p_mgr->p_subn->opt.use_ucast_cache)
-		p_mgr->cache_valid = TRUE;
+	if (p_osm->routing_engine_used) {
+		OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
+			"%s tables configured on all switches\n",
+			osm_routing_engine_type_str(p_osm->
+						    routing_engine_used->type));
 
+		if (p_mgr->p_subn->opt.use_ucast_cache)
+			p_mgr->cache_valid = TRUE;
+	} else {
+		p_mgr->p_subn->subnet_initialization_error = TRUE;
+		OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR,
+			"No routing engine able to successfully configure "
+			" switch tables on current fabric\n");
+	}
 Exit:
 	CL_PLOCK_RELEASE(p_mgr->p_lock);
 	OSM_LOG_EXIT(p_mgr->p_log);
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 15/17] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (12 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 14/17] opensm: Make it possible to configure no fallback " Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 16/17] opensm: Avoid havoc in dump_ucast_routes() " Jim Schutt
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

It cannot clear osm_port_t:priv members when it tears down its persistent
data for the following reason: If a port is removed from the fabric, the
opensm core will delete the corresponding osm_port_t object, leaving
torus-2QoS holding a dangling reference.  Torus-2QoS then has a use-after-free
error when tearing down its persistent data if it tries to use its dangling
osm_port_t reference to clear the priv member.

When torus-2QoS is unable to route a fabric due to missing switches and
opensm is configured to fall back to minhop, havoc will ensue because
minhop uses a non-NULL osm_port_t:priv as a proxy for LMC > 0: it
assumes if osm_port_t:priv is non-NULL it can only be because
alloc_ports_priv() has been called.

Fix this up by always calling alloc_ports_priv(), and have it set
priv = NULL if LMC == 0.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_mgr.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index d1c485f..e6e40f0 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -315,8 +315,10 @@ static void alloc_ports_priv(osm_ucast_mgr_t * mgr)
 	     item = cl_qmap_next(item)) {
 		port = (osm_port_t *) item;
 		lmc = ib_port_info_get_lmc(&port->p_physp->port_info);
-		if (!lmc)
+		if (!lmc) {
+			port->priv = NULL;
 			continue;
+		}
 		r = malloc(sizeof(*r) + sizeof(r->guids[0]) * (1 << lmc));
 		if (!r) {
 			OSM_LOG(mgr->p_log, OSM_LOG_ERROR, "ERR 3A09: "
@@ -363,8 +365,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * p_map_item,
 	/* Initialize LIDs in buffer to invalid port number. */
 	memset(p_sw->new_lft, OSM_NO_PATH, p_sw->max_lid_ho + 1);
 
-	if (p_mgr->p_subn->opt.lmc)
-		alloc_ports_priv(p_mgr);
+	alloc_ports_priv(p_mgr);
 
 	/*
 	   Iterate through every port setting LID routes for each
@@ -381,8 +382,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * p_map_item,
 		}
 	}
 
-	if (p_mgr->p_subn->opt.lmc)
-		free_ports_priv(p_mgr);
+	free_ports_priv(p_mgr);
 
 	OSM_LOG_EXIT(p_mgr->p_log);
 }
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 16/17] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (13 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 15/17] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-15 19:53   ` [PATCH v3 17/17] opensm: Cause status of unicast routing attempt to propogate to callers of osm_ucast_mgr_process() Jim Schutt
  2010-06-16 14:11   ` [PATCH v3 08/17] opensm: Add new torus routing engine: torus-2QoS, part 2 Jim Schutt
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

However, osm_switch_recommend_path() uses a non-NULL osm_port_t:priv
as a flag that osm_port_t:priv holds a tracking array used when
LMC > 0.  It turns out that 1) dump_ucast_routes() does not need
osm_switch_recommend_path() to consider alternate routes, and 2)
before the addition of torus-2QoS, osm_port_t:priv use never
persisted past the unicast routing function, so it was always
NULL on entry to dump_ucast_routes().

Fix this up by making the routing_for_lmc flag explicitly set by
the caller of osm_switch_recommend_path(), rather than inferring
it from osm_port_t:priv.  This retains existing behavior for
existing routing engines, and allows torus-2QoS to make persistent
use of osm_port_t:priv.

The alternative would be to add another member to osm_port_t,
say osm_port_t:priv2.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_switch.h |   12 ++++++++++++
 opensm/opensm/osm_dump.c           |    2 +-
 opensm/opensm/osm_switch.c         |    7 ++++---
 opensm/opensm/osm_ucast_mgr.c      |    1 +
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h
index 51a8427..f407dd9 100644
--- a/opensm/include/opensm/osm_switch.h
+++ b/opensm/include/opensm/osm_switch.h
@@ -918,6 +918,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 				  IN osm_port_t * p_port, IN uint16_t lid_ho,
 				  IN unsigned start_from,
 				  IN boolean_t ignore_existing,
+				  IN boolean_t routing_for_lmc,
 				  IN boolean_t dor);
 /*
 * PARAMETERS
@@ -940,6 +941,17 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 *		If false, the switch will choose an existing route if one
 *		exists, otherwise will choose the optimal route.
 *
+*	routing_for_lmc
+*		[in] We support an enhanced LMC aware routing mode:
+*		In the case of LMC > 0, we can track the remote side
+*		system and node for all of the lids of the target
+*		and try and avoid routing again through the same
+*		system / node.
+*
+*		Assume if routing_for_lmc is TRUE that this procedure
+*		was provided with the tracking array and counter via
+*		p_port->priv, and we can conduct this algorithm.
+*
 *	dor
 *		[in] If TRUE, Dimension Order Routing will be done.
 *
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index bfff1a0..535a03f 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -221,7 +221,7 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * file, void *cxt)
 			/* No LMC Optimization */
 			best_port = osm_switch_recommend_path(p_sw, p_port,
 							      lid_ho, 1, TRUE,
-							      dor);
+							      FALSE, dor);
 			fprintf(file, "No %u hop path possible via port %u!",
 				best_hops, best_port);
 		}
diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c
index b621852..9785a9d 100644
--- a/opensm/opensm/osm_switch.c
+++ b/opensm/opensm/osm_switch.c
@@ -216,6 +216,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 				  IN osm_port_t * p_port, IN uint16_t lid_ho,
 				  IN unsigned start_from,
 				  IN boolean_t ignore_existing,
+				  IN boolean_t routing_for_lmc,
 				  IN boolean_t dor)
 {
 	/*
@@ -225,10 +226,10 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 	   and try and avoid routing again through the same
 	   system / node.
 
-	   If this procedure is provided with the tracking array
-	   and counter we can conduct this algorithm.
+	   Assume if routing_for_lmc is true that this procedure was
+	   provided the tracking array and counter via p_port->priv,
+	   and we can conduct this algorithm.
 	 */
-	boolean_t routing_for_lmc = (p_port->priv != NULL);
 	uint16_t base_lid;
 	uint8_t hops;
 	uint8_t least_hops;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index e6e40f0..f5a715f 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -252,6 +252,7 @@ static void ucast_mgr_process_port(IN osm_ucast_mgr_t * p_mgr,
 	 */
 	port = osm_switch_recommend_path(p_sw, p_port, lid_ho, start_from,
 					 p_mgr->p_subn->ignore_existing_lfts,
+					 p_mgr->p_subn->opt.lmc,
 					 p_mgr->is_dor);
 
 	if (port == OSM_NO_PATH) {
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 17/17] opensm: Cause status of unicast routing attempt to propogate to callers of osm_ucast_mgr_process().
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (14 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 16/17] opensm: Avoid havoc in dump_ucast_routes() " Jim Schutt
@ 2010-06-15 19:53   ` Jim Schutt
  2010-06-16 14:11   ` [PATCH v3 08/17] opensm: Add new torus routing engine: torus-2QoS, part 2 Jim Schutt
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-15 19:53 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, Jim Schutt

If unicast routing fails, there is no point to continuing with fabric bring-up.
Just restart a new heavy sweep instead.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_state_mgr.c |   12 +++++++++---
 opensm/opensm/osm_ucast_mgr.c |   14 +++++++++-----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 762bb27..422f3a2 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1140,7 +1140,11 @@ static void do_sweep(osm_sm_t * sm)
 		/* Re-program the switches fully */
 		sm->p_subn->ignore_existing_lfts = TRUE;
 
-		osm_ucast_mgr_process(&sm->ucast_mgr);
+		if (osm_ucast_mgr_process(&sm->ucast_mgr)) {
+			OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE,
+					"REROUTE FAILED");
+			return;
+		}
 		osm_qos_setup(sm->p_subn->p_osm);
 
 		/* Reset flag */
@@ -1299,12 +1303,14 @@ repeat_discovery:
 			"LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG");
 
 	/*
-	 * Proceed with unicast forwarding table configuration.
+	 * Proceed with unicast forwarding table configuration; if it fails
+	 * return early to wait for a trap or the next sweep interval.
 	 */
 
 	if (!sm->ucast_mgr.cache_valid ||
 	    osm_ucast_cache_process(&sm->ucast_mgr))
-		osm_ucast_mgr_process(&sm->ucast_mgr);
+		if (osm_ucast_mgr_process(&sm->ucast_mgr))
+			return;
 
 	osm_qos_setup(sm->p_subn->p_osm);
 
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index f5a715f..85495eb 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -1069,6 +1069,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 	osm_opensm_t *p_osm;
 	struct osm_routing_engine *p_routing_eng;
 	cl_qmap_t *p_sw_guid_tbl;
+	int failed = 0;
 
 	OSM_LOG_ENTER(p_mgr->p_log);
 
@@ -1087,7 +1088,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 
 	p_osm->routing_engine_used = NULL;
 	while (p_routing_eng) {
-		if (!ucast_mgr_route(p_routing_eng, p_osm))
+		failed = ucast_mgr_route(p_routing_eng, p_osm);
+		if (!failed)
 			break;
 		p_routing_eng = p_routing_eng->next;
 	}
@@ -1098,9 +1100,11 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 		struct osm_routing_engine *r = p_osm->default_routing_engine;
 
 		r->build_lid_matrices(r->context);
-		r->ucast_build_fwd_tables(r->context);
-		p_osm->routing_engine_used = r;
-		osm_ucast_mgr_set_fwd_tables(p_mgr);
+		failed = r->ucast_build_fwd_tables(r->context);
+		if (!failed) {
+			p_osm->routing_engine_used = r;
+			osm_ucast_mgr_set_fwd_tables(p_mgr);
+		}
 	}
 
 	if (p_osm->routing_engine_used) {
@@ -1120,7 +1124,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 Exit:
 	CL_PLOCK_RELEASE(p_mgr->p_lock);
 	OSM_LOG_EXIT(p_mgr->p_log);
-	return 0;
+	return failed;
 }
 
 static int ucast_build_lid_matrices(void *context)
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 08/17] opensm: Add new torus routing engine: torus-2QoS, part 2.
       [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (15 preceding siblings ...)
  2010-06-15 19:53   ` [PATCH v3 17/17] opensm: Cause status of unicast routing attempt to propogate to callers of osm_ucast_mgr_process() Jim Schutt
@ 2010-06-16 14:11   ` Jim Schutt
  16 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-06-16 14:11 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: sashak-smomgflXvOZWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---

Hmmm, I tried to break up the addition of osm_torus.c into
mailing-list-size hunks, but evidently failed on this one;
it doesn't seem to have made it to the list.

I've attached the patch as a compressed file.

Sorry.

-- Jim

 opensm/opensm/osm_torus.c | 3993
+++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 3993 insertions(+), 0 deletions(-)


[-- Attachment #2: 0008-opensm-Add-torus-2QoS-routing-engine-part-2.patch.bz2 --]
[-- Type: application/x-bzip, Size: 5571 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.
       [not found]     ` <1276631604-29230-2-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2010-07-07 17:06       ` Sasha Khapyorsky
  2010-07-07 17:57         ` Jim Schutt
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Khapyorsky @ 2010-07-07 17:06 UTC (permalink / raw)
  To: Jim Schutt; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Jim,

On 13:53 Tue 15 Jun     , Jim Schutt wrote:
> diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
> index d3dc02e..5614240 100644
> --- a/opensm/opensm/osm_opensm.c
> +++ b/opensm/opensm/osm_opensm.c
> @@ -147,7 +147,8 @@ static void append_routing_engine(osm_opensm_t *osm,
>  	r->next = routing_engine;
>  }
>  
> -static void setup_routing_engine(osm_opensm_t *osm, const char *name)
> +static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
> +						       const char *name)
>  {
>  	struct osm_routing_engine *re;
>  	const struct routing_engine_module *m;
> @@ -158,47 +159,53 @@ static void setup_routing_engine(osm_opensm_t *osm, const char *name)
>  			if (!re) {
>  				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
>  					"memory allocation failed\n");
> -				return;
> +				return NULL;
>  			}
>  			memset(re, 0, sizeof(struct osm_routing_engine));
>  
>  			re->name = m->name;
> +			re->type = osm_routing_engine_type(m->name);
>  			if (m->setup(re, osm)) {
>  				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
>  					"setup of routing"
>  					" engine \'%s\' failed\n", name);
> -				return;
> +				free(re);
> +				return NULL;
>  			}
>  			OSM_LOG(&osm->log, OSM_LOG_DEBUG,
>  				"\'%s\' routing engine set up\n", re->name);
> -			append_routing_engine(osm, re);
> -			return;
> +			if (re->type == OSM_ROUTING_ENGINE_TYPE_MINHOP)
> +				osm->default_routing_engine = re;
> +			return re;
>  		}
>  	}
>  
>  	OSM_LOG(&osm->log, OSM_LOG_ERROR,
>  		"cannot find or setup routing engine \'%s\'\n", name);
> +	return NULL;
>  }
>  
>  static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names)
>  {
>  	char *name, *str, *p;
> +	struct osm_routing_engine *re;
>  
> -	if (!engine_names || !*engine_names) {
> -		setup_routing_engine(osm, "minhop");
> -		return;
> +	if (engine_names && *engine_names) {
> +		str = strdup(engine_names);
> +		name = strtok_r(str, ", \t\n", &p);
> +		while (name && *name) {
> +			re = setup_routing_engine(osm, name);
> +			if (re)
> +				append_routing_engine(osm, re);
> +			name = strtok_r(NULL, ", \t\n", &p);
> +		}
> +		free(str);
>  	}
> -
> -	str = strdup(engine_names);
> -	name = strtok_r(str, ", \t\n", &p);
> -	while (name && *name) {
> -		setup_routing_engine(osm, name);
> -		name = strtok_r(NULL, ", \t\n", &p);
> +	if (!osm->default_routing_engine) {
> +		re = setup_routing_engine(osm, "minhop");
> +		if (!osm->routing_engine_list && re)
> +			append_routing_engine(osm, re);

Shouldn't here be:

		osm->default_routing_engine = re;

too?


>  	}
> -	free(str);
> -
> -	if (!osm->routing_engine_list)
> -		setup_routing_engine(osm, "minhop");
>  }
>  
>  void osm_opensm_construct(IN osm_opensm_t * p_osm)


So that this chunk in osm_ucast_mgr_process() (below) will not break
over NULL pointer?

> -	if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) {
> +	if (!p_osm->routing_engine_used) {
>  		/* If configured routing algorithm failed, use default MinHop */
> -		osm_ucast_mgr_build_lid_matrices(p_mgr);
> -		ucast_mgr_build_lfts(p_mgr);
> +		struct osm_routing_engine *r = p_osm->default_routing_engine;
> +
> +		r->build_lid_matrices(r->context);
> +		r->ucast_build_fwd_tables(r->context);
> +		p_osm->routing_engine_used = r;
>  		osm_ucast_mgr_set_fwd_tables(p_mgr);
> -		p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP;
>  	}

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.
  2010-07-07 17:06       ` Sasha Khapyorsky
@ 2010-07-07 17:57         ` Jim Schutt
       [not found]           ` <1278525460.4812.22.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Jim Schutt @ 2010-07-07 17:57 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Sasha:

On Wed, 2010-07-07 at 11:06 -0600, Sasha Khapyorsky wrote:
> Hi Jim,
> 
> On 13:53 Tue 15 Jun     , Jim Schutt wrote:
> > diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
> > index d3dc02e..5614240 100644
> > --- a/opensm/opensm/osm_opensm.c
> > +++ b/opensm/opensm/osm_opensm.c
> > @@ -147,7 +147,8 @@ static void append_routing_engine(osm_opensm_t *osm,
> >  	r->next = routing_engine;
> >  }
> >  
> > -static void setup_routing_engine(osm_opensm_t *osm, const char *name)
> > +static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
> > +						       const char *name)
> >  {
> >  	struct osm_routing_engine *re;
> >  	const struct routing_engine_module *m;
> > @@ -158,47 +159,53 @@ static void setup_routing_engine(osm_opensm_t *osm, const char *name)
> >  			if (!re) {
> >  				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
> >  					"memory allocation failed\n");
> > -				return;
> > +				return NULL;
> >  			}
> >  			memset(re, 0, sizeof(struct osm_routing_engine));
> >  
> >  			re->name = m->name;
> > +			re->type = osm_routing_engine_type(m->name);
> >  			if (m->setup(re, osm)) {
> >  				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
> >  					"setup of routing"
> >  					" engine \'%s\' failed\n", name);
> > -				return;
> > +				free(re);
> > +				return NULL;
> >  			}
> >  			OSM_LOG(&osm->log, OSM_LOG_DEBUG,
> >  				"\'%s\' routing engine set up\n", re->name);
> > -			append_routing_engine(osm, re);
> > -			return;
> > +			if (re->type == OSM_ROUTING_ENGINE_TYPE_MINHOP)
> > +				osm->default_routing_engine = re;
> > +			return re;
> >  		}
> >  	}
> >  
> >  	OSM_LOG(&osm->log, OSM_LOG_ERROR,
> >  		"cannot find or setup routing engine \'%s\'\n", name);
> > +	return NULL;
> >  }
> >  
> >  static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names)
> >  {
> >  	char *name, *str, *p;
> > +	struct osm_routing_engine *re;
> >  
> > -	if (!engine_names || !*engine_names) {
> > -		setup_routing_engine(osm, "minhop");
> > -		return;
> > +	if (engine_names && *engine_names) {
> > +		str = strdup(engine_names);
> > +		name = strtok_r(str, ", \t\n", &p);
> > +		while (name && *name) {
> > +			re = setup_routing_engine(osm, name);
> > +			if (re)
> > +				append_routing_engine(osm, re);
> > +			name = strtok_r(NULL, ", \t\n", &p);
> > +		}
> > +		free(str);
> >  	}
> > -
> > -	str = strdup(engine_names);
> > -	name = strtok_r(str, ", \t\n", &p);
> > -	while (name && *name) {
> > -		setup_routing_engine(osm, name);
> > -		name = strtok_r(NULL, ", \t\n", &p);
> > +	if (!osm->default_routing_engine) {
> > +		re = setup_routing_engine(osm, "minhop");
> > +		if (!osm->routing_engine_list && re)
> > +			append_routing_engine(osm, re);
> 
> Shouldn't here be:
> 
> 		osm->default_routing_engine = re;
> 
> too?

I think above call to setup_routing_engine(osm, "minhop")
does that, because we're explicitly calling it for minhop?

But now that I look at this again, I'm confused why I
thought I needed to append a minhop routing engine to
the routing engine list when the list was empty and there 
was no default routing engine.

I was trying to exactly duplicate old functionality, where
minhop is only in the routing engine list if explicitly
configured, but always called if no routing engines are
configured or all configured engines fail.
   
So I think the end of the above chunk only needs to be

-
-	str = strdup(engine_names);
-	name = strtok_r(str, ", \t\n", &p);
-	while (name && *name) {
-		setup_routing_engine(osm, name);
-		name = strtok_r(NULL, ", \t\n", &p);
- 	}
+	if (!osm->default_routing_engine)
+		setup_routing_engine(osm, "minhop");

-- Jim

> 
> 
> >  	}
> > -	free(str);
> > -
> > -	if (!osm->routing_engine_list)
> > -		setup_routing_engine(osm, "minhop");
> >  }
> >  
> >  void osm_opensm_construct(IN osm_opensm_t * p_osm)
> 
> 
> So that this chunk in osm_ucast_mgr_process() (below) will not break
> over NULL pointer?
> 
> > -	if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) {
> > +	if (!p_osm->routing_engine_used) {
> >  		/* If configured routing algorithm failed, use default MinHop */
> > -		osm_ucast_mgr_build_lid_matrices(p_mgr);
> > -		ucast_mgr_build_lfts(p_mgr);
> > +		struct osm_routing_engine *r = p_osm->default_routing_engine;
> > +
> > +		r->build_lid_matrices(r->context);
> > +		r->ucast_build_fwd_tables(r->context);
> > +		p_osm->routing_engine_used = r;
> >  		osm_ucast_mgr_set_fwd_tables(p_mgr);
> > -		p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP;
> >  	}
> 
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.
       [not found]           ` <1278525460.4812.22.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
@ 2010-07-07 21:03             ` Sasha Khapyorsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sasha Khapyorsky @ 2010-07-07 21:03 UTC (permalink / raw)
  To: Jim Schutt; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 11:57 Wed 07 Jul     , Jim Schutt wrote:
>    
> So I think the end of the above chunk only needs to be
> 
> -
> -	str = strdup(engine_names);
> -	name = strtok_r(str, ", \t\n", &p);
> -	while (name && *name) {
> -		setup_routing_engine(osm, name);
> -		name = strtok_r(NULL, ", \t\n", &p);
> - 	}
> +	if (!osm->default_routing_engine)
> +		setup_routing_engine(osm, "minhop");

This makes sense. Don't need to resubmit the patch, I will fix.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations.
       [not found]     ` <1276631604-29230-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2010-07-13 18:36       ` Sasha Khapyorsky
  2010-07-13 20:12         ` Jim Schutt
  0 siblings, 1 reply; 23+ messages in thread
From: Sasha Khapyorsky @ 2010-07-13 18:36 UTC (permalink / raw)
  To: Jim Schutt; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Jim,

On 13:53 Tue 15 Jun     , Jim Schutt wrote:
> diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
> index 093c70d..a323671 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -164,6 +164,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  	const osm_physp_t *p_dest_physp;
>  	const osm_prtn_t *p_prtn = NULL;
>  	osm_opensm_t *p_osm;
> +	struct osm_routing_engine *p_re;
>  	const ib_port_info_t *p_pi;
>  	ib_api_status_t status = IB_SUCCESS;
>  	ib_net16_t pkey;
> @@ -180,7 +181,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  	ib_slvl_table_t *p_slvl_tbl = NULL;
>  	osm_qos_level_t *p_qos_level = NULL;
>  	uint16_t valid_sl_mask = 0xffff;
> -	int is_lash;
>  	int hops = 0;
>  
>  	OSM_LOG_ENTER(sa->p_log);
> @@ -192,6 +192,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  	p_src_physp = p_physp;
>  	p_pi = &p_physp->port_info;
>  	p_osm = sa->p_subn->p_osm;
> +	p_re = p_osm->routing_engine_used;
>  
>  	mtu = ib_port_info_get_mtu_cap(p_pi);
>  	rate = ib_port_info_compute_rate(p_pi);
> @@ -667,9 +668,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  	 * Set PathRecord SL
>  	 */
>  
> -	is_lash = (p_osm->routing_engine_used &&
> -		   p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH);
> -
>  	if (comp_mask & IB_PR_COMPMASK_SL) {
>  		/*
>  		 * Specific SL was requested
> @@ -686,26 +684,10 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  			goto Exit;
>  		}
>  
> -		if (is_lash
> -		    && osm_get_lash_sl(p_osm, p_src_port, p_dest_port) != sl) {
> -			OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F23: "
> -				"Required PathRecord SL (%u) doesn't "
> -				"match LASH SL\n", sl);
> -			status = IB_NOT_FOUND;
> -			goto Exit;
> -		}

When specific SL is requested in PR shouldn't it verify this value by
routing engine provided (just similar as it was done with LASH), so
something like:

	if (p_re && p_re->path_sl &&
	    sl != p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port)) {
		OSM_LOG("blah-blah\n");
		status = IB_NOT_FOUND;
		goto Exit;
	}

?

Sasha

> -
> -	} else if (is_lash) {
> -		/*
> -		 * No specific SL in PathRecord request.
> -		 * If it's LASH routing - use its SL.
> -		 * slid and dest_lid are stored in network in lash.
> -		 */
> -		sl = osm_get_lash_sl(p_osm, p_src_port, p_dest_port);
>  	} else if (p_qos_level && p_qos_level->sl_set) {
>  		/*
> -		 * No specific SL was requested, and we're not in
> -		 * LASH routing, but there is an SL in QoS level.
> +		 * No specific SL was requested, but there is an SL in
> +		 * QoS level.
>  		 */
>  		sl = p_qos_level->sl;
>  
> @@ -746,6 +728,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  		goto Exit;
>  	}
>  
> +	/*
> +	 * If the routing engine wants to have a say in path SL selection,
> +	 * send the currently computed SL value as a hint and let the routing
> +	 * engine override it.
> +	 */
> +	if (p_re && p_re->path_sl)
> +		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
> +
>  	/* reset pkey when raw traffic */
>  	if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
>  	    cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations.
  2010-07-13 18:36       ` Sasha Khapyorsky
@ 2010-07-13 20:12         ` Jim Schutt
  0 siblings, 0 replies; 23+ messages in thread
From: Jim Schutt @ 2010-07-13 20:12 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


On Tue, 2010-07-13 at 12:36 -0600, Sasha Khapyorsky wrote:
> Hi Jim,
> 
> On 13:53 Tue 15 Jun     , Jim Schutt wrote:
> > diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
> > index 093c70d..a323671 100644
> > --- a/opensm/opensm/osm_sa_path_record.c
> > +++ b/opensm/opensm/osm_sa_path_record.c
> > @@ -164,6 +164,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >  	const osm_physp_t *p_dest_physp;
> >  	const osm_prtn_t *p_prtn = NULL;
> >  	osm_opensm_t *p_osm;
> > +	struct osm_routing_engine *p_re;
> >  	const ib_port_info_t *p_pi;
> >  	ib_api_status_t status = IB_SUCCESS;
> >  	ib_net16_t pkey;
> > @@ -180,7 +181,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >  	ib_slvl_table_t *p_slvl_tbl = NULL;
> >  	osm_qos_level_t *p_qos_level = NULL;
> >  	uint16_t valid_sl_mask = 0xffff;
> > -	int is_lash;
> >  	int hops = 0;
> >  
> >  	OSM_LOG_ENTER(sa->p_log);
> > @@ -192,6 +192,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >  	p_src_physp = p_physp;
> >  	p_pi = &p_physp->port_info;
> >  	p_osm = sa->p_subn->p_osm;
> > +	p_re = p_osm->routing_engine_used;
> >  
> >  	mtu = ib_port_info_get_mtu_cap(p_pi);
> >  	rate = ib_port_info_compute_rate(p_pi);
> > @@ -667,9 +668,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >  	 * Set PathRecord SL
> >  	 */
> >  
> > -	is_lash = (p_osm->routing_engine_used &&
> > -		   p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH);
> > -
> >  	if (comp_mask & IB_PR_COMPMASK_SL) {
> >  		/*
> >  		 * Specific SL was requested
> > @@ -686,26 +684,10 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >  			goto Exit;
> >  		}
> >  
> > -		if (is_lash
> > -		    && osm_get_lash_sl(p_osm, p_src_port, p_dest_port) != sl) {
> > -			OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F23: "
> > -				"Required PathRecord SL (%u) doesn't "
> > -				"match LASH SL\n", sl);
> > -			status = IB_NOT_FOUND;
> > -			goto Exit;
> > -		}
> 
> When specific SL is requested in PR shouldn't it verify this value by
> routing engine provided (just similar as it was done with LASH), so
> something like:
> 
> 	if (p_re && p_re->path_sl &&
> 	    sl != p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port)) {
> 		OSM_LOG("blah-blah\n");
> 		status = IB_NOT_FOUND;
> 		goto Exit;
> 	}
> 
> ?

The problem I'm trying to deal with is that torus-2QoS needs to use
3 SL bits for deadlock avoidance, leaving one SL bit for the
"traffic differentiation" sense of QoS.  torus-2QoS uses bit 3 for
this, so SLs 0-7 all reference one traffic class, and SLs 8-15 all
reference the other traffic class.

Thus, to do as you suggest would require that for an application
to be robustly successful in requesting a particular SL in the
face of torus-2QoS routing, it needs to know 1) that torus-2QoS
is being used; and 2) which of SL bits 0-2 will be set by
torus-2QoS for deadlock avoidance.

Instead, my proposal has the result that for torus-2QoS, apps
ask for a particular SL value as before, and torus-2QoS maps it
onto one of the two possible traffic classes, in a fashion that
prevents message deadlock.

The behavior of routing engines other than LASH are unchanged,
because they all use SL only for traffic differentiation.

I believe that the current behavior of LASH is both not
useful, and also wrong.  But, maybe I am missing some
subtlety about LASH's SL use, so please let me know if so.

I believe that it is not useful because currently LASH uses SL only
for message deadlock avoidance, and has no way to differentiate
classes of traffic.  So anyone who makes a PR request with a 
specific SL in an attempt to differentiate traffic is not 
getting what they think they are getting when LASH is routing.

I think it is wrong because for two instances of the same 
PR request (with SL specified), the first can succeed and 
the second can fail if the fabric topology changed (say due
to a component failure) between requests, and LASH computed
a different SL value for the path.

Under my proposal when LASH is used, PR queries that 
specify SL will now succeed when they previously would have
failed.  I believe this is acceptable because 1) currently
most such queries would fail, since LASH uses SL for
deadlock avoidance, not traffic differentiation; and 2)
since LASH can really only support one class of traffic,
any SL returned in a PR query is OK as long as it provides
deadlock free messaging.

So I believe that my proposal 1) leaves existing routing
engines other that LASH unchanged; 2) changes LASH's behavior
to be more useful; 3) allows torus-2QoS to provide two classes
of traffic without deadlock; and 4) doesn't prescribe how some
yet-to-be-developed routing engine might want to make use
of SL values.

What do you think?

-- Jim

> Sasha
> 
> > -
> > -	} else if (is_lash) {
> > -		/*
> > -		 * No specific SL in PathRecord request.
> > -		 * If it's LASH routing - use its SL.
> > -		 * slid and dest_lid are stored in network in lash.
> > -		 */
> > -		sl = osm_get_lash_sl(p_osm, p_src_port, p_dest_port);
> >  	} else if (p_qos_level && p_qos_level->sl_set) {
> >  		/*
> > -		 * No specific SL was requested, and we're not in
> > -		 * LASH routing, but there is an SL in QoS level.
> > +		 * No specific SL was requested, but there is an SL in
> > +		 * QoS level.
> >  		 */
> >  		sl = p_qos_level->sl;
> >  
> > @@ -746,6 +728,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >  		goto Exit;
> >  	}
> >  
> > +	/*
> > +	 * If the routing engine wants to have a say in path SL selection,
> > +	 * send the currently computed SL value as a hint and let the routing
> > +	 * engine override it.
> > +	 */
> > +	if (p_re && p_re->path_sl)
> > +		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
> > +
> >  	/* reset pkey when raw traffic */
> >  	if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
> >  	    cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-07-13 20:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-15 19:53 [PATCH v3 00/17] opensm: Add new torus routing engine: torus-2QoS Jim Schutt
     [not found] ` <1276631604-29230-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-06-15 19:53   ` [PATCH v3 01/17] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
     [not found]     ` <1276631604-29230-2-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-07-07 17:06       ` Sasha Khapyorsky
2010-07-07 17:57         ` Jim Schutt
     [not found]           ` <1278525460.4812.22.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
2010-07-07 21:03             ` Sasha Khapyorsky
2010-06-15 19:53   ` [PATCH v3 02/17] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
2010-06-15 19:53   ` [PATCH v3 03/17] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
     [not found]     ` <1276631604-29230-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-07-13 18:36       ` Sasha Khapyorsky
2010-07-13 20:12         ` Jim Schutt
2010-06-15 19:53   ` [PATCH v3 04/17] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
2010-06-15 19:53   ` [PATCH v3 05/17] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
2010-06-15 19:53   ` [PATCH v3 06/17] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
2010-06-15 19:53   ` [PATCH v3 07/17] opensm: Add torus-2QoS routing engine, part 1 Jim Schutt
2010-06-15 19:53   ` [PATCH v3 09/17] opensm: Add torus-2QoS routing engine, part 3 Jim Schutt
2010-06-15 19:53   ` [PATCH v3 10/17] opensm: Update documentation to describe torus-2QoS Jim Schutt
2010-06-15 19:53   ` [PATCH v3 11/17] opensm: Enable torus-2QoS routing engine Jim Schutt
2010-06-15 19:53   ` [PATCH v3 12/17] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
2010-06-15 19:53   ` [PATCH v3 13/17] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
2010-06-15 19:53   ` [PATCH v3 14/17] opensm: Make it possible to configure no fallback " Jim Schutt
2010-06-15 19:53   ` [PATCH v3 15/17] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
2010-06-15 19:53   ` [PATCH v3 16/17] opensm: Avoid havoc in dump_ucast_routes() " Jim Schutt
2010-06-15 19:53   ` [PATCH v3 17/17] opensm: Cause status of unicast routing attempt to propogate to callers of osm_ucast_mgr_process() Jim Schutt
2010-06-16 14:11   ` [PATCH v3 08/17] opensm: Add new torus routing engine: torus-2QoS, part 2 Jim Schutt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.