All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Add new torus routing engine: torus-2QoS
@ 2009-11-20 19:14 Jim Schutt
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:14 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

This patch series adds a new routing engine designed to handle large 
fabrics connected with a 2D/3D torus topology.

Patches 1-4 do some preparation to handle new SL-related features of
the routing engine, patches 5/6 add and enable the engine, and patches
7-11 have some fixups that only make sense in the presence of the new
engine.

So why a new torus routing engine?

Because I believe none of the existing routing engines can provide a
satisfactory operational experience on a large-scale torus, i.e. one
with hundreds of switches.

Generating routes for a torus that are free of credit loops requires
the use of multiple virtual lanes, and thus SLs on IB.  For IB fabrics
it also requires that _every_ application use path record queries - 
any application that uses an SL that was not obtained via a path record
query may cause credit loops.

In addition, if a fabric topology change (e.g. failed switch/link)
causes a change in the path SL values needed to prevent credit loops,
then _every_ application needs to repath for every path whose SL has
changed.  AFAIK there is no good way to do this as yet in general.

Also, the requirement for path SL queries on every connection places a
heavy load on subnet administration, and the possibility that path SL
values can change makes caching as a performance enhancement more 
difficult.

Since multiple VL/SL values are required to prevent credit loops on a 
torus,  supporting QoS means that QoS and routing need to share the small 
pool of available SL values, and the even smaller pool of available VL 
values.

This patch series, and the routing engine it introduces, addresses these
issues for a 2D/3D torus fabric.  The torus-2QoS engine can provide the
following functionality on a 2D/3D torus:
- routing that is free of credit loops
- two levels of QoS, assuming switches support 8 data VLs
- ability to route around a single failed switch, and/or multiple failed
    links, without
    - introducing credit loops
    - changing path SL values
- very short run times, with good scaling properties as fabric size
    increases

The routing engine currently in opensm that is most functional for a
torus-connected fabric is LASH.  In comparison with torus-2QoS, LASH
has the following issues:
- LASH does not support QoS.
- changing inter-switch topology (add/remove a switch, or
    removing all the links between a switch) can change many
    path SL values, potentially leading to credit loops if
    running applications do not repath.
- running time to calculate routes scales poorly with increasing 
    fabric size.

The basic algorithm used by torus-2QoS is DOR.  It also uses SL bits 0-2,
one SL bit per torus dimension, to encode whether a path crosses a dateline
(where the coordinate value wraps to zero) for each of the three dimensions,
in order to avoid the credit loops that otherwise result on a torus.  It
uses SL bit 3 to distinguish between two QoS levels.

It uses the SL2VL tables to map those eight SL values per QoS level into
two VL values per QoS level, based on which coordinate direction a link
points.  For two QoS levels, this consumes four data VLs, where VL bit
0 encodes whether the path crosses the dateline for the coordinate
direction in which the link points, and VL bit 2 encodes QoS level.

In the event of link failure, it routes the long way around the 1-D ring
containing the failed link.  I.e. no turns are introduced into a path in
order to route around a failed link.  Note that due to this implementation, 
torus-2QoS cannot route a torus with link failures that break a 1-D ring
into two disjoint segments.

Under DOR routing in a torus with a failed switch, paths that would
otherwise turn at the failed switch cannot be routed without introducing
an "illegal" turn into the path.  Such turns are "illegal" in the
sense that allowing them will allow credit loops, unless something can
be done.

The routes produced by torus-2QoS will introduce such "illegal" turns when
a switch fails.  It makes use of the input/output port dependence in the
SL2VL maps to set the otherwise unused VL bit 1 for the path hop following 
such an illegal turn.  This is enough to avoid credit loops in the 
presence of a single failed switch.

As an example, consider the following 2D torus, and consider routes
from S to D, both when the switch at F is operational, and when it
has failed.  torus-2QoS will generate routes such that the path
S-F-D is followed if F is operational, and the path S-E-I-L-D
if F has failed:

    |    |    |    |    |    |    |
  --+----+----+----+----+----+----+--
    |    |    |    |    |    |    |
  --+----+----+----+----+----D----+--
    |    |    |    |    |    |    |
  --+----+----+----+----I----L----+--
    |    |    |    |    |    |    |
  --+----+----S----+----E----F----+--
    |    |    |    |    |    |    |
  --+----+----+----+----+----+----+--

The turn in S-E-I-L-D at switch I is the illegal turn introduced
into the path.  The turns at E and L are extra turns introduced
into the path that are legal in the sense that no credit loops
can be constructed using them.

The path hop after the turn at switch I has VL bit 1 set, which marks
it as a hop after an illegal turn.

I've used the latest development version of ibdmchk, because it can use
path SL values and SL2VL tables, to check for credit loops in cases like 
the above routed with torus-2QoS, and it finds none.

I've also looked for credit loops in a torus with multiple failed switches
routed with torus-2QoS, and learned that if and only if the failed switches
are adjacent in the last DOR dimension, there will be no credit loops.

Since torus-2QoS makes use of all available SL values when supporting
2 QoS levels, there are none left over on which to confine multicast.
It turns out there is a way to construct a spanning tree which can 
overlay a DOR-routed mesh, so that multicast and unicast can coexist
on the same SL/VL without causing credit loops.  I'm working on that but
don't have it implemented yet.

In the meantime, if you do not request QoS using opensm -Q, then
torus-2QoS will only use SLs 8-15, and thus VLs 4-7, leaving SL0/VL0
free for multicast.


Jim Schutt (11):
  opensm: Prepare for routing engine input to path record SL lookup and
    SL2VL map setup.
  opensm: Allow the routing engine to influence SL2VL calculations.
  opensm: Allow the routing engine to participate in path SL
    calculations.
  opensm: Track the minimum value in the fabric of data VLs supported.
  opensm: Add torus-2QoS routing engine.
  opensm: Enable torus-2QoS routing engine.
  opensm: Add opensm option to specify file name for extra torus-2QoS
    configuration information.
  opensm: Do not require -Q option for torus-2QoS routing engine.
  opensm: Make it possible to configure no fallback routing engine.
  opensm:  Avoid havoc in minhop caused by torus-2QoS persistent use of
    osm_port_t:priv.
  opensm: Update documentation to describe torus-2QoS.

 opensm/doc/current-routing.txt         |  154 +-
 opensm/include/opensm/osm_base.h       |   18 +
 opensm/include/opensm/osm_opensm.h     |   24 +-
 opensm/include/opensm/osm_subnet.h     |    7 +
 opensm/include/opensm/osm_ucast_lash.h |    3 -
 opensm/man/opensm.8.in                 |    9 +-
 opensm/opensm/Makefile.am              |    2 +-
 opensm/opensm/main.c                   |    8 +
 opensm/opensm/osm_console.c            |   10 +-
 opensm/opensm/osm_dump.c               |    3 +-
 opensm/opensm/osm_link_mgr.c           |   16 +-
 opensm/opensm/osm_opensm.c             |   54 +-
 opensm/opensm/osm_port_info_rcv.c      |   13 +-
 opensm/opensm/osm_qos.c                |   26 +-
 opensm/opensm/osm_sa_path_record.c     |   33 +-
 opensm/opensm/osm_state_mgr.c          |   10 +-
 opensm/opensm/osm_subnet.c             |   20 +-
 opensm/opensm/osm_ucast_lash.c         |   11 +-
 opensm/opensm/osm_ucast_mgr.c          |   44 +-
 opensm/opensm/osm_ucast_torus.c        | 8665 ++++++++++++++++++++++++++++++++
 20 files changed, 9038 insertions(+), 92 deletions(-)
 create mode 100644 opensm/opensm/osm_ucast_torus.c


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/11] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:15   ` [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

In the event a routing engine needs to participate in SL assignment and
SL2VL map setup in order to avoid credit loops in a fabric, it will be
useful to make the routing engine context more widely available.

To this end, have osm_opensm_t save a pointer to the routing engine used,
rather than its type.  This will make the routing engine context easily
available in, e.g., sl2vl_update() and pr_rcv_get_path_parms().

Make the necessary adjustments to the code that used the old
routing_engine_used as an enum _osm_routing_engine_type.  In order to
keep the behavior where minhop was used if the configured routing engines
failed, the easiest solution was to add a pointer to osm_opensm_t which
pointed to the minhop struct osm_routing_engine.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |    4 ++-
 opensm/opensm/osm_console.c        |   10 ++++++--
 opensm/opensm/osm_dump.c           |    3 +-
 opensm/opensm/osm_link_mgr.c       |    5 ++-
 opensm/opensm/osm_opensm.c         |   43 +++++++++++++++++++++---------------
 opensm/opensm/osm_sa_path_record.c |    3 +-
 opensm/opensm/osm_ucast_lash.c     |    3 +-
 opensm/opensm/osm_ucast_mgr.c      |   17 ++++++++------
 8 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index c6c9bdb..e97142e 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -120,6 +120,7 @@ typedef enum _osm_routing_engine_type {
 *	added later.
 */
 struct osm_routing_engine {
+	osm_routing_engine_type_t type;
 	const char *name;
 	void *context;
 	int (*build_lid_matrices) (void *context);
@@ -183,7 +184,8 @@ typedef struct osm_opensm {
 	cl_dispatcher_t disp;
 	cl_plock_t lock;
 	struct osm_routing_engine *routing_engine_list;
-	osm_routing_engine_type_t routing_engine_used;
+	struct osm_routing_engine *routing_engine_used;
+	struct osm_routing_engine *default_routing_engine;
 	osm_stats_t stats;
 	osm_console_t console;
 	nn_map_t *node_name_map;
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index 206e7f7..f0c7aa0 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -362,6 +362,8 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
 	cl_list_item_t *item;
 
 	if (out) {
+		const char *re_str;
+
 		cl_plock_acquire(&p_osm->lock);
 		fprintf(out, "   OpenSM Version       : %s\n", p_osm->osm_version);
 		fprintf(out, "   SM State             : %s\n",
@@ -370,9 +372,11 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
 			p_osm->subn.opt.sm_priority);
 		fprintf(out, "   SA State             : %s\n",
 			sa_state_str(p_osm->sa.state));
-		fprintf(out, "   Routing Engine       : %s\n",
-			osm_routing_engine_type_str(p_osm->
-						    routing_engine_used));
+
+		re_str = p_osm->routing_engine_used ?
+			osm_routing_engine_type_str(p_osm->routing_engine_used->type) :
+			osm_routing_engine_type_str(OSM_ROUTING_ENGINE_TYPE_NONE);
+		fprintf(out, "   Routing Engine       : %s\n", re_str);
 
 		fprintf(out, "   Loaded event plugins :");
 		if (cl_qlist_head(&p_osm->plugin_list) ==
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index 86e9c00..f3f4623 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -135,7 +135,8 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * file, void *cxt)
 		"Switch 0x%016" PRIx64 "\nLID    : Port : Hops : Optimal\n",
 		cl_ntoh64(osm_node_get_node_guid(p_node)));
 
-	dor = (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_DOR);
+	dor = (p_osm->routing_engine_used &&
+	       p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_DOR);
 
 	for (lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++) {
 		fprintf(file, "0x%04X : ", lid_ho);
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index 03a585b..aaeebc7 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -64,8 +64,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 
 	OSM_LOG_ENTER(sm->p_log);
 
-	if (p_osm->routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH
-	    || !(slid = osm_physp_get_base_lid(p_physp))) {
+	if (!(p_osm->routing_engine_used &&
+	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH &&
+	      (slid = osm_physp_get_base_lid(p_physp)))) {
 		/* Use default SL if lash routing is not used */
 		OSM_LOG_EXIT(sm->p_log);
 		return sm->p_subn->opt.sm_sl;
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 5b3b364..9cd254e 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -147,7 +147,8 @@ static void append_routing_engine(osm_opensm_t *osm,
 	r->next = routing_engine;
 }
 
-static void setup_routing_engine(osm_opensm_t *osm, const char *name)
+static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
+						       const char *name)
 {
 	struct osm_routing_engine *re;
 	const struct routing_engine_module *m;
@@ -158,47 +159,53 @@ static void setup_routing_engine(osm_opensm_t *osm, const char *name)
 			if (!re) {
 				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
 					"memory allocation failed\n");
-				return;
+				return NULL;
 			}
 			memset(re, 0, sizeof(struct osm_routing_engine));
 
 			re->name = m->name;
+			re->type = osm_routing_engine_type(m->name);
 			if (m->setup(re, osm)) {
 				OSM_LOG(&osm->log, OSM_LOG_VERBOSE,
 					"setup of routing"
 					" engine \'%s\' failed\n", name);
-				return;
+				free(re);
+				return NULL;
 			}
 			OSM_LOG(&osm->log, OSM_LOG_DEBUG,
 				"\'%s\' routing engine set up\n", re->name);
-			append_routing_engine(osm, re);
-			return;
+			if (re->type == OSM_ROUTING_ENGINE_TYPE_MINHOP)
+				osm->default_routing_engine = re;
+			return re;
 		}
 	}
 
 	OSM_LOG(&osm->log, OSM_LOG_ERROR,
 		"cannot find or setup routing engine \'%s\'\n", name);
+	return NULL;
 }
 
 static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names)
 {
 	char *name, *str, *p;
+	struct osm_routing_engine *re;
 
-	if (!engine_names || !*engine_names) {
-		setup_routing_engine(osm, "minhop");
-		return;
+	if (engine_names && *engine_names) {
+		str = strdup(engine_names);
+		name = strtok_r(str, ", \t\n", &p);
+		while (name && *name) {
+			re = setup_routing_engine(osm, name);
+			if (re)
+				append_routing_engine(osm, re);
+			name = strtok_r(NULL, ", \t\n", &p);
+		}
+		free(str);
 	}
-
-	str = strdup(engine_names);
-	name = strtok_r(str, ", \t\n", &p);
-	while (name && *name) {
-		setup_routing_engine(osm, name);
-		name = strtok_r(NULL, ", \t\n", &p);
+	if (!osm->default_routing_engine) {
+		re = setup_routing_engine(osm, "minhop");
+		if (!osm->routing_engine_list && re)
+			append_routing_engine(osm, re);
 	}
-	free(str);
-
-	if (!osm->routing_engine_list)
-		setup_routing_engine(osm, "minhop");
 }
 
 void osm_opensm_construct(IN osm_opensm_t * p_osm)
diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
index dc9d508..484cb5b 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -646,7 +646,8 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	 * Set PathRecord SL
 	 */
 
-	is_lash = (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_LASH);
+	is_lash = (p_osm->routing_engine_used &&
+		   p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH);
 
 	if (comp_mask & IB_PR_COMPMASK_SL) {
 		/*
diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c
index 3054a56..626887f 100644
--- a/opensm/opensm/osm_ucast_lash.c
+++ b/opensm/opensm/osm_ucast_lash.c
@@ -1284,7 +1284,8 @@ uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
 	unsigned src_id;
 	osm_switch_t *p_sw;
 
-	if (p_osm->routing_engine_used != OSM_ROUTING_ENGINE_TYPE_LASH)
+	if (!(p_osm->routing_engine_used &&
+	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH))
 		return OSM_DEFAULT_SL;
 
 	p_sw = get_osm_switch_from_port(p_dst_port);
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 244563d..c29eb8f 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -941,7 +941,7 @@ static int ucast_mgr_route(struct osm_routing_engine *r, osm_opensm_t * osm)
 		return ret;
 	}
 
-	osm->routing_engine_used = osm_routing_engine_type(r->name);
+	osm->routing_engine_used = r;
 
 	osm_ucast_mgr_set_fwd_tables(&osm->sm.ucast_mgr);
 
@@ -969,24 +969,27 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 	    ucast_mgr_setup_all_switches(p_mgr->p_subn) < 0)
 		goto Exit;
 
-	p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_NONE;
+	p_osm->routing_engine_used = NULL;
 	while (p_routing_eng) {
 		if (!ucast_mgr_route(p_routing_eng, p_osm))
 			break;
 		p_routing_eng = p_routing_eng->next;
 	}
 
-	if (p_osm->routing_engine_used == OSM_ROUTING_ENGINE_TYPE_NONE) {
+	if (!p_osm->routing_engine_used) {
 		/* If configured routing algorithm failed, use default MinHop */
-		osm_ucast_mgr_build_lid_matrices(p_mgr);
-		ucast_mgr_build_lfts(p_mgr);
+		struct osm_routing_engine *r = p_osm->default_routing_engine;
+
+		r->build_lid_matrices(r->context);
+		r->ucast_build_fwd_tables(r->context);
+		p_osm->routing_engine_used = r;
 		osm_ucast_mgr_set_fwd_tables(p_mgr);
-		p_osm->routing_engine_used = OSM_ROUTING_ENGINE_TYPE_MINHOP;
 	}
 
 	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
 		"%s tables configured on all switches\n",
-		osm_routing_engine_type_str(p_osm->routing_engine_used));
+		osm_routing_engine_type_str(p_osm->
+					    routing_engine_used->type));
 
 	if (p_mgr->p_subn->opt.use_ucast_cache)
 		p_mgr->cache_valid = TRUE;
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-11-20 19:15   ` [PATCH 01/11] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
       [not found]     ` <1258744509-11148-3-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-11-20 19:15   ` [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
                     ` (21 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

Note that the original code assumes that QoS setup is mostly static and
based only on user configuration.  As a result, there is no provision for
routing engines that want to compute contributions to the SL2VL maps.

Fix this up by adding a callback to struct osm_routing_engine that computes
a per-port SL2VL map, and call it from the appropriate place in the QoS
setup path.

Also need to move the call to osm_qos_setup() in do_sweep() to after the
call to the routing engine, so that any SL2VL map contributions from the
routing engine are based on the latest information.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |   13 +++++++++++++
 opensm/opensm/osm_qos.c            |   19 ++++++++++++++++++-
 opensm/opensm/osm_state_mgr.c      |    4 ++--
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index e97142e..616113b 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -126,6 +126,9 @@ struct osm_routing_engine {
 	int (*build_lid_matrices) (void *context);
 	int (*ucast_build_fwd_tables) (void *context);
 	void (*ucast_dump_tables) (void *context);
+	void (*update_sl2vl)(void *context, IN osm_port_t *port,
+			     IN uint8_t in_port_num, IN uint8_t out_port_num,
+			     IN OUT ib_slvl_table_t *t);
 	void (*delete) (void *context);
 	struct osm_routing_engine *next;
 };
@@ -147,6 +150,16 @@ struct osm_routing_engine {
 *	ucast_dump_tables
 *		The callback for dumping unicast routing tables.
 *
+*	update_sl2vl(void *context, IN osm_port_t *port,
+*		     IN uint8_t in_port_num, IN uint8_t out_port_num,
+*		     OUT ib_slvl_table_t *t)
+*		The callback to allow routing engine input for SL2VL maps.
+*		For switches, *port is the switch management port, and
+*		in_port_num/out_port_num identify which part of the SL2VL
+*		map to update.  For router/HCA ports, *port is the port
+*		for which the SL2VL map should be updated, and in_port_num/
+*		out_port_num should be ignored.
+*
 *	delete
 *		The delete method, may be used for routing engine
 *		internals cleanup.
diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index 08f9a60..f42c334 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -194,6 +194,7 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
 {
 	ib_api_status_t status;
 	uint8_t i, num_ports;
+	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
 	osm_physp_t *p_physp;
 
 	if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) {
@@ -213,8 +214,24 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
 	}
 
 	for (i = 0; i < num_ports; i++) {
+		ib_slvl_table_t routing_sl2vl;
+		const ib_slvl_table_t *port_sl2vl;
+		const ib_slvl_table_t *port_sl2vl_old;
+
+		if (re->update_sl2vl) {
+			routing_sl2vl = qcfg->sl2vl;
+			re->update_sl2vl(re->context,
+					 p_port, i, port_num, &routing_sl2vl);
+			port_sl2vl = &routing_sl2vl;
+			port_sl2vl_old = osm_physp_get_slvl_tbl(p, i);
+			if (memcmp(port_sl2vl, port_sl2vl_old,
+				   sizeof(*port_sl2vl)) != 0)
+				force_update = 1;
+		} else
+			port_sl2vl = &qcfg->sl2vl;
+
 		status = sl2vl_update_table(sm, p, i, port_num, force_update,
-					    &qcfg->sl2vl);
+					    port_sl2vl);
 		if (status != IB_SUCCESS)
 			return status;
 	}
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 7540adc..c3f49dc 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1228,8 +1228,6 @@ repeat_discovery:
 
 	osm_pkey_mgr_process(sm->p_subn->p_osm);
 
-	osm_qos_setup(sm->p_subn->p_osm);
-
 	/* try to restore SA DB (this should be before lid_mgr
 	   because we may want to disable clients reregistration
 	   when SA DB is restored) */
@@ -1270,6 +1268,8 @@ repeat_discovery:
 	    osm_ucast_cache_process(&sm->ucast_mgr))
 		osm_ucast_mgr_process(&sm->ucast_mgr);
 
+	osm_qos_setup(sm->p_subn->p_osm);
+
 	if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
 		return;
 
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-11-20 19:15   ` [PATCH 01/11] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
  2009-11-20 19:15   ` [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
       [not found]     ` <1258744509-11148-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-11-20 19:15   ` [PATCH 04/11] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
                     ` (20 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

LASH already does this, in a hard-coded fashion.

Generalize this by adding a callback to struct osm_routing_engine that
computes a path SL value, and fix up LASH to use it.

This patchset causes the requested or QoS-computed SL value to be passed
to the routing engine path SL computation as a hint.  In the event the
routing engine's use of SLs allows it to support more than one QoS level,
it may be able to make use of the SL hint to do so.

For now, LASH just ignores the hint.

Note that before this change, if LASH was configured and a specific path
SL value was requested that differed from what LASH needed to route the
fabric without credit loops, the path SL lookup would fail.  Now LASH's
SL value is always used.

Possibly the choice between failing a path SL request when it conflicts
with routing, vs. always providing an SL value that gives a credit-loop-
free routing, should be user-configurable?

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h     |    6 +++++
 opensm/include/opensm/osm_ucast_lash.h |    3 --
 opensm/opensm/osm_link_mgr.c           |   15 ++++++++-----
 opensm/opensm/osm_sa_path_record.c     |   34 +++++++++++--------------------
 opensm/opensm/osm_ucast_lash.c         |    8 +++++-
 5 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index 616113b..ef9d4e1 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -129,6 +129,9 @@ struct osm_routing_engine {
 	void (*update_sl2vl)(void *context, IN osm_port_t *port,
 			     IN uint8_t in_port_num, IN uint8_t out_port_num,
 			     IN OUT ib_slvl_table_t *t);
+	uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
+			   IN const osm_port_t *src_port,
+			   IN const osm_port_t *dst_port);
 	void (*delete) (void *context);
 	struct osm_routing_engine *next;
 };
@@ -160,6 +163,9 @@ struct osm_routing_engine {
 *		for which the SL2VL map should be updated, and in_port_num/
 *		out_port_num should be ignored.
 *
+*	path_sl
+*		The callback for computing path SL.
+*
 *	delete
 *		The delete method, may be used for routing engine
 *		internals cleanup.
diff --git a/opensm/include/opensm/osm_ucast_lash.h b/opensm/include/opensm/osm_ucast_lash.h
index 9e15d38..dd90d5d 100644
--- a/opensm/include/opensm/osm_ucast_lash.h
+++ b/opensm/include/opensm/osm_ucast_lash.h
@@ -94,7 +94,4 @@ typedef struct _lash {
 	int ***virtual_location;
 } lash_t;
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-			const osm_port_t * p_dst_port);
-
 #endif
diff --git a/opensm/opensm/osm_link_mgr.c b/opensm/opensm/osm_link_mgr.c
index aaeebc7..02d6ec8 100644
--- a/opensm/opensm/osm_link_mgr.c
+++ b/opensm/opensm/osm_link_mgr.c
@@ -53,21 +53,23 @@
 #include <opensm/osm_helper.h>
 #include <opensm/osm_msgdef.h>
 #include <opensm/osm_opensm.h>
-#include <opensm/osm_ucast_lash.h>
 
 static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 {
 	osm_opensm_t *p_osm = sm->p_subn->p_osm;
+	struct osm_routing_engine *re = p_osm->routing_engine_used;
 	const osm_port_t *p_sm_port, *p_src_port;
 	ib_net16_t slid, smlid;
 	uint8_t sl;
 
 	OSM_LOG_ENTER(sm->p_log);
 
-	if (!(p_osm->routing_engine_used &&
-	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH &&
+	if (!(re && re->path_sl &&
 	      (slid = osm_physp_get_base_lid(p_physp)))) {
-		/* Use default SL if lash routing is not used */
+		/*
+		 * Use default SL if routing engine does not provide a
+		 * path SL lookup callback.
+		 */
 		OSM_LOG_EXIT(sm->p_log);
 		return sm->p_subn->opt.sm_sl;
 	}
@@ -81,8 +83,9 @@ static uint8_t link_mgr_get_smsl(IN osm_sm_t * sm, IN osm_physp_t * p_physp)
 	p_src_port =
 	    cl_ptr_vector_get(&sm->p_subn->port_lid_tbl, cl_ntoh16(slid));
 
-	/* Call lash to find proper SL */
-	sl = osm_get_lash_sl(p_osm, p_src_port, p_sm_port);
+	/* Call into routing engine to find proper SL */
+	sl = re->path_sl(re->context, sm->p_subn->opt.sm_sl,
+			 p_src_port, p_sm_port);
 
 	OSM_LOG_EXIT(sm->p_log);
 	return sl;
diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
index 484cb5b..dcb2d4e 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -161,6 +161,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	const osm_physp_t *p_dest_physp;
 	const osm_prtn_t *p_prtn = NULL;
 	osm_opensm_t *p_osm;
+	struct osm_routing_engine *p_re;
 	const ib_port_info_t *p_pi;
 	ib_api_status_t status = IB_SUCCESS;
 	ib_net16_t pkey;
@@ -177,7 +178,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	ib_slvl_table_t *p_slvl_tbl = NULL;
 	osm_qos_level_t *p_qos_level = NULL;
 	uint16_t valid_sl_mask = 0xffff;
-	int is_lash;
 
 	OSM_LOG_ENTER(sa->p_log);
 
@@ -188,6 +188,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	p_src_physp = p_physp;
 	p_pi = &p_physp->port_info;
 	p_osm = sa->p_subn->p_osm;
+	p_re = p_osm->routing_engine_used;
 
 	mtu = ib_port_info_get_mtu_cap(p_pi);
 	rate = ib_port_info_compute_rate(p_pi);
@@ -646,9 +647,6 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 	 * Set PathRecord SL
 	 */
 
-	is_lash = (p_osm->routing_engine_used &&
-		   p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH);
-
 	if (comp_mask & IB_PR_COMPMASK_SL) {
 		/*
 		 * Specific SL was requested
@@ -665,26 +663,10 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 			goto Exit;
 		}
 
-		if (is_lash
-		    && osm_get_lash_sl(p_osm, p_src_port, p_dest_port) != sl) {
-			OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F23: "
-				"Required PathRecord SL (%u) doesn't "
-				"match LASH SL\n", sl);
-			status = IB_NOT_FOUND;
-			goto Exit;
-		}
-
-	} else if (is_lash) {
-		/*
-		 * No specific SL in PathRecord request.
-		 * If it's LASH routing - use its SL.
-		 * slid and dest_lid are stored in network in lash.
-		 */
-		sl = osm_get_lash_sl(p_osm, p_src_port, p_dest_port);
 	} else if (p_qos_level && p_qos_level->sl_set) {
 		/*
-		 * No specific SL was requested, and we're not in
-		 * LASH routing, but there is an SL in QoS level.
+		 * No specific SL was requested, but there is an SL in
+		 * QoS level.
 		 */
 		sl = p_qos_level->sl;
 
@@ -725,6 +707,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
 		goto Exit;
 	}
 
+	/*
+	 * If the routing engine wants to have a say in path SL selection,
+	 * send the currently computed SL value as a hint and let the routing
+	 * engine override it.
+	 */
+	if (p_re && p_re->path_sl)
+		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
+
 	/* reset pkey when raw traffic */
 	if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
 	    cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c
index 626887f..bbba6ee 100644
--- a/opensm/opensm/osm_ucast_lash.c
+++ b/opensm/opensm/osm_ucast_lash.c
@@ -1277,12 +1277,15 @@ static void lash_delete(void *context)
 	free(p_lash);
 }
 
-uint8_t osm_get_lash_sl(osm_opensm_t * p_osm, const osm_port_t * p_src_port,
-			const osm_port_t * p_dst_port)
+static uint8_t get_lash_sl(void *context, uint8_t path_sl_hint,
+			   const osm_port_t *p_src_port,
+			   const osm_port_t *p_dst_port)
 {
 	unsigned dst_id;
 	unsigned src_id;
 	osm_switch_t *p_sw;
+	lash_t *p_lash = context;
+	osm_opensm_t *p_osm = p_lash->p_osm;
 
 	if (!(p_osm->routing_engine_used &&
 	      p_osm->routing_engine_used->type == OSM_ROUTING_ENGINE_TYPE_LASH))
@@ -1312,6 +1315,7 @@ int osm_ucast_lash_setup(struct osm_routing_engine *r, osm_opensm_t *p_osm)
 
 	r->context = p_lash;
 	r->ucast_build_fwd_tables = lash_process;
+	r->path_sl = get_lash_sl;
 	r->delete = lash_delete;
 
 	return 0;
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/11] opensm: Track the minimum value in the fabric of data VLs supported.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:15   ` [PATCH 06/11] opensm: Enable torus-2QoS routing engine Jim Schutt
                     ` (19 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

A routing engine that wants to make contributions to SL2VL maps in support
of routing free from credit loops may need to know the minimum number
of supported data VLs in the fabric.

This code tracks that value.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_subnet.h |    1 +
 opensm/opensm/osm_port_info_rcv.c  |   13 ++++++++++++-
 opensm/opensm/osm_state_mgr.c      |    6 ++++++
 opensm/opensm/osm_subnet.c         |    1 +
 4 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 0302f91..c303e86 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -509,6 +509,7 @@ typedef struct osm_subn {
 	uint16_t max_mcast_lid_ho;
 	uint8_t min_ca_mtu;
 	uint8_t min_ca_rate;
+	uint8_t min_data_vls;
 	boolean_t ignore_existing_lfts;
 	boolean_t subnet_initialization_error;
 	boolean_t force_heavy_sweep;
diff --git a/opensm/opensm/osm_port_info_rcv.c b/opensm/opensm/osm_port_info_rcv.c
index 8a99064..b0d54c8 100644
--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -82,6 +82,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN osm_physp_t * p_physp,
 	ib_api_status_t status;
 	ib_net64_t port_guid;
 	uint8_t rate, mtu;
+	unsigned data_vls;
 	cl_qmap_t *p_sm_tbl;
 	osm_remote_sm_t *p_sm;
 
@@ -91,7 +92,7 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN osm_physp_t * p_physp,
 
 	/* HACK extended port 0 should be handled too! */
 	if (osm_physp_get_port_num(p_physp) != 0) {
-		/* track the minimal endport MTU and rate */
+		/* track the minimal endport MTU, rate, and operational VLs */
 		mtu = ib_port_info_get_mtu_cap(p_pi);
 		if (mtu < sm->p_subn->min_ca_mtu) {
 			OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
@@ -107,6 +108,16 @@ static void pi_rcv_process_endport(IN osm_sm_t * sm, IN osm_physp_t * p_physp,
 				PRIx64 "\n", rate, cl_ntoh64(port_guid));
 			sm->p_subn->min_ca_rate = rate;
 		}
+
+		data_vls = 1U << (ib_port_info_get_op_vls(p_pi) - 1);
+		if (data_vls >= IB_MAX_NUM_VLS)
+			data_vls = IB_MAX_NUM_VLS - 1;
+		if ((uint8_t)data_vls < sm->p_subn->min_data_vls) {
+			OSM_LOG(sm->p_log, OSM_LOG_VERBOSE,
+				"Setting endport minimal data VLs to:%u defined by port:0x%"
+				PRIx64 "\n", data_vls, cl_ntoh64(port_guid));
+			sm->p_subn->min_data_vls = data_vls;
+		}
 	}
 
 	if (port_guid != sm->p_subn->sm_port_guid) {
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index c3f49dc..b6c41a6 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1132,6 +1132,12 @@ repeat_discovery:
 	sm->p_subn->force_reroute = FALSE;
 	sm->p_subn->subnet_initialization_error = FALSE;
 
+	/* Reset tracking values in case limiting component got removed
+	 * from fabric. */
+	sm->p_subn->min_ca_mtu = IB_MAX_MTU;
+	sm->p_subn->min_ca_rate = IB_MAX_RATE;
+	sm->p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
+
 	/* rescan configuration updates */
 	if (!config_parsed && osm_subn_rescan_conf_files(sm->p_subn) < 0)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331A: "
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 2cfcbe6..19ba730 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -526,6 +526,7 @@ ib_api_status_t osm_subn_init(IN osm_subn_t * p_subn, IN osm_opensm_t * p_osm,
 	p_subn->max_mcast_lid_ho = IB_LID_MCAST_END_HO;
 	p_subn->min_ca_mtu = IB_MAX_MTU;
 	p_subn->min_ca_rate = IB_MAX_RATE;
+	p_subn->min_data_vls = IB_MAX_NUM_VLS - 1;
 	p_subn->ignore_existing_lfts = TRUE;
 
 	/* we assume master by default - so we only need to set it true if STANDBY */
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/11] opensm: Enable torus-2QoS routing engine.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 04/11] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:15   ` [PATCH 07/11] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
                     ` (18 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |    1 +
 opensm/opensm/osm_opensm.c         |    6 ++++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index ef9d4e1..90c6c0f 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -105,6 +105,7 @@ typedef enum _osm_routing_engine_type {
 	OSM_ROUTING_ENGINE_TYPE_FTREE,
 	OSM_ROUTING_ENGINE_TYPE_LASH,
 	OSM_ROUTING_ENGINE_TYPE_DOR,
+	OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS,
 	OSM_ROUTING_ENGINE_TYPE_UNKNOWN
 } osm_routing_engine_type_t;
 /***********/
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 9cd254e..7052d49 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -70,6 +70,7 @@ extern int osm_ucast_file_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_ftree_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_lash_setup(struct osm_routing_engine *, osm_opensm_t *);
 extern int osm_ucast_dor_setup(struct osm_routing_engine *, osm_opensm_t *);
+extern int osm_ucast_torus2QoS_setup(struct osm_routing_engine *, osm_opensm_t *);
 
 const static struct routing_engine_module routing_modules[] = {
 	{"minhop", osm_ucast_minhop_setup},
@@ -78,6 +79,7 @@ const static struct routing_engine_module routing_modules[] = {
 	{"ftree", osm_ucast_ftree_setup},
 	{"lash", osm_ucast_lash_setup},
 	{"dor", osm_ucast_dor_setup},
+	{"torus-2QoS", osm_ucast_torus2QoS_setup},
 	{NULL, NULL}
 };
 
@@ -98,6 +100,8 @@ const char *osm_routing_engine_type_str(IN osm_routing_engine_type_t type)
 		return "lash";
 	case OSM_ROUTING_ENGINE_TYPE_DOR:
 		return "dor";
+	case OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS:
+		return "torus-2QoS";
 	default:
 		break;
 	}
@@ -124,6 +128,8 @@ osm_routing_engine_type_t osm_routing_engine_type(IN const char *str)
 		return OSM_ROUTING_ENGINE_TYPE_LASH;
 	else if (!strcasecmp(str, "dor"))
 		return OSM_ROUTING_ENGINE_TYPE_DOR;
+	else if (!strcasecmp(str, "torus-2QoS"))
+		return OSM_ROUTING_ENGINE_TYPE_TORUS_2QOS;
 	else
 		return OSM_ROUTING_ENGINE_TYPE_UNKNOWN;
 }
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/11] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 06/11] opensm: Enable torus-2QoS routing engine Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:15   ` [PATCH 08/11] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
                     ` (17 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_base.h   |   18 ++++++++++++++++++
 opensm/include/opensm/osm_subnet.h |    5 +++++
 opensm/opensm/main.c               |    8 ++++++++
 opensm/opensm/osm_subnet.c         |    1 +
 opensm/opensm/osm_ucast_torus.c    |    2 +-
 5 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 9d8bf98..0a90ba8 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -278,6 +278,24 @@ BEGIN_C_DECLS
 #endif /* __WIN__ */
 /***********/
 
+/****d* OpenSM: Base/OSM_DEFAULT_TORUS_CONF_FILE
+* NAME
+*	OSM_DEFAULT_TORUS_CONF_FILE
+*
+* DESCRIPTION
+*	Specifies the default file name for extra torus-2QoS configuration
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_TORUS_CONF_FILE strcat(GetOsmCachePath(), "osm-torus-2QoS.conf")
+#elif defined(OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_TORUS_CONF_FILE OPENSM_CONFIG_DIR "/torus-2QoS.conf"
+#else
+#define OSM_DEFAULT_TORUS_CONF_FILE "/etc/opensm/torus-2QoS.conf"
+#endif /* __WIN__ */
+/***********/
+
 /****d* OpenSM: Base/OSM_DEFAULT_PREFIX_ROUTES_FILE
 * NAME
 *	OSM_DEFAULT_PREFIX_ROUTES_FILE
diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index c303e86..6350dfb 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -200,6 +200,7 @@ typedef struct osm_subn_opt {
 	char *ids_guid_file;
 	char *guid_routing_order_file;
 	char *sa_db_file;
+	char *torus_conf_file;
 	boolean_t do_mesh_analysis;
 	boolean_t exit_on_fatal;
 	boolean_t honor_guid2lid_file;
@@ -411,6 +412,10 @@ typedef struct osm_subn_opt {
 *	sa_db_file
 *		Name of the SA database file.
 *
+*	torus_conf_file
+*		Name of the file with extra configuration info for torus-2QoS
+*		routing engine.
+*
 *	exit_on_fatal
 *		If TRUE (default) - SM will exit on fatal subnet initialization
 *		issues.
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 18efde1..488327c 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -231,6 +231,10 @@ static void show_usage(void)
 	       "          Set the order port guids will be routed for the MinHop\n"
 	       "          and Up/Down routing algorithms to the guids provided in the\n"
 	       "          given file (one to a line)\n\n");
+	printf("--torus_config <path to file>\n"
+	       "          This option defines the file name for the extra configuration\n"
+	       "          info needed for the torus-2QoS routing engine.   The default\n"
+	       "          name is \'"OSM_DEFAULT_TORUS_CONF_FILE"\'\n\n");
 	printf("--once, -o\n"
 	       "          This option causes OpenSM to configure the subnet\n"
 	       "          once, then exit.  Ports remain in the ACTIVE state.\n\n");
@@ -607,6 +611,7 @@ int main(int argc, char *argv[])
 		{"lash_start_vl", 1, NULL, 6},
 		{"sm_sl", 1, NULL, 7},
 		{"retries", 1, NULL, 8},
+		{"torus_config", 1, NULL, 9},
 		{NULL, 0, NULL, 0}	/* Required at the end of the array */
 	};
 
@@ -985,6 +990,9 @@ int main(int argc, char *argv[])
 			printf(" Transaction retries = %u\n",
 			       opt.transaction_retries);
 			break;
+		case 9:
+			SET_STR_OPT(opt.torus_conf_file, optarg);
+			break;
 		case 'h':
 		case '?':
 		case ':':
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 19ba730..c9bb20c 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -747,6 +747,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
 	p_opt->ids_guid_file = NULL;
 	p_opt->guid_routing_order_file = NULL;
 	p_opt->sa_db_file = NULL;
+	p_opt->torus_conf_file = strdup(OSM_DEFAULT_TORUS_CONF_FILE);
 	p_opt->do_mesh_analysis = FALSE;
 	p_opt->exit_on_fatal = TRUE;
 	p_opt->enable_quirks = FALSE;
diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 149189f..6fff73e 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -8573,7 +8573,7 @@ int torus_build_lfts(void *context)
 	torus->osm = ctx->osm;
 	fabric->osm = ctx->osm;
 
-	if (!parse_config(OPENSM_CONFIG_DIR "/opensm-torus.conf",
+	if (!parse_config(ctx->osm->subn.opt.torus_conf_file,
 			  fabric, torus))
 		goto out;
 
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/11] opensm: Do not require -Q option for torus-2QoS routing engine.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 07/11] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:15   ` [PATCH 09/11] opensm: Make it possible to configure no fallback " Jim Schutt
                     ` (16 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

The torus-2QoS engine provides a deadlock-free routing for a 2D/3D torus,
but requires that switch SL2VL maps be programmed.  Before this change,
"opensm -Q" was required for that to happen.

When a routing engine sets the struct osm_routing_engine:update_sl2vl
pointer, it is signalling its intent to participate in SL2VL map programming.
So, don't return early from osm_qos_setup() in that case; instead do everything
except attempt to read QoS configuration information.

For that to work properly, need to also always set up the default QoS config
information, instead of just when QoS is requested via -Q.

With that in place, the -Q option now means the same thing to torus-2QoS that
it means to other routing engines: QoS configuration is requested.

Otherwise, torus-2QoS can confine its unicast traffic to SLs 8-15, leaving
SL 0 free, e.g. for multicast.  This is useful until such time as
torus-2QoS can be extended to implement a spanning tree for multicast that
will not deadlock against the routing used for unicast.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_qos.c         |    7 +++++--
 opensm/opensm/osm_subnet.c      |   18 +++++++++---------
 opensm/opensm/osm_ucast_torus.c |   24 +++++++++++++++++++++++-
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index f42c334..0f0b24f 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -288,7 +288,9 @@ int osm_qos_setup(osm_opensm_t * p_osm)
 	int ret = 0;
 	uint8_t i;
 
-	if (!p_osm->subn.opt.qos)
+	if (!(p_osm->subn.opt.qos ||
+	      (p_osm->routing_engine_used &&
+	       p_osm->routing_engine_used->update_sl2vl)))
 		return 0;
 
 	OSM_LOG_ENTER(&p_osm->log);
@@ -305,7 +307,8 @@ int osm_qos_setup(osm_opensm_t * p_osm)
 	cl_plock_excl_acquire(&p_osm->lock);
 
 	/* read QoS policy config file */
-	osm_qos_parse_policy_file(&p_osm->subn);
+	if (p_osm->subn.opt.qos)
+		osm_qos_parse_policy_file(&p_osm->subn);
 
 	p_tbl = &p_osm->subn.port_guid_tbl;
 	p_next = cl_qmap_head(p_tbl);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index c9bb20c..cc81545 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -1044,6 +1044,8 @@ static void subn_verify_qos_set(osm_qos_options_t *set, const char *prefix,
 
 int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 {
+	osm_qos_options_t dflt;
+
 	if (p_opts->lmc > 7) {
 		log_report(" Invalid Cached Option Value:lmc = %u:"
 			   "Using Default:%u\n", p_opts->lmc, OSM_DEFAULT_LMC);
@@ -1087,17 +1089,15 @@ int osm_subn_verify_config(IN osm_subn_opt_t * p_opts)
 		p_opts->console = OSM_DEFAULT_CONSOLE;
 	}
 
-	if (p_opts->qos) {
-		osm_qos_options_t dflt;
-
-		/* the default options in qos_options must be correct.
-		 * every other one need not be, b/c those will default
-		 * back to whatever is in qos_options.
-		 */
 
-		subn_set_default_qos_options(&dflt);
+	/* the default options in qos_options must be correct.
+	 * every other one need not be, b/c those will default
+	 * back to whatever is in qos_options.
+	 */
+	subn_set_default_qos_options(&dflt);
+	subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
 
-		subn_verify_qos_set(&p_opts->qos_options, "qos", &dflt);
+	if (p_opts->qos) {
 		subn_verify_qos_set(&p_opts->qos_ca_options, "qos_ca",
 				    &p_opts->qos_options);
 		subn_verify_qos_set(&p_opts->qos_sw0_options, "qos_sw0",
diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 6fff73e..8eb2880 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -298,6 +298,7 @@ struct torus {
 #define Z_MESH (1U << 2)
 #define MSG_DEADLOCK (1U << 29)
 #define NOTIFY_CHANGES (1U << 30)
+#define QOS_ENABLED (1U << 31)
 
 #define ALL_MESH(flags) \
 	((flags & (X_MESH | Y_MESH | Z_MESH)) == (X_MESH | Y_MESH | Z_MESH))
@@ -8548,7 +8549,25 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 	sl  = sl_set_use_loop_vl(use_vl1(ssw->i, dsw->i, t->x_sz), 0);
 	sl |= sl_set_use_loop_vl(use_vl1(ssw->j, dsw->j, t->y_sz), 1);
 	sl |= sl_set_use_loop_vl(use_vl1(ssw->k, dsw->k, t->z_sz), 2);
-	sl |= sl_set_qos(sl_get_qos(path_sl_hint));
+
+	/*
+	 * If QoS was not requested by user, force path SLs into 8-15 range.
+	 * This leaves SL 0 available for multicast, and SL2VL mappings
+	 * will keep multicast traffic from deadlocking with unicast traffic.
+	 *
+	 * However, multicast might still deadlock against itself if multiple
+	 * multicast groups each use their own spanning tree.
+	 *
+	 * FIXME: it is possible to construct a spanning tree that can
+	 * overlay the DOR routing used for unicast in a way that multicast
+	 * and unicast can share VLs but cannot deadlock against each other.
+	 * Need to implement that and cause it to be used whenever the
+	 * torus-2QoS routing engine is used.
+	 */
+	if (t->flags & QOS_ENABLED)
+		sl |= sl_set_qos(sl_get_qos(path_sl_hint));
+	else
+		sl |= sl_set_qos(1);
 out:
 	return sl;
 }
@@ -8570,6 +8589,9 @@ int torus_build_lfts(void *context)
 			"Error: allocating torus: %s\n", strerror(errno));
 		goto out;
 	}
+	if (ctx->osm->subn.opt.qos)
+		torus->flags |= QOS_ENABLED;
+
 	torus->osm = ctx->osm;
 	fabric->osm = ctx->osm;
 
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/11] opensm: Make it possible to configure no fallback routing engine.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (6 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 08/11] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
       [not found]     ` <1258744509-11148-9-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-11-20 19:15   ` [PATCH 10/11] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
                     ` (15 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

For a fabric that requires routing with an engine with special properties,
say avoiding credit loops via making use of SLs in routing, it might
be preferable to not fall back to minhop if the configured routing engine
fails.

E.g. the torus-2QoS routing engine uses both SL2VL maps and path SL values
to provide routing free of credit loops, but cannot route fabrics for
some patterns of failed switches.  Should a switch fail that creates such
a pattern, it may be preferable to keep the previous routing information
loaded in the switches until a switch can be replaced that restores
torus-2QoS's ability to route the fabric.

The alternative, having some other engine route the fabric, will immediately
introduce credit loops.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_subnet.h |    1 +
 opensm/opensm/osm_opensm.c         |    5 +++++
 opensm/opensm/osm_ucast_mgr.c      |   23 +++++++++++++++--------
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 6350dfb..3022143 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -214,6 +214,7 @@ typedef struct osm_subn_opt {
 	osm_qos_options_t qos_rtr_options;
 	boolean_t enable_quirks;
 	boolean_t no_clients_rereg;
+	boolean_t no_fallback_routing_engine;
 #ifdef ENABLE_OSM_PERF_MGR
 	boolean_t perfmgr;
 	boolean_t perfmgr_redir;
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index 7052d49..e7ef55c 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -159,6 +159,11 @@ static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
 	struct osm_routing_engine *re;
 	const struct routing_engine_module *m;
 
+	if (!strcmp(name, "no_fallback")) {
+		osm->subn.opt.no_fallback_routing_engine = TRUE;
+		return NULL;
+	}
+
 	for (m = routing_modules; m->name && *m->name; m++) {
 		if (!strcmp(m->name, name)) {
 			re = malloc(sizeof(struct osm_routing_engine));
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index c29eb8f..f3cd379 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -976,7 +976,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 		p_routing_eng = p_routing_eng->next;
 	}
 
-	if (!p_osm->routing_engine_used) {
+	if (!p_osm->routing_engine_used &&
+	    p_osm->subn.opt.no_fallback_routing_engine != TRUE) {
 		/* If configured routing algorithm failed, use default MinHop */
 		struct osm_routing_engine *r = p_osm->default_routing_engine;
 
@@ -986,14 +987,20 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
 		osm_ucast_mgr_set_fwd_tables(p_mgr);
 	}
 
-	OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
-		"%s tables configured on all switches\n",
-		osm_routing_engine_type_str(p_osm->
-					    routing_engine_used->type));
-
-	if (p_mgr->p_subn->opt.use_ucast_cache)
-		p_mgr->cache_valid = TRUE;
+	if (p_osm->routing_engine_used) {
+		OSM_LOG(p_mgr->p_log, OSM_LOG_INFO,
+			"%s tables configured on all switches\n",
+			osm_routing_engine_type_str(p_osm->
+						    routing_engine_used->type));
 
+		if (p_mgr->p_subn->opt.use_ucast_cache)
+			p_mgr->cache_valid = TRUE;
+	} else {
+		p_mgr->p_subn->subnet_initialization_error = TRUE;
+		OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR,
+			"No routing engine able to successfully configure "
+			" switch tables on current fabric\n");
+	}
 Exit:
 	CL_PLOCK_RELEASE(p_mgr->p_lock);
 	OSM_LOG_EXIT(p_mgr->p_log);
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/11] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (7 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 09/11] opensm: Make it possible to configure no fallback " Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:15   ` [PATCH 11/11] opensm: Update documentation to describe torus-2QoS Jim Schutt
                     ` (14 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

It cannot clear osm_port_t:priv members when it tears down its persistent
data for the following reason: If a port is removed from the fabric, the
opensm core will delete the corresponding osm_port_t object, leaving
torus-2QoS holding a dangling reference.  Torus-2QoS then has a use-after-free
error when tearing down its persistent data if it tries to use its dangling
osm_port_t reference to clear the priv member.

When torus-2QoS is unable to route a fabric due to missing switches and
opensm is configured to fall back to minhop, havoc will ensue because
minhop uses a non-NULL osm_port_t:priv as a proxy for LMC > 0: it
assumes if osm_port_t:priv is non-NULL it can only be because
alloc_ports_priv() has been called.

Fix this up by always calling alloc_ports_priv(), and have it set
priv = NULL if LMC == 0.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_mgr.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index f3cd379..1bb7a13 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -314,8 +314,10 @@ static void alloc_ports_priv(osm_ucast_mgr_t * mgr)
 	     item = cl_qmap_next(item)) {
 		port = (osm_port_t *) item;
 		lmc = ib_port_info_get_lmc(&port->p_physp->port_info);
-		if (!lmc)
+		if (!lmc) {
+			port->priv = NULL;
 			continue;
+		}
 		r = malloc(sizeof(*r) + sizeof(r->guids[0]) * (1 << lmc));
 		if (!r) {
 			OSM_LOG(mgr->p_log, OSM_LOG_ERROR, "ERR 3A09: "
@@ -362,8 +364,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * p_map_item,
 	/* Initialize LIDs in buffer to invalid port number. */
 	memset(p_sw->new_lft, OSM_NO_PATH, p_sw->max_lid_ho + 1);
 
-	if (p_mgr->p_subn->opt.lmc)
-		alloc_ports_priv(p_mgr);
+	alloc_ports_priv(p_mgr);
 
 	/*
 	   Iterate through every port setting LID routes for each
@@ -380,8 +381,7 @@ static void ucast_mgr_process_tbl(IN cl_map_item_t * p_map_item,
 		}
 	}
 
-	if (p_mgr->p_subn->opt.lmc)
-		free_ports_priv(p_mgr);
+	free_ports_priv(p_mgr);
 
 	OSM_LOG_EXIT(p_mgr->p_log);
 }
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/11] opensm: Update documentation to describe torus-2QoS.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (8 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 10/11] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
@ 2009-11-20 19:15   ` Jim Schutt
  2009-11-20 19:24   ` [PATCH 05/11] opensm: Add torus-2QoS routing engine Jim Schutt
                     ` (13 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:15 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/doc/current-routing.txt |  154 +++++++++++++++++++++++++++++++++++++++-
 opensm/man/opensm.8.in         |    9 ++-
 2 files changed, 160 insertions(+), 3 deletions(-)

diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
index 1302860..141d793 100644
--- a/opensm/doc/current-routing.txt
+++ b/opensm/doc/current-routing.txt
@@ -1,7 +1,7 @@
 Current OpenSM Routing
-7/9/07
+10/9/09
 
-OpenSM offers five routing engines:
+OpenSM offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -28,6 +28,13 @@ two switches.  This provides deadlock free routes for hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM provides an optional unicast routing cache (enabled by -A or
 --ucast_cache options). When enabled, unicast routing cache prevents
 routing recalculation (which is a heavy task in a large cluster) when
@@ -388,3 +395,146 @@ ports, one port on one end of the cable, and the other port on the
 other end, continuing along the mesh dimension.
 
 Use '-R dor' option to activate the DOR algorithm.
+
+Torus-2QoS Routing Algorithm
+----------------------------
+
+Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
+
+It is a DOR-based algorithm that avoids deadlocks that would otherwise
+occur in a torus using the concept of a dateline for each torus dimension.
+It encodes into a path SL which datelines the path crosses as follows:
+
+  sl = 0;
+  for (d = 0; d < torus_dimensions; d++)
+    /* path_crosses_dateline(d) returns 0 or 1 */
+    sl |= path_crosses_dateline(d) << d;
+
+For a 3D torus, that leaves one SL bit free, which torus-2QoS uses to
+implement two QoS levels.
+
+This is possible because torus-2QoS also makes use of the output port
+dependence of the switch SL2VL maps.  It computes in which torus coordinate
+direction each interswitch link "points", and writes SL2VL maps for such
+ports as follows:
+
+  for (sl = 0; sl < 16; sl ++)
+    /* cdir(port) reports which torus coordinate direction a switch port
+     * "points" in, and returns 0, 1, or 2 */
+    sl2vl(iport,oport,sl) = 0x1 & (sl >> cdir(oport));
+
+Thus torus-2QoS consumes 8 SL values (SL bits 0-2) and 2 VL values (VL bit 0)
+ per QoS level to provide deadlock-free routing on a 3D torus.
+
+Torus-2QoS routes around link failure by "taking the long way around" any
+1D ring interrupted by a link failure.  For example, consider the 2D 6x5
+torus below, where switches are denoted by [+a-zA-Z]:
+
+        |    |    |    |    |    |
+   4  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   3  --+----+----+----D----+----+--
+        |    |    |    |    |    |
+   2  --+----+----I----r----+----+--
+        |    |    |    |    |    |
+   1  --m----S----n----T----o----p--
+        |    |    |    |    |    |
+ y=0  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+
+      x=0    1    2    3    4    5
+
+For a pristine fabric the path from S to D would be S-n-T-r-d.  In the
+event that either link S-n or n-T has failed, torus-2QoS would use the path
+S-m-p-o-T-r-D.  Note that it can do this without changing the path SL
+value; once the 1D ring m-S-n-T-o-p-m has been broken by failure, path
+segments using it cannot contribute to deadlock, and the x-direction
+dateline (between, say, x=5 and x=0) can be ignored for path segments on
+that ring.
+
+One result of this is that torus-2QoS can route around many simultaneous
+link failures, as long as no 1D ring is broken into disjoint regions.  For
+example, if links n-T and T-o have both failed, that ring has been broken
+into two disjoint regions, T and o-p-m-S-n.  Torus-2QoS checks for such
+issues, reports if they are found, and refuses to route such fabrics.
+
+Handling a failed switch under DOR requires introducing into a path at
+least one turn that would be otherwise "illegal", i.e. not allowed by DOR
+rules.  Torus-2QoS will introduce such a turn as close as possible to the
+failed switch in order to route around it.
+
+In the above example, suppose switch T has failed, and consider the path
+from S to D.  Torus-2QoS will produce the path S-n-I-r-D, rather than the
+S-n-T-r-D path for a pristine torus, by introducing an early turn at n.
+For traffic arriving at switch I from n, normal DOR rules will generate an
+illegal turn in the path from S to D at I, and a legal turn at r.
+
+Torus-2QoS will also use the input port dependence of SL2VL maps to set VL
+bit 1 (which would be otherwise unused) for y-x, z-x, and z-y turns, i.e.,
+those turns that are illegal under DOR.  This causes the first hop after
+any such turn to use a separate set of VL values, and prevents deadlock in
+the presence of a single failed switch.
+
+For any given path, only the hops after a turn that is illegal under DOR
+can contribute to a credit loop that leads to deadlock.  So in the example
+above with failed switch T, the location of the illegal turn at I in the
+path from S to D requires that any credit loop caused by that turn must
+encircle the failed switch at T.  Thus the second and later hops after the
+illegal turn at I (i.e., hop r-D) cannot contribute to a credit loop
+because they cannot be used to construct a loop encircling T.  The hop I-r
+uses a separate VL, so it cannot contribute to a credit loop encircling T.
+
+Extending this argument shows that in addition to being capable of routing
+around a single switch failure without introducing deadlock, torus-2QoS can
+also route around multiple failed switches on the condition they are
+adjacent in the last dimension routed by DOR.  For example, consider the
+following case on a 6x6 2D torus:
+
+
+        |    |    |    |    |    |
+   5  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   4  --+----+----+----D----+----+--
+        |    |    |    |    |    |
+   3  --+----+----I----u----+----+--
+        |    |    |    |    |    |
+   2  --+----+----q----R----+----+--
+        |    |    |    |    |    |
+   1  --m----S----n----T----o----p--
+        |    |    |    |    |    |
+ y=0  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+
+      x=0    1    2    3    4    5
+
+
+Suppose switches T and R have failed, and consider the path from S to D.
+Torus-2QoS will generate the path S-n-q-I-u-D, with an illegal turn at
+switch I, and with hop I-u using a VL with bit 1 set.
+
+As a further example, consider a case that torus-2QoS cannot route without
+deadlock: two failed switches adjacent in a dimension that is not the last
+dimension routed by DOR; here the failed switches are O and T:
+
+        |    |    |    |    |    |
+   5  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   4  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+   3  --+----+----+----+----D----+--
+        |    |    |    |    |    |
+   2  --+----+----I----q----r----+--
+        |    |    |    |    |    |
+   1  --m----S----n----O----T----p--
+        |    |    |    |    |    |
+ y=0  --+----+----+----+----+----+--
+        |    |    |    |    |    |
+
+      x=0    1    2    3    4    5
+
+In a pristine fabric, torus-2QoS would generate the path from S to D as
+S-n-O-T-r-D.  With failed switches O and T, torus-2QoS will generate the
+path S-n-I-q-r-D, with illegal turn at switch I, and with hop I-q using a
+VL with bit 1 set.  In contrast to the earlier examples, the second hop
+after the illegal turn, q-r, can be used to construct a credit loop
+encircling the failed switches.
diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
index bd8ab4e..da016f1 100644
--- a/opensm/man/opensm.8.in
+++ b/opensm/man/opensm.8.in
@@ -630,7 +630,7 @@ compiling opensm with -DROUTER_EXP which has been obsoleted.
 
 .SH ROUTING
 .PP
-OpenSM now offers five routing engines:
+OpenSM now offers six routing engines:
 
 1.  Min Hop Algorithm - based on the minimum hops to each node where the
 path length is optimized.
@@ -659,6 +659,13 @@ two switches.  This provides deadlock free routes for hypercubes when
 the fabric is cabled as a hypercube and for meshes when cabled as a
 mesh (see details below).
 
+6. Torus-2QoS unicast routing algorithm - a DOR-based routing algorithm
+specialized for 2D/3D torus topologies.  Torus-2QoS provides deadlock-free
+routing while supporting two quality of service (QoS) levels.  In addition
+it is able to route around multiple failed fabric links or a single failed
+fabric switch without introducing deadlocks, and without changing path SL
+values granted before the failure.
+
 OpenSM also supports a file method which
 can load routes from a table. See \'Modular Routing Engine\' for more
 information on this.
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/11] opensm: Add torus-2QoS routing engine.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (9 preceding siblings ...)
  2009-11-20 19:15   ` [PATCH 11/11] opensm: Update documentation to describe torus-2QoS Jim Schutt
@ 2009-11-20 19:24   ` Jim Schutt
  2009-11-20 19:27   ` torus-2QoS example input files (was Re: [PATCH 00/11] Add new torus routing engine: torus-2QoS) Jim Schutt
                     ` (12 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w

[-- Attachment #1: Type: text/plain, Size: 526 bytes --]


This engine routes a 2D/3D torus without credit loops while providing two
quality-of-service levels.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---

I've attached the patch as a compressed file, as otherwise
it is too large to make it through the list.

-- Jim

 opensm/opensm/Makefile.am       |    2 +-
 opensm/opensm/osm_ucast_torus.c | 8643 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 8644 insertions(+), 1 deletions(-)
 create mode 100644 opensm/opensm/osm_ucast_torus.c


[-- Attachment #2: 0005-opensm-Add-torus-2QoS-routing-engine.patch.bz2 --]
[-- Type: application/x-bzip, Size: 26863 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* torus-2QoS example input files (was Re: [PATCH 00/11] Add new torus routing engine: torus-2QoS)
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (10 preceding siblings ...)
  2009-11-20 19:24   ` [PATCH 05/11] opensm: Add torus-2QoS routing engine Jim Schutt
@ 2009-11-20 19:27   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 00/12] Add specialized multicast support to new torus routing engine: torus-2QoS Jim Schutt
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-11-20 19:27 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]


The attached files can be used to test the torus-2QoS routing
engine using ibsim.

fabric-torus-5x5x5 contains a fabric description that ibsim can read.
Once ibsim is running, run opensm like this:

  opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf
or 
  opensm --config opensm.conf --torus_config torus-2QoS-5x5x5.conf \
     -Q --qos_policy_file qos-policy-torus-5x5x5.conf

-- Jim


[-- Attachment #2: fabric-torus-5x5x5.bz2 --]
[-- Type: application/x-bzip, Size: 1648 bytes --]

[-- Attachment #3: opensm.conf --]
[-- Type: text/plain, Size: 1093 bytes --]


# Limit the maximal operational VLs
max_op_vls 8

# The number of seconds between subnet sweeps (0 disables it)
sweep_interval 10

# Routing engine
# Multiple routing engines can be specified separated by
# commas so that specific ordering of routing algorithms will
# be tried if earlier routing engines fail.
# Supported engines: minhop, updn, file, ftree, lash, dor
routing_engine torus-2QoS,no_fallback

# Use unicast routing cache (use FALSE if unsure)
use_ucast_cache TRUE

# Force flush of the log file after each log message
force_log_flush TRUE

# Log file to be used
log_file /dev/tty

# console [off|local|loopback|socket]
console loopback

# Telnet port for console (default 10000)
console_port 10000

# QoS default options
# Note that for OFED > 1.3, this information can also be in qos-policy.conf.
# However, it may be good to have it here also for torus-2QoS, as this will
# change the defaults even if not using QoS.
qos_max_vls 8
qos_high_limit 0
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0
qos_vlarb_low 0:64,1:64,2:64,3:64,4:64,5:64,6:64,7:64,8:64
qos_sl2vl (null)

[-- Attachment #4: qos-policy-torus-5x5x5.conf --]
[-- Type: text/plain, Size: 6879 bytes --]


# This is a QoS configuration for the torus-2QoS routing engine.
# As it supports only 2 levels of QoS, via SL bit 3, we should configure
# only SLs 0 and 8.  Based on that torus-2QoS will pick the appropriate
# SL value to provide deadlock-free routing for both QoS levels.

port-groups
    port-group
	name: Service_nodes
	port-name: "H_0_0_0_0/P1"	# E.g. admin
	port-name: "H_0_0_1_0/P1"	# E.g. NFS server
	port-name: "H_0_0_2_0/P1"	# E.g. boot server
	port-name: "H_0_0_3_0/P1"	# E.g. login node
    end-port-group

    port-group
	name: Lustre_nodes

	port-name: "H_0_0_4_0/P1"	# E.g. MDS

	port-name: "H_0_1_0_0/P1"	# E.g. OSS
	port-name: "H_0_1_1_0/P1"	# E.g. OSS
	port-name: "H_0_1_2_0/P1"	# E.g. OSS
	port-name: "H_0_1_3_0/P1"	# E.g. OSS
	port-name: "H_0_1_4_0/P1"	# E.g. OSS
    end-port-group

    port-group
	name: Compute_nodes

	port-name: "H_0_2_0_0/P1"
	port-name: "H_0_2_1_0/P1"
	port-name: "H_0_2_2_0/P1"
	port-name: "H_0_2_3_0/P1"
	port-name: "H_0_2_4_0/P1"

	port-name: "H_0_3_0_0/P1"
	port-name: "H_0_3_1_0/P1"
	port-name: "H_0_3_2_0/P1"
	port-name: "H_0_3_3_0/P1"
	port-name: "H_0_3_4_0/P1"

	port-name: "H_0_4_0_0/P1"
	port-name: "H_0_4_1_0/P1"
	port-name: "H_0_4_2_0/P1"
	port-name: "H_0_4_3_0/P1"
	port-name: "H_0_4_4_0/P1"

	port-name: "H_1_0_0_0/P1"
	port-name: "H_1_0_1_0/P1"
	port-name: "H_1_0_2_0/P1"
	port-name: "H_1_0_3_0/P1"
	port-name: "H_1_0_4_0/P1"

	port-name: "H_1_1_0_0/P1"
	port-name: "H_1_1_1_0/P1"
	port-name: "H_1_1_2_0/P1"
	port-name: "H_1_1_3_0/P1"
	port-name: "H_1_1_4_0/P1"

	port-name: "H_1_2_0_0/P1"
	port-name: "H_1_2_1_0/P1"
	port-name: "H_1_2_2_0/P1"
	port-name: "H_1_2_3_0/P1"
	port-name: "H_1_2_4_0/P1"

	port-name: "H_1_3_0_0/P1"
	port-name: "H_1_3_1_0/P1"
	port-name: "H_1_3_2_0/P1"
	port-name: "H_1_3_3_0/P1"
	port-name: "H_1_3_4_0/P1"

	port-name: "H_1_4_0_0/P1"
	port-name: "H_1_4_1_0/P1"
	port-name: "H_1_4_2_0/P1"
	port-name: "H_1_4_3_0/P1"
	port-name: "H_1_4_4_0/P1"

	port-name: "H_2_0_0_0/P1"
	port-name: "H_2_0_1_0/P1"
	port-name: "H_2_0_2_0/P1"
	port-name: "H_2_0_3_0/P1"
	port-name: "H_2_0_4_0/P1"

	port-name: "H_2_1_0_0/P1"
	port-name: "H_2_1_1_0/P1"
	port-name: "H_2_1_2_0/P1"
	port-name: "H_2_1_3_0/P1"
	port-name: "H_2_1_4_0/P1"

	port-name: "H_2_2_0_0/P1"
	port-name: "H_2_2_1_0/P1"
	port-name: "H_2_2_2_0/P1"
	port-name: "H_2_2_3_0/P1"
	port-name: "H_2_2_4_0/P1"

	port-name: "H_2_3_0_0/P1"
	port-name: "H_2_3_1_0/P1"
	port-name: "H_2_3_2_0/P1"
	port-name: "H_2_3_3_0/P1"
	port-name: "H_2_3_4_0/P1"

	port-name: "H_2_4_0_0/P1"
	port-name: "H_2_4_1_0/P1"
	port-name: "H_2_4_2_0/P1"
	port-name: "H_2_4_3_0/P1"
	port-name: "H_2_4_4_0/P1"

	port-name: "H_3_0_0_0/P1"
	port-name: "H_3_0_1_0/P1"
	port-name: "H_3_0_2_0/P1"
	port-name: "H_3_0_3_0/P1"
	port-name: "H_3_0_4_0/P1"

	port-name: "H_3_1_0_0/P1"
	port-name: "H_3_1_1_0/P1"
	port-name: "H_3_1_2_0/P1"
	port-name: "H_3_1_3_0/P1"
	port-name: "H_3_1_4_0/P1"

	port-name: "H_3_2_0_0/P1"
	port-name: "H_3_2_1_0/P1"
	port-name: "H_3_2_2_0/P1"
	port-name: "H_3_2_3_0/P1"
	port-name: "H_3_2_4_0/P1"

	port-name: "H_3_3_0_0/P1"
	port-name: "H_3_3_1_0/P1"
	port-name: "H_3_3_2_0/P1"
	port-name: "H_3_3_3_0/P1"
	port-name: "H_3_3_4_0/P1"

	port-name: "H_4_4_0_0/P1"
	port-name: "H_4_4_1_0/P1"
	port-name: "H_4_4_2_0/P1"
	port-name: "H_4_4_3_0/P1"
	port-name: "H_4_4_4_0/P1"

	port-name: "H_4_0_0_0/P1"
	port-name: "H_4_0_1_0/P1"
	port-name: "H_4_0_2_0/P1"
	port-name: "H_4_0_3_0/P1"
	port-name: "H_4_0_4_0/P1"

	port-name: "H_4_1_0_0/P1"
	port-name: "H_4_1_1_0/P1"
	port-name: "H_4_1_2_0/P1"
	port-name: "H_4_1_3_0/P1"
	port-name: "H_4_1_4_0/P1"

	port-name: "H_4_2_0_0/P1"
	port-name: "H_4_2_1_0/P1"
	port-name: "H_4_2_2_0/P1"
	port-name: "H_4_2_3_0/P1"
	port-name: "H_4_2_4_0/P1"

	port-name: "H_4_3_0_0/P1"
	port-name: "H_4_3_1_0/P1"
	port-name: "H_4_3_2_0/P1"
	port-name: "H_4_3_3_0/P1"
	port-name: "H_4_3_4_0/P1"

	port-name: "H_4_4_0_0/P1"
	port-name: "H_4_4_1_0/P1"
	port-name: "H_4_4_2_0/P1"
	port-name: "H_4_4_3_0/P1"
	port-name: "H_4_4_4_0/P1"
    end-port-group

    port-group
	name: All_ports
	node-type: ALL
    end-port-group
end-port-groups

#
# The default VL arbitration setup will not be quite right for
# torus-2QoS, so set up something more appropriate.
#
# All the SLs for a given QoS level need to have equal traffic priority.
# Since SLs 0-7 map to VLs 0-3, and SLs 8-15 map to VLs 4-7, we need 
# equal VL arbitration weightings in each of those VL ranges.
#
# OFED 1.3 doesn't use this information, just parses and drops it on the floor,
# so it needs to be repeated in opensm.conf.  Putting it in opensm.conf has
# the added benefit that the defaults can be set and used even if QoS isn't
# configured.
#
qos-setup
    vlarb-tables
	vlarb-scope
	    group: All_ports
	    across: All_ports

	    vl-high-limit: 0

	    vlarb-high: 0:0
	    vlarb-high: 1:0
	    vlarb-high: 2:0
	    vlarb-high: 3:0
	    vlarb-high: 4:0
	    vlarb-high: 5:0
	    vlarb-high: 6:0
	    vlarb-high: 7:0
	    vlarb-high: 8:0
	    vlarb-high: 9:0
	    vlarb-high: 10:0
	    vlarb-high: 11:0
	    vlarb-high: 12:0
	    vlarb-high: 13:0
	    vlarb-high: 14:0

	    vlarb-low: 0:64
	    vlarb-low: 1:64
	    vlarb-low: 2:64
	    vlarb-low: 3:64
	    vlarb-low: 4:64
	    vlarb-low: 5:64
	    vlarb-low: 6:64
	    vlarb-low: 7:64
	    vlarb-low: 8:64
	    vlarb-low: 9:64
	    vlarb-low: 10:64
	    vlarb-low: 11:64
	    vlarb-low: 12:64
	    vlarb-low: 13:64
	    vlarb-low: 14:64
	end-vlarb-scope
    end-vlarb-tables
end-qos-setup

#
# We don't explicitly use the qos-class keyword in qos-match-rule, because
# we don't have any control over how apps will specify qos-class in path
# queries, and we don't want rule matching falures due to wrong qos-class
# values in queries.
#
qos-levels
    qos-level
	name: DEFAULT
	sl: 0
    end-qos-level

    # By assigning Lustre and MPI traffic to different SLs (and thus 
    # different VLs) we keep MPI and Lustre from starving each other.
    qos-level
	name: Lustre
	sl: 0
    end-qos-level

    qos-level
	name: MPI
	sl: 8
    end-qos-level
end-qos-levels

#
# For the purposes of QoS configuration, MPI is not a supported ULP.
# Need to use port group match rules get MPI to request SL 8.
#
qos-ulps
    ipoib : 0
    default : 0
end-qos-ulps

#
# Note that the first matching rule is used to assign the qos-level-name
# used to chose the SL to send on, and that anything that doesn't match
# one of the above rules will be assigned to the DEFAULT qos-level.
#
qos-match-rules
    qos-match-rule
	source: Compute_nodes
	destination: Compute_nodes
	qos-level-name: MPI
    end-qos-match-rule

    qos-match-rule
	source: Lustre_nodes
	qos-level-name: Lustre
    end-qos-match-rule

    qos-match-rule
	destination: Lustre_nodes
	qos-level-name: Lustre
    end-qos-match-rule

    # Note that anything that doesn't match one of the above rules
    # will be assigned to the DEFAULT qos-level.
end-qos-match-rules

[-- Attachment #5: torus-2QoS-5x5x5.conf --]
[-- Type: text/plain, Size: 1680 bytes --]

# We want the torus routing engine to attempt to find a
# 5x5x5 torus in the fabric:
torus 5 5 5

# We need to tell the routing engine what directions we
# want the torus coordinate directions to be, by specifing
# the endpoints (switch GUID + port) of a link in each
# direction.  Here we specify positive coordinate directions:
xp_link 0x200000 1 0x200019 2  # S_0_0_0/P1 -> S_1_0_0/P2
yp_link 0x200000 3 0x200005 4  # S_0_0_0/P3 -> S_0_1_0/P4
zp_link 0x200000 5 0x200001 6  # S_0_0_0/P5 -> S_0_0_1/P6

# If one of the above switches were to fail, the routing
# engine would not have sufficient information to locate the
# torus in the fabric.  Specify a backup origin here:

next_origin
xp_link 0x20001f 1 0x200038 2  # S_1_1_1/P1 -> S_2_1_1/P2
yp_link 0x20001f 3 0x200024 4  # S_1_1_1/P3 -> S_1_2_1/P4
zp_link 0x20001f 5 0x200020 6  # S_1_1_1/P5 -> S_1_1_2/P6

# The torus routing engine uses the concept of a dateline,
# where a coordinate wraps from its maximum back to zero,
# in order to compute path SL values that provide routing
# that is free from credit loops.
#
# If it is forced by a failed switch to use the backup
# origin specification, that would cause the datelines
# to move, which would change many path SL values, which
# defeats one of the main benefits of this routing engine.
# So, describe the position of the original datelines
# relative to the backup origin as follows:
x_dateline -1
y_dateline -1
z_dateline -1

# You can specify as many backup origins as you like, but
# in practice, the torus routing engine is only guaranteed
# to be able to route around a single failed switch without
# introducing credit loops, so one backup origin is enough.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 00/12] Add specialized multicast support to new torus routing engine: torus-2QoS
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (11 preceding siblings ...)
  2009-11-20 19:27   ` torus-2QoS example input files (was Re: [PATCH 00/11] Add new torus routing engine: torus-2QoS) Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
       [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-12-18 20:50   ` [PATCH 01/12] opensm: Make error message for torus-2QoS dateline specification match code check Jim Schutt
                     ` (10 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

This patch series adds specialized multicast support to torus-2QoS,
a new routing engine designed to handle large fabrics connected with a
2D/3D torus topology.  The original patch series adding torus-2QoS
can be found here: http://www.spinics.net/lists/linux-rdma/msg01438.html

This patch series is intended to be applied on top of the previous series.
Patches 1-7 are cleanup patches that fix issues in the previous series
that were discoverd during development of this series.  Patches 8-9 are
preparation patches that make it possible for a routing engine to 
generate spanning trees for multicast groups.  Patches 10-12 implement
the specialized multicast spanning tree support required by torus-2QoS.

As described for the previous patch series, the torus-2QoS engine can
provide the following functionality on a 2D/3D torus:
- routing that is free of credit loops
- two levels of QoS, assuming switches support 8 data VLs
- ability to route around a single failed switch, and/or multiple failed
    links, without
    - introducing credit loops
    - changing path SL values
- very short run times, with good scaling properties as fabric size
    increases

However, in order to provide this funcionality, torus-2QoS must employ
all 4 available SL bits, and 3 data VL bits.  Thus, there are no available
resources on which to confine multicast routing, and multicast spanning
trees must be constructed to overlay unicast routes in such a way that
no credit loops are possible.  This patch set implements that, and provides
the above functionality for all fabrics for which torus-2QoS can generate
unicast routes which are free of credit loops.

The last patch in the series updates opensm/doc/current-routing.txt with
a description of how torus-2QoS generates spanning trees with the 
desired properties.


Jim Schutt (12):
  opensm: Make error message for torus-2QoS dateline specification
    match code check.
  opensm: torus-2QoS should fail to route if message deadlock is
    possible.
  opensm: Remove unused port specification from torus-2QoS config file
    parsing.
  opensm: Fix up some torus-2QoS comments to match code.
  opensm: Enforce torus-2QoS link ordering convention.
  opensm: Remove redundant function names in torus-2QoS logging.
  opensm: Make torus-2QoS always use OSM_LOG_INFO, never LOG_INFO.
  opensm: Add struct osm_routing_engine callback to build spanning
    trees for multicast.
  opensm: Make mcast_mgr_purge_tree() available outside
    osm_mcast_mgr.c.
  opensm: Implement master spanning tree for torus-2QoS multicast
    support.
  opensm: Implement multicast support for torus-2QoS.
  opensm: Update documentation to describe torus-2QoS multicast
    support.

 opensm/doc/current-routing.txt        |  121 ++++-
 opensm/include/opensm/osm_multicast.h |   33 ++
 opensm/include/opensm/osm_opensm.h    |    6 +
 opensm/opensm/osm_mcast_mgr.c         |   11 +-
 opensm/opensm/osm_ucast_torus.c       | 1001 +++++++++++++++++++++++++--------
 5 files changed, 931 insertions(+), 241 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 01/12] opensm: Make error message for torus-2QoS dateline specification match code check.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (12 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 00/12] Add specialized multicast support to new torus routing engine: torus-2QoS Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 02/12] opensm: torus-2QoS should fail to route if message deadlock is possible Jim Schutt
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 8eb2880..7108394 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -954,7 +954,7 @@ bool parse_dir_dateline(int c_dir, struct torus *t, const char *parse_sep)
 	if ((*dl < 0 && *dl <= -max_dl) || *dl >= max_dl)
 		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
 			"Error: dateline value for coordinate direction %d "
-			"must be %d <= dl <= %d\n",
+			"must be %d < dl < %d\n",
 			c_dir, -max_dl, max_dl);
 	else
 		success = true;
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 02/12] opensm: torus-2QoS should fail to route if message deadlock is possible.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (13 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 01/12] opensm: Make error message for torus-2QoS dateline specification match code check Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 03/12] opensm: Remove unused port specification from torus-2QoS config file parsing Jim Schutt
                     ` (8 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

The whole point of torus-2QoS is to provide deadlock-free routing
for a torus while enabling two quality of service levels.  The
ability to route around a failed switch provides a window to
repair the fabric with minimal impact to running applications.

So if the possibility of mesage deadlock is detected due to
the topology of missing switches, torus-2QoS should fail
to route.

Users of torus-2QoS can either configure multiple routing
algorithms, so another algorithm with different properties can
attempt to route the fabric, or configure no fallback algorithm
so that the last good torus-2QoS tables are left in the switches.

None of the alternatives are great:
- Having torus-2QoS route the fabric even though the missing
    switch topology allows message deadlock means applications
    may encounter poor performance due to message deadlock.
- Having another engine route the fabric means that any
    application that doesn't repath may trigger message deadlock
    due to inconsistencies between path SL values in use and
    path SL values required by the new engine for deadlock-free
    routing.
- Leaving the last good torus-2QoS tables in the switches means
    that traffic through the newly failed switch cannot be
    delivered.

It isn't clear which of these options has the least impact on
running applications, but the operational imperative is clear:
failures in a torus fabric routed with torus-2QoS need to be
repaired ASAP.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 7108394..bc87757 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -7659,10 +7659,12 @@ bool routable_torus(struct torus *t, struct fabric *f)
 			}
 		}
 
-	if (t->flags & MSG_DEADLOCK)
+	if (t->flags & MSG_DEADLOCK) {
 		OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
-			"Warning: missing switch topology "
-			"==> message deadlock possible!\n");
+			"Error: missing switch topology "
+			"==> message deadlock!\n");
+		success = false;
+	}
 	return success;
 }
 
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 03/12] opensm: Remove unused port specification from torus-2QoS config file parsing.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (14 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 02/12] opensm: torus-2QoS should fail to route if message deadlock is possible Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
       [not found]     ` <1261169461-2516-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-12-18 20:50   ` [PATCH 04/12] opensm: Fix up some torus-2QoS comments to match code Jim Schutt
                     ` (7 subsequent siblings)
  23 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

The torus-2QoS.conf file is used to provide persistent information
describing the torus topology that torus-2QoS should generate.
Only switch GUIDs are needed to specify torus coordinate direction
and dateline information - switch ports are not needed, and in
fact were discarded immediately after parsing, before this patch.

Just get rid of it.  While we're in the area, get rid of another
unused argument to the same parsing function.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |   13 +++----------
 1 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index bc87757..42582ce 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -833,8 +833,7 @@ bool parse_pg_max_ports(struct torus *t, const char *parse_sep)
 }
 
 static
-bool parse_guid_port(struct torus *t, guid_t *guid, int *port,
-		     const char *typestr, const char *parse_sep)
+bool parse_guid(struct torus *t, guid_t *guid, const char *parse_sep)
 {
 	char *val;
 	bool success = false;
@@ -845,11 +844,6 @@ bool parse_guid_port(struct torus *t, guid_t *guid, int *port,
 	*guid = strtoull(val, NULL, 0);
 	*guid = cl_hton64(*guid);
 
-	val = strtok(NULL, parse_sep);
-	if (!val)
-		goto out;
-	*port = strtol(val, NULL, 0);
-
 	success = true;
 out:
 	return success;
@@ -858,15 +852,14 @@ out:
 static
 bool parse_dir_link(int c_dir, struct torus *t, const char *parse_sep)
 {
-	int sw_port0, sw_port1;
 	guid_t sw_guid0, sw_guid1;
 	struct link *l;
 	bool success = false;
 
-	if (!parse_guid_port(t, &sw_guid0, &sw_port0, "switch", parse_sep))
+	if (!parse_guid(t, &sw_guid0, parse_sep))
 		goto out;
 
-	if (!parse_guid_port(t, &sw_guid1, &sw_port1, "switch", parse_sep))
+	if (!parse_guid(t, &sw_guid1, parse_sep))
 		goto out;
 
 	if (!t) {
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 04/12] opensm: Fix up some torus-2QoS comments to match code.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (15 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 03/12] opensm: Remove unused port specification from torus-2QoS config file parsing Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 06/12] opensm: Remove redundant function names in torus-2QoS logging Jim Schutt
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 42582ce..9e4a9eb 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -169,11 +169,11 @@ struct port_grp {
 /*
  * A struct t_switch is used to represent a switch as placed in a torus.
  *
- * A t_switch used to build an N-dimensional torus will have N+1 port groups,
+ * A t_switch used to build an N-dimensional torus will have 2N+1 port groups,
  * used as follows, assuming 0 <= d < N:
  *   port_grp[2d]   => links leaving in negative direction for coordinate d
  *   port_grp[2d+1] => links leaving in positive direction for coordinate d
- *   port_grp[N]    => endpoints local to switch; i.e., hosts on switch
+ *   port_grp[2N]   => endpoints local to switch; i.e., hosts on switch
  *
  * struct link objects referenced by a t_switch are assumed to be oriented:
  * traversing a link from link.end[0] to link.end[1] is always in the positive
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 06/12] opensm: Remove redundant function names in torus-2QoS logging.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (16 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 04/12] opensm: Fix up some torus-2QoS comments to match code Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 07/12] opensm: Make torus-2QoS always use OSM_LOG_INFO, never LOG_INFO Jim Schutt
                     ` (5 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index b740f93..0306af9 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -8437,7 +8437,7 @@ bool get_lid(struct port_grp *pg, unsigned p,
 	osm_port = ep->osm_port;
 	if (!(osm_port && osm_port->priv == ep)) {
 		OSM_LOG(&pg->sw->torus->osm->log, OSM_LOG_ERROR,
-			"Error: get_lid: ep->osm_port->priv != ep "
+			"Error: ep->osm_port->priv != ep "
 			"for sw 0x%04llu port %d\n",
 			ntohllu(((struct t_switch *)(ep->sw))->n_id), ep->port);
 		return false;
@@ -8528,8 +8528,8 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 		if (!sport) {
 			guid = osm_node_get_node_guid(osm_sport->p_node);
 			OSM_LOG(log, LOG_INFO,
-				"Error: get_torus_sl: osm_sport (GUID "
-				"0x%04llx) not in our fabric description\n",
+				"Error: osm_sport (GUID 0x%04llx) "
+				"not in our fabric description\n",
 				ntohllu(guid));
 			goto out;
 		}
@@ -8540,8 +8540,8 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 		if (!dport) {
 			guid = osm_node_get_node_guid(osm_dport->p_node);
 			OSM_LOG(log, LOG_INFO,
-				"Error: get_torus_sl: osm_dport (GUID "
-				"0x%04llx) not in our fabric description\n",
+				"Error: osm_dport (GUID 0x%04llx) "
+				"not in our fabric description\n",
 				ntohllu(guid));
 			goto out;
 		}
@@ -8553,14 +8553,14 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 	if (sport->type != SRCSINK) {
 		guid = osm_node_get_node_guid(osm_sport->p_node);
 		OSM_LOG(log, LOG_INFO,
-			"Error: get_torus_sl: osm_sport (GUID 0x%04llx) "
+			"Error: osm_sport (GUID 0x%04llx) "
 			"not a data src/sink port\n", ntohllu(guid));
 		goto out;
 	}
 	if (dport->type != SRCSINK) {
 		guid = osm_node_get_node_guid(osm_dport->p_node);
 		OSM_LOG(log, LOG_INFO,
-			"Error: get_torus_sl: osm_dport (GUID 0x%04llx) "
+			"Error: osm_dport (GUID 0x%04llx) "
 			"not a data src/sink port\n", ntohllu(guid));
 		goto out;
 	}
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 07/12] opensm: Make torus-2QoS always use OSM_LOG_INFO, never LOG_INFO.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (17 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 06/12] opensm: Remove redundant function names in torus-2QoS logging Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 08/12] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
                     ` (4 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 0306af9..61e0bf3 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -7971,7 +7971,7 @@ void torus_update_osm_sl2vl(void *context, osm_port_t *sw_mgmt_port,
 		guid_t guid;
 
 		guid = osm_node_get_node_guid(sw_mgmt_port->p_node);
-		OSM_LOG(log, LOG_INFO,
+		OSM_LOG(log, OSM_LOG_INFO,
 			"Error: osm_port (GUID 0x%04llx) "
 			"not in our fabric description\n", ntohllu(guid));
 		return;
@@ -8527,7 +8527,7 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 		sport = osm_port_relink_endpoint(osm_sport);
 		if (!sport) {
 			guid = osm_node_get_node_guid(osm_sport->p_node);
-			OSM_LOG(log, LOG_INFO,
+			OSM_LOG(log, OSM_LOG_INFO,
 				"Error: osm_sport (GUID 0x%04llx) "
 				"not in our fabric description\n",
 				ntohllu(guid));
@@ -8539,7 +8539,7 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 		dport = osm_port_relink_endpoint(osm_dport);
 		if (!dport) {
 			guid = osm_node_get_node_guid(osm_dport->p_node);
-			OSM_LOG(log, LOG_INFO,
+			OSM_LOG(log, OSM_LOG_INFO,
 				"Error: osm_dport (GUID 0x%04llx) "
 				"not in our fabric description\n",
 				ntohllu(guid));
@@ -8552,14 +8552,14 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 	 */
 	if (sport->type != SRCSINK) {
 		guid = osm_node_get_node_guid(osm_sport->p_node);
-		OSM_LOG(log, LOG_INFO,
+		OSM_LOG(log, OSM_LOG_INFO,
 			"Error: osm_sport (GUID 0x%04llx) "
 			"not a data src/sink port\n", ntohllu(guid));
 		goto out;
 	}
 	if (dport->type != SRCSINK) {
 		guid = osm_node_get_node_guid(osm_dport->p_node);
-		OSM_LOG(log, LOG_INFO,
+		OSM_LOG(log, OSM_LOG_INFO,
 			"Error: osm_dport (GUID 0x%04llx) "
 			"not a data src/sink port\n", ntohllu(guid));
 		goto out;
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 08/12] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (18 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 07/12] opensm: Make torus-2QoS always use OSM_LOG_INFO, never LOG_INFO Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 09/12] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

If a routing engine needs to compute spanning trees with special
properties, it needs a way to override the default implementation.
A routing engine callback provides that mechanism.  Routing engines
that can use the default implementation can leave the callback
pointer set to NULL.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_opensm.h |    6 ++++++
 opensm/opensm/osm_mcast_mgr.c      |    7 ++++++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/opensm/include/opensm/osm_opensm.h b/opensm/include/opensm/osm_opensm.h
index 90c6c0f..59df9ea 100644
--- a/opensm/include/opensm/osm_opensm.h
+++ b/opensm/include/opensm/osm_opensm.h
@@ -133,6 +133,8 @@ struct osm_routing_engine {
 	uint8_t (*path_sl)(void *context, IN uint8_t path_sl_hint,
 			   IN const osm_port_t *src_port,
 			   IN const osm_port_t *dst_port);
+	ib_api_status_t (*mcast_build_stree)(void *context,
+					     IN OUT osm_mgrp_box_t *mgb);
 	void (*delete) (void *context);
 	struct osm_routing_engine *next;
 };
@@ -167,6 +169,10 @@ struct osm_routing_engine {
 *	path_sl
 *		The callback for computing path SL.
 *
+*	mcast_build_stree
+*		The callback for building the spanning tree for multicast
+*		forwarding, called per MLID.
+*
 *	delete
 *		The delete method, may be used for routing engine
 *		internals cleanup.
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index 697fb58..e65e459 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -946,6 +946,7 @@ Exit:
 static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * sm, uint16_t mlid)
 {
 	ib_api_status_t status = IB_SUCCESS;
+	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
 	osm_mgrp_box_t *mbox;
 
 	OSM_LOG_ENTER(sm->p_log);
@@ -960,7 +961,11 @@ static ib_api_status_t mcast_mgr_process_mlid(osm_sm_t * sm, uint16_t mlid)
 
 	mbox = osm_get_mbox_by_mlid(sm->p_subn, cl_hton16(mlid));
 	if (mbox) {
-		status = mcast_mgr_build_spanning_tree(sm, mbox);
+		if (re && re->mcast_build_stree)
+			status = re->mcast_build_stree(re->context, mbox);
+		else
+			status = mcast_mgr_build_spanning_tree(sm, mbox);
+
 		if (status != IB_SUCCESS)
 			OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 0A17: "
 				"Unable to create spanning tree (%s) for mlid "
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 09/12] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (19 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 08/12] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:50   ` [PATCH 10/12] opensm: Implement master spanning tree for torus-2QoS multicast support Jim Schutt
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

A routing engine that needs to compute multicast spanning trees with
special properties will need to delete old trees.  There's already
a function that does this: mcast_mgr_purge_tree().

Make it available outside osm_mcast_mgr.c, and change the name
to follow the naming convention (osm_ prefix) for global functions.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_multicast.h |   33 +++++++++++++++++++++++++++++++++
 opensm/opensm/osm_mcast_mgr.c         |    4 ++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h
index 1da575d..df6ac6c 100644
--- a/opensm/include/opensm/osm_multicast.h
+++ b/opensm/include/opensm/osm_multicast.h
@@ -53,6 +53,7 @@
 #include <opensm/osm_mcm_port.h>
 #include <opensm/osm_subnet.h>
 #include <opensm/osm_log.h>
+#include <opensm/osm_sm.h>
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -193,6 +194,38 @@ osm_mgrp_t *osm_mgrp_new(IN osm_subn_t * subn, IN ib_net16_t mlid,
 *	Multicast Group, osm_mgrp_delete
 *********/
 
+/*
+ * Need a forward declaration to work around include loop:
+ * osm_sm.h <- osm_multicast.h
+ */
+struct osm_sm;
+
+/****f* OpenSM: Multicast Tree/osm_purge_mtree
+* NAME
+*	osm_purge_mtree
+*
+* DESCRIPTION
+*	Frees all the nodes in a multicast spanning tree
+*
+* SYNOPSIS
+*/
+void osm_purge_mtree(IN struct osm_sm * sm, IN osm_mgrp_box_t * mgb);
+/*
+* PARAMETERS
+*	sm
+*		[in] Pointer to osm_sm_t object.
+*	mgb
+*		[in] Pointer to an osm_mgrp_box_t object.
+*
+* RETURN VALUES
+*	None.
+*
+*
+* NOTES
+*
+* SEE ALSO
+*********/
+
 /****f* OpenSM: Multicast Group/osm_mgrp_is_guid
 * NAME
 *	osm_mgrp_is_guid
diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c
index e65e459..11a10ce 100644
--- a/opensm/opensm/osm_mcast_mgr.c
+++ b/opensm/opensm/osm_mcast_mgr.c
@@ -146,7 +146,7 @@ static void mcast_mgr_purge_tree_node(IN osm_mtree_node_t * p_mtn)
 	free(p_mtn);
 }
 
-static void mcast_mgr_purge_tree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
+void osm_purge_mtree(osm_sm_t * sm, IN osm_mgrp_box_t * mbox)
 {
 	OSM_LOG_ENTER(sm->p_log);
 
@@ -695,7 +695,7 @@ static ib_api_status_t mcast_mgr_build_spanning_tree(osm_sm_t * sm,
 	   on multicast forwarding table information if the user wants to
 	   preserve existing multicast routes.
 	 */
-	mcast_mgr_purge_tree(sm, mbox);
+	osm_purge_mtree(sm, mbox);
 
 	/* build the first "subset" containing all member ports */
 	if (make_port_list(&port_list, mbox)) {
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 10/12] opensm: Implement master spanning tree for torus-2QoS multicast support.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (20 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 09/12] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
@ 2009-12-18 20:50   ` Jim Schutt
  2009-12-18 20:51   ` [PATCH 11/12] opensm: Implement multicast support for torus-2QoS Jim Schutt
  2009-12-18 20:51   ` [PATCH 12/12] opensm: Update documentation to describe torus-2QoS multicast support Jim Schutt
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:50 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

In order to route a 2D/3D torus without credit loops while providing
support for two QoS levels, torus-2QoS needs to use 3 VL bits and
all 4 available SL bits.  This means that multicast traffic must
share SL values with unicast traffic, which in turn means that
multicast routing must be compatible with unicast routing to prevent
credit loops.

Torus-2QoS unicast routing is based on DOR, and it turns out to
be possible to construct spanning trees so that when multicast
and unicast traffic are overlaid, credit loops are not possible.

Here is a 2D example of such a spanning tree, where "x" is the
root switch, and each "+" is a non-root switch:

   +  +  +  +  +
   |  |  |  |  |
   +  +  +  +  +
   |  |  |  |  |
   +--+--x--+--+
   |  |  |  |  |
   +  +  +  +  +

For multicast traffic routed from root to tip, every turn in the
above spanning tree is a legal DOR turn.

For traffic routed from tip to root, and traffic routed through
the root, turns are not legal DOR turns.  However, to construct
a credit loop, the union of multicast routing on this spanning
tree with DOR unicast routing can only provide 3 of the 4 turns
needed for the loop.

In addition, if none of the above spanning tree branches crosses
a dateline used for unicast credit loop avoidance on a torus,
and multicast traffic is confined to SL 0 or SL 8 (recall that
torus-2QoS uses SL bit 3 to differentiate QoS level), then
multicast traffic also cannot contribute to the "ring" credit
loops that are otherwise possible in a torus.

Torus-2QoS uses these ideas to create a master spanning tree.
Every multicast group spanning tree will be constructed as a
subset of the master tree, with the same root as the master
tree.

Such multicast group spanning trees will in general not be
optimal for groups which are a subset of the full fabric.
However, this compromise must be made to enable support for
two QoS levels on a torus while preventing credit loops.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |  267 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 267 insertions(+), 0 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 61e0bf3..082fcf5 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -154,6 +154,19 @@ struct link {
  * type.  Furthermore, if that type is PASSTHRU, then the connected links:
  *   1) are parallel to a given coordinate direction
  *   2) share the same two switches as endpoints.
+ *
+ * Torus-2QoS uses one master spanning tree for multicast, of which every
+ * multicast group spanning tree is a subtree.  to_stree_root is a pointer
+ * to the next port_grp on the path to the master spanning tree root.
+ * to_stree_tip is a pointer to the next port_grp on the path to a master
+ * spanning tree branch tip.
+ *
+ * Each t_switch can have at most one port_grp with a non-NULL to_stree_root.
+ * Exactly one t_switch in the fabric will have all port_grp objects with
+ * to_stree_root NULL; it is the master spanning tree root.
+ *
+ * A t_switch with all port_grp objects where to_stree_tip is NULL is at a
+ * master spanning tree branch tip.
  */
 struct port_grp {
 	enum endpt_type type;
@@ -163,6 +176,8 @@ struct port_grp {
 	unsigned sw_dlid_cnt;	/* switch dlids routed through this group */
 	unsigned ca_dlid_cnt;	/* CA dlids routed through this group */
 	struct t_switch *sw;	/* what switch we're attached to */
+	struct port_grp *to_stree_root;
+	struct port_grp *to_stree_tip;
 	struct endpoint **port;
 };
 
@@ -8499,6 +8514,256 @@ bool torus_lft(struct torus *t, struct t_switch *sw)
 	return success;
 }
 
+static
+bool good_xy_ring(struct torus *t, int x, int y, int z)
+{
+	struct t_switch ****sw = t->sw;
+	bool good_ring = true;
+
+	for (x = 0; x < t->x_sz && good_ring; x++)
+		good_ring = sw[x][y][z];
+
+	for (y = 0; y < t->y_sz && good_ring; y++)
+		good_ring = sw[x][y][z];
+
+	return good_ring;
+}
+
+static
+struct t_switch *find_plane_mid(struct torus *t, int z)
+{
+	int x, dx, xm = t->x_sz / 2;
+	int y, dy, ym = t->y_sz / 2;
+	struct t_switch ****sw = t->sw;
+
+	if (good_xy_ring(t, xm, ym, z))
+		return sw[xm][ym][z];
+
+	for (dx = 1, dy = 1; dx <= xm && dy <= ym; dx++, dy++) {
+
+		x = canonicalize(xm - dx, t->x_sz);
+		y = canonicalize(ym - dy, t->y_sz);
+		if (good_xy_ring(t, x, y, z))
+			return sw[x][y][z];
+
+		x = canonicalize(xm + dx, t->x_sz);
+		y = canonicalize(ym + dy, t->y_sz);
+		if (good_xy_ring(t, x, y, z))
+			return sw[x][y][z];
+	}
+	return NULL;
+}
+
+static
+struct t_switch *find_stree_root(struct torus *t)
+{
+	int x, y, z, dz, zm = t->z_sz / 2;
+	struct t_switch ****sw = t->sw;
+	struct t_switch *root;
+	bool good_plane;
+
+	/*
+	 * Look for a switch near the "center" (wrt. the datelines) of the
+	 * torus, as that will be the most optimum spanning tree root.  Use
+	 * a search that is not exhaustive, on the theory that this routing
+	 * engine isn't useful anyway if too many switches are missing.
+	 *
+	 * Also, want to pick an x-y plane with no missing switches, so that
+	 * the master spanning tree construction algorithm doesn't have to
+	 * deal with needing a turn on a missing switch.
+	 */
+	for (dz = 0; dz <= zm; dz++) {
+
+		z = canonicalize(zm - dz, t->z_sz);
+		good_plane = true;
+		for (y = 0; y < t->y_sz && good_plane; y++)
+			for (x = 0; x < t->x_sz && good_plane; x++)
+				good_plane = sw[x][y][z];
+
+		if (good_plane) {
+			root = find_plane_mid(t, z);
+			if (root)
+				goto out;
+		}
+		if (!dz)
+			continue;
+
+		z = canonicalize(zm + dz, t->z_sz);
+		good_plane = true;
+		for (y = 0; y < t->y_sz && good_plane; y++)
+			for (x = 0; x < t->x_sz && good_plane; x++)
+				good_plane = sw[x][y][z];
+
+		if (good_plane) {
+			root = find_plane_mid(t, z);
+			if (root)
+				goto out;
+		}
+	}
+	/*
+	 * Note that torus-2QoS can route a torus that is missing an entire
+	 * column (switches with x,y constant, for all z values) without
+	 * deadlocks.
+	 *
+	 * if we've reached this point, we must have a column of missing
+	 * switches, as routable_torus() would have returned false for
+	 * any other configuration of missing switches that made it through
+	 * the above.
+	 *
+	 * So any switch in the mid-z plane will do as the root.
+	 */
+	root = find_plane_mid(t, zm);
+out:
+	return root;
+}
+
+static
+bool sw_in_master_stree(struct t_switch *sw)
+{
+	int g;
+	bool connected;
+
+	connected = sw == sw->torus->master_stree_root;
+	for (g = 0; g < 2 * TORUS_MAX_DIM; g++)
+		connected = connected || sw->ptgrp[g].to_stree_root;
+
+	return connected;
+}
+
+static
+void grow_master_stree_branch(struct t_switch *root, struct t_switch *tip,
+			      unsigned to_root_pg, unsigned to_tip_pg)
+{
+	root->ptgrp[to_tip_pg].to_stree_tip = &tip->ptgrp[to_root_pg];
+	tip->ptgrp[to_root_pg].to_stree_root = &root->ptgrp[to_tip_pg];
+}
+
+static
+void build_master_stree_branch(struct t_switch *branch_root, int cdir)
+{
+	struct t_switch *sw, *n_sw, *p_sw;
+	unsigned l, idx, cnt, pg, ng;
+
+	switch (cdir) {
+	case 0:
+		idx = branch_root->i;
+		cnt = branch_root->torus->x_sz;
+		break;
+	case 1:
+		idx = branch_root->j;
+		cnt = branch_root->torus->y_sz;
+		break;
+	case 2:
+		idx = branch_root->k;
+		cnt = branch_root->torus->z_sz;
+		break;
+	default:
+		goto out;
+	}
+	/*
+	 * This algorithm intends that a spanning tree branch never crosses
+	 * a dateline unless the 1-D ring for which we're building the branch
+	 * is interrupted by failure.  We need that guarantee to prevent
+	 * multicast/unicast credit loops.
+	 */
+	n_sw = branch_root;		/* tip of negative cdir branch */
+	ng = 2 * cdir;			/* negative cdir port group index */
+	p_sw = branch_root;		/* tip of positive cdir branch */
+	pg = 2 * cdir + 1;		/* positive cdir port group index */
+
+	for (l = idx; n_sw && l >= 1; l--) {
+		sw = ring_next_sw(n_sw, cdir, -1);
+		if (sw && !sw_in_master_stree(sw)) {
+			grow_master_stree_branch(n_sw, sw, pg, ng);
+			n_sw = sw;
+		} else
+			n_sw = NULL;
+	}
+	for (l = idx; p_sw && l < (cnt - 1); l++) {
+		sw = ring_next_sw(p_sw, cdir, 1);
+		if (sw && !sw_in_master_stree(sw)) {
+			grow_master_stree_branch(p_sw, sw, ng, pg);
+			p_sw = sw;
+		} else
+			p_sw = NULL;
+	}
+	if (n_sw && p_sw)
+		goto out;
+	/*
+	 * At least one branch couldn't grow to the dateline for this ring.
+	 * That means it is acceptable to grow the branch by crossing the
+	 * dateline.
+	 */
+	for (l = 0; l < cnt; l++) {
+		if (n_sw) {
+			sw = ring_next_sw(n_sw, cdir, -1);
+			if (sw && !sw_in_master_stree(sw)) {
+				grow_master_stree_branch(n_sw, sw, pg, ng);
+				n_sw = sw;
+			} else
+				n_sw = NULL;
+		}
+		if (p_sw) {
+			sw = ring_next_sw(p_sw, cdir, 1);
+			if (sw && !sw_in_master_stree(sw)) {
+				grow_master_stree_branch(p_sw, sw, ng, pg);
+				p_sw = sw;
+			} else
+				p_sw = NULL;
+		}
+		if (!(n_sw || p_sw))
+			break;
+	}
+out:
+	return;
+}
+
+static
+bool torus_master_stree(struct torus *t)
+{
+	int i, j, k;
+	bool success = false;
+	struct t_switch *stree_root = find_stree_root(t);
+
+	if (stree_root)
+		build_master_stree_branch(stree_root, 0);
+	else
+		goto out;
+
+	k = stree_root->k;
+	for (i = 0; i < t->x_sz; i++) {
+		j = stree_root->j;
+		if (t->sw[i][j][k])
+			build_master_stree_branch(t->sw[i][j][k], 1);
+
+		for (j = 0; j < t->y_sz; j++)
+			if (t->sw[i][j][k])
+				build_master_stree_branch(t->sw[i][j][k], 2);
+	}
+	/*
+	 * At this point we should have a master spanning tree that contains
+	 * every present switch, for all fabrics that torus-2QoS can route
+	 * without deadlocks.  Make sure this is the case; otherwise warn
+	 * and return failure so we get bug reports.
+	 */
+	success = true;
+	for (i = 0; i < t->x_sz; i++)
+		for (j = 0; j < t->y_sz; j++)
+			for (k = 0; k < t->z_sz; k++) {
+				struct t_switch *sw = t->sw[i][j][k];
+				if (!sw || sw_in_master_stree(sw))
+					continue;
+
+				success = false;
+				OSM_LOG(&t->osm->log, OSM_LOG_ERROR,
+					"Error: sw 0x%04llx (%d,%d,%d) not in "
+					"torus multicast master spanning tree\n",
+					ntohllu(sw->n_id), i, j, k);
+			}
+out:
+	return success;
+}
+
 int route_torus(struct torus *t)
 {
 	int s;
@@ -8507,6 +8772,8 @@ int route_torus(struct torus *t)
 	for (s = 0; s < (int)t->switch_cnt; s++)
 		success = torus_lft(t, t->sw_pool[s]) && success;
 
+	success = success && torus_master_stree(t);
+
 	return success ? 0 : -1;
 }
 
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 11/12] opensm: Implement multicast support for torus-2QoS.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (21 preceding siblings ...)
  2009-12-18 20:50   ` [PATCH 10/12] opensm: Implement master spanning tree for torus-2QoS multicast support Jim Schutt
@ 2009-12-18 20:51   ` Jim Schutt
  2009-12-18 20:51   ` [PATCH 12/12] opensm: Update documentation to describe torus-2QoS multicast support Jim Schutt
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:51 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg

Every multicast spanning tree used by torus-2QoS is a subset
of the master spanning tree built when unicast routing is
computed.  This is required because when QoS is enabled,
torus-2QoS needs to use the same SLs for unicast and multicast.
Thus, the multicast spanning trees must have special properties
to avoid credit loops between unicast and multicast traffic.

To build a spanning tree for a particular MLID, torus-2QoS just
needs to mark all the ports that participate in that multicast
group, then walk the master spanning tree and add switches
hosting the marked ports to the multicast group spanning tree.
Use a depth-first search of the master spanning tree for this.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |  250 +++++++++++++++++++++++++++++++++++++--
 1 files changed, 239 insertions(+), 11 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 082fcf5..e2eb324 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -300,6 +300,7 @@ struct torus {
 
 	struct coord_dirs *origin;
 	struct t_switch ****sw;
+	struct t_switch *master_stree_root;
 
 	unsigned flags;
 	int debug;
@@ -8515,6 +8516,241 @@ bool torus_lft(struct torus *t, struct t_switch *sw)
 }
 
 static
+osm_mtree_node_t *mcast_stree_branch(struct t_switch *sw, osm_switch_t *osm_sw,
+				     osm_mgrp_box_t *mgb, unsigned depth,
+				     unsigned *port_cnt, unsigned *max_depth)
+{
+	osm_mtree_node_t *mtn = NULL;
+	osm_mcast_tbl_t *mcast_tbl, *ds_mcast_tbl;
+	osm_node_t *ds_node;
+	struct t_switch *ds_sw;
+	struct port_grp *ptgrp;
+	struct link *link;
+	struct endpoint *port;
+	unsigned g, p;
+	unsigned mcast_fwd_ports = 0, mcast_end_ports = 0;
+
+	depth++;
+
+	if (osm_sw->priv != sw) {
+		OSM_LOG(&sw->torus->osm->log, OSM_LOG_INFO,
+			"Error: osm_sw (GUID 0x%04llx) "
+			"not in our fabric description\n",
+			ntohllu(osm_node_get_node_guid(osm_sw->p_node)));
+		goto out;
+	}
+	if (!osm_switch_supports_mcast(osm_sw)) {
+		OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+			"Error: osm_sw (GUID 0x%04llx) "
+			"does not support multicast\n",
+			ntohllu(osm_node_get_node_guid(osm_sw->p_node)));
+		goto out;
+	}
+	mtn = osm_mtree_node_new(osm_sw);
+	if (!mtn) {
+		OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+			"Insufficient memory to build multicast tree\n");
+		goto out;
+	}
+	mcast_tbl = osm_switch_get_mcast_tbl_ptr(osm_sw);
+	/*
+	 * Recurse to downstream switches, i.e. those closer to master
+	 * spanning tree branch tips.
+	 *
+	 * Note that if there are multiple ports in this port group, i.e.,
+	 * multiple parallel links, we can pick any one of them to use for
+	 * any individual MLID without causing loops.  Pick one based on MLID
+	 * for now, until someone turns up evidence we need to be smarter.
+	 *
+	 * Also, it might be we got called in a window between a switch getting
+	 * removed from the fabric, and torus-2QoS getting to rebuild its
+	 * fabric representation.  If that were to happen, our next hop
+	 * osm_switch pointer might be stale.  Look it up via opensm's fabric
+	 * description to be sure it's not.
+	 */
+	for (g = 0; g < 2 * TORUS_MAX_DIM; g++) {
+		ptgrp = &sw->ptgrp[g];
+		if (!ptgrp->to_stree_tip)
+			continue;
+
+		p = mgb->mlid % ptgrp->port_cnt;/* port # in port group */
+		p = ptgrp->port[p]->port;	/* now port # in switch */
+
+		ds_node = osm_node_get_remote_node(osm_sw->p_node, p, NULL);
+		ds_sw = ptgrp->to_stree_tip->sw;
+
+		if (!(ds_node && ds_node->sw &&
+		      ds_sw->osm_switch == ds_node->sw)) {
+			OSM_LOG(&sw->torus->osm->log, OSM_LOG_ERROR,
+				"Error: stale pointer to osm_sw "
+				"(GUID 0x%04llx)\n", ntohllu(ds_sw->n_id));
+			continue;
+		}
+		mtn->child_array[p] =
+			mcast_stree_branch(ds_sw, ds_node->sw, mgb,
+					   depth, port_cnt, max_depth);
+		if (!mtn->child_array[p])
+			continue;
+
+		osm_mcast_tbl_set(mcast_tbl, mgb->mlid, p);
+		mcast_fwd_ports++;
+		/*
+		 * Since we forward traffic for this multicast group on this
+		 * port, cause the switch on the other end of the link
+		 * to forward traffic back to us.  Do it now since have at
+		 * hand the link used; otherwise it'll be hard to figure out
+		 * later, and if we get it wrong we get a MC routing loop.
+		 */
+		link = sw->port[p]->link;
+		ds_mcast_tbl = osm_switch_get_mcast_tbl_ptr(ds_node->sw);
+
+		if (&link->end[0] == sw->port[p])
+			osm_mcast_tbl_set(ds_mcast_tbl, mgb->mlid,
+					  link->end[1].port);
+		else
+			osm_mcast_tbl_set(ds_mcast_tbl, mgb->mlid,
+					  link->end[0].port);
+	}
+	/*
+	 * Add any host ports marked as in mcast group into spanning tree.
+	 */
+	ptgrp = &sw->ptgrp[2 * TORUS_MAX_DIM];
+	for (p = 0; p < ptgrp->port_cnt; p++) {
+		port = ptgrp->port[p];
+		if (port->tmp) {
+			port->tmp = NULL;
+			mtn->child_array[port->port] = OSM_MTREE_LEAF;
+			osm_mcast_tbl_set(mcast_tbl, mgb->mlid, port->port);
+			mcast_end_ports++;
+		}
+	}
+	if (!(mcast_end_ports || mcast_fwd_ports)) {
+		free(mtn);
+		mtn = NULL;
+	} else if (depth > *max_depth)
+		*max_depth = depth;
+
+	*port_cnt += mcast_end_ports;
+out:
+	return mtn;
+}
+
+static
+osm_port_t *next_mgrp_box_port(osm_mgrp_box_t *mgb,
+			       cl_list_item_t **list_iterator,
+			       cl_map_item_t **map_iterator)
+{
+	osm_mgrp_t *mgrp;
+	osm_mcm_port_t *mcm_port;
+	osm_port_t *osm_port = NULL;
+	cl_map_item_t *m_item = *map_iterator;
+	cl_list_item_t *l_item = *list_iterator;
+
+next_mgrp:
+	if (!l_item)
+		l_item = cl_qlist_head(&mgb->mgrp_list);
+	if (l_item == cl_qlist_end(&mgb->mgrp_list)) {
+		l_item = NULL;
+		goto out;
+	}
+	mgrp = cl_item_obj(l_item, mgrp, list_item);
+
+	if (!m_item)
+		m_item = cl_qmap_head(&mgrp->mcm_port_tbl);
+	if (m_item == cl_qmap_end(&mgrp->mcm_port_tbl)) {
+		m_item = NULL;
+		l_item = cl_qlist_next(l_item);
+		goto next_mgrp;
+	}
+	mcm_port = cl_item_obj(m_item, mcm_port, map_item);
+	m_item = cl_qmap_next(m_item);
+	osm_port = mcm_port->port;
+out:
+	*list_iterator = l_item;
+	*map_iterator = m_item;
+	return osm_port;
+}
+
+static
+ib_api_status_t torus_mcast_stree(void *context, osm_mgrp_box_t *mgb)
+{
+	struct torus_context *ctx = context;
+	struct torus *t = ctx->torus;
+	cl_map_item_t *m_item = NULL;
+	cl_list_item_t *l_item = NULL;
+	osm_port_t *osm_port;
+	osm_switch_t *osm_sw;
+	struct endpoint *port;
+	unsigned port_cnt = 0, max_depth = 0;
+
+	osm_purge_mtree(&ctx->osm->sm, mgb);
+
+	/*
+	 * Build a spanning tree for a multicast group by first marking
+	 * the torus endpoints that are participating in the group.
+	 * Then do a depth-first search of the torus master spanning
+	 * tree to build up the spanning tree specific to this group.
+	 *
+	 * Since the torus master spanning tree is constructed specifically
+	 * to guarantee that multicast will not deadlock against unicast
+	 * when they share VLs, we can be sure that any multicast group
+	 * spanning tree constructed this way has the same property.
+	 */
+	while ((osm_port = next_mgrp_box_port(mgb, &l_item, &m_item))) {
+		port = osm_port->priv;
+		if (!(port && port->osm_port == osm_port)) {
+			port = osm_port_relink_endpoint(osm_port);
+			if (!port) {
+				guid_t id;
+				id = osm_node_get_node_guid(osm_port->p_node);
+				OSM_LOG(&ctx->osm->log, OSM_LOG_ERROR,
+					"Error: osm_port (GUID 0x%04llx) "
+					"not in our fabric description\n",
+					ntohllu(id));
+				continue;
+			}
+		}
+		/*
+		 * If this is a CA port, mark the switch port at the
+		 * other end of this port's link.
+		 *
+		 * By definition, a CA port is connected to end[1] of a link,
+		 * and the switch port is end[0].  See build_ca_link() and
+		 * link_srcsink().
+		 */
+		if (port->link)
+			port = &port->link->end[0];
+		port->tmp = osm_port;
+	}
+	/*
+	 * It might be we got called in a window between a switch getting
+	 * removed from the fabric, and torus-2QoS getting to rebuild its
+	 * fabric representation.  If that were to happen, our
+	 * master_stree_root->osm_switch pointer might be stale.  Look up
+	 * the osm_switch by GUID to be sure it's not.
+	 *
+	 * Also, call into mcast_stree_branch with depth = -1, because
+	 * depth at root switch needs to be 0.
+	 */
+	osm_sw = (osm_switch_t *)cl_qmap_get(&ctx->osm->subn.sw_guid_tbl,
+					     t->master_stree_root->n_id);
+	if (!(osm_sw && t->master_stree_root->osm_switch == osm_sw)) {
+		OSM_LOG(&ctx->osm->log, OSM_LOG_ERROR,
+			"Error: stale pointer to osm_sw (GUID 0x%04llx)\n",
+			ntohllu(t->master_stree_root->n_id));
+		return IB_ERROR;
+	}
+	mgb->root = mcast_stree_branch(t->master_stree_root, osm_sw,
+				       mgb, -1, &port_cnt, &max_depth);
+
+	OSM_LOG(&ctx->osm->log, OSM_LOG_VERBOSE,
+		"Configured MLID 0x%X for %u ports, max tree depth = %u\n",
+		mgb->mlid, port_cnt, max_depth);
+
+	return IB_SUCCESS;
+}
+
+static
 bool good_xy_ring(struct torus *t, int x, int y, int z)
 {
 	struct t_switch ****sw = t->sw;
@@ -8740,6 +8976,7 @@ bool torus_master_stree(struct torus *t)
 			if (t->sw[i][j][k])
 				build_master_stree_branch(t->sw[i][j][k], 2);
 	}
+	t->master_stree_root = stree_root;
 	/*
 	 * At this point we should have a master spanning tree that contains
 	 * every present switch, for all fabrics that torus-2QoS can route
@@ -8855,17 +9092,7 @@ uint8_t torus_path_sl(void *context, uint8_t path_sl_hint,
 
 	/*
 	 * If QoS was not requested by user, force path SLs into 8-15 range.
-	 * This leaves SL 0 available for multicast, and SL2VL mappings
-	 * will keep multicast traffic from deadlocking with unicast traffic.
-	 *
-	 * However, multicast might still deadlock against itself if multiple
-	 * multicast groups each use their own spanning tree.
-	 *
-	 * FIXME: it is possible to construct a spanning tree that can
-	 * overlay the DOR routing used for unicast in a way that multicast
-	 * and unicast can share VLs but cannot deadlock against each other.
-	 * Need to implement that and cause it to be used whenever the
-	 * torus-2QoS routing engine is used.
+	 * This leaves SL 0 available for multicast.
 	 */
 	if (t->flags & QOS_ENABLED)
 		sl |= sl_set_qos(sl_get_qos(path_sl_hint));
@@ -8963,6 +9190,7 @@ int osm_ucast_torus2QoS_setup(struct osm_routing_engine *r,
 	r->ucast_build_fwd_tables = torus_build_lfts;
 	r->update_sl2vl = torus_update_osm_sl2vl;
 	r->path_sl = torus_path_sl;
+	r->mcast_build_stree = torus_mcast_stree;
 	r->delete = torus_context_delete;
 	return 0;
 }
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 12/12] opensm: Update documentation to describe torus-2QoS multicast support.
       [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                     ` (22 preceding siblings ...)
  2009-12-18 20:51   ` [PATCH 11/12] opensm: Implement multicast support for torus-2QoS Jim Schutt
@ 2009-12-18 20:51   ` Jim Schutt
  23 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:51 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	jaschut-4OHPYypu0djtX7QSmKvirg


Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/doc/current-routing.txt |  121 +++++++++++++++++++++++++++++++++++++++-
 1 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/opensm/doc/current-routing.txt b/opensm/doc/current-routing.txt
index 141d793..78a2e01 100644
--- a/opensm/doc/current-routing.txt
+++ b/opensm/doc/current-routing.txt
@@ -400,8 +400,18 @@ Torus-2QoS Routing Algorithm
 ----------------------------
 
 Torus-2QoS is routing algorithm designed for large-scale 2D/3D torus fabrics.
-
-It is a DOR-based algorithm that avoids deadlocks that would otherwise
+The torus-2QoS routing engine can provide the following functionality on
+a 2D/3D torus:
+- routing that is free of credit loops
+- two levels of QoS, assuming switches support 8 data VLs
+- ability to route around a single failed switch, and/or multiple failed
+    links, without
+    - introducing credit loops
+    - changing path SL values
+- very short run times, with good scaling properties as fabric size
+    increases
+
+Torus-2QoS is a DOR-based algorithm that avoids deadlocks that would otherwise
 occur in a torus using the concept of a dateline for each torus dimension.
 It encodes into a path SL which datelines the path crosses as follows:
 
@@ -424,7 +434,7 @@ ports as follows:
     sl2vl(iport,oport,sl) = 0x1 & (sl >> cdir(oport));
 
 Thus torus-2QoS consumes 8 SL values (SL bits 0-2) and 2 VL values (VL bit 0)
- per QoS level to provide deadlock-free routing on a 3D torus.
+per QoS level to provide deadlock-free routing on a 3D torus.
 
 Torus-2QoS routes around link failure by "taking the long way around" any
 1D ring interrupted by a link failure.  For example, consider the 2D 6x5
@@ -538,3 +548,108 @@ path S-n-I-q-r-D, with illegal turn at switch I, and with hop I-q using a
 VL with bit 1 set.  In contrast to the earlier examples, the second hop
 after the illegal turn, q-r, can be used to construct a credit loop
 encircling the failed switches.
+
+Since torus-2QoS uses all four available SL bits, and the three data VL
+bits that are typically available in current switches, there is no way
+to use SL/VL values to separate multicast traffic from unicast traffic.
+Thus, torus-2QoS must generate multicast routing such that credit loops
+cannot arise from a combination of multicast and unicast path segments.
+
+It turns out that it is possible to construct spanning trees for multicast
+routing that have that property.  For the 2D 6x5 torus example above, here
+is the full-fabric spanning tree that torus-2QoS will construct, where "x"
+is the root switch and each "+" is a non-root switch:
+
+   4    +    +    +    +    +    +
+        |    |    |    |    |    |
+   3    +    +    +    +    +    +
+        |    |    |    |    |    |
+   2    +----+----+----x----+----+
+        |    |    |    |    |    |
+   1    +    +    +    +    +    +
+        |    |    |    |    |    |
+ y=0    +    +    +    +    +    +
+
+      x=0    1    2    3    4    5
+
+For multicast traffic routed from root to tip, every turn in the above
+spanning tree is a legal DOR turn.
+
+For traffic routed from tip to root, and some traffic routed through the
+root, turns are not legal DOR turns.  However, to construct a credit loop,
+the union of multicast routing on this spanning tree with DOR unicast
+routing can only provide 3 of the 4 turns needed for the loop.
+
+In addition, if none of the above spanning tree branches crosses a dateline
+used for unicast credit loop avoidance on a torus, and if multicast traffic
+is confined to SL 0 or SL 8 (recall that torus-2QoS uses SL bit 3 to
+differentiate QoS level), then multicast traffic also cannot contribute to
+the "ring" credit loops that are otherwise possible in a torus.
+
+Torus-2QoS uses these ideas to create a master spanning tree.  Every
+multicast group spanning tree will be constructed as a subset of the master
+tree, with the same root as the master tree.
+
+Such multicast group spanning trees will in general not be optimal for
+groups which are a subset of the full fabric. However, this compromise must
+be made to enable support for two QoS levels on a torus while preventing
+credit loops.
+
+In the presence of link or switch failures that result in a fabric for
+which torus-2QoS can generate credit-loop-free unicast routes, it is also
+possible to generate a master spanning tree for multicast that retains the
+required properties.  For example, consider that same 2D 6x5 torus, with
+the link from (2,2) to (3,2) failed.  Torus-2QoS will generate the following
+master spanning tree:
+
+   4    +    +    +    +    +    +
+        |    |    |    |    |    |
+   3    +    +    +    +    +    +
+        |    |    |    |    |    |
+   2  --+----+----+    x----+----+--
+        |    |    |    |    |    |
+   1    +    +    +    +    +    +
+        |    |    |    |    |    |
+ y=0    +    +    +    +    +    +
+
+      x=0    1    2    3    4    5
+
+Two things are notable about this master spanning tree.  First, assuming
+the x dateline was between x=5 and x=0, this spanning tree has a branch
+that crosses the dateline.  However, just as for unicast, crossing a
+dateline on a 1D ring (here, the ring for y=2) that is broken by a failure
+cannot contribute to a torus credit loop.
+
+Second, this spanning tree is no longer optimal even for multicast groups
+that encompass the entire fabric.  That, unfortunately, is a compromise that
+must be made to retain the other desirable properties of torus-2QoS routing.
+
+In the event that a single switch fails, torus-2QoS will generate a master
+spanning tree that has no "extra" turns by appropriately selecting a root
+switch.  In the 2D 6x5 torus example, assume now that the switch at (3,2),
+i.e. the root for a pristine fabric, fails.  Torus-2QoS will generate the
+following master spanning tree for that case:
+
+                       |
+   4    +    +    +    +    +    +
+        |    |    |    |    |    |
+   3    +    +    +    +    +    +
+        |    |    |         |    |
+   2    +    +    +         +    +
+        |    |    |         |    |
+   1    +----+----x----+----+----+
+        |    |    |    |    |    |
+ y=0    +    +    +    +    +    +
+                       |
+
+      x=0    1    2    3    4    5
+
+Assuming the y dateline was between y=4 and y=0, this spanning tree has
+a branch that crosses a dateline.  However, again this cannot contribute
+to credit loops as it occurs on a 1D ring (the ring for x=3) that is
+broken by a failure, as in the above example.
+
+Due to the use made by torus-2QoS of SLs and VLs, QoS configuration should
+only employ SL values 0 and 8, for both multicast and unicast.  Also,
+SL to VL map configuration must be under the complete control of torus-2QoS,
+so any user-supplied configuration must and will be ignored.
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 05/12] opensm: Enforce torus-2QoS link ordering convention.
       [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2009-12-18 20:54       ` Jim Schutt
  2010-02-16 16:16       ` [PATCH 0/3] opensm: Bug fixes for torus-2QoS patchset Jim Schutt
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]


The function ring_next_sw() used by torus-2QoS to build LFTs relies
on the ordering convention that the 1 end of a link is in the
positive coordinate direction WRT the 0 end.  Previously the links
were always built this way, but nothing enforced the convention.

This commit adds code to enforce the convention, including code
needed to label switches as they are installed into the torus,
rather than after all the torus switches are found.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---

I've attached the patch as a compressed file, as otherwise
it is too large to make it through the list.

-- Jim

 opensm/opensm/osm_ucast_torus.c |  433 +++++++++++++++++++++------------------
 1 files changed, 237 insertions(+), 196 deletions(-)



[-- Attachment #2: 0005-opensm-Enforce-torus-2QoS-link-ordering-convention.patch.bz2 --]
[-- Type: application/x-bzip, Size: 4697 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/12] opensm: Remove unused port specification from torus-2QoS config file parsing.
       [not found]     ` <1261169461-2516-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2009-12-18 20:56       ` Jim Schutt
  0 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2009-12-18 20:56 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w

[-- Attachment #1: Type: text/plain, Size: 221 bytes --]


The patch this email replies to changes the format of the torus-2QoS.conf
file.  The attached file works with the new format, and replaces the
example file sent in reply to the original torus-2QoS patch series.

-- Jim


[-- Attachment #2: torus-2QoS-5x5x5.conf --]
[-- Type: text/plain, Size: 1630 bytes --]

# We want the torus routing engine to attempt to find a
# 5x5x5 torus in the fabric:
torus 5 5 5

# We need to tell the routing engine what directions we
# want the torus coordinate directions to be, by specifing
# the endpoints (switch GUID only) of a link in each
# direction.  Here we specify positive coordinate directions:
xp_link 0x200000  0x200019   # S_0_0_0 -> S_1_0_0
yp_link 0x200000  0x200005   # S_0_0_0 -> S_0_1_0
zp_link 0x200000  0x200001   # S_0_0_0 -> S_0_0_1

# If one of the above switches were to fail, the routing
# engine would not have sufficient information to locate the
# torus in the fabric.  Specify a backup origin here:

next_origin
xp_link 0x20001f  0x200038   # S_1_1_1 -> S_2_1_1
yp_link 0x20001f  0x200024   # S_1_1_1 -> S_1_2_1
zp_link 0x20001f  0x200020   # S_1_1_1 -> S_1_1_2

# The torus routing engine uses the concept of a dateline,
# where a coordinate wraps from its maximum back to zero,
# in order to compute path SL values that provide routing
# that is free from credit loops.
#
# If it is forced by a failed switch to use the backup
# origin specification, that would cause the datelines
# to move, which would change many path SL values, which
# defeats one of the main benefits of this routing engine.
# So, describe the position of the original datelines
# relative to the backup origin as follows:
x_dateline -1
y_dateline -1
z_dateline -1

# You can specify as many backup origins as you like, but
# in practice, the torus routing engine is only guaranteed
# to be able to route around a single failed switch without
# introducing credit loops, so one backup origin is enough.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations.
       [not found]     ` <1258744509-11148-3-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2010-01-14 12:36       ` Yevgeny Kliteynik
       [not found]         ` <4B4F0FBD.3040308-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
  2010-02-10 16:15       ` Yevgeny Kliteynik
  1 sibling, 1 reply; 40+ messages in thread
From: Yevgeny Kliteynik @ 2010-01-14 12:36 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w

Hi Jim,

Just started reading this stuff, so it's going to take a while :-)
Meanwhile, first question:

On 20/Nov/09 21:15, Jim Schutt wrote:
> Note that the original code assumes that QoS setup is mostly static and
> based only on user configuration.  As a result, there is no provision for
> routing engines that want to compute contributions to the SL2VL maps.
>
> Fix this up by adding a callback to struct osm_routing_engine that computes
> a per-port SL2VL map, and call it from the appropriate place in the QoS
> setup path.
>
> Also need to move the call to osm_qos_setup() in do_sweep() to after the
> call to the routing engine, so that any SL2VL map contributions from the
> routing engine are based on the latest information.

[snip...]

> diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
> index 7540adc..c3f49dc 100644
> --- a/opensm/opensm/osm_state_mgr.c
> +++ b/opensm/opensm/osm_state_mgr.c
> @@ -1228,8 +1228,6 @@ repeat_discovery:
>
>   	osm_pkey_mgr_process(sm->p_subn->p_osm);
>
> -	osm_qos_setup(sm->p_subn->p_osm);
> -
>   	/* try to restore SA DB (this should be before lid_mgr
>   	   because we may want to disable clients reregistration
>   	when SA DB is restored) */
> @@ -1270,6 +1268,8 @@ repeat_discovery:
>   	    osm_ucast_cache_process(&sm->ucast_mgr))
>   		osm_ucast_mgr_process(&sm->ucast_mgr);
>
> +	osm_qos_setup(sm->p_subn->p_osm);
> +
>   	if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
>   		return;
>    

So I understand that QoS setup has to be re-applied every time routing
engine is executed. There's also another place where routing engine is
executed - when re-route is specifically required:

   1100          /*
   1101           * If we don't need to do a heavy sweep and we want to 
do a reroute,
   1102           * just reroute only.
   1103           */
   1104          if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
   1105 && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
   1106 && sm->p_subn->opt.force_heavy_sweep == FALSE
   1107 && sm->p_subn->force_heavy_sweep == FALSE
   1108 && sm->p_subn->force_reroute == TRUE
   1109 && sm->p_subn->subnet_initialization_error == FALSE) {
   ....
   1115
   1116                  osm_ucast_mgr_process(&sm->ucast_mgr);
   1117
   ....

Guess you need to call osm_qos_setup() here as well, right?

-- Yevgeny




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations.
       [not found]         ` <4B4F0FBD.3040308-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
@ 2010-01-14 16:01           ` Jim Schutt
  0 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-01-14 16:01 UTC (permalink / raw)
  To: kliteyn-VPRAkNaXOzVS1MOuV/RT9w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w


On Thu, 2010-01-14 at 05:36 -0700, Yevgeny Kliteynik wrote:
> Hi Jim,
> 
> Just started reading this stuff, so it's going to take a while :-)

Thanks for taking a look.

> Meanwhile, first question:
> 
> On 20/Nov/09 21:15, Jim Schutt wrote:
> > Note that the original code assumes that QoS setup is mostly static and
> > based only on user configuration.  As a result, there is no provision for
> > routing engines that want to compute contributions to the SL2VL maps.
> >
> > Fix this up by adding a callback to struct osm_routing_engine that computes
> > a per-port SL2VL map, and call it from the appropriate place in the QoS
> > setup path.
> >
> > Also need to move the call to osm_qos_setup() in do_sweep() to after the
> > call to the routing engine, so that any SL2VL map contributions from the
> > routing engine are based on the latest information.
> 
> [snip...]
> 
> > diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
> > index 7540adc..c3f49dc 100644
> > --- a/opensm/opensm/osm_state_mgr.c
> > +++ b/opensm/opensm/osm_state_mgr.c
> > @@ -1228,8 +1228,6 @@ repeat_discovery:
> >
> >   	osm_pkey_mgr_process(sm->p_subn->p_osm);
> >
> > -	osm_qos_setup(sm->p_subn->p_osm);
> > -
> >   	/* try to restore SA DB (this should be before lid_mgr
> >   	   because we may want to disable clients reregistration
> >   	when SA DB is restored) */
> > @@ -1270,6 +1268,8 @@ repeat_discovery:
> >   	    osm_ucast_cache_process(&sm->ucast_mgr))
> >   		osm_ucast_mgr_process(&sm->ucast_mgr);
> >
> > +	osm_qos_setup(sm->p_subn->p_osm);
> > +
> >   	if (wait_for_pending_transactions(&sm->p_subn->p_osm->stats))
> >   		return;
> >    
> 
> So I understand that QoS setup has to be re-applied every time routing
> engine is executed. There's also another place where routing engine is
> executed - when re-route is specifically required:
> 
>    1100          /*
>    1101           * If we don't need to do a heavy sweep and we want to 
> do a reroute,
>    1102           * just reroute only.
>    1103           */
>    1104          if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>    1105 && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>    1106 && sm->p_subn->opt.force_heavy_sweep == FALSE
>    1107 && sm->p_subn->force_heavy_sweep == FALSE
>    1108 && sm->p_subn->force_reroute == TRUE
>    1109 && sm->p_subn->subnet_initialization_error == FALSE) {
>    ....
>    1115
>    1116                  osm_ucast_mgr_process(&sm->ucast_mgr);
>    1117
>    ....
> 
> Guess you need to call osm_qos_setup() here as well, right?

Yep.  I missed that one.

Good catch, thanks.

-- Jim

> 
> -- Yevgeny
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.
       [not found]     ` <1258744509-11148-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2010-01-14 16:24       ` Yevgeny Kliteynik
       [not found]         ` <4B4F452B.7040007-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Yevgeny Kliteynik @ 2010-01-14 16:24 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w

Jim,

On 20/Nov/09 21:15, Jim Schutt wrote:
> LASH already does this, in a hard-coded fashion.
>
> Generalize this by adding a callback to struct osm_routing_engine that
> computes a path SL value, and fix up LASH to use it.
>
> This patchset causes the requested or QoS-computed SL value to be passed
> to the routing engine path SL computation as a hint.  In the event the
> routing engine's use of SLs allows it to support more than one QoS level,
> it may be able to make use of the SL hint to do so.
>
> For now, LASH just ignores the hint.
>
> Note that before this change, if LASH was configured and a specific path
> SL value was requested that differed from what LASH needed to route the
> fabric without credit loops, the path SL lookup would fail.  Now LASH's
> SL value is always used.
>
> Possibly the choice between failing a path SL request when it conflicts
> with routing, vs. always providing an SL value that gives a credit-loop-
> free routing, should be user-configurable?

SL can come from the following places:
  - user requested specific SL in PathRecord query
  - QoS policy configuration
  - SL specified in partition parameters
  - basic QoS (no policies, only SL2VL table)
  - routing engine

Except for QoS policy being able to override SL that is specified in
the partition parameters (with an error message in the log), IMHO if
there's a conflict between SLs coming from different constraints
PathRecord should fail to find a satisfiable path, or at least we
should see some error message in the log that the selected SL
conflicts with other OSM configurations, but will be used anyway.

[snip...]

>
> @@ -725,6 +707,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>   		goto Exit;
>   	}
>
> +	/*
> +	 * If the routing engine wants to have a say in path SL selection,
> +	 * send the currently computed SL value as a hint and let the routing
> +	 * engine override it.
> +	 */
> +	if (p_re&&  p_re->path_sl)
> +		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
>    

In addition to error message if routing engine overrides the provided
hint, need to check whether the returned SL is valid - check the
corresponding bit in valid_sl_mask. It might be irrelevant for torus-2QoS
routing (not sure yet, need to read more patches :-) ), but it's
probably needed in general case.

Also, perhaps it would be better to provide the bitmask of available
SLs as a hint if there are more than one suitable SL?

I mean something like this (didn't try it, didn't even compile it,
need corresponding change in the p_re->path_sl callback, it's just
to illustrate what I mean):

---
  opensm/opensm/osm_sa_path_record.c |   47 
++++++++++++++++++++++-------------
  1 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index 7120d65..6de8979 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -171,7 +171,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
osm_sa_t * sa,
      uint8_t required_mtu;
      uint8_t required_rate;
      uint8_t required_pkt_life;
-    uint8_t sl;
+    uint8_t sl = OSM_DEFAULT_SL;
      uint8_t in_port_num;
      ib_net16_t dest_lid;
      uint8_t i;
@@ -688,33 +688,44 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
osm_sa_t * sa,
                  cl_ntoh16(pkey), sl);
          } else
              sl = p_prtn->sl;
-    } else if (sa->p_subn->opt.qos) {
+    }
+
+    /*
+     * If the routing engine wants to have a say in path SL selection,
+     * send the currently computed SL value as a hint and let the routing
+     * engine override it.
+     */
+    if (p_re && p_re->path_sl)
+        sl = p_re->path_sl(p_re->context, valid_sl_mask, p_src_port, 
p_dest_port);
+
+    if (sa->p_subn->opt.qos && !(valid_sl_mask & (1 << sl))) {
+        OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F24: "
+            "Selected SL (%u) leads to VL15\n", sl);
+        status = IB_NOT_FOUND;
+        goto Exit;
+    }
+
+    if (!(p_re && p_re->path_sl) &&
+        !(comp_mask & IB_PR_COMPMASK_SL) &&
+        !(p_qos_level && p_qos_level->sl_set) &&
+        !pkey &&
+        (sa->p_subn->opt.qos)) {
          if (valid_sl_mask & (1 << OSM_DEFAULT_SL))
              sl = OSM_DEFAULT_SL;
          else {
              for (i = 0; i < IB_MAX_NUM_VLS; i++)
                  if (valid_sl_mask & (1 << i))
                      break;
+            if (i == IB_MAX_NUM_VLS) {
+                OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR ABCD: "
+                    "bla bla bla\n");
+                status = IB_NOT_FOUND;
+                goto Exit;
+            }
              sl = i;
          }
-    } else
-        sl = OSM_DEFAULT_SL;
-
-    if (sa->p_subn->opt.qos && !(valid_sl_mask & (1 << sl))) {
-        OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F24: "
-            "Selected SL (%u) leads to VL15\n", sl);
-        status = IB_NOT_FOUND;
-        goto Exit;
      }

-    /*
-     * If the routing engine wants to have a say in path SL selection,
-     * send the currently computed SL value as a hint and let the routing
-     * engine override it.
-     */
-    if (p_re && p_re->path_sl)
-        sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
-
      /* reset pkey when raw traffic */
      if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
          cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))
-- 
1.5.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.
       [not found]         ` <4B4F452B.7040007-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
@ 2010-01-18 19:24           ` Jim Schutt
       [not found]             ` <1263842661.5550.43.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Jim Schutt @ 2010-01-18 19:24 UTC (permalink / raw)
  To: kliteyn-VPRAkNaXOzVS1MOuV/RT9w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w


Hi Yevgeny,

On Thu, 2010-01-14 at 09:24 -0700, Yevgeny Kliteynik wrote:
> Jim,
> 
> On 20/Nov/09 21:15, Jim Schutt wrote:
> > LASH already does this, in a hard-coded fashion.
> >
> > Generalize this by adding a callback to struct osm_routing_engine that
> > computes a path SL value, and fix up LASH to use it.
> >
> > This patchset causes the requested or QoS-computed SL value to be passed
> > to the routing engine path SL computation as a hint.  In the event the
> > routing engine's use of SLs allows it to support more than one QoS level,
> > it may be able to make use of the SL hint to do so.
> >
> > For now, LASH just ignores the hint.
> >
> > Note that before this change, if LASH was configured and a specific path
> > SL value was requested that differed from what LASH needed to route the
> > fabric without credit loops, the path SL lookup would fail.  Now LASH's
> > SL value is always used.
> >
> > Possibly the choice between failing a path SL request when it conflicts
> > with routing, vs. always providing an SL value that gives a credit-loop-
> > free routing, should be user-configurable?
> 
> SL can come from the following places:
>   - user requested specific SL in PathRecord query
>   - QoS policy configuration
>   - SL specified in partition parameters
>   - basic QoS (no policies, only SL2VL table)
>   - routing engine
> 
> Except for QoS policy being able to override SL that is specified in
> the partition parameters (with an error message in the log), IMHO if
> there's a conflict between SLs coming from different constraints
> PathRecord should fail to find a satisfiable path, or at least we
> should see some error message in the log that the selected SL
> conflicts with other OSM configurations, but will be used anyway.
> 
> [snip...]
> 
> >
> > @@ -725,6 +707,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
> >   		goto Exit;
> >   	}
> >
> > +	/*
> > +	 * If the routing engine wants to have a say in path SL selection,
> > +	 * send the currently computed SL value as a hint and let the routing
> > +	 * engine override it.
> > +	 */
> > +	if (p_re&&  p_re->path_sl)
> > +		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
> >    
> 
> In addition to error message if routing engine overrides the provided
> hint, need to check whether the returned SL is valid - check the
> corresponding bit in valid_sl_mask. It might be irrelevant for torus-2QoS
> routing (not sure yet, need to read more patches :-) ), but it's
> probably needed in general case.
> 
> Also, perhaps it would be better to provide the bitmask of available
> SLs as a hint if there are more than one suitable SL?
> 
> I mean something like this (didn't try it, didn't even compile it,
> need corresponding change in the p_re->path_sl callback, it's just
> to illustrate what I mean):

Your suggestion below won't accomplish what I was trying to 
accomplish.

Torus-2QoS needs to encode global path information into the
SL value in order to provide routing free of credit loops.

But it only needs 3 bits of SL to do this, leaving one free.
So, it uses that bit to provide two "levels" of quality of
service.

This usage of SL clashes with the QoS policy engine, which
uses each SL value to provide up to 16 "levels" of quality
of service.  So to the QoS policy engine, every SL value
is distinct, but to torus-2QoS, SL values 0-7 are all the
same wrt. QoS "level", and SL values 8-15 are also all the
same wrt. a second QoS "level".

I wanted to use the QoS policy engine to configure QoS
"level" in torus-2QoS, so I used this "hint" idea.
What torus-2QoS' path_sl() does is append the high-order
bit from the SL hint, as computed by the QoS policy engine,
onto the 3 low-order bits that it computes are needed 
to avoid deadlock.

Does that help explain what I'm after?

-- Jim

> 
> ---
>   opensm/opensm/osm_sa_path_record.c |   47 
> ++++++++++++++++++++++-------------
>   1 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/opensm/opensm/osm_sa_path_record.c 
> b/opensm/opensm/osm_sa_path_record.c
> index 7120d65..6de8979 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -171,7 +171,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
> osm_sa_t * sa,
>       uint8_t required_mtu;
>       uint8_t required_rate;
>       uint8_t required_pkt_life;
> -    uint8_t sl;
> +    uint8_t sl = OSM_DEFAULT_SL;
>       uint8_t in_port_num;
>       ib_net16_t dest_lid;
>       uint8_t i;
> @@ -688,33 +688,44 @@ static ib_api_status_t pr_rcv_get_path_parms(IN 
> osm_sa_t * sa,
>                   cl_ntoh16(pkey), sl);
>           } else
>               sl = p_prtn->sl;
> -    } else if (sa->p_subn->opt.qos) {
> +    }
> +
> +    /*
> +     * If the routing engine wants to have a say in path SL selection,
> +     * send the currently computed SL value as a hint and let the routing
> +     * engine override it.
> +     */
> +    if (p_re && p_re->path_sl)
> +        sl = p_re->path_sl(p_re->context, valid_sl_mask, p_src_port, 
> p_dest_port);
> +
> +    if (sa->p_subn->opt.qos && !(valid_sl_mask & (1 << sl))) {
> +        OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F24: "
> +            "Selected SL (%u) leads to VL15\n", sl);
> +        status = IB_NOT_FOUND;
> +        goto Exit;
> +    }
> +
> +    if (!(p_re && p_re->path_sl) &&
> +        !(comp_mask & IB_PR_COMPMASK_SL) &&
> +        !(p_qos_level && p_qos_level->sl_set) &&
> +        !pkey &&
> +        (sa->p_subn->opt.qos)) {
>           if (valid_sl_mask & (1 << OSM_DEFAULT_SL))
>               sl = OSM_DEFAULT_SL;
>           else {
>               for (i = 0; i < IB_MAX_NUM_VLS; i++)
>                   if (valid_sl_mask & (1 << i))
>                       break;
> +            if (i == IB_MAX_NUM_VLS) {
> +                OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR ABCD: "
> +                    "bla bla bla\n");
> +                status = IB_NOT_FOUND;
> +                goto Exit;
> +            }
>               sl = i;
>           }
> -    } else
> -        sl = OSM_DEFAULT_SL;
> -
> -    if (sa->p_subn->opt.qos && !(valid_sl_mask & (1 << sl))) {
> -        OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F24: "
> -            "Selected SL (%u) leads to VL15\n", sl);
> -        status = IB_NOT_FOUND;
> -        goto Exit;
>       }
> 
> -    /*
> -     * If the routing engine wants to have a say in path SL selection,
> -     * send the currently computed SL value as a hint and let the routing
> -     * engine override it.
> -     */
> -    if (p_re && p_re->path_sl)
> -        sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
> -
>       /* reset pkey when raw traffic */
>       if (comp_mask & IB_PR_COMPMASK_RAWTRAFFIC &&
>           cl_ntoh32(p_pr->hop_flow_raw) & (1 << 31))


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations.
       [not found]             ` <1263842661.5550.43.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
@ 2010-01-18 20:19               ` Yevgeny Kliteynik
  0 siblings, 0 replies; 40+ messages in thread
From: Yevgeny Kliteynik @ 2010-01-18 20:19 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w

Hi Jim,

On 18/Jan/10 21:24, Jim Schutt wrote:
> Hi Yevgeny,
>
> On Thu, 2010-01-14 at 09:24 -0700, Yevgeny Kliteynik wrote:
>    
>> Jim,
>>
>> On 20/Nov/09 21:15, Jim Schutt wrote:
>>      
>>> LASH already does this, in a hard-coded fashion.
>>>
>>> Generalize this by adding a callback to struct osm_routing_engine that
>>> computes a path SL value, and fix up LASH to use it.
>>>
>>> This patchset causes the requested or QoS-computed SL value to be passed
>>> to the routing engine path SL computation as a hint.  In the event the
>>> routing engine's use of SLs allows it to support more than one QoS level,
>>> it may be able to make use of the SL hint to do so.
>>>
>>> For now, LASH just ignores the hint.
>>>
>>> Note that before this change, if LASH was configured and a specific path
>>> SL value was requested that differed from what LASH needed to route the
>>> fabric without credit loops, the path SL lookup would fail.  Now LASH's
>>> SL value is always used.
>>>
>>> Possibly the choice between failing a path SL request when it conflicts
>>> with routing, vs. always providing an SL value that gives a credit-loop-
>>> free routing, should be user-configurable?
>>>        
>> SL can come from the following places:
>>    - user requested specific SL in PathRecord query
>>    - QoS policy configuration
>>    - SL specified in partition parameters
>>    - basic QoS (no policies, only SL2VL table)
>>    - routing engine
>>
>> Except for QoS policy being able to override SL that is specified in
>> the partition parameters (with an error message in the log), IMHO if
>> there's a conflict between SLs coming from different constraints
>> PathRecord should fail to find a satisfiable path, or at least we
>> should see some error message in the log that the selected SL
>> conflicts with other OSM configurations, but will be used anyway.
>>
>> [snip...]
>>
>>      
>>> @@ -725,6 +707,14 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>>>    		goto Exit;
>>>    	}
>>>
>>> +	/*
>>> +	 * If the routing engine wants to have a say in path SL selection,
>>> +	 * send the currently computed SL value as a hint and let the routing
>>> +	 * engine override it.
>>> +	 */
>>> +	if (p_re&&   p_re->path_sl)
>>> +		sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
>>>
>>>        
>> In addition to error message if routing engine overrides the provided
>> hint, need to check whether the returned SL is valid - check the
>> corresponding bit in valid_sl_mask. It might be irrelevant for torus-2QoS
>> routing (not sure yet, need to read more patches :-) ), but it's
>> probably needed in general case.
>>
>> Also, perhaps it would be better to provide the bitmask of available
>> SLs as a hint if there are more than one suitable SL?
>>
>> I mean something like this (didn't try it, didn't even compile it,
>> need corresponding change in the p_re->path_sl callback, it's just
>> to illustrate what I mean):
>>      
> Your suggestion below won't accomplish what I was trying to
> accomplish.
>
> Torus-2QoS needs to encode global path information into the
> SL value in order to provide routing free of credit loops.
>
> But it only needs 3 bits of SL to do this, leaving one free.
> So, it uses that bit to provide two "levels" of quality of
> service.
>
> This usage of SL clashes with the QoS policy engine, which
> uses each SL value to provide up to 16 "levels" of quality
> of service.  So to the QoS policy engine, every SL value
> is distinct, but to torus-2QoS, SL values 0-7 are all the
> same wrt. QoS "level", and SL values 8-15 are also all the
> same wrt. a second QoS "level".
>    

Understood. Please ignore my suggestion.

-- Yevgeny

> I wanted to use the QoS policy engine to configure QoS
> "level" in torus-2QoS, so I used this "hint" idea.
> What torus-2QoS' path_sl() does is append the high-order
> bit from the SL hint, as computed by the QoS policy engine,
> onto the 3 low-order bits that it computes are needed
> to avoid deadlock.
>
> Does that help explain what I'm after?
>
> -- Jim
>
>    
>> ---
>>    opensm/opensm/osm_sa_path_record.c |   47
>> ++++++++++++++++++++++-------------
>>    1 files changed, 29 insertions(+), 18 deletions(-)
>>
>> diff --git a/opensm/opensm/osm_sa_path_record.c
>> b/opensm/opensm/osm_sa_path_record.c
>> index 7120d65..6de8979 100644
>> --- a/opensm/opensm/osm_sa_path_record.c
>> +++ b/opensm/opensm/osm_sa_path_record.c
>> @@ -171,7 +171,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN
>> osm_sa_t * sa,
>>        uint8_t required_mtu;
>>        uint8_t required_rate;
>>        uint8_t required_pkt_life;
>> -    uint8_t sl;
>> +    uint8_t sl = OSM_DEFAULT_SL;
>>        uint8_t in_port_num;
>>        ib_net16_t dest_lid;
>>        uint8_t i;
>> @@ -688,33 +688,44 @@ static ib_api_status_t pr_rcv_get_path_parms(IN
>> osm_sa_t * sa,
>>                    cl_ntoh16(pkey), sl);
>>            } else
>>                sl = p_prtn->sl;
>> -    } else if (sa->p_subn->opt.qos) {
>> +    }
>> +
>> +    /*
>> +     * If the routing engine wants to have a say in path SL selection,
>> +     * send the currently computed SL value as a hint and let the routing
>> +     * engine override it.
>> +     */
>> +    if (p_re&&  p_re->path_sl)
>> +        sl = p_re->path_sl(p_re->context, valid_sl_mask, p_src_port,
>> p_dest_port);
>> +
>> +    if (sa->p_subn->opt.qos&&  !(valid_sl_mask&  (1<<  sl))) {
>> +        OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F24: "
>> +            "Selected SL (%u) leads to VL15\n", sl);
>> +        status = IB_NOT_FOUND;
>> +        goto Exit;
>> +    }
>> +
>> +    if (!(p_re&&  p_re->path_sl)&&
>> +        !(comp_mask&  IB_PR_COMPMASK_SL)&&
>> +        !(p_qos_level&&  p_qos_level->sl_set)&&
>> +        !pkey&&
>> +        (sa->p_subn->opt.qos)) {
>>            if (valid_sl_mask&  (1<<  OSM_DEFAULT_SL))
>>                sl = OSM_DEFAULT_SL;
>>            else {
>>                for (i = 0; i<  IB_MAX_NUM_VLS; i++)
>>                    if (valid_sl_mask&  (1<<  i))
>>                        break;
>> +            if (i == IB_MAX_NUM_VLS) {
>> +                OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR ABCD: "
>> +                    "bla bla bla\n");
>> +                status = IB_NOT_FOUND;
>> +                goto Exit;
>> +            }
>>                sl = i;
>>            }
>> -    } else
>> -        sl = OSM_DEFAULT_SL;
>> -
>> -    if (sa->p_subn->opt.qos&&  !(valid_sl_mask&  (1<<  sl))) {
>> -        OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1F24: "
>> -            "Selected SL (%u) leads to VL15\n", sl);
>> -        status = IB_NOT_FOUND;
>> -        goto Exit;
>>        }
>>
>> -    /*
>> -     * If the routing engine wants to have a say in path SL selection,
>> -     * send the currently computed SL value as a hint and let the routing
>> -     * engine override it.
>> -     */
>> -    if (p_re&&  p_re->path_sl)
>> -        sl = p_re->path_sl(p_re->context, sl, p_src_port, p_dest_port);
>> -
>>        /* reset pkey when raw traffic */
>>        if (comp_mask&  IB_PR_COMPMASK_RAWTRAFFIC&&
>>            cl_ntoh32(p_pr->hop_flow_raw)&  (1<<  31))
>>      
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>    
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations.
       [not found]     ` <1258744509-11148-3-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2010-01-14 12:36       ` Yevgeny Kliteynik
@ 2010-02-10 16:15       ` Yevgeny Kliteynik
       [not found]         ` <4B72DBBD.9020709-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
  1 sibling, 1 reply; 40+ messages in thread
From: Yevgeny Kliteynik @ 2010-02-10 16:15 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w

Hi Jim,

[snip...]

On 20/Nov/09 21:15, Jim Schutt wrote:
> diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
> index 08f9a60..f42c334 100644
> --- a/opensm/opensm/osm_qos.c
> +++ b/opensm/opensm/osm_qos.c
> @@ -194,6 +194,7 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
>   {
>   	ib_api_status_t status;
>   	uint8_t i, num_ports;
> +	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
>   	osm_physp_t *p_physp;
>
>   	if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) {
> @@ -213,8 +214,24 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
>   	}
>
>   	for (i = 0; i<  num_ports; i++) {
> +		ib_slvl_table_t routing_sl2vl;
> +		const ib_slvl_table_t *port_sl2vl;
> +		const ib_slvl_table_t *port_sl2vl_old;
> +
> +		if (re->update_sl2vl) {
>    

If routing failed, and no_fallback specified, OSM crashes here.
The simple fix is, of course, just fixing the condition to
"(re && re->update_sl2vl)", but I think that it would be better
not to apply QoS configuration if unicast manager failed - just
restart the sweep.

-- Yevgeny



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations.
       [not found]         ` <4B72DBBD.9020709-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
@ 2010-02-15 21:45           ` Jim Schutt
  0 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-02-15 21:45 UTC (permalink / raw)
  To: kliteyn-VPRAkNaXOzVS1MOuV/RT9w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w


On Wed, 2010-02-10 at 09:15 -0700, Yevgeny Kliteynik wrote:
> Hi Jim,
> 
> [snip...]
> 
> On 20/Nov/09 21:15, Jim Schutt wrote:
> > diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
> > index 08f9a60..f42c334 100644
> > --- a/opensm/opensm/osm_qos.c
> > +++ b/opensm/opensm/osm_qos.c
> > @@ -194,6 +194,7 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
> >   {
> >   	ib_api_status_t status;
> >   	uint8_t i, num_ports;
> > +	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
> >   	osm_physp_t *p_physp;
> >
> >   	if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) {
> > @@ -213,8 +214,24 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
> >   	}
> >
> >   	for (i = 0; i<  num_ports; i++) {
> > +		ib_slvl_table_t routing_sl2vl;
> > +		const ib_slvl_table_t *port_sl2vl;
> > +		const ib_slvl_table_t *port_sl2vl_old;
> > +
> > +		if (re->update_sl2vl) {
> >    
> 
> If routing failed, and no_fallback specified, OSM crashes here.
> The simple fix is, of course, just fixing the condition to
> "(re && re->update_sl2vl)", but 

This could cause message deadlock for applications still running
on parts of fabric that are sill operational, if the last successful
routing was via a routing engine that wants to set SL2VL map values,
because we would overwrite them with inappropriate values.

But the following equivalent change would be OK:

diff --git a/opensm/opensm/osm_qos.c b/opensm/opensm/osm_qos.c
index 0f0b24f..07f4836 100644
--- a/opensm/opensm/osm_qos.c
+++ b/opensm/opensm/osm_qos.c
@@ -197,6 +197,12 @@ static ib_api_status_t sl2vl_update(osm_sm_t * sm, osm_port_t * p_port,
 	struct osm_routing_engine *re = sm->p_subn->p_osm->routing_engine_used;
 	osm_physp_t *p_physp;
 
+	/*
+	 * Do nothing unless the most recent routing attempt was successful.
+	 */
+	if (!re)
+		return IB_SUCCESS;
+
 	if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) {
 		if (ib_port_info_get_vl_cap(&p->port_info) == 1) {
 			/* Check port 0's capability mask */


> I think that it would be better
> not to apply QoS configuration if unicast manager failed - just
> restart the sweep.

I think you are right.  Something like this?


diff --git a/opensm/opensm/osm_state_mgr.c
b/opensm/opensm/osm_state_mgr.c
index 10d5e09..d8d4c9e 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1113,7 +1113,11 @@ static void do_sweep(osm_sm_t * sm)
 		/* Re-program the switches fully */
 		sm->p_subn->ignore_existing_lfts = TRUE;
 
-		osm_ucast_mgr_process(&sm->ucast_mgr);
+		if (osm_ucast_mgr_process(&sm->ucast_mgr)) {
+			OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE,
+					"REROUTE FAILED");
+			return;
+		}
 		osm_qos_setup(sm->p_subn->p_osm);
 
 		/* Reset flag */
@@ -1272,12 +1276,14 @@ repeat_discovery:
 			"LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG");
 
 	/*
-	 * Proceed with unicast forwarding table configuration.
+	 * Proceed with unicast forwarding table configuration; repeat
+	 * if unicast routing fails.
 	 */
 
 	if (!sm->ucast_mgr.cache_valid ||
 	    osm_ucast_cache_process(&sm->ucast_mgr))
-		osm_ucast_mgr_process(&sm->ucast_mgr);
+		if (osm_ucast_mgr_process(&sm->ucast_mgr))
+			goto repeat_discovery;
 
 	osm_qos_setup(sm->p_subn->p_osm);
 
diff --git a/opensm/opensm/osm_ucast_mgr.c
b/opensm/opensm/osm_ucast_mgr.c
index fbc9244..8ea2e52 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -955,6 +955,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t *
p_mgr)
 	osm_opensm_t *p_osm;
 	struct osm_routing_engine *p_routing_eng;
 	cl_qmap_t *p_sw_guid_tbl;
+	int failed = 0;
 
 	OSM_LOG_ENTER(p_mgr->p_log);
 
@@ -973,7 +974,8 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t *
p_mgr)
 
 	p_osm->routing_engine_used = NULL;
 	while (p_routing_eng) {
-		if (!ucast_mgr_route(p_routing_eng, p_osm))
+		failed = ucast_mgr_route(p_routing_eng, p_osm);
+		if (!failed)
 			break;
 		p_routing_eng = p_routing_eng->next;
 	}
@@ -984,9 +986,11 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t *
p_mgr)
 		struct osm_routing_engine *r = p_osm->default_routing_engine;
 
 		r->build_lid_matrices(r->context);
-		r->ucast_build_fwd_tables(r->context);
-		p_osm->routing_engine_used = r;
-		osm_ucast_mgr_set_fwd_tables(p_mgr);
+		failed = r->ucast_build_fwd_tables(r->context);
+		if (!failed) {
+			p_osm->routing_engine_used = r;
+			osm_ucast_mgr_set_fwd_tables(p_mgr);
+		}
 	}
 
 	if (p_osm->routing_engine_used) {
@@ -1006,7 +1010,7 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t *
p_mgr)
 Exit:
 	CL_PLOCK_RELEASE(p_mgr->p_lock);
 	OSM_LOG_EXIT(p_mgr->p_log);
-	return 0;
+	return failed;
 }
 
 static int ucast_build_lid_matrices(void *context)


Thanks -- Jim


> 
> -- Yevgeny
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 0/3] opensm: Bug fixes for torus-2QoS patchset
       [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-12-18 20:54       ` [PATCH 05/12] opensm: Enforce torus-2QoS link ordering convention Jim Schutt
@ 2010-02-16 16:16       ` Jim Schutt
  2010-02-16 16:16       ` [PATCH 1/3] opensm: Use local variables when searching for torus-2QoS master spanning tree root Jim Schutt
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-02-16 16:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	kliteyn-VPRAkNaXOzVS1MOuV/RT9w, jaschut-4OHPYypu0djtX7QSmKvirg

These patches fix bugs discovered during further testing of the
torus-2QoS routing module for OpenSM.

(See http://www.spinics.net/lists/linux-rdma/msg01438.html
and http://www.spinics.net/lists/linux-rdma/msg01938.html)


Jim Schutt (3):
  opensm: Use local variables when searching for torus-2QoS master
    spanning tree root.
  opensm: Fix handling of torus-2QoS topology discovery for radix 4
    torus dimensions.
  opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS
    persistent use of osm_port_t:priv.

 opensm/include/opensm/osm_switch.h |   12 +
 opensm/opensm/osm_dump.c           |    2 +-
 opensm/opensm/osm_switch.c         |    7 +-
 opensm/opensm/osm_ucast_mgr.c      |    1 +
 opensm/opensm/osm_ucast_torus.c    |  418 +++++++++++++++---------------------
 5 files changed, 193 insertions(+), 247 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/3] opensm: Use local variables when searching for torus-2QoS master spanning tree root.
       [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
  2009-12-18 20:54       ` [PATCH 05/12] opensm: Enforce torus-2QoS link ordering convention Jim Schutt
  2010-02-16 16:16       ` [PATCH 0/3] opensm: Bug fixes for torus-2QoS patchset Jim Schutt
@ 2010-02-16 16:16       ` Jim Schutt
  2010-02-16 16:16       ` [PATCH 2/3] opensm: Fix handling of torus-2QoS topology discovery for radix 4 torus dimensions Jim Schutt
  2010-02-16 16:16       ` [PATCH 3/3] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
  4 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-02-16 16:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	kliteyn-VPRAkNaXOzVS1MOuV/RT9w, jaschut-4OHPYypu0djtX7QSmKvirg

Otherwise 1) presence of the wrong switches is checked; and 2) the y-loop
in good_xy_ring() can segfault on an out-of-bounds switch array x index.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index e2eb324..728e56c 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -8751,22 +8751,23 @@ ib_api_status_t torus_mcast_stree(void *context, osm_mgrp_box_t *mgb)
 }
 
 static
-bool good_xy_ring(struct torus *t, int x, int y, int z)
+bool good_xy_ring(struct torus *t, const int x, const int y, const int z)
 {
 	struct t_switch ****sw = t->sw;
 	bool good_ring = true;
+	int x_tst, y_tst;
 
-	for (x = 0; x < t->x_sz && good_ring; x++)
-		good_ring = sw[x][y][z];
+	for (x_tst = 0; x_tst < t->x_sz && good_ring; x_tst++)
+		good_ring = sw[x_tst][y][z];
 
-	for (y = 0; y < t->y_sz && good_ring; y++)
-		good_ring = sw[x][y][z];
+	for (y_tst = 0; y_tst < t->y_sz && good_ring; y_tst++)
+		good_ring = sw[x][y_tst][z];
 
 	return good_ring;
 }
 
 static
-struct t_switch *find_plane_mid(struct torus *t, int z)
+struct t_switch *find_plane_mid(struct torus *t, const int z)
 {
 	int x, dx, xm = t->x_sz / 2;
 	int y, dy, ym = t->y_sz / 2;
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/3] opensm: Fix handling of torus-2QoS topology discovery for radix 4 torus dimensions.
       [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                         ` (2 preceding siblings ...)
  2010-02-16 16:16       ` [PATCH 1/3] opensm: Use local variables when searching for torus-2QoS master spanning tree root Jim Schutt
@ 2010-02-16 16:16       ` Jim Schutt
  2010-02-16 16:16       ` [PATCH 3/3] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
  4 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-02-16 16:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	kliteyn-VPRAkNaXOzVS1MOuV/RT9w, jaschut-4OHPYypu0djtX7QSmKvirg

Torus-2QoS finds the torus topology in a fabric using an algorithm that
looks for 8 adjacent switches which form the corners of a cube, by looking
for 4 adjacent switches which form the corners of a face on that cube.

When a torus dimension has radix 4 (e.g. the y dimension in a 5x4x8 torus),
1-D rings which span that dimension cannot be distinguished topologically
from the faces the algorithm is trying to construct.

Code that prevents that situation from arising should only be applied in
cases where a torus dimension has radix 4, but due to a missing test, it
could be applied inappropriately.

This commit fixes the bug by adding the missing test.  It also restructures
the code in question to remove code duplication by adding helper functions.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/opensm/osm_ucast_torus.c |  405 ++++++++++++++++-----------------------
 1 files changed, 168 insertions(+), 237 deletions(-)

diff --git a/opensm/opensm/osm_ucast_torus.c b/opensm/opensm/osm_ucast_torus.c
index 728e56c..ab0e6a6 100644
--- a/opensm/opensm/osm_ucast_torus.c
+++ b/opensm/opensm/osm_ucast_torus.c
@@ -1956,38 +1956,16 @@ struct f_switch *tfind_2d_perpendicular(struct t_switch *tsw0,
 	return ffind_2d_perpendicular(tsw0->tmp, tsw1->tmp, tsw2->tmp);
 }
 
-/*
- * These functions return true when it safe to call
- * tfind_3d_perpendicular()/ffind_3d_perpendicular().
- */
 static
-bool safe_x_perpendicular(struct torus *t, int i, int j, int k)
+bool safe_x_ring(struct torus *t, int i, int j, int k)
 {
-	int jm1, jp1, jp2, km1, kp1, kp2;
-
-	/*
-	 * If the dimensions perpendicular to the search direction are
-	 * not radix 4 torus dimensions, it is always safe to search for
-	 * a perpendicular.
-	 */
-	if ((t->y_sz != 4 && t->z_sz != 4) ||
-	    (t->flags & Y_MESH && t->flags & Z_MESH) ||
-	    (t->y_sz != 4 && (t->flags & Z_MESH)) ||
-	    (t->z_sz != 4 && (t->flags & Y_MESH)))
-		return true;
-
-	jm1 = canonicalize(j - 1, t->y_sz);
-	jp1 = canonicalize(j + 1, t->y_sz);
-	jp2 = canonicalize(j + 2, t->y_sz);
-
-	km1 = canonicalize(k - 1, t->z_sz);
-	kp1 = canonicalize(k + 1, t->z_sz);
-	kp2 = canonicalize(k + 2, t->z_sz);
+	int im1, ip1, ip2;
+	bool success = true;
 
 	/*
-	 * Here we are checking for enough appropriate links having been
-	 * installed into the torus to prevent an incorrect link from being
-	 * considered as a perpendicular candidate.
+	 * If this x-direction radix-4 ring has at least two links
+	 * already installed into the torus,  then this ring does not
+	 * prevent us from looking for y or z direction perpendiculars.
 	 *
 	 * It is easier to check for the appropriate switches being installed
 	 * into the torus than it is to check for the links, so force the
@@ -1995,93 +1973,111 @@ bool safe_x_perpendicular(struct torus *t, int i, int j, int k)
 	 *
 	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
 	 */
-	if (((!!t->sw[i][jm1][k] +
-	      !!t->sw[i][jp1][k] + !!t->sw[i][jp2][k] >= 2) &&
-	     (!!t->sw[i][j][km1] +
-	      !!t->sw[i][j][kp1] + !!t->sw[i][j][kp2] >= 2))) {
-
-		bool success = true;
-
-		if (t->sw[i][jp2][k] && t->sw[i][jm1][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][jp2][k],
-						 t->sw[i][jm1][k])
-				&& success;
-
-		if (t->sw[i][jm1][k] && t->sw[i][j][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][jm1][k],
-						 t->sw[i][j][k])
-				&& success;
-
-		if (t->sw[i][j][k] && t->sw[i][jp1][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][j][k],
-						 t->sw[i][jp1][k])
-				&& success;
-
-		if (t->sw[i][jp1][k] && t->sw[i][jp2][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][jp1][k],
-						 t->sw[i][jp2][k])
-				&& success;
-
-		if (t->sw[i][j][kp2] && t->sw[i][j][km1])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][kp2],
-						 t->sw[i][j][km1])
-				&& success;
-
-		if (t->sw[i][j][km1] && t->sw[i][j][k])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][km1],
-						 t->sw[i][j][k])
-				&& success;
-
-		if (t->sw[i][j][k] && t->sw[i][j][kp1])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][k],
-						 t->sw[i][j][kp1])
-				&& success;
-
-		if (t->sw[i][j][kp1] && t->sw[i][j][kp2])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][kp1],
-						 t->sw[i][j][kp2])
-				&& success;
-		return success;
+	if (t->x_sz != 4 || t->flags & X_MESH)
+		goto out;
+
+	im1 = canonicalize(i - 1, t->x_sz);
+	ip1 = canonicalize(i + 1, t->x_sz);
+	ip2 = canonicalize(i + 2, t->x_sz);
+
+	if (!!t->sw[im1][j][k] +
+	    !!t->sw[ip1][j][k] + !!t->sw[ip2][j][k] < 2) {
+		success = false;
+		goto out;
 	}
-	return false;
+	if (t->sw[ip2][j][k] && t->sw[im1][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[ip2][j][k],
+					 t->sw[im1][j][k])
+			&& success;
+
+	if (t->sw[im1][j][k] && t->sw[i][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[im1][j][k],
+					 t->sw[i][j][k])
+			&& success;
+
+	if (t->sw[i][j][k] && t->sw[ip1][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[i][j][k],
+					 t->sw[ip1][j][k])
+			&& success;
+
+	if (t->sw[ip1][j][k] && t->sw[ip2][j][k])
+		success = link_tswitches(t, 0,
+					 t->sw[ip1][j][k],
+					 t->sw[ip2][j][k])
+			&& success;
+out:
+	return success;
 }
 
 static
-bool safe_y_perpendicular(struct torus *t, int i, int j, int k)
+bool safe_y_ring(struct torus *t, int i, int j, int k)
 {
-	int im1, ip1, ip2, km1, kp1, kp2;
+	int jm1, jp1, jp2;
+	bool success = true;
 
 	/*
-	 * If the dimensions perpendicular to the search direction are
-	 * not radix 4 torus dimensions, it is always safe to search for
-	 * a perpendicular.
+	 * If this y-direction radix-4 ring has at least two links
+	 * already installed into the torus,  then this ring does not
+	 * prevent us from looking for x or z direction perpendiculars.
+	 *
+	 * It is easier to check for the appropriate switches being installed
+	 * into the torus than it is to check for the links, so force the
+	 * link installation if the appropriate switches are installed.
+	 *
+	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
 	 */
-	if ((t->x_sz != 4 && t->z_sz != 4) ||
-	    (t->flags & X_MESH && t->flags & Z_MESH) ||
-	    (t->x_sz != 4 && (t->flags & Z_MESH)) ||
-	    (t->z_sz != 4 && (t->flags & X_MESH)))
-		return true;
+	if (t->y_sz != 4 || (t->flags & Y_MESH))
+		goto out;
 
-	im1 = canonicalize(i - 1, t->x_sz);
-	ip1 = canonicalize(i + 1, t->x_sz);
-	ip2 = canonicalize(i + 2, t->x_sz);
+	jm1 = canonicalize(j - 1, t->y_sz);
+	jp1 = canonicalize(j + 1, t->y_sz);
+	jp2 = canonicalize(j + 2, t->y_sz);
 
-	km1 = canonicalize(k - 1, t->z_sz);
-	kp1 = canonicalize(k + 1, t->z_sz);
-	kp2 = canonicalize(k + 2, t->z_sz);
+	if (!!t->sw[i][jm1][k] +
+	    !!t->sw[i][jp1][k] + !!t->sw[i][jp2][k] < 2) {
+		success = false;
+		goto out;
+	}
+	if (t->sw[i][jp2][k] && t->sw[i][jm1][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][jp2][k],
+					 t->sw[i][jm1][k])
+			&& success;
+
+	if (t->sw[i][jm1][k] && t->sw[i][j][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][jm1][k],
+					 t->sw[i][j][k])
+			&& success;
+
+	if (t->sw[i][j][k] && t->sw[i][jp1][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][j][k],
+					 t->sw[i][jp1][k])
+			&& success;
+
+	if (t->sw[i][jp1][k] && t->sw[i][jp2][k])
+		success = link_tswitches(t, 1,
+					 t->sw[i][jp1][k],
+					 t->sw[i][jp2][k])
+			&& success;
+out:
+	return success;
+}
+
+static
+bool safe_z_ring(struct torus *t, int i, int j, int k)
+{
+	int km1, kp1, kp2;
+	bool success = true;
 
 	/*
-	 * Here we are checking for enough appropriate links having been
-	 * installed into the torus to prevent an incorrect link from being
-	 * considered as a perpendicular candidate.
+	 * If this z-direction radix-4 ring has at least two links
+	 * already installed into the torus,  then this ring does not
+	 * prevent us from looking for x or y direction perpendiculars.
 	 *
 	 * It is easier to check for the appropriate switches being installed
 	 * into the torus than it is to check for the links, so force the
@@ -2089,157 +2085,92 @@ bool safe_y_perpendicular(struct torus *t, int i, int j, int k)
 	 *
 	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
 	 */
-	if (((!!t->sw[im1][j][k] +
-	      !!t->sw[ip1][j][k] + !!t->sw[ip2][j][k] >= 2) &&
-	     (!!t->sw[i][j][km1] +
-	      !!t->sw[i][j][kp1] + !!t->sw[i][j][kp2] >= 2))) {
-
-		bool success = true;
-
-		if (t->sw[ip2][j][k] && t->sw[im1][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[ip2][j][k],
-						 t->sw[im1][j][k])
-				&& success;
-
-		if (t->sw[im1][j][k] && t->sw[i][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[im1][j][k],
-						 t->sw[i][j][k])
-				&& success;
-
-		if (t->sw[i][j][k] && t->sw[ip1][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[i][j][k],
-						 t->sw[ip1][j][k])
-				&& success;
-
-		if (t->sw[ip1][j][k] && t->sw[ip2][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[ip1][j][k],
-						 t->sw[ip2][j][k])
-				&& success;
-
-		if (t->sw[i][j][kp2] && t->sw[i][j][km1])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][kp2],
-						 t->sw[i][j][km1])
-				&& success;
-
-		if (t->sw[i][j][km1] && t->sw[i][j][k])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][km1],
-						 t->sw[i][j][k])
-				&& success;
-
-		if (t->sw[i][j][k] && t->sw[i][j][kp1])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][k],
-						 t->sw[i][j][kp1])
-				&& success;
-
-		if (t->sw[i][j][kp1] && t->sw[i][j][kp2])
-			success = link_tswitches(t, 2,
-						 t->sw[i][j][kp1],
-						 t->sw[i][j][kp2])
-				&& success;
-		return success;
+	if (t->z_sz != 4 || t->flags & Z_MESH)
+		goto out;
+
+	km1 = canonicalize(k - 1, t->z_sz);
+	kp1 = canonicalize(k + 1, t->z_sz);
+	kp2 = canonicalize(k + 2, t->z_sz);
+
+	if (!!t->sw[i][j][km1] +
+	    !!t->sw[i][j][kp1] + !!t->sw[i][j][kp2] < 2) {
+		success = false;
+		goto out;
 	}
-	return false;
+	if (t->sw[i][j][kp2] && t->sw[i][j][km1])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][kp2],
+					 t->sw[i][j][km1])
+			&& success;
+
+	if (t->sw[i][j][km1] && t->sw[i][j][k])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][km1],
+					 t->sw[i][j][k])
+			&& success;
+
+	if (t->sw[i][j][k] && t->sw[i][j][kp1])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][k],
+					 t->sw[i][j][kp1])
+			&& success;
+
+	if (t->sw[i][j][kp1] && t->sw[i][j][kp2])
+		success = link_tswitches(t, 2,
+					 t->sw[i][j][kp1],
+					 t->sw[i][j][kp2])
+			&& success;
+out:
+	return success;
 }
 
+/*
+ * These functions return true when it safe to call
+ * tfind_3d_perpendicular()/ffind_3d_perpendicular().
+ */
 static
-bool safe_z_perpendicular(struct torus *t, int i, int j, int k)
+bool safe_x_perpendicular(struct torus *t, int i, int j, int k)
 {
-	int im1, ip1, ip2, jm1, jp1, jp2;
-
 	/*
 	 * If the dimensions perpendicular to the search direction are
 	 * not radix 4 torus dimensions, it is always safe to search for
 	 * a perpendicular.
+	 *
+	 * Here we are checking for enough appropriate links having been
+	 * installed into the torus to prevent an incorrect link from being
+	 * considered as a perpendicular candidate.
 	 */
-	if ((t->x_sz != 4 && t->y_sz != 4) ||
-	    (t->flags & X_MESH && t->flags & Y_MESH) ||
-	    (t->x_sz != 4 && (t->flags & Y_MESH)) ||
-	    (t->y_sz != 4 && (t->flags & X_MESH)))
-		return true;
-
-	im1 = canonicalize(i - 1, t->x_sz);
-	ip1 = canonicalize(i + 1, t->x_sz);
-	ip2 = canonicalize(i + 2, t->x_sz);
-
-	jm1 = canonicalize(j - 1, t->y_sz);
-	jp1 = canonicalize(j + 1, t->y_sz);
-	jp2 = canonicalize(j + 2, t->y_sz);
+	return safe_y_ring(t, i, j, k) && safe_z_ring(t, i, j, k);
+}
 
+static
+bool safe_y_perpendicular(struct torus *t, int i, int j, int k)
+{
 	/*
+	 * If the dimensions perpendicular to the search direction are
+	 * not radix 4 torus dimensions, it is always safe to search for
+	 * a perpendicular.
+	 *
 	 * Here we are checking for enough appropriate links having been
 	 * installed into the torus to prevent an incorrect link from being
 	 * considered as a perpendicular candidate.
+	 */
+	return safe_x_ring(t, i, j, k) && safe_z_ring(t, i, j, k);
+}
+
+static
+bool safe_z_perpendicular(struct torus *t, int i, int j, int k)
+{
+	/*
+	 * If the dimensions perpendicular to the search direction are
+	 * not radix 4 torus dimensions, it is always safe to search for
+	 * a perpendicular.
 	 *
-	 * It is easier to check for the appropriate switches being installed
-	 * into the torus than it is to check for the links, so force the
-	 * link installation if the appropriate switches are installed.
-	 *
-	 * Recall that canonicalize(n - 2, 4) == canonicalize(n + 2, 4).
+	 * Implement this by checking for enough appropriate links having
+	 * been installed into the torus to prevent an incorrect link from
+	 * being considered as a perpendicular candidate.
 	 */
-	if (((!!t->sw[im1][j][k] +
-	      !!t->sw[ip1][j][k] + !!t->sw[ip2][j][k] >= 2) &&
-	     (!!t->sw[i][jm1][k] +
-	      !!t->sw[i][jp1][k] + !!t->sw[i][jp2][k] >= 2))) {
-
-		bool success = true;
-
-		if (t->sw[ip2][j][k] && t->sw[im1][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[ip2][j][k],
-						 t->sw[im1][j][k])
-				&& success;
-
-		if (t->sw[im1][j][k] && t->sw[i][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[im1][j][k],
-						 t->sw[i][j][k])
-				&& success;
-
-		if (t->sw[i][j][k] && t->sw[ip1][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[i][j][k],
-						 t->sw[ip1][j][k])
-				&& success;
-
-		if (t->sw[ip1][j][k] && t->sw[ip2][j][k])
-			success = link_tswitches(t, 0,
-						 t->sw[ip1][j][k],
-						 t->sw[ip2][j][k])
-				&& success;
-
-		if (t->sw[i][jp2][k] && t->sw[i][jm1][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][jp2][k],
-						 t->sw[i][jm1][k])
-				&& success;
-
-		if (t->sw[i][jm1][k] && t->sw[i][j][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][jm1][k],
-						 t->sw[i][j][k])
-				&& success;
-
-		if (t->sw[i][j][k] && t->sw[i][jp1][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][j][k],
-						 t->sw[i][jp1][k])
-				&& success;
-
-		if (t->sw[i][jp1][k] && t->sw[i][jp2][k])
-			success = link_tswitches(t, 1,
-						 t->sw[i][jp1][k],
-						 t->sw[i][jp2][k])
-				&& success;
-		return true;
-	}
-	return false;
+	return safe_x_ring(t, i, j, k) && safe_y_ring(t, i, j, k);
 }
 
 /*
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/3] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv.
       [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
                         ` (3 preceding siblings ...)
  2010-02-16 16:16       ` [PATCH 2/3] opensm: Fix handling of torus-2QoS topology discovery for radix 4 torus dimensions Jim Schutt
@ 2010-02-16 16:16       ` Jim Schutt
  4 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-02-16 16:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: sashak-smomgflXvOZWk0Htik3J/w, eitan-VPRAkNaXOzVS1MOuV/RT9w,
	kliteyn-VPRAkNaXOzVS1MOuV/RT9w, jaschut-4OHPYypu0djtX7QSmKvirg

Torus-2QoS makes persistent use of osm_port_t:priv to speed calculation
of path SL values.

However, osm_switch_recommend_path() uses a non-NULL osm_port_t:priv
as a flag that osm_port_t:priv holds a tracking array used when
LMC > 0.  It turns out that 1) dump_ucast_routes() does not need
osm_switch_recommend_path() to consider alternate routes, and 2)
before the addition of torus-2QoS, osm_port_t:priv use never
persisted past the unicast routing function, so it was always
NULL on entry to dump_ucast_routes().

Fix this up by making the routing_for_lmc flag explicitly set by
the caller of osm_switch_recommend_path(), rather than inferring
it from osm_port_t:priv.  This retains existing behavior for
existing routing engines, and allows torus-2QoS to make persistent
use of osm_port_t:priv.

The alternative would be to add another member to osm_port_t,
say osm_port_t:priv2.

Signed-off-by: Jim Schutt <jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
---
 opensm/include/opensm/osm_switch.h |   12 ++++++++++++
 opensm/opensm/osm_dump.c           |    2 +-
 opensm/opensm/osm_switch.c         |    7 ++++---
 opensm/opensm/osm_ucast_mgr.c      |    1 +
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h
index 205896d..0b4fc78 100644
--- a/opensm/include/opensm/osm_switch.h
+++ b/opensm/include/opensm/osm_switch.h
@@ -876,6 +876,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 				  IN osm_port_t * p_port, IN uint16_t lid_ho,
 				  IN unsigned start_from,
 				  IN boolean_t ignore_existing,
+				  IN boolean_t routing_for_lmc,
 				  IN boolean_t dor);
 /*
 * PARAMETERS
@@ -898,6 +899,17 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 *		If false, the switch will choose an existing route if one
 *		exists, otherwise will choose the optimal route.
 *
+*	routing_for_lmc
+*		[in] We support an enhanced LMC aware routing mode:
+*		In the case of LMC > 0, we can track the remote side
+*		system and node for all of the lids of the target
+*		and try and avoid routing again through the same
+*		system / node.
+*
+*		Assume if routing_for_lmc is TRUE that this procedure
+*		was provided with the tracking array and counter via
+*		p_port->priv, and we can conduct this algorithm.
+*
 *	dor
 *		[in] If TRUE, Dimension Order Routing will be done.
 *
diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index f3f4623..030de74 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -221,7 +221,7 @@ static void dump_ucast_routes(cl_map_item_t * item, FILE * file, void *cxt)
 			/* No LMC Optimization */
 			best_port = osm_switch_recommend_path(p_sw, p_port,
 							      lid_ho, 1, TRUE,
-							      dor);
+							      FALSE, dor);
 			fprintf(file, "No %u hop path possible via port %u!",
 				best_hops, best_port);
 		}
diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c
index 1cd8bfc..14b0021 100644
--- a/opensm/opensm/osm_switch.c
+++ b/opensm/opensm/osm_switch.c
@@ -214,6 +214,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 				  IN osm_port_t * p_port, IN uint16_t lid_ho,
 				  IN unsigned start_from,
 				  IN boolean_t ignore_existing,
+				  IN boolean_t routing_for_lmc,
 				  IN boolean_t dor)
 {
 	/*
@@ -223,10 +224,10 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
 	   and try and avoid routing again through the same
 	   system / node.
 
-	   If this procedure is provided with the tracking array
-	   and counter we can conduct this algorithm.
+	   Assume if routing_for_lmc is true that this procedure was
+	   provided the tracking array and counter via p_port->priv,
+	   and we can conduct this algorithm.
 	 */
-	boolean_t routing_for_lmc = (p_port->priv != NULL);
 	uint16_t base_lid;
 	uint8_t hops;
 	uint8_t least_hops;
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index 9a3ea25..fbc9244 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -251,6 +251,7 @@ static void ucast_mgr_process_port(IN osm_ucast_mgr_t * p_mgr,
 	 */
 	port = osm_switch_recommend_path(p_sw, p_port, lid_ho, start_from,
 					 p_mgr->p_subn->ignore_existing_lfts,
+					 p_mgr->p_subn->opt.lmc,
 					 p_mgr->is_dor);
 
 	if (port == OSM_NO_PATH) {
-- 
1.5.6.GIT


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/11] opensm: Make it possible to configure no fallback routing engine.
       [not found]     ` <1258744509-11148-9-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
@ 2010-03-04 14:35       ` Yevgeny Kliteynik
       [not found]         ` <4B8FC53C.9060605-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Yevgeny Kliteynik @ 2010-03-04 14:35 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w

Hi Jim,

On 20/Nov/09 21:15, Jim Schutt wrote:
> For a fabric that requires routing with an engine with special properties,
> say avoiding credit loops via making use of SLs in routing, it might
> be preferable to not fall back to minhop if the configured routing engine
> fails.
>
> E.g. the torus-2QoS routing engine uses both SL2VL maps and path SL values
> to provide routing free of credit loops, but cannot route fabrics for
> some patterns of failed switches.  Should a switch fail that creates such
> a pattern, it may be preferable to keep the previous routing information
> loaded in the switches until a switch can be replaced that restores
> torus-2QoS's ability to route the fabric.
>
> The alternative, having some other engine route the fabric, will immediately
> introduce credit loops.

This is a great idea.
Regarding the implementation: I would prefer seeing this
as a purely OpenSM option and not as a new routing engine
keyword.
I think it would be cleaner to leave the list of routing
engines w/o special keys, and have a general option
that would prevent SM from falling back. Actually, the
fall-back itself is not bad, as it is defined by the list
of routing engines, and SM should try them one by one.
The problem is with using default routing that is not
specified in the routing engines list.

Here's the patch that implements OSM option
"use_default_routing", and a command line parameter
"no_default_routing" to control this option.

I'll write the patch that adds this option to the
OSM trunk and send it to Sasha shortly.

Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
---
  opensm/include/opensm/osm_subnet.h |    2 +-
  opensm/opensm/main.c               |    9 +++++++++
  opensm/opensm/osm_opensm.c         |   10 ++++------
  opensm/opensm/osm_subnet.c         |    8 ++++++++
  opensm/opensm/osm_ucast_mgr.c      |    7 +++++--
  5 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index a4133a0..905f64d 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -190,6 +190,7 @@ typedef struct osm_subn_opt {
  	boolean_t sweep_on_trap;
  	char *routing_engine_names;
  	boolean_t use_ucast_cache;
+	boolean_t use_default_routing;
  	boolean_t connect_roots;
  	char *lid_matrix_dump_file;
  	char *lfts_file;
@@ -215,7 +216,6 @@ typedef struct osm_subn_opt {
  	osm_qos_options_t qos_rtr_options;
  	boolean_t enable_quirks;
  	boolean_t no_clients_rereg;
-	boolean_t no_fallback_routing_engine;
  #ifdef ENABLE_OSM_PERF_MGR
  	boolean_t perfmgr;
  	boolean_t perfmgr_redir;
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 096bf5f..47075a2 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -175,6 +175,10 @@ static void show_usage(void)
  	       "          separated by commas so that specific ordering of routing\n"
  	       "          algorithms will be tried if earlier routing engines fail.\n"
  	       "          Supported engines: updn, file, ftree, lash, dor, torus-2QoS\n\n");
+	printf("--no_default_routing\n"
+	       "          This option prevents OpenSM from falling back to default\n"
+	       "          routing if none of the provided engines was able to\n"
+	       "          configure the subnet.\n\n");
  	printf("--do_mesh_analysis\n"
  	       "          This option enables additional analysis for the lash\n"
  	       "          routing engine to precondition switch port assignments\n"
@@ -612,6 +616,7 @@ int main(int argc, char *argv[])
  		{"sm_sl", 1, NULL, 7},
  		{"retries", 1, NULL, 8},
  		{"torus_config", 1, NULL, 9},
+		{"no_default_routing", 0, NULL, 10},
  		{NULL, 0, NULL, 0}	/* Required at the end of the array */
  	};
  
@@ -993,6 +998,10 @@ int main(int argc, char *argv[])
  		case 9:
  			SET_STR_OPT(opt.torus_conf_file, optarg);
  			break;
+		case 10:
+			opt.use_default_routing = FALSE;
+			printf(" No fall back to default routing\n");
+			break;
  		case 'h':
  		case '?':
  		case ':':
diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
index e7ef55c..d153be5 100644
--- a/opensm/opensm/osm_opensm.c
+++ b/opensm/opensm/osm_opensm.c
@@ -159,11 +159,6 @@ static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
  	struct osm_routing_engine *re;
  	const struct routing_engine_module *m;
  
-	if (!strcmp(name, "no_fallback")) {
-		osm->subn.opt.no_fallback_routing_engine = TRUE;
-		return NULL;
-	}
-
  	for (m = routing_modules; m->name && *m->name; m++) {
  		if (!strcmp(m->name, name)) {
  			re = malloc(sizeof(struct osm_routing_engine));
@@ -212,7 +207,10 @@ static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names)
  		}
  		free(str);
  	}
-	if (!osm->default_routing_engine) {
+
+	if (!engine_names || !*engine_names ||
+	    (!osm->default_routing_engine &&
+	     osm->subn.opt.use_default_routing)) {
  		re = setup_routing_engine(osm, "minhop");
  		if (!osm->routing_engine_list && re)
  			append_routing_engine(osm, re);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 03d9538..274e807 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -327,6 +327,7 @@ static const opt_rec_t opt_tbl[] = {
  	{ "port_profile_switch_nodes", OPT_OFFSET(port_profile_switch_nodes), opts_parse_boolean, NULL, 1 },
  	{ "sweep_on_trap", OPT_OFFSET(sweep_on_trap), opts_parse_boolean, NULL, 1 },
  	{ "routing_engine", OPT_OFFSET(routing_engine_names), opts_parse_charp, NULL, 0 },
+	{ "use_default_routing", OPT_OFFSET(use_default_routing), opts_parse_boolean, NULL, 1 },
  	{ "connect_roots", OPT_OFFSET(connect_roots), opts_parse_boolean, NULL, 1 },
  	{ "use_ucast_cache", OPT_OFFSET(use_ucast_cache), opts_parse_boolean, NULL, 1 },
  	{ "log_file", OPT_OFFSET(log_file), opts_parse_charp, NULL, 0 },
@@ -743,6 +744,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
  	p_opt->port_profile_switch_nodes = FALSE;
  	p_opt->sweep_on_trap = TRUE;
  	p_opt->use_ucast_cache = FALSE;
+	p_opt->use_default_routing = TRUE;
  	p_opt->routing_engine_names = NULL;
  	p_opt->connect_roots = FALSE;
  	p_opt->lid_matrix_dump_file = NULL;
@@ -1392,6 +1394,12 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
  		p_opts->routing_engine_names : null_str);
  
  	fprintf(out,
+		"# Fall back to default routing engine if the provided\n"
+		"# routing engine(s) failed to configure the subnet\n"
+		"use_default_routing %s\n\n",
+		p_opts->use_default_routing ? "TRUE" : "FALSE");
+
+	fprintf(out,
  		"# Connect roots (use FALSE if unsure)\n"
  		"connect_roots %s\n\n",
  		p_opts->connect_roots ? "TRUE" : "FALSE");
diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
index fbc9244..9264753 100644
--- a/opensm/opensm/osm_ucast_mgr.c
+++ b/opensm/opensm/osm_ucast_mgr.c
@@ -979,8 +979,11 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
  	}
  
  	if (!p_osm->routing_engine_used &&
-	    p_osm->subn.opt.no_fallback_routing_engine != TRUE) {
-		/* If configured routing algorithm failed, use default MinHop */
+	    p_osm->default_routing_engine) {
+		/*
+		 * If configured routing algorithms failed,
+		 * and default routing has been set, use it.
+		 */
  		struct osm_routing_engine *r = p_osm->default_routing_engine;
  
  		r->build_lid_matrices(r->context);
-- 
1.5.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 09/11] opensm: Make it possible to configure no fallback routing engine.
       [not found]         ` <4B8FC53C.9060605-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
@ 2010-03-04 21:38           ` Jim Schutt
  0 siblings, 0 replies; 40+ messages in thread
From: Jim Schutt @ 2010-03-04 21:38 UTC (permalink / raw)
  To: kliteyn-VPRAkNaXOzVS1MOuV/RT9w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sashak-smomgflXvOZWk0Htik3J/w,
	eitan-VPRAkNaXOzVS1MOuV/RT9w


On Thu, 2010-03-04 at 07:35 -0700, Yevgeny Kliteynik wrote:
> Hi Jim,
> 
> On 20/Nov/09 21:15, Jim Schutt wrote:
> > For a fabric that requires routing with an engine with special properties,
> > say avoiding credit loops via making use of SLs in routing, it might
> > be preferable to not fall back to minhop if the configured routing engine
> > fails.
> >
> > E.g. the torus-2QoS routing engine uses both SL2VL maps and path SL values
> > to provide routing free of credit loops, but cannot route fabrics for
> > some patterns of failed switches.  Should a switch fail that creates such
> > a pattern, it may be preferable to keep the previous routing information
> > loaded in the switches until a switch can be replaced that restores
> > torus-2QoS's ability to route the fabric.
> >
> > The alternative, having some other engine route the fabric, will immediately
> > introduce credit loops.
> 
> This is a great idea.
> Regarding the implementation: I would prefer seeing this
> as a purely OpenSM option and not as a new routing engine
> keyword.
> I think it would be cleaner to leave the list of routing
> engines w/o special keys, and have a general option
> that would prevent SM from falling back. 

That seems right to me, now.

> Actually, the
> fall-back itself is not bad, as it is defined by the list
> of routing engines, and SM should try them one by one.
> The problem is with using default routing that is not
> specified in the routing engines list.

I agree.  If a user explicitly configures which
routing engines to try, only those should be used,
and a notification logged if they all fail.

> 
> Here's the patch that implements OSM option
> "use_default_routing", and a command line parameter
> "no_default_routing" to control this option.

This looks good to me.

> 
> I'll write the patch that adds this option to the
> OSM trunk and send it to Sasha shortly.

OK, thanks.

-- Jim

> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
> ---
>   opensm/include/opensm/osm_subnet.h |    2 +-
>   opensm/opensm/main.c               |    9 +++++++++
>   opensm/opensm/osm_opensm.c         |   10 ++++------
>   opensm/opensm/osm_subnet.c         |    8 ++++++++
>   opensm/opensm/osm_ucast_mgr.c      |    7 +++++--
>   5 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
> index a4133a0..905f64d 100644
> --- a/opensm/include/opensm/osm_subnet.h
> +++ b/opensm/include/opensm/osm_subnet.h
> @@ -190,6 +190,7 @@ typedef struct osm_subn_opt {
>   	boolean_t sweep_on_trap;
>   	char *routing_engine_names;
>   	boolean_t use_ucast_cache;
> +	boolean_t use_default_routing;
>   	boolean_t connect_roots;
>   	char *lid_matrix_dump_file;
>   	char *lfts_file;
> @@ -215,7 +216,6 @@ typedef struct osm_subn_opt {
>   	osm_qos_options_t qos_rtr_options;
>   	boolean_t enable_quirks;
>   	boolean_t no_clients_rereg;
> -	boolean_t no_fallback_routing_engine;
>   #ifdef ENABLE_OSM_PERF_MGR
>   	boolean_t perfmgr;
>   	boolean_t perfmgr_redir;
> diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
> index 096bf5f..47075a2 100644
> --- a/opensm/opensm/main.c
> +++ b/opensm/opensm/main.c
> @@ -175,6 +175,10 @@ static void show_usage(void)
>   	       "          separated by commas so that specific ordering of routing\n"
>   	       "          algorithms will be tried if earlier routing engines fail.\n"
>   	       "          Supported engines: updn, file, ftree, lash, dor, torus-2QoS\n\n");
> +	printf("--no_default_routing\n"
> +	       "          This option prevents OpenSM from falling back to default\n"
> +	       "          routing if none of the provided engines was able to\n"
> +	       "          configure the subnet.\n\n");
>   	printf("--do_mesh_analysis\n"
>   	       "          This option enables additional analysis for the lash\n"
>   	       "          routing engine to precondition switch port assignments\n"
> @@ -612,6 +616,7 @@ int main(int argc, char *argv[])
>   		{"sm_sl", 1, NULL, 7},
>   		{"retries", 1, NULL, 8},
>   		{"torus_config", 1, NULL, 9},
> +		{"no_default_routing", 0, NULL, 10},
>   		{NULL, 0, NULL, 0}	/* Required at the end of the array */
>   	};
>   
> @@ -993,6 +998,10 @@ int main(int argc, char *argv[])
>   		case 9:
>   			SET_STR_OPT(opt.torus_conf_file, optarg);
>   			break;
> +		case 10:
> +			opt.use_default_routing = FALSE;
> +			printf(" No fall back to default routing\n");
> +			break;
>   		case 'h':
>   		case '?':
>   		case ':':
> diff --git a/opensm/opensm/osm_opensm.c b/opensm/opensm/osm_opensm.c
> index e7ef55c..d153be5 100644
> --- a/opensm/opensm/osm_opensm.c
> +++ b/opensm/opensm/osm_opensm.c
> @@ -159,11 +159,6 @@ static struct osm_routing_engine *setup_routing_engine(osm_opensm_t *osm,
>   	struct osm_routing_engine *re;
>   	const struct routing_engine_module *m;
>   
> -	if (!strcmp(name, "no_fallback")) {
> -		osm->subn.opt.no_fallback_routing_engine = TRUE;
> -		return NULL;
> -	}
> -
>   	for (m = routing_modules; m->name && *m->name; m++) {
>   		if (!strcmp(m->name, name)) {
>   			re = malloc(sizeof(struct osm_routing_engine));
> @@ -212,7 +207,10 @@ static void setup_routing_engines(osm_opensm_t *osm, const char *engine_names)
>   		}
>   		free(str);
>   	}
> -	if (!osm->default_routing_engine) {
> +
> +	if (!engine_names || !*engine_names ||
> +	    (!osm->default_routing_engine &&
> +	     osm->subn.opt.use_default_routing)) {
>   		re = setup_routing_engine(osm, "minhop");
>   		if (!osm->routing_engine_list && re)
>   			append_routing_engine(osm, re);
> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
> index 03d9538..274e807 100644
> --- a/opensm/opensm/osm_subnet.c
> +++ b/opensm/opensm/osm_subnet.c
> @@ -327,6 +327,7 @@ static const opt_rec_t opt_tbl[] = {
>   	{ "port_profile_switch_nodes", OPT_OFFSET(port_profile_switch_nodes), opts_parse_boolean, NULL, 1 },
>   	{ "sweep_on_trap", OPT_OFFSET(sweep_on_trap), opts_parse_boolean, NULL, 1 },
>   	{ "routing_engine", OPT_OFFSET(routing_engine_names), opts_parse_charp, NULL, 0 },
> +	{ "use_default_routing", OPT_OFFSET(use_default_routing), opts_parse_boolean, NULL, 1 },
>   	{ "connect_roots", OPT_OFFSET(connect_roots), opts_parse_boolean, NULL, 1 },
>   	{ "use_ucast_cache", OPT_OFFSET(use_ucast_cache), opts_parse_boolean, NULL, 1 },
>   	{ "log_file", OPT_OFFSET(log_file), opts_parse_charp, NULL, 0 },
> @@ -743,6 +744,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
>   	p_opt->port_profile_switch_nodes = FALSE;
>   	p_opt->sweep_on_trap = TRUE;
>   	p_opt->use_ucast_cache = FALSE;
> +	p_opt->use_default_routing = TRUE;
>   	p_opt->routing_engine_names = NULL;
>   	p_opt->connect_roots = FALSE;
>   	p_opt->lid_matrix_dump_file = NULL;
> @@ -1392,6 +1394,12 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
>   		p_opts->routing_engine_names : null_str);
>   
>   	fprintf(out,
> +		"# Fall back to default routing engine if the provided\n"
> +		"# routing engine(s) failed to configure the subnet\n"
> +		"use_default_routing %s\n\n",
> +		p_opts->use_default_routing ? "TRUE" : "FALSE");
> +
> +	fprintf(out,
>   		"# Connect roots (use FALSE if unsure)\n"
>   		"connect_roots %s\n\n",
>   		p_opts->connect_roots ? "TRUE" : "FALSE");
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index fbc9244..9264753 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -979,8 +979,11 @@ int osm_ucast_mgr_process(IN osm_ucast_mgr_t * p_mgr)
>   	}
>   
>   	if (!p_osm->routing_engine_used &&
> -	    p_osm->subn.opt.no_fallback_routing_engine != TRUE) {
> -		/* If configured routing algorithm failed, use default MinHop */
> +	    p_osm->default_routing_engine) {
> +		/*
> +		 * If configured routing algorithms failed,
> +		 * and default routing has been set, use it.
> +		 */
>   		struct osm_routing_engine *r = p_osm->default_routing_engine;
>   
>   		r->build_lid_matrices(r->context);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-03-04 21:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-20 19:14 [PATCH 00/11] Add new torus routing engine: torus-2QoS Jim Schutt
     [not found] ` <1258744509-11148-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2009-11-20 19:15   ` [PATCH 01/11] opensm: Prepare for routing engine input to path record SL lookup and SL2VL map setup Jim Schutt
2009-11-20 19:15   ` [PATCH 02/11] opensm: Allow the routing engine to influence SL2VL calculations Jim Schutt
     [not found]     ` <1258744509-11148-3-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-01-14 12:36       ` Yevgeny Kliteynik
     [not found]         ` <4B4F0FBD.3040308-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-01-14 16:01           ` Jim Schutt
2010-02-10 16:15       ` Yevgeny Kliteynik
     [not found]         ` <4B72DBBD.9020709-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-02-15 21:45           ` Jim Schutt
2009-11-20 19:15   ` [PATCH 03/11] opensm: Allow the routing engine to participate in path SL calculations Jim Schutt
     [not found]     ` <1258744509-11148-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-01-14 16:24       ` Yevgeny Kliteynik
     [not found]         ` <4B4F452B.7040007-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-01-18 19:24           ` Jim Schutt
     [not found]             ` <1263842661.5550.43.camel-mgfCWIlwujvg4c9jKm7R2O1ftBKYq+Ku@public.gmane.org>
2010-01-18 20:19               ` Yevgeny Kliteynik
2009-11-20 19:15   ` [PATCH 04/11] opensm: Track the minimum value in the fabric of data VLs supported Jim Schutt
2009-11-20 19:15   ` [PATCH 06/11] opensm: Enable torus-2QoS routing engine Jim Schutt
2009-11-20 19:15   ` [PATCH 07/11] opensm: Add opensm option to specify file name for extra torus-2QoS configuration information Jim Schutt
2009-11-20 19:15   ` [PATCH 08/11] opensm: Do not require -Q option for torus-2QoS routing engine Jim Schutt
2009-11-20 19:15   ` [PATCH 09/11] opensm: Make it possible to configure no fallback " Jim Schutt
     [not found]     ` <1258744509-11148-9-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2010-03-04 14:35       ` Yevgeny Kliteynik
     [not found]         ` <4B8FC53C.9060605-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-03-04 21:38           ` Jim Schutt
2009-11-20 19:15   ` [PATCH 10/11] opensm: Avoid havoc in minhop caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
2009-11-20 19:15   ` [PATCH 11/11] opensm: Update documentation to describe torus-2QoS Jim Schutt
2009-11-20 19:24   ` [PATCH 05/11] opensm: Add torus-2QoS routing engine Jim Schutt
2009-11-20 19:27   ` torus-2QoS example input files (was Re: [PATCH 00/11] Add new torus routing engine: torus-2QoS) Jim Schutt
2009-12-18 20:50   ` [PATCH 00/12] Add specialized multicast support to new torus routing engine: torus-2QoS Jim Schutt
     [not found]     ` <1261169461-2516-1-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2009-12-18 20:54       ` [PATCH 05/12] opensm: Enforce torus-2QoS link ordering convention Jim Schutt
2010-02-16 16:16       ` [PATCH 0/3] opensm: Bug fixes for torus-2QoS patchset Jim Schutt
2010-02-16 16:16       ` [PATCH 1/3] opensm: Use local variables when searching for torus-2QoS master spanning tree root Jim Schutt
2010-02-16 16:16       ` [PATCH 2/3] opensm: Fix handling of torus-2QoS topology discovery for radix 4 torus dimensions Jim Schutt
2010-02-16 16:16       ` [PATCH 3/3] opensm: Avoid havoc in dump_ucast_routes() caused by torus-2QoS persistent use of osm_port_t:priv Jim Schutt
2009-12-18 20:50   ` [PATCH 01/12] opensm: Make error message for torus-2QoS dateline specification match code check Jim Schutt
2009-12-18 20:50   ` [PATCH 02/12] opensm: torus-2QoS should fail to route if message deadlock is possible Jim Schutt
2009-12-18 20:50   ` [PATCH 03/12] opensm: Remove unused port specification from torus-2QoS config file parsing Jim Schutt
     [not found]     ` <1261169461-2516-4-git-send-email-jaschut-4OHPYypu0djtX7QSmKvirg@public.gmane.org>
2009-12-18 20:56       ` Jim Schutt
2009-12-18 20:50   ` [PATCH 04/12] opensm: Fix up some torus-2QoS comments to match code Jim Schutt
2009-12-18 20:50   ` [PATCH 06/12] opensm: Remove redundant function names in torus-2QoS logging Jim Schutt
2009-12-18 20:50   ` [PATCH 07/12] opensm: Make torus-2QoS always use OSM_LOG_INFO, never LOG_INFO Jim Schutt
2009-12-18 20:50   ` [PATCH 08/12] opensm: Add struct osm_routing_engine callback to build spanning trees for multicast Jim Schutt
2009-12-18 20:50   ` [PATCH 09/12] opensm: Make mcast_mgr_purge_tree() available outside osm_mcast_mgr.c Jim Schutt
2009-12-18 20:50   ` [PATCH 10/12] opensm: Implement master spanning tree for torus-2QoS multicast support Jim Schutt
2009-12-18 20:51   ` [PATCH 11/12] opensm: Implement multicast support for torus-2QoS Jim Schutt
2009-12-18 20:51   ` [PATCH 12/12] opensm: Update documentation to describe torus-2QoS multicast support Jim Schutt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.