All of lore.kernel.org
 help / color / mirror / Atom feed
From: Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org>
To: Alex Netes <alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Jared Carr <jared.carr-Y2zl/4KMd60@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [opensm] RFC: new routing options (repost)
Date: Tue, 05 Jul 2011 09:53:34 -0700	[thread overview]
Message-ID: <1309884814.11479.29.camel@auk59.llnl.gov> (raw)
In-Reply-To: <20110704105259.GA6084-iQai9MGU/dyyaiaB+Ve85laTQe2KTcn/@public.gmane.org>

Hi Alex,

Thanks.  Are you still reviewing the remote_guid_sorting patch (the 2/4
patch)?  Or do you feel there is work there that needs to be done?

Al

On Mon, 2011-07-04 at 03:52 -0700, Alex Netes wrote:
> Hi Al, Hared,
> 
> Applied:
>   [PATCH 1/4] Support port shifting.
>   [PATCH 3/4] Support scatter ports.
>   [PATCH 4/4] Cleanup scatter ports patch. 
> 
> Thanks.
> 
> On 17:56 Wed 06 Apr     , Albert Chu wrote:
> > Hey Alex, Jared,
> > 
> > On Wed, 2011-04-06 at 11:14 -0700, Albert Chu wrote:
> > > Hey Alex,
> > > 
> > > On Wed, 2011-04-06 at 07:09 -0700, Alex Netes wrote:
> > > > Hi Al, Jared,
> > > > 
> > > > On 14:31 Wed 23 Mar     , Albert Chu wrote:
> > > > > > 
> > > > > > 1) Port Shifting
> > > > > > 
> > > > > > This is similar to what was done with some of the LMC > 0 code.
> > > > > > Congestion would occur due to "alignment" of routes w/ common traffic
> > > > > > patterns.  However, we found that it was also necessary for LMC=0 and
> > > > > > only for used-ports.  For example, lets say there are 4 ports (called A,
> > > > > > B, C, D) and we are routing lids 1-9 through them.  Suppose only routing
> > > > > > through A, B, and C will reach lids 1-9.
> > > > > > 
> > > > > > The LFT would normally be:
> > > > > > 
> > > > > > A: 1 4 7
> > > > > > B: 2 5 8
> > > > > > C: 3 6 9
> > > > > > D:
> > > > > > 
> > > > > > The Port Shifting option would make this:
> > > > > > 
> > > > > > A: 1 6 8
> > > > > > B: 2 4 9
> > > > > > C: 3 5 7
> > > > > > D:
> > > > > > 
> > > > > > This option by itself improved the mpiGraph average send/recv bandwidth
> > > > > > from 420 MB/s and 508 MB/s to to 991 MB/s and 1172 MB/s.
> > > > > > 
> > > > 
> > > > After thinking about this a little more and reviewing Jared Carr's - Scatter ports
> > > > patch, I think we should combine these efforts into one framework as Al
> > > > suggested.
> > 
> > As I was beginning to integrate Jared's patch with mine, it ends up that
> > algorithmically/architecturally, it isn't as easy (or similar) as I had
> > originally thought.  In particular, it has issues with LMC > 0.
> > Normally you want to route through a port that is least forwarded
> > through or goes through systems it hasn't seen yet.  This sort of
> > conflicts with the idea of selecting a port randomly.
> > 
> > I'm going to throw out the following patch series as a starting point
> > for discussion on scatter ports.  My original two patches have been
> > updated with new log messages and some minor tweaks.
> > 
> > My attempt of integration of Jared's scatter patch is included.  It has
> > a variety of cleanup (b/c of conflicts w/ my patches), 1 or 2 gotchas I
> > caught, and various tweaks for code consistency with my patches/other
> > OpenSM code.  Jared's original code algorithm is largely unchanged, but
> > I did modify it to deal with LMC > 0 better (by basically ignoring LMC).
> > 
> > Jared, LMK what you think and if it'll work for you.
> > 
> > Al
> > 
> > P.S.  Jared, I made you author on the 3rd patch naturally.
> > 
> > > Moreover, isn't "port_shifting" too much fabric oriented? Do
> > > > general OpenSM users will find this useful for them?
> > > > Moreover, how can user identify that port_shifting may improve performance for
> > > > him.
> > > 
> > > I will admit, I'm unsure of how much non-HPC users would benefit from
> > > this option, be hurt by it, or if they would even care.  I can't speak
> > > for all users, but here at LLNL and at most of the lab HPC sites, people
> > > play with the options and experiment to find the best routing algorithm
> > > + settings that support their environment.  I would imagine the
> > > port_shifting option would just be another option for people to
> > > experiment with.
> > > 
> > > I think adding Jared's Scatter Ports would be easy to merge into my line
> > > of patches.  Let me see if I can integrate his patch into my line
> > > easily.
> > > 
> > > > Is providing shift factor (more than the suggested 1) will help to make it
> > > > suitable foo a general case?
> > > 
> > > That seems like a good idea, we certainly could support an arbitrary
> > > shift, allowing users to experiment if there is a better one for their
> > > particular environment.
> > > 
> > > > > > 2) Remote Guid Sorting
> > > > > > 
> > > > > > Most core/spine switches we've seen thus far have had line boards
> > > > > > connected to spine boards in a consistent pattern.  However, we recently
> > > > > > got some Qlogic switches that connect from line/leaf boards to spine
> > > > > > boards in a (to the casual observer) random pattern.  I'm sure there was
> > > > > > a good electrical/board reason for this design, but it does hurt routing
> > > > > > b/c updn doesn't account for this.  Here's an output from iblinkinfo as
> > > > > > an example.
> > > > > > 
> > > > 
> > > > Why this problem can't be addressed by guid_routing_order_file option?
> > > 
> > > The problem we encountered in our fabric is predominantly a
> > > switch-to-switch routing issue with a spine switch.  The
> > > guid_routing_order_file wouldn't be able to solve this, since its input
> > > is just end ports.
> > > 
> > > Or another way to say it, this option directly affects the routing
> > > decisions made.  The guid_routing_order_file does not, it only affects
> > > the order in which routes are chosen (which can have consequences, but
> > > the routing algorithm itself is unchanged).
> > > 
> > > Al
> > > 
> > > > 
> > > > --Alex
> > -- 
> > Albert Chu
> > chu11-i2BcT+NCU+M@public.gmane.org
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> 
> 
-- 
Albert Chu
chu11-i2BcT+NCU+M@public.gmane.org
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2011-07-05 16:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-11  1:33 [opensm] RFC: new routing options (repost) Albert Chu
     [not found] ` <1297388014.18394.302.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2011-03-23 21:31   ` Albert Chu
     [not found]     ` <1300915898.3128.168.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2011-04-06 14:09       ` Alex Netes
     [not found]         ` <20110406140929.GA21920-iQai9MGU/dyyaiaB+Ve85laTQe2KTcn/@public.gmane.org>
2011-04-06 18:14           ` Albert Chu
     [not found]             ` <1302113667.4906.336.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2011-04-07  0:56               ` Albert Chu
     [not found]                 ` <1302137816.4906.403.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2011-04-11 21:24                   ` Carr, Jared F
2011-07-04 10:52                   ` Alex Netes
     [not found]                     ` <20110704105259.GA6084-iQai9MGU/dyyaiaB+Ve85laTQe2KTcn/@public.gmane.org>
2011-07-05 16:53                       ` Albert Chu [this message]
     [not found]                         ` <1309884814.11479.29.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2011-07-05 17:07                           ` Alex Netes
     [not found]                             ` <20110705170738.GC18903-iQai9MGU/dyyaiaB+Ve85laTQe2KTcn/@public.gmane.org>
2011-07-05 17:46                               ` Albert Chu
     [not found]                                 ` <1309887969.11479.48.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2011-07-06  8:07                                   ` Alex Netes
     [not found]                                     ` <20110706080736.GD18903-iQai9MGU/dyyaiaB+Ve85laTQe2KTcn/@public.gmane.org>
2011-07-06 16:54                                       ` Albert Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1309884814.11479.29.camel@auk59.llnl.gov \
    --to=chu11-i2bct+ncu+m@public.gmane.org \
    --cc=alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=jared.carr-Y2zl/4KMd60@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.