linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Mansfield <patmans@us.ibm.com>
To: James Bottomley <James.Bottomley@SteelEye.com>,
	Lars Marowsky-Bree <lmb@suse.de>
Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [RFC] Multi-path IO in 2.5/2.6 ?
Date: Mon, 9 Sep 2002 09:56:52 -0700	[thread overview]
Message-ID: <20020909095652.A21245@eng2.beaverton.ibm.com> (raw)
In-Reply-To: <200209091458.g89Evv806056@localhost.localdomain>; from James.Bottomley@SteelEye.com on Mon, Sep 09, 2002 at 09:57:56AM -0500

On Mon, Sep 09, 2002 at 09:57:56AM -0500, James Bottomley wrote:
> Lars Marowsky-Bree <lmb@suse.de> said:
> > So, what is the take on "multi-path IO" (in particular, storage) in
> > 2.5/2.6?
> 
> I've already made my views on this fairly clear (at least from the SCSI stack 
> standpoint):
> 
> - multi-path inside the low level drivers (like qla2x00) is wrong

Agreed.

> - multi-path inside the SCSI mid-layer is probably wrong

Disagree

> - from the generic block layer on up, I hold no specific preferences

Using md or volume manager is wrong for non-failover usage, and somewhat
bad for failover models; generic block layer is OK but it is wasted code
for any lower layers that do not or cannot have multi-path IO (such as
IDE).

> 
> James

I have a newer version of SCSI multi-path in the mid layer that I hope to
post this week, the last version patched against 2.5.14 is here:

http://www-124.ibm.com/storageio/multipath/scsi-multipath

Some documentation is located here:

http://www-124.ibm.com/storageio/multipath/scsi-multipath/docs/index.html

I have a current patch against 2.5.33 that includes NUMA support (it
requires discontigmem support that I believe is in the current linus
bk tree, plus NUMA topology patches).

A major problem with multi-path in md or other volume manager is that we
use multiple (block layer) queues for a single device, when we should be
using a single queue. If we want to use all paths to a device (i.e. round
robin across paths or such, not a failover model) this means the elevator
code becomes inefficient, mabye even counterproductive. For disk arrays,
this might not be bad, but for actual drives or even plugging single
ported drives into a switch or bus with multiple initiators, this could
lead to slower disk performance.

If the volume manager implements only a failover model (use only a single
path until that path fails), besides performance issues in balancing IO
loads, we waste space allocating an extra Scsi_Device for each path.

In the current code, each path is allocated a Scsi_Device, including a
request_queue_t, and a set of Scsi_Cmnd structures. Not only do we
end up with a Scsi_Device for each path, we also have an upper level
(sd, sg, st, or sr) driver attached to each Scsi_Device.

For sd, this means if you have n paths to each SCSI device, you are
limited to whatever limit sd has divided by n, right now 128 / n.  Having
four paths to a device is very reasonable, limiting us to 32 devices, but
with the overhead of 128 devices.

Using a volume manager to implement multiple paths (again non-failover
model) means that the queue_depth might be too large if the queue_depth
(i.e.  number of outstanding commands sent to the drive) is set as a
per-device value - we can end sending n * queue_depth commands to a device.

multi-path in the scsi layer enables multi-path use for all upper level scsi
drivers, not just disks.

We could implement multi-path IO in the block layer, but if the only user
is SCSI, this gains nothing compared to putting multi-path in the scsi
layers. Creating block level interfaces that will work for future devices
and/or future code is hard without already having the devices or code in
place. Any block level interface still requires support in the the
underlying layers.

I'm not against a block level interface, but I don't have ideas or code
for such an implementation.

Generic device naming consistency is a problem if multiple devices show up
with the same id.

With the scsi layer multi-path, ide-scsi or usb-scsi could also do
multi-path IO.

-- Patrick Mansfield

  reply	other threads:[~2002-09-09 16:52 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-09 14:57 [RFC] Multi-path IO in 2.5/2.6 ? James Bottomley
2002-09-09 16:56 ` Patrick Mansfield [this message]
2002-09-09 17:34   ` James Bottomley
2002-09-09 18:40     ` Mike Anderson
2002-09-10 13:02       ` Lars Marowsky-Bree
2002-09-10 16:03         ` Patrick Mansfield
2002-09-10 16:27         ` Mike Anderson
2002-09-10  0:08     ` Patrick Mansfield
2002-09-10  7:55       ` Jeremy Higdon
2002-09-10 13:04         ` Lars Marowsky-Bree
2002-09-10 16:20           ` Patrick Mansfield
2002-09-10 13:16       ` Lars Marowsky-Bree
2002-09-10 19:26         ` Patrick Mansfield
2002-09-11 14:20           ` James Bottomley
2002-09-11 19:17             ` Lars Marowsky-Bree
2002-09-11 19:37               ` James Bottomley
2002-09-11 19:52                 ` Lars Marowsky-Bree
2002-09-12  1:15                   ` Bernd Eckenfels
2002-09-11 21:38                 ` Oliver Xymoron
2002-09-11 20:30             ` Doug Ledford
2002-09-11 21:17               ` Mike Anderson
2002-09-10 17:21       ` Patrick Mochel
2002-09-10 18:42         ` Patrick Mansfield
2002-09-10 19:00           ` Patrick Mochel
2002-09-10 19:37             ` Patrick Mansfield
2002-09-11  0:21 ` Neil Brown
     [not found] <patmans@us.ibm.com>
2002-10-30 16:58 ` [PATCH] 2.5 current bk fix setting scsi queue depths Patrick Mansfield
2002-10-30 17:17   ` James Bottomley
2002-10-30 18:05     ` Patrick Mansfield
2002-10-31  0:44       ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2002-09-10 16:34 [RFC] Multi-path IO in 2.5/2.6 ? Cameron, Steve
2002-09-10 18:48 ` Alan Cox
2002-09-10 14:43 Cameron, Steve
2002-09-10 15:05 ` Alan Cox
2002-09-10 14:06 Cameron, Steve
2002-09-10 14:25 ` Alan Cox
2002-09-09 17:58 Ulrich Weigand
2002-09-09 10:49 Lars Marowsky-Bree
2002-09-09 12:23 ` Alan Cox
2002-09-10 10:30 ` Heinz J . Mauelshagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020909095652.A21245@eng2.beaverton.ibm.com \
    --to=patmans@us.ibm.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lmb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).