All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guan Junxiong <guanjunxiong@huawei.com>
To: Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>
Cc: <linux-block@vger.kernel.org>, Sagi Grimberg <sagi@grimberg.me>,
	<linux-nvme@lists.infradead.org>,
	Keith Busch <keith.busch@intel.com>,
	"Hannes Reinecke" <hare@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	"Shenhong (C)" <shenhong09@huawei.com>,
	niuhaoxin <niuhaoxin@huawei.com>
Subject: Re: nvme multipath support V4
Date: Mon, 23 Oct 2017 10:08:52 +0800	[thread overview]
Message-ID: <fbccc0db-77cd-fabc-0225-4b515729f214@huawei.com> (raw)
In-Reply-To: <20171018165258.23212-1-hch@lst.de>

Hi Christoph,


On 2017/10/19 0:52, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for multipathing, that is accessing nvme
> namespaces through multiple controllers to the nvme core driver.
> 
> It is a very thin and efficient implementation that relies on
> close cooperation with other bits of the nvme driver, and few small
> and simple block helpers.
> 
> Compared to dm-multipath the important differences are how management
> of the paths is done, and how the I/O path works.
> 
> Management of the paths is fully integrated into the nvme driver,
> for each newly found nvme controller we check if there are other
> controllers that refer to the same subsystem, and if so we link them
> up in the nvme driver.  Then for each namespace found we check if
> the namespace id and identifiers match to check if we have multiple
> controllers that refer to the same namespaces.  For now path
> availability is based entirely on the controller status, which at
> least for fabrics will be continuously updated based on the mandatory
> keep alive timer.  Once the Asynchronous Namespace Access (ANA)
> proposal passes in NVMe we will also get per-namespace states in
> addition to that, but for now any details of that remain confidential
> to NVMe members.
> 
> The I/O path is very different from the existing multipath drivers,
> which is enabled by the fact that NVMe (unlike SCSI) does not support
> partial completions - a controller will either complete a whole
> command or not, but never only complete parts of it.  Because of that
> there is no need to clone bios or requests - the I/O path simply
> redirects the I/O to a suitable path.  For successful commands
> multipath is not in the completion stack at all.  For failed commands
> we decide if the error could be a path failure, and if yes remove
> the bios from the request structure and requeue them before completing
> the request.  All together this means there is no performance
> degradation compared to normal nvme operation when using the multipath
> device node (at least not until I find a dual ported DRAM backed
> device :))
> 
> A git tree is available at:
> 
>    git://git.infradead.org/users/hch/block.git nvme-mpath
> 
> gitweb:
> 
>    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-mpath
> 
> Changes since V3:
>   - new block layer support for hidden gendisks
>   - a couple new patches to refactor device handling before the
>     actual multipath support
>   - don't expose per-controller block device nodes
>   - use /dev/nvmeXnZ as the device nodes for the whole subsystem.

If per-controller block device nodes are hidden, how can the user-space tools
such as multipath-tools and nvme-cli (if it supports) know status of each path of
the multipath device?
In some cases, the admin wants to know which path is in down state , in degraded
state such as suffering intermittent IO error because of shaky link and he can fix
the link or isolate such link from the normal path.

Regards
Guan


>   - expose subsystems in sysfs (Hannes Reinecke)
>   - fix a subsystem leak when duplicate NQNs are found
>   - fix up some names
>   - don't clear current_path if freeing a different namespace
> 
> Changes since V2:
>   - don't create duplicate subsystems on reset (Keith Bush)
>   - free requests properly when failing over in I/O completion (Keith Bush)
>   - new devices names: /dev/nvm-sub%dn%d
>   - expose the namespace identification sysfs files for the mpath nodes
> 
> Changes since V1:
>   - introduce new nvme_ns_ids structure to clean up identifier handling
>   - generic_make_request_fast is now named direct_make_request and calls
>     generic_make_request_checks
>   - reset bi_disk on resubmission
>   - create sysfs links between the existing nvme namespace block devices and
>     the new share mpath device
>   - temporarily added the timeout patches from James, this should go into
>     nvme-4.14, though
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 
> .
> 

WARNING: multiple messages have this Message-ID (diff)
From: guanjunxiong@huawei.com (Guan Junxiong)
Subject: nvme multipath support V4
Date: Mon, 23 Oct 2017 10:08:52 +0800	[thread overview]
Message-ID: <fbccc0db-77cd-fabc-0225-4b515729f214@huawei.com> (raw)
In-Reply-To: <20171018165258.23212-1-hch@lst.de>

Hi Christoph,


On 2017/10/19 0:52, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for multipathing, that is accessing nvme
> namespaces through multiple controllers to the nvme core driver.
> 
> It is a very thin and efficient implementation that relies on
> close cooperation with other bits of the nvme driver, and few small
> and simple block helpers.
> 
> Compared to dm-multipath the important differences are how management
> of the paths is done, and how the I/O path works.
> 
> Management of the paths is fully integrated into the nvme driver,
> for each newly found nvme controller we check if there are other
> controllers that refer to the same subsystem, and if so we link them
> up in the nvme driver.  Then for each namespace found we check if
> the namespace id and identifiers match to check if we have multiple
> controllers that refer to the same namespaces.  For now path
> availability is based entirely on the controller status, which at
> least for fabrics will be continuously updated based on the mandatory
> keep alive timer.  Once the Asynchronous Namespace Access (ANA)
> proposal passes in NVMe we will also get per-namespace states in
> addition to that, but for now any details of that remain confidential
> to NVMe members.
> 
> The I/O path is very different from the existing multipath drivers,
> which is enabled by the fact that NVMe (unlike SCSI) does not support
> partial completions - a controller will either complete a whole
> command or not, but never only complete parts of it.  Because of that
> there is no need to clone bios or requests - the I/O path simply
> redirects the I/O to a suitable path.  For successful commands
> multipath is not in the completion stack at all.  For failed commands
> we decide if the error could be a path failure, and if yes remove
> the bios from the request structure and requeue them before completing
> the request.  All together this means there is no performance
> degradation compared to normal nvme operation when using the multipath
> device node (at least not until I find a dual ported DRAM backed
> device :))
> 
> A git tree is available at:
> 
>    git://git.infradead.org/users/hch/block.git nvme-mpath
> 
> gitweb:
> 
>    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-mpath
> 
> Changes since V3:
>   - new block layer support for hidden gendisks
>   - a couple new patches to refactor device handling before the
>     actual multipath support
>   - don't expose per-controller block device nodes
>   - use /dev/nvmeXnZ as the device nodes for the whole subsystem.

If per-controller block device nodes are hidden, how can the user-space tools
such as multipath-tools and nvme-cli (if it supports) know status of each path of
the multipath device?
In some cases, the admin wants to know which path is in down state , in degraded
state such as suffering intermittent IO error because of shaky link and he can fix
the link or isolate such link from the normal path.

Regards
Guan


>   - expose subsystems in sysfs (Hannes Reinecke)
>   - fix a subsystem leak when duplicate NQNs are found
>   - fix up some names
>   - don't clear current_path if freeing a different namespace
> 
> Changes since V2:
>   - don't create duplicate subsystems on reset (Keith Bush)
>   - free requests properly when failing over in I/O completion (Keith Bush)
>   - new devices names: /dev/nvm-sub%dn%d
>   - expose the namespace identification sysfs files for the mpath nodes
> 
> Changes since V1:
>   - introduce new nvme_ns_ids structure to clean up identifier handling
>   - generic_make_request_fast is now named direct_make_request and calls
>     generic_make_request_checks
>   - reset bi_disk on resubmission
>   - create sysfs links between the existing nvme namespace block devices and
>     the new share mpath device
>   - temporarily added the timeout patches from James, this should go into
>     nvme-4.14, though
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
> 
> .
> 

  parent reply	other threads:[~2017-10-23  2:13 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-18 16:52 nvme multipath support V4 Christoph Hellwig
2017-10-18 16:52 ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 01/17] block: move REQ_NOWAIT Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  5:48   ` Hannes Reinecke
2017-10-19  5:48     ` Hannes Reinecke
2017-10-19  6:46   ` Johannes Thumshirn
2017-10-19  6:46     ` Johannes Thumshirn
2017-10-18 16:52 ` [PATCH 02/17] block: add REQ_DRV bit Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  5:48   ` Hannes Reinecke
2017-10-19  5:48     ` Hannes Reinecke
2017-10-19  6:47   ` Johannes Thumshirn
2017-10-19  6:47     ` Johannes Thumshirn
2017-10-18 16:52 ` [PATCH 03/17] block: provide a direct_make_request helper Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19 10:35   ` Sagi Grimberg
2017-10-19 10:35     ` Sagi Grimberg
2017-10-19 10:36     ` Sagi Grimberg
2017-10-19 10:36       ` Sagi Grimberg
2017-10-19 13:54     ` Christoph Hellwig
2017-10-19 13:54       ` Christoph Hellwig
2017-10-19 14:42       ` Sagi Grimberg
2017-10-19 14:42         ` Sagi Grimberg
2017-10-18 16:52 ` [PATCH 04/17] block: add a blk_steal_bios helper Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 05/17] block: don't look at the struct device dev_t in disk_devt Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  8:32   ` Johannes Thumshirn
2017-10-19  8:32     ` Johannes Thumshirn
2017-10-18 16:52 ` [PATCH 06/17] block: introduce GENHD_FL_HIDDEN Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  8:31   ` Johannes Thumshirn
2017-10-19  8:31     ` Johannes Thumshirn
2017-10-19 12:45   ` Hannes Reinecke
2017-10-19 12:45     ` Hannes Reinecke
2017-10-19 13:15     ` Christoph Hellwig
2017-10-19 13:15       ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 07/17] nvme: use ida_simple_{get,remove} for the controller instance Christoph Hellwig
2017-10-18 16:52   ` [PATCH 07/17] nvme: use ida_simple_{get, remove} " Christoph Hellwig
2017-10-19  6:53   ` [PATCH 07/17] nvme: use ida_simple_{get,remove} " Johannes Thumshirn
2017-10-19  6:53     ` [PATCH 07/17] nvme: use ida_simple_{get, remove} " Johannes Thumshirn
2017-10-19  6:58   ` [PATCH 07/17] nvme: use ida_simple_{get,remove} " Sagi Grimberg
2017-10-19  6:58     ` Sagi Grimberg
2017-10-19 15:14   ` Keith Busch
2017-10-19 15:14     ` Keith Busch
2017-10-19 15:12     ` Christoph Hellwig
2017-10-19 15:12       ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 08/17] nvme: use kref_get_unless_zero in nvme_find_get_ns Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  6:54   ` Johannes Thumshirn
2017-10-19  6:54     ` Johannes Thumshirn
2017-10-19  6:59   ` Sagi Grimberg
2017-10-19  6:59     ` Sagi Grimberg
2017-10-18 16:52 ` [PATCH 09/17] nvme: simplify nvme_open Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  6:55   ` Johannes Thumshirn
2017-10-19  6:55     ` Johannes Thumshirn
2017-10-19  6:59   ` Sagi Grimberg
2017-10-19  6:59     ` Sagi Grimberg
2017-10-18 16:52 ` [PATCH 10/17] nvme: switch controller refcounting to use struct device Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  7:17   ` Sagi Grimberg
2017-10-19  7:17     ` Sagi Grimberg
2017-10-19  7:20     ` Christoph Hellwig
2017-10-19  7:20       ` Christoph Hellwig
2017-10-19  7:31       ` Sagi Grimberg
2017-10-19  7:31         ` Sagi Grimberg
2017-10-19  7:37         ` Christoph Hellwig
2017-10-19  7:37           ` Christoph Hellwig
2017-10-19 10:02           ` Sagi Grimberg
2017-10-19 10:02             ` Sagi Grimberg
2017-10-19 10:18             ` Christoph Hellwig
2017-10-19 10:18               ` Christoph Hellwig
2017-10-19 10:33               ` Sagi Grimberg
2017-10-19 10:33                 ` Sagi Grimberg
2017-10-19 13:54                 ` Christoph Hellwig
2017-10-19 13:54                   ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 11/17] nvme: get rid of nvme_ctrl_list Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  7:18   ` Sagi Grimberg
2017-10-19  7:18     ` Sagi Grimberg
2017-10-19  7:22   ` Johannes Thumshirn
2017-10-19  7:22     ` Johannes Thumshirn
2017-10-19  7:24     ` Christoph Hellwig
2017-10-19  7:24       ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 12/17] nvme: check for a live controller in nvme_dev_open Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  7:18   ` Sagi Grimberg
2017-10-19  7:18     ` Sagi Grimberg
2017-10-19  7:23   ` Johannes Thumshirn
2017-10-19  7:23     ` Johannes Thumshirn
2017-10-18 16:52 ` [PATCH 13/17] nvme: track subsystems Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-18 22:39   ` Keith Busch
2017-10-18 22:39     ` Keith Busch
2017-10-18 22:53     ` Keith Busch
2017-10-18 22:53       ` Keith Busch
2017-10-19  7:14     ` Christoph Hellwig
2017-10-19  7:14       ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 14/17] nvme: introduce a nvme_ns_ids structure Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 15/17] nvme: track shared namespaces Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  7:36   ` Johannes Thumshirn
2017-10-19  7:36     ` Johannes Thumshirn
2017-10-19 11:06   ` Sagi Grimberg
2017-10-19 11:06     ` Sagi Grimberg
2017-10-19 13:51     ` Christoph Hellwig
2017-10-19 13:51       ` Christoph Hellwig
2017-10-20 19:03   ` Javier González
2017-10-20 19:03     ` Javier González
2017-10-18 16:52 ` [PATCH 16/17] nvme: implement multipath access to nvme subsystems Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-18 16:52 ` [PATCH 17/17] nvme: also expose the namespace identification sysfs files for mpath nodes Christoph Hellwig
2017-10-18 16:52   ` Christoph Hellwig
2017-10-19  7:45   ` Johannes Thumshirn
2017-10-19  7:45     ` Johannes Thumshirn
2017-10-19 15:24   ` Sagi Grimberg
2017-10-19 15:24     ` Sagi Grimberg
2017-10-23  2:08 ` Guan Junxiong [this message]
2017-10-23  2:08   ` nvme multipath support V4 Guan Junxiong
2017-10-23  6:33   ` Sagi Grimberg
2017-10-23  6:33     ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbccc0db-77cd-fabc-0225-4b515729f214@huawei.com \
    --to=guanjunxiong@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jthumshirn@suse.de \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=niuhaoxin@huawei.com \
    --cc=sagi@grimberg.me \
    --cc=shenhong09@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.