Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>,
	Stephen Bates <sbates@raithlin.com>, Jens Axboe <axboe@fb.com>,
	Keith Busch <kbusch@kernel.org>, Max Gurtovoy <maxg@mellanox.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Christoph Hellwig <hch@lst.de>
Subject: [PATCH v9 00/12] nvmet: add target passthru commands support
Date: Wed,  9 Oct 2019 13:25:17 -0600
Message-ID: <20191009192530.13079-1-logang@deltatee.com> (raw)

Hi,

This is v9 of the passthru patchset. This addresses some of Sagi's
feedback from v8 and rebases onto v5.4-rc2.

--

Chaitainya has asked us to take on these patches as we have an
interest in getting them into upstream. To that end, we've done
a large amount of testing, bug fixes and cleanup.

Passthru support for nvmet allows users to export an entire
NVMe controller through NVMe-oF. When exported in this way (as opposed
to exporting each namespace as a block device), all the NVMe commands
are passed to the given controller unmodified, including most admin
commands and Vendor Unique Commands (VUCs). A passthru target will
expose all namespaces for a given device to the remote host.

There are three major non-bugfix changes that we've done to the series:

1) Instead of using a seperate special passthru subsystem in
   configfs simply add a passthru directory that's analogous to
   the existing namespace directories. The directories have
   very similar attributes to namespaces but are mutually exclusive.
   If a user enables a namespaces, they can't then enable
   passthru controller and vice versa. This simplifies the code
   required to implement passthru configfs and IMO creates a much
   clearer and uniform interface.

2) Instead of taking a bare controller name (ie. "nvme1"), take a
   full device path to the controller's char device. This is more
   consistent with the regular namespaces which take a path and
   also allows users to make use of udev rules and symlinks to
   manage their controllers instead of the potentially unstable
   device names.

3) Implement block accounting for the passthru devices. This is so
   the target OS can still track device usage using /proc/diskstats.

Besides these three changes, we've also found a large number of bugs
and crashes and did a bunch of testing with KASAN, lockdep and kmemleak.
A more complete list of changes is given below.

Additionally, we've written some new blktests to test the passthru
code. A branch is available here[1] and can be submitted once these
patches are upstream.

These patches are based off of v5.4-rc2 and a git branch is available
at [2].

Thanks,

Logan

[1] https://github.com/Eideticom/blktests nvmet_passthru
[2] https://github.com/sbates130272/linux-p2pmem/ nvmet_passthru_v9

--

v9 Changes:
  1. Rebased onto v5.4-rc2 (required adjusting nvme_identify_ns() usage)
  2. Collected Sagi's Reviewed-By Tags
  3. Squashed seperate Kconfig patch into passthru patch (Per Sagi)
  4. Set REQ_FUA for flush requests and remove special casing
     on RQF_IO_STAT (Per Sagi)

v8 Changes:
  1. Rebased onto v5.3-rc6
  2. Collected Max's Reviewed-By tags
  3. Converted admin command black-list to a white-list, but
     allow all vendor specific commands. With this, we feel
     it's safe to allow multiple connections from hosts.
     (As per Sagi's feedback)

v7 Changes:
  1. Rebased onto v5.3-rc2
  2. Rework nvme_ctrl_get_by_path() to use filp_open() instead of
     the cdev changes that were in v6. (Per Al)
  3. Override the cmic bit to allow multipath and allow
     multiple connections from the same hostnqn. (At the same
     time I cleaned up the method of rejecting multiple connections.)
     See Patch 8)
  4. Found a bug when used with the tcp transport (See Patch 10)

v6 Changes:
  1. Rebased onto v5.3-rc1
  2. Rework configfs interface to simply be a passthru directory
     within the existing subsystem. The directory is similar to
     and consistent with a namespace directory.
  3. Have the configfs take a path instead of a bare controller name
  4. Renamed the main passthru file to io-cmd-passthru.c for consistency
     with the file and block-dev methods.
  5. Cleaned up all the CONFIG_NVME_TARGET_PASSTHRU usage to remove
     all the inline #ifdefs
  6. Restructured nvmet_passthru_make_request() a bit for clearer code
  7. Moved nvme_find_get_ns() call into nvmet_passthru_execute_cmd()
     seeing calling it in nvmet_req_init() causes a lockdep warning
     due to nvme_find_get_ns() being able to sleep.
  8. Added a check in nvmet_passthru_execute_cmd() to ensure we don't
     violate queue_max_segments or queue_max_hw_sectors and overrode
     mdts to ensure hosts don't intentionally submit commands
     that will exceed these limits.
  9. Reworked the code which ensures there's only one subsystem per
     passthru controller to use an xarray instead of a list as this is
     simpler and more easily fixed some bugs triggered by disabling
     subsystems that weren't enabled.
 10. Removed the overide of the target cntlid with the passthru cntlid;
     this seemed like a really bad idea especially in the presence of
     mixed systems as you could end up with two ctrlrs with the same
     cntlid. For now, commands that depend on cntlid are black listed.
 11. Implement block accounting for passthru so the target can track
     usage using /proc/diskstats
 12. A number of other minor bug fixes and cleanups

v5 Changes (not sent to list, from Chaitanya):
  1. Added workqueue for admin commands.
  2. Added kconfig option for the pass-thru feature.
  3. Restructure the parsing code according to your suggestion,
     call nvmet_xxx_parse_cmd() from nvmet_passthru_parse_cmd().
  4. Use pass-thru instead of pt.
  5. Several cleanups and add comments at the appropriate locations.
  6. Minimize the code for checking pass-thru ns across all the subsystems.
  7. Removed the delays in the ns related admin commands since I was
     not able to reproduce the previous bug.

v4 Changes:
  1. Add request polling interface to the block layer.
  2. Use request polling interface in the NVMEoF target passthru code
     path.
  3. Add checks suggested by Sagi for creating one target ctrl per
     passthru ctrl.
  4. Don't enable the namespace if it belongs to the configured passthru
     ctrl.
  5. Adjust the code latest kernel.

v3 Changes:
  1. Split the addition of passthru command handlers and integration
     into two different patches since we add guards to create one target
     controller per passthru controller. This way it will be easier to
     review the code.
  2. Adjust the code for 4.18.

v2 Changes:
  1. Update the new nvme core controller find API naming and
     changed the string comparison of the ctrl.
  2. Get rid of the newly added #defines for target ctrl values.
  3. Use the newly added structure members in the same patch where
     they are used. Aggregate the passthru command handling support
     and integration with nvmet-core into one patch.
  4. Introduce global NVMe Target subsystem list for connected and
     not connected subsystems on the target side.
  5. Add check when configuring the target ns and target
     passthru ctrl to allow only one target controller to be created
     for one passthru subsystem.
  6. Use the passthru ctrl cntlid when creating the
     target controller.

Chaitanya Kulkarni (4):
  nvme-core: export existing ctrl and ns interfaces
  nvmet: add return value to  nvmet_add_async_event()
  nvmet-passthru: add passthru code to process commands
  nvmet-core: don't check the data len for pt-ctrl

Logan Gunthorpe (8):
  nvme-core: introduce nvme_ctrl_get_by_path()
  nvmet: make nvmet_copy_ns_identifier() non-static
  nvmet-passthru: add enable/disable helpers
  nvmet-tcp: don't check data_len in nvmet_tcp_map_data()
  nvmet-configfs: introduce passthru configfs interface
  block: don't check blk_rq_is_passthrough() in blk_do_io_stat()
  block: call blk_account_io_start() in blk_execute_rq_nowait()
  nvmet-passthru: support block accounting

 block/blk-exec.c                      |   2 +
 block/blk-mq.c                        |   2 +-
 block/blk.h                           |   5 +-
 drivers/nvme/host/core.c              |  43 +-
 drivers/nvme/host/nvme.h              |   9 +
 drivers/nvme/target/Kconfig           |  10 +
 drivers/nvme/target/Makefile          |   1 +
 drivers/nvme/target/admin-cmd.c       |   4 +-
 drivers/nvme/target/configfs.c        |  99 ++++
 drivers/nvme/target/core.c            |  36 +-
 drivers/nvme/target/io-cmd-passthru.c | 647 ++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h           |  61 ++-
 drivers/nvme/target/tcp.c             |   2 +-
 include/linux/nvme.h                  |   1 +
 14 files changed, 903 insertions(+), 19 deletions(-)
 create mode 100644 drivers/nvme/target/io-cmd-passthru.c

--
2.20.1

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

             reply index

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-09 19:25 Logan Gunthorpe [this message]
2019-10-09 19:25 ` [PATCH v9 01/12] nvme-core: introduce nvme_ctrl_get_by_path() Logan Gunthorpe
2019-10-09 22:13   ` Keith Busch
2019-10-10 11:37   ` Christoph Hellwig
2019-10-09 19:25 ` [PATCH v9 02/12] nvme-core: export existing ctrl and ns interfaces Logan Gunthorpe
2019-10-09 22:14   ` Keith Busch
2019-10-09 19:25 ` [PATCH v9 03/12] nvmet: add return value to nvmet_add_async_event() Logan Gunthorpe
2019-10-09 22:17   ` Keith Busch
2019-10-09 19:25 ` [PATCH v9 04/12] nvmet: make nvmet_copy_ns_identifier() non-static Logan Gunthorpe
2019-10-09 22:18   ` Keith Busch
2019-10-10 11:50   ` Christoph Hellwig
2019-10-09 19:25 ` [PATCH v9 05/12] Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [logang@deltatee.com: fixed some of the wording in the help message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Logan Gunthorpe
2019-10-10 12:34   ` Christoph Hellwig
2019-10-25 23:09     ` [PATCH v9 05/12] Logan Gunthorpe
2019-10-09 19:25 ` [PATCH v9 05/12] nvmet-passthru: add passthru code to process commands Logan Gunthorpe
2019-10-09 19:25 ` [PATCH v9 06/12] nvmet-passthru: add enable/disable helpers Logan Gunthorpe
2019-10-09 19:25 ` [PATCH v9 07/12] nvmet-core: don't check the data len for pt-ctrl Logan Gunthorpe
2019-10-10 11:04   ` Christoph Hellwig
2019-10-09 19:25 ` [PATCH v9 08/12] nvmet-tcp: don't check data_len in nvmet_tcp_map_data() Logan Gunthorpe
2019-10-10 11:07   ` Christoph Hellwig
2019-10-09 19:25 ` [PATCH v9 09/12] nvmet-configfs: introduce passthru configfs interface Logan Gunthorpe
2019-10-09 19:25 ` [PATCH v9 10/12] block: don't check blk_rq_is_passthrough() in blk_do_io_stat() Logan Gunthorpe
2019-10-10 10:05   ` Christoph Hellwig
2019-10-10 17:56     ` Logan Gunthorpe
2019-10-09 19:25 ` [PATCH v9 11/12] block: call blk_account_io_start() in blk_execute_rq_nowait() Logan Gunthorpe
2019-10-10 10:06   ` Christoph Hellwig
2019-10-09 19:25 ` [PATCH v9 12/12] nvmet-passthru: support block accounting Logan Gunthorpe

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191009192530.13079-1-logang@deltatee.com \
    --to=logang@deltatee.com \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git