All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/35] RFC: add "nvme monitor" subcommand
@ 2021-01-26 20:32 mwilck
  2021-01-26 20:32 ` [PATCH 01/35] nvme-monitor: add new stub mwilck
                   ` (36 more replies)
  0 siblings, 37 replies; 89+ messages in thread
From: mwilck @ 2021-01-26 20:32 UTC (permalink / raw)
  To: Keith Busch, linux-nvme; +Cc: Hannes Reinecke, Chaitanya Kulkarni, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

(Cover letter copied from https://github.com/linux-nvme/nvme-cli/pull/877)

This patch set adds a new subcommand **nvme monitor**. In this mode,
**nvme-cli** runs continuously, monitors events (currently, uevents) relevant
for discovery, and optionally autoconnects to newly discovered subsystems.

The monitor mode is suitable to be run in a systemd service. An appropriate
unit file is provided. As such, **nvme monitor** can be used as an alternative
to the current auto-connection mechanism based on udev rules and systemd
template units. If `--autoconnect` is active, **nvme monitor** masks the
respective udev rules in order to prevent simultaneous connection attempts
from udev and itself.

This method for discovery and autodetection has some advantages over the
current udev-rule based approach:

 * By using the `--persistent` option, users can easily control whether
   persistent discovery controllers for discovered transport addresses should
   be created and monitored for AEN events. **nvme monitor** watches known
   transport addresses, creates discovery controllers as required, and re-uses
   existing ones if possible. It keeps track of persistent discovery
   controllers it created, and tears them down on exit. When run in
   `--persistent --autoconnect` mode *in the initial ramfs*, it will keep
   discovery controllers alive, so that a new instance started after switching
   root can simply re-use them.
 * In certain situations, the systemd-based approach may miss events due to
   race conditions. This can happen e.g. if an FC remote port is detected, but
   shortly after it's detection an FC relogin procedure is necessary e.g. due to
   an RSCN. In this case, an `fc_udev_device` uevent will be received on the
   first detection and handled by an `nvme connect-all` command run from
   `nvmf-connect@.service`. The connection attempt to the rport in question will
   fail with "no such device" because of the simultaneous FC
   relogin. `nvmf-connect@.service` may not terminate immediately, because it
   attempts to establish other connections listed in the Discovery Log page it
   retrieved. When the FC relogin eventually finishes, a new uevent will be
   received, and `nvmf-connect@` will be started again, but *this has no effect*
   if the previous `nvmf-connect@` service hasn't finished yet. This is the
   general semantics of systemd services, no easy workaround exists.  **nvme
   monitor** doesn't suffer from this problem. If it sees an uevent for a
   transport address for which a discovery is already running, it will queue the
   handling of this event up and restart the discovery after it's finished.
 * Resource consumption for handling uevents is lower. Instead of running an
   udev worker, executing the rules, executing `systemctl start` from the
   worker, starting a systemd service, and starting a separate **nvme-cli**
   instance, only a single `fork()` operation is necessary. Of course, on the
   back side, the monitor itself consumes resources while it's running and
   waiting for events. On my system with 8 persistent discovery controllers,
   its RSS is ~3MB. CPU consumption is zero as long as no events occur.
 * **nvme monitor** could be easily extended to handle events for non-FC
   transports.

I've tested `fc_udev_device` handling for NVMeoFC with an Ontap target, and
AEN handling for RDMA using a Linux **nvmet** target.

### Implementation notes

I've tried to change the exisiting **nvme-cli** code as little as possible
while reusing the code from `fabrics.c`. The majority of changes in the
existing code exports formerly static functions and variables, so that they
are usable from the monitor code.

The main process just waits for events using `epoll()`. When an event is
received that necessitates a new discovery, a child is forked. This makes it
possible to fill in the configuration parameters for `do_discover()` without
interfering with the main process or other discovery tasks running in
parallel. The program tracks *transport addresses* (called "connections" in
the code) rather than NVMe controllers. In `--persistent` mode, it tries to
maintain exactly one persistent discovery connection per transport address.

Using `epoll()` may look over-engineered at this stage. I hope the better
flexibility over `poll()` (in particular, the ability to add new event sources
while waiting) will simplify future extensions and improvements.

### Todo

 * Referrals are not handled perfectly yet. They will be handled by
   `do_discover()` just as it would when called from **nvme connect-all**, but
   it would be better to pass referrals back to the main process to make it
   aware of the additional discovery controller rather than using
   recursion. The main process would e.g. know if a discovery is already
   running for the transport address in the referrral.
 *  When "add" uevents for nvme controller devices are received, the
    controller is consistently not in `live` state yet, and attempting to read
    the `subsysnqn` sysfs attribute returns `(efault)`. While this should
    arguably be fixed in the kernel, it could be worked around in user space
    by using timers or polling the `state` sysfs attribute for changes.
 * Parse and handle `discovery.conf` on startup.
 * Implement support for RDMA and TCP protocols.


Martin Wilck (35):
  nvme-monitor: add new stub
  monitor: create udev socket
  monitor: initialize signal handling
  monitor: add main loop for uevent monitoring
  monitor: add uevent filters
  monitor: Create a log() macro.
  fabrics: use log() macro
  monitor: add command line options to control logging
  nvme_get_ctrl_attr(): constify "path" argument
  fabrics: export do_discover(), build_options() and config
  monitor: add option -A / --autoconnect
  monitor: add helpers for __attribute__((cleanup))
  monitor: disable nvmf-autoconnect udev rules in autoconnect mode
  monitor: implement handling of fc_udev_device
  monitor: implement handling of nvme AEN events
  monitor: reset children's signal disposition
  monitor: handle SIGCHLD for terminated child processes
  monitor: add "--persistent/-p" flag
  fabrics: use "const char *" in struct config
  fabrics: export arg_str(), parse_conn_arg(), and remove_ctrl()
  nvme-cli: add "list.h"
  conn-db: add simple connection registry
  monitor: handle restart of pending discoveries
  monitor: monitor_discovery(): try to reuse existing controllers
  monitor: read existing connections on startup
  monitor: implement starting discovery controllers on startup
  monitor: implement cleanup of created discovery controllers
  monitor: basic handling of add/remove uevents for nvme controllers
  monitor: kill running discovery tasks on exit
  monitor: add connection property options from connect-all
  completions: add completions for nvme monitor
  nvmf-autoconnect: add unit file for nvme-monitor.service
  nvme-connect-all(1): fix documentation for --quiet/-S
  nvme-monitor(1): add man page for nvme-monitor
  monitor: add option --keep/-K

 Documentation/cmds-main.txt                   |    4 +
 Documentation/nvme-connect-all.1              |    8 +-
 Documentation/nvme-connect-all.html           |   10 +-
 Documentation/nvme-connect-all.txt            |    4 +-
 Documentation/nvme-monitor.1                  |  218 ++++
 Documentation/nvme-monitor.html               | 1067 +++++++++++++++++
 Documentation/nvme-monitor.txt                |  170 +++
 Makefile                                      |   10 +
 common.h                                      |   12 +
 completions/bash-nvme-completion.sh           |    6 +-
 conn-db.c                                     |  341 ++++++
 conn-db.h                                     |  141 +++
 fabrics.c                                     |  145 +--
 fabrics.h                                     |   39 +
 list.h                                        |  365 ++++++
 log.h                                         |   44 +
 monitor.c                                     |  764 ++++++++++++
 monitor.h                                     |    6 +
 nvme-builtin.h                                |    1 +
 nvme-topology.c                               |    2 +-
 nvme.c                                        |   13 +
 nvme.h                                        |    2 +-
 nvmf-autoconnect/systemd/nvme-monitor.service |   17 +
 23 files changed, 3296 insertions(+), 93 deletions(-)
 create mode 100644 Documentation/nvme-monitor.1
 create mode 100644 Documentation/nvme-monitor.html
 create mode 100644 Documentation/nvme-monitor.txt
 create mode 100644 conn-db.c
 create mode 100644 conn-db.h
 create mode 100644 list.h
 create mode 100644 log.h
 create mode 100644 monitor.c
 create mode 100644 monitor.h
 create mode 100644 nvmf-autoconnect/systemd/nvme-monitor.service

-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2021-02-22 21:05 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26 20:32 [PATCH 00/35] RFC: add "nvme monitor" subcommand mwilck
2021-01-26 20:32 ` [PATCH 01/35] nvme-monitor: add new stub mwilck
2021-01-26 20:32 ` [PATCH 02/35] monitor: create udev socket mwilck
2021-01-26 20:32 ` [PATCH 03/35] monitor: initialize signal handling mwilck
2021-01-26 20:32 ` [PATCH 04/35] monitor: add main loop for uevent monitoring mwilck
2021-01-26 20:32 ` [PATCH 05/35] monitor: add uevent filters mwilck
2021-02-04  6:58   ` Hannes Reinecke
2021-01-26 20:32 ` [PATCH 06/35] monitor: Create a log() macro mwilck
2021-02-04  7:01   ` Hannes Reinecke
2021-02-04  9:14     ` Martin Wilck
2021-01-26 20:32 ` [PATCH 07/35] fabrics: use " mwilck
2021-02-04  7:02   ` Hannes Reinecke
2021-01-26 20:32 ` [PATCH 08/35] monitor: add command line options to control logging mwilck
2021-02-04  7:04   ` Hannes Reinecke
2021-02-04  9:18     ` Martin Wilck
2021-01-26 20:32 ` [PATCH 09/35] nvme_get_ctrl_attr(): constify "path" argument mwilck
2021-02-04  7:05   ` Hannes Reinecke
2021-01-26 20:32 ` [PATCH 10/35] fabrics: export do_discover(), build_options() and config mwilck
2021-02-04  7:09   ` Hannes Reinecke
2021-02-04  9:21     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 11/35] monitor: add option -A / --autoconnect mwilck
2021-01-29 18:59   ` Sagi Grimberg
2021-01-29 19:33     ` Martin Wilck
2021-01-29 20:09       ` Sagi Grimberg
2021-02-04  7:13   ` Hannes Reinecke
2021-01-26 20:33 ` [PATCH 12/35] monitor: add helpers for __attribute__((cleanup)) mwilck
2021-02-04  7:14   ` Hannes Reinecke
2021-01-26 20:33 ` [PATCH 13/35] monitor: disable nvmf-autoconnect udev rules in autoconnect mode mwilck
2021-01-29  1:52   ` Sagi Grimberg
2021-01-29 14:16     ` Martin Wilck
2021-01-29 18:54       ` Sagi Grimberg
2021-02-04  7:16   ` Hannes Reinecke
2021-02-04  9:37     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 14/35] monitor: implement handling of fc_udev_device mwilck
2021-01-26 20:33 ` [PATCH 15/35] monitor: implement handling of nvme AEN events mwilck
2021-01-26 20:33 ` [PATCH 16/35] monitor: reset children's signal disposition mwilck
2021-01-29  1:54   ` Sagi Grimberg
2021-01-29 14:18     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 17/35] monitor: handle SIGCHLD for terminated child processes mwilck
2021-01-29  1:54   ` Sagi Grimberg
2021-01-26 20:33 ` [PATCH 18/35] monitor: add "--persistent/-p" flag mwilck
2021-01-29 19:02   ` Sagi Grimberg
2021-01-29 19:45     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 19/35] fabrics: use "const char *" in struct config mwilck
2021-02-04  7:20   ` Hannes Reinecke
2021-01-26 20:33 ` [PATCH 20/35] fabrics: export arg_str(), parse_conn_arg(), and remove_ctrl() mwilck
2021-01-26 20:33 ` [PATCH 21/35] nvme-cli: add "list.h" mwilck
2021-01-26 20:33 ` [PATCH 22/35] conn-db: add simple connection registry mwilck
2021-01-29  1:59   ` Sagi Grimberg
2021-01-29 14:18     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 23/35] monitor: handle restart of pending discoveries mwilck
2021-01-26 20:33 ` [PATCH 24/35] monitor: monitor_discovery(): try to reuse existing controllers mwilck
2021-01-26 20:33 ` [PATCH 25/35] monitor: read existing connections on startup mwilck
2021-01-26 20:33 ` [PATCH 26/35] monitor: implement starting discovery controllers " mwilck
2021-01-29 21:06   ` Sagi Grimberg
2021-01-29 21:13     ` Martin Wilck
2021-01-29 21:18       ` Sagi Grimberg
2021-01-26 20:33 ` [PATCH 27/35] monitor: implement cleanup of created discovery controllers mwilck
2021-01-26 20:33 ` [PATCH 28/35] monitor: basic handling of add/remove uevents for nvme controllers mwilck
2021-01-26 20:33 ` [PATCH 29/35] monitor: kill running discovery tasks on exit mwilck
2021-01-26 20:33 ` [PATCH 30/35] monitor: add connection property options from connect-all mwilck
2021-01-26 20:33 ` [PATCH 31/35] completions: add completions for nvme monitor mwilck
2021-01-26 20:33 ` [PATCH 32/35] nvmf-autoconnect: add unit file for nvme-monitor.service mwilck
2021-01-29 19:08   ` Sagi Grimberg
2021-01-29 19:50     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 33/35] nvme-connect-all(1): fix documentation for --quiet/-S mwilck
2021-01-29 19:09   ` Sagi Grimberg
2021-01-26 20:33 ` [PATCH 34/35] nvme-monitor(1): add man page for nvme-monitor mwilck
2021-01-26 20:33 ` [PATCH 35/35] monitor: add option --keep/-K mwilck
2021-01-29 19:10   ` Sagi Grimberg
2021-01-29 19:53     ` Martin Wilck
2021-01-29 20:16       ` Sagi Grimberg
2021-01-29 20:30         ` Martin Wilck
2021-01-29 20:45           ` Sagi Grimberg
2021-01-29 20:51             ` Martin Wilck
2021-01-29 20:57               ` Sagi Grimberg
2021-01-29 21:05                 ` Martin Wilck
2021-01-29 21:11                   ` Sagi Grimberg
2021-01-29 21:15                     ` Martin Wilck
2021-01-29 21:21                       ` Sagi Grimberg
2021-02-04  7:34                       ` Hannes Reinecke
2021-02-04  9:41                         ` Martin Wilck
2021-01-29  1:14 ` [PATCH 00/35] RFC: add "nvme monitor" subcommand Sagi Grimberg
2021-01-29 11:18   ` Martin Wilck
2021-01-29 20:08     ` Sagi Grimberg
2021-01-29 20:27       ` Martin Wilck
2021-02-04  7:52         ` Hannes Reinecke
2021-02-22 19:02 ` Enzo Matsumiya
2021-02-22 21:05   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.