All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>,
	 linux-nvme@lists.infradead.org
Cc: Hannes Reinecke <hare@suse.de>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH 00/35] RFC: add "nvme monitor" subcommand
Date: Fri, 29 Jan 2021 12:18:05 +0100	[thread overview]
Message-ID: <d8956cf973867ad7cc6c4d968682ec3db3054aa3.camel@suse.com> (raw)
In-Reply-To: <60846bb5-c0df-ac23-260b-b53afd48f661@grimberg.me>

On Thu, 2021-01-28 at 17:14 -0800, Sagi Grimberg wrote:
> 
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > (Cover letter copied from 
> > https://github.com/linux-nvme/nvme-cli/pull/877)
> > 
> > This patch set adds a new subcommand **nvme monitor**. In this
> > mode,
> > **nvme-cli** runs continuously, monitors events (currently,
> > uevents) relevant
> > for discovery, and optionally autoconnects to newly discovered
> > subsystems.
> > 
> > The monitor mode is suitable to be run in a systemd service. An
> > appropriate
> > unit file is provided. As such, **nvme monitor** can be used as an
> > alternative
> > to the current auto-connection mechanism based on udev rules and
> > systemd
> > template units. If `--autoconnect` is active, **nvme monitor**
> > masks the
> > respective udev rules in order to prevent simultaneous connection
> > attempts
> > from udev and itself.
> 
> I think that a monitor daemon is a good path forward.
> 
> > 
> > This method for discovery and autodetection has some advantages
> > over the
> > current udev-rule based approach:
> > 
> >   * By using the `--persistent` option, users can easily control
> > whether
> >     persistent discovery controllers for discovered transport
> > addresses should
> >     be created and monitored for AEN events. **nvme monitor**
> > watches known
> >     transport addresses, creates discovery controllers as required,
> > and re-uses
> >     existing ones if possible.
> 
> What does that mean?

In general, if the monitor detects a new host_traddr/traddr/trsvcid
tuple, it runs a discovery on it, and keeps the discovery controller
open if --persistent was given. On startup, it scans existing
controllers, and if it finds already existing discovery controllers,
re-uses them. These will not be shut down when the monitor exits.

This allows users fine-grained control about what discovery controllers
to operate persistently. Users who want all discovery controllers to be
persistent just use --persistent. Others can set up those that they
want to have (manually or with a script), and not use --persistent.

The background is that hosts may not need every detected discovery
controller to be persistent. In multipath scenarios, you may see more
discovery subsystems than anything else, and not everyone likes that.
That's a generic issue and unrelated to the monitor, but running the
monitor with --persistent creates discovery controllers that would
otherwise not be visible.

Hope this clarifies it.

> >   * In certain situations, the systemd-based approach may miss
> > events due to
> >     race conditions. This can happen e.g. if an FC remote port is
> > detected, but
> >     shortly after it's detection an FC relogin procedure is
> > necessary e.g. due to
> >     an RSCN. In this case, an `fc_udev_device` uevent will be
> > received on the
> >     first detection and handled by an `nvme connect-all` command
> > run from
> >     `nvmf-connect@.service`. The connection attempt to the rport in
> > question will
> >     fail with "no such device" because of the simultaneous FC
> >     relogin. `nvmf-connect@.service` may not terminate immediately,
> > because it
> >     attempts to establish other connections listed in the Discovery
> > Log page it
> >     retrieved. When the FC relogin eventually finishes, a new
> > uevent will be
> >     received, and `nvmf-connect@` will be started again, but *this
> > has no effect*
> >     if the previous `nvmf-connect@` service hasn't finished yet.
> > This is the
> >     general semantics of systemd services, no easy workaround
> > exists.  **nvme
> >     monitor** doesn't suffer from this problem. If it sees an
> > uevent for a
> >     transport address for which a discovery is already running, it
> > will queue the
> >     handling of this event up and restart the discovery after it's
> > finished.
> 
> While I understand the issue, this reason alone is an overkill for
> doing 
> this.

I agree. But it's not easy to fix the issue otherwise. In the customer
problem where we observed it, I worked around it by adding the udev
seqnum to the "instance name" of the systemd service, thus allowing
several "nvme connect-all" processes to run for the same transport
address simultaneously. But I don't think that would scale well;
the monitor can handle it more cleanly.

> >   * Resource consumption for handling uevents is lower. Instead of
> > running an
> >     udev worker, executing the rules, executing `systemctl start`
> > from the
> >     worker, starting a systemd service, and starting a separate
> > **nvme-cli**
> >     instance, only a single `fork()` operation is necessary. Of
> > course, on the
> >     back side, the monitor itself consumes resources while it's
> > running and
> >     waiting for events. On my system with 8 persistent discovery
> > controllers,
> >     its RSS is ~3MB. CPU consumption is zero as long as no events
> > occur.
> 
> What is the baseline with what we have today?

A meaningful comparsion is difficult and should be done when the
monitor functionality is finalized. I made this statement only to
provide a rough idea of the resource usage, not more.

> >   * **nvme monitor** could be easily extended to handle events for
> > non-FC
> >     transports.
> 
> Which events?

Network discovery, mDNS or the like. I haven't digged into the details
yet.

> > I've tested `fc_udev_device` handling for NVMeoFC with an Ontap
> > target, and
> > AEN handling for RDMA using a Linux **nvmet** target.
> > 
> > ### Implementation notes
> > 
> > I've tried to change the exisiting **nvme-cli** code as little as
> > possible
> > while reusing the code from `fabrics.c`. The majority of changes in
> > the
> > existing code exports formerly static functions and variables, so
> > that they
> > are usable from the monitor code.
> 
> General comment, can you please separate out fixes/cleanups that are
> not
> related to the goal of this patchset?

Which ones are you referring to? 09 and 19? While these are minor
improvements to the existing code, I wouldn't say they qualify as fixes
or cleanups. They aren't necessary without adding the monitor code.

But yes, I can post all changes to existing code separately.

> >   *  When "add" uevents for nvme controller devices are received,
> > the
> >      controller is consistently not in `live` state yet, and
> > attempting to read
> >      the `subsysnqn` sysfs attribute returns `(efault)`. While this
> > should
> >      arguably be fixed in the kernel, it could be worked around in
> > user space
> >      by using timers or polling the `state` sysfs attribute for
> > changes.
> 
> This is a bug, what in the code causes this? nothing in controller
> state
> should prevent from this sysfs read from executing correctly...

I think it can be fixed by making nvme_sysfs_show_subsysnqn() fall back
to ctrl->opts->subsysnqn if ctrl->subsys is NULL.

I'll send a patch. Anyway, it'll take time until this is fixed
everywhere.

> 
> >   * Parse and handle `discovery.conf` on startup.
> 
> This is a must I think, where do you get the known transport
> addresses
> on startup today?

There's a systemd service that runs "nvme connect-all" once during
boot. That exists today. I'm not sure if it should be integrated in the
monitor, perhaps it's good to keep these separate. People who don't
need the monitor can still run the existing service only, whereas for
others, the two would play together just fine.

> 
> >   * Implement support for RDMA and TCP protocols.
> 
> What is needed for supporting them? Not sure I follow (I thought
> you mentioned that you tested against linux nvmet-rdma?)
> 

AENs over existing discovery controllers are supported for all
transports. But there's no support for discovery of new transports
except for NVMeoFC's "fc_udev_device" mechanism.

Regards
Martin





_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-01-29 11:18 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26 20:32 [PATCH 00/35] RFC: add "nvme monitor" subcommand mwilck
2021-01-26 20:32 ` [PATCH 01/35] nvme-monitor: add new stub mwilck
2021-01-26 20:32 ` [PATCH 02/35] monitor: create udev socket mwilck
2021-01-26 20:32 ` [PATCH 03/35] monitor: initialize signal handling mwilck
2021-01-26 20:32 ` [PATCH 04/35] monitor: add main loop for uevent monitoring mwilck
2021-01-26 20:32 ` [PATCH 05/35] monitor: add uevent filters mwilck
2021-02-04  6:58   ` Hannes Reinecke
2021-01-26 20:32 ` [PATCH 06/35] monitor: Create a log() macro mwilck
2021-02-04  7:01   ` Hannes Reinecke
2021-02-04  9:14     ` Martin Wilck
2021-01-26 20:32 ` [PATCH 07/35] fabrics: use " mwilck
2021-02-04  7:02   ` Hannes Reinecke
2021-01-26 20:32 ` [PATCH 08/35] monitor: add command line options to control logging mwilck
2021-02-04  7:04   ` Hannes Reinecke
2021-02-04  9:18     ` Martin Wilck
2021-01-26 20:32 ` [PATCH 09/35] nvme_get_ctrl_attr(): constify "path" argument mwilck
2021-02-04  7:05   ` Hannes Reinecke
2021-01-26 20:32 ` [PATCH 10/35] fabrics: export do_discover(), build_options() and config mwilck
2021-02-04  7:09   ` Hannes Reinecke
2021-02-04  9:21     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 11/35] monitor: add option -A / --autoconnect mwilck
2021-01-29 18:59   ` Sagi Grimberg
2021-01-29 19:33     ` Martin Wilck
2021-01-29 20:09       ` Sagi Grimberg
2021-02-04  7:13   ` Hannes Reinecke
2021-01-26 20:33 ` [PATCH 12/35] monitor: add helpers for __attribute__((cleanup)) mwilck
2021-02-04  7:14   ` Hannes Reinecke
2021-01-26 20:33 ` [PATCH 13/35] monitor: disable nvmf-autoconnect udev rules in autoconnect mode mwilck
2021-01-29  1:52   ` Sagi Grimberg
2021-01-29 14:16     ` Martin Wilck
2021-01-29 18:54       ` Sagi Grimberg
2021-02-04  7:16   ` Hannes Reinecke
2021-02-04  9:37     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 14/35] monitor: implement handling of fc_udev_device mwilck
2021-01-26 20:33 ` [PATCH 15/35] monitor: implement handling of nvme AEN events mwilck
2021-01-26 20:33 ` [PATCH 16/35] monitor: reset children's signal disposition mwilck
2021-01-29  1:54   ` Sagi Grimberg
2021-01-29 14:18     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 17/35] monitor: handle SIGCHLD for terminated child processes mwilck
2021-01-29  1:54   ` Sagi Grimberg
2021-01-26 20:33 ` [PATCH 18/35] monitor: add "--persistent/-p" flag mwilck
2021-01-29 19:02   ` Sagi Grimberg
2021-01-29 19:45     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 19/35] fabrics: use "const char *" in struct config mwilck
2021-02-04  7:20   ` Hannes Reinecke
2021-01-26 20:33 ` [PATCH 20/35] fabrics: export arg_str(), parse_conn_arg(), and remove_ctrl() mwilck
2021-01-26 20:33 ` [PATCH 21/35] nvme-cli: add "list.h" mwilck
2021-01-26 20:33 ` [PATCH 22/35] conn-db: add simple connection registry mwilck
2021-01-29  1:59   ` Sagi Grimberg
2021-01-29 14:18     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 23/35] monitor: handle restart of pending discoveries mwilck
2021-01-26 20:33 ` [PATCH 24/35] monitor: monitor_discovery(): try to reuse existing controllers mwilck
2021-01-26 20:33 ` [PATCH 25/35] monitor: read existing connections on startup mwilck
2021-01-26 20:33 ` [PATCH 26/35] monitor: implement starting discovery controllers " mwilck
2021-01-29 21:06   ` Sagi Grimberg
2021-01-29 21:13     ` Martin Wilck
2021-01-29 21:18       ` Sagi Grimberg
2021-01-26 20:33 ` [PATCH 27/35] monitor: implement cleanup of created discovery controllers mwilck
2021-01-26 20:33 ` [PATCH 28/35] monitor: basic handling of add/remove uevents for nvme controllers mwilck
2021-01-26 20:33 ` [PATCH 29/35] monitor: kill running discovery tasks on exit mwilck
2021-01-26 20:33 ` [PATCH 30/35] monitor: add connection property options from connect-all mwilck
2021-01-26 20:33 ` [PATCH 31/35] completions: add completions for nvme monitor mwilck
2021-01-26 20:33 ` [PATCH 32/35] nvmf-autoconnect: add unit file for nvme-monitor.service mwilck
2021-01-29 19:08   ` Sagi Grimberg
2021-01-29 19:50     ` Martin Wilck
2021-01-26 20:33 ` [PATCH 33/35] nvme-connect-all(1): fix documentation for --quiet/-S mwilck
2021-01-29 19:09   ` Sagi Grimberg
2021-01-26 20:33 ` [PATCH 34/35] nvme-monitor(1): add man page for nvme-monitor mwilck
2021-01-26 20:33 ` [PATCH 35/35] monitor: add option --keep/-K mwilck
2021-01-29 19:10   ` Sagi Grimberg
2021-01-29 19:53     ` Martin Wilck
2021-01-29 20:16       ` Sagi Grimberg
2021-01-29 20:30         ` Martin Wilck
2021-01-29 20:45           ` Sagi Grimberg
2021-01-29 20:51             ` Martin Wilck
2021-01-29 20:57               ` Sagi Grimberg
2021-01-29 21:05                 ` Martin Wilck
2021-01-29 21:11                   ` Sagi Grimberg
2021-01-29 21:15                     ` Martin Wilck
2021-01-29 21:21                       ` Sagi Grimberg
2021-02-04  7:34                       ` Hannes Reinecke
2021-02-04  9:41                         ` Martin Wilck
2021-01-29  1:14 ` [PATCH 00/35] RFC: add "nvme monitor" subcommand Sagi Grimberg
2021-01-29 11:18   ` Martin Wilck [this message]
2021-01-29 20:08     ` Sagi Grimberg
2021-01-29 20:27       ` Martin Wilck
2021-02-04  7:52         ` Hannes Reinecke
2021-02-22 19:02 ` Enzo Matsumiya
2021-02-22 21:05   ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8956cf973867ad7cc6c4d968682ec3db3054aa3.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=hare@suse.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.