linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand
@ 2021-03-06  0:36 mwilck
  2021-03-06  0:36 ` [PATCH v2 01/16] fabrics: export symbols required for monitor functionality mwilck
                   ` (15 more replies)
  0 siblings, 16 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This patch set adds a new subcommand "nvme monitor". In this mode,
nvme-cli runs continuously, monitors various events relevant for discovery,
and autoconnects to newly discovered subsystems.

This series is based on, and requires, my previously submitted patch series
"Some minor fixes/additions for nvme-cli".

The monitor mode is suitable to be run in a systemd service. An appropriate
unit file is provided. As such, "nvme monitor" can be used as an alternative
to the current auto-connection mechanism based on udev rules and systemd
template units.

This method for discovery and autodetection has some advantages over the
current udev-rule based approach:

 * The monitor creates persistent discovery controllers if possible,
   and monitors them for AENs.

 * The monitor tracks "/etc/nvme/discovery.conf" changes using inotify.

 * The monitor keeps record about existing NMVe transport connections and
   associated discovery controllers (if any). Thus it can avoid recreating
   discovery controllers if a persistent discovery controller is already
   present on a given transport address, without having to search sysfs
   for a matching controller.

 * The monitor is aware of ongoing discoveries (as much as it has started
   them itself) and can queue up additional processes without taking the
   risk to miss any events. Missing events is possible with the current
   systemd-based activation of NVMe discovery.

 * I expect slightly less resource usage compared to the current udev-rule
   based discovery, as less fork()/exec() operations are required. The effect
   will probably to be small though, and I have no numbers.

 * The monitor will be able to support network discovery too, and react
   on mDNS records being published in the network. This functionality
   will be implemented using libavahi; Enzo Matsumiya is working on it.
   Once finished, "nvme monitor" will be able to track discovery events
   for every NVMeoF transport.

I've tested `fc_udev_device` handling for NVMeoFC with an Ontap target, and
AEN handling for RDMA using a Linux nvmet target.

# Changes wrt "RFC: add "nvme monitor" subcommand" patch series

A lot.

 * Separated out those changes that are not directly related to the monitor
   into a separate series, as requested by Sagi (see above). The part
   that changes some symbols in fabrics.c from static to global is still
   part of the "monitor" series though, as it doesn't make sense to do
   this without the monitor.

 * Reorganized the patches into less, bigger chunks, as requested by Hannes.

 * Changed the behavior of the monitor:

   - Autoconnect by default, and allow to use "-n/--no-connect" for opt-out.

   - Always create persistent discovery connections (Sagi): it makes no sense
     to use temporary discovery controllers if the monitor is running.

   - Don't try to create discovery controllers on every transport connection
     found. Sagi had pointed out that this behavior in the RFC was wrong.
     Instead, run discoveries from /etc/nvme/discovery.conf on startup.

   - Don't automatically disable 70-nvmf-autoconnect.rules (Hannes).
     I have put this in the systemd service file for now, because I think
     it makes no sense to run the monitor as a systemd service and run the
     discovery via udev rules at the same time. If this is also unwanted,
     I can remove it entirely of course.

 * Moved the event handling into a separate "library". This was motivated
   by the additional events monitored in the v2 series, and by the prospect
   of adding more (and network-related ones, where timeout handling will
   become important) when the mDNS support is merged. I've actually spent
   most work on this part, stabilizing the API, creating tests and fixing
   issues. I have published this separately on https://github.com/mwilck/minivent,
   together with the unit tests that I didn't want to add to the nvme-cli
   patch set at this time.
   
 * Added new features:

   - /etc/nvme/discovery.conf: Parse it on startup, and monitor changes with
     inotify.

   - parent/child messaging: allow children running discovery to communicate
     with the parent monitor process via a Unix socket. Without this, the
     discovery of newly created discovery controllers by the parent is
     fragile, because the monitor has no way to figure out whether a given
     controller was created by its own child or by another process. Also,
     it wasn't possible to pass existing discovery controller devices to
     children running discovery from the conf file, or for referrals. This
     had the effect that children would create a temporary discovery controller
     even though persistent controller for the same connection existed
     already.

 * Use the "udev" udev monitor socket by default rather than "kernel".
   When I made the first submission, I was unaware that filtering on "kernel"
   netlink sockets is much less efficient than on "udev" sockets. Thus
   "kernel" is only used if udevd is not available.

 * Lots of bugs and minor issues fixed.

# Todo

 * Implement support for RDMA and TCP protocols. As noted above, Enzo
   Matsumiya has been working on this, and we are cooperating to merge
   our efforts.

Reviews and comments welcome.
Thanks,

PS: I've pushed both this series and the "minor fixes" series to
    https://github.com/linux-nvme/nvme-cli/pull/877. The CI fails
    because I don't know how to resolve the dependency of libudev
    in the Ubuntu / powerpc cross-compilation environment used there.
    Help would be appreciated.    


Martin Wilck (16):
  fabrics: export symbols required for monitor functionality
  nvme-cli: add code for event and timeout handling
  monitor: add basic "nvme monitor" functionality
  monitor: implement uevent handling
  conn-db: add simple connection registry
  monitor: monitor_discovery(): try to reuse existing controllers
  monitor: kill running discovery tasks on exit
  monitor: add option --cleanup / -C
  monitor: handling of add/remove uevents for nvme controllers
  monitor: discover from conf file on startup
  monitor: watch discovery.conf with inotify
  monitor: add parent/child messaging and "notify" message exchange
  monitor: add "query device" message exchange
  completions: add completions for nvme monitor
  nvmf-autoconnect: add unit file for nvme-monitor.service
  nvme-monitor(1): add man page for nvme-monitor

 .github/workflows/c-cpp.yml                   |    4 +
 Documentation/cmds-main.txt                   |    4 +
 Documentation/nvme-monitor.1                  |  180 +++
 Documentation/nvme-monitor.html               | 1018 ++++++++++++
 Documentation/nvme-monitor.txt                |  144 ++
 Makefile                                      |   21 +-
 common.h                                      |   17 +
 completions/bash-nvme-completion.sh           |    6 +-
 conn-db.c                                     |  425 +++++
 conn-db.h                                     |  171 ++
 event/event.c                                 |  481 ++++++
 event/event.h                                 |  460 ++++++
 event/timeout.c                               |  373 +++++
 event/timeout.h                               |  110 ++
 event/ts-util.c                               |  107 ++
 event/ts-util.h                               |  129 ++
 fabrics.c                                     |  436 +++---
 fabrics.h                                     |   52 +
 list.h                                        |  349 +++++
 monitor.c                                     | 1370 +++++++++++++++++
 monitor.h                                     |   14 +
 nvme-builtin.h                                |    1 +
 nvme.c                                        |   13 +
 nvmf-autoconnect/systemd/nvme-monitor.service |   18 +
 util/cleanup.c                                |    2 +
 util/cleanup.h                                |    1 +
 26 files changed, 5676 insertions(+), 230 deletions(-)
 create mode 100644 Documentation/nvme-monitor.1
 create mode 100644 Documentation/nvme-monitor.html
 create mode 100644 Documentation/nvme-monitor.txt
 create mode 100644 conn-db.c
 create mode 100644 conn-db.h
 create mode 100644 event/event.c
 create mode 100644 event/event.h
 create mode 100644 event/timeout.c
 create mode 100644 event/timeout.h
 create mode 100644 event/ts-util.c
 create mode 100644 event/ts-util.h
 create mode 100644 list.h
 create mode 100644 monitor.c
 create mode 100644 monitor.h
 create mode 100644 nvmf-autoconnect/systemd/nvme-monitor.service

-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/16] fabrics: export symbols required for monitor functionality
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 02/16] nvme-cli: add code for event and timeout handling mwilck
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

These macros, functions and variables will be used by the "nvme monitor"
functionality. Convert them to globally visible symbols.

Being able to access struct config and the "cfg" variable
from fabrics.c is essential for the monitor to leverage the existing,
well tested code as much as possible. Rename "cfg" to "fabrics_cfg"
to make the global variable name less generic.
---
 fabrics.c | 384 +++++++++++++++++++++++++-----------------------------
 fabrics.h |  42 ++++++
 2 files changed, 222 insertions(+), 204 deletions(-)

diff --git a/fabrics.c b/fabrics.c
index 1fcc6cf..5012519 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -67,33 +67,8 @@ const char *conarg_traddr = "traddr";
 const char *conarg_trsvcid = "trsvcid";
 const char *conarg_host_traddr = "host_traddr";
 
-static struct config {
-	const char *nqn;
-	const char *transport;
-	const char *traddr;
-	const char *trsvcid;
-	const char *host_traddr;
-	const char *hostnqn;
-	const char *hostid;
-	int  nr_io_queues;
-	int  nr_write_queues;
-	int  nr_poll_queues;
-	int  queue_size;
-	int  keep_alive_tmo;
-	int  reconnect_delay;
-	int  ctrl_loss_tmo;
-	int  tos;
-	const char *raw;
-	char *device;
-	int  duplicate_connect;
-	int  disable_sqflow;
-	int  hdr_digest;
-	int  data_digest;
-	bool persistent;
-	bool matching_only;
-	const char *output_format;
-} cfg = {
-	.ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO,
+struct fabrics_config fabrics_cfg = {
+	.ctrl_loss_tmo = -1,
 	.output_format = "normal",
 };
 
@@ -109,7 +84,6 @@ struct connect_args {
 
 struct connect_args *tracked_ctrls;
 
-#define BUF_SIZE		4096
 #define PATH_NVME_FABRICS	"/dev/nvme-fabrics"
 #define PATH_NVMF_DISC		"/etc/nvme/discovery.conf"
 #define PATH_NVMF_HOSTNQN	"/etc/nvme/hostnqn"
@@ -129,7 +103,7 @@ static const match_table_t opt_tokens = {
 	{ OPT_ERR,		NULL		},
 };
 
-static const char *arg_str(const char * const *strings,
+const char *arg_str(const char * const *strings,
 		size_t array_size, size_t idx)
 {
 	if (idx < array_size && strings[idx])
@@ -137,7 +111,7 @@ static const char *arg_str(const char * const *strings,
 	return "unrecognized";
 }
 
-static const char * const trtypes[] = {
+const char * const trtypes[] = {
 	[NVMF_TRTYPE_RDMA]	= "rdma",
 	[NVMF_TRTYPE_FC]	= "fc",
 	[NVMF_TRTYPE_TCP]	= "tcp",
@@ -228,14 +202,12 @@ static const char *cms_str(__u8 cm)
 	return arg_str(cms, ARRAY_SIZE(cms), cm);
 }
 
-static int do_discover(char *argstr, bool connect, enum nvme_print_flags flags);
-
 /*
  * parse strings with connect arguments to find a particular field.
  * If field found, return string containing field value. If field
  * not found, return an empty string.
  */
-static char *parse_conn_arg(const char *conargs, const char delim, const char *field)
+char *parse_conn_arg(const char *conargs, const char delim, const char *field)
 {
 	char *s, *e;
 	size_t cnt;
@@ -279,7 +251,7 @@ empty_field:
 	return strdup("\0");
 }
 
-static int ctrl_instance(const char *device)
+int ctrl_instance(const char *device)
 {
 	char d[64];
 	const char *p;
@@ -542,7 +514,7 @@ out:
 	return ret;
 }
 
-static int remove_ctrl(int instance)
+int remove_ctrl(int instance)
 {
 	char *sysfs_path;
 	int ret;
@@ -802,10 +774,10 @@ static void save_discovery_log(struct nvmf_disc_rsp_page_hdr *log, int numrec)
 	int fd;
 	int len, ret;
 
-	fd = open(cfg.raw, O_CREAT|O_RDWR|O_TRUNC, S_IRUSR|S_IWUSR);
+	fd = open(fabrics_cfg.raw, O_CREAT|O_RDWR|O_TRUNC, S_IRUSR|S_IWUSR);
 	if (fd < 0) {
 		msg(LOG_ERR, "failed to open %s: %s\n",
-			cfg.raw, strerror(errno));
+			fabrics_cfg.raw, strerror(errno));
 		return;
 	}
 
@@ -814,9 +786,9 @@ static void save_discovery_log(struct nvmf_disc_rsp_page_hdr *log, int numrec)
 	ret = write(fd, log, len);
 	if (ret < 0)
 		msg(LOG_ERR, "failed to write to %s: %s\n",
-			cfg.raw, strerror(errno));
+			fabrics_cfg.raw, strerror(errno));
 	else
-		printf("Discovery log is saved to %s\n", cfg.raw);
+		printf("Discovery log is saved to %s\n", fabrics_cfg.raw);
 
 	close(fd);
 }
@@ -878,9 +850,9 @@ char *hostnqn_read(void)
 
 static int nvmf_hostnqn_file(void)
 {
-	cfg.hostnqn = hostnqn_read();
+	fabrics_cfg.hostnqn = hostnqn_read();
 
-	return cfg.hostnqn != NULL;
+	return fabrics_cfg.hostnqn != NULL;
 }
 
 static int nvmf_hostid_file(void)
@@ -896,8 +868,8 @@ static int nvmf_hostid_file(void)
 	if (fgets(hostid, sizeof(hostid), f) == NULL)
 		goto out;
 
-	cfg.hostid = strdup(hostid);
-	if (!cfg.hostid)
+	fabrics_cfg.hostid = strdup(hostid);
+	if (!fabrics_cfg.hostid)
 		goto out;
 
 	ret = true;
@@ -955,68 +927,71 @@ add_argument(char **argstr, int *max_len, char *arg_str, const char *arg)
 	return 0;
 }
 
-static int build_options(char *argstr, int max_len, bool discover)
+int build_options(char *argstr, int max_len, bool discover)
 {
 	int len;
 
-	if (!cfg.transport) {
+	if (!fabrics_cfg.transport) {
 		msg(LOG_ERR, "need a transport (-t) argument\n");
 		return -EINVAL;
 	}
 
-	if (strncmp(cfg.transport, "loop", 4)) {
-		if (!cfg.traddr) {
+	if (strncmp(fabrics_cfg.transport, "loop", 4)) {
+		if (!fabrics_cfg.traddr) {
 			msg(LOG_ERR, "need a address (-a) argument\n");
 			return -EINVAL;
 		}
+		/* Use the default ctrl loss timeout if unset */
+		if (fabrics_cfg.ctrl_loss_tmo == -1)
+			fabrics_cfg.ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO;
 	}
 
 	/* always specify nqn as first arg - this will init the string */
-	len = snprintf(argstr, max_len, "nqn=%s", cfg.nqn);
+	len = snprintf(argstr, max_len, "nqn=%s", fabrics_cfg.nqn);
 	if (len < 0)
 		return -EINVAL;
 	argstr += len;
 	max_len -= len;
 
-	if (add_argument(&argstr, &max_len, "transport", cfg.transport) ||
-	    add_argument(&argstr, &max_len, "traddr", cfg.traddr) ||
-	    add_argument(&argstr, &max_len, "host_traddr", cfg.host_traddr) ||
-	    add_argument(&argstr, &max_len, "trsvcid", cfg.trsvcid) ||
-	    ((cfg.hostnqn || nvmf_hostnqn_file()) &&
-		    add_argument(&argstr, &max_len, "hostnqn", cfg.hostnqn)) ||
-	    ((cfg.hostid || nvmf_hostid_file()) &&
-		    add_argument(&argstr, &max_len, "hostid", cfg.hostid)) ||
+	if (add_argument(&argstr, &max_len, "transport", fabrics_cfg.transport) ||
+	    add_argument(&argstr, &max_len, "traddr", fabrics_cfg.traddr) ||
+	    add_argument(&argstr, &max_len, "host_traddr", fabrics_cfg.host_traddr) ||
+	    add_argument(&argstr, &max_len, "trsvcid", fabrics_cfg.trsvcid) ||
+	    ((fabrics_cfg.hostnqn || nvmf_hostnqn_file()) &&
+		    add_argument(&argstr, &max_len, "hostnqn", fabrics_cfg.hostnqn)) ||
+	    ((fabrics_cfg.hostid || nvmf_hostid_file()) &&
+		    add_argument(&argstr, &max_len, "hostid", fabrics_cfg.hostid)) ||
 	    (!discover &&
 	      add_int_argument(&argstr, &max_len, "nr_io_queues",
-				cfg.nr_io_queues, false)) ||
+				fabrics_cfg.nr_io_queues, false)) ||
 	    add_int_argument(&argstr, &max_len, "nr_write_queues",
-				cfg.nr_write_queues, false) ||
+				fabrics_cfg.nr_write_queues, false) ||
 	    add_int_argument(&argstr, &max_len, "nr_poll_queues",
-				cfg.nr_poll_queues, false) ||
+				fabrics_cfg.nr_poll_queues, false) ||
 	    (!discover &&
 	      add_int_argument(&argstr, &max_len, "queue_size",
-				cfg.queue_size, false)) ||
+				fabrics_cfg.queue_size, false)) ||
 	    add_int_argument(&argstr, &max_len, "keep_alive_tmo",
-				cfg.keep_alive_tmo, false) ||
+			     fabrics_cfg.keep_alive_tmo, false) ||
 	    add_int_argument(&argstr, &max_len, "reconnect_delay",
-				cfg.reconnect_delay, false) ||
-	    (strncmp(cfg.transport, "loop", 4) &&
+				fabrics_cfg.reconnect_delay, false) ||
+	    (strncmp(fabrics_cfg.transport, "loop", 4) &&
 	     add_int_argument(&argstr, &max_len, "ctrl_loss_tmo",
-				cfg.ctrl_loss_tmo, true)) ||
+			      fabrics_cfg.ctrl_loss_tmo, true)) ||
 	    add_int_argument(&argstr, &max_len, "tos",
-				cfg.tos, true) ||
+				fabrics_cfg.tos, true) ||
 	    add_bool_argument(&argstr, &max_len, "duplicate_connect",
-				cfg.duplicate_connect) ||
+				fabrics_cfg.duplicate_connect) ||
 	    add_bool_argument(&argstr, &max_len, "disable_sqflow",
-				cfg.disable_sqflow) ||
-	    add_bool_argument(&argstr, &max_len, "hdr_digest", cfg.hdr_digest) ||
-	    add_bool_argument(&argstr, &max_len, "data_digest", cfg.data_digest))
+				fabrics_cfg.disable_sqflow) ||
+	    add_bool_argument(&argstr, &max_len, "hdr_digest", fabrics_cfg.hdr_digest) ||
+	    add_bool_argument(&argstr, &max_len, "data_digest", fabrics_cfg.data_digest))
 		return -EINVAL;
 
 	return 0;
 }
 
-static void set_discovery_kato(struct config *cfg)
+static void set_discovery_kato(struct fabrics_config *cfg)
 {
 	/* Set kato to NVMF_DEF_DISC_TMO for persistent controllers */
 	if (cfg->persistent && !cfg->keep_alive_tmo)
@@ -1026,41 +1001,41 @@ static void set_discovery_kato(struct config *cfg)
 		cfg->keep_alive_tmo = 0;
 }
 
-static void discovery_trsvcid(struct config *cfg)
+static void discovery_trsvcid(struct fabrics_config *fabrics_cfg)
 {
-	if (!strcmp(cfg->transport, "tcp")) {
+	if (!strcmp(fabrics_cfg->transport, "tcp")) {
 		/* Default port for NVMe/TCP discovery controllers */
-		cfg->trsvcid = __stringify(NVME_DISC_IP_PORT);
-	} else if (!strcmp(cfg->transport, "rdma")) {
+		fabrics_cfg->trsvcid = __stringify(NVME_DISC_IP_PORT);
+	} else if (!strcmp(fabrics_cfg->transport, "rdma")) {
 		/* Default port for NVMe/RDMA controllers */
-		cfg->trsvcid = __stringify(NVME_RDMA_IP_PORT);
+		fabrics_cfg->trsvcid = __stringify(NVME_RDMA_IP_PORT);
 	}
 }
 
-static bool traddr_is_hostname(struct config *cfg)
+static bool traddr_is_hostname(struct fabrics_config *fabrics_cfg)
 {
 	char addrstr[NVMF_TRADDR_SIZE];
 
-	if (!cfg->traddr || !cfg->transport)
+	if (!fabrics_cfg->traddr || !fabrics_cfg->transport)
 		return false;
-	if (strcmp(cfg->transport, "tcp") && strcmp(cfg->transport, "rdma"))
+	if (strcmp(fabrics_cfg->transport, "tcp") && strcmp(fabrics_cfg->transport, "rdma"))
 		return false;
-	if (inet_pton(AF_INET, cfg->traddr, addrstr) > 0 ||
-	    inet_pton(AF_INET6, cfg->traddr, addrstr) > 0)
+	if (inet_pton(AF_INET, fabrics_cfg->traddr, addrstr) > 0 ||
+	    inet_pton(AF_INET6, fabrics_cfg->traddr, addrstr) > 0)
 		return false;
 	return true;
 }
 
-static int hostname2traddr(struct config *cfg)
+static int hostname2traddr(struct fabrics_config *fabrics_cfg)
 {
 	struct addrinfo *host_info, hints = {.ai_family = AF_UNSPEC};
 	char addrstr[NVMF_TRADDR_SIZE];
 	const char *p;
 	int ret;
 
-	ret = getaddrinfo(cfg->traddr, NULL, &hints, &host_info);
+	ret = getaddrinfo(fabrics_cfg->traddr, NULL, &hints, &host_info);
 	if (ret) {
-		msg(LOG_ERR, "failed to resolve host %s info\n", cfg->traddr);
+		msg(LOG_ERR, "failed to resolve host %s info\n", fabrics_cfg->traddr);
 		return ret;
 	}
 
@@ -1077,17 +1052,17 @@ static int hostname2traddr(struct config *cfg)
 		break;
 	default:
 		msg(LOG_ERR, "unrecognized address family (%d) %s\n",
-			host_info->ai_family, cfg->traddr);
+			host_info->ai_family, fabrics_cfg->traddr);
 		ret = -EINVAL;
 		goto free_addrinfo;
 	}
 
 	if (!p) {
-		msg(LOG_ERR, "failed to get traddr for %s\n", cfg->traddr);
+		msg(LOG_ERR, "failed to get traddr for %s\n", fabrics_cfg->traddr);
 		ret = -errno;
 		goto free_addrinfo;
 	}
-	cfg->traddr = strdup(addrstr);
+	fabrics_cfg->traddr = strdup(addrstr);
 
 free_addrinfo:
 	freeaddrinfo(host_info);
@@ -1121,78 +1096,78 @@ retry:
 		return -EINVAL;
 	p += len;
 
-	if (cfg.hostnqn && strcmp(cfg.hostnqn, "none")) {
-		len = sprintf(p, ",hostnqn=%s", cfg.hostnqn);
+	if (fabrics_cfg.hostnqn && strcmp(fabrics_cfg.hostnqn, "none")) {
+		len = sprintf(p, ",hostnqn=%s", fabrics_cfg.hostnqn);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.hostid && strcmp(cfg.hostid, "none")) {
-		len = sprintf(p, ",hostid=%s", cfg.hostid);
+	if (fabrics_cfg.hostid && strcmp(fabrics_cfg.hostid, "none")) {
+		len = sprintf(p, ",hostid=%s", fabrics_cfg.hostid);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.queue_size && !discover) {
-		len = sprintf(p, ",queue_size=%d", cfg.queue_size);
+	if (fabrics_cfg.queue_size && !discover) {
+		len = sprintf(p, ",queue_size=%d", fabrics_cfg.queue_size);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.nr_io_queues && !discover) {
-		len = sprintf(p, ",nr_io_queues=%d", cfg.nr_io_queues);
+	if (fabrics_cfg.nr_io_queues && !discover) {
+		len = sprintf(p, ",nr_io_queues=%d", fabrics_cfg.nr_io_queues);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.nr_write_queues) {
-		len = sprintf(p, ",nr_write_queues=%d", cfg.nr_write_queues);
+	if (fabrics_cfg.nr_write_queues) {
+		len = sprintf(p, ",nr_write_queues=%d", fabrics_cfg.nr_write_queues);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.nr_poll_queues) {
-		len = sprintf(p, ",nr_poll_queues=%d", cfg.nr_poll_queues);
+	if (fabrics_cfg.nr_poll_queues) {
+		len = sprintf(p, ",nr_poll_queues=%d", fabrics_cfg.nr_poll_queues);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.host_traddr && strcmp(cfg.host_traddr, "none")) {
-		len = sprintf(p, ",host_traddr=%s", cfg.host_traddr);
+	if (fabrics_cfg.host_traddr && strcmp(fabrics_cfg.host_traddr, "none")) {
+		len = sprintf(p, ",host_traddr=%s", fabrics_cfg.host_traddr);
 		if (len < 0)
 			return -EINVAL;
 		p+= len;
 	}
 
-	if (cfg.reconnect_delay) {
-		len = sprintf(p, ",reconnect_delay=%d", cfg.reconnect_delay);
+	if (fabrics_cfg.reconnect_delay) {
+		len = sprintf(p, ",reconnect_delay=%d", fabrics_cfg.reconnect_delay);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if ((e->trtype != NVMF_TRTYPE_LOOP) && (cfg.ctrl_loss_tmo >= -1)) {
-		len = sprintf(p, ",ctrl_loss_tmo=%d", cfg.ctrl_loss_tmo);
+	if ((e->trtype != NVMF_TRTYPE_LOOP) && (fabrics_cfg.ctrl_loss_tmo >= -1)) {
+		len = sprintf(p, ",ctrl_loss_tmo=%d", fabrics_cfg.ctrl_loss_tmo);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.tos != -1) {
-		len = sprintf(p, ",tos=%d", cfg.tos);
+	if (fabrics_cfg.tos != -1) {
+		len = sprintf(p, ",tos=%d", fabrics_cfg.tos);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.keep_alive_tmo) {
-		len = sprintf(p, ",keep_alive_tmo=%d", cfg.keep_alive_tmo);
+	if (fabrics_cfg.keep_alive_tmo) {
+		len = sprintf(p, ",keep_alive_tmo=%d", fabrics_cfg.keep_alive_tmo);
 		if (len < 0)
 			return -EINVAL;
 		p += len;
@@ -1210,14 +1185,14 @@ retry:
 		return -EINVAL;
 	p += len;
 
-	if (cfg.hdr_digest) {
+	if (fabrics_cfg.hdr_digest) {
 		len = sprintf(p, ",hdr_digest");
 		if (len < 0)
 			return -EINVAL;
 		p += len;
 	}
 
-	if (cfg.data_digest) {
+	if (fabrics_cfg.data_digest) {
 		len = sprintf(p, ",data_digest");
 		if (len < 0)
 			return -EINVAL;
@@ -1277,7 +1252,7 @@ retry:
 	if (discover) {
 		enum nvme_print_flags flags;
 
-		flags = validate_output_format(cfg.output_format);
+		flags = validate_output_format(fabrics_cfg.output_format);
 		if (flags < 0)
 			flags = NORMAL;
 		ret = do_discover(argstr, true, flags);
@@ -1300,7 +1275,7 @@ static bool cargs_match_found(struct nvmf_disc_rsp_page_entry *entry)
 	cargs.transport = strdup(trtype_str(entry->trtype));
 	cargs.subsysnqn = strdup(entry->subnqn);
 	cargs.trsvcid = strdup(entry->trsvcid);
-	cargs.host_traddr = strdup(cfg.host_traddr ?: "\0");
+	cargs.host_traddr = strdup(fabrics_cfg.host_traddr ?: "\0");
 
 	/* check if we have a match in the discovery recursion */
 	while (c) {
@@ -1324,11 +1299,11 @@ static bool should_connect(struct nvmf_disc_rsp_page_entry *entry)
 	if (cargs_match_found(entry))
 		return false;
 
-	if (!cfg.matching_only || !cfg.traddr)
+	if (!fabrics_cfg.matching_only || !fabrics_cfg.traddr)
 		return true;
 
 	len = space_strip_len(NVMF_TRADDR_SIZE, entry->traddr);
-	return !strncmp(cfg.traddr, entry->traddr, len);
+	return !strncmp(fabrics_cfg.traddr, entry->traddr, len);
 }
 
 static int connect_ctrls(struct nvmf_disc_rsp_page_hdr *log, int numrec)
@@ -1375,13 +1350,13 @@ static void nvmf_get_host_identifiers(int ctrl_instance)
 
 	if (asprintf(&path, "%s/nvme%d", SYS_NVME, ctrl_instance) < 0)
 		return;
-	cfg.hostnqn = nvme_get_ctrl_attr(path, "hostnqn");
-	cfg.hostid = nvme_get_ctrl_attr(path, "hostid");
+	fabrics_cfg.hostnqn = nvme_get_ctrl_attr(path, "hostnqn");
+	fabrics_cfg.hostid = nvme_get_ctrl_attr(path, "hostid");
 }
 
 static DEFINE_CLEANUP_FUNC(cleanup_log, struct nvmf_disc_rsp_page_hdr *, free);
 
-static int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
+int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 {
 	struct nvmf_disc_rsp_page_hdr *log __cleanup__(cleanup_log) = NULL;
 	char *dev_name;
@@ -1393,18 +1368,19 @@ static int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 	if (!cargs)
 		return -ENOMEM;
 
-	if (cfg.device && !ctrl_matches_connectargs(cfg.device, cargs, true)) {
-		free(cfg.device);
-		cfg.device = NULL;
+	if (fabrics_cfg.device &&
+	    !ctrl_matches_connectargs(fabrics_cfg.device, cargs, true)) {
+		free(fabrics_cfg.device);
+		fabrics_cfg.device = NULL;
 	}
-	if (!cfg.device)
-		cfg.device = find_ctrl_with_connectargs(cargs);
+	if (!fabrics_cfg.device)
+		fabrics_cfg.device = find_ctrl_with_connectargs(cargs);
 	free_connect_args(cargs);
 
-	if (!cfg.device) {
+	if (!fabrics_cfg.device) {
 		instance = add_ctrl(argstr);
 	} else {
-		instance = ctrl_instance(cfg.device);
+		instance = ctrl_instance(fabrics_cfg.device);
 		nvmf_get_host_identifiers(instance);
 	}
 	if (instance < 0)
@@ -1414,9 +1390,9 @@ static int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 		return -errno;
 	ret = nvmf_get_log_page_discovery(dev_name, &log, &numrec, &status);
 	free(dev_name);
-	if (cfg.persistent)
+	if (fabrics_cfg.persistent)
 		msg(LOG_NOTICE, "Persistent device: nvme%d\n", instance);
-	if (!cfg.device && !cfg.persistent) {
+	if (!fabrics_cfg.device && !fabrics_cfg.persistent) {
 		err = remove_ctrl(instance);
 		if (err)
 			return err;
@@ -1426,7 +1402,7 @@ static int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 	case DISC_OK:
 		if (connect)
 			ret = connect_ctrls(log, numrec);
-		else if (cfg.raw || flags == BINARY)
+		else if (fabrics_cfg.raw || flags == BINARY)
 			save_discovery_log(log, numrec);
 		else if (flags == JSON)
 			json_discovery_log(log, numrec);
@@ -1508,22 +1484,22 @@ static int discover_from_conf_file(const char *desc, char *argstr,
 		if (err)
 			goto free_and_continue;
 
-		if (!cfg.transport || !cfg.traddr)
+		if (!fabrics_cfg.transport || !fabrics_cfg.traddr)
 			goto free_and_continue;
 
-		err = flags = validate_output_format(cfg.output_format);
+		err = flags = validate_output_format(fabrics_cfg.output_format);
 		if (err < 0)
 			goto free_and_continue;
-		set_discovery_kato(&cfg);
+		set_discovery_kato(&fabrics_cfg);
 
-		if (traddr_is_hostname(&cfg)) {
-			ret = hostname2traddr(&cfg);
+		if (traddr_is_hostname(&fabrics_cfg)) {
+			ret = hostname2traddr(&fabrics_cfg);
 			if (ret)
 				goto out;
 		}
 
-		if (!cfg.trsvcid)
-			discovery_trsvcid(&cfg);
+		if (!fabrics_cfg.trsvcid)
+			discovery_trsvcid(&fabrics_cfg);
 
 		err = build_options(argstr, BUF_SIZE, true);
 		if (err) {
@@ -1538,8 +1514,8 @@ static int discover_from_conf_file(const char *desc, char *argstr,
 free_and_continue:
 		free(all_args);
 		free(argv);
-		cfg.transport = cfg.traddr = cfg.trsvcid =
-			cfg.host_traddr = NULL;
+		fabrics_cfg.transport = fabrics_cfg.traddr =
+			fabrics_cfg.trsvcid = fabrics_cfg.host_traddr = NULL;
 	}
 
 out:
@@ -1555,42 +1531,42 @@ int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 	bool quiet = false;
 
 	OPT_ARGS(opts) = {
-		OPT_LIST("transport",      't', &cfg.transport,       "transport type"),
-		OPT_LIST("traddr",         'a', &cfg.traddr,          "transport address"),
-		OPT_LIST("trsvcid",        's', &cfg.trsvcid,         "transport service id (e.g. IP port)"),
-		OPT_LIST("host-traddr",    'w', &cfg.host_traddr,     "host traddr (e.g. FC WWN's)"),
-		OPT_LIST("hostnqn",        'q', &cfg.hostnqn,         "user-defined hostnqn (if default not used)"),
-		OPT_LIST("hostid",         'I', &cfg.hostid,          "user-defined hostid (if default not used)"),
-		OPT_LIST("raw",            'r', &cfg.raw,             "raw output file"),
-		OPT_LIST("device",         'd', &cfg.device,          "use existing discovery controller device"),
-		OPT_INT("keep-alive-tmo",  'k', &cfg.keep_alive_tmo,  "keep alive timeout period in seconds"),
-		OPT_INT("reconnect-delay", 'c', &cfg.reconnect_delay, "reconnect timeout period in seconds"),
-		OPT_INT("ctrl-loss-tmo",   'l', &cfg.ctrl_loss_tmo,   "controller loss timeout period in seconds"),
-		OPT_INT("tos",             'T', &cfg.tos,             "type of service"),
-		OPT_FLAG("hdr_digest",     'g', &cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
-		OPT_FLAG("data_digest",    'G', &cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
-		OPT_INT("nr-io-queues",    'i', &cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
-		OPT_INT("nr-write-queues", 'W', &cfg.nr_write_queues, "number of write queues to use (default 0)"),
-		OPT_INT("nr-poll-queues",  'P', &cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
-		OPT_INT("queue-size",      'Q', &cfg.queue_size,      "number of io queue elements to use (default 128)"),
-		OPT_FLAG("persistent",     'p', &cfg.persistent,      "persistent discovery connection"),
+		OPT_LIST("transport",      't', &fabrics_cfg.transport,       "transport type"),
+		OPT_LIST("traddr",         'a', &fabrics_cfg.traddr,          "transport address"),
+		OPT_LIST("trsvcid",        's', &fabrics_cfg.trsvcid,         "transport service id (e.g. IP port)"),
+		OPT_LIST("host-traddr",    'w', &fabrics_cfg.host_traddr,     "host traddr (e.g. FC WWN's)"),
+		OPT_LIST("hostnqn",        'q', &fabrics_cfg.hostnqn,         "user-defined hostnqn (if default not used)"),
+		OPT_LIST("hostid",         'I', &fabrics_cfg.hostid,          "user-defined hostid (if default not used)"),
+		OPT_LIST("raw",            'r', &fabrics_cfg.raw,             "raw output file"),
+		OPT_LIST("device",         'd', &fabrics_cfg.device,          "existing discovery controller device"),
+		OPT_INT("keep-alive-tmo",  'k', &fabrics_cfg.keep_alive_tmo,  "keep alive timeout period in seconds"),
+		OPT_INT("reconnect-delay", 'c', &fabrics_cfg.reconnect_delay, "reconnect timeout period in seconds"),
+		OPT_INT("ctrl-loss-tmo",   'l', &fabrics_cfg.ctrl_loss_tmo,   "controller loss timeout period in seconds"),
+		OPT_INT("tos",             'T', &fabrics_cfg.tos,             "type of service"),
+		OPT_FLAG("hdr_digest",     'g', &fabrics_cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
+		OPT_FLAG("data_digest",    'G', &fabrics_cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
+		OPT_INT("nr-io-queues",    'i', &fabrics_cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
+		OPT_INT("nr-write-queues", 'W', &fabrics_cfg.nr_write_queues, "number of write queues to use (default 0)"),
+		OPT_INT("nr-poll-queues",  'P', &fabrics_cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
+		OPT_INT("queue-size",      'Q', &fabrics_cfg.queue_size,      "number of io queue elements to use (default 128)"),
+		OPT_FLAG("persistent",     'p', &fabrics_cfg.persistent,      "persistent discovery connection"),
 		OPT_FLAG("quiet",          'S', &quiet,               "suppress already connected errors"),
-		OPT_FLAG("matching",       'm', &cfg.matching_only,   "connect only records matching the traddr"),
-		OPT_FMT("output-format",   'o', &cfg.output_format,   output_format),
+		OPT_FLAG("matching",       'm', &fabrics_cfg.matching_only,   "connect only records matching the traddr"),
+		OPT_FMT("output-format",   'o', &fabrics_cfg.output_format,   output_format),
 		OPT_END()
 	};
 
-	cfg.tos = -1;
+	fabrics_cfg.tos = -1;
 	ret = argconfig_parse(argc, argv, desc, opts);
 	if (ret)
 		goto out;
 
-	ret = flags = validate_output_format(cfg.output_format);
+	ret = flags = validate_output_format(fabrics_cfg.output_format);
 	if (ret < 0)
 		goto out;
-	if (cfg.device && strcmp(cfg.device, "none")) {
-		cfg.device = strdup(cfg.device);
-		if (!cfg.device) {
+	if (fabrics_cfg.device && strcmp(fabrics_cfg.device, "none")) {
+		fabrics_cfg.device = strdup(fabrics_cfg.device);
+		if (!fabrics_cfg.device) {
 			ret = -ENOMEM;
 			goto out;
 		}
@@ -1599,24 +1575,24 @@ int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 	if (quiet)
 		log_level = LOG_WARNING;
 
-	if (cfg.device && !strcmp(cfg.device, "none"))
-		cfg.device = NULL;
+	if (fabrics_cfg.device && !strcmp(fabrics_cfg.device, "none"))
+		fabrics_cfg.device = NULL;
 
-	cfg.nqn = NVME_DISC_SUBSYS_NAME;
+	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
 
-	if (!cfg.transport && !cfg.traddr) {
+	if (!fabrics_cfg.transport && !fabrics_cfg.traddr) {
 		ret = discover_from_conf_file(desc, argstr, opts, connect);
 	} else {
-		set_discovery_kato(&cfg);
+		set_discovery_kato(&fabrics_cfg);
 
-		if (traddr_is_hostname(&cfg)) {
-			ret = hostname2traddr(&cfg);
+		if (traddr_is_hostname(&fabrics_cfg)) {
+			ret = hostname2traddr(&fabrics_cfg);
 			if (ret)
 				goto out;
 		}
 
-		if (!cfg.trsvcid)
-			discovery_trsvcid(&cfg);
+		if (!fabrics_cfg.trsvcid)
+			discovery_trsvcid(&fabrics_cfg);
 
 		ret = build_options(argstr, BUF_SIZE, true);
 		if (ret)
@@ -1635,35 +1611,35 @@ int fabrics_connect(const char *desc, int argc, char **argv)
 	int instance, ret;
 
 	OPT_ARGS(opts) = {
-		OPT_LIST("transport",         't', &cfg.transport,         "transport type"),
-		OPT_LIST("nqn",               'n', &cfg.nqn,               "nqn name"),
-		OPT_LIST("traddr",            'a', &cfg.traddr,            "transport address"),
-		OPT_LIST("trsvcid",           's', &cfg.trsvcid,           "transport service id (e.g. IP port)"),
-		OPT_LIST("host-traddr",       'w', &cfg.host_traddr,       "host traddr (e.g. FC WWN's)"),
-		OPT_LIST("hostnqn",           'q', &cfg.hostnqn,           "user-defined hostnqn"),
-		OPT_LIST("hostid",            'I', &cfg.hostid,            "user-defined hostid (if default not used)"),
-		OPT_INT("nr-io-queues",       'i', &cfg.nr_io_queues,      "number of io queues to use (default is core count)"),
-		OPT_INT("nr-write-queues",    'W', &cfg.nr_write_queues,   "number of write queues to use (default 0)"),
-		OPT_INT("nr-poll-queues",     'P', &cfg.nr_poll_queues,    "number of poll queues to use (default 0)"),
-		OPT_INT("queue-size",         'Q', &cfg.queue_size,        "number of io queue elements to use (default 128)"),
-		OPT_INT("keep-alive-tmo",     'k', &cfg.keep_alive_tmo,    "keep alive timeout period in seconds"),
-		OPT_INT("reconnect-delay",    'c', &cfg.reconnect_delay,   "reconnect timeout period in seconds"),
-		OPT_INT("ctrl-loss-tmo",      'l', &cfg.ctrl_loss_tmo,     "controller loss timeout period in seconds"),
-		OPT_INT("tos",                'T', &cfg.tos,               "type of service"),
-		OPT_FLAG("duplicate-connect", 'D', &cfg.duplicate_connect, "allow duplicate connections between same transport host and subsystem port"),
-		OPT_FLAG("disable-sqflow",    'd', &cfg.disable_sqflow,    "disable controller sq flow control (default false)"),
-		OPT_FLAG("hdr-digest",        'g', &cfg.hdr_digest,        "enable transport protocol header digest (TCP transport)"),
-		OPT_FLAG("data-digest",       'G', &cfg.data_digest,       "enable transport protocol data digest (TCP transport)"),
+		OPT_LIST("transport",         't', &fabrics_cfg.transport,         "transport type"),
+		OPT_LIST("nqn",               'n', &fabrics_cfg.nqn,               "nqn name"),
+		OPT_LIST("traddr",            'a', &fabrics_cfg.traddr,            "transport address"),
+		OPT_LIST("trsvcid",           's', &fabrics_cfg.trsvcid,           "transport service id (e.g. IP port)"),
+		OPT_LIST("host-traddr",       'w', &fabrics_cfg.host_traddr,       "host traddr (e.g. FC WWN's)"),
+		OPT_LIST("hostnqn",           'q', &fabrics_cfg.hostnqn,           "user-defined hostnqn"),
+		OPT_LIST("hostid",            'I', &fabrics_cfg.hostid,            "user-defined hostid (if default not used)"),
+		OPT_INT("nr-io-queues",       'i', &fabrics_cfg.nr_io_queues,      "number of io queues to use (default is core count)"),
+		OPT_INT("nr-write-queues",    'W', &fabrics_cfg.nr_write_queues,   "number of write queues to use (default 0)"),
+		OPT_INT("nr-poll-queues",     'P', &fabrics_cfg.nr_poll_queues,    "number of poll queues to use (default 0)"),
+		OPT_INT("queue-size",         'Q', &fabrics_cfg.queue_size,        "number of io queue elements to use (default 128)"),
+		OPT_INT("keep-alive-tmo",     'k', &fabrics_cfg.keep_alive_tmo,    "keep alive timeout period in seconds"),
+		OPT_INT("reconnect-delay",    'c', &fabrics_cfg.reconnect_delay,   "reconnect timeout period in seconds"),
+		OPT_INT("ctrl-loss-tmo",      'l', &fabrics_cfg.ctrl_loss_tmo,     "controller loss timeout period in seconds"),
+		OPT_INT("tos",                'T', &fabrics_cfg.tos,               "type of service"),
+		OPT_FLAG("duplicate-connect", 'D', &fabrics_cfg.duplicate_connect, "allow duplicate connections between same transport host and subsystem port"),
+		OPT_FLAG("disable-sqflow",    'd', &fabrics_cfg.disable_sqflow,    "disable controller sq flow control (default false)"),
+		OPT_FLAG("hdr-digest",        'g', &fabrics_cfg.hdr_digest,        "enable transport protocol header digest (TCP transport)"),
+		OPT_FLAG("data-digest",       'G', &fabrics_cfg.data_digest,       "enable transport protocol data digest (TCP transport)"),
 		OPT_END()
 	};
 
-	cfg.tos = -1;
+	fabrics_cfg.tos = -1;
 	ret = argconfig_parse(argc, argv, desc, opts);
 	if (ret)
 		goto out;
 
-	if (traddr_is_hostname(&cfg)) {
-		ret = hostname2traddr(&cfg);
+	if (traddr_is_hostname(&fabrics_cfg)) {
+		ret = hostname2traddr(&fabrics_cfg);
 		if (ret)
 			goto out;
 	}
@@ -1672,7 +1648,7 @@ int fabrics_connect(const char *desc, int argc, char **argv)
 	if (ret)
 		goto out;
 
-	if (!cfg.nqn) {
+	if (!fabrics_cfg.nqn) {
 		msg(LOG_ERR, "need a -n argument\n");
 		ret = -EINVAL;
 		goto out;
@@ -1775,8 +1751,8 @@ int fabrics_disconnect(const char *desc, int argc, char **argv)
 	int ret;
 
 	OPT_ARGS(opts) = {
-		OPT_LIST("nqn",    'n', &cfg.nqn,    nqn),
-		OPT_LIST("device", 'd', &cfg.device, device),
+		OPT_LIST("nqn",    'n', &fabrics_cfg.nqn,    nqn),
+		OPT_LIST("device", 'd', &fabrics_cfg.device, device),
 		OPT_END()
 	};
 
@@ -1784,29 +1760,29 @@ int fabrics_disconnect(const char *desc, int argc, char **argv)
 	if (ret)
 		goto out;
 
-	if (!cfg.nqn && !cfg.device) {
+	if (!fabrics_cfg.nqn && !fabrics_cfg.device) {
 		msg(LOG_ERR, "need a -n or -d argument\n");
 		ret = -EINVAL;
 		goto out;
 	}
 
-	if (cfg.nqn) {
-		ret = disconnect_by_nqn(cfg.nqn);
+	if (fabrics_cfg.nqn) {
+		ret = disconnect_by_nqn(fabrics_cfg.nqn);
 		if (ret < 0)
 			msg(LOG_ERR, "Failed to disconnect by NQN: %s\n",
-				cfg.nqn);
+				fabrics_cfg.nqn);
 		else {
-			printf("NQN:%s disconnected %d controller(s)\n", cfg.nqn, ret);
+			printf("NQN:%s disconnected %d controller(s)\n", fabrics_cfg.nqn, ret);
 			ret = 0;
 		}
 	}
 
-	if (cfg.device) {
-		ret = disconnect_by_device(cfg.device);
+	if (fabrics_cfg.device) {
+		ret = disconnect_by_device(fabrics_cfg.device);
 		if (ret)
 			msg(LOG_ERR,
 				"Failed to disconnect by device name: %s\n",
-				cfg.device);
+				fabrics_cfg.device);
 	}
 
 out:
diff --git a/fabrics.h b/fabrics.h
index f5b8eaf..41e6a2d 100644
--- a/fabrics.h
+++ b/fabrics.h
@@ -10,4 +10,46 @@ extern int fabrics_connect(const char *desc, int argc, char **argv);
 extern int fabrics_disconnect(const char *desc, int argc, char **argv);
 extern int fabrics_disconnect_all(const char *desc, int argc, char **argv);
 
+/* Symbols used by monitor.c */
+
+const char *arg_str(const char * const *strings, size_t array_size, size_t idx);
+
+struct fabrics_config {
+	const char *nqn;
+	const char *transport;
+	const char *traddr;
+	const char *trsvcid;
+	const char *host_traddr;
+	const char *hostnqn;
+	const char *hostid;
+	int  nr_io_queues;
+	int  nr_write_queues;
+	int  nr_poll_queues;
+	int  queue_size;
+	int  keep_alive_tmo;
+	int  reconnect_delay;
+	int  ctrl_loss_tmo;
+	int  tos;
+	const char *raw;
+	char *device;
+	int  duplicate_connect;
+	int  disable_sqflow;
+	int  hdr_digest;
+	int  data_digest;
+	bool persistent;
+	bool matching_only;
+	const char *output_format;
+};
+extern struct fabrics_config fabrics_cfg;
+
+extern const char *const trtypes[];
+
+#define BUF_SIZE 4096
+
+int build_options(char *argstr, int max_len, bool discover);
+int do_discover(char *argstr, bool connect, enum nvme_print_flags flags);
+int ctrl_instance(const char *device);
+char *parse_conn_arg(const char *conargs, const char delim, const char *field);
+int remove_ctrl(int instance);
+
 #endif
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 02/16] nvme-cli: add code for event and timeout handling
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
  2021-03-06  0:36 ` [PATCH v2 01/16] fabrics: export symbols required for monitor functionality mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-17  0:32   ` Martin Wilck
  2021-03-06  0:36 ` [PATCH v2 03/16] monitor: add basic "nvme monitor" functionality mwilck
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

For the nvme monitor functionality, an event handling mechanism
will be necessary which deals with event timeouts. While there are standard
solutions for this (e.g. libevent), these add unnecessary complexity
and dependencies to nvme-cli.

Add a small, straighforward event and timeout handling code based
on epoll and timerfd.

This code is identical to what I've pushed recently to
https://github.com/mwilck/minivent, where I added a couple of unit
tests to make sure the code is as robust as it needs to be.
---
 Makefile        |  11 +-
 common.h        |  17 ++
 event/event.c   | 481 ++++++++++++++++++++++++++++++++++++++++++++++++
 event/event.h   | 460 +++++++++++++++++++++++++++++++++++++++++++++
 event/timeout.c | 373 +++++++++++++++++++++++++++++++++++++
 event/timeout.h | 110 +++++++++++
 event/ts-util.c | 107 +++++++++++
 event/ts-util.h | 129 +++++++++++++
 8 files changed, 1683 insertions(+), 5 deletions(-)
 create mode 100644 event/event.c
 create mode 100644 event/event.h
 create mode 100644 event/timeout.c
 create mode 100644 event/timeout.h
 create mode 100644 event/ts-util.c
 create mode 100644 event/ts-util.h

diff --git a/Makefile b/Makefile
index 1fe693c..ad18d47 100644
--- a/Makefile
+++ b/Makefile
@@ -63,6 +63,7 @@ OBJS := nvme-print.o nvme-ioctl.o nvme-rpmb.o \
 	nvme-status.o nvme-filters.o nvme-topology.o
 
 UTIL_OBJS := util/argconfig.o util/suffix.o util/json.o util/parser.o util/cleanup.o util/log.o
+EVENT_OBJS := event/event.o event/timeout.o event/ts-util.o
 
 PLUGIN_OBJS :=					\
 	plugins/intel/intel-nvme.o		\
@@ -83,11 +84,11 @@ PLUGIN_OBJS :=					\
 	plugins/transcend/transcend-nvme.o	\
 	plugins/zns/zns.o
 
-nvme: nvme.c nvme.h $(OBJS) $(PLUGIN_OBJS) $(UTIL_OBJS) NVME-VERSION-FILE
-	$(QUIET_CC)$(CC) $(CPPFLAGS) $(CFLAGS) $(INC) $< -o $(NVME) $(OBJS) $(PLUGIN_OBJS) $(UTIL_OBJS) $(LDFLAGS)
+nvme: nvme.c nvme.h $(OBJS) $(PLUGIN_OBJS) $(UTIL_OBJS) $(EVENT_OBJS) NVME-VERSION-FILE
+	$(QUIET_CC)$(CC) $(CPPFLAGS) $(CFLAGS) $(INC) $< -o $(NVME) $(OBJS) $(PLUGIN_OBJS) $(UTIL_OBJS) $(EVENT_OBJS) $(LDFLAGS)
 
-verify-no-dep: nvme.c nvme.h $(OBJS) $(UTIL_OBJS) NVME-VERSION-FILE
-	$(QUIET_CC)$(CC) $(CPPFLAGS) $(CFLAGS) $(INC) $< -o $@ $(OBJS) $(UTIL_OBJS) $(LDFLAGS)
+verify-no-dep: nvme.c nvme.h $(OBJS) $(UTIL_OBJS)$(EVENT_OBJS)  NVME-VERSION-FILE
+	$(QUIET_CC)$(CC) $(CPPFLAGS) $(CFLAGS) $(INC) $< -o $@ $(OBJS) $(UTIL_OBJS) $(EVENT_OBJS) $(LDFLAGS)
 
 nvme.o: nvme.c nvme.h nvme-print.h nvme-ioctl.h util/argconfig.h util/suffix.h nvme-lightnvm.h fabrics.h
 	$(QUIET_CC)$(CC) $(CPPFLAGS) $(CFLAGS) $(INC) -c $<
@@ -107,7 +108,7 @@ test:
 all: doc
 
 clean:
-	$(RM) $(NVME) $(OBJS) $(PLUGIN_OBJS) $(UTIL_OBJS) *~ a.out NVME-VERSION-FILE *.tar* nvme.spec version control nvme-*.deb 70-nvmf-autoconnect.conf
+	$(RM) $(NVME) $(OBJS) $(PLUGIN_OBJS) $(UTIL_OBJS) $(EVENT_OBJS) *~ a.out NVME-VERSION-FILE *.tar* nvme.spec version control nvme-*.deb 70-nvmf-autoconnect.conf
 	$(MAKE) -C Documentation clean
 	$(RM) tests/*.pyc
 	$(RM) verify-no-dep
diff --git a/common.h b/common.h
index 1c214a4..4a5a8da 100644
--- a/common.h
+++ b/common.h
@@ -12,4 +12,21 @@
 #define __stringify_1(x...) #x
 #define __stringify(x...)  __stringify_1(x)
 
+/**
+ * container_of - cast a member of a structure out to the containing structure
+ *
+ * @ptr:	the pointer to the member.
+ * @type:	the type of the container struct this is embedded in.
+ * @member:	the name of the member within the struct.
+ *
+ */
+#define container_of_const(ptr, type, member) ({	\
+	typeof( ((const type *)0)->member ) *__mptr = (ptr);	\
+	(const type *)( (const char *)__mptr - offsetof(type,member) );})
+#define container_of(ptr, type, member) ({		\
+	typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+	(type *)( (char *)__mptr - offsetof(type,member) );})
+
+#define STEAL_PTR(p) ({ typeof(p) __tmp = (p); (p) = NULL; __tmp; })
+
 #endif
diff --git a/event/event.c b/event/event.c
new file mode 100644
index 0000000..b4a4101
--- /dev/null
+++ b/event/event.c
@@ -0,0 +1,481 @@
+/*
+ * Copyright (c) 2021 Martin Wilck, SUSE LLC
+ * SPDX-License-Identifier: LGPL-2.1-or-newer
+ */
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/epoll.h>
+#include <stdbool.h>
+#include <syslog.h>
+#include <string.h>
+#include <limits.h>
+#define LOG_FUNCNAME 1
+#include "log.h"
+#include "common.h"
+#include "cleanup.h"
+#include "event.h"
+#include "timeout.h"
+
+/* size of events array in call to epoll_pwait() */
+#define MAX_EVENTS 8
+#define LEN_CHUNK 8
+
+struct dispatcher {
+	int epoll_fd;
+	bool exiting;
+	struct event *timeout_event;
+	unsigned int len, n, free;
+	struct event **events;
+};
+
+const char * const reason_str[__MAX_CALLBACK_REASON] = {
+	[REASON_EVENT_OCCURED] = "event occured",
+	[REASON_TIMEOUT] = "timeout",
+};
+
+static int _dispatcher_increase(struct dispatcher *dsp)
+{
+	struct event **new;
+
+	if (dsp->len >= UINT_MAX - LEN_CHUNK)
+		return -EOVERFLOW;
+	new = realloc(dsp->events, (dsp->len + LEN_CHUNK) * sizeof(*new));
+	if (!new)
+		return -ENOMEM;
+	dsp->len += LEN_CHUNK;
+	dsp->events = new;
+	msg(LOG_DEBUG, "new size: %u\n", dsp->len);
+	return 0;
+}
+
+static unsigned int _dispatcher_find(const struct dispatcher *dsp,
+				     const struct event *evt)
+{
+	unsigned int i;
+
+	for (i = 0; i < dsp->n; i++)
+		if (dsp->events[i] == evt)
+			return i;
+	return UINT_MAX;
+}
+
+static int _dispatcher_add(struct dispatcher *dsp, struct event *evt)
+{
+	unsigned int i;
+	int rc;
+
+	if (_dispatcher_find(dsp, evt) != UINT_MAX)
+		return -EEXIST;
+
+	if (dsp->free > 0) {
+		for (i = 0; i < dsp->n; i++) {
+			if (dsp->events[i] == NULL)
+				break;
+		}
+		if (i == dsp->n) {
+			msg(LOG_WARNING, "free=%u, but no empty slot found\n",
+			    dsp->free);
+			dsp->free = 0;
+		} else {
+			dsp->events[i] = evt;
+			dsp->free--;
+			msg(LOG_DEBUG, "new event @%u, %u/%u/%u free\n",
+			    i, dsp->free, dsp->n, dsp->len);
+			return 0;
+		}
+	}
+
+	if (dsp->len == dsp->n)
+		if ((rc = _dispatcher_increase(dsp)) < 0)
+			return rc;
+
+	dsp->events[dsp->n] = evt;
+	dsp->n++;
+	msg(LOG_DEBUG, "new event @%u, %u/%u/%u free\n",
+	    dsp->n, dsp->free, dsp->n, dsp->len);
+	return 0;
+}
+
+static int _dispatcher_gc(struct dispatcher *dsp) {
+	unsigned int i, n;
+        struct event **new;
+
+	if (dsp->free <= dsp->len / 4)
+		return 0;
+
+	n = dsp->n;
+	for (i = n; i > 0; i--) {
+		unsigned int j;
+
+		if (dsp->events[i - 1] != NULL)
+			continue;
+
+		for (j = i - 1; j > 0; j--)
+			if (dsp->events[j - 1] != NULL)
+				break;
+
+		memmove(&dsp->events[j], &dsp->events[i],
+			(dsp->n - i) * sizeof(*dsp->events));
+
+		n -= (i - j);
+		if (j == 0)
+			break;
+		else
+			i = j;
+	}
+
+	if (dsp->n - n  != dsp->free)
+		msg(LOG_ERR, "error: %u != %u\n", dsp->free, dsp->n - n);
+	else {
+		msg(LOG_DEBUG, "collected %u slots\n", dsp->free);
+		dsp->n = n;
+		dsp->free = 0;
+	}
+
+	for (i = 0; i < dsp->n; i++) {
+		if (dsp->events[i] == NULL)
+			msg(LOG_ERR, "error at %u\n", i);
+	}
+
+	if (dsp->len <= 2 * LEN_CHUNK || dsp->n >= dsp->len / 2)
+		return 0;
+
+	new = realloc(dsp->events, (dsp->len / 2) * sizeof(*new));
+	if (!new)
+		return -ENOMEM;
+	dsp->events = new;
+	dsp->len = dsp->len / 2;
+
+	msg(LOG_NOTICE, "new size: %u/%u\n", dsp->n, dsp->len);
+	return 0;
+}
+
+static int _dispatcher_remove(struct dispatcher *dsp, struct event *ev)
+{
+	unsigned int i;
+
+	if ((i = _dispatcher_find(dsp, ev)) == UINT_MAX) {
+		msg(LOG_NOTICE, "event not found\n");
+		return -ENOENT;
+	}
+
+	dsp->events[i] = NULL;
+	if (i == dsp->n - 1)
+		dsp->n--;
+	else
+		dsp->free++;
+
+	msg(LOG_DEBUG, "removed event @%u, %u/%u/%u free\n",
+	    i, dsp->free, dsp->n, dsp->len);
+
+	return _dispatcher_gc(dsp);
+}
+
+int _event_remove(struct event *evt)
+{
+	if (evt->fd != -1) {
+		int rc = epoll_ctl(evt->dsp->epoll_fd, EPOLL_CTL_DEL, evt->fd, NULL);
+
+		if (rc == -1)
+			msg(LOG_ERR, "EPOLL_CTL_DEL: %m");
+		return rc;
+	} else
+		return 0;
+}
+
+static void _run_cleanup_handlers(struct dispatcher *dsp, bool do_epoll)
+{
+	unsigned int i;
+
+	for (i = 0; i < dsp->n; i++) {
+		struct event *evt = dsp->events[i];
+
+		if (!evt)
+			continue;
+
+		if (do_epoll)
+			_event_remove(evt);
+		if (evt->cleanup)
+			evt->cleanup(evt);
+	}
+}
+
+int cleanup_dispatcher(struct dispatcher *dsp)
+{
+
+	if (!dsp)
+		return -EINVAL;
+	if (dsp->exiting)
+		return 0;
+
+	dsp->exiting = true;
+
+	_run_cleanup_handlers(dsp, true);
+	timeout_reset(dsp->timeout_event);
+
+	dsp->len = dsp->n = dsp->free = 0;
+	free(dsp->events);
+	dsp->events = NULL;
+	dsp->exiting = false;
+	return 0;
+}
+
+void free_dispatcher(struct dispatcher *dsp)
+{
+	if (!dsp)
+		return;
+
+	/*
+	 * If this function is called e.g. after fork(), we must not
+	 * call epoll_ctl() or reset the timerfd (thus not call timeout_reset()).
+	 * Just close the dup'd timerfd and epoll_fd, and free memory.
+	 */
+	_run_cleanup_handlers(dsp, false);
+	if (dsp->timeout_event)
+		free_timeout_event(dsp->timeout_event);
+	if (dsp->epoll_fd != -1)
+		close(dsp->epoll_fd);
+	free(dsp->events);
+	free(dsp);
+}
+
+static DEFINE_CLEANUP_FUNC(free_dsp_p, struct dispatcher *, free_dispatcher);
+static int _event_add(struct dispatcher *dsp, struct event *evt);
+
+struct dispatcher *new_dispatcher(int clocksrc)
+{
+	struct dispatcher *dsp __cleanup__(free_dsp_p) = NULL;
+
+	dsp = calloc(1, sizeof(*dsp));
+	if (!dsp)
+		return NULL;
+
+	if ((dsp->epoll_fd = epoll_create1(EPOLL_CLOEXEC)) == -1) {
+		msg(LOG_ERR, "epoll_create1: %m\n");
+		return NULL;
+	}
+
+	if (!(dsp->timeout_event = new_timeout_event(clocksrc))) {
+		msg(LOG_ERR, "failed to create timeout event: %m\n");
+		return NULL;
+	}
+
+	/* Don't use event_add() here, timeout is tracked separately */
+	if (_event_add(dsp, dsp->timeout_event) != 0) {
+		msg(LOG_ERR, "failed to dispatch timeout event: %m\n");
+		return NULL;
+	} else
+		return STEAL_PTR(dsp);
+}
+
+int dispatcher_get_efd(const struct dispatcher *dsp)
+{
+	if (!dsp)
+		return -EINVAL;
+	return dsp->epoll_fd;
+}
+
+static int _event_add(struct dispatcher *dsp, struct event *evt)
+{
+	evt->ep.data.ptr = evt;
+	if (evt->fd != -1 &&
+	    epoll_ctl(dsp->epoll_fd, EPOLL_CTL_ADD, evt->fd, &evt->ep) == -1) {
+		msg(LOG_ERR, "failed to add event: %m\n");
+		_dispatcher_remove(evt->dsp, evt);
+		return -errno;
+	}
+	evt->dsp = dsp;
+	evt->reason = 0;
+	return timeout_add(dsp->timeout_event, evt);
+}
+
+int event_add(struct dispatcher *dsp, struct event *evt)
+{
+	int rc;
+
+	if (!dsp || !evt || !evt->callback)
+		return -EINVAL;
+	if (dsp->exiting)
+		return -EBUSY;
+	if ((rc = _dispatcher_add(dsp, evt)) < 0)
+		return rc;
+	return _event_add(dsp, evt);
+}
+
+int event_remove(struct event *evt)
+{
+	int rc;
+
+	if (!evt || !evt->dsp)
+		return -EINVAL;
+
+	rc = _event_remove(evt);
+	if (rc == -1)
+		rc = -errno;
+
+	_dispatcher_remove(evt->dsp, evt);
+	timeout_cancel(evt->dsp->timeout_event, evt);
+	evt->dsp = NULL;
+
+	return rc;
+}
+
+int event_mod_timeout(struct event *evt, const struct timespec *tmo)
+{
+	unsigned int i;
+	struct timespec ts;
+
+	if (!evt || !evt->dsp || !tmo)
+		return -EINVAL;
+	if (evt->dsp->exiting)
+		return -EBUSY;
+	if ((i = _dispatcher_find(evt->dsp, evt)) == UINT_MAX) {
+		msg(LOG_WARNING, "attempt to modify non-existing event\n");
+		return -EEXIST;
+	}
+
+	ts = *tmo;
+	return timeout_modify(evt->dsp->timeout_event, evt, &ts);
+}
+
+int event_modify(struct event *evt)
+{
+	int rc;
+	unsigned int i;
+
+	if (!evt || !evt->dsp)
+		return -EINVAL;
+	if (evt->dsp->exiting)
+		return -EBUSY;
+	if ((i = _dispatcher_find(evt->dsp, evt)) == UINT_MAX) {
+		msg(LOG_WARNING, "attempt to modify non-existing event\n");
+		return -EEXIST;
+	}
+	rc= epoll_ctl(evt->dsp->epoll_fd, EPOLL_CTL_MOD,
+			   evt->fd, &evt->ep);
+	return rc == -1 ? -errno : 0;
+}
+
+void _event_invoke_callback(struct event *ev, unsigned short reason,
+			   unsigned int events, bool reset_reason)
+{
+	int rc;
+
+	if (ev->reason) {
+		msg(LOG_INFO, "skipping callback for %s because of %s\n",
+		    reason_str[reason], reason_str[ev->reason]);
+		return;
+	}
+
+	ev->reason = reason;
+	rc = ev->callback(ev, events);
+
+	if (rc == EVENTCB_CLEANUP) {
+		msg(LOG_DEBUG, "cleaning out event\n");
+		event_remove(ev);
+		if (ev->cleanup)
+			ev->cleanup(ev);
+	} else if (rc == EVENTCB_REMOVE) {
+		msg(LOG_DEBUG, "removing event\n");
+		event_remove(ev);
+		ev->reason = 0;
+	} else if (reset_reason)
+		ev->reason = 0;
+}
+
+
+int event_wait(const struct dispatcher *dsp, const sigset_t *sigmask)
+{
+	int ep_fd = dispatcher_get_efd(dsp);
+	int rc, i;
+	struct epoll_event events[MAX_EVENTS];
+	struct epoll_event *tmo_event = NULL;
+
+	if (!dsp)
+		return -EINVAL;
+	if (dsp->exiting)
+		return -EBUSY;
+	if (ep_fd < 0)
+		return -EINVAL;
+
+	rc = epoll_pwait(ep_fd, events, MAX_EVENTS, -1, sigmask);
+	if (rc == -1) {
+		msg(errno == EINTR ? LOG_DEBUG : LOG_WARNING,
+		    "epoll_pwait: %m\n");
+		return -errno;
+	}
+
+	msg(LOG_DEBUG, "received %d events\n", rc);
+	for (i = 0; i < rc; i++) {
+		struct event *ev = events[i].data.ptr;
+
+		if (ev == dsp->timeout_event)
+			tmo_event = &events[i];
+		else
+			_event_invoke_callback(ev, REASON_EVENT_OCCURED,
+					       events[i].events, false);
+	}
+
+	if (tmo_event) {
+		struct event *ev = tmo_event->data.ptr;
+
+		_event_invoke_callback(ev, REASON_EVENT_OCCURED,
+					    tmo_event->events, false);
+	}
+
+	for (i = 0; i < rc; i++) {
+		struct event *ev = events[i].data.ptr;
+		ev->reason = 0;
+	}
+
+	return ELOOP_CONTINUE;
+}
+
+int event_loop(const struct dispatcher *dsp, const sigset_t *sigmask,
+	       int (*err_handler)(int err))
+{
+	int rc;
+
+	do {
+		rc = event_wait(dsp, sigmask);
+		if (rc < 0 && err_handler)
+			rc = err_handler(-errno);
+	} while (rc == ELOOP_CONTINUE);
+
+	return rc;
+}
+
+void cleanup_event_on_stack(struct event *evt)
+{
+	if (!evt)
+		return;
+	if (evt->fd != -1)
+		close(evt->fd);
+}
+
+void cleanup_event_on_heap(struct event *evt)
+{
+	if (!evt)
+		return;
+	cleanup_event_on_stack(evt);
+	free(evt);
+}
+
+int _call_timer_cb(struct event *evt, uint32_t events __attribute__((unused)))
+{
+	struct timer_event *tim = container_of(evt, struct timer_event, e);
+
+	if (!evt)
+		return -EINVAL;
+
+	tim->timer_fn(tim->timer_arg);
+	return EVENTCB_CLEANUP;
+}
+
+int dispatcher_get_clocksource(const struct dispatcher *dsp)
+{
+	if (!dsp)
+		return -EINVAL;
+	return timeout_get_clocksource(dsp->timeout_event);
+}
diff --git a/event/event.h b/event/event.h
new file mode 100644
index 0000000..ff892f1
--- /dev/null
+++ b/event/event.h
@@ -0,0 +1,460 @@
+/*
+ * Copyright (c) 2021 Martin Wilck, SUSE LLC
+ * SPDX-License-Identifier: LGPL-2.1-or-newer
+ */
+#ifndef _EVENT_H
+#define _EVENT_H
+#include <stddef.h>
+#include <sys/epoll.h>
+
+struct event;
+struct dispatcher;
+
+/**
+ * reason codes for event callback function
+ * @REASON_EVENT_OCCURED: the event has occured
+ * @REASON_TIMEOUT: the timeout has expired
+ * @REASON_CLEANUP: dispatcher is about to exit
+ */
+enum {
+	REASON_EVENT_OCCURED,
+	REASON_TIMEOUT,
+	__MAX_CALLBACK_REASON,
+};
+
+/*
+ * reason_str: string representation for the reason a callback is called.
+ */
+extern const char * const reason_str[__MAX_CALLBACK_REASON];
+
+/**
+ * Return codes for callback function
+ * @EVENTCB_CONTINUE:  continue processing
+ * @EVENTCB_REMOVE:    remove this event
+ * @EVENTCB_CLEANUP:   call the cleanup callback (implies EVENTCB_REMOVE)
+ */
+enum {
+	EVENTCB_CONTINUE = 0,
+	EVENTCB_REMOVE =   1,
+	EVENTCB_CLEANUP =  2,
+};
+
+/*
+ * Flags for struct event
+ */
+enum {
+	/*
+	 * timeout is absolute time, not relative.
+	 * Used in event_add() and event_modify_timeout()
+	 */
+	TMO_ABS = 1,
+};
+
+/**
+ * Prototype for event callback.
+ *
+ * @evt: the event object which registered the callback
+ * @events: bit mask of epoll events (see epoll_ctl(2))
+ *
+ * In the callback, check event->reason to obtain the reason the
+ * callback was called for.
+ *
+ * NOTE: race conditions between timeout and event completion can't
+ * be fully avoided. Even if called with @REASON_TIMEOUT, the callback
+ * should check whether an event might have arrived in the meantime,
+ * and in this case, handle the event as if it had arrived before
+ * the timeout.
+ *
+ * CAUTION: don not free() @evt in this callback.
+ *
+ * Return: an EVENTCB_xxx value (see above).
+ */
+typedef int (*cb_fn)(struct event *evt, uint32_t events);
+
+/**
+ * Prototype for cleanup callback.
+ * @evt: the event object which registered the callback
+ *
+ * Called for this event if the event callback returned EVENTCB_CLEANUP, and
+ * from cleanup_dispatcher() / free_dispatcher(), for all registered events.
+ *
+ * If this callback is called, the event has already been
+ * removed from the dispatcher's internal lists. Use this callback
+ * to free the event (if necessary), close file descriptors, and
+ * release other resources as appropriate.
+ *
+ * @evt: the event object which registered the callback
+ */
+typedef void (*cleanup_fn)(struct event *evt);
+
+
+/**
+ * cleanup_event_on_stack() - convenience cleanup callback
+ * @evt: the event object which registered the callback
+ *
+ * This cleanup function simply closes @evt->fd.
+ */
+void cleanup_event_on_stack(struct event *evt);
+
+/**
+ * cleanup_event_on_heap() - convenience cleanup callback
+ * @evt: the event object which registered the callback
+ *
+ * This cleanup function closes @evt->fd and frees @eft.
+ */
+void cleanup_event_on_heap(struct event *evt);
+
+/**
+ * struct event - data structure for a generic event with timeout
+ *
+ * For best results, embed this data structure in the data you need.
+ *
+ * @ep: struct epoll_event. Fill in the @ep.events field with the epoll event
+ *      types you're interested in (see epoll_ctl(2)). If @ep.events is
+ *      0, the event is "disabled"; the timeout will still be active if set.
+ *      CAUTION: don't touch ep.data, it's used by the dispatcher internally.
+ * @fd: the file desciptor to monitor. Use -1 (and fill in the tmo field)
+ *      to create a timer.
+ *      Note: don't change the fd field after creating the event. In particular,
+ *      setting a positve fd after calling event_add with fd == -1 is not allowed.
+ * @callback: the callback function to be called if the event occurs or
+ *      times out. This field *must* be set.
+ * @dispatcher: the dispatcher object to which this event belongs
+ * @tmo: The timeout for the event.
+ *      Setting @tmo.tv_sec = @tmo.tv_nsec = 0 on calls to event_add()
+ *      creates an event with no (=infinite) timeout.
+ *      CAUTION: USED INTERNALLY. Do not change this any more after calling
+ *      event_add(), after event_finish(), it may be set again. The field
+ *      may be modified by the dispatcher code. To change the timeout,
+ *      call event_mod_timeout().
+ * @flags: See above. Currently only @TMO_ABS is supported. This field may
+ *      be used internally by the dispatcher, be sure to set or clear only
+ *      public bits.
+ */
+
+struct event {
+	struct epoll_event ep;
+	int fd;
+	unsigned short reason;
+	unsigned short flags;
+	struct dispatcher *dsp;
+	struct timespec tmo;
+	cb_fn callback;
+	cleanup_fn cleanup;
+};
+
+/**
+ * event_add() - add an event to be monitored.
+ *
+ * @dispatcher: a dispatcher object
+ * @event: an event structure. See the description above for the
+ *
+ * Return: 0 on success, negative error code (-errno) on failure.
+ */
+int event_add(struct dispatcher *dsp, struct event *event);
+
+/**
+ * event_remove() - remove the event from epoll.
+ *
+ * @event: a previously added event structure
+ *
+ * Removes the event from the dispatcher, and cancels the associated
+ * timeout (if any).
+ *
+ * CAUTION: don't call this from callbacks. Use EVENTCB_xxx return codes
+ * instead.
+ *
+ * Return: 0 on success, negative error code (-errno) on failure.
+ */
+int event_remove(struct event *event);
+
+/**
+ * event_modify() - modify epoll events to wait for
+ *
+ * @event: a previously added event structure
+ *
+ * Call this function to change the epoll events (event->ep.events).
+ * By setting @ep.events = 0, the event is temporarily disabled and
+ * can be re-enabled later. NOTE: this function doesn't disable an
+ * active timeout; use event_mod_timeout() for that.
+ *
+ * Return: 0 on success, negative error code (-errno) on failure.
+ */
+int event_modify(struct event *event);
+
+/**
+ * event_mod_timeout() - modify or re-arm timeout for an event
+ *
+ * @event: a previously added event structure
+ * @tmo: the new timeout value
+ *
+ * Call this function to modify or re-enable a timeout for an event.
+ * It can (and must!) be used from the callback to change the timeout
+ * if the event occured, to wait longer if it has timed out. 
+ * If @tmo->tv_sec and @tmo->tv_nsec are both 0, an existing timeout is
+ * cleared (an inifinite timeout is used for this event), as if the tmo field
+ * had been set to { 0, 0 } in the call to event_add().  Set or clear
+ * @event->flags to indicate whether @tmo is an absolute or relative
+ * timeout. Note that the flags fields is "remembered", so if you want to use
+ * a relative timeout after having used an absolute timeout before, you must
+ * clear the @TMO_ABS field in event->flags before calling this function.
+ *
+ * NOTE: if the callback is called with reason REASON_TIMEOUT, the timeout
+ * has expired and *must* be rearmed if the event is monitored further.
+ * Otherwise, the timeout will implicitly be changed to "infinite", because
+ * there is no timeout for this event any more.
+ *
+ * Return: 0 on success, negative error code (-errno) on failure.
+ */
+int event_mod_timeout(struct event *event, const struct timespec *tmo);
+
+/**
+ * int _event_invoke_callback - handle callback invocation
+ * @reason: one of the reason codes above
+ *
+ * Internal use only.
+ */
+void _event_invoke_callback(struct event *, unsigned short, unsigned int, bool);
+
+/**
+ * event_wait(): wait for events or timeouts, once
+ *
+ * @dispatcher: a dispatcher object
+ * @sigmask: set of signals to be blocked while waiting
+ *
+ * This function waits for events or timeouts to occur, and calls
+ * callbacks as appropriate. A single epoll_wait() call is made.
+ * Depending on how the code was compiled, 0, 1, or more events may
+ * occur in a single call. While waiting, the signal mask will be
+ * set to @sigmask atomically. It is recommended to block all signals
+ * except those that the application wants to receive (e.g. SIGTERM),
+ * and install a signal handler for these signals to avoid the default
+ * action (usually program termination, see signal(7)).
+ *
+ * NOTE: if no events have been added to the dispatcher before calling this
+ * function, it will block waiting until a signal is caught.
+ *
+ * Return: 0 on success, a negative error code (-errno) on failure,
+ * which might be -EINTR.
+ */
+int event_wait(const struct dispatcher *dsp, const sigset_t *sigmask);
+
+/**
+ * Return codes for err_handler in event_loop()
+ */
+enum {
+	ELOOP_CONTINUE = 0,
+	ELOOP_QUIT,
+};
+
+/**
+ * event_loop(): wait for some or timeouts, repeatedly
+ *
+ * @dispatcher: a dispatcher object
+ * @sigmask: set of signals to be blocked while waiting
+ * @err_handler: callback for event_wait
+ *
+ * This function calls event_wait() in a loop, and calls err_handler() if
+ * event_wait() returns an error code, passing it the negative error code
+ * (e.g. -EINTR) in the @err parameter. err_handler() should return ELOOP_QUIT
+ * or a negative error code to make event_loop() return, and ELOOP_CONTINUE
+ * if event_loop() should continue execution.
+ * @err_handler may be NULL, in which case event_loop() will simply return
+ * the error code from event_wait().
+ *
+ * Return: 0 on success, or negative error code (-errno) on failure.
+ * In particular, it returns -EINTR if interrupted by caught signal.
+ */
+int event_loop(const struct dispatcher *dsp, const sigset_t *sigmask,
+	       int (*err_handler)(int err));
+
+/**
+ * cleanup_dispatcher() - clean out all events and timeouts
+ * @dsp: a pointer returned by new_dispatcher().
+ *
+ * Remove all events and timeouts, and call every event's @cleanup
+ * callback. The dispatcher object itself remains intact, and can
+ * be re-used by adding new events.
+ *
+ * NOTE: unlike free_dispatcher(), this function disables the timer
+ * event (as it cancels all timeouts), and removes all fds from the
+ * dispatcher's epoll instance. Thus calling this e.g. after fork()
+ * affects the parent process's operation.
+ *
+ * Return: 0 on success, negative error code (-errno) on failure.
+ */
+int cleanup_dispatcher(struct dispatcher *dsp);
+
+/**
+ * free_dispatcher() - free a dispatcher object.
+ * @dsp: a pointer returned by new_dispatcher().
+ *
+ * Calls the @cleanup callback of every registered event, and frees
+ * the dispatcher's data structures.
+ *
+ * NOTE: Unlike cleanup_dispatcher(), this function doesn't touch the
+ * kernel-owned epoll and itimerfd data structures. It's safe to call after
+ * fork() without disturbing the parent.
+ */
+void free_dispatcher(struct dispatcher *dsp);
+
+/**
+ * new_dispatcher() - allocate and return a new dispatcher object.
+ *
+ * @clocksrc: one of the supported clock sources of the system,
+ *            see clock_gettime(2). It will be used for timeout handling.
+ *
+ * Return: NULL on failure, a valid pointer otherwise.
+ */
+struct dispatcher *new_dispatcher(int clocksrc);
+
+/**
+ * dispatcher_get_efd() - obtain the epoll file descriptor
+ *
+ * @dispatcher: a dispatcher object
+ *
+ * Use this function if you want to implement a custom wait loop, to
+ * obtain the file descriptor to be passed to epoll_wait().
+ */
+int dispatcher_get_efd(const struct dispatcher *dsp);
+
+/**
+ * dispatcher_get_clocksource() - obtain the clock source used for timeouts
+ *
+ * @dispatcher: a dispatcher object
+ * *
+ * Return: the clocksrc passed to new_dispatcher when the object was
+ * created.
+ */
+int dispatcher_get_clocksource(const struct dispatcher *dsp);
+
+/**
+ * Convenenience macros for event initialization
+ *
+ * IMPORTANT: The cleanup functionality of the ON_HEAP variants requires
+ * that "struct event" is embedded in the application's data structures
+ * at offset 0.
+ */
+
+/**
+ * __EVENT_INIT() - generic timer initializer
+ */
+#define __EVENT_INIT(cb, cln, f, ev, s, ns)		\
+	((struct event){				\
+		.fd = (f),				\
+		.ep.events = (ev),			\
+		.callback = (cb),			\
+		.cleanup = (cln),			\
+		.tmo.tv_sec  = (s),			\
+		.tmo.tv_nsec = (ns),			\
+	})
+
+/**
+ * EVENT_W_TMO_ON_STACK() - initializer for struct event
+ * @cb: callback of type @cb_fn
+ * @f:  file descriptor
+ * @ev: epoll event mask
+ * @us: timeout in microseconds, must be non-negative
+ */
+#define EVENT_W_TMO_ON_STACK(cb, f, ev, us)			\
+	__EVENT_INIT(cb, cleanup_event_on_stack, f, ev,		\
+		     (us) / 1000000L, (us) % 1000000L * 1000)
+
+/**
+ * EVENT_ON_STACK() - initializer for struct event
+ * @cb: callback of type @cb_fn
+ * @f:  file descriptor
+ * @ev: epoll event mask
+ *
+ * The initialized event has no timeout.
+ */
+#define EVENT_ON_STACK(cb, f, ev) \
+	EVENT_W_TMO_ON_STACK(cb, f, ev, 0)
+
+/**
+ * TIMER_EVENT_ON_STACK() - initializer for struct event
+ * @cb: callback of type @cb_fn
+ * @us: timeout in microseconds, must be non-negative
+ * NOTE: it's pointless to set a timeout of 0 us (timer inactive),
+ *       thus the code sets it to 1ns at least.
+ * Thus, by passing us = 0, an event is created that will fire
+ * immediately after calling event_wait() or event_loop().
+ */
+#define TIMER_EVENT_ON_STACK(cb, us)				\
+	__EVENT_INIT(cb, cleanup_event_on_stack, -1, 0,		\
+		     (us) / 1000000L, (us) % 1000000L * 1000 + 1)
+
+/**
+ * EVENT_W_TMO_ON_HEAP() - initializer for struct event
+ * Like EVENT_W_TMO_ON_STACK(), but the cleanup callback
+ * will free the struct event.
+ */
+#define EVENT_W_TMO_ON_HEAP(cb, f, ev, us)			\
+	__EVENT_INIT(cb, cleanup_event_on_heap, f, ev,		\
+		     (us) / 1000000L, (us) % 1000000L * 1000)
+
+/**
+ * EVENT_ON_HEAP() - initializer for struct event
+ * Like EVENT_ON_STACK(), but the cleanup callback
+ * will free the struct event.
+ */
+#define EVENT_ON_HEAP(cb, f, ev)			\
+	EVENT_W_TMO_ON_HEAP(cb, f, ev, 0)
+
+/**
+ * TIMER_EVENT_ON_HEAP() - initializer for struct event
+ * Like TIMER_EVENT_ON_STACK(), but the cleanup callback
+ * will free the struct event.
+ */
+#define TIMER_EVENT_ON_HEAP(cb, us)				\
+	__EVENT_INIT(cb, cleanup_event_on_heap, -1, 0,		\
+		     (us) / 1000000L, (us) % 1000000L * 1000 + 1)
+
+/**
+ * timer_cb - prototype for a generic single-shot timer callback
+ * Use the TIMER macros below.
+ */
+typedef void (*timer_cb)(void *arg);
+
+/**
+ * _call_timer_cb() - helper for invoking timer callbacks
+ *
+ * Internal use.
+ */
+int _call_timer_cb(struct event *, uint32_t events);
+
+/**
+ * struct timer_event - helper struct for invoking timer callbacks
+ */
+struct timer_event {
+	struct event e;
+	timer_cb timer_fn;
+	void *timer_arg;
+};
+
+/**
+ * TIMER_ON_STACK() - initializer for a single-shot timer
+ * @fn: callback of type @timer_cb
+ * @arg: argument to pass to @fn
+ * @us: timeout in microseconds
+ */
+#define TIMER_ON_STACK(fn, arg, us)					\
+	((struct timer_event){						\
+		.e = TIMER_EVENT_ON_STACK(_call_timer_cb, us),		\
+		.timer_fn = fn,						\
+		.timer_arg = arg,					\
+	})
+
+/**
+ * TIMER_ON_HEAP() - initializer for a single-shot timer
+ * Like TIMER_ON_STACK(), but the cleanup callback
+ * will free the struct event.
+ */
+#define TIMER_ON_HEAP(fn, arg, us)					\
+	((struct timer_event){						\
+		.e = TIMER_EVENT_ON_HEAP(_call_timer_cb, us),		\
+		.timer_fn = fn,						\
+		.timer_arg = arg,					\
+	})
+
+#endif
diff --git a/event/timeout.c b/event/timeout.c
new file mode 100644
index 0000000..ee37b30
--- /dev/null
+++ b/event/timeout.c
@@ -0,0 +1,373 @@
+/*
+ * Copyright (c) 2021 Martin Wilck, SUSE LLC
+ * SPDX-License-Identifier: LGPL-2.1-or-newer
+ */
+#include <inttypes.h>
+#include <limits.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <errno.h>
+#include <sys/time.h>
+#include <sys/timerfd.h>
+#include <syslog.h>
+#include "common.h"
+#include "ts-util.h"
+#define LOG_FUNCNAME 1
+#include "log.h"
+#include "timeout.h"
+#include "event.h"
+
+struct timeout_handler {
+        int source;
+        size_t len;
+        struct timespec **timeouts;
+	struct timespec expiry;
+	struct event ev;
+};
+
+int timeout_get_clocksource(const struct event *evt)
+{
+	return container_of_const(evt, struct timeout_handler, ev)->source;
+}
+
+static void free_timeout_handler(struct timeout_handler *th)
+{
+        if (th->ev.fd != -1)
+                close(th->ev.fd);
+
+        if (th->timeouts)
+                free(th->timeouts);
+
+        free(th);
+}
+
+void free_timeout_event(struct event *ev)
+{
+	return free_timeout_handler(container_of(ev, struct timeout_handler, ev));
+}
+
+struct event *new_timeout_event(int source)
+{
+        struct timeout_handler *th = calloc(1, sizeof(*th));
+
+        if (!th)
+                return NULL;
+        th->ev.fd = timerfd_create(source, TFD_NONBLOCK|TFD_CLOEXEC);
+        if (th->ev.fd == -1) {
+                msg(LOG_ERR, "timerfd_create: %m\n");
+                free(th);
+                return NULL;
+        }
+        th->source = source;
+	th->ev.ep.events = EPOLLIN;
+	th->ev.ep.data.ptr = &th->ev;
+	th->ev.callback = timeout_event;
+
+	msg(LOG_DEBUG, "done\n");
+        return &th->ev;
+}
+
+static long _timeout_rearm(struct timeout_handler *th, long pos)
+{
+        struct itimerspec it = { .it_interval = { 0, 0 }, };
+        int rc;
+
+        if (pos < (long)th->len)
+                it.it_value = *th->timeouts[pos];
+
+	if (ts_compare(&it.it_value, &th->expiry) == 0)
+		return pos;
+
+        msg(LOG_DEBUG, "current: %ld/%zd, expire: %ld.%06ld\n",
+            pos, th->len, (long)it.it_value.tv_sec, it.it_value.tv_nsec / 1000L);
+
+        rc = timerfd_settime(th->ev.fd, TFD_TIMER_ABSTIME, &it, NULL);
+        if (rc == -1) {
+                msg(LOG_ERR, "timerfd_settime: %m\n");
+                return -errno;
+        } else {
+		th->expiry = it.it_value;
+                return pos;
+	}
+}
+
+static const struct timespec null_ts;
+
+static long timeout_resize(struct timeout_handler *th, size_t size)
+{
+	struct timespec **tmp;
+
+	if (size > LONG_MAX)
+		return -EOVERFLOW;
+
+	if (size == 0) {
+		free(th->timeouts);
+		th->timeouts = NULL;
+		th->len = 0;
+		return 0;
+	}
+
+	msg(LOG_DEBUG, "size old %zu new %zu\n", th->len, size);
+	tmp = realloc(th->timeouts, size * sizeof(*th->timeouts));
+	if (tmp == NULL)
+		return -errno;
+
+	th->timeouts = tmp;
+	return size;
+}
+
+int timeout_reset(struct event  *tmo_event)
+{
+	struct timeout_handler *th =
+		container_of(tmo_event, struct timeout_handler, ev);
+
+	timeout_resize(th, 0);
+	return _timeout_rearm(th, 0);
+}
+
+static int absolute_timespec(int source, struct timespec *ts)
+{
+	struct timespec now;
+
+	if (clock_gettime(source, &now) == -1)
+		return -errno;
+	ts->tv_sec += now.tv_sec;
+	ts->tv_nsec += now.tv_nsec;
+	return 0;
+}
+
+static int timeout_add_ev(struct timeout_handler *th, struct event *event)
+{
+        long pos;
+	int rc;
+
+        if (!th || !event)
+                return -EINVAL;
+
+	if (ts_compare(&event->tmo, &null_ts) == 0)
+		return 0;
+
+	for (pos = 0; pos < (long)th->len; pos++)
+		if (th->timeouts[pos] == &event->tmo) {
+			msg(LOG_DEBUG, "event %p exists already at pos %ld/%zu\n",
+			    event, pos, th->len);
+			return -EEXIST;
+		};
+
+	if ((rc = timeout_resize(th, th->len + 1)) < 0) {
+		msg(LOG_ERR, "failed to increase array size: %m\n");
+		return rc;
+	}
+
+        if (~event->flags & TMO_ABS &&
+	    absolute_timespec(th->source, &event->tmo) == -1)
+			return -errno;
+
+        pos = ts_insert(th->timeouts, &th->len, th->len + 1, &event->tmo);
+        if (pos < 0) {
+                msg(LOG_ERR, "ts_insert failed: %m\n");
+                return errno ? -errno : -EIO;
+        }
+
+        msg(LOG_DEBUG, "new timeout at pos %ld/%zd: %ld.%06ld\n",
+            pos, th->len, (long)event->tmo.tv_sec, event->tmo.tv_nsec / 1000L);
+
+        if (pos == 0)
+                _timeout_rearm(th, pos);
+
+        return 0;
+}
+
+int timeout_add(struct event *tmo_event, struct event *ev)
+{
+	return timeout_add_ev(container_of(tmo_event, struct timeout_handler, ev), ev);
+}
+
+static int timeout_cancel_ev(struct timeout_handler *th, struct event *evt)
+{
+        struct timespec *ts = &evt->tmo;
+        long pos;
+
+	if (ts_compare(&evt->tmo, &null_ts) == 0)
+		return 0;
+
+        for (pos = 0; pos < (long)th->len && ts != th->timeouts[pos]; pos++);
+
+        if (pos == (long)th->len) {
+                msg(LOG_DEBUG, "%p: not found\n", evt);
+		/*
+		 * This is normal if called from a timeout handler.
+		 * Mark the event as having no timeout.
+		 */
+		*ts = null_ts;
+                return -ENOENT;
+        }
+
+	msg(LOG_DEBUG, "timeout %ld cancelled, %ld.%06ld\n",
+            pos, (long)ts->tv_sec, ts->tv_nsec / 1000L);
+
+	*ts = null_ts;
+        memmove(&th->timeouts[pos], &th->timeouts[pos + 1],
+                (th->len - pos - 1) * sizeof(*th->timeouts));
+
+        th->len--;
+        if (pos == 0)
+                _timeout_rearm(th, 0);
+        return 0;
+}
+
+int timeout_cancel(struct event *tmo_event, struct event *ev)
+{
+	return timeout_cancel_ev(container_of(tmo_event, struct timeout_handler, ev), ev);
+}
+
+int timeout_modify(struct event *tmo_event, struct event *evt, struct timespec *new)
+{
+	struct timeout_handler *th =
+		container_of(tmo_event, struct timeout_handler, ev);
+        struct timespec *ts = &evt->tmo;
+        long pos, pnew, pmin;
+
+	if (ts_compare(&evt->tmo, &null_ts) == 0 || th->len == 0) {
+		evt->tmo = *new;
+		return timeout_add_ev(th, evt);
+	}
+
+	if (ts_compare(new, &null_ts) == 0)
+		return timeout_cancel_ev(th, evt);
+
+	if (ts_compare(new, &evt->tmo) == 0)
+		/* Nothing changed */
+		return 0;
+
+	/* There could be several timeouts with the same expiry, find the right one */
+	pmin = ts_search(th->timeouts, th->len, ts);
+        for (pos = pmin;
+             pos < (long)th->len &&
+                     ts_compare(th->timeouts[pos], ts) == 0;
+             pos++) {
+                if (ts == th->timeouts[pos])
+                        break;
+        }
+
+        if (pos == (long)th->len || ts != th->timeouts[pos]) {
+		/* This is normal if timeout_modify called from timeout handler */
+                msg(LOG_DEBUG, "%p: not found\n", evt);
+                evt->tmo = *new;
+		return timeout_add_ev(th, evt);
+        }
+
+        if (~evt->flags & TMO_ABS && absolute_timespec(th->source, new) == -1)
+		return -errno;
+
+	ts_normalize(new);
+	pnew = ts_search(th->timeouts, th->len, new);
+	if (pnew < 0)
+		return pnew;
+
+	if (pnew > pos + 1) {
+		/*
+		 * ts_search returns the position (pnew) at which the new tmo would be
+		 * inserted. All members at pnew or higher are >= new.
+		 * So if pnew = pos + 1, nothing needs to be done.
+		 * Subtract 1, because pnew is after pos but pos will be moved away.
+		 */
+		pnew--;
+		memmove(&th->timeouts[pos], &th->timeouts[pos + 1],
+			(pnew - pos)  * sizeof(*th->timeouts));
+		th->timeouts[pnew] = &evt->tmo;
+	} else if (pnew < pos) {
+		memmove(&th->timeouts[pnew + 1], &th->timeouts[pnew],
+			(pos - pnew)  * sizeof(*th->timeouts));
+		th->timeouts[pnew] = &evt->tmo;
+	}
+	msg(LOG_DEBUG, "timeout %ld now at pos %ld, %ld.%06ld -> %ld.%06ld\n",
+            pos, pnew, (long)ts->tv_sec, ts->tv_nsec / 1000L,
+            (long)new->tv_sec, new->tv_nsec / 1000L);
+	evt->tmo = *new;
+
+
+        if (pnew == 0)
+                _timeout_rearm(th, 0);
+        return 0;
+}
+
+static void _timeout_run_callbacks(struct timespec **tss, long n)
+{
+        long i;
+
+        for (i = 0; i < n; i++) {
+                struct event *evt;
+
+                evt = container_of(tss[i], struct event, tmo);
+
+                msg(LOG_DEBUG, "calling callback %ld (%ld.%06ld)\n", i,
+                    (long)tss[i]->tv_sec, tss[i]->tv_nsec / 1000);
+
+		_event_invoke_callback(evt, REASON_TIMEOUT, 0, true);
+        }
+
+}
+
+int timeout_event(struct event *tmo_ev, uint32_t events)
+{
+	struct timeout_handler *th = container_of(tmo_ev, struct timeout_handler, ev);
+        struct timespec now;
+        struct timespec **expired;
+        long pos = th->len;
+	uint64_t val;
+
+	if (tmo_ev->reason != REASON_EVENT_OCCURED || events & ~EPOLLIN) {
+		msg(LOG_WARNING, "unexpected reason %s, events 0x%08x\n",
+		    reason_str[tmo_ev->reason], events);
+		return EVENTCB_CONTINUE;
+	}
+
+	if (read(tmo_ev->fd, &val, sizeof(val)) == -1)
+		/*
+		 * EAGAIN happens if the most recent timer was cancelled
+		 * and the timer rearmed before we get here.
+		 */
+		msg(errno == EAGAIN ? LOG_DEBUG : LOG_ERR,
+		    "failed to read timerfd: %m\n");
+
+	clock_gettime(th->source, &now);
+
+        /*
+         * callbacks may add new timers, therefore we must iterate here.
+         */
+        while (th->len > 0) {
+
+		/* Expired timeouts are at the beginning, don't ts_search() here */
+		for (pos = 0;
+		     pos < (long)th->len && ts_compare(th->timeouts[pos], &now) <= 0;
+		     pos++);
+
+                if (pos == (long)th->len) {
+                        expired = th->timeouts;
+                        th->len = 0;
+                        th->timeouts = NULL;
+                        _timeout_run_callbacks(expired, pos);
+                        free(expired);
+                } else if (pos > 0) {
+                        expired = malloc(pos * sizeof(*expired));
+                        if (expired)
+                                memcpy(expired, th->timeouts, pos * sizeof(*expired));
+                        th->len -= pos;
+                        memmove(th->timeouts, &th->timeouts[pos],
+                                th->len * sizeof(*th->timeouts));
+                        if (expired) {
+                                _timeout_run_callbacks(expired, pos);
+                                free(expired);
+                        }
+                } else
+                        break;
+        }
+
+        _timeout_rearm(th, 0);
+	return EVENTCB_CONTINUE;
+}
diff --git a/event/timeout.h b/event/timeout.h
new file mode 100644
index 0000000..fd87eae
--- /dev/null
+++ b/event/timeout.h
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2021 Martin Wilck, SUSE LLC
+ * SPDX-License-Identifier: LGPL-2.1-or-newer
+ */
+#ifndef _TIMEOUT_H
+#define _TIMEOUT_H
+
+struct event;
+
+/**
+ * free_timeout_event() - free resources associated with a timeout event
+ * @tmo_event: a struct event returned from new_timeout_event().
+ */
+void free_timeout_event(struct event *tmo_event);
+
+/**
+ * new_timeout_event() - create a new timeout event object
+ * @source: One of the supported clock sources of the sytstem, see clock_gettime(2).
+ *
+ * Return: a new timeout event object on success, NULL on failure.
+ */
+struct event *new_timeout_event(int source);
+
+/**
+ * timeout_add() - add an event to the timeout list.
+ * @tmo_event: struct event returned from new_timeout_event().
+ * @event: a struct event
+ *
+ * This function adds @event to the list of timeouts handled, using
+ * the @event->tmo and @event->flags to determine the expiry of the timeout.
+ * When the timeout expires, timeout_event() will call @event->callback()
+ * with @event->reason set to @REASON_TIMEOUT.
+ * If @event->tmo is {0, 0}, nothing is done.
+ *
+ * Return: 0 on success. On error, a negative error code.
+ *  -EEXIST: the event is already in the list of timeouts handled.
+ *  -ENOMEM: failed to allocate memory to insert the new element.
+ *  -EINVAL: invalid input parameters.
+ */
+int timeout_add(struct event *tmo_event, struct event *event);
+
+/**
+ * timeout_modify() - modify the timeout value of a previously added event
+ * @tmo_event: struct event returned from new_timeout_event().
+ * @event: the event to modify
+ * @new: the new timeout (doesn't need to be normalized)
+ *
+ * Moves the event in the timeout list to a new position according to
+ * the new timeout value in @new. If the event isn't currently in the list,
+ * timeout_add() will be called. If @new is {0, 0} (no timeout), timeout_cancel()
+ * is called. On successful return, @event->tmo will be set
+ * to @new, and normalized.
+ * IMPORTANT: don't set @event->tmo to @new before calling this function.
+ *
+ * Return: 0 on success, negative error code on failure. Error codes can be
+ * from timeout_add() or timeout_cancel().
+ */
+int timeout_modify(struct event *tmo_event, struct event *event, struct timespec *new);
+
+/**
+ * timeout_cancel() - remove an event from the timeout list
+ * @tmo_event: struct event returned from new_timeout_event().
+ * @event: the event to modify
+ *
+ * Subsequent calls to timeout_event() will not call this event's callback any more.
+ * But if the timeout event has already happened and delivered to the event dispatcher,
+ * this function will return -ENOENT, and the callback will be called later on.
+ *
+ * Return: 0 on success, negative error code on failure.
+ *  -ENOENT: the event wasn't found in the timeout list. Either the timeout event
+ *  had happened already, or the event had never been added / already cancelled.
+ */
+int timeout_cancel(struct event *tmo_event, struct event *);
+
+/**
+ * timeout_reset() - clear all timeouts
+ *
+ * Cancel all timeouts (without calling any callbacks), and disarm the timer.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int timeout_reset(struct event *tmo_event);
+
+/**
+ * timeout_event() - handle timeout events
+ * @tmo_event: struct event returned from new_timeout_event().
+ * @events: epoll event bitmask, see epoll_wait(2); expected to be EPOLLIN.
+ *
+ * This function is invoked by the event dispatcher if the @tmo_event has occured,
+ * meaning that one or more timeouts in the list handled by @tmo_ev have expired.
+ * timeout_event() removes the expired events from the list and calls the respective
+ * callbacks for the timed-out events with the @reason field set to @REASON_TIMEOUT.
+ *
+ * If the callback wants to extend or otherwise re-arm the timeout, it must call
+ * timeout_add() or (preferrably) timeout_modify().
+ *
+ * Return: EVENTCB_CONTINUE
+ */
+int timeout_event(struct event *tmo_event, uint32_t events);
+
+/**
+ * timeout_get_clocksource() - obtain clock source used
+ * @tmo_event: struct event returned from new_timeout_event().
+ *
+ * Return: the clock source passed to new_timeout_event() when @tmo_ev
+ * was created.
+ */
+int timeout_get_clocksource(const struct event *tmo_event);
+
+#endif
diff --git a/event/ts-util.c b/event/ts-util.c
new file mode 100644
index 0000000..2cb1466
--- /dev/null
+++ b/event/ts-util.c
@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2021 Martin Wilck, SUSE LLC
+ * SPDX-License-Identifier: LGPL-2.1-or-newer
+ */
+#include <time.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+#include <errno.h>
+#include "ts-util.h"
+void ts_normalize(struct timespec *tv)
+{
+	long quot, rem;
+	if (tv->tv_nsec >= 0 && tv->tv_nsec < 1000000000L)
+		return;
+	quot = tv->tv_nsec / 1000000000L;
+	rem = tv->tv_nsec % 1000000000L;
+	if (rem < 0) {
+		rem += 1000000000L;
+		quot--;
+	}
+	tv->tv_sec += quot;
+	tv->tv_nsec = rem;
+}
+
+void ts_add(struct timespec *t1, const struct timespec *t2)
+{
+	t1->tv_sec += t2->tv_sec;
+	t1->tv_nsec += t2->tv_nsec;
+	ts_normalize(t1);
+	return;
+}
+
+void ts_subtract(struct timespec *t1, const struct timespec *t2)
+{
+	t1->tv_sec -= t2->tv_sec;
+	t1->tv_nsec -= t2->tv_nsec;
+	ts_normalize(t1);
+	return;
+}
+
+int ts_compare(const struct timespec *t1, const struct timespec *t2)
+{
+	if (t1->tv_sec < t2->tv_sec)
+		return -1;
+	if (t1->tv_sec > t2->tv_sec)
+		return 1;
+	if (t1->tv_nsec < t2->tv_nsec)
+		return -1;
+	if (t1->tv_nsec > t2->tv_nsec)
+		return 1;
+	return 0;
+}
+
+static int ts_compare_q(const struct timespec **pt1,
+			const struct timespec **pt2)
+{
+	return ts_compare(*pt1, *pt2);
+}
+
+long ts_search(struct timespec *const *tvs, size_t size, struct timespec *new)
+{
+	long low, high, mid;
+	if (!new || !tvs || size > LONG_MAX)
+		return -EINVAL;
+	ts_normalize(new);
+	if (size == 0)
+		return 0;
+	high = size - 1;
+	if (ts_compare(new, tvs[high]) > 0)
+		return size;
+	low = 0;
+	while (high - low > 1) {
+		mid = low + (high - low) / 2;
+		if (ts_compare(new, tvs[mid]) <= 0)
+			high = mid;
+		else
+			low = mid;
+	}
+	if (high > low && ts_compare(new, tvs[low]) > 0)
+		return high;
+	else
+		return low;
+}
+
+long ts_insert(struct timespec **tvs, size_t *len, size_t size,
+	       struct timespec *new)
+{
+	long pos;
+	if (!len || size <= *len)
+		return -EOVERFLOW;
+	pos = ts_search(tvs, *len, new);
+	if (pos < 0)
+		return pos;
+	memmove(&tvs[pos + 1], &tvs[pos], (*len - pos) * sizeof(*tvs));
+	(*len)++;
+	tvs[pos] = new;
+	return pos;
+}
+
+void ts_sort(struct timespec **tvs, size_t size)
+{
+	qsort(tvs, size, sizeof(struct timespec *),
+	      (int (*)(const void *, const void *))ts_compare_q);
+	return;
+}
diff --git a/event/ts-util.h b/event/ts-util.h
new file mode 100644
index 0000000..1b469f0
--- /dev/null
+++ b/event/ts-util.h
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2021 Martin Wilck, SUSE LLC
+ * SPDX-License-Identifier: LGPL-2.1-or-newer
+ */
+#ifndef _TS_UTIL_H
+#define _TS_UTIL_H
+
+/*
+ * Utility functions for dealing with "struct timespec".
+ * See also tv_util.h, which has the same set of functions
+ * for "struct timeval".
+ */
+
+/**
+ * ts_to_us - convert struct timespec to microseconds
+ * @ts: timespec object
+ *
+ * Return: value of @ts in microseconds.
+ */
+static inline uint64_t ts_to_us(const struct timespec *ts)
+{
+	return ts->tv_sec * 1000000ULL + ts->tv_nsec / 1000;
+}
+
+/**
+ * us_to_ts - convert microseconds to struct timespec
+ * @us: microseconds
+ * @ts: conversion result
+ */
+static inline void us_to_ts(uint64_t us, struct timespec *ts)
+{
+	ts->tv_sec = us / 1000000L;
+	ts->tv_nsec = (us % 1000000L) * 1000;
+}
+
+/**
+ * ts_normalize() - convert a struct timespec to normal form
+ * @ts: timespec to normalize
+ *
+ * "Normalized" means 0 <= ts->tv_nsec < 1000000000.
+ */
+void ts_normalize(struct timespec *ts);
+
+/**
+ * ts_add(): add a struct timespec to another
+ * @t1: 1st summand, this one will be modified
+ * @t2: 2nd summand, will be added to @t1.
+ *
+ * @t1 is normalized on return.
+ */
+void ts_add(struct timespec *t1, const struct timespec *t2);
+
+/**
+ * ts_subtract(): subtract a struct timespec from another
+ *
+ * @t1: minuend, this one will be modified
+ * @t2: subtrahend
+ *
+ * @t1 is normalized on return.
+ */
+void ts_subtract(struct timespec *t1, const struct timespec *t2);
+
+/**
+ * ts_compare - compare two struct timespec objects
+ *
+ * @t1: 1st timespec object
+ * @t2: 2nd timespec object
+ *
+ * IMPORTANT: this function assumes that both @t1 and  @t2 are normalized.
+ * If that's not the case, results will be wrong.
+ *
+ * Return: 0 if @t1 == @t2, -1 if @t1 < @t2 and 1 if @t1 > @t2.
+ */
+int ts_compare(const struct timespec *t1, const struct timespec *t2);
+
+/**
+ * ts_sort() - sort an array of normalized struct timespec objects
+ *
+ * @tss: array of of "struct timespec *"
+ * @len: number of elements in @tss
+ *
+ * IMPORTANT: all elements of the array should be normalized before calling
+ * this function.
+ * The array is sorted in ascending order, using ts_compare() for comparing
+ * elements.
+ */
+void ts_sort(struct timespec **tss, size_t len);
+
+/**
+ * ts_search - find insertion point for a timespec object in a sorted array
+ *
+ * @tvs: sorted array of normalized "struct timespec *"
+ * @len: number of elements in @tss
+ * @new: new struct timespec object
+ *
+ * On entry, @tvs must be a sorted array of normalized "struct timespec" pointers.
+ * (sorted in the sense of ts_sort()). The function searches the index in the
+ * array where @new would need to be inserted, using a bisection algorithm.
+ * @new needs not be normalized on entry, it will be when the function returns
+ * successfully.
+ *
+ * Return: On success, the non-negative index at which this element would need to
+ * be inserted in the array in order to keep it sorted. If the return value is n,
+ * then the timespec @tvs[n-1] is smaller than @new, and @tvs[n] is greater or
+ * equal than @new.
+ *  -EINVAL if one of the input parameters is invalid.
+ */
+long ts_search(struct timespec *const *tvs, size_t len, struct timespec *new);
+
+/**
+ * ts_insert - insert a new struct timespec into a sorted array
+ *
+ * @tvs: sorted array of normalized "struct timespec *"
+ * @len: number of elements in @tss
+ * @size: allocated size (in elements) of @tvs, must be larger than @len on entry
+ * @new: new struct timespec object
+ *
+ * Inserts the element @new into @tvs at the point returned by ts_search(), keeping
+ * the array sorted. @new doesn't need to be normalized on entry, it will be on
+ * successful return.
+ * This function doesn't reallocate @tvs and doesn't take a copy of @new.
+ *
+ * Return: On success, the non-negative index at which the element was inserted.
+ *  -EINVAL if input parameters were invalid (see ts_search()).
+ *  -EOVERFLOW if @size is not large enough to add the new element.
+ */
+long ts_insert(struct timespec **tvs, size_t *len, size_t size, struct timespec *new);
+
+#endif
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 03/16] monitor: add basic "nvme monitor" functionality
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
  2021-03-06  0:36 ` [PATCH v2 01/16] fabrics: export symbols required for monitor functionality mwilck
  2021-03-06  0:36 ` [PATCH v2 02/16] nvme-cli: add code for event and timeout handling mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 04/16] monitor: implement uevent handling mwilck
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

"nvme monitor" listens for uevents related to NVMe discovery
and attemtps to auto-connect newly discovered controllers.
Currently NVMeFC events via fc_udev_device and AEN events from
NVMe discovery controllers are supported.

This patch adds the main event listening functionality. Actual
event handling will be added in the forthcoming patches.

Options:

 -N/--no-connect for "dry run" mode
 -S/--silent   to suppress LOG_NOTICE
 -v/--verbose  to enable LOG_INFO (overrides -S)
 -D/--debug    to enable LOG_DEBUG (overrides -S, -v)
 -t/--timestamps to enable time stamps on log messages.

I tried to use short options that don't conflict with options
from nvme connect-all, because many of those options will be added
to the monitor later, too.
---
 .github/workflows/c-cpp.yml |   4 +
 Makefile                    |  10 ++
 monitor.c                   | 246 ++++++++++++++++++++++++++++++++++++
 monitor.h                   |   6 +
 nvme-builtin.h              |   1 +
 nvme.c                      |  13 ++
 6 files changed, 280 insertions(+)
 create mode 100644 monitor.c
 create mode 100644 monitor.h

diff --git a/.github/workflows/c-cpp.yml b/.github/workflows/c-cpp.yml
index d2f94e9..555edca 100644
--- a/.github/workflows/c-cpp.yml
+++ b/.github/workflows/c-cpp.yml
@@ -13,6 +13,10 @@ jobs:
 
     steps:
     - uses: actions/checkout@v2
+    - name: update
+      run: sudo apt-get update
+    - name: dependencies
+      run: sudo apt-get install --yes libudev-dev
     - name: make
       run: sudo apt-get install gcc-10-powerpc* && make clean && make && make clean && make LD=powerpc64le-linux-gnu-ld CC=powerpc64le-linux-gnu-gcc-10 CFLAGS='-O2 -g -Wall -Wformat-security -Werror -m64 -mcpu=power8 -mtune=power8 -I -I/usr/powerpc64-linux-gnu/include/'
 
diff --git a/Makefile b/Makefile
index ad18d47..33441b1 100644
--- a/Makefile
+++ b/Makefile
@@ -4,6 +4,7 @@ override CPPFLAGS += -D_GNU_SOURCE -D__CHECK_ENDIAN__
 LIBUUID = $(shell $(LD) -o /dev/null -luuid >/dev/null 2>&1; echo $$?)
 LIBHUGETLBFS = $(shell $(LD) -o /dev/null -lhugetlbfs >/dev/null 2>&1; echo $$?)
 HAVE_SYSTEMD = $(shell pkg-config --exists libsystemd  --atleast-version=242; echo $$?)
+HAVE_LIBUDEV = $(shell pkg-config --exists libudev; echo $$?)
 NVME = nvme
 INSTALL ?= install
 DESTDIR =
@@ -32,6 +33,11 @@ endif
 
 INC=-Iutil
 
+ifeq ($(HAVE_LIBUDEV),0)
+	override LDFLAGS += -ludev
+	override CFLAGS += -DHAVE_LIBUDEV
+endif
+
 ifeq ($(HAVE_SYSTEMD),0)
 	override LDFLAGS += -lsystemd
 	override CFLAGS += -DHAVE_SYSTEMD
@@ -62,6 +68,10 @@ OBJS := nvme-print.o nvme-ioctl.o nvme-rpmb.o \
 	nvme-lightnvm.o fabrics.o nvme-models.o plugin.o \
 	nvme-status.o nvme-filters.o nvme-topology.o
 
+ifeq ($(HAVE_LIBUDEV),0)
+        OBJS += monitor.o
+endif
+
 UTIL_OBJS := util/argconfig.o util/suffix.o util/json.o util/parser.o util/cleanup.o util/log.o
 EVENT_OBJS := event/event.o event/timeout.o event/ts-util.o
 
diff --git a/monitor.c b/monitor.c
new file mode 100644
index 0000000..32f53a3
--- /dev/null
+++ b/monitor.c
@@ -0,0 +1,246 @@
+/*
+ * Copyright (C) 2021 SUSE LLC
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This file implements a simple monitor for NVMe-related uevents.
+ */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <errno.h>
+#include <libudev.h>
+#include <signal.h>
+#include <time.h>
+#include <syslog.h>
+#include <sys/epoll.h>
+
+#include "nvme-status.h"
+#include "util/argconfig.h"
+#include "util/cleanup.h"
+#include "common.h"
+#include "monitor.h"
+#define LOG_FUNCNAME 1
+#include "util/log.h"
+#include "event/event.h"
+
+static struct monitor_config {
+	bool autoconnect;
+} mon_cfg = {
+	.autoconnect = true,
+};
+
+static struct dispatcher *mon_dsp;
+
+static DEFINE_CLEANUP_FUNC(cleanup_monitorp, struct udev_monitor *, udev_monitor_unref);
+
+static int create_udev_monitor(struct udev *udev, struct udev_monitor **pmon)
+{
+	struct udev_monitor *mon __cleanup__(cleanup_monitorp) = NULL;
+	int ret;
+	bool use_udev;
+	static const char *const monitor_name[] = {
+		[false] = "kernel",
+		[true]  = "udev",
+	};
+
+	/* Check if udevd is running, same test that libudev uses */
+	use_udev = access("/run/udev/control", F_OK) >= 0;
+	msg(LOG_DEBUG, "using %s monitor for uevents\n", monitor_name[use_udev]);
+
+	mon = udev_monitor_new_from_netlink(udev, monitor_name[use_udev]);
+	if (!mon)
+		return errno ? -errno : -ENOMEM;
+
+	/* Add match for NVMe controller devices */
+	ret = udev_monitor_filter_add_match_subsystem_devtype(mon, "nvme", NULL);
+	/* Add match for fc_udev_device */
+	ret = udev_monitor_filter_add_match_subsystem_devtype(mon, "fc", NULL);
+
+	/*
+	 * If we use the "udev" monitor, the kernel filters out the interesting
+	 * uevents for us using BPF. A single event is normally well below 1kB,
+	 * so 1MiB is sufficient for queueing more than 1000 uevents, which
+	 * should be plenty for just nvme.
+	 *
+	 * For "kernel" monitors, the filtering is done by libudev in user space,
+	 * thus every device is received in the first place, and a larger
+	 * receive buffer is needed. Use the same value as udevd.
+	 */
+	udev_monitor_set_receive_buffer_size(mon, (use_udev ? 1 : 128) * 1024 * 1024);
+	ret = udev_monitor_enable_receiving(mon);
+	if (ret < 0)
+		return ret;
+	*pmon = mon;
+	mon = NULL;
+	return 0;
+}
+
+static sig_atomic_t must_exit;
+
+static void monitor_int_handler(int sig)
+{
+	must_exit = 1;
+}
+
+static int monitor_init_signals(sigset_t *wait_mask)
+{
+	sigset_t mask;
+	struct sigaction sa = { .sa_handler = monitor_int_handler, };
+
+	/*
+	 * Block all signals. They will be unblocked when we wait
+	 * for events.
+	 */
+	sigfillset(&mask);
+	if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
+		return -errno;
+	if (sigaction(SIGTERM, &sa, NULL) == -1)
+		return -errno;
+	if (sigaction(SIGINT, &sa, NULL) == -1)
+		return -errno;
+
+	/* signal mask to be used in epoll_pwait() */
+	sigfillset(wait_mask);
+	sigdelset(wait_mask, SIGTERM);
+	sigdelset(wait_mask, SIGINT);
+
+	return 0;
+}
+
+static void monitor_handle_udevice(struct udev_device *ud)
+{
+	msg(LOG_INFO, "uevent: %s %s\n",
+		udev_device_get_action(ud),
+		udev_device_get_sysname(ud));
+}
+
+struct udev_monitor_event {
+	struct event e;
+	struct udev_monitor *monitor;
+};
+
+static int monitor_handle_uevents(struct event *ev,
+				  uint32_t __attribute__((unused)) ep_events)
+{
+	struct udev_monitor_event *udev_event =
+		container_of(ev, struct udev_monitor_event, e);
+	struct udev_monitor *monitor = udev_event->monitor;
+	struct udev_device *ud;
+
+	for (ud = udev_monitor_receive_device(monitor);
+	     ud;
+	     ud = udev_monitor_receive_device(monitor)) {
+		monitor_handle_udevice(ud);
+		udev_device_unref(ud);
+	}
+	return EVENTCB_CONTINUE;
+}
+
+static int monitor_parse_opts(const char *desc, int argc, char **argv)
+{
+	bool quiet = false;
+	bool verbose = false;
+	bool debug = false;
+	bool noauto = false;
+	int ret;
+	OPT_ARGS(opts) = {
+		OPT_FLAG("no-connect",     'N', &noauto,              "dry run, do not autoconnect to discovered controllers"),
+		OPT_FLAG("silent",         'S', &quiet,               "log level: silent"),
+		OPT_FLAG("verbose",        'v', &verbose,             "log level: verbose"),
+		OPT_FLAG("debug",          'D', &debug,               "log level: debug"),
+		OPT_FLAG("timestamps",     't', &log_timestamp,       "print log timestamps"),
+		OPT_END()
+	};
+
+	ret = argconfig_parse(argc, argv, desc, opts);
+	if (ret)
+		return ret;
+	if (quiet)
+		log_level = LOG_WARNING;
+	if (verbose)
+		log_level = LOG_INFO;
+	if (debug)
+		log_level = LOG_DEBUG;
+	if (noauto)
+		mon_cfg.autoconnect = false;
+
+	return ret;
+}
+
+static DEFINE_CLEANUP_FUNC(cleanup_udevp, struct udev *, udev_unref);
+
+static void cleanup_udev_event(struct event *evt)
+{
+	struct udev_monitor_event *ue;
+
+	ue = container_of(evt, struct udev_monitor_event, e);
+	if (ue->monitor)
+		ue->monitor = udev_monitor_unref(ue->monitor);
+}
+
+int aen_monitor(const char *desc, int argc, char **argv)
+{
+	int ret;
+	struct udev *udev __cleanup__(cleanup_udevp) = NULL;
+	struct udev_monitor *monitor __cleanup__(cleanup_monitorp) = NULL;
+	struct udev_monitor_event udev_event = { .e.fd = -1, };
+	sigset_t wait_mask;
+
+	ret = monitor_parse_opts(desc, argc, argv);
+	if (ret)
+		goto out;
+
+	ret = monitor_init_signals(&wait_mask);
+	if (ret != 0) {
+		msg(LOG_ERR, "monitor: failed to initialize signals: %m\n");
+		goto out;
+	}
+
+	mon_dsp = new_dispatcher(CLOCK_REALTIME);
+	if (!mon_dsp) {
+		ret = errno ? -errno : -EIO;
+		goto out;
+	}
+
+	udev = udev_new();
+	if (!udev) {
+		msg(LOG_ERR, "failed to create udev object: %m\n");
+		ret = errno ? -errno : -ENOMEM;
+		goto out;
+	}
+
+	ret = create_udev_monitor(udev, &monitor);
+	if (ret != 0)
+		goto out;
+
+	udev_event.e = EVENT_ON_STACK(monitor_handle_uevents,
+				      udev_monitor_get_fd(monitor), EPOLLIN);
+	if (udev_event.e.fd == -1)
+		goto out;
+	udev_event.e.cleanup = cleanup_udev_event;
+	udev_event.monitor = monitor;
+	monitor = NULL;
+
+	if ((ret = event_add(mon_dsp, &udev_event.e)) != 0) {
+		msg(LOG_ERR, "failed to register udev monitor event: %s\n",
+		    strerror(-ret));
+		goto out;
+	}
+
+	ret = event_loop(mon_dsp, &wait_mask, NULL);
+
+out:
+	free_dispatcher(mon_dsp);
+	return nvme_status_to_errno(ret, true);
+}
diff --git a/monitor.h b/monitor.h
new file mode 100644
index 0000000..e79d3a6
--- /dev/null
+++ b/monitor.h
@@ -0,0 +1,6 @@
+#ifndef _MONITOR_H
+#define _MONITOR_H
+
+extern int aen_monitor(const char *desc, int argc, char **argv);
+
+#endif
diff --git a/nvme-builtin.h b/nvme-builtin.h
index 296afd6..5be7827 100644
--- a/nvme-builtin.h
+++ b/nvme-builtin.h
@@ -83,6 +83,7 @@ COMMAND_LIST(
 	ENTRY("dir-send", "Submit a Directive Send command, return results", dir_send)
 	ENTRY("virt-mgmt", "Manage Flexible Resources between Primary and Secondary Controller ", virtual_mgmt)
 	ENTRY("rpmb", "Replay Protection Memory Block commands", rpmb_cmd)
+	ENTRY("monitor", "Monitor NVMeoF AEN events", monitor_cmd)
 );
 
 #endif
diff --git a/nvme.c b/nvme.c
index 9064e83..7beaeb8 100644
--- a/nvme.c
+++ b/nvme.c
@@ -57,6 +57,7 @@
 
 #include "argconfig.h"
 #include "fabrics.h"
+#include "monitor.h"
 
 #define CREATE_CMD
 #include "nvme-builtin.h"
@@ -5561,6 +5562,18 @@ static int disconnect_all_cmd(int argc, char **argv, struct command *command, st
 	return fabrics_disconnect_all(desc, argc, argv);
 }
 
+static int monitor_cmd(int argc, char **argv, struct command *command, struct plugin *plugin)
+{
+#ifdef HAVE_LIBUDEV
+	const char *desc = "Monitor NVMeoF AEN events";
+
+	return aen_monitor(desc, argc, argv);
+#else
+	fprintf(stderr, "nvme-cli built without libudev doesn't support the \"monitor\" subcommand\n");
+	return EOPNOTSUPP;
+#endif
+}
+
 void register_extension(struct plugin *plugin)
 {
 	plugin->parent = &nvme;
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 04/16] monitor: implement uevent handling
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (2 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 03/16] monitor: add basic "nvme monitor" functionality mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 05/16] conn-db: add simple connection registry mwilck
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This patch implements handling of events received via NVMe-FC
(fc_udev_device, detection of FC remote ports with NVMe support)
and AEN events from persistent discovery controller connections.

For actual discovery, we fork and basically run a "nvme connect-all"
process with for the newly detected discovery controller.
The reason for forking discovery tasks is twofold: Firstly, we'd
otherwise be forced to make all discovery connections sequentially,
which would be slow, as connecting controllers can block on
the order of seconds even in successful cases.

Secondly, this allows us to use the some global variables like
fabrics_cfg and tracked_ctrls in the discovery code path. Without forking,
we'd have to re-write much more code in fabrics.c. In general,
the alternative to forking would be creating threads, but the
that would require a large rewrite of our code base.

A single-threaded server that forks off the actual discovery makes
most sense at this point.

All options known from "nvme connect-all" can be passed to the discovery
processes as usual, with the exception of the "peristent" option, which
is always enabled: The monitor must try to create persistent discovery
connections in order to monitor them.
---
 monitor.c | 282 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 277 insertions(+), 5 deletions(-)

diff --git a/monitor.c b/monitor.c
index 32f53a3..f544319 100644
--- a/monitor.c
+++ b/monitor.c
@@ -17,18 +17,24 @@
 
 #include <stddef.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
 #include <libudev.h>
 #include <signal.h>
 #include <time.h>
 #include <syslog.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/wait.h>
 #include <sys/epoll.h>
 
 #include "nvme-status.h"
+#include "nvme.h"
 #include "util/argconfig.h"
 #include "util/cleanup.h"
 #include "common.h"
+#include "fabrics.h"
 #include "monitor.h"
 #define LOG_FUNCNAME 1
 #include "util/log.h"
@@ -87,12 +93,19 @@ static int create_udev_monitor(struct udev *udev, struct udev_monitor **pmon)
 }
 
 static sig_atomic_t must_exit;
+static sig_atomic_t got_sigchld;
+static sigset_t orig_sigmask;
 
 static void monitor_int_handler(int sig)
 {
 	must_exit = 1;
 }
 
+static void monitor_chld_handler(int sig)
+{
+	got_sigchld = 1;
+}
+
 static int monitor_init_signals(sigset_t *wait_mask)
 {
 	sigset_t mask;
@@ -103,26 +116,228 @@ static int monitor_init_signals(sigset_t *wait_mask)
 	 * for events.
 	 */
 	sigfillset(&mask);
-	if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1)
+	if (sigprocmask(SIG_BLOCK, &mask, &orig_sigmask) == -1)
 		return -errno;
 	if (sigaction(SIGTERM, &sa, NULL) == -1)
 		return -errno;
 	if (sigaction(SIGINT, &sa, NULL) == -1)
 		return -errno;
 
+	sa.sa_handler = monitor_chld_handler;
+	if (sigaction(SIGCHLD, &sa, NULL) == -1)
+		return -errno;
+
 	/* signal mask to be used in epoll_pwait() */
 	sigfillset(wait_mask);
 	sigdelset(wait_mask, SIGTERM);
 	sigdelset(wait_mask, SIGINT);
+	sigdelset(wait_mask, SIGCHLD);
 
 	return 0;
 }
 
+static int child_reset_signals(void)
+{
+	int err = 0;
+	struct sigaction sa = { .sa_handler = SIG_DFL, };
+
+	if (sigaction(SIGTERM, &sa, NULL) == -1)
+		err = errno;
+	if (sigaction(SIGINT, &sa, NULL) == -1 && !err)
+		err = errno;
+	if (sigaction(SIGCHLD, &sa, NULL) == -1 && !err)
+		err = errno;
+
+	if (sigprocmask(SIG_SETMASK, &orig_sigmask, NULL) == -1 && !err)
+		err = errno;
+
+	if (err)
+		msg(LOG_ERR, "error resetting signal handlers and mask\n");
+	return -err;
+}
+
+static int monitor_get_fc_uev_props(struct udev_device *ud,
+				    char *traddr, size_t tra_sz,
+				    char *host_traddr, size_t htra_sz)
+{
+	const char *sysname = udev_device_get_sysname(ud);
+	const char *tra = NULL, *host_tra = NULL;
+	bool fc_event_seen = false;
+	struct udev_list_entry *entry;
+
+	entry = udev_device_get_properties_list_entry(ud);
+	if (!entry) {
+		msg(LOG_NOTICE, "%s: emtpy properties list\n", sysname);
+		return -ENOENT;
+	}
+
+	for (; entry; entry = udev_list_entry_get_next(entry)) {
+		const char *name = udev_list_entry_get_name(entry);
+
+		if (!strcmp(name, "FC_EVENT") &&
+		    !strcmp(udev_list_entry_get_value(entry), "nvmediscovery"))
+				fc_event_seen = true;
+		else if (!strcmp(name, "NVMEFC_HOST_TRADDR"))
+			host_tra = udev_list_entry_get_value(entry);
+		else if (!strcmp(name, "NVMEFC_TRADDR"))
+			tra = udev_list_entry_get_value(entry);
+	}
+	if (!fc_event_seen) {
+		msg(LOG_DEBUG, "%s: FC_EVENT property missing or unsupported\n",
+		    sysname);
+		return -EINVAL;
+	}
+	if (!tra || !host_tra) {
+		msg(LOG_WARNING, "%s: transport properties missing\n", sysname);
+		return -EINVAL;
+	}
+
+	if (!memccpy(traddr, tra, '\0', tra_sz) ||
+	    !memccpy(host_traddr, host_tra, '\0', htra_sz)) {
+		msg(LOG_ERR, "traddr (%zu) or host_traddr (%zu) overflow\n",
+		    strlen(traddr), strlen(host_traddr));
+		return -ENAMETOOLONG;
+	}
+
+	return 0;
+}
+
+static int monitor_discovery(char *transport, char *traddr, char *trsvcid,
+			     char *host_traddr)
+{
+	char argstr[BUF_SIZE];
+	pid_t pid;
+	int rc;
+
+	pid = fork();
+	if (pid == -1) {
+		msg(LOG_ERR, "failed to fork discovery task: %m");
+		return -errno;
+	} else if (pid > 0) {
+		msg(LOG_DEBUG, "started discovery task %ld\n", (long)pid);
+		return 0;
+	}
+
+	child_reset_signals();
+	free_dispatcher(mon_dsp);
+
+	msg(LOG_NOTICE, "starting discovery\n");
+	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
+	fabrics_cfg.transport = transport;
+	fabrics_cfg.traddr = traddr;
+	fabrics_cfg.trsvcid = trsvcid;
+	fabrics_cfg.host_traddr = host_traddr;
+	/* Without the following, the kernel returns EINVAL */
+	fabrics_cfg.tos = -1;
+	fabrics_cfg.persistent = true;
+
+	rc = build_options(argstr, sizeof(argstr), true);
+	msg(LOG_DEBUG, "%s\n", argstr);
+	rc = do_discover(argstr, mon_cfg.autoconnect, NORMAL);
+
+	exit(-rc);
+	/* not reached */
+	return rc;
+}
+
+static void monitor_handle_fc_uev(struct udev_device *ud)
+{
+	const char *action = udev_device_get_action(ud);
+	const char *sysname = udev_device_get_sysname(ud);
+	char traddr[NVMF_TRADDR_SIZE], host_traddr[NVMF_TRADDR_SIZE];
+
+	if (strcmp(action, "change") || strcmp(sysname, "fc_udev_device"))
+		return;
+
+	if (monitor_get_fc_uev_props(ud, traddr, sizeof(traddr),
+				     host_traddr, sizeof(host_traddr)))
+		return;
+
+	monitor_discovery("fc", traddr, NULL, host_traddr);
+}
+
+static int monitor_get_nvme_uev_props(struct udev_device *ud,
+				      char *transport, size_t tr_sz,
+				      char *traddr, size_t tra_sz,
+				      char *trsvcid, size_t trs_sz,
+				      char *host_traddr, size_t htra_sz)
+{
+	const char *sysname = udev_device_get_sysname(ud);
+	bool aen_disc = false;
+	struct udev_list_entry *entry;
+
+	entry = udev_device_get_properties_list_entry(ud);
+	if (!entry) {
+		msg(LOG_NOTICE, "%s: emtpy properties list\n", sysname);
+		return -ENOENT;
+	}
+
+	*transport = *traddr = *trsvcid = *host_traddr = '\0';
+	for (; entry; entry = udev_list_entry_get_next(entry)) {
+		const char *name = udev_list_entry_get_name(entry);
+
+		if (!strcmp(name, "NVME_AEN") &&
+		    !strcmp(udev_list_entry_get_value(entry), "0x70f002"))
+				aen_disc = true;
+		else if (!strcmp(name, "NVME_TRTYPE"))
+			memccpy(transport, udev_list_entry_get_value(entry),
+				'\0', tr_sz);
+		else if (!strcmp(name, "NVME_TRADDR"))
+			memccpy(traddr, udev_list_entry_get_value(entry),
+				'\0', htra_sz);
+		else if (!strcmp(name, "NVME_TRSVCID"))
+			memccpy(trsvcid, udev_list_entry_get_value(entry),
+				'\0', trs_sz);
+		else if (!strcmp(name, "NVME_HOST_TRADDR"))
+			memccpy(host_traddr, udev_list_entry_get_value(entry),
+				'\0', tra_sz);
+	}
+	if (!aen_disc) {
+		msg(LOG_DEBUG, "%s: not a \"discovery log changed\" AEN, ignoring event\n",
+		    sysname);
+		return -EINVAL;
+	}
+
+	if (!*traddr || !*transport) {
+		msg(LOG_WARNING, "%s: transport properties missing\n", sysname);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void monitor_handle_nvme_uev(struct udev_device *ud)
+{
+	char traddr[NVMF_TRADDR_SIZE], host_traddr[NVMF_TRADDR_SIZE];
+	char trsvcid[NVMF_TRSVCID_SIZE], transport[5];
+
+	if (strcmp(udev_device_get_action(ud), "change"))
+		return;
+
+	if (monitor_get_nvme_uev_props(ud, transport, sizeof(transport),
+				       traddr, sizeof(traddr),
+				       trsvcid, sizeof(trsvcid),
+				       host_traddr, sizeof(host_traddr)))
+		return;
+
+	monitor_discovery(transport, traddr,
+			  strcmp(trsvcid, "none") ? trsvcid : NULL, host_traddr);
+}
+
 static void monitor_handle_udevice(struct udev_device *ud)
 {
-	msg(LOG_INFO, "uevent: %s %s\n",
-		udev_device_get_action(ud),
-		udev_device_get_sysname(ud));
+	const char *subsys  = udev_device_get_subsystem(ud);
+
+	if (log_level >= LOG_INFO) {
+		const char *action = udev_device_get_action(ud);
+		const char *syspath = udev_device_get_syspath(ud);
+
+		msg(LOG_INFO, "%s %s\n", action, syspath);
+	}
+	if (!strcmp(subsys, "fc"))
+		monitor_handle_fc_uev(ud);
+	else if (!strcmp(subsys, "nvme"))
+		monitor_handle_nvme_uev(ud);
 }
 
 struct udev_monitor_event {
@@ -147,6 +362,49 @@ static int monitor_handle_uevents(struct event *ev,
 	return EVENTCB_CONTINUE;
 }
 
+static int handle_epoll_err(int errcode)
+{
+	if (errcode != -EINTR)
+		return errcode;
+	else if (must_exit) {
+		msg(LOG_NOTICE, "monitor: exit signal received\n");
+		return ELOOP_QUIT;
+	} else if (!got_sigchld) {
+		msg(LOG_WARNING, "monitor: unexpected interruption, ignoring\n");
+		return ELOOP_CONTINUE;
+	}
+
+	got_sigchld = 0;
+	while (true) {
+		int wstatus;
+		pid_t pid;
+
+		pid = waitpid(-1, &wstatus, WNOHANG);
+		switch(pid) {
+		case -1:
+			if (errno != ECHILD)
+				msg(LOG_ERR, "error in waitpid: %m\n");
+			goto out;
+		case 0:
+			goto out;
+		default:
+			break;
+		}
+		if (!WIFEXITED(wstatus))
+			msg(LOG_WARNING, "child %ld didn't exit normally\n",
+			    (long)pid);
+		else if (WEXITSTATUS(wstatus) != 0)
+			msg(LOG_NOTICE, "child %ld exited with status \"%s\"\n",
+			    (long)pid, strerror(WEXITSTATUS(wstatus)));
+		else
+			msg(LOG_DEBUG, "child %ld exited normally\n", (long)pid);
+	};
+
+out:
+	/* tell event_loop() to continue */
+	return ELOOP_CONTINUE;
+}
+
 static int monitor_parse_opts(const char *desc, int argc, char **argv)
 {
 	bool quiet = false;
@@ -156,6 +414,19 @@ static int monitor_parse_opts(const char *desc, int argc, char **argv)
 	int ret;
 	OPT_ARGS(opts) = {
 		OPT_FLAG("no-connect",     'N', &noauto,              "dry run, do not autoconnect to discovered controllers"),
+		OPT_LIST("hostnqn",        'q', &fabrics_cfg.hostnqn,         "user-defined hostnqn (if default not used)"),
+		OPT_LIST("hostid",         'I', &fabrics_cfg.hostid,          "user-defined hostid (if default not used)"),
+		OPT_INT("keep-alive-tmo",  'k', &fabrics_cfg.keep_alive_tmo,  "keep alive timeout period in seconds"),
+		OPT_INT("reconnect-delay", 'c', &fabrics_cfg.reconnect_delay, "reconnect timeout period in seconds"),
+		OPT_INT("ctrl-loss-tmo",   'l', &fabrics_cfg.ctrl_loss_tmo,   "controller loss timeout period in seconds"),
+		OPT_INT("tos",             'T', &fabrics_cfg.tos,             "type of service"),
+		OPT_FLAG("hdr_digest",     'g', &fabrics_cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
+		OPT_FLAG("data_digest",    'G', &fabrics_cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
+		OPT_INT("nr-io-queues",    'i', &fabrics_cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
+		OPT_INT("nr-write-queues", 'W', &fabrics_cfg.nr_write_queues, "number of write queues to use (default 0)"),
+		OPT_INT("nr-poll-queues",  'P', &fabrics_cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
+		OPT_INT("queue-size",      'Q', &fabrics_cfg.queue_size,      "number of io queue elements to use (default 128)"),
+		OPT_FLAG("matching",       'm', &fabrics_cfg.matching_only,   "connect only records matching the traddr"),
 		OPT_FLAG("silent",         'S', &quiet,               "log level: silent"),
 		OPT_FLAG("verbose",        'v', &verbose,             "log level: verbose"),
 		OPT_FLAG("debug",          'D', &debug,               "log level: debug"),
@@ -163,6 +434,7 @@ static int monitor_parse_opts(const char *desc, int argc, char **argv)
 		OPT_END()
 	};
 
+	log_pid = true;
 	ret = argconfig_parse(argc, argv, desc, opts);
 	if (ret)
 		return ret;
@@ -238,7 +510,7 @@ int aen_monitor(const char *desc, int argc, char **argv)
 		goto out;
 	}
 
-	ret = event_loop(mon_dsp, &wait_mask, NULL);
+	ret = event_loop(mon_dsp, &wait_mask, handle_epoll_err);
 
 out:
 	free_dispatcher(mon_dsp);
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 05/16] conn-db: add simple connection registry
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (3 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 04/16] monitor: implement uevent handling mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 06/16] monitor: monitor_discovery(): try to reuse existing controllers mwilck
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

The monitor works best if it maintains a discovery controller connection
to every transport address that provides a discovery subsystem.

While controllers are easily tracked in sysfs, addresses ("connections"),
i.e. (transport, traddr, trsvid, host_traddr) tuples, are not. Create
a simple registry that tracks the state of "connections" and their
associated discovery controllers.

A detailed description of the API is provided in the header file conn-db.h.

This patch also adds well-known "list.h" macros from the kernel. The file
was taken from multipath-tools, which got it from the kernel long ago.
---
 Makefile  |   2 +-
 conn-db.c | 424 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 conn-db.h | 170 ++++++++++++++++++++++
 list.h    | 349 ++++++++++++++++++++++++++++++++++++++++++++
 monitor.c |  49 ++++++-
 5 files changed, 987 insertions(+), 7 deletions(-)
 create mode 100644 conn-db.c
 create mode 100644 conn-db.h
 create mode 100644 list.h

diff --git a/Makefile b/Makefile
index 33441b1..7c7b3b9 100644
--- a/Makefile
+++ b/Makefile
@@ -69,7 +69,7 @@ OBJS := nvme-print.o nvme-ioctl.o nvme-rpmb.o \
 	nvme-status.o nvme-filters.o nvme-topology.o
 
 ifeq ($(HAVE_LIBUDEV),0)
-        OBJS += monitor.o
+	OBJS += monitor.o conn-db.o
 endif
 
 UTIL_OBJS := util/argconfig.o util/suffix.o util/json.o util/parser.o util/cleanup.o util/log.o
diff --git a/conn-db.c b/conn-db.c
new file mode 100644
index 0000000..cfdc208
--- /dev/null
+++ b/conn-db.c
@@ -0,0 +1,424 @@
+/*
+ * Copyright (C) 2021 SUSE LLC
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This file implements a simple registry for NVMe connections, i.e.
+ * (transport type, host_traddr, traddr, trsvcid) tuples.
+ */
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <syslog.h>
+#include <time.h>
+
+#include "common.h"
+#include "util/cleanup.h"
+#include "list.h"
+#include "nvme.h"
+#include "fabrics.h"
+#include "conn-db.h"
+
+#define LOG_FUNCNAME 1
+#include "util/log.h"
+
+struct conn_int {
+	struct nvme_connection c;
+	struct list_head lst;
+};
+
+#define conn2internal(co) container_of(co, struct conn_int, c)
+
+static LIST_HEAD(connections);
+
+static const char * const _status_str[] = {
+	[CS_NEW] = "new",
+	[CS_DISC_RUNNING] = "discovery-running",
+	[CS_ONLINE] = "online",
+	[CS_FAILED] = "failed",
+};
+
+const char *conn_status_str(int status)
+{
+	return arg_str(_status_str, ARRAY_SIZE(_status_str), status);
+}
+
+void __attribute__((format(printf, 4, 5)))
+_conn_msg(int lvl, const char *func, const struct nvme_connection *c,
+	  const char *fmt, ...)
+{
+	char *fbuf __cleanup__(cleanup_charp) = NULL;
+	char *cbuf __cleanup__(cleanup_charp) = NULL;
+	char *mbuf __cleanup__(cleanup_charp) = NULL;
+	va_list ap;
+
+	if (asprintf(&cbuf, "[%s]%s->%s(%s): ",
+		     c->transport,
+		     c->host_traddr ? c->host_traddr : "localhost",
+		     c->traddr ? c->traddr : "<no traddr>",
+		     c->trsvcid ? c->trsvcid : "") == -1) {
+		cbuf = NULL;
+		return;
+	}
+
+	va_start(ap, fmt);
+	if (vasprintf(&mbuf, fmt, ap) == -1)
+		mbuf = NULL;
+	va_end(ap);
+	__msg(lvl, func, "%s%s\n", cbuf, mbuf);
+}
+
+static void conn_free(struct conn_int *ci)
+{
+	if (!ci)
+		return;
+	if (ci->c.traddr)
+		free(ci->c.traddr);
+	if (ci->c.trsvcid)
+		free(ci->c.trsvcid);
+	if (ci->c.host_traddr)
+		free(ci->c.host_traddr);
+	free(ci);
+}
+
+static int conn_del(struct conn_int *ci)
+{
+	if (!ci)
+		return -ENOENT;
+	if (list_empty(&ci->lst))
+		return -EINVAL;
+	conn_msg(LOG_DEBUG, &ci->c, "forgetting connection\n");
+	list_del(&ci->lst);
+	conn_free(ci);
+	return 0;
+}
+
+static int get_trtype(const char *transport)
+{
+	if (!transport)
+		return -EINVAL;
+	if (!strcmp(transport, trtypes[NVMF_TRTYPE_RDMA]))
+		return NVMF_TRTYPE_RDMA;
+	else if (!strcmp(transport, trtypes[NVMF_TRTYPE_FC]))
+		return NVMF_TRTYPE_FC;
+	else if (!strcmp(transport, trtypes[NVMF_TRTYPE_TCP]))
+		return NVMF_TRTYPE_TCP;
+	else if (!strcmp(transport, trtypes[NVMF_TRTYPE_LOOP]))
+		return NVMF_TRTYPE_LOOP;
+	else
+		return -ENOENT;
+}
+
+static bool transport_params_ok(const char *transport, const char *traddr,
+				const char *host_traddr)
+{
+	int trtype = get_trtype(transport);
+
+	/* same as "required_opts" in the kernel code */
+	switch(trtype) {
+	case NVMF_TRTYPE_FC:
+		return traddr && *traddr && host_traddr && *host_traddr;
+	case NVMF_TRTYPE_RDMA:
+	case NVMF_TRTYPE_TCP:
+		return traddr && *traddr;
+	case NVMF_TRTYPE_LOOP:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static bool prop_matches(const char *p1, const char *p2, size_t len)
+{
+	/* treat NULL and empty string as equivalent */
+	if ((!p1 && !p2) || (!p1 && !*p2) || (!p2 && !*p1))
+		return true;
+	if (p1 && p2 && !strncmp(p1, p2, len))
+		return true;
+	return false;
+}
+
+bool conndb_matches(const char *transport, const char *traddr,
+		    const char *trsvcid, const char *host_traddr,
+		    const struct nvme_connection *co)
+{
+	if (!co)
+		return false;
+	if (!transport_params_ok(transport, traddr, host_traddr))
+		return NULL;
+	if (strcmp(transport, co->transport))
+		return false;
+	if (!prop_matches(traddr, co->traddr, NVMF_TRADDR_SIZE))
+		return false;
+	if (!prop_matches(trsvcid, co->trsvcid, NVMF_TRSVCID_SIZE))
+		return false;
+	if (!prop_matches(host_traddr, co->host_traddr, NVMF_TRADDR_SIZE))
+		return false;
+	return true;
+}
+
+static struct conn_int *conn_find(const char *transport, const char *traddr,
+				  const char *trsvcid, const char *host_traddr)
+{
+	struct conn_int *ci;
+
+	if (!transport_params_ok(transport, traddr, host_traddr))
+		return NULL;
+	list_for_each_entry(ci, &connections, lst) {
+		if (conndb_matches(transport, traddr, trsvcid, host_traddr, &ci->c))
+			return ci;
+	}
+	return NULL;
+}
+
+static DEFINE_CLEANUP_FUNC(conn_free_p, struct conn_int *, conn_free);
+
+static int _conn_add(const char *transport, const char *traddr,
+		     const char *trsvcid, const char *host_traddr,
+		     struct conn_int **new_ci)
+{
+	struct conn_int *ci __cleanup__(conn_free_p) = NULL;
+
+	if (!transport_params_ok(transport, traddr, host_traddr)) {
+		msg(LOG_ERR, "invalid %s transport parameters: traddr=%s host_traddr=%s\n",
+		    transport, traddr, host_traddr);
+		return -EINVAL;
+	}
+
+	if (!(ci = calloc(1, sizeof(*ci))) ||
+	    (traddr && *traddr &&
+	     !(ci->c.traddr = strndup(traddr, NVMF_TRADDR_SIZE))) ||
+	    (host_traddr && *host_traddr &&
+	     !(ci->c.host_traddr = strndup(host_traddr, NVMF_TRADDR_SIZE))) ||
+	    (trsvcid && *trsvcid &&
+	     !(ci->c.trsvcid = strndup(trsvcid, NVMF_TRSVCID_SIZE))))
+		return -ENOMEM;
+	memccpy(ci->c.transport, transport, '\0', sizeof(ci->c.transport));
+	ci->c.status = CS_NEW;
+	ci->c.discovery_instance = -1;
+	list_add(&ci->lst, &connections);
+	*new_ci = ci;
+	ci = NULL;
+	return 0;
+}
+
+static int conn_add(const char *transport, const char *traddr,
+		    const char *trsvcid, const char *host_traddr,
+		    struct conn_int **new_ci)
+{
+	struct conn_int *ci = conn_find(transport, traddr, trsvcid, host_traddr);
+	int rc;
+
+	if (ci) {
+		*new_ci = ci;
+		return -EEXIST;
+	}
+	rc = _conn_add(transport, traddr, trsvcid, host_traddr, new_ci);
+	if (!rc)
+		conn_msg(LOG_DEBUG, &(*new_ci)->c, "added connection\n");
+	else
+		msg(LOG_ERR, "failed to add %s connection\n", transport);
+	return rc;
+}
+
+int conndb_add(const char *transport, const char *traddr,
+	       const char *trsvcid, const char *host_traddr,
+	       struct nvme_connection **new_conn)
+{
+	struct conn_int *ci = NULL;
+	int rc = conn_add(transport, traddr, trsvcid, host_traddr, &ci);
+
+	if (rc != 0 && rc != -EEXIST)
+		return rc;
+	if (new_conn)
+		*new_conn = &ci->c;
+	return rc;
+}
+
+int conndb_add_disc_ctrl(const char *addrstr, struct nvme_connection **new_conn)
+{
+	char *subsysnqn __cleanup__(cleanup_charp) = NULL;
+	char *transport __cleanup__(cleanup_charp) = NULL;
+	char *traddr __cleanup__(cleanup_charp) = NULL;
+	char *trsvcid __cleanup__(cleanup_charp) = NULL;
+	char *host_traddr __cleanup__(cleanup_charp) = NULL;
+
+	subsysnqn = parse_conn_arg(addrstr, ',', "nqn");
+	if (strcmp(subsysnqn, NVME_DISC_SUBSYS_NAME)) {
+		msg(LOG_WARNING, "%s is not a discovery subsystem\n", subsysnqn);
+		return -EINVAL;
+	}
+	transport = parse_conn_arg(addrstr, ',', "transport");
+	traddr = parse_conn_arg(addrstr, ',', "traddr");
+	trsvcid = parse_conn_arg(addrstr, ',', "trsvcid");
+	host_traddr = parse_conn_arg(addrstr, ',', "host_traddr");
+	return conndb_add(transport, traddr, trsvcid, host_traddr, new_conn);
+}
+
+struct nvme_connection *conndb_find(const char *transport, const char *traddr,
+				    const char *trsvcid, const char *host_traddr)
+{
+	struct conn_int *ci;
+
+	ci = conn_find(transport, traddr, trsvcid, host_traddr);
+	if (ci)
+		return &ci->c;
+	else
+		return NULL;
+}
+
+struct nvme_connection *conndb_find_by_pid(pid_t pid)
+{
+	struct conn_int *ci;
+
+	list_for_each_entry(ci, &connections, lst) {
+		if (ci->c.status == CS_DISC_RUNNING &&
+		    ci->c.discovery_task == pid)
+			return &ci->c;
+	}
+	return NULL;
+}
+
+struct nvme_connection *conndb_find_by_ctrl(const char *devname)
+{
+	struct conn_int *ci;
+	int instance;
+
+	instance = ctrl_instance(devname);
+	if (instance < 0)
+		return NULL;
+
+	list_for_each_entry(ci, &connections, lst) {
+		if (ci->c.discovery_instance == instance)
+			return &ci->c;
+	}
+	return NULL;
+}
+
+int conndb_delete(struct nvme_connection *co)
+{
+	if (!co)
+		return -ENOENT;
+	return conn_del(conn2internal(co));
+}
+
+void conndb_free(void)
+{
+	struct conn_int *ci, *next;
+
+	list_for_each_entry_safe(ci, next, &connections, lst)
+		conn_del(ci);
+}
+
+int conndb_init_from_sysfs(void)
+{
+	struct dirent **devices;
+	int i, n, ret = 0;
+	char syspath[PATH_MAX];
+
+	n = scandir(SYS_NVME, &devices, scan_ctrls_filter, alphasort);
+	if (n <= 0)
+		return n;
+
+	for (i = 0; i < n; i++) {
+		int len, rc;
+		struct conn_int *ci;
+		char *transport __cleanup__(cleanup_charp) = NULL;
+		char *address __cleanup__(cleanup_charp) = NULL;
+		char *traddr __cleanup__(cleanup_charp) = NULL;
+		char *trsvcid __cleanup__(cleanup_charp) = NULL;
+		char *host_traddr __cleanup__(cleanup_charp) = NULL;
+		char *subsysnqn __cleanup__(cleanup_charp) = NULL;
+
+		len = snprintf(syspath, sizeof(syspath), SYS_NVME "/%s",
+			       devices[i]->d_name);
+		if (len < 0 || len >= sizeof(syspath))
+			continue;
+
+		transport = nvme_get_ctrl_attr(syspath, "transport");
+		address = nvme_get_ctrl_attr(syspath, "address");
+		if (!transport || !address)
+			continue;
+		traddr = parse_conn_arg(address, ' ', "traddr");
+		trsvcid = parse_conn_arg(address, ' ', "trsvcid");
+		host_traddr = parse_conn_arg(address, ' ', "host_traddr");
+
+		rc = conn_add(transport, traddr, trsvcid, host_traddr, &ci);
+		if (rc != 0 && rc != -EEXIST)
+			continue;
+
+		if (rc == 0)
+			ret++;
+
+		subsysnqn = nvme_get_ctrl_attr(syspath, "subsysnqn");
+		if (subsysnqn && !strcmp(subsysnqn, NVME_DISC_SUBSYS_NAME)) {
+			int instance;
+			char *kato_attr __cleanup__(cleanup_charp) = NULL;
+
+			kato_attr = nvme_get_ctrl_attr(syspath, "kato");
+			if (kato_attr) {
+				char dummy;
+				unsigned int kato;
+				/*
+				 * The kernel supports the "kato" attribute, and
+				 * this controller isn't persistent. Skip it.
+				 */
+				if (sscanf(kato_attr, "%u%c", &kato, &dummy) == 1
+				    && kato == 0)
+					continue;
+			}
+
+			instance =ctrl_instance(devices[i]->d_name);
+			if (instance >= 0) {
+				ci->c.discovery_instance = instance;
+				msg(LOG_DEBUG, "found discovery controller %s\n",
+				    devices[i]->d_name);
+			}
+		}
+	}
+
+	for (i = 0; i < n; i++)
+		free(devices[i]);
+	free(devices);
+
+	return ret;
+}
+
+int conndb_for_each(int (*callback)(struct nvme_connection *co, void *arg),
+		    void *arg)
+{
+	struct conn_int *ci, *next;
+	int ret = 0;
+
+	list_for_each_entry_safe(ci, next, &connections, lst) {
+		int rc = callback(&ci->c, arg);
+
+		if (rc & ~(CD_CB_ERR|CD_CB_DEL|CD_CB_BREAK)) {
+			msg(LOG_ERR,
+			    "invalid return value 0x%x from callback\n", rc);
+			ret = -EINVAL;
+			continue;
+		}
+		if (rc & CD_CB_ERR) {
+			msg(LOG_WARNING, "callback returned error\n");
+			if (!ret)
+				ret = errno ? -errno : -EIO;
+		}
+		if (rc & CD_CB_DEL)
+			conn_del(ci);
+		if (rc & CD_CB_BREAK)
+			break;
+	}
+	return ret;
+}
diff --git a/conn-db.h b/conn-db.h
new file mode 100644
index 0000000..e2a5827
--- /dev/null
+++ b/conn-db.h
@@ -0,0 +1,170 @@
+#ifndef _CONN_DB_H
+#define _CONN_DB_H
+#include "log.h"
+
+struct nvme_connection {
+	char transport[5];
+	char *traddr;
+	char *trsvcid;
+	char *host_traddr;
+
+	int status;
+	int discovery_pending:1;
+	int did_discovery:1;
+	int successful_discovery:1;
+	union {
+		pid_t discovery_task;
+		int discovery_result;
+	};
+	int discovery_instance;
+};
+
+/* connection status */
+enum {
+	CS_NEW = 0,
+	CS_DISC_RUNNING,
+	CS_ONLINE,
+	CS_FAILED,
+	__CS_LAST,
+};
+
+/**
+ * conn_status_str() - return string representation of connection status
+ */
+const char *conn_status_str(int status);
+
+/**
+ * conndb_add() - add a connection with given parameters
+ *
+ * @new_conn: if non-NULL and the function succeeds, will receive a pointer
+ *            to the either existing or newly created connection object.
+ *
+ * Looks up the given connection parameters in the db and adds a new connection
+ * unless found. All input parameters except trsvcid must be non-NULL.
+ *
+ * Return: 0 if controller was added, -EEXIST if controller existed in the db
+ *         (this is considered success), or other negative error code in
+ *         the error case.
+ *
+ */
+int conndb_add(const char *transport, const char *traddr,
+	       const char *trsvcid, const char *host_traddr,
+	       struct nvme_connection **new_conn);
+
+/**
+ * conndb_add_disc_ctrl - add connection from kernel parameters
+ *
+ * @addrstr: kernel connect parameters as passed to /dev/nvme-fabrics
+ * @new_conn: see conndb_add()
+ *
+ * Extracts connection parameters from @addrstr and calls conndb_add().
+ *
+ * Return: see conndb_add().
+ */
+int conndb_add_disc_ctrl(const char *addrstr, struct nvme_connection **new_conn);
+
+/**
+ * conndb_find() - lookup a connection with given parameters
+ *
+ * Return: NULL if not found, valid connection object otherwise.
+ */
+struct nvme_connection *conndb_find(const char *transport, const char *traddr,
+				    const char *trsvcid, const char *host_traddr);
+
+
+/**
+ * conndb_find_by_pid() - lookup connection by discovery task pid
+ *
+ * Return: valid connetion object if successful, NULL otherwise.
+ */
+struct nvme_connection *conndb_find_by_pid(pid_t pid);
+
+
+/**
+ * conndb_find_by_pid() - lookup connection from controller instance
+ *
+ * Return: valid connetion object if a connection was found that has
+ * the given device as discovery controller. NULL otherwise.
+ */
+struct nvme_connection *conndb_find_by_ctrl(const char *devname);
+
+enum {
+	CD_CB_OK    = 0,
+	CD_CB_ERR   = (1 << 0),
+	CD_CB_DEL   = (1 << 1),
+	CD_CB_BREAK = (1 << 2),
+};
+
+/**
+ *  conndb_for_each() - run a callback for each connection
+ *
+ * @callback: function to be called
+ * @arg:      user argument passed to callback
+ *
+ * The callback must return a bitmask created from the CD_CB_* enum
+ * values above. CD_CB_ERR signals an error condition in the callback.
+ * CD_CB_DEL causes the connection to be deleted after the callback
+ * returns. CD_CB_BREAK stops the iteration. Returning a value that
+ * is not an OR-ed from these values is an error.
+ *
+ * Return: 0 if all callbacks completed successfully.
+ *         A negative error code if some callback failed.
+ */
+int conndb_for_each(int (*callback)(struct nvme_connection *co, void *arg),
+		    void *arg);
+
+/**
+ * conndb_matches - check if connection matches given parameters
+ *
+ * The arguments @transport and @traddr must be non-null and non-empty.
+ * @trscvid and @host_traddr may be NULL, in which case they match
+ * connections that don't have these attributes set, either.
+ *
+ * Return: true iff the given connection matches the given attributes.
+ */
+bool conndb_matches(const char *transport, const char *traddr,
+		    const char *trsvcid, const char *host_traddr,
+		    const struct nvme_connection *co);
+
+/**
+ * conndb_delete() - remove a given nvme connection object
+ *
+ * Removes the object from the data base and frees it.
+ *
+ * Return: 0 if successful, negative error code otherwise
+ */
+int conndb_delete(struct nvme_connection *co);
+
+/**
+ * conndb-free() - free internal data structures
+ */
+void conndb_free(void);
+
+/**
+ * conndb_init_from_sysfs() - check existing NVMe connections
+ *
+ * Populates the connection db from existing contoller devices in sysfs.
+ *
+ * Return: (positive or zero) number of found connections on success.
+ *         Negative error code on failure.
+ */
+int conndb_init_from_sysfs(void);
+
+/**
+ * conn_msg() - print a log message prepended by a connection params
+ * @lvl: standard syslog log level
+ * @c: nvme connection to print information
+ * @fmt: format string
+ * ...: parameters for format
+ */
+void __attribute__((format(printf, 4, 5)))
+_conn_msg(int lvl, const char *func, const struct nvme_connection *c,
+	  const char *fmt, ...);
+
+#define conn_msg(lvl, c, fmt, ...) \
+do {									\
+	if ((lvl) <= MAX_LOGLEVEL)					\
+		_conn_msg(lvl, _log_func, c, fmt, ##__VA_ARGS__);	\
+} while (0)
+
+#endif
diff --git a/list.h b/list.h
new file mode 100644
index 0000000..f87c84f
--- /dev/null
+++ b/list.h
@@ -0,0 +1,349 @@
+/*
+ * Copied from the Linux kernel source tree, version 2.6.0-test1.
+ *
+ * Licensed under the GPL v2 as per the whole kernel source tree.
+ *
+ */
+
+#ifndef _LIST_H
+#define _LIST_H
+
+#include <stddef.h>
+
+/*
+ * These are non-NULL pointers that will result in page faults
+ * under normal circumstances, used to verify that nobody uses
+ * non-initialized list entries.
+ */
+#define LIST_POISON1  ((void *) 0x00100100)
+#define LIST_POISON2  ((void *) 0x00200200)
+
+/*
+ * Simple doubly linked list implementation.
+ *
+ * Some of the internal functions ("__xxx") are useful when
+ * manipulating whole lists rather than single entries, as
+ * sometimes we already know the next/prev entries and we can
+ * generate better code by using them directly rather than
+ * using the generic single-entry routines.
+ */
+
+struct list_head {
+	struct list_head *next, *prev;
+};
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define LIST_HEAD(name) \
+	struct list_head name = LIST_HEAD_INIT(name)
+
+#define INIT_LIST_HEAD(ptr) do { \
+	(ptr)->next = (ptr); (ptr)->prev = (ptr); \
+} while (0)
+
+/*
+ * Insert a new entry between two known consecutive entries.
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_add(struct list_head *new,
+			      struct list_head *prev,
+			      struct list_head *next)
+{
+	next->prev = new;
+	new->next = next;
+	new->prev = prev;
+	prev->next = new;
+}
+
+/**
+ * list_add - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it after
+ *
+ * Insert a new entry after the specified head.
+ * This is good for implementing stacks.
+ */
+static inline void list_add(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head, head->next);
+}
+
+/**
+ * list_add_tail - add a new entry
+ * @new: new entry to be added
+ * @head: list head to add it before
+ *
+ * Insert a new entry before the specified head.
+ * This is useful for implementing queues.
+ */
+static inline void list_add_tail(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head->prev, head);
+}
+
+/*
+ * Delete a list entry by making the prev/next entries
+ * point to each other.
+ *
+ * This is only for internal list manipulation where we know
+ * the prev/next entries already!
+ */
+static inline void __list_del(struct list_head * prev, struct list_head * next)
+{
+	next->prev = prev;
+	prev->next = next;
+}
+
+/**
+ * list_del - deletes entry from list.
+ * @entry: the element to delete from the list.
+ * Note: list_empty on entry does not return true after this, the entry is
+ * in an undefined state.
+ */
+static inline void list_del(struct list_head *entry)
+{
+	__list_del(entry->prev, entry->next);
+	entry->next = LIST_POISON1;
+	entry->prev = LIST_POISON2;
+}
+
+/**
+ * list_del_init - deletes entry from list and reinitialize it.
+ * @entry: the element to delete from the list.
+ */
+static inline void list_del_init(struct list_head *entry)
+{
+	__list_del(entry->prev, entry->next);
+	INIT_LIST_HEAD(entry);
+}
+
+/**
+ * list_move - delete from one list and add as another's head
+ * @list: the entry to move
+ * @head: the head that will precede our entry
+ */
+static inline void list_move(struct list_head *list, struct list_head *head)
+{
+	__list_del(list->prev, list->next);
+	list_add(list, head);
+}
+
+/**
+ * list_move_tail - delete from one list and add as another's tail
+ * @list: the entry to move
+ * @head: the head that will follow our entry
+ */
+static inline void list_move_tail(struct list_head *list,
+				  struct list_head *head)
+{
+	__list_del(list->prev, list->next);
+	list_add_tail(list, head);
+}
+
+/**
+ * list_empty - tests whether a list is empty
+ * @head: the list to test.
+ */
+static inline int list_empty(struct list_head *head)
+{
+	return head->next == head;
+}
+
+static inline void __list_splice(const struct list_head *list,
+				 struct list_head *prev,
+				 struct list_head *next)
+{
+	struct list_head *first = list->next;
+	struct list_head *last = list->prev;
+
+	first->prev = prev;
+	prev->next = first;
+
+	last->next = next;
+	next->prev = last;
+}
+
+/**
+ * list_splice - join two lists
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ */
+static inline void list_splice(struct list_head *list, struct list_head *head)
+{
+	if (!list_empty(list))
+		__list_splice(list, head, head->next);
+}
+
+/**
+ * list_splice_tail - join two lists, each list being a queue
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ */
+static inline void list_splice_tail(struct list_head *list,
+				    struct list_head *head)
+{
+	if (!list_empty(list))
+		__list_splice(list, head->prev, head);
+}
+
+/**
+ * list_splice_init - join two lists and reinitialise the emptied list.
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ *
+ * The list at @list is reinitialised
+ */
+static inline void list_splice_init(struct list_head *list,
+				    struct list_head *head)
+{
+	if (!list_empty(list)) {
+		__list_splice(list, head, head->next);
+		INIT_LIST_HEAD(list);
+	}
+}
+
+/**
+ * list_splice_tail_init - join two lists and reinitialise the emptied list
+ * @list: the new list to add.
+ * @head: the place to add it in the first list.
+ *
+ * Each of the lists is a queue.
+ * The list at @list is reinitialised
+ */
+static inline void list_splice_tail_init(struct list_head *list,
+					 struct list_head *head)
+{
+	if (!list_empty(list)) {
+		__list_splice(list, head->prev, head);
+		INIT_LIST_HEAD(list);
+	}
+}
+
+/**
+ * list_entry - get the struct for this entry
+ * @ptr:	the &struct list_head pointer.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_entry(ptr, type, member) \
+	container_of(ptr, type, member)
+
+/**
+ * list_for_each	-	iterate over a list
+ * @pos:	the &struct list_head to use as a loop counter.
+ * @head:	the head for your list.
+ */
+#define list_for_each(pos, head) \
+	for (pos = (head)->next; pos != (head); \
+		pos = pos->next)
+
+/**
+ * __list_for_each	-	iterate over a list
+ * @pos:	the &struct list_head to use as a loop counter.
+ * @head:	the head for your list.
+ *
+ * This variant differs from list_for_each() in that it's the
+ * simplest possible list iteration code.
+ * Use this for code that knows the list to be very short (empty
+ * or 1 entry) most of the time.
+ */
+#define __list_for_each(pos, head) \
+	for (pos = (head)->next; pos != (head); pos = pos->next)
+
+/**
+ * list_for_each_prev	-	iterate over a list backwards
+ * @pos:	the &struct list_head to use as a loop counter.
+ * @head:	the head for your list.
+ */
+#define list_for_each_prev(pos, head) \
+	for (pos = (head)->prev; pos != (head); pos = pos->prev)
+
+/**
+ * list_for_each_safe	-	iterate over a list safe against removal of list entry
+ * @pos:	the &struct list_head to use as a loop counter.
+ * @n:		another &struct list_head to use as temporary storage
+ * @head:	the head for your list.
+ */
+#define list_for_each_safe(pos, n, head) \
+	for (pos = (head)->next, n = pos->next; pos != (head); \
+		pos = n, n = pos->next)
+
+/**
+ * list_for_each_entry	-	iterate over list of given type
+ * @pos:	the type * to use as a loop counter.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_each_entry(pos, head, member)				\
+	for (pos = list_entry((head)->next, typeof(*pos), member);	\
+	     &pos->member != (head);					\
+	     pos = list_entry(pos->member.next, typeof(*pos), member))
+
+/**
+ * list_for_each_entry_reverse - iterate backwards over list of given type.
+ * @pos:	the type * to use as a loop counter.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_reverse(pos, head, member)			\
+	for (pos = list_entry((head)->prev, typeof(*pos), member);	\
+	     &pos->member != (head);					\
+	     pos = list_entry(pos->member.prev, typeof(*pos), member))
+
+/**
+ * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
+ * @pos:	the type * to use as a loop counter.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_safe(pos, n, head, member)			\
+	for (pos = list_entry((head)->next, typeof(*pos), member),	\
+		n = list_entry(pos->member.next, typeof(*pos), member);	\
+	     &pos->member != (head);					\
+	     pos = n, n = list_entry(n->member.next, typeof(*n), member))
+
+/**
+ * list_for_each_entry_reverse_safe - iterate backwards over list of given type safe against removal of list entry
+ * @pos:	the type * to use as a loop counter.
+ * @n:		another type * to use as temporary storage
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_each_entry_reverse_safe(pos, n, head, member)          \
+	for (pos = list_entry((head)->prev, typeof(*pos), member),      \
+		 n = list_entry(pos->member.prev, typeof(*pos), member);\
+	     &pos->member != (head);                                    \
+	     pos = n, n = list_entry(n->member.prev, typeof(*n), member))
+
+/**
+ * list_for_some_entry_safe - iterate list from the given begin node to the given end node safe against removal of list entry
+ * @pos:	the type * to use as a loop counter.
+ * @n:		another type * to use as temporary storage
+ * @from:	the begin node of the iteration.
+ * @to:		the end node of the iteration.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_some_entry_safe(pos, n, from, to, member)              \
+	for (pos = list_entry((from)->next, typeof(*pos), member),      \
+	     n = list_entry(pos->member.next, typeof(*pos), member);    \
+	     &pos->member != (to);                                      \
+	     pos = n, n = list_entry(n->member.next, typeof(*n), member))
+
+/**
+ * list_for_some_entry_reverse_safe - iterate backwards list from the given begin node to the given end node safe against removal of list entry
+ * @pos:	the type * to use as a loop counter.
+ * @n:		another type * to use as temporary storage
+ * @from:	the begin node of the iteration.
+ * @to:		the end node of the iteration.
+ * @member:	the name of the list_struct within the struct.
+ */
+#define list_for_some_entry_reverse_safe(pos, n, from, to, member)      \
+	for (pos = list_entry((from)->prev, typeof(*pos), member),      \
+	     n = list_entry(pos->member.prev, typeof(*pos), member);    \
+	     &pos->member != (to);                                      \
+	     pos = n, n = list_entry(n->member.prev, typeof(*n), member))
+
+#endif /* _LIST_H */
diff --git a/monitor.c b/monitor.c
index f544319..d724f6f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -36,6 +36,7 @@
 #include "common.h"
 #include "fabrics.h"
 #include "monitor.h"
+#include "conn-db.h"
 #define LOG_FUNCNAME 1
 #include "util/log.h"
 #include "event/event.h"
@@ -202,12 +203,22 @@ static int monitor_get_fc_uev_props(struct udev_device *ud,
 	return 0;
 }
 
-static int monitor_discovery(char *transport, char *traddr, char *trsvcid,
-			     char *host_traddr)
+static int monitor_discovery(const char *transport, const char *traddr,
+			     const char *trsvcid, const char *host_traddr)
 {
 	char argstr[BUF_SIZE];
 	pid_t pid;
-	int rc;
+	int rc, db_rc;
+	struct nvme_connection *co = NULL;
+
+	db_rc = conndb_add(transport, traddr, trsvcid, host_traddr, &co);
+	if (db_rc != 0 && db_rc != -EEXIST)
+		return db_rc;
+
+	if (co->status == CS_DISC_RUNNING) {
+		co->discovery_pending = 1;
+		return -EAGAIN;
+	}
 
 	pid = fork();
 	if (pid == -1) {
@@ -215,11 +226,17 @@ static int monitor_discovery(char *transport, char *traddr, char *trsvcid,
 		return -errno;
 	} else if (pid > 0) {
 		msg(LOG_DEBUG, "started discovery task %ld\n", (long)pid);
+
+		co->discovery_pending = 0;
+		co->status = CS_DISC_RUNNING;
+		co->discovery_task = pid;
+
 		return 0;
 	}
 
 	child_reset_signals();
 	free_dispatcher(mon_dsp);
+	conndb_free();
 
 	msg(LOG_NOTICE, "starting discovery\n");
 	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
@@ -376,6 +393,7 @@ static int handle_epoll_err(int errcode)
 
 	got_sigchld = 0;
 	while (true) {
+	struct nvme_connection *co;
 		int wstatus;
 		pid_t pid;
 
@@ -390,14 +408,33 @@ static int handle_epoll_err(int errcode)
 		default:
 			break;
 		}
-		if (!WIFEXITED(wstatus))
+		co = conndb_find_by_pid(pid);
+		if (!co) {
+			msg(LOG_ERR, "no connection found for discovery task %ld\n",
+			    (long)pid);
+			continue;
+		}
+		if (!WIFEXITED(wstatus)) {
 			msg(LOG_WARNING, "child %ld didn't exit normally\n",
 			    (long)pid);
-		else if (WEXITSTATUS(wstatus) != 0)
+			co->status = CS_FAILED;
+		} else if (WEXITSTATUS(wstatus) != 0) {
 			msg(LOG_NOTICE, "child %ld exited with status \"%s\"\n",
 			    (long)pid, strerror(WEXITSTATUS(wstatus)));
-		else
+			co->status = CS_FAILED;
+			co->did_discovery = 1;
+			co->discovery_result = WEXITSTATUS(wstatus);
+		} else {
 			msg(LOG_DEBUG, "child %ld exited normally\n", (long)pid);
+			co->status = CS_ONLINE;
+			co->successful_discovery = co->did_discovery = 1;
+			co->discovery_result = 0;
+		}
+		if (co->discovery_pending) {
+			msg(LOG_NOTICE, "new discovery pending - restarting\n");
+			monitor_discovery(co->transport, co->traddr,
+					  co->trsvcid, co->host_traddr);
+		}
 	};
 
 out:
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 06/16] monitor: monitor_discovery(): try to reuse existing controllers
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (4 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 05/16] conn-db: add simple connection registry mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 07/16] monitor: kill running discovery tasks on exit mwilck
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

We need to pass cfg.device to do_discovery() if we want to re-use the
discovery controller device. Remember the instance number when
handling an AEN, and try to reuse this instance if the discovery
needs to be repeated.
---
 monitor.c | 43 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/monitor.c b/monitor.c
index d724f6f..f7517cb 100644
--- a/monitor.c
+++ b/monitor.c
@@ -204,12 +204,14 @@ static int monitor_get_fc_uev_props(struct udev_device *ud,
 }
 
 static int monitor_discovery(const char *transport, const char *traddr,
-			     const char *trsvcid, const char *host_traddr)
+			     const char *trsvcid, const char *host_traddr,
+			     const char *devname)
 {
 	char argstr[BUF_SIZE];
 	pid_t pid;
 	int rc, db_rc;
 	struct nvme_connection *co = NULL;
+	char *device = NULL;
 
 	db_rc = conndb_add(transport, traddr, trsvcid, host_traddr, &co);
 	if (db_rc != 0 && db_rc != -EEXIST)
@@ -230,7 +232,15 @@ static int monitor_discovery(const char *transport, const char *traddr,
 		co->discovery_pending = 0;
 		co->status = CS_DISC_RUNNING;
 		co->discovery_task = pid;
+		if (devname) {
+			int instance = ctrl_instance(devname);
 
+			if (instance < 0) {
+				msg(LOG_ERR, "unexpected devname: %s\n",
+				    devname);
+			} else
+				co->discovery_instance = instance;
+		}
 		return 0;
 	}
 
@@ -238,12 +248,29 @@ static int monitor_discovery(const char *transport, const char *traddr,
 	free_dispatcher(mon_dsp);
 	conndb_free();
 
-	msg(LOG_NOTICE, "starting discovery\n");
+	conn_msg(LOG_NOTICE, co, "starting discovery in state %s\n",
+		 conn_status_str(co->status));
+
+	/*
+	 * Try to re-use existing controller. do_discovery() will check
+	 * if it matches the connection parameters.
+	 * fabrics_cfg.device must be allocated on the heap!
+	 */
+	if (devname)
+		device = strdup(devname);
+	else if (co->discovery_instance >= 0 &&
+		 asprintf(&device, "nvme%d", co->discovery_instance) == -1)
+		device = NULL;
+
+	if (device)
+		msg(LOG_INFO, "using discovery controller %s\n", device);
+
 	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
 	fabrics_cfg.transport = transport;
 	fabrics_cfg.traddr = traddr;
-	fabrics_cfg.trsvcid = trsvcid;
-	fabrics_cfg.host_traddr = host_traddr;
+	fabrics_cfg.trsvcid = trsvcid && *trsvcid ? trsvcid : NULL;
+	fabrics_cfg.host_traddr = host_traddr && *host_traddr ? host_traddr : NULL;
+	fabrics_cfg.device = device;
 	/* Without the following, the kernel returns EINVAL */
 	fabrics_cfg.tos = -1;
 	fabrics_cfg.persistent = true;
@@ -252,6 +279,7 @@ static int monitor_discovery(const char *transport, const char *traddr,
 	msg(LOG_DEBUG, "%s\n", argstr);
 	rc = do_discover(argstr, mon_cfg.autoconnect, NORMAL);
 
+	free(device);
 	exit(-rc);
 	/* not reached */
 	return rc;
@@ -270,7 +298,7 @@ static void monitor_handle_fc_uev(struct udev_device *ud)
 				     host_traddr, sizeof(host_traddr)))
 		return;
 
-	monitor_discovery("fc", traddr, NULL, host_traddr);
+	monitor_discovery("fc", traddr, NULL, host_traddr, NULL);
 }
 
 static int monitor_get_nvme_uev_props(struct udev_device *ud,
@@ -338,7 +366,8 @@ static void monitor_handle_nvme_uev(struct udev_device *ud)
 		return;
 
 	monitor_discovery(transport, traddr,
-			  strcmp(trsvcid, "none") ? trsvcid : NULL, host_traddr);
+			  strcmp(trsvcid, "none") ? trsvcid : NULL, host_traddr,
+			  udev_device_get_sysname(ud));
 }
 
 static void monitor_handle_udevice(struct udev_device *ud)
@@ -433,7 +462,7 @@ static int handle_epoll_err(int errcode)
 		if (co->discovery_pending) {
 			msg(LOG_NOTICE, "new discovery pending - restarting\n");
 			monitor_discovery(co->transport, co->traddr,
-					  co->trsvcid, co->host_traddr);
+					  co->trsvcid, co->host_traddr, NULL);
 		}
 	};
 
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 07/16] monitor: kill running discovery tasks on exit
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (5 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 06/16] monitor: monitor_discovery(): try to reuse existing controllers mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 08/16] monitor: add option --cleanup / -C mwilck
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Make sure running discovery tasks terminate when the main process terminates.
---
 monitor.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/monitor.c b/monitor.c
index f7517cb..b17fca0 100644
--- a/monitor.c
+++ b/monitor.c
@@ -471,6 +471,34 @@ out:
 	return ELOOP_CONTINUE;
 }
 
+static int monitor_kill_discovery_task(struct nvme_connection *co,
+				       void *arg __attribute__((unused)))
+{
+	int wstatus;
+	pid_t pid, wpid = -1;
+
+	if (co->status != CS_DISC_RUNNING)
+		return CD_CB_OK;
+
+	pid = co->discovery_task;
+	co->status = CS_FAILED;
+	if (kill(co->discovery_task, SIGTERM) == -1) {
+		msg(LOG_ERR, "failed to send SIGTERM to pid %ld: %m\n",
+		    (long)pid);
+		wpid = waitpid(pid, &wstatus, WNOHANG);
+	} else {
+		msg(LOG_DEBUG, "sent SIGTERM to pid %ld, waiting\n", (long)pid);
+		wpid = waitpid(pid, &wstatus, 0);
+	}
+	if (wpid != pid) {
+		msg(LOG_ERR, "failed to wait for %ld: %m\n", (long)pid);
+		return CD_CB_ERR;
+	} else {
+		msg(LOG_DEBUG, "child %ld terminated\n", (long)pid);
+		return CD_CB_OK;
+	}
+}
+
 static int monitor_parse_opts(const char *desc, int argc, char **argv)
 {
 	bool quiet = false;
@@ -578,6 +606,8 @@ int aen_monitor(const char *desc, int argc, char **argv)
 
 	ret = event_loop(mon_dsp, &wait_mask, handle_epoll_err);
 
+	conndb_for_each(monitor_kill_discovery_task, NULL);
+	conndb_free();
 out:
 	free_dispatcher(mon_dsp);
 	return nvme_status_to_errno(ret, true);
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 08/16] monitor: add option --cleanup / -C
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (6 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 07/16] monitor: kill running discovery tasks on exit mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 09/16] monitor: handling of add/remove uevents for nvme controllers mwilck
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

With this option, "nvme monitor" will remove created discovery controllers
after it exits. To avoid shutting down connections that were created
independently, check existing connections on startup and track them with
the discovery_ctrl_existed flag.
---
 conn-db.c |  1 +
 conn-db.h |  1 +
 monitor.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/conn-db.c b/conn-db.c
index cfdc208..93eae4a 100644
--- a/conn-db.c
+++ b/conn-db.c
@@ -382,6 +382,7 @@ int conndb_init_from_sysfs(void)
 			instance =ctrl_instance(devices[i]->d_name);
 			if (instance >= 0) {
 				ci->c.discovery_instance = instance;
+				ci->c.discovery_ctrl_existed = 1;
 				msg(LOG_DEBUG, "found discovery controller %s\n",
 				    devices[i]->d_name);
 			}
diff --git a/conn-db.h b/conn-db.h
index e2a5827..8f5b400 100644
--- a/conn-db.h
+++ b/conn-db.h
@@ -12,6 +12,7 @@ struct nvme_connection {
 	int discovery_pending:1;
 	int did_discovery:1;
 	int successful_discovery:1;
+	int discovery_ctrl_existed:1;
 	union {
 		pid_t discovery_task;
 		int discovery_result;
diff --git a/monitor.c b/monitor.c
index b17fca0..e7e91f9 100644
--- a/monitor.c
+++ b/monitor.c
@@ -43,8 +43,10 @@
 
 static struct monitor_config {
 	bool autoconnect;
+	bool keep_ctrls;
 } mon_cfg = {
 	.autoconnect = true,
+	.keep_ctrls = true,
 };
 
 static struct dispatcher *mon_dsp;
@@ -499,15 +501,50 @@ static int monitor_kill_discovery_task(struct nvme_connection *co,
 	}
 }
 
+static int monitor_remove_discovery_ctrl(struct nvme_connection *co,
+					void *arg __attribute__((unused)))
+{
+	char syspath[PATH_MAX];
+	int len;
+	char *subsysnqn __cleanup__(cleanup_charp) = NULL;
+
+	if (co->discovery_instance == -1 || co->discovery_ctrl_existed)
+		return CD_CB_OK;
+
+	len = snprintf(syspath, sizeof(syspath), SYS_NVME "/nvme%d",
+		       co->discovery_instance);
+	if (len < 0 || len >= sizeof(syspath))
+		return CD_CB_ERR;
+
+	subsysnqn = nvme_get_ctrl_attr(syspath, "subsysnqn");
+	if (subsysnqn && !strcmp(subsysnqn, NVME_DISC_SUBSYS_NAME)) {
+		if (remove_ctrl(co->discovery_instance)) {
+			msg(LOG_ERR,
+			    "failed to remove discovery controller /dev/nvme%d: %m\n",
+			    co->discovery_instance);
+			return CD_CB_ERR;
+		} else
+			msg(LOG_INFO,
+			    "removed discovery controller /dev/nvme%d\n",
+			    co->discovery_instance);
+	} else
+		msg(LOG_WARNING,
+		    "unexpected NQN %s on /dev/nvme%d, not removing controller\n",
+		    subsysnqn ? subsysnqn : "(NULL)", co->discovery_instance);
+	return CD_CB_OK;
+}
+
 static int monitor_parse_opts(const char *desc, int argc, char **argv)
 {
 	bool quiet = false;
 	bool verbose = false;
 	bool debug = false;
 	bool noauto = false;
+	bool cleanup = false;
 	int ret;
 	OPT_ARGS(opts) = {
 		OPT_FLAG("no-connect",     'N', &noauto,              "dry run, do not autoconnect to discovered controllers"),
+		OPT_FLAG("cleanup",        'C', &cleanup,                     "remove created discovery controllers on exit"),
 		OPT_LIST("hostnqn",        'q', &fabrics_cfg.hostnqn,         "user-defined hostnqn (if default not used)"),
 		OPT_LIST("hostid",         'I', &fabrics_cfg.hostid,          "user-defined hostid (if default not used)"),
 		OPT_INT("keep-alive-tmo",  'k', &fabrics_cfg.keep_alive_tmo,  "keep alive timeout period in seconds"),
@@ -540,6 +577,8 @@ static int monitor_parse_opts(const char *desc, int argc, char **argv)
 		log_level = LOG_DEBUG;
 	if (noauto)
 		mon_cfg.autoconnect = false;
+	if (cleanup)
+		mon_cfg.keep_ctrls = false;
 
 	return ret;
 }
@@ -604,9 +643,12 @@ int aen_monitor(const char *desc, int argc, char **argv)
 		goto out;
 	}
 
+	conndb_init_from_sysfs();
 	ret = event_loop(mon_dsp, &wait_mask, handle_epoll_err);
 
 	conndb_for_each(monitor_kill_discovery_task, NULL);
+	if (mon_cfg.autoconnect && !mon_cfg.keep_ctrls)
+		conndb_for_each(monitor_remove_discovery_ctrl, NULL);
 	conndb_free();
 out:
 	free_dispatcher(mon_dsp);
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/16] monitor: handling of add/remove uevents for nvme controllers
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (7 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 08/16] monitor: add option --cleanup / -C mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 10/16] monitor: discover from conf file on startup mwilck
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

On add, we'd like to detect the creation of a discovery controller for a given
connection. More often than not, that doesn't work, because the controller
is still in "connecting" state when the uevent arrives, and reading subsysnqn
yields "(efault)".

This would need to be fixed by waiting for the connection to complete,
but doing this properly is currently out of scope. It's not really a big
problem, because without --persistent, the discovery controller will
be removed anyway, and with --persistent, we'll notice the discovery
controller as soon as it actually receives an AEN and generates an uevent.
---
 monitor.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/monitor.c b/monitor.c
index e7e91f9..43d3084 100644
--- a/monitor.c
+++ b/monitor.c
@@ -159,6 +159,36 @@ static int child_reset_signals(void)
 	return -err;
 }
 
+static void monitor_handle_nvme_add(struct udev_device *ud)
+{
+	const char *syspath = udev_device_get_syspath(ud);
+	char *subsysnqn __cleanup__(cleanup_charp) = NULL;
+	char *state __cleanup__(cleanup_charp) = NULL;
+
+	if (!syspath)
+		return;
+	subsysnqn = nvme_get_ctrl_attr(syspath, "subsysnqn");
+	state = nvme_get_ctrl_attr(syspath, "state");
+	msg(LOG_DEBUG, "add %s => %s [%s]\n", syspath, subsysnqn, state);
+}
+
+static void monitor_handle_nvme_remove(struct udev_device *ud)
+{
+	const char *sysname = udev_device_get_sysname(ud);
+	struct nvme_connection *co;
+
+	if (!sysname)
+		return;
+
+	co = conndb_find_by_ctrl(sysname);
+	if (co) {
+		msg(LOG_DEBUG, "%s: connection discovery controller removed\n",
+		    sysname);
+		co->discovery_instance = -1;
+	}
+	return;
+}
+
 static int monitor_get_fc_uev_props(struct udev_device *ud,
 				    char *traddr, size_t tra_sz,
 				    char *host_traddr, size_t htra_sz)
@@ -358,6 +388,14 @@ static void monitor_handle_nvme_uev(struct udev_device *ud)
 	char traddr[NVMF_TRADDR_SIZE], host_traddr[NVMF_TRADDR_SIZE];
 	char trsvcid[NVMF_TRSVCID_SIZE], transport[5];
 
+	if (!strcmp(udev_device_get_action(ud), "remove")) {
+		monitor_handle_nvme_remove(ud);
+		return;
+	}
+	if (!strcmp(udev_device_get_action(ud), "add")) {
+		monitor_handle_nvme_add(ud);
+		return;
+	}
 	if (strcmp(udev_device_get_action(ud), "change"))
 		return;
 
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 10/16] monitor: discover from conf file on startup
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (8 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 09/16] monitor: handling of add/remove uevents for nvme controllers mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 11/16] monitor: watch discovery.conf with inotify mwilck
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Implement discovery from /etc/nvme/discovery.conf on startup of
the monitor.

The monitor needs to call discover_from_conf_file() in order to
be able to do discovery at startup. discover_from_conf_file() takes
the argconfig_commandline_options argument from fabrics_discover().
By moving these options into a static variable, we can avoid passing
them as function argument. This makes it possible to use
discover_from_conf_file() from the monitor without making the
options for fabric_discover() globally visible.
---
 fabrics.c | 67 +++++++++++++++++++++++++++----------------------------
 fabrics.h |  5 +++++
 monitor.c | 58 +++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 36 deletions(-)

diff --git a/fabrics.c b/fabrics.c
index 5012519..b195d0e 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -85,7 +85,6 @@ struct connect_args {
 struct connect_args *tracked_ctrls;
 
 #define PATH_NVME_FABRICS	"/dev/nvme-fabrics"
-#define PATH_NVMF_DISC		"/etc/nvme/discovery.conf"
 #define PATH_NVMF_HOSTNQN	"/etc/nvme/hostnqn"
 #define PATH_NVMF_HOSTID	"/etc/nvme/hostid"
 #define MAX_DISC_ARGS		10
@@ -1439,8 +1438,9 @@ int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 	return ret;
 }
 
-static int discover_from_conf_file(const char *desc, char *argstr,
-		const struct argconfig_commandline_options *opts, bool connect)
+static OPT_ARGS(discover_opts);
+
+int discover_from_conf_file(const char *desc, char *argstr, bool connect)
 {
 	FILE *f;
 	char line[256], *ptr, *all_args, *args, **argv;
@@ -1480,7 +1480,7 @@ static int discover_from_conf_file(const char *desc, char *argstr,
 		while ((ptr = strsep(&args, " =\n")) != NULL)
 			argv[argc++] = ptr;
 
-		err = argconfig_parse(argc, argv, desc, opts);
+		err = argconfig_parse(argc, argv, desc, discover_opts);
 		if (err)
 			goto free_and_continue;
 
@@ -1523,41 +1523,40 @@ out:
 	return ret;
 }
 
+static OPT_ARGS(discover_opts) = {
+	OPT_LIST("transport",      't', &fabrics_cfg.transport,       "transport type"),
+	OPT_LIST("traddr",         'a', &fabrics_cfg.traddr,          "transport address"),
+	OPT_LIST("trsvcid",        's', &fabrics_cfg.trsvcid,         "transport service id (e.g. IP port)"),
+	OPT_LIST("host-traddr",    'w', &fabrics_cfg.host_traddr,     "host traddr (e.g. FC WWN's)"),
+	OPT_LIST("hostnqn",        'q', &fabrics_cfg.hostnqn,         "user-defined hostnqn (if default not used)"),
+	OPT_LIST("hostid",         'I', &fabrics_cfg.hostid,          "user-defined hostid (if default not used)"),
+	OPT_LIST("raw",            'r', &fabrics_cfg.raw,             "raw output file"),
+	OPT_LIST("device",         'd', &fabrics_cfg.device,          "use existing discovery controller device"),
+	OPT_INT("keep-alive-tmo",  'k', &fabrics_cfg.keep_alive_tmo,  "keep alive timeout period in seconds"),
+	OPT_INT("reconnect-delay", 'c', &fabrics_cfg.reconnect_delay, "reconnect timeout period in seconds"),
+	OPT_INT("ctrl-loss-tmo",   'l', &fabrics_cfg.ctrl_loss_tmo,   "controller loss timeout period in seconds"),
+	OPT_INT("tos",             'T', &fabrics_cfg.tos,             "type of service"),
+	OPT_FLAG("hdr_digest",     'g', &fabrics_cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
+	OPT_FLAG("data_digest",    'G', &fabrics_cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
+	OPT_INT("nr-io-queues",    'i', &fabrics_cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
+	OPT_INT("nr-write-queues", 'W', &fabrics_cfg.nr_write_queues, "number of write queues to use (default 0)"),
+	OPT_INT("nr-poll-queues",  'P', &fabrics_cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
+	OPT_INT("queue-size",      'Q', &fabrics_cfg.queue_size,      "number of io queue elements to use (default 128)"),
+	OPT_FLAG("persistent",     'p', &fabrics_cfg.persistent,      "persistent discovery connection"),
+	OPT_FLAG("quiet",          'S', &fabrics_cfg.quiet,           "suppress already connected errors"),
+	OPT_FLAG("matching",       'm', &fabrics_cfg.matching_only,   "connect only records matching the traddr"),
+	OPT_FMT("output-format",   'o', &fabrics_cfg.output_format,   "Output format: normal|json|binary"),
+	OPT_END()
+};
+
 int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 {
 	char argstr[BUF_SIZE];
 	int ret;
 	enum nvme_print_flags flags;
-	bool quiet = false;
-
-	OPT_ARGS(opts) = {
-		OPT_LIST("transport",      't', &fabrics_cfg.transport,       "transport type"),
-		OPT_LIST("traddr",         'a', &fabrics_cfg.traddr,          "transport address"),
-		OPT_LIST("trsvcid",        's', &fabrics_cfg.trsvcid,         "transport service id (e.g. IP port)"),
-		OPT_LIST("host-traddr",    'w', &fabrics_cfg.host_traddr,     "host traddr (e.g. FC WWN's)"),
-		OPT_LIST("hostnqn",        'q', &fabrics_cfg.hostnqn,         "user-defined hostnqn (if default not used)"),
-		OPT_LIST("hostid",         'I', &fabrics_cfg.hostid,          "user-defined hostid (if default not used)"),
-		OPT_LIST("raw",            'r', &fabrics_cfg.raw,             "raw output file"),
-		OPT_LIST("device",         'd', &fabrics_cfg.device,          "existing discovery controller device"),
-		OPT_INT("keep-alive-tmo",  'k', &fabrics_cfg.keep_alive_tmo,  "keep alive timeout period in seconds"),
-		OPT_INT("reconnect-delay", 'c', &fabrics_cfg.reconnect_delay, "reconnect timeout period in seconds"),
-		OPT_INT("ctrl-loss-tmo",   'l', &fabrics_cfg.ctrl_loss_tmo,   "controller loss timeout period in seconds"),
-		OPT_INT("tos",             'T', &fabrics_cfg.tos,             "type of service"),
-		OPT_FLAG("hdr_digest",     'g', &fabrics_cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
-		OPT_FLAG("data_digest",    'G', &fabrics_cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
-		OPT_INT("nr-io-queues",    'i', &fabrics_cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
-		OPT_INT("nr-write-queues", 'W', &fabrics_cfg.nr_write_queues, "number of write queues to use (default 0)"),
-		OPT_INT("nr-poll-queues",  'P', &fabrics_cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
-		OPT_INT("queue-size",      'Q', &fabrics_cfg.queue_size,      "number of io queue elements to use (default 128)"),
-		OPT_FLAG("persistent",     'p', &fabrics_cfg.persistent,      "persistent discovery connection"),
-		OPT_FLAG("quiet",          'S', &quiet,               "suppress already connected errors"),
-		OPT_FLAG("matching",       'm', &fabrics_cfg.matching_only,   "connect only records matching the traddr"),
-		OPT_FMT("output-format",   'o', &fabrics_cfg.output_format,   output_format),
-		OPT_END()
-	};
 
 	fabrics_cfg.tos = -1;
-	ret = argconfig_parse(argc, argv, desc, opts);
+	ret = argconfig_parse(argc, argv, desc, discover_opts);
 	if (ret)
 		goto out;
 
@@ -1572,7 +1571,7 @@ int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 		}
 	}
 
-	if (quiet)
+	if (fabrics_cfg.quiet)
 		log_level = LOG_WARNING;
 
 	if (fabrics_cfg.device && !strcmp(fabrics_cfg.device, "none"))
@@ -1581,7 +1580,7 @@ int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
 
 	if (!fabrics_cfg.transport && !fabrics_cfg.traddr) {
-		ret = discover_from_conf_file(desc, argstr, opts, connect);
+		ret = discover_from_conf_file(desc, argstr, connect);
 	} else {
 		set_discovery_kato(&fabrics_cfg);
 
diff --git a/fabrics.h b/fabrics.h
index 41e6a2d..128f251 100644
--- a/fabrics.h
+++ b/fabrics.h
@@ -38,6 +38,7 @@ struct fabrics_config {
 	int  data_digest;
 	bool persistent;
 	bool matching_only;
+	bool quiet;
 	const char *output_format;
 };
 extern struct fabrics_config fabrics_cfg;
@@ -45,11 +46,15 @@ extern struct fabrics_config fabrics_cfg;
 extern const char *const trtypes[];
 
 #define BUF_SIZE 4096
+#define PATH_NVMF_CFG_DIR	"/etc/nvme"
+#define FILE_NVMF_DISC		"discovery.conf"
+#define PATH_NVMF_DISC		PATH_NVMF_CFG_DIR "/" FILE_NVMF_DISC
 
 int build_options(char *argstr, int max_len, bool discover);
 int do_discover(char *argstr, bool connect, enum nvme_print_flags flags);
 int ctrl_instance(const char *device);
 char *parse_conn_arg(const char *conargs, const char delim, const char *field);
 int remove_ctrl(int instance);
+int discover_from_conf_file(const char *desc, char *argstr, bool connect);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 43d3084..95dea19 100644
--- a/monitor.c
+++ b/monitor.c
@@ -477,12 +477,20 @@ static int handle_epoll_err(int errcode)
 		default:
 			break;
 		}
+
 		co = conndb_find_by_pid(pid);
 		if (!co) {
-			msg(LOG_ERR, "no connection found for discovery task %ld\n",
-			    (long)pid);
+			if (!WIFEXITED(wstatus))
+				msg(LOG_WARNING, "child %ld didn't exit normally\n",
+				    (long)pid);
+			else if (WEXITSTATUS(wstatus) != 0)
+				msg(LOG_NOTICE, "child %ld exited with status \"%s\"\n",
+				    (long)pid, strerror(WEXITSTATUS(wstatus)));
+			else
+				msg(LOG_DEBUG, "child %ld exited normally\n", (long)pid);
 			continue;
 		}
+
 		if (!WIFEXITED(wstatus)) {
 			msg(LOG_WARNING, "child %ld didn't exit normally\n",
 			    (long)pid);
@@ -572,6 +580,45 @@ static int monitor_remove_discovery_ctrl(struct nvme_connection *co,
 	return CD_CB_OK;
 }
 
+static int monitor_discover_from_conf_file(void)
+{
+	char argstr[BUF_SIZE];
+	pid_t pid;
+	int rc;
+
+	pid = fork();
+	if (pid == -1) {
+		msg(LOG_ERR, "failed to fork discovery task: %m");
+		return -errno;
+	} else if (pid > 0) {
+		msg(LOG_DEBUG, "started discovery task %ld from conf file\n",
+		    (long)pid);
+		return 0;
+	}
+
+	child_reset_signals();
+
+	msg(LOG_NOTICE, "starting discovery from conf file\n");
+
+	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
+	fabrics_cfg.tos = -1;
+	fabrics_cfg.persistent = true;
+
+	rc = discover_from_conf_file("Discover NVMeoF subsystems from " PATH_NVMF_DISC,
+				     argstr, mon_cfg.autoconnect);
+
+	exit(-rc);
+	/* not reached */
+	return rc;
+}
+
+static int discovery_from_conf_file_cb(struct event *ev __attribute__((unused)),
+					unsigned int __attribute__((unused)) ep_events)
+{
+	monitor_discover_from_conf_file();
+	return EVENTCB_CLEANUP;
+}
+
 static int monitor_parse_opts(const char *desc, int argc, char **argv)
 {
 	bool quiet = false;
@@ -638,6 +685,7 @@ int aen_monitor(const char *desc, int argc, char **argv)
 	struct udev *udev __cleanup__(cleanup_udevp) = NULL;
 	struct udev_monitor *monitor __cleanup__(cleanup_monitorp) = NULL;
 	struct udev_monitor_event udev_event = { .e.fd = -1, };
+	struct event startup_discovery_event = { .fd = -1, };
 	sigset_t wait_mask;
 
 	ret = monitor_parse_opts(desc, argc, argv);
@@ -663,6 +711,12 @@ int aen_monitor(const char *desc, int argc, char **argv)
 		goto out;
 	}
 
+	startup_discovery_event =
+		TIMER_EVENT_ON_STACK(discovery_from_conf_file_cb, 0);
+	if ((ret = event_add(mon_dsp, &startup_discovery_event)) != 0)
+		msg(LOG_ERR, "failed to register initial discovery timer: %s\n",
+		    strerror(-ret));
+
 	ret = create_udev_monitor(udev, &monitor);
 	if (ret != 0)
 		goto out;
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 11/16] monitor: watch discovery.conf with inotify
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (9 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 10/16] monitor: discover from conf file on startup mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 12/16] monitor: add parent/child messaging and "notify" message exchange mwilck
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Watch discovery.conf, and re-start discovery from conf file
whenever it's created or changed.
---
 monitor.c      | 114 +++++++++++++++++++++++++++++++++++++++++++++++++
 util/cleanup.c |   2 +
 util/cleanup.h |   1 +
 3 files changed, 117 insertions(+)

diff --git a/monitor.c b/monitor.c
index 95dea19..a1229a7 100644
--- a/monitor.c
+++ b/monitor.c
@@ -20,6 +20,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
+#include <inttypes.h>
 #include <libudev.h>
 #include <signal.h>
 #include <time.h>
@@ -28,6 +29,7 @@
 #include <sys/types.h>
 #include <sys/wait.h>
 #include <sys/epoll.h>
+#include <sys/inotify.h>
 
 #include "nvme-status.h"
 #include "nvme.h"
@@ -448,6 +450,14 @@ static int monitor_handle_uevents(struct event *ev,
 	return EVENTCB_CONTINUE;
 }
 
+static struct {
+	bool running;
+	bool pending;
+	pid_t pid;
+} discovery_conf_task;
+
+static int monitor_discover_from_conf_file(void);
+
 static int handle_epoll_err(int errcode)
 {
 	if (errcode != -EINTR)
@@ -488,6 +498,15 @@ static int handle_epoll_err(int errcode)
 				    (long)pid, strerror(WEXITSTATUS(wstatus)));
 			else
 				msg(LOG_DEBUG, "child %ld exited normally\n", (long)pid);
+			if (discovery_conf_task.running &&
+			    discovery_conf_task.pid == pid) {
+				discovery_conf_task.running = false;
+				if (discovery_conf_task.pending) {
+					msg(LOG_NOTICE,
+					    "discovery from conf file pending - restarting\n");
+					monitor_discover_from_conf_file();
+				}
+			}
 			continue;
 		}
 
@@ -586,6 +605,13 @@ static int monitor_discover_from_conf_file(void)
 	pid_t pid;
 	int rc;
 
+	if (discovery_conf_task.running) {
+		msg(LOG_NOTICE, "discovery from conf file already running (%ld)\n",
+		    (long)discovery_conf_task.pid);
+		discovery_conf_task.pending = true;
+		return 0;
+	}
+
 	pid = fork();
 	if (pid == -1) {
 		msg(LOG_ERR, "failed to fork discovery task: %m");
@@ -593,6 +619,9 @@ static int monitor_discover_from_conf_file(void)
 	} else if (pid > 0) {
 		msg(LOG_DEBUG, "started discovery task %ld from conf file\n",
 		    (long)pid);
+		discovery_conf_task.pending = false;
+		discovery_conf_task.running = true;
+		discovery_conf_task.pid = pid;
 		return 0;
 	}
 
@@ -619,6 +648,89 @@ static int discovery_from_conf_file_cb(struct event *ev __attribute__((unused)),
 	return EVENTCB_CLEANUP;
 }
 
+static void handle_inotify_event(struct inotify_event *iev)
+{
+	if ((iev->mask & (IN_CLOSE_WRITE|IN_MOVED_TO)) == 0) {
+		msg(LOG_DEBUG, "ignoring event mask 0x%"PRIx32"\n", iev->mask);
+		return;
+	}
+
+	if (!iev->name || strcmp(iev->name, FILE_NVMF_DISC)) {
+		msg(LOG_DEBUG, "ignoring event mask 0x%"PRIx32" for %s\n",
+		    iev->mask, iev->name ? iev->name : "(null)");
+		return;
+	}
+
+	msg(LOG_INFO, "discovery.conf changed, re-reading\n");
+	monitor_discover_from_conf_file();
+}
+
+static int inotify_cb(struct event *ev, unsigned int ep_events)
+{
+	char buf[sizeof(struct inotify_event) + NAME_MAX + 1];
+	int rc;
+
+	if (ev->reason != REASON_EVENT_OCCURED || (ep_events & EPOLLIN) == 0)
+		return EVENTCB_CONTINUE;
+
+	while (true) {
+		struct inotify_event *iev;
+
+		rc = read(ev->fd, buf, sizeof(buf));
+		if (rc == -1) {
+			if (errno != EAGAIN)
+				msg(LOG_ERR, "error reading from inotify fd: %m\n");
+			return EVENTCB_CONTINUE;
+		}
+
+		iev = (struct inotify_event *)buf;
+		if (iev->mask & (IN_DELETE_SELF|IN_MOVE_SELF)) {
+			if (inotify_rm_watch(ev->fd, iev->wd) == -1)
+				msg(LOG_ERR, "failed to remove watch %d: %m\n",
+				    iev->wd);
+			msg(LOG_WARNING, "inotify watch %d removed\n", iev->wd);
+			return EVENTCB_CLEANUP;
+		}
+		handle_inotify_event(iev);
+	}
+	return EVENTCB_CONTINUE;
+}
+
+static DEFINE_CLEANUP_FUNC(cleanup_event, struct event *, free);
+
+static void add_inotify_event(struct dispatcher *dsp)
+{
+	struct event *inotify_event __cleanup__(cleanup_event) = NULL;
+	int fd __cleanup__(cleanup_fd) = -1;
+	int rc;
+
+	inotify_event = calloc(1, sizeof *inotify_event);
+	if (!inotify_event)
+		return;
+
+	fd = inotify_init1(IN_NONBLOCK|IN_CLOEXEC);
+	if (fd == -1) {
+		msg(LOG_ERR, "failed to init inotify: %m\n");
+		return;
+	}
+
+	*inotify_event = EVENT_ON_HEAP(inotify_cb, fd, EPOLLIN);
+	rc = inotify_add_watch(inotify_event->fd, PATH_NVMF_CFG_DIR,
+			       IN_CLOSE_WRITE|IN_MOVED_TO|
+			       IN_DELETE_SELF|IN_MOVE_SELF);
+	if (rc == -1)
+		msg(LOG_ERR, "failed to add inotify watch for %s: %m\n",
+		    PATH_NVMF_CFG_DIR);
+
+	if ((rc = event_add(dsp, inotify_event)) < 0) {
+		msg(LOG_ERR, "failed to add inotify event: %s\n",
+		    strerror(-rc));
+		return;
+	}
+	fd = -1;
+	inotify_event = NULL;
+}
+
 static int monitor_parse_opts(const char *desc, int argc, char **argv)
 {
 	bool quiet = false;
@@ -735,7 +847,9 @@ int aen_monitor(const char *desc, int argc, char **argv)
 		goto out;
 	}
 
+	add_inotify_event(mon_dsp);
 	conndb_init_from_sysfs();
+
 	ret = event_loop(mon_dsp, &wait_mask, handle_epoll_err);
 
 	conndb_for_each(monitor_kill_discovery_task, NULL);
diff --git a/util/cleanup.c b/util/cleanup.c
index 0d5d910..3101e1a 100644
--- a/util/cleanup.c
+++ b/util/cleanup.c
@@ -1,4 +1,6 @@
 #include <stdlib.h>
+#include <unistd.h>
 #include "cleanup.h"
 
 DEFINE_CLEANUP_FUNC(cleanup_charp, char *, free);
+DEFINE_CLEANUP_FUNC(cleanup_fd, int, close);
diff --git a/util/cleanup.h b/util/cleanup.h
index 89a4984..b039488 100644
--- a/util/cleanup.h
+++ b/util/cleanup.h
@@ -14,5 +14,6 @@ DECLARE_CLEANUP_FUNC(name, type)		\
 }
 
 DECLARE_CLEANUP_FUNC(cleanup_charp, char *);
+DECLARE_CLEANUP_FUNC(cleanup_fd, int);
 
 #endif
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 12/16] monitor: add parent/child messaging and "notify" message exchange
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (10 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 11/16] monitor: watch discovery.conf with inotify mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 13/16] monitor: add "query device" " mwilck
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Persistent discovery controllers are set up in forked children,
possibly using recursion via referrals in do_discover(). The simplest
way to keep the parent's connection registry up to date is to communicate
freshly created controllers directly from child to parent.

To make this work, a callback function is passed to do_discover(),
which (if non-null) will be called after setting up a discovery
controller to initiate the message exchange. The callback is then
passed down to connect_ctrl() for recursive discoveries (referrals).
---
 fabrics.c |  29 ++--
 fabrics.h |   9 +-
 monitor.c | 405 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 monitor.h |   6 +
 4 files changed, 435 insertions(+), 14 deletions(-)

diff --git a/fabrics.c b/fabrics.c
index b195d0e..a9e28d8 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -50,6 +50,7 @@
 #include "common.h"
 #include "util/log.h"
 #include "util/cleanup.h"
+#include "monitor.h"
 
 #ifdef HAVE_SYSTEMD
 #include <systemd/sd-id128.h>
@@ -1068,7 +1069,8 @@ free_addrinfo:
 	return ret;
 }
 
-static int connect_ctrl(struct nvmf_disc_rsp_page_entry *e)
+static int connect_ctrl(struct nvmf_disc_rsp_page_entry *e,
+			const struct monitor_callbacks *monitor)
 {
 	char argstr[BUF_SIZE], *p;
 	const char *transport;
@@ -1254,7 +1256,7 @@ retry:
 		flags = validate_output_format(fabrics_cfg.output_format);
 		if (flags < 0)
 			flags = NORMAL;
-		ret = do_discover(argstr, true, flags);
+		ret = do_discover(argstr, true, flags, monitor);
 	} else
 		ret = add_ctrl(argstr);
 	if (ret == -EINVAL && e->treq & NVMF_TREQ_DISABLE_SQFLOW) {
@@ -1305,7 +1307,8 @@ static bool should_connect(struct nvmf_disc_rsp_page_entry *entry)
 	return !strncmp(fabrics_cfg.traddr, entry->traddr, len);
 }
 
-static int connect_ctrls(struct nvmf_disc_rsp_page_hdr *log, int numrec)
+static int connect_ctrls(struct nvmf_disc_rsp_page_hdr *log, int numrec,
+			 const struct monitor_callbacks *monitor)
 {
 	int i;
 	int instance;
@@ -1315,7 +1318,7 @@ static int connect_ctrls(struct nvmf_disc_rsp_page_hdr *log, int numrec)
 		if (!should_connect(&log->entries[i]))
 			continue;
 
-		instance = connect_ctrl(&log->entries[i]);
+		instance = connect_ctrl(&log->entries[i], monitor);
 
 		/* clean success */
 		if (instance >= 0)
@@ -1355,7 +1358,8 @@ static void nvmf_get_host_identifiers(int ctrl_instance)
 
 static DEFINE_CLEANUP_FUNC(cleanup_log, struct nvmf_disc_rsp_page_hdr *, free);
 
-int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
+int do_discover(char *argstr, bool connect, enum nvme_print_flags flags,
+		const struct monitor_callbacks *monitor)
 {
 	struct nvmf_disc_rsp_page_hdr *log __cleanup__(cleanup_log) = NULL;
 	char *dev_name;
@@ -1384,6 +1388,9 @@ int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 	}
 	if (instance < 0)
 		return instance;
+	else if (monitor && monitor->notify &&
+		 (fabrics_cfg.device || fabrics_cfg.persistent))
+		monitor->notify(argstr, instance);
 
 	if (asprintf(&dev_name, "/dev/nvme%d", instance) < 0)
 		return -errno;
@@ -1391,6 +1398,7 @@ int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 	free(dev_name);
 	if (fabrics_cfg.persistent)
 		msg(LOG_NOTICE, "Persistent device: nvme%d\n", instance);
+
 	if (!fabrics_cfg.device && !fabrics_cfg.persistent) {
 		err = remove_ctrl(instance);
 		if (err)
@@ -1400,7 +1408,7 @@ int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 	switch (ret) {
 	case DISC_OK:
 		if (connect)
-			ret = connect_ctrls(log, numrec);
+			ret = connect_ctrls(log, numrec, monitor);
 		else if (fabrics_cfg.raw || flags == BINARY)
 			save_discovery_log(log, numrec);
 		else if (flags == JSON)
@@ -1440,7 +1448,8 @@ int do_discover(char *argstr, bool connect, enum nvme_print_flags flags)
 
 static OPT_ARGS(discover_opts);
 
-int discover_from_conf_file(const char *desc, char *argstr, bool connect)
+int discover_from_conf_file(const char *desc, char *argstr, bool connect,
+			    const struct monitor_callbacks *monitor)
 {
 	FILE *f;
 	char line[256], *ptr, *all_args, *args, **argv;
@@ -1507,7 +1516,7 @@ int discover_from_conf_file(const char *desc, char *argstr, bool connect)
 			goto free_and_continue;
 		}
 
-		err = do_discover(argstr, connect, flags);
+		err = do_discover(argstr, connect, flags, monitor);
 		if (err)
 			ret = err;
 
@@ -1580,7 +1589,7 @@ int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 	fabrics_cfg.nqn = NVME_DISC_SUBSYS_NAME;
 
 	if (!fabrics_cfg.transport && !fabrics_cfg.traddr) {
-		ret = discover_from_conf_file(desc, argstr, connect);
+		ret = discover_from_conf_file(desc, argstr, connect, NULL);
 	} else {
 		set_discovery_kato(&fabrics_cfg);
 
@@ -1597,7 +1606,7 @@ int fabrics_discover(const char *desc, int argc, char **argv, bool connect)
 		if (ret)
 			goto out;
 
-		ret = do_discover(argstr, connect, flags);
+		ret = do_discover(argstr, connect, flags, NULL);
 	}
 
 out:
diff --git a/fabrics.h b/fabrics.h
index 128f251..f2f19d1 100644
--- a/fabrics.h
+++ b/fabrics.h
@@ -50,11 +50,16 @@ extern const char *const trtypes[];
 #define FILE_NVMF_DISC		"discovery.conf"
 #define PATH_NVMF_DISC		PATH_NVMF_CFG_DIR "/" FILE_NVMF_DISC
 
+struct monitor_callbacks;
+
 int build_options(char *argstr, int max_len, bool discover);
-int do_discover(char *argstr, bool connect, enum nvme_print_flags flags);
+int do_discover(char *argstr, bool connect, enum nvme_print_flags flags,
+		const struct monitor_callbacks *);
 int ctrl_instance(const char *device);
 char *parse_conn_arg(const char *conargs, const char delim, const char *field);
 int remove_ctrl(int instance);
-int discover_from_conf_file(const char *desc, char *argstr, bool connect);
+int discover_from_conf_file(const char *desc, char *argstr, bool connect,
+			    const struct monitor_callbacks *);
+
 
 #endif
diff --git a/monitor.c b/monitor.c
index a1229a7..7f08772 100644
--- a/monitor.c
+++ b/monitor.c
@@ -20,6 +20,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <errno.h>
+#include <fcntl.h>
 #include <inttypes.h>
 #include <libudev.h>
 #include <signal.h>
@@ -30,6 +31,8 @@
 #include <sys/wait.h>
 #include <sys/epoll.h>
 #include <sys/inotify.h>
+#include <sys/socket.h>
+#include <sys/un.h>
 
 #include "nvme-status.h"
 #include "nvme.h"
@@ -43,6 +46,13 @@
 #include "util/log.h"
 #include "event/event.h"
 
+#define MSG_SIZE 1024
+#define SOCK_PATH "nvme-monitor"
+static const struct sockaddr_un monitor_sa = {
+	.sun_family = AF_UNIX,
+	.sun_path = "\0" SOCK_PATH
+};
+
 static struct monitor_config {
 	bool autoconnect;
 	bool keep_ctrls;
@@ -161,6 +171,395 @@ static int child_reset_signals(void)
 	return -err;
 }
 
+
+static ssize_t monitor_child_message(char *buf, size_t size, size_t len)
+{
+	int fd __cleanup__(cleanup_fd) = -1;
+	struct sockaddr_un clt_addr = { .sun_family = AF_UNIX, };
+	ssize_t rc;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd == -1) {
+		msg(LOG_ERR, "failed to create socket: %m\n");
+		return -errno;
+	}
+
+	snprintf(&clt_addr.sun_path[1], sizeof(clt_addr.sun_path) - 1,
+		 SOCK_PATH ".%ld", (long)getpid());
+
+	if ((rc = bind(fd, (struct sockaddr *)&clt_addr, sizeof(clt_addr))) == -1) {
+		msg(LOG_ERR, "failed in bind(): %m\n");
+		return -errno;
+	}
+
+	if ((rc = sendto(fd, buf, len, 0,
+			 (struct sockaddr *)&monitor_sa, sizeof(monitor_sa))) == -1) {
+		msg(LOG_ERR, "failed to send client message: %m\n");
+		return -errno;
+	}
+	msg(LOG_DEBUG, "sent %zd bytes to server\n", rc);
+
+	memset(buf, 0, size);
+	if ((rc = recv(fd, buf, size, MSG_TRUNC)) == -1) {
+		msg(LOG_ERR, "failed to receive response: %m\n");
+		return -errno;
+	} else if (rc >= size) {
+		msg(LOG_ERR, "response truncated: %zu bytes missing\n",
+		    rc - (size - 1));
+		return -EOVERFLOW;
+	}
+
+	return rc;
+}
+
+#define safe_snprintf(var, size, format, args...)			\
+({									\
+	size_t __size = size;						\
+	int __ret;							\
+									\
+	__ret = snprintf(var, __size, format, ##args);			\
+	__ret < 0 || (size_t)__ret < __size ? __ret : -EOVERFLOW;	\
+})
+
+/*
+ * Monitor parent <-> child message exchange protocol
+ *
+ * Every exchange consists of a single message sent from child (discovery
+ * process) to parent (monitor main program) and a single response from
+ * the parent to the child.
+ *
+ * "New discovery controller" exchange:
+ *    - The child sends a MON_MSG_NEW message to the parent after establishing
+ *      the connection to a new persistent discovery controller.
+ *      Payload: the instance number and the the connection parameter string
+ *      as sent to /dev/nvme-fabrics.
+ *      This exchange is initiated in notify_new_discovery(), which is passed
+ *      as "notify" callback for do_discover().
+ *    - parent responds with MON_MSG_ACK (or MON_MSG_ERR if an error occurs).
+ */
+
+static const char monitor_magic[] = "NVMM";
+enum {
+	MON_MSG_ACK = 0,
+	MON_MSG_ERR,
+	MON_MSG_NEW,
+	__MAX_MON_MSG__,
+};
+
+enum {
+	MAGIC_LEN = 4,
+	OPCODE_LEN = 4,
+	HEADER_LEN = MAGIC_LEN + OPCODE_LEN,
+};
+
+static const char *const monitor_opcode[] = {
+	[MON_MSG_ACK] = "ACK ",
+	[MON_MSG_ERR] = "ERR ",
+	[MON_MSG_NEW] = "NEW ",
+};
+
+static int monitor_msg_hdr(char *buf, size_t len, int opcode)
+{
+	memset(buf, 0, len);
+	return safe_snprintf(buf, len, "%s%s",
+			     monitor_magic, monitor_opcode[opcode]);
+}
+
+static int monitor_check_hdr(const char *buf, size_t len, int *opcode)
+{
+	int i;
+
+	if (len < HEADER_LEN) {
+		msg(LOG_ERR, "short packet\n");
+		return -EINVAL;
+	}
+
+	if (memcmp(buf, monitor_magic, MAGIC_LEN) != 0) {
+		msg(LOG_ERR, "bad magic\n");
+		return -EINVAL;
+	}
+
+	buf += MAGIC_LEN;
+	for (i = 0; i < ARRAY_SIZE(monitor_opcode); i ++) {
+		if (memcmp(buf, monitor_opcode[i], OPCODE_LEN) == 0)
+			break;
+	}
+
+	if (i == ARRAY_SIZE(monitor_opcode)) {
+		msg(LOG_ERR, "invalid opcode\n");
+		return -EINVAL;
+	}
+
+	*opcode = i;
+	return HEADER_LEN;
+}
+
+static int monitor_ack_msg(char *buf, size_t len)
+{
+	return monitor_msg_hdr(buf, len, MON_MSG_ACK);
+}
+
+static __attribute__((unused))
+int monitor_err_msg(char *buf, size_t len)
+{
+	return monitor_msg_hdr(buf, len, MON_MSG_ERR);
+}
+
+static int monitor_check_resp(const char *buf, size_t len, int req_opcode)
+{
+	int resp_opcode, rc, done;
+
+	if ((done = monitor_check_hdr(buf, len, &resp_opcode)) < 0)
+		return done;
+
+	buf += done;
+	len -= done;
+	rc = -EINVAL;
+
+	switch (req_opcode) {
+	case MON_MSG_NEW:
+		if (resp_opcode == MON_MSG_ACK && len == 0)
+			rc = 0;
+		break;
+	default:
+		break;
+	}
+
+	msg(rc == 0 ? LOG_DEBUG : LOG_ERR,
+	    "%s response: %s => %s, len=%zu\n",
+	    rc == 0 ? "good" : "bad",
+	    monitor_opcode[req_opcode], monitor_opcode[resp_opcode], len);
+
+	return rc == 0 ? done : rc;
+}
+
+static void notify_new_discovery(const char *argstr, int instance)
+{
+	char buf[MSG_SIZE];
+	size_t len = 0;
+	ssize_t rc;
+
+	if ((rc = monitor_msg_hdr(buf, sizeof(buf), MON_MSG_NEW)) < 0) {
+		msg(LOG_ERR, "failed to create msghdr: %s\n", strerror(-rc));
+		return;
+	}
+	len += rc;
+
+	if ((rc = safe_snprintf(buf + len, sizeof(buf) - len, "%d %s",
+				instance, argstr)) < 0) {
+		msg(LOG_ERR, "failed to create msg: %s\n", strerror(-rc));
+		return;
+	}
+	len += rc;
+
+	if ((rc = monitor_child_message(buf, sizeof(buf), len)) < 0)
+		return;
+
+	monitor_check_resp(buf, rc, MON_MSG_NEW);
+}
+
+static const struct monitor_callbacks discover_callbacks = {
+	.notify = notify_new_discovery,
+};
+
+struct comm_event {
+	struct event e;
+	struct sockaddr_un addr;
+	char message[MSG_SIZE];
+	int msglen;
+};
+
+static int handle_child_msg_new(char *buf, size_t size, ssize_t *len, ssize_t ofs)
+{
+	int rc, instance, n;
+	struct nvme_connection *co = NULL;
+
+	if (*len - ofs < 2) {
+		msg(LOG_ERR, "short packet (len=%zu)\n", *len);
+		return MON_MSG_ERR;
+	}
+	buf += ofs;
+	if (sscanf(buf, "%d %n", &instance, &n) != 1) {
+		msg(LOG_ERR, "no instance number found\n");
+		return MON_MSG_ERR;
+	}
+	buf += n;
+
+	rc = conndb_add_disc_ctrl(buf, &co);
+	if (rc == 0 || rc == -EEXIST) {
+		if (co->discovery_instance != instance) {
+			co->discovery_instance = instance;
+			conn_msg(LOG_INFO, co,
+				 "discovery instance set to %d\n", instance);
+		} else
+			conn_msg(LOG_DEBUG, co, "discovery instance unchanged\n");
+	} else
+		msg(LOG_ERR, "failed to add connection: %s\n", strerror(-rc));
+
+	return MON_MSG_ACK;
+}
+
+static int handle_child_msg(struct comm_event *comm, ssize_t len)
+{
+	ssize_t rc, ofs;
+	int opcode = MON_MSG_ERR;
+	char *buf =  comm->message;
+
+	msg(LOG_DEBUG, "got message from %s: %s\n",
+	    &comm->addr.sun_path[1], buf);
+
+	if ((ofs = monitor_check_hdr(comm->message, sizeof(comm->message),
+				     &opcode)) < 0)
+		rc = MON_MSG_ERR;
+	else {
+		switch (opcode) {
+		case MON_MSG_NEW:
+			rc = handle_child_msg_new(comm->message,
+						  sizeof(comm->message),
+						  &len, ofs);
+			break;
+		case MON_MSG_ACK:
+		case MON_MSG_ERR:
+			msg(LOG_ERR, "unexpected message: %s\n", monitor_opcode[opcode]);
+			rc = MON_MSG_ERR;
+			break;
+		default:
+			msg(LOG_ERR, "bogus message\n");
+			rc = MON_MSG_ERR;
+			break;
+		};
+	}
+
+	switch (rc) {
+	case MON_MSG_ACK:
+		if ((rc = monitor_ack_msg(comm->message, sizeof(comm->message))) > 0)
+			len = rc;
+		break;
+	case MON_MSG_ERR:
+		if ((rc = monitor_err_msg(comm->message, sizeof(comm->message))) > 0)
+			len = rc;
+		break;
+	default:
+		/* other messages must be filled in by handlers above */
+		break;
+	}
+	if (rc < 0)
+		msg(LOG_ERR, "failed to create response\n");
+	else {
+		comm->msglen = len;
+		msg(LOG_DEBUG, "response (%zd): %s\n", len, comm->message);
+	}
+	return rc;
+}
+
+static int parent_comm_cb(struct event *evt, uint32_t events)
+{
+	struct comm_event *comm = container_of(evt, struct comm_event, e);
+	ssize_t rc;
+
+	if (events & EPOLLHUP) {
+		msg(LOG_WARNING, "socket disconnect\n");
+		return EVENTCB_CLEANUP;
+
+	} else if (events & EPOLLOUT) {
+		rc = sendto(evt->fd, comm->message, comm->msglen, 0,
+			    (struct sockaddr *)&comm->addr, sizeof(comm->addr));
+		if (rc == -1) {
+			msg(LOG_ERR, "sendto: %m\n");
+			return EVENTCB_CLEANUP;
+		}
+		evt->ep.events = EPOLLIN|EPOLLHUP;
+
+	} else if (events & EPOLLIN) {
+		socklen_t len;
+
+		memset(&comm->addr, 0, sizeof(comm->addr));
+		len = sizeof(comm->addr);
+		rc = recvfrom(evt->fd, comm->message, sizeof(comm->message),
+			      MSG_TRUNC, (struct sockaddr*)&comm->addr, &len);
+		if (rc <= 0) {
+			msg(LOG_ERR, "error receiving child message: %m\n");
+			return EVENTCB_CONTINUE;
+		} else if (rc >= sizeof(comm->message)) {
+			msg(LOG_ERR, "child message truncated: %zd bytes missing\n",
+			    rc - (sizeof(comm->message) - 1));
+			return EVENTCB_CONTINUE;
+		}
+		if (handle_child_msg(comm, rc) < 0)
+			return EVENTCB_CONTINUE;
+
+		evt->ep.events = EPOLLOUT|EPOLLHUP;
+	}
+
+	if ((rc = event_modify(evt)) < 0) {
+		msg(LOG_ERR, "event_modify: %s\n", strerror(-rc));
+		return EVENTCB_CLEANUP;
+	}
+
+	return EVENTCB_CONTINUE;
+}
+
+static int set_socketflags(int fd)
+{
+	int flags;
+
+	if ((flags = fcntl(fd, F_GETFL, 0)) == -1) {
+		msg(LOG_ERR, "F_GETFL failed: %m\n");
+		return -errno;
+	}
+	if (fcntl(fd, F_SETFL, flags|O_NONBLOCK) == -1) {
+		msg(LOG_ERR, "F_SETFL failed: %m\n");
+		return -errno;
+	}
+	if ((flags = fcntl(fd, F_GETFD, 0)) == -1) {
+		msg(LOG_ERR, "F_GETFD failed: %m\n");
+		return -errno;
+	}
+	if (fcntl(fd, F_SETFD, flags|FD_CLOEXEC) == -1) {
+		msg(LOG_ERR, "F_SETFD failed: %m\n");
+		return -errno;
+	}
+	return 0;
+}
+
+static DEFINE_CLEANUP_FUNC(cleanup_comm, struct comm_event *, free);
+
+static void add_parent_comm_event(struct dispatcher *dsp)
+{
+	struct comm_event *comm __cleanup__(cleanup_comm) = NULL;
+	int fd __cleanup__(cleanup_fd) = -1;
+	int rc;
+
+	fd = socket(AF_UNIX, SOCK_DGRAM, 0);
+	if (fd == -1) {
+		msg(LOG_ERR, "failed to create socket: %m\n");
+		return;
+	}
+
+	if ((rc = set_socketflags(fd)) < 0)
+		return;
+
+	if (bind(fd, (struct sockaddr *)&monitor_sa,
+		 sizeof(monitor_sa)) == -1) {
+		msg(LOG_ERR, "bind() failed: %m\n");
+		return;
+	}
+
+	comm = calloc(1, sizeof(*comm));
+	if (!comm)
+		return;
+
+	comm->e = EVENT_ON_HEAP(parent_comm_cb, fd, EPOLLIN);
+
+	if ((rc = event_add(dsp, &comm->e)) < 0) {
+		msg(LOG_ERR, "failed to add child communication event: %s\n",
+		    strerror(-rc));
+		return;
+	}
+	fd = -1;
+	comm = NULL;
+}
+
 static void monitor_handle_nvme_add(struct udev_device *ud)
 {
 	const char *syspath = udev_device_get_syspath(ud);
@@ -311,7 +710,7 @@ static int monitor_discovery(const char *transport, const char *traddr,
 
 	rc = build_options(argstr, sizeof(argstr), true);
 	msg(LOG_DEBUG, "%s\n", argstr);
-	rc = do_discover(argstr, mon_cfg.autoconnect, NORMAL);
+	rc = do_discover(argstr, mon_cfg.autoconnect, NORMAL, &discover_callbacks);
 
 	free(device);
 	exit(-rc);
@@ -634,7 +1033,8 @@ static int monitor_discover_from_conf_file(void)
 	fabrics_cfg.persistent = true;
 
 	rc = discover_from_conf_file("Discover NVMeoF subsystems from " PATH_NVMF_DISC,
-				     argstr, mon_cfg.autoconnect);
+				     argstr, mon_cfg.autoconnect,
+				     &discover_callbacks);
 
 	exit(-rc);
 	/* not reached */
@@ -848,6 +1248,7 @@ int aen_monitor(const char *desc, int argc, char **argv)
 	}
 
 	add_inotify_event(mon_dsp);
+	add_parent_comm_event(mon_dsp);
 	conndb_init_from_sysfs();
 
 	ret = event_loop(mon_dsp, &wait_mask, handle_epoll_err);
diff --git a/monitor.h b/monitor.h
index e79d3a6..01ae4de 100644
--- a/monitor.h
+++ b/monitor.h
@@ -1,6 +1,12 @@
 #ifndef _MONITOR_H
 #define _MONITOR_H
 
+typedef void (*disc_notify_cb)(const char *argstr, int instance);
+
+struct monitor_callbacks {
+	disc_notify_cb notify;
+};
+
 extern int aen_monitor(const char *desc, int argc, char **argv);
 
 #endif
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 13/16] monitor: add "query device" message exchange
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (11 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 12/16] monitor: add parent/child messaging and "notify" message exchange mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 14/16] completions: add completions for nvme monitor mwilck
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

This message exchange sends a message with the current connection
parameters to the parent monitor process and attempts to obtain
the instance number of an existing discovery controller for the
given connection.

For making this work, another callback is added to monitor_callbacks()
do be called before attempting a discovery connection.
---
 fabrics.c |   4 ++
 monitor.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 monitor.h |   2 +
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/fabrics.c b/fabrics.c
index a9e28d8..64468db 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -1376,6 +1376,10 @@ int do_discover(char *argstr, bool connect, enum nvme_print_flags flags,
 		free(fabrics_cfg.device);
 		fabrics_cfg.device = NULL;
 	}
+
+	if (!fabrics_cfg.device && monitor && monitor->query_dev)
+		monitor->query_dev(argstr, &fabrics_cfg.device);
+
 	if (!fabrics_cfg.device)
 		fabrics_cfg.device = find_ctrl_with_connectargs(cargs);
 	free_connect_args(cargs);
diff --git a/monitor.c b/monitor.c
index 7f08772..f2d9512 100644
--- a/monitor.c
+++ b/monitor.c
@@ -236,6 +236,19 @@ static ssize_t monitor_child_message(char *buf, size_t size, size_t len)
  *      This exchange is initiated in notify_new_discovery(), which is passed
  *      as "notify" callback for do_discover().
  *    - parent responds with MON_MSG_ACK (or MON_MSG_ERR if an error occurs).
+ *
+ * "Query existing device" exchange:
+ *   - The child sends a MON_MSG_QDEV message to the parent after determining
+ *      transport connection parameters, but before attempting to create a
+ *      discovery controller.
+ *      Payload: the transport parameter string to be sent to /dev/nvme-fabrics.
+ *      This exchange is initiated in query_device(), which is passed as
+ *      "query_dev" callback for discover_from_conf_file().
+ *    - The parent responds:
+ *       * MON_MSG_SDEV ("send device") if an existing controller device was found
+ *         Payload: the instance number of the controller ("0" for /dev/nvme0").
+ *       * MON_MSG_ACK: if no existing controller device was found.
+ *       * MON_MSG_ERR: in case of an error.
  */
 
 static const char monitor_magic[] = "NVMM";
@@ -243,6 +256,8 @@ enum {
 	MON_MSG_ACK = 0,
 	MON_MSG_ERR,
 	MON_MSG_NEW,
+	MON_MSG_QDEV,
+	MON_MSG_SDEV,
 	__MAX_MON_MSG__,
 };
 
@@ -256,6 +271,8 @@ static const char *const monitor_opcode[] = {
 	[MON_MSG_ACK] = "ACK ",
 	[MON_MSG_ERR] = "ERR ",
 	[MON_MSG_NEW] = "NEW ",
+	[MON_MSG_QDEV] = "QDEV",
+	[MON_MSG_SDEV] = "SDEV",
 };
 
 static int monitor_msg_hdr(char *buf, size_t len, int opcode)
@@ -321,6 +338,11 @@ static int monitor_check_resp(const char *buf, size_t len, int req_opcode)
 		if (resp_opcode == MON_MSG_ACK && len == 0)
 			rc = 0;
 		break;
+	case MON_MSG_QDEV:
+		if ((resp_opcode == MON_MSG_ACK && len == 0) ||
+		    (resp_opcode == MON_MSG_SDEV && len > 0))
+			rc = 0;
+		break;
 	default:
 		break;
 	}
@@ -358,10 +380,54 @@ static void notify_new_discovery(const char *argstr, int instance)
 	monitor_check_resp(buf, rc, MON_MSG_NEW);
 }
 
+static void query_device(const char *argstr, char **device)
+{
+	char buf[MSG_SIZE];
+	size_t len = 0;
+	ssize_t rc;
+	int instance;
+	char dummy;
+	char *pbuf, *dev;
+
+	if ((rc = monitor_msg_hdr(buf, sizeof(buf), MON_MSG_QDEV)) < 0) {
+		msg(LOG_ERR, "failed to create msghdr: %s\n", strerror(-rc));
+		return;
+	}
+	len += rc;
+	if ((rc = safe_snprintf(buf + len, sizeof(buf) - len, "%s", argstr)) < 0) {
+		msg(LOG_ERR, "failed to create msg: %s\n", strerror(-rc));
+		return;
+	}
+	len += rc;
+	if ((rc = monitor_child_message(buf, sizeof(buf), len)) < 0)
+		return;
+
+	len = rc;
+	pbuf = buf;
+	if ((rc = monitor_check_resp(pbuf, len, MON_MSG_QDEV)) < 0)
+		return;
+
+	pbuf += rc;
+	len -= rc;
+	if (len == 0) {
+		msg(LOG_INFO, "monitor didn't report existing device\n");
+		return;
+	} else if (sscanf(pbuf, "%d%c", &instance, &dummy) != 1) {
+		msg(LOG_WARNING, "got bad device info: %s\n", pbuf);
+		return;
+	}
+
+	if (asprintf(&dev, "nvme%d", instance) < 0)
+		return;
+
+	msg(LOG_INFO, "monitor reported existing device %s\n", dev);
+	*device = dev;
+}
+
 static const struct monitor_callbacks discover_callbacks = {
 	.notify = notify_new_discovery,
+	.query_dev = query_device,
 };
-
 struct comm_event {
 	struct event e;
 	struct sockaddr_un addr;
@@ -399,6 +465,41 @@ static int handle_child_msg_new(char *buf, size_t size, ssize_t *len, ssize_t of
 	return MON_MSG_ACK;
 }
 
+static int handle_child_msg_qdev(char *buf, size_t size, ssize_t *len, ssize_t ofs)
+{
+	ssize_t rc = MON_MSG_ERR;
+	struct nvme_connection *co;
+	char *pbuf = buf;
+
+	if (*len <= ofs) {
+		msg(LOG_ERR, "short packet (len=%zd)\n", *len);
+		return MON_MSG_ERR;
+	}
+
+	pbuf += ofs;
+	rc = conndb_add_disc_ctrl(pbuf, &co);
+	if (rc != 0 && rc != -EEXIST) {
+		msg(LOG_WARNING, "invalid address: \"%s\"\n", buf);
+		return MON_MSG_ERR;
+	}
+
+	if (co->discovery_instance != -1) {
+		rc = monitor_msg_hdr(buf, size, MON_MSG_SDEV);
+		if (rc >= 0) {
+			buf += rc;
+			if ((rc = snprintf(buf, size - rc, "%d",
+					   co->discovery_instance)) >= 0) {
+				*len = ofs + rc;
+				return MON_MSG_SDEV;
+			}
+		}
+		msg(LOG_ERR, "failed to create SDEV message: %s\n",
+		    strerror(-rc));
+	}
+
+	return MON_MSG_ACK;
+}
+
 static int handle_child_msg(struct comm_event *comm, ssize_t len)
 {
 	ssize_t rc, ofs;
@@ -418,8 +519,14 @@ static int handle_child_msg(struct comm_event *comm, ssize_t len)
 						  sizeof(comm->message),
 						  &len, ofs);
 			break;
+		case MON_MSG_QDEV:
+			rc = handle_child_msg_qdev(comm->message,
+						   sizeof(comm->message),
+						   &len, ofs);
+			break;
 		case MON_MSG_ACK:
 		case MON_MSG_ERR:
+		case MON_MSG_SDEV:
 			msg(LOG_ERR, "unexpected message: %s\n", monitor_opcode[opcode]);
 			rc = MON_MSG_ERR;
 			break;
diff --git a/monitor.h b/monitor.h
index 01ae4de..3c2f6da 100644
--- a/monitor.h
+++ b/monitor.h
@@ -2,9 +2,11 @@
 #define _MONITOR_H
 
 typedef void (*disc_notify_cb)(const char *argstr, int instance);
+typedef void (*disc_query_dev_cb)(const char *argstr, char **device);
 
 struct monitor_callbacks {
 	disc_notify_cb notify;
+	disc_query_dev_cb query_dev;
 };
 
 extern int aen_monitor(const char *desc, int argc, char **argv);
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 14/16] completions: add completions for nvme monitor
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (12 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 13/16] monitor: add "query device" " mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 15/16] nvmf-autoconnect: add unit file for nvme-monitor.service mwilck
  2021-03-06  0:36 ` [PATCH v2 16/16] nvme-monitor(1): add man page for nvme-monitor mwilck
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Add some completions for the monitor command.
---
 completions/bash-nvme-completion.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/completions/bash-nvme-completion.sh b/completions/bash-nvme-completion.sh
index 7ab4226..f142981 100644
--- a/completions/bash-nvme-completion.sh
+++ b/completions/bash-nvme-completion.sh
@@ -12,7 +12,7 @@ _cmds="list id-ctrl id-ns list-ns id-iocs nvm-id-ctrl create-ns delete-ns \
 	write-uncor copy reset subsystem-reset show-regs discover \
 	connect-all connect disconnect version help \
 	intel lnvm memblaze list-subsys endurance-event-agg-log \
-	lba-status-log"
+	lba-status-log monitor"
 
 nvme_list_opts () {
         local opts=""
@@ -242,6 +242,10 @@ nvme_list_opts () {
 		"disconnect")
 		opts+=" --nqn -n --device -d"
 			;;
+		"monitor")
+		opts+=" --no-connect -N --cleanup -C \
+			--hostnqn= -q --verbose -v --timestamps -t"
+			;;
 		"version")
 		opts+=""
 			;;
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 15/16] nvmf-autoconnect: add unit file for nvme-monitor.service
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (13 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 14/16] completions: add completions for nvme monitor mwilck
@ 2021-03-06  0:36 ` mwilck
  2021-03-06  0:36 ` [PATCH v2 16/16] nvme-monitor(1): add man page for nvme-monitor mwilck
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Running both this service and the udev-based nvmf autoconnect service
is discouraged. A "Conflicts" directive against nvmf-connect@.service
wouldn't work, because it would cause the monitor to be stopped as soon
as an event arrives. Therefore, mask the udev rules that would start
the parallel discovery.
---
 nvmf-autoconnect/systemd/nvme-monitor.service | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
 create mode 100644 nvmf-autoconnect/systemd/nvme-monitor.service

diff --git a/nvmf-autoconnect/systemd/nvme-monitor.service b/nvmf-autoconnect/systemd/nvme-monitor.service
new file mode 100644
index 0000000..5666761
--- /dev/null
+++ b/nvmf-autoconnect/systemd/nvme-monitor.service
@@ -0,0 +1,18 @@
+[Unit]
+Description=NVMe Event Monitor for Automatical Subsystem Connection
+Documentation=man:nvme-monitor(1)
+DefaultDependencies=false
+RequiresMountsFor=/sys /run /dev
+Conflicts=shutdown.target
+After=systemd-udevd-kernel.socket
+Before=sysinit.target systemd-udev-trigger.service nvmefc-boot-connections.service
+
+[Service]
+Type=simple
+ExecStartPre=-/usr/bin/ln -sf /dev/null /run/udev/rules.d/70-nvmf-autoconnect.rules
+ExecStart=/usr/sbin/nvme monitor
+ExecStartPost=-/usr/bin/rm -f /run/udev/rules.d/70-nvmf-autoconnect.rules
+KillMode=mixed
+
+[Install]
+WantedBy=sysinit.target
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 16/16] nvme-monitor(1): add man page for nvme-monitor
  2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
                   ` (14 preceding siblings ...)
  2021-03-06  0:36 ` [PATCH v2 15/16] nvmf-autoconnect: add unit file for nvme-monitor.service mwilck
@ 2021-03-06  0:36 ` mwilck
  15 siblings, 0 replies; 20+ messages in thread
From: mwilck @ 2021-03-06  0:36 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Add documentation for the nvme-monitor command.
---
 Documentation/cmds-main.txt     |    4 +
 Documentation/nvme-monitor.1    |  180 ++++++
 Documentation/nvme-monitor.html | 1018 +++++++++++++++++++++++++++++++
 Documentation/nvme-monitor.txt  |  144 +++++
 4 files changed, 1346 insertions(+)
 create mode 100644 Documentation/nvme-monitor.1
 create mode 100644 Documentation/nvme-monitor.html
 create mode 100644 Documentation/nvme-monitor.txt

diff --git a/Documentation/cmds-main.txt b/Documentation/cmds-main.txt
index 46df03d..d058a54 100644
--- a/Documentation/cmds-main.txt
+++ b/Documentation/cmds-main.txt
@@ -168,3 +168,7 @@ linknvme:nvme-disconnect-all[1]::
 
 linknvme:nvme-get-property[1]::
 	Reads and shows NVMe-over-Fabrics controller property
+
+linknvme:nvme-monitor[1]::
+	Monitor Discovery events and Discover and Connect automatically
+
diff --git a/Documentation/nvme-monitor.1 b/Documentation/nvme-monitor.1
new file mode 100644
index 0000000..c87988e
--- /dev/null
+++ b/Documentation/nvme-monitor.1
@@ -0,0 +1,180 @@
+'\" t
+.\"     Title: nvme-monitor
+.\"    Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
+.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
+.\"      Date: 02/25/2021
+.\"    Manual: NVMe Manual
+.\"    Source: NVMe
+.\"  Language: English
+.\"
+.TH "NVME\-MONITOR" "1" "02/25/2021" "NVMe" "NVMe Manual"
+.\" -----------------------------------------------------------------
+.\" * Define some portability stuff
+.\" -----------------------------------------------------------------
+.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.\" http://bugs.debian.org/507673
+.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
+.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\" -----------------------------------------------------------------
+.\" * set default formatting
+.\" -----------------------------------------------------------------
+.\" disable hyphenation
+.nh
+.\" disable justification (adjust text to left margin only)
+.ad l
+.\" -----------------------------------------------------------------
+.\" * MAIN CONTENT STARTS HERE *
+.\" -----------------------------------------------------------------
+.SH "NAME"
+nvme-monitor \- Monitor Discovery events and Discover and Connect automatically
+.SH "SYNOPSIS"
+.sp
+.nf
+\fInvme discover\fR
+                [\-\-no\-auto                | \-N]
+                [\-\-hostnqn=<hostnqn>      | \-q <hostnqn>]
+                [\-\-hostid=<hostid>        | \-I <hostid>]
+                [\-\-keep\-alive\-tmo=<#>     | \-k <#>]
+                [\-\-reconnect\-delay=<#>    | \-c <#>]
+                [\-\-ctrl\-loss\-tmo=<#>      | \-l <#>]
+                [\-\-hdr\-digest             | \-g]
+                [\-\-data\-digest            | \-G]
+                [\-\-nr\-io\-queues=<#>       | \-i <#>]
+                [\-\-nr\-write\-queues=<#>    | \-W <#>]
+                [\-\-nr\-poll\-queues=<#>     | \-P <#>]
+                [\-\-queue\-size=<#>         | \-Q <#>]
+                [\-\-matching               | \-m]
+                [\-\-persistent             | \-p]
+                [\-\-silent                 | \-S]
+                [\-\-verbose                | \-v]
+                [\-\-debug                  | \-D]
+                [\-\-timestamps             | \-t]
+.fi
+.SH "DESCRIPTION"
+.sp
+Listen to Discovery events (Asynchronous Event Notifications, AENs) on NVMe\-over\-Fabrics (NVMeoF) Discovery Controllers and for other events related to NVMeoF Discovery, and optionally connect to newly discovered controllers\&.
+.sp
+If no parameters are given, then \fInvme monitor\fR listens to Discovery\-related udev events (uevents)\&. If an event is received, it connects to the Discovery Controller and performs the equivalent of an \fInvme connect\-all\fR on the associated transport address\&. When run through a systemd service in rhis mode, the monitor can be used as an alternative to the udev\-rule based auto\-activation of NVMeoF connections\&. If this is done, it is recommended to deactivate the udev rule\-based autoconnection mechanism, e\&.g\&. by creating a symlink /run/udev/rules\&.d/70\-nvmf\-autoconnect\&.rules to /dev/null\&. Otherwise both mechanisms will run discovery in parallel, which causes unnecessary system activity spurious error messages\&.
+.sp
+Currently, the following event types are supported:
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+"Discovery Log Page Change" Asynchronous Event Notifications (AENs) delivered via persistent connections to NVMeoF discovery controllers connected to the discovery service (nqn\&.2014\-08\&.org\&.nvmexpress\&.discovery)\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+FC\-NVMe auto\-connect uevents sent when the FibreChannel transport discovers N_Ports offering NVMe services\&.
+.RE
+.sp
+See the documentation for the nvme\-connect\-all(1) and nvme\-discover(1) commands for further background\&.
+.SH "OPTIONS"
+.PP
+\-N, \-\-no\-connect
+.RS 4
+If this option is given,
+\fInvme monitor\fR
+will not attempt to connect to newly discovered controllers\&. Instead, information about found discovery log entries will be printed to stdout (in other words, instead of
+\fInvme connect\-all\fR, the monitor only executes
+\fInvme discover\fR
+for detected discovery controllers)\&.
+.RE
+.PP
+\-C, \-\-cleanup
+.RS 4
+Disconnect discovery controllers when the program exits\&. This affects only discovery controller connections created while the program was running\&.
+.RE
+.PP
+\-q <hostnqn>, \-\-hostnqn=<hostnqn>, \-I <hostid>, \-\-hostid=<hostid>, \-k <#>, \-\-keep\-alive\-tmo=<#>, \-c <#>, \-\-reconnect\-delay=<#>, \-l <#>, \-\-ctrl\-loss\-tmo=<#>, \-g, \-\-hdr\-digest, \-G, \-\-data\-digest, \-i <#>, \-\-nr\-io\-queues=<#>, \-W <#>, \-\-nr\-write\-queues=<#>, \-P <#>, \-\-nr\-poll\-queues=<#>, \-Q <#>, \-\-queue\-size=<#>, \-m, \-\-matching
+.RS 4
+These options have the same meaning as for
+\fInvme connect\-all\fR\&. See the man page nvme\-connect\-all(1) for details\&.
+.RE
+.PP
+\-S, \-\-silent
+.RS 4
+Only print warnings and severe error messages\&. Do not log discoveries and newly created controllers\&.
+.RE
+.PP
+\-v, \-\-verbose
+.RS 4
+Log informational messages\&. This option overrides
+\fI\-\-silent\fR\&.
+.RE
+.PP
+\-D, \-\-debug
+.RS 4
+Log informational and debug messages\&. This option overrieds
+\fI\-\-silent\fR
+and
+\fI\-\-verbose\fR\&.
+.RE
+.PP
+\-t, \-\-timestamps
+.RS 4
+Add timestamps to log messages\&.
+.RE
+.SH "EXAMPLES"
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Listen to FC\-NVME events and AENs, creating persistent Discovery Controllers on the way, and automatically connect to all discovered controllers:
+.sp
+.if n \{\
+.RS 4
+.\}
+.nf
+# nvme monitor
+.fi
+.if n \{\
+.RE
+.\}
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Like the above, but print more log messages, remove created discovery controller connections on exit, and use a non\-standard host NQN:
+.sp
+.if n \{\
+.RS 4
+.\}
+.nf
+# nvme monitor \-\-verbose \-\-cleanup \-\-hostqn=host1\-rogue\-nqn
+.fi
+.if n \{\
+.RE
+.\}
+.RE
+.SH "SEE ALSO"
+.sp
+nvme\-discover(1) nvme\-connect\-all(1)
+.SH "NVME"
+.sp
+Part of the nvme\-user suite
diff --git a/Documentation/nvme-monitor.html b/Documentation/nvme-monitor.html
new file mode 100644
index 0000000..bcc8f93
--- /dev/null
+++ b/Documentation/nvme-monitor.html
@@ -0,0 +1,1018 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
+    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
+<head>
+<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
+<meta name="generator" content="AsciiDoc" />
+<title>nvme-monitor(1)</title>
+<style type="text/css">
+/* Shared CSS for AsciiDoc xhtml11 and html5 backends */
+
+/* Default font. */
+body {
+  font-family: Georgia,serif;
+}
+
+/* Title font. */
+h1, h2, h3, h4, h5, h6,
+div.title, caption.title,
+thead, p.table.header,
+#toctitle,
+#author, #revnumber, #revdate, #revremark,
+#footer {
+  font-family: Arial,Helvetica,sans-serif;
+}
+
+body {
+  margin: 1em 5% 1em 5%;
+}
+
+a {
+  color: blue;
+  text-decoration: underline;
+}
+a:visited {
+  color: fuchsia;
+}
+
+em {
+  font-style: italic;
+  color: navy;
+}
+
+strong {
+  font-weight: bold;
+  color: #083194;
+}
+
+h1, h2, h3, h4, h5, h6 {
+  color: #527bbd;
+  margin-top: 1.2em;
+  margin-bottom: 0.5em;
+  line-height: 1.3;
+}
+
+h1, h2, h3 {
+  border-bottom: 2px solid silver;
+}
+h2 {
+  padding-top: 0.5em;
+}
+h3 {
+  float: left;
+}
+h3 + * {
+  clear: left;
+}
+h5 {
+  font-size: 1.0em;
+}
+
+div.sectionbody {
+  margin-left: 0;
+}
+
+hr {
+  border: 1px solid silver;
+}
+
+p {
+  margin-top: 0.5em;
+  margin-bottom: 0.5em;
+}
+
+ul, ol, li > p {
+  margin-top: 0;
+}
+ul > li     { color: #aaa; }
+ul > li > * { color: black; }
+
+.monospaced, code, pre {
+  font-family: "Courier New", Courier, monospace;
+  font-size: inherit;
+  color: navy;
+  padding: 0;
+  margin: 0;
+}
+pre {
+  white-space: pre-wrap;
+}
+
+#author {
+  color: #527bbd;
+  font-weight: bold;
+  font-size: 1.1em;
+}
+#email {
+}
+#revnumber, #revdate, #revremark {
+}
+
+#footer {
+  font-size: small;
+  border-top: 2px solid silver;
+  padding-top: 0.5em;
+  margin-top: 4.0em;
+}
+#footer-text {
+  float: left;
+  padding-bottom: 0.5em;
+}
+#footer-badges {
+  float: right;
+  padding-bottom: 0.5em;
+}
+
+#preamble {
+  margin-top: 1.5em;
+  margin-bottom: 1.5em;
+}
+div.imageblock, div.exampleblock, div.verseblock,
+div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
+div.admonitionblock {
+  margin-top: 1.0em;
+  margin-bottom: 1.5em;
+}
+div.admonitionblock {
+  margin-top: 2.0em;
+  margin-bottom: 2.0em;
+  margin-right: 10%;
+  color: #606060;
+}
+
+div.content { /* Block element content. */
+  padding: 0;
+}
+
+/* Block element titles. */
+div.title, caption.title {
+  color: #527bbd;
+  font-weight: bold;
+  text-align: left;
+  margin-top: 1.0em;
+  margin-bottom: 0.5em;
+}
+div.title + * {
+  margin-top: 0;
+}
+
+td div.title:first-child {
+  margin-top: 0.0em;
+}
+div.content div.title:first-child {
+  margin-top: 0.0em;
+}
+div.content + div.title {
+  margin-top: 0.0em;
+}
+
+div.sidebarblock > div.content {
+  background: #ffffee;
+  border: 1px solid #dddddd;
+  border-left: 4px solid #f0f0f0;
+  padding: 0.5em;
+}
+
+div.listingblock > div.content {
+  border: 1px solid #dddddd;
+  border-left: 5px solid #f0f0f0;
+  background: #f8f8f8;
+  padding: 0.5em;
+}
+
+div.quoteblock, div.verseblock {
+  padding-left: 1.0em;
+  margin-left: 1.0em;
+  margin-right: 10%;
+  border-left: 5px solid #f0f0f0;
+  color: #888;
+}
+
+div.quoteblock > div.attribution {
+  padding-top: 0.5em;
+  text-align: right;
+}
+
+div.verseblock > pre.content {
+  font-family: inherit;
+  font-size: inherit;
+}
+div.verseblock > div.attribution {
+  padding-top: 0.75em;
+  text-align: left;
+}
+/* DEPRECATED: Pre version 8.2.7 verse style literal block. */
+div.verseblock + div.attribution {
+  text-align: left;
+}
+
+div.admonitionblock .icon {
+  vertical-align: top;
+  font-size: 1.1em;
+  font-weight: bold;
+  text-decoration: underline;
+  color: #527bbd;
+  padding-right: 0.5em;
+}
+div.admonitionblock td.content {
+  padding-left: 0.5em;
+  border-left: 3px solid #dddddd;
+}
+
+div.exampleblock > div.content {
+  border-left: 3px solid #dddddd;
+  padding-left: 0.5em;
+}
+
+div.imageblock div.content { padding-left: 0; }
+span.image img { border-style: none; vertical-align: text-bottom; }
+a.image:visited { color: white; }
+
+dl {
+  margin-top: 0.8em;
+  margin-bottom: 0.8em;
+}
+dt {
+  margin-top: 0.5em;
+  margin-bottom: 0;
+  font-style: normal;
+  color: navy;
+}
+dd > *:first-child {
+  margin-top: 0.1em;
+}
+
+ul, ol {
+    list-style-position: outside;
+}
+ol.arabic {
+  list-style-type: decimal;
+}
+ol.loweralpha {
+  list-style-type: lower-alpha;
+}
+ol.upperalpha {
+  list-style-type: upper-alpha;
+}
+ol.lowerroman {
+  list-style-type: lower-roman;
+}
+ol.upperroman {
+  list-style-type: upper-roman;
+}
+
+div.compact ul, div.compact ol,
+div.compact p, div.compact p,
+div.compact div, div.compact div {
+  margin-top: 0.1em;
+  margin-bottom: 0.1em;
+}
+
+tfoot {
+  font-weight: bold;
+}
+td > div.verse {
+  white-space: pre;
+}
+
+div.hdlist {
+  margin-top: 0.8em;
+  margin-bottom: 0.8em;
+}
+div.hdlist tr {
+  padding-bottom: 15px;
+}
+dt.hdlist1.strong, td.hdlist1.strong {
+  font-weight: bold;
+}
+td.hdlist1 {
+  vertical-align: top;
+  font-style: normal;
+  padding-right: 0.8em;
+  color: navy;
+}
+td.hdlist2 {
+  vertical-align: top;
+}
+div.hdlist.compact tr {
+  margin: 0;
+  padding-bottom: 0;
+}
+
+.comment {
+  background: yellow;
+}
+
+.footnote, .footnoteref {
+  font-size: 0.8em;
+}
+
+span.footnote, span.footnoteref {
+  vertical-align: super;
+}
+
+#footnotes {
+  margin: 20px 0 20px 0;
+  padding: 7px 0 0 0;
+}
+
+#footnotes div.footnote {
+  margin: 0 0 5px 0;
+}
+
+#footnotes hr {
+  border: none;
+  border-top: 1px solid silver;
+  height: 1px;
+  text-align: left;
+  margin-left: 0;
+  width: 20%;
+  min-width: 100px;
+}
+
+div.colist td {
+  padding-right: 0.5em;
+  padding-bottom: 0.3em;
+  vertical-align: top;
+}
+div.colist td img {
+  margin-top: 0.3em;
+}
+
+@media print {
+  #footer-badges { display: none; }
+}
+
+#toc {
+  margin-bottom: 2.5em;
+}
+
+#toctitle {
+  color: #527bbd;
+  font-size: 1.1em;
+  font-weight: bold;
+  margin-top: 1.0em;
+  margin-bottom: 0.1em;
+}
+
+div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
+  margin-top: 0;
+  margin-bottom: 0;
+}
+div.toclevel2 {
+  margin-left: 2em;
+  font-size: 0.9em;
+}
+div.toclevel3 {
+  margin-left: 4em;
+  font-size: 0.9em;
+}
+div.toclevel4 {
+  margin-left: 6em;
+  font-size: 0.9em;
+}
+
+span.aqua { color: aqua; }
+span.black { color: black; }
+span.blue { color: blue; }
+span.fuchsia { color: fuchsia; }
+span.gray { color: gray; }
+span.green { color: green; }
+span.lime { color: lime; }
+span.maroon { color: maroon; }
+span.navy { color: navy; }
+span.olive { color: olive; }
+span.purple { color: purple; }
+span.red { color: red; }
+span.silver { color: silver; }
+span.teal { color: teal; }
+span.white { color: white; }
+span.yellow { color: yellow; }
+
+span.aqua-background { background: aqua; }
+span.black-background { background: black; }
+span.blue-background { background: blue; }
+span.fuchsia-background { background: fuchsia; }
+span.gray-background { background: gray; }
+span.green-background { background: green; }
+span.lime-background { background: lime; }
+span.maroon-background { background: maroon; }
+span.navy-background { background: navy; }
+span.olive-background { background: olive; }
+span.purple-background { background: purple; }
+span.red-background { background: red; }
+span.silver-background { background: silver; }
+span.teal-background { background: teal; }
+span.white-background { background: white; }
+span.yellow-background { background: yellow; }
+
+span.big { font-size: 2em; }
+span.small { font-size: 0.6em; }
+
+span.underline { text-decoration: underline; }
+span.overline { text-decoration: overline; }
+span.line-through { text-decoration: line-through; }
+
+div.unbreakable { page-break-inside: avoid; }
+
+
+/*
+ * xhtml11 specific
+ *
+ * */
+
+div.tableblock {
+  margin-top: 1.0em;
+  margin-bottom: 1.5em;
+}
+div.tableblock > table {
+  border: 3px solid #527bbd;
+}
+thead, p.table.header {
+  font-weight: bold;
+  color: #527bbd;
+}
+p.table {
+  margin-top: 0;
+}
+/* Because the table frame attribute is overridden by CSS in most browsers. */
+div.tableblock > table[frame="void"] {
+  border-style: none;
+}
+div.tableblock > table[frame="hsides"] {
+  border-left-style: none;
+  border-right-style: none;
+}
+div.tableblock > table[frame="vsides"] {
+  border-top-style: none;
+  border-bottom-style: none;
+}
+
+
+/*
+ * html5 specific
+ *
+ * */
+
+table.tableblock {
+  margin-top: 1.0em;
+  margin-bottom: 1.5em;
+}
+thead, p.tableblock.header {
+  font-weight: bold;
+  color: #527bbd;
+}
+p.tableblock {
+  margin-top: 0;
+}
+table.tableblock {
+  border-width: 3px;
+  border-spacing: 0px;
+  border-style: solid;
+  border-color: #527bbd;
+  border-collapse: collapse;
+}
+th.tableblock, td.tableblock {
+  border-width: 1px;
+  padding: 4px;
+  border-style: solid;
+  border-color: #527bbd;
+}
+
+table.tableblock.frame-topbot {
+  border-left-style: hidden;
+  border-right-style: hidden;
+}
+table.tableblock.frame-sides {
+  border-top-style: hidden;
+  border-bottom-style: hidden;
+}
+table.tableblock.frame-none {
+  border-style: hidden;
+}
+
+th.tableblock.halign-left, td.tableblock.halign-left {
+  text-align: left;
+}
+th.tableblock.halign-center, td.tableblock.halign-center {
+  text-align: center;
+}
+th.tableblock.halign-right, td.tableblock.halign-right {
+  text-align: right;
+}
+
+th.tableblock.valign-top, td.tableblock.valign-top {
+  vertical-align: top;
+}
+th.tableblock.valign-middle, td.tableblock.valign-middle {
+  vertical-align: middle;
+}
+th.tableblock.valign-bottom, td.tableblock.valign-bottom {
+  vertical-align: bottom;
+}
+
+
+/*
+ * manpage specific
+ *
+ * */
+
+body.manpage h1 {
+  padding-top: 0.5em;
+  padding-bottom: 0.5em;
+  border-top: 2px solid silver;
+  border-bottom: 2px solid silver;
+}
+body.manpage h2 {
+  border-style: none;
+}
+body.manpage div.sectionbody {
+  margin-left: 3em;
+}
+
+@media print {
+  body.manpage div#toc { display: none; }
+}
+
+
+</style>
+<script type="text/javascript">
+/*<![CDATA[*/
+var asciidoc = {  // Namespace.
+
+/////////////////////////////////////////////////////////////////////
+// Table Of Contents generator
+/////////////////////////////////////////////////////////////////////
+
+/* Author: Mihai Bazon, September 2002
+ * http://students.infoiasi.ro/~mishoo
+ *
+ * Table Of Content generator
+ * Version: 0.4
+ *
+ * Feel free to use this script under the terms of the GNU General Public
+ * License, as long as you do not remove or alter this notice.
+ */
+
+ /* modified by Troy D. Hanson, September 2006. License: GPL */
+ /* modified by Stuart Rackham, 2006, 2009. License: GPL */
+
+// toclevels = 1..4.
+toc: function (toclevels) {
+
+  function getText(el) {
+    var text = "";
+    for (var i = el.firstChild; i != null; i = i.nextSibling) {
+      if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
+        text += i.data;
+      else if (i.firstChild != null)
+        text += getText(i);
+    }
+    return text;
+  }
+
+  function TocEntry(el, text, toclevel) {
+    this.element = el;
+    this.text = text;
+    this.toclevel = toclevel;
+  }
+
+  function tocEntries(el, toclevels) {
+    var result = new Array;
+    var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
+    // Function that scans the DOM tree for header elements (the DOM2
+    // nodeIterator API would be a better technique but not supported by all
+    // browsers).
+    var iterate = function (el) {
+      for (var i = el.firstChild; i != null; i = i.nextSibling) {
+        if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
+          var mo = re.exec(i.tagName);
+          if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
+            result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
+          }
+          iterate(i);
+        }
+      }
+    }
+    iterate(el);
+    return result;
+  }
+
+  var toc = document.getElementById("toc");
+  if (!toc) {
+    return;
+  }
+
+  // Delete existing TOC entries in case we're reloading the TOC.
+  var tocEntriesToRemove = [];
+  var i;
+  for (i = 0; i < toc.childNodes.length; i++) {
+    var entry = toc.childNodes[i];
+    if (entry.nodeName.toLowerCase() == 'div'
+     && entry.getAttribute("class")
+     && entry.getAttribute("class").match(/^toclevel/))
+      tocEntriesToRemove.push(entry);
+  }
+  for (i = 0; i < tocEntriesToRemove.length; i++) {
+    toc.removeChild(tocEntriesToRemove[i]);
+  }
+
+  // Rebuild TOC entries.
+  var entries = tocEntries(document.getElementById("content"), toclevels);
+  for (var i = 0; i < entries.length; ++i) {
+    var entry = entries[i];
+    if (entry.element.id == "")
+      entry.element.id = "_toc_" + i;
+    var a = document.createElement("a");
+    a.href = "#" + entry.element.id;
+    a.appendChild(document.createTextNode(entry.text));
+    var div = document.createElement("div");
+    div.appendChild(a);
+    div.className = "toclevel" + entry.toclevel;
+    toc.appendChild(div);
+  }
+  if (entries.length == 0)
+    toc.parentNode.removeChild(toc);
+},
+
+
+/////////////////////////////////////////////////////////////////////
+// Footnotes generator
+/////////////////////////////////////////////////////////////////////
+
+/* Based on footnote generation code from:
+ * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
+ */
+
+footnotes: function () {
+  // Delete existing footnote entries in case we're reloading the footnodes.
+  var i;
+  var noteholder = document.getElementById("footnotes");
+  if (!noteholder) {
+    return;
+  }
+  var entriesToRemove = [];
+  for (i = 0; i < noteholder.childNodes.length; i++) {
+    var entry = noteholder.childNodes[i];
+    if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
+      entriesToRemove.push(entry);
+  }
+  for (i = 0; i < entriesToRemove.length; i++) {
+    noteholder.removeChild(entriesToRemove[i]);
+  }
+
+  // Rebuild footnote entries.
+  var cont = document.getElementById("content");
+  var spans = cont.getElementsByTagName("span");
+  var refs = {};
+  var n = 0;
+  for (i=0; i<spans.length; i++) {
+    if (spans[i].className == "footnote") {
+      n++;
+      var note = spans[i].getAttribute("data-note");
+      if (!note) {
+        // Use [\s\S] in place of . so multi-line matches work.
+        // Because JavaScript has no s (dotall) regex flag.
+        note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
+        spans[i].innerHTML =
+          "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
+          "' title='View footnote' class='footnote'>" + n + "</a>]";
+        spans[i].setAttribute("data-note", note);
+      }
+      noteholder.innerHTML +=
+        "<div class='footnote' id='_footnote_" + n + "'>" +
+        "<a href='#_footnoteref_" + n + "' title='Return to text'>" +
+        n + "</a>. " + note + "</div>";
+      var id =spans[i].getAttribute("id");
+      if (id != null) refs["#"+id] = n;
+    }
+  }
+  if (n == 0)
+    noteholder.parentNode.removeChild(noteholder);
+  else {
+    // Process footnoterefs.
+    for (i=0; i<spans.length; i++) {
+      if (spans[i].className == "footnoteref") {
+        var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
+        href = href.match(/#.*/)[0];  // Because IE return full URL.
+        n = refs[href];
+        spans[i].innerHTML =
+          "[<a href='#_footnote_" + n +
+          "' title='View footnote' class='footnote'>" + n + "</a>]";
+      }
+    }
+  }
+},
+
+install: function(toclevels) {
+  var timerId;
+
+  function reinstall() {
+    asciidoc.footnotes();
+    if (toclevels) {
+      asciidoc.toc(toclevels);
+    }
+  }
+
+  function reinstallAndRemoveTimer() {
+    clearInterval(timerId);
+    reinstall();
+  }
+
+  timerId = setInterval(reinstall, 500);
+  if (document.addEventListener)
+    document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
+  else
+    window.onload = reinstallAndRemoveTimer;
+}
+
+}
+asciidoc.install();
+/*]]>*/
+</script>
+</head>
+<body class="manpage">
+<div id="header">
+<h1>
+nvme-monitor(1) Manual Page
+</h1>
+<h2>NAME</h2>
+<div class="sectionbody">
+<p>nvme-monitor -
+   Monitor Discovery events and Discover and Connect automatically
+</p>
+</div>
+</div>
+<div id="content">
+<div class="sect1">
+<h2 id="_synopsis">SYNOPSIS</h2>
+<div class="sectionbody">
+<div class="verseblock">
+<pre class="content"><em>nvme discover</em>
+                [--no-auto                | -N]
+                [--hostnqn=&lt;hostnqn&gt;      | -q &lt;hostnqn&gt;]
+                [--hostid=&lt;hostid&gt;        | -I &lt;hostid&gt;]
+                [--keep-alive-tmo=&lt;#&gt;     | -k &lt;#&gt;]
+                [--reconnect-delay=&lt;#&gt;    | -c &lt;#&gt;]
+                [--ctrl-loss-tmo=&lt;#&gt;      | -l &lt;#&gt;]
+                [--hdr-digest             | -g]
+                [--data-digest            | -G]
+                [--nr-io-queues=&lt;#&gt;       | -i &lt;#&gt;]
+                [--nr-write-queues=&lt;#&gt;    | -W &lt;#&gt;]
+                [--nr-poll-queues=&lt;#&gt;     | -P &lt;#&gt;]
+                [--queue-size=&lt;#&gt;         | -Q &lt;#&gt;]
+                [--matching               | -m]
+                [--persistent             | -p]
+                [--silent                 | -S]
+                [--verbose                | -v]
+                [--debug                  | -D]
+                [--timestamps             | -t]</pre>
+<div class="attribution">
+</div></div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_description">DESCRIPTION</h2>
+<div class="sectionbody">
+<div class="paragraph"><p>Listen to Discovery events (Asynchronous Event Notifications, AENs) on
+NVMe-over-Fabrics (NVMeoF) Discovery Controllers and for other events related
+to NVMeoF Discovery, and optionally connect to newly discovered controllers.</p></div>
+<div class="paragraph"><p>If no parameters are given, then <em>nvme monitor</em> listens to Discovery-related
+udev events (uevents). If an event is received, it connects to the Discovery
+Controller and performs the equivalent of an <em>nvme connect-all</em> on the
+associated transport address. When run through a systemd service in rhis
+mode, the monitor can be used as an alternative to the udev-rule based
+auto-activation of NVMeoF connections. If this is done, it is recommended
+to deactivate the udev rule-based autoconnection mechanism, e.g. by creating
+a symlink <code>/run/udev/rules.d/70-nvmf-autoconnect.rules</code> to <code>/dev/null</code>.
+Otherwise both mechanisms will run discovery in parallel, which causes
+unnecessary system activity spurious error messages.</p></div>
+<div class="paragraph"><p>Currently, the following event types are supported:</p></div>
+<div class="ulist"><ul>
+<li>
+<p>
+"Discovery Log Page Change" Asynchronous Event Notifications (AENs)
+  delivered via persistent connections to NVMeoF discovery controllers
+  connected to the discovery service (<code>nqn.2014-08.org.nvmexpress.discovery</code>).
+</p>
+</li>
+<li>
+<p>
+FC-NVMe auto-connect uevents sent when the FibreChannel transport discovers
+  N_Ports offering NVMe services.
+</p>
+</li>
+</ul></div>
+<div class="paragraph"><p>See the documentation for the nvme-connect-all(1) and nvme-discover(1)
+commands for further background.</p></div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_options">OPTIONS</h2>
+<div class="sectionbody">
+<div class="dlist"><dl>
+<dt class="hdlist1">
+-N
+</dt>
+<dt class="hdlist1">
+--no-connect
+</dt>
+<dd>
+<p>
+        If this option is given, <em>nvme monitor</em> will not  attempt to connect to
+        newly discovered controllers. Instead, information about found
+        discovery log entries will be printed to stdout (in other words, instead of
+        <em>nvme connect-all</em>, the monitor only executes <em>nvme discover</em> for
+        detected discovery controllers).
+</p>
+</dd>
+<dt class="hdlist1">
+-C
+</dt>
+<dt class="hdlist1">
+--cleanup
+</dt>
+<dd>
+<p>
+        Disconnect discovery controllers when the program exits. This affects
+        only discovery controller connections created while the program was running.
+</p>
+</dd>
+<dt class="hdlist1">
+-q &lt;hostnqn&gt;
+</dt>
+<dt class="hdlist1">
+--hostnqn=&lt;hostnqn&gt;
+</dt>
+<dt class="hdlist1">
+-I &lt;hostid&gt;
+</dt>
+<dt class="hdlist1">
+--hostid=&lt;hostid&gt;
+</dt>
+<dt class="hdlist1">
+-k &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--keep-alive-tmo=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-c &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--reconnect-delay=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-l &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--ctrl-loss-tmo=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-g
+</dt>
+<dt class="hdlist1">
+--hdr-digest
+</dt>
+<dt class="hdlist1">
+-G
+</dt>
+<dt class="hdlist1">
+--data-digest
+</dt>
+<dt class="hdlist1">
+-i &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--nr-io-queues=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-W &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--nr-write-queues=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-P &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--nr-poll-queues=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-Q &lt;#&gt;
+</dt>
+<dt class="hdlist1">
+--queue-size=&lt;#&gt;
+</dt>
+<dt class="hdlist1">
+-m
+</dt>
+<dt class="hdlist1">
+--matching
+</dt>
+<dd>
+<p>
+        These options have the same meaning as for <em>nvme connect-all</em>. See the
+        man page nvme-connect-all(1) for details.
+</p>
+</dd>
+<dt class="hdlist1">
+-S
+</dt>
+<dt class="hdlist1">
+--silent
+</dt>
+<dd>
+<p>
+        Only print warnings and severe error messages. Do not log discoveries
+        and newly created controllers.
+</p>
+</dd>
+<dt class="hdlist1">
+-v
+</dt>
+<dt class="hdlist1">
+--verbose
+</dt>
+<dd>
+<p>
+        Log informational messages. This option overrides <em>--silent</em>.
+</p>
+</dd>
+<dt class="hdlist1">
+-D
+</dt>
+<dt class="hdlist1">
+--debug
+</dt>
+<dd>
+<p>
+        Log informational and debug messages. This option overrieds <em>--silent</em>
+        and <em>--verbose</em>.
+</p>
+</dd>
+<dt class="hdlist1">
+-t
+</dt>
+<dt class="hdlist1">
+--timestamps
+</dt>
+<dd>
+<p>
+        Add timestamps to log messages.
+</p>
+</dd>
+</dl></div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_examples">EXAMPLES</h2>
+<div class="sectionbody">
+<div class="ulist"><ul>
+<li>
+<p>
+Listen to FC-NVME events and AENs, creating persistent Discovery Controllers
+on the way, and automatically connect to all discovered controllers:
+</p>
+<div class="listingblock">
+<div class="content">
+<pre><code># nvme monitor</code></pre>
+</div></div>
+</li>
+<li>
+<p>
+Like the above, but print more log messages, remove created discovery controller
+connections on exit, and use a non-standard host NQN:
+</p>
+<div class="listingblock">
+<div class="content">
+<pre><code># nvme monitor --verbose --cleanup --hostqn=host1-rogue-nqn</code></pre>
+</div></div>
+</li>
+</ul></div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_see_also">SEE ALSO</h2>
+<div class="sectionbody">
+<div class="paragraph"><p>nvme-discover(1)
+nvme-connect-all(1)</p></div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_nvme">NVME</h2>
+<div class="sectionbody">
+<div class="paragraph"><p>Part of the nvme-user suite</p></div>
+</div>
+</div>
+</div>
+<div id="footnotes"><hr /></div>
+<div id="footer">
+<div id="footer-text">
+Last updated
+ 2021-02-25 00:42:13 CET
+</div>
+</div>
+</body>
+</html>
diff --git a/Documentation/nvme-monitor.txt b/Documentation/nvme-monitor.txt
new file mode 100644
index 0000000..79f9c03
--- /dev/null
+++ b/Documentation/nvme-monitor.txt
@@ -0,0 +1,144 @@
+nvme-monitor(1)
+===============
+
+NAME
+----
+nvme-monitor - Monitor Discovery events and Discover and Connect automatically
+
+SYNOPSIS
+--------
+[verse]
+'nvme discover'
+		[--no-auto                | -N]
+		[--hostnqn=<hostnqn>      | -q <hostnqn>]
+		[--hostid=<hostid>        | -I <hostid>]
+		[--keep-alive-tmo=<#>     | -k <#>]
+		[--reconnect-delay=<#>    | -c <#>]
+		[--ctrl-loss-tmo=<#>      | -l <#>]
+		[--hdr-digest             | -g]
+		[--data-digest            | -G]
+		[--nr-io-queues=<#>       | -i <#>]
+		[--nr-write-queues=<#>    | -W <#>]
+		[--nr-poll-queues=<#>     | -P <#>]
+		[--queue-size=<#>         | -Q <#>]
+		[--matching               | -m]
+		[--persistent             | -p]
+		[--silent                 | -S]
+		[--verbose                | -v]
+		[--debug                  | -D]
+		[--timestamps             | -t]
+
+DESCRIPTION
+-----------
+Listen to Discovery events (Asynchronous Event Notifications, AENs) on
+NVMe-over-Fabrics (NVMeoF) Discovery Controllers and for other events related
+to NVMeoF Discovery, and optionally connect to newly discovered controllers.
+
+If no parameters are given, then 'nvme monitor' listens to Discovery-related
+udev events (uevents). If an event is received, it connects to the Discovery
+Controller and performs the equivalent of an 'nvme connect-all' on the
+associated transport address. When run through a systemd service in this
+mode, the monitor can be used as an alternative to the udev-rule based
+auto-activation of NVMeoF connections. If this is done, it is recommended
+to deactivate the udev rule-based autoconnection mechanism, e.g. by creating
+a symlink `/run/udev/rules.d/70-nvmf-autoconnect.rules` to `/dev/null`.
+Otherwise both mechanisms will run discovery in parallel, which causes
+unnecessary system activity and spurious error messages.
+
+Currently, the following event types are supported:
+
+- "Discovery Log Page Change" Asynchronous Event Notifications (AENs)
+  delivered via persistent connections to NVMeoF discovery controllers
+  connected to the discovery service (`nqn.2014-08.org.nvmexpress.discovery`).
+
+- FC-NVMe auto-connect uevents sent when the FibreChannel transport discovers
+  N_Ports offering NVMe services.
+
+See the documentation for the nvme-connect-all(1) and nvme-discover(1)
+commands for further background.
+
+OPTIONS
+-------
+
+-N::
+--no-connect::
+	If this option is given, 'nvme monitor' will not  attempt to connect to
+	newly discovered controllers. Instead, information about found
+	discovery log entries will be printed to stdout (in other words, instead of
+	'nvme connect-all', the monitor only executes 'nvme discover' for
+	detected discovery controllers).
+
+-C::
+--cleanup::
+	Disconnect discovery controllers when the program exits. This affects
+	only discovery controller connections created while the program was running.
+
+-q <hostnqn>::
+--hostnqn=<hostnqn>::
+-I <hostid>::
+--hostid=<hostid>::
+-k <#>::
+--keep-alive-tmo=<#>::
+-c <#>::
+--reconnect-delay=<#>::
+-l <#>::
+--ctrl-loss-tmo=<#>::
+-g::
+--hdr-digest::
+-G::
+--data-digest::
+-i <#>::
+--nr-io-queues=<#>::
+-W <#>::
+--nr-write-queues=<#>::
+-P <#>::
+--nr-poll-queues=<#>::
+-Q <#>::
+--queue-size=<#>::
+-m::
+--matching::
+	These options have the same meaning as for 'nvme connect-all'. See the
+	man page nvme-connect-all(1) for details.
+
+-S::
+--silent::
+	Only print warnings and severe error messages. Do not log discoveries
+	and newly created controllers.
+
+-v::
+--verbose::
+	Log informational messages. This option overrides '--silent'.
+
+-D::
+--debug::
+	Log informational and debug messages. This option overrieds '--silent'
+	and '--verbose'.
+
+-t::
+--timestamps::
+	Add timestamps to log messages.
+
+EXAMPLES
+--------
+* Listen to FC-NVME events and AENs, creating persistent Discovery Controllers
+on the way, and automatically connect to all discovered controllers:
++
+-------------
+# nvme monitor
+-------------
++
+* Like the above, but print more log messages, remove created discovery controller
+connections on exit, and use a non-standard host NQN:
++
+------------
+# nvme monitor --verbose --cleanup --hostqn=host1-rogue-nqn
+------------
+
+SEE ALSO
+--------
+nvme-discover(1)
+nvme-connect-all(1)
+
+NVME
+----
+Part of the nvme-user suite
-- 
2.29.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/16] nvme-cli: add code for event and timeout handling
  2021-03-06  0:36 ` [PATCH v2 02/16] nvme-cli: add code for event and timeout handling mwilck
@ 2021-03-17  0:32   ` Martin Wilck
  2021-03-19 16:42     ` Martin Wilck
  0 siblings, 1 reply; 20+ messages in thread
From: Martin Wilck @ 2021-03-17  0:32 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya

On Sat, 2021-03-06 at 01:36 +0100, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> For the nvme monitor functionality, an event handling mechanism
> will be necessary which deals with event timeouts. While there are
> standard
> solutions for this (e.g. libevent), these add unnecessary complexity
> and dependencies to nvme-cli.
> 
> Add a small, straighforward event and timeout handling code based
> on epoll and timerfd.
> 
> This code is identical to what I've pushed recently to
> https://github.com/mwilck/minivent, where I added a couple of unit
> tests to make sure the code is as robust as it needs to be.

FTR, I found a use-after-free condition in this code. It's will be
fixed in the next iteration of this patch set.

Martin



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/16] nvme-cli: add code for event and timeout handling
  2021-03-17  0:32   ` Martin Wilck
@ 2021-03-19 16:42     ` Martin Wilck
  2021-03-30 22:06       ` Martin Wilck
  0 siblings, 1 reply; 20+ messages in thread
From: Martin Wilck @ 2021-03-19 16:42 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya

On Wed, 2021-03-17 at 01:32 +0100, Martin Wilck wrote:
> On Sat, 2021-03-06 at 01:36 +0100, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > For the nvme monitor functionality, an event handling mechanism
> > will be necessary which deals with event timeouts. While there are
> > standard
> > solutions for this (e.g. libevent), these add unnecessary
> > complexity
> > and dependencies to nvme-cli.
> > 
> > Add a small, straighforward event and timeout handling code based
> > on epoll and timerfd.
> > 
> > This code is identical to what I've pushed recently to
> > https://github.com/mwilck/minivent, where I added a couple of unit
> > tests to make sure the code is as robust as it needs to be.
> 
> FTR, I found a use-after-free condition in this code. It's will be
> fixed in the next iteration of this patch set.

The fixed code is in the git repo mentioned above. I suppose that, in
case that the general approach gets positive review, it would make
sense to pull this in via git subtree or git submodule. Tell me your
preferences.

Regards
Martin



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 02/16] nvme-cli: add code for event and timeout handling
  2021-03-19 16:42     ` Martin Wilck
@ 2021-03-30 22:06       ` Martin Wilck
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2021-03-30 22:06 UTC (permalink / raw)
  To: Sagi Grimberg, Hannes Reinecke, Keith Busch
  Cc: Chaitanya Kulkarni, linux-nvme, Enzo Matsumiya

On Fri, 2021-03-19 at 17:42 +0100, Martin Wilck wrote:
> On Wed, 2021-03-17 at 01:32 +0100, Martin Wilck wrote:
> > On Sat, 2021-03-06 at 01:36 +0100, mwilck@suse.com wrote:
> > > From: Martin Wilck <mwilck@suse.com>
> > > 
> > > For the nvme monitor functionality, an event handling mechanism
> > > will be necessary which deals with event timeouts. While there are
> > > standard
> > > solutions for this (e.g. libevent), these add unnecessary
> > > complexity
> > > and dependencies to nvme-cli.
> > > 
> > > Add a small, straighforward event and timeout handling code based
> > > on epoll and timerfd.
> > > 
> > > This code is identical to what I've pushed recently to
> > > https://github.com/mwilck/minivent, where I added a couple of unit
> > > tests to make sure the code is as robust as it needs to be.
> > 
> > FTR, I found a use-after-free condition in this code. It's will be
> > fixed in the next iteration of this patch set.
> 
> The fixed code is in the git repo mentioned above. I suppose that, in
> case that the general approach gets positive review, it would make
> sense to pull this in via git subtree or git submodule. Tell me your
> preferences.

I've resubmitted to https://github.com/linux-nvme/nvme-cli/pull/877 now
with the event code incorporated via "git subtree"
from https://github.com/mwilck/minivent. I need to flatten the subtree
merge in order to submit this here in patch form, but I'll
postpone that until I get some feedback on the v2 series.

Best Regards,
Martin



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-03-30 22:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-06  0:36 [PATCH v2 00/16] nvme-cli: add "nvme monitor" subcommand mwilck
2021-03-06  0:36 ` [PATCH v2 01/16] fabrics: export symbols required for monitor functionality mwilck
2021-03-06  0:36 ` [PATCH v2 02/16] nvme-cli: add code for event and timeout handling mwilck
2021-03-17  0:32   ` Martin Wilck
2021-03-19 16:42     ` Martin Wilck
2021-03-30 22:06       ` Martin Wilck
2021-03-06  0:36 ` [PATCH v2 03/16] monitor: add basic "nvme monitor" functionality mwilck
2021-03-06  0:36 ` [PATCH v2 04/16] monitor: implement uevent handling mwilck
2021-03-06  0:36 ` [PATCH v2 05/16] conn-db: add simple connection registry mwilck
2021-03-06  0:36 ` [PATCH v2 06/16] monitor: monitor_discovery(): try to reuse existing controllers mwilck
2021-03-06  0:36 ` [PATCH v2 07/16] monitor: kill running discovery tasks on exit mwilck
2021-03-06  0:36 ` [PATCH v2 08/16] monitor: add option --cleanup / -C mwilck
2021-03-06  0:36 ` [PATCH v2 09/16] monitor: handling of add/remove uevents for nvme controllers mwilck
2021-03-06  0:36 ` [PATCH v2 10/16] monitor: discover from conf file on startup mwilck
2021-03-06  0:36 ` [PATCH v2 11/16] monitor: watch discovery.conf with inotify mwilck
2021-03-06  0:36 ` [PATCH v2 12/16] monitor: add parent/child messaging and "notify" message exchange mwilck
2021-03-06  0:36 ` [PATCH v2 13/16] monitor: add "query device" " mwilck
2021-03-06  0:36 ` [PATCH v2 14/16] completions: add completions for nvme monitor mwilck
2021-03-06  0:36 ` [PATCH v2 15/16] nvmf-autoconnect: add unit file for nvme-monitor.service mwilck
2021-03-06  0:36 ` [PATCH v2 16/16] nvme-monitor(1): add man page for nvme-monitor mwilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).