All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] nbd: Netlink interface and path failure enhancements
@ 2017-04-06 21:01 Josef Bacik
  2017-04-06 21:01 ` [PATCH 01/12] nbd: put socket in error cases Josef Bacik
                   ` (13 more replies)
  0 siblings, 14 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:01 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

This patchset adds a new netlink configuration interface to NBD as well as a
bunch of enhancments around path failures.  The patches provide the following
enhancemnts to NBD

 - Netlink configuration interface that doesn't leave a userspace application
   waiting in kernel space for the device to disconnect.
 - Netlink reconfigure interface for adding re-connected sockets to replace dead
   sockets.
 - A flag to destroy the NBD device on disconnect, much like how mount -o loop
   works.
 - A status interface that currently will only report whether a device is
   connected or not, but can be extended to include whatever in the future.
 - A netlink multicast notification scheme to notify user space when there are
   connection issues to allow for seamless reconnects.
 - Dead link handling.  You can specify a dead link timeout and the NBD device
   will pause IO for that timeout waiting to see if the connection can be
   re-established.  This is helpful to allow for things like nbd server upgrades
   where the whole server disappears for a short period of time.

These patches have been thorougly and continuously tested for about a month.
I've been finding bugs in various places, but this batch has been solid for the
last few days of testing, which include a constant disconnect/reconnect torture
test.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/12] nbd: put socket in error cases
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
@ 2017-04-06 21:01 ` Josef Bacik
  2017-04-06 21:01 ` [PATCH 02/12] nbd: handle single path failures gracefully Josef Bacik
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:01 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

When adding a new socket we look it up and then try to add it to our
configuration.  If any of those steps fail we need to make sure we put
the socket so we don't leak them.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 03ae729..a4fd82e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -628,16 +628,21 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 	if (nbd->task_setup != current) {
 		dev_err(disk_to_dev(nbd->disk),
 			"Device being setup by another task");
+		sockfd_put(sock);
 		return -EINVAL;
 	}
 
 	socks = krealloc(nbd->socks, (nbd->num_connections + 1) *
 			 sizeof(struct nbd_sock *), GFP_KERNEL);
-	if (!socks)
+	if (!socks) {
+		sockfd_put(sock);
 		return -ENOMEM;
+	}
 	nsock = kzalloc(sizeof(struct nbd_sock), GFP_KERNEL);
-	if (!nsock)
+	if (!nsock) {
+		sockfd_put(sock);
 		return -ENOMEM;
+	}
 
 	nbd->socks = socks;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/12] nbd: handle single path failures gracefully
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
  2017-04-06 21:01 ` [PATCH 01/12] nbd: put socket in error cases Josef Bacik
@ 2017-04-06 21:01 ` Josef Bacik
  2017-04-06 21:01 ` [PATCH 03/12] nbd: separate out the config information Josef Bacik
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:01 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

Currently if we have multiple connections and one of them goes down we will tear
down the whole device.  However there's no reason we need to do this as we
could have other connections that are working fine.  Deal with this by keeping
track of the state of the different connections, and if we lose one we mark it
as dead and send all IO destined for that socket to one of the other healthy
sockets.  Any outstanding requests that were on the dead socket will timeout and
be re-submitted properly.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 151 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 125 insertions(+), 26 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a4fd82e..4e64d60 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -49,6 +49,8 @@ struct nbd_sock {
 	struct mutex tx_lock;
 	struct request *pending;
 	int sent;
+	bool dead;
+	int fallback_index;
 };
 
 #define NBD_TIMEDOUT			0
@@ -82,6 +84,7 @@ struct nbd_device {
 
 struct nbd_cmd {
 	struct nbd_device *nbd;
+	int index;
 	struct completion send_complete;
 };
 
@@ -124,6 +127,15 @@ static const char *nbdcmd_to_ascii(int cmd)
 	return "invalid";
 }
 
+static void nbd_mark_nsock_dead(struct nbd_sock *nsock)
+{
+	if (!nsock->dead)
+		kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
+	nsock->dead = true;
+	nsock->pending = NULL;
+	nsock->sent = 0;
+}
+
 static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
 {
 	if (bdev->bd_openers <= 1)
@@ -191,7 +203,31 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
 	struct nbd_device *nbd = cmd->nbd;
 
-	dev_err(nbd_to_dev(nbd), "Connection timed out, shutting down connection\n");
+	if (nbd->num_connections > 1) {
+		dev_err_ratelimited(nbd_to_dev(nbd),
+				    "Connection timed out, retrying\n");
+		mutex_lock(&nbd->config_lock);
+		/*
+		 * Hooray we have more connections, requeue this IO, the submit
+		 * path will put it on a real connection.
+		 */
+		if (nbd->socks && nbd->num_connections > 1) {
+			if (cmd->index < nbd->num_connections) {
+				struct nbd_sock *nsock =
+					nbd->socks[cmd->index];
+				mutex_lock(&nsock->tx_lock);
+				nbd_mark_nsock_dead(nsock);
+				mutex_unlock(&nsock->tx_lock);
+			}
+			mutex_unlock(&nbd->config_lock);
+			blk_mq_requeue_request(req, true);
+			return BLK_EH_NOT_HANDLED;
+		}
+		mutex_unlock(&nbd->config_lock);
+	} else {
+		dev_err_ratelimited(nbd_to_dev(nbd),
+				    "Connection timed out\n");
+	}
 	set_bit(NBD_TIMEDOUT, &nbd->runtime_flags);
 	req->errors = -EIO;
 
@@ -301,6 +337,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		}
 		iov_iter_advance(&from, sent);
 	}
+	cmd->index = index;
 	request.type = htonl(type);
 	if (type != NBD_CMD_FLUSH) {
 		request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9);
@@ -328,7 +365,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		}
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 			"Send control failed (result %d)\n", result);
-		return -EIO;
+		return -EAGAIN;
 	}
 send_pages:
 	if (type != NBD_CMD_WRITE)
@@ -370,7 +407,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 				dev_err(disk_to_dev(nbd->disk),
 					"Send data failed (result %d)\n",
 					result);
-				return -EIO;
+				return -EAGAIN;
 			}
 			/*
 			 * The completion might already have come in,
@@ -389,6 +426,12 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
+static int nbd_disconnected(struct nbd_device *nbd)
+{
+	return test_bit(NBD_DISCONNECTED, &nbd->runtime_flags) ||
+		test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags);
+}
+
 /* NULL returned = something went wrong, inform userspace */
 static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 {
@@ -405,8 +448,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	iov_iter_kvec(&to, READ | ITER_KVEC, &iov, 1, sizeof(reply));
 	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
 	if (result <= 0) {
-		if (!test_bit(NBD_DISCONNECTED, &nbd->runtime_flags) &&
-		    !test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags))
+		if (!nbd_disconnected(nbd))
 			dev_err(disk_to_dev(nbd->disk),
 				"Receive control failed (result %d)\n", result);
 		return ERR_PTR(result);
@@ -449,8 +491,19 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 			if (result <= 0) {
 				dev_err(disk_to_dev(nbd->disk), "Receive data failed (result %d)\n",
 					result);
-				req->errors = -EIO;
-				return cmd;
+				/*
+				 * If we've disconnected or we only have 1
+				 * connection then we need to make sure we
+				 * complete this request, otherwise error out
+				 * and let the timeout stuff handle resubmitting
+				 * this request onto another connection.
+				 */
+				if (nbd_disconnected(nbd) ||
+				    nbd->num_connections <= 1) {
+					req->errors = -EIO;
+					return cmd;
+				}
+				return ERR_PTR(-EIO);
 			}
 			dev_dbg(nbd_to_dev(nbd), "request %p: got %d bytes data\n",
 				cmd, bvec.bv_len);
@@ -495,19 +548,17 @@ static void recv_work(struct work_struct *work)
 	while (1) {
 		cmd = nbd_read_stat(nbd, args->index);
 		if (IS_ERR(cmd)) {
+			struct nbd_sock *nsock = nbd->socks[args->index];
+
+			mutex_lock(&nsock->tx_lock);
+			nbd_mark_nsock_dead(nsock);
+			mutex_unlock(&nsock->tx_lock);
 			ret = PTR_ERR(cmd);
 			break;
 		}
 
 		nbd_end_request(cmd);
 	}
-
-	/*
-	 * We got an error, shut everybody down if this wasn't the result of a
-	 * disconnect request.
-	 */
-	if (ret && !test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags))
-		sock_shutdown(nbd);
 	atomic_dec(&nbd->recv_threads);
 	wake_up(&nbd->recv_wq);
 }
@@ -531,6 +582,47 @@ static void nbd_clear_que(struct nbd_device *nbd)
 	dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n");
 }
 
+static int find_fallback(struct nbd_device *nbd, int index)
+{
+	int new_index = -1;
+	struct nbd_sock *nsock = nbd->socks[index];
+	int fallback = nsock->fallback_index;
+
+	if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags))
+		return new_index;
+
+	if (nbd->num_connections <= 1) {
+		dev_err_ratelimited(disk_to_dev(nbd->disk),
+				    "Attempted send on invalid socket\n");
+		return new_index;
+	}
+
+	if (fallback >= 0 && fallback < nbd->num_connections &&
+	    !nbd->socks[fallback]->dead)
+		return fallback;
+
+	if (nsock->fallback_index < 0 ||
+	    nsock->fallback_index >= nbd->num_connections ||
+	    nbd->socks[nsock->fallback_index]->dead) {
+		int i;
+		for (i = 0; i < nbd->num_connections; i++) {
+			if (i == index)
+				continue;
+			if (!nbd->socks[i]->dead) {
+				new_index = i;
+				break;
+			}
+		}
+		nsock->fallback_index = new_index;
+		if (new_index < 0) {
+			dev_err_ratelimited(disk_to_dev(nbd->disk),
+					    "Dead connection, failed to find a fallback\n");
+			return new_index;
+		}
+	}
+	new_index = nsock->fallback_index;
+	return new_index;
+}
 
 static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 {
@@ -544,22 +636,16 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 				    "Attempted send on invalid socket\n");
 		return -EINVAL;
 	}
-
-	if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags)) {
-		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Attempted send on closed socket\n");
-		return -EINVAL;
-	}
-
 	req->errors = 0;
-
+again:
 	nsock = nbd->socks[index];
 	mutex_lock(&nsock->tx_lock);
-	if (unlikely(!nsock->sock)) {
+	if (nsock->dead) {
+		index = find_fallback(nbd, index);
 		mutex_unlock(&nsock->tx_lock);
-		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Attempted send on closed socket\n");
-		return -EINVAL;
+		if (index < 0)
+			return -EIO;
+		goto again;
 	}
 
 	/* Handle the case that we have a pending request that was partially
@@ -572,7 +658,18 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 		ret = 0;
 		goto out;
 	}
+	/*
+	 * Some failures are related to the link going down, so anything that
+	 * returns EAGAIN can be retried on a different socket.
+	 */
 	ret = nbd_send_cmd(nbd, cmd, index);
+	if (ret == -EAGAIN) {
+		dev_err_ratelimited(disk_to_dev(nbd->disk),
+				    "Request send failed trying another connection\n");
+		nbd_mark_nsock_dead(nsock);
+		mutex_unlock(&nsock->tx_lock);
+		goto again;
+	}
 out:
 	mutex_unlock(&nsock->tx_lock);
 	return ret;
@@ -646,6 +743,8 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 
 	nbd->socks = socks;
 
+	nsock->fallback_index = -1;
+	nsock->dead = false;
 	mutex_init(&nsock->tx_lock);
 	nsock->sock = sock;
 	nsock->pending = NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/12] nbd: separate out the config information
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
  2017-04-06 21:01 ` [PATCH 01/12] nbd: put socket in error cases Josef Bacik
  2017-04-06 21:01 ` [PATCH 02/12] nbd: handle single path failures gracefully Josef Bacik
@ 2017-04-06 21:01 ` Josef Bacik
  2017-04-06 21:01 ` [PATCH 04/12] nbd: stop using the bdev everywhere Josef Bacik
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:01 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

In order to properly refcount the various aspects of a NBD device we
need to separate out the configuration elements of the nbd device.  The
configuration of a NBD device has a different lifetime from the actual
device, so it doesn't make sense to bundle these two concepts.  Add a
config_refs to keep track of the configuration structure, that way we
can be sure that we never access it when we've torn down the device.
Add a new nbd_config structure to hold all of the transient
configuration information.  Finally create this when we open the device
so that it is in place when we start to configure the device.  This has
a nice side-effect of fixing a long standing problem where you could end
up with a half-configured nbd device that needed to be "disconnected" in
order to be usable again.  Now once we close our device the
configuration will be discarded.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 437 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 263 insertions(+), 174 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 4e64d60..c2d9923 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -53,35 +53,45 @@ struct nbd_sock {
 	int fallback_index;
 };
 
+struct recv_thread_args {
+	struct work_struct work;
+	struct nbd_device *nbd;
+	int index;
+};
+
 #define NBD_TIMEDOUT			0
 #define NBD_DISCONNECT_REQUESTED	1
 #define NBD_DISCONNECTED		2
-#define NBD_RUNNING			3
+#define NBD_HAS_PID_FILE		3
 
-struct nbd_device {
+struct nbd_config {
 	u32 flags;
 	unsigned long runtime_flags;
-	struct nbd_sock **socks;
-	int magic;
-
-	struct blk_mq_tag_set tag_set;
 
-	struct mutex config_lock;
-	struct gendisk *disk;
+	struct nbd_sock **socks;
 	int num_connections;
+
 	atomic_t recv_threads;
 	wait_queue_head_t recv_wq;
 	loff_t blksize;
 	loff_t bytesize;
-
-	struct task_struct *task_recv;
-	struct task_struct *task_setup;
-
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 	struct dentry *dbg_dir;
 #endif
 };
 
+struct nbd_device {
+	struct blk_mq_tag_set tag_set;
+
+	refcount_t config_refs;
+	struct nbd_config *config;
+	struct mutex config_lock;
+	struct gendisk *disk;
+
+	struct task_struct *task_recv;
+	struct task_struct *task_setup;
+};
+
 struct nbd_cmd {
 	struct nbd_device *nbd;
 	int index;
@@ -103,7 +113,7 @@ static int part_shift;
 
 static int nbd_dev_dbg_init(struct nbd_device *nbd);
 static void nbd_dev_dbg_close(struct nbd_device *nbd);
-
+static void nbd_config_put(struct nbd_device *nbd);
 
 static inline struct device *nbd_to_dev(struct nbd_device *nbd)
 {
@@ -127,6 +137,20 @@ static const char *nbdcmd_to_ascii(int cmd)
 	return "invalid";
 }
 
+static ssize_t pid_show(struct device *dev,
+			struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct nbd_device *nbd = (struct nbd_device *)disk->private_data;
+
+	return sprintf(buf, "%d\n", task_pid_nr(nbd->task_recv));
+}
+
+static struct device_attribute pid_attr = {
+	.attr = { .name = "pid", .mode = S_IRUGO},
+	.show = pid_show,
+};
+
 static void nbd_mark_nsock_dead(struct nbd_sock *nsock)
 {
 	if (!nsock->dead)
@@ -138,28 +162,32 @@ static void nbd_mark_nsock_dead(struct nbd_sock *nsock)
 
 static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
 {
-	if (bdev->bd_openers <= 1)
-		bd_set_size(bdev, 0);
-	set_capacity(nbd->disk, 0);
-	kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
+	if (nbd->config->bytesize) {
+		if (bdev->bd_openers <= 1)
+			bd_set_size(bdev, 0);
+		set_capacity(nbd->disk, 0);
+		kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
+	}
 
 	return 0;
 }
 
 static void nbd_size_update(struct nbd_device *nbd, struct block_device *bdev)
 {
-	blk_queue_logical_block_size(nbd->disk->queue, nbd->blksize);
-	blk_queue_physical_block_size(nbd->disk->queue, nbd->blksize);
-	bd_set_size(bdev, nbd->bytesize);
-	set_capacity(nbd->disk, nbd->bytesize >> 9);
+	struct nbd_config *config = nbd->config;
+	blk_queue_logical_block_size(nbd->disk->queue, config->blksize);
+	blk_queue_physical_block_size(nbd->disk->queue, config->blksize);
+	bd_set_size(bdev, config->bytesize);
+	set_capacity(nbd->disk, config->bytesize >> 9);
 	kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
 }
 
 static void nbd_size_set(struct nbd_device *nbd, struct block_device *bdev,
 			loff_t blocksize, loff_t nr_blocks)
 {
-	nbd->blksize = blocksize;
-	nbd->bytesize = blocksize * nr_blocks;
+	struct nbd_config *config = nbd->config;
+	config->blksize = blocksize;
+	config->bytesize = blocksize * nr_blocks;
 	if (nbd_is_connected(nbd))
 		nbd_size_update(nbd, bdev);
 }
@@ -181,17 +209,19 @@ static void nbd_end_request(struct nbd_cmd *cmd)
  */
 static void sock_shutdown(struct nbd_device *nbd)
 {
+	struct nbd_config *config = nbd->config;
 	int i;
 
-	if (nbd->num_connections == 0)
+	if (config->num_connections == 0)
 		return;
-	if (test_and_set_bit(NBD_DISCONNECTED, &nbd->runtime_flags))
+	if (test_and_set_bit(NBD_DISCONNECTED, &config->runtime_flags))
 		return;
 
-	for (i = 0; i < nbd->num_connections; i++) {
-		struct nbd_sock *nsock = nbd->socks[i];
+	for (i = 0; i < config->num_connections; i++) {
+		struct nbd_sock *nsock = config->socks[i];
 		mutex_lock(&nsock->tx_lock);
 		kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
+		nbd_mark_nsock_dead(nsock);
 		mutex_unlock(&nsock->tx_lock);
 	}
 	dev_warn(disk_to_dev(nbd->disk), "shutting down sockets\n");
@@ -202,38 +232,43 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 {
 	struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
 	struct nbd_device *nbd = cmd->nbd;
+	struct nbd_config *config;
+
+	if (!refcount_inc_not_zero(&nbd->config_refs)) {
+		req->errors = -EIO;
+		return BLK_EH_HANDLED;
+	}
 
-	if (nbd->num_connections > 1) {
+	config = nbd->config;
+
+	if (config->num_connections > 1) {
 		dev_err_ratelimited(nbd_to_dev(nbd),
 				    "Connection timed out, retrying\n");
-		mutex_lock(&nbd->config_lock);
 		/*
 		 * Hooray we have more connections, requeue this IO, the submit
 		 * path will put it on a real connection.
 		 */
-		if (nbd->socks && nbd->num_connections > 1) {
-			if (cmd->index < nbd->num_connections) {
+		if (config->socks && config->num_connections > 1) {
+			if (cmd->index < config->num_connections) {
 				struct nbd_sock *nsock =
-					nbd->socks[cmd->index];
+					config->socks[cmd->index];
 				mutex_lock(&nsock->tx_lock);
 				nbd_mark_nsock_dead(nsock);
 				mutex_unlock(&nsock->tx_lock);
 			}
-			mutex_unlock(&nbd->config_lock);
 			blk_mq_requeue_request(req, true);
+			nbd_config_put(nbd);
 			return BLK_EH_NOT_HANDLED;
 		}
-		mutex_unlock(&nbd->config_lock);
 	} else {
 		dev_err_ratelimited(nbd_to_dev(nbd),
 				    "Connection timed out\n");
 	}
-	set_bit(NBD_TIMEDOUT, &nbd->runtime_flags);
+	set_bit(NBD_TIMEDOUT, &config->runtime_flags);
 	req->errors = -EIO;
-
-	mutex_lock(&nbd->config_lock);
 	sock_shutdown(nbd);
-	mutex_unlock(&nbd->config_lock);
+	nbd_config_put(nbd);
+
 	return BLK_EH_HANDLED;
 }
 
@@ -243,7 +278,8 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 static int sock_xmit(struct nbd_device *nbd, int index, int send,
 		     struct iov_iter *iter, int msg_flags, int *sent)
 {
-	struct socket *sock = nbd->socks[index]->sock;
+	struct nbd_config *config = nbd->config;
+	struct socket *sock = config->socks[index]->sock;
 	int result;
 	struct msghdr msg;
 	unsigned long pflags = current->flags;
@@ -289,7 +325,8 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send,
 static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 {
 	struct request *req = blk_mq_rq_from_pdu(cmd);
-	struct nbd_sock *nsock = nbd->socks[index];
+	struct nbd_config *config = nbd->config;
+	struct nbd_sock *nsock = config->socks[index];
 	int result;
 	struct nbd_request request = {.magic = htonl(NBD_REQUEST_MAGIC)};
 	struct kvec iov = {.iov_base = &request, .iov_len = sizeof(request)};
@@ -320,7 +357,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	}
 
 	if (rq_data_dir(req) == WRITE &&
-	    (nbd->flags & NBD_FLAG_READ_ONLY)) {
+	    (config->flags & NBD_FLAG_READ_ONLY)) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Write on read-only\n");
 		return -EIO;
@@ -426,15 +463,16 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
-static int nbd_disconnected(struct nbd_device *nbd)
+static int nbd_disconnected(struct nbd_config *config)
 {
-	return test_bit(NBD_DISCONNECTED, &nbd->runtime_flags) ||
-		test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags);
+	return test_bit(NBD_DISCONNECTED, &config->runtime_flags) ||
+		test_bit(NBD_DISCONNECT_REQUESTED, &config->runtime_flags);
 }
 
 /* NULL returned = something went wrong, inform userspace */
 static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 {
+	struct nbd_config *config = nbd->config;
 	int result;
 	struct nbd_reply reply;
 	struct nbd_cmd *cmd;
@@ -448,7 +486,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	iov_iter_kvec(&to, READ | ITER_KVEC, &iov, 1, sizeof(reply));
 	result = sock_xmit(nbd, index, 0, &to, MSG_WAITALL, NULL);
 	if (result <= 0) {
-		if (!nbd_disconnected(nbd))
+		if (!nbd_disconnected(config))
 			dev_err(disk_to_dev(nbd->disk),
 				"Receive control failed (result %d)\n", result);
 		return ERR_PTR(result);
@@ -498,8 +536,8 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 				 * and let the timeout stuff handle resubmitting
 				 * this request onto another connection.
 				 */
-				if (nbd_disconnected(nbd) ||
-				    nbd->num_connections <= 1) {
+				if (nbd_disconnected(config) ||
+				    config->num_connections <= 1) {
 					req->errors = -EIO;
 					return cmd;
 				}
@@ -515,40 +553,20 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 	return cmd;
 }
 
-static ssize_t pid_show(struct device *dev,
-			struct device_attribute *attr, char *buf)
-{
-	struct gendisk *disk = dev_to_disk(dev);
-	struct nbd_device *nbd = (struct nbd_device *)disk->private_data;
-
-	return sprintf(buf, "%d\n", task_pid_nr(nbd->task_recv));
-}
-
-static struct device_attribute pid_attr = {
-	.attr = { .name = "pid", .mode = S_IRUGO},
-	.show = pid_show,
-};
-
-struct recv_thread_args {
-	struct work_struct work;
-	struct nbd_device *nbd;
-	int index;
-};
-
 static void recv_work(struct work_struct *work)
 {
 	struct recv_thread_args *args = container_of(work,
 						     struct recv_thread_args,
 						     work);
 	struct nbd_device *nbd = args->nbd;
+	struct nbd_config *config = nbd->config;
 	struct nbd_cmd *cmd;
 	int ret = 0;
 
-	BUG_ON(nbd->magic != NBD_MAGIC);
 	while (1) {
 		cmd = nbd_read_stat(nbd, args->index);
 		if (IS_ERR(cmd)) {
-			struct nbd_sock *nsock = nbd->socks[args->index];
+			struct nbd_sock *nsock = config->socks[args->index];
 
 			mutex_lock(&nsock->tx_lock);
 			nbd_mark_nsock_dead(nsock);
@@ -559,8 +577,10 @@ static void recv_work(struct work_struct *work)
 
 		nbd_end_request(cmd);
 	}
-	atomic_dec(&nbd->recv_threads);
-	wake_up(&nbd->recv_wq);
+	atomic_dec(&config->recv_threads);
+	wake_up(&config->recv_wq);
+	nbd_config_put(nbd);
+	kfree(args);
 }
 
 static void nbd_clear_req(struct request *req, void *data, bool reserved)
@@ -576,39 +596,38 @@ static void nbd_clear_req(struct request *req, void *data, bool reserved)
 
 static void nbd_clear_que(struct nbd_device *nbd)
 {
-	BUG_ON(nbd->magic != NBD_MAGIC);
-
 	blk_mq_tagset_busy_iter(&nbd->tag_set, nbd_clear_req, NULL);
 	dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n");
 }
 
 static int find_fallback(struct nbd_device *nbd, int index)
 {
+	struct nbd_config *config = nbd->config;
 	int new_index = -1;
-	struct nbd_sock *nsock = nbd->socks[index];
+	struct nbd_sock *nsock = config->socks[index];
 	int fallback = nsock->fallback_index;
 
-	if (test_bit(NBD_DISCONNECTED, &nbd->runtime_flags))
+	if (test_bit(NBD_DISCONNECTED, &config->runtime_flags))
 		return new_index;
 
-	if (nbd->num_connections <= 1) {
+	if (config->num_connections <= 1) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Attempted send on invalid socket\n");
 		return new_index;
 	}
 
-	if (fallback >= 0 && fallback < nbd->num_connections &&
-	    !nbd->socks[fallback]->dead)
+	if (fallback >= 0 && fallback < config->num_connections &&
+	    !config->socks[fallback]->dead)
 		return fallback;
 
 	if (nsock->fallback_index < 0 ||
-	    nsock->fallback_index >= nbd->num_connections ||
-	    nbd->socks[nsock->fallback_index]->dead) {
+	    nsock->fallback_index >= config->num_connections ||
+	    config->socks[nsock->fallback_index]->dead) {
 		int i;
-		for (i = 0; i < nbd->num_connections; i++) {
+		for (i = 0; i < config->num_connections; i++) {
 			if (i == index)
 				continue;
-			if (!nbd->socks[i]->dead) {
+			if (!config->socks[i]->dead) {
 				new_index = i;
 				break;
 			}
@@ -628,23 +647,34 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 {
 	struct request *req = blk_mq_rq_from_pdu(cmd);
 	struct nbd_device *nbd = cmd->nbd;
+	struct nbd_config *config;
 	struct nbd_sock *nsock;
 	int ret;
 
-	if (index >= nbd->num_connections) {
+	if (!refcount_inc_not_zero(&nbd->config_refs)) {
+		dev_err_ratelimited(disk_to_dev(nbd->disk),
+				    "Socks array is empty\n");
+		return -EINVAL;
+	}
+	config = nbd->config;
+
+	if (index >= config->num_connections) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Attempted send on invalid socket\n");
+		nbd_config_put(nbd);
 		return -EINVAL;
 	}
 	req->errors = 0;
 again:
-	nsock = nbd->socks[index];
+	nsock = config->socks[index];
 	mutex_lock(&nsock->tx_lock);
 	if (nsock->dead) {
 		index = find_fallback(nbd, index);
+		if (index < 0) {
+			ret = -EIO;
+			goto out;
+		}
 		mutex_unlock(&nsock->tx_lock);
-		if (index < 0)
-			return -EIO;
 		goto again;
 	}
 
@@ -672,6 +702,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	}
 out:
 	mutex_unlock(&nsock->tx_lock);
+	nbd_config_put(nbd);
 	return ret;
 }
 
@@ -711,6 +742,7 @@ static int nbd_queue_rq(struct blk_mq_hw_ctx *hctx,
 static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 			  unsigned long arg)
 {
+	struct nbd_config *config = nbd->config;
 	struct socket *sock;
 	struct nbd_sock **socks;
 	struct nbd_sock *nsock;
@@ -729,7 +761,7 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 		return -EINVAL;
 	}
 
-	socks = krealloc(nbd->socks, (nbd->num_connections + 1) *
+	socks = krealloc(config->socks, (config->num_connections + 1) *
 			 sizeof(struct nbd_sock *), GFP_KERNEL);
 	if (!socks) {
 		sockfd_put(sock);
@@ -741,7 +773,7 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 		return -ENOMEM;
 	}
 
-	nbd->socks = socks;
+	config->socks = socks;
 
 	nsock->fallback_index = -1;
 	nsock->dead = false;
@@ -749,7 +781,7 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 	nsock->sock = sock;
 	nsock->pending = NULL;
 	nsock->sent = 0;
-	socks[nbd->num_connections++] = nsock;
+	socks[config->num_connections++] = nsock;
 
 	if (max_part)
 		bdev->bd_invalidated = 1;
@@ -759,11 +791,7 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 /* Reset all properties of an NBD device */
 static void nbd_reset(struct nbd_device *nbd)
 {
-	nbd->runtime_flags = 0;
-	nbd->blksize = 1024;
-	nbd->bytesize = 0;
-	set_capacity(nbd->disk, 0);
-	nbd->flags = 0;
+	nbd->config = NULL;
 	nbd->tag_set.timeout = 0;
 	queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
 }
@@ -782,11 +810,12 @@ static void nbd_bdev_reset(struct block_device *bdev)
 
 static void nbd_parse_flags(struct nbd_device *nbd, struct block_device *bdev)
 {
-	if (nbd->flags & NBD_FLAG_READ_ONLY)
+	struct nbd_config *config = nbd->config;
+	if (config->flags & NBD_FLAG_READ_ONLY)
 		set_device_ro(bdev, true);
-	if (nbd->flags & NBD_FLAG_SEND_TRIM)
+	if (config->flags & NBD_FLAG_SEND_TRIM)
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
-	if (nbd->flags & NBD_FLAG_SEND_FLUSH)
+	if (config->flags & NBD_FLAG_SEND_FLUSH)
 		blk_queue_write_cache(nbd->disk->queue, true, false);
 	else
 		blk_queue_write_cache(nbd->disk->queue, false, false);
@@ -794,6 +823,7 @@ static void nbd_parse_flags(struct nbd_device *nbd, struct block_device *bdev)
 
 static void send_disconnects(struct nbd_device *nbd)
 {
+	struct nbd_config *config = nbd->config;
 	struct nbd_request request = {
 		.magic = htonl(NBD_REQUEST_MAGIC),
 		.type = htonl(NBD_CMD_DISC),
@@ -802,7 +832,7 @@ static void send_disconnects(struct nbd_device *nbd)
 	struct iov_iter from;
 	int i, ret;
 
-	for (i = 0; i < nbd->num_connections; i++) {
+	for (i = 0; i < config->num_connections; i++) {
 		iov_iter_kvec(&from, WRITE | ITER_KVEC, &iov, 1, sizeof(request));
 		ret = sock_xmit(nbd, i, 1, &from, 0, NULL);
 		if (ret <= 0)
@@ -813,20 +843,15 @@ static void send_disconnects(struct nbd_device *nbd)
 
 static int nbd_disconnect(struct nbd_device *nbd, struct block_device *bdev)
 {
-	dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n");
-	if (!nbd->socks)
-		return -EINVAL;
+	struct nbd_config *config = nbd->config;
 
+	dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n");
 	mutex_unlock(&nbd->config_lock);
 	fsync_bdev(bdev);
 	mutex_lock(&nbd->config_lock);
 
-	/* Check again after getting mutex back.  */
-	if (!nbd->socks)
-		return -EINVAL;
-
 	if (!test_and_set_bit(NBD_DISCONNECT_REQUESTED,
-			      &nbd->runtime_flags))
+			      &config->runtime_flags))
 		send_disconnects(nbd);
 	return 0;
 }
@@ -838,51 +863,63 @@ static int nbd_clear_sock(struct nbd_device *nbd, struct block_device *bdev)
 
 	__invalidate_device(bdev, true);
 	nbd_bdev_reset(bdev);
-	/*
-	 * We want to give the run thread a chance to wait for everybody
-	 * to clean up and then do it's own cleanup.
-	 */
-	if (!test_bit(NBD_RUNNING, &nbd->runtime_flags) &&
-	    nbd->num_connections) {
-		int i;
+	nbd->task_setup = NULL;
+	return 0;
+}
+
+static void nbd_config_put(struct nbd_device *nbd)
+{
+	if (refcount_dec_and_mutex_lock(&nbd->config_refs,
+					&nbd->config_lock)) {
+		struct block_device *bdev;
+		struct nbd_config *config = nbd->config;
 
-		for (i = 0; i < nbd->num_connections; i++) {
-			sockfd_put(nbd->socks[i]->sock);
-			kfree(nbd->socks[i]);
+		bdev = bdget_disk(nbd->disk, 0);
+		if (!bdev) {
+			mutex_unlock(&nbd->config_lock);
+			return;
 		}
-		kfree(nbd->socks);
-		nbd->socks = NULL;
-		nbd->num_connections = 0;
-	}
-	nbd->task_setup = NULL;
 
-	return 0;
+		nbd_dev_dbg_close(nbd);
+		nbd_size_clear(nbd, bdev);
+		if (test_and_clear_bit(NBD_HAS_PID_FILE,
+				       &config->runtime_flags))
+			device_remove_file(disk_to_dev(nbd->disk), &pid_attr);
+		nbd->task_recv = NULL;
+		nbd_clear_sock(nbd, bdev);
+		if (config->num_connections) {
+			int i;
+			for (i = 0; i < config->num_connections; i++) {
+				sockfd_put(config->socks[i]->sock);
+				kfree(config->socks[i]);
+			}
+			kfree(config->socks);
+		}
+		nbd_reset(nbd);
+		mutex_unlock(&nbd->config_lock);
+		bdput(bdev);
+		module_put(THIS_MODULE);
+	}
 }
 
 static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 {
-	struct recv_thread_args *args;
-	int num_connections = nbd->num_connections;
+	struct nbd_config *config = nbd->config;
+	int num_connections = config->num_connections;
 	int error = 0, i;
 
 	if (nbd->task_recv)
 		return -EBUSY;
-	if (!nbd->socks)
+	if (!config->socks)
 		return -EINVAL;
+
 	if (num_connections > 1 &&
-	    !(nbd->flags & NBD_FLAG_CAN_MULTI_CONN)) {
+	    !(config->flags & NBD_FLAG_CAN_MULTI_CONN)) {
 		dev_err(disk_to_dev(nbd->disk), "server does not support multiple connections per device.\n");
-		error = -EINVAL;
-		goto out_err;
+		return -EINVAL;
 	}
 
-	set_bit(NBD_RUNNING, &nbd->runtime_flags);
-	blk_mq_update_nr_hw_queues(&nbd->tag_set, nbd->num_connections);
-	args = kcalloc(num_connections, sizeof(*args), GFP_KERNEL);
-	if (!args) {
-		error = -ENOMEM;
-		goto out_err;
-	}
+	blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections);
 	nbd->task_recv = current;
 	mutex_unlock(&nbd->config_lock);
 
@@ -891,41 +928,40 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 	error = device_create_file(disk_to_dev(nbd->disk), &pid_attr);
 	if (error) {
 		dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n");
-		goto out_recv;
+		return error;
 	}
+	set_bit(NBD_HAS_PID_FILE, &config->runtime_flags);
 
 	nbd_size_update(nbd, bdev);
 
 	nbd_dev_dbg_init(nbd);
 	for (i = 0; i < num_connections; i++) {
-		sk_set_memalloc(nbd->socks[i]->sock->sk);
-		atomic_inc(&nbd->recv_threads);
-		INIT_WORK(&args[i].work, recv_work);
-		args[i].nbd = nbd;
-		args[i].index = i;
-		queue_work(recv_workqueue, &args[i].work);
+		struct recv_thread_args *args;
+
+		args = kzalloc(sizeof(*args), GFP_KERNEL);
+		if (!args) {
+			sock_shutdown(nbd);
+			return -ENOMEM;
+		}
+		sk_set_memalloc(config->socks[i]->sock->sk);
+		atomic_inc(&config->recv_threads);
+		refcount_inc(&nbd->config_refs);
+		INIT_WORK(&args->work, recv_work);
+		args->nbd = nbd;
+		args->index = i;
+		queue_work(recv_workqueue, &args->work);
 	}
-	wait_event_interruptible(nbd->recv_wq,
-				 atomic_read(&nbd->recv_threads) == 0);
-	for (i = 0; i < num_connections; i++)
-		flush_work(&args[i].work);
-	nbd_dev_dbg_close(nbd);
-	nbd_size_clear(nbd, bdev);
-	device_remove_file(disk_to_dev(nbd->disk), &pid_attr);
-out_recv:
+	error = wait_event_interruptible(config->recv_wq,
+					 atomic_read(&config->recv_threads) == 0);
+	if (error)
+		sock_shutdown(nbd);
 	mutex_lock(&nbd->config_lock);
-	nbd->task_recv = NULL;
-out_err:
-	clear_bit(NBD_RUNNING, &nbd->runtime_flags);
-	nbd_clear_sock(nbd, bdev);
 
 	/* user requested, ignore socket errors */
-	if (test_bit(NBD_DISCONNECT_REQUESTED, &nbd->runtime_flags))
+	if (test_bit(NBD_DISCONNECT_REQUESTED, &config->runtime_flags))
 		error = 0;
-	if (test_bit(NBD_TIMEDOUT, &nbd->runtime_flags))
+	if (test_bit(NBD_TIMEDOUT, &config->runtime_flags))
 		error = -ETIMEDOUT;
-
-	nbd_reset(nbd);
 	return error;
 }
 
@@ -933,6 +969,8 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		       unsigned int cmd, unsigned long arg)
 {
+	struct nbd_config *config = nbd->config;
+
 	switch (cmd) {
 	case NBD_DISCONNECT:
 		return nbd_disconnect(nbd, bdev);
@@ -942,14 +980,14 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		return nbd_add_socket(nbd, bdev, arg);
 	case NBD_SET_BLKSIZE:
 		nbd_size_set(nbd, bdev, arg,
-			     div_s64(nbd->bytesize, arg));
+			     div_s64(config->bytesize, arg));
 		return 0;
 	case NBD_SET_SIZE:
-		nbd_size_set(nbd, bdev, nbd->blksize,
-			     div_s64(arg, nbd->blksize));
+		nbd_size_set(nbd, bdev, config->blksize,
+			     div_s64(arg, config->blksize));
 		return 0;
 	case NBD_SET_SIZE_BLOCKS:
-		nbd_size_set(nbd, bdev, nbd->blksize, arg);
+		nbd_size_set(nbd, bdev, config->blksize, arg);
 		return 0;
 	case NBD_SET_TIMEOUT:
 		if (arg) {
@@ -959,7 +997,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		return 0;
 
 	case NBD_SET_FLAGS:
-		nbd->flags = arg;
+		config->flags = arg;
 		return 0;
 	case NBD_DO_IT:
 		return nbd_start_device(nbd, bdev);
@@ -988,18 +1026,70 @@ static int nbd_ioctl(struct block_device *bdev, fmode_t mode,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	BUG_ON(nbd->magic != NBD_MAGIC);
-
 	mutex_lock(&nbd->config_lock);
 	error = __nbd_ioctl(bdev, nbd, cmd, arg);
 	mutex_unlock(&nbd->config_lock);
-
 	return error;
 }
 
+static struct nbd_config *nbd_alloc_config(void)
+{
+	struct nbd_config *config;
+
+	config = kzalloc(sizeof(struct nbd_config), GFP_NOFS);
+	if (!config)
+		return NULL;
+	atomic_set(&config->recv_threads, 0);
+	init_waitqueue_head(&config->recv_wq);
+	config->blksize = 1024;
+	try_module_get(THIS_MODULE);
+	return config;
+}
+
+static int nbd_open(struct block_device *bdev, fmode_t mode)
+{
+	struct nbd_device *nbd;
+	int ret = 0;
+
+	mutex_lock(&nbd_index_mutex);
+	nbd = bdev->bd_disk->private_data;
+	if (!nbd) {
+		ret = -ENXIO;
+		goto out;
+	}
+	if (!refcount_inc_not_zero(&nbd->config_refs)) {
+		struct nbd_config *config;
+
+		mutex_lock(&nbd->config_lock);
+		if (refcount_inc_not_zero(&nbd->config_refs)) {
+			mutex_unlock(&nbd->config_lock);
+			goto out;
+		}
+		config = nbd->config = nbd_alloc_config();
+		if (!config) {
+			ret = -ENOMEM;
+			mutex_unlock(&nbd->config_lock);
+			goto out;
+		}
+		refcount_set(&nbd->config_refs, 1);
+		mutex_unlock(&nbd->config_lock);
+	}
+out:
+	mutex_unlock(&nbd_index_mutex);
+	return ret;
+}
+
+static void nbd_release(struct gendisk *disk, fmode_t mode)
+{
+	struct nbd_device *nbd = disk->private_data;
+	nbd_config_put(nbd);
+}
+
 static const struct block_device_operations nbd_fops =
 {
 	.owner =	THIS_MODULE,
+	.open =		nbd_open,
+	.release =	nbd_release,
 	.ioctl =	nbd_ioctl,
 	.compat_ioctl =	nbd_ioctl,
 };
@@ -1031,7 +1121,7 @@ static const struct file_operations nbd_dbg_tasks_ops = {
 static int nbd_dbg_flags_show(struct seq_file *s, void *unused)
 {
 	struct nbd_device *nbd = s->private;
-	u32 flags = nbd->flags;
+	u32 flags = nbd->config->flags;
 
 	seq_printf(s, "Hex: 0x%08x\n\n", flags);
 
@@ -1064,6 +1154,7 @@ static const struct file_operations nbd_dbg_flags_ops = {
 static int nbd_dev_dbg_init(struct nbd_device *nbd)
 {
 	struct dentry *dir;
+	struct nbd_config *config = nbd->config;
 
 	if (!nbd_dbg_dir)
 		return -EIO;
@@ -1074,12 +1165,12 @@ static int nbd_dev_dbg_init(struct nbd_device *nbd)
 			nbd_name(nbd));
 		return -EIO;
 	}
-	nbd->dbg_dir = dir;
+	config->dbg_dir = dir;
 
 	debugfs_create_file("tasks", 0444, dir, nbd, &nbd_dbg_tasks_ops);
-	debugfs_create_u64("size_bytes", 0444, dir, &nbd->bytesize);
+	debugfs_create_u64("size_bytes", 0444, dir, &config->bytesize);
 	debugfs_create_u32("timeout", 0444, dir, &nbd->tag_set.timeout);
-	debugfs_create_u64("blocksize", 0444, dir, &nbd->blksize);
+	debugfs_create_u64("blocksize", 0444, dir, &config->blksize);
 	debugfs_create_file("flags", 0444, dir, nbd, &nbd_dbg_flags_ops);
 
 	return 0;
@@ -1087,7 +1178,7 @@ static int nbd_dev_dbg_init(struct nbd_device *nbd)
 
 static void nbd_dev_dbg_close(struct nbd_device *nbd)
 {
-	debugfs_remove_recursive(nbd->dbg_dir);
+	debugfs_remove_recursive(nbd->config->dbg_dir);
 }
 
 static int nbd_dbg_init(void)
@@ -1148,7 +1239,6 @@ static const struct blk_mq_ops nbd_mq_ops = {
 static void nbd_dev_remove(struct nbd_device *nbd)
 {
 	struct gendisk *disk = nbd->disk;
-	nbd->magic = 0;
 	if (disk) {
 		del_gendisk(disk);
 		blk_cleanup_queue(disk->queue);
@@ -1218,14 +1308,13 @@ static int nbd_dev_add(int index)
 	blk_queue_max_hw_sectors(disk->queue, 65536);
 	disk->queue->limits.max_sectors = 256;
 
-	nbd->magic = NBD_MAGIC;
 	mutex_init(&nbd->config_lock);
+	refcount_set(&nbd->config_refs, 0);
 	disk->major = NBD_MAJOR;
 	disk->first_minor = index << part_shift;
 	disk->fops = &nbd_fops;
 	disk->private_data = nbd;
 	sprintf(disk->disk_name, "nbd%d", index);
-	init_waitqueue_head(&nbd->recv_wq);
 	nbd_reset(nbd);
 	add_disk(disk);
 	return index;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/12] nbd: stop using the bdev everywhere
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (2 preceding siblings ...)
  2017-04-06 21:01 ` [PATCH 03/12] nbd: separate out the config information Josef Bacik
@ 2017-04-06 21:01 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 05/12] nbd: add a basic netlink interface Josef Bacik
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:01 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

In preparation for the upcoming netlink interface we need to not rely on
already having the bdev for the NBD device we are doing operations on.
Instead of passing the bdev around, just use it in places where we know
we already have the bdev.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 92 +++++++++++++++++++++--------------------------------
 1 file changed, 37 insertions(+), 55 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index c2d9923..8891889 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -120,11 +120,6 @@ static inline struct device *nbd_to_dev(struct nbd_device *nbd)
 	return disk_to_dev(nbd->disk);
 }
 
-static bool nbd_is_connected(struct nbd_device *nbd)
-{
-	return !!nbd->task_recv;
-}
-
 static const char *nbdcmd_to_ascii(int cmd)
 {
 	switch (cmd) {
@@ -160,36 +155,30 @@ static void nbd_mark_nsock_dead(struct nbd_sock *nsock)
 	nsock->sent = 0;
 }
 
-static int nbd_size_clear(struct nbd_device *nbd, struct block_device *bdev)
+static void nbd_size_clear(struct nbd_device *nbd)
 {
 	if (nbd->config->bytesize) {
-		if (bdev->bd_openers <= 1)
-			bd_set_size(bdev, 0);
 		set_capacity(nbd->disk, 0);
 		kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
 	}
-
-	return 0;
 }
 
-static void nbd_size_update(struct nbd_device *nbd, struct block_device *bdev)
+static void nbd_size_update(struct nbd_device *nbd)
 {
 	struct nbd_config *config = nbd->config;
 	blk_queue_logical_block_size(nbd->disk->queue, config->blksize);
 	blk_queue_physical_block_size(nbd->disk->queue, config->blksize);
-	bd_set_size(bdev, config->bytesize);
 	set_capacity(nbd->disk, config->bytesize >> 9);
 	kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
 }
 
-static void nbd_size_set(struct nbd_device *nbd, struct block_device *bdev,
-			loff_t blocksize, loff_t nr_blocks)
+static void nbd_size_set(struct nbd_device *nbd, loff_t blocksize,
+			 loff_t nr_blocks)
 {
 	struct nbd_config *config = nbd->config;
 	config->blksize = blocksize;
 	config->bytesize = blocksize * nr_blocks;
-	if (nbd_is_connected(nbd))
-		nbd_size_update(nbd, bdev);
+	nbd_size_update(nbd);
 }
 
 static void nbd_end_request(struct nbd_cmd *cmd)
@@ -739,8 +728,7 @@ static int nbd_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
-static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
-			  unsigned long arg)
+static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg)
 {
 	struct nbd_config *config = nbd->config;
 	struct socket *sock;
@@ -783,8 +771,6 @@ static int nbd_add_socket(struct nbd_device *nbd, struct block_device *bdev,
 	nsock->sent = 0;
 	socks[config->num_connections++] = nsock;
 
-	if (max_part)
-		bdev->bd_invalidated = 1;
 	return 0;
 }
 
@@ -800,19 +786,20 @@ static void nbd_bdev_reset(struct block_device *bdev)
 {
 	if (bdev->bd_openers > 1)
 		return;
-	set_device_ro(bdev, false);
-	bdev->bd_inode->i_size = 0;
+	bd_set_size(bdev, 0);
 	if (max_part > 0) {
 		blkdev_reread_part(bdev);
 		bdev->bd_invalidated = 1;
 	}
 }
 
-static void nbd_parse_flags(struct nbd_device *nbd, struct block_device *bdev)
+static void nbd_parse_flags(struct nbd_device *nbd)
 {
 	struct nbd_config *config = nbd->config;
 	if (config->flags & NBD_FLAG_READ_ONLY)
-		set_device_ro(bdev, true);
+		set_disk_ro(nbd->disk, true);
+	else
+		set_disk_ro(nbd->disk, false);
 	if (config->flags & NBD_FLAG_SEND_TRIM)
 		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
 	if (config->flags & NBD_FLAG_SEND_FLUSH)
@@ -841,52 +828,36 @@ static void send_disconnects(struct nbd_device *nbd)
 	}
 }
 
-static int nbd_disconnect(struct nbd_device *nbd, struct block_device *bdev)
+static int nbd_disconnect(struct nbd_device *nbd)
 {
 	struct nbd_config *config = nbd->config;
 
 	dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n");
-	mutex_unlock(&nbd->config_lock);
-	fsync_bdev(bdev);
-	mutex_lock(&nbd->config_lock);
-
 	if (!test_and_set_bit(NBD_DISCONNECT_REQUESTED,
 			      &config->runtime_flags))
 		send_disconnects(nbd);
 	return 0;
 }
 
-static int nbd_clear_sock(struct nbd_device *nbd, struct block_device *bdev)
+static void nbd_clear_sock(struct nbd_device *nbd)
 {
 	sock_shutdown(nbd);
 	nbd_clear_que(nbd);
-
-	__invalidate_device(bdev, true);
-	nbd_bdev_reset(bdev);
 	nbd->task_setup = NULL;
-	return 0;
 }
 
 static void nbd_config_put(struct nbd_device *nbd)
 {
 	if (refcount_dec_and_mutex_lock(&nbd->config_refs,
 					&nbd->config_lock)) {
-		struct block_device *bdev;
 		struct nbd_config *config = nbd->config;
-
-		bdev = bdget_disk(nbd->disk, 0);
-		if (!bdev) {
-			mutex_unlock(&nbd->config_lock);
-			return;
-		}
-
 		nbd_dev_dbg_close(nbd);
-		nbd_size_clear(nbd, bdev);
+		nbd_size_clear(nbd);
 		if (test_and_clear_bit(NBD_HAS_PID_FILE,
 				       &config->runtime_flags))
 			device_remove_file(disk_to_dev(nbd->disk), &pid_attr);
 		nbd->task_recv = NULL;
-		nbd_clear_sock(nbd, bdev);
+		nbd_clear_sock(nbd);
 		if (config->num_connections) {
 			int i;
 			for (i = 0; i < config->num_connections; i++) {
@@ -897,7 +868,6 @@ static void nbd_config_put(struct nbd_device *nbd)
 		}
 		nbd_reset(nbd);
 		mutex_unlock(&nbd->config_lock);
-		bdput(bdev);
 		module_put(THIS_MODULE);
 	}
 }
@@ -912,27 +882,30 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 		return -EBUSY;
 	if (!config->socks)
 		return -EINVAL;
-
 	if (num_connections > 1 &&
 	    !(config->flags & NBD_FLAG_CAN_MULTI_CONN)) {
 		dev_err(disk_to_dev(nbd->disk), "server does not support multiple connections per device.\n");
 		return -EINVAL;
 	}
 
+	if (max_part)
+		bdev->bd_invalidated = 1;
 	blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections);
 	nbd->task_recv = current;
 	mutex_unlock(&nbd->config_lock);
 
-	nbd_parse_flags(nbd, bdev);
+	nbd_parse_flags(nbd);
 
 	error = device_create_file(disk_to_dev(nbd->disk), &pid_attr);
 	if (error) {
 		dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n");
 		return error;
 	}
-	set_bit(NBD_HAS_PID_FILE, &config->runtime_flags);
 
-	nbd_size_update(nbd, bdev);
+	set_bit(NBD_HAS_PID_FILE, &config->runtime_flags);
+	if (max_part)
+		bdev->bd_invalidated = 1;
+	bd_set_size(bdev, config->bytesize);
 
 	nbd_dev_dbg_init(nbd);
 	for (i = 0; i < num_connections; i++) {
@@ -965,6 +938,14 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 	return error;
 }
 
+static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
+				 struct block_device *bdev)
+{
+	nbd_clear_sock(nbd);
+	kill_bdev(bdev);
+	nbd_bdev_reset(bdev);
+}
+
 /* Must be called with config_lock held */
 static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		       unsigned int cmd, unsigned long arg)
@@ -973,21 +954,22 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 
 	switch (cmd) {
 	case NBD_DISCONNECT:
-		return nbd_disconnect(nbd, bdev);
+		return nbd_disconnect(nbd);
 	case NBD_CLEAR_SOCK:
-		return nbd_clear_sock(nbd, bdev);
+		nbd_clear_sock_ioctl(nbd, bdev);
+		return 0;
 	case NBD_SET_SOCK:
-		return nbd_add_socket(nbd, bdev, arg);
+		return nbd_add_socket(nbd, arg);
 	case NBD_SET_BLKSIZE:
-		nbd_size_set(nbd, bdev, arg,
+		nbd_size_set(nbd, arg,
 			     div_s64(config->bytesize, arg));
 		return 0;
 	case NBD_SET_SIZE:
-		nbd_size_set(nbd, bdev, config->blksize,
+		nbd_size_set(nbd, config->blksize,
 			     div_s64(arg, config->blksize));
 		return 0;
 	case NBD_SET_SIZE_BLOCKS:
-		nbd_size_set(nbd, bdev, config->blksize, arg);
+		nbd_size_set(nbd, config->blksize, arg);
 		return 0;
 	case NBD_SET_TIMEOUT:
 		if (arg) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/12] nbd: add a basic netlink interface
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (3 preceding siblings ...)
  2017-04-06 21:01 ` [PATCH 04/12] nbd: stop using the bdev everywhere Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 06/12] nbd: add a reconfigure netlink command Josef Bacik
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

The existing ioctl interface for configuring NBD devices is a bit
cumbersome and hard to extend.  The other problem is we leave a
userspace app sitting in it's syscall until the device disconnects,
which is less than ideal.

This patch introduces a netlink interface for adding and disconnecting
nbd devices.  This has the benefits of being easily extendable without
breaking older userspace applications, and allows us to configure a nbd
device without leaving a userspace app sitting waiting for the device to
disconnect.

With this interface we also gain the ability to configure more devices
than are preallocated at insmod time.  We also have gained the ability
to not specify a particular device and be provided one for us so that
userspace doesn't need to find a free device to configure.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c              | 316 +++++++++++++++++++++++++++++++++++----
 include/uapi/linux/nbd-netlink.h |  69 +++++++++
 2 files changed, 359 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/nbd-netlink.h

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 8891889..b8764e1 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -40,6 +40,8 @@
 #include <asm/types.h>
 
 #include <linux/nbd.h>
+#include <linux/nbd-netlink.h>
+#include <net/genetlink.h>
 
 static DEFINE_IDR(nbd_index_idr);
 static DEFINE_MUTEX(nbd_index_mutex);
@@ -63,6 +65,8 @@ struct recv_thread_args {
 #define NBD_DISCONNECT_REQUESTED	1
 #define NBD_DISCONNECTED		2
 #define NBD_HAS_PID_FILE		3
+#define NBD_HAS_CONFIG_REF		4
+#define NBD_BOUND			5
 
 struct nbd_config {
 	u32 flags;
@@ -83,6 +87,7 @@ struct nbd_config {
 struct nbd_device {
 	struct blk_mq_tag_set tag_set;
 
+	int index;
 	refcount_t config_refs;
 	struct nbd_config *config;
 	struct mutex config_lock;
@@ -114,6 +119,7 @@ static int part_shift;
 static int nbd_dev_dbg_init(struct nbd_device *nbd);
 static void nbd_dev_dbg_close(struct nbd_device *nbd);
 static void nbd_config_put(struct nbd_device *nbd);
+static void nbd_connect_reply(struct genl_info *info, int index);
 
 static inline struct device *nbd_to_dev(struct nbd_device *nbd)
 {
@@ -728,7 +734,8 @@ static int nbd_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
-static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg)
+static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg,
+			  bool netlink)
 {
 	struct nbd_config *config = nbd->config;
 	struct socket *sock;
@@ -740,13 +747,17 @@ static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg)
 	if (!sock)
 		return err;
 
-	if (!nbd->task_setup)
+	if (!netlink && !nbd->task_setup &&
+	    !test_bit(NBD_BOUND, &config->runtime_flags))
 		nbd->task_setup = current;
-	if (nbd->task_setup != current) {
+
+	if (!netlink &&
+	    (nbd->task_setup != current ||
+	     test_bit(NBD_BOUND, &config->runtime_flags))) {
 		dev_err(disk_to_dev(nbd->disk),
 			"Device being setup by another task");
 		sockfd_put(sock);
-		return -EINVAL;
+		return -EBUSY;
 	}
 
 	socks = krealloc(config->socks, (config->num_connections + 1) *
@@ -872,7 +883,7 @@ static void nbd_config_put(struct nbd_device *nbd)
 	}
 }
 
-static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
+static int nbd_start_device(struct nbd_device *nbd)
 {
 	struct nbd_config *config = nbd->config;
 	int num_connections = config->num_connections;
@@ -888,11 +899,8 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 		return -EINVAL;
 	}
 
-	if (max_part)
-		bdev->bd_invalidated = 1;
 	blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections);
 	nbd->task_recv = current;
-	mutex_unlock(&nbd->config_lock);
 
 	nbd_parse_flags(nbd);
 
@@ -901,11 +909,7 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 		dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n");
 		return error;
 	}
-
 	set_bit(NBD_HAS_PID_FILE, &config->runtime_flags);
-	if (max_part)
-		bdev->bd_invalidated = 1;
-	bd_set_size(bdev, config->bytesize);
 
 	nbd_dev_dbg_init(nbd);
 	for (i = 0; i < num_connections; i++) {
@@ -924,18 +928,34 @@ static int nbd_start_device(struct nbd_device *nbd, struct block_device *bdev)
 		args->index = i;
 		queue_work(recv_workqueue, &args->work);
 	}
-	error = wait_event_interruptible(config->recv_wq,
+	return error;
+}
+
+static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *bdev)
+{
+	struct nbd_config *config = nbd->config;
+	int ret;
+
+	ret = nbd_start_device(nbd);
+	if (ret)
+		return ret;
+
+	bd_set_size(bdev, config->bytesize);
+	if (max_part)
+		bdev->bd_invalidated = 1;
+	mutex_unlock(&nbd->config_lock);
+	ret = wait_event_interruptible(config->recv_wq,
 					 atomic_read(&config->recv_threads) == 0);
-	if (error)
+	if (ret)
 		sock_shutdown(nbd);
 	mutex_lock(&nbd->config_lock);
-
+	bd_set_size(bdev, 0);
 	/* user requested, ignore socket errors */
 	if (test_bit(NBD_DISCONNECT_REQUESTED, &config->runtime_flags))
-		error = 0;
+		ret = 0;
 	if (test_bit(NBD_TIMEDOUT, &config->runtime_flags))
-		error = -ETIMEDOUT;
-	return error;
+		ret = -ETIMEDOUT;
+	return ret;
 }
 
 static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
@@ -944,6 +964,9 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
 	nbd_clear_sock(nbd);
 	kill_bdev(bdev);
 	nbd_bdev_reset(bdev);
+	if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
+			       &nbd->config->runtime_flags))
+		nbd_config_put(nbd);
 }
 
 /* Must be called with config_lock held */
@@ -959,7 +982,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		nbd_clear_sock_ioctl(nbd, bdev);
 		return 0;
 	case NBD_SET_SOCK:
-		return nbd_add_socket(nbd, arg);
+		return nbd_add_socket(nbd, arg, false);
 	case NBD_SET_BLKSIZE:
 		nbd_size_set(nbd, arg,
 			     div_s64(config->bytesize, arg));
@@ -982,7 +1005,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		config->flags = arg;
 		return 0;
 	case NBD_DO_IT:
-		return nbd_start_device(nbd, bdev);
+		return nbd_start_device_ioctl(nbd, bdev);
 	case NBD_CLEAR_QUE:
 		/*
 		 * This is for compatibility only.  The queue is always cleared
@@ -1003,13 +1026,22 @@ static int nbd_ioctl(struct block_device *bdev, fmode_t mode,
 		     unsigned int cmd, unsigned long arg)
 {
 	struct nbd_device *nbd = bdev->bd_disk->private_data;
-	int error;
+	struct nbd_config *config = nbd->config;
+	int error = -EINVAL;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
 	mutex_lock(&nbd->config_lock);
-	error = __nbd_ioctl(bdev, nbd, cmd, arg);
+
+	/* Don't allow ioctl operations on a nbd device that was created with
+	 * netlink, unless it's DISCONNECT or CLEAR_SOCK, which are fine.
+	 */
+	if (!test_bit(NBD_BOUND, &config->runtime_flags) ||
+	    (cmd == NBD_DISCONNECT || cmd == NBD_CLEAR_SOCK))
+		error = __nbd_ioctl(bdev, nbd, cmd, arg);
+	else
+		dev_err(nbd_to_dev(nbd), "Cannot use ioctl interface on a netlink controlled device.\n");
 	mutex_unlock(&nbd->config_lock);
 	return error;
 }
@@ -1258,6 +1290,7 @@ static int nbd_dev_add(int index)
 	if (err < 0)
 		goto out_free_disk;
 
+	nbd->index = index;
 	nbd->disk = disk;
 	nbd->tag_set.ops = &nbd_mq_ops;
 	nbd->tag_set.nr_hw_queues = 1;
@@ -1313,10 +1346,235 @@ static int nbd_dev_add(int index)
 	return err;
 }
 
-/*
- * And here should be modules and kernel interface 
- *  (Just smiley confuses emacs :-)
- */
+static int find_free_cb(int id, void *ptr, void *data)
+{
+	struct nbd_device *nbd = ptr;
+	struct nbd_device **found = data;
+
+	if (!refcount_read(&nbd->config_refs)) {
+		*found = nbd;
+		return 1;
+	}
+	return 0;
+}
+
+/* Netlink interface. */
+static struct nla_policy nbd_attr_policy[NBD_ATTR_MAX + 1] = {
+	[NBD_ATTR_INDEX]		=	{ .type = NLA_U32 },
+	[NBD_ATTR_SIZE_BYTES]		=	{ .type = NLA_U64 },
+	[NBD_ATTR_BLOCK_SIZE_BYTES]	=	{ .type = NLA_U64 },
+	[NBD_ATTR_TIMEOUT]		=	{ .type = NLA_U64 },
+	[NBD_ATTR_SERVER_FLAGS]		=	{ .type = NLA_U64 },
+	[NBD_ATTR_CLIENT_FLAGS]		=	{ .type = NLA_U64 },
+	[NBD_ATTR_SOCKETS]		=	{ .type = NLA_NESTED},
+};
+
+static struct nla_policy nbd_sock_policy[NBD_SOCK_MAX + 1] = {
+	[NBD_SOCK_FD]			=	{ .type = NLA_U32 },
+};
+
+static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nbd_device *nbd = NULL;
+	struct nbd_config *config;
+	int index = -1;
+	int ret;
+
+	if (!netlink_capable(skb, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (info->attrs[NBD_ATTR_INDEX])
+		index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
+	if (!info->attrs[NBD_ATTR_SOCKETS]) {
+		printk(KERN_ERR "nbd: must specify at least one socket\n");
+		return -EINVAL;
+	}
+	if (!info->attrs[NBD_ATTR_SIZE_BYTES]) {
+		printk(KERN_ERR "nbd: must specify a size in bytes for the device\n");
+		return -EINVAL;
+	}
+again:
+	mutex_lock(&nbd_index_mutex);
+	if (index == -1) {
+		ret = idr_for_each(&nbd_index_idr, &find_free_cb, &nbd);
+		if (ret == 0) {
+			int new_index;
+			new_index = nbd_dev_add(-1);
+			if (new_index < 0) {
+				mutex_unlock(&nbd_index_mutex);
+				printk(KERN_ERR "nbd: failed to add new device\n");
+				return ret;
+			}
+			nbd = idr_find(&nbd_index_idr, new_index);
+		}
+	} else {
+		nbd = idr_find(&nbd_index_idr, index);
+	}
+	mutex_unlock(&nbd_index_mutex);
+	if (!nbd) {
+		printk(KERN_ERR "nbd: couldn't find device at index %d\n",
+		       index);
+		return -EINVAL;
+	}
+
+	mutex_lock(&nbd->config_lock);
+	if (refcount_read(&nbd->config_refs)) {
+		mutex_unlock(&nbd->config_lock);
+		if (index == -1)
+			goto again;
+		printk(KERN_ERR "nbd: nbd%d already in use\n", index);
+		return -EBUSY;
+	}
+	if (WARN_ON(nbd->config)) {
+		mutex_unlock(&nbd->config_lock);
+		return -EINVAL;
+	}
+	config = nbd->config = nbd_alloc_config();
+	if (!nbd->config) {
+		mutex_unlock(&nbd->config_lock);
+		printk(KERN_ERR "nbd: couldn't allocate config\n");
+		return -ENOMEM;
+	}
+	refcount_set(&nbd->config_refs, 1);
+	set_bit(NBD_BOUND, &config->runtime_flags);
+
+	if (info->attrs[NBD_ATTR_SIZE_BYTES]) {
+		u64 bytes = nla_get_u64(info->attrs[NBD_ATTR_SIZE_BYTES]);
+		nbd_size_set(nbd, config->blksize,
+			     div64_u64(bytes, config->blksize));
+	}
+	if (info->attrs[NBD_ATTR_BLOCK_SIZE_BYTES]) {
+		u64 bsize =
+			nla_get_u64(info->attrs[NBD_ATTR_BLOCK_SIZE_BYTES]);
+		nbd_size_set(nbd, bsize, div64_u64(config->bytesize, bsize));
+	}
+	if (info->attrs[NBD_ATTR_TIMEOUT]) {
+		u64 timeout = nla_get_u64(info->attrs[NBD_ATTR_TIMEOUT]);
+		nbd->tag_set.timeout = timeout * HZ;
+		blk_queue_rq_timeout(nbd->disk->queue, timeout * HZ);
+	}
+	if (info->attrs[NBD_ATTR_SERVER_FLAGS])
+		config->flags =
+			nla_get_u64(info->attrs[NBD_ATTR_SERVER_FLAGS]);
+	if (info->attrs[NBD_ATTR_SOCKETS]) {
+		struct nlattr *attr;
+		int rem, fd;
+
+		nla_for_each_nested(attr, info->attrs[NBD_ATTR_SOCKETS],
+				    rem) {
+			struct nlattr *socks[NBD_SOCK_MAX+1];
+
+			if (nla_type(attr) != NBD_SOCK_ITEM) {
+				printk(KERN_ERR "nbd: socks must be embedded in a SOCK_ITEM attr\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			ret = nla_parse_nested(socks, NBD_SOCK_MAX, attr,
+					       nbd_sock_policy);
+			if (ret != 0) {
+				printk(KERN_ERR "nbd: error processing sock list\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			if (!socks[NBD_SOCK_FD])
+				continue;
+			fd = (int)nla_get_u32(socks[NBD_SOCK_FD]);
+			ret = nbd_add_socket(nbd, fd, true);
+			if (ret)
+				goto out;
+		}
+	}
+	ret = nbd_start_device(nbd);
+out:
+	mutex_unlock(&nbd->config_lock);
+	if (!ret) {
+		set_bit(NBD_HAS_CONFIG_REF, &config->runtime_flags);
+		refcount_inc(&nbd->config_refs);
+		nbd_connect_reply(info, nbd->index);
+	}
+	nbd_config_put(nbd);
+	return ret;
+}
+
+static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nbd_device *nbd;
+	int index;
+
+	if (!netlink_capable(skb, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (!info->attrs[NBD_ATTR_INDEX]) {
+		printk(KERN_ERR "nbd: must specify an index to disconnect\n");
+		return -EINVAL;
+	}
+	index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
+	mutex_lock(&nbd_index_mutex);
+	nbd = idr_find(&nbd_index_idr, index);
+	mutex_unlock(&nbd_index_mutex);
+	if (!nbd) {
+		printk(KERN_ERR "nbd: couldn't find device at index %d\n",
+		       index);
+		return -EINVAL;
+	}
+	if (!refcount_inc_not_zero(&nbd->config_refs))
+		return 0;
+	mutex_lock(&nbd->config_lock);
+	nbd_disconnect(nbd);
+	mutex_unlock(&nbd->config_lock);
+	if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
+			       &nbd->config->runtime_flags))
+		nbd_config_put(nbd);
+	nbd_config_put(nbd);
+	return 0;
+}
+
+static const struct genl_ops nbd_connect_genl_ops[] = {
+	{
+		.cmd	= NBD_CMD_CONNECT,
+		.policy	= nbd_attr_policy,
+		.doit	= nbd_genl_connect,
+	},
+	{
+		.cmd	= NBD_CMD_DISCONNECT,
+		.policy	= nbd_attr_policy,
+		.doit	= nbd_genl_disconnect,
+	},
+};
+
+static struct genl_family nbd_genl_family __ro_after_init = {
+	.hdrsize	= 0,
+	.name		= NBD_GENL_FAMILY_NAME,
+	.version	= NBD_GENL_VERSION,
+	.module		= THIS_MODULE,
+	.ops		= nbd_connect_genl_ops,
+	.n_ops		= ARRAY_SIZE(nbd_connect_genl_ops),
+	.maxattr	= NBD_ATTR_MAX,
+};
+
+static void nbd_connect_reply(struct genl_info *info, int index)
+{
+	struct sk_buff *skb;
+	void *msg_head;
+	int ret;
+
+	skb = genlmsg_new(nla_total_size(sizeof(u32)), GFP_KERNEL);
+	if (!skb)
+		return;
+	msg_head = genlmsg_put_reply(skb, info, &nbd_genl_family, 0,
+				     NBD_CMD_CONNECT);
+	if (!msg_head) {
+		nlmsg_free(skb);
+		return;
+	}
+	ret = nla_put_u32(skb, NBD_ATTR_INDEX, index);
+	if (ret) {
+		nlmsg_free(skb);
+		return;
+	}
+	genlmsg_end(skb, msg_head);
+	genlmsg_reply(skb, info);
+}
 
 static int __init nbd_init(void)
 {
@@ -1359,6 +1617,11 @@ static int __init nbd_init(void)
 		return -EIO;
 	}
 
+	if (genl_register_family(&nbd_genl_family)) {
+		unregister_blkdev(NBD_MAJOR, "nbd");
+		destroy_workqueue(recv_workqueue);
+		return -EINVAL;
+	}
 	nbd_dbg_init();
 
 	mutex_lock(&nbd_index_mutex);
@@ -1381,6 +1644,7 @@ static void __exit nbd_cleanup(void)
 
 	idr_for_each(&nbd_index_idr, &nbd_exit_cb, NULL);
 	idr_destroy(&nbd_index_idr);
+	genl_unregister_family(&nbd_genl_family);
 	destroy_workqueue(recv_workqueue);
 	unregister_blkdev(NBD_MAJOR, "nbd");
 }
diff --git a/include/uapi/linux/nbd-netlink.h b/include/uapi/linux/nbd-netlink.h
new file mode 100644
index 0000000..fd0f4e4
--- /dev/null
+++ b/include/uapi/linux/nbd-netlink.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2017 Facebook.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#ifndef _UAPILINUX_NBD_NETLINK_H
+#define _UAPILINUX_NBD_NETLINK_H
+
+#define NBD_GENL_FAMILY_NAME	"nbd"
+#define NBD_GENL_VERSION	0x1
+
+/* Configuration policy attributes, used for CONNECT */
+enum {
+	NBD_ATTR_UNSPEC,
+	NBD_ATTR_INDEX,
+	NBD_ATTR_SIZE_BYTES,
+	NBD_ATTR_BLOCK_SIZE_BYTES,
+	NBD_ATTR_TIMEOUT,
+	NBD_ATTR_SERVER_FLAGS,
+	NBD_ATTR_CLIENT_FLAGS,
+	NBD_ATTR_SOCKETS,
+	__NBD_ATTR_MAX,
+};
+#define NBD_ATTR_MAX (__NBD_ATTR_MAX - 1)
+
+/*
+ * This is the format for multiple sockets with NBD_ATTR_SOCKETS
+ *
+ * [NBD_ATTR_SOCKETS]
+ *   [NBD_SOCK_ITEM]
+ *     [NBD_SOCK_FD]
+ *   [NBD_SOCK_ITEM]
+ *     [NBD_SOCK_FD]
+ */
+enum {
+	NBD_SOCK_ITEM_UNSPEC,
+	NBD_SOCK_ITEM,
+	__NBD_SOCK_ITEM_MAX,
+};
+#define NBD_SOCK_ITEM_MAX (__NBD_SOCK_ITEM_MAX - 1)
+
+enum {
+	NBD_SOCK_UNSPEC,
+	NBD_SOCK_FD,
+	__NBD_SOCK_MAX,
+};
+#define NBD_SOCK_MAX (__NBD_SOCK_MAX - 1)
+
+enum {
+	NBD_CMD_UNSPEC,
+	NBD_CMD_CONNECT,
+	NBD_CMD_DISCONNECT,
+	__NBD_CMD_MAX,
+};
+#define NBD_CMD_MAX	(__NBD_CMD_MAX - 1)
+
+#endif /* _UAPILINUX_NBD_NETLINK_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/12] nbd: add a reconfigure netlink command
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (4 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 05/12] nbd: add a basic netlink interface Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 07/12] nbd: multicast dead link notifications Josef Bacik
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

We want to be able to reconnect dead connections to existing block
devices, so add a reconfigure netlink command.  We will also allow users
to change their timeout on the fly, but everything else will require a
disconnect and reconnect.  You won't be able to add more connections
either, simply replace dead connections with new more lively
connections.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c              | 141 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/nbd-netlink.h |   1 +
 2 files changed, 142 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b8764e1..27958c3 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -785,6 +785,59 @@ static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg,
 	return 0;
 }
 
+static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
+{
+	struct nbd_config *config = nbd->config;
+	struct socket *sock, *old;
+	struct recv_thread_args *args;
+	int i;
+	int err;
+
+	sock = sockfd_lookup(arg, &err);
+	if (!sock)
+		return err;
+
+	args = kzalloc(sizeof(*args), GFP_KERNEL);
+	if (!args) {
+		sockfd_put(sock);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < config->num_connections; i++) {
+		struct nbd_sock *nsock = config->socks[i];
+
+		if (!nsock->dead)
+			continue;
+
+		mutex_lock(&nsock->tx_lock);
+		if (!nsock->dead) {
+			mutex_unlock(&nsock->tx_lock);
+			continue;
+		}
+		sk_set_memalloc(sock->sk);
+		atomic_inc(&config->recv_threads);
+		refcount_inc(&nbd->config_refs);
+		old = nsock->sock;
+		nsock->fallback_index = -1;
+		nsock->sock = sock;
+		nsock->dead = false;
+		INIT_WORK(&args->work, recv_work);
+		args->index = i;
+		args->nbd = nbd;
+		mutex_unlock(&nsock->tx_lock);
+		sockfd_put(old);
+
+		/* We take the tx_mutex in an error path in the recv_work, so we
+		 * need to queue_work outside of the tx_mutex.
+		 */
+		queue_work(recv_workqueue, &args->work);
+		return 0;
+	}
+	sockfd_put(sock);
+	kfree(args);
+	return -ENOSPC;
+}
+
 /* Reset all properties of an NBD device */
 static void nbd_reset(struct nbd_device *nbd)
 {
@@ -1529,6 +1582,89 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 }
 
+static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nbd_device *nbd = NULL;
+	struct nbd_config *config;
+	int index;
+	int ret = -EINVAL;
+
+	if (!netlink_capable(skb, CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (!info->attrs[NBD_ATTR_INDEX]) {
+		printk(KERN_ERR "nbd: must specify a device to reconfigure\n");
+		return -EINVAL;
+	}
+	index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
+	mutex_lock(&nbd_index_mutex);
+	nbd = idr_find(&nbd_index_idr, index);
+	mutex_unlock(&nbd_index_mutex);
+	if (!nbd) {
+		printk(KERN_ERR "nbd: couldn't find a device at index %d\n",
+		       index);
+		return -EINVAL;
+	}
+
+	if (!refcount_inc_not_zero(&nbd->config_refs)) {
+		dev_err(nbd_to_dev(nbd),
+			"not configured, cannot reconfigure\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&nbd->config_lock);
+	config = nbd->config;
+	if (!test_bit(NBD_BOUND, &config->runtime_flags) ||
+	    !nbd->task_recv) {
+		dev_err(nbd_to_dev(nbd),
+			"not configured, cannot reconfigure\n");
+		goto out;
+	}
+
+	if (info->attrs[NBD_ATTR_TIMEOUT]) {
+		u64 timeout = nla_get_u64(info->attrs[NBD_ATTR_TIMEOUT]);
+		nbd->tag_set.timeout = timeout * HZ;
+		blk_queue_rq_timeout(nbd->disk->queue, timeout * HZ);
+	}
+
+	if (info->attrs[NBD_ATTR_SOCKETS]) {
+		struct nlattr *attr;
+		int rem, fd;
+
+		nla_for_each_nested(attr, info->attrs[NBD_ATTR_SOCKETS],
+				    rem) {
+			struct nlattr *socks[NBD_SOCK_MAX+1];
+
+			if (nla_type(attr) != NBD_SOCK_ITEM) {
+				printk(KERN_ERR "nbd: socks must be embedded in a SOCK_ITEM attr\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			ret = nla_parse_nested(socks, NBD_SOCK_MAX, attr,
+					       nbd_sock_policy);
+			if (ret != 0) {
+				printk(KERN_ERR "nbd: error processing sock list\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			if (!socks[NBD_SOCK_FD])
+				continue;
+			fd = (int)nla_get_u32(socks[NBD_SOCK_FD]);
+			ret = nbd_reconnect_socket(nbd, fd);
+			if (ret) {
+				if (ret == -ENOSPC)
+					ret = 0;
+				goto out;
+			}
+			dev_info(nbd_to_dev(nbd), "reconnected socket\n");
+		}
+	}
+out:
+	mutex_unlock(&nbd->config_lock);
+	nbd_config_put(nbd);
+	return ret;
+}
+
 static const struct genl_ops nbd_connect_genl_ops[] = {
 	{
 		.cmd	= NBD_CMD_CONNECT,
@@ -1540,6 +1676,11 @@ static const struct genl_ops nbd_connect_genl_ops[] = {
 		.policy	= nbd_attr_policy,
 		.doit	= nbd_genl_disconnect,
 	},
+	{
+		.cmd	= NBD_CMD_RECONFIGURE,
+		.policy	= nbd_attr_policy,
+		.doit	= nbd_genl_reconfigure,
+	},
 };
 
 static struct genl_family nbd_genl_family __ro_after_init = {
diff --git a/include/uapi/linux/nbd-netlink.h b/include/uapi/linux/nbd-netlink.h
index fd0f4e4..f932f96 100644
--- a/include/uapi/linux/nbd-netlink.h
+++ b/include/uapi/linux/nbd-netlink.h
@@ -62,6 +62,7 @@ enum {
 	NBD_CMD_UNSPEC,
 	NBD_CMD_CONNECT,
 	NBD_CMD_DISCONNECT,
+	NBD_CMD_RECONFIGURE,
 	__NBD_CMD_MAX,
 };
 #define NBD_CMD_MAX	(__NBD_CMD_MAX - 1)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/12] nbd: multicast dead link notifications
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (5 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 06/12] nbd: add a reconfigure netlink command Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 08/12] nbd: only clear the queue on device teardown Josef Bacik
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

Provide a mechanism to notify userspace that there's been a link problem
on a NBD device.  This will allow userspace to re-establish a connection
and provide the new socket to the device without disrupting the device.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c              | 89 ++++++++++++++++++++++++++++++++++------
 include/uapi/linux/nbd-netlink.h |  6 ++-
 2 files changed, 81 insertions(+), 14 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 27958c3..911e36c 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -53,6 +53,7 @@ struct nbd_sock {
 	int sent;
 	bool dead;
 	int fallback_index;
+	int cookie;
 };
 
 struct recv_thread_args {
@@ -61,6 +62,11 @@ struct recv_thread_args {
 	int index;
 };
 
+struct link_dead_args {
+	struct work_struct work;
+	int index;
+};
+
 #define NBD_TIMEDOUT			0
 #define NBD_DISCONNECT_REQUESTED	1
 #define NBD_DISCONNECTED		2
@@ -100,6 +106,7 @@ struct nbd_device {
 struct nbd_cmd {
 	struct nbd_device *nbd;
 	int index;
+	int cookie;
 	struct completion send_complete;
 };
 
@@ -120,6 +127,7 @@ static int nbd_dev_dbg_init(struct nbd_device *nbd);
 static void nbd_dev_dbg_close(struct nbd_device *nbd);
 static void nbd_config_put(struct nbd_device *nbd);
 static void nbd_connect_reply(struct genl_info *info, int index);
+static void nbd_dead_link_work(struct work_struct *work);
 
 static inline struct device *nbd_to_dev(struct nbd_device *nbd)
 {
@@ -152,8 +160,24 @@ static struct device_attribute pid_attr = {
 	.show = pid_show,
 };
 
-static void nbd_mark_nsock_dead(struct nbd_sock *nsock)
+static int nbd_disconnected(struct nbd_config *config)
+{
+	return test_bit(NBD_DISCONNECTED, &config->runtime_flags) ||
+		test_bit(NBD_DISCONNECT_REQUESTED, &config->runtime_flags);
+}
+
+static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
+				int notify)
 {
+	if (!nsock->dead && notify && !nbd_disconnected(nbd->config)) {
+		struct link_dead_args *args;
+		args = kmalloc(sizeof(struct link_dead_args), GFP_NOIO);
+		if (args) {
+			INIT_WORK(&args->work, nbd_dead_link_work);
+			args->index = nbd->index;
+			queue_work(system_wq, &args->work);
+		}
+	}
 	if (!nsock->dead)
 		kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
 	nsock->dead = true;
@@ -215,8 +239,7 @@ static void sock_shutdown(struct nbd_device *nbd)
 	for (i = 0; i < config->num_connections; i++) {
 		struct nbd_sock *nsock = config->socks[i];
 		mutex_lock(&nsock->tx_lock);
-		kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
-		nbd_mark_nsock_dead(nsock);
+		nbd_mark_nsock_dead(nbd, nsock, 0);
 		mutex_unlock(&nsock->tx_lock);
 	}
 	dev_warn(disk_to_dev(nbd->disk), "shutting down sockets\n");
@@ -248,7 +271,14 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 				struct nbd_sock *nsock =
 					config->socks[cmd->index];
 				mutex_lock(&nsock->tx_lock);
-				nbd_mark_nsock_dead(nsock);
+				/* We can have multiple outstanding requests, so
+				 * we don't want to mark the nsock dead if we've
+				 * already reconnected with a new socket, so
+				 * only mark it dead if its the same socket we
+				 * were sent out on.
+				 */
+				if (cmd->cookie == nsock->cookie)
+					nbd_mark_nsock_dead(nbd, nsock, 1);
 				mutex_unlock(&nsock->tx_lock);
 			}
 			blk_mq_requeue_request(req, true);
@@ -370,6 +400,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 		iov_iter_advance(&from, sent);
 	}
 	cmd->index = index;
+	cmd->cookie = nsock->cookie;
 	request.type = htonl(type);
 	if (type != NBD_CMD_FLUSH) {
 		request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9);
@@ -458,12 +489,6 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct nbd_cmd *cmd, int index)
 	return 0;
 }
 
-static int nbd_disconnected(struct nbd_config *config)
-{
-	return test_bit(NBD_DISCONNECTED, &config->runtime_flags) ||
-		test_bit(NBD_DISCONNECT_REQUESTED, &config->runtime_flags);
-}
-
 /* NULL returned = something went wrong, inform userspace */
 static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 {
@@ -564,7 +589,7 @@ static void recv_work(struct work_struct *work)
 			struct nbd_sock *nsock = config->socks[args->index];
 
 			mutex_lock(&nsock->tx_lock);
-			nbd_mark_nsock_dead(nsock);
+			nbd_mark_nsock_dead(nbd, nsock, 1);
 			mutex_unlock(&nsock->tx_lock);
 			ret = PTR_ERR(cmd);
 			break;
@@ -691,7 +716,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	if (ret == -EAGAIN) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
 				    "Request send failed trying another connection\n");
-		nbd_mark_nsock_dead(nsock);
+		nbd_mark_nsock_dead(nbd, nsock, 1);
 		mutex_unlock(&nsock->tx_lock);
 		goto again;
 	}
@@ -780,6 +805,7 @@ static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg,
 	nsock->sock = sock;
 	nsock->pending = NULL;
 	nsock->sent = 0;
+	nsock->cookie = 0;
 	socks[config->num_connections++] = nsock;
 
 	return 0;
@@ -824,6 +850,7 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
 		INIT_WORK(&args->work, recv_work);
 		args->index = i;
 		args->nbd = nbd;
+		nsock->cookie++;
 		mutex_unlock(&nsock->tx_lock);
 		sockfd_put(old);
 
@@ -1683,6 +1710,10 @@ static const struct genl_ops nbd_connect_genl_ops[] = {
 	},
 };
 
+static const struct genl_multicast_group nbd_mcast_grps[] = {
+	{ .name = NBD_GENL_MCAST_GROUP_NAME, },
+};
+
 static struct genl_family nbd_genl_family __ro_after_init = {
 	.hdrsize	= 0,
 	.name		= NBD_GENL_FAMILY_NAME,
@@ -1691,6 +1722,8 @@ static struct genl_family nbd_genl_family __ro_after_init = {
 	.ops		= nbd_connect_genl_ops,
 	.n_ops		= ARRAY_SIZE(nbd_connect_genl_ops),
 	.maxattr	= NBD_ATTR_MAX,
+	.mcgrps		= nbd_mcast_grps,
+	.n_mcgrps	= ARRAY_SIZE(nbd_mcast_grps),
 };
 
 static void nbd_connect_reply(struct genl_info *info, int index)
@@ -1717,6 +1750,38 @@ static void nbd_connect_reply(struct genl_info *info, int index)
 	genlmsg_reply(skb, info);
 }
 
+static void nbd_mcast_index(int index)
+{
+	struct sk_buff *skb;
+	void *msg_head;
+	int ret;
+
+	skb = genlmsg_new(nla_total_size(sizeof(u32)), GFP_KERNEL);
+	if (!skb)
+		return;
+	msg_head = genlmsg_put(skb, 0, 0, &nbd_genl_family, 0,
+				     NBD_CMD_LINK_DEAD);
+	if (!msg_head) {
+		nlmsg_free(skb);
+		return;
+	}
+	ret = nla_put_u32(skb, NBD_ATTR_INDEX, index);
+	if (ret) {
+		nlmsg_free(skb);
+		return;
+	}
+	genlmsg_end(skb, msg_head);
+	genlmsg_multicast(&nbd_genl_family, skb, 0, 0, GFP_KERNEL);
+}
+
+static void nbd_dead_link_work(struct work_struct *work)
+{
+	struct link_dead_args *args = container_of(work, struct link_dead_args,
+						   work);
+	nbd_mcast_index(args->index);
+	kfree(args);
+}
+
 static int __init nbd_init(void)
 {
 	int i;
diff --git a/include/uapi/linux/nbd-netlink.h b/include/uapi/linux/nbd-netlink.h
index f932f96..b69105cc 100644
--- a/include/uapi/linux/nbd-netlink.h
+++ b/include/uapi/linux/nbd-netlink.h
@@ -18,8 +18,9 @@
 #ifndef _UAPILINUX_NBD_NETLINK_H
 #define _UAPILINUX_NBD_NETLINK_H
 
-#define NBD_GENL_FAMILY_NAME	"nbd"
-#define NBD_GENL_VERSION	0x1
+#define NBD_GENL_FAMILY_NAME		"nbd"
+#define NBD_GENL_VERSION		0x1
+#define NBD_GENL_MCAST_GROUP_NAME	"nbd_mc_group"
 
 /* Configuration policy attributes, used for CONNECT */
 enum {
@@ -63,6 +64,7 @@ enum {
 	NBD_CMD_CONNECT,
 	NBD_CMD_DISCONNECT,
 	NBD_CMD_RECONFIGURE,
+	NBD_CMD_LINK_DEAD,
 	__NBD_CMD_MAX,
 };
 #define NBD_CMD_MAX	(__NBD_CMD_MAX - 1)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/12] nbd: only clear the queue on device teardown
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (6 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 07/12] nbd: multicast dead link notifications Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 09/12] nbd: handle dead connections Josef Bacik
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

When running a disconnect torture test I noticed that sometimes we would
crash with a negative ref count on our queue.  This was because we were
ending the same request twice.  Turns out we were racing with
NBD_CLEAR_SOCK clearing the requests as well as the teardown of the
device clearing the requests.  So instead make the ioctl only shutdown
the sockets and make it so that we only ever run nbd_clear_que from the
device teardown.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 911e36c..70c5e75 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -616,7 +616,9 @@ static void nbd_clear_req(struct request *req, void *data, bool reserved)
 
 static void nbd_clear_que(struct nbd_device *nbd)
 {
+	blk_mq_stop_hw_queues(nbd->disk->queue);
 	blk_mq_tagset_busy_iter(&nbd->tag_set, nbd_clear_req, NULL);
+	blk_mq_start_hw_queues(nbd->disk->queue);
 	dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n");
 }
 
@@ -1041,7 +1043,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
 static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
 				 struct block_device *bdev)
 {
-	nbd_clear_sock(nbd);
+	sock_shutdown(nbd);
 	kill_bdev(bdev);
 	nbd_bdev_reset(bdev);
 	if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/12] nbd: handle dead connections
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (7 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 08/12] nbd: only clear the queue on device teardown Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 10/12] nbd: add a status netlink command Josef Bacik
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

Sometimes we like to upgrade our server without making all of our
clients freak out and reconnect.  This patch provides a way to specify a
dead connection timeout to allow us to pause all requests and wait for
new connections to be opened.  With this in place I can take down the
nbd server for less than the dead connection timeout time and bring it
back up and everything resumes gracefully.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c              | 63 +++++++++++++++++++++++++++++++++++++---
 include/uapi/linux/nbd-netlink.h |  1 +
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 70c5e75..fd3d535 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -77,9 +77,12 @@ struct link_dead_args {
 struct nbd_config {
 	u32 flags;
 	unsigned long runtime_flags;
+	u64 dead_conn_timeout;
 
 	struct nbd_sock **socks;
 	int num_connections;
+	atomic_t live_connections;
+	wait_queue_head_t conn_wait;
 
 	atomic_t recv_threads;
 	wait_queue_head_t recv_wq;
@@ -178,8 +181,10 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
 			queue_work(system_wq, &args->work);
 		}
 	}
-	if (!nsock->dead)
+	if (!nsock->dead) {
 		kernel_sock_shutdown(nsock->sock, SHUT_RDWR);
+		atomic_dec(&nbd->config->live_connections);
+	}
 	nsock->dead = true;
 	nsock->pending = NULL;
 	nsock->sent = 0;
@@ -257,6 +262,14 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 		return BLK_EH_HANDLED;
 	}
 
+	/* If we are waiting on our dead timer then we could get timeout
+	 * callbacks for our request.  For this we just want to reset the timer
+	 * and let the queue side take care of everything.
+	 */
+	if (!completion_done(&cmd->send_complete)) {
+		nbd_config_put(nbd);
+		return BLK_EH_RESET_TIMER;
+	}
 	config = nbd->config;
 
 	if (config->num_connections > 1) {
@@ -665,6 +678,19 @@ static int find_fallback(struct nbd_device *nbd, int index)
 	return new_index;
 }
 
+static int wait_for_reconnect(struct nbd_device *nbd)
+{
+	struct nbd_config *config = nbd->config;
+	if (!config->dead_conn_timeout)
+		return 0;
+	if (test_bit(NBD_DISCONNECTED, &config->runtime_flags))
+		return 0;
+	wait_event_interruptible_timeout(config->conn_wait,
+					 atomic_read(&config->live_connections),
+					 config->dead_conn_timeout);
+	return atomic_read(&config->live_connections);
+}
+
 static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 {
 	struct request *req = blk_mq_rq_from_pdu(cmd);
@@ -691,12 +717,24 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
 	nsock = config->socks[index];
 	mutex_lock(&nsock->tx_lock);
 	if (nsock->dead) {
+		int old_index = index;
 		index = find_fallback(nbd, index);
+		mutex_unlock(&nsock->tx_lock);
 		if (index < 0) {
-			ret = -EIO;
-			goto out;
+			if (wait_for_reconnect(nbd)) {
+				index = old_index;
+				goto again;
+			}
+			/* All the sockets should already be down at this point,
+			 * we just want to make sure that DISCONNECTED is set so
+			 * any requests that come in that were queue'ed waiting
+			 * for the reconnect timer don't trigger the timer again
+			 * and instead just error out.
+			 */
+			sock_shutdown(nbd);
+			nbd_config_put(nbd);
+			return -EIO;
 		}
-		mutex_unlock(&nsock->tx_lock);
 		goto again;
 	}
 
@@ -809,6 +847,7 @@ static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg,
 	nsock->sent = 0;
 	nsock->cookie = 0;
 	socks[config->num_connections++] = nsock;
+	atomic_inc(&config->live_connections);
 
 	return 0;
 }
@@ -860,6 +899,9 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
 		 * need to queue_work outside of the tx_mutex.
 		 */
 		queue_work(recv_workqueue, &args->work);
+
+		atomic_inc(&config->live_connections);
+		wake_up(&config->conn_wait);
 		return 0;
 	}
 	sockfd_put(sock);
@@ -1137,7 +1179,9 @@ static struct nbd_config *nbd_alloc_config(void)
 		return NULL;
 	atomic_set(&config->recv_threads, 0);
 	init_waitqueue_head(&config->recv_wq);
+	init_waitqueue_head(&config->conn_wait);
 	config->blksize = 1024;
+	atomic_set(&config->live_connections, 0);
 	try_module_get(THIS_MODULE);
 	return config;
 }
@@ -1449,6 +1493,7 @@ static struct nla_policy nbd_attr_policy[NBD_ATTR_MAX + 1] = {
 	[NBD_ATTR_SERVER_FLAGS]		=	{ .type = NLA_U64 },
 	[NBD_ATTR_CLIENT_FLAGS]		=	{ .type = NLA_U64 },
 	[NBD_ATTR_SOCKETS]		=	{ .type = NLA_NESTED},
+	[NBD_ATTR_DEAD_CONN_TIMEOUT]	=	{ .type = NLA_U64 },
 };
 
 static struct nla_policy nbd_sock_policy[NBD_SOCK_MAX + 1] = {
@@ -1535,6 +1580,11 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 		nbd->tag_set.timeout = timeout * HZ;
 		blk_queue_rq_timeout(nbd->disk->queue, timeout * HZ);
 	}
+	if (info->attrs[NBD_ATTR_DEAD_CONN_TIMEOUT]) {
+		config->dead_conn_timeout =
+			nla_get_u64(info->attrs[NBD_ATTR_DEAD_CONN_TIMEOUT]);
+		config->dead_conn_timeout *= HZ;
+	}
 	if (info->attrs[NBD_ATTR_SERVER_FLAGS])
 		config->flags =
 			nla_get_u64(info->attrs[NBD_ATTR_SERVER_FLAGS]);
@@ -1655,6 +1705,11 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 		nbd->tag_set.timeout = timeout * HZ;
 		blk_queue_rq_timeout(nbd->disk->queue, timeout * HZ);
 	}
+	if (info->attrs[NBD_ATTR_DEAD_CONN_TIMEOUT]) {
+		config->dead_conn_timeout =
+			nla_get_u64(info->attrs[NBD_ATTR_DEAD_CONN_TIMEOUT]);
+		config->dead_conn_timeout *= HZ;
+	}
 
 	if (info->attrs[NBD_ATTR_SOCKETS]) {
 		struct nlattr *attr;
diff --git a/include/uapi/linux/nbd-netlink.h b/include/uapi/linux/nbd-netlink.h
index b69105cc..c2209c75 100644
--- a/include/uapi/linux/nbd-netlink.h
+++ b/include/uapi/linux/nbd-netlink.h
@@ -32,6 +32,7 @@ enum {
 	NBD_ATTR_SERVER_FLAGS,
 	NBD_ATTR_CLIENT_FLAGS,
 	NBD_ATTR_SOCKETS,
+	NBD_ATTR_DEAD_CONN_TIMEOUT,
 	__NBD_ATTR_MAX,
 };
 #define NBD_ATTR_MAX (__NBD_ATTR_MAX - 1)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/12] nbd: add a status netlink command
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (8 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 09/12] nbd: handle dead connections Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 11/12] nbd: add device refcounting Josef Bacik
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

Allow users to query the status of existing nbd devices.  Right now this
only returns whether or not the device is connected, but could be
extended in the future to include more information.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c              | 108 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/nbd-netlink.h |  25 +++++++++
 2 files changed, 133 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index fd3d535..e1289d0 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -45,6 +45,7 @@
 
 static DEFINE_IDR(nbd_index_idr);
 static DEFINE_MUTEX(nbd_index_mutex);
+static int nbd_total_devices = 0;
 
 struct nbd_sock {
 	struct socket *sock;
@@ -130,6 +131,7 @@ static int nbd_dev_dbg_init(struct nbd_device *nbd);
 static void nbd_dev_dbg_close(struct nbd_device *nbd);
 static void nbd_config_put(struct nbd_device *nbd);
 static void nbd_connect_reply(struct genl_info *info, int index);
+static int nbd_genl_status(struct sk_buff *skb, struct genl_info *info);
 static void nbd_dead_link_work(struct work_struct *work);
 
 static inline struct device *nbd_to_dev(struct nbd_device *nbd)
@@ -1458,6 +1460,7 @@ static int nbd_dev_add(int index)
 	sprintf(disk->disk_name, "nbd%d", index);
 	nbd_reset(nbd);
 	add_disk(disk);
+	nbd_total_devices++;
 	return index;
 
 out_free_tags:
@@ -1494,12 +1497,22 @@ static struct nla_policy nbd_attr_policy[NBD_ATTR_MAX + 1] = {
 	[NBD_ATTR_CLIENT_FLAGS]		=	{ .type = NLA_U64 },
 	[NBD_ATTR_SOCKETS]		=	{ .type = NLA_NESTED},
 	[NBD_ATTR_DEAD_CONN_TIMEOUT]	=	{ .type = NLA_U64 },
+	[NBD_ATTR_DEVICE_LIST]		=	{ .type = NLA_NESTED},
 };
 
 static struct nla_policy nbd_sock_policy[NBD_SOCK_MAX + 1] = {
 	[NBD_SOCK_FD]			=	{ .type = NLA_U32 },
 };
 
+/* We don't use this right now since we don't parse the incoming list, but we
+ * still want it here so userspace knows what to expect.
+ */
+static struct nla_policy __attribute__((unused))
+nbd_device_policy[NBD_DEVICE_ATTR_MAX + 1] = {
+	[NBD_DEVICE_INDEX]		=	{ .type = NLA_U32 },
+	[NBD_DEVICE_CONNECTED]		=	{ .type = NLA_U8 },
+};
+
 static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nbd_device *nbd = NULL;
@@ -1765,6 +1778,11 @@ static const struct genl_ops nbd_connect_genl_ops[] = {
 		.policy	= nbd_attr_policy,
 		.doit	= nbd_genl_reconfigure,
 	},
+	{
+		.cmd	= NBD_CMD_STATUS,
+		.policy	= nbd_attr_policy,
+		.doit	= nbd_genl_status,
+	},
 };
 
 static const struct genl_multicast_group nbd_mcast_grps[] = {
@@ -1783,6 +1801,96 @@ static struct genl_family nbd_genl_family __ro_after_init = {
 	.n_mcgrps	= ARRAY_SIZE(nbd_mcast_grps),
 };
 
+static int populate_nbd_status(struct nbd_device *nbd, struct sk_buff *reply)
+{
+	struct nlattr *dev_opt;
+	u8 connected = 0;
+	int ret;
+
+	/* This is a little racey, but for status it's ok.  The
+	 * reason we don't take a ref here is because we can't
+	 * take a ref in the index == -1 case as we would need
+	 * to put under the nbd_index_mutex, which could
+	 * deadlock if we are configured to remove ourselves
+	 * once we're disconnected.
+	 */
+	if (refcount_read(&nbd->config_refs))
+		connected = 1;
+	dev_opt = nla_nest_start(reply, NBD_DEVICE_ITEM);
+	if (!dev_opt)
+		return -EMSGSIZE;
+	ret = nla_put_u32(reply, NBD_DEVICE_INDEX, nbd->index);
+	if (ret)
+		return -EMSGSIZE;
+	ret = nla_put_u8(reply, NBD_DEVICE_CONNECTED,
+			 connected);
+	if (ret)
+		return -EMSGSIZE;
+	nla_nest_end(reply, dev_opt);
+	return 0;
+}
+
+static int status_cb(int id, void *ptr, void *data)
+{
+	struct nbd_device *nbd = ptr;
+	return populate_nbd_status(nbd, (struct sk_buff *)data);
+}
+
+static int nbd_genl_status(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr *dev_list;
+	struct sk_buff *reply;
+	void *reply_head;
+	size_t msg_size;
+	int index = -1;
+	int ret = -ENOMEM;
+
+	if (info->attrs[NBD_ATTR_INDEX])
+		index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
+
+	mutex_lock(&nbd_index_mutex);
+
+	msg_size = nla_total_size(nla_attr_size(sizeof(u32)) +
+				  nla_attr_size(sizeof(u8)));
+	msg_size *= (index == -1) ? nbd_total_devices : 1;
+
+	reply = genlmsg_new(msg_size, GFP_KERNEL);
+	if (!reply)
+		goto out;
+	reply_head = genlmsg_put_reply(reply, info, &nbd_genl_family, 0,
+				       NBD_CMD_STATUS);
+	if (!reply_head) {
+		nlmsg_free(reply);
+		goto out;
+	}
+
+	dev_list = nla_nest_start(reply, NBD_ATTR_DEVICE_LIST);
+	if (index == -1) {
+		ret = idr_for_each(&nbd_index_idr, &status_cb, reply);
+		if (ret) {
+			nlmsg_free(reply);
+			goto out;
+		}
+	} else {
+		struct nbd_device *nbd;
+		nbd = idr_find(&nbd_index_idr, index);
+		if (nbd) {
+			ret = populate_nbd_status(nbd, reply);
+			if (ret) {
+				nlmsg_free(reply);
+				goto out;
+			}
+		}
+	}
+	nla_nest_end(reply, dev_list);
+	genlmsg_end(reply, reply_head);
+	genlmsg_reply(reply, info);
+	ret = 0;
+out:
+	mutex_unlock(&nbd_index_mutex);
+	return ret;
+}
+
 static void nbd_connect_reply(struct genl_info *info, int index)
 {
 	struct sk_buff *skb;
diff --git a/include/uapi/linux/nbd-netlink.h b/include/uapi/linux/nbd-netlink.h
index c2209c75..6f7ca3d 100644
--- a/include/uapi/linux/nbd-netlink.h
+++ b/include/uapi/linux/nbd-netlink.h
@@ -33,11 +33,35 @@ enum {
 	NBD_ATTR_CLIENT_FLAGS,
 	NBD_ATTR_SOCKETS,
 	NBD_ATTR_DEAD_CONN_TIMEOUT,
+	NBD_ATTR_DEVICE_LIST,
 	__NBD_ATTR_MAX,
 };
 #define NBD_ATTR_MAX (__NBD_ATTR_MAX - 1)
 
 /*
+ * This is the format for multiple devices with NBD_ATTR_DEVICE_LIST
+ *
+ * [NBD_ATTR_DEVICE_LIST]
+ *   [NBD_DEVICE_ITEM]
+ *     [NBD_DEVICE_INDEX]
+ *     [NBD_DEVICE_CONNECTED]
+ */
+enum {
+	NBD_DEVICE_ITEM_UNSPEC,
+	NBD_DEVICE_ITEM,
+	__NBD_DEVICE_ITEM_MAX,
+};
+#define NBD_DEVICE_ITEM_MAX (__NBD_DEVICE_ITEM_MAX - 1)
+
+enum {
+	NBD_DEVICE_UNSPEC,
+	NBD_DEVICE_INDEX,
+	NBD_DEVICE_CONNECTED,
+	__NBD_DEVICE_MAX,
+};
+#define NBD_DEVICE_ATTR_MAX (__NBD_DEVICE_MAX - 1)
+
+/*
  * This is the format for multiple sockets with NBD_ATTR_SOCKETS
  *
  * [NBD_ATTR_SOCKETS]
@@ -66,6 +90,7 @@ enum {
 	NBD_CMD_DISCONNECT,
 	NBD_CMD_RECONFIGURE,
 	NBD_CMD_LINK_DEAD,
+	NBD_CMD_STATUS,
 	__NBD_CMD_MAX,
 };
 #define NBD_CMD_MAX	(__NBD_CMD_MAX - 1)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/12] nbd: add device refcounting
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (9 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 10/12] nbd: add a status netlink command Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:02 ` [PATCH 12/12] nbd: add a flag to destroy an nbd device on disconnect Josef Bacik
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

In order to support deleting the device on disconnect we need to
refcount the actual nbd_device struct.  So add the refcounting framework
and change how we free the normal devices at rmmod time so we can catch
reference leaks.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 104 +++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 86 insertions(+), 18 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index e1289d0..b33014a 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -99,10 +99,12 @@ struct nbd_device {
 
 	int index;
 	refcount_t config_refs;
+	refcount_t refs;
 	struct nbd_config *config;
 	struct mutex config_lock;
 	struct gendisk *disk;
 
+	struct list_head list;
 	struct task_struct *task_recv;
 	struct task_struct *task_setup;
 };
@@ -165,6 +167,28 @@ static struct device_attribute pid_attr = {
 	.show = pid_show,
 };
 
+static void nbd_dev_remove(struct nbd_device *nbd)
+{
+	struct gendisk *disk = nbd->disk;
+	if (disk) {
+		del_gendisk(disk);
+		blk_cleanup_queue(disk->queue);
+		blk_mq_free_tag_set(&nbd->tag_set);
+		put_disk(disk);
+	}
+	kfree(nbd);
+}
+
+static void nbd_put(struct nbd_device *nbd)
+{
+	if (refcount_dec_and_mutex_lock(&nbd->refs,
+					&nbd_index_mutex)) {
+		idr_remove(&nbd_index_idr, nbd->index);
+		mutex_unlock(&nbd_index_mutex);
+		nbd_dev_remove(nbd);
+	}
+}
+
 static int nbd_disconnected(struct nbd_config *config)
 {
 	return test_bit(NBD_DISCONNECTED, &config->runtime_flags) ||
@@ -1005,6 +1029,7 @@ static void nbd_config_put(struct nbd_device *nbd)
 		}
 		nbd_reset(nbd);
 		mutex_unlock(&nbd->config_lock);
+		nbd_put(nbd);
 		module_put(THIS_MODULE);
 	}
 }
@@ -1199,6 +1224,10 @@ static int nbd_open(struct block_device *bdev, fmode_t mode)
 		ret = -ENXIO;
 		goto out;
 	}
+	if (!refcount_inc_not_zero(&nbd->refs)) {
+		ret = -ENXIO;
+		goto out;
+	}
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		struct nbd_config *config;
 
@@ -1214,6 +1243,7 @@ static int nbd_open(struct block_device *bdev, fmode_t mode)
 			goto out;
 		}
 		refcount_set(&nbd->config_refs, 1);
+		refcount_inc(&nbd->refs);
 		mutex_unlock(&nbd->config_lock);
 	}
 out:
@@ -1225,6 +1255,7 @@ static void nbd_release(struct gendisk *disk, fmode_t mode)
 {
 	struct nbd_device *nbd = disk->private_data;
 	nbd_config_put(nbd);
+	nbd_put(nbd);
 }
 
 static const struct block_device_operations nbd_fops =
@@ -1378,18 +1409,6 @@ static const struct blk_mq_ops nbd_mq_ops = {
 	.timeout	= nbd_xmit_timeout,
 };
 
-static void nbd_dev_remove(struct nbd_device *nbd)
-{
-	struct gendisk *disk = nbd->disk;
-	if (disk) {
-		del_gendisk(disk);
-		blk_cleanup_queue(disk->queue);
-		blk_mq_free_tag_set(&nbd->tag_set);
-		put_disk(disk);
-	}
-	kfree(nbd);
-}
-
 static int nbd_dev_add(int index)
 {
 	struct nbd_device *nbd;
@@ -1453,6 +1472,8 @@ static int nbd_dev_add(int index)
 
 	mutex_init(&nbd->config_lock);
 	refcount_set(&nbd->config_refs, 0);
+	refcount_set(&nbd->refs, 1);
+	INIT_LIST_HEAD(&nbd->list);
 	disk->major = NBD_MAJOR;
 	disk->first_minor = index << part_shift;
 	disk->fops = &nbd_fops;
@@ -1550,16 +1571,26 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 	} else {
 		nbd = idr_find(&nbd_index_idr, index);
 	}
-	mutex_unlock(&nbd_index_mutex);
 	if (!nbd) {
 		printk(KERN_ERR "nbd: couldn't find device at index %d\n",
 		       index);
+		mutex_unlock(&nbd_index_mutex);
+		return -EINVAL;
+	}
+	if (!refcount_inc_not_zero(&nbd->refs)) {
+		mutex_unlock(&nbd_index_mutex);
+		if (index == -1)
+			goto again;
+		printk(KERN_ERR "nbd: device at index %d is going down\n",
+		       index);
 		return -EINVAL;
 	}
+	mutex_unlock(&nbd_index_mutex);
 
 	mutex_lock(&nbd->config_lock);
 	if (refcount_read(&nbd->config_refs)) {
 		mutex_unlock(&nbd->config_lock);
+		nbd_put(nbd);
 		if (index == -1)
 			goto again;
 		printk(KERN_ERR "nbd: nbd%d already in use\n", index);
@@ -1567,11 +1598,13 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 	}
 	if (WARN_ON(nbd->config)) {
 		mutex_unlock(&nbd->config_lock);
+		nbd_put(nbd);
 		return -EINVAL;
 	}
 	config = nbd->config = nbd_alloc_config();
 	if (!nbd->config) {
 		mutex_unlock(&nbd->config_lock);
+		nbd_put(nbd);
 		printk(KERN_ERR "nbd: couldn't allocate config\n");
 		return -ENOMEM;
 	}
@@ -1656,14 +1689,23 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
 	index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
 	mutex_lock(&nbd_index_mutex);
 	nbd = idr_find(&nbd_index_idr, index);
-	mutex_unlock(&nbd_index_mutex);
 	if (!nbd) {
+		mutex_unlock(&nbd_index_mutex);
 		printk(KERN_ERR "nbd: couldn't find device at index %d\n",
 		       index);
 		return -EINVAL;
 	}
-	if (!refcount_inc_not_zero(&nbd->config_refs))
+	if (!refcount_inc_not_zero(&nbd->refs)) {
+		mutex_unlock(&nbd_index_mutex);
+		printk(KERN_ERR "nbd: device at index %d is going down\n",
+		       index);
+		return -EINVAL;
+	}
+	mutex_unlock(&nbd_index_mutex);
+	if (!refcount_inc_not_zero(&nbd->config_refs)) {
+		nbd_put(nbd);
 		return 0;
+	}
 	mutex_lock(&nbd->config_lock);
 	nbd_disconnect(nbd);
 	mutex_unlock(&nbd->config_lock);
@@ -1671,6 +1713,7 @@ static int nbd_genl_disconnect(struct sk_buff *skb, struct genl_info *info)
 			       &nbd->config->runtime_flags))
 		nbd_config_put(nbd);
 	nbd_config_put(nbd);
+	nbd_put(nbd);
 	return 0;
 }
 
@@ -1691,16 +1734,24 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 	index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]);
 	mutex_lock(&nbd_index_mutex);
 	nbd = idr_find(&nbd_index_idr, index);
-	mutex_unlock(&nbd_index_mutex);
 	if (!nbd) {
+		mutex_unlock(&nbd_index_mutex);
 		printk(KERN_ERR "nbd: couldn't find a device at index %d\n",
 		       index);
 		return -EINVAL;
 	}
+	if (!refcount_inc_not_zero(&nbd->refs)) {
+		mutex_unlock(&nbd_index_mutex);
+		printk(KERN_ERR "nbd: device at index %d is going down\n",
+		       index);
+		return -EINVAL;
+	}
+	mutex_unlock(&nbd_index_mutex);
 
 	if (!refcount_inc_not_zero(&nbd->config_refs)) {
 		dev_err(nbd_to_dev(nbd),
 			"not configured, cannot reconfigure\n");
+		nbd_put(nbd);
 		return -EINVAL;
 	}
 
@@ -1759,6 +1810,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 out:
 	mutex_unlock(&nbd->config_lock);
 	nbd_config_put(nbd);
+	nbd_put(nbd);
 	return ret;
 }
 
@@ -2004,16 +2056,32 @@ static int __init nbd_init(void)
 
 static int nbd_exit_cb(int id, void *ptr, void *data)
 {
+	struct list_head *list = (struct list_head *)data;
 	struct nbd_device *nbd = ptr;
-	nbd_dev_remove(nbd);
+
+	refcount_inc(&nbd->refs);
+	list_add_tail(&nbd->list, list);
 	return 0;
 }
 
 static void __exit nbd_cleanup(void)
 {
+	struct nbd_device *nbd;
+	LIST_HEAD(del_list);
+
 	nbd_dbg_close();
 
-	idr_for_each(&nbd_index_idr, &nbd_exit_cb, NULL);
+	mutex_lock(&nbd_index_mutex);
+	idr_for_each(&nbd_index_idr, &nbd_exit_cb, &del_list);
+	mutex_unlock(&nbd_index_mutex);
+
+	list_for_each_entry(nbd, &del_list, list) {
+		if (refcount_read(&nbd->refs) != 2)
+			printk(KERN_ERR "nbd: possibly leaking a device\n");
+		nbd_put(nbd);
+		nbd_put(nbd);
+	}
+
 	idr_destroy(&nbd_index_idr);
 	genl_unregister_family(&nbd_genl_family);
 	destroy_workqueue(recv_workqueue);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/12] nbd: add a flag to destroy an nbd device on disconnect
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (10 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 11/12] nbd: add device refcounting Josef Bacik
@ 2017-04-06 21:02 ` Josef Bacik
  2017-04-06 21:05 ` [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
  2017-04-17 15:59 ` Jens Axboe
  13 siblings, 0 replies; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:02 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team

For ease of management it would be nice for users to specify that the
device node for a nbd device is destroyed once it is disconnected and
there are no more users.  Add a client flag and enable this operation to
happen.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c      | 30 ++++++++++++++++++++++++++++++
 include/uapi/linux/nbd.h |  6 +++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b33014a..d220045 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -74,6 +74,7 @@ struct link_dead_args {
 #define NBD_HAS_PID_FILE		3
 #define NBD_HAS_CONFIG_REF		4
 #define NBD_BOUND			5
+#define NBD_DESTROY_ON_DISCONNECT	6
 
 struct nbd_config {
 	u32 flags;
@@ -174,6 +175,7 @@ static void nbd_dev_remove(struct nbd_device *nbd)
 		del_gendisk(disk);
 		blk_cleanup_queue(disk->queue);
 		blk_mq_free_tag_set(&nbd->tag_set);
+		disk->private_data = NULL;
 		put_disk(disk);
 	}
 	kfree(nbd);
@@ -1028,6 +1030,7 @@ static void nbd_config_put(struct nbd_device *nbd)
 			kfree(config->socks);
 		}
 		nbd_reset(nbd);
+
 		mutex_unlock(&nbd->config_lock);
 		nbd_put(nbd);
 		module_put(THIS_MODULE);
@@ -1540,6 +1543,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 	struct nbd_config *config;
 	int index = -1;
 	int ret;
+	bool put_dev = false;
 
 	if (!netlink_capable(skb, CAP_SYS_ADMIN))
 		return -EPERM;
@@ -1634,6 +1638,15 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 	if (info->attrs[NBD_ATTR_SERVER_FLAGS])
 		config->flags =
 			nla_get_u64(info->attrs[NBD_ATTR_SERVER_FLAGS]);
+	if (info->attrs[NBD_ATTR_CLIENT_FLAGS]) {
+		u64 flags = nla_get_u64(info->attrs[NBD_ATTR_CLIENT_FLAGS]);
+		if (flags & NBD_CFLAG_DESTROY_ON_DISCONNECT) {
+			set_bit(NBD_DESTROY_ON_DISCONNECT,
+				&config->runtime_flags);
+			put_dev = true;
+		}
+	}
+
 	if (info->attrs[NBD_ATTR_SOCKETS]) {
 		struct nlattr *attr;
 		int rem, fd;
@@ -1671,6 +1684,8 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info)
 		nbd_connect_reply(info, nbd->index);
 	}
 	nbd_config_put(nbd);
+	if (put_dev)
+		nbd_put(nbd);
 	return ret;
 }
 
@@ -1723,6 +1738,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 	struct nbd_config *config;
 	int index;
 	int ret = -EINVAL;
+	bool put_dev = false;
 
 	if (!netlink_capable(skb, CAP_SYS_ADMIN))
 		return -EPERM;
@@ -1774,6 +1790,18 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 			nla_get_u64(info->attrs[NBD_ATTR_DEAD_CONN_TIMEOUT]);
 		config->dead_conn_timeout *= HZ;
 	}
+	if (info->attrs[NBD_ATTR_CLIENT_FLAGS]) {
+		u64 flags = nla_get_u64(info->attrs[NBD_ATTR_CLIENT_FLAGS]);
+		if (flags & NBD_CFLAG_DESTROY_ON_DISCONNECT) {
+			if (!test_and_set_bit(NBD_DESTROY_ON_DISCONNECT,
+					      &config->runtime_flags))
+				put_dev = true;
+		} else {
+			if (test_and_clear_bit(NBD_DESTROY_ON_DISCONNECT,
+					       &config->runtime_flags))
+				refcount_inc(&nbd->refs);
+		}
+	}
 
 	if (info->attrs[NBD_ATTR_SOCKETS]) {
 		struct nlattr *attr;
@@ -1811,6 +1839,8 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, struct genl_info *info)
 	mutex_unlock(&nbd->config_lock);
 	nbd_config_put(nbd);
 	nbd_put(nbd);
+	if (put_dev)
+		nbd_put(nbd);
 	return ret;
 }
 
diff --git a/include/uapi/linux/nbd.h b/include/uapi/linux/nbd.h
index c91c642..155e33f 100644
--- a/include/uapi/linux/nbd.h
+++ b/include/uapi/linux/nbd.h
@@ -37,7 +37,7 @@ enum {
 	NBD_CMD_TRIM = 4
 };
 
-/* values for flags field */
+/* values for flags field, these are server interaction specific. */
 #define NBD_FLAG_HAS_FLAGS	(1 << 0) /* nbd-server supports flags */
 #define NBD_FLAG_READ_ONLY	(1 << 1) /* device is read-only */
 #define NBD_FLAG_SEND_FLUSH	(1 << 2) /* can flush writeback cache */
@@ -45,6 +45,10 @@ enum {
 #define NBD_FLAG_SEND_TRIM	(1 << 5) /* send trim/discard */
 #define NBD_FLAG_CAN_MULTI_CONN	(1 << 8)	/* Server supports multiple connections per export. */
 
+/* These are client behavior specific flags. */
+#define NBD_CFLAG_DESTROY_ON_DISCONNECT	(1 << 0) /* delete the nbd device on
+						    disconnect. */
+
 /* userspace doesn't need the nbd_device structure */
 
 /* These are sent over the network in the request/reply magic fields */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/12] nbd: Netlink interface and path failure enhancements
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (11 preceding siblings ...)
  2017-04-06 21:02 ` [PATCH 12/12] nbd: add a flag to destroy an nbd device on disconnect Josef Bacik
@ 2017-04-06 21:05 ` Josef Bacik
  2017-04-07 13:04   ` [Nbd] " Wouter Verhelst
  2017-04-17 15:59 ` Jens Axboe
  13 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2017-04-06 21:05 UTC (permalink / raw)
  To: Jens Axboe, nbd-general, linux-block, kernel-team


> On Apr 6, 2017, at 5:01 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>=20
> This patchset adds a new netlink configuration interface to NBD as =
well as a
> bunch of enhancments around path failures.  The patches provide the =
following
> enhancemnts to NBD
>=20
> - Netlink configuration interface that doesn't leave a userspace =
application
>   waiting in kernel space for the device to disconnect.
> - Netlink reconfigure interface for adding re-connected sockets to =
replace dead
>   sockets.
> - A flag to destroy the NBD device on disconnect, much like how mount =
-o loop
>   works.
> - A status interface that currently will only report whether a device =
is
>   connected or not, but can be extended to include whatever in the =
future.
> - A netlink multicast notification scheme to notify user space when =
there are
>   connection issues to allow for seamless reconnects.
> - Dead link handling.  You can specify a dead link timeout and the NBD =
device
>   will pause IO for that timeout waiting to see if the connection can =
be
>   re-established.  This is helpful to allow for things like nbd server =
upgrades
>   where the whole server disappears for a short period of time.
>=20
> These patches have been thorougly and continuously tested for about a =
month.
> I've been finding bugs in various places, but this batch has been =
solid for the
> last few days of testing, which include a constant =
disconnect/reconnect torture
> test.  Thanks,

Oops forgot to mention that I have patches for the nbd user space side =
that utilizes all of these new interfaces, you can find the tree here

https://github.com/josefbacik/nbd

The user space patches are a bit rough, but utilize everything so good =
for testing.  They will be cleaned up and submitted once the kernel part =
is upstream.  Thanks,

Josef=

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Nbd] [PATCH 00/12] nbd: Netlink interface and path failure enhancements
  2017-04-06 21:05 ` [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
@ 2017-04-07 13:04   ` Wouter Verhelst
  0 siblings, 0 replies; 16+ messages in thread
From: Wouter Verhelst @ 2017-04-07 13:04 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Jens Axboe, nbd-general, linux-block, kernel-team

Hi Josef,

On Thu, Apr 06, 2017 at 05:05:25PM -0400, Josef Bacik wrote:
> 
> > On Apr 6, 2017, at 5:01 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > 
> > This patchset adds a new netlink configuration interface to NBD as well as a
> > bunch of enhancments around path failures.  The patches provide the following
> > enhancemnts to NBD
> > 
> > - Netlink configuration interface that doesn't leave a userspace application
> >   waiting in kernel space for the device to disconnect.
> > - Netlink reconfigure interface for adding re-connected sockets to replace dead
> >   sockets.
> > - A flag to destroy the NBD device on disconnect, much like how mount -o loop
> >   works.
> > - A status interface that currently will only report whether a device is
> >   connected or not, but can be extended to include whatever in the future.
> > - A netlink multicast notification scheme to notify user space when there are
> >   connection issues to allow for seamless reconnects.
> > - Dead link handling.  You can specify a dead link timeout and the NBD device
> >   will pause IO for that timeout waiting to see if the connection can be
> >   re-established.  This is helpful to allow for things like nbd server upgrades
> >   where the whole server disappears for a short period of time.
> > 
> > These patches have been thorougly and continuously tested for about a month.
> > I've been finding bugs in various places, but this batch has been solid for the
> > last few days of testing, which include a constant disconnect/reconnect torture
> > test.  Thanks,
> 
> Oops forgot to mention that I have patches for the nbd user space side
> that utilizes all of these new interfaces, you can find the tree here
> 
> https://github.com/josefbacik/nbd

Yeah, found that already :-)

> The user space patches are a bit rough, but utilize everything so good
> for testing. They will be cleaned up and submitted once the kernel
> part is upstream. Thanks,

No worries, I'll probably clean it up myself. TBH, cleaning up
nbd-client so it becomes somewhat less of a beast has been on my TODO
list for a while, and this will only add to it... maybe that'll finally
convince me that it's time ;-)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/12] nbd: Netlink interface and path failure enhancements
  2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
                   ` (12 preceding siblings ...)
  2017-04-06 21:05 ` [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
@ 2017-04-17 15:59 ` Jens Axboe
  13 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2017-04-17 15:59 UTC (permalink / raw)
  To: Josef Bacik, nbd-general, linux-block, kernel-team

On 04/06/2017 03:01 PM, Josef Bacik wrote:
> This patchset adds a new netlink configuration interface to NBD as well as a
> bunch of enhancments around path failures.  The patches provide the following
> enhancemnts to NBD
> 
>  - Netlink configuration interface that doesn't leave a userspace application
>    waiting in kernel space for the device to disconnect.
>  - Netlink reconfigure interface for adding re-connected sockets to replace dead
>    sockets.
>  - A flag to destroy the NBD device on disconnect, much like how mount -o loop
>    works.
>  - A status interface that currently will only report whether a device is
>    connected or not, but can be extended to include whatever in the future.
>  - A netlink multicast notification scheme to notify user space when there are
>    connection issues to allow for seamless reconnects.
>  - Dead link handling.  You can specify a dead link timeout and the NBD device
>    will pause IO for that timeout waiting to see if the connection can be
>    re-established.  This is helpful to allow for things like nbd server upgrades
>    where the whole server disappears for a short period of time.
> 
> These patches have been thorougly and continuously tested for about a month.
> I've been finding bugs in various places, but this batch has been solid for the
> last few days of testing, which include a constant disconnect/reconnect torture
> test.  Thanks,

Added for 4.12, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-04-17 15:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 21:01 [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
2017-04-06 21:01 ` [PATCH 01/12] nbd: put socket in error cases Josef Bacik
2017-04-06 21:01 ` [PATCH 02/12] nbd: handle single path failures gracefully Josef Bacik
2017-04-06 21:01 ` [PATCH 03/12] nbd: separate out the config information Josef Bacik
2017-04-06 21:01 ` [PATCH 04/12] nbd: stop using the bdev everywhere Josef Bacik
2017-04-06 21:02 ` [PATCH 05/12] nbd: add a basic netlink interface Josef Bacik
2017-04-06 21:02 ` [PATCH 06/12] nbd: add a reconfigure netlink command Josef Bacik
2017-04-06 21:02 ` [PATCH 07/12] nbd: multicast dead link notifications Josef Bacik
2017-04-06 21:02 ` [PATCH 08/12] nbd: only clear the queue on device teardown Josef Bacik
2017-04-06 21:02 ` [PATCH 09/12] nbd: handle dead connections Josef Bacik
2017-04-06 21:02 ` [PATCH 10/12] nbd: add a status netlink command Josef Bacik
2017-04-06 21:02 ` [PATCH 11/12] nbd: add device refcounting Josef Bacik
2017-04-06 21:02 ` [PATCH 12/12] nbd: add a flag to destroy an nbd device on disconnect Josef Bacik
2017-04-06 21:05 ` [PATCH 00/12] nbd: Netlink interface and path failure enhancements Josef Bacik
2017-04-07 13:04   ` [Nbd] " Wouter Verhelst
2017-04-17 15:59 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.