linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] NBD fixes for caching and block device flags
@ 2013-02-12 16:06 Paolo Bonzini
  2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-12 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: nbd-general, Paul Clements, Andrew Morton

The first two patches improve the behavior of NBD with respect to the host
cache---respectively the server's cache and the client's.  The first adds
support for flushing the backing storage, so that NBD is safe against
power losses.  The second properly syncs and cleans the client's page
cache when an NBD device is disconnected from the server.

The third reports read-only devices properly in sysfs.

Ok for 3.9?  I saw the last NBD patches were applied by Andrew Morton,
so I'm CCing him; Paul, please ack them if they are okay.

Paolo

Alex Bligh (1):
  nbd: support FLUSH requests

Paolo Bonzini (2):
  nbd: fsync and kill block device on shutdown
  nbd: show read-only state in sysfs

 drivers/block/nbd.c      |   35 +++++++++++++++++++++++++++++++++--
 include/uapi/linux/nbd.h |    3 ++-
 2 files changed, 35 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 16:06 [PATCH 0/3] NBD fixes for caching and block device flags Paolo Bonzini
@ 2013-02-12 16:06 ` Paolo Bonzini
  2013-02-12 17:37   ` Alex Bligh
  2013-02-12 22:07   ` Paul Clements
  2013-02-12 16:06 ` [PATCH 2/3] nbd: fsync and kill block device on shutdown Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-12 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alex Bligh, nbd-general, Paul Clements, Andrew Morton

From: Alex Bligh <alex@alex.org.uk>

The NBD device does not support writeback caching, thus it is not safe
against power losses unless the client opens the target with O_DSYNC or
O_SYNC.

Add support for a new flag that the server can pass.  If the flag is
enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH
command.

Cc: <nbd-general@lists.sf.net>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alex Bligh <alex@alex.org.uk>
[ Removed FUA support for reasons similar to those outlined in
  https://lkml.org/lkml/2010/8/17/234 for virtio - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/block/nbd.c      |   22 ++++++++++++++++++++--
 include/uapi/linux/nbd.h |    3 ++-
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 043ddcc..5603765 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -98,6 +98,7 @@ static const char *nbdcmd_to_ascii(int cmd)
 	case  NBD_CMD_READ: return "read";
 	case NBD_CMD_WRITE: return "write";
 	case  NBD_CMD_DISC: return "disconnect";
+	case NBD_CMD_FLUSH: return "flush";
 	case  NBD_CMD_TRIM: return "trim/discard";
 	}
 	return "invalid";
@@ -244,8 +245,15 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req)
 
 	request.magic = htonl(NBD_REQUEST_MAGIC);
 	request.type = htonl(nbd_cmd(req));
-	request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9);
-	request.len = htonl(size);
+
+	if (nbd_cmd(req) == NBD_CMD_FLUSH) {
+		/* Other values are reserved for FLUSH requests.  */
+		request.from = 0;
+		request.len = 0;
+	} else {
+		request.from = cpu_to_be64((u64)blk_rq_pos(req) << 9);
+		request.len = htonl(size);
+	}
 	memcpy(request.handle, &req, sizeof(req));
 
 	dprintk(DBG_TX, "%s: request %p: sending control (%s@%llu,%uB)\n",
@@ -482,6 +490,11 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req)
 		}
 	}
 
+	if (req->cmd_flags & REQ_FLUSH) {
+		BUG_ON(unlikely(blk_rq_sectors(req)));
+		nbd_cmd(req) = NBD_CMD_FLUSH;
+	}
+
 	req->errors = 0;
 
 	mutex_lock(&nbd->tx_lock);
@@ -684,6 +697,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		if (nbd->flags & NBD_FLAG_SEND_TRIM)
 			queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
 				nbd->disk->queue);
+		if (nbd->flags & NBD_FLAG_SEND_FLUSH)
+			blk_queue_flush(nbd->disk->queue, REQ_FLUSH);
+		else
+			blk_queue_flush(nbd->disk->queue, 0);
 
 		thread = kthread_create(nbd_thread, nbd, nbd->disk->disk_name);
 		if (IS_ERR(thread)) {
@@ -705,6 +722,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
 		if (file)
 			fput(file);
+		nbd->flags = 0;
 		nbd->bytesize = 0;
 		bdev->bd_inode->i_size = 0;
 		set_capacity(nbd->disk, 0);
diff --git a/include/uapi/linux/nbd.h b/include/uapi/linux/nbd.h
index dfb5144..4f52549 100644
--- a/include/uapi/linux/nbd.h
+++ b/include/uapi/linux/nbd.h
@@ -33,13 +33,14 @@ enum {
 	NBD_CMD_READ = 0,
 	NBD_CMD_WRITE = 1,
 	NBD_CMD_DISC = 2,
-	/* there is a gap here to match userspace */
+	NBD_CMD_FLUSH = 3,
 	NBD_CMD_TRIM = 4
 };
 
 /* values for flags field */
 #define NBD_FLAG_HAS_FLAGS    (1 << 0) /* nbd-server supports flags */
 #define NBD_FLAG_READ_ONLY    (1 << 1) /* device is read-only */
+#define NBD_FLAG_SEND_FLUSH   (1 << 2) /* can flush writeback cache */
 /* there is a gap here to match userspace */
 #define NBD_FLAG_SEND_TRIM    (1 << 5) /* send trim/discard */
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] nbd: fsync and kill block device on shutdown
  2013-02-12 16:06 [PATCH 0/3] NBD fixes for caching and block device flags Paolo Bonzini
  2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
@ 2013-02-12 16:06 ` Paolo Bonzini
  2013-02-12 21:41   ` Andrew Morton
  2013-02-12 22:15   ` Paul Clements
  2013-02-12 16:06 ` [PATCH 3/3] nbd: show read-only state in sysfs Paolo Bonzini
  2013-02-12 21:43 ` [PATCH 0/3] NBD fixes for caching and block device flags Andrew Morton
  3 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-12 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: stable, nbd-general, Paul Clements, Andrew Morton

There are two problems with shutdown in the NBD driver.  The first is
that receiving the NBD_DISCONNECT ioctl does not sync the filesystem;
this is useful because BLKFLSBUF is restricted to processes that have
CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the
block device does not sync the filesystem, either).

The second is that once we clear the socket we have no guarantee that
later reads will come from the same backing storage.  Thus the page cache
must be cleaned, lest reads that hit on the page cache will return stale
data from the previously-accessible disk.

Example:

    # qemu-nbd -r -c/dev/nbd0 /dev/sr0
    # file -s /dev/nbd0
    /dev/stdin: # UDF filesystem data (version 1.5) etc.
    # qemu-nbd -d /dev/nbd0
    # qemu-nbd -r -c/dev/nbd0 /dev/sda
    # file -s /dev/nbd0
    /dev/stdin: # UDF filesystem data (version 1.5) etc.

While /dev/sda has:

    # file -s /dev/sda
    /dev/sda: x86 boot sector; etc.

Cc: <stable@vger.kernel.org>
Cc: <nbd-general@lists.sf.net>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/block/nbd.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5603765..a9c5c7a 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -608,12 +608,20 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		struct request sreq;
 
 		dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n");
+		if (!nbd->sock)
+			return -EINVAL;
 
+		mutex_unlock(&nbd->tx_lock);
+		fsync_bdev(bdev);
+		mutex_lock(&nbd->tx_lock);
 		blk_rq_init(NULL, &sreq);
 		sreq.cmd_type = REQ_TYPE_SPECIAL;
 		nbd_cmd(&sreq) = NBD_CMD_DISC;
+
+		/* Check again after getting mutex back.  */
 		if (!nbd->sock)
 			return -EINVAL;
+
 		nbd_send_req(nbd, &sreq);
                 return 0;
 	}
@@ -627,6 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		nbd_clear_que(nbd);
 		BUG_ON(!list_empty(&nbd->queue_head));
 		BUG_ON(!list_empty(&nbd->waiting_queue));
+		kill_bdev(bdev);
 		if (file)
 			fput(file);
 		return 0;
@@ -719,6 +728,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		nbd->file = NULL;
 		nbd_clear_que(nbd);
 		dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
+		kill_bdev(bdev);
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
 		if (file)
 			fput(file);
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] nbd: show read-only state in sysfs
  2013-02-12 16:06 [PATCH 0/3] NBD fixes for caching and block device flags Paolo Bonzini
  2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
  2013-02-12 16:06 ` [PATCH 2/3] nbd: fsync and kill block device on shutdown Paolo Bonzini
@ 2013-02-12 16:06 ` Paolo Bonzini
  2013-02-12 22:16   ` Paul Clements
  2013-02-12 21:43 ` [PATCH 0/3] NBD fixes for caching and block device flags Andrew Morton
  3 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-12 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: nbd-general, Paul Clements, Andrew Morton

Pass the read-only flag to set_device_ro, so that it will be
visible to the block layer and in sysfs.

Cc: <nbd-general@lists.sf.net>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/block/nbd.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a9c5c7a..ef17c2e 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -703,6 +703,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 
 		mutex_unlock(&nbd->tx_lock);
 
+		if (nbd->flags & NBD_FLAG_READ_ONLY)
+			set_device_ro(bdev, true);
 		if (nbd->flags & NBD_FLAG_SEND_TRIM)
 			queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
 				nbd->disk->queue);
@@ -730,6 +732,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
 		kill_bdev(bdev);
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
+		set_device_ro(bdev, false);
 		if (file)
 			fput(file);
 		nbd->flags = 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
@ 2013-02-12 17:37   ` Alex Bligh
  2013-02-12 18:06     ` Paolo Bonzini
  2013-02-12 22:07   ` Paul Clements
  1 sibling, 1 reply; 20+ messages in thread
From: Alex Bligh @ 2013-02-12 17:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bligh, linux-kernel, nbd-general, Paul Clements, Andrew Morton

Paolo,

> Add support for a new flag that the server can pass.  If the flag is
> enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH
> command.
> 
> Cc: <nbd-general@lists.sf.net>
> Cc: Paul Clements <Paul.Clements@steeleye.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> [ Removed FUA support for reasons similar to those outlined in
>  https://lkml.org/lkml/2010/8/17/234 for virtio - Paolo ]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


For my education, why remove the FUA stuff? The link you posted
was for virtio drivers, and says:

> > Hmmm... the underlying storage could be md/dm RAIDs in which case FUA
> > should be cheaper than FLUSH.
> 
> If someone ever wrote a virtio-blk backend that sits directly ontop
> of the Linux block layer that would be true.

In this case we don't know what the backend is sitting on top of
a-priori. It might be the current nbd server code, but it might
not be.

-- 
Alex Bligh





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 17:37   ` Alex Bligh
@ 2013-02-12 18:06     ` Paolo Bonzini
  2013-02-12 21:32       ` Andrew Morton
  2013-02-13  0:00       ` Alex Bligh
  0 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-12 18:06 UTC (permalink / raw)
  To: Alex Bligh; +Cc: linux-kernel, nbd-general, Paul Clements, Andrew Morton

Il 12/02/2013 18:37, Alex Bligh ha scritto:
> For my education, why remove the FUA stuff?

Because I had no way to test it.

>>> > > Hmmm... the underlying storage could be md/dm RAIDs in which case FUA
>>> > > should be cheaper than FLUSH.
>> > 
>> > If someone ever wrote a virtio-blk backend that sits directly ontop
>> > of the Linux block layer that would be true.
> In this case we don't know what the backend is sitting on top of
> a-priori. It might be the current nbd server code, but it might
> not be.

Do you know of any other NBD server than the "official" one and qemu-nbd?

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 18:06     ` Paolo Bonzini
@ 2013-02-12 21:32       ` Andrew Morton
  2013-02-13  0:03         ` Alex Bligh
  2013-02-13  0:00       ` Alex Bligh
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2013-02-12 21:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Bligh, linux-kernel, nbd-general, Paul Clements

On Tue, 12 Feb 2013 19:06:09 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 12/02/2013 18:37, Alex Bligh ha scritto:
> > For my education, why remove the FUA stuff?
> 
> Because I had no way to test it.
> 
> >>> > > Hmmm... the underlying storage could be md/dm RAIDs in which case FUA
> >>> > > should be cheaper than FLUSH.
> >> > 
> >> > If someone ever wrote a virtio-blk backend that sits directly ontop
> >> > of the Linux block layer that would be true.
> > In this case we don't know what the backend is sitting on top of
> > a-priori. It might be the current nbd server code, but it might
> > not be.
> 
> Do you know of any other NBD server than the "official" one and qemu-nbd?
> 

Obviously the changelog was inadequate.  Please send along a new one
which fully describes the reasons for this change.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown
  2013-02-12 16:06 ` [PATCH 2/3] nbd: fsync and kill block device on shutdown Paolo Bonzini
@ 2013-02-12 21:41   ` Andrew Morton
  2013-02-13 13:05     ` Paolo Bonzini
  2013-02-12 22:15   ` Paul Clements
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2013-02-12 21:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, stable, nbd-general, Paul Clements

On Tue, 12 Feb 2013 17:06:10 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> There are two problems with shutdown in the NBD driver.  The first is
> that receiving the NBD_DISCONNECT ioctl does not sync the filesystem;
> this is useful because BLKFLSBUF is restricted to processes that have
> CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the
> block device does not sync the filesystem, either).

hm, this says that the lack of a sync is "useful".  I think you mean
that the patch-which-adds-the-sync is the thing which is useful, yes?

> The second is that once we clear the socket we have no guarantee that
> later reads will come from the same backing storage.  Thus the page cache
> must be cleaned, lest reads that hit on the page cache will return stale
> data from the previously-accessible disk.

That sounds like a problem.

> Example:
> 
>     # qemu-nbd -r -c/dev/nbd0 /dev/sr0
>     # file -s /dev/nbd0
>     /dev/stdin: # UDF filesystem data (version 1.5) etc.
>     # qemu-nbd -d /dev/nbd0
>     # qemu-nbd -r -c/dev/nbd0 /dev/sda
>     # file -s /dev/nbd0
>     /dev/stdin: # UDF filesystem data (version 1.5) etc.
> 
> While /dev/sda has:
> 
>     # file -s /dev/sda
>     /dev/sda: x86 boot sector; etc.

OK, we've described the problems but there's no description here of how
the patch addresses those problems.


How does this look?


: There are two problems with shutdown in the NBD driver.
: 
: 1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem. 
: 
:    This patch adds the sync operation into __nbd_ioctl()'s
:    NBD_DISCONNECT handler.  This is useful because BLKFLSBUF is restricted
:    to processes that have CAP_SYS_ADMIN, and the NBD client may not
:    possess it (fsync of the block device does not sync the filesystem,
:    either).
: 
: 2: Once we clear the socket we have no guarantee that later reads will
:    come from the same backing storage.
: 
:    The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
:    clearing code so the page cache is cleaned, lest reads that hit on the
:    page cache will return stale data from the previously-accessible disk.
: 
: Example:
: 
:     # qemu-nbd -r -c/dev/nbd0 /dev/sr0
:     # file -s /dev/nbd0
:     /dev/stdin: # UDF filesystem data (version 1.5) etc.
:     # qemu-nbd -d /dev/nbd0
:     # qemu-nbd -r -c/dev/nbd0 /dev/sda
:     # file -s /dev/nbd0
:     /dev/stdin: # UDF filesystem data (version 1.5) etc.
: 
: While /dev/sda has:
: 
:     # file -s /dev/sda
:     /dev/sda: x86 boot sector; etc.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] NBD fixes for caching and block device flags
  2013-02-12 16:06 [PATCH 0/3] NBD fixes for caching and block device flags Paolo Bonzini
                   ` (2 preceding siblings ...)
  2013-02-12 16:06 ` [PATCH 3/3] nbd: show read-only state in sysfs Paolo Bonzini
@ 2013-02-12 21:43 ` Andrew Morton
  2013-02-13 17:14   ` [Nbd] " Wouter Verhelst
  3 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2013-02-12 21:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, nbd-general, Paul Clements

On Tue, 12 Feb 2013 17:06:08 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> The first two patches improve the behavior of NBD with respect to the host
> cache---respectively the server's cache and the client's.  The first adds
> support for flushing the backing storage, so that NBD is safe against
> power losses.  The second properly syncs and cleans the client's page
> cache when an NBD device is disconnected from the server.
> 
> The third reports read-only devices properly in sysfs.
> 
> Ok for 3.9?  I saw the last NBD patches were applied by Andrew Morton,
> so I'm CCing him; Paul, please ack them if they are okay.

I grabbed these and will await Paul's feedback, and a bit more
changelog detail for [1/3], please.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
  2013-02-12 17:37   ` Alex Bligh
@ 2013-02-12 22:07   ` Paul Clements
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Clements @ 2013-02-12 22:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, Alex Bligh, nbd-general, Andrew Morton

On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> From: Alex Bligh <alex@alex.org.uk>
>
> The NBD device does not support writeback caching, thus it is not safe
> against power losses unless the client opens the target with O_DSYNC or
> O_SYNC.
>
> Add support for a new flag that the server can pass.  If the flag is
> enabled, we translate REQ_FLUSH requests into the NBD_CMD_FLUSH
> command.
>
> Cc: <nbd-general@lists.sf.net>
> Cc: Paul Clements <Paul.Clements@steeleye.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Alex Bligh <alex@alex.org.uk>

Looks good.

Acked-by: Paul.Clements@steeleye.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown
  2013-02-12 16:06 ` [PATCH 2/3] nbd: fsync and kill block device on shutdown Paolo Bonzini
  2013-02-12 21:41   ` Andrew Morton
@ 2013-02-12 22:15   ` Paul Clements
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Clements @ 2013-02-12 22:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, stable, nbd-general, Andrew Morton

On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> There are two problems with shutdown in the NBD driver.  The first is
> that receiving the NBD_DISCONNECT ioctl does not sync the filesystem;
> this is useful because BLKFLSBUF is restricted to processes that have
> CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the
> block device does not sync the filesystem, either).

> The second is that once we clear the socket we have no guarantee that
> later reads will come from the same backing storage.  Thus the page cache
> must be cleaned, lest reads that hit on the page cache will return stale
> data from the previously-accessible disk.

Paolo,

Thanks for this. A problem indeed...

Acked-by: Paul.Clements@steeleye.com


> Cc: <stable@vger.kernel.org>
> Cc: <nbd-general@lists.sf.net>
> Cc: Paul Clements <Paul.Clements@steeleye.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/block/nbd.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5603765..a9c5c7a 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -608,12 +608,20 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>                 struct request sreq;
>
>                 dev_info(disk_to_dev(nbd->disk), "NBD_DISCONNECT\n");
> +               if (!nbd->sock)
> +                       return -EINVAL;
>
> +               mutex_unlock(&nbd->tx_lock);
> +               fsync_bdev(bdev);
> +               mutex_lock(&nbd->tx_lock);
>                 blk_rq_init(NULL, &sreq);
>                 sreq.cmd_type = REQ_TYPE_SPECIAL;
>                 nbd_cmd(&sreq) = NBD_CMD_DISC;
> +
> +               /* Check again after getting mutex back.  */
>                 if (!nbd->sock)
>                         return -EINVAL;
> +
>                 nbd_send_req(nbd, &sreq);
>                  return 0;
>         }
> @@ -627,6 +635,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>                 nbd_clear_que(nbd);
>                 BUG_ON(!list_empty(&nbd->queue_head));
>                 BUG_ON(!list_empty(&nbd->waiting_queue));
> +               kill_bdev(bdev);
>                 if (file)
>                         fput(file);
>                 return 0;
> @@ -719,6 +728,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>                 nbd->file = NULL;
>                 nbd_clear_que(nbd);
>                 dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
> +               kill_bdev(bdev);
>                 queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
>                 if (file)
>                         fput(file);
> --
> 1.7.1
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] nbd: show read-only state in sysfs
  2013-02-12 16:06 ` [PATCH 3/3] nbd: show read-only state in sysfs Paolo Bonzini
@ 2013-02-12 22:16   ` Paul Clements
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Clements @ 2013-02-12 22:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, nbd-general, Andrew Morton

On Tue, Feb 12, 2013 at 11:06 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Pass the read-only flag to set_device_ro, so that it will be
> visible to the block layer and in sysfs.

Looks good


> Cc: <nbd-general@lists.sf.net>
> Cc: Paul Clements <Paul.Clements@steeleye.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/block/nbd.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index a9c5c7a..ef17c2e 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -703,6 +703,8 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>
>                 mutex_unlock(&nbd->tx_lock);
>
> +               if (nbd->flags & NBD_FLAG_READ_ONLY)
> +                       set_device_ro(bdev, true);
>                 if (nbd->flags & NBD_FLAG_SEND_TRIM)
>                         queue_flag_set_unlocked(QUEUE_FLAG_DISCARD,
>                                 nbd->disk->queue);
> @@ -730,6 +732,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>                 dev_warn(disk_to_dev(nbd->disk), "queue cleared\n");
>                 kill_bdev(bdev);
>                 queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
> +               set_device_ro(bdev, false);
>                 if (file)
>                         fput(file);
>                 nbd->flags = 0;
> --
> 1.7.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 18:06     ` Paolo Bonzini
  2013-02-12 21:32       ` Andrew Morton
@ 2013-02-13  0:00       ` Alex Bligh
  1 sibling, 0 replies; 20+ messages in thread
From: Alex Bligh @ 2013-02-13  0:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bligh, linux-kernel, nbd-general, Paul Clements, Andrew Morton

Paolo,

On 12 Feb 2013, at 18:06, Paolo Bonzini wrote:

> Il 12/02/2013 18:37, Alex Bligh ha scritto:
>> For my education, why remove the FUA stuff?
> 
> Because I had no way to test it.

I think my mods to the official NBD code support FUA (albeit not very efficiently)

>>>>>> Hmmm... the underlying storage could be md/dm RAIDs in which case FUA
>>>>>> should be cheaper than FLUSH.
>>>> 
>>>> If someone ever wrote a virtio-blk backend that sits directly ontop
>>>> of the Linux block layer that would be true.
>> In this case we don't know what the backend is sitting on top of
>> a-priori. It might be the current nbd server code, but it might
>> not be.
> 
> Do you know of any other NBD server than the "official" one and qemu-nbd?


Yes. I know one well (but it's not open source). NBD seems to be the 'goto protocol' for writing distributed block store drivers in user space.

-- 
Alex Bligh





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-12 21:32       ` Andrew Morton
@ 2013-02-13  0:03         ` Alex Bligh
  2013-02-13 13:00           ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Bligh @ 2013-02-13  0:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alex Bligh, Paolo Bonzini, linux-kernel, nbd-general, Paul Clements


On 12 Feb 2013, at 21:32, Andrew Morton wrote:

> 
> Obviously the changelog was inadequate.  Please send along a new one
> which fully describes the reasons for this change.

To be clear I have no complaints about the rest of the patch being
merged. Supporting FLUSH but not FUA is far better than supporting
neither. I just didn't understand dropping FUA given the semantics
of nbd is in essence 'linux bios over tcp'.

-- 
Alex Bligh





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-13  0:03         ` Alex Bligh
@ 2013-02-13 13:00           ` Paolo Bonzini
  2013-02-13 15:55             ` Alex Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-13 13:00 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Andrew Morton, linux-kernel, nbd-general, Paul Clements

Il 13/02/2013 01:03, Alex Bligh ha scritto:
>> > Obviously the changelog was inadequate.  Please send along a new one
>> > which fully describes the reasons for this change.
> To be clear I have no complaints about the rest of the patch being
> merged. Supporting FLUSH but not FUA is far better than supporting
> neither. I just didn't understand dropping FUA given the semantics
> of nbd is in essence 'linux bios over tcp'.

Not really bios, since it expects FLUSH requests to include no I/O, but
yes the semantics of NBD (and virtio-blk) are quite close to those of
the Linux block layer.

But as far as I can test with free servers, the FUA bits have no
advantage over flush.  Also, I wasn't sure if SEND_FUA without
SEND_FLUSH is valid, and if so how to handle this combination (treat it
as writethrough and add FUA to all requests? warn and do nothing?).

Andrew, here is a better commit message:

------------ 8< ---------------
From: Alex Bligh <alex@alex.org.uk>
Subject: nbd: support FLUSH requests

Currently, the NBD device does not accept flush requests from the Linux
block layer.  If the NBD server opened the target with neither O_SYNC
nor O_DSYNC, however, the device will be effectively backed by a
writeback cache.  Without issuing flushes properly, operation of the NBD
device will not be safe against power losses.

The NBD protocol has support for both a cache flush command and a FUA
command flag; the server will also pass a flag to note its support for
these features.  This patch adds support for the cache flush command and
flag.  In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush.  When
the flag is active the block layer will send REQ_FLUSH requests, which
we translate to NBD_CMD_FLUSH commands.

FUA support is not included in this patch because all free software
servers implement it with a full fdatasync; thus it has no advantage
over supporting flush only.  Because I [Paolo] cannot really benchmark
it in a realistic scenario, I cannot tell if it is a good idea or not.
It is also not clear if it is valid for an NBD server to support FUA but
not flush.  The Linux block layer gives a warning for this combination,
the NBD protocol documentation says nothing about it.

The patch also fixes a small problem in the handling of flags:
nbd->flags must be cleared at the end of NBD_DO_IT, but the driver was
not doing that.  The bug manifests itself as follows.  Suppose you two
different client/server pairs to start the NBD device.  Suppose also
that the first client supports NBD_SET_FLAGS, and the first server sends
NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
things.  Before this patch, the second invocation of NBD_DO_IT will use
a stale value of nbd->flags, and the second server will issue an error
every time it receives an NBD_CMD_FLUSH command.

This bug is pre-existing, but it becomes much more important after this
patch; flush failures make the device pretty much unusable, unlike
discard failures.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] nbd: fsync and kill block device on shutdown
  2013-02-12 21:41   ` Andrew Morton
@ 2013-02-13 13:05     ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-13 13:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, stable, nbd-general, Paul Clements

Il 12/02/2013 22:41, Andrew Morton ha scritto:
>> > There are two problems with shutdown in the NBD driver.  The first is
>> > that receiving the NBD_DISCONNECT ioctl does not sync the filesystem;
>> > this is useful because BLKFLSBUF is restricted to processes that have
>> > CAP_SYS_ADMIN, and the NBD client may not possess it (fsync of the
>> > block device does not sync the filesystem, either).
> hm, this says that the lack of a sync is "useful".  I think you mean
> that the patch-which-adds-the-sync is the thing which is useful, yes?

Yes.

>> > The second is that once we clear the socket we have no guarantee that
>> > later reads will come from the same backing storage.  Thus the page cache
>> > must be cleaned, lest reads that hit on the page cache will return stale
>> > data from the previously-accessible disk.
> That sounds like a problem.
> 
>> > Example:
>> > 
>> >     # qemu-nbd -r -c/dev/nbd0 /dev/sr0
>> >     # file -s /dev/nbd0
>> >     /dev/stdin: # UDF filesystem data (version 1.5) etc.
>> >     # qemu-nbd -d /dev/nbd0
>> >     # qemu-nbd -r -c/dev/nbd0 /dev/sda
>> >     # file -s /dev/nbd0
>> >     /dev/stdin: # UDF filesystem data (version 1.5) etc.
>> > 
>> > While /dev/sda has:
>> > 
>> >     # file -s /dev/sda
>> >     /dev/sda: x86 boot sector; etc.
> OK, we've described the problems but there's no description here of how
> the patch addresses those problems.
> 
> How does this look?

Perfect, thanks very much.  I tried to similarly balance the "why" and
"how" in the new commit message for patch 1.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-13 13:00           ` Paolo Bonzini
@ 2013-02-13 15:55             ` Alex Bligh
  2013-02-13 16:02               ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Bligh @ 2013-02-13 15:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bligh, Andrew Morton, linux-kernel, nbd-general, Paul Clements

Paolo,

On 13 Feb 2013, at 13:00, Paolo Bonzini wrote:

> But as far as I can test with free servers, the FUA bits have no
> advantage over flush.  Also, I wasn't sure if SEND_FUA without
> SEND_FLUSH is valid, and if so how to handle this combination (treat it
> as writethrough and add FUA to all requests? warn and do nothing?).

On the main opensource nbd client, the following applies:

What REQ_FUA does is an fdatasync() after the write. Code extract and
comments below from Christoph Hellwig.

What REQ_FLUSH does is to do an fsync().

The way I read Christoph's comment, provided the linux block layer always
issues a REQ_FLUSH before a REQ_FUA, there is not performance problem.

However, a REQ_FUA is going to do a f(data)?sync AFTER the write, whereas
the preceding REQ_FLUSH is going to an fsync() BEFORE the write. It seems
to me that either the FUA and FLUSH semantics are therefore different
(and we need FUA), or that Christoph's comment is wrong and that you
are guaranteed a REQ_FLUSH *after* the write with REQ_FUA.

-- 
Alex Bligh




        } else if (fua) {

          /* This is where we would do the following
           *   #ifdef USE_SYNC_FILE_RANGE
           * However, we don't, for the reasons set out below
           * by Christoph Hellwig <hch@infradead.org>
           *
           * [BEGINS] 
           * fdatasync is equivalent to fsync except that it does not flush
           * non-essential metadata (basically just timestamps in practice), but it
           * does flush metadata requried to find the data again, e.g. allocation
           * information and extent maps.  sync_file_range does nothing but flush
           * out pagecache content - it means you basically won't get your data
           * back in case of a crash if you either:
           * 
           *  a) have a volatile write cache in your disk (e.g. any normal SATA disk)
           *  b) are using a sparse file on a filesystem
           *  c) are using a fallocate-preallocated file on a filesystem
           *  d) use any file on a COW filesystem like btrfs
           * 
           * e.g. it only does anything useful for you if you do not have a volatile
           * write cache, and either use a raw block device node, or just overwrite
           * an already fully allocated (and not preallocated) file on a non-COW
           * filesystem.
           * [ENDS]
           *
           * What we should do is open a second FD with O_DSYNC set, then write to
           * that when appropriate. However, with a Linux client, every REQ_FUA
           * immediately follows a REQ_FLUSH, so fdatasync does not cause performance
           * problems.
           *
           */
#if 0
                sync_file_range(fhandle, foffset, len,
                                SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
                                SYNC_FILE_RANGE_WAIT_AFTER);
#else
                fdatasync(fhandle);
#endif
        }



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-13 15:55             ` Alex Bligh
@ 2013-02-13 16:02               ` Paolo Bonzini
  2013-02-13 17:35                 ` Alex Bligh
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2013-02-13 16:02 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Andrew Morton, linux-kernel, nbd-general, Paul Clements

Il 13/02/2013 16:55, Alex Bligh ha scritto:
>> > But as far as I can test with free servers, the FUA bits have no
>> > advantage over flush.  Also, I wasn't sure if SEND_FUA without
>> > SEND_FLUSH is valid, and if so how to handle this combination (treat it
>> > as writethrough and add FUA to all requests? warn and do nothing?).
> On the main opensource nbd client, the following applies:
> 
> What REQ_FUA does is an fdatasync() after the write. Code extract and
> comments below from Christoph Hellwig.
> 
> What REQ_FLUSH does is to do an fsync().
> 
> The way I read Christoph's comment, provided the linux block layer always
> issues a REQ_FLUSH before a REQ_FUA, there is not performance problem.
> 
> However, a REQ_FUA is going to do a f(data)?sync AFTER the write, whereas
> the preceding REQ_FLUSH is going to an fsync() BEFORE the write. It seems
> to me that either the FUA and FLUSH semantics are therefore different
> (and we need FUA), or that Christoph's comment is wrong and that you
> are guaranteed a REQ_FLUSH *after* the write with REQ_FUA.

REQ_FLUSH is indeed a flush before the write.  fdatasync is fine there too.

If you do not have REQ_FUA, as is the case with this patch, the block
layer converts it to a REQ_FLUSH *after* the write.

See block/blk-flush.c:

 * REQ_{FLUSH|FUA} requests are decomposed to sequences consisted of three
 * optional steps - PREFLUSH, DATA and POSTFLUSH - according to the request
 * properties and hardware capability.
 *
 * If the device doesn't have writeback cache, FLUSH and FUA don't make any
 * difference.  The requests are either completed immediately if there's no
 * data or executed as normal requests otherwise.
 *
 * If the device has writeback cache and supports FUA, REQ_FLUSH is
 * translated to PREFLUSH but REQ_FUA is passed down directly with DATA.
 *
 * If the device has writeback cache and doesn't support FUA, REQ_FLUSH is
 * translated to PREFLUSH and REQ_FUA to POSTFLUSH.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Nbd] [PATCH 0/3] NBD fixes for caching and block device flags
  2013-02-12 21:43 ` [PATCH 0/3] NBD fixes for caching and block device flags Andrew Morton
@ 2013-02-13 17:14   ` Wouter Verhelst
  0 siblings, 0 replies; 20+ messages in thread
From: Wouter Verhelst @ 2013-02-13 17:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paolo Bonzini, nbd-general, Paul Clements, linux-kernel

On Tue, Feb 12, 2013 at 01:43:49PM -0800, Andrew Morton wrote:
> On Tue, 12 Feb 2013 17:06:08 +0100
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > The first two patches improve the behavior of NBD with respect to the host
> > cache---respectively the server's cache and the client's.  The first adds
> > support for flushing the backing storage, so that NBD is safe against
> > power losses.  The second properly syncs and cleans the client's page
> > cache when an NBD device is disconnected from the server.
> > 
> > The third reports read-only devices properly in sysfs.
> > 
> > Ok for 3.9?  I saw the last NBD patches were applied by Andrew Morton,
> > so I'm CCing him; Paul, please ack them if they are okay.
> 
> I grabbed these and will await Paul's feedback, and a bit more
> changelog detail for [1/3], please.

While we're at it, I'd like to see some updates to the horribly outdated
Documentation/blockdev/nbd.txt, and to see this list (nbd-general)
linked to from the MAINTAINERS file. I'll send a proposed patch shortly.

-- 
Copyshops should do vouchers. So that next time some bureaucracy requires you
to mail a form in triplicate, you can mail it just once, add a voucher, and
save on postage.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] nbd: support FLUSH requests
  2013-02-13 16:02               ` Paolo Bonzini
@ 2013-02-13 17:35                 ` Alex Bligh
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Bligh @ 2013-02-13 17:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bligh, Andrew Morton, linux-kernel, nbd-general, Paul Clements


On 13 Feb 2013, at 16:02, Paolo Bonzini wrote:

> If you do not have REQ_FUA, as is the case with this patch, the block
> layer converts it to a REQ_FLUSH *after* the write.

OK. Well +/- it converting an fdatasync() into a full fsync(), that
should be be ok.

-- 
Alex Bligh





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-02-13 17:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12 16:06 [PATCH 0/3] NBD fixes for caching and block device flags Paolo Bonzini
2013-02-12 16:06 ` [PATCH 1/3] nbd: support FLUSH requests Paolo Bonzini
2013-02-12 17:37   ` Alex Bligh
2013-02-12 18:06     ` Paolo Bonzini
2013-02-12 21:32       ` Andrew Morton
2013-02-13  0:03         ` Alex Bligh
2013-02-13 13:00           ` Paolo Bonzini
2013-02-13 15:55             ` Alex Bligh
2013-02-13 16:02               ` Paolo Bonzini
2013-02-13 17:35                 ` Alex Bligh
2013-02-13  0:00       ` Alex Bligh
2013-02-12 22:07   ` Paul Clements
2013-02-12 16:06 ` [PATCH 2/3] nbd: fsync and kill block device on shutdown Paolo Bonzini
2013-02-12 21:41   ` Andrew Morton
2013-02-13 13:05     ` Paolo Bonzini
2013-02-12 22:15   ` Paul Clements
2013-02-12 16:06 ` [PATCH 3/3] nbd: show read-only state in sysfs Paolo Bonzini
2013-02-12 22:16   ` Paul Clements
2013-02-12 21:43 ` [PATCH 0/3] NBD fixes for caching and block device flags Andrew Morton
2013-02-13 17:14   ` [Nbd] " Wouter Verhelst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).