All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Bligh <alex@alex.org.uk>
To: Christoph Hellwig <hch@infradead.org>
Cc: Alex Bligh <alex@alex.org.uk>, Wouter Verhelst <w@uter.be>,
	Josef Bacik <jbacik@fb.com>,
	"nbd-general@lists.sourceforge.net"
	<nbd-general@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org, Markus Pargmann <mpa@pengutronix.de>,
	kernel-team@fb.com
Subject: Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
Date: Thu, 15 Sep 2016 13:04:27 +0100	[thread overview]
Message-ID: <E19F1FFC-3112-40EE-8935-306CC5B039C2@alex.org.uk> (raw)
In-Reply-To: <20160915115217.GB6411@infradead.org>


> On 15 Sep 2016, at 12:52, Christoph Hellwig <hch@infradead.org> wrote:
>=20
> On Thu, Sep 15, 2016 at 12:46:07PM +0100, Alex Bligh wrote:
>> Essentially NBD does supports FLUSH/FUA like this:
>>=20
>> =
https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt=

>>=20
>> IE supports the same FLUSH/FUA primitives as other block drivers =
(AIUI).
>>=20
>> Link to protocol (per last email) here:
>>=20
>> =
https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-a=
nd-writes
>=20
> Flush as defined by the Linux block layer (and supported that way in
> SCSI, ATA, NVMe) only requires to flush all already completed writes
> to non-volatile media.  It does not impose any ordering unlike the
> nbd spec.

As maintainer of the NBD spec, I'm confused as to why you think it
imposes any ordering - if you think this, clearly I need to clean up
the wording.

Here's what it says:

> The server MAY process commands out of order, and MAY reply out of =
order,
> except that:
>=20
> 	=E2=80=A2 All write commands (that includes NBD_CMD_WRITE, and =
NBD_CMD_TRIM)
> that the server completes (i.e. replies to) prior to processing to a
> NBD_CMD_FLUSH MUST be written to non-volatile storage prior to =
replying to that
> NBD_CMD_FLUSH. This paragraph only applies if NBD_FLAG_SEND_FLUSH is =
set within
> the transmission flags, as otherwise NBD_CMD_FLUSH will never be sent =
by the
> client to the server.

(and another bit re FUA that isn't relevant here).

Here's the Linux Kernel documentation:

> The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio =
submitted from
> the filesystem and will make sure the volatile cache of the storage =
device
> has been flushed before the actual I/O operation is started.  This =
explicitly
> guarantees that previously completed write requests are on =
non-volatile
> storage before the flagged bio starts. In addition the REQ_PREFLUSH =
flag can be
> set on an otherwise empty bio structure, which causes only an explicit =
cache
> flush without any dependent I/O.  It is recommend to use
> the blkdev_issue_flush() helper for a pure cache flush.


I believe that NBD treats NBD_CMD_FLUSH the same as a REQ_PREFLUSH and =
empty
bio.

If you don't read those two as compatible, I'd like to understand why =
not
(i.e. what additional constraints one is applying that the other is not)
as they are meant to be the same (save that NBD only has FLUSH as a =
command,
i.e. the 'empty bio' version). I am happy to improve the docs to make it
clearer.

(sidenote: I am interested in the change from REQ_FLUSH to REQ_PREFLUSH,
but in an empty bio it's not really relevant I think).

> FUA as defined by the Linux block layer (and supported that way in =
SCSI,
> ATA, NVMe) only requires the write operation the FUA bit is set on to =
be
> on non-volatile media before completing the write operation.  It does
> not impose any ordering, which seems to match the nbd spec.  Unlike =
the
> NBD spec Linux does not allow FUA to be set on anything by WRITE
> commands.  Some other storage protocols allow a FUA bit on READ
> commands or other commands that write data to the device, though.

I think you mean "anything *but* WRITE commands". In NBD setting
FUA on a command that does not write will do nothing, but FUA can
be set on NBD_CMD_TRIM and has the expected effect.

Interestingly the kernel docs are silent on which commands REQ_FUA
can be set on.

--=20
Alex Bligh





WARNING: multiple messages have this Message-ID (diff)
From: Alex Bligh <alex@alex.org.uk>
To: Christoph Hellwig <hch@infradead.org>
Cc: Alex Bligh <alex@alex.org.uk>, Wouter Verhelst <w@uter.be>,
	Josef Bacik <jbacik@fb.com>,
	"nbd-general@lists.sourceforge.net" 
	<nbd-general@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org, Markus Pargmann <mpa@pengutronix.de>,
	kernel-team@fb.com
Subject: Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
Date: Thu, 15 Sep 2016 13:04:27 +0100	[thread overview]
Message-ID: <E19F1FFC-3112-40EE-8935-306CC5B039C2@alex.org.uk> (raw)
In-Reply-To: <20160915115217.GB6411@infradead.org>


> On 15 Sep 2016, at 12:52, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Thu, Sep 15, 2016 at 12:46:07PM +0100, Alex Bligh wrote:
>> Essentially NBD does supports FLUSH/FUA like this:
>> 
>> https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt
>> 
>> IE supports the same FLUSH/FUA primitives as other block drivers (AIUI).
>> 
>> Link to protocol (per last email) here:
>> 
>> https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-and-writes
> 
> Flush as defined by the Linux block layer (and supported that way in
> SCSI, ATA, NVMe) only requires to flush all already completed writes
> to non-volatile media.  It does not impose any ordering unlike the
> nbd spec.

As maintainer of the NBD spec, I'm confused as to why you think it
imposes any ordering - if you think this, clearly I need to clean up
the wording.

Here's what it says:

> The server MAY process commands out of order, and MAY reply out of order,
> except that:
> 
> 	• All write commands (that includes NBD_CMD_WRITE, and NBD_CMD_TRIM)
> that the server completes (i.e. replies to) prior to processing to a
> NBD_CMD_FLUSH MUST be written to non-volatile storage prior to replying to that
> NBD_CMD_FLUSH. This paragraph only applies if NBD_FLAG_SEND_FLUSH is set within
> the transmission flags, as otherwise NBD_CMD_FLUSH will never be sent by the
> client to the server.

(and another bit re FUA that isn't relevant here).

Here's the Linux Kernel documentation:

> The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
> the filesystem and will make sure the volatile cache of the storage device
> has been flushed before the actual I/O operation is started.  This explicitly
> guarantees that previously completed write requests are on non-volatile
> storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
> set on an otherwise empty bio structure, which causes only an explicit cache
> flush without any dependent I/O.  It is recommend to use
> the blkdev_issue_flush() helper for a pure cache flush.


I believe that NBD treats NBD_CMD_FLUSH the same as a REQ_PREFLUSH and empty
bio.

If you don't read those two as compatible, I'd like to understand why not
(i.e. what additional constraints one is applying that the other is not)
as they are meant to be the same (save that NBD only has FLUSH as a command,
i.e. the 'empty bio' version). I am happy to improve the docs to make it
clearer.

(sidenote: I am interested in the change from REQ_FLUSH to REQ_PREFLUSH,
but in an empty bio it's not really relevant I think).

> FUA as defined by the Linux block layer (and supported that way in SCSI,
> ATA, NVMe) only requires the write operation the FUA bit is set on to be
> on non-volatile media before completing the write operation.  It does
> not impose any ordering, which seems to match the nbd spec.  Unlike the
> NBD spec Linux does not allow FUA to be set on anything by WRITE
> commands.  Some other storage protocols allow a FUA bit on READ
> commands or other commands that write data to the device, though.

I think you mean "anything *but* WRITE commands". In NBD setting
FUA on a command that does not write will do nothing, but FUA can
be set on NBD_CMD_TRIM and has the expected effect.

Interestingly the kernel docs are silent on which commands REQ_FUA
can be set on.

-- 
Alex Bligh

  parent reply	other threads:[~2016-09-15 12:04 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-08 21:12 [RESEND][PATCH 0/5] nbd improvements Josef Bacik
2016-09-08 21:12 ` [PATCH 1/5] nbd: convert to blkmq Josef Bacik
2016-09-08 21:12 ` [PATCH 2/5] nbd: don't shutdown sock with irq's disabled Josef Bacik
2016-09-08 21:12 ` [PATCH 3/5] nbd: use flags instead of bool Josef Bacik
2016-09-09  1:20   ` Joe Perches
2016-09-09 13:55     ` Jens Axboe
2016-09-09 16:04       ` Joe Perches
2016-09-09 16:04         ` Joe Perches
2016-09-09 16:11         ` Jens Axboe
2016-09-09 16:15           ` Joe Perches
2016-09-09 16:20             ` Jens Axboe
2016-09-09 16:20               ` Jens Axboe
2016-09-08 21:12 ` [PATCH 4/5] nbd: allow block mq to deal with timeouts Josef Bacik
2016-09-08 21:12 ` [PATCH 5/5] nbd: add multi-connection support Josef Bacik
2016-09-10  7:43   ` Christoph Hellwig
2016-09-12 13:11     ` Josef Bacik
2016-09-09 20:02 ` [Nbd] [RESEND][PATCH 0/5] nbd improvements Wouter Verhelst
2016-09-09 20:36   ` Josef Bacik
2016-09-09 20:55     ` Wouter Verhelst
2016-09-09 23:00       ` Josef Bacik
2016-09-09 23:37         ` Jens Axboe
2016-09-15 10:49   ` Wouter Verhelst
2016-09-15 11:09     ` Alex Bligh
2016-09-15 11:09       ` Alex Bligh
2016-09-15 11:29       ` Wouter Verhelst
2016-09-15 11:29         ` Wouter Verhelst
2016-09-15 11:40         ` Christoph Hellwig
2016-09-15 11:40           ` Christoph Hellwig
2016-09-15 11:46           ` Alex Bligh
2016-09-15 11:46             ` Alex Bligh
2016-09-15 11:52             ` Christoph Hellwig
2016-09-15 11:52               ` Christoph Hellwig
2016-09-15 12:01               ` Wouter Verhelst
2016-09-15 12:01                 ` Wouter Verhelst
2016-09-15 12:20                 ` Christoph Hellwig
2016-09-15 12:20                   ` Christoph Hellwig
2016-09-15 12:26                   ` Wouter Verhelst
2016-09-15 12:26                     ` Wouter Verhelst
2016-09-15 12:27                     ` Christoph Hellwig
2016-09-15 12:27                       ` Christoph Hellwig
2016-09-15 12:04               ` Alex Bligh [this message]
2016-09-15 12:04                 ` Alex Bligh
2016-09-15 11:39       ` Christoph Hellwig
2016-09-15 11:39         ` Christoph Hellwig
2016-09-15 13:34       ` Eric Blake
2016-09-15 13:34         ` Eric Blake
2016-09-15 14:07         ` Paolo Bonzini
2016-09-15 14:07           ` Paolo Bonzini
2016-09-15 15:23           ` Alex Bligh
2016-09-15 15:23             ` Alex Bligh
2016-09-15 21:10             ` Paolo Bonzini
2016-09-15 21:10               ` Paolo Bonzini
2016-09-15 15:25         ` Alex Bligh
2016-09-15 15:25           ` Alex Bligh
2016-09-15 11:38     ` Christoph Hellwig
2016-09-15 11:43       ` Alex Bligh
2016-09-15 11:43         ` Alex Bligh
2016-09-15 11:46         ` Christoph Hellwig
2016-09-15 11:46           ` Christoph Hellwig
2016-09-15 11:56           ` Alex Bligh
2016-09-15 11:56             ` Alex Bligh
2016-09-15 11:55       ` Wouter Verhelst
2016-09-15 12:01         ` Christoph Hellwig
2016-09-15 12:11           ` Alex Bligh
2016-09-15 12:11             ` Alex Bligh
2016-09-15 12:18             ` Christoph Hellwig
2016-09-15 12:18               ` Christoph Hellwig
2016-09-15 12:28               ` Alex Bligh
2016-09-15 12:28                 ` Alex Bligh
2016-09-15 12:21           ` Wouter Verhelst
2016-09-15 12:23             ` Christoph Hellwig
2016-09-15 12:33               ` Alex Bligh
2016-09-15 12:33                 ` Alex Bligh
2016-09-15 12:36                 ` Christoph Hellwig
2016-09-15 12:36                   ` Christoph Hellwig
2016-09-15 12:39                   ` Alex Bligh
2016-09-15 12:39                     ` Alex Bligh
2016-09-15 12:41                     ` Christoph Hellwig
2016-09-15 12:41                       ` Christoph Hellwig
2016-09-15 12:44                       ` Alex Bligh
2016-09-15 12:44                         ` Alex Bligh
2016-09-15 13:17                         ` Wouter Verhelst
2016-09-15 13:17                           ` Wouter Verhelst
2016-09-15 13:57                           ` Josef Bacik
2016-09-15 13:57                             ` Josef Bacik
2016-09-15 15:17                             ` Alex Bligh
2016-09-15 15:17                               ` Alex Bligh
2016-09-15 16:08                           ` Alex Bligh
2016-09-15 16:08                             ` Alex Bligh
2016-09-15 16:27                             ` Wouter Verhelst
2016-09-15 16:27                               ` Wouter Verhelst
2016-09-15 16:42                               ` Alex Bligh
2016-09-15 16:42                                 ` Alex Bligh
2016-09-15 19:02                               ` Eric Blake
2016-09-15 19:02                                 ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E19F1FFC-3112-40EE-8935-306CC5B039C2@alex.org.uk \
    --to=alex@alex.org.uk \
    --cc=hch@infradead.org \
    --cc=jbacik@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.