All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
@ 2016-03-31 19:33 Eric Blake
  2016-03-31 19:41 ` [Qemu-devel] [Nbd] " Alex Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2016-03-31 19:33 UTC (permalink / raw)
  To: qemu-devel, nbd-general

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

Qemu's nbd-client is setting NBD_CMD_FLAG_FUA during a flush command,
but the official NBD protocol documentation doesn't describe this as
valid (it merely states that flush must not have a reply until all
acknowledged writes have hit permanent storage).  Does this flag make
sense (what semantics would the flag add, and we need to fix the NBD
docs as well as relax the reference implementation to allow the flag),
or is it a bug in qemu (and the recent tightening of NBD to throw EINVAL
on unsupported flags will trip up qemu)?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 19:33 [Qemu-devel] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH? Eric Blake
@ 2016-03-31 19:41 ` Alex Bligh
  2016-03-31 19:54   ` Eric Blake
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bligh @ 2016-03-31 19:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]


On 31 Mar 2016, at 20:33, Eric Blake <eblake@redhat.com> wrote:

> Qemu's nbd-client is setting NBD_CMD_FLAG_FUA during a flush command,
> but the official NBD protocol documentation doesn't describe this as
> valid (it merely states that flush must not have a reply until all
> acknowledged writes have hit permanent storage).  Does this flag make
> sense (what semantics would the flag add, and we need to fix the NBD
> docs as well as relax the reference implementation to allow the flag),
> or is it a bug in qemu (and the recent tightening of NBD to throw EINVAL
> on unsupported flags will trip up qemu)?

As the original author of that particular mess, the intent was that
they should reflect exactly the Linux kernel's semantics for FLUSH
and FUA, not only in terms of whether they can be used together,
but also exactly what they mean.

This turned out to be an easier way of describing the operations
than describing them semantically (in particular FLUSH, where I
couldn't get an entirely consistent answer of what it required
of inflight requests, specifically whether it required all
requests inflight at the time of making the request to be written
to disk prior to answering, or all requests inflight prior to the
time of replying to be written to disk prior to answering, though
I believe the former).

FUA just requires that particular request to be persisted to
disk, and does not require other requests to be persisted to disk

So in answer to your question, my understanding is that FLUSH requires
(some subset) of otherwise potentially non-persisted requests to
be persisted to disk. In that sense it implies FUA. It is permitted
to set FUA (as it is permitted, I believe, in the linux block layer)
but it will make no difference.

I once thought FUA on read should bypass any local read cache, though
that is not part of the spec currently.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 19:41 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-03-31 19:54   ` Eric Blake
  2016-03-31 20:17     ` Alex Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2016-03-31 19:54 UTC (permalink / raw)
  To: Alex Bligh; +Cc: nbd-general, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3722 bytes --]

On 03/31/2016 01:41 PM, Alex Bligh wrote:
> 
> On 31 Mar 2016, at 20:33, Eric Blake <eblake@redhat.com> wrote:
> 
>> Qemu's nbd-client is setting NBD_CMD_FLAG_FUA during a flush command,
>> but the official NBD protocol documentation doesn't describe this as
>> valid (it merely states that flush must not have a reply until all
>> acknowledged writes have hit permanent storage).  Does this flag make
>> sense (what semantics would the flag add, and we need to fix the NBD
>> docs as well as relax the reference implementation to allow the flag),
>> or is it a bug in qemu (and the recent tightening of NBD to throw EINVAL
>> on unsupported flags will trip up qemu)?
> 
> As the original author of that particular mess, the intent was that
> they should reflect exactly the Linux kernel's semantics for FLUSH
> and FUA, not only in terms of whether they can be used together,
> but also exactly what they mean.

Oh, and I also just found that qemu's nbd-server tries to honor FUA on
read, even though the protocol doesn't document that as valid either.

> 
> This turned out to be an easier way of describing the operations
> than describing them semantically (in particular FLUSH, where I
> couldn't get an entirely consistent answer of what it required
> of inflight requests, specifically whether it required all
> requests inflight at the time of making the request to be written
> to disk prior to answering, or all requests inflight prior to the
> time of replying to be written to disk prior to answering, though
> I believe the former).
> 
> FUA just requires that particular request to be persisted to
> disk, and does not require other requests to be persisted to disk

As written, NBD says that FUA requires the current write operation to
land on disk (but says nothing about any other writes, whether those
writes had an early reply).  And for flush, NBD only requires that all
writes that have _sent_ their reply to the client must land on disk, but
this can certainly be a smaller set of write requests than _all_ writes
issued prior to that point in time.  So maybe flush+FUA is a valid thing
to support, and means that ALL in-flight writes must land, whether or
not a reply has been sent to the client, for an even stronger barrier?

> So in answer to your question, my understanding is that FLUSH requires
> (some subset) of otherwise potentially non-persisted requests to
> be persisted to disk. In that sense it implies FUA. It is permitted
> to set FUA (as it is permitted, I believe, in the linux block layer)
> but it will make no difference.
> 
> I once thought FUA on read should bypass any local read cache, though
> that is not part of the spec currently.

In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
that's the same function it calls for write+FUA.  And for flush (whether
or not FUA was specified), qemu still calls blk_co_flush().  So from
qemu's perspective, FUA is synonymous with "finish ALL pending
transactions", which is stronger than what the NBD protocol requires.
(Nothing wrong with an implementation doing more work than required,
although it may be less efficient).  Alas, that means I can't use qemu's
behavior as a good reference for how to improve the NBD spec.

Meanwhile, it sounds like FUA is valid on read, write, AND flush
(because the kernel supports all three), even if we aren't quite sure
what to document of those flags.  And that means qemu is correct, and
the NBD protocol has a bug.  Since you contributed the FUA flag, is that
something you can try to improve?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 19:54   ` Eric Blake
@ 2016-03-31 20:17     ` Alex Bligh
  2016-03-31 20:34       ` Eric Blake
  2016-04-01  7:43       ` Paolo Bonzini
  0 siblings, 2 replies; 24+ messages in thread
From: Alex Bligh @ 2016-03-31 20:17 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 4846 bytes --]


On 31 Mar 2016, at 20:54, Eric Blake <eblake@redhat.com> wrote:
> 
> Oh, and I also just found that qemu's nbd-server tries to honor FUA on
> read, even though the protocol doesn't document that as valid either.

Potentially useful, but I believe not required (I don't believe the
kernel does that, and I *believe* qemu's block layer does the
same as the kernel).

>> This turned out to be an easier way of describing the operations
>> than describing them semantically (in particular FLUSH, where I
>> couldn't get an entirely consistent answer of what it required
>> of inflight requests, specifically whether it required all
>> requests inflight at the time of making the request to be written
>> to disk prior to answering, or all requests inflight prior to the
>> time of replying to be written to disk prior to answering, though
>> I believe the former).
>> 
>> FUA just requires that particular request to be persisted to
>> disk, and does not require other requests to be persisted to disk
> 
> As written, NBD says that FUA requires the current write operation to
> land on disk (but says nothing about any other writes, whether those
> writes had an early reply).

That is my understanding.

>  And for flush, NBD only requires that all
> writes that have _sent_ their reply to the client must land on disk, but
> this can certainly be a smaller set of write requests than _all_ writes
> issued prior to that point in time.  So maybe flush+FUA is a valid thing
> to support, and means that ALL in-flight writes must land, whether or
> not a reply has been sent to the client, for an even stronger barrier?

OK so I actually went and researched what my answer was last time I
was asked ( :-) ):

Here was my conclusion last time after trawling through lkml
on the subject:

From https://sourceforge.net/p/nbd/mailman/message/27569820/

> You may process commands out of order, and reply out of order,
> save that
> a) all write commands *completed* before you process a REQ_FLUSH
>  must be written to non-volatile storage prior to completing
>  that REQ_FLUSH (though apparently you should, if possible, make
>  this true for all write commands *received*, which is a stronger
>  condition) [Ignore this if you don't set SEND_REQ_FLUSH]
> b) a REQ_FUA flagged write must not complete until its payload
>  is written to non-volatile storage [ignore this if you don't
>  set SEND_REQ_FUA]
> 


Perhaps it would be good for that to actually go in the docs!

I don't think we need a 'stronger barrier' as the client can
implement that itself merely by waiting for all commands to
complete prior to sending FLUSH.

Incidentally, last time I looked, the linux kernel always sent
a FLUSH immediately after any bio marked FUA. Does qemu use
more interesting behavioural modes?

>> So in answer to your question, my understanding is that FLUSH requires
>> (some subset) of otherwise potentially non-persisted requests to
>> be persisted to disk. In that sense it implies FUA. It is permitted
>> to set FUA (as it is permitted, I believe, in the linux block layer)
>> but it will make no difference.
>> 
>> I once thought FUA on read should bypass any local read cache, though
>> that is not part of the spec currently.
> 
> In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
> that's the same function it calls for write+FUA.

That's harmless, but unnecessary in the sense that current documented
behaviour doesn't require it. Perhaps it should?

I suppose TRIM etc. should support FUA too?

>  And for flush (whether
> or not FUA was specified), qemu still calls blk_co_flush().  So from
> qemu's perspective, FUA is synonymous with "finish ALL pending
> transactions", which is stronger than what the NBD protocol requires.
> (Nothing wrong with an implementation doing more work than required,
> although it may be less efficient).  Alas, that means I can't use qemu's
> behavior as a good reference for how to improve the NBD spec.
> 
> Meanwhile, it sounds like FUA is valid on read, write, AND flush
> (because the kernel supports all three),

Do you have a pointer to what FUA means on kernel reads? Does it
mean "force unit access for the read" or does it mean "flush any
write for that block first"? The first is subtly different if the
file is remote and being accessed by multiple people (e.g. NFS, Ceph etc.)

> even if we aren't quite sure
> what to document of those flags.  And that means qemu is correct, and
> the NBD protocol has a bug.  Since you contributed the FUA flag, is that
> something you can try to improve?

Yeah. My mess so I should clean it up. I think FUA should be valid
on essentially everything.

I think I might wait until structured replies is in though!

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 20:17     ` Alex Bligh
@ 2016-03-31 20:34       ` Eric Blake
  2016-04-01  7:49         ` Paolo Bonzini
                           ` (2 more replies)
  2016-04-01  7:43       ` Paolo Bonzini
  1 sibling, 3 replies; 24+ messages in thread
From: Eric Blake @ 2016-03-31 20:34 UTC (permalink / raw)
  To: Alex Bligh; +Cc: nbd-general, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]

On 03/31/2016 02:17 PM, Alex Bligh wrote:
> OK so I actually went and researched what my answer was last time I
> was asked ( :-) ):
> 
> Here was my conclusion last time after trawling through lkml
> on the subject:
> 
> From https://sourceforge.net/p/nbd/mailman/message/27569820/
> 
>> You may process commands out of order, and reply out of order,
>> save that
>> a) all write commands *completed* before you process a REQ_FLUSH
>>  must be written to non-volatile storage prior to completing
>>  that REQ_FLUSH (though apparently you should, if possible, make
>>  this true for all write commands *received*, which is a stronger
>>  condition) [Ignore this if you don't set SEND_REQ_FLUSH]
>> b) a REQ_FUA flagged write must not complete until its payload
>>  is written to non-volatile storage [ignore this if you don't
>>  set SEND_REQ_FUA]
>>
> 
> 
> Perhaps it would be good for that to actually go in the docs!

Indeed.

> 
> I don't think we need a 'stronger barrier' as the client can
> implement that itself merely by waiting for all commands to
> complete prior to sending FLUSH.
> 
> Incidentally, last time I looked, the linux kernel always sent
> a FLUSH immediately after any bio marked FUA. Does qemu use
> more interesting behavioural modes?

I'm just learning the qemu nbd code myself, so I don't have a good
answer, other than what I wrote before:

>>
>> In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
>> that's the same function it calls for write+FUA.
> 
> That's harmless, but unnecessary in the sense that current documented
> behaviour doesn't require it. Perhaps it should?

It _is_ a reasonable semantic - it means you are guaranteed that what
YOU read will match what anyone ELSE reads (without waiting for writes
to land, what YOU read SHOULD favor what is sitting in pending writes,
while what others read may be stale on-disk data about to be overwritten
by pending writes).  And while I'm a bit fuzzy on the POSIX semantics of
O_SYNC and O_DSYNC with open(), and on sync() vs. fsync() vs.
fdatasync() vs. syncfs() (they are all subtly different, and I never
remember which is stronger in what scenarios, nor how Linux subtly
differs from what POSIX says), POSIX does have some wording syncs being
useful even on reads to force the read to not complete until you can
guarantee that everyone else will read the same content (that is, the
sync flushes the pending writes, even though you are doing a read
operation).

> 
> I suppose TRIM etc. should support FUA too?

Probably, and with similar semantics to WRITE (only affects this
transaction rather than all pending ones, but guarantees that the trim
lands on disk before returning).

>> Meanwhile, it sounds like FUA is valid on read, write, AND flush
>> (because the kernel supports all three),
> 
> Do you have a pointer to what FUA means on kernel reads? Does it

No clue. I'm not a kernel expert, and was assuming that you knew more
about it than me.

> mean "force unit access for the read" or does it mean "flush any
> write for that block first"? The first is subtly different if the
> file is remote and being accessed by multiple people (e.g. NFS, Ceph etc.)

I would lean to the latter - FUA on a read seems like it is most useful
if it means "guarantee that no one else can read something older than
what I read", and NOT "give me possibly stale data because I accessed
the underlying storage rather than paying attention to in-flight writes
that would change what I read".  In other words, I think you should
ALWAYS prefer data from in-flight writes over going to backing storage,
but USUALLY don't need the overhead of waiting for those writes to
complete; FUA slows down your read, but gives you better data assurance.

> 
>> even if we aren't quite sure
>> what to document of those flags.  And that means qemu is correct, and
>> the NBD protocol has a bug.  Since you contributed the FUA flag, is that
>> something you can try to improve?
> 
> Yeah. My mess so I should clean it up. I think FUA should be valid
> on essentially everything.
> 
> I think I might wait until structured replies is in though!

It's also tricky because we just barely documented that servers SHOULD
reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
commands where it is not documented; I don't know if we have an adequate
discovery system in place to learn _which_ commands support FUA,
especially if you are proposing that we expand the scope of FUA to be
valid alongside a TRIM request.

It doesn't have to be solved today, though, so I'm fine if you wait for
structured replies first.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 20:17     ` Alex Bligh
  2016-03-31 20:34       ` Eric Blake
@ 2016-04-01  7:43       ` Paolo Bonzini
  2016-04-01  9:19         ` Alex Bligh
  2016-04-05  5:09         ` Kevin Wolf
  1 sibling, 2 replies; 24+ messages in thread
From: Paolo Bonzini @ 2016-04-01  7:43 UTC (permalink / raw)
  To: Alex Bligh, Eric Blake; +Cc: nbd-general, qemu-devel



On 31/03/2016 22:17, Alex Bligh wrote:
>> > In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
>> > that's the same function it calls for write+FUA.
> That's harmless, but unnecessary in the sense that current documented
> behaviour doesn't require it. Perhaps it should?
> 
> I suppose TRIM etc. should support FUA too?

TRIM is an advisory operation, so it doesn't make sense to force access
to the medium.  The closest you could get would be to add FUA to
WRITE_ZEROES.  But since WRITE_ZEROES is not a particularly common
operation there isn't much to gain compared to FLUSHing after the write
has completed; in fact SCSI doesn't have a FUA bit on its WRITE SAME
command.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 20:34       ` Eric Blake
@ 2016-04-01  7:49         ` Paolo Bonzini
  2016-04-01  9:25           ` Alex Bligh
  2016-04-01  8:27         ` Wouter Verhelst
  2016-05-02 17:08         ` Eric Blake
  2 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2016-04-01  7:49 UTC (permalink / raw)
  To: Eric Blake, Alex Bligh; +Cc: nbd-general, qemu-devel



On 31/03/2016 22:34, Eric Blake wrote:
> give me possibly stale data because I accessed
> the underlying storage rather than paying attention to in-flight writes
> that would change what I read".

Overlapping I/O is always unspecified, so you should expect stale data
if a read overlaps an in-flight write.  Not only there's no guarantee of
atomicity; the disk is a "safe register" rather than a "regular
register"
(http://stackoverflow.com/questions/8871633/whats-the-difference-between-safe-regular-and-atomic-registers)

The way QEMU does "flush first" for FUA on reads provides the guarantee
that "no one else can read something older than what I read", but that
assumes that no one else is doing an overlapping write.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 20:34       ` Eric Blake
  2016-04-01  7:49         ` Paolo Bonzini
@ 2016-04-01  8:27         ` Wouter Verhelst
  2016-04-01  9:40           ` Alex Bligh
  2016-04-01 14:16           ` Eric Blake
  2016-05-02 17:08         ` Eric Blake
  2 siblings, 2 replies; 24+ messages in thread
From: Wouter Verhelst @ 2016-04-01  8:27 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, qemu-devel, Alex Bligh

On Thu, Mar 31, 2016 at 02:34:24PM -0600, Eric Blake wrote:
> On 03/31/2016 02:17 PM, Alex Bligh wrote:
> >> even if we aren't quite sure
> >> what to document of those flags.  And that means qemu is correct, and
> >> the NBD protocol has a bug.  Since you contributed the FUA flag, is that
> >> something you can try to improve?
> > 
> > Yeah. My mess so I should clean it up. I think FUA should be valid
> > on essentially everything.
> > 
> > I think I might wait until structured replies is in though!
> 
> It's also tricky because we just barely documented that servers SHOULD
> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
> commands where it is not documented; I don't know if we have an adequate
> discovery system in place to learn _which_ commands support FUA,

That's what I'm mostly worried about. Yes, we have FUA, and yes, some
clients may send it on commands that aren't WRITE, but it is not very
well defined what happens then:

- Currently-released versions of nbd-server will accept the flag on
  non-WRITE commands, but ignore it. This is generally not desirable, I
  would say.
- The change that I made a few days ago means future versions will *not*
  accept it. I feel this is better than accepting-and-ignoring, but not
  ideal either in case a client sends out a request that includes FUA.

We can perhaps ignore the former of the above as "buggy behaviour" and
not negotiate anything for a server that does the "proper" thing of
accepting-and-acting-on for non-WRITE FUA, but then it is probably a
good idea to document in the spec that "there are older, buggy, servers
which accept but ignore FUA on anything but WRITE", at least for a few
years until such older servers aren't available anymore.

Alternatively, we could add yet another flag bit to signal "FUA valid
everywhere". Not sure if that is worth it.

[...]

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01  7:43       ` Paolo Bonzini
@ 2016-04-01  9:19         ` Alex Bligh
  2016-04-05  5:09         ` Kevin Wolf
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Bligh @ 2016-04-01  9:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nbd-general, qemu-devel, Alex Bligh


On 1 Apr 2016, at 08:43, Paolo Bonzini <pbonzini@redhat.com> wrote:

> 
> On 31/03/2016 22:17, Alex Bligh wrote:
>>>> In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
>>>> that's the same function it calls for write+FUA.
>> That's harmless, but unnecessary in the sense that current documented
>> behaviour doesn't require it. Perhaps it should?
>> 
>> I suppose TRIM etc. should support FUA too?
> 
> TRIM is an advisory operation, so it doesn't make sense to force access
> to the medium.

Reflecting, that's correct. FUA can't mean anything sensible for
TRIM.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01  7:49         ` Paolo Bonzini
@ 2016-04-01  9:25           ` Alex Bligh
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bligh @ 2016-04-01  9:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nbd-general, qemu-devel, Alex Bligh


On 1 Apr 2016, at 08:49, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 31/03/2016 22:34, Eric Blake wrote:
>> give me possibly stale data because I accessed
>> the underlying storage rather than paying attention to in-flight writes
>> that would change what I read".
> 
> Overlapping I/O is always unspecified, so you should expect stale data
> if a read overlaps an in-flight write.

Sure, that's a consideration for overlapping I/O in general rather
than FUA. You need to wait until a write operation has completed to be
able to guarantee that a read operation will read back the written
data (whether or not it has been persisted to disk rather than
e.g. sitting in a cache somewhere). A read following a FUA write
that has not completed could be processed prior to the write. Similarly
a read following a non-FUA write that has completed must read the written
data even if it's not persisted.

I'm interested with either Qemu or the kernel have a sense of FUA
on reads. Do you have a view on that?

This (which was the inspiration for nbd FUA and now appears slightly
clearer than I remember at the time):
https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt

says

> The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
> filesystem and will make sure that I/O completion for this request is only
> signaled after the data has been committed to non-volatile storage.

and

> For devices that also support the FUA bit the block layer needs
> to be told to pass through the REQ_FUA bit using:
> 
> 	blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA);
> 
> and the driver must handle write requests that have the REQ_FUA bit set
> in prep_fn/request_fn.  If the FUA bit is not natively supported the block
> layer turns it into an empty REQ_FLUSH request after the actual write.

That's pretty ambiguous as to whether you need to handle it for reads
as well. If it does read+FUA means that the read needs an fdatasync()
type operation to happen.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01  8:27         ` Wouter Verhelst
@ 2016-04-01  9:40           ` Alex Bligh
  2016-04-01 14:16           ` Eric Blake
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Bligh @ 2016-04-01  9:40 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd-general, qemu-devel, Alex Bligh


On 1 Apr 2016, at 09:27, Wouter Verhelst <w@uter.be> wrote:

>> It's also tricky because we just barely documented that servers SHOULD
>> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
>> commands where it is not documented; I don't know if we have an adequate
>> discovery system in place to learn _which_ commands support FUA,
> 
> That's what I'm mostly worried about. Yes, we have FUA, and yes, some
> clients may send it on commands that aren't WRITE, but it is not very
> well defined what happens then:

Actually the text says:

> Clients MUST NOT set a command flag bit that is not documented for the particular command; and whether a flag is valid may depend on negotiation during the handshake phase.


So this gives us an easy out. We document NBD_CMD_FLAG_FUA for every command, but say it is meaningless for commands outside some subset (see below), and therefore 'SHOULD NOT' be used by the client. For commands outside that subset, the server SHOULD ignore it. That means that we don't need to change the above text, as it is documented.

However, I'm not clear what the subset of commands NBD_CMD_FLAG_FUA should work on is. I originally thought it was anything that writes. Paolo has pointed out it's pointless for TRIM given it's advisory. It makes sense for anything that writes holes. I am not clear what it should do for reads - the original semantic was meant to be the kernel bio layer semantic, and (as per message to Paolo) the kernel documentation is less than helpful in whether it can be set on a read bio.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01  8:27         ` Wouter Verhelst
  2016-04-01  9:40           ` Alex Bligh
@ 2016-04-01 14:16           ` Eric Blake
  2016-04-01 15:00             ` Alex Bligh
  1 sibling, 1 reply; 24+ messages in thread
From: Eric Blake @ 2016-04-01 14:16 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd-general, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 4574 bytes --]

On 04/01/2016 02:27 AM, Wouter Verhelst wrote:

> That's what I'm mostly worried about. Yes, we have FUA, and yes, some
> clients may send it on commands that aren't WRITE, but it is not very
> well defined what happens then:
> 
> - Currently-released versions of nbd-server will accept the flag on
>   non-WRITE commands, but ignore it. This is generally not desirable, I
>   would say.
> - The change that I made a few days ago means future versions will *not*
>   accept it. I feel this is better than accepting-and-ignoring, but not
>   ideal either in case a client sends out a request that includes FUA.
> 

We have not yet released any version of the server that rejects FUA on a
command, so that may work in our favor with my strawman below (but does
mean we need to relax your recent commit before releasing).

> We can perhaps ignore the former of the above as "buggy behaviour" and
> not negotiate anything for a server that does the "proper" thing of
> accepting-and-acting-on for non-WRITE FUA, but then it is probably a
> good idea to document in the spec that "there are older, buggy, servers
> which accept but ignore FUA on anything but WRITE", at least for a few
> years until such older servers aren't available anymore.
> 
> Alternatively, we could add yet another flag bit to signal "FUA valid
> everywhere". Not sure if that is worth it.

Or rather than a flag bit, what about this strawman:

NBD_FLAG_SEND_FUA: If set, the server understands the NBD_CMD_FLAG_FUA
bit.  Except where more specific mandatory or optional behavior is
documented on a given request, the server MUST ignore NBD_CMD_FLAG_FUA
if it advertised NBD_FLAG_SEND_FUA.  Clients SHOULD NOT assume that the
flag will have an effect on anything other than NBD_CMD_WRITE, unless
the client has first checked NBD_OPT_QUERY_FUA.  If clear, the client
MUST NOT use NBD_CMD_FLAG_FUA.


NBD_OPT_QUERY_FUA
  Data: 16 bits, the request type to check
  Responses:
    - NBD_REP_ACK - If NBD_CMD_FLAG_FUA is set for the given request
type, the server obeys FUA semantics as documented for that command. If
the server will be advertising NBD_FLAG_SEND_FUA, then it MUST use this
reply for NBD_CMD_WRITE, and SHOULD use this reply for all other
supported request types that have documented mandatory behavior for
NBD_CMD_FLAG_FUA.
    - NBD_REP_ERR_POLICY - The server knows the request type, but either
the server will not be advertising NBD_FLAG_SEND_FUA, or the server will
ignore the NBD_CMD_FLAG_FUA flag; either way, the client SHOULD NOT use
NBD_CMD_FLAG_FUA with a request of this type.  The server SHOULD use
this reply for request types that have documented optional behavior for
NBD_CMD_FLAG_FUA where the server did not implement that optional
behavior (such as NBD_CMD_TRIM), and for commands that do not specify
behavior (such as NBD_CMD_DISC).
    - NBD_REP_ERR_INVALID - The server does not recognize the request
type in question. The server MUST use this reply for request types that
it does not know (including for NBD_CMD_WRITE_ZEROES if the server will
not be advertising NBD_FLAG_SEND_WRITE_ZEROES).
    - For backward compatibility, the client should also be prepared for
NBD_REP_ERR_UNSUP, when the server does not understand the option
request, and the client SHOULD assume that NBD_CMD_FLAG_FUA will be
honored for NBD_CMD_WRITE if NBD_FLAG_SEND_FUA is advertised, and SHOULD
NOT use NBD_CMD_FLAG_FUA on any other request.

NBD_CMD_WRITE: If NBD_FLAG_SEND_FUA is set, the server MUST obey FUA
semantics.

NBD_CMD_WRITE_ZEROES: If NBD_FLAG_SEND_FUA is set and the write zeroes
extension is advertised, the server MUST obey FUA semantics.

NBD_CMD_READ: If NBD_FLAG_SEND_FUA is set, the client SHOULD use
NBD_OPT_QUERY_FUA to determine whether using NBD_CMD_FLAG_FUA will have
any effect; the server SHOULD obey FUA semantics (if we figure out what
they are!), but buggy servers MAY ignore the flag instead.

NBD_CMD_FLUSH: If NBD_FLAG_SEND_FUA is set, the client SHOULD use
NBD_OPT_QUERY_FUA to determine whether using NBD_CMD_FLAG_FUA will have
any effect; the server SHOULD obey FUA semantics, but buggy servers MAY
ignore the flag instead.

NBD_CMD_TRIM: If NBD_FLAG_SEND_FUA is set, the client SHOULD use
NBD_OPT_QUERY_FUA to determine whether using NBD_CMD_FLAG_FUA will have
any effect.  Since the command is advisory, the server is free to ignore
the flag.

Thoughts?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01 14:16           ` Eric Blake
@ 2016-04-01 15:00             ` Alex Bligh
  2016-04-01 15:08               ` Eric Blake
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bligh @ 2016-04-01 15:00 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, Wouter Verhelst, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 1601 bytes --]

> 
> Or rather than a flag bit, what about this strawman:
> 
> NBD_FLAG_SEND_FUA: If set, the server understands the NBD_CMD_FLAG_FUA
> bit.  Except where more specific mandatory or optional behavior is
> documented on a given request, the server MUST ignore NBD_CMD_FLAG_FUA
> if it advertised NBD_FLAG_SEND_FUA.  Clients SHOULD NOT assume that the
> flag will have an effect on anything other than NBD_CMD_WRITE, unless

and NBD_CMD_WRITE_ZEROES presumably

> the client has first checked NBD_OPT_QUERY_FUA.  If clear, the client
> MUST NOT use NBD_CMD_FLAG_FUA.
> 
> 
> NBD_OPT_QUERY_FUA
>  Data: 16 bits, the request type to check
>  Responses:
...
> Thoughts?

A bit overengineered I think. I can't realistically see clients writing
code that would negotiate all that.

Personally I think we should stick to:

* Don't send FUA unless NBD_FLAG_SEND_FUA is set

* FUA works on NBD_CMD_WRITE and NBD_CMD_WRITE_ZEROES

* On all the other commands:

  - If it does anything in the linux kernel, we should make
    it do the same thing here. (We know it doesn't for Qemu
    or rather Qemu doesn't rely on it)

  - If it does not do anything in the linux kernel, then we
    should decide whether we want to: do
    a) client SHOULD NOT send, server MUST ignore it (my preference)
    or
    b) client MUST NOT send, server MUST close connection.

I suppose I am going have to try another lkml message to get
to the bottom of the first one. On the other hand if we
take route (a) above, we can relatively easily add a
'meaning' later as people can't have been relying on it
working.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01 15:00             ` Alex Bligh
@ 2016-04-01 15:08               ` Eric Blake
  2016-04-01 15:12                 ` Alex Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2016-04-01 15:08 UTC (permalink / raw)
  To: Alex Bligh; +Cc: nbd-general, Wouter Verhelst, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2384 bytes --]

On 04/01/2016 09:00 AM, Alex Bligh wrote:
>>
>> Or rather than a flag bit, what about this strawman:
>>
>> NBD_FLAG_SEND_FUA: If set, the server understands the NBD_CMD_FLAG_FUA
>> bit.  Except where more specific mandatory or optional behavior is
>> documented on a given request, the server MUST ignore NBD_CMD_FLAG_FUA
>> if it advertised NBD_FLAG_SEND_FUA.  Clients SHOULD NOT assume that the
>> flag will have an effect on anything other than NBD_CMD_WRITE, unless
> 
> and NBD_CMD_WRITE_ZEROES presumably
> 
>> the client has first checked NBD_OPT_QUERY_FUA.  If clear, the client
>> MUST NOT use NBD_CMD_FLAG_FUA.
>>
>>
>> NBD_OPT_QUERY_FUA
>>  Data: 16 bits, the request type to check
>>  Responses:
> ...
>> Thoughts?
> 
> A bit overengineered I think. I can't realistically see clients writing
> code that would negotiate all that.

Fair enough. I was just throwing it out there so that at least we
considered our tradeoffs.

> 
> Personally I think we should stick to:
> 
> * Don't send FUA unless NBD_FLAG_SEND_FUA is set
> 
> * FUA works on NBD_CMD_WRITE and NBD_CMD_WRITE_ZEROES
> 
> * On all the other commands:
> 
>   - If it does anything in the linux kernel, we should make
>     it do the same thing here. (We know it doesn't for Qemu
>     or rather Qemu doesn't rely on it)
> 
>   - If it does not do anything in the linux kernel, then we
>     should decide whether we want to: do
>     a) client SHOULD NOT send, server MUST ignore it (my preference)
>     or
>     b) client MUST NOT send, server MUST close connection.

closing the connection is not strictly necessary (except maybe for
NBD_CMD_READ without structured replies); for all other commands, the
server may reply with EINVAL error instead of closing the connection.

But yes, I'm favoring a) as well, for the simplicity factor.  There's
still the issue that if we document a behavior, a new client talking to
an older server can't reliably tell if the behavior will be guaranteed.

> I suppose I am going have to try another lkml message to get
> to the bottom of the first one. On the other hand if we
> take route (a) above, we can relatively easily add a
> 'meaning' later as people can't have been relying on it
> working.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01 15:08               ` Eric Blake
@ 2016-04-01 15:12                 ` Alex Bligh
  2016-04-01 15:13                   ` Alex Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bligh @ 2016-04-01 15:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, Wouter Verhelst, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 679 bytes --]


On 1 Apr 2016, at 16:08, Eric Blake <eblake@redhat.com> wrote:

> But yes, I'm favoring a) as well, for the simplicity factor.  There's
> still the issue that if we document a behavior, a new client talking to
> an older server can't reliably tell if the behavior will be guaranteed.

Existing clients should not be sending FUA on anything other than
NBD_CMD_WRITE *and* relying on the behaviour, as the behaviour is
not documented (hence this discussion). Therefore it shouldn't
break anything. I also think it won't break anything in practice
as qemu doesn't use FUA on write and the kernel doesn't use FUA
at all; I realise that is not an exhaustive list.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01 15:12                 ` Alex Bligh
@ 2016-04-01 15:13                   ` Alex Bligh
  2016-04-01 15:31                     ` Eric Blake
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Bligh @ 2016-04-01 15:13 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, Wouter Verhelst, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]


On 1 Apr 2016, at 16:12, Alex Bligh <alex@alex.org.uk> wrote:

> as qemu doesn't use FUA on write and the kernel doesn't use FUA

"as qemu doesn't use FUA other than on write" - sorry

> at all; I realise that is not an exhaustive list.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01 15:13                   ` Alex Bligh
@ 2016-04-01 15:31                     ` Eric Blake
  2016-04-01 15:46                       ` Alex Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2016-04-01 15:31 UTC (permalink / raw)
  To: Alex Bligh; +Cc: nbd-general, Wouter Verhelst, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1990 bytes --]

On 04/01/2016 09:13 AM, Alex Bligh wrote:
> 
> On 1 Apr 2016, at 16:12, Alex Bligh <alex@alex.org.uk> wrote:
> 
>> as qemu doesn't use FUA on write and the kernel doesn't use FUA
> 
> "as qemu doesn't use FUA other than on write" - sorry

qemu's block/nbd-client.c currently sends NBD_CMD_FLAG_FUA on
NBD_CMD_FLUSH (unconditionally, if NBD_FLAG_SEND_FUA was advertised),
and on NBD_CMD_WRITE (conditionally: if NBD_FLAG_SEND_FUA was advertised
AND the upper layer requested FUA).  I don't see it being sent on reads,
so on that front, you appear to be correct that no common clients are
expecting any particular behavior on reads.

qemu's nbd/server.c silently ignores NBD_CMD_FLAG_FUA on all commands
except NBD_CMD_READ (conditionally calls blk_co_flush()), NBD_CMD_WRITE
(conditionally calls blk_co_flush()).  Ignoring the flag on
NBD_CMD_FLUSH seems to make sense, since that code unconditionally calls
blk_co_flush(), which is all the more the flag was enabling on
read/write.  And if you are correct that no real clients are sending FUA
on READ, then deleting (or changing) the server support for FUA on read
should have no negative impact.

qemu doesn't yet implement WRITE_ZEROES.

When qemu client is talking to a qemu server, there is no
incompatibility between the two - all client commands that set FUA are
sanely handled by the recipient server code.  But the same is not true
if you pair a current qemu client with current nbd.git server, as the
qemu will unconditionally set FUA on flush, and the server now rejects
it as an invalid flag (thus making it impossible for qemu to flush).  So
we probably ought to fix qemu client to quit unconditionally sending FUA
(no change when qemu client talks to qemu server); and at the same time
out to fix NBD server to silently ignore FUA on flush (fix the breakage
between the pairing).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01 15:31                     ` Eric Blake
@ 2016-04-01 15:46                       ` Alex Bligh
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Bligh @ 2016-04-01 15:46 UTC (permalink / raw)
  To: Eric Blake; +Cc: nbd-general, Wouter Verhelst, qemu-devel, Alex Bligh

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]


On 1 Apr 2016, at 16:31, Eric Blake <eblake@redhat.com> wrote:

> When qemu client is talking to a qemu server, there is no
> incompatibility between the two - all client commands that set FUA are
> sanely handled by the recipient server code.  But the same is not true
> if you pair a current qemu client with current nbd.git server, as the
> qemu will unconditionally set FUA on flush, and the server now rejects
> it as an invalid flag (thus making it impossible for qemu to flush).  So
> we probably ought to fix qemu client to quit unconditionally sending FUA
> (no change when qemu client talks to qemu server); and at the same time
> out to fix NBD server to silently ignore FUA on flush (fix the breakage
> between the pairing).

+1.

--
Alex Bligh





[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-01  7:43       ` Paolo Bonzini
  2016-04-01  9:19         ` Alex Bligh
@ 2016-04-05  5:09         ` Kevin Wolf
  2016-04-05 13:28           ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2016-04-05  5:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nbd-general, qemu-devel, Alex Bligh

Am 01.04.2016 um 09:43 hat Paolo Bonzini geschrieben:
> 
> 
> On 31/03/2016 22:17, Alex Bligh wrote:
> >> > In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
> >> > that's the same function it calls for write+FUA.
> > That's harmless, but unnecessary in the sense that current documented
> > behaviour doesn't require it. Perhaps it should?
> > 
> > I suppose TRIM etc. should support FUA too?
> 
> TRIM is an advisory operation, so it doesn't make sense to force access
> to the medium.

I think it does make sense. It means that on completion there is no
pending discard operation (i.e. either there wasn't a discard or if
there was, it has completed) and other readers will see the final state
of the blocks.

> The closest you could get would be to add FUA to WRITE_ZEROES.  But
> since WRITE_ZEROES is not a particularly common operation there isn't
> much to gain compared to FLUSHing after the write has completed; in
> fact SCSI doesn't have a FUA bit on its WRITE SAME command.

Right, I guess this would mostly be for consistency when FUA is
supported more or less everywhere else.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-05  5:09         ` Kevin Wolf
@ 2016-04-05 13:28           ` Paolo Bonzini
  2016-04-06 13:14             ` Kevin Wolf
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2016-04-05 13:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: nbd-general, qemu-devel, Alex Bligh



On 05/04/2016 07:09, Kevin Wolf wrote:
> > TRIM is an advisory operation, so it doesn't make sense to force access
> > to the medium.
> 
> I think it does make sense. It means that on completion there is no
> pending discard operation (i.e. either there wasn't a discard or if
> there was, it has completed) and other readers will see the final state
> of the blocks.

This is what already happens though, isn't it?

Paolo

>> > The closest you could get would be to add FUA to WRITE_ZEROES.  But
>> > since WRITE_ZEROES is not a particularly common operation there isn't
>> > much to gain compared to FLUSHing after the write has completed; in
>> > fact SCSI doesn't have a FUA bit on its WRITE SAME command.
> Right, I guess this would mostly be for consistency when FUA is
> supported more or less everywhere else.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-05 13:28           ` Paolo Bonzini
@ 2016-04-06 13:14             ` Kevin Wolf
  2016-04-06 13:28               ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2016-04-06 13:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nbd-general, qemu-devel, Alex Bligh

Am 05.04.2016 um 15:28 hat Paolo Bonzini geschrieben:
> On 05/04/2016 07:09, Kevin Wolf wrote:
> > > TRIM is an advisory operation, so it doesn't make sense to force access
> > > to the medium.
> > 
> > I think it does make sense. It means that on completion there is no
> > pending discard operation (i.e. either there wasn't a discard or if
> > there was, it has completed) and other readers will see the final state
> > of the blocks.
> 
> This is what already happens though, isn't it?

You mean because in practice discard requests aren't even cached, so we
always behave as if FUA were specified? That's probably right, but is
there a fundamental reason why some storage backend couldn't have a
writeback cache for discards?

It probably wouldn't make sense to introduce FUA for this if it didn't
already exist elsewhere, but now that we do have it, I'd allow it for
TRIM, too, for the sake of consistency and symmetry.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-06 13:14             ` Kevin Wolf
@ 2016-04-06 13:28               ` Paolo Bonzini
  2016-04-06 13:50                 ` Kevin Wolf
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2016-04-06 13:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: nbd-general, qemu-devel, Alex Bligh



On 06/04/2016 15:14, Kevin Wolf wrote:
>>> > > 
>>> > > I think it does make sense. It means that on completion there is no
>>> > > pending discard operation (i.e. either there wasn't a discard or if
>>> > > there was, it has completed) and other readers will see the final state
>>> > > of the blocks.
>> > 
>> > This is what already happens though, isn't it?
> You mean because in practice discard requests aren't even cached, so we
> always behave as if FUA were specified? That's probably right, but is
> there a fundamental reason why some storage backend couldn't have a
> writeback cache for discards?

No, there isn't.  Does qcow2's discard get cached?  I wouldn't be
surprised (and SCSI actually says nowhere that WRITE SAME is durable
without a subsequent SYNCHRONIZE CACHE!).

> It probably wouldn't make sense to introduce FUA for this if it didn't
> already exist elsewhere, but now that we do have it, I'd allow it for
> TRIM, too, for the sake of consistency and symmetry.

Yes, that's fine.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-04-06 13:28               ` Paolo Bonzini
@ 2016-04-06 13:50                 ` Kevin Wolf
  0 siblings, 0 replies; 24+ messages in thread
From: Kevin Wolf @ 2016-04-06 13:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: nbd-general, qemu-devel, Alex Bligh

Am 06.04.2016 um 15:28 hat Paolo Bonzini geschrieben:
> 
> 
> On 06/04/2016 15:14, Kevin Wolf wrote:
> >>> > > 
> >>> > > I think it does make sense. It means that on completion there is no
> >>> > > pending discard operation (i.e. either there wasn't a discard or if
> >>> > > there was, it has completed) and other readers will see the final state
> >>> > > of the blocks.
> >> > 
> >> > This is what already happens though, isn't it?
> > You mean because in practice discard requests aren't even cached, so we
> > always behave as if FUA were specified? That's probably right, but is
> > there a fundamental reason why some storage backend couldn't have a
> > writeback cache for discards?
> 
> No, there isn't.  Does qcow2's discard get cached?  I wouldn't be
> surprised (and SCSI actually says nowhere that WRITE SAME is durable
> without a subsequent SYNCHRONIZE CACHE!).

For qcow2, yes and no. It has a cache for coalescing discards within a
single request, but that cache is flushed before completing the
operation. So it's not visible for the caller.

> > It probably wouldn't make sense to introduce FUA for this if it didn't
> > already exist elsewhere, but now that we do have it, I'd allow it for
> > TRIM, too, for the sake of consistency and symmetry.
> 
> Yes, that's fine.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
  2016-03-31 20:34       ` Eric Blake
  2016-04-01  7:49         ` Paolo Bonzini
  2016-04-01  8:27         ` Wouter Verhelst
@ 2016-05-02 17:08         ` Eric Blake
  2 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2016-05-02 17:08 UTC (permalink / raw)
  To: Alex Bligh; +Cc: nbd-general, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3512 bytes --]

On 03/31/2016 02:34 PM, Eric Blake wrote:
> On 03/31/2016 02:17 PM, Alex Bligh wrote:
>> OK so I actually went and researched what my answer was last time I
>> was asked ( :-) ):
>>
>> Here was my conclusion last time after trawling through lkml
>> on the subject:
>>
>> From https://sourceforge.net/p/nbd/mailman/message/27569820/
>>

>>> Meanwhile, it sounds like FUA is valid on read, write, AND flush
>>> (because the kernel supports all three),
>>
>> Do you have a pointer to what FUA means on kernel reads? Does it
> 
> No clue. I'm not a kernel expert, and was assuming that you knew more
> about it than me.
> 
>> mean "force unit access for the read" or does it mean "flush any
>> write for that block first"? The first is subtly different if the
>> file is remote and being accessed by multiple people (e.g. NFS, Ceph etc.)
> 
> I would lean to the latter - FUA on a read seems like it is most useful
> if it means "guarantee that no one else can read something older than
> what I read", and NOT "give me possibly stale data because I accessed
> the underlying storage rather than paying attention to in-flight writes
> that would change what I read".  In other words, I think you should
> ALWAYS prefer data from in-flight writes over going to backing storage,
> but USUALLY don't need the overhead of waiting for those writes to
> complete; FUA slows down your read, but gives you better data assurance.

SCSI defines FUA on both reads and writes.  Reading
http://www.seagate.com/staticfiles/support/disc/manuals/scsi/100293068a.pdf
under READ(10), I see:

"The force unit access (FUA) and force unit access non-volatile cache
(FUA_NV) bits are defined in table 87.
...
The device server shall read the logical blocks from the medium. If a
cache contains a more
recent version of a logical block, the device server shall write the
logical block to the medium
before reading it.
"

Whether the kernel exposes this aspect of SCSI FUA bit through its bio
interface (and if not, whether it should) are a different matter, but it
sounds like since at least SCSI has a definition for FUA on read, maybe
NBD should (someday) likewise add a definition for it.  At least for
now, we left the door open for it, by at least requiring that servers
not reject the flag, although there may be some discovery issues to
figure out before a client can reasonably know that its FUA-on-read
requests will actually be honored.

> 
>>
>>> even if we aren't quite sure
>>> what to document of those flags.  And that means qemu is correct, and
>>> the NBD protocol has a bug.  Since you contributed the FUA flag, is that
>>> something you can try to improve?
>>
>> Yeah. My mess so I should clean it up. I think FUA should be valid
>> on essentially everything.
>>
>> I think I might wait until structured replies is in though!
> 
> It's also tricky because we just barely documented that servers SHOULD
> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
> commands where it is not documented; I don't know if we have an adequate
> discovery system in place to learn _which_ commands support FUA,
> especially if you are proposing that we expand the scope of FUA to be
> valid alongside a TRIM request.
> 
> It doesn't have to be solved today, though, so I'm fine if you wait for
> structured replies first.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-05-02 17:09 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 19:33 [Qemu-devel] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH? Eric Blake
2016-03-31 19:41 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-03-31 19:54   ` Eric Blake
2016-03-31 20:17     ` Alex Bligh
2016-03-31 20:34       ` Eric Blake
2016-04-01  7:49         ` Paolo Bonzini
2016-04-01  9:25           ` Alex Bligh
2016-04-01  8:27         ` Wouter Verhelst
2016-04-01  9:40           ` Alex Bligh
2016-04-01 14:16           ` Eric Blake
2016-04-01 15:00             ` Alex Bligh
2016-04-01 15:08               ` Eric Blake
2016-04-01 15:12                 ` Alex Bligh
2016-04-01 15:13                   ` Alex Bligh
2016-04-01 15:31                     ` Eric Blake
2016-04-01 15:46                       ` Alex Bligh
2016-05-02 17:08         ` Eric Blake
2016-04-01  7:43       ` Paolo Bonzini
2016-04-01  9:19         ` Alex Bligh
2016-04-05  5:09         ` Kevin Wolf
2016-04-05 13:28           ` Paolo Bonzini
2016-04-06 13:14             ` Kevin Wolf
2016-04-06 13:28               ` Paolo Bonzini
2016-04-06 13:50                 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.