On 03/30/2016 12:50 AM, Alex Bligh wrote:
> Eric,
> 
> There's a lot in common between our two proposals now (unsurprisingly).
> You've highlighted the differences in the other mail and I'll
> comment on them there. You may want to steal some of my wording as
> I think there are bits I've got that you haven't (as well as vice versa).
> But I'm inclined to use yours as a base unless you particularly
> like mine.
> 
> Comments inline below.
> 
> Alex
> 
> On 30 Mar 2016, at 00:01, Eric Blake <eblake@redhat.com> wrote:
> ...
> 
>> +    While the server is permitted to send at most one normal reply (or
>> +    else close the connection), a command that uses structured replies
>> +    may document that the server is permitted to send mutiple replies,

And just now noticing my typo on 'multiple' :)

>> +    all sharing the same handle,
> 
> The thought is fine, but the language is confusing. I think this is
> a single reply, made up of multiple parts (I called them chunks). You've
> called them multiple replies, which I think makes things less clear.
> Also below you've started using my 'chunk' language anyway!

All right, for v3, I will try to go with the wording that every request
has a single reply; but the reply will either be a 'simple reply'
(single message), or a 'structured reply' (which may occupy multiple
messages, where each message is called a 'chunk').

> 
>> by using the `NBD_REPLY_FLAG_DONE`
>> +    (bit 0) to delineate the final reply. The server MAY interleave
>> +    intermediate replies to one structured command with replies
>> +    relating to a different handle.
> 
> Neat.
> 
> The argument against this route is that now there are essentially
> two ways to end a chain of chunks (with and without a NONE chunk)
> which is necessary for the reasons you set out. On balance I like it though.

Yeah; I didn't see any way around that.  Always requiring a NONE chunk
is more network overhead if the server can guarantee that the last data
chunk is error-free; but at the same time, we shouldn't force servers to
guarantee the last data chunk will be error-free.

I don't think it is too much of a burden for a client to receive chunks
in a loop until the FLAG_DONE bit is set, without regards to what chunk
type came last.  And for CMD_READ, we still have a nice delineation: if
the last chunk is NONE, OFFSET_DATA, or OFFSET_HOLE, the command
succeeded; if the last chunk is ERROR or ERROR_OFFSET, the command failed.

> 
>>  However, for ease of implementation, a
>> +    server MAY close the connection rather than entering transmission
>> +    phase if, at the end of option haggling, the client has negotiated
>> +    another command that requires a structured reply but did not also
>> +    negotiate Structured Reads.
> 
> That's pretty yucky given a reconnect will achieve the same result
> and you'll end up in an infinite retry loop.
> 
> Wouldn't a better route be simply to say that implementing certain
> commands (server or client sides) requires support of structured
> replies?

I think we're in agreement that the only command that (for back-compat
reasons) can ever send data on a simple reply is CMD_READ.  Therefore,
if you negotiate any other command that can send data, that command will
use a structured reply; the mere fact that you negotiated the command
means that both client and server know about structured replies.

I guess what I was trying to get at is that if you are using any
structured replies, then it is a disservice to wire-sniffers if you did
not also enable structured replies for CMD_READ.  Technically, it would
be feasible to use simple replies for reads while using structured
replies for the other negotiated commands, but practically, a client
that does that seems undesirable, which is why I said that a server
could disconnect rather than talking to such a client.

So taking your sentence at face value, yes there is another solution
possible: require that any NBD_OPT_* haggling used to negotiate the use
of any other command with a structured reply MUST fail if the client has
not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
open so the client can continue option haggling.  That way, the only way
to end option haggling with the new command enabled is to also enable
structured reads.  The burden is then on the client to haggle in the
correct order, and on the server to track haggling state when deciding
how to answer the option requests for the new commands.

I'm a bit reluctant to put ordering requirements on option haggling
(that option A must be turned on before option B), but then again, the
SELECT extension requires NBD_OPT_SELECT to be haggled prior to
NBD_OPT_GO, so there's precedent.  I also am thinking of proposing an
extension for option haggling to communicate minimum/preferred
alignments and maximum don't-fragment sizing, and that option would have
to be enabled before OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it
would change the NBD_REP_SERVER layout in response to those option
requests), which would be another case where option A affects the
behavior of option B.

> 
>> +    - `NBD_REPLY_TYPE_NONE` (0)
>> +
>> +      *length* MUST be 0 (and the payload field omitted).  This type
>> +      SHOULD be used only as the final reply (that is, when
>> +      `NBD_REPLY_FLAG_DONE` is set), and implies that the overall
>> +      client request was successfully completed.
> 
> I think this would be clearer as 'SHOULD NOT be used other than as the
> final reply'. Because you are also saying (I think) that you need not
> have it as the final reply - it's just as good in a non-errored
> reply to have NBD_REPLY_FLAG_DONE set on the last data packet (provided
> you know it's not going to error before starting to send it).

Yes, that sounds slightly better.  (Technically, there's no reason that
it can't be used as an intermediate chunk, but there it is only a no-op
filler that wastes bandwidth - hence the SHOULD and not MUST).

> 
> ...
> 
>> +    The server MAY split the reply into any number of data chunks,
>> +    using reply types of `NBD_REPLY_TYPE_OFFSET_DATA` or
>> +    `NBD_REPLY_TYPE_OFFSET_HOLE`; each chunk MUST describe at least
>> +    one byte, although to minimize overhead, the server SHOULD use
>> +    chunks no smaller than 512 bytes where possible (the first and
> 
> This is a good idea, but rather than 'no smaller than 512 bytes', as
> it's a 'SHOULD', could we have 'the server SHOULD use chunks each
> an integer multiple of 512 bytes where possible' (you already have
> a carve out for the first and last).
> 
> ...
> 
>> +    If no error is detected, then the server MUST send enough chunks
>> +    to cover the bytes requested.  The server MAY set the
>> +    `NBD_REPLY_FLAG_DONE` on the final data chunk,
> 
> In which case it MUST NOT send any further non-data chunks
> (e.g. an error chunk or a NONE chunk)

Well, yeah - once the FLAG_DONE is sent, no further chunks of any type
are allowed; the reply is complete.

> 
>> to minimize
>> +    traffic, but MUST NOT do so if it would still be possible to
>> +    detect an error while transmitting the chunk.  If the last data
>> +    chunk is not the final reply, the server MUST use
>> +    `NBD_REPLY_TYPE_NONE` as the final reply to indicate success.
> 
> or an error chunk to indicate an error, and these final chunk MUST have
> NBD_REPLY_FLAG_DONE set on it.
> 
>> +    If an error is detected, the server MUST send padding bytes to
>> +    complete the current chunk (if any), MUST report the error with a
>> +    reply type of either `NBD_REPLY_TYPE_ERROR` or
>> +    `NBD_REPLY_TYPE_ERROR_OFFSET`, and MAY end the sequence of replies
>> +    without sending the total number of bytes requested.  If one or
>> +    more offset errors are reported, the client MAY assume that all
>> +    data in chunks not including the offset,
> 
> "the offset(s)"
> 
>> and all data within the
>> +    affected chunk
> 
> "within each affected chunk"
> 
>> but prior to the offset,
> 
> "prior to the relevant offset"
> 
>> is valid; the client MAY
>> +    NOT assume anything about data validity if no offset is provided.
> 

Yes, your tweaks help the flow.

> These multiple error chunks are neat. However, I suspect lazy implementors
> may just send an error without an offset.

Any ideas are appreciated on how to word it to suggest that servers
SHOULD try to give full details; but I think we want the fallback to the
no-offset case because some situations will not have an offset.

> 
>> +    The server MAY send additional chunks or offset error replies, if
>> +    `NBD_REPLY_FLAG_DONE` was not set, but MUST ensure the final reply
>> +    also reports an error (that is, the final reply MUST NOT use
>> +    `NBD_REPLY_TYPE_NONE`), and MAY reuse an offset reported earlier
>> +    in constructing the final reply.
> 
> I'm not sure I get that bit. Why don't you make an errorred reply simply
> one that contains one or more error chunks. An errorred reply need not contain
> all the data requested (though each chunk must be complete). A reply that
> isn't errorred needs not contain all the data requested. Why do you
> need anything stronger than that? So if you have a parallelised server which
> is simply sending several reads in parallel (think Ceph) it sends the
> result from each thread, possibly followed by an error packet, and some
> other thread notices when all of these have completed and sends a
> NBD_REPLY_TYPE_NONE packet (always, error or not) to close of use of the
> handle. This seems perfectly natural and no harder for the client to deal
> with, but you are prohibiting it.

I was thinking that it's easier on the client if the final chunk always
serves as a reliable indicator of success (OFFSET_DATA, OFFSET_HOLE,
NONE) or error (ERROR, ERROR_OFFSET).  But if you think that always
allowing a concluding NONE, even on an errored reply due to an earlier
chunk reporting the error, is reasonable, I can relax things along those
lines.  Or maybe we can state that the concluding chunk on an errored
reply may be ERROR_OFFSET with 'error' and 'offset' fields set to 0,
rather than forcing the server to replay one of the earlier offsets.


> 
>>  A server SHOULD NOT mix
>> +    `NBD_REPLY_TYPE_ERROR` and `NBD_REPLY_TYPE_ERROR_OFFSET` replies
>> +    to the same request.
>> +
>> +    A client MAY close the connection if it detects that the server
>> +    has sent invalid chunks (such as overlapping data, or not enough
>> +    data before claiming success).
>> +
>> +    In order to avoid the burden of reassembly, the client MAY set the
>> +    `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
>> +    fragment the reply.  If this flag is set, the server MUST send at
>> +    most one `NBD_REPLY_TYPE_OFFSET_DATA` or
>> +    `NBD_REPLY_TYPE_OFFSET_HOLE`, although it MAY still send more than
>> +    reply (for error reporting, or a final `NBD_REPLY_TYPE_NONE`).  If
> 
> "the flag is set and"
> 
>> +    the client's length request is larger than 65,536 bytes (or if a
>> +    later extension adds a way to negotiate a larger maximum fragment
>> +    size), the server MAY reject the command with `EOVERFLOW`.  The
>> +    `EOVERFLOW` error MUST NOT be used if the `NBD_CMD_FLAG_DF` flag
>> +    was not set, or if the requested length is no larger than 65,536.

I'm also wondering if the wording should be switched along the lines of:

If the flag is set and the server deems the request to be too large, the
server MAY reject the command with `EOVERFLOW`; however, the server MUST
NOT reject a request that is no larger than 65,536 bytes in this manner.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org