From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58342)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1alN8X-0000Xf-VJ
	for qemu-devel@nongnu.org; Wed, 30 Mar 2016 16:54:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1alN8V-0000V7-4D
	for qemu-devel@nongnu.org; Wed, 30 Mar 2016 16:54:45 -0400
Received: from mx1.redhat.com ([209.132.183.28]:52323)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eblake@redhat.com>) id 1alN8U-0000V3-QG
	for qemu-devel@nongnu.org; Wed, 30 Mar 2016 16:54:43 -0400
References: <1459173555-4890-1-git-send-email-eblake@redhat.com>
	<1459292460-6875-1-git-send-email-eblake@redhat.com>
	<1459292460-6875-4-git-send-email-eblake@redhat.com>
	<4690D2CF-B919-4AAF-B772-1B901922275F@alex.org.uk>
	<56FC10A0.5090902@redhat.com> <20160330195112.GA3057@grep.be>
From: Eric Blake <eblake@redhat.com>
Message-ID: <56FC3D11.3030505@redhat.com>
Date: Wed, 30 Mar 2016 14:54:41 -0600
MIME-Version: 1.0
In-Reply-To: <20160330195112.GA3057@grep.be>
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15"
Subject: Re: [Qemu-devel] [Nbd] [PATCH v2 3/3] doc: Propose Structured Read
 extension
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Wouter Verhelst <w@uter.be>
Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Alex Bligh <alex@alex.org.uk>

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 03/30/2016 01:51 PM, Wouter Verhelst wrote:

> (side note: this seems to be mostly an NBD discussion at this point.
> Does qemu-devel need to be in the loop before we've finished that? I
> don't care either way, but then I don't want to bore or annoy people...=
)

Well, it stemmed out of qemu's desires to implement more efficient ways
to both push and pull sparse files across NBD; and qemu will ultimately
want to implement what we discuss.  But I'm okay doing most of the churn
off of the qemu list, and only adding qemu back in the loop when it is
actually time to implement the final design, unless someone else speaks
up asking to remain in on the conversation.

>> I'm a bit reluctant to put ordering requirements on option haggling
>> (that option A must be turned on before option B),
>=20
> Yes, me too.
>=20
>> but then again, the
>> SELECT extension requires NBD_OPT_SELECT to be haggled prior to
>> NBD_OPT_GO, so there's precedent.
>=20
> Yeah, but then, that's only because GO is meant to transition from
> negotiation to transmission, so it needs to be after *any* other
> negotiation; anything else would defeat its purpose.
>=20
> Requiring structured read after negotiating other structured commands
> could easily be done by saying that it's an error to leave the
> negotiation phase with "some other" structured command negotiated, but
> not structured read.
>=20
> On the other hand, I feel compelled to point out that we only find
> ourselves in this hole because we've coupled the structured reply with
> the read command. That may have looked like a good idea at the time, bu=
t
> obviously it isn't. If we just have it be "negotiate the structured
> reply format" rather than "the structured read reply", and state that
> once negotiated, the structured reply format is required for any comman=
d
> with a payload, we're done.

Well, I'm worried about the opposite - if the client does not negotiate
structured replies, but does negotiate NBD_CMD_GET_LBA_STATUS (which has
a variable-length payload to reply), then the server has three choices:
0) refuse the client (we already said this is a bit undesirable, as it
may lead to the client retrying in an infloop - having a way to return
an error message is better than dropping the connection); 1) the server
must find a way to shoehorn the same data that would be sent in a
structured reply to fit in the old-style unstructured reply (one nice
thing about the structured reply is that we get a length for free; that
length would have to be shoehorned into the old-style reply, different
from CMD_READ where a length was implied from the request), 2) the
server must fail the message

Actually, having typed that, maybe choice 2 is not all that bad.  It's
fairly easy for a server to ALWAYS fail NBD_CMD_GET_LBA_STATUS with
EINVAL if structured replies were not negotiated, and to document that a
client that negotiates GET_LBA_STATUS MUST also negotiate structured
replies for the command to be useful.  For that matter, we just
documented that servers SHOULD use EINVAL for an unrecognized (or
out-of-context) command.  The client can enable the two options in
either order.  And we'd have the following table:

enabled       enabled
structured    GET_LBA            result of:
replies                     read         GET_LBA       write
----------------------------------------------------------------
no            no            old reply    EINVAL        old reply
yes           no            new reply    EINVAL        [*]
no            yes           old reply    EINVAL        old reply
yes           yes           new reply    new reply     [*]

[*] Here, we're still debating whether it makes sense to allow/require a
server to send new replies everywhere (and clients must handle both
styles if they negotiate structured replies), or to require a server to
send old replies (so that read is the only command where clients have to
handle two styles, but where the results of negotiating pinpoint which
style).

>=20
> Since only the read reply has a payload at this point in time, you don'=
t
> really need to special-case it, anyway.

Okay, so in v1 I tried to negotiate STRUCTURED_REPLY; in v2 I was
specific to STRUCTURED_READ; it sounds like we are leaning back towards
STRUCTURED_REPLY and just a caveat that any new command that sends
payload SHOULD/MUST fail if STRUCTURED_REPLY was not also negotiated.  I
guess that also makes it easier to argue for a server that uses a
structured reply for ALL messages (REPLY_TYPE_NONE for success changes
16 bytes into 20, and REPLY_TYPE_ERROR for errors changes 16 bytes into 2=
4).

>=20
>> I also am thinking of proposing an extension for option haggling to
>> communicate minimum/preferred alignments and maximum don't-fragment
>> sizing, and that option would have to be enabled before
>> OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it would change the
>> NBD_REP_SERVER layout in response to those option requests), which
>> would be another case where option A affects the behavior of option B.=

>=20
> I reused the NBD_REP_SERVER command in reply to the NBD_OPT_SELECT
> command since its purpose seems fairly similar ("send metadata about an=

> export to the client"), but that may have been a mistake. It certainly
> wasn't meant that if you say NBD_OPT_SELECT first and then go
> NBD_OPT_LIST, that the NBD_REP_SERVER reply to NBD_OPT_LIST should be
> the one as specified in the SELECT extension.

Ah, well, then it's evidence that improved wording will help.

>> I was thinking that it's easier on the client if the final chunk alway=
s
>> serves as a reliable indicator of success (OFFSET_DATA, OFFSET_HOLE,
>> NONE) or error (ERROR, ERROR_OFFSET).
>=20
> The client will already need to keep state to reassemble a fragmented
> and out-of-order read reply, anyway. If that's already the case, adding=

> in the requirement to *also* keep track of error state (which could in
> the simplest case be a single bit for a client which doesn't care about=

> offsets or error count) isn't that much more of an issue.

For a client that is copying NBD_CMD_READ into a local file, dumping
directly via pwrite() as each chunk comes in doesn't require the client
track any state (the client can just assume that by the end of the
command, all the bytes will have been covered); while a client using
pwritev() will have to construct an iovec that visits the chunks in the
correct order (not necessarily in the order received).  Clients that
really don't want to do much work have the DF flag to forbid
fragmentation.  But I think you've swayed me - I will make sure v3
allows an error at any point in the chain of chunks, and that the
wording on the final TYPE_NONE chunk does NOT imply success unless no
earlier error chunks were sent.

--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJW/D0RAAoJEKeha0olJ0NqLjUH/0r50Xmp4wrKWTKBB80fmZJO
SJiVQ92Sk9Cd7BEpdwwx56TN/CaMwYVHenB9767qmNY2NnnetjpNaLZ1hKFluJQX
iUv34faMtYriti6ZqDmRI/IUlkLmUn/q8BvTrdWY0ZTbZkNp/2LCQ5FHDE20Aes6
FZR1EI2QiQQhsd+7W939pKVUWamtsf1DU2P8ow6Q+QVWGf3RpHqA5uONNAwVGFeH
r2kCUeYwpIDJtTAaUmBRrDamBZBspqjx+8TLt/WXTlg4NUeUvnLS1iljwaBVAIA9
C90XxU88ifVlWfnsM38jRZIGxSJ0d1L0fVTVPGgemDBDClFgbgtBU01dpjQEZzM=
=d7Vr
-----END PGP SIGNATURE-----

--Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15--