From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58342) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alN8X-0000Xf-VJ for qemu-devel@nongnu.org; Wed, 30 Mar 2016 16:54:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1alN8V-0000V7-4D for qemu-devel@nongnu.org; Wed, 30 Mar 2016 16:54:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52323) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alN8U-0000V3-QG for qemu-devel@nongnu.org; Wed, 30 Mar 2016 16:54:43 -0400 References: <1459173555-4890-1-git-send-email-eblake@redhat.com> <1459292460-6875-1-git-send-email-eblake@redhat.com> <1459292460-6875-4-git-send-email-eblake@redhat.com> <4690D2CF-B919-4AAF-B772-1B901922275F@alex.org.uk> <56FC10A0.5090902@redhat.com> <20160330195112.GA3057@grep.be> From: Eric Blake Message-ID: <56FC3D11.3030505@redhat.com> Date: Wed, 30 Mar 2016 14:54:41 -0600 MIME-Version: 1.0 In-Reply-To: <20160330195112.GA3057@grep.be> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15" Subject: Re: [Qemu-devel] [Nbd] [PATCH v2 3/3] doc: Propose Structured Read extension List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wouter Verhelst Cc: "nbd-general@lists.sourceforge.net" , "qemu-devel@nongnu.org" , Alex Bligh This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/30/2016 01:51 PM, Wouter Verhelst wrote: > (side note: this seems to be mostly an NBD discussion at this point. > Does qemu-devel need to be in the loop before we've finished that? I > don't care either way, but then I don't want to bore or annoy people...= ) Well, it stemmed out of qemu's desires to implement more efficient ways to both push and pull sparse files across NBD; and qemu will ultimately want to implement what we discuss. But I'm okay doing most of the churn off of the qemu list, and only adding qemu back in the loop when it is actually time to implement the final design, unless someone else speaks up asking to remain in on the conversation. >> I'm a bit reluctant to put ordering requirements on option haggling >> (that option A must be turned on before option B), >=20 > Yes, me too. >=20 >> but then again, the >> SELECT extension requires NBD_OPT_SELECT to be haggled prior to >> NBD_OPT_GO, so there's precedent. >=20 > Yeah, but then, that's only because GO is meant to transition from > negotiation to transmission, so it needs to be after *any* other > negotiation; anything else would defeat its purpose. >=20 > Requiring structured read after negotiating other structured commands > could easily be done by saying that it's an error to leave the > negotiation phase with "some other" structured command negotiated, but > not structured read. >=20 > On the other hand, I feel compelled to point out that we only find > ourselves in this hole because we've coupled the structured reply with > the read command. That may have looked like a good idea at the time, bu= t > obviously it isn't. If we just have it be "negotiate the structured > reply format" rather than "the structured read reply", and state that > once negotiated, the structured reply format is required for any comman= d > with a payload, we're done. Well, I'm worried about the opposite - if the client does not negotiate structured replies, but does negotiate NBD_CMD_GET_LBA_STATUS (which has a variable-length payload to reply), then the server has three choices: 0) refuse the client (we already said this is a bit undesirable, as it may lead to the client retrying in an infloop - having a way to return an error message is better than dropping the connection); 1) the server must find a way to shoehorn the same data that would be sent in a structured reply to fit in the old-style unstructured reply (one nice thing about the structured reply is that we get a length for free; that length would have to be shoehorned into the old-style reply, different from CMD_READ where a length was implied from the request), 2) the server must fail the message Actually, having typed that, maybe choice 2 is not all that bad. It's fairly easy for a server to ALWAYS fail NBD_CMD_GET_LBA_STATUS with EINVAL if structured replies were not negotiated, and to document that a client that negotiates GET_LBA_STATUS MUST also negotiate structured replies for the command to be useful. For that matter, we just documented that servers SHOULD use EINVAL for an unrecognized (or out-of-context) command. The client can enable the two options in either order. And we'd have the following table: enabled enabled structured GET_LBA result of: replies read GET_LBA write ---------------------------------------------------------------- no no old reply EINVAL old reply yes no new reply EINVAL [*] no yes old reply EINVAL old reply yes yes new reply new reply [*] [*] Here, we're still debating whether it makes sense to allow/require a server to send new replies everywhere (and clients must handle both styles if they negotiate structured replies), or to require a server to send old replies (so that read is the only command where clients have to handle two styles, but where the results of negotiating pinpoint which style). >=20 > Since only the read reply has a payload at this point in time, you don'= t > really need to special-case it, anyway. Okay, so in v1 I tried to negotiate STRUCTURED_REPLY; in v2 I was specific to STRUCTURED_READ; it sounds like we are leaning back towards STRUCTURED_REPLY and just a caveat that any new command that sends payload SHOULD/MUST fail if STRUCTURED_REPLY was not also negotiated. I guess that also makes it easier to argue for a server that uses a structured reply for ALL messages (REPLY_TYPE_NONE for success changes 16 bytes into 20, and REPLY_TYPE_ERROR for errors changes 16 bytes into 2= 4). >=20 >> I also am thinking of proposing an extension for option haggling to >> communicate minimum/preferred alignments and maximum don't-fragment >> sizing, and that option would have to be enabled before >> OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it would change the >> NBD_REP_SERVER layout in response to those option requests), which >> would be another case where option A affects the behavior of option B.= >=20 > I reused the NBD_REP_SERVER command in reply to the NBD_OPT_SELECT > command since its purpose seems fairly similar ("send metadata about an= > export to the client"), but that may have been a mistake. It certainly > wasn't meant that if you say NBD_OPT_SELECT first and then go > NBD_OPT_LIST, that the NBD_REP_SERVER reply to NBD_OPT_LIST should be > the one as specified in the SELECT extension. Ah, well, then it's evidence that improved wording will help. >> I was thinking that it's easier on the client if the final chunk alway= s >> serves as a reliable indicator of success (OFFSET_DATA, OFFSET_HOLE, >> NONE) or error (ERROR, ERROR_OFFSET). >=20 > The client will already need to keep state to reassemble a fragmented > and out-of-order read reply, anyway. If that's already the case, adding= > in the requirement to *also* keep track of error state (which could in > the simplest case be a single bit for a client which doesn't care about= > offsets or error count) isn't that much more of an issue. For a client that is copying NBD_CMD_READ into a local file, dumping directly via pwrite() as each chunk comes in doesn't require the client track any state (the client can just assume that by the end of the command, all the bytes will have been covered); while a client using pwritev() will have to construct an iovec that visits the chunks in the correct order (not necessarily in the order received). Clients that really don't want to do much work have the DF flag to forbid fragmentation. But I think you've swayed me - I will make sure v3 allows an error at any point in the chain of chunks, and that the wording on the final TYPE_NONE chunk does NOT imply success unless no earlier error chunks were sent. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJW/D0RAAoJEKeha0olJ0NqLjUH/0r50Xmp4wrKWTKBB80fmZJO SJiVQ92Sk9Cd7BEpdwwx56TN/CaMwYVHenB9767qmNY2NnnetjpNaLZ1hKFluJQX iUv34faMtYriti6ZqDmRI/IUlkLmUn/q8BvTrdWY0ZTbZkNp/2LCQ5FHDE20Aes6 FZR1EI2QiQQhsd+7W939pKVUWamtsf1DU2P8ow6Q+QVWGf3RpHqA5uONNAwVGFeH r2kCUeYwpIDJtTAaUmBRrDamBZBspqjx+8TLt/WXTlg4NUeUvnLS1iljwaBVAIA9 C90XxU88ifVlWfnsM38jRZIGxSJ0d1L0fVTVPGgemDBDClFgbgtBU01dpjQEZzM= =d7Vr -----END PGP SIGNATURE----- --Vq7XVU1BFptQAJGKTvxKMmp2tl0Jqlt15--