From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47019) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajDQR-0002CV-Ao for qemu-devel@nongnu.org; Thu, 24 Mar 2016 18:08:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ajDQO-0001v9-2n for qemu-devel@nongnu.org; Thu, 24 Mar 2016 18:08:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60666) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ajDQN-0001v3-RB for qemu-devel@nongnu.org; Thu, 24 Mar 2016 18:08:16 -0400 References: <1458742562-30624-1-git-send-email-den@openvz.org> <1458742562-30624-3-git-send-email-den@openvz.org> From: Eric Blake Message-ID: <56F4654D.2020700@redhat.com> Date: Thu, 24 Mar 2016 16:08:13 -0600 MIME-Version: 1.0 In-Reply-To: <1458742562-30624-3-git-send-email-den@openvz.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="a32eluTSvqNorem37OURfHSB89U1AqdkW" Subject: Re: [Qemu-devel] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Denis V. Lunev" , nbd-general@lists.sourceforge.net, qemu-devel@nongnu.org Cc: Kevin Wolf , Paolo Bonzini , Pavel Borzenkov , Stefan Hajnoczi , Wouter Verhelst This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --a32eluTSvqNorem37OURfHSB89U1AqdkW Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/23/2016 08:16 AM, Denis V. Lunev wrote: > From: Pavel Borzenkov >=20 > With the availability of sparse storage formats, it is often needed to > query status of a particular LBA range and read only those blocks of > data that are actually present on the block device. >=20 > To provide such information, the patch adds GET_LBA_STATUS extension > with one new NBD_CMD_GET_LBA_STATUS command. >=20 > +* `NBD_CMD_GET_LBA_STATUS` (7) > + > + An LBA range status query request. Length and offset define the ra= nge > + of interest. The server MUST reply with a reply header, followed > + immediately by the following data: > + > + - 32 bits, length of parameter data that follow (unsigned) > + - zero or more LBA status descriptors, each having the following= > + structure: > + > + * 64 bits, offset (unsigned) > + * 32 bits, length (unsigned) > + * 16 bits, status (unsigned) > + > + unless an error condition has occurred. To date, only the NBD_CMD_READ command caused the server to send data after the handle in its reply. This would be the second command with a data field in the response, but it is returning a variable amount of data, not directly tied to the client's length - but at least you did make it structured so that the client knows how much to read. However, your patch is incomplete; you'll need to edit the "Transmission" section of the document to mention the rules on the server sending data, as the existing text would now be wrong: > The server replies with: >=20 > S: 32 bits, 0x67446698, magic (NBD_REPLY_MAGIC) > S: 32 bits, error > S: 64 bits, handle > S: (length bytes of data if the request is of type NBD_CMD_READ) You may also want to add a rule that for all future extensions, any command that requires data in the server response (other than the server response to NBD_CMD_READ) must include a 32-bit length as the first field of its data payload, so that the server reply is always structured.= Hmm, I still think it would be hard to write a wireshark dissector for server replies, since there is no indication whether a given reply will include data or not. The _client_ knows (well, any well-written client that uses a different value for 'handle' for every command sent to the server), because it can read the returned 'handle' field to know what command it matches to and therefore what data to expect; but a third-party observer seeing _just_ the server messages has no idea which server responses should have payload. Scanning the stream and assuming that a new reply starts each time NBD_REPLY_MAGIC is encountered is a close approximation, but hits false positives if the data being returned for NBD_CMD_READ contains the same magic number as part of its contents. Obviously, back-compat says we can't change the response to NBD_CMD_READ, but that means that a wireshark dissector has to either maintain context, or hunt through returned data looking for potential magic numbers and possibly hitting false positives. That said, maybe we want to allow global flag negotiation prior to transmission to add a new flag on both server and client side - the server would advertise that it is capable of a new reply mode, and the client then has to reply that it wants to use the new reply mode; in that mode, all server replies starting with magic 0x67446698 (NBD_REPLY_MAGIC) will never have a data payload, and all commands that cause a reply with payload (both NBD_CMD_READ and the new NBD_CMD_GET_LBA_STATUS of this message, or whatever name we give it) will reply with a NEW magic number: S: 32 bits, XXXX, magic (NBD_REPLY_MAGIC2) S: 32 bits, error S: 64 bits, handle S: 32 bits, length S: length bytes of data so that the server's data stream is fully structured without having to maintain context of the client's requests. That is, a dissector can now merely scan for both magic numbers; and on a stream between a client and server that have negotiated the new mode, the old magic number will never have a payload, and the new magic number will always be accompanied with a payload that describes how much data to read to the boundary of the next reply. For that matter, right now, NBD_CMD_READ is required to either end the session or return length bytes of data even when error is non-zero (but where those bytes MAY be invalid); but by adding a negotiated flag for structured length replies, it would be possible to allow for short reads and/or returning an error with 0 bytes of payload but still keeping the connection to the client open, without having to send wasted bytes over the wire. I could write up a negotiation of global flags for structured reply lengths as an extension proposal, if you think it is worth it. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --a32eluTSvqNorem37OURfHSB89U1AqdkW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJW9GVNAAoJEKeha0olJ0Nqz5UH+QEntVzxJIYQqDEEIgLsLI++ TgvVzQywn7HlsZ7CGSDGsdltZmALkcHuBUNQ0OpyBEJfQ1SwATWvjH+oyBNzx3j/ FWNu6qwR73Er+UHvbSkK3/NxctkYTijLO0mN7r/DsDPQxm3/a34JNHRShuGcRjEh f19uIBC6FhHQ1utZB2i01woeq5823llo7kkj5yLEPHcamEuMj9BiHbG5GoeOVg1F 22ThodabCKZn9WipTnG1t83OqZson5BxFFvCGfUqW/9CH3eJOVSmjD09k/wHpO2L WtdINDHwwujfHZ+scLmoS2Hew9uvJqVlIPGzGTY38oDnc8dortB/BdXxAJPsFmo= =COFQ -----END PGP SIGNATURE----- --a32eluTSvqNorem37OURfHSB89U1AqdkW--