From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52910) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1an1nn-0003Wm-8p for qemu-devel@nongnu.org; Mon, 04 Apr 2016 06:32:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1an1ni-0003HM-I6 for qemu-devel@nongnu.org; Mon, 04 Apr 2016 06:32:11 -0400 Received: from metis.ext.4.pengutronix.de ([92.198.50.35]:50547) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1an1ni-0003H2-8Y for qemu-devel@nongnu.org; Mon, 04 Apr 2016 06:32:06 -0400 From: Markus Pargmann Date: Mon, 04 Apr 2016 12:32 +0200 Message-ID: <6085102.BIbixMc99f@adelgunde> In-Reply-To: <56F954BC.5010606@redhat.com> References: <1458742562-30624-1-git-send-email-den@openvz.org> <20160325084929.GA2671@grep.be> <56F954BC.5010606@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1575767.6UFzRbWvKL"; micalg="pgp-sha256"; protocol="application/pgp-signature" Subject: Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: nbd-general@lists.sourceforge.net, Kevin Wolf , "Denis V. Lunev" , qemu-devel@nongnu.org, Stefan Hajnoczi , Paolo Bonzini , Wouter Verhelst --nextPart1575767.6UFzRbWvKL Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Hi, On Monday 28 March 2016 09:58:52 Eric Blake wrote: > On 03/25/2016 02:49 AM, Wouter Verhelst wrote: >=20 > >> You may also want to add a rule that for all future extensions, an= y > >> command that requires data in the server response (other than the = server > >> response to NBD_CMD_READ) must include a 32-bit length as the firs= t > >> field of its data payload, so that the server reply is always stru= ctured. > >=20 > > Right. > >=20 > >> Hmm, I still think it would be hard to write a wireshark dissector= for > >> server replies, since there is no indication whether a given reply= will > >> include data or not. > >=20 > > Well, a wireshark dissector already exists. However, it is very old= ; it > > doesn't support the NBD_CMD_TRIM or NBD_CMD_FLUSH commands, it does= n't > > support the NBD_CMD_FLAG_FUA option to NBD_CMD_WRITE, and it doesn'= t > > deal with negotiation at all. It was written when newstyle negotiat= ion > > didn't exist yet, and scanning for INIT_PASSWD at the time was too = hard, > > according to the guy who wrote it (don't remember who that was). >=20 > And I've never written a wireshark dissector myself, to know how easy= or > hard it would be to extend that work. >=20 > >=20 > >> The _client_ knows (well, any well-written client > >> that uses a different value for 'handle' for every command sent to= the > >> server), because it can read the returned 'handle' field to know w= hat > >> command it matches to and therefore what data to expect; but a > >> third-party observer seeing _just_ the server messages has no idea= which > >> server responses should have payload. > >=20 > > It can if it knows enough about the protocol, but granted, that mak= es it > > harder for us to extend the protocol without breaking the dissector= . > >=20 > >> Scanning the stream and assuming that a new reply starts each time= > >> NBD_REPLY_MAGIC is encountered is a close approximation, but hits > >> false positives if the data being returned for NBD_CMD_READ contai= ns > >> the same magic number as part of its contents. > >=20 > > Indeed. >=20 > One benefit of TCP: we can rely on packet boundaries (whether or not > fragmentation is also happening); I'm not sure if UDP shares the same= > benefits (then again, I'm not even sure if UDP is usable for the NBD > protocol, since we definitely rely on in-order delivery of packets: a= > read command can definitely return more bytes than even a jumbo frame= > can contain, even though we do allow out-of-order delivery of replies= ). > So if the server always sends each reply as the start of its own pac= ket > (rather than coalescing multiple replies into a single network packet= ), > and the dissector only looks for magic numbers at the start of a pack= et, > then any server packet not starting with the magic number must be dat= a > payload, and you could potentially even avoid the false positives by > choosing to break data replies into packets by adjusting packet lengt= hs > by one byte any time the next data chunk would otherwise happen to st= art > with the same two bytes as the magic number. But I haven't actually > tested any of this, to know if packet coalescing goes on, and I > certainly don't think it is worth complicating the server to avoid fa= lse > positive magic number detection by splitting read payloads across pac= ket > boundaries, just for the benefit of wire-sniffing. >=20 > > I like this. However, before committing to it, I would like to see > > Markus' feedback on that (explicit Cc added, even though he's on th= e > > list, too). > >=20 > > We'd also need a way to communicate the ability to speak this proto= col > > from the kernel to userspace before telling the server that the cli= ent > > understands something which maybe its kernel doesn't. >=20 > proto.md already documents that ioctl()s exist for the user space to > inform the kernel about options sent by the server prior to kicking o= ff > transmission phase, and that the NBD protocol itself does not go into= > full detail about available ioctl()s (the kernel/userspace coordinati= on > does not affect the protocol). It seems fairly straightforward for t= he > kernel to supply another ioctl() that userspace can use to learn whet= her > it is allowed or forbidden from advertising structured reply status > during the handshake phase (including the case where the ioctl() is n= ot > present being treated as the client must not enable structured replie= s). Depending on the implementation we may not even need to communicate the= used NBD protocol from userspace to the kernel. We have a different magic and the magic stays at the beginning of the message so we can simply use the magic to decide what message we have. Of course that would need another receive which may be too slow. For the other direction, giving the userspace information about the capabilities of the kernel implementation, it may be better to use a sysfs entry? All current ioctls are used for the exact nbd device it is= used on. But implementation capabilities are a common property over all= nbd devices. Best Regards, Markus =2D-=20 Pengutronix e.K. | = | Industrial Linux Solutions | http://www.pengutronix.de/= | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 = | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-555= 5 | --nextPart1575767.6UFzRbWvKL Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXAkKgAAoJEEpcgKtcEGQQpukQAK5PPitFGrKcEoJ6QjMPhRtB ggCGUYlsjH6rBfITVpFDdh5g81Y1df00+9qxpZ8ekmjfEkBWPBtC2nF1arPf0Vwp WdX2l6RXj8+4/41vgSgyc8NOlnOLb+5GE8GbpyyLYOCZrn3h0udAJuJkNs99imLK ePlCGiMg5wupeBx7q79Cw/0PFWlIb2eFhc/QLb5Uz6QDhoHXw630cbBcNWf6gem+ 6ns0PfyMA4FPhiSmKCjN1VBoKcgBqMRI6mCro0dzs65DQqaSnwz+N94kn9Kyg7aA eg+ZDQqx1maykeaa7yhzxlXLke0GR5gBKj8axgD3reOv6NXH7JIZVst4hCibfbbO /OEYzlAdjh56NVpiDfCKer67MY8INi2rpydvrAyCfZknGAWkaI29iDH4SV2ElVWZ KI5TJAhoDWKNxQfL5APV7ZBQ+O8yt4DjobX+NBiZoy2XR2nCFXfUmNa4pPohUVQm kAXRylUwshTsLJQ5nRahbx38zAEu5HfH3+ps0+Rl6BKesPWFgaItgBcAzlyx9TIP v+FqNFZ3YnoYM4q5+dT34UF20pbKAbD1u1/fNiM2EdoUG+hLzCy3GutSYmUZAb5c O767kfMkEWNti01kdDQJRq5GScsnEwdtnBOPC0tJKJWE42AFoqMsQCaGELphGMUC tPS2oJCDmttPYgYjh78Z =saht -----END PGP SIGNATURE----- --nextPart1575767.6UFzRbWvKL--