From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36029) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aj78f-0001f7-OO for qemu-devel@nongnu.org; Thu, 24 Mar 2016 11:25:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aj78a-0005Ew-Pp for qemu-devel@nongnu.org; Thu, 24 Mar 2016 11:25:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50380) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aj78a-0005Ej-Hb for qemu-devel@nongnu.org; Thu, 24 Mar 2016 11:25:28 -0400 References: <1458742562-30624-1-git-send-email-den@openvz.org> <1458742562-30624-3-git-send-email-den@openvz.org> <20160323175834.GC2467@grep.be> <56F3D5C7.9070007@redhat.com> From: Eric Blake Message-ID: <56F406E7.4010207@redhat.com> Date: Thu, 24 Mar 2016 09:25:27 -0600 MIME-Version: 1.0 In-Reply-To: <56F3D5C7.9070007@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="IKfqFmFCWCGJViTUl109RsDSWGaNrwh4x" Subject: Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Wouter Verhelst , "Denis V. Lunev" Cc: nbd-general@lists.sourceforge.net, Kevin Wolf , qemu-devel@nongnu.org, Stefan Hajnoczi This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --IKfqFmFCWCGJViTUl109RsDSWGaNrwh4x Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/24/2016 05:55 AM, Paolo Bonzini wrote: >> As Eric noted, please expand LBA at least once. >=20 > Let's just use "block" (e.g. NBD_CMD_GET_BLOCK_STATUS). Yes, avoiding the term LBA and using BLOCK everywhere also nicely solves the problem of introducing yet more terminology. >=20 >>> + - 32 bits, length of parameter data that follow (unsigned) >>> + - zero or more LBA status descriptors, each having the followi= ng >>> + structure: >>> + >>> + * 64 bits, offset (unsigned) >>> + * 32 bits, length (unsigned) >>> + * 16 bits, status (unsigned) >>> + >>> + unless an error condition has occurred. >>> + >=20 > Can we just return one descriptor? That would simplify the protocol a = bit. As in, the return is exactly one descriptor, consisting of: * 32 bits, length (unsigned): must be > 0, <=3D the client's length * 16 bits, status (unsigned): status of that block Of course, it means more traffic. The nice part about returning an array of descriptors is that I can learn the status of 1G of the file, even if the file alternates every 512 bytes between extent status, in just one client call. But returning only a single descriptor at a time means I'd have to make 2M client calls to learn the same pattern of allocation. Fortunately, in the common case, allocation patterns tend to not be that disjoint. On the other hand, returning only one descriptor at a time (for possibly less length than the client requested) may be easier when using lseek(SEEK_DATA/HOLE) as the mechanism for determining the bounds of each extent, since the server only has to search once per command, instead of dynamically construct the entire reply. I don't have any strong opinions on which would be better, but it is definitely food for thought. >=20 > However, let's make these bits, so that >=20 > NBD_STATE_ALLOCATED (0x1), LBA extent is present on the block device > NBD_STATE_ZERO (0x2), LBA extent will read as zeroes Should we flip the sense and call this NBD_STATE_UNALLOCATED (0 means allocated, 1 means not present), so that an overall status of 0 is a safe default? (That is, it should always be safe to state a sector is allocated when it is not, and always safe to state a sector is not known to read as zeroes even if that happens to be its contents - all that we lose by reporting this safe default state is that the client will be unable to optimize for skipping holes). >> Either the spec should define what it means for a block to be in a dir= ty >> state, or it should not talk about it. >=20 > Here is my attempt: >=20 > This command is meant to operate in tandem with other (non-NBD) > channels to the server. Generally, a "dirty" block is a block that= > has been written to by someone, but the exact meaning of "has been > written" is left to the implementation. For example, a virtual > machine monitor could provide a (non-NBD) command to start tracking= > blocks written by the virtual machine. A backup client then can > connect to an NBD server provided by the virtual machine monitor > and use NBD_CMD_GET_BLOCK_STATUS only read blocks that the virtual s/only/to only/ > machine has changed. s/changed/changed since it started tracking/ >=20 > An implementation that doesn't track the "dirtiness" state of block= s > MUST either fail this command with EINVAL, or mark all blocks as > dirty in the descriptor that it returns. Is it feasible to return zero/allocated/dirty status all at the same time, or do we want to strictly require two different modes of operation? That is, if we are returning zero and allocated as two bits, can we also return a third bit for dirty/clean? Should we flip the sense of the bit, where 0 means dirty and 1 means clean, again so that a server can always return a status of 0 as the safe default? --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --IKfqFmFCWCGJViTUl109RsDSWGaNrwh4x Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJW9AbnAAoJEKeha0olJ0NqBV0IAKa6+TsQd+18AB7oAfeXHrEN HfO/IMMNfGea2ioqPmaYX/rq+MnAb8D+MHNf8jeX09n5DSSjPTYvW1b1pPIaEzfR YhMaik/zSqV4gd8gr0LK7Tkgn/8dJsopSFqtFiTN8PyOgjdrBg1xqnxI0DCh1AWa r/SaPZpo6ffXA+JRNpIvi/vQFXLmog1ZBV01uKnyDRvuL+4WnoGWmYsUZTXe3uJP 4YUk36kF24x9uW2UC1PoB5Bgh4iPM0/hGqrBHgszB7uw/oc3GTqQkmphV7aIIPvO /eJyS0Kh7f1Jy06WUdd0UUZiQJhtm4weBLt2kPRRXBHreUcGkVv5UJi1FEye9AA= =JS3Q -----END PGP SIGNATURE----- --IKfqFmFCWCGJViTUl109RsDSWGaNrwh4x--