From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54952) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cBhyr-0007Fs-V4 for qemu-devel@nongnu.org; Tue, 29 Nov 2016 07:57:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cBhyo-0004sW-I5 for qemu-devel@nongnu.org; Tue, 29 Nov 2016 07:57:54 -0500 Received: from mail.avalus.com ([89.16.176.221]:43824) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cBhyo-0004qu-7m for qemu-devel@nongnu.org; Tue, 29 Nov 2016 07:57:50 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) From: Alex Bligh In-Reply-To: <1480073296-6931-1-git-send-email-vsementsov@virtuozzo.com> Date: Tue, 29 Nov 2016 12:57:46 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1480073296-6931-1-git-send-email-vsementsov@virtuozzo.com> Subject: Re: [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy Cc: Alex Bligh , "nbd-general@lists.sourceforge.net" , "qemu-devel@nongnu.org" , Kevin Wolf , Paolo Bonzini , Pavel Borzenkov , "Stefan stefanha@redhat. com" , "Denis V. Lunev" , Wouter Verhelst , Eric Blake , mpa@pengutronix.de Vladimir, I went back to April to reread the previous train of conversation then found you had helpfully summarised some if it. Comments below. Rather than comment on many of the points individual, the root of my confusion and to some extent uncomfortableness about this proposal is 'who owns the meaning the the bitmaps'. Some of this is my own confusion (sorry) about the use to which this is being put, which is I think at root a documentation issue. To illustrate this, you write in the FAQ section that this is for read only disks, but the text talks about: +Some storage formats and operations over such formats express a +concept of data dirtiness. Whether the operation is block device +mirroring, incremental block device backup or any other operation with +a concept of data dirtiness, they all share a need to provide a list +of ranges that this particular operation treats as dirty. How can data be 'dirty' if it is static and unchangeable? (I thought) I now think what you are talking about backing up a *snapshot* of a disk that's running, where the disk itself was not connected using NBD? IE = it's not being 'made dirty' by NBD_CMD_WRITE etc. Rather 'dirtiness' is = effectively an opaque state represented in a bitmap, which is binary metadata at some particular level of granularity. It might as well be 'happiness' or 'is coloured blue'. The NBD server would (normally) have no way of manipulating this bitmap. In previous comments, I said 'how come we can set the dirty bit through writes but can't clear it?'. This (my statement) is now I think wrong, as NBD_CMD_WRITE etc. is not defined to set the dirty bit. The state of the bitmap comes from whatever sets the bitmap which is outside the scope of this protocol to transmit it. However, we have the uncomfortable (to me) situation where the protocol describes a flag 'dirty', with implications as to what it does, but no actual strict definition of how it's set. So any 'other' user has no real idea how to use the information, or how to implement a server that provides a 'dirty' bit, because the semantics of that aren't within the protocol. This does not sit happily with me. So I'm wondering whether we should simplify and generalise this spec. = You say that for the dirty flag, there's no specification of when it is set and cleared - that's implementation defined. Would it not be better then to say 'that whole thing is private to Qemu - even the name'. Rather you could read the list of bitmaps a server has, with a textual name, each having an index (and perhaps a granularity). You could then ask on NBD_CMD_BLOCK_STATUS for the appropriate index, and get back that bitmap value. Some names (e.g. 'ALLOCATED') could be defined in the = spec, and some (e.g. ones beginning with 'X-') could be reserved for user usage. So you could use 'X-QEMU-DIRTY'). If you then change what your dirty bit means, you could use 'X-QEMU-DIRTY2' or similar, and not need a protocol change to support it. IE rather than looking at 'a way of reading the dirty bit', we could have this as a generic way of reading opaque bitmaps. Only one = (allocation) might be given meaning to start off with, and it wouldn't be necessary for all servers to support that - i.e. you could support bitmap reading without having an ALLOCATION bitmap available. This spec would then be limited to the transmission of the bitmaps (remove the word 'dirty' entirely, except perhaps as an illustrative use case), and include only the definition of the allocation bitmap. Some more nits: > Also, bit of NBD_FLAG_SEND_BLOCK_STATUS is changed to 9, as 8 is now > NBD_FLAG_CAN_MULTI_CONN in master branch. >=20 > And, finally, I've rebased this onto current state of > extension-structured-reply branch (which itself should be rebased on > master IMHO). Each documentation branch should normally be branched off master unless it depends on another extension (in which case it will be branched from = that). I haven't been rebasing them frequently as it can disrupt those working on the branches. There's only really an issue around rebasing where you depend on another branch. > 2. Q: different granularities of dirty/allocated bitmaps. Any = problems? > A: 1: server replies with status descriptors of any size, = granularity > is hidden from the client > 2: dirty/allocated requests are separate and unrelated to each > other, so their granularities are not intersecting I'm OK with this, but note that you do actually mention a granularity of sorts in the spec (512 byes) - I think you should replace that with the minimum block size. > 3. Q: selecting of dirty bitmap to export > A: several variants: > 1: id of bitmap is in flags field of request > pros: - simple > cons: - it's a hack. flags field is for other uses. > - we'll have to map bitmap names to these "ids" > 2: introduce extended nbd requests with variable length and = exploit this > feature for BLOCK_STATUS command, specifying bitmap = identifier. > pros: - looks like a true way > cons: - we have to create additional extension > - possible we have to create a map, > { <=3D> } > 3: exteranl tool should select which bitmap to export. So, in = case of Qemu > it should be something like qmp command = block-export-dirty-bitmap. > pros: - simple > - we can extend it to behave like (2) later > cons: - additional qmp command to implement (possibly, the = lesser evil) > note: Hmm, external tool can make chose between = allocated/dirty data too, > so, we can remove 'NBD_FLAG_STATUS_DIRTY' flag at all. Yes, this is all pretty horrible. I suspect we want to do something like = (2), and permit extra data across (in my proposal, it would only be one byte = to select the index). I suppose one could ask for a list of bitmaps. > 4. Q: Should not get_{allocated,dirty} be separate commands? > cons: Two commands with almost same semantic and similar means? > pros: However here is a good point of separating clearly defined and = native > for block devices GET_BLOCK_STATUS from user-driven and = actually > undefined data, called 'dirtyness'. I'm suggesting one generic 'read bitmap' command like you. > 5. Number of status descriptors, sent by server, should be restricted > variants: > 1: just allow server to restrict this as it wants (which was done in = v3) > 2: (not excluding 1). Client specifies somehow the maximum for = number > of descriptors. > 2.1: add command flag, which will request only one descriptor > (otherwise, no restrictions from the client) > 2.2: again, introduce extended nbd requests, and add field to > specify this maximum I think some form of extended request is the way to go, but out of interest, what's the issue with as many descriptors being sent as it takes to encode the reply? The client can just consume the remainder (without buffering) and reissue the request at a later point for the areas it discarded. >=20 > 6. A: What to do with unspecified flags (in request/reply)? > I think the normal variant is to make them reserved. (Server should > return EINVAL if found unknown bits, client should consider replay > with unknown bits as an error) Yeah. >=20 > + > +* `NBD_CMD_BLOCK_STATUS` > + > + A block status query request. Length and offset define the range > + of interest. Clients SHOULD NOT use this request unless the = server MUST NOT is what we say elsewhere I believe. > + set `NBD_CMD_SEND_BLOCK_STATUS` in the transmission flags, which > + in turn requires the client to first negotiate structured = replies. > + For a successful return, the server MUST use a structured reply, > + containing at most one chunk of type = `NBD_REPLY_TYPE_BLOCK_STATUS`. Nit: are you saying that non-structured error replies are permissible? You're always/often going to get a non-structured (simple) error reply if the server doesn't support the command, but I think it would be fair = to say the server MUST use a structured reply to NBD_CMD_SEND_BLOCK_STATUS if it supports the command. This is effectively what we say re = NBD_CMD_READ. > + > + The list of block status descriptors within the > + `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive = portions > + of the file starting from specified *offset*, and the sum of the > + *length* fields of each descriptor MUST not be greater than the > + overall *length* of the request. This means that the server MAY > + return less data than required. However the server MUST return at > + least one status descriptor I'm not sure I understand why that's useful. What should the client infer from the server refusing to provide information? We don't permit short reads etc. > . The server SHOULD use different > + *status* values between consecutive descriptors, and SHOULD use > + descriptor lengths that are an integer multiple of 512 bytes = where > + possible (the first and last descriptor of an unaligned query = being > + the most obvious places for an exception). Surely better would be an an integer multiple of the minimum block size. Being able to offer bitmap support at finer granularity than the absolute minimum block size helps no one, and if it were possible to support a 256 byte block size (I think some floppy disks had that) I see no reason not to support that as a granularity. --=20 Alex Bligh