From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39957)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Sergey.Talantov@acronis.com>) id 1cC2go-00060i-K5
	for qemu-devel@nongnu.org; Wed, 30 Nov 2016 06:04:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Sergey.Talantov@acronis.com>) id 1cC2gk-0003UB-9w
	for qemu-devel@nongnu.org; Wed, 30 Nov 2016 06:04:38 -0500
Received: from us-edge-1.acronis.com ([69.20.59.99]:48180)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <Sergey.Talantov@acronis.com>)
	id 1cC2gk-0003TK-4S
	for qemu-devel@nongnu.org; Wed, 30 Nov 2016 06:04:34 -0500
From: Sergey Talantov <Sergey.Talantov@acronis.com>
Date: Wed, 30 Nov 2016 10:41:54 +0000
Message-ID: <2f42edce999e4d15b9d5343cc9bbb948@ru-exch-1.corp.acronis.com>
References: <1480073296-6931-1-git-send-email-vsementsov@virtuozzo.com>
	<20161127191714.7wrvzbx7zpotad22@grep.be>
In-Reply-To: <20161127191714.7wrvzbx7zpotad22@grep.be>
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS
 extension
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Wouter Verhelst <w@uter.be>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, "kwolf@redhat.com" <kwolf@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "pborzenkov@virtuozzo.com" <pborzenkov@virtuozzo.com>, "stefanha@redhat.com" <stefanha@redhat.com>, "pbonzini@redhat.com" <pbonzini@redhat.com>, "mpa@pengutronix.de" <mpa@pengutronix.de>, "den@openvz.org" <den@openvz.org>

Hi, Wouter!

> Actually, come to think of that. What is the exact use case for this thin=
g? I understand you're trying to create incremental backups of things, whic=
h would imply you don't write from the client that is getting the ?
> block status thingies, right?

Overall, the most desired use case for this NBD extension is to allow 3-rd =
party software to make incremental backups.
Acronis (vendor of backup solutions) would support qemu backup if block sta=
tus is provided.


-----Original Message-----
From: Wouter Verhelst [mailto:w@uter.be]=20
Sent: Sunday, November 27, 2016 22:17
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: nbd-general@lists.sourceforge.net; kwolf@redhat.com; qemu-devel@nongnu.=
org; pborzenkov@virtuozzo.com; stefanha@redhat.com; pbonzini@redhat.com; mp=
a@pengutronix.de; den@openvz.org
Subject: Re: [Nbd] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension

Hi Vladimir,

Quickly: the reason I haven't merged this yes is twofold:
- I wasn't thrilled with the proposal at the time. It felt a bit
  hackish, and bolted onto NBD so you could use it, but without defining
  everything in the NBD protocol. "We're reading some data, but it's not
  about you". That didn't feel right
- There were a number of questions still unanswered (you're answering a
  few below, so that's good).

For clarity, I have no objection whatsoever to adding more commands if they=
're useful, but I would prefer that they're also useful with NBD on its own=
, i.e., without requiring an initiation or correlation of some state throug=
h another protocol or network connection or whatever. If that's needed, tha=
t feels like I didn't do my job properly, if you get my point.

On Fri, Nov 25, 2016 at 02:28:16PM +0300, Vladimir Sementsov-Ogievskiy wrot=
e:
> With the availability of sparse storage formats, it is often needed to=20
> query status of a particular range and read only those blocks of data=20
> that are actually present on the block device.
>=20
> To provide such information, the patch adds the BLOCK_STATUS extension=20
> with one new NBD_CMD_BLOCK_STATUS command, a new structured reply=20
> chunk format, and a new transmission flag.
>=20
> There exists a concept of data dirtiness, which is required during,=20
> for example, incremental block device backup. To express this concept=20
> via NBD protocol, this patch also adds a flag to NBD_CMD_BLOCK_STATUS=20
> to request dirtiness information rather than provisioning information;=20
> however, with the current proposal, data dirtiness is only useful with=20
> additional coordination outside of the NBD protocol (such as a way to=20
> start and stop the server from tracking dirty sectors).  Future NBD=20
> extensions may add commands to control dirtiness through NBD.
>=20
> Since NBD protocol has no notion of block size, and to mimic SCSI "GET=20
> LBA STATUS" command more closely, it has been chosen to return a list=20
> of extents in the response of NBD_CMD_BLOCK_STATUS command, instead of=20
> a bitmap.
>=20
> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
> CC: Denis V. Lunev <den@openvz.org>
> CC: Wouter Verhelst <w@uter.be>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>=20
> v3:
>=20
> Hi all. This is almost a resend of v2 (by Eric Blake), The only change=20
> is removing the restriction, that sum of status descriptor lengths=20
> must be equal to requested length. I.e., let's permit the server to=20
> replay with less data than required if it wants.

Reasonable, yes. The length that the client requests should be a maximum (i=
.e.
"I'm interested in this range"), not an exact request.

> Also, bit of NBD_FLAG_SEND_BLOCK_STATUS is changed to 9, as 8 is now =20
> NBD_FLAG_CAN_MULTI_CONN in master branch.

Right.

> And, finally, I've rebased this onto current state of=20
> extension-structured-reply branch (which itself should be rebased on=20
> master IMHO).

Probably a good idea, given the above.

> By this resend I just want to continue the diqussion, started about=20
> half a year ago. Here is a summary of some questions and ideas from v2
> diqussion:
>=20
> 1. Q: Synchronisation. Is such data (dirty/allocated) reliable?=20
>    A: This all is for read-only disks, so the data is static and unchange=
able.

I think we should declare that it's up to the client to ensure no other wri=
tes happen without its knowledge. This may be because the client and server=
 communicate out of band about state changes, or because the client somehow=
 knows that it's the only writer, or whatever.

We can easily do that by declaring that the result of that command only tal=
ks about *current* state, and that concurrent writes by different clients m=
ay invalidate the state. This is true for NBD in general (i.e., concurrent =
read or write commands from other clients may confuse file systems on top o=
f NBD), so it doesn't change expectations in any way.

> 2. Q: different granularities of dirty/allocated bitmaps. Any problems?
>    A: 1: server replies with status descriptors of any size, granularity
>          is hidden from the client
>       2: dirty/allocated requests are separate and unrelated to each
>          other, so their granularities are not intersecting

Not entirely sure anymore what this is about?

> 3. Q: selecting of dirty bitmap to export
>    A: several variants:
>       1: id of bitmap is in flags field of request
>           pros: - simple
>           cons: - it's a hack. flags field is for other uses.
>                 - we'll have to map bitmap names to these "ids"
>       2: introduce extended nbd requests with variable length and exploit=
 this
>          feature for BLOCK_STATUS command, specifying bitmap identifier.
>          pros: - looks like a true way
>          cons: - we have to create additional extension
>                - possible we have to create a map,
>                  {<QEMU bitmap name> <=3D> <NBD bitmap id>}
>       3: exteranl tool should select which bitmap to export. So, in case =
of Qemu
>          it should be something like qmp command block-export-dirty-bitma=
p.
>          pros: - simple
>                - we can extend it to behave like (2) later
>          cons: - additional qmp command to implement (possibly, the lesse=
r evil)
>          note: Hmm, external tool can make chose between allocated/dirty =
data too,
>                so, we can remove 'NBD_FLAG_STATUS_DIRTY' flag at all.

Downside of 3, though, is that it moves the definition of what the differen=
t states mean outside of the NBD protocol (i.e., the protocol messages are =
not entirely defined anymore, and their meaning depends on the clients and =
servers in use).

To avoid this, we should have a clear definition of what the reply means *b=
y default*, but then we can add a note that clients and servers can possibl=
y define other meanings out of band if they want to.

> 4. Q: Should not get_{allocated,dirty} be separate commands?
>    cons: Two commands with almost same semantic and similar means?
>    pros: However here is a good point of separating clearly defined and n=
ative
>          for block devices GET_BLOCK_STATUS from user-driven and actually
>          undefined data, called 'dirtyness'.

Yeah, having them separate commands might be a bad idea indeed.

> 5. Number of status descriptors, sent by server, should be restricted
>    variants:
>    1: just allow server to restrict this as it wants (which was done in v=
3)
>    2: (not excluding 1). Client specifies somehow the maximum for number
>       of descriptors.
>       2.1: add command flag, which will request only one descriptor
>            (otherwise, no restrictions from the client)
>       2.2: again, introduce extended nbd requests, and add field to
>            specify this maximum

I think having a flag which requests just one descriptor can be useful, but=
 I'm hesitant to add it unless it's actually going to be used; so in other =
words, I'll leave the decision on that bit to you.

> 6. A: What to do with unspecified flags (in request/reply)?
>    I think the normal variant is to make them reserved. (Server should
>    return EINVAL if found unknown bits, client should consider replay
>    with unknown bits as an error)

Right, probably best to do that, yes.

> =3D=3D=3D=3D=3D=3D
>=20
> Also, an idea on 2-4:
>=20
>     As we say, that dirtiness is unknown for NBD, and external tool
>     should specify, manage and understand, which data is actually
>     transmitted, why not just call it user_data and leave status field
>     of reply chunk unspecified in this case?
>=20
>     So, I propose one flag for NBD_CMD_BLOCK_STATUS:
>     NBD_FLAG_STATUS_USER. If it is clear, than behaviour is defined by
>     Eric's 'Block provisioning status' paragraph.  If it is set, we just
>     leave status field for some external... protocol? Who knows, what is
>     this user data.

Yes, this sounds like a reasonable approach.

>     Note: I'm not sure, that I like this (my) proposal. It's just an
>     idea, may be someone like it.  And, I think, it represents what we
>     are trying to do more honestly.

Indeed.

>     Note2: the next step of generalization will be NBD_CMD_USER, with
>     variable request size, structured reply and no definition :)

Well, er, no please, if we can avoid it :-)

> Another idea, about backups themselves:
>=20
>     Why do we need allocated/zero status for backup? IMHO we don't.

Well, I've been thinking so all along, but then I don't really know what it=
 is, in detail, that you want to do :-)

I can understand a "has this changed since time X" request, which the "dirt=
y" thing seems to want to be. Whether something is allocated is just a spec=
ial case of that.

Actually, come to think of that. What is the exact use case for this thing?=
 I understand you're trying to create incremental backups of things, which =
would imply you don't write from the client that is getting the block statu=
s thingies, right? If so, how about:

- NBD_OPT_GET_SNAPSHOTS (during negotiation): returns a list of
  snapshots. Not required, optional, includes a machine-readable form,
  not defined by NBD, which explains what the snapshot is about (e.g., a
  qemu json file). The "base" version of that is just "allocation
  status", and is implied (i.e., you don't need to run
  NBD_OPT_GET_SNAPSHOTS if you're not interested in anything but the
  allocation status).
- NBD_CMD_BLOCK_STATUS (during transmission), returns block descriptors
  which tell you what the status of a block of data is for each of the
  relevant snapshots that we know about.

Perhaps this is somewhat overengineered, but it does bring most of the defi=
nition of what a snapshot is back into the NBD protocol, without having to =
say "this could be anything", and without requiring connectivity over two p=
orts for this to be useful (e.g., you could store the machine-readable form=
 of the snapshot description into your backup program and match what they m=
ean with what you're interested in at restore time, etc).

This wouldn't work if you're interested in new snapshots that get created o=
nce we've already moved into transmission, but hey.

Thoughts?

>     Full backup: just do structured read - it will show us, which chunks
>     may be treaded as zeroes.

Right.

>     Incremental backup: get dirty bitmap (somehow, for example through
>     user-defined part of proposed command), than, for dirty blocks, read
>     them through structured read, so information about zero/unallocated
>     areas are here.
>=20
> For me all the variants above are OK. Let's finally choose something.
>=20
> v2:
> v1 was:=20
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg05574.html
>=20
> Since then, we've added the STRUCTURED_REPLY extension, which=20
> necessitates a rather larger rebase; I've also changed things to=20
> rename the command 'NBD_CMD_BLOCK_STATUS', changed the request modes=20
> to be determined by boolean flags (rather than by fixed values of the=20
> 16-bit flags field), changed the reply status fields to be bitwise-or=20
> values (with a default of 0 always being sane), and changed the=20
> descriptor layout to drop an offset but to include a 32-bit status so=20
> that the descriptor is nicely 8-byte aligned without padding.
>=20
>  doc/proto.md | 155=20
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 154 insertions(+), 1 deletion(-)

[...]

I'll commit this in a minute into a separate branch called "extension-block=
status", under the understanding that changes are still required, as per ab=
ove (i.e., don't assume that just because there's a branch I'm happy with t=
he current result ;-)

Regards

--
< ron> I mean, the main *practical* problem with C++, is there's like a doz=
en
       people in the world who think they really understand all of its rule=
s,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

---------------------------------------------------------------------------=
---
_______________________________________________
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general