All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: nbd-general@lists.sourceforge.net, qemu-devel@nongnu.org,
	kwolf@redhat.com, pbonzini@redhat.com, pborzenkov@virtuozzo.com,
	den@openvz.org, w@uter.be, eblake@redhat.com, alex@alex.org.uk,
	mpa@pengutronix.de
Subject: Re: [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Fri, 25 Nov 2016 14:02:48 +0000	[thread overview]
Message-ID: <20161125140248.GA7493@stefanha-x1.localdomain> (raw)
In-Reply-To: <1480073296-6931-1-git-send-email-vsementsov@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 7501 bytes --]

On Fri, Nov 25, 2016 at 02:28:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> With the availability of sparse storage formats, it is often needed
> to query status of a particular range and read only those blocks of
> data that are actually present on the block device.
> 
> To provide such information, the patch adds the BLOCK_STATUS
> extension with one new NBD_CMD_BLOCK_STATUS command, a new
> structured reply chunk format, and a new transmission flag.
> 
> There exists a concept of data dirtiness, which is required
> during, for example, incremental block device backup. To express
> this concept via NBD protocol, this patch also adds a flag to
> NBD_CMD_BLOCK_STATUS to request dirtiness information rather than
> provisioning information; however, with the current proposal, data
> dirtiness is only useful with additional coordination outside of
> the NBD protocol (such as a way to start and stop the server from
> tracking dirty sectors).  Future NBD extensions may add commands
> to control dirtiness through NBD.
> 
> Since NBD protocol has no notion of block size, and to mimic SCSI
> "GET LBA STATUS" command more closely, it has been chosen to return
> a list of extents in the response of NBD_CMD_BLOCK_STATUS command,
> instead of a bitmap.
> 
> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
> CC: Denis V. Lunev <den@openvz.org>
> CC: Wouter Verhelst <w@uter.be>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> v3:
> 
> Hi all. This is almost a resend of v2 (by Eric Blake), The only change is
> removing the restriction, that sum of status descriptor lengths must be equal
> to requested length. I.e., let's permit the server to replay with less data
> than required if it wants.
> 
> Also, bit of NBD_FLAG_SEND_BLOCK_STATUS is changed to 9, as 8 is now
>  NBD_FLAG_CAN_MULTI_CONN in master branch.
> 
> And, finally, I've rebased this onto current state of
> extension-structured-reply branch (which itself should be rebased on
> master IMHO).
> 
> By this resend I just want to continue the diqussion, started about half
> a year ago. Here is a summary of some questions and ideas from v2
> diqussion:
> 
> 1. Q: Synchronisation. Is such data (dirty/allocated) reliable? 
>    A: This all is for read-only disks, so the data is static and unchangeable.
> 
> 2. Q: different granularities of dirty/allocated bitmaps. Any problems?
>    A: 1: server replies with status descriptors of any size, granularity
>          is hidden from the client
>       2: dirty/allocated requests are separate and unrelated to each
>          other, so their granularities are not intersecting
> 
> 3. Q: selecting of dirty bitmap to export
>    A: several variants:
>       1: id of bitmap is in flags field of request
>           pros: - simple
>           cons: - it's a hack. flags field is for other uses.
>                 - we'll have to map bitmap names to these "ids"
>       2: introduce extended nbd requests with variable length and exploit this
>          feature for BLOCK_STATUS command, specifying bitmap identifier.
>          pros: - looks like a true way
>          cons: - we have to create additional extension
>                - possible we have to create a map,
>                  {<QEMU bitmap name> <=> <NBD bitmap id>}
>       3: exteranl tool should select which bitmap to export. So, in case of Qemu
>          it should be something like qmp command block-export-dirty-bitmap.
>          pros: - simple
>                - we can extend it to behave like (2) later
>          cons: - additional qmp command to implement (possibly, the lesser evil)
>          note: Hmm, external tool can make chose between allocated/dirty data too,
>                so, we can remove 'NBD_FLAG_STATUS_DIRTY' flag at all.
> 
> 4. Q: Should not get_{allocated,dirty} be separate commands?
>    cons: Two commands with almost same semantic and similar means?
>    pros: However here is a good point of separating clearly defined and native
>          for block devices GET_BLOCK_STATUS from user-driven and actually
>          undefined data, called 'dirtyness'.
> 
> 5. Number of status descriptors, sent by server, should be restricted
>    variants:
>    1: just allow server to restrict this as it wants (which was done in v3)
>    2: (not excluding 1). Client specifies somehow the maximum for number
>       of descriptors.
>       2.1: add command flag, which will request only one descriptor
>            (otherwise, no restrictions from the client)
>       2.2: again, introduce extended nbd requests, and add field to
>            specify this maximum
> 
> 6. A: What to do with unspecified flags (in request/reply)?
>    I think the normal variant is to make them reserved. (Server should
>    return EINVAL if found unknown bits, client should consider replay
>    with unknown bits as an error)
> 
> ======
> 
> Also, an idea on 2-4:
> 
>     As we say, that dirtiness is unknown for NBD, and external tool
>     should specify, manage and understand, which data is actually
>     transmitted, why not just call it user_data and leave status field
>     of reply chunk unspecified in this case?
> 
>     So, I propose one flag for NBD_CMD_BLOCK_STATUS:
>     NBD_FLAG_STATUS_USER. If it is clear, than behaviour is defined by
>     Eric's 'Block provisioning status' paragraph.  If it is set, we just
>     leave status field for some external... protocol? Who knows, what is
>     this user data.
> 
>     Note: I'm not sure, that I like this (my) proposal. It's just an
>     idea, may be someone like it.  And, I think, it represents what we
>     are trying to do more honestly.
> 
>     Note2: the next step of generalization will be NBD_CMD_USER, with
>     variable request size, structured reply and no definition :)
> 
> 
> Another idea, about backups themselves:
> 
>     Why do we need allocated/zero status for backup? IMHO we don't.
> 
>     Full backup: just do structured read - it will show us, which chunks
>     may be treaded as zeroes.
> 
>     Incremental backup: get dirty bitmap (somehow, for example through
>     user-defined part of proposed command), than, for dirty blocks, read
>     them through structured read, so information about zero/unallocated
>     areas are here.
> 
> For me all the variants above are OK. Let's finally choose something.
> 
> v2:
> v1 was: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg05574.html
> 
> Since then, we've added the STRUCTURED_REPLY extension, which
> necessitates a rather larger rebase; I've also changed things
> to rename the command 'NBD_CMD_BLOCK_STATUS', changed the request
> modes to be determined by boolean flags (rather than by fixed
> values of the 16-bit flags field), changed the reply status fields
> to be bitwise-or values (with a default of 0 always being sane),
> and changed the descriptor layout to drop an offset but to include
> a 32-bit status so that the descriptor is nicely 8-byte aligned
> without padding.
> 
>  doc/proto.md | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 154 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

  reply	other threads:[~2016-11-25 14:03 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-25 11:28 [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension Vladimir Sementsov-Ogievskiy
2016-11-25 14:02 ` Stefan Hajnoczi [this message]
2016-11-27 19:17 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-11-28 11:19   ` Stefan Hajnoczi
2016-11-28 17:33     ` Wouter Verhelst
2016-11-29  9:17       ` Stefan Hajnoczi
2016-11-29 10:50       ` Wouter Verhelst
2016-11-29 12:41         ` Vladimir Sementsov-Ogievskiy
2016-11-29 13:08           ` Wouter Verhelst
2016-11-29 13:07         ` Alex Bligh
2016-12-01 10:14         ` Wouter Verhelst
2016-12-01 11:26           ` Vladimir Sementsov-Ogievskiy
2016-12-02  9:25             ` Wouter Verhelst
2016-11-28 23:15   ` John Snow
2016-11-29 10:18   ` Kevin Wolf
2016-11-29 11:34     ` Vladimir Sementsov-Ogievskiy
2016-11-30 10:41   ` Sergey Talantov
2016-11-29 12:57 ` [Qemu-devel] " Alex Bligh
2016-11-29 14:36   ` Vladimir Sementsov-Ogievskiy
2016-11-29 14:52     ` Alex Bligh
2016-11-29 15:07       ` Vladimir Sementsov-Ogievskiy
2016-11-29 15:17         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-12-01 23:42   ` [Qemu-devel] " John Snow
2016-12-02  9:16     ` Vladimir Sementsov-Ogievskiy
2016-12-02 18:45     ` Alex Bligh
2016-12-02 20:39       ` John Snow
2016-12-03 11:08         ` Alex Bligh
2016-12-05  8:36         ` Vladimir Sementsov-Ogievskiy
2016-12-06 13:32         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-12-06 16:39           ` John Snow
2016-12-08  3:39       ` [Qemu-devel] " Alex Bligh
2016-12-08  6:58         ` Vladimir Sementsov-Ogievskiy
2016-12-08 14:13           ` Alex Bligh
2016-12-08  9:44         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-12-08 14:40           ` Alex Bligh
2016-12-08 15:59             ` Eric Blake
2016-12-08 16:03               ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161125140248.GA7493@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=alex@alex.org.uk \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pbonzini@redhat.com \
    --cc=pborzenkov@virtuozzo.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.