All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Bligh <alex@alex.org.uk>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Alex Bligh <alex@alex.org.uk>,
	"nbd-general@lists.sourceforge.net"
	<nbd-general@lists.sourceforge.net>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Pavel Borzenkov <pborzenkov@virtuozzo.com>,
	"Stefan stefanha@redhat. com" <stefanha@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>, Wouter Verhelst <w@uter.be>,
	Eric Blake <eblake@redhat.com>,
	mpa@pengutronix.de
Subject: Re: [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Tue, 29 Nov 2016 14:52:35 +0000	[thread overview]
Message-ID: <11F2E6BC-D538-466B-9D80-541D146EF2A0@alex.org.uk> (raw)
In-Reply-To: <b55c59cc-626c-ba45-1d89-d7c5583616af@virtuozzo.com>

Vladimir,

>>> 4. Q: Should not get_{allocated,dirty} be separate commands?
>>>   cons: Two commands with almost same semantic and similar means?
>>>   pros: However here is a good point of separating clearly defined and native
>>>         for block devices GET_BLOCK_STATUS from user-driven and actually
>>>         undefined data, called 'dirtyness'.
>> I'm suggesting one generic 'read bitmap' command like you.
> 
> To support get_block_status in this general read_bitmap, we will need to define something like 'multibitmap', which allows several bits per chunk, as allocation data has two: zero and allocated.

I think you are saying that for arbitrary 'bitmap' there might be more than one state. For instance, one might (in an allocation 'bitmap') have a hole, a non-hole-zero, or a non-hole-non-zero.

In the spec I'd suggest, for one 'bitmap', we represent the output as extents. Each extent has a status. For the bitmap to be useful, at least two status need to be possible, but the above would have three. This could be internally implemented by the server as (a) a bitmap (with two bits per entry), (b) two bitmaps (possibly with different granularity), (c) something else (e.g. reading file extents, then if the data is allocated manually comparing it against zero).

I should have put 'bitmap' in quotes in what I wrote because returning extents (as you suggested) is a good idea, and there need not be an actual bitmap.

>>> 5. Number of status descriptors, sent by server, should be restricted
>>>   variants:
>>>   1: just allow server to restrict this as it wants (which was done in v3)
>>>   2: (not excluding 1). Client specifies somehow the maximum for number
>>>      of descriptors.
>>>      2.1: add command flag, which will request only one descriptor
>>>           (otherwise, no restrictions from the client)
>>>      2.2: again, introduce extended nbd requests, and add field to
>>>           specify this maximum
>> I think some form of extended request is the way to go, but out of
>> interest, what's the issue with as many descriptors being sent as it
>> takes to encode the reply? The client can just consume the remainder
>> (without buffering) and reissue the request at a later point for
>> the areas it discarded.
> 
> the issue is: too many descriptors possible. So, (1) solves it. (2) is optional, just to simplify/optimize client side.

I think I'd prefer the server to return what it was asked for, and the client to deal with it. So either the client should be able to specify a maximum number of extents (and if we are extending the command structure, that's possible) or we deal with the client consuming and retrying unwanted extents. The reason for this is that it's unlikely the server can know a priori the number of extents which is the appropriate maximum for the client.

>>> +    The list of block status descriptors within the
>>> +    `NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
>>> +    of the file starting from specified *offset*, and the sum of the
>>> +    *length* fields of each descriptor MUST not be greater than the
>>> +    overall *length* of the request. This means that the server MAY
>>> +    return less data than required. However the server MUST return at
>>> +    least one status descriptor
>> I'm not sure I understand why that's useful. What should the client
>> infer from the server refusing to provide information? We don't
>> permit short reads etc.
> 
> if the bitmap is 010101010101 we will have too many descriptors. For example, 16tb disk, 64k granularity -> 2G of descriptors payload.

Yep. And the cost of consuming and retrying is quite high. One option would be for the client to realise this is a possibility, and not request the entire extent map for a 16TB disk, as it might be very large! Even if the client worked at e.g. a 64MB level (where they'd get a maximum of 1024 extents per reply), this isn't going to noticeably increase the round trip timing. One issue here is that to determine a 'reasonable' size, the client needs to know the minimum length of any extent.

I think the answer is probably a 'maximum number of extents' in the request packet.

Of course with statuses in extent, the final extent could be represented as 'I don't know, break this bit into a separate request' status.

-- 
Alex Bligh

  reply	other threads:[~2016-11-29 14:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-25 11:28 [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension Vladimir Sementsov-Ogievskiy
2016-11-25 14:02 ` Stefan Hajnoczi
2016-11-27 19:17 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-11-28 11:19   ` Stefan Hajnoczi
2016-11-28 17:33     ` Wouter Verhelst
2016-11-29  9:17       ` Stefan Hajnoczi
2016-11-29 10:50       ` Wouter Verhelst
2016-11-29 12:41         ` Vladimir Sementsov-Ogievskiy
2016-11-29 13:08           ` Wouter Verhelst
2016-11-29 13:07         ` Alex Bligh
2016-12-01 10:14         ` Wouter Verhelst
2016-12-01 11:26           ` Vladimir Sementsov-Ogievskiy
2016-12-02  9:25             ` Wouter Verhelst
2016-11-28 23:15   ` John Snow
2016-11-29 10:18   ` Kevin Wolf
2016-11-29 11:34     ` Vladimir Sementsov-Ogievskiy
2016-11-30 10:41   ` Sergey Talantov
2016-11-29 12:57 ` [Qemu-devel] " Alex Bligh
2016-11-29 14:36   ` Vladimir Sementsov-Ogievskiy
2016-11-29 14:52     ` Alex Bligh [this message]
2016-11-29 15:07       ` Vladimir Sementsov-Ogievskiy
2016-11-29 15:17         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-12-01 23:42   ` [Qemu-devel] " John Snow
2016-12-02  9:16     ` Vladimir Sementsov-Ogievskiy
2016-12-02 18:45     ` Alex Bligh
2016-12-02 20:39       ` John Snow
2016-12-03 11:08         ` Alex Bligh
2016-12-05  8:36         ` Vladimir Sementsov-Ogievskiy
2016-12-06 13:32         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-12-06 16:39           ` John Snow
2016-12-08  3:39       ` [Qemu-devel] " Alex Bligh
2016-12-08  6:58         ` Vladimir Sementsov-Ogievskiy
2016-12-08 14:13           ` Alex Bligh
2016-12-08  9:44         ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-12-08 14:40           ` Alex Bligh
2016-12-08 15:59             ` Eric Blake
2016-12-08 16:03               ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11F2E6BC-D538-466B-9D80-541D146EF2A0@alex.org.uk \
    --to=alex@alex.org.uk \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pbonzini@redhat.com \
    --cc=pborzenkov@virtuozzo.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.