From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:43079) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QgfUq-0003sk-U0 for qemu-devel@nongnu.org; Tue, 12 Jul 2011 12:07:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QgfUo-0005n5-4X for qemu-devel@nongnu.org; Tue, 12 Jul 2011 12:07:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:25636) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QgfUn-0005mx-HC for qemu-devel@nongnu.org; Tue, 12 Jul 2011 12:07:37 -0400 Message-ID: <4E1C71F2.4030507@redhat.com> Date: Tue, 12 Jul 2011 18:10:26 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <4E131D0D.307@redhat.com> <20110711125432.GA19686@stefanha-thinkpad.localdomain> <20110711163226.GA10924@amt.cnet> <4E1C009C.1010408@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] live block copy/stream/snapshot discussion List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Anthony Liguori , Dor Laor , Stefan Hajnoczi , Marcelo Tosatti , qemu-devel , Avi Kivity , Adam Litke Am 12.07.2011 17:45, schrieb Stefan Hajnoczi: >>>> Image streaming API >>>> =================== >>>> >>>> For leaf images with copy-on-read semantics, the stream commands allow the user >>>> to populate local blocks by manually streaming them from the backing image. >>>> Once all blocks have been streamed, the dependency on the original backing >>>> image can be removed. Therefore, stream commands can be used to implement >>>> post-copy live block migration and rapid deployment. >>>> >>>> The block_stream command can be used to stream a single cluster, to >>>> start streaming the entire device, and to cancel an active stream. It >>>> is easiest to allow the block_stream command to manage streaming for the >>>> entire device but a managent tool could use single cluster mode to >>>> throttle the I/O rate. >> >> As discussed earlier, having the management send requests for each >> single cluster doesn't make any sense at all. It wouldn't only throttle >> the I/O rate but bring it down to a level that makes it unusable. What >> you really want is to allow the management to give us a range (offset + >> length) that qemu should stream. > > I feel that an iteration interface is problematic whether the > management tool or QEMU decide what to stream. Let's have just the > background streaming operation. > > The problem with byte ranges is two-fold. The management tool doesn't > know which regions of the image are allocated so it may do a lot of > nop calls to already-allocated regions with no intelligence as to > where the next sensible offset for streaming is. Secondly, because > the progress and performance of image streaming depend largely on > whether or not clusters are allocated (it is very fast when a cluster > is already allocated and we have no work to do), offsets are bad > indicators of progress to the user. I think it's best not to expose > these details to the management tool at all. > > The only reason for the iteration interface was to punt I/O throttling > to the management tool. I think it would be easier to just throttle > inside the streaming function. > > Kevin: Are you happy with dropping the iteration interface? > Adam: Is there a libvirt requirement for iteration or could we support > background copy only? Okay, works for me. >>>> The command synopses are as follows: >>>> >>>> block_stream >>>> ------------ >>>> >>>> Copy data from a backing file into a block device. >>>> >>>> If the optional 'all' argument is true, this operation is performed in the >>>> background until the entire backing file has been copied. The status of >>>> ongoing block_stream operations can be checked with query-block-stream. >> >> Not sure if it's a good idea to use a bool argument to turn a command >> into its opposite. I think having a separate command for stopping would >> be cleaner. Something for the QMP folks to decide, though. > > git branch new_branch > git branch -D new_branch > > Makes sense to me :) I don't think you should compare a command line option to a programming interface. Having a git_create_branch(const char *name, bool delete) would really look strange. Anyway, probably a matter of taste. A hint that separate commands would make sense is that the stop command won't need the other arguments that the start command gets ('all' and 'base'). >>>> Arguments: >>>> >>>> - all: copy entire device (json-bool, optional) >>>> - stop: stop copying to device (json-bool, optional) >>>> - device: device name (json-string) >>> >>> It must be possible to specify backing file that will be >>> active after streaming finishes (data from that file will not >>> be streamed into active file, of course). >> >> Yes, I think the common base image belongs here. > > Right. We need to specify it by filename: > > - base: filename of base file (json-string, optional) > > Sectors are not copied from the base file and its backing file > chain. The following describes this feature: > Before: base <- sn1 <- sn2 <- sn3 <- vm.img > After: base <- vm.img Does this imply that a rebase -u happens always after completion? >> With all = false, where does the streaming begin? > > Streaming begins at the start of the image. > >> Do you have something like the "current streaming offset" in the state of each BlockDriverState? > > Yes, there is a StreamState for each block device that has an > in-progress operation. The progress is saved between block_stream > (without -a) invocations so the caller does not need to specify the > streaming offset as an argument. > > Thanks for pointing out these weaknesses in the documentation. It > should really be explained fully. I think we also need to describe error cases. For example, what happens if you try to start streaming while it's already in progress? >>>> Return: >>>> >>>> - device: device name (json-string) >>>> - len: size of the device, in bytes (json-int) >>>> - offset: ending offset of the completed I/O, in bytes (json-int) >> >> So you only get the reply when the request has completed? With the >> current monitor, this means that QMP is blocked while we stream, doesn't >> it? How are you supposed to send the stop command then? > > Incomplete documentation again, sorry. The block_stream command > behaves as follows: > > 1. block_stream all returns immediately and the BLOCK_STREAM_COMPLETED > event is raised when streaming completes either successfully or with > an error. > > 2. block_stream stop returns when the in-progress streaming operation > has been safely stopped. > > 3. block_stream returns when one iteration of streaming has completed. > >> Two of three examples below have an empty return value instead, so they >> are not compliant to this specification. > > I will update the documentation, the non-all invocations do not return anything. Okay, then I don't understand what the 'offset' return value means. The text says "offset of the completed I/O". If all=true immediately returns, shouldn't it always be 0? >> I find it rather disturbing that a command like 'change' has made it >> into QMP... Anyway, I don't think this is really what we need. >> >> We have two switches to do. The first one happens before starting the >> copy: Creating the copy, with the source as its backing file, and >> switching to that. The monitor command to achieve this is snapshot_blkdev. > > I don't think that creating image files in QEMU is going to work when > running KVM with libvirt (SELinux). The QEMU process does not have > the ability to create new image files. It needs at least a file > descriptor to an empty file or maybe a file that has been created > using qemu-img like I showed above. Independent problem. We're really creating an external snapshot here, so we should use the function for external snapshots. libvirt can pre-create an empty image file, so that qemu will write the image format data into it, but we have discussed this before. Kevin