From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52730) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z9NST-0005vT-Pg for qemu-devel@nongnu.org; Sun, 28 Jun 2015 21:02:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z9NSQ-0003D5-Hc for qemu-devel@nongnu.org; Sun, 28 Jun 2015 21:02:01 -0400 Message-ID: <559099E5.6080205@cn.fujitsu.com> Date: Mon, 29 Jun 2015 09:05:41 +0800 From: Wen Congyang MIME-Version: 1.0 References: <1434617361-17778-1-git-send-email-wency@cn.fujitsu.com> <1434617361-17778-8-git-send-email-wency@cn.fujitsu.com> <20150618125533.GE25387@stefanha-thinkpad.redhat.com> <5582D777.8060202@gmail.com> <20150618160618.GF822@stefanha-thinkpad.redhat.com> <55836860.2010401@cn.fujitsu.com> <20150619104953.GB9807@stefanha-thinkpad.redhat.com> <5584DEA8.80005@gmail.com> <20150624140755.GC2206@work-vm> <558B52EE.3000800@cn.fujitsu.com> <20150626190335.GN2186@work-vm> In-Reply-To: <20150626190335.GN2186@work-vm> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Kevin Wolf , Fam Zheng , Wen Congyang , qemu block , Stefan Hajnoczi , Jiang Yunhong , Dong Eddie , qemu devel , Max Reitz , zhanghailiang , Gonglei , Stefan Hajnoczi , Paolo Bonzini , Yang Hongyang , Lai Jiangshan On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote: > * Wen Congyang (wency@cn.fujitsu.com) wrote: >> On 06/24/2015 10:07 PM, Dr. David Alan Gilbert wrote: >>> * Wen Congyang (ghostwcy@gmail.com) wrote: >>>> At 2015/6/19 18:49, Stefan Hajnoczi Wrote: >>>>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote: >>>>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote: >>>>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote: >>>>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote: >>>>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote: >>>>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp) >>>>>>>>>> +{ >>>>>>>>>> + BlockDriver *drv = bs->drv; >>>>>>>>>> + >>>>>>>>>> + if (drv && drv->bdrv_connect) { >>>>>>>>>> + drv->bdrv_connect(bs, errp); >>>>>>>>>> + } else if (bs->file) { >>>>>>>>>> + bdrv_connect(bs->file, errp); >>>>>>>>>> + } else { >>>>>>>>>> + error_setg(errp, "this feature or command is not currently supported"); >>>>>>>>>> + } >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +void bdrv_disconnect(BlockDriverState *bs) >>>>>>>>>> +{ >>>>>>>>>> + BlockDriver *drv = bs->drv; >>>>>>>>>> + >>>>>>>>>> + if (drv && drv->bdrv_disconnect) { >>>>>>>>>> + drv->bdrv_disconnect(bs); >>>>>>>>>> + } else if (bs->file) { >>>>>>>>>> + bdrv_disconnect(bs->file); >>>>>>>>>> + } >>>>>>>>>> +} >>>>>>>>> >>>>>>>>> Please add doc comments describing the semantics of these commands. >>>>>>>> >>>>>>>> Where should it be documented? In the header file? >>>>>>> >>>>>>> block.h doesn't document prototypes in the header file, please document >>>>>>> the function definition in block.c. (QEMU is not consistent here, some >>>>>>> places do it the other way around.) >>>>>>> >>>>>>>>> Why are these operations needed when there is already a bs->drv == NULL >>>>>>>>> case which means the BDS is not ready for read/write? >>>>>>>>> >>>>>>>> >>>>>>>> The purpos is that: don't connect to nbd server when opening a nbd client. >>>>>>>> connect/disconnect >>>>>>>> to nbd server when we need to do it. >>>>>>>> >>>>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here, >>>>>>>> connect/disconnect >>>>>>>> means that connect/disconnect to remote target(The target may be in another >>>>>>>> host). >>>>>>> >>>>>>> Connect/disconnect puts something on the QEMU command-line that isn't >>>>>>> ready at startup time. >>>>>>> >>>>>>> How about using monitor commands to add objects when needed instead? >>>>>>> >>>>>>> That is cleaner because it doesn't introduce a new state (which is only >>>>>>> implemented for nbd). >>>>>>> >>>>>> >>>>>> The problem is that, nbd client is one child of quorum, and quorum must have more >>>>>> than one child. The nbd server is not ready until colo is running. >>>>> >>>>> A monitor command to hot add/remove quorum children solves this problem >>>>> and could also be used in other scenarios (e.g. user decides to take a >>>>> quorum child offline). >>>>> >>>> >>>> For replication case, we always do checkpoint again and again after >>>> migration. If the disk is not synced before migration, we will use disk mirgation or mirror >>>> job to sync it. >>> >>> Can you document the way that you use disk migration or mirror, together >>> with your COLO setup? I think it would make it easier to understand this >>> restriction. >>> At the moment I don't understand how you can switch from doing a disk migration >>> into COLO mode - there seems to be a gap between the end of disk migration and the start >>> of COLO. >> >> In my test, I use disk migration. >> >> After migration and before starting COLO, we will disable disk migration: >> + /* Disable block migration */ >> + s->params.blk = 0; >> + s->params.shared = 0; >> + qemu_savevm_state_begin(trans, &s->params); > > Ah, I hadn't realised you could do that; so do you just do: > > migrate_set_parameter colo on > migrate -d -b tcp:otherhhost:port > > How does the secondary know to feed that data straight into the disk without > recording all the old data into the hidden-disk ? hidden disk and active disk will be made empty when starting block replication. > >> If the user uses mirror job, we don't cancel the mirror job now. > > It would be good to get it to work with mirror, that seems preferred these > days to the old block migration. In normal migration, is mirror job created and cancelled by libvirt? > > With block migration or mirror, then that pretty much allows you to replace > a failed secondary doesn't it without restarting? The only thing stopping > you is the NBD client needs to point to the address of the new secondary. > Stefan's suggestion of being able to modify the quorum on the fly would seem > to fix that problem. Yes, I am implementing adding/removing quorum child now. > (The other thing I can see is that the secondary wouldn't have the conntrack > data for open connections; but that's a separate problem). > > A related topic was that a colleague was asking about starting the qemu > up and only then setting the NBD target (he's trying to wire it into OpenStack); > I suggested that instead of specifying the drive on the command line, it > might work to use a drive_add to add the whole quorum/drive stack before starting > the guest; however again something to modify the quorum would be easier. > > Finally the last good reason I can see for being able to modify the quorum on the fly > is that you'll need it for continuous ft to be able to transmute a secondary into > a new primary. Yes, I think adding a filter on the top is also needed. Thanks Wen Congyang > > Dave > >> >>> >>>> So we cannot start block replication when migration is running. We need >>>> that nbd >>>> client is not ready when migration is running, and it is ready between >>>> migration ends >>>> and checkpoint begins. Using a monotir command add the nbd client will cause >>>> larger >>>> downtime. So if the nbd client has been added(only not connect to the nbd >>>> server), >>>> we can connect to nbd server automatically. >>> >>> Without the disk migration or mirror, I can't see the need for the delayed bdrv_connect. >>> I can see that you want to do a disconnect at failover, however you need >>> to take care because if the network is broken at the point of failover >>> you need to make sure nothing blocks. >> >> disk migration is needed if the disk is not synced before migration. >> >> Thanks >> Wen Congyang >> >>> >>> Dave >>> >>>> >>>> Thanks >>>> Wen Congyang >>> -- >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>> . >>> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > . >