From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52730)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1Z9NST-0005vT-Pg
	for qemu-devel@nongnu.org; Sun, 28 Jun 2015 21:02:03 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1Z9NSQ-0003D5-Hc
	for qemu-devel@nongnu.org; Sun, 28 Jun 2015 21:02:01 -0400
Message-ID: <559099E5.6080205@cn.fujitsu.com>
Date: Mon, 29 Jun 2015 09:05:41 +0800
From: Wen Congyang <wency@cn.fujitsu.com>
MIME-Version: 1.0
References: <1434617361-17778-1-git-send-email-wency@cn.fujitsu.com>
	<1434617361-17778-8-git-send-email-wency@cn.fujitsu.com>
	<20150618125533.GE25387@stefanha-thinkpad.redhat.com>
	<5582D777.8060202@gmail.com>
	<20150618160618.GF822@stefanha-thinkpad.redhat.com>
	<55836860.2010401@cn.fujitsu.com>
	<20150619104953.GB9807@stefanha-thinkpad.redhat.com>
	<5584DEA8.80005@gmail.com> <20150624140755.GC2206@work-vm>
	<558B52EE.3000800@cn.fujitsu.com> <20150626190335.GN2186@work-vm>
In-Reply-To: <20150626190335.GN2186@work-vm>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new
 block driver interface to connect/disconnect the remote target
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>, Wen Congyang <ghostwcy@gmail.com>, qemu block <qemu-block@nongnu.org>, Stefan Hajnoczi <stefanha@gmail.com>, Jiang Yunhong <yunhong.jiang@intel.com>, Dong Eddie <eddie.dong@intel.com>, qemu devel <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, zhanghailiang <zhang.zhanghailiang@huawei.com>, Gonglei <arei.gonglei@huawei.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Yang Hongyang <yanghy@cn.fujitsu.com>, Lai Jiangshan <laijs@cn.fujitsu.com>

On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 06/24/2015 10:07 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (ghostwcy@gmail.com) wrote:
>>>> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
>>>>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
>>>>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
>>>>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>>>>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>>>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>>>> +
>>>>>>>>>> +    if (drv && drv->bdrv_connect) {
>>>>>>>>>> +        drv->bdrv_connect(bs, errp);
>>>>>>>>>> +    } else if (bs->file) {
>>>>>>>>>> +        bdrv_connect(bs->file, errp);
>>>>>>>>>> +    } else {
>>>>>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>>>>>>>> +{
>>>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>>>> +
>>>>>>>>>> +    if (drv && drv->bdrv_disconnect) {
>>>>>>>>>> +        drv->bdrv_disconnect(bs);
>>>>>>>>>> +    } else if (bs->file) {
>>>>>>>>>> +        bdrv_disconnect(bs->file);
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> Please add doc comments describing the semantics of these commands.
>>>>>>>>
>>>>>>>> Where should it be documented? In the header file?
>>>>>>>
>>>>>>> block.h doesn't document prototypes in the header file, please document
>>>>>>> the function definition in block.c.  (QEMU is not consistent here, some
>>>>>>> places do it the other way around.)
>>>>>>>
>>>>>>>>> Why are these operations needed when there is already a bs->drv == NULL
>>>>>>>>> case which means the BDS is not ready for read/write?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
>>>>>>>> connect/disconnect
>>>>>>>> to nbd server when we need to do it.
>>>>>>>>
>>>>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>>>>>>>> connect/disconnect
>>>>>>>> means that connect/disconnect to remote target(The target may be in another
>>>>>>>> host).
>>>>>>>
>>>>>>> Connect/disconnect puts something on the QEMU command-line that isn't
>>>>>>> ready at startup time.
>>>>>>>
>>>>>>> How about using monitor commands to add objects when needed instead?
>>>>>>>
>>>>>>> That is cleaner because it doesn't introduce a new state (which is only
>>>>>>> implemented for nbd).
>>>>>>>
>>>>>>
>>>>>> The problem is that, nbd client is one child of quorum, and quorum must have more
>>>>>> than one child. The nbd server is not ready until colo is running.
>>>>>
>>>>> A monitor command to hot add/remove quorum children solves this problem
>>>>> and could also be used in other scenarios (e.g. user decides to take a
>>>>> quorum child offline).
>>>>>
>>>>
>>>> For replication case, we always do checkpoint again and again after
>>>> migration. If the disk is not synced before migration, we will use disk mirgation or mirror
>>>> job to sync it.
>>>
>>> Can you document the way that you use disk migration or mirror, together
>>> with your COLO setup?   I think it would make it easier to understand this
>>> restriction.
>>> At the moment I don't understand how you can switch from doing a disk migration
>>> into COLO mode - there seems to be a gap between the end of disk migration and the start
>>> of COLO.
>>
>> In my test, I use disk migration.
>>
>> After migration and before starting COLO, we will disable disk migration:
>> +    /* Disable block migration */
>> +    s->params.blk = 0;
>> +    s->params.shared = 0;
>> +    qemu_savevm_state_begin(trans, &s->params);
> 
> Ah, I hadn't realised you could do that; so do you just do:
> 
> migrate_set_parameter colo on
> migrate -d -b tcp:otherhhost:port
> 
> How does the secondary know to feed that data straight into the disk without
> recording all the old data into the hidden-disk ?

hidden disk and active disk will be made empty when starting block replication.

> 
>> If the user uses mirror job, we don't cancel the mirror job now.
> 
> It would be good to get it to work with mirror, that seems preferred these
> days to the old block migration.

In normal migration, is mirror job created and cancelled by libvirt?

> 
> With block migration or mirror, then that pretty much allows you to replace
> a failed secondary doesn't it without restarting?  The only thing stopping
> you is the NBD client needs to point to the address of the new secondary.
> Stefan's suggestion of being able to modify the quorum on the fly would seem
> to fix that problem.

Yes, I am implementing adding/removing quorum child now.

> (The other thing I can see is that the secondary wouldn't have the conntrack
> data for open connections; but that's a separate problem).
> 
> A related topic was that a colleague was asking about starting the qemu
> up and only then setting the NBD target (he's trying to wire it into OpenStack);
> I suggested that instead of specifying the drive on the command line, it
> might work to use a drive_add to add the whole quorum/drive stack before starting
> the guest; however again something to modify the quorum would be easier.
> 
> Finally the last good reason I can see for being able to modify the quorum on the fly
> is that you'll need it for continuous ft to be able to transmute a secondary into
> a new primary.

Yes, I think adding a filter on the top is also needed.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>>>
>>>> So we cannot start block replication when migration is running. We need
>>>> that nbd
>>>> client is not ready when migration is running, and it is ready between
>>>> migration ends
>>>> and checkpoint begins. Using a monotir command add the nbd client will cause
>>>> larger
>>>> downtime. So if the nbd client has been added(only not connect to the nbd
>>>> server),
>>>> we can connect to nbd server automatically.
>>>
>>> Without the disk migration or mirror, I can't see the need for the delayed bdrv_connect.
>>> I can see that you want to do a disconnect at failover, however you need
>>> to take care because if the network is broken at the point of failover
>>> you need to make sure nothing blocks.
>>
>> disk migration is needed if the disk is not synced before migration.
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Dave
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>