From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34600)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1YlXvh-00008I-Bx
	for qemu-devel@nongnu.org; Fri, 24 Apr 2015 03:21:42 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1YlXvc-0000L5-Hr
	for qemu-devel@nongnu.org; Fri, 24 Apr 2015 03:21:41 -0400
Received: from [59.151.112.132] (port=58536 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wency@cn.fujitsu.com>) id 1YlXvb-0000JO-4o
	for qemu-devel@nongnu.org; Fri, 24 Apr 2015 03:21:36 -0400
Message-ID: <5539EFDD.8060703@cn.fujitsu.com>
Date: Fri, 24 Apr 2015 15:25:17 +0800
From: Wen Congyang <wency@cn.fujitsu.com>
MIME-Version: 1.0
References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<5524E3E9.1070102@huawei.com> <20150422111833.GD2386@work-vm>
In-Reply-To: <20150422111833.GD2386@work-vm>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain
 LOck-stepping(COLO) Virtual Machines for Non-stop Service
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, qemu-devel@nongnu.org, peter huangpeng <peter.huangpeng@huawei.com>, arei.gonglei@huawei.com, amit.shah@redhat.com, david@gibson.dropbear.id.au

On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi,
>>
>> ping ...
> 
> I will get to look at this again; but not until after next week.
> 
>> The main blocked bugs for COLO have been solved,
> 
> I've got the v3 set running, but the biggest problem I hit are problems
> with the packet comparison module; I've seen a panic which I think is
> in colo_send_checkpoint_req that I think is due to the use of
> GFP_KERNEL to allocate the netlink message and I think it can schedule
> there.  I tried making that a GFP_ATOMIC  but I'm hitting other
> problems with :

Thanks for your test.
I guest the backtrace should like:
1. colo_send_checkpoint_req()
2. colo_setup_checkpoint_by_id()

Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.

> 
> kcolo_thread, no conn, schedule out

Hmm, how to reproduce it? In my test, I only focus on block replication, and
I don't use the network.

> 
> that I've not had time to look into yet.
> 
> So I only get about a 50% success rate of starting COLO.
> I see there are stuff in the TODO of the colo-proxy that
> seem to say the netlink stuff should change, maybe you're already fixing
> that?

Do you mean you get about a 50% success rate if you use the network?


Thanks
Wen Congyang

> 
>> we also have finished some new features and optimization on COLO. (If you are interested in this,
>> we can send them to you in private ;))
> 
>> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
>> patch set before it been totally reviewed.
> 
> I'd like to see those; but I don't want to take code privately.
> It's OK to post extra stuff as a separate set.
> 
>> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
>> so your comments and feedback are warmly welcomed. :)
> 
> Yes, it's getting there though; I don't think anyone else has
> got this close to getting a full FT set working with disk and networking.
> 
> Dave
> 
>>