From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34600) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlXvh-00008I-Bx for qemu-devel@nongnu.org; Fri, 24 Apr 2015 03:21:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlXvc-0000L5-Hr for qemu-devel@nongnu.org; Fri, 24 Apr 2015 03:21:41 -0400 Received: from [59.151.112.132] (port=58536 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlXvb-0000JO-4o for qemu-devel@nongnu.org; Fri, 24 Apr 2015 03:21:36 -0400 Message-ID: <5539EFDD.8060703@cn.fujitsu.com> Date: Fri, 24 Apr 2015 15:25:17 +0800 From: Wen Congyang MIME-Version: 1.0 References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com> <5524E3E9.1070102@huawei.com> <20150422111833.GD2386@work-vm> In-Reply-To: <20150422111833.GD2386@work-vm> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" , zhanghailiang Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, qemu-devel@nongnu.org, peter huangpeng , arei.gonglei@huawei.com, amit.shah@redhat.com, david@gibson.dropbear.id.au On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >> Hi, >> >> ping ... > > I will get to look at this again; but not until after next week. > >> The main blocked bugs for COLO have been solved, > > I've got the v3 set running, but the biggest problem I hit are problems > with the packet comparison module; I've seen a panic which I think is > in colo_send_checkpoint_req that I think is due to the use of > GFP_KERNEL to allocate the netlink message and I think it can schedule > there. I tried making that a GFP_ATOMIC but I'm hitting other > problems with : Thanks for your test. I guest the backtrace should like: 1. colo_send_checkpoint_req() 2. colo_setup_checkpoint_by_id() Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory. > > kcolo_thread, no conn, schedule out Hmm, how to reproduce it? In my test, I only focus on block replication, and I don't use the network. > > that I've not had time to look into yet. > > So I only get about a 50% success rate of starting COLO. > I see there are stuff in the TODO of the colo-proxy that > seem to say the netlink stuff should change, maybe you're already fixing > that? Do you mean you get about a 50% success rate if you use the network? Thanks Wen Congyang > >> we also have finished some new features and optimization on COLO. (If you are interested in this, >> we can send them to you in private ;)) > >> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame >> patch set before it been totally reviewed. > > I'd like to see those; but I don't want to take code privately. > It's OK to post extra stuff as a separate set. > >> COLO is a totally new feature which is still in early stage, we hope to speed up the development, >> so your comments and feedback are warmly welcomed. :) > > Yes, it's getting there though; I don't think anyone else has > got this close to getting a full FT set working with disk and networking. > > Dave > >>