From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41613) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yn37M-0002TV-0D for qemu-devel@nongnu.org; Tue, 28 Apr 2015 06:51:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yn37H-0007CE-SC for qemu-devel@nongnu.org; Tue, 28 Apr 2015 06:51:55 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:45224) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yn37F-00079G-Bd for qemu-devel@nongnu.org; Tue, 28 Apr 2015 06:51:51 -0400 Message-ID: <553F6630.2040704@huawei.com> Date: Tue, 28 Apr 2015 18:51:28 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com> <5524E3E9.1070102@huawei.com> <20150422111833.GD2386@work-vm> <5539EFDD.8060703@cn.fujitsu.com> <20150424083518.GA2139@work-vm> In-Reply-To: <20150424083518.GA2139@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" , Wen Congyang Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com, david@gibson.dropbear.id.au On 2015/4/24 16:35, Dr. David Alan Gilbert wrote: > * Wen Congyang (wency@cn.fujitsu.com) wrote: >> On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote: >>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >>>> Hi, >>>> >>>> ping ... >>> >>> I will get to look at this again; but not until after next week. >>> >>>> The main blocked bugs for COLO have been solved, >>> >>> I've got the v3 set running, but the biggest problem I hit are problems >>> with the packet comparison module; I've seen a panic which I think is >>> in colo_send_checkpoint_req that I think is due to the use of >>> GFP_KERNEL to allocate the netlink message and I think it can schedule >>> there. I tried making that a GFP_ATOMIC but I'm hitting other >>> problems with : >> >> Thanks for your test. >> I guest the backtrace should like: >> 1. colo_send_checkpoint_req() >> 2. colo_setup_checkpoint_by_id() >> >> Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory. > > See the backtrace below. > >>> kcolo_thread, no conn, schedule out >> >> Hmm, how to reproduce it? In my test, I only focus on block replication, and >> I don't use the network. >> >>> >>> that I've not had time to look into yet. >>> >>> So I only get about a 50% success rate of starting COLO. >>> I see there are stuff in the TODO of the colo-proxy that >>> seem to say the netlink stuff should change, maybe you're already fixing >>> that? >> >> Do you mean you get about a 50% success rate if you use the network? > > I always run with the network configured; but the 'kcolo_thread, no conn' bug > will hit very early; so I don't see any output on the primary or secondary > after the migrate -d is issued on the primary. On the primary in the dmesg > I see: > [ 736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team > [ 736.615268] kcolo_thread, no conn, schedule out, chk 0 > [ 736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team > [ 736.718273] arp_tables: (C) 2002 David S. Miller > > I've not had a chance to look further at that yet. > > Here is the backtrace from the 1st bug. > > Dave (I'm on holiday next week; I probably won't respond to many mails) > > [ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100 > [ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i > pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab > les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp llc sg kvm_intel kvm snd_hda_codec_generic > cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common snd_hwdep syscopyarea snd_seq crc32_pclmul crc > 32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw sysimgblt gf128mul ttm drm_kms_helper snd_p > cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper drm cryptd virtio_console i2c_core virti > o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput sunrpc xfs libcrc32c sr_mod cdrom ata_generic > [ 9087.833572] pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci libata virtio_ring floppy virtio dm_mirror > dm_region_hash dm_log dm_mod [last unloaded: ip_tables] > [ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF O-------------- 3.10.0-123.20.1.el7.dgilbertcolo > .x86_64 #1 > [ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 9087.833630] ffff880813de8000 7b4d45d276068aee ffff88083fc23980 ffffffff815e2b0c > [ 9087.833640] ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 ffffffff815e827b > [ 9087.833648] ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 00000000000135c0 > [ 9087.833657] Call Trace: > [ 9087.833664] [] dump_stack+0x19/0x1b > [ 9087.833680] [] __schedule_bug+0x4d/0x5b > [ 9087.833688] [] __schedule+0x78b/0x790 > [ 9087.833699] [] __cond_resched+0x26/0x30 > [ 9087.833707] [] _cond_resched+0x3a/0x50 > [ 9087.833716] [] kmem_cache_alloc_node+0x38/0x200 > [ 9087.833752] [] ? nf_conntrack_find_get+0x30/0x40 [nf_conntrack] > [ 9087.833761] [] ? __alloc_skb+0x5d/0x2d0 > [ 9087.833768] [] __alloc_skb+0x5d/0x2d0 > [ 9087.833777] [] ? netlink_lookup+0x32/0xf0 > [ 9087.833786] [] ? arp_req_set+0x270/0x270 > [ 9087.833794] [] netlink_alloc_skb+0x6b/0x1e0 > [ 9087.833801] [] ? arp_req_set+0x270/0x270 > [ 9087.833816] [] colo_send_checkpoint_req+0x2b/0x80 [xt_PMYCOLO] > [ 9087.833823] [] ? arp_req_set+0x270/0x270 > [ 9087.833832] [] colo_slaver_arp_hook+0x79/0xa0 [xt_PMYCOLO] > [ 9087.833850] [] ? arptable_filter_hook+0x2f/0x40 [arptable_filter] > [ 9087.833858] [] nf_iterate+0xaa/0xc0 > [ 9087.833866] [] ? arp_req_set+0x270/0x270 > [ 9087.833874] [] nf_hook_slow+0x84/0x140 > [ 9087.833882] [] ? arp_req_set+0x270/0x270 > [ 9087.833890] [] arp_rcv+0x120/0x160 > [ 9087.833906] [] __netif_receive_skb_core+0x676/0x870 > [ 9087.833914] [] __netif_receive_skb+0x18/0x60 > [ 9087.833922] [] netif_receive_skb+0x40/0xd0 > [ 9087.833930] [] napi_gro_receive+0x80/0xb0 > [ 9087.833959] [] e1000_clean_rx_irq+0x2b0/0x580 [e1000] > [ 9087.833970] [] e1000_clean+0x265/0x8e0 [e1000] > [ 9087.833979] [] ? ttwu_do_activate.constprop.85+0x5d/0x70 > [ 9087.833988] [] net_rx_action+0x15a/0x250 > [ 9087.833997] [] __do_softirq+0xf7/0x290 > [ 9087.834006] [] call_softirq+0x1c/0x30 > [ 9087.834011] [] do_softirq+0x55/0x90 > [ 9087.834011] [] irq_exit+0x115/0x120 > [ 9087.834011] [] do_IRQ+0x58/0xf0 > [ 9087.834011] [] common_interrupt+0x6d/0x6d > [ 9087.834011] [] ? native_safe_halt+0x6/0x10 > [ 9087.834011] [] default_idle+0x1f/0xc0 > [ 9087.834011] [] arch_cpu_idle+0x26/0x30 > [ 9087.834011] [] cpu_startup_entry+0xf5/0x290 > [ 9087.834011] [] start_secondary+0x1c4/0x1da > [ 9087.837189] ------------[ cut here ]------------ > [ 9087.837189] kernel BUG at net/core/dev.c:4130! > Hi Dave, This seems to be a deadlock bug. We have called some functions that could lead to schedule between rcu read lock and unlock. There are two places, One is netlink_alloc_skb() with GFP_KERNEL flag, and the other one is netlink_unicast() (It can also lead to schedule in some special cases). Please test with the follow modification. ;) diff --git a/xt_PMYCOLO.c b/xt_PMYCOLO.c index a8cf1a1..d8a6eab 100644 --- a/xt_PMYCOLO.c +++ b/xt_PMYCOLO.c @@ -1360,6 +1360,7 @@ static void colo_setup_checkpoint_by_id(u32 id) { if (node) { pr_dbg("mark %d, find colo_primary %p, setup checkpoint\n", id, node); + rcu_read_unlock(); colo_send_checkpoint_req(&node->u.p); } rcu_read_unlock(); Thanks, zhanghailiang >> >> >> Thanks >> Wen Congyang >> >>> >>>> we also have finished some new features and optimization on COLO. (If you are interested in this, >>>> we can send them to you in private ;)) >>> >>>> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame >>>> patch set before it been totally reviewed. >>> >>> I'd like to see those; but I don't want to take code privately. >>> It's OK to post extra stuff as a separate set. >>> >>>> COLO is a totally new feature which is still in early stage, we hope to speed up the development, >>>> so your comments and feedback are warmly welcomed. :) >>> >>> Yes, it's getting there though; I don't think anyone else has >>> got this close to getting a full FT set working with disk and networking. >>> >>> Dave >>> >>>> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >