All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Wen Congyang <wency@cn.fujitsu.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	peter.huangpeng@huawei.com, qemu-devel@nongnu.org,
	arei.gonglei@huawei.com, amit.shah@redhat.com,
	david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
Date: Tue, 28 Apr 2015 18:51:28 +0800	[thread overview]
Message-ID: <553F6630.2040704@huawei.com> (raw)
In-Reply-To: <20150424083518.GA2139@work-vm>

On 2015/4/24 16:35, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 04/22/2015 07:18 PM, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> Hi,
>>>>
>>>> ping ...
>>>
>>> I will get to look at this again; but not until after next week.
>>>
>>>> The main blocked bugs for COLO have been solved,
>>>
>>> I've got the v3 set running, but the biggest problem I hit are problems
>>> with the packet comparison module; I've seen a panic which I think is
>>> in colo_send_checkpoint_req that I think is due to the use of
>>> GFP_KERNEL to allocate the netlink message and I think it can schedule
>>> there.  I tried making that a GFP_ATOMIC  but I'm hitting other
>>> problems with :
>>
>> Thanks for your test.
>> I guest the backtrace should like:
>> 1. colo_send_checkpoint_req()
>> 2. colo_setup_checkpoint_by_id()
>>
>> Because we hold rcu read lock, so we cannot use GFP_KERNEL to malloc memory.
>
> See the backtrace below.
>
>>> kcolo_thread, no conn, schedule out
>>
>> Hmm, how to reproduce it? In my test, I only focus on block replication, and
>> I don't use the network.
>>
>>>
>>> that I've not had time to look into yet.
>>>
>>> So I only get about a 50% success rate of starting COLO.
>>> I see there are stuff in the TODO of the colo-proxy that
>>> seem to say the netlink stuff should change, maybe you're already fixing
>>> that?
>>
>> Do you mean you get about a 50% success rate if you use the network?
>
> I always run with the network configured; but the 'kcolo_thread, no conn' bug
> will hit very early; so I don't see any output on the primary or secondary
> after the migrate -d is issued on the primary.  On the primary in the dmesg
> I see:
> [  736.607043] ip_tables: (C) 2000-2006 Netfilter Core Team
> [  736.615268] kcolo_thread, no conn, schedule out, chk 0
> [  736.619442] ip6_tables: (C) 2000-2006 Netfilter Core Team
> [  736.718273] arp_tables: (C) 2002 David S. Miller
>
> I've not had a chance to look further at that yet.
>
> Here is the backtrace from the 1st bug.
>
> Dave (I'm on holiday next week; I probably won't respond to many mails)
>
> [ 9087.833228] BUG: scheduling while atomic: swapper/1/0/0x10000100
> [ 9087.833271] Modules linked in: ip6table_mangle ip6_tables xt_physdev iptable_mangle xt_PMYCOLO(OF) nf_conntrack_i
> pv4 nf_defrag_ipv4 xt_mark nf_conntrack_colo(OF) nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack iptable_filter ip_tab
> les arptable_filter arp_tables act_mirred cls_u32 sch_prio tun bridge stp llc sg kvm_intel kvm snd_hda_codec_generic
>   cirrus snd_hda_intel crct10dif_pclmul snd_hda_codec crct10dif_common snd_hwdep syscopyarea snd_seq crc32_pclmul crc
> 32c_intel sysfillrect ghash_clmulni_intel snd_seq_device aesni_intel lrw sysimgblt gf128mul ttm drm_kms_helper snd_p
> cm snd_page_alloc snd_timer snd soundcore glue_helper i2c_piix4 ablk_helper drm cryptd virtio_console i2c_core virti
> o_balloon serio_raw mperf pcspkr nfsd auth_rpcgss nfs_acl lockd uinput sunrpc xfs libcrc32c sr_mod cdrom ata_generic
> [ 9087.833572]  pata_acpi virtio_net virtio_blk ata_piix e1000 virtio_pci libata virtio_ring floppy virtio dm_mirror
>   dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
> [ 9087.833616] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GF          O--------------   3.10.0-123.20.1.el7.dgilbertcolo
> .x86_64 #1
> [ 9087.833623] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 9087.833630]  ffff880813de8000 7b4d45d276068aee ffff88083fc23980 ffffffff815e2b0c
> [ 9087.833640]  ffff88083fc23990 ffffffff815dca9f ffff88083fc239f0 ffffffff815e827b
> [ 9087.833648]  ffff880813de9fd8 00000000000135c0 ffff880813de9fd8 00000000000135c0
> [ 9087.833657] Call Trace:
> [ 9087.833664]  <IRQ>  [<ffffffff815e2b0c>] dump_stack+0x19/0x1b
> [ 9087.833680]  [<ffffffff815dca9f>] __schedule_bug+0x4d/0x5b
> [ 9087.833688]  [<ffffffff815e827b>] __schedule+0x78b/0x790
> [ 9087.833699]  [<ffffffff81094fb6>] __cond_resched+0x26/0x30
> [ 9087.833707]  [<ffffffff815e86aa>] _cond_resched+0x3a/0x50
> [ 9087.833716]  [<ffffffff81193908>] kmem_cache_alloc_node+0x38/0x200
> [ 9087.833752]  [<ffffffffa046b770>] ? nf_conntrack_find_get+0x30/0x40 [nf_conntrack]
> [ 9087.833761]  [<ffffffff814c115d>] ? __alloc_skb+0x5d/0x2d0
> [ 9087.833768]  [<ffffffff814c115d>] __alloc_skb+0x5d/0x2d0
> [ 9087.833777]  [<ffffffff814fb972>] ? netlink_lookup+0x32/0xf0
> [ 9087.833786]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833794]  [<ffffffff814fbc3b>] netlink_alloc_skb+0x6b/0x1e0
> [ 9087.833801]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833816]  [<ffffffffa04a462b>] colo_send_checkpoint_req+0x2b/0x80 [xt_PMYCOLO]
> [ 9087.833823]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833832]  [<ffffffffa04a4dd9>] colo_slaver_arp_hook+0x79/0xa0 [xt_PMYCOLO]
> [ 9087.833850]  [<ffffffffa05fc02f>] ? arptable_filter_hook+0x2f/0x40 [arptable_filter]
> [ 9087.833858]  [<ffffffff81500c5a>] nf_iterate+0xaa/0xc0
> [ 9087.833866]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833874]  [<ffffffff81500cf4>] nf_hook_slow+0x84/0x140
> [ 9087.833882]  [<ffffffff8153b7d0>] ? arp_req_set+0x270/0x270
> [ 9087.833890]  [<ffffffff8153bf60>] arp_rcv+0x120/0x160
> [ 9087.833906]  [<ffffffff814d0596>] __netif_receive_skb_core+0x676/0x870
> [ 9087.833914]  [<ffffffff814d07a8>] __netif_receive_skb+0x18/0x60
> [ 9087.833922]  [<ffffffff814d0830>] netif_receive_skb+0x40/0xd0
> [ 9087.833930]  [<ffffffff814d1290>] napi_gro_receive+0x80/0xb0
> [ 9087.833959]  [<ffffffffa00e34a0>] e1000_clean_rx_irq+0x2b0/0x580 [e1000]
> [ 9087.833970]  [<ffffffffa00e5985>] e1000_clean+0x265/0x8e0 [e1000]
> [ 9087.833979]  [<ffffffff8109506d>] ? ttwu_do_activate.constprop.85+0x5d/0x70
> [ 9087.833988]  [<ffffffff814d0bfa>] net_rx_action+0x15a/0x250
> [ 9087.833997]  [<ffffffff81067047>] __do_softirq+0xf7/0x290
> [ 9087.834006]  [<ffffffff815f4b5c>] call_softirq+0x1c/0x30
> [ 9087.834011]  [<ffffffff81014cf5>] do_softirq+0x55/0x90
> [ 9087.834011]  [<ffffffff810673e5>] irq_exit+0x115/0x120
> [ 9087.834011]  [<ffffffff815f5458>] do_IRQ+0x58/0xf0
> [ 9087.834011]  [<ffffffff815ea5ad>] common_interrupt+0x6d/0x6d
> [ 9087.834011]  <EOI>  [<ffffffff81046346>] ? native_safe_halt+0x6/0x10
> [ 9087.834011]  [<ffffffff8101b39f>] default_idle+0x1f/0xc0
> [ 9087.834011]  [<ffffffff8101bc96>] arch_cpu_idle+0x26/0x30
> [ 9087.834011]  [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
> [ 9087.834011]  [<ffffffff815d0a6e>] start_secondary+0x1c4/0x1da
> [ 9087.837189] ------------[ cut here ]------------
> [ 9087.837189] kernel BUG at net/core/dev.c:4130!
>

Hi Dave,

This seems to be a deadlock bug. We have called some functions that could lead to schedule
between rcu read lock and unlock. There are two places, One is netlink_alloc_skb() with GFP_KERNEL flag,
and the other one is netlink_unicast() (It can also lead to schedule in some special cases).

Please test with the follow modification. ;)

diff --git a/xt_PMYCOLO.c b/xt_PMYCOLO.c
index a8cf1a1..d8a6eab 100644
--- a/xt_PMYCOLO.c
+++ b/xt_PMYCOLO.c
@@ -1360,6 +1360,7 @@ static void colo_setup_checkpoint_by_id(u32 id) {
         if (node) {
                 pr_dbg("mark %d, find colo_primary %p, setup checkpoint\n",
                         id, node);
+               rcu_read_unlock();
                 colo_send_checkpoint_req(&node->u.p);
         }
         rcu_read_unlock();


Thanks,
zhanghailiang

>>
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>> we also have finished some new features and optimization on COLO. (If you are interested in this,
>>>> we can send them to you in private ;))
>>>
>>>> For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame
>>>> patch set before it been totally reviewed.
>>>
>>> I'd like to see those; but I don't want to take code privately.
>>> It's OK to post extra stuff as a separate set.
>>>
>>>> COLO is a totally new feature which is still in early stage, we hope to speed up the development,
>>>> so your comments and feedback are warmly welcomed. :)
>>>
>>> Yes, it's getting there though; I don't think anyone else has
>>> got this close to getting a full FT set working with disk and networking.
>>>
>>> Dave
>>>
>>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

  reply	other threads:[~2015-04-28 10:51 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26  5:29 [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 01/28] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 02/28] migration: Introduce capability 'colo' to migration zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 03/28] COLO: migrate colo related info to slave zhanghailiang
2015-05-15 11:38   ` Dr. David Alan Gilbert
2015-05-18  5:04     ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 04/28] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 05/28] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 06/28] COLO: Implement colo checkpoint protocol zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 07/28] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-05-15 11:28   ` Dr. David Alan Gilbert
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 08/28] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-05-15 11:56   ` Dr. David Alan Gilbert
2015-05-18  5:10     ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint zhanghailiang
2015-05-15 12:09   ` Dr. David Alan Gilbert
2015-05-18  9:11     ` zhanghailiang
2015-05-18 12:10       ` Dr. David Alan Gilbert
2015-05-18 12:22         ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 10/28] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 11/28] COLO VMstate: Load VM state into qsb before restore it zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 12/28] arch_init: Start to trace dirty pages of SVM zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 13/28] COLO RAM: Flush cached RAM into SVM's memory zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 14/28] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 15/28] COLO failover: Implement COLO master/slave failover work zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 16/28] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 17/28] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 18/28] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 19/28] COLO NIC: Implement colo nic device interface configure() zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 20/28] COLO NIC : Implement colo nic init/destroy function zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 21/28] COLO NIC: Some init work related with proxy module zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 22/28] COLO: Do checkpoint according to the result of net packets comparing zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint efficiency by do additional periodic checkpoint zhanghailiang
2015-05-18 16:48   ` Dr. David Alan Gilbert
2015-05-19  6:08     ` zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 24/28] COLO: Add colo-set-checkpoint-period command zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 25/28] COLO NIC: Implement NIC checkpoint and failover zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 26/28] COLO: Disable qdev hotplug when VM is in COLO mode zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 27/28] COLO: Implement shutdown checkpoint zhanghailiang
2015-03-26  5:29 ` [Qemu-devel] [RFC PATCH v4 28/28] COLO: Add block replication into colo process zhanghailiang
2015-04-08  8:16 ` [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service zhanghailiang
2015-04-22 11:18   ` Dr. David Alan Gilbert
2015-04-24  7:25     ` Wen Congyang
2015-04-24  8:35       ` Dr. David Alan Gilbert
2015-04-28 10:51         ` zhanghailiang [this message]
2015-05-06 17:11           ` Dr. David Alan Gilbert
2015-04-24  8:52     ` zhanghailiang
2015-04-24  8:56       ` Dr. David Alan Gilbert
2015-05-14 12:14 ` Dr. David Alan Gilbert
2015-05-14 12:58   ` zhanghailiang
2015-05-14 16:09     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=553F6630.2040704@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=amit.shah@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wency@cn.fujitsu.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.