From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759133AbcINGac (ORCPT ); Wed, 14 Sep 2016 02:30:32 -0400 Received: from mail-wm0-f51.google.com ([74.125.82.51]:36835 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758443AbcINGaa (ORCPT ); Wed, 14 Sep 2016 02:30:30 -0400 Subject: Re: BUG_ON in rcu_sync_func triggered To: Oleg Nesterov References: <57D69CEC.5010103@kyup.com> <20160912130124.GA7984@redhat.com> <57D7B6F5.4040106@kyup.com> <20160913131852.GA4112@redhat.com> <20160913134304.GA26160@redhat.com> <57D80EB8.9080405@kyup.com> <57D80F52.6090804@kyup.com> <20160913152042.GA30160@redhat.com> Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org From: Nikolay Borisov Message-ID: <57D8EE82.3090502@kyup.com> Date: Wed, 14 Sep 2016 09:30:26 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20160913152042.GA30160@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/13/2016 06:20 PM, Oleg Nesterov wrote: > On 09/13, Nikolay Borisov wrote: >> >> On 09/13/2016 05:35 PM, Nikolay Borisov wrote: >>> >>> On 09/13/2016 04:43 PM, Oleg Nesterov wrote: >>>> On 09/13, Oleg Nesterov wrote: >>>>> >>>>> OK... perhaps the unbalanced up_write... I'll try to look at freeze/thaw code, >>>> >>>> Heh, yes, it looks racy or I am totally confused. >>>> >>>>> could test the debugging patch below meanwhile? >>>> >>>> Yes please. I'll send you another patch (hopefully fix) later, but it >>>> would be nice if you can test this patch to get more info. >>> >>> I've already started testing with this patch on 4.4.20 this time > > I think it would be better to stay with the same kernel version to > debug the problem... > >> Actually forget that, here is a warning that this triggered: >> >> [ 844.290454] WARNING: CPU: 2 PID: 1900 at kernel/rcu/sync.c:160 rcu_sync_func+0xc8/0x150() >> ... >> [ 844.754708] XXX: ffff88047527da78 gp=2 cnt=0 cb=1 > > Hmm. Thanks. Please show us all the warnings you get. What I showed you yesterday was the only warning that I got. On the same kernel another run produced the following warnings: [ 556.869624] ------------[ cut here ]------------ [ 556.875128] WARNING: CPU: 6 PID: 20210 at kernel/rcu/sync.c:111 rcu_sync_enter+0x148/0x1a0() [ 556.885142] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio [ 556.960028] CPU: 6 PID: 20210 Comm: fsfreeze Not tainted 4.4.20-clouder1 #9 [ 556.968143] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012 [ 556.975963] 0000000000000000 ffff880256b43cf8 ffffffff81307a9b 0000000000004ef2 [ 556.985182] 0000000000000000 0000000000000000 000000000000006f ffff880256b43d38 [ 556.994383] ffffffff81054a85 ffff880473326b10 ffff880473326b08 ffff880473326b10 [ 557.003589] Call Trace: [ 557.006656] [] dump_stack+0x6b/0xa0 [ 557.012737] [] warn_slowpath_common+0x95/0xe0 [ 557.019781] [] warn_slowpath_null+0x1a/0x20 [ 557.026645] [] rcu_sync_enter+0x148/0x1a0 [ 557.033309] [] percpu_down_write+0x1e/0xf0 [ 557.040074] [] ? call_rwsem_down_write_failed+0x13/0x20 [ 557.048092] [] freeze_super+0xab/0x1b0 [ 557.054456] [] do_vfs_ioctl+0x29d/0x560 [ 557.060920] [] ? SYSC_newfstat+0x2e/0x40 [ 557.067480] [] SyS_ioctl+0x92/0xa0 [ 557.073465] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 557.081015] ---[ end trace fc087420ac1d8f16 ]--- [ 557.086507] XXX: ffff880473326b08 gp=2 cnt=-1 cb=1 [ 557.092326] rbd: rbd19: added with size 0x500000000 This is: if (WARN_ON(rsp->gp_count < 0)) xxx(rsp); [ 557.094835] ------------[ cut here ]------------ [ 557.094841] WARNING: CPU: 1 PID: 0 at kernel/rcu/sync.c:159 rcu_sync_func+0xf9/0x150() [ 557.094873] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio [ 557.094876] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.4.20-clouder1 #9 [ 557.094877] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012 [ 557.094879] 0000000000000000 ffff880277c83d78 ffffffff81307a9b 0000000000000000 [ 557.094881] 0000000000000000 0000000000000000 000000000000009f ffff880277c83db8 [ 557.094882] ffffffff81054a85 0000000000000000 ffff880473326bc0 ffff880473326b98 [ 557.094882] Call Trace: [ 557.094889] [] dump_stack+0x6b/0xa0 [ 557.094892] [] warn_slowpath_common+0x95/0xe0 [ 557.094893] [] warn_slowpath_null+0x1a/0x20 [ 557.094894] [] rcu_sync_func+0xf9/0x150 [ 557.094897] [] rcu_process_callbacks+0x290/0x740 [ 557.094900] [] __do_softirq+0x113/0x330 [ 557.094902] [] irq_exit+0x75/0x80 [ 557.094905] [] smp_apic_timer_interrupt+0x46/0x55 [ 557.094907] [] apic_timer_interrupt+0x89/0x90 [ 557.094911] [] ? cpuidle_enter_state+0x14c/0x2e0 [ 557.094912] [] ? cpuidle_enter_state+0x141/0x2e0 [ 557.094914] [] cpuidle_enter+0x17/0x20 [ 557.094916] [] cpu_startup_entry+0x310/0x370 [ 557.094919] [] ? clockevents_config_and_register+0x2c/0x40 [ 557.094923] [] start_secondary+0xf9/0x100 [ 557.094924] ---[ end trace fc087420ac1d8f17 ]--- [ 557.094925] XXX: ffff880473326b98 gp=2 cnt=-1 cb=1 This is: if (WARN_ON(rsp->gp_count < 0)) xxx(rsp); I'd say the warnings are pretty consistent between 4.4.14 and 4.4.20. Another run on the same kernel (no reboots between tries) produced the following: [ 700.981151] ------------[ cut here ]------------ [ 700.986643] WARNING: CPU: 3 PID: 2428 at kernel/rcu/sync.c:111 rcu_sync_enter+0x148/0x1a0() [ 700.996541] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio [ 701.071212] CPU: 3 PID: 2428 Comm: fsfreeze Tainted: G W 4.4.20-clouder1 #9 [ 701.080816] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012 [ 701.088530] 0000000000000000 ffff88006b5efcf8 ffffffff81307a9b 000000000000097c [ 701.097711] 0000000000000000 0000000000000000 000000000000006f ffff88006b5efd38 [ 701.106870] ffffffff81054a85 ffff880079ab9310 ffff880079ab9308 ffff880079ab9310 [ 701.116039] Call Trace: [ 701.119084] [] dump_stack+0x6b/0xa0 [ 701.125149] [] warn_slowpath_common+0x95/0xe0 [ 701.132184] [] warn_slowpath_null+0x1a/0x20 [ 701.139027] [] rcu_sync_enter+0x148/0x1a0 [ 701.145674] [] ? __might_sleep+0x52/0x90 [ 701.152221] [] percpu_down_write+0x1e/0xf0 [ 701.158962] [] ? __might_sleep+0x52/0x90 [ 701.165511] [] freeze_super+0xab/0x1b0 [ 701.171867] [] do_vfs_ioctl+0x29d/0x560 [ 701.178319] [] ? SYSC_newfstat+0x2e/0x40 [ 701.184959] [] SyS_ioctl+0x92/0xa0 [ 701.190926] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 701.198446] ---[ end trace fc087420ac1d8f18 ]--- [ 701.203915] XXX: ffff880079ab9308 gp=0 cnt=-1 cb=0 [ 701.209624] ------------[ cut here ]------------ [ 701.215193] WARNING: CPU: 3 PID: 2428 at kernel/rcu/sync.c:118 rcu_sync_enter+0x7e/0x1a0() [ 701.224996] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio [ 701.299721] CPU: 3 PID: 2428 Comm: fsfreeze Tainted: G W 4.4.20-clouder1 #9 [ 701.309325] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012 [ 701.317038] 0000000000000000 ffff88006b5efcf8 ffffffff81307a9b 000000000000097c [ 701.326203] 0000000000000000 0000000000000000 0000000000000076 ffff88006b5efd38 [ 701.335382] ffffffff81054a85 0000000000000007 ffff880079ab9308 ffff880079ab9310 [ 701.344556] Call Trace: [ 701.347610] [] dump_stack+0x6b/0xa0 [ 701.353674] [] warn_slowpath_common+0x95/0xe0 [ 701.360705] [] warn_slowpath_null+0x1a/0x20 [ 701.367543] [] rcu_sync_enter+0x7e/0x1a0 [ 701.374090] [] ? __might_sleep+0x52/0x90 [ 701.380637] [] percpu_down_write+0x1e/0xf0 [ 701.387379] [] ? __might_sleep+0x52/0x90 [ 701.393928] [] freeze_super+0xab/0x1b0 [ 701.400281] [] do_vfs_ioctl+0x29d/0x560 [ 701.406737] [] ? SYSC_newfstat+0x2e/0x40 [ 701.413283] [] SyS_ioctl+0x92/0xa0 [ 701.419249] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 701.426773] ---[ end trace fc087420ac1d8f19 ]--- [ 701.432273] XXX: ffff880079ab9308 gp=1 cnt=0 cb=0 Regards, Nikolay > > Oleg. >