From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756270AbcIMOiQ (ORCPT ); Tue, 13 Sep 2016 10:38:16 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:35976 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751488AbcIMOiO (ORCPT ); Tue, 13 Sep 2016 10:38:14 -0400 Subject: Re: BUG_ON in rcu_sync_func triggered To: Oleg Nesterov References: <57D69CEC.5010103@kyup.com> <20160912130124.GA7984@redhat.com> <57D7B6F5.4040106@kyup.com> <20160913131852.GA4112@redhat.com> <20160913134304.GA26160@redhat.com> <57D80EB8.9080405@kyup.com> Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org From: Nikolay Borisov Message-ID: <57D80F52.6090804@kyup.com> Date: Tue, 13 Sep 2016 17:38:10 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <57D80EB8.9080405@kyup.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/13/2016 05:35 PM, Nikolay Borisov wrote: > > > On 09/13/2016 04:43 PM, Oleg Nesterov wrote: >> On 09/13, Oleg Nesterov wrote: >>> >>> OK... perhaps the unbalanced up_write... I'll try to look at freeze/thaw code, >> >> Heh, yes, it looks racy or I am totally confused. >> >>> could test the debugging patch below meanwhile? >> >> Yes please. I'll send you another patch (hopefully fix) later, but it >> would be nice if you can test this patch to get more info. > > I've already started testing with this patch on 4.4.20 this time to see > what happens, but I'll likely get results tomorrow. For now I wasn't > able to crash it. Actually forget that, here is a warning that this triggered: [ 844.284959] ------------[ cut here ]------------ [ 844.290454] WARNING: CPU: 2 PID: 1900 at kernel/rcu/sync.c:160 rcu_sync_func+0xc8/0x150() [ 844.300154] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio [ 844.375006] CPU: 2 PID: 1900 Comm: fio Not tainted 4.4.20-clouder1 #9 [ 844.382524] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012 [ 844.390241] 0000000000000000 ffff880277d03d78 ffffffff81307a9b 000000000000076c [ 844.399416] 0000000000000000 0000000000000000 00000000000000a0 ffff880277d03db8 [ 844.408598] ffffffff81054a85 ffff880277d03dc8 ffff88047527daa0 ffff88047527da78 [ 844.417771] Call Trace: [ 844.420822] [] dump_stack+0x6b/0xa0 [ 844.427659] [] warn_slowpath_common+0x95/0xe0 [ 844.434695] [] warn_slowpath_null+0x1a/0x20 [ 844.441532] [] rcu_sync_func+0xc8/0x150 [ 844.447983] [] rcu_process_callbacks+0x290/0x740 [ 844.455310] [] ? ktime_get+0x52/0xc0 [ 844.461459] [] __do_softirq+0x113/0x330 [ 844.467909] [] irq_exit+0x75/0x80 [ 844.473775] [] smp_apic_timer_interrupt+0x46/0x55 [ 844.481200] [] apic_timer_interrupt+0x89/0x90 [ 844.488234] [] ? shrink_inactive_list+0x1e0/0x5c0 [ 844.496426] [] ? shrink_inactive_list+0x1d8/0x5c0 [ 844.503848] [] ? global_dirty_limits+0x98/0xc0 [ 844.510984] [] ? throttle_vm_writeout+0x39/0xc0 [ 844.518214] [] shrink_lruvec+0x289/0x390 [ 844.524754] [] ? mem_cgroup_iter+0x2a9/0x3e0 [ 844.531687] [] ? wb_queue_work+0x8c/0x100 [ 844.538333] [] shrink_zone+0x12a/0x360 [ 844.544686] [] ? vmpressure+0x88/0x90 [ 844.550943] [] do_try_to_free_pages+0x17d/0x450 [ 844.558174] [] ? mem_cgroup_select_victim_node+0x1d1/0x1f0 [ 844.566468] [] try_to_free_mem_cgroup_pages+0xb5/0x190 [ 844.574375] [] try_charge+0x22d/0x720 [ 844.580631] [] ? find_get_entry+0x3e/0xd0 [ 844.587281] [] ? __might_sleep+0x52/0x90 [ 844.593827] [] ? radix_tree_lookup_slot+0x13/0x30 [ 844.601251] [] mem_cgroup_try_charge+0x57/0x150 [ 844.608478] [] __add_to_page_cache_locked+0x4c/0x270 [ 844.616194] [] ? __block_commit_write+0x80/0xb0 [ 844.623419] [] add_to_page_cache_lru+0x28/0x80 [ 844.630548] [] pagecache_get_page+0x97/0x1e0 [ 844.637484] [] grab_cache_page_write_begin+0x2b/0x50 [ 844.645202] [] ext4_da_write_begin+0x17d/0x330 [ 844.652334] [] ? ext4_dirty_inode+0x66/0x80 [ 844.659167] [] generic_perform_write+0xd0/0x1f0 [ 844.666385] [] __generic_file_write_iter+0x196/0x1f0 [ 844.674102] [] ? __might_sleep+0x52/0x90 [ 844.680648] [] ext4_file_write_iter+0x11f/0x3a0 [ 844.687874] [] ? __might_sleep+0x52/0x90 [ 844.694418] [] ? ext4_unwritten_wait+0xc0/0xc0 [ 844.701547] [] aio_run_iocb+0x1ee/0x290 [ 844.707999] [] ? __might_sleep+0x52/0x90 [ 844.714537] [] do_io_submit+0x321/0x530 [ 844.720989] [] ? SyS_io_getevents+0x58/0xc0 [ 844.727828] [] ? trace_hardirqs_on_thunk+0x17/0x19 [ 844.735345] [] SyS_io_submit+0x10/0x20 [ 844.741701] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 844.749230] ---[ end trace 5f72aeec215954f4 ]--- [ 844.754708] XXX: ffff88047527da78 gp=2 cnt=0 cb=1