Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29)

From: Yang Shi <shy828301@gmail.com>
To: Wang Yugui <wangyugui@e16-tech.com>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Linux MM <linux-mm@kvack.org>
Subject: Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29)
Date: Wed, 21 Apr 2021 17:11:45 -0700	[thread overview]
Message-ID: <CAHbLzkpiVjDs9qPL=sX7PRMTweyi9TForGB3B4yGhqR575p_Xg@mail.gmail.com> (raw)
In-Reply-To: <20210417163337.AA58.409509F4@e16-tech.com>

On Sat, Apr 17, 2021 at 1:33 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi,
>
> > On Mon, Apr 12, 2021 at 3:07 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
> > >
> > > Hi,
> > >
> > > kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) is triggered
> > > by some files write test.
> > >
> > > mm/huge_memory.c:
> > >         if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
> > >             pr_alert("total_mapcount: %u, page_count(): %u\n",
> > >                     mapcount, count);
> > >             if (PageTail(page))
> > >                 dump_page(head, NULL);
> > >             dump_page(page, "total_mapcount(head) > 0");
> > > L2736:           BUG();
> > >         }
> >
> > We just can tell the mapcount of the page is not zero from the current
> > log, it might mean the unmap_page() call is failed. It seems you have
> > CONFIG_DEBUG_VM enabled, could you please paste more log? There is
> > "VM_BUG_ON_PAGE(!unmap_success, page)" in unmap_page(). It should be
> > able to tell us if unmap_page() is failed or not, or something else
> > happened.
>
> This is the full dmesg output
>
> [63080.331513] huge_memory: total_mapcount: 511, page_count(): 512
> [63080.332167] page:00000000d2e1a982 refcount:512 mapcount:0 mapping:0000000000000000 index:0x7fe260582 pfn:0x676a00
> [63080.332167] head:00000000d2e1a982 order:9 compound_mapcount:0 compound_pincount:0
> [63080.332167] anon flags: 0x17ffffc009001d(locked|uptodate|dirty|lru|head|swapbacked)
> [63080.332167] raw: 0017ffffc009001d ffffc93cda0d0008 ffffc93cd9ab0008 ffff8f21be9f0cb9
> [63080.332167] raw: 00000007fe260582 0000000000000000 00000200ffffffff ffff8f1021810000
> [63080.332167] page->mem_cgroup:ffff8f1021810000
> [63080.332167] page:00000000bc78ac24 refcount:512 mapcount:1 mapping:0000000000000000 index:0x7fe260584 pfn:0x676a02
> [63080.332167] head:00000000d2e1a982 order:9 compound_mapcount:0 compound_pincount:0
> [63080.332167] anon flags: 0x17ffffc009001d(locked|uptodate|dirty|lru|head|swapbacked)
> [63080.332167] raw: 0017ffffc0000000 ffffc93cd9da8001 dead000000000000 ffffc93d428d0098
> [63080.332167] raw: ffffa002cd183bf0 0000000000000000 0000000000000000 0000000000000000
> [63080.332167] head: 0017ffffc009001d ffffc93cda0d0008 ffffc93cd9ab0008 ffff8f21be9f0cb9
> [63080.332167] head: 00000007fe260582 0000000000000000 00000200ffffffff ffff8f1021810000
> [63080.332167] page dumped because: total_mapcount(head) > 0

Added Kirill in this loop too, he may have some insights.

Thanks a lot for pasting the full log. It seems the BUG_ON in
unmap_page() and VM_BUG_ON_PAGE(compound_mapcount(head), head) were
not triggered. But the dumped page shows its total_mapcount is 511. It
means 511 subpages of the huge page are PTE mapped. It seems all tail
pages are PTE mapped. It may be because unmap_page() is failed or they
are mapped again after unmap_page().

But the VM_BUG_ON_PAGE just checks compound_mapcount, and it seems
page_mapcount() call in unmap_page() also just checks
compound_mapcount and the mapcount of the head page. If the mapcount
of the head page is 0 and compound_mapcount is also 0, try_to_unmap()
considers unmap is successful.

So we can't tell which case it is although I don't think of how
unmap_page() could fail for this case.  I think we should check the
total mapcount in try_to_unmap() instead.

Can you please try the below debug patch (untested) to help narrow
down the problem?

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ae907a9c2050..c10e89be1c99 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2726,7 +2726,7 @@ int split_huge_page_to_list(struct page *page,
struct list_head *list)
        }

        unmap_page(head);
-       VM_BUG_ON_PAGE(compound_mapcount(head), head);
+       VM_BUG_ON_PAGE(total_mapcount(head), head);

        /* block interrupt reentry in xa_lock and spinlock */
        local_irq_disable();
diff --git a/mm/rmap.c b/mm/rmap.c
index b0fc27e77d6d..537dfc557744 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1777,7 +1777,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
        else
                rmap_walk(page, &rwc);

-       return !page_mapcount(page) ? true : false;
+       return !total_mapcount(page) ? true : false;
 }

 /**


> [63080.332167] ------------[ cut here ]------------
> [63080.332167] kernel BUG at mm/huge_memory.c:2736!
> [63080.332167] invalid opcode: 0000 [#1] SMP NOPTI
> [63080.332167] CPU: 8 PID: 376 Comm: kswapd0 Tainted: G S                5.10.31-1.el7.x86_64 #1
> [63080.332167] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
> [63080.332167] RIP: 0010:split_huge_page_to_list.cold.86+0x19/0x1b
> [63080.332167] Code: 3a bc e8 8f 86 ff ff b8 f4 ff ff ff e9 43 7f 83 ff 31 f6 4c 89 e7 e8 bd dc 7d ff 48 c7 c6 4f f1 3a bc 48 89 ef e8 ae dc 7d ff <0f> 0b 48 8b 34 24 4c 89 e2 48 c7 c7 28 f5 3a bc e8 57 86 ff ff 31
> [63080.332167] RSP: 0018:ffffa002cd183b10 EFLAGS: 00010086
> [63080.332167] RAX: 0000000000000000 RBX: ffff8f1021810ae0 RCX: 0000000000000027
> [63080.332167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f2eefa18a88
> [63080.332167] RBP: ffffc93cd9da8080 R08: 0000000000000000 R09: c0000000ffffbfff
> [63080.332167] R10: 0000000000000001 R11: ffffa002cd1837e8 R12: ffffc93cd9da8000
> [63080.332167] R13: 0000000000000000 R14: ffff8f21be9f0cb8 R15: 00000000000001ff
> [63080.332167] FS:  0000000000000000(0000) GS:ffff8f2eefa00000(0000) knlGS:0000000000000000
> [63080.332167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [63080.332167] CR2: 00007f8e24fabd20 CR3: 00000007eaa10005 CR4: 00000000001706e0
> [63080.332167] Call Trace:
> [63080.332167]  ? irq_exit_rcu+0x4f/0xe0
> [63080.332167]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> [63080.332167]  deferred_split_scan+0x1ca/0x320
> [63080.332167]  do_shrink_slab+0x11f/0x250
> [63080.332167]  shrink_slab+0x20f/0x2c0
> [63080.332167]  shrink_node+0x24b/0x6d0
> [63080.332167]  balance_pgdat+0x2db/0x550
> [63080.332167]  kswapd+0x201/0x390
> [63080.332167]  ? finish_wait+0x80/0x80
> [63080.332167]  ? balance_pgdat+0x550/0x550
> [63080.332167]  kthread+0x116/0x130
> [63080.332167]  ? kthread_park+0x80/0x80
> [63080.332167]  ret_from_fork+0x1f/0x30
> [63080.332167] Modules linked in: binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rfkill rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation intel_rapl_msr intel_rapl_common snd_soc_core snd_compress snd_pcm_dmaengine soundwire_cadence snd_hda_codec sb_edac x86_pkg_temp_thermal snd_hda_core intel_powerclamp coretemp ac97_bus kvm_intel snd_hwdep snd_seq iTCO_wdt snd_seq_device dcdbas intel_pmc_bxt mei_wdt mei_hdcp iTCO_vendor_support snd_pcm dell_smm_hwmon kvm irqbypass snd_timer rapl mei_me intel_cstate snd i2c_i801 intel_uncore i2c_smbus lpc_ich mei soundcore nvme_rdma nvme_fabrics rdma_cm iw_cm ib_cm rdmavt rdma_rxe nfsd ib_uverbs ip6_udp_tunnel udp_tunnel ib_core auth_rpcgss nfs_acl lockd grace nfs_ssc ip_tables xfs radeon
> [63080.332167]  i2c_algo_bit ttm drm_kms_helper cec bnx2x crct10dif_pclmul nvme crc32_pclmul drm crc32c_intel mpt3sas ghash_clmulni_intel e1000e pcspkr mdio nvme_core raid_class scsi_transport_sas wmi dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua btrfs xor raid6_pq sunrpc i2c_dev
> [63080.332167] ---[ end trace 35ee9d9fcf3c4757 ]---
> [63080.332167] RIP: 0010:split_huge_page_to_list.cold.86+0x19/0x1b
> [63080.332167] Code: 3a bc e8 8f 86 ff ff b8 f4 ff ff ff e9 43 7f 83 ff 31 f6 4c 89 e7 e8 bd dc 7d ff 48 c7 c6 4f f1 3a bc 48 89 ef e8 ae dc 7d ff <0f> 0b 48 8b 34 24 4c 89 e2 48 c7 c7 28 f5 3a bc e8 57 86 ff ff 31
> [63080.332167] RSP: 0018:ffffa002cd183b10 EFLAGS: 00010086
> [63080.332167] RAX: 0000000000000000 RBX: ffff8f1021810ae0 RCX: 0000000000000027
> [63080.332167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f2eefa18a88
> [63080.332167] RBP: ffffc93cd9da8080 R08: 0000000000000000 R09: c0000000ffffbfff
> [63080.332167] R10: 0000000000000001 R11: ffffa002cd1837e8 R12: ffffc93cd9da8000
> [63080.332167] R13: 0000000000000000 R14: ffff8f21be9f0cb8 R15: 00000000000001ff
> [63080.332167] FS:  0000000000000000(0000) GS:ffff8f2eefa00000(0000) knlGS:0000000000000000
> [63080.332167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [63080.332167] CR2: 00007f8e24fabd20 CR3: 00000007eaa10005 CR4: 00000000001706e0
> [63080.332167] Kernel panic - not syncing: Fatal exception
> [63080.332167] Shutting down cpus with NMI
> [63080.332167] Kernel Offset: 0x3a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [63080.332167] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2021/04/17
>