From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E961C433B4 for ; Fri, 7 May 2021 07:17:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 22837613EB for ; Fri, 7 May 2021 07:17:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235270AbhEGHSO (ORCPT ); Fri, 7 May 2021 03:18:14 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:17140 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231272AbhEGHSM (ORCPT ); Fri, 7 May 2021 03:18:12 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Fc1rx60N9zqT3T; Fri, 7 May 2021 15:13:53 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.498.0; Fri, 7 May 2021 15:17:08 +0800 Subject: Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()) From: Kefeng Wang To: Mike Rapoport , David Hildenbrand CC: , Andrew Morton , Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , Marc Zyngier , Mark Rutland , "Mike Rapoport" , Will Deacon , , , References: <259d14df-a713-72e7-4ccb-c06a8ee31e13@huawei.com> <6ad2956c-70ae-c423-ed7d-88e94c88060f@huawei.com> <0cb013e4-1157-f2fa-96ec-e69e60833f72@huawei.com> <24b37c01-fc75-d459-6e61-d67e8f0cf043@redhat.com> <82cfbb7f-dd4f-12d8-dc76-847f06172200@huawei.com> Message-ID: Date: Fri, 7 May 2021 15:17:08 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <82cfbb7f-dd4f-12d8-dc76-847f06172200@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/5/6 20:47, Kefeng Wang wrote: > > >>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >>>>> move_freepages at >>>>> >>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] :  pfn >>>>> =de600, page >>>>> =ef3cc000, page-flags = ffffffff,  pfn2phy = de600000 >>>>> >>>>>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - >>>>>>> b0200 >>>>>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - >>>>>>> b0200 >>>>>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - >>>>>>> b0200 >>>> >>>> Hmm, [de600, de7ff] is not added to the free lists which is correct. >>>> But >>>> then it's unclear how the page for de600 gets to move_freepages()... >>>> >>>> Can't say I have any bright ideas to try here... >>> >>> Are we missing some checks (e.g., PageReserved()) that >>> pfn_valid_within() >>> would have "caught" before? >> >> Unless I'm missing something the crash happens in __rmqueue_fallback(): >> >> do_steal: >>     page = get_page_from_free_area(area, fallback_mt); >> >>     steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, >>                                 can_steal); >>         -> move_freepages() >>             -> BUG() >> >> So a page from free area should be sane as the freed range was never >> added >> it to the free lists. > > Sorry for the late response due to the vacation. > > The pfn in range [de600, de7ff] won't be added into the free lists via > __free_memory_core(), but the pfn could be added into freelists via > free_highmem_page() > > I add some debug[1] in add_to_free_list(), we could see the calltrace > > free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] > free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] > free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] > add_to_free_list, ===> pfn = de700 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec > pfn = de700 > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 > Hardware name: Hisilicon A9 > [] (show_stack) from [] (dump_stack+0x9c/0xc0) > [] (dump_stack) from [] (__warn+0xc0/0xec) > [] (__warn) from [] (warn_slowpath_fmt+0x74/0xa4) > [] (warn_slowpath_fmt) from [] > (add_to_free_list+0x8c/0xec) > [] (add_to_free_list) from [] > (free_pcppages_bulk+0x200/0x278) > [] (free_pcppages_bulk) from [] > (free_unref_page+0x58/0x68) > [] (free_unref_page) from [] > (free_highmem_page+0xc/0x50) > [] (free_highmem_page) from [] (mem_init+0x21c/0x254) > [] (mem_init) from [] (start_kernel+0x258/0x5c0) > [] (start_kernel) from [<00000000>] (0x0) > > so any idea? If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], but the range of [de600,de700] without ‘struct page' will lead to this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, and the same issue will occurred in isolate_freepages_block(), maybe there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in ARCH_HISI, any better solution? Thanks.