From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13BCBC433B4 for ; Mon, 10 May 2021 03:10:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 876DC613D1 for ; Mon, 10 May 2021 03:10:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 876DC613D1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 032F26B0070; Sun, 9 May 2021 23:10:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFD906B0071; Sun, 9 May 2021 23:10:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D77186B0072; Sun, 9 May 2021 23:10:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B90B96B0070 for ; Sun, 9 May 2021 23:10:28 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6F17D2C06 for ; Mon, 10 May 2021 03:10:28 +0000 (UTC) X-FDA: 78123843336.09.1593DF2 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf09.hostedemail.com (Postfix) with ESMTP id 7391D6000105 for ; Mon, 10 May 2021 03:10:16 +0000 (UTC) Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4FdmG53PPszlcpj; Mon, 10 May 2021 11:08:13 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Mon, 10 May 2021 11:10:21 +0800 Subject: Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()) To: Mike Rapoport CC: David Hildenbrand , , Andrew Morton , Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , Marc Zyngier , Mark Rutland , "Mike Rapoport" , Will Deacon , , , References: <0cb013e4-1157-f2fa-96ec-e69e60833f72@huawei.com> <24b37c01-fc75-d459-6e61-d67e8f0cf043@redhat.com> <82cfbb7f-dd4f-12d8-dc76-847f06172200@huawei.com> <33c67e13-dc48-9a2f-46d8-a532e17380fb@huawei.com> From: Kefeng Wang Message-ID: Date: Mon, 10 May 2021 11:10:20 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US X-Originating-IP: [10.174.177.243] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7391D6000105 Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=none) header.from=huawei.com X-Stat-Signature: uon8qkq1trckry5jj478k5kbq5r8unh9 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=szxga06-in.huawei.com; client-ip=45.249.212.32 X-HE-DKIM-Result: none/none X-HE-Tag: 1620616216-659750 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/5/9 13:59, Mike Rapoport wrote: > On Fri, May 07, 2021 at 08:34:52PM +0800, Kefeng Wang wrote: >> >> >> On 2021/5/7 18:30, Mike Rapoport wrote: >>> On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote: >>>> >>>> On 2021/5/6 20:47, Kefeng Wang wrote: >>>>> >>>>>>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >>>>>>>>> move_freepages at >>>>>>>>> >>>>>>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] >>>>>>>>> :=C2=A0 pfn =3Dde600, page >>>>>>>>> =3Def3cc000, page-flags =3D ffffffff,=C2=A0 pfn2phy =3D de60000= 0 >>>>>>>>> >>>>>>>>>>> __free_memory_core, range: 0xb0200000 - >>>>>>>>>>> 0xc0000000, pfn: b0200 - b0200 >>>>>>>>>>> __free_memory_core, range: 0xcc000000 - >>>>>>>>>>> 0xdca00000, pfn: cc000 - b0200 >>>>>>>>>>> __free_memory_core, range: 0xde700000 - >>>>>>>>>>> 0xdea00000, pfn: de700 - b0200 >>>>>>>> >>>>>>>> Hmm, [de600, de7ff] is not added to the free lists which is >>>>>>>> correct. But >>>>>>>> then it's unclear how the page for de600 gets to move_freepages(= )... >>>>>>>> >>>>>>>> Can't say I have any bright ideas to try here... >>>>>>> >>>>>>> Are we missing some checks (e.g., PageReserved()) that >>>>>>> pfn_valid_within() >>>>>>> would have "caught" before? >>>>>> >>>>>> Unless I'm missing something the crash happens in __rmqueue_fallba= ck(): >>>>>> >>>>>> do_steal: >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0page =3D get_page_from_free_area(area, f= allback_mt); >>>>>> >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0steal_suitable_fallback(zone, page, allo= c_flags, start_migratetype, >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 can_steal); >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -> move_freepages() >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= -> BUG() >>>>>> >>>>>> So a page from free area should be sane as the freed range was nev= er >>>>>> added >>>>>> it to the free lists. >>>>> >>>>> Sorry for the late response due to the vacation. >>>>> >>>>> The pfn in range [de600, de7ff] won't be added into the free lists = via >>>>> __free_memory_core(), but the pfn could be added into freelists via >>>>> free_highmem_page() >>>>> >>>>> I add some debug[1] in add_to_free_list(), we could see the calltra= ce >>>>> >>>>> free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c00= 00000] >>>>> free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca= 00000] >>>>> free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea= 00000] >>>>> add_to_free_list, =3D=3D=3D> pfn =3D de700 >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c= /0xec >>>>> pfn =3D de700 >>>>> Modules linked in: >>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 >>>>> Hardware name: Hisilicon A9 >>>>> [] (show_stack) from [] (dump_stack+0x9c/0xc0) >>>>> [] (dump_stack) from [] (__warn+0xc0/0xec) >>>>> [] (__warn) from [] (warn_slowpath_fmt+0x74/0xa= 4) >>>>> [] (warn_slowpath_fmt) from [] >>>>> (add_to_free_list+0x8c/0xec) >>>>> [] (add_to_free_list) from [] >>>>> (free_pcppages_bulk+0x200/0x278) >>>>> [] (free_pcppages_bulk) from [] >>>>> (free_unref_page+0x58/0x68) >>>>> [] (free_unref_page) from [] >>>>> (free_highmem_page+0xc/0x50) >>>>> [] (free_highmem_page) from [] (mem_init+0x21c/= 0x254) >>>>> [] (mem_init) from [] (start_kernel+0x258/0x5c0= ) >>>>> [] (start_kernel) from [<00000000>] (0x0) >>>>> >>>>> so any idea? >>>> >>>> If pfn =3D 0xde700, due to the pageblock_nr_pages =3D 0x200, then th= e >>>> start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], >>>> but the range of [de600,de700] without =E2=80=98struct page' will le= ad to >>>> this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, >>>> and the same issue will occurred in isolate_freepages_block(), maybe >>> >>> I think your analysis is correct except one minor detail. With the #i= fdef >>> fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] shou= ld not >>> be freed so there should be a struct page. Did you check what parts o= f the >>> memmap are actually freed with this patch applied? >>> Would you get a panic if you add >>> >>> dump_page(pfn_to_page(0xde600), ""); >>> >>> say, in the end of memblock_free_all()? >> >> The memory is not continuous, see MEMBLOCK: >> memory size =3D 0x4c0fffff reserved size =3D 0x027ef058 >> memory.cnt =3D 0xa >> memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 >> memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 >> memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 >> memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 >> memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 >> memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 >> memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 >> ... >> >> The pfn_range [0xde600,0xde700] =3D> addr_range [0xde600000,0xde700000= ] >> is not available memory, and we won't create memmap , so with or witho= ut >> your patch, we can't see the range in free_memmap(), right? > =20 >=20 > This is not available memory and we won't see the reange in free_memmap= (), > but we still should create memmap for it and that's what my patch tried= to > do. >=20 > There are a lot of places in core mm that operate on pageblocks and > free_unused_memmap() should make sure that any pageblock has a valid me= mory > map. >=20 > Currently, that's not the case when SPARSEMEM=3Dy and my patch tried to= fix > it. >=20 > Can you please send log with my patch applied and with the printing of > ranges that are freed in free_unused_memmap() you've used in previous > mails? with your patch[1] and debug print in free_memmap, ----> free_memmap, start_pfn =3D 85800, 85800000 end_pfn =3D 86800, 8680= 0000 ----> free_memmap, start_pfn =3D 8c800, 8c800000 end_pfn =3D 8e000, 8e00= 0000 ----> free_memmap, start_pfn =3D 8f000, 8f000000 end_pfn =3D 90000, 9000= 0000 ----> free_memmap, start_pfn =3D dcc00, dcc00000 end_pfn =3D de400, de40= 0000 ----> free_memmap, start_pfn =3D dec00, dec00000 end_pfn =3D e0000, e000= 0000 ----> free_memmap, start_pfn =3D e0c00, e0c00000 end_pfn =3D e4000, e400= 0000 ----> free_memmap, start_pfn =3D f7000, f7000000 end_pfn =3D f8000, f800= 0000 __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04 __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00 __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000 __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600 __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00 __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500 __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00 __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0 __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450 __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540 __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580 __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8 __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00 __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80 __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00 __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4 __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] free_highpages, range_pfn [e0800, e0c00], range_addr [e0800000, e0c00000] free_highpages, range_pfn [f4b00, f7000], range_addr [f4b00000, f7000000] free_highpages, range_pfn [fda00, fffff], range_addr [fda00000, ffffffff] > =20 >>>> there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to= solve >>>> this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or= only in >>>> ARCH_HISI, any better solution? Thanks. >>> >>> I don't think that HOLES_IN_ZONE is the right solution. I believe tha= t we >>> must keep the memory map aligned on pageblock boundaries. That's sure= ly not the >>> case for SPARSEMEM as of now, and if my fix is not enough we need to = find >>> where it went wrong. >>> >>> Besides, I'd say that if it is possible to update your firmware to ma= ke the >>> memory layout reported to the kernel less, hmm, esoteric, you would h= it >>> less corner cases. >> >> Sorry, memory layout is customized and we can't change it, some memory= is >> for special purposes by our production. > =20 > I understand that this memory cannot be used by Linux, but the firmware= may > supply the kernel with actual physical memory layout and then mark all > the special purpose memory that kernel should not touch as reserved. We only can modify kernel, so it is not practicable for our production,=20 and this way looks like a workaround, we need find a way to solve the=20 issue from kernel side. [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org