From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5B7AC433B4 for ; Sun, 25 Apr 2021 07:52:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D2B6561462 for ; Sun, 25 Apr 2021 07:52:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2B6561462 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 44F056B0036; Sun, 25 Apr 2021 03:52:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FE096B006C; Sun, 25 Apr 2021 03:52:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 277106B006E; Sun, 25 Apr 2021 03:52:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id 090DF6B0036 for ; Sun, 25 Apr 2021 03:52:09 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A9743180366EE for ; Sun, 25 Apr 2021 07:52:08 +0000 (UTC) X-FDA: 78070121136.06.74A11AD Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf20.hostedemail.com (Postfix) with ESMTP id E6D5EF0 for ; Sun, 25 Apr 2021 07:51:58 +0000 (UTC) Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.59]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FSgBy19Y2zNxMw; Sun, 25 Apr 2021 15:48:58 +0800 (CST) Received: from [10.174.177.244] (10.174.177.244) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.498.0; Sun, 25 Apr 2021 15:51:57 +0800 Subject: Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()) To: Mike Rapoport CC: , Andrew Morton , Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , David Hildenbrand , Marc Zyngier , "Mark Rutland" , Mike Rapoport , "Will Deacon" , , , References: <20210421065108.1987-1-rppt@kernel.org> <9aa68d26-d736-3b75-4828-f148964eb7f0@huawei.com> <33fa74c2-f32d-f224-eb30-acdb717179ff@huawei.com> <2a1592ad-bc9d-4664-fd19-f7448a37edc0@huawei.com> From: Kefeng Wang Message-ID: <52f7d03b-7219-46bc-c62d-b976bc31ebd5@huawei.com> Date: Sun, 25 Apr 2021 15:51:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------DED9ED8B6053C923FBA02074" Content-Language: en-US X-Originating-IP: [10.174.177.244] X-CFilter-Loop: Reflected X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: E6D5EF0 X-Stat-Signature: nsqa6rcr4r6owt9hwc5y3xdxzo3q5nz7 Received-SPF: none (huawei.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=szxga05-in.huawei.com; client-ip=45.249.212.191 X-HE-DKIM-Result: none/none X-HE-Tag: 1619337118-922822 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --------------DED9ED8B6053C923FBA02074 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable On 2021/4/25 15:19, Mike Rapoport wrote: > On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: >> I tested this patchset(plus arm32 change, like arm64 does) based on lt= s >> 5.10=EF=BC=8Cadd >> >> some debug log, the useful info shows below, if we enable HOLES_IN_ZON= E, no >> panic, >> >> any idea, thanks. > =20 > Are there any changes on top of 5.10 except for pfn_valid() patch? > Do you see this panic on 5.10 without the changes? Yes, there are some BSP support for arm board based on 5.10, with or=20 without your patch will get same panic, the panic pfn=3Dde600 in the range of=20 [dcc00,de00] which is freed by free_memmap, start_pfn =3D dcc00,=C2=A0 dcc00000 end_pf= n =3D=20 de700, de700000 we see the PC is at PageLRU, same reason like arm64 panic log, "PageBuddy in move_freepages returns false Then we call PageLRU, the=20 macro calls PF_HEAD which is compound_page() compound_page reads=20 page->compound_head, it is 0xffffffffffffffff, so it resturns=20 0xfffffffffffffffe - and accessing this address causes crash" > Can you see stack backtrace beyond move_freepages_block? I do some oom test, so the log is about memory allocate, [] (move_freepages_block) from []=20 (steal_suitable_fallback+0x174/0x1f4) [] (steal_suitable_fallback) from []=20 (get_page_from_freelist+0x490/0x9a4) [] (get_page_from_freelist) from []=20 (__alloc_pages_nodemask+0x188/0xc08) [] (__alloc_pages_nodemask) from []=20 (alloc_zeroed_user_highpage_movable+0x14/0x3c) [] (alloc_zeroed_user_highpage_movable) from []=20 (handle_mm_fault+0x254/0xac8) [] (handle_mm_fault) from [] (do_page_fault+0x228/0x2= f4) [] (do_page_fault) from [] (do_DataAbort+0x48/0xd0) [] (do_DataAbort) from [] (__dabt_usr+0x40/0x60) > >> Zone ranges: >> =C2=A0 Normal=C2=A0=C2=A0 [mem 0x0000000080a00000-0x00000000b01fffff] >> =C2=A0 HighMem=C2=A0 [mem 0x00000000b0200000-0x00000000ffffefff] >> Movable zone start for each node >> Early memory node ranges >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x0000000080a00000-0x00000000855fffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x0000000086a00000-0x0000000087dfffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x000000008bd00000-0x000000008c4fffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x000000008e300000-0x000000008ecfffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x0000000090d00000-0x00000000bfffffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000cc000000-0x00000000dc9fffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000de700000-0x00000000de9fffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000e0800000-0x00000000e0bfffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000f4b00000-0x00000000f6ffffff= ] >> =C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000fda00000-0x00000000ffffefff= ] >> >> ----> free_memmap, start_pfn =3D 85800,=C2=A0 85800000 end_pfn =3D 86a= 00, 86a00000 >> ----> free_memmap, start_pfn =3D 8c800,=C2=A0 8c800000 end_pfn =3D 8e3= 00, 8e300000 >> ----> free_memmap, start_pfn =3D 8f000,=C2=A0 8f000000 end_pfn =3D 900= 00, 90000000 >> ----> free_memmap, start_pfn =3D dcc00,=C2=A0 dcc00000 end_pfn =3D de7= 00, de700000 >> ----> free_memmap, start_pfn =3D dec00,=C2=A0 dec00000 end_pfn =3D e00= 00, e0000000 >> ----> free_memmap, start_pfn =3D e0c00,=C2=A0 e0c00000 end_pfn =3D e40= 00, e4000000 >> ----> free_memmap, start_pfn =3D f7000,=C2=A0 f7000000 end_pfn =3D f80= 00, f8000000 >> =3D=3D=3D >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000= , de7ff000] >> :=C2=A0 pfn =3Dde600 pfn2phy =3D de600000 , page =3D ef3cc000, page-fl= ags =3D ffffffff >> 8<--- cut here --- >> Unable to handle kernel paging request at virtual address fffffffe >> pgd =3D 5dd50df5 >> [fffffffe] *pgd=3Daffff861, *pte=3D00000000, *ppte=3D00000000 >> Internal error: Oops: 37 [#1] SMP ARM >> Modules linked in: gmac(O) >> CPU: 2 PID: 635 Comm: test-oom Tainted: G=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 O=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5.10.0+ #3= 1 >> Hardware name: Hisilicon A9 >> PC is at move_freepages_block+0x150/0x278 >> LR is at move_freepages_block+0x150/0x278 >> pc : []=C2=A0=C2=A0=C2=A0 lr : []=C2=A0=C2=A0=C2=A0= psr: 200e0393 >> sp : c4179cf8=C2=A0 ip : 00000000=C2=A0 fp : 00000001 >> r10: c4179d58=C2=A0 r9 : 000de7ff=C2=A0 r8 : 00000000 >> r7 : c0863280=C2=A0 r6 : 000de600=C2=A0 r5 : 000de600=C2=A0 r4 : ef3cc= 000 >> r3 : ffffffff=C2=A0 r2 : 00000000=C2=A0 r1 : ef5d069c=C2=A0 r0 : fffff= ffe >> Flags: nzCv=C2=A0 IRQs off=C2=A0 FIQs on=C2=A0 Mode SVC_32=C2=A0 ISA A= RM=C2=A0 Segment user >> Control: 1ac5387d=C2=A0 Table: 83b0c04a=C2=A0 DAC: 55555555 >> Process test-oom (pid: 635, stack limit =3D 0x25d667df) >> --------------DED9ED8B6053C923FBA02074 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable


On 2021/4/25 15:19, Mike Rapoport wrote:
On Fri, Apr 23, 2021 at 04:1=
1:16PM +0800, Kefeng Wang wrote:
I tested this patchset(plus arm32 change, like arm64 does) based on lts
5.10=EF=BC=8Cadd

some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, =
no
panic,

any idea, thanks.
=20
Are there any changes on top of 5.10 except for pfn_valid() patch?
Do you see this panic on 5.10 without the changes?

Yes, there are some BSP support for arm board based on 5.10, with or without

your patch will get same panic, the panic pfn=3Dde600 in the range of [dcc00,de00]

which is freed by free_memmap, start_pfn =3D dcc00,=C2=A0 dcc00000 end_pfn =3D de700, de700000

we see the PC is at PageLRU, same reason like arm64 panic log,

   "PageBuddy in move_freepages returns false
    Then we call PageLRU, the macro calls PF_HEAD which is compound_page(=
)
    compound_page reads page->compound_head, it is 0xffffffffffffffff,=
 so it
    resturns 0xfffffffffffffffe - and accessing this address causes crash=
"
Can you see stack backtrace beyond move_freepages_block?

I do some oom test, so the log is about memory allocate,

[<c02383c8>] (move_freepages_block) from [<c0238668>] (steal_suitable_fallback+0x174/0x1f4)

[<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4)
[<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08)
[<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c)
[<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8)
[<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4)
[<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0)
[<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60)

Zone ranges:
=C2=A0 Normal=C2=A0=C2=A0 [mem 0x0000000080a00000-0x00000000b01fffff]
=C2=A0 HighMem=C2=A0 [mem 0x00000000b0200000-0x00000000ffffefff]
Movable zone start for each node
Early memory node ranges
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x0000000080a00000-0x00000000855fffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x0000000086a00000-0x0000000087dfffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x000000008bd00000-0x000000008c4fffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x000000008e300000-0x000000008ecfffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x0000000090d00000-0x00000000bfffffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000cc000000-0x00000000dc9fffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000de700000-0x00000000de9fffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000e0800000-0x00000000e0bfffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000f4b00000-0x00000000f6ffffff]
=C2=A0 node=C2=A0=C2=A0 0: [mem 0x00000000fda00000-0x00000000ffffefff]

----> free_memmap, start_pfn =3D 85800,=C2=A0 85800000 end_pfn =3D 86a=
00, 86a00000
----> free_memmap, start_pfn =3D 8c800,=C2=A0 8c800000 end_pfn =3D 8e3=
00, 8e300000
----> free_memmap, start_pfn =3D 8f000,=C2=A0 8f000000 end_pfn =3D 900=
00, 90000000
----> free_memmap, start_pfn =3D dcc00,=C2=A0 dcc00000 end_pfn =3D de7=
00, de700000
----> free_memmap, start_pfn =3D dec00,=C2=A0 dec00000 end_pfn =3D e00=
00, e0000000
----> free_memmap, start_pfn =3D e0c00,=C2=A0 e0c00000 end_pfn =3D e40=
00, e4000000
----> free_memmap, start_pfn =3D f7000,=C2=A0 f7000000 end_pfn =3D f80=
00, f8000000
=3D=3D=3D >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000=
, de7ff000]
:=C2=A0 pfn =3Dde600 pfn2phy =3D de600000 , page =3D ef3cc000, page-flags=
 =3D ffffffff
8<--- cut here ---
Unable to handle kernel paging request at virtual address fffffffe
pgd =3D 5dd50df5
[fffffffe] *pgd=3Daffff861, *pte=3D00000000, *ppte=3D00000000
Internal error: Oops: 37 [#1] SMP ARM
Modules linked in: gmac(O)
CPU: 2 PID: 635 Comm: test-oom Tainted: G=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 O=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5.10.0+ #31
Hardware name: Hisilicon A9
PC is at move_freepages_block+0x150/0x278
LR is at move_freepages_block+0x150/0x278
pc : [<c02383a4>]=C2=A0=C2=A0=C2=A0 lr : [<c02383a4>]=C2=A0=C2=
=A0=C2=A0 psr: 200e0393
sp : c4179cf8=C2=A0 ip : 00000000=C2=A0 fp : 00000001
r10: c4179d58=C2=A0 r9 : 000de7ff=C2=A0 r8 : 00000000
r7 : c0863280=C2=A0 r6 : 000de600=C2=A0 r5 : 000de600=C2=A0 r4 : ef3cc000
r3 : ffffffff=C2=A0 r2 : 00000000=C2=A0 r1 : ef5d069c=C2=A0 r0 : fffffffe
Flags: nzCv=C2=A0 IRQs off=C2=A0 FIQs on=C2=A0 Mode SVC_32=C2=A0 ISA ARM=C2=
=A0 Segment user
Control: 1ac5387d=C2=A0 Table: 83b0c04a=C2=A0 DAC: 55555555
Process test-oom (pid: 635, stack limit =3D 0x25d667df)


    
--------------DED9ED8B6053C923FBA02074--