From: Mike Rapoport <rppt@linux.ibm.com> To: "Qian Cai (QUIC)" <quic_qiancai@quicinc.com> Cc: Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Ard Biesheuvel <ardb@kernel.org>, Linux Memory Management List <linux-mm@kvack.org>, Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: Arm64 crash while reading memory sysfs Date: Wed, 26 May 2021 20:24:44 +0300 [thread overview] Message-ID: <YK6EXNZHY1xt7Kjs@linux.ibm.com> (raw) In-Reply-To: <CY4PR0201MB35539FF5EE729283C4241F5A8E249@CY4PR0201MB3553.namprd02.prod.outlook.com> On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote: > > > > On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote: > > > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while > > reading files under /sys/devices/system/memory. Does the issue persist of you only revert the latest patch in the series? In next-20210525 it would be commit 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix") and commit dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()"). > > Can you please send the beginning of the boot log, up to the > > "Memory: xK/yK available ..." > > line? > > [ 0.000000] NUMA: Failed to initialise from firmware > [ 0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff] > [ 0.000000] Zone ranges: > [ 0.000000] Normal [mem 0x0000000090000000-0x0000009fffffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000090000000-0x0000000091ffffff] > [ 0.000000] node 0: [mem 0x0000000092000000-0x00000000928fffff] > [ 0.000000] node 0: [mem 0x0000000092900000-0x00000000fffbffff] > [ 0.000000] node 0: [mem 0x00000000fffc0000-0x00000000ffffffff] > [ 0.000000] node 0: [mem 0x0000000880000000-0x0000000fffffffff] > [ 0.000000] node 0: [mem 0x0000008800000000-0x0000009ff5aeffff] > [ 0.000000] node 0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff] > [ 0.000000] node 0: [mem 0x0000009ff5b30000-0x0000009ff5baffff] > [ 0.000000] node 0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff] > [ 0.000000] node 0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff] > [ 0.000000] node 0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff] > [ 0.000000] node 0: [mem 0x0000009ff8000000-0x0000009fffffffff] > [ 0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff] > [ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off > [ 0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved) The available and reserved sizes look weird. Can you post the log with memblock=debug and mminit_loglevel=4 added to the kernel command line? > > > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/ > > > > > > [ 247.669668][ T1443] kernel BUG at include/linux/mm.h:1383! > > > [ 247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP > > > [ 247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit > > nvme mlx5_core i2c_core nvme_core firmware_class > > > [ 247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11 > > > [ 247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 > > > [ 247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > > > [ 247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300 > > > [ 247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300 Do we know what PFN triggers it? Can you please run with this patch: diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 70620d0dd923..b9d1dd0dae5f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1443,6 +1443,12 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn, i++; if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn) continue; + + if (!pfn_valid(pfn)) + pr_info("%s: pfn %lx is not valid\n", __func__, pfn); + else if (PagePoisoned(pfn_to_page(pfn))) + dump_page(pfn_to_page(pfn), ""); + /* Check if we got outside of the zone */ if (zone && !zone_spans_pfn(zone, pfn + i)) return NULL; -- Sincerely yours, Mike.
WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@linux.ibm.com> To: "Qian Cai (QUIC)" <quic_qiancai@quicinc.com> Cc: Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Catalin Marinas <catalin.marinas@arm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Ard Biesheuvel <ardb@kernel.org>, Linux Memory Management List <linux-mm@kvack.org>, Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: Arm64 crash while reading memory sysfs Date: Wed, 26 May 2021 20:24:44 +0300 [thread overview] Message-ID: <YK6EXNZHY1xt7Kjs@linux.ibm.com> (raw) In-Reply-To: <CY4PR0201MB35539FF5EE729283C4241F5A8E249@CY4PR0201MB3553.namprd02.prod.outlook.com> On Wed, May 26, 2021 at 12:09:14PM +0000, Qian Cai (QUIC) wrote: > > > > On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote: > > > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while > > reading files under /sys/devices/system/memory. Does the issue persist of you only revert the latest patch in the series? In next-20210525 it would be commit 89fb47db72f2 ("arm64-drop-pfn_valid_within-and-simplify-pfn_valid-fix") and commit dfe215e9bac2 ("arm64: drop pfn_valid_within() and simplify pfn_valid()"). > > Can you please send the beginning of the boot log, up to the > > "Memory: xK/yK available ..." > > line? > > [ 0.000000] NUMA: Failed to initialise from firmware > [ 0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff] > [ 0.000000] Zone ranges: > [ 0.000000] Normal [mem 0x0000000090000000-0x0000009fffffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000090000000-0x0000000091ffffff] > [ 0.000000] node 0: [mem 0x0000000092000000-0x00000000928fffff] > [ 0.000000] node 0: [mem 0x0000000092900000-0x00000000fffbffff] > [ 0.000000] node 0: [mem 0x00000000fffc0000-0x00000000ffffffff] > [ 0.000000] node 0: [mem 0x0000000880000000-0x0000000fffffffff] > [ 0.000000] node 0: [mem 0x0000008800000000-0x0000009ff5aeffff] > [ 0.000000] node 0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff] > [ 0.000000] node 0: [mem 0x0000009ff5b30000-0x0000009ff5baffff] > [ 0.000000] node 0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff] > [ 0.000000] node 0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff] > [ 0.000000] node 0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff] > [ 0.000000] node 0: [mem 0x0000009ff8000000-0x0000009fffffffff] > [ 0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff] > [ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off > [ 0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved) The available and reserved sizes look weird. Can you post the log with memblock=debug and mminit_loglevel=4 added to the kernel command line? > > > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@kernel.org/ > > > > > > [ 247.669668][ T1443] kernel BUG at include/linux/mm.h:1383! > > > [ 247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP > > > [ 247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit > > nvme mlx5_core i2c_core nvme_core firmware_class > > > [ 247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11 > > > [ 247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 > > > [ 247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > > > [ 247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300 > > > [ 247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300 Do we know what PFN triggers it? Can you please run with this patch: diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 70620d0dd923..b9d1dd0dae5f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1443,6 +1443,12 @@ struct zone *test_pages_in_a_zone(unsigned long start_pfn, i++; if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn) continue; + + if (!pfn_valid(pfn)) + pr_info("%s: pfn %lx is not valid\n", __func__, pfn); + else if (PagePoisoned(pfn_to_page(pfn))) + dump_page(pfn_to_page(pfn), ""); + /* Check if we got outside of the zone */ if (zone && !zone_spans_pfn(zone, pfn + i)) return NULL; -- Sincerely yours, Mike. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-05-26 17:25 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-25 15:25 Arm64 crash while reading memory sysfs Qian Cai (QUIC) 2021-05-25 15:25 ` Qian Cai (QUIC) 2021-05-25 15:37 ` David Hildenbrand 2021-05-25 15:37 ` David Hildenbrand 2021-05-26 6:40 ` Mike Rapoport 2021-05-26 6:40 ` Mike Rapoport 2021-05-26 12:09 ` Qian Cai (QUIC) 2021-05-26 12:09 ` Qian Cai (QUIC) 2021-05-26 13:04 ` Catalin Marinas 2021-05-26 13:04 ` Catalin Marinas 2021-05-26 17:25 ` Mike Rapoport 2021-05-26 17:25 ` Mike Rapoport 2021-05-26 17:24 ` Mike Rapoport [this message] 2021-05-26 17:24 ` Mike Rapoport 2021-05-27 0:16 ` Qian Cai 2021-05-27 0:16 ` Qian Cai 2021-05-27 0:31 ` Andrew Morton 2021-05-27 0:31 ` Andrew Morton 2021-05-27 7:25 ` Stephen Rothwell 2021-05-27 7:25 ` Stephen Rothwell 2021-05-27 8:56 ` Mike Rapoport 2021-05-27 8:56 ` Mike Rapoport 2021-05-27 14:33 ` Qian Cai 2021-05-27 14:33 ` Qian Cai 2021-05-27 16:22 ` Mike Rapoport 2021-05-27 16:22 ` Mike Rapoport 2021-05-27 17:00 ` Qian Cai 2021-05-27 17:00 ` Qian Cai 2021-05-27 17:12 ` David Hildenbrand 2021-05-27 17:12 ` David Hildenbrand 2021-05-27 17:50 ` Catalin Marinas 2021-05-27 17:50 ` Catalin Marinas 2021-05-27 22:56 ` Andrew Morton 2021-05-27 22:56 ` Andrew Morton 2021-05-28 5:13 ` Mike Rapoport 2021-05-28 5:13 ` Mike Rapoport 2021-06-08 7:06 ` Anshuman Khandual 2021-06-08 7:06 ` Anshuman Khandual 2021-06-14 8:25 ` Mike Rapoport 2021-06-14 8:25 ` Mike Rapoport 2021-06-15 0:13 ` Andrew Morton 2021-06-15 0:13 ` Andrew Morton 2021-06-15 6:05 ` Mike Rapoport 2021-06-15 6:05 ` Mike Rapoport
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YK6EXNZHY1xt7Kjs@linux.ibm.com \ --to=rppt@linux.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=anshuman.khandual@arm.com \ --cc=ardb@kernel.org \ --cc=catalin.marinas@arm.com \ --cc=david@redhat.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=maz@kernel.org \ --cc=quic_qiancai@quicinc.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.