From: Jianyong Wu <Jianyong.Wu@arm.com> To: Mark Rutland <Mark.Rutland@arm.com> Cc: Catalin Marinas <Catalin.Marinas@arm.com>, "will@kernel.org" <will@kernel.org>, Anshuman Khandual <Anshuman.Khandual@arm.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "david@redhat.com" <david@redhat.com>, "quic_qiancai@quicinc.com" <quic_qiancai@quicinc.com>, "ardb@kernel.org" <ardb@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, "gshan@redhat.com" <gshan@redhat.com>, Justin He <Justin.He@arm.com>, nd <nd@arm.com> Subject: RE: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping Date: Fri, 17 Dec 2021 10:09:04 +0000 [thread overview] Message-ID: <AM9PR08MB727694AAAB8247FDF9AD73DAF4789@AM9PR08MB7276.eurprd08.prod.outlook.com> (raw) In-Reply-To: <YbxYuETndF9LmJz4@FVFF77S0Q05N> Hi Mark, > -----Original Message----- > From: Mark Rutland <mark.rutland@arm.com> > Sent: Friday, December 17, 2021 5:31 PM > To: Jianyong Wu <Jianyong.Wu@arm.com> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; will@kernel.org; Anshuman > Khandual <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org; > david@redhat.com; quic_qiancai@quicinc.com; ardb@kernel.org; linux- > kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > gshan@redhat.com; Justin He <Justin.He@arm.com>; nd <nd@arm.com> > Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create > pud mapping > > On Thu, Dec 16, 2021 at 04:28:12PM +0800, Jianyong Wu wrote: > > The 'fixmap' is a global resource and is used recursively by create > > pud mapping(), leading to a potential race condition in the presence > > of a concurrent call to alloc_init_pud(): > > > > kernel_init thread virtio-mem workqueue thread > > ================== =========================== > > > > alloc_init_pud(...) alloc_init_pud(...) > > pudp = pud_set_fixmap_offset(...) pudp = pud_set_fixmap_offset(...) > > READ_ONCE(*pudp) > > pud_clear_fixmap(...) > > READ_ONCE(*pudp) // CRASH! > > > > As kernel may sleep during creating pud mapping, introduce a mutex > > lock to serialise use of the fixmap entries by alloc_init_pud(). > > > > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> > > Since there were deadlock issues with the last version, it would be very nice > if we could check this with at least: > > * CONFIG_DEBUG_ATOMIC_SLEEP > * CONFIG_PROVE_LOCKING > > ... so that we can be reasonably certain that we're not introducing some > livelock/deadlock scenario. > I enable these 2 configs and test for the current patch. No warning related with this change found. > Are you able to reproduce the problem for testing, or was this found by > inspection? Do you have any instructions for reproducing the problem? e.g. > can this easily be tested with QEMU? > I test it using Cloud Hypervisor not QEMU. I find the bug when I tested the incoming feature of virtio-mem using Cloud Hypervisor. I think we can reproduce this bug using QEMU, but as there is no virtio-mem support for the current QEMU, we can only test the ACPI-based memory hotplug. However, I think it's not easy to do and I have not tried that. In my test: firstly, start a VM and hotplug a large size of memory using virtio-mem and reboot or kexec a new kernel. When the kernel booting, memory hotplugged by virtio-mem will be added within kernel_init. As both of kernel init and memory add thread will update page table, "alloc_pud_init" may be executed concurrently. I think it's not easy to reproduce the bug using ACPI based memory hotplug, as we must hotplug memory at the same time of kernel_init to crash with it. > If you're able to reproduce the issue, it would be nice to have an example > backtrace of when this goes wrong. > Yes, this bug occurs when kernel init, the function execute flow is: ------------------------- kernel_init kernel_init_freeable ... do_initcall ... module_init [A] ... mark_readonly mark_rodata_ro [B] ------------------------- [A] can contains memory hotplug init therefore both [A] and [B] can update page table at the same time and may lead to crash. Thanks Jianyong Wu > Thanks, > Mark. > > > --- > > > > Change log: > > > > from v2 to v3: > > change spin lock to mutex lock as kernel may sleep when create > > pud map. > > > > arch/arm64/mm/mmu.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index > > acfae9b41cc8..e680a6a8ca40 100644 > > --- a/arch/arm64/mm/mmu.c > > +++ b/arch/arm64/mm/mmu.c > > @@ -63,6 +63,7 @@ static pmd_t bm_pmd[PTRS_PER_PMD] > __page_aligned_bss > > __maybe_unused; static pud_t bm_pud[PTRS_PER_PUD] > __page_aligned_bss > > __maybe_unused; > > > > static DEFINE_SPINLOCK(swapper_pgdir_lock); > > +static DEFINE_MUTEX(fixmap_lock); > > > > void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd) { @@ -329,6 +330,11 @@ > > static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long > end, > > } > > BUG_ON(p4d_bad(p4d)); > > > > + /* > > + * We only have one fixmap entry per page-table level, so take > > + * the fixmap lock until we're done. > > + */ > > + mutex_lock(&fixmap_lock); > > pudp = pud_set_fixmap_offset(p4dp, addr); > > do { > > pud_t old_pud = READ_ONCE(*pudp); > > @@ -359,6 +365,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned > long addr, unsigned long end, > > } while (pudp++, addr = next, addr != end); > > > > pud_clear_fixmap(); > > + mutex_unlock(&fixmap_lock); > > } > > > > static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys, > > -- > > 2.17.1 > >
WARNING: multiple messages have this Message-ID (diff)
From: Jianyong Wu <Jianyong.Wu@arm.com> To: Mark Rutland <Mark.Rutland@arm.com> Cc: Catalin Marinas <Catalin.Marinas@arm.com>, "will@kernel.org" <will@kernel.org>, Anshuman Khandual <Anshuman.Khandual@arm.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "david@redhat.com" <david@redhat.com>, "quic_qiancai@quicinc.com" <quic_qiancai@quicinc.com>, "ardb@kernel.org" <ardb@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, "gshan@redhat.com" <gshan@redhat.com>, Justin He <Justin.He@arm.com>, nd <nd@arm.com> Subject: RE: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping Date: Fri, 17 Dec 2021 10:09:04 +0000 [thread overview] Message-ID: <AM9PR08MB727694AAAB8247FDF9AD73DAF4789@AM9PR08MB7276.eurprd08.prod.outlook.com> (raw) In-Reply-To: <YbxYuETndF9LmJz4@FVFF77S0Q05N> Hi Mark, > -----Original Message----- > From: Mark Rutland <mark.rutland@arm.com> > Sent: Friday, December 17, 2021 5:31 PM > To: Jianyong Wu <Jianyong.Wu@arm.com> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; will@kernel.org; Anshuman > Khandual <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org; > david@redhat.com; quic_qiancai@quicinc.com; ardb@kernel.org; linux- > kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org; > gshan@redhat.com; Justin He <Justin.He@arm.com>; nd <nd@arm.com> > Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create > pud mapping > > On Thu, Dec 16, 2021 at 04:28:12PM +0800, Jianyong Wu wrote: > > The 'fixmap' is a global resource and is used recursively by create > > pud mapping(), leading to a potential race condition in the presence > > of a concurrent call to alloc_init_pud(): > > > > kernel_init thread virtio-mem workqueue thread > > ================== =========================== > > > > alloc_init_pud(...) alloc_init_pud(...) > > pudp = pud_set_fixmap_offset(...) pudp = pud_set_fixmap_offset(...) > > READ_ONCE(*pudp) > > pud_clear_fixmap(...) > > READ_ONCE(*pudp) // CRASH! > > > > As kernel may sleep during creating pud mapping, introduce a mutex > > lock to serialise use of the fixmap entries by alloc_init_pud(). > > > > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> > > Since there were deadlock issues with the last version, it would be very nice > if we could check this with at least: > > * CONFIG_DEBUG_ATOMIC_SLEEP > * CONFIG_PROVE_LOCKING > > ... so that we can be reasonably certain that we're not introducing some > livelock/deadlock scenario. > I enable these 2 configs and test for the current patch. No warning related with this change found. > Are you able to reproduce the problem for testing, or was this found by > inspection? Do you have any instructions for reproducing the problem? e.g. > can this easily be tested with QEMU? > I test it using Cloud Hypervisor not QEMU. I find the bug when I tested the incoming feature of virtio-mem using Cloud Hypervisor. I think we can reproduce this bug using QEMU, but as there is no virtio-mem support for the current QEMU, we can only test the ACPI-based memory hotplug. However, I think it's not easy to do and I have not tried that. In my test: firstly, start a VM and hotplug a large size of memory using virtio-mem and reboot or kexec a new kernel. When the kernel booting, memory hotplugged by virtio-mem will be added within kernel_init. As both of kernel init and memory add thread will update page table, "alloc_pud_init" may be executed concurrently. I think it's not easy to reproduce the bug using ACPI based memory hotplug, as we must hotplug memory at the same time of kernel_init to crash with it. > If you're able to reproduce the issue, it would be nice to have an example > backtrace of when this goes wrong. > Yes, this bug occurs when kernel init, the function execute flow is: ------------------------- kernel_init kernel_init_freeable ... do_initcall ... module_init [A] ... mark_readonly mark_rodata_ro [B] ------------------------- [A] can contains memory hotplug init therefore both [A] and [B] can update page table at the same time and may lead to crash. Thanks Jianyong Wu > Thanks, > Mark. > > > --- > > > > Change log: > > > > from v2 to v3: > > change spin lock to mutex lock as kernel may sleep when create > > pud map. > > > > arch/arm64/mm/mmu.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index > > acfae9b41cc8..e680a6a8ca40 100644 > > --- a/arch/arm64/mm/mmu.c > > +++ b/arch/arm64/mm/mmu.c > > @@ -63,6 +63,7 @@ static pmd_t bm_pmd[PTRS_PER_PMD] > __page_aligned_bss > > __maybe_unused; static pud_t bm_pud[PTRS_PER_PUD] > __page_aligned_bss > > __maybe_unused; > > > > static DEFINE_SPINLOCK(swapper_pgdir_lock); > > +static DEFINE_MUTEX(fixmap_lock); > > > > void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd) { @@ -329,6 +330,11 @@ > > static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long > end, > > } > > BUG_ON(p4d_bad(p4d)); > > > > + /* > > + * We only have one fixmap entry per page-table level, so take > > + * the fixmap lock until we're done. > > + */ > > + mutex_lock(&fixmap_lock); > > pudp = pud_set_fixmap_offset(p4dp, addr); > > do { > > pud_t old_pud = READ_ONCE(*pudp); > > @@ -359,6 +365,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned > long addr, unsigned long end, > > } while (pudp++, addr = next, addr != end); > > > > pud_clear_fixmap(); > > + mutex_unlock(&fixmap_lock); > > } > > > > static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys, > > -- > > 2.17.1 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-12-17 10:09 UTC|newest] Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-16 8:28 [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping Jianyong Wu 2021-12-16 8:28 ` Jianyong Wu 2021-12-16 15:19 ` David Hildenbrand 2021-12-16 15:19 ` David Hildenbrand 2021-12-17 9:30 ` Mark Rutland 2021-12-17 9:30 ` Mark Rutland 2021-12-17 10:09 ` Jianyong Wu [this message] 2021-12-17 10:09 ` Jianyong Wu 2022-01-05 18:03 ` Catalin Marinas 2022-01-05 18:03 ` Catalin Marinas 2022-01-06 10:13 ` Jianyong Wu 2022-01-06 10:13 ` Jianyong Wu 2022-01-06 15:56 ` Catalin Marinas 2022-01-06 15:56 ` Catalin Marinas 2022-01-07 9:10 ` Jianyong Wu 2022-01-07 9:10 ` Jianyong Wu 2022-01-07 10:42 ` Catalin Marinas 2022-01-07 10:42 ` Catalin Marinas 2022-01-26 4:20 ` Justin He 2022-01-26 4:20 ` Justin He 2022-01-26 8:36 ` Ard Biesheuvel 2022-01-26 8:36 ` Ard Biesheuvel 2022-01-26 10:09 ` Jianyong Wu 2022-01-26 10:09 ` Jianyong Wu 2022-01-26 10:12 ` Ard Biesheuvel 2022-01-26 10:12 ` Ard Biesheuvel 2022-01-26 10:17 ` David Hildenbrand 2022-01-26 10:17 ` David Hildenbrand 2022-01-26 10:28 ` Jianyong Wu 2022-01-26 10:28 ` Jianyong Wu 2022-01-26 10:30 ` David Hildenbrand 2022-01-26 10:30 ` David Hildenbrand 2022-01-26 10:31 ` David Hildenbrand 2022-01-26 10:31 ` David Hildenbrand 2022-01-27 6:24 ` Jianyong Wu 2022-01-27 6:24 ` Jianyong Wu 2022-01-27 12:22 ` David Hildenbrand 2022-01-27 12:22 ` David Hildenbrand 2022-01-27 12:34 ` Catalin Marinas 2022-01-27 12:34 ` Catalin Marinas 2022-01-31 8:13 ` Jianyong Wu 2022-01-31 8:13 ` Jianyong Wu 2022-01-31 8:10 ` Jianyong Wu 2022-01-31 8:10 ` Jianyong Wu 2022-01-27 1:31 ` Justin He 2022-01-27 1:31 ` Justin He 2022-01-07 10:53 ` Catalin Marinas 2022-01-07 10:53 ` Catalin Marinas
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=AM9PR08MB727694AAAB8247FDF9AD73DAF4789@AM9PR08MB7276.eurprd08.prod.outlook.com \ --to=jianyong.wu@arm.com \ --cc=Anshuman.Khandual@arm.com \ --cc=Catalin.Marinas@arm.com \ --cc=Justin.He@arm.com \ --cc=Mark.Rutland@arm.com \ --cc=akpm@linux-foundation.org \ --cc=ardb@kernel.org \ --cc=david@redhat.com \ --cc=gshan@redhat.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=nd@arm.com \ --cc=quic_qiancai@quicinc.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.