All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jianyong Wu <Jianyong.Wu@arm.com>
To: Mark Rutland <Mark.Rutland@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"david@redhat.com" <david@redhat.com>,
	"quic_qiancai@quicinc.com" <quic_qiancai@quicinc.com>,
	"ardb@kernel.org" <ardb@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"gshan@redhat.com" <gshan@redhat.com>,
	Justin He <Justin.He@arm.com>, nd <nd@arm.com>
Subject: RE: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping
Date: Fri, 17 Dec 2021 10:09:04 +0000	[thread overview]
Message-ID: <AM9PR08MB727694AAAB8247FDF9AD73DAF4789@AM9PR08MB7276.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <YbxYuETndF9LmJz4@FVFF77S0Q05N>

Hi Mark,

> -----Original Message-----
> From: Mark Rutland <mark.rutland@arm.com>
> Sent: Friday, December 17, 2021 5:31 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; will@kernel.org; Anshuman
> Khandual <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org;
> david@redhat.com; quic_qiancai@quicinc.com; ardb@kernel.org; linux-
> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> gshan@redhat.com; Justin He <Justin.He@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create
> pud mapping
> 
> On Thu, Dec 16, 2021 at 04:28:12PM +0800, Jianyong Wu wrote:
> > The 'fixmap' is a global resource and is used recursively by create
> > pud mapping(), leading to a potential race condition in the presence
> > of a concurrent call to alloc_init_pud():
> >
> > kernel_init thread                          virtio-mem workqueue thread
> > ==================                          ===========================
> >
> >   alloc_init_pud(...)                       alloc_init_pud(...)
> >   pudp = pud_set_fixmap_offset(...)         pudp = pud_set_fixmap_offset(...)
> >   READ_ONCE(*pudp)
> >   pud_clear_fixmap(...)
> >                                             READ_ONCE(*pudp) // CRASH!
> >
> > As kernel may sleep during creating pud mapping, introduce a mutex
> > lock to serialise use of the fixmap entries by alloc_init_pud().
> >
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> 
> Since there were deadlock issues with the last version, it would be very nice
> if we could check this with at least:
> 
> * CONFIG_DEBUG_ATOMIC_SLEEP
> * CONFIG_PROVE_LOCKING
> 
> ... so that we can be reasonably certain that we're not introducing some
> livelock/deadlock scenario.
> 

I enable these 2 configs and test for the current patch. No warning related with this change found. 

> Are you able to reproduce the problem for testing, or was this found by
> inspection? Do you have any instructions for reproducing the problem? e.g.
> can this easily be tested with QEMU?
> 

I test it using Cloud Hypervisor not QEMU. I find the bug when I tested the incoming feature of virtio-mem using Cloud Hypervisor.
I think we can reproduce this bug using QEMU, but as there is no virtio-mem support for the current QEMU, we can only test the ACPI-based memory hotplug. However, I think it's not easy to do and I have not tried that.

In my test: firstly, start a VM and hotplug a large size of memory using virtio-mem and reboot or kexec a new kernel. When the kernel booting, memory hotplugged by virtio-mem will be added within kernel_init. As both of kernel init and memory add thread will update page table, "alloc_pud_init" may be executed concurrently. 

I think it's not easy to reproduce the bug using ACPI based memory hotplug, as we must hotplug memory at the same time of kernel_init to crash with it. 

> If you're able to reproduce the issue, it would be nice to have an example
> backtrace of when this goes wrong.
> 
Yes, this bug occurs when kernel init, the function execute flow is:
-------------------------
kernel_init
  kernel_init_freeable
    ...
      do_initcall
        ...
          module_init [A]

  ...
  mark_readonly
    mark_rodata_ro [B]
-------------------------
[A] can contains memory hotplug init therefore both [A] and [B] can
update page table at the same time and may lead to crash.

Thanks
Jianyong Wu

> Thanks,
> Mark.
> 
> > ---
> >
> > Change log:
> >
> > from v2 to v3:
> >      change spin lock to mutex lock as kernel may sleep when create
> > pud map.
> >
> >  arch/arm64/mm/mmu.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index
> > acfae9b41cc8..e680a6a8ca40 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -63,6 +63,7 @@ static pmd_t bm_pmd[PTRS_PER_PMD]
> __page_aligned_bss
> > __maybe_unused;  static pud_t bm_pud[PTRS_PER_PUD]
> __page_aligned_bss
> > __maybe_unused;
> >
> >  static DEFINE_SPINLOCK(swapper_pgdir_lock);
> > +static DEFINE_MUTEX(fixmap_lock);
> >
> >  void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)  { @@ -329,6 +330,11 @@
> > static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long
> end,
> >  	}
> >  	BUG_ON(p4d_bad(p4d));
> >
> > +	/*
> > +	 * We only have one fixmap entry per page-table level, so take
> > +	 * the fixmap lock until we're done.
> > +	 */
> > +	mutex_lock(&fixmap_lock);
> >  	pudp = pud_set_fixmap_offset(p4dp, addr);
> >  	do {
> >  		pud_t old_pud = READ_ONCE(*pudp);
> > @@ -359,6 +365,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned
> long addr, unsigned long end,
> >  	} while (pudp++, addr = next, addr != end);
> >
> >  	pud_clear_fixmap();
> > +	mutex_unlock(&fixmap_lock);
> >  }
> >
> >  static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> > --
> > 2.17.1
> >

WARNING: multiple messages have this Message-ID (diff)
From: Jianyong Wu <Jianyong.Wu@arm.com>
To: Mark Rutland <Mark.Rutland@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"david@redhat.com" <david@redhat.com>,
	"quic_qiancai@quicinc.com" <quic_qiancai@quicinc.com>,
	"ardb@kernel.org" <ardb@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"gshan@redhat.com" <gshan@redhat.com>,
	Justin He <Justin.He@arm.com>, nd <nd@arm.com>
Subject: RE: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping
Date: Fri, 17 Dec 2021 10:09:04 +0000	[thread overview]
Message-ID: <AM9PR08MB727694AAAB8247FDF9AD73DAF4789@AM9PR08MB7276.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <YbxYuETndF9LmJz4@FVFF77S0Q05N>

Hi Mark,

> -----Original Message-----
> From: Mark Rutland <mark.rutland@arm.com>
> Sent: Friday, December 17, 2021 5:31 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; will@kernel.org; Anshuman
> Khandual <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org;
> david@redhat.com; quic_qiancai@quicinc.com; ardb@kernel.org; linux-
> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> gshan@redhat.com; Justin He <Justin.He@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create
> pud mapping
> 
> On Thu, Dec 16, 2021 at 04:28:12PM +0800, Jianyong Wu wrote:
> > The 'fixmap' is a global resource and is used recursively by create
> > pud mapping(), leading to a potential race condition in the presence
> > of a concurrent call to alloc_init_pud():
> >
> > kernel_init thread                          virtio-mem workqueue thread
> > ==================                          ===========================
> >
> >   alloc_init_pud(...)                       alloc_init_pud(...)
> >   pudp = pud_set_fixmap_offset(...)         pudp = pud_set_fixmap_offset(...)
> >   READ_ONCE(*pudp)
> >   pud_clear_fixmap(...)
> >                                             READ_ONCE(*pudp) // CRASH!
> >
> > As kernel may sleep during creating pud mapping, introduce a mutex
> > lock to serialise use of the fixmap entries by alloc_init_pud().
> >
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> 
> Since there were deadlock issues with the last version, it would be very nice
> if we could check this with at least:
> 
> * CONFIG_DEBUG_ATOMIC_SLEEP
> * CONFIG_PROVE_LOCKING
> 
> ... so that we can be reasonably certain that we're not introducing some
> livelock/deadlock scenario.
> 

I enable these 2 configs and test for the current patch. No warning related with this change found. 

> Are you able to reproduce the problem for testing, or was this found by
> inspection? Do you have any instructions for reproducing the problem? e.g.
> can this easily be tested with QEMU?
> 

I test it using Cloud Hypervisor not QEMU. I find the bug when I tested the incoming feature of virtio-mem using Cloud Hypervisor.
I think we can reproduce this bug using QEMU, but as there is no virtio-mem support for the current QEMU, we can only test the ACPI-based memory hotplug. However, I think it's not easy to do and I have not tried that.

In my test: firstly, start a VM and hotplug a large size of memory using virtio-mem and reboot or kexec a new kernel. When the kernel booting, memory hotplugged by virtio-mem will be added within kernel_init. As both of kernel init and memory add thread will update page table, "alloc_pud_init" may be executed concurrently. 

I think it's not easy to reproduce the bug using ACPI based memory hotplug, as we must hotplug memory at the same time of kernel_init to crash with it. 

> If you're able to reproduce the issue, it would be nice to have an example
> backtrace of when this goes wrong.
> 
Yes, this bug occurs when kernel init, the function execute flow is:
-------------------------
kernel_init
  kernel_init_freeable
    ...
      do_initcall
        ...
          module_init [A]

  ...
  mark_readonly
    mark_rodata_ro [B]
-------------------------
[A] can contains memory hotplug init therefore both [A] and [B] can
update page table at the same time and may lead to crash.

Thanks
Jianyong Wu

> Thanks,
> Mark.
> 
> > ---
> >
> > Change log:
> >
> > from v2 to v3:
> >      change spin lock to mutex lock as kernel may sleep when create
> > pud map.
> >
> >  arch/arm64/mm/mmu.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index
> > acfae9b41cc8..e680a6a8ca40 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -63,6 +63,7 @@ static pmd_t bm_pmd[PTRS_PER_PMD]
> __page_aligned_bss
> > __maybe_unused;  static pud_t bm_pud[PTRS_PER_PUD]
> __page_aligned_bss
> > __maybe_unused;
> >
> >  static DEFINE_SPINLOCK(swapper_pgdir_lock);
> > +static DEFINE_MUTEX(fixmap_lock);
> >
> >  void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)  { @@ -329,6 +330,11 @@
> > static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long
> end,
> >  	}
> >  	BUG_ON(p4d_bad(p4d));
> >
> > +	/*
> > +	 * We only have one fixmap entry per page-table level, so take
> > +	 * the fixmap lock until we're done.
> > +	 */
> > +	mutex_lock(&fixmap_lock);
> >  	pudp = pud_set_fixmap_offset(p4dp, addr);
> >  	do {
> >  		pud_t old_pud = READ_ONCE(*pudp);
> > @@ -359,6 +365,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned
> long addr, unsigned long end,
> >  	} while (pudp++, addr = next, addr != end);
> >
> >  	pud_clear_fixmap();
> > +	mutex_unlock(&fixmap_lock);
> >  }
> >
> >  static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> > --
> > 2.17.1
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-12-17 10:09 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-16  8:28 [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping Jianyong Wu
2021-12-16  8:28 ` Jianyong Wu
2021-12-16 15:19 ` David Hildenbrand
2021-12-16 15:19   ` David Hildenbrand
2021-12-17  9:30 ` Mark Rutland
2021-12-17  9:30   ` Mark Rutland
2021-12-17 10:09   ` Jianyong Wu [this message]
2021-12-17 10:09     ` Jianyong Wu
2022-01-05 18:03 ` Catalin Marinas
2022-01-05 18:03   ` Catalin Marinas
2022-01-06 10:13   ` Jianyong Wu
2022-01-06 10:13     ` Jianyong Wu
2022-01-06 15:56     ` Catalin Marinas
2022-01-06 15:56       ` Catalin Marinas
2022-01-07  9:10       ` Jianyong Wu
2022-01-07  9:10         ` Jianyong Wu
2022-01-07 10:42         ` Catalin Marinas
2022-01-07 10:42           ` Catalin Marinas
2022-01-26  4:20           ` Justin He
2022-01-26  4:20             ` Justin He
2022-01-26  8:36             ` Ard Biesheuvel
2022-01-26  8:36               ` Ard Biesheuvel
2022-01-26 10:09               ` Jianyong Wu
2022-01-26 10:09                 ` Jianyong Wu
2022-01-26 10:12                 ` Ard Biesheuvel
2022-01-26 10:12                   ` Ard Biesheuvel
2022-01-26 10:17                   ` David Hildenbrand
2022-01-26 10:17                     ` David Hildenbrand
2022-01-26 10:28                     ` Jianyong Wu
2022-01-26 10:28                       ` Jianyong Wu
2022-01-26 10:30                       ` David Hildenbrand
2022-01-26 10:30                         ` David Hildenbrand
2022-01-26 10:31                         ` David Hildenbrand
2022-01-26 10:31                           ` David Hildenbrand
2022-01-27  6:24                           ` Jianyong Wu
2022-01-27  6:24                             ` Jianyong Wu
2022-01-27 12:22                             ` David Hildenbrand
2022-01-27 12:22                               ` David Hildenbrand
2022-01-27 12:34                               ` Catalin Marinas
2022-01-27 12:34                                 ` Catalin Marinas
2022-01-31  8:13                                 ` Jianyong Wu
2022-01-31  8:13                                   ` Jianyong Wu
2022-01-31  8:10                               ` Jianyong Wu
2022-01-31  8:10                                 ` Jianyong Wu
2022-01-27  1:31               ` Justin He
2022-01-27  1:31                 ` Justin He
2022-01-07 10:53         ` Catalin Marinas
2022-01-07 10:53           ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM9PR08MB727694AAAB8247FDF9AD73DAF4789@AM9PR08MB7276.eurprd08.prod.outlook.com \
    --to=jianyong.wu@arm.com \
    --cc=Anshuman.Khandual@arm.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=Justin.He@arm.com \
    --cc=Mark.Rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nd@arm.com \
    --cc=quic_qiancai@quicinc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.