Re: [PATCH v1] arm64/mm: avoid race condition of update page table when kernel init

From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Jianyong Wu <Jianyong.Wu@arm.com>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	 "maz@kernel.org" <maz@kernel.org>,
	"ardb@kernel.org" <ardb@kernel.org>,
	"gshan@redhat.com" <gshan@redhat.com>,
	Justin He <Justin.He@arm.com>, nd <nd@arm.com>
Subject: Re: [PATCH v1] arm64/mm: avoid race condition of update page table when kernel init
Date: Mon, 6 Dec 2021 15:14:43 +0000	[thread overview]
Message-ID: <Ya4o4xa53YZ0SXls@arm.com> (raw)
In-Reply-To: <3c971e70-f8c7-4406-d098-74e92f3c7dc4@redhat.com>

On Fri, Dec 03, 2021 at 07:13:31PM +0100, David Hildenbrand wrote:
> On 03.12.21 18:44, Catalin Marinas wrote:
> > On Thu, Oct 28, 2021 at 08:36:07AM +0100, Jianyong Wu wrote:
> >> From Anshuman Khandual <anshuman.khandual@arm.com>:
> >>> On 10/27/21 3:18 PM, Jianyong Wu wrote:
> >>>> Race condition of page table update can happen in kernel boot period
> >>>> as both of memory hotplug action when kernel init and the
> >>>> mark_rodata_ro can update page table. For virtio-mem, the function excute flow chart is:
> >>>>
> >>>> -------------------------
> >>>> kernel_init
> >>>>   kernel_init_freeable
> >>>>     ...
> >>>>       do_initcall
> >>>>         ...
> >>>>           module_init [A]
> >>>>
> >>>>   ...
> >>>>   mark_readonly
> >>>>     mark_rodata_ro [B]
> >>>> -------------------------
> > [...]
> >>>> We can see that the error derived from the l3 translation as the pte
> >>>> value is *0*. That is because the fixmap has been clear when access.
> >>>>
> >>>> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> >>>> ---
> >>>>  arch/arm64/mm/mmu.c | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index
> >>>> cfd9deb347c3..567dfba8f08a 100644
> >>>> --- a/arch/arm64/mm/mmu.c
> >>>> +++ b/arch/arm64/mm/mmu.c
> >>>> @@ -564,8 +564,10 @@ void mark_rodata_ro(void)
> >>>>      * to cover NOTES and EXCEPTION_TABLE.
> >>>>      */
> >>>>     section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
> >>>> +   get_online_mems();
> >>>>     update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
> >>>>                         section_size, PAGE_KERNEL_RO);
> >>>> +   put_online_mems();
> >>>>
> >>>>     debug_checkwx();
> >>>>  }
> >>>
> >>> While this should solve the current problem i.e race between concurrent
> >>> memory hotplug operation and mark_rodata_ro(), but I am still wondering
> >>> whether this is the fix at the right place and granularity. Basically a hotplug
> >>> operation queued in an work queue at [A] can execute during [B] is the root
> >>> cause of this problem.
> >>
> >> Not exactly, this issue doesn't only happen at the the *pure* kernel
> >> boot. For example, hotplug memory through VM monitor when VM boot. We
> >> can't foresee when that happen. Thus, this issue can affect all kinds
> >> of memory hotplug mechanism, including ACPI based memory hotplug and
> >> virtio-mem. I'm not sure that fix it here is the best way. If the race
> >> only happens between kernel init and memory hotplug, I think it's fine
> >> to fix it here. IMO, this issue results from the race for "fixmap"
> >> resource. I wonder why this global resource is not protected by a
> >> lock. Maybe we can add one and fix it there.
> > 
> > IIUC the race is caused by multiple attempts to use the fixmap at the
> > same time. We can add a fixmap_lock and hold it during
> > __create_pgd_mapping().
> 
> IIRC that's something along the lines I suggested, so, yes :)

Yeah, I just echoed what you and Anshuman said ;).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel