All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Jianyong Wu <Jianyong.Wu@arm.com>, Ard Biesheuvel <ardb@kernel.org>
Cc: Justin He <Justin.He@arm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	"will@kernel.org" <will@kernel.org>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"quic_qiancai@quicinc.com" <quic_qiancai@quicinc.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"gshan@redhat.com" <gshan@redhat.com>, nd <nd@arm.com>
Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping
Date: Wed, 26 Jan 2022 11:30:52 +0100	[thread overview]
Message-ID: <65fdd873-1f93-56e3-c7a5-98d621c5dbd8@redhat.com> (raw)
In-Reply-To: <AM9PR08MB72764111B775352448D75CD9F4209@AM9PR08MB7276.eurprd08.prod.outlook.com>

On 26.01.22 11:28, Jianyong Wu wrote:
> Hi David,
> 
>> -----Original Message-----
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Wednesday, January 26, 2022 6:18 PM
>> To: Ard Biesheuvel <ardb@kernel.org>; Jianyong Wu
>> <Jianyong.Wu@arm.com>
>> Cc: Justin He <Justin.He@arm.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; will@kernel.org; Anshuman Khandual
>> <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org;
>> quic_qiancai@quicinc.com; linux-kernel@vger.kernel.org; linux-arm-
>> kernel@lists.infradead.org; gshan@redhat.com; nd <nd@arm.com>
>> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create
>> pud mapping
>>
>> On 26.01.22 11:12, Ard Biesheuvel wrote:
>>> On Wed, 26 Jan 2022 at 11:09, Jianyong Wu <Jianyong.Wu@arm.com>
>> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>>> -----Original Message-----
>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>> Sent: Wednesday, January 26, 2022 4:37 PM
>>>>> To: Justin He <Justin.He@arm.com>
>>>>> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Jianyong Wu
>>>>> <Jianyong.Wu@arm.com>; will@kernel.org; Anshuman Khandual
>>>>> <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org;
>>>>> david@redhat.com; quic_qiancai@quicinc.com; linux-
>>>>> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>>>>> gshan@redhat.com; nd <nd@arm.com>
>>>>> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when
>>>>> create pud mapping
>>>>>
>>>>> On Wed, 26 Jan 2022 at 05:21, Justin He <Justin.He@arm.com> wrote:
>>>>>>
>>>>>> Hi Catalin
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>>>>> Sent: Friday, January 7, 2022 6:43 PM
>>>>>>> To: Jianyong Wu <Jianyong.Wu@arm.com>
>>>>>>> Cc: will@kernel.org; Anshuman Khandual
>>>>> <Anshuman.Khandual@arm.com>;
>>>>>>> akpm@linux-foundation.org; david@redhat.com;
>>>>>>> quic_qiancai@quicinc.com; ardb@kernel.org;
>>>>>>> linux-kernel@vger.kernel.org; linux-arm-
>>>>>>> kernel@lists.infradead.org; gshan@redhat.com; Justin He
>>>>>>> <Justin.He@arm.com>; nd <nd@arm.com>
>>>>>>> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when
>>>>>>> create pud mapping
>>>>>>>
>>>>>>> On Fri, Jan 07, 2022 at 09:10:57AM +0000, Jianyong Wu wrote:
>>>>>>>> Hi Catalin,
>>>>>>>>
>>>>>>>> I roughly find the root cause.
>>>>>>>>  alloc_init_pud will be called at the very beginning of kernel
>>>>>>>> boot in
>>>>>>> create_mapping_noalloc where no memory allocator is initialized.
>>>>>>> But lockdep check may need allocate memory. So, kernel take
>>>>>>> exception when acquire lock.(I have not found the exact code that
>>>>>>> cause this
>>>>>>> issue) that's say we may not be able to use a lock so early.
>>>>>>>>
>>>>>>>> I come up with 2 methods to address it.
>>>>>>>> 1) skip dead lock check at the very beginning of kernel boot in
>>>>>>>> lockdep
>>>>>>> code.
>>>>>>>> 2) provided 2 two versions of __create_pgd_mapping, one with lock
>>>>>>>> in it and the other without. There may be no possible of race for
>>>>>>>> memory mapping at the very beginning time of kernel boot, thus we
>>>>>>>> can use the no lock version of __create_pgd_mapping safely.
>>>>>>>> In my test, this issue is gone if there is no lock held in
>>>>>>>> create_mapping_noalloc. I think create_mapping_noalloc is called
>>>>>>>> early enough to avoid the race conditions of memory mapping,
>>>>>>>> however, I have not proved it.
>>>>>>>
>>>>>>> I think method 2 would work better but rather than implementing
>>>>>>> new nolock functions I'd add a NO_LOCK flag and check it in
>>>>>>> alloc_init_pud() before mutex_lock/unlock. Also add a comment
>> when
>>>>>>> passing the NO_LOCK flag on why it's needed and why there wouldn't
>>>>>>> be any races at that stage (early boot etc.)
>>>>>>>
>>>>>> The problematic code path is:
>>>>>> __primary_switched
>>>>>>         early_fdt_map->fixmap_remap_fdt
>>>>>>                 create_mapping_noalloc->alloc_init_pud
>>>>>>                         mutex_lock (with Jianyong's patch)
>>>>>>
>>>>>> The problem seems to be that we will clear BSS segment twice if
>>>>>> kaslr is enabled. Hence, some of the static variables in lockdep
>>>>>> init process were messed up. That is to said, with kaslr enabled we
>>>>>> might initialize lockdep twice if we add mutex_lock/unlock in
>> alloc_init_pud().
>>>>>>
>>>>>
>>>>> Thanks for tracking that down.
>>>>>
>>>>> Note that clearing the BSS twice is not the root problem here. The
>>>>> root problem is that we set global state while the kernel runs at
>>>>> the default link time address, and then refer to it again after the
>>>>> entire kernel has been shifted in the kernel VA space. Such global
>>>>> state could consist of mutable pointers to statically allocated data
>>>>> (which would be reset to their default values after the relocation code
>> runs again), or global pointer variables in BSS.
>>>>> In either case, relying on such a global variable after the second
>>>>> relocation performed by KASLR would be risky, and so we should avoid
>>>>> manipulating global state at all if it might involve pointer to
>>>>> statically allocated data structures.
>>>>>
>>>>>> In other ways, if we invoke mutex_lock/unlock in such a early booting
>> stage.
>>>>>> It might be unsafe because lockdep inserts lock_acquire/release as
>>>>>> the complex hooks.
>>>>>>
>>>>>> In summary, would it better if Jianyong splits these early boot and
>>>>>> late boot case? e.g. introduce a nolock version for
>>>>> create_mapping_noalloc().
>>>>>>
>>>>>> What do you think of it?
>>>>>>
>>>>>
>>>>> The pre-KASLR case definitely doesn't need a lock. But given that
>>>>> create_mapping_noalloc() is only used to map the FDT, which happens
>>>>> very early one way or the other, wouldn't it be better to move the
>>>>> lock/unlock into other callers of __create_pgd_mapping()? (and make
>>>>> sure no other users of the fixmap slots exist)
>>>>
>>>> There are server callers of __create_pgd_mapping. I think some of them
>> need no fixmap lock as they are called so early. I figure out all of them here:
>>>> create_mapping_noalloc:   no lock
>>>> create_pgd_mapping:   no lock
>>>> __map_memblock:    no lock
>>>> map_kernel_segment:  no lock
>>>> map_entry_trampoline: no lock
>>>> update_mapping_prot:    need lock
>>>> arch_add_memory:  need lock
>>>>
>>>> WDYT?
>>>>
>>>
>>> That seems reasonable, but it needs to be documented clearly in the code.
>>>
>>
>> Just a random thought, could we rely on system_state to do the locking
>> conditionally?
> 
> I can't see the point. At the early stages of kernel boot, we definitely need no lock. Also, I think we should keep it simple.
> 

Is e.g.,

if (system_state < SYSTEM_RUNNING)
	/* lock */

if (system_state < SYSTEM_RUNNING)
	/* unlock */

more complicated than checking individual users and eventually getting
it wrong?

> Thanks
> Jianyong


-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Jianyong Wu <Jianyong.Wu@arm.com>, Ard Biesheuvel <ardb@kernel.org>
Cc: Justin He <Justin.He@arm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	 "will@kernel.org" <will@kernel.org>,
	Anshuman Khandual <Anshuman.Khandual@arm.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"quic_qiancai@quicinc.com" <quic_qiancai@quicinc.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"gshan@redhat.com" <gshan@redhat.com>, nd <nd@arm.com>
Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping
Date: Wed, 26 Jan 2022 11:30:52 +0100	[thread overview]
Message-ID: <65fdd873-1f93-56e3-c7a5-98d621c5dbd8@redhat.com> (raw)
In-Reply-To: <AM9PR08MB72764111B775352448D75CD9F4209@AM9PR08MB7276.eurprd08.prod.outlook.com>

On 26.01.22 11:28, Jianyong Wu wrote:
> Hi David,
> 
>> -----Original Message-----
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Wednesday, January 26, 2022 6:18 PM
>> To: Ard Biesheuvel <ardb@kernel.org>; Jianyong Wu
>> <Jianyong.Wu@arm.com>
>> Cc: Justin He <Justin.He@arm.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; will@kernel.org; Anshuman Khandual
>> <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org;
>> quic_qiancai@quicinc.com; linux-kernel@vger.kernel.org; linux-arm-
>> kernel@lists.infradead.org; gshan@redhat.com; nd <nd@arm.com>
>> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when create
>> pud mapping
>>
>> On 26.01.22 11:12, Ard Biesheuvel wrote:
>>> On Wed, 26 Jan 2022 at 11:09, Jianyong Wu <Jianyong.Wu@arm.com>
>> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>>> -----Original Message-----
>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>> Sent: Wednesday, January 26, 2022 4:37 PM
>>>>> To: Justin He <Justin.He@arm.com>
>>>>> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Jianyong Wu
>>>>> <Jianyong.Wu@arm.com>; will@kernel.org; Anshuman Khandual
>>>>> <Anshuman.Khandual@arm.com>; akpm@linux-foundation.org;
>>>>> david@redhat.com; quic_qiancai@quicinc.com; linux-
>>>>> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>>>>> gshan@redhat.com; nd <nd@arm.com>
>>>>> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when
>>>>> create pud mapping
>>>>>
>>>>> On Wed, 26 Jan 2022 at 05:21, Justin He <Justin.He@arm.com> wrote:
>>>>>>
>>>>>> Hi Catalin
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>>>>> Sent: Friday, January 7, 2022 6:43 PM
>>>>>>> To: Jianyong Wu <Jianyong.Wu@arm.com>
>>>>>>> Cc: will@kernel.org; Anshuman Khandual
>>>>> <Anshuman.Khandual@arm.com>;
>>>>>>> akpm@linux-foundation.org; david@redhat.com;
>>>>>>> quic_qiancai@quicinc.com; ardb@kernel.org;
>>>>>>> linux-kernel@vger.kernel.org; linux-arm-
>>>>>>> kernel@lists.infradead.org; gshan@redhat.com; Justin He
>>>>>>> <Justin.He@arm.com>; nd <nd@arm.com>
>>>>>>> Subject: Re: [PATCH v3] arm64/mm: avoid fixmap race condition when
>>>>>>> create pud mapping
>>>>>>>
>>>>>>> On Fri, Jan 07, 2022 at 09:10:57AM +0000, Jianyong Wu wrote:
>>>>>>>> Hi Catalin,
>>>>>>>>
>>>>>>>> I roughly find the root cause.
>>>>>>>>  alloc_init_pud will be called at the very beginning of kernel
>>>>>>>> boot in
>>>>>>> create_mapping_noalloc where no memory allocator is initialized.
>>>>>>> But lockdep check may need allocate memory. So, kernel take
>>>>>>> exception when acquire lock.(I have not found the exact code that
>>>>>>> cause this
>>>>>>> issue) that's say we may not be able to use a lock so early.
>>>>>>>>
>>>>>>>> I come up with 2 methods to address it.
>>>>>>>> 1) skip dead lock check at the very beginning of kernel boot in
>>>>>>>> lockdep
>>>>>>> code.
>>>>>>>> 2) provided 2 two versions of __create_pgd_mapping, one with lock
>>>>>>>> in it and the other without. There may be no possible of race for
>>>>>>>> memory mapping at the very beginning time of kernel boot, thus we
>>>>>>>> can use the no lock version of __create_pgd_mapping safely.
>>>>>>>> In my test, this issue is gone if there is no lock held in
>>>>>>>> create_mapping_noalloc. I think create_mapping_noalloc is called
>>>>>>>> early enough to avoid the race conditions of memory mapping,
>>>>>>>> however, I have not proved it.
>>>>>>>
>>>>>>> I think method 2 would work better but rather than implementing
>>>>>>> new nolock functions I'd add a NO_LOCK flag and check it in
>>>>>>> alloc_init_pud() before mutex_lock/unlock. Also add a comment
>> when
>>>>>>> passing the NO_LOCK flag on why it's needed and why there wouldn't
>>>>>>> be any races at that stage (early boot etc.)
>>>>>>>
>>>>>> The problematic code path is:
>>>>>> __primary_switched
>>>>>>         early_fdt_map->fixmap_remap_fdt
>>>>>>                 create_mapping_noalloc->alloc_init_pud
>>>>>>                         mutex_lock (with Jianyong's patch)
>>>>>>
>>>>>> The problem seems to be that we will clear BSS segment twice if
>>>>>> kaslr is enabled. Hence, some of the static variables in lockdep
>>>>>> init process were messed up. That is to said, with kaslr enabled we
>>>>>> might initialize lockdep twice if we add mutex_lock/unlock in
>> alloc_init_pud().
>>>>>>
>>>>>
>>>>> Thanks for tracking that down.
>>>>>
>>>>> Note that clearing the BSS twice is not the root problem here. The
>>>>> root problem is that we set global state while the kernel runs at
>>>>> the default link time address, and then refer to it again after the
>>>>> entire kernel has been shifted in the kernel VA space. Such global
>>>>> state could consist of mutable pointers to statically allocated data
>>>>> (which would be reset to their default values after the relocation code
>> runs again), or global pointer variables in BSS.
>>>>> In either case, relying on such a global variable after the second
>>>>> relocation performed by KASLR would be risky, and so we should avoid
>>>>> manipulating global state at all if it might involve pointer to
>>>>> statically allocated data structures.
>>>>>
>>>>>> In other ways, if we invoke mutex_lock/unlock in such a early booting
>> stage.
>>>>>> It might be unsafe because lockdep inserts lock_acquire/release as
>>>>>> the complex hooks.
>>>>>>
>>>>>> In summary, would it better if Jianyong splits these early boot and
>>>>>> late boot case? e.g. introduce a nolock version for
>>>>> create_mapping_noalloc().
>>>>>>
>>>>>> What do you think of it?
>>>>>>
>>>>>
>>>>> The pre-KASLR case definitely doesn't need a lock. But given that
>>>>> create_mapping_noalloc() is only used to map the FDT, which happens
>>>>> very early one way or the other, wouldn't it be better to move the
>>>>> lock/unlock into other callers of __create_pgd_mapping()? (and make
>>>>> sure no other users of the fixmap slots exist)
>>>>
>>>> There are server callers of __create_pgd_mapping. I think some of them
>> need no fixmap lock as they are called so early. I figure out all of them here:
>>>> create_mapping_noalloc:   no lock
>>>> create_pgd_mapping:   no lock
>>>> __map_memblock:    no lock
>>>> map_kernel_segment:  no lock
>>>> map_entry_trampoline: no lock
>>>> update_mapping_prot:    need lock
>>>> arch_add_memory:  need lock
>>>>
>>>> WDYT?
>>>>
>>>
>>> That seems reasonable, but it needs to be documented clearly in the code.
>>>
>>
>> Just a random thought, could we rely on system_state to do the locking
>> conditionally?
> 
> I can't see the point. At the early stages of kernel boot, we definitely need no lock. Also, I think we should keep it simple.
> 

Is e.g.,

if (system_state < SYSTEM_RUNNING)
	/* lock */

if (system_state < SYSTEM_RUNNING)
	/* unlock */

more complicated than checking individual users and eventually getting
it wrong?

> Thanks
> Jianyong


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-01-26 10:31 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-16  8:28 [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping Jianyong Wu
2021-12-16  8:28 ` Jianyong Wu
2021-12-16 15:19 ` David Hildenbrand
2021-12-16 15:19   ` David Hildenbrand
2021-12-17  9:30 ` Mark Rutland
2021-12-17  9:30   ` Mark Rutland
2021-12-17 10:09   ` Jianyong Wu
2021-12-17 10:09     ` Jianyong Wu
2022-01-05 18:03 ` Catalin Marinas
2022-01-05 18:03   ` Catalin Marinas
2022-01-06 10:13   ` Jianyong Wu
2022-01-06 10:13     ` Jianyong Wu
2022-01-06 15:56     ` Catalin Marinas
2022-01-06 15:56       ` Catalin Marinas
2022-01-07  9:10       ` Jianyong Wu
2022-01-07  9:10         ` Jianyong Wu
2022-01-07 10:42         ` Catalin Marinas
2022-01-07 10:42           ` Catalin Marinas
2022-01-26  4:20           ` Justin He
2022-01-26  4:20             ` Justin He
2022-01-26  8:36             ` Ard Biesheuvel
2022-01-26  8:36               ` Ard Biesheuvel
2022-01-26 10:09               ` Jianyong Wu
2022-01-26 10:09                 ` Jianyong Wu
2022-01-26 10:12                 ` Ard Biesheuvel
2022-01-26 10:12                   ` Ard Biesheuvel
2022-01-26 10:17                   ` David Hildenbrand
2022-01-26 10:17                     ` David Hildenbrand
2022-01-26 10:28                     ` Jianyong Wu
2022-01-26 10:28                       ` Jianyong Wu
2022-01-26 10:30                       ` David Hildenbrand [this message]
2022-01-26 10:30                         ` David Hildenbrand
2022-01-26 10:31                         ` David Hildenbrand
2022-01-26 10:31                           ` David Hildenbrand
2022-01-27  6:24                           ` Jianyong Wu
2022-01-27  6:24                             ` Jianyong Wu
2022-01-27 12:22                             ` David Hildenbrand
2022-01-27 12:22                               ` David Hildenbrand
2022-01-27 12:34                               ` Catalin Marinas
2022-01-27 12:34                                 ` Catalin Marinas
2022-01-31  8:13                                 ` Jianyong Wu
2022-01-31  8:13                                   ` Jianyong Wu
2022-01-31  8:10                               ` Jianyong Wu
2022-01-31  8:10                                 ` Jianyong Wu
2022-01-27  1:31               ` Justin He
2022-01-27  1:31                 ` Justin He
2022-01-07 10:53         ` Catalin Marinas
2022-01-07 10:53           ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65fdd873-1f93-56e3-c7a5-98d621c5dbd8@redhat.com \
    --to=david@redhat.com \
    --cc=Anshuman.Khandual@arm.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=Jianyong.Wu@arm.com \
    --cc=Justin.He@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=gshan@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nd@arm.com \
    --cc=quic_qiancai@quicinc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.