All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: David Hildenbrand <david@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Cc: catalin.marinas@arm.com, will@kernel.org, ardb@kernel.org,
	"Mark Rutland" <mark.rutland@arm.com>,
	"James Morse" <james.morse@arm.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
Date: Thu, 28 Jan 2021 13:12:28 +0530	[thread overview]
Message-ID: <05db75d4-f5ec-4481-19fa-5fb622f97969@arm.com> (raw)
In-Reply-To: <8ad7d1d2-6d0a-1c3c-5c18-3d5b8ca5feb8@redhat.com>


On 1/27/21 2:59 PM, David Hildenbrand wrote:
> On 27.01.21 05:06, Anshuman Khandual wrote:
>>
>>
>> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>>
>>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>>> into the system via memremap_pages() called from a driver, their respective
>>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>>
>>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>>> skipped as its always going to be positive and that will be an optimization
>>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>>
>>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>>> performance for normal hotplug memory as well.
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Will Deacon <will@kernel.org>
>>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>>
>>>> Hello David/Mike,
>>>>
>>>> Given that we would need to rework early sections, memblock semantics via a
>>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>>> which fixes a problem (and improves performance) can be merged first. After
>>>> that, I could start working on the proposed rework. Could you please let me
>>>> know your thoughts on this. Thank you.
>>>
>>> As I said, we might have to throw in an pfn_section_valid() check, to
>>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>>> on arm64 as well, no?).
>>
>> pfn_section_valid() should be called only for !early_section() i.e normal
>> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
>> be section aligned.
> 
> Well, at least not on x86-64 you can have early sections intersect with ZONE_DEVICE memory.
> 
> E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE memory which might cover the remaining 64MB. For pfn_valid() on x86-64, we always return "true" for such sections, because we always have the memmap for the whole early section allocated during boot. So, there it's "simple".

This is the generic pfn_valid() used on X86. As you mentioned this
does not test pfn_section_valid() if the section is early assuming
that vmemmap coverage is complete.

#ifndef CONFIG_HAVE_ARCH_PFN_VALID
static inline int pfn_valid(unsigned long pfn)
{
        struct mem_section *ms;

        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        ms = __nr_to_section(pfn_to_section_nr(pfn));
        if (!valid_section(ms))
                return 0;
        /*
         * Traditionally early sections always returned pfn_valid() for
         * the entire section-sized span.
         */
        return early_section(ms) || pfn_section_valid(ms, pfn);
}
#endif

Looking at the code, seems like early sections get initialized via
sparse_init() only in section granularity but then please correct
me otherwise.

> 
> Now, arm64 seems to discard some parts of the vmemmap, so the remaining 64MB in such an early section might not have a memmap anymore? TBH, I don't know.

Did not get that. Could you please be more specific on how arm64 discards
parts of the vmemmap.

> 
> Most probably only performing the check for
> !early_section() is sufficient on arm64, but I really can't tell as I don't know what we're actually discarding and if something as described for x86-64 is even possible on arm64.

Seems like direct users for arch_add_memory() and __add_pages() like
pagemap_range() can cause subsection hotplug and vmemmap mapping. So
pfn_section_valid() should be applicable only for !early_sections().

Although a simple test on arm64 shows that both boot memory and
traditional memory hotplug gets entire subsection_map populated. But
that might not be always true for ZONE_DEVICE memory.

> 
> We should really try to take the magic out of arm64 vmemmap handling.

I would really like to understand more about this.

> 
>>
>>>
>>> Apart from that, I'm fine with a simple fix upfront, that can be more
>>> easily backported if needed. (Q: do we? is this stable material?)
>>>
>>
>> Right, an upfront fix here would help in backporting. AFAICS it should be
>> backported to the stable as pte_devmap and ZONE_DEVICE have been around
>> for some time now. Do you have a particular stable version which needs to
>> be tagged in the patch ?
> 
> I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was enabled on arm64?
> 
Sure, will figure this out.

WARNING: multiple messages have this Message-ID (diff)
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: David Hildenbrand <david@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Cc: "Mark Rutland" <mark.rutland@arm.com>,
	will@kernel.org, catalin.marinas@arm.com,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"James Morse" <james.morse@arm.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	ardb@kernel.org
Subject: Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
Date: Thu, 28 Jan 2021 13:12:28 +0530	[thread overview]
Message-ID: <05db75d4-f5ec-4481-19fa-5fb622f97969@arm.com> (raw)
In-Reply-To: <8ad7d1d2-6d0a-1c3c-5c18-3d5b8ca5feb8@redhat.com>


On 1/27/21 2:59 PM, David Hildenbrand wrote:
> On 27.01.21 05:06, Anshuman Khandual wrote:
>>
>>
>> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>>
>>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>>> into the system via memremap_pages() called from a driver, their respective
>>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>>
>>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>>> skipped as its always going to be positive and that will be an optimization
>>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>>
>>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>>> performance for normal hotplug memory as well.
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Will Deacon <will@kernel.org>
>>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>>
>>>> Hello David/Mike,
>>>>
>>>> Given that we would need to rework early sections, memblock semantics via a
>>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>>> which fixes a problem (and improves performance) can be merged first. After
>>>> that, I could start working on the proposed rework. Could you please let me
>>>> know your thoughts on this. Thank you.
>>>
>>> As I said, we might have to throw in an pfn_section_valid() check, to
>>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>>> on arm64 as well, no?).
>>
>> pfn_section_valid() should be called only for !early_section() i.e normal
>> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
>> be section aligned.
> 
> Well, at least not on x86-64 you can have early sections intersect with ZONE_DEVICE memory.
> 
> E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE memory which might cover the remaining 64MB. For pfn_valid() on x86-64, we always return "true" for such sections, because we always have the memmap for the whole early section allocated during boot. So, there it's "simple".

This is the generic pfn_valid() used on X86. As you mentioned this
does not test pfn_section_valid() if the section is early assuming
that vmemmap coverage is complete.

#ifndef CONFIG_HAVE_ARCH_PFN_VALID
static inline int pfn_valid(unsigned long pfn)
{
        struct mem_section *ms;

        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        ms = __nr_to_section(pfn_to_section_nr(pfn));
        if (!valid_section(ms))
                return 0;
        /*
         * Traditionally early sections always returned pfn_valid() for
         * the entire section-sized span.
         */
        return early_section(ms) || pfn_section_valid(ms, pfn);
}
#endif

Looking at the code, seems like early sections get initialized via
sparse_init() only in section granularity but then please correct
me otherwise.

> 
> Now, arm64 seems to discard some parts of the vmemmap, so the remaining 64MB in such an early section might not have a memmap anymore? TBH, I don't know.

Did not get that. Could you please be more specific on how arm64 discards
parts of the vmemmap.

> 
> Most probably only performing the check for
> !early_section() is sufficient on arm64, but I really can't tell as I don't know what we're actually discarding and if something as described for x86-64 is even possible on arm64.

Seems like direct users for arch_add_memory() and __add_pages() like
pagemap_range() can cause subsection hotplug and vmemmap mapping. So
pfn_section_valid() should be applicable only for !early_sections().

Although a simple test on arm64 shows that both boot memory and
traditional memory hotplug gets entire subsection_map populated. But
that might not be always true for ZONE_DEVICE memory.

> 
> We should really try to take the magic out of arm64 vmemmap handling.

I would really like to understand more about this.

> 
>>
>>>
>>> Apart from that, I'm fine with a simple fix upfront, that can be more
>>> easily backported if needed. (Q: do we? is this stable material?)
>>>
>>
>> Right, an upfront fix here would help in backporting. AFAICS it should be
>> backported to the stable as pte_devmap and ZONE_DEVICE have been around
>> for some time now. Do you have a particular stable version which needs to
>> be tagged in the patch ?
> 
> I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was enabled on arm64?
> 
Sure, will figure this out.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-28  7:43 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22  7:12 [RFC 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVIE based memory Anshuman Khandual
2020-12-22  7:12 ` Anshuman Khandual
2020-12-22  7:12 ` [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE " Anshuman Khandual
2020-12-22  7:12   ` Anshuman Khandual
2020-12-22  9:11   ` David Hildenbrand
2020-12-22  9:11     ` David Hildenbrand
2021-01-04  6:18     ` Anshuman Khandual
2021-01-04  6:18       ` Anshuman Khandual
2021-01-11 10:31       ` David Hildenbrand
2021-01-11 10:31         ` David Hildenbrand
2021-01-11 14:00         ` Mike Rapoport
2021-01-11 14:00           ` Mike Rapoport
2021-01-04 15:36   ` Jonathan Cameron
2021-01-04 15:36     ` Jonathan Cameron
2021-01-05  3:25     ` Anshuman Khandual
2021-01-05  3:25       ` Anshuman Khandual
2021-01-25  6:22   ` Anshuman Khandual
2021-01-25  6:22     ` Anshuman Khandual
2021-01-25  7:31     ` Mike Rapoport
2021-01-25  7:31       ` Mike Rapoport
2021-01-27  3:46       ` Anshuman Khandual
2021-01-27  3:46         ` Anshuman Khandual
2021-01-25  9:13     ` David Hildenbrand
2021-01-25  9:13       ` David Hildenbrand
2021-01-27  4:06       ` Anshuman Khandual
2021-01-27  4:06         ` Anshuman Khandual
2021-01-27  9:29         ` David Hildenbrand
2021-01-27  9:29           ` David Hildenbrand
2021-01-28  7:42           ` Anshuman Khandual [this message]
2021-01-28  7:42             ` Anshuman Khandual
2020-12-22  7:12 ` [RFC 2/2] arm64/mm: Reorganize pfn_valid() Anshuman Khandual
2020-12-22  7:12   ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05db75d4-f5ec-4481-19fa-5fb622f97969@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=james.morse@arm.com \
    --cc=jglisse@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.