All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVIE based memory
@ 2020-12-22  7:12 ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2020-12-22  7:12 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Anshuman Khandual, Mark Rutland,
	James Morse, Robin Murphy, Jérôme Glisse, Dan Williams,
	David Hildenbrand

This series fixes pfn_valid() for ZONE_DEVICE based memory and also improves
its performance for normal hotplug memory. While here, it also reorganizes
pfn_valid() on CONFIG_SPARSEMEM. This series has been slightly tested on the
current mainline tree.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (2):
  arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  arm64/mm: Reorganize pfn_valid()

 arch/arm64/mm/init.c | 46 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVIE based memory
@ 2020-12-22  7:12 ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2020-12-22  7:12 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Anshuman Khandual, catalin.marinas,
	David Hildenbrand, Robin Murphy, Jérôme Glisse,
	James Morse, Dan Williams, will, ardb

This series fixes pfn_valid() for ZONE_DEVICE based memory and also improves
its performance for normal hotplug memory. While here, it also reorganizes
pfn_valid() on CONFIG_SPARSEMEM. This series has been slightly tested on the
current mainline tree.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

Anshuman Khandual (2):
  arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  arm64/mm: Reorganize pfn_valid()

 arch/arm64/mm/init.c | 46 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 5 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2020-12-22  7:12 ` Anshuman Khandual
@ 2020-12-22  7:12   ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2020-12-22  7:12 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Anshuman Khandual, Mark Rutland,
	James Morse, Robin Murphy, Jérôme Glisse, Dan Williams,
	David Hildenbrand

pfn_valid() validates a pfn but basically it checks for a valid struct page
backing for that pfn. It should always return positive for memory ranges
backed with struct page mapping. But currently pfn_valid() fails for all
ZONE_DEVICE based memory types even though they have struct page mapping.

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
normal memory too will not have SECTION_IS_EARLY set for their sections.

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/init.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 75addb36354a..ee23bda00c28 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
 
 	if (!valid_section(__pfn_to_section(pfn)))
 		return 0;
+
+	/*
+	 * ZONE_DEVICE memory does not have the memblock entries.
+	 * memblock_is_map_memory() check for ZONE_DEVICE based
+	 * addresses will always fail. Even the normal hotplugged
+	 * memory will never have MEMBLOCK_NOMAP flag set in their
+	 * memblock entries. Skip memblock search for all non early
+	 * memory sections covering all of hotplug memory including
+	 * both normal and ZONE_DEVIE based.
+	 */
+	if (!early_section(__pfn_to_section(pfn)))
+		return 1;
 #endif
 	return memblock_is_map_memory(addr);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2020-12-22  7:12   ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2020-12-22  7:12 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Anshuman Khandual, catalin.marinas,
	David Hildenbrand, Robin Murphy, Jérôme Glisse,
	James Morse, Dan Williams, will, ardb

pfn_valid() validates a pfn but basically it checks for a valid struct page
backing for that pfn. It should always return positive for memory ranges
backed with struct page mapping. But currently pfn_valid() fails for all
ZONE_DEVICE based memory types even though they have struct page mapping.

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
normal memory too will not have SECTION_IS_EARLY set for their sections.

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/init.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 75addb36354a..ee23bda00c28 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
 
 	if (!valid_section(__pfn_to_section(pfn)))
 		return 0;
+
+	/*
+	 * ZONE_DEVICE memory does not have the memblock entries.
+	 * memblock_is_map_memory() check for ZONE_DEVICE based
+	 * addresses will always fail. Even the normal hotplugged
+	 * memory will never have MEMBLOCK_NOMAP flag set in their
+	 * memblock entries. Skip memblock search for all non early
+	 * memory sections covering all of hotplug memory including
+	 * both normal and ZONE_DEVIE based.
+	 */
+	if (!early_section(__pfn_to_section(pfn)))
+		return 1;
 #endif
 	return memblock_is_map_memory(addr);
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 2/2] arm64/mm: Reorganize pfn_valid()
  2020-12-22  7:12 ` Anshuman Khandual
@ 2020-12-22  7:12   ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2020-12-22  7:12 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Anshuman Khandual, Mark Rutland,
	James Morse, Robin Murphy, Jérôme Glisse, Dan Williams,
	David Hildenbrand

There are multiple instances of pfn_to_section_nr() and __pfn_to_section()
when CONFIG_SPARSEMEM is enabled. This can be just optimized if the memory
section is fetched earlier. Hence bifurcate pfn_valid() into two different
definitions depending on whether CONFIG_SPARSEMEM is enabled. Also replace
the open coded pfn <--> addr conversion with __[pfn|phys]_to_[phys|pfn]().
This does not cause any functional change.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/init.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ee23bda00c28..c3387798a3be 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -212,18 +212,25 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 	free_area_init(max_zone_pfns);
 }
 
+#ifdef CONFIG_SPARSEMEM
 int pfn_valid(unsigned long pfn)
 {
-	phys_addr_t addr = pfn << PAGE_SHIFT;
+	struct mem_section *ms = __pfn_to_section(pfn);
+	phys_addr_t addr = __pfn_to_phys(pfn);
 
-	if ((addr >> PAGE_SHIFT) != pfn)
+	/*
+	 * Ensure the upper PAGE_SHIFT bits are clear in the
+	 * pfn. Else it might lead to false positives when
+	 * some of the upper bits are set, but the lower bits
+	 * match a valid pfn.
+	 */
+	if (__phys_to_pfn(addr) != pfn)
 		return 0;
 
-#ifdef CONFIG_SPARSEMEM
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 
-	if (!valid_section(__pfn_to_section(pfn)))
+	if (!valid_section(ms))
 		return 0;
 
 	/*
@@ -235,11 +242,28 @@ int pfn_valid(unsigned long pfn)
 	 * memory sections covering all of hotplug memory including
 	 * both normal and ZONE_DEVIE based.
 	 */
-	if (!early_section(__pfn_to_section(pfn)))
+	if (!early_section(ms))
 		return 1;
-#endif
+
 	return memblock_is_map_memory(addr);
 }
+#else
+int pfn_valid(unsigned long pfn)
+{
+	phys_addr_t addr = __pfn_to_phys(pfn);
+
+	/*
+	 * Ensure the upper PAGE_SHIFT bits are clear in the
+	 * pfn. Else it might lead to false positives when
+	 * some of the upper bits are set, but the lower bits
+	 * match a valid pfn.
+	 */
+	if (__phys_to_pfn(addr) != pfn)
+		return 0;
+
+	return memblock_is_map_memory(addr);
+}
+#endif
 EXPORT_SYMBOL(pfn_valid);
 
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 2/2] arm64/mm: Reorganize pfn_valid()
@ 2020-12-22  7:12   ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2020-12-22  7:12 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, Anshuman Khandual, catalin.marinas,
	David Hildenbrand, Robin Murphy, Jérôme Glisse,
	James Morse, Dan Williams, will, ardb

There are multiple instances of pfn_to_section_nr() and __pfn_to_section()
when CONFIG_SPARSEMEM is enabled. This can be just optimized if the memory
section is fetched earlier. Hence bifurcate pfn_valid() into two different
definitions depending on whether CONFIG_SPARSEMEM is enabled. Also replace
the open coded pfn <--> addr conversion with __[pfn|phys]_to_[phys|pfn]().
This does not cause any functional change.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/init.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ee23bda00c28..c3387798a3be 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -212,18 +212,25 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 	free_area_init(max_zone_pfns);
 }
 
+#ifdef CONFIG_SPARSEMEM
 int pfn_valid(unsigned long pfn)
 {
-	phys_addr_t addr = pfn << PAGE_SHIFT;
+	struct mem_section *ms = __pfn_to_section(pfn);
+	phys_addr_t addr = __pfn_to_phys(pfn);
 
-	if ((addr >> PAGE_SHIFT) != pfn)
+	/*
+	 * Ensure the upper PAGE_SHIFT bits are clear in the
+	 * pfn. Else it might lead to false positives when
+	 * some of the upper bits are set, but the lower bits
+	 * match a valid pfn.
+	 */
+	if (__phys_to_pfn(addr) != pfn)
 		return 0;
 
-#ifdef CONFIG_SPARSEMEM
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 
-	if (!valid_section(__pfn_to_section(pfn)))
+	if (!valid_section(ms))
 		return 0;
 
 	/*
@@ -235,11 +242,28 @@ int pfn_valid(unsigned long pfn)
 	 * memory sections covering all of hotplug memory including
 	 * both normal and ZONE_DEVIE based.
 	 */
-	if (!early_section(__pfn_to_section(pfn)))
+	if (!early_section(ms))
 		return 1;
-#endif
+
 	return memblock_is_map_memory(addr);
 }
+#else
+int pfn_valid(unsigned long pfn)
+{
+	phys_addr_t addr = __pfn_to_phys(pfn);
+
+	/*
+	 * Ensure the upper PAGE_SHIFT bits are clear in the
+	 * pfn. Else it might lead to false positives when
+	 * some of the upper bits are set, but the lower bits
+	 * match a valid pfn.
+	 */
+	if (__phys_to_pfn(addr) != pfn)
+		return 0;
+
+	return memblock_is_map_memory(addr);
+}
+#endif
 EXPORT_SYMBOL(pfn_valid);
 
 static phys_addr_t memory_limit = PHYS_ADDR_MAX;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2020-12-22  7:12   ` Anshuman Khandual
@ 2020-12-22  9:11     ` David Hildenbrand
  -1 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-12-22  9:11 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport, Michael Ellerman

On 22.12.20 08:12, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/mm/init.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>  
>  	if (!valid_section(__pfn_to_section(pfn)))
>  		return 0;
> +
> +	/*
> +	 * ZONE_DEVICE memory does not have the memblock entries.
> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * addresses will always fail. Even the normal hotplugged
> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> +	 * memblock entries. Skip memblock search for all non early
> +	 * memory sections covering all of hotplug memory including
> +	 * both normal and ZONE_DEVIE based.
> +	 */
> +	if (!early_section(__pfn_to_section(pfn)))
> +		return 1;

Actually, I think we want to check for partial present sections.

Maybe we can rather switch to generic pfn_valid() and tweak it to
something like

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..7b1fcce5bd5a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
                return 0;
        /*
         * Traditionally early sections always returned pfn_valid() for
-        * the entire section-sized span.
+        * the entire section-sized span. Some archs might have holes in
+        * early sections, so double check with memblock if configured.
         */
-       return early_section(ms) || pfn_section_valid(ms, pfn);
+       if (early_section(ms))
+               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
+                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
+       return pfn_section_valid(ms, pfn);
 }
 #endif



Which users are remaining that require us to add/remove memblocks when
hot(un)plugging memory

 $ git grep KEEP_MEM | grep memory_hotplug
mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {


I think one user we would have to handle is
arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
does not rely on memblock_is_map_memory.

-- 
Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2020-12-22  9:11     ` David Hildenbrand
  0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2020-12-22  9:11 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Michael Ellerman, Jérôme Glisse, James Morse,
	Dan Williams, Robin Murphy, ardb

On 22.12.20 08:12, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/mm/init.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>  
>  	if (!valid_section(__pfn_to_section(pfn)))
>  		return 0;
> +
> +	/*
> +	 * ZONE_DEVICE memory does not have the memblock entries.
> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * addresses will always fail. Even the normal hotplugged
> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> +	 * memblock entries. Skip memblock search for all non early
> +	 * memory sections covering all of hotplug memory including
> +	 * both normal and ZONE_DEVIE based.
> +	 */
> +	if (!early_section(__pfn_to_section(pfn)))
> +		return 1;

Actually, I think we want to check for partial present sections.

Maybe we can rather switch to generic pfn_valid() and tweak it to
something like

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..7b1fcce5bd5a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
                return 0;
        /*
         * Traditionally early sections always returned pfn_valid() for
-        * the entire section-sized span.
+        * the entire section-sized span. Some archs might have holes in
+        * early sections, so double check with memblock if configured.
         */
-       return early_section(ms) || pfn_section_valid(ms, pfn);
+       if (early_section(ms))
+               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
+                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
+       return pfn_section_valid(ms, pfn);
 }
 #endif



Which users are remaining that require us to add/remove memblocks when
hot(un)plugging memory

 $ git grep KEEP_MEM | grep memory_hotplug
mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {


I think one user we would have to handle is
arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
does not rely on memblock_is_map_memory.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2020-12-22  9:11     ` David Hildenbrand
@ 2021-01-04  6:18       ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-04  6:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport, Michael Ellerman


On 12/22/20 2:41 PM, David Hildenbrand wrote:
> On 22.12.20 08:12, Anshuman Khandual wrote:
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/arm64/mm/init.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>  
>>  	if (!valid_section(__pfn_to_section(pfn)))
>>  		return 0;
>> +
>> +	/*
>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>> +	 * addresses will always fail. Even the normal hotplugged
>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>> +	 * memblock entries. Skip memblock search for all non early
>> +	 * memory sections covering all of hotplug memory including
>> +	 * both normal and ZONE_DEVIE based.
>> +	 */
>> +	if (!early_section(__pfn_to_section(pfn)))
>> +		return 1;
> 
> Actually, I think we want to check for partial present sections.
> 
> Maybe we can rather switch to generic pfn_valid() and tweak it to
> something like
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fb3bf696c05e..7b1fcce5bd5a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
>                 return 0;
>         /*
>          * Traditionally early sections always returned pfn_valid() for
> -        * the entire section-sized span.
> +        * the entire section-sized span. Some archs might have holes in
> +        * early sections, so double check with memblock if configured.
>          */
> -       return early_section(ms) || pfn_section_valid(ms, pfn);
> +       if (early_section(ms))
> +               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> +                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> +       return pfn_section_valid(ms, pfn);
>  }
>  #endif

Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
create this config which could track platform scenarios where all early
sections might not have mmap coverage such as arm64 ?

> 
> 
> 
> Which users are remaining that require us to add/remove memblocks when
> hot(un)plugging memory
> 
>  $ git grep KEEP_MEM | grep memory_hotplug
> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {

Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
will not be able to track MEMBLOCK_NOMAP memory at runtime.

> 
> 
> I think one user we would have to handle is
> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> does not rely on memblock_is_map_memory.

memblock_is_map_memory() is currently used only on arm/arm64 platforms.
Apart from the above example in valid_phys_addr_range(), there are some
other memblock_is_map_memory() call sites as well. But then, we are not
trying to completely drop memblock_is_map_memory() or are we ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-04  6:18       ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-04  6:18 UTC (permalink / raw)
  To: David Hildenbrand, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Michael Ellerman, Jérôme Glisse, James Morse,
	Dan Williams, Robin Murphy, ardb


On 12/22/20 2:41 PM, David Hildenbrand wrote:
> On 22.12.20 08:12, Anshuman Khandual wrote:
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/arm64/mm/init.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>  
>>  	if (!valid_section(__pfn_to_section(pfn)))
>>  		return 0;
>> +
>> +	/*
>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>> +	 * addresses will always fail. Even the normal hotplugged
>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>> +	 * memblock entries. Skip memblock search for all non early
>> +	 * memory sections covering all of hotplug memory including
>> +	 * both normal and ZONE_DEVIE based.
>> +	 */
>> +	if (!early_section(__pfn_to_section(pfn)))
>> +		return 1;
> 
> Actually, I think we want to check for partial present sections.
> 
> Maybe we can rather switch to generic pfn_valid() and tweak it to
> something like
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fb3bf696c05e..7b1fcce5bd5a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
>                 return 0;
>         /*
>          * Traditionally early sections always returned pfn_valid() for
> -        * the entire section-sized span.
> +        * the entire section-sized span. Some archs might have holes in
> +        * early sections, so double check with memblock if configured.
>          */
> -       return early_section(ms) || pfn_section_valid(ms, pfn);
> +       if (early_section(ms))
> +               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> +                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> +       return pfn_section_valid(ms, pfn);
>  }
>  #endif

Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
create this config which could track platform scenarios where all early
sections might not have mmap coverage such as arm64 ?

> 
> 
> 
> Which users are remaining that require us to add/remove memblocks when
> hot(un)plugging memory
> 
>  $ git grep KEEP_MEM | grep memory_hotplug
> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {

Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
will not be able to track MEMBLOCK_NOMAP memory at runtime.

> 
> 
> I think one user we would have to handle is
> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> does not rely on memblock_is_map_memory.

memblock_is_map_memory() is currently used only on arm/arm64 platforms.
Apart from the above example in valid_phys_addr_range(), there are some
other memblock_is_map_memory() call sites as well. But then, we are not
trying to completely drop memblock_is_map_memory() or are we ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2020-12-22  7:12   ` Anshuman Khandual
@ 2021-01-04 15:36     ` Jonathan Cameron
  -1 siblings, 0 replies; 32+ messages in thread
From: Jonathan Cameron @ 2021-01-04 15:36 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, linux-kernel, Mark Rutland, catalin.marinas,
	David Hildenbrand, Robin Murphy, Jérôme Glisse,
	James Morse, Dan Williams, will, ardb

On Tue, 22 Dec 2020 12:42:23 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged

typo: ZONE_DEVIE

> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/mm/init.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>  
>  	if (!valid_section(__pfn_to_section(pfn)))
>  		return 0;
> +
> +	/*
> +	 * ZONE_DEVICE memory does not have the memblock entries.
> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * addresses will always fail. Even the normal hotplugged
> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> +	 * memblock entries. Skip memblock search for all non early
> +	 * memory sections covering all of hotplug memory including
> +	 * both normal and ZONE_DEVIE based.

Here as well + the cover letter title.

> +	 */
> +	if (!early_section(__pfn_to_section(pfn)))
> +		return 1;
>  #endif
>  	return memblock_is_map_memory(addr);
>  }


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-04 15:36     ` Jonathan Cameron
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Cameron @ 2021-01-04 15:36 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mark Rutland, will, David Hildenbrand, catalin.marinas,
	linux-kernel, Jérôme Glisse, James Morse, Dan Williams,
	Robin Murphy, ardb, linux-arm-kernel

On Tue, 22 Dec 2020 12:42:23 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged

typo: ZONE_DEVIE

> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/mm/init.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 75addb36354a..ee23bda00c28 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>  
>  	if (!valid_section(__pfn_to_section(pfn)))
>  		return 0;
> +
> +	/*
> +	 * ZONE_DEVICE memory does not have the memblock entries.
> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * addresses will always fail. Even the normal hotplugged
> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> +	 * memblock entries. Skip memblock search for all non early
> +	 * memory sections covering all of hotplug memory including
> +	 * both normal and ZONE_DEVIE based.

Here as well + the cover letter title.

> +	 */
> +	if (!early_section(__pfn_to_section(pfn)))
> +		return 1;
>  #endif
>  	return memblock_is_map_memory(addr);
>  }


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-04 15:36     ` Jonathan Cameron
@ 2021-01-05  3:25       ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-05  3:25 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-arm-kernel, linux-kernel, Mark Rutland, catalin.marinas,
	David Hildenbrand, Robin Murphy, Jérôme Glisse,
	James Morse, Dan Williams, will, ardb


On 1/4/21 9:06 PM, Jonathan Cameron wrote:
> On Tue, 22 Dec 2020 12:42:23 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> 
> typo: ZONE_DEVIE
> 
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/arm64/mm/init.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>  
>>  	if (!valid_section(__pfn_to_section(pfn)))
>>  		return 0;
>> +
>> +	/*
>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>> +	 * addresses will always fail. Even the normal hotplugged
>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>> +	 * memblock entries. Skip memblock search for all non early
>> +	 * memory sections covering all of hotplug memory including
>> +	 * both normal and ZONE_DEVIE based.
> 
> Here as well + the cover letter title.

My bad, will fix all the three instances.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-05  3:25       ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-05  3:25 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Rutland, will, David Hildenbrand, catalin.marinas,
	linux-kernel, Jérôme Glisse, James Morse, Dan Williams,
	Robin Murphy, ardb, linux-arm-kernel


On 1/4/21 9:06 PM, Jonathan Cameron wrote:
> On Tue, 22 Dec 2020 12:42:23 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> 
> typo: ZONE_DEVIE
> 
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/arm64/mm/init.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 75addb36354a..ee23bda00c28 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>  
>>  	if (!valid_section(__pfn_to_section(pfn)))
>>  		return 0;
>> +
>> +	/*
>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>> +	 * addresses will always fail. Even the normal hotplugged
>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>> +	 * memblock entries. Skip memblock search for all non early
>> +	 * memory sections covering all of hotplug memory including
>> +	 * both normal and ZONE_DEVIE based.
> 
> Here as well + the cover letter title.

My bad, will fix all the three instances.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-04  6:18       ` Anshuman Khandual
@ 2021-01-11 10:31         ` David Hildenbrand
  -1 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2021-01-11 10:31 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport, Michael Ellerman

On 04.01.21 07:18, Anshuman Khandual wrote:
> 
> On 12/22/20 2:41 PM, David Hildenbrand wrote:
>> On 22.12.20 08:12, Anshuman Khandual wrote:
>>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>>> backing for that pfn. It should always return positive for memory ranges
>>> backed with struct page mapping. But currently pfn_valid() fails for all
>>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>>
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>  arch/arm64/mm/init.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 75addb36354a..ee23bda00c28 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>>  
>>>  	if (!valid_section(__pfn_to_section(pfn)))
>>>  		return 0;
>>> +
>>> +	/*
>>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>>> +	 * addresses will always fail. Even the normal hotplugged
>>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>>> +	 * memblock entries. Skip memblock search for all non early
>>> +	 * memory sections covering all of hotplug memory including
>>> +	 * both normal and ZONE_DEVIE based.
>>> +	 */
>>> +	if (!early_section(__pfn_to_section(pfn)))
>>> +		return 1;
>>
>> Actually, I think we want to check for partial present sections.
>>
>> Maybe we can rather switch to generic pfn_valid() and tweak it to
>> something like
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fb3bf696c05e..7b1fcce5bd5a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
>>                 return 0;
>>         /*
>>          * Traditionally early sections always returned pfn_valid() for
>> -        * the entire section-sized span.
>> +        * the entire section-sized span. Some archs might have holes in
>> +        * early sections, so double check with memblock if configured.
>>          */
>> -       return early_section(ms) || pfn_section_valid(ms, pfn);
>> +       if (early_section(ms))
>> +               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
>> +                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>> +       return pfn_section_valid(ms, pfn);
>>  }
>>  #endif
> 
> Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> create this config which could track platform scenarios where all early
> sections might not have mmap coverage such as arm64 ?

Yes, a new config that states what's actually happening.

> 
>>
>>
>>
>> Which users are remaining that require us to add/remove memblocks when
>> hot(un)plugging memory
>>
>>  $ git grep KEEP_MEM | grep memory_hotplug
>> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> 
> Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> will not be able to track MEMBLOCK_NOMAP memory at runtime.

I'd only like the hot(un)plug parts gone for now, if possible: I don't
see the need for that handling really that cannot be handled easier, as
in the proposed pfn_valid() changes.

I understand that current handling of memory holes in early sections and
memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.

> 
>>
>>
>> I think one user we would have to handle is
>> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
>> does not rely on memblock_is_map_memory.
> 
> memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> Apart from the above example in valid_phys_addr_range(), there are some
> other memblock_is_map_memory() call sites as well. But then, we are not
> trying to completely drop memblock_is_map_memory() or are we ?

No, just change the semantics: only relevant for early sections. Imagine
freezing MEMBLOCK state after boot.

Only early sections can have memory holes and might be marked
MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
memblock_is_map_memory().

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-11 10:31         ` David Hildenbrand
  0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2021-01-11 10:31 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Michael Ellerman, Jérôme Glisse, James Morse,
	Dan Williams, Robin Murphy, ardb

On 04.01.21 07:18, Anshuman Khandual wrote:
> 
> On 12/22/20 2:41 PM, David Hildenbrand wrote:
>> On 22.12.20 08:12, Anshuman Khandual wrote:
>>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>>> backing for that pfn. It should always return positive for memory ranges
>>> backed with struct page mapping. But currently pfn_valid() fails for all
>>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>>
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>  arch/arm64/mm/init.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 75addb36354a..ee23bda00c28 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -225,6 +225,18 @@ int pfn_valid(unsigned long pfn)
>>>  
>>>  	if (!valid_section(__pfn_to_section(pfn)))
>>>  		return 0;
>>> +
>>> +	/*
>>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>>> +	 * addresses will always fail. Even the normal hotplugged
>>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>>> +	 * memblock entries. Skip memblock search for all non early
>>> +	 * memory sections covering all of hotplug memory including
>>> +	 * both normal and ZONE_DEVIE based.
>>> +	 */
>>> +	if (!early_section(__pfn_to_section(pfn)))
>>> +		return 1;
>>
>> Actually, I think we want to check for partial present sections.
>>
>> Maybe we can rather switch to generic pfn_valid() and tweak it to
>> something like
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fb3bf696c05e..7b1fcce5bd5a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
>>                 return 0;
>>         /*
>>          * Traditionally early sections always returned pfn_valid() for
>> -        * the entire section-sized span.
>> +        * the entire section-sized span. Some archs might have holes in
>> +        * early sections, so double check with memblock if configured.
>>          */
>> -       return early_section(ms) || pfn_section_valid(ms, pfn);
>> +       if (early_section(ms))
>> +               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
>> +                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
>> +       return pfn_section_valid(ms, pfn);
>>  }
>>  #endif
> 
> Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> create this config which could track platform scenarios where all early
> sections might not have mmap coverage such as arm64 ?

Yes, a new config that states what's actually happening.

> 
>>
>>
>>
>> Which users are remaining that require us to add/remove memblocks when
>> hot(un)plugging memory
>>
>>  $ git grep KEEP_MEM | grep memory_hotplug
>> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
>> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> 
> Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> will not be able to track MEMBLOCK_NOMAP memory at runtime.

I'd only like the hot(un)plug parts gone for now, if possible: I don't
see the need for that handling really that cannot be handled easier, as
in the proposed pfn_valid() changes.

I understand that current handling of memory holes in early sections and
memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.

> 
>>
>>
>> I think one user we would have to handle is
>> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
>> does not rely on memblock_is_map_memory.
> 
> memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> Apart from the above example in valid_phys_addr_range(), there are some
> other memblock_is_map_memory() call sites as well. But then, we are not
> trying to completely drop memblock_is_map_memory() or are we ?

No, just change the semantics: only relevant for early sections. Imagine
freezing MEMBLOCK state after boot.

Only early sections can have memory holes and might be marked
MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
memblock_is_map_memory().

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-11 10:31         ` David Hildenbrand
@ 2021-01-11 14:00           ` Mike Rapoport
  -1 siblings, 0 replies; 32+ messages in thread
From: Mike Rapoport @ 2021-01-11 14:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, linux-arm-kernel, linux-kernel,
	catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Michael Ellerman

On Mon, Jan 11, 2021 at 11:31:02AM +0100, David Hildenbrand wrote:
> On 04.01.21 07:18, Anshuman Khandual wrote:

...

> >> Actually, I think we want to check for partial present sections.
> >>
> >> Maybe we can rather switch to generic pfn_valid() and tweak it to
> >> something like
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index fb3bf696c05e..7b1fcce5bd5a 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
> >>                 return 0;
> >>         /*
> >>          * Traditionally early sections always returned pfn_valid() for
> >> -        * the entire section-sized span.
> >> +        * the entire section-sized span. Some archs might have holes in
> >> +        * early sections, so double check with memblock if configured.
> >>          */
> >> -       return early_section(ms) || pfn_section_valid(ms, pfn);
> >> +       if (early_section(ms))
> >> +               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> >> +                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> >> +       return pfn_section_valid(ms, pfn);
> >>  }
> >>  #endif
> > 
> > Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> > create this config which could track platform scenarios where all early
> > sections might not have mmap coverage such as arm64 ?
> 
> Yes, a new config that states what's actually happening.

Just to clarify, there are several architectures that may free parts of
memmap with FLATMEM/SPARSEMEM and this functionality is gated by
CONFIG_HAVE_ARCH_PFN_VALID.

So if arm64 is to use a generic version, this should be also taken into
account.

> >> Which users are remaining that require us to add/remove memblocks when
> >> hot(un)plugging memory
> >>
> >>  $ git grep KEEP_MEM | grep memory_hotplug
> >> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> > 
> > Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> > will not be able to track MEMBLOCK_NOMAP memory at runtime.
> 
> I'd only like the hot(un)plug parts gone for now, if possible: I don't
> see the need for that handling really that cannot be handled easier, as
> in the proposed pfn_valid() changes.
> 
> I understand that current handling of memory holes in early sections and
> memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.
 
This is mostly about the holes in the memory map. I believe it does not
matter if that memory is NOMAP or not, the holes in the memmap are punched
for anything in memblock.memory.

It seems to me that for arm64's pfn_valid() we can safely replace
memblock_is_map_memory() with memblock_is_memory(), not that it would
change much.

> >> I think one user we would have to handle is
> >> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> >> does not rely on memblock_is_map_memory.
> > 
> > memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> > Apart from the above example in valid_phys_addr_range(), there are some
> > other memblock_is_map_memory() call sites as well. But then, we are not
> > trying to completely drop memblock_is_map_memory() or are we ?
> 
> No, just change the semantics: only relevant for early sections. Imagine
> freezing MEMBLOCK state after boot.
> 
> Only early sections can have memory holes and might be marked
> MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
> memblock_is_map_memory().
>
> -- 
> Thanks,
> 
> David / dhildenb

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-11 14:00           ` Mike Rapoport
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Rapoport @ 2021-01-11 14:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mark Rutland, Anshuman Khandual, catalin.marinas, linux-kernel,
	Michael Ellerman, Jérôme Glisse, James Morse,
	Dan Williams, will, ardb, linux-arm-kernel, Robin Murphy

On Mon, Jan 11, 2021 at 11:31:02AM +0100, David Hildenbrand wrote:
> On 04.01.21 07:18, Anshuman Khandual wrote:

...

> >> Actually, I think we want to check for partial present sections.
> >>
> >> Maybe we can rather switch to generic pfn_valid() and tweak it to
> >> something like
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index fb3bf696c05e..7b1fcce5bd5a 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -1382,9 +1382,13 @@ static inline int pfn_valid(unsigned long pfn)
> >>                 return 0;
> >>         /*
> >>          * Traditionally early sections always returned pfn_valid() for
> >> -        * the entire section-sized span.
> >> +        * the entire section-sized span. Some archs might have holes in
> >> +        * early sections, so double check with memblock if configured.
> >>          */
> >> -       return early_section(ms) || pfn_section_valid(ms, pfn);
> >> +       if (early_section(ms))
> >> +               return IS_ENABLED(CONFIG_EARLY_SECTION_MEMMAP_HOLES) ?
> >> +                      memblock_is_map_memory(pfn << PAGE_SHIFT) : 1;
> >> +       return pfn_section_valid(ms, pfn);
> >>  }
> >>  #endif
> > 
> > Could not find CONFIG_EARLY_SECTION_MEMMAP_HOLES. Are you suggesting to
> > create this config which could track platform scenarios where all early
> > sections might not have mmap coverage such as arm64 ?
> 
> Yes, a new config that states what's actually happening.

Just to clarify, there are several architectures that may free parts of
memmap with FLATMEM/SPARSEMEM and this functionality is gated by
CONFIG_HAVE_ARCH_PFN_VALID.

So if arm64 is to use a generic version, this should be also taken into
account.

> >> Which users are remaining that require us to add/remove memblocks when
> >> hot(un)plugging memory
> >>
> >>  $ git grep KEEP_MEM | grep memory_hotplug
> >> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
> >> mm/memory_hotplug.c:    if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
> > 
> > Did not follow, do we want to drop ARCH_KEEP_MEMBLOCK ? Without it arm64
> > will not be able to track MEMBLOCK_NOMAP memory at runtime.
> 
> I'd only like the hot(un)plug parts gone for now, if possible: I don't
> see the need for that handling really that cannot be handled easier, as
> in the proposed pfn_valid() changes.
> 
> I understand that current handling of memory holes in early sections and
> memory marked as MEMBLOCK_NOMAP requires ARCH_KEEP_MEMBLOCK for now.
 
This is mostly about the holes in the memory map. I believe it does not
matter if that memory is NOMAP or not, the holes in the memmap are punched
for anything in memblock.memory.

It seems to me that for arm64's pfn_valid() we can safely replace
memblock_is_map_memory() with memblock_is_memory(), not that it would
change much.

> >> I think one user we would have to handle is
> >> arch/arm64/mm/mmap.c:valid_phys_addr_range(). AFAIS, powerpc at least
> >> does not rely on memblock_is_map_memory.
> > 
> > memblock_is_map_memory() is currently used only on arm/arm64 platforms.
> > Apart from the above example in valid_phys_addr_range(), there are some
> > other memblock_is_map_memory() call sites as well. But then, we are not
> > trying to completely drop memblock_is_map_memory() or are we ?
> 
> No, just change the semantics: only relevant for early sections. Imagine
> freezing MEMBLOCK state after boot.
> 
> Only early sections can have memory holes and might be marked
> MEMBLOCK_NOMAP. For hotplugged memory, we don't have to call
> memblock_is_map_memory().
>
> -- 
> Thanks,
> 
> David / dhildenb

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2020-12-22  7:12   ` Anshuman Khandual
@ 2021-01-25  6:22     ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-25  6:22 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	David Hildenbrand, Mike Rapoport


On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Hello David/Mike,

Given that we would need to rework early sections, memblock semantics via a
new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
which fixes a problem (and improves performance) can be merged first. After
that, I could start working on the proposed rework. Could you please let me
know your thoughts on this. Thank you.

- Anshuman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-25  6:22     ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-25  6:22 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, David Hildenbrand, catalin.marinas,
	Mike Rapoport, Jérôme Glisse, James Morse,
	Dan Williams, Robin Murphy, ardb


On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> normal memory too will not have SECTION_IS_EARLY set for their sections.
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

Hello David/Mike,

Given that we would need to rework early sections, memblock semantics via a
new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
which fixes a problem (and improves performance) can be merged first. After
that, I could start working on the proposed rework. Could you please let me
know your thoughts on this. Thank you.

- Anshuman

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-25  6:22     ` Anshuman Khandual
@ 2021-01-25  7:31       ` Mike Rapoport
  -1 siblings, 0 replies; 32+ messages in thread
From: Mike Rapoport @ 2021-01-25  7:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, will, ardb,
	Mark Rutland, James Morse, Robin Murphy, Jérôme Glisse,
	Dan Williams, David Hildenbrand

On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
> 
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> > pfn_valid() asserts that there is a memblock entry for a given pfn without
> > MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> > that they do not have memblock entries. Hence memblock_is_map_memory() will
> > invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> > eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> > to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> > into the system via memremap_pages() called from a driver, their respective
> > memory sections will not have SECTION_IS_EARLY set.
> > 
> > Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> > regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> > for firmware reserved memory regions. memblock_is_map_memory() can just be
> > skipped as its always going to be positive and that will be an optimization
> > for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> > normal memory too will not have SECTION_IS_EARLY set for their sections.
> > 
> > Skipping memblock_is_map_memory() for all non early memory sections would
> > fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> > performance for normal hotplug memory as well.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> Hello David/Mike,
> 
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

I didn't object to these patches, I think they are fine.
I agree that we can look into update of arm64's pfn_valid(), maybe right
after decrease of section size lands in.
 
> - Anshuman

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-25  7:31       ` Mike Rapoport
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Rapoport @ 2021-01-25  7:31 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Mark Rutland, David Hildenbrand, catalin.marinas, linux-kernel,
	Jérôme Glisse, James Morse, Dan Williams, will, ardb,
	linux-arm-kernel, Robin Murphy

On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
> 
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
> > pfn_valid() asserts that there is a memblock entry for a given pfn without
> > MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> > that they do not have memblock entries. Hence memblock_is_map_memory() will
> > invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> > eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> > to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> > into the system via memremap_pages() called from a driver, their respective
> > memory sections will not have SECTION_IS_EARLY set.
> > 
> > Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> > regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> > for firmware reserved memory regions. memblock_is_map_memory() can just be
> > skipped as its always going to be positive and that will be an optimization
> > for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
> > normal memory too will not have SECTION_IS_EARLY set for their sections.
> > 
> > Skipping memblock_is_map_memory() for all non early memory sections would
> > fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> > performance for normal hotplug memory as well.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> Hello David/Mike,
> 
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

I didn't object to these patches, I think they are fine.
I agree that we can look into update of arm64's pfn_valid(), maybe right
after decrease of section size lands in.
 
> - Anshuman

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-25  6:22     ` Anshuman Khandual
@ 2021-01-25  9:13       ` David Hildenbrand
  -1 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2021-01-25  9:13 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport

On 25.01.21 07:22, Anshuman Khandual wrote:
> 
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> Hello David/Mike,
> 
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

As I said, we might have to throw in an pfn_section_valid() check, to
catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
on arm64 as well, no?).

Apart from that, I'm fine with a simple fix upfront, that can be more
easily backported if needed. (Q: do we? is this stable material?)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-25  9:13       ` David Hildenbrand
  0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2021-01-25  9:13 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Jérôme Glisse, James Morse, Dan Williams, Robin Murphy,
	ardb

On 25.01.21 07:22, Anshuman Khandual wrote:
> 
> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> Hello David/Mike,
> 
> Given that we would need to rework early sections, memblock semantics via a
> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
> which fixes a problem (and improves performance) can be merged first. After
> that, I could start working on the proposed rework. Could you please let me
> know your thoughts on this. Thank you.

As I said, we might have to throw in an pfn_section_valid() check, to
catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
on arm64 as well, no?).

Apart from that, I'm fine with a simple fix upfront, that can be more
easily backported if needed. (Q: do we? is this stable material?)

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-25  7:31       ` Mike Rapoport
@ 2021-01-27  3:46         ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-27  3:46 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, will, ardb,
	Mark Rutland, James Morse, Robin Murphy, Jérôme Glisse,
	Dan Williams, David Hildenbrand



On 1/25/21 1:01 PM, Mike Rapoport wrote:
> On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
> 
> I didn't object to these patches, I think they are fine.
> I agree that we can look into update of arm64's pfn_valid(), maybe right
> after decrease of section size lands in.

Sure, will drop the RFC tag and prepare these patches.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-27  3:46         ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-27  3:46 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, David Hildenbrand, catalin.marinas, linux-kernel,
	Jérôme Glisse, James Morse, Dan Williams, will, ardb,
	linux-arm-kernel, Robin Murphy



On 1/25/21 1:01 PM, Mike Rapoport wrote:
> On Mon, Jan 25, 2021 at 11:52:32AM +0530, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
> 
> I didn't object to these patches, I think they are fine.
> I agree that we can look into update of arm64's pfn_valid(), maybe right
> after decrease of section size lands in.

Sure, will drop the RFC tag and prepare these patches.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-25  9:13       ` David Hildenbrand
@ 2021-01-27  4:06         ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-27  4:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport



On 1/25/21 2:43 PM, David Hildenbrand wrote:
> On 25.01.21 07:22, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
> 
> As I said, we might have to throw in an pfn_section_valid() check, to
> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
> on arm64 as well, no?).

pfn_section_valid() should be called only for !early_section() i.e normal
hotplug and ZONE_DEVICE memory ? Because early boot memory should always
be section aligned.

> 
> Apart from that, I'm fine with a simple fix upfront, that can be more
> easily backported if needed. (Q: do we? is this stable material?)
> 

Right, an upfront fix here would help in backporting. AFAICS it should be
backported to the stable as pte_devmap and ZONE_DEVICE have been around
for some time now. Do you have a particular stable version which needs to
be tagged in the patch ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-27  4:06         ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-27  4:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Jérôme Glisse, James Morse, Dan Williams, Robin Murphy,
	ardb



On 1/25/21 2:43 PM, David Hildenbrand wrote:
> On 25.01.21 07:22, Anshuman Khandual wrote:
>>
>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>> into the system via memremap_pages() called from a driver, their respective
>>> memory sections will not have SECTION_IS_EARLY set.
>>>
>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>> skipped as its always going to be positive and that will be an optimization
>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>
>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>> performance for normal hotplug memory as well.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>
>> Hello David/Mike,
>>
>> Given that we would need to rework early sections, memblock semantics via a
>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>> which fixes a problem (and improves performance) can be merged first. After
>> that, I could start working on the proposed rework. Could you please let me
>> know your thoughts on this. Thank you.
> 
> As I said, we might have to throw in an pfn_section_valid() check, to
> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
> on arm64 as well, no?).

pfn_section_valid() should be called only for !early_section() i.e normal
hotplug and ZONE_DEVICE memory ? Because early boot memory should always
be section aligned.

> 
> Apart from that, I'm fine with a simple fix upfront, that can be more
> easily backported if needed. (Q: do we? is this stable material?)
> 

Right, an upfront fix here would help in backporting. AFAICS it should be
backported to the stable as pte_devmap and ZONE_DEVICE have been around
for some time now. Do you have a particular stable version which needs to
be tagged in the patch ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-27  4:06         ` Anshuman Khandual
@ 2021-01-27  9:29           ` David Hildenbrand
  -1 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2021-01-27  9:29 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport

On 27.01.21 05:06, Anshuman Khandual wrote:
> 
> 
> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>
>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>> into the system via memremap_pages() called from a driver, their respective
>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>
>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>> skipped as its always going to be positive and that will be an optimization
>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>
>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>> performance for normal hotplug memory as well.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>
>>> Hello David/Mike,
>>>
>>> Given that we would need to rework early sections, memblock semantics via a
>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>> which fixes a problem (and improves performance) can be merged first. After
>>> that, I could start working on the proposed rework. Could you please let me
>>> know your thoughts on this. Thank you.
>>
>> As I said, we might have to throw in an pfn_section_valid() check, to
>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>> on arm64 as well, no?).
> 
> pfn_section_valid() should be called only for !early_section() i.e normal
> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
> be section aligned.

Well, at least not on x86-64 you can have early sections intersect with 
ZONE_DEVICE memory.

E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE 
memory which might cover the remaining 64MB. For pfn_valid() on x86-64, 
we always return "true" for such sections, because we always have the 
memmap for the whole early section allocated during boot. So, there it's 
"simple".

Now, arm64 seems to discard some parts of the vmemmap, so the remaining 
64MB in such an early section might not have a memmap anymore? TBH, I 
don't know.

Most probably only performing the check for
!early_section() is sufficient on arm64, but I really can't tell as I 
don't know what we're actually discarding and if something as described 
for x86-64 is even possible on arm64.

We should really try to take the magic out of arm64 vmemmap handling.

> 
>>
>> Apart from that, I'm fine with a simple fix upfront, that can be more
>> easily backported if needed. (Q: do we? is this stable material?)
>>
> 
> Right, an upfront fix here would help in backporting. AFAICS it should be
> backported to the stable as pte_devmap and ZONE_DEVICE have been around
> for some time now. Do you have a particular stable version which needs to
> be tagged in the patch ?

I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was 
enabled on arm64?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-27  9:29           ` David Hildenbrand
  0 siblings, 0 replies; 32+ messages in thread
From: David Hildenbrand @ 2021-01-27  9:29 UTC (permalink / raw)
  To: Anshuman Khandual, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Jérôme Glisse, James Morse, Dan Williams, Robin Murphy,
	ardb

On 27.01.21 05:06, Anshuman Khandual wrote:
> 
> 
> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>
>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>> into the system via memremap_pages() called from a driver, their respective
>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>
>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>> skipped as its always going to be positive and that will be an optimization
>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>
>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>> performance for normal hotplug memory as well.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>
>>> Hello David/Mike,
>>>
>>> Given that we would need to rework early sections, memblock semantics via a
>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>> which fixes a problem (and improves performance) can be merged first. After
>>> that, I could start working on the proposed rework. Could you please let me
>>> know your thoughts on this. Thank you.
>>
>> As I said, we might have to throw in an pfn_section_valid() check, to
>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>> on arm64 as well, no?).
> 
> pfn_section_valid() should be called only for !early_section() i.e normal
> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
> be section aligned.

Well, at least not on x86-64 you can have early sections intersect with 
ZONE_DEVICE memory.

E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE 
memory which might cover the remaining 64MB. For pfn_valid() on x86-64, 
we always return "true" for such sections, because we always have the 
memmap for the whole early section allocated during boot. So, there it's 
"simple".

Now, arm64 seems to discard some parts of the vmemmap, so the remaining 
64MB in such an early section might not have a memmap anymore? TBH, I 
don't know.

Most probably only performing the check for
!early_section() is sufficient on arm64, but I really can't tell as I 
don't know what we're actually discarding and if something as described 
for x86-64 is even possible on arm64.

We should really try to take the magic out of arm64 vmemmap handling.

> 
>>
>> Apart from that, I'm fine with a simple fix upfront, that can be more
>> easily backported if needed. (Q: do we? is this stable material?)
>>
> 
> Right, an upfront fix here would help in backporting. AFAICS it should be
> backported to the stable as pte_devmap and ZONE_DEVICE have been around
> for some time now. Do you have a particular stable version which needs to
> be tagged in the patch ?

I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was 
enabled on arm64?

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
  2021-01-27  9:29           ` David Hildenbrand
@ 2021-01-28  7:42             ` Anshuman Khandual
  -1 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-28  7:42 UTC (permalink / raw)
  To: David Hildenbrand, linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, ardb, Mark Rutland, James Morse,
	Robin Murphy, Jérôme Glisse, Dan Williams,
	Mike Rapoport


On 1/27/21 2:59 PM, David Hildenbrand wrote:
> On 27.01.21 05:06, Anshuman Khandual wrote:
>>
>>
>> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>>
>>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>>> into the system via memremap_pages() called from a driver, their respective
>>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>>
>>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>>> skipped as its always going to be positive and that will be an optimization
>>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>>
>>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>>> performance for normal hotplug memory as well.
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Will Deacon <will@kernel.org>
>>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>>
>>>> Hello David/Mike,
>>>>
>>>> Given that we would need to rework early sections, memblock semantics via a
>>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>>> which fixes a problem (and improves performance) can be merged first. After
>>>> that, I could start working on the proposed rework. Could you please let me
>>>> know your thoughts on this. Thank you.
>>>
>>> As I said, we might have to throw in an pfn_section_valid() check, to
>>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>>> on arm64 as well, no?).
>>
>> pfn_section_valid() should be called only for !early_section() i.e normal
>> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
>> be section aligned.
> 
> Well, at least not on x86-64 you can have early sections intersect with ZONE_DEVICE memory.
> 
> E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE memory which might cover the remaining 64MB. For pfn_valid() on x86-64, we always return "true" for such sections, because we always have the memmap for the whole early section allocated during boot. So, there it's "simple".

This is the generic pfn_valid() used on X86. As you mentioned this
does not test pfn_section_valid() if the section is early assuming
that vmemmap coverage is complete.

#ifndef CONFIG_HAVE_ARCH_PFN_VALID
static inline int pfn_valid(unsigned long pfn)
{
        struct mem_section *ms;

        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        ms = __nr_to_section(pfn_to_section_nr(pfn));
        if (!valid_section(ms))
                return 0;
        /*
         * Traditionally early sections always returned pfn_valid() for
         * the entire section-sized span.
         */
        return early_section(ms) || pfn_section_valid(ms, pfn);
}
#endif

Looking at the code, seems like early sections get initialized via
sparse_init() only in section granularity but then please correct
me otherwise.

> 
> Now, arm64 seems to discard some parts of the vmemmap, so the remaining 64MB in such an early section might not have a memmap anymore? TBH, I don't know.

Did not get that. Could you please be more specific on how arm64 discards
parts of the vmemmap.

> 
> Most probably only performing the check for
> !early_section() is sufficient on arm64, but I really can't tell as I don't know what we're actually discarding and if something as described for x86-64 is even possible on arm64.

Seems like direct users for arch_add_memory() and __add_pages() like
pagemap_range() can cause subsection hotplug and vmemmap mapping. So
pfn_section_valid() should be applicable only for !early_sections().

Although a simple test on arm64 shows that both boot memory and
traditional memory hotplug gets entire subsection_map populated. But
that might not be always true for ZONE_DEVICE memory.

> 
> We should really try to take the magic out of arm64 vmemmap handling.

I would really like to understand more about this.

> 
>>
>>>
>>> Apart from that, I'm fine with a simple fix upfront, that can be more
>>> easily backported if needed. (Q: do we? is this stable material?)
>>>
>>
>> Right, an upfront fix here would help in backporting. AFAICS it should be
>> backported to the stable as pte_devmap and ZONE_DEVICE have been around
>> for some time now. Do you have a particular stable version which needs to
>> be tagged in the patch ?
> 
> I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was enabled on arm64?
> 
Sure, will figure this out.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory
@ 2021-01-28  7:42             ` Anshuman Khandual
  0 siblings, 0 replies; 32+ messages in thread
From: Anshuman Khandual @ 2021-01-28  7:42 UTC (permalink / raw)
  To: David Hildenbrand, linux-arm-kernel, linux-kernel
  Cc: Mark Rutland, will, catalin.marinas, Mike Rapoport,
	Jérôme Glisse, James Morse, Dan Williams, Robin Murphy,
	ardb


On 1/27/21 2:59 PM, David Hildenbrand wrote:
> On 27.01.21 05:06, Anshuman Khandual wrote:
>>
>>
>> On 1/25/21 2:43 PM, David Hildenbrand wrote:
>>> On 25.01.21 07:22, Anshuman Khandual wrote:
>>>>
>>>> On 12/22/20 12:42 PM, Anshuman Khandual wrote:
>>>>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>>>>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>>>>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>>>>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>>>>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>>>>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>>>>> into the system via memremap_pages() called from a driver, their respective
>>>>> memory sections will not have SECTION_IS_EARLY set.
>>>>>
>>>>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>>>>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>>>>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>>>>> skipped as its always going to be positive and that will be an optimization
>>>>> for the normal hotplug memory. Like ZONE_DEVIE based memory, all hotplugged
>>>>> normal memory too will not have SECTION_IS_EARLY set for their sections.
>>>>>
>>>>> Skipping memblock_is_map_memory() for all non early memory sections would
>>>>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>>>>> performance for normal hotplug memory as well.
>>>>>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Will Deacon <will@kernel.org>
>>>>> Cc: Ard Biesheuvel <ardb@kernel.org>
>>>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>>>> Cc: linux-arm-kernel@lists.infradead.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>>
>>>> Hello David/Mike,
>>>>
>>>> Given that we would need to rework early sections, memblock semantics via a
>>>> new config i.e EARLY_SECTION_MEMMAP_HOLES and also some possible changes to
>>>> ARCH_KEEP_MEMBLOCK and HAVE_ARCH_PFN_VALID, wondering if these patches here
>>>> which fixes a problem (and improves performance) can be merged first. After
>>>> that, I could start working on the proposed rework. Could you please let me
>>>> know your thoughts on this. Thank you.
>>>
>>> As I said, we might have to throw in an pfn_section_valid() check, to
>>> catch not-section-aligned ZONE_DEVICE ranges (I assume this is possible
>>> on arm64 as well, no?).
>>
>> pfn_section_valid() should be called only for !early_section() i.e normal
>> hotplug and ZONE_DEVICE memory ? Because early boot memory should always
>> be section aligned.
> 
> Well, at least not on x86-64 you can have early sections intersect with ZONE_DEVICE memory.
> 
> E.g., have 64MB boot memory in a section. Later, we add ZONE_DEVICE memory which might cover the remaining 64MB. For pfn_valid() on x86-64, we always return "true" for such sections, because we always have the memmap for the whole early section allocated during boot. So, there it's "simple".

This is the generic pfn_valid() used on X86. As you mentioned this
does not test pfn_section_valid() if the section is early assuming
that vmemmap coverage is complete.

#ifndef CONFIG_HAVE_ARCH_PFN_VALID
static inline int pfn_valid(unsigned long pfn)
{
        struct mem_section *ms;

        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        ms = __nr_to_section(pfn_to_section_nr(pfn));
        if (!valid_section(ms))
                return 0;
        /*
         * Traditionally early sections always returned pfn_valid() for
         * the entire section-sized span.
         */
        return early_section(ms) || pfn_section_valid(ms, pfn);
}
#endif

Looking at the code, seems like early sections get initialized via
sparse_init() only in section granularity but then please correct
me otherwise.

> 
> Now, arm64 seems to discard some parts of the vmemmap, so the remaining 64MB in such an early section might not have a memmap anymore? TBH, I don't know.

Did not get that. Could you please be more specific on how arm64 discards
parts of the vmemmap.

> 
> Most probably only performing the check for
> !early_section() is sufficient on arm64, but I really can't tell as I don't know what we're actually discarding and if something as described for x86-64 is even possible on arm64.

Seems like direct users for arch_add_memory() and __add_pages() like
pagemap_range() can cause subsection hotplug and vmemmap mapping. So
pfn_section_valid() should be applicable only for !early_sections().

Although a simple test on arm64 shows that both boot memory and
traditional memory hotplug gets entire subsection_map populated. But
that might not be always true for ZONE_DEVICE memory.

> 
> We should really try to take the magic out of arm64 vmemmap handling.

I would really like to understand more about this.

> 
>>
>>>
>>> Apart from that, I'm fine with a simple fix upfront, that can be more
>>> easily backported if needed. (Q: do we? is this stable material?)
>>>
>>
>> Right, an upfront fix here would help in backporting. AFAICS it should be
>> backported to the stable as pte_devmap and ZONE_DEVICE have been around
>> for some time now. Do you have a particular stable version which needs to
>> be tagged in the patch ?
> 
> I haven't looked yet TBH. I guess it is broken since ZONE_DEVICE was enabled on arm64?
> 
Sure, will figure this out.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-01-28  7:43 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22  7:12 [RFC 0/2] arm64/mm: Fix pfn_valid() for ZONE_DEVIE based memory Anshuman Khandual
2020-12-22  7:12 ` Anshuman Khandual
2020-12-22  7:12 ` [RFC 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE " Anshuman Khandual
2020-12-22  7:12   ` Anshuman Khandual
2020-12-22  9:11   ` David Hildenbrand
2020-12-22  9:11     ` David Hildenbrand
2021-01-04  6:18     ` Anshuman Khandual
2021-01-04  6:18       ` Anshuman Khandual
2021-01-11 10:31       ` David Hildenbrand
2021-01-11 10:31         ` David Hildenbrand
2021-01-11 14:00         ` Mike Rapoport
2021-01-11 14:00           ` Mike Rapoport
2021-01-04 15:36   ` Jonathan Cameron
2021-01-04 15:36     ` Jonathan Cameron
2021-01-05  3:25     ` Anshuman Khandual
2021-01-05  3:25       ` Anshuman Khandual
2021-01-25  6:22   ` Anshuman Khandual
2021-01-25  6:22     ` Anshuman Khandual
2021-01-25  7:31     ` Mike Rapoport
2021-01-25  7:31       ` Mike Rapoport
2021-01-27  3:46       ` Anshuman Khandual
2021-01-27  3:46         ` Anshuman Khandual
2021-01-25  9:13     ` David Hildenbrand
2021-01-25  9:13       ` David Hildenbrand
2021-01-27  4:06       ` Anshuman Khandual
2021-01-27  4:06         ` Anshuman Khandual
2021-01-27  9:29         ` David Hildenbrand
2021-01-27  9:29           ` David Hildenbrand
2021-01-28  7:42           ` Anshuman Khandual
2021-01-28  7:42             ` Anshuman Khandual
2020-12-22  7:12 ` [RFC 2/2] arm64/mm: Reorganize pfn_valid() Anshuman Khandual
2020-12-22  7:12   ` Anshuman Khandual

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.