linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Cleanup and fixups for vmemmap handling
@ 2021-02-03 10:47 Oscar Salvador
  2021-02-03 10:47 ` [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range Oscar Salvador
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Oscar Salvador @ 2021-02-03 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel,
	Oscar Salvador

Hi,

this series contains cleanups to remove dead code that handles
unaligned cases for 4K and 1GB pages (patch#1 and pathc#2) when
removing the vemmmap range, and a fix (patch#3) to handle the case
when two vmemmap ranges intersect a PMD.

More details can be found in the respective changelogs.


 v1 -> v2:
 - Remove dead code in remove_pud_table as well
 - Addessed feedback by David
 - Place the vmemap functions that take care of unaligned PMDs
   within CONFIG_SPARSEMEM_VMEMMAP

Oscar Salvador (3):
  x86/vmemmap: Drop handling of 4K unaligned vmemmap range
  x86/vmemmap: Drop handling of 1GB vmemmap ranges
  x86/vmemmap: Handle unpopulated sub-pmd ranges

 arch/x86/mm/init_64.c | 166 ++++++++++++++++++++++++------------------
 1 file changed, 96 insertions(+), 70 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range
  2021-02-03 10:47 [PATCH v2 0/3] Cleanup and fixups for vmemmap handling Oscar Salvador
@ 2021-02-03 10:47 ` Oscar Salvador
  2021-02-03 13:29   ` David Hildenbrand
  2021-02-11 21:25   ` Oscar Salvador
  2021-02-03 10:47 ` [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges Oscar Salvador
  2021-02-03 10:47 ` [PATCH v2 3/3] x86/vmemmap: Handle unpopulated sub-pmd ranges Oscar Salvador
  2 siblings, 2 replies; 10+ messages in thread
From: Oscar Salvador @ 2021-02-03 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel,
	Oscar Salvador

remove_pte_table() is prepared to handle the case where either the
start or the end of the range is not PAGE aligned.
This cannot actually happen:

__populate_section_memmap enforces the range to be PMD aligned,
so as long as the size of the struct page remains multiple of 8,
the vmemmap range will be aligned to PAGE_SIZE.

Drop the dead code and place a VM_BUG_ON in vmemmap_{populate,free}
to catch nasty cases.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/mm/init_64.c | 48 ++++++++++++-------------------------------
 1 file changed, 13 insertions(+), 35 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b5a3fa4033d3..b0e1d215c83e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -962,7 +962,6 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 {
 	unsigned long next, pages = 0;
 	pte_t *pte;
-	void *page_addr;
 	phys_addr_t phys_addr;
 
 	pte = pte_start + pte_index(addr);
@@ -983,42 +982,15 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 		if (phys_addr < (phys_addr_t)0x40000000)
 			return;
 
-		if (PAGE_ALIGNED(addr) && PAGE_ALIGNED(next)) {
-			/*
-			 * Do not free direct mapping pages since they were
-			 * freed when offlining, or simplely not in use.
-			 */
-			if (!direct)
-				free_pagetable(pte_page(*pte), 0);
-
-			spin_lock(&init_mm.page_table_lock);
-			pte_clear(&init_mm, addr, pte);
-			spin_unlock(&init_mm.page_table_lock);
+		if (!direct)
+			free_pagetable(pte_page(*pte), 0);
 
-			/* For non-direct mapping, pages means nothing. */
-			pages++;
-		} else {
-			/*
-			 * If we are here, we are freeing vmemmap pages since
-			 * direct mapped memory ranges to be freed are aligned.
-			 *
-			 * If we are not removing the whole page, it means
-			 * other page structs in this page are being used and
-			 * we canot remove them. So fill the unused page_structs
-			 * with 0xFD, and remove the page when it is wholly
-			 * filled with 0xFD.
-			 */
-			memset((void *)addr, PAGE_INUSE, next - addr);
-
-			page_addr = page_address(pte_page(*pte));
-			if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
-				free_pagetable(pte_page(*pte), 0);
+		spin_lock(&init_mm.page_table_lock);
+		pte_clear(&init_mm, addr, pte);
+		spin_unlock(&init_mm.page_table_lock);
 
-				spin_lock(&init_mm.page_table_lock);
-				pte_clear(&init_mm, addr, pte);
-				spin_unlock(&init_mm.page_table_lock);
-			}
-		}
+		/* For non-direct mapping, pages means nothing. */
+		pages++;
 	}
 
 	/* Call free_pte_table() in remove_pmd_table(). */
@@ -1197,6 +1169,9 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct,
 void __ref vmemmap_free(unsigned long start, unsigned long end,
 		struct vmem_altmap *altmap)
 {
+	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
+	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
+
 	remove_pagetable(start, end, false, altmap);
 }
 
@@ -1556,6 +1531,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 {
 	int err;
 
+	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
+	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
+
 	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
 		err = vmemmap_populate_basepages(start, end, node, NULL);
 	else if (boot_cpu_has(X86_FEATURE_PSE))
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges
  2021-02-03 10:47 [PATCH v2 0/3] Cleanup and fixups for vmemmap handling Oscar Salvador
  2021-02-03 10:47 ` [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range Oscar Salvador
@ 2021-02-03 10:47 ` Oscar Salvador
  2021-02-03 13:33   ` David Hildenbrand
  2021-02-03 10:47 ` [PATCH v2 3/3] x86/vmemmap: Handle unpopulated sub-pmd ranges Oscar Salvador
  2 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2021-02-03 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel,
	Oscar Salvador

We never get to allocate 1GB pages when mapping the vmemmap range.
Drop the dead code both for the aligned and unaligned cases and leave
only the direct map handling.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
---
 arch/x86/mm/init_64.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b0e1d215c83e..28729c6b9775 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1062,7 +1062,6 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 	unsigned long next, pages = 0;
 	pmd_t *pmd_base;
 	pud_t *pud;
-	void *page_addr;
 
 	pud = pud_start + pud_index(addr);
 	for (; addr < end; addr = next, pud++) {
@@ -1072,32 +1071,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
 			continue;
 
 		if (pud_large(*pud)) {
-			if (IS_ALIGNED(addr, PUD_SIZE) &&
-			    IS_ALIGNED(next, PUD_SIZE)) {
-				if (!direct)
-					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
-
-				spin_lock(&init_mm.page_table_lock);
-				pud_clear(pud);
-				spin_unlock(&init_mm.page_table_lock);
-				pages++;
-			} else {
-				/* If here, we are freeing vmemmap pages. */
-				memset((void *)addr, PAGE_INUSE, next - addr);
-
-				page_addr = page_address(pud_page(*pud));
-				if (!memchr_inv(page_addr, PAGE_INUSE,
-						PUD_SIZE)) {
-					free_pagetable(pud_page(*pud),
-						       get_order(PUD_SIZE));
-
-					spin_lock(&init_mm.page_table_lock);
-					pud_clear(pud);
-					spin_unlock(&init_mm.page_table_lock);
-				}
-			}
-
+			spin_lock(&init_mm.page_table_lock);
+			pud_clear(pud);
+			spin_unlock(&init_mm.page_table_lock);
+			pages++;
 			continue;
 		}
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] x86/vmemmap: Handle unpopulated sub-pmd ranges
  2021-02-03 10:47 [PATCH v2 0/3] Cleanup and fixups for vmemmap handling Oscar Salvador
  2021-02-03 10:47 ` [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range Oscar Salvador
  2021-02-03 10:47 ` [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges Oscar Salvador
@ 2021-02-03 10:47 ` Oscar Salvador
  2 siblings, 0 replies; 10+ messages in thread
From: Oscar Salvador @ 2021-02-03 10:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel,
	Oscar Salvador

When the size of a struct page is not multiple of 2MB, sections do
not span a PMD anymore and so when populating them some parts of the
PMD will remain unused.
Because of this, PMDs will be left behind when depopulating sections
since remove_pmd_table() thinks that those unused parts are still in
use.

Fix this by marking the unused parts with PAGE_UNUSED, so memchr_inv()
will do the right thing and will let us free the PMD when the last user
of it is gone.

This patch is based on a similar patch by David Hildenbrand:

https://lore.kernel.org/linux-mm/20200722094558.9828-9-david@redhat.com/
https://lore.kernel.org/linux-mm/20200722094558.9828-10-david@redhat.com/

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 arch/x86/mm/init_64.c | 87 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 28729c6b9775..967cd244e623 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -871,7 +871,74 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	return add_pages(nid, start_pfn, nr_pages, params);
 }
 
-#define PAGE_INUSE 0xFD
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#define PAGE_UNUSED 0xFD
+
+/*
+ * The unused vmemmap range, which was not yet memset(PAGE_UNUSED) ranges
+ * from unused_pmd_start to next PMD_SIZE boundary.
+ */
+static unsigned long unused_pmd_start __meminitdata;
+
+static void __meminit vmemmap_flush_unused_pmd(void)
+{
+	if (!unused_pmd_start)
+		return;
+	/*
+	 * Clears (unused_pmd_start, PMD_END]
+	 */
+	memset((void *)unused_pmd_start, PAGE_UNUSED,
+	       ALIGN(unused_pmd_start, PMD_SIZE) - unused_pmd_start);
+	unused_pmd_start = 0;
+}
+
+/* Returns true if the PMD is completely unused and thus it can be freed */
+static bool __meminit vmemmap_unuse_sub_pmd(unsigned long addr, unsigned long end)
+{
+	unsigned long start = ALIGN_DOWN(addr, PMD_SIZE);
+
+	vmemmap_flush_unused_pmd();
+	memset((void *)addr, PAGE_UNUSED, end - addr);
+
+	return !memchr_inv((void *)start, PAGE_UNUSED, PMD_SIZE);
+}
+
+static void __meminit vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+{
+	/*
+	 * We only optimize if the new used range directly follows the
+	 * previously unused range (esp., when populating consecutive sections).
+	 */
+	if (unused_pmd_start == start) {
+		if (likely(IS_ALIGNED(end, PMD_SIZE)))
+			unused_pmd_start = 0;
+		else
+			unused_pmd_start = end;
+		return;
+	}
+
+	vmemmap_flush_unused_pmd();
+}
+
+static void __meminit vmemmap_use_new_sub_pmd(unsigned long start, unsigned long end)
+{
+	vmemmap_flush_unused_pmd();
+
+	/*
+	 * Mark the unused parts of the new memmap range
+	 */
+	if (!IS_ALIGNED(start, PMD_SIZE))
+		memset((void *)start, PAGE_UNUSED,
+		       start - ALIGN_DOWN(start, PMD_SIZE));
+	/*
+	 * We want to avoid memset(PAGE_UNUSED) when populating the vmemmap of
+	 * consecutive sections. Remember for the last added PMD the last
+	 * unused range in the populated PMD.
+	 */
+	if (!IS_ALIGNED(end, PMD_SIZE))
+		unused_pmd_start = end;
+}
+#endif
 
 static void __meminit free_pagetable(struct page *page, int order)
 {
@@ -1006,7 +1073,6 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 	unsigned long next, pages = 0;
 	pte_t *pte_base;
 	pmd_t *pmd;
-	void *page_addr;
 
 	pmd = pmd_start + pmd_index(addr);
 	for (; addr < end; addr = next, pmd++) {
@@ -1027,12 +1093,11 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 				spin_unlock(&init_mm.page_table_lock);
 				pages++;
 			} else {
-				/* If here, we are freeing vmemmap pages. */
-				memset((void *)addr, PAGE_INUSE, next - addr);
-
-				page_addr = page_address(pmd_page(*pmd));
-				if (!memchr_inv(page_addr, PAGE_INUSE,
-						PMD_SIZE)) {
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+				/*
+				 * Free the PMD if the whole range is unused.
+				 */
+				if (vmemmap_unuse_sub_pmd(addr, next)) {
 					free_hugepage_table(pmd_page(*pmd),
 							    altmap);
 
@@ -1040,6 +1105,7 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 					pmd_clear(pmd);
 					spin_unlock(&init_mm.page_table_lock);
 				}
+#endif
 			}
 
 			continue;
@@ -1490,11 +1556,16 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
 
 				addr_end = addr + PMD_SIZE;
 				p_end = p + PMD_SIZE;
+
+				if (!IS_ALIGNED(addr, PMD_SIZE) ||
+				    !IS_ALIGNED(next, PMD_SIZE))
+					vmemmap_use_new_sub_pmd(addr, next);
 				continue;
 			} else if (altmap)
 				return -ENOMEM; /* no fallback */
 		} else if (pmd_large(*pmd)) {
 			vmemmap_verify((pte_t *)pmd, node, addr, next);
+			vmemmap_use_sub_pmd(addr, next);
 			continue;
 		}
 		if (vmemmap_populate_basepages(addr, next, node, NULL))
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range
  2021-02-03 10:47 ` [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range Oscar Salvador
@ 2021-02-03 13:29   ` David Hildenbrand
  2021-02-11 21:25   ` Oscar Salvador
  1 sibling, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-02-03 13:29 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H . Peter Anvin, Michal Hocko,
	linux-mm, linux-kernel

On 03.02.21 11:47, Oscar Salvador wrote:
> remove_pte_table() is prepared to handle the case where either the
> start or the end of the range is not PAGE aligned.
> This cannot actually happen:
> 
> __populate_section_memmap enforces the range to be PMD aligned,
> so as long as the size of the struct page remains multiple of 8,
> the vmemmap range will be aligned to PAGE_SIZE.
> 
> Drop the dead code and place a VM_BUG_ON in vmemmap_{populate,free}
> to catch nasty cases.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>
> ---
>   arch/x86/mm/init_64.c | 48 ++++++++++++-------------------------------
>   1 file changed, 13 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index b5a3fa4033d3..b0e1d215c83e 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -962,7 +962,6 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>   {
>   	unsigned long next, pages = 0;
>   	pte_t *pte;
> -	void *page_addr;
>   	phys_addr_t phys_addr;
>   
>   	pte = pte_start + pte_index(addr);
> @@ -983,42 +982,15 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>   		if (phys_addr < (phys_addr_t)0x40000000)
>   			return;
>   
> -		if (PAGE_ALIGNED(addr) && PAGE_ALIGNED(next)) {
> -			/*
> -			 * Do not free direct mapping pages since they were
> -			 * freed when offlining, or simplely not in use.
> -			 */
> -			if (!direct)
> -				free_pagetable(pte_page(*pte), 0);
> -
> -			spin_lock(&init_mm.page_table_lock);
> -			pte_clear(&init_mm, addr, pte);
> -			spin_unlock(&init_mm.page_table_lock);
> +		if (!direct)
> +			free_pagetable(pte_page(*pte), 0);
>   
> -			/* For non-direct mapping, pages means nothing. */
> -			pages++;
> -		} else {
> -			/*
> -			 * If we are here, we are freeing vmemmap pages since
> -			 * direct mapped memory ranges to be freed are aligned.
> -			 *
> -			 * If we are not removing the whole page, it means
> -			 * other page structs in this page are being used and
> -			 * we canot remove them. So fill the unused page_structs
> -			 * with 0xFD, and remove the page when it is wholly
> -			 * filled with 0xFD.
> -			 */
> -			memset((void *)addr, PAGE_INUSE, next - addr);
> -
> -			page_addr = page_address(pte_page(*pte));
> -			if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> -				free_pagetable(pte_page(*pte), 0);
> +		spin_lock(&init_mm.page_table_lock);
> +		pte_clear(&init_mm, addr, pte);
> +		spin_unlock(&init_mm.page_table_lock);
>   
> -				spin_lock(&init_mm.page_table_lock);
> -				pte_clear(&init_mm, addr, pte);
> -				spin_unlock(&init_mm.page_table_lock);
> -			}
> -		}
> +		/* For non-direct mapping, pages means nothing. */
> +		pages++;
>   	}
>   
>   	/* Call free_pte_table() in remove_pmd_table(). */
> @@ -1197,6 +1169,9 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct,
>   void __ref vmemmap_free(unsigned long start, unsigned long end,
>   		struct vmem_altmap *altmap)
>   {
> +	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
> +	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
> +
>   	remove_pagetable(start, end, false, altmap);
>   }
>   
> @@ -1556,6 +1531,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>   {
>   	int err;
>   
> +	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
> +	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
> +
>   	if (end - start < PAGES_PER_SECTION * sizeof(struct page))
>   		err = vmemmap_populate_basepages(start, end, node, NULL);
>   	else if (boot_cpu_has(X86_FEATURE_PSE))
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges
  2021-02-03 10:47 ` [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges Oscar Salvador
@ 2021-02-03 13:33   ` David Hildenbrand
  2021-02-03 14:10     ` Oscar Salvador
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-02-03 13:33 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H . Peter Anvin, Michal Hocko,
	linux-mm, linux-kernel

On 03.02.21 11:47, Oscar Salvador wrote:
> We never get to allocate 1GB pages when mapping the vmemmap range.
> Drop the dead code both for the aligned and unaligned cases and leave
> only the direct map handling.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>
> ---
>   arch/x86/mm/init_64.c | 31 ++++---------------------------
>   1 file changed, 4 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index b0e1d215c83e..28729c6b9775 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1062,7 +1062,6 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
>   	unsigned long next, pages = 0;
>   	pmd_t *pmd_base;
>   	pud_t *pud;
> -	void *page_addr;
>   
>   	pud = pud_start + pud_index(addr);
>   	for (; addr < end; addr = next, pud++) {
> @@ -1072,32 +1071,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
>   			continue;
>   
>   		if (pud_large(*pud)) {
> -			if (IS_ALIGNED(addr, PUD_SIZE) &&
> -			    IS_ALIGNED(next, PUD_SIZE)) {
> -				if (!direct)
> -					free_pagetable(pud_page(*pud),
> -						       get_order(PUD_SIZE));
> -
> -				spin_lock(&init_mm.page_table_lock);
> -				pud_clear(pud);
> -				spin_unlock(&init_mm.page_table_lock);
> -				pages++;
> -			} else {
> -				/* If here, we are freeing vmemmap pages. */
> -				memset((void *)addr, PAGE_INUSE, next - addr);
> -
> -				page_addr = page_address(pud_page(*pud));
> -				if (!memchr_inv(page_addr, PAGE_INUSE,
> -						PUD_SIZE)) {
> -					free_pagetable(pud_page(*pud),
> -						       get_order(PUD_SIZE));
> -
> -					spin_lock(&init_mm.page_table_lock);
> -					pud_clear(pud);
> -					spin_unlock(&init_mm.page_table_lock);
> -				}
> -			}
> -
> +			spin_lock(&init_mm.page_table_lock);
> +			pud_clear(pud);
> +			spin_unlock(&init_mm.page_table_lock);
> +			pages++;
>   			continue;
>   		}

One problem I see with existing code / this change making more obvious 
is that when trying to remove in other granularity than we added (e.g., 
unplug a 128MB DIMM avaialble during boot), we remove the direct map of 
unrelated DIMMs.

I think we should keep the

if (IS_ALIGNED(addr, PUD_SIZE) &&
     IS_ALIGNED(next, PUD_SIZE)) {
...
}

bits. Thoguhts?

Apart from that looks good.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges
  2021-02-03 13:33   ` David Hildenbrand
@ 2021-02-03 14:10     ` Oscar Salvador
  2021-02-03 14:12       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Oscar Salvador @ 2021-02-03 14:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel

On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote:
> One problem I see with existing code / this change making more obvious is
> that when trying to remove in other granularity than we added (e.g., unplug
> a 128MB DIMM avaialble during boot), we remove the direct map of unrelated
> DIMMs.

So, let me see if I understand your concern.

We have a range that was mapped with 1GB page, and we try to remove
a 128MB chunk from it.
Yes, in that case we would clear the pud, and that is bad, so we should
keep the PAGE_ALIGNED checks.

Now, let us assume that scenario.
If you have a 1GB mapped range and you remove it in smaller chunks bit by bit
(e.g: 128M), the direct mapping of that range will never be cleared unless
I am missing something (and the pagetables won't be freed) ?

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges
  2021-02-03 14:10     ` Oscar Salvador
@ 2021-02-03 14:12       ` David Hildenbrand
  2021-02-03 14:15         ` Oscar Salvador
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-02-03 14:12 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel

On 03.02.21 15:10, Oscar Salvador wrote:
> On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote:
>> One problem I see with existing code / this change making more obvious is
>> that when trying to remove in other granularity than we added (e.g., unplug
>> a 128MB DIMM avaialble during boot), we remove the direct map of unrelated
>> DIMMs.
> 
> So, let me see if I understand your concern.
> 
> We have a range that was mapped with 1GB page, and we try to remove
> a 128MB chunk from it.
> Yes, in that case we would clear the pud, and that is bad, so we should
> keep the PAGE_ALIGNED checks.
> 
> Now, let us assume that scenario.
> If you have a 1GB mapped range and you remove it in smaller chunks bit by bit
> (e.g: 128M), the direct mapping of that range will never be cleared unless

No, that's exactly what's happening. Good thing is that it barely ever 
happens, so I assume leaving behind some direct mapping / page tables is 
not that bad.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges
  2021-02-03 14:12       ` David Hildenbrand
@ 2021-02-03 14:15         ` Oscar Salvador
  0 siblings, 0 replies; 10+ messages in thread
From: Oscar Salvador @ 2021-02-03 14:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel

On Wed, Feb 03, 2021 at 03:12:05PM +0100, David Hildenbrand wrote:
> On 03.02.21 15:10, Oscar Salvador wrote:
> > On Wed, Feb 03, 2021 at 02:33:56PM +0100, David Hildenbrand wrote:
> > > One problem I see with existing code / this change making more obvious is
> > > that when trying to remove in other granularity than we added (e.g., unplug
> > > a 128MB DIMM avaialble during boot), we remove the direct map of unrelated
> > > DIMMs.
> > 
> > So, let me see if I understand your concern.
> > 
> > We have a range that was mapped with 1GB page, and we try to remove
> > a 128MB chunk from it.
> > Yes, in that case we would clear the pud, and that is bad, so we should
> > keep the PAGE_ALIGNED checks.
> > 
> > Now, let us assume that scenario.
> > If you have a 1GB mapped range and you remove it in smaller chunks bit by bit
> > (e.g: 128M), the direct mapping of that range will never be cleared unless
> 
> No, that's exactly what's happening. Good thing is that it barely ever
> happens, so I assume leaving behind some direct mapping / page tables is not
> that bad.

Sorry, I meant that that is the current situation now.

Then let us keep the PAGE_ALIGNED stuff.

I shall resend a v3 later today.


thanks for the review ;-)

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range
  2021-02-03 10:47 ` [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range Oscar Salvador
  2021-02-03 13:29   ` David Hildenbrand
@ 2021-02-11 21:25   ` Oscar Salvador
  1 sibling, 0 replies; 10+ messages in thread
From: Oscar Salvador @ 2021-02-11 21:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, Michal Hocko, linux-mm, linux-kernel

On Wed, Feb 03, 2021 at 11:47:48AM +0100, Oscar Salvador wrote:
> remove_pte_table() is prepared to handle the case where either the
> start or the end of the range is not PAGE aligned.
> This cannot actually happen:
> 
> __populate_section_memmap enforces the range to be PMD aligned,
> so as long as the size of the struct page remains multiple of 8,
> the vmemmap range will be aligned to PAGE_SIZE.
> 
> Drop the dead code and place a VM_BUG_ON in vmemmap_{populate,free}
> to catch nasty cases.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>

Hi Andrew, 

the cover letter got lost somehow, but is this patchset, [1] and [2]
on your radar?

[1] https://lore.kernel.org/linux-mm/20210204134325.7237-3-osalvador@suse.de/
[2] https://lore.kernel.org/linux-mm/20210204134325.7237-4-osalvador@suse.de/

thanks


-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-02-11 21:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03 10:47 [PATCH v2 0/3] Cleanup and fixups for vmemmap handling Oscar Salvador
2021-02-03 10:47 ` [PATCH v2 1/3] x86/vmemmap: Drop handling of 4K unaligned vmemmap range Oscar Salvador
2021-02-03 13:29   ` David Hildenbrand
2021-02-11 21:25   ` Oscar Salvador
2021-02-03 10:47 ` [PATCH v2 2/3] x86/vmemmap: Drop handling of 1GB vmemmap ranges Oscar Salvador
2021-02-03 13:33   ` David Hildenbrand
2021-02-03 14:10     ` Oscar Salvador
2021-02-03 14:12       ` David Hildenbrand
2021-02-03 14:15         ` Oscar Salvador
2021-02-03 10:47 ` [PATCH v2 3/3] x86/vmemmap: Handle unpopulated sub-pmd ranges Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).