linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
@ 2020-07-03 13:39 David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 1/9] s390/vmem: rename vmem_add_mem() to vmem_add_range() David Hildenbrand
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Christian Borntraeger,
	Gerald Schaefer, Heiko Carstens, Vasily Gorbik

This series is based on the latest s390/features branch [1]. It implements
vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
- Freeing empty page tables (now also done for idendity mapping).
- Handling cases where the vmemmap of a section does not fill huge pages
  completely.

vmemmap_free() is currently never used, unless adiing standby memory fails
(unlikely). This is relevant for virtio-mem, which adds/removes memory
in memory block/section granularity (always removes memory in the same
granularity it added it).

I gave this a proper test with my virtio-mem prototype (which I will share
once the basic QEMU implementation is upstream), both with 56 byte memmap
per page and 64 byte memmap per page, with and without huge page support.
In both cases, removing memory (routed through arch_remove_memory()) will
result in
- all populated vmemmap pages to get removed/freed
- all applicable page tables for the vmemmap getting removed/freed
- all applicable page tables for the idendity mapping getting removed/freed
Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
environments.

This is the basis for real memory hotunplug support for s390x and should
complete my journey to s390x vmem/vmemmap code for now :)

What needs double-checking is tlb flushing. AFAIKS, as there are no valid
accesses, doing a single range flush at the end is sufficient, both when
removing vmemmap pages and the idendity mapping.

Along, some minor cleanups.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features

David Hildenbrand (9):
  s390/vmem: rename vmem_add_mem() to vmem_add_range()
  s390/vmem: recursive implementation of vmem_remove_range()
  s390/vmemmap: implement vmemmap_free()
  s390/vmemmap: cleanup when vmemmap_populate() fails
  s390/vmemmap: take the vmem_mutex when populating/freeing
  s390/vmem: cleanup empty page tables
  s390/vmemmap: fallback to PTEs if mapping large PMD fails
  s390/vmemmap: remember unused sub-pmd ranges
  s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive
    sections

 arch/s390/mm/vmem.c | 400 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 338 insertions(+), 62 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v1 1/9] s390/vmem: rename vmem_add_mem() to vmem_add_range()
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 2/9] s390/vmem: recursive implementation of vmem_remove_range() David Hildenbrand
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Let's match the name to vmem_remove_range().

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 3b9e71654c37b..66c5333020ead 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -57,7 +57,7 @@ pte_t __ref *vmem_pte_alloc(void)
 /*
  * Add a physical memory range to the 1:1 mapping.
  */
-static int vmem_add_mem(unsigned long start, unsigned long size)
+static int vmem_add_range(unsigned long start, unsigned long size)
 {
 	unsigned long pgt_prot, sgt_prot, r3_prot;
 	unsigned long pages4k, pages1m, pages2g;
@@ -308,7 +308,7 @@ int vmem_add_mapping(unsigned long start, unsigned long size)
 		return -ERANGE;
 
 	mutex_lock(&vmem_mutex);
-	ret = vmem_add_mem(start, size);
+	ret = vmem_add_range(start, size);
 	if (ret)
 		vmem_remove_range(start, size);
 	mutex_unlock(&vmem_mutex);
@@ -325,7 +325,7 @@ void __init vmem_map_init(void)
 	struct memblock_region *reg;
 
 	for_each_memblock(memory, reg)
-		vmem_add_mem(reg->base, reg->size);
+		vmem_add_range(reg->base, reg->size);
 	__set_memory((unsigned long)_stext,
 		     (unsigned long)(_etext - _stext) >> PAGE_SHIFT,
 		     SET_MEMORY_RO | SET_MEMORY_X);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 2/9] s390/vmem: recursive implementation of vmem_remove_range()
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 1/9] s390/vmem: rename vmem_add_mem() to vmem_add_range() David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 3/9] s390/vmemmap: implement vmemmap_free() David Hildenbrand
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

We want to reuse the same functionality in vmemmap_free(). Let's start by
introducing recursive remove_pagetable(), inspired by x86. We'll extend
it to cover the vmemmap similarly next.

A recursive implementation makes it easier to expand individual cases
without harming readability. In addition, we minimize traversing the
whole hierarchy over and over again.

One change is that we don't unmap large PMDs/PUDs when not completely
covered by the request, something that should never happen with direct
mappings, unless one would be removing in other granularity than added,
which would be broken already.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 153 +++++++++++++++++++++++++++++++-------------
 1 file changed, 107 insertions(+), 46 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 66c5333020ead..6fe156c3f035c 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -138,64 +138,125 @@ static int vmem_add_range(unsigned long start, unsigned long size)
 	return ret;
 }
 
-/*
- * Remove a physical memory range from the 1:1 mapping.
- * Currently only invalidates page table entries.
- */
-static void vmem_remove_range(unsigned long start, unsigned long size)
+static void remove_pte_table(pmd_t *pmd, unsigned long addr,
+			     unsigned long end)
 {
-	unsigned long pages4k, pages1m, pages2g;
-	unsigned long end = start + size;
-	unsigned long address = start;
-	pgd_t *pg_dir;
-	p4d_t *p4_dir;
-	pud_t *pu_dir;
-	pmd_t *pm_dir;
-	pte_t *pt_dir;
+	unsigned long pages = 0;
+	pte_t *pte;
 
-	pages4k = pages1m = pages2g = 0;
-	while (address < end) {
-		pg_dir = pgd_offset_k(address);
-		if (pgd_none(*pg_dir)) {
-			address += PGDIR_SIZE;
+	pte = pte_offset_kernel(pmd, addr);
+	for (; addr < end; addr += PAGE_SIZE, pte++) {
+		if (pte_none(*pte))
 			continue;
-		}
-		p4_dir = p4d_offset(pg_dir, address);
-		if (p4d_none(*p4_dir)) {
-			address += P4D_SIZE;
+
+		pte_clear(&init_mm, addr, pte);
+		pages++;
+	}
+
+	update_page_count(PG_DIRECT_MAP_4K, -pages);
+}
+
+static void remove_pmd_table(pud_t *pud, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next, pages = 0;
+	pmd_t *pmd;
+
+	pmd = pmd_offset(pud, addr);
+	for (; addr < end; addr = next, pmd++) {
+		next = pmd_addr_end(addr, end);
+
+		if (pmd_none(*pmd))
 			continue;
-		}
-		pu_dir = pud_offset(p4_dir, address);
-		if (pud_none(*pu_dir)) {
-			address += PUD_SIZE;
+
+		if (pmd_large(*pmd)) {
+			if (IS_ALIGNED(addr, PMD_SIZE) &&
+			    IS_ALIGNED(next, PMD_SIZE)) {
+				pmd_clear(pmd);
+				pages++;
+			}
 			continue;
 		}
-		if (pud_large(*pu_dir)) {
-			pud_clear(pu_dir);
-			address += PUD_SIZE;
-			pages2g++;
+
+		remove_pte_table(pmd, addr, next);
+	}
+
+	update_page_count(PG_DIRECT_MAP_1M, -pages);
+}
+
+static void remove_pud_table(p4d_t *p4d, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next, pages = 0;
+	pud_t *pud;
+
+	pud = pud_offset(p4d, addr);
+	for (; addr < end; addr = next, pud++) {
+		next = pud_addr_end(addr, end);
+
+		if (pud_none(*pud))
 			continue;
-		}
-		pm_dir = pmd_offset(pu_dir, address);
-		if (pmd_none(*pm_dir)) {
-			address += PMD_SIZE;
+
+		if (pud_large(*pud)) {
+			if (IS_ALIGNED(addr, PUD_SIZE) &&
+			    IS_ALIGNED(next, PUD_SIZE)) {
+				pud_clear(pud);
+				pages++;
+			}
 			continue;
 		}
-		if (pmd_large(*pm_dir)) {
-			pmd_clear(pm_dir);
-			address += PMD_SIZE;
-			pages1m++;
+
+		remove_pmd_table(pud, addr, next);
+	}
+
+	update_page_count(PG_DIRECT_MAP_2G, -pages);
+}
+
+static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
+			     unsigned long end)
+{
+	unsigned long next;
+	p4d_t *p4d;
+
+	p4d = p4d_offset(pgd, addr);
+	for (; addr < end; addr = next, p4d++) {
+		next = p4d_addr_end(addr, end);
+
+		if (p4d_none(*p4d))
 			continue;
-		}
-		pt_dir = pte_offset_kernel(pm_dir, address);
-		pte_clear(&init_mm, address, pt_dir);
-		address += PAGE_SIZE;
-		pages4k++;
+
+		remove_pud_table(p4d, addr, next);
 	}
+}
+
+static void remove_pagetable(unsigned long start, unsigned long end)
+{
+	unsigned long addr, next;
+	pgd_t *pgd;
+
+	if (WARN_ON_ONCE(!PAGE_ALIGNED(start | end)))
+		return;
+
+	for (addr = start; addr < end; addr = next) {
+		next = pgd_addr_end(addr, end);
+		pgd = pgd_offset_k(addr);
+
+		if (pgd_none(*pgd))
+			continue;
+
+		remove_p4d_table(pgd, addr, next);
+	}
+
 	flush_tlb_kernel_range(start, end);
-	update_page_count(PG_DIRECT_MAP_4K, -pages4k);
-	update_page_count(PG_DIRECT_MAP_1M, -pages1m);
-	update_page_count(PG_DIRECT_MAP_2G, -pages2g);
+}
+
+/*
+ * Remove a physical memory range from the 1:1 mapping.
+ * Currently only invalidates page table entries.
+ */
+static void vmem_remove_range(unsigned long start, unsigned long size)
+{
+	remove_pagetable(start, start + size);
 }
 
 /*
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 3/9] s390/vmemmap: implement vmemmap_free()
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 1/9] s390/vmem: rename vmem_add_mem() to vmem_add_range() David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 2/9] s390/vmem: recursive implementation of vmem_remove_range() David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails David Hildenbrand
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Reuse the shiny new remove_pagetable(), tweaking it to handle freeing of
vmemmap pages, similar to the x86-64 variant (passing "bool direct" to
distinguish).

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 46 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 6fe156c3f035c..16e109c292bf5 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -29,6 +29,15 @@ static void __ref *vmem_alloc_pages(unsigned int order)
 	return (void *) memblock_phys_alloc(size, size);
 }
 
+static void vmem_free_pages(unsigned long addr, int order)
+{
+	/* We don't expect boot memory to be removed ever. */
+	if (!slab_is_available() ||
+	    WARN_ON_ONCE(PageReserved(phys_to_page(addr))))
+		return;
+	free_pages(addr, order);
+}
+
 void *vmem_crst_alloc(unsigned long val)
 {
 	unsigned long *table;
@@ -139,7 +148,7 @@ static int vmem_add_range(unsigned long start, unsigned long size)
 }
 
 static void remove_pte_table(pmd_t *pmd, unsigned long addr,
-			     unsigned long end)
+			     unsigned long end, bool direct)
 {
 	unsigned long pages = 0;
 	pte_t *pte;
@@ -149,15 +158,18 @@ static void remove_pte_table(pmd_t *pmd, unsigned long addr,
 		if (pte_none(*pte))
 			continue;
 
+		if (!direct)
+			vmem_free_pages(pfn_to_phys(pte_pfn(*pte)), 0);
 		pte_clear(&init_mm, addr, pte);
 		pages++;
 	}
 
-	update_page_count(PG_DIRECT_MAP_4K, -pages);
+	if (direct)
+		update_page_count(PG_DIRECT_MAP_4K, -pages);
 }
 
 static void remove_pmd_table(pud_t *pud, unsigned long addr,
-			     unsigned long end)
+			     unsigned long end, bool direct)
 {
 	unsigned long next, pages = 0;
 	pmd_t *pmd;
@@ -172,20 +184,24 @@ static void remove_pmd_table(pud_t *pud, unsigned long addr,
 		if (pmd_large(*pmd)) {
 			if (IS_ALIGNED(addr, PMD_SIZE) &&
 			    IS_ALIGNED(next, PMD_SIZE)) {
+				if (!direct)
+					vmem_free_pages(pmd_deref(*pmd),
+							get_order(PMD_SIZE));
 				pmd_clear(pmd);
 				pages++;
 			}
 			continue;
 		}
 
-		remove_pte_table(pmd, addr, next);
+		remove_pte_table(pmd, addr, next, direct);
 	}
 
-	update_page_count(PG_DIRECT_MAP_1M, -pages);
+	if (direct)
+		update_page_count(PG_DIRECT_MAP_1M, -pages);
 }
 
 static void remove_pud_table(p4d_t *p4d, unsigned long addr,
-			     unsigned long end)
+			     unsigned long end, bool direct)
 {
 	unsigned long next, pages = 0;
 	pud_t *pud;
@@ -200,20 +216,22 @@ static void remove_pud_table(p4d_t *p4d, unsigned long addr,
 		if (pud_large(*pud)) {
 			if (IS_ALIGNED(addr, PUD_SIZE) &&
 			    IS_ALIGNED(next, PUD_SIZE)) {
+				WARN_ON_ONCE(!direct);
 				pud_clear(pud);
 				pages++;
 			}
 			continue;
 		}
 
-		remove_pmd_table(pud, addr, next);
+		remove_pmd_table(pud, addr, next, direct);
 	}
 
-	update_page_count(PG_DIRECT_MAP_2G, -pages);
+	if (direct)
+		update_page_count(PG_DIRECT_MAP_2G, -pages);
 }
 
 static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
-			     unsigned long end)
+			     unsigned long end, bool direct)
 {
 	unsigned long next;
 	p4d_t *p4d;
@@ -225,11 +243,12 @@ static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
 		if (p4d_none(*p4d))
 			continue;
 
-		remove_pud_table(p4d, addr, next);
+		remove_pud_table(p4d, addr, next, direct);
 	}
 }
 
-static void remove_pagetable(unsigned long start, unsigned long end)
+static void remove_pagetable(unsigned long start, unsigned long end,
+			     bool direct)
 {
 	unsigned long addr, next;
 	pgd_t *pgd;
@@ -244,7 +263,7 @@ static void remove_pagetable(unsigned long start, unsigned long end)
 		if (pgd_none(*pgd))
 			continue;
 
-		remove_p4d_table(pgd, addr, next);
+		remove_p4d_table(pgd, addr, next, direct);
 	}
 
 	flush_tlb_kernel_range(start, end);
@@ -256,7 +275,7 @@ static void remove_pagetable(unsigned long start, unsigned long end)
  */
 static void vmem_remove_range(unsigned long start, unsigned long size)
 {
-	remove_pagetable(start, start + size);
+	remove_pagetable(start, start + size, true);
 }
 
 /*
@@ -351,6 +370,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 void vmemmap_free(unsigned long start, unsigned long end,
 		struct vmem_altmap *altmap)
 {
+	remove_pagetable(start, end, false);
 }
 
 void vmem_remove_mapping(unsigned long start, unsigned long size)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 3/9] s390/vmemmap: implement vmemmap_free() David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 17:09   ` kernel test robot
  2020-07-04 11:48   ` kernel test robot
  2020-07-03 13:39 ` [PATCH v1 5/9] s390/vmemmap: take the vmem_mutex when populating/freeing David Hildenbrand
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Cleanup what we partially added in case vmemmap_populate() fails.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 16e109c292bf5..bcddabd509da8 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -364,6 +364,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	}
 	ret = 0;
 out:
+	if (ret)
+		vmemmap_free(start, end, altmap);
 	return ret;
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 5/9] s390/vmemmap: take the vmem_mutex when populating/freeing
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 6/9] s390/vmem: cleanup empty page tables David Hildenbrand
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Let's synchronize all accesses to the 1:1 and vmemmap mappings. This will
be especially relevant when wanting to cleanup empty page tables that could
be shared by both. Avoid races when removing tables that might be just
about to get reused.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index bcddabd509da8..aa968f67d7f9f 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -293,6 +293,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	pte_t *pt_dir;
 	int ret = -ENOMEM;
 
+	mutex_lock(&vmem_mutex);
 	pgt_prot = pgprot_val(PAGE_KERNEL);
 	sgt_prot = pgprot_val(SEGMENT_KERNEL);
 	if (!MACHINE_HAS_NX) {
@@ -364,6 +365,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 	}
 	ret = 0;
 out:
+	mutex_unlock(&vmem_mutex);
 	if (ret)
 		vmemmap_free(start, end, altmap);
 	return ret;
@@ -372,7 +374,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 void vmemmap_free(unsigned long start, unsigned long end,
 		struct vmem_altmap *altmap)
 {
+	mutex_lock(&vmem_mutex);
 	remove_pagetable(start, end, false);
+	mutex_unlock(&vmem_mutex);
 }
 
 void vmem_remove_mapping(unsigned long start, unsigned long size)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 6/9] s390/vmem: cleanup empty page tables
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 5/9] s390/vmemmap: take the vmem_mutex when populating/freeing David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 7/9] s390/vmemmap: fallback to PTEs if mapping large PMD fails David Hildenbrand
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Let's cleanup empty page tables. Consider only page tables that fully
fall into the idendity mapping and the vmemmap range.

As there are no valid accesses to vmem/vmemmap within non-populated ranges,
the single tlb flush at the end should be sufficient.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 98 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index aa968f67d7f9f..5239130770b7b 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -63,6 +63,15 @@ pte_t __ref *vmem_pte_alloc(void)
 	return pte;
 }
 
+static void vmem_pte_free(unsigned long *table)
+{
+	/* We don't expect boot memory to be removed ever. */
+	if (!slab_is_available() ||
+	    WARN_ON_ONCE(PageReserved(virt_to_page(table))))
+		return;
+	page_table_free(&init_mm, table);
+}
+
 /*
  * Add a physical memory range to the 1:1 mapping.
  */
@@ -168,6 +177,21 @@ static void remove_pte_table(pmd_t *pmd, unsigned long addr,
 		update_page_count(PG_DIRECT_MAP_4K, -pages);
 }
 
+static void try_free_pte_table(pmd_t *pmd, unsigned long start)
+{
+	pte_t *pte;
+	int i;
+
+	/* We can safely assume this is fully in 1:1 mapping & vmemmap area */
+	pte = pte_offset_kernel(pmd, start);
+	for (i = 0; i < PTRS_PER_PTE; i++, pte++)
+		if (!pte_none(*pte))
+			return;
+
+	vmem_pte_free(__va(pmd_deref(*pmd)));
+	pmd_clear(pmd);
+}
+
 static void remove_pmd_table(pud_t *pud, unsigned long addr,
 			     unsigned long end, bool direct)
 {
@@ -194,12 +218,36 @@ static void remove_pmd_table(pud_t *pud, unsigned long addr,
 		}
 
 		remove_pte_table(pmd, addr, next, direct);
+		try_free_pte_table(pmd, addr & PMD_MASK);
 	}
 
 	if (direct)
 		update_page_count(PG_DIRECT_MAP_1M, -pages);
 }
 
+static void try_free_pmd_table(pud_t *pud, unsigned long start)
+{
+	const unsigned long end = start + PUD_SIZE;
+	pmd_t *pmd;
+	int i;
+
+	/* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */
+	if (end > VMALLOC_START)
+		return;
+#ifdef CONFIG_KASAN
+	if (start < KASAN_SHADOW_END && KASAN_SHADOW_START > end)
+		return;
+#endif
+
+	pmd = pmd_offset(pud, start);
+	for (i = 0; i < PTRS_PER_PMD; i++, pmd++)
+		if (!pmd_none(*pmd))
+			return;
+
+	vmem_free_pages(pud_deref(*pud), CRST_ALLOC_ORDER);
+	pud_clear(pud);
+}
+
 static void remove_pud_table(p4d_t *p4d, unsigned long addr,
 			     unsigned long end, bool direct)
 {
@@ -224,12 +272,36 @@ static void remove_pud_table(p4d_t *p4d, unsigned long addr,
 		}
 
 		remove_pmd_table(pud, addr, next, direct);
+		try_free_pmd_table(pud, addr & PUD_MASK);
 	}
 
 	if (direct)
 		update_page_count(PG_DIRECT_MAP_2G, -pages);
 }
 
+static void try_free_pud_table(p4d_t *p4d, unsigned long start)
+{
+	const unsigned long end = start + P4D_SIZE;
+	pud_t *pud;
+	int i;
+
+	/* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */
+	if (end > VMALLOC_START)
+		return;
+#ifdef CONFIG_KASAN
+	if (start < KASAN_SHADOW_END && KASAN_SHADOW_START > end)
+		return;
+#endif
+
+	pud = pud_offset(p4d, start);
+	for (i = 0; i < PTRS_PER_PUD; i++, pud++)
+		if (!pud_none(*pud))
+			return;
+
+	vmem_free_pages(p4d_deref(*p4d), CRST_ALLOC_ORDER);
+	p4d_clear(p4d);
+}
+
 static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
 			     unsigned long end, bool direct)
 {
@@ -244,9 +316,33 @@ static void remove_p4d_table(pgd_t *pgd, unsigned long addr,
 			continue;
 
 		remove_pud_table(p4d, addr, next, direct);
+		try_free_pud_table(p4d, addr & P4D_MASK);
 	}
 }
 
+static void try_free_p4d_table(pgd_t *pgd, unsigned long start)
+{
+	const unsigned long end = start + PGDIR_SIZE;
+	p4d_t *p4d;
+	int i;
+
+	/* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */
+	if (end > VMALLOC_START)
+		return;
+#ifdef CONFIG_KASAN
+	if (start < KASAN_SHADOW_END && KASAN_SHADOW_START > end)
+		return;
+#endif
+
+	p4d = p4d_offset(pgd, start);
+	for (i = 0; i < PTRS_PER_P4D; i++, p4d++)
+		if (!p4d_none(*p4d))
+			return;
+
+	vmem_free_pages(pgd_deref(*pgd), CRST_ALLOC_ORDER);
+	pgd_clear(pgd);
+}
+
 static void remove_pagetable(unsigned long start, unsigned long end,
 			     bool direct)
 {
@@ -264,6 +360,7 @@ static void remove_pagetable(unsigned long start, unsigned long end,
 			continue;
 
 		remove_p4d_table(pgd, addr, next, direct);
+		try_free_p4d_table(pgd, addr & PGDIR_MASK);
 	}
 
 	flush_tlb_kernel_range(start, end);
@@ -271,7 +368,6 @@ static void remove_pagetable(unsigned long start, unsigned long end,
 
 /*
  * Remove a physical memory range from the 1:1 mapping.
- * Currently only invalidates page table entries.
  */
 static void vmem_remove_range(unsigned long start, unsigned long size)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 7/9] s390/vmemmap: fallback to PTEs if mapping large PMD fails
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 6/9] s390/vmem: cleanup empty page tables David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 8/9] s390/vmemmap: remember unused sub-pmd ranges David Hildenbrand
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Let's fallback to single pages if short on huge pages. No need to stop
memory hotplug.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 5239130770b7b..b7fdb9536707f 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -422,23 +422,23 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		}
 
 		pm_dir = pmd_offset(pu_dir, address);
-		if (pmd_none(*pm_dir)) {
+		if (pmd_none(*pm_dir) && MACHINE_HAS_EDAT1) {
+			void *new_page;
+
 			/* Use 1MB frames for vmemmap if available. We always
 			 * use large frames even if they are only partially
 			 * used.
 			 * Otherwise we would have also page tables since
 			 * vmemmap_populate gets called for each section
 			 * separately. */
-			if (MACHINE_HAS_EDAT1) {
-				void *new_page;
-
-				new_page = vmemmap_alloc_block(PMD_SIZE, node);
-				if (!new_page)
-					goto out;
+			new_page = vmemmap_alloc_block(PMD_SIZE, node);
+			if (new_page) {
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
 				address = (address + PMD_SIZE) & PMD_MASK;
 				continue;
 			}
+		}
+		if (pmd_none(*pm_dir)) {
 			pt_dir = vmem_pte_alloc();
 			if (!pt_dir)
 				goto out;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 8/9] s390/vmemmap: remember unused sub-pmd ranges
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 7/9] s390/vmemmap: fallback to PTEs if mapping large PMD fails David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 13:39 ` [PATCH v1 9/9] s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections David Hildenbrand
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

With a memmap size of 56 bytes or 72 bytes per page, the memmap for a
256 MB section won't span full PMDs. As we populate single sections and
depopulate single sections, the depopulation step would not be able to
free all vmemmap pmds anymore.

Do it similarly to x86, marking the unused memmap ranges in a special way
(pad it with 0xFD).

This allows us to add/remove sections, cleaning up all allocated
vmemmap pages even if the memmap size is not multiple of 16 bytes per page.

A 56 byte memmap can, for example, be created with !CONFIG_MEMCG and
!CONFIG_SLUB.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 66 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 60 insertions(+), 6 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index b7fdb9536707f..a981ff5d47223 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -72,6 +72,42 @@ static void vmem_pte_free(unsigned long *table)
 	page_table_free(&init_mm, table);
 }
 
+#define PAGE_UNUSED 0xFD
+
+static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+{
+	/*
+	 * As we expect to add in the same granularity as we remove, it's
+	 * sufficient to mark only some piece used to block the memmap page from
+	 * getting removed (just in case the memmap never gets initialized,
+	 * e.g., because the memory block never gets onlined).
+	 */
+	memset(__va(start), 0, sizeof(struct page));
+}
+
+static void vmemmap_use_new_sub_pmd(unsigned long start, unsigned long end)
+{
+	void *page = __va(ALIGN_DOWN(start, PMD_SIZE));
+
+	/* Could be our memmap page is filled with PAGE_UNUSED already ... */
+	vmemmap_use_sub_pmd(start, end);
+
+	/* Mark the unused parts of the new memmap page PAGE_UNUSED. */
+	if (!IS_ALIGNED(start, PMD_SIZE))
+		memset(page, PAGE_UNUSED, start - __pa(page));
+	if (!IS_ALIGNED(end, PMD_SIZE))
+		memset(__va(end), PAGE_UNUSED, __pa(page) + PMD_SIZE - end);
+}
+
+/* Returns true if the PMD is completely unused and can be freed. */
+static bool vmemmap_unuse_sub_pmd(unsigned long start, unsigned long end)
+{
+	void *page = __va(ALIGN_DOWN(start, PMD_SIZE));
+
+	memset(__va(start), PAGE_UNUSED, end - start);
+	return !memchr_inv(page, PAGE_UNUSED, PMD_SIZE);
+}
+
 /*
  * Add a physical memory range to the 1:1 mapping.
  */
@@ -213,6 +249,11 @@ static void remove_pmd_table(pud_t *pud, unsigned long addr,
 							get_order(PMD_SIZE));
 				pmd_clear(pmd);
 				pages++;
+			} else if (!direct &&
+				   vmemmap_unuse_sub_pmd(addr, next)) {
+				vmem_free_pages(pmd_deref(*pmd),
+						get_order(PMD_SIZE));
+				pmd_clear(pmd);
 			}
 			continue;
 		}
@@ -381,7 +422,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		struct vmem_altmap *altmap)
 {
 	unsigned long pgt_prot, sgt_prot;
-	unsigned long address = start;
+	unsigned long next, address = start;
 	pgd_t *pg_dir;
 	p4d_t *p4_dir;
 	pud_t *pu_dir;
@@ -425,16 +466,27 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 		if (pmd_none(*pm_dir) && MACHINE_HAS_EDAT1) {
 			void *new_page;
 
-			/* Use 1MB frames for vmemmap if available. We always
+			/*
+			 * Use 1MB frames for vmemmap if available. We always
 			 * use large frames even if they are only partially
-			 * used.
+			 * used, and mark the unused parts using PAGE_UNUSED.
+			 *
+			 * This is only an issue in some setups. E.g.,
+			 * a full sections with 64 byte memmap per page need
+			 * 4 MB in total. However, with 56 byte, it's 3.5 MB.
+			 *
 			 * Otherwise we would have also page tables since
 			 * vmemmap_populate gets called for each section
-			 * separately. */
+			 * separately.
+			 */
 			new_page = vmemmap_alloc_block(PMD_SIZE, node);
 			if (new_page) {
 				pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
-				address = (address + PMD_SIZE) & PMD_MASK;
+				next = pmd_addr_end(address, end);
+				if (!IS_ALIGNED(next, PMD_SIZE) ||
+				    !IS_ALIGNED(address, PMD_SIZE))
+					vmemmap_use_new_sub_pmd(address, next);
+				address = next;
 				continue;
 			}
 		}
@@ -444,7 +496,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 				goto out;
 			pmd_populate(&init_mm, pm_dir, pt_dir);
 		} else if (pmd_large(*pm_dir)) {
-			address = (address + PMD_SIZE) & PMD_MASK;
+			next = pmd_addr_end(address, end);
+			vmemmap_use_sub_pmd(address, next);
+			address = next;
 			continue;
 		}
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v1 9/9] s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 8/9] s390/vmemmap: remember unused sub-pmd ranges David Hildenbrand
@ 2020-07-03 13:39 ` David Hildenbrand
  2020-07-03 15:48 ` [PATCH v1 0/9] s390: implement and optimize vmemmap_free() Heiko Carstens
  2020-07-07 12:08 ` Heiko Carstens
  10 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-03 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-s390, linux-mm, David Hildenbrand, Heiko Carstens,
	Vasily Gorbik, Christian Borntraeger, Gerald Schaefer

Let's avoid memset(PAGE_UNUSED) when adding consecutive sections,
whereby the vmemmap of a single section does not span full PMDs.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/mm/vmem.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index a981ff5d47223..9db15fc864fc8 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -74,7 +74,22 @@ static void vmem_pte_free(unsigned long *table)
 
 #define PAGE_UNUSED 0xFD
 
-static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+/*
+ * The unused vmemmap range, which was not yet memset(PAGE_UNUSED) ranges
+ * from unused_pmd_start to next PMD_SIZE boundary.
+ */
+unsigned long unused_pmd_start;
+
+static void vmemmap_flush_unused_pmd(void)
+{
+	if (!unused_pmd_start)
+		return;
+	memset(__va(unused_pmd_start), PAGE_UNUSED,
+	       ALIGN(unused_pmd_start, PMD_SIZE) - unused_pmd_start);
+	unused_pmd_start = 0;
+}
+
+static void __vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
 {
 	/*
 	 * As we expect to add in the same granularity as we remove, it's
@@ -85,18 +100,41 @@ static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
 	memset(__va(start), 0, sizeof(struct page));
 }
 
+static void vmemmap_use_sub_pmd(unsigned long start, unsigned long end)
+{
+	/*
+	 * We only optimize if the new used range directly follows the
+	 * previously unused range (esp., when populating consecutive sections).
+	 */
+	if (unused_pmd_start == start) {
+		unused_pmd_start = end;
+		if (likely(IS_ALIGNED(unused_pmd_start, PMD_SIZE)))
+			unused_pmd_start = 0;
+		return;
+	}
+	vmemmap_flush_unused_pmd();
+	__vmemmap_use_sub_pmd(start, end);
+}
+
 static void vmemmap_use_new_sub_pmd(unsigned long start, unsigned long end)
 {
 	void *page = __va(ALIGN_DOWN(start, PMD_SIZE));
 
+	vmemmap_flush_unused_pmd();
+
 	/* Could be our memmap page is filled with PAGE_UNUSED already ... */
-	vmemmap_use_sub_pmd(start, end);
+	__vmemmap_use_sub_pmd(start, end);
 
 	/* Mark the unused parts of the new memmap page PAGE_UNUSED. */
 	if (!IS_ALIGNED(start, PMD_SIZE))
 		memset(page, PAGE_UNUSED, start - __pa(page));
+	/*
+	 * We want to avoid memset(PAGE_UNUSED) when populating the vmemmap of
+	 * consecutive sections. Remember for the last added PMD the last
+	 * unused range in the populated PMD.
+	 */
 	if (!IS_ALIGNED(end, PMD_SIZE))
-		memset(__va(end), PAGE_UNUSED, __pa(page) + PMD_SIZE - end);
+		unused_pmd_start = end;
 }
 
 /* Returns true if the PMD is completely unused and can be freed. */
@@ -104,6 +142,7 @@ static bool vmemmap_unuse_sub_pmd(unsigned long start, unsigned long end)
 {
 	void *page = __va(ALIGN_DOWN(start, PMD_SIZE));
 
+	vmemmap_flush_unused_pmd();
 	memset(__va(start), PAGE_UNUSED, end - start);
 	return !memchr_inv(page, PAGE_UNUSED, PMD_SIZE);
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-07-03 13:39 ` [PATCH v1 9/9] s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections David Hildenbrand
@ 2020-07-03 15:48 ` Heiko Carstens
  2020-07-07 12:08 ` Heiko Carstens
  10 siblings, 0 replies; 20+ messages in thread
From: Heiko Carstens @ 2020-07-03 15:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
> This series is based on the latest s390/features branch [1]. It implements
> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
> - Freeing empty page tables (now also done for idendity mapping).
> - Handling cases where the vmemmap of a section does not fill huge pages
>   completely.

Nice! You implemented some things I "always wanted to do". Maybe I
should just do nothing and wait until patches appear ;)

Will take a look at the series next week. Thanks!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails
  2020-07-03 13:39 ` [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails David Hildenbrand
@ 2020-07-03 17:09   ` kernel test robot
  2020-07-06  7:30     ` David Hildenbrand
  2020-07-04 11:48   ` kernel test robot
  1 sibling, 1 reply; 20+ messages in thread
From: kernel test robot @ 2020-07-03 17:09 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: kbuild-all, linux-s390, linux-mm, David Hildenbrand,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
	Gerald Schaefer

[-- Attachment #1: Type: text/plain, Size: 5197 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20200703]
[cannot apply to linux/master kvms390/next linus/master v5.8-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Hildenbrand/s390-implement-and-optimize-vmemmap_free/20200703-214348
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-alldefconfig (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   arch/s390/mm/vmem.c: In function 'vmemmap_populate':
>> arch/s390/mm/vmem.c:368:3: error: implicit declaration of function 'vmemmap_free'; did you mean 'vmem_altmap_free'? [-Werror=implicit-function-declaration]
     368 |   vmemmap_free(start, end, altmap);
         |   ^~~~~~~~~~~~
         |   vmem_altmap_free
   arch/s390/mm/vmem.c: At top level:
   arch/s390/mm/vmem.c:372:6: warning: no previous prototype for 'vmemmap_free' [-Wmissing-prototypes]
     372 | void vmemmap_free(unsigned long start, unsigned long end,
         |      ^~~~~~~~~~~~
>> arch/s390/mm/vmem.c:372:6: warning: conflicting types for 'vmemmap_free'
   arch/s390/mm/vmem.c:368:3: note: previous implicit declaration of 'vmemmap_free' was here
     368 |   vmemmap_free(start, end, altmap);
         |   ^~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +368 arch/s390/mm/vmem.c

   280	
   281	/*
   282	 * Add a backed mem_map array to the virtual mem_map array.
   283	 */
   284	int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
   285			struct vmem_altmap *altmap)
   286	{
   287		unsigned long pgt_prot, sgt_prot;
   288		unsigned long address = start;
   289		pgd_t *pg_dir;
   290		p4d_t *p4_dir;
   291		pud_t *pu_dir;
   292		pmd_t *pm_dir;
   293		pte_t *pt_dir;
   294		int ret = -ENOMEM;
   295	
   296		pgt_prot = pgprot_val(PAGE_KERNEL);
   297		sgt_prot = pgprot_val(SEGMENT_KERNEL);
   298		if (!MACHINE_HAS_NX) {
   299			pgt_prot &= ~_PAGE_NOEXEC;
   300			sgt_prot &= ~_SEGMENT_ENTRY_NOEXEC;
   301		}
   302		for (address = start; address < end;) {
   303			pg_dir = pgd_offset_k(address);
   304			if (pgd_none(*pg_dir)) {
   305				p4_dir = vmem_crst_alloc(_REGION2_ENTRY_EMPTY);
   306				if (!p4_dir)
   307					goto out;
   308				pgd_populate(&init_mm, pg_dir, p4_dir);
   309			}
   310	
   311			p4_dir = p4d_offset(pg_dir, address);
   312			if (p4d_none(*p4_dir)) {
   313				pu_dir = vmem_crst_alloc(_REGION3_ENTRY_EMPTY);
   314				if (!pu_dir)
   315					goto out;
   316				p4d_populate(&init_mm, p4_dir, pu_dir);
   317			}
   318	
   319			pu_dir = pud_offset(p4_dir, address);
   320			if (pud_none(*pu_dir)) {
   321				pm_dir = vmem_crst_alloc(_SEGMENT_ENTRY_EMPTY);
   322				if (!pm_dir)
   323					goto out;
   324				pud_populate(&init_mm, pu_dir, pm_dir);
   325			}
   326	
   327			pm_dir = pmd_offset(pu_dir, address);
   328			if (pmd_none(*pm_dir)) {
   329				/* Use 1MB frames for vmemmap if available. We always
   330				 * use large frames even if they are only partially
   331				 * used.
   332				 * Otherwise we would have also page tables since
   333				 * vmemmap_populate gets called for each section
   334				 * separately. */
   335				if (MACHINE_HAS_EDAT1) {
   336					void *new_page;
   337	
   338					new_page = vmemmap_alloc_block(PMD_SIZE, node);
   339					if (!new_page)
   340						goto out;
   341					pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
   342					address = (address + PMD_SIZE) & PMD_MASK;
   343					continue;
   344				}
   345				pt_dir = vmem_pte_alloc();
   346				if (!pt_dir)
   347					goto out;
   348				pmd_populate(&init_mm, pm_dir, pt_dir);
   349			} else if (pmd_large(*pm_dir)) {
   350				address = (address + PMD_SIZE) & PMD_MASK;
   351				continue;
   352			}
   353	
   354			pt_dir = pte_offset_kernel(pm_dir, address);
   355			if (pte_none(*pt_dir)) {
   356				void *new_page;
   357	
   358				new_page = vmemmap_alloc_block(PAGE_SIZE, node);
   359				if (!new_page)
   360					goto out;
   361				pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
   362			}
   363			address += PAGE_SIZE;
   364		}
   365		ret = 0;
   366	out:
   367		if (ret)
 > 368			vmemmap_free(start, end, altmap);
   369		return ret;
   370	}
   371	
 > 372	void vmemmap_free(unsigned long start, unsigned long end,
   373			struct vmem_altmap *altmap)
   374	{
   375		remove_pagetable(start, end, false);
   376	}
   377	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7970 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails
  2020-07-03 13:39 ` [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails David Hildenbrand
  2020-07-03 17:09   ` kernel test robot
@ 2020-07-04 11:48   ` kernel test robot
  1 sibling, 0 replies; 20+ messages in thread
From: kernel test robot @ 2020-07-04 11:48 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: kbuild-all, clang-built-linux, linux-s390, linux-mm,
	David Hildenbrand, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Gerald Schaefer

[-- Attachment #1: Type: text/plain, Size: 11815 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20200703]
[cannot apply to linux/master kvms390/next linus/master v5.8-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Hildenbrand/s390-implement-and-optimize-vmemmap_free/20200703-214348
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-randconfig-r036-20200701 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project ca464639a1c9dd3944eb055ffd2796e8c2e7639f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:19:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x000000ffUL) << 24) |            \
                     ^
   In file included from arch/s390/mm/vmem.c:7:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:20:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x0000ff00UL) <<  8) |            \
                     ^
   In file included from arch/s390/mm/vmem.c:7:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:21:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0x00ff0000UL) >>  8) |            \
                     ^
   In file included from arch/s390/mm/vmem.c:7:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:119:21: note: expanded from macro '__swab32'
           ___constant_swab32(x) :                 \
                              ^
   include/uapi/linux/swab.h:22:12: note: expanded from macro '___constant_swab32'
           (((__u32)(x) & (__u32)0xff000000UL) >> 24)))
                     ^
   In file included from arch/s390/mm/vmem.c:7:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:490:45: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:120:12: note: expanded from macro '__swab32'
           __fswab32(x))
                     ^
   In file included from arch/s390/mm/vmem.c:7:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:72:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:46: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew(cpu_to_le16(value), PCI_IOBASE + addr);
                                            ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:46: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel(cpu_to_le32(value), PCI_IOBASE + addr);
                                            ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> arch/s390/mm/vmem.c:368:3: error: implicit declaration of function 'vmemmap_free' [-Werror,-Wimplicit-function-declaration]
                   vmemmap_free(start, end, altmap);
                   ^
>> arch/s390/mm/vmem.c:372:6: error: conflicting types for 'vmemmap_free'
   void vmemmap_free(unsigned long start, unsigned long end,
        ^
   arch/s390/mm/vmem.c:368:3: note: previous implicit declaration is here
                   vmemmap_free(start, end, altmap);
                   ^
   20 warnings and 2 errors generated.

vim +/vmemmap_free +368 arch/s390/mm/vmem.c

   280	
   281	/*
   282	 * Add a backed mem_map array to the virtual mem_map array.
   283	 */
   284	int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
   285			struct vmem_altmap *altmap)
   286	{
   287		unsigned long pgt_prot, sgt_prot;
   288		unsigned long address = start;
   289		pgd_t *pg_dir;
   290		p4d_t *p4_dir;
   291		pud_t *pu_dir;
   292		pmd_t *pm_dir;
   293		pte_t *pt_dir;
   294		int ret = -ENOMEM;
   295	
   296		pgt_prot = pgprot_val(PAGE_KERNEL);
   297		sgt_prot = pgprot_val(SEGMENT_KERNEL);
   298		if (!MACHINE_HAS_NX) {
   299			pgt_prot &= ~_PAGE_NOEXEC;
   300			sgt_prot &= ~_SEGMENT_ENTRY_NOEXEC;
   301		}
   302		for (address = start; address < end;) {
   303			pg_dir = pgd_offset_k(address);
   304			if (pgd_none(*pg_dir)) {
   305				p4_dir = vmem_crst_alloc(_REGION2_ENTRY_EMPTY);
   306				if (!p4_dir)
   307					goto out;
   308				pgd_populate(&init_mm, pg_dir, p4_dir);
   309			}
   310	
   311			p4_dir = p4d_offset(pg_dir, address);
   312			if (p4d_none(*p4_dir)) {
   313				pu_dir = vmem_crst_alloc(_REGION3_ENTRY_EMPTY);
   314				if (!pu_dir)
   315					goto out;
   316				p4d_populate(&init_mm, p4_dir, pu_dir);
   317			}
   318	
   319			pu_dir = pud_offset(p4_dir, address);
   320			if (pud_none(*pu_dir)) {
   321				pm_dir = vmem_crst_alloc(_SEGMENT_ENTRY_EMPTY);
   322				if (!pm_dir)
   323					goto out;
   324				pud_populate(&init_mm, pu_dir, pm_dir);
   325			}
   326	
   327			pm_dir = pmd_offset(pu_dir, address);
   328			if (pmd_none(*pm_dir)) {
   329				/* Use 1MB frames for vmemmap if available. We always
   330				 * use large frames even if they are only partially
   331				 * used.
   332				 * Otherwise we would have also page tables since
   333				 * vmemmap_populate gets called for each section
   334				 * separately. */
   335				if (MACHINE_HAS_EDAT1) {
   336					void *new_page;
   337	
   338					new_page = vmemmap_alloc_block(PMD_SIZE, node);
   339					if (!new_page)
   340						goto out;
   341					pmd_val(*pm_dir) = __pa(new_page) | sgt_prot;
   342					address = (address + PMD_SIZE) & PMD_MASK;
   343					continue;
   344				}
   345				pt_dir = vmem_pte_alloc();
   346				if (!pt_dir)
   347					goto out;
   348				pmd_populate(&init_mm, pm_dir, pt_dir);
   349			} else if (pmd_large(*pm_dir)) {
   350				address = (address + PMD_SIZE) & PMD_MASK;
   351				continue;
   352			}
   353	
   354			pt_dir = pte_offset_kernel(pm_dir, address);
   355			if (pte_none(*pt_dir)) {
   356				void *new_page;
   357	
   358				new_page = vmemmap_alloc_block(PAGE_SIZE, node);
   359				if (!new_page)
   360					goto out;
   361				pte_val(*pt_dir) = __pa(new_page) | pgt_prot;
   362			}
   363			address += PAGE_SIZE;
   364		}
   365		ret = 0;
   366	out:
   367		if (ret)
 > 368			vmemmap_free(start, end, altmap);
   369		return ret;
   370	}
   371	
 > 372	void vmemmap_free(unsigned long start, unsigned long end,
   373			struct vmem_altmap *altmap)
   374	{
   375		remove_pagetable(start, end, false);
   376	}
   377	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails
  2020-07-03 17:09   ` kernel test robot
@ 2020-07-06  7:30     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-06  7:30 UTC (permalink / raw)
  To: kernel test robot, linux-kernel
  Cc: kbuild-all, linux-s390, linux-mm, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Gerald Schaefer

On 03.07.20 19:09, kernel test robot wrote:
> Hi David,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on s390/features]
> [also build test ERROR on next-20200703]
> [cannot apply to linux/master kvms390/next linus/master v5.8-rc3]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use  as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/David-Hildenbrand/s390-implement-and-optimize-vmemmap_free/20200703-214348
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> config: s390-alldefconfig (attached as .config)
> compiler: s390-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=s390 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All error/warnings (new ones prefixed by >>):
> 
>    arch/s390/mm/vmem.c: In function 'vmemmap_populate':
>>> arch/s390/mm/vmem.c:368:3: error: implicit declaration of function 'vmemmap_free'; did you mean 'vmem_altmap_free'? [-Werror=implicit-function-declaration]
>      368 |   vmemmap_free(start, end, altmap);
>          |   ^~~~~~~~~~~~
>          |   vmem_altmap_free
>    arch/s390/mm/vmem.c: At top level:
>    arch/s390/mm/vmem.c:372:6: warning: no previous prototype for 'vmemmap_free' [-Wmissing-prototypes]
>      372 | void vmemmap_free(unsigned long start, unsigned long end,
>          |      ^~~~~~~~~~~~
>>> arch/s390/mm/vmem.c:372:6: warning: conflicting types for 'vmemmap_free'
>    arch/s390/mm/vmem.c:368:3: note: previous implicit declaration of 'vmemmap_free' was here
>      368 |   vmemmap_free(start, end, altmap);
>          |   ^~~~~~~~~~~~
>    cc1: some warnings being treated as errors

Ah, vmemmap_free() is only defined with CONFIG_MEMORY_HOTPLUG. Easy to fix.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-07-03 15:48 ` [PATCH v1 0/9] s390: implement and optimize vmemmap_free() Heiko Carstens
@ 2020-07-07 12:08 ` Heiko Carstens
  2020-07-07 12:13   ` David Hildenbrand
  10 siblings, 1 reply; 20+ messages in thread
From: Heiko Carstens @ 2020-07-07 12:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
> This series is based on the latest s390/features branch [1]. It implements
> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
> - Freeing empty page tables (now also done for idendity mapping).
> - Handling cases where the vmemmap of a section does not fill huge pages
>   completely.
> 
> vmemmap_free() is currently never used, unless adiing standby memory fails
> (unlikely). This is relevant for virtio-mem, which adds/removes memory
> in memory block/section granularity (always removes memory in the same
> granularity it added it).
> 
> I gave this a proper test with my virtio-mem prototype (which I will share
> once the basic QEMU implementation is upstream), both with 56 byte memmap
> per page and 64 byte memmap per page, with and without huge page support.
> In both cases, removing memory (routed through arch_remove_memory()) will
> result in
> - all populated vmemmap pages to get removed/freed
> - all applicable page tables for the vmemmap getting removed/freed
> - all applicable page tables for the idendity mapping getting removed/freed
> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
> environments.
> 
> This is the basis for real memory hotunplug support for s390x and should
> complete my journey to s390x vmem/vmemmap code for now :)
> 
> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
> accesses, doing a single range flush at the end is sufficient, both when
> removing vmemmap pages and the idendity mapping.
> 
> Along, some minor cleanups.

Hmm.. I really would like to see if there would be only a single page
table walker left in vmem.c, which handles both adding and removing
things.
Now we end up with two different page table walk implementations
within the same file. However not sure if it is worth the effort to
unify them though.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-07 12:08 ` Heiko Carstens
@ 2020-07-07 12:13   ` David Hildenbrand
  2020-07-08  6:50     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-07-07 12:13 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On 07.07.20 14:08, Heiko Carstens wrote:
> On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
>> This series is based on the latest s390/features branch [1]. It implements
>> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
>> - Freeing empty page tables (now also done for idendity mapping).
>> - Handling cases where the vmemmap of a section does not fill huge pages
>>   completely.
>>
>> vmemmap_free() is currently never used, unless adiing standby memory fails
>> (unlikely). This is relevant for virtio-mem, which adds/removes memory
>> in memory block/section granularity (always removes memory in the same
>> granularity it added it).
>>
>> I gave this a proper test with my virtio-mem prototype (which I will share
>> once the basic QEMU implementation is upstream), both with 56 byte memmap
>> per page and 64 byte memmap per page, with and without huge page support.
>> In both cases, removing memory (routed through arch_remove_memory()) will
>> result in
>> - all populated vmemmap pages to get removed/freed
>> - all applicable page tables for the vmemmap getting removed/freed
>> - all applicable page tables for the idendity mapping getting removed/freed
>> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
>> environments.
>>
>> This is the basis for real memory hotunplug support for s390x and should
>> complete my journey to s390x vmem/vmemmap code for now :)
>>
>> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
>> accesses, doing a single range flush at the end is sufficient, both when
>> removing vmemmap pages and the idendity mapping.
>>
>> Along, some minor cleanups.
> 
> Hmm.. I really would like to see if there would be only a single page
> table walker left in vmem.c, which handles both adding and removing
> things.
> Now we end up with two different page table walk implementations
> within the same file. However not sure if it is worth the effort to
> unify them though.

I tried to unify vmemmap_populate() and vmem_add_range() already and
didn't like the end result ... so, unifying these along with the removal
part won't be any better - most probably. Open for suggestions :)

(at least arm64 and x86-64 handle it similarly)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-07 12:13   ` David Hildenbrand
@ 2020-07-08  6:50     ` David Hildenbrand
  2020-07-08 12:16       ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-07-08  6:50 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On 07.07.20 14:13, David Hildenbrand wrote:
> On 07.07.20 14:08, Heiko Carstens wrote:
>> On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
>>> This series is based on the latest s390/features branch [1]. It implements
>>> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
>>> - Freeing empty page tables (now also done for idendity mapping).
>>> - Handling cases where the vmemmap of a section does not fill huge pages
>>>   completely.
>>>
>>> vmemmap_free() is currently never used, unless adiing standby memory fails
>>> (unlikely). This is relevant for virtio-mem, which adds/removes memory
>>> in memory block/section granularity (always removes memory in the same
>>> granularity it added it).
>>>
>>> I gave this a proper test with my virtio-mem prototype (which I will share
>>> once the basic QEMU implementation is upstream), both with 56 byte memmap
>>> per page and 64 byte memmap per page, with and without huge page support.
>>> In both cases, removing memory (routed through arch_remove_memory()) will
>>> result in
>>> - all populated vmemmap pages to get removed/freed
>>> - all applicable page tables for the vmemmap getting removed/freed
>>> - all applicable page tables for the idendity mapping getting removed/freed
>>> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
>>> environments.
>>>
>>> This is the basis for real memory hotunplug support for s390x and should
>>> complete my journey to s390x vmem/vmemmap code for now :)
>>>
>>> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
>>> accesses, doing a single range flush at the end is sufficient, both when
>>> removing vmemmap pages and the idendity mapping.
>>>
>>> Along, some minor cleanups.
>>
>> Hmm.. I really would like to see if there would be only a single page
>> table walker left in vmem.c, which handles both adding and removing
>> things.
>> Now we end up with two different page table walk implementations
>> within the same file. However not sure if it is worth the effort to
>> unify them though.
> 
> I tried to unify vmemmap_populate() and vmem_add_range() already and
> didn't like the end result ... so, unifying these along with the removal
> part won't be any better - most probably. Open for suggestions :)
> 
> (at least arm64 and x86-64 handle it similarly)
> 

I'll play with something like

static void modify_pagetable(unsigned long start, unsigned long end,
			     bool direct, bool add)

and see how it turns out.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-08  6:50     ` David Hildenbrand
@ 2020-07-08 12:16       ` David Hildenbrand
  2020-07-10 13:57         ` Heiko Carstens
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-07-08 12:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On 08.07.20 08:50, David Hildenbrand wrote:
> On 07.07.20 14:13, David Hildenbrand wrote:
>> On 07.07.20 14:08, Heiko Carstens wrote:
>>> On Fri, Jul 03, 2020 at 03:39:08PM +0200, David Hildenbrand wrote:
>>>> This series is based on the latest s390/features branch [1]. It implements
>>>> vmemmap_free(), consolidating it with vmem_add_range(), and optimizes it by
>>>> - Freeing empty page tables (now also done for idendity mapping).
>>>> - Handling cases where the vmemmap of a section does not fill huge pages
>>>>   completely.
>>>>
>>>> vmemmap_free() is currently never used, unless adiing standby memory fails
>>>> (unlikely). This is relevant for virtio-mem, which adds/removes memory
>>>> in memory block/section granularity (always removes memory in the same
>>>> granularity it added it).
>>>>
>>>> I gave this a proper test with my virtio-mem prototype (which I will share
>>>> once the basic QEMU implementation is upstream), both with 56 byte memmap
>>>> per page and 64 byte memmap per page, with and without huge page support.
>>>> In both cases, removing memory (routed through arch_remove_memory()) will
>>>> result in
>>>> - all populated vmemmap pages to get removed/freed
>>>> - all applicable page tables for the vmemmap getting removed/freed
>>>> - all applicable page tables for the idendity mapping getting removed/freed
>>>> Unfortunately, I don't have access to bigger and z/VM (esp. dcss)
>>>> environments.
>>>>
>>>> This is the basis for real memory hotunplug support for s390x and should
>>>> complete my journey to s390x vmem/vmemmap code for now :)
>>>>
>>>> What needs double-checking is tlb flushing. AFAIKS, as there are no valid
>>>> accesses, doing a single range flush at the end is sufficient, both when
>>>> removing vmemmap pages and the idendity mapping.
>>>>
>>>> Along, some minor cleanups.
>>>
>>> Hmm.. I really would like to see if there would be only a single page
>>> table walker left in vmem.c, which handles both adding and removing
>>> things.
>>> Now we end up with two different page table walk implementations
>>> within the same file. However not sure if it is worth the effort to
>>> unify them though.
>>
>> I tried to unify vmemmap_populate() and vmem_add_range() already and
>> didn't like the end result ... so, unifying these along with the removal
>> part won't be any better - most probably. Open for suggestions :)
>>
>> (at least arm64 and x86-64 handle it similarly)
>>
> 
> I'll play with something like
> 
> static void modify_pagetable(unsigned long start, unsigned long end,
> 			     bool direct, bool add)
> 
> and see how it turns out.
> 

Did a quick hack. With a single walker (modify_pagetable) I get

 arch/s390/mm/vmem.c | 628 ++++++++++++++++++++++++++++++--------------
 1 file changed, 434 insertions(+), 194 deletions(-)

Overall looks cleaner, only modify_pte_table() and modify_pmd_table()
are a little more involved ...

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-08 12:16       ` David Hildenbrand
@ 2020-07-10 13:57         ` Heiko Carstens
  2020-07-10 14:02           ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Heiko Carstens @ 2020-07-10 13:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On Wed, Jul 08, 2020 at 02:16:39PM +0200, David Hildenbrand wrote:
> >>> Hmm.. I really would like to see if there would be only a single page
> >>> table walker left in vmem.c, which handles both adding and removing
> >>> things.
> >>> Now we end up with two different page table walk implementations
> >>> within the same file. However not sure if it is worth the effort to
> >>> unify them though.
> >>
> >> I tried to unify vmemmap_populate() and vmem_add_range() already and
> >> didn't like the end result ... so, unifying these along with the removal
> >> part won't be any better - most probably. Open for suggestions :)
> >>
> >> (at least arm64 and x86-64 handle it similarly)
> >>
> > 
> > I'll play with something like
> > 
> > static void modify_pagetable(unsigned long start, unsigned long end,
> > 			     bool direct, bool add)
> > 
> > and see how it turns out.
> > 
> 
> Did a quick hack. With a single walker (modify_pagetable) I get
> 
>  arch/s390/mm/vmem.c | 628 ++++++++++++++++++++++++++++++--------------
>  1 file changed, 434 insertions(+), 194 deletions(-)
> 
> Overall looks cleaner, only modify_pte_table() and modify_pmd_table()
> are a little more involved ...

Would you mind to resend the series with this integrated?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1 0/9] s390: implement and optimize vmemmap_free()
  2020-07-10 13:57         ` Heiko Carstens
@ 2020-07-10 14:02           ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-07-10 14:02 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: linux-kernel, linux-s390, linux-mm, Christian Borntraeger,
	Gerald Schaefer, Vasily Gorbik

On 10.07.20 15:57, Heiko Carstens wrote:
> On Wed, Jul 08, 2020 at 02:16:39PM +0200, David Hildenbrand wrote:
>>>>> Hmm.. I really would like to see if there would be only a single page
>>>>> table walker left in vmem.c, which handles both adding and removing
>>>>> things.
>>>>> Now we end up with two different page table walk implementations
>>>>> within the same file. However not sure if it is worth the effort to
>>>>> unify them though.
>>>>
>>>> I tried to unify vmemmap_populate() and vmem_add_range() already and
>>>> didn't like the end result ... so, unifying these along with the removal
>>>> part won't be any better - most probably. Open for suggestions :)
>>>>
>>>> (at least arm64 and x86-64 handle it similarly)
>>>>
>>>
>>> I'll play with something like
>>>
>>> static void modify_pagetable(unsigned long start, unsigned long end,
>>> 			     bool direct, bool add)
>>>
>>> and see how it turns out.
>>>
>>
>> Did a quick hack. With a single walker (modify_pagetable) I get
>>
>>  arch/s390/mm/vmem.c | 628 ++++++++++++++++++++++++++++++--------------
>>  1 file changed, 434 insertions(+), 194 deletions(-)
>>
>> Overall looks cleaner, only modify_pte_table() and modify_pmd_table()
>> are a little more involved ...
> 
> Would you mind to resend the series with this integrated?
> 

Yes, did some testing yesterday. Want to reshuffle/cleanup some things
before posting.

Will be out of office next week, so expect a v2 some-when in 1-2 weeks.

Cheers!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-07-10 14:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-03 13:39 [PATCH v1 0/9] s390: implement and optimize vmemmap_free() David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 1/9] s390/vmem: rename vmem_add_mem() to vmem_add_range() David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 2/9] s390/vmem: recursive implementation of vmem_remove_range() David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 3/9] s390/vmemmap: implement vmemmap_free() David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 4/9] s390/vmemmap: cleanup when vmemmap_populate() fails David Hildenbrand
2020-07-03 17:09   ` kernel test robot
2020-07-06  7:30     ` David Hildenbrand
2020-07-04 11:48   ` kernel test robot
2020-07-03 13:39 ` [PATCH v1 5/9] s390/vmemmap: take the vmem_mutex when populating/freeing David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 6/9] s390/vmem: cleanup empty page tables David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 7/9] s390/vmemmap: fallback to PTEs if mapping large PMD fails David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 8/9] s390/vmemmap: remember unused sub-pmd ranges David Hildenbrand
2020-07-03 13:39 ` [PATCH v1 9/9] s390/vmemmap: avoid memset(PAGE_UNUSED) when adding consecutive sections David Hildenbrand
2020-07-03 15:48 ` [PATCH v1 0/9] s390: implement and optimize vmemmap_free() Heiko Carstens
2020-07-07 12:08 ` Heiko Carstens
2020-07-07 12:13   ` David Hildenbrand
2020-07-08  6:50     ` David Hildenbrand
2020-07-08 12:16       ` David Hildenbrand
2020-07-10 13:57         ` Heiko Carstens
2020-07-10 14:02           ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).