linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
@ 2020-11-11 14:53 David Hildenbrand
  2020-11-11 14:53 ` [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space David Hildenbrand
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Aneesh Kumar K.V, Benjamin Herrenschmidt, Michael Ellerman,
	Michal Hocko, Michal Hocko, Mike Rapoport, Nicholas Piggin,
	Oscar Salvador, Paul Mackerras, Rashmica Gupta, Wei Yang

Based on latest linux/master

powernv/memtrace is the only in-kernel user that rips out random memory
it never added (doesn't own) in order to allocate memory without a
linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
that - use alloc_contig_pages() for allocating memory and remove the
linear mapping manually.

The original idea was discussed in:
 https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com

I only tested via QEMU TCG with a single NUMA node- see patch #8 for more
details.

Error handling and cleanup handling in memtrace code is a mess - that
should definitely get cleaned up sooner or later. Once we have __GFP_ZERO
support for alloc_contig_pages(), we can drop manual clearing. I added
a TODO for now, so this series can go via the powerpc tree - the __GFP_ZERO
change is then better suited via the mm tree, along with support for
__GFP_ZERO.

v1 -> v2:
- Tweaks to patch descriptions
- "powernv/memtrace: don't leak kernel memory to user space"
-- Added. Reported by Michael.
- "powernv/memtrace: fix crashing the kernel when enabling concurrently"
-- Added, discovered while testing.
- "powerpc/mm: protect linear mapping modifications by a mutex"
-- Added. Although we currently won't have concurrency, this is cleaner and
   future-proof.
- "powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping"
-- Added. Suggested by Oscar
- "powernv/memtrace: don't abuse memory hot(un)plug infrastructure for
   memory allocations"
-- Reshuffle the code to make review easier.
-- Add a TODO regarding __GFP_ZERO. Adapt to changed page clearing code.
-- Use GFP_KERNEL | __GFP_THISNODE | __GFP_NOWARN for allocations.


David Hildenbrand (8):
  powernv/memtrace: don't leak kernel memory to user space
  powernv/memtrace: fix crashing the kernel when enabling concurrently
  powerpc/mm: factor out creating/removing linear mapping
  powerpc/mm: protect linear mapping modifications by a mutex
  powerpc/mm: print warning in arch_remove_linear_mapping()
  powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping
  powerpc/mm: remove linear mapping if __add_pages() fails in
    arch_add_memory()
  powernv/memtrace: don't abuse memory hot(un)plug infrastructure for
    memory allocations

 arch/powerpc/mm/book3s64/hash_utils.c     |   1 -
 arch/powerpc/mm/mem.c                     |  53 +++++--
 arch/powerpc/platforms/powernv/Kconfig    |   8 +-
 arch/powerpc/platforms/powernv/memtrace.c | 175 ++++++++++------------
 include/linux/memory_hotplug.h            |   3 +
 5 files changed, 125 insertions(+), 115 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 15:13   ` Oscar Salvador
  2020-11-11 14:53 ` [PATCH v2 2/8] powernv/memtrace: fix crashing the kernel when enabling concurrently David Hildenbrand
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	stable, Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang

We currently leak kernel memory to user space, because memory offlining
doesn't do any implicit clearing of memory and we are missing explicit
clearing of memory.

Let's keep it simple and clear pages before removing the linear mapping.

Reproduced in QEMU/TCG with 10 GiB of main memory:
  [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null
  [... wait until "free -m" used counter no longer changes and cancel]
  19665802+0 records in
  1+0 records out
  9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s
  [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes
  40000000
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900
  [  402.980063][ T1086] flags: 0x7ffff000001000(reserved)
  [  402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000
  [  402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
  [  402.980845][ T1086] page dumped because: unmovable page
  [  402.989608][ T1086] Offlined Pages 16384
  [  403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000

Before this patch:
  [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace  | head
  00000000  c8 25 72 51 4d 26 36 c5  5c c2 56 15 d5 1a cd 10  |.%rQM&6.\.V.....|
  00000010  19 b9 50 b2 cb e3 60 b8  ec 0a f3 ec 4b 3c 39 f0  |..P...`.....K<9.|$
  00000020  4e 5a 4c cf bd 26 19 ff  37 79 13 67 24 b7 b8 57  |NZL..&..7y.g$..W|$
  00000030  98 3e f5 be 6f 14 6a bd  a4 52 bc 6e e9 e0 c1 5d  |.>..o.j..R.n...]|$
  00000040  76 b3 ae b5 88 d7 da e3  64 23 85 2c 10 88 07 b6  |v.......d#.,....|$
  00000050  9a d8 91 de f7 50 27 69  2e 64 9c 6f d3 19 45 79  |.....P'i.d.o..Ey|$
  00000060  6a 6f 8a 61 71 19 1f c7  f1 df 28 26 ca 0f 84 55  |jo.aq.....(&...U|$
  00000070  01 3f be e4 e2 e1 da ff  7b 8c 8e 32 37 b4 24 53  |.?......{..27.$S|$
  00000080  1b 70 30 45 56 e6 8c c4  0e b5 4c fb 9f dd 88 06  |.p0EV.....L.....|$
  00000090  ef c4 18 79 f1 60 b1 5c  79 59 4d f4 36 d7 4a 5c  |...y.`.\yYM.6.J\|$

After this patch:
  [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace  | head
  00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  *
  40000000

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org # v4.14+
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 6828108486f8..eea1f94482ff 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -67,6 +67,23 @@ static int change_memblock_state(struct memory_block *mem, void *arg)
 	return 0;
 }
 
+static void memtrace_clear_range(unsigned long start_pfn,
+				 unsigned long nr_pages)
+{
+	unsigned long pfn;
+
+	/*
+	 * As pages are offline, we cannot trust the memmap anymore. As HIGHMEM
+	 * does not apply, avoid passing around "struct page" and use
+	 * clear_page() instead directly.
+	 */
+	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++) {
+		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
+			cond_resched();
+		clear_page(__va(PFN_PHYS(pfn)));
+	}
+}
+
 /* called with device_hotplug_lock held */
 static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 {
@@ -111,6 +128,11 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
 	lock_device_hotplug();
 	for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
 		if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) {
+			/*
+			 * Clear the range while we still have a linear
+			 * mapping.
+			 */
+			memtrace_clear_range(base_pfn, nr_pages);
 			/*
 			 * Remove memory in memory block size chunks so that
 			 * iomem resources are always split to the same size and
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/8] powernv/memtrace: fix crashing the kernel when enabling concurrently
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
  2020-11-11 14:53 ` [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 15:22   ` Oscar Salvador
  2020-11-11 14:53 ` [PATCH v2 3/8] powerpc/mm: factor out creating/removing linear mapping David Hildenbrand
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, stable,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta

It's very easy to crash the kernel right now by simply trying to enable
memtrace concurrently, hammering on the "enable" interface

loop.sh:
  #!/bin/bash

  dmesg --console-off

  while true; do
          echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  done

[root@localhost ~]# loop.sh &
[root@localhost ~]# loop.sh &

Resulting quickly in a kernel crash. Let's properly protect using a
mutex.

Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org# v4.14+
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index eea1f94482ff..0e42fe2d7b6a 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -30,6 +30,7 @@ struct memtrace_entry {
 	char name[16];
 };
 
+static DEFINE_MUTEX(memtrace_mutex);
 static u64 memtrace_size;
 
 static struct memtrace_entry *memtrace_array;
@@ -279,6 +280,7 @@ static int memtrace_online(void)
 
 static int memtrace_enable_set(void *data, u64 val)
 {
+	int rc = -EAGAIN;
 	u64 bytes;
 
 	/*
@@ -291,25 +293,31 @@ static int memtrace_enable_set(void *data, u64 val)
 		return -EINVAL;
 	}
 
+	mutex_lock(&memtrace_mutex);
+
 	/* Re-add/online previously removed/offlined memory */
 	if (memtrace_size) {
 		if (memtrace_online())
-			return -EAGAIN;
+			goto out_unlock;
 	}
 
-	if (!val)
-		return 0;
+	if (!val) {
+		rc = 0;
+		goto out_unlock;
+	}
 
 	/* Offline and remove memory */
 	if (memtrace_init_regions_runtime(val))
-		return -EINVAL;
+		goto out_unlock;
 
 	if (memtrace_init_debugfs())
-		return -EINVAL;
+		goto out_unlock;
 
 	memtrace_size = val;
-
-	return 0;
+	rc = 0;
+out_unlock:
+	mutex_unlock(&memtrace_mutex);
+	return rc;
 }
 
 static int memtrace_enable_get(void *data, u64 *val)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/8] powerpc/mm: factor out creating/removing linear mapping
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
  2020-11-11 14:53 ` [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space David Hildenbrand
  2020-11-11 14:53 ` [PATCH v2 2/8] powernv/memtrace: fix crashing the kernel when enabling concurrently David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 15:27   ` Oscar Salvador
  2020-11-11 14:53 ` [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex David Hildenbrand
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang

We want to stop abusing memory hotplug infrastructure in memtrace code
to perform allocations and remove the linear mapping. Instead we will use
alloc_contig_pages() and remove the linear mapping manually.

Let's factor out creating/removing the linear mapping into
arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
future, we might be able to have whole arch_add_memory() /
arch_remove_memory() be implemented in common code.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c          | 41 +++++++++++++++++++++++-----------
 include/linux/memory_hotplug.h |  3 +++
 2 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 01ec2a252f09..8a86d81f8df0 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -120,34 +120,26 @@ static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
 	}
 }
 
-int __ref arch_add_memory(int nid, u64 start, u64 size,
-			  struct mhp_params *params)
+int __ref arch_create_linear_mapping(int nid, u64 start, u64 size,
+				     struct mhp_params *params)
 {
-	unsigned long start_pfn = start >> PAGE_SHIFT;
-	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int rc;
 
 	start = (unsigned long)__va(start);
 	rc = create_section_mapping(start, start + size, nid,
 				    params->pgprot);
 	if (rc) {
-		pr_warn("Unable to create mapping for hot added memory 0x%llx..0x%llx: %d\n",
+		pr_warn("Unable to create linear mapping for 0x%llx..0x%llx: %d\n",
 			start, start + size, rc);
 		return -EFAULT;
 	}
-
-	return __add_pages(nid, start_pfn, nr_pages, params);
+	return 0;
 }
 
-void __ref arch_remove_memory(int nid, u64 start, u64 size,
-			     struct vmem_altmap *altmap)
+void __ref arch_remove_linear_mapping(u64 start, u64 size)
 {
-	unsigned long start_pfn = start >> PAGE_SHIFT;
-	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	__remove_pages(start_pfn, nr_pages, altmap);
-
 	/* Remove htab bolted mappings for this section of memory */
 	start = (unsigned long)__va(start);
 	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
@@ -160,6 +152,29 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 	 */
 	vm_unmap_aliases();
 }
+
+int __ref arch_add_memory(int nid, u64 start, u64 size,
+			  struct mhp_params *params)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+	int rc;
+
+	rc = arch_create_linear_mapping(nid, start, size, params);
+	if (rc)
+		return rc;
+	return __add_pages(nid, start_pfn, nr_pages, params);
+}
+
+void __ref arch_remove_memory(int nid, u64 start, u64 size,
+			      struct vmem_altmap *altmap)
+{
+	unsigned long start_pfn = start >> PAGE_SHIFT;
+	unsigned long nr_pages = size >> PAGE_SHIFT;
+
+	__remove_pages(start_pfn, nr_pages, altmap);
+	arch_remove_linear_mapping(start, size);
+}
 #endif
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d65c6fdc5cfc..00b9e9bd3850 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -375,6 +375,9 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
 					  unsigned long pnum);
 extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned start_pfn,
 		unsigned long nr_pages);
+extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
+				      struct mhp_params *params);
+void arch_remove_linear_mapping(u64 start, u64 size);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-11-11 14:53 ` [PATCH v2 3/8] powerpc/mm: factor out creating/removing linear mapping David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 15:37   ` Oscar Salvador
  2020-11-11 14:53 ` [PATCH v2 5/8] powerpc/mm: print warning in arch_remove_linear_mapping() David Hildenbrand
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang

This code currently relies on mem_hotplug_begin()/mem_hotplug_done() -
create_section_mapping()/remove_section_mapping() implementations
cannot tollerate getting called concurrently.

Let's prepare for callers (memtrace) not holding any such locks (and
don't force them to mess with memory hotplug locks).

Other parts in these functions don't seem to rely on external locking.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8a86d81f8df0..ca5c4b54c366 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -58,6 +58,7 @@
 #define CPU_FTR_NOEXECUTE	0
 #endif
 
+static DEFINE_MUTEX(linear_mapping_mutex);
 unsigned long long memory_limit;
 bool init_mem_is_free;
 
@@ -126,8 +127,10 @@ int __ref arch_create_linear_mapping(int nid, u64 start, u64 size,
 	int rc;
 
 	start = (unsigned long)__va(start);
+	mutex_lock(&linear_mapping_mutex);
 	rc = create_section_mapping(start, start + size, nid,
 				    params->pgprot);
+	mutex_unlock(&linear_mapping_mutex);
 	if (rc) {
 		pr_warn("Unable to create linear mapping for 0x%llx..0x%llx: %d\n",
 			start, start + size, rc);
@@ -144,7 +147,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
 	start = (unsigned long)__va(start);
 	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
 
+	mutex_lock(&linear_mapping_mutex);
 	ret = remove_section_mapping(start, start + size);
+	mutex_unlock(&linear_mapping_mutex);
 	WARN_ON_ONCE(ret);
 
 	/* Ensure all vmalloc mappings are flushed in case they also
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/8] powerpc/mm: print warning in arch_remove_linear_mapping()
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-11-11 14:53 ` [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-11 14:53 ` [PATCH v2 6/8] powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping David Hildenbrand
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Oscar Salvador,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Mike Rapoport, Michal Hocko,
	Wei Yang

Let's print a warning similar to in arch_add_linear_mapping() instead of
WARN_ON_ONCE() and eventually crashing the kernel.

Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index ca5c4b54c366..c5755b9efb64 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -150,7 +150,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
 	mutex_lock(&linear_mapping_mutex);
 	ret = remove_section_mapping(start, start + size);
 	mutex_unlock(&linear_mapping_mutex);
-	WARN_ON_ONCE(ret);
+	if (ret)
+		pr_warn("Unable to remove linear mapping for 0x%llx..0x%llx: %d\n",
+			start, start + size, ret);
 
 	/* Ensure all vmalloc mappings are flushed in case they also
 	 * hit that section of memory
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/8] powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-11-11 14:53 ` [PATCH v2 5/8] powerpc/mm: print warning in arch_remove_linear_mapping() David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 15:45   ` Oscar Salvador
  2020-11-11 14:53 ` [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Oscar Salvador,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Mike Rapoport, Michal Hocko,
	Wei Yang, Aneesh Kumar K.V, Nicholas Piggin

The single caller (arch_remove_linear_mapping()) prints a proper warning
when this function fails. No need to eventually crash the kernel - let's
drop this WARN_ON.

Suggested-by: Oscar Salvador <osalvador@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/book3s64/hash_utils.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 24702c0a92e0..d2dcb7757c68 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -845,7 +845,6 @@ int hash__remove_section_mapping(unsigned long start, unsigned long end)
 {
 	int rc = htab_remove_mapping(start, end, mmu_linear_psize,
 				     mmu_kernel_ssize);
-	WARN_ON(rc < 0);
 
 	if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
 		pr_warn("Hash collision while resizing HPT\n");
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-11-11 14:53 ` [PATCH v2 6/8] powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 15:51   ` Oscar Salvador
  2020-11-11 14:53 ` [PATCH v2 8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
  2020-11-25 11:57 ` [PATCH v2 0/8] " Michael Ellerman
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Oscar Salvador,
	Wei Yang

Let's revert what we did in case seomthing goes wrong and we return an
error - as already done on arm64 and s390x.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/mm/mem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index c5755b9efb64..8b946ec68d1b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -170,7 +170,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
 	rc = arch_create_linear_mapping(nid, start, size, params);
 	if (rc)
 		return rc;
-	return __add_pages(nid, start_pfn, nr_pages, params);
+	rc = __add_pages(nid, start_pfn, nr_pages, params);
+	if (rc)
+		arch_remove_linear_mapping(start, size);
+	return rc;
 }
 
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-11-11 14:53 ` [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
@ 2020-11-11 14:53 ` David Hildenbrand
  2020-11-17 16:45   ` Oscar Salvador
  2020-11-25 11:57 ` [PATCH v2 0/8] " Michael Ellerman
  8 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-11 14:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Michal Hocko,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Mike Rapoport, Michal Hocko,
	Oscar Salvador, Wei Yang

Let's use alloc_contig_pages() for allocating memory and remove the
linear mapping manually via arch_remove_linear_mapping(). Mark all pages
PG_offline, such that they will definitely not get touched - e.g.,
when hibernating. When freeing memory, try to revert what we did.

The original idea was discussed in:
 https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com

This is similar to CONFIG_DEBUG_PAGEALLOC handling on other
architectures, whereby only single pages are unmapped from the linear
mapping. Let's mimic what memory hot(un)plug would do with the linear
mapping.

We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO
that we want to use __GFP_ZERO for clearing once alloc_contig_pages()
understands that.

Tested with in QEMU/TCG with 10 GiB of main memory:
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
  [  145.049019][ T1080] memtrace: Freed trace memory back on node 0
  [  145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
  [root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages
  [  213.613855][ T1080] memtrace: Freed trace memory back on node 0
  [  214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000
  [root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages
  [  234.886974][ T1080] memtrace: Freed trace memory back on node 0
  [  234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000

I also made sure allocated memory is properly zeroed.

Note 1: We currently won't be allocating from ZONE_MOVABLE - because our
	pages are not movable. However, as we don't run with any memory
	hot(un)plug mechanism around, we could make an exception to
	increase the chance of allocations succeeding.

Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used
	along PG_reserved in hibernation code will always return "true"
	on powerpc, resulting in the pages getting touched. It's too
	generic - e.g., indicates boot allocations.

Note 3: For now, we keep using memory_block_size_bytes() as minimum
	granularity.

Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/Kconfig    |   8 +-
 arch/powerpc/platforms/powernv/memtrace.c | 163 ++++++++--------------
 2 files changed, 62 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 938803eab0ad..619b093a0657 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -27,11 +27,11 @@ config OPAL_PRD
 	  recovery diagnostics on OpenPower machines
 
 config PPC_MEMTRACE
-	bool "Enable removal of RAM from kernel mappings for tracing"
-	depends on PPC_POWERNV && MEMORY_HOTREMOVE
+	bool "Enable runtime allocation of RAM for tracing"
+	depends on PPC_POWERNV && MEMORY_HOTPLUG && CONTIG_ALLOC
 	help
-	  Enabling this option allows for the removal of memory (RAM)
-	  from the kernel mappings to be used for hardware tracing.
+	  Enabling this option allows for runtime allocation of memory (RAM)
+	  for hardware tracing.
 
 config PPC_VAS
 	bool "IBM Virtual Accelerator Switchboard (VAS)"
diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 0e42fe2d7b6a..5fc9408bb0b3 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -51,33 +51,12 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
-{
-	if (mem->state != MEM_ONLINE)
-		return -1;
-
-	return 0;
-}
-
-static int change_memblock_state(struct memory_block *mem, void *arg)
-{
-	unsigned long state = (unsigned long)arg;
-
-	mem->state = state;
-
-	return 0;
-}
-
 static void memtrace_clear_range(unsigned long start_pfn,
 				 unsigned long nr_pages)
 {
 	unsigned long pfn;
 
-	/*
-	 * As pages are offline, we cannot trust the memmap anymore. As HIGHMEM
-	 * does not apply, avoid passing around "struct page" and use
-	 * clear_page() instead directly.
-	 */
+	/* As HIGHMEM does not apply, use clear_page() directly. */
 	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++) {
 		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
 			cond_resched();
@@ -85,72 +64,39 @@ static void memtrace_clear_range(unsigned long start_pfn,
 	}
 }
 
-/* called with device_hotplug_lock held */
-static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
-{
-	const unsigned long start = PFN_PHYS(start_pfn);
-	const unsigned long size = PFN_PHYS(nr_pages);
-
-	if (walk_memory_blocks(start, size, NULL, check_memblock_online))
-		return false;
-
-	walk_memory_blocks(start, size, (void *)MEM_GOING_OFFLINE,
-			   change_memblock_state);
-
-	if (offline_pages(start_pfn, nr_pages)) {
-		walk_memory_blocks(start, size, (void *)MEM_ONLINE,
-				   change_memblock_state);
-		return false;
-	}
-
-	walk_memory_blocks(start, size, (void *)MEM_OFFLINE,
-			   change_memblock_state);
-
-
-	return true;
-}
-
 static u64 memtrace_alloc_node(u32 nid, u64 size)
 {
-	u64 start_pfn, end_pfn, nr_pages, pfn;
-	u64 base_pfn;
-	u64 bytes = memory_block_size_bytes();
+	const unsigned long nr_pages = PHYS_PFN(size);
+	unsigned long pfn, start_pfn;
+	struct page *page;
 
-	if (!node_spanned_pages(nid))
+	/*
+	 * Trace memory needs to be aligned to the size, which is guaranteed
+	 * by alloc_contig_pages().
+	 */
+	page = alloc_contig_pages(nr_pages, GFP_KERNEL | __GFP_THISNODE |
+				  __GFP_NOWARN, nid, NULL);
+	if (!page)
 		return 0;
+	start_pfn = page_to_pfn(page);
 
-	start_pfn = node_start_pfn(nid);
-	end_pfn = node_end_pfn(nid);
-	nr_pages = size >> PAGE_SHIFT;
-
-	/* Trace memory needs to be aligned to the size */
-	end_pfn = round_down(end_pfn - nr_pages, nr_pages);
-
-	lock_device_hotplug();
-	for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
-		if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true) {
-			/*
-			 * Clear the range while we still have a linear
-			 * mapping.
-			 */
-			memtrace_clear_range(base_pfn, nr_pages);
-			/*
-			 * Remove memory in memory block size chunks so that
-			 * iomem resources are always split to the same size and
-			 * we never try to remove memory that spans two iomem
-			 * resources.
-			 */
-			end_pfn = base_pfn + nr_pages;
-			for (pfn = base_pfn; pfn < end_pfn; pfn += bytes>> PAGE_SHIFT) {
-				__remove_memory(nid, pfn << PAGE_SHIFT, bytes);
-			}
-			unlock_device_hotplug();
-			return base_pfn << PAGE_SHIFT;
-		}
-	}
-	unlock_device_hotplug();
+	/*
+	 * Clear the range while we still have a linear mapping.
+	 *
+	 * TODO: use __GFP_ZERO with alloc_contig_pages() once supported.
+	 */
+	memtrace_clear_range(start_pfn, nr_pages);
 
-	return 0;
+	/*
+	 * Set pages PageOffline(), to indicate that nobody (e.g., hibernation,
+	 * dumping, ...) should be touching these pages.
+	 */
+	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++)
+		__SetPageOffline(pfn_to_page(pfn));
+
+	arch_remove_linear_mapping(PFN_PHYS(start_pfn), size);
+
+	return PFN_PHYS(start_pfn);
 }
 
 static int memtrace_init_regions_runtime(u64 size)
@@ -220,16 +166,30 @@ static int memtrace_init_debugfs(void)
 	return ret;
 }
 
-static int online_mem_block(struct memory_block *mem, void *arg)
+static int memtrace_free(int nid, u64 start, u64 size)
 {
-	return device_online(&mem->dev);
+	struct mhp_params params = { .pgprot = PAGE_KERNEL };
+	const unsigned long nr_pages = PHYS_PFN(size);
+	const unsigned long start_pfn = PHYS_PFN(start);
+	unsigned long pfn;
+	int ret;
+
+	ret = arch_create_linear_mapping(nid, start, size, &params);
+	if (ret)
+		return ret;
+
+	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++)
+		__ClearPageOffline(pfn_to_page(pfn));
+
+	free_contig_range(start_pfn, nr_pages);
+	return 0;
 }
 
 /*
- * Iterate through the chunks of memory we have removed from the kernel
- * and attempt to add them back to the kernel.
+ * Iterate through the chunks of memory we allocated and attempt to expose
+ * them back to the kernel.
  */
-static int memtrace_online(void)
+static int memtrace_free_regions(void)
 {
 	int i, ret = 0;
 	struct memtrace_entry *ent;
@@ -237,7 +197,7 @@ static int memtrace_online(void)
 	for (i = memtrace_array_nr - 1; i >= 0; i--) {
 		ent = &memtrace_array[i];
 
-		/* We have onlined this chunk previously */
+		/* We have freed this chunk previously */
 		if (ent->nid == NUMA_NO_NODE)
 			continue;
 
@@ -247,30 +207,25 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size, MHP_NONE)) {
-			pr_err("Failed to add trace memory to node %d\n",
+		if (memtrace_free(ent->nid, ent->start, ent->size)) {
+			pr_err("Failed to free trace memory on node %d\n",
 				ent->nid);
 			ret += 1;
 			continue;
 		}
 
-		lock_device_hotplug();
-		walk_memory_blocks(ent->start, ent->size, NULL,
-				   online_mem_block);
-		unlock_device_hotplug();
-
 		/*
-		 * Memory was added successfully so clean up references to it
-		 * so on reentry we can tell that this chunk was added.
+		 * Memory was freed successfully so clean up references to it
+		 * so on reentry we can tell that this chunk was freed.
 		 */
 		debugfs_remove_recursive(ent->dir);
-		pr_info("Added trace memory back to node %d\n", ent->nid);
+		pr_info("Freed trace memory back on node %d\n", ent->nid);
 		ent->size = ent->start = ent->nid = NUMA_NO_NODE;
 	}
 	if (ret)
 		return ret;
 
-	/* If all chunks of memory were added successfully, reset globals */
+	/* If all chunks of memory were freed successfully, reset globals */
 	kfree(memtrace_array);
 	memtrace_array = NULL;
 	memtrace_size = 0;
@@ -295,18 +250,16 @@ static int memtrace_enable_set(void *data, u64 val)
 
 	mutex_lock(&memtrace_mutex);
 
-	/* Re-add/online previously removed/offlined memory */
-	if (memtrace_size) {
-		if (memtrace_online())
-			goto out_unlock;
-	}
+	/* Free all previously allocated memory. */
+	if (memtrace_size && memtrace_free_regions())
+		goto out_unlock;
 
 	if (!val) {
 		rc = 0;
 		goto out_unlock;
 	}
 
-	/* Offline and remove memory */
+	/* Allocate memory. */
 	if (memtrace_init_regions_runtime(val))
 		goto out_unlock;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space
  2020-11-11 14:53 ` [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space David Hildenbrand
@ 2020-11-17 15:13   ` Oscar Salvador
  0 siblings, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 15:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman, stable,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang

On Wed, Nov 11, 2020 at 03:53:15PM +0100, David Hildenbrand wrote:
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
> Cc: stable@vger.kernel.org # v4.14+
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/8] powernv/memtrace: fix crashing the kernel when enabling concurrently
  2020-11-11 14:53 ` [PATCH v2 2/8] powernv/memtrace: fix crashing the kernel when enabling concurrently David Hildenbrand
@ 2020-11-17 15:22   ` Oscar Salvador
  0 siblings, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 15:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, stable, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta

On Wed, Nov 11, 2020 at 03:53:16PM +0100, David Hildenbrand wrote:
> Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing")
> Cc: stable@vger.kernel.org# v4.14+
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/8] powerpc/mm: factor out creating/removing linear mapping
  2020-11-11 14:53 ` [PATCH v2 3/8] powerpc/mm: factor out creating/removing linear mapping David Hildenbrand
@ 2020-11-17 15:27   ` Oscar Salvador
  0 siblings, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 15:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang

On Wed, Nov 11, 2020 at 03:53:17PM +0100, David Hildenbrand wrote:
> We want to stop abusing memory hotplug infrastructure in memtrace code
> to perform allocations and remove the linear mapping. Instead we will use
> alloc_contig_pages() and remove the linear mapping manually.
> 
> Let's factor out creating/removing the linear mapping into
> arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the
> future, we might be able to have whole arch_add_memory() /
> arch_remove_memory() be implemented in common code.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex
  2020-11-11 14:53 ` [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex David Hildenbrand
@ 2020-11-17 15:37   ` Oscar Salvador
  2020-11-17 15:46     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 15:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang

On Wed, Nov 11, 2020 at 03:53:18PM +0100, David Hildenbrand wrote:
> @@ -144,7 +147,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
>  	start = (unsigned long)__va(start);
>  	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
>  
> +	mutex_lock(&linear_mapping_mutex);
>  	ret = remove_section_mapping(start, start + size);
> +	mutex_unlock(&linear_mapping_mutex);
>  	WARN_ON_ONCE(ret);

My expertise in this area is low, so bear with me.

Why we do not need to protect flush_dcache_range_chunked and
vm_unmap_aliases?

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/8] powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping
  2020-11-11 14:53 ` [PATCH v2 6/8] powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping David Hildenbrand
@ 2020-11-17 15:45   ` Oscar Salvador
  0 siblings, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 15:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang,
	Aneesh Kumar K.V, Nicholas Piggin

On Wed, Nov 11, 2020 at 03:53:20PM +0100, David Hildenbrand wrote:
> The single caller (arch_remove_linear_mapping()) prints a proper warning
> when this function fails. No need to eventually crash the kernel - let's
> drop this WARN_ON.
> 
> Suggested-by: Oscar Salvador <osalvador@suse.de>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex
  2020-11-17 15:37   ` Oscar Salvador
@ 2020-11-17 15:46     ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2020-11-17 15:46 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang

On 17.11.20 16:37, Oscar Salvador wrote:
> On Wed, Nov 11, 2020 at 03:53:18PM +0100, David Hildenbrand wrote:
>> @@ -144,7 +147,9 @@ void __ref arch_remove_linear_mapping(u64 start, u64 size)
>>   	start = (unsigned long)__va(start);
>>   	flush_dcache_range_chunked(start, start + size, FLUSH_CHUNK_SIZE);
>>   
>> +	mutex_lock(&linear_mapping_mutex);
>>   	ret = remove_section_mapping(start, start + size);
>> +	mutex_unlock(&linear_mapping_mutex);
>>   	WARN_ON_ONCE(ret);
> 
> My expertise in this area is low, so bear with me.
> 
> Why we do not need to protect flush_dcache_range_chunked and
> vm_unmap_aliases?
> 

vm_unmap_aliases does own locking and can handle concurrent calls.


flush_dcache_range_chunked()->flush_dcache_range() ends up as a sequence 
of memory barriers paired with dcbf instructions.

dcbf: Copies modified cache blocks to main storage and invalidates the 
copy in the data cache.

It's called from various places and no global variables seem to be 
involved, so it looks like it doesn't need any kind of locking.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-11 14:53 ` [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
@ 2020-11-17 15:51   ` Oscar Salvador
  2020-11-17 15:53     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 15:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang

On Wed, Nov 11, 2020 at 03:53:21PM +0100, David Hildenbrand wrote:
> Let's revert what we did in case seomthing goes wrong and we return an
"something" :-)
> error - as already done on arm64 and s390x.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-17 15:51   ` Oscar Salvador
@ 2020-11-17 15:53     ` David Hildenbrand
  2020-11-18  2:00       ` Michael Ellerman
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2020-11-17 15:53 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, Rashmica Gupta,
	Andrew Morton, Mike Rapoport, Michal Hocko, Wei Yang

On 17.11.20 16:51, Oscar Salvador wrote:
> On Wed, Nov 11, 2020 at 03:53:21PM +0100, David Hildenbrand wrote:
>> Let's revert what we did in case seomthing goes wrong and we return an
> "something" :-)

Thanks! :)

@Michael, I assume if I don't have to resend, this can be fixed up?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-11-11 14:53 ` [PATCH v2 8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
@ 2020-11-17 16:45   ` Oscar Salvador
  0 siblings, 0 replies; 20+ messages in thread
From: Oscar Salvador @ 2020-11-17 16:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Michal Hocko,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rashmica Gupta, Andrew Morton, Mike Rapoport, Michal Hocko,
	Wei Yang

On Wed, Nov 11, 2020 at 03:53:22PM +0100, David Hildenbrand wrote:
> Suggested-by: Michal Hocko <mhocko@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

LGTM, and much cleaner definetely:

Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
  2020-11-17 15:53     ` David Hildenbrand
@ 2020-11-18  2:00       ` Michael Ellerman
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Ellerman @ 2020-11-18  2:00 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador
  Cc: linux-kernel, linux-mm, linuxppc-dev, Benjamin Herrenschmidt,
	Paul Mackerras, Rashmica Gupta, Andrew Morton, Mike Rapoport,
	Michal Hocko, Wei Yang

David Hildenbrand <david@redhat.com> writes:
> On 17.11.20 16:51, Oscar Salvador wrote:
>> On Wed, Nov 11, 2020 at 03:53:21PM +0100, David Hildenbrand wrote:
>>> Let's revert what we did in case seomthing goes wrong and we return an
>> "something" :-)
>
> Thanks! :)
>
> @Michael, I assume if I don't have to resend, this can be fixed up?

Yep, I fixed it up.

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
  2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-11-11 14:53 ` [PATCH v2 8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
@ 2020-11-25 11:57 ` Michael Ellerman
  8 siblings, 0 replies; 20+ messages in thread
From: Michael Ellerman @ 2020-11-25 11:57 UTC (permalink / raw)
  To: linux-kernel, David Hildenbrand
  Cc: Rashmica Gupta, Michael Ellerman, Andrew Morton, Paul Mackerras,
	Mike Rapoport, Michal Hocko, Michal Hocko, Nicholas Piggin,
	Wei Yang, Oscar Salvador, linux-mm, Benjamin Herrenschmidt,
	Aneesh Kumar K.V, linuxppc-dev

On Wed, 11 Nov 2020 15:53:14 +0100, David Hildenbrand wrote:
> Based on latest linux/master
> 
> powernv/memtrace is the only in-kernel user that rips out random memory
> it never added (doesn't own) in order to allocate memory without a
> linear mapping. Let's stop abusing memory hot(un)plug infrastructure for
> that - use alloc_contig_pages() for allocating memory and remove the
> linear mapping manually.
> 
> [...]

Applied to powerpc/next.

[1/8] powerpc/powernv/memtrace: Don't leak kernel memory to user space
      https://git.kernel.org/powerpc/c/c74cf7a3d59a21b290fe0468f5b470d0b8ee37df
[2/8] powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
      https://git.kernel.org/powerpc/c/d6718941a2767fb383e105d257d2105fe4f15f0e
[3/8] powerpc/mm: factor out creating/removing linear mapping
      https://git.kernel.org/powerpc/c/4abb1e5b63ac3281275315fc6b0cde0b9c2e2e42
[4/8] powerpc/mm: protect linear mapping modifications by a mutex
      https://git.kernel.org/powerpc/c/e5b2af044f31bf18defa557a8cd11c23caefa34c
[5/8] powerpc/mm: print warning in arch_remove_linear_mapping()
      https://git.kernel.org/powerpc/c/1f73ad3e8d755dbec52fcec98618a7ce4de12af2
[6/8] powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()
      https://git.kernel.org/powerpc/c/d8bd9a121c2f2bc8b36da930dc91b69fd2a705e2
[7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()
      https://git.kernel.org/powerpc/c/ca2c36cae9d48b180ea51259e35ab3d95d327df2
[8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations
      https://git.kernel.org/powerpc/c/0bd4b96d99108b7ea9bac0573957483be7781d70

cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-11-25 11:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-11 14:53 [PATCH v2 0/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
2020-11-11 14:53 ` [PATCH v2 1/8] powernv/memtrace: don't leak kernel memory to user space David Hildenbrand
2020-11-17 15:13   ` Oscar Salvador
2020-11-11 14:53 ` [PATCH v2 2/8] powernv/memtrace: fix crashing the kernel when enabling concurrently David Hildenbrand
2020-11-17 15:22   ` Oscar Salvador
2020-11-11 14:53 ` [PATCH v2 3/8] powerpc/mm: factor out creating/removing linear mapping David Hildenbrand
2020-11-17 15:27   ` Oscar Salvador
2020-11-11 14:53 ` [PATCH v2 4/8] powerpc/mm: protect linear mapping modifications by a mutex David Hildenbrand
2020-11-17 15:37   ` Oscar Salvador
2020-11-17 15:46     ` David Hildenbrand
2020-11-11 14:53 ` [PATCH v2 5/8] powerpc/mm: print warning in arch_remove_linear_mapping() David Hildenbrand
2020-11-11 14:53 ` [PATCH v2 6/8] powerepc/book3s64/hash: drop WARN_ON in hash__remove_section_mapping David Hildenbrand
2020-11-17 15:45   ` Oscar Salvador
2020-11-11 14:53 ` [PATCH v2 7/8] powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory() David Hildenbrand
2020-11-17 15:51   ` Oscar Salvador
2020-11-17 15:53     ` David Hildenbrand
2020-11-18  2:00       ` Michael Ellerman
2020-11-11 14:53 ` [PATCH v2 8/8] powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory allocations David Hildenbrand
2020-11-17 16:45   ` Oscar Salvador
2020-11-25 11:57 ` [PATCH v2 0/8] " Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).