All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/powernv: Add config option for removal of memory
@ 2017-04-28  5:42 Rashmica Gupta
  2017-04-28  5:42 ` [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing Rashmica Gupta
  2017-04-28  9:39 ` [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Anshuman Khandual
  0 siblings, 2 replies; 14+ messages in thread
From: Rashmica Gupta @ 2017-04-28  5:42 UTC (permalink / raw)
  To: linuxppc-dev, mpe, anton, npiggin, bsingharora, oohall; +Cc: Rashmica Gupta

Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
---
 arch/powerpc/platforms/powernv/Kconfig  | 4 ++++
 arch/powerpc/platforms/powernv/Makefile | 1 +
 2 files changed, 5 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 6a6f4ef..1b8b3a8 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -30,3 +30,7 @@ config OPAL_PRD
 	help
 	  This enables the opal-prd driver, a facility to run processor
 	  recovery diagnostics on OpenPower machines
+
+config HARDWARE_TRACING
+	bool 'Enable removal of memory for hardware memory tracing'
+	depends on PPC_POWERNV && MEMORY_HOTPLUG
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..e61be1b 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS)	+= opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD)	+= opal-prd.o
+obj-$(CONFIG_HARDWARE_TRACING)	+= memtrace.o
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-04-28  5:42 [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Rashmica Gupta
@ 2017-04-28  5:42 ` Rashmica Gupta
  2017-04-28  9:52   ` Anshuman Khandual
  2017-04-28  9:39 ` [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Anshuman Khandual
  1 sibling, 1 reply; 14+ messages in thread
From: Rashmica Gupta @ 2017-04-28  5:42 UTC (permalink / raw)
  To: linuxppc-dev, mpe, anton, npiggin, bsingharora, oohall; +Cc: Rashmica Gupta

Some powerpc hardware features may want to gain access to a chunk of
undisturbed real memory.  This update provides a means to unplug said memory
from the kernel with a set of debugfs calls.  By writing an integer containing
 the size of memory to be unplugged into
/sys/kernel/debug/powerpc/memtrace/enable, the code will remove that much
memory from the end of each available chip's memory space (ie each memory node).
In addition, the means to read out the contents of the unplugged memory is also
provided by reading out the /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
file.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>

---
This requires the 'Wire up hpte_removebolted for powernv' patch.

RFC -> v1: Added in two missing locks. Replaced the open-coded flush_memory_region() with the existing
flush_inval_dcache_range(start, end).

memtrace_offline_pages() is open-coded because offline_pages is designed to be
called through the sysfs interface - not directly.

We could move the offlining of pages to userspace, which removes some of this
open-coding. This would then require passing info to the kernel such that it
can then remove the memory that has been offlined. This could be done using
notifiers, but this isn't simple due to locking (remove_memory needs
mem_hotplug_begin() which the sysfs interface already has). This could also be
done through the debugfs interface (similar to what is done here). Either way,
this would require the process that needs the memory to have open-coded code
which it shouldn't really be involved with.

As the current remove_memory() function requires the memory to already be
offlined, it makes sense to keep the offlining and removal of memory
functionality grouped together so that a process can simply make one request to
unplug some memory. Ideally there would be a kernel function we could call that
would offline the memory and then remove it.


 arch/powerpc/platforms/powernv/memtrace.c | 276 ++++++++++++++++++++++++++++++
 1 file changed, 276 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/memtrace.c

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
new file mode 100644
index 0000000..86184b1
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -0,0 +1,276 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) IBM Corporation, 2014
+ *
+ * Author: Anton Blanchard <anton@au.ibm.com>
+ */
+
+#define pr_fmt(fmt) "powernv-memtrace: " fmt
+
+#include <linux/bitops.h>
+#include <linux/string.h>
+#include <linux/memblock.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+#include <linux/memory.h>
+#include <linux/memory_hotplug.h>
+#include <asm/machdep.h>
+#include <asm/debugfs.h>
+#include <asm/cacheflush.h>
+
+struct memtrace_entry {
+	void *mem;
+	u64 start;
+	u64 size;
+	u32 nid;
+	struct dentry *dir;
+	char name[16];
+};
+
+static struct memtrace_entry *memtrace_array;
+static unsigned int memtrace_array_nr;
+
+static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
+			     size_t count, loff_t *ppos)
+{
+	struct memtrace_entry *ent = filp->private_data;
+
+	return simple_read_from_buffer(ubuf, count, ppos, ent->mem, ent->size);
+}
+
+static bool valid_memtrace_range(struct memtrace_entry *dev,
+				 unsigned long start, unsigned long size)
+{
+	if ((dev->start <= start) &&
+	    ((start + size) <= (dev->start + dev->size)))
+		return true;
+
+	return false;
+}
+
+static int memtrace_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	unsigned long size = vma->vm_end - vma->vm_start;
+	struct memtrace_entry *dev = filp->private_data;
+
+	if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT, size))
+		return -EINVAL;
+
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+	if (io_remap_pfn_range(vma, vma->vm_start,
+			       vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
+			       size, vma->vm_page_prot))
+		return -EAGAIN;
+
+	return 0;
+}
+
+static const struct file_operations memtrace_fops = {
+	.llseek = default_llseek,
+	.read	= memtrace_read,
+	.mmap	= memtrace_mmap,
+	.open	= simple_open,
+};
+
+static int check_memblock_online(struct memory_block *mem, void *arg)
+{
+	if (mem->state != MEM_ONLINE)
+		return -1;
+
+	return 0;
+}
+
+static int change_memblock_state(struct memory_block *mem, void *arg)
+{
+	unsigned long state = (unsigned long)arg;
+
+	mem->state = state;
+	return 0;
+}
+
+static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
+{
+	u64 end_pfn = start_pfn + nr_pages - 1;
+
+	if (walk_memory_range(start_pfn, end_pfn, NULL,
+	    check_memblock_online))
+		return false;
+
+	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
+			  change_memblock_state);
+
+	mem_hotplug_begin();
+	if (offline_pages(start_pfn, nr_pages)) {
+		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
+				  change_memblock_state);
+		mem_hotplug_done();
+		return false;
+	}
+
+	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
+			  change_memblock_state);
+	mem_hotplug_done();
+
+	/* Clear the dcache to remove any references to the memory */
+	flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
+									(u64)__va(end_pfn << PAGE_SHIFT));
+
+	/* Now remove memory from the mappings */
+	lock_device_hotplug();
+	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
+	unlock_device_hotplug();
+
+	return true;
+}
+
+static u64 memtrace_alloc_node(u32 nid, u64 size)
+{
+	u64 start_pfn, end_pfn, nr_pages;
+	u64 base_pfn;
+
+	if (!NODE_DATA(nid) || !node_spanned_pages(nid))
+		return 0;
+
+	start_pfn = node_start_pfn(nid);
+	end_pfn = node_end_pfn(nid);
+	nr_pages = size >> PAGE_SHIFT;
+
+	/* Trace memory needs to be aligned to the size */
+	end_pfn = round_down(end_pfn - nr_pages, nr_pages);
+
+	for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
+		if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
+			return base_pfn << PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
+static int memtrace_init_regions_runtime(u64 size)
+{
+	u64 m;
+	u32 nid;
+
+	memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
+				num_online_nodes(), GFP_KERNEL);
+	if (!memtrace_array) {
+		pr_err("Failed to allocate memtrace_array\n");
+		return -EINVAL;
+	}
+
+	for_each_online_node(nid) {
+		m = memtrace_alloc_node(nid, size);
+		/*
+		 * A node might not have any local memory, so warn but
+		 * continue on.
+		 */
+		if (!m) {
+			pr_err("Failed to allocate trace memory on node %d\n",
+				 nid);
+		} else {
+			pr_info("Allocated trace memory on node %d at 0x%016llx\n",
+				 nid, m);
+
+			memtrace_array[memtrace_array_nr].start = m;
+			memtrace_array[memtrace_array_nr].size = size;
+			memtrace_array[memtrace_array_nr].nid = nid;
+			memtrace_array_nr++;
+		}
+	}
+	return 0;
+}
+
+static struct dentry *memtrace_debugfs_dir;
+
+static int memtrace_init_debugfs(void)
+{
+	int ret = 0;
+	int i;
+
+	for (i = 0; i < memtrace_array_nr; i++) {
+		struct dentry *dir;
+		struct memtrace_entry *ent = &memtrace_array[i];
+
+		ent->mem = ioremap(ent->start, ent->size);
+		/* Warn but continue on */
+		if (!ent->mem) {
+			pr_err("Failed to map trace memory at 0x%llx\n",
+				 ent->start);
+			ret = -1;
+			continue;
+		}
+
+		snprintf(ent->name, 16, "%08x", ent->nid);
+		dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
+		if (!dir)
+			return -1;
+
+		ent->dir = dir;
+		debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
+		debugfs_create_x64("start", 0400, dir, &ent->start);
+		debugfs_create_x64("size", 0400, dir, &ent->size);
+		debugfs_create_u32("node", 0400, dir, &ent->nid);
+	}
+
+	return ret;
+}
+
+static u64 memtrace_size;
+
+static int memtrace_enable_set(void *data, u64 val)
+{
+	if (memtrace_size)
+		return -EINVAL;
+
+	if (!val)
+		return -EINVAL;
+
+	/* Make sure size is aligned to a memory block */
+	if (val & (memory_block_size_bytes()-1))
+		return -EINVAL;
+
+	if (memtrace_init_regions_runtime(val))
+		return -EINVAL;
+
+	if (memtrace_init_debugfs())
+		return -EINVAL;
+
+	memtrace_size = val;
+
+	return 0;
+}
+
+static int memtrace_enable_get(void *data, u64 *val)
+{
+	*val = memtrace_size;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get, memtrace_enable_set, "0x%016llx\n");
+
+static int memtrace_init(void)
+{
+	memtrace_debugfs_dir = debugfs_create_dir("memtrace", powerpc_debugfs_root);
+	if (!memtrace_debugfs_dir)
+		return -1;
+
+	debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
+			    NULL, &memtrace_init_fops);
+
+	return 0;
+}
+machine_device_initcall(powernv, memtrace_init);
+
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] powerpc/powernv: Add config option for removal of memory
  2017-04-28  5:42 [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Rashmica Gupta
  2017-04-28  5:42 ` [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing Rashmica Gupta
@ 2017-04-28  9:39 ` Anshuman Khandual
  2017-05-03  3:52   ` Rashmica Gupta
  1 sibling, 1 reply; 14+ messages in thread
From: Anshuman Khandual @ 2017-04-28  9:39 UTC (permalink / raw)
  To: Rashmica Gupta, linuxppc-dev, mpe, anton, npiggin, bsingharora, oohall

On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>

Please describe a bit about the function/feature you are trying
to add with this new config option. The subject says " Add config
option for removal of memory" but I guess its not related to
memory hotplug but about hardware enabled tracing IIUC. Hence
it should have some amount of description.

> ---
>  arch/powerpc/platforms/powernv/Kconfig  | 4 ++++
>  arch/powerpc/platforms/powernv/Makefile | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
> index 6a6f4ef..1b8b3a8 100644
> --- a/arch/powerpc/platforms/powernv/Kconfig
> +++ b/arch/powerpc/platforms/powernv/Kconfig
> @@ -30,3 +30,7 @@ config OPAL_PRD
>  	help
>  	  This enables the opal-prd driver, a facility to run processor
>  	  recovery diagnostics on OpenPower machines
> +
> +config HARDWARE_TRACING

This is too generic for platform specific feature and also it does
not intend to fit into a generic HW tracing infrastructure. IMHO
it should be named something like "PPC64_HARDWARE_TRACING" or
something similar.

> +	bool 'Enable removal of memory for hardware memory tracing'

If this memory is going to be taken out of memblock like normal
memory hotplug and eventually goes away from kernel control, then
you need to be more specific about its usage.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-04-28  5:42 ` [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing Rashmica Gupta
@ 2017-04-28  9:52   ` Anshuman Khandual
  2017-05-03  3:52     ` Rashmica Gupta
  0 siblings, 1 reply; 14+ messages in thread
From: Anshuman Khandual @ 2017-04-28  9:52 UTC (permalink / raw)
  To: Rashmica Gupta, linuxppc-dev, mpe, anton, npiggin, bsingharora, oohall

On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
> Some powerpc hardware features may want to gain access to a chunk of

What kind of features ? Please add specifics.

> undisturbed real memory.  This update provides a means to unplug said memory

Undisturbed ? Meaning part of memblock and currently inside the buddy
allocator which we are trying to hot unplug out ?

> from the kernel with a set of debugfs calls.  By writing an integer containing
>  the size of memory to be unplugged into

Does the size has some constraints like aligned with memblock section
size ? LMB size ? page block size ? etc. Please add the details.

> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that much
> memory from the end of each available chip's memory space (ie each memory node).

<size> amount (I guess bytes in this case) of memory will be removed
from the end of the NUMA node ? Whats the guarantee that they would be
free at that time and not being pinned by some process ? If its not
guaranteed to be freed, then interface description should state that
clearly.

> In addition, the means to read out the contents of the unplugged memory is also
> provided by reading out the /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
> file.

All of the debugfs file interfaces added here should be documented some
where in detail.

> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
> 
> ---
> This requires the 'Wire up hpte_removebolted for powernv' patch.
> 
> RFC -> v1: Added in two missing locks. Replaced the open-coded flush_memory_region() with the existing
> flush_inval_dcache_range(start, end).
> 
> memtrace_offline_pages() is open-coded because offline_pages is designed to be
> called through the sysfs interface - not directly.
> 
> We could move the offlining of pages to userspace, which removes some of this
> open-coding. This would then require passing info to the kernel such that it
> can then remove the memory that has been offlined. This could be done using
> notifiers, but this isn't simple due to locking (remove_memory needs
> mem_hotplug_begin() which the sysfs interface already has). This could also be
> done through the debugfs interface (similar to what is done here). Either way,
> this would require the process that needs the memory to have open-coded code
> which it shouldn't really be involved with.
> 
> As the current remove_memory() function requires the memory to already be
> offlined, it makes sense to keep the offlining and removal of memory
> functionality grouped together so that a process can simply make one request to
> unplug some memory. Ideally there would be a kernel function we could call that
> would offline the memory and then remove it.
> 
> 
>  arch/powerpc/platforms/powernv/memtrace.c | 276 ++++++++++++++++++++++++++++++
>  1 file changed, 276 insertions(+)
>  create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> new file mode 100644
> index 0000000..86184b1
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -0,0 +1,276 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Copyright (C) IBM Corporation, 2014
> + *
> + * Author: Anton Blanchard <anton@au.ibm.com>
> + */
> +
> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/string.h>
> +#include <linux/memblock.h>
> +#include <linux/init.h>
> +#include <linux/moduleparam.h>
> +#include <linux/fs.h>
> +#include <linux/debugfs.h>
> +#include <linux/slab.h>
> +#include <linux/memory.h>
> +#include <linux/memory_hotplug.h>
> +#include <asm/machdep.h>
> +#include <asm/debugfs.h>
> +#include <asm/cacheflush.h>
> +
> +struct memtrace_entry {
> +	void *mem;
> +	u64 start;
> +	u64 size;
> +	u32 nid;
> +	struct dentry *dir;
> +	char name[16];
> +};

Little bit of description about the structure here will help.

> +
> +static struct memtrace_entry *memtrace_array;
> +static unsigned int memtrace_array_nr;
> +
> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
> +			     size_t count, loff_t *ppos)
> +{
> +	struct memtrace_entry *ent = filp->private_data;
> +
> +	return simple_read_from_buffer(ubuf, count, ppos, ent->mem, ent->size);
> +}
> +
> +static bool valid_memtrace_range(struct memtrace_entry *dev,
> +				 unsigned long start, unsigned long size)
> +{
> +	if ((dev->start <= start) &&

Switch the position of start and dev->start above. Will make
it easy while reading.

> +	    ((start + size) <= (dev->start + dev->size)))
> +		return true;
> +
> +	return false;
> +}
> +
> +static int memtrace_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	unsigned long size = vma->vm_end - vma->vm_start;
> +	struct memtrace_entry *dev = filp->private_data;
> +
> +	if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT, size))
> +		return -EINVAL;
> +
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

Why we do this ? Its coming from real RAM not IO memory. Then the page
protection still needs changes ?

> +
> +	if (io_remap_pfn_range(vma, vma->vm_start,
> +			       vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
> +			       size, vma->vm_page_prot))

You can just call remap_pfn_rang() instead though they are all the same.
There is nothing I/O here should be explicit.

> +		return -EAGAIN;
> +
> +	return 0;
> +}
> +
> +static const struct file_operations memtrace_fops = {
> +	.llseek = default_llseek,
> +	.read	= memtrace_read,
> +	.mmap	= memtrace_mmap,
> +	.open	= simple_open,
> +};
> +
> +static int check_memblock_online(struct memory_block *mem, void *arg)
> +{
> +	if (mem->state != MEM_ONLINE)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int change_memblock_state(struct memory_block *mem, void *arg)
> +{
> +	unsigned long state = (unsigned long)arg;
> +
> +	mem->state = state;
> +	return 0;
> +}
> +
> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
> +{
> +	u64 end_pfn = start_pfn + nr_pages - 1;
> +
> +	if (walk_memory_range(start_pfn, end_pfn, NULL,
> +	    check_memblock_online))
> +		return false;
> +
> +	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
> +			  change_memblock_state);
> +

walk_memory_range() might be expensive, cant we just change the state
to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
the first loop and bail out if any of the memblock is not in MEM_ONLINE
in the first place.

> +	mem_hotplug_begin();
> +	if (offline_pages(start_pfn, nr_pages)) {
> +		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
> +				  change_memblock_state);
> +		mem_hotplug_done();

Right, this can remain as is. If we fail to offline pages, mark the
memory blocks as MEM_ONLINE again.

> +		return false;
> +	}
> +
> +	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
> +			  change_memblock_state);
> +	mem_hotplug_done();

Right.

> +
> +	/* Clear the dcache to remove any references to the memory */
> +	flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
> +									(u64)__va(end_pfn << PAGE_SHIFT));

I am wondering why this is required now when we dont do anything for
cache flushing calls from core VM. If its really required now then
it also should be required during memory hot unplug operations in
general as well.

/*
 * No cache flushing is required when address mappings are changed,
 * because the caches on PowerPCs are physically addressed.
 */
#define flush_cache_all()			do { } while (0)
#define flush_cache_mm(mm)			do { } while (0)
#define flush_cache_dup_mm(mm)			do { } while (0)
#define flush_cache_range(vma, start, end)	do { } while (0)
#define flush_cache_page(vma, vmaddr, pfn)	do { } while (0)
#define flush_icache_page(vma, page)		do { } while (0)
#define flush_cache_vmap(start, end)		do { } while (0)
#define flush_cache_vunmap(start, end)		do { } while (0)


> +
> +	/* Now remove memory from the mappings */
> +	lock_device_hotplug();
> +	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
> +	unlock_device_hotplug();

Right. Now we have successfully taken down the memory.

> +
> +	return true;
> +}
> +
> +static u64 memtrace_alloc_node(u32 nid, u64 size)
> +{
> +	u64 start_pfn, end_pfn, nr_pages;
> +	u64 base_pfn;
> +
> +	if (!NODE_DATA(nid) || !node_spanned_pages(nid))
> +		return 0;

Why NODE_DATA check is required here ? Each node should have one
allocated and initialized by now, else we have bigger problems.

Is there any specific reason to check for spanned pages instead
of present/managed pages.

> +
> +	start_pfn = node_start_pfn(nid);
> +	end_pfn = node_end_pfn(nid);
> +	nr_pages = size >> PAGE_SHIFT;
> +
> +	/* Trace memory needs to be aligned to the size */
> +	end_pfn = round_down(end_pfn - nr_pages, nr_pages);
> +
> +	for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
> +		if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
> +			return base_pfn << PAGE_SHIFT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int memtrace_init_regions_runtime(u64 size)
> +{
> +	u64 m;
> +	u32 nid;
> +
> +	memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
> +				num_online_nodes(), GFP_KERNEL);
> +	if (!memtrace_array) {
> +		pr_err("Failed to allocate memtrace_array\n");
> +		return -EINVAL;
> +	}
> +
> +	for_each_online_node(nid) {
> +		m = memtrace_alloc_node(nid, size);
> +		/*
> +		 * A node might not have any local memory, so warn but
> +		 * continue on.
> +		 */
> +		if (!m) {
> +			pr_err("Failed to allocate trace memory on node %d\n",
> +				 nid);
> +		} else {
> +			pr_info("Allocated trace memory on node %d at 0x%016llx\n",
> +				 nid, m);
> +
> +			memtrace_array[memtrace_array_nr].start = m;
> +			memtrace_array[memtrace_array_nr].size = size;
> +			memtrace_array[memtrace_array_nr].nid = nid;
> +			memtrace_array_nr++;
> +		}
> +	}
> +	return 0;
> +}

All the pr_info() and pr_err() prints should have a "memtrace :" before the
actual string to make it clear that its coming from this feature.

> +
> +static struct dentry *memtrace_debugfs_dir;
> +
> +static int memtrace_init_debugfs(void)
> +{
> +	int ret = 0;
> +	int i;
> +
> +	for (i = 0; i < memtrace_array_nr; i++) {
> +		struct dentry *dir;
> +		struct memtrace_entry *ent = &memtrace_array[i];
> +
> +		ent->mem = ioremap(ent->start, ent->size);
> +		/* Warn but continue on */
> +		if (!ent->mem) {
> +			pr_err("Failed to map trace memory at 0x%llx\n",
> +				 ent->start);
> +			ret = -1;
> +			continue;
> +		}
> +
> +		snprintf(ent->name, 16, "%08x", ent->nid);
> +		dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
> +		if (!dir)
> +			return -1;
> +
> +		ent->dir = dir;
> +		debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
> +		debugfs_create_x64("start", 0400, dir, &ent->start);
> +		debugfs_create_x64("size", 0400, dir, &ent->size);
> +		debugfs_create_u32("node", 0400, dir, &ent->nid);
> +	}

Oh okay, its creating all the four files. Please create corresponding
to each of the files some where. Documentation/ABI/testing lists the
actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.

> +
> +	return ret;
> +}
> +
> +static u64 memtrace_size;
> +
> +static int memtrace_enable_set(void *data, u64 val)
> +{
> +	if (memtrace_size)
> +		return -EINVAL;
> +
> +	if (!val)
> +		return -EINVAL;
> +
> +	/* Make sure size is aligned to a memory block */
> +	if (val & (memory_block_size_bytes()-1))

As I have mentioned earlier, this should be mentioned in the interface
description some where.

> +		return -EINVAL;
> +
> +	if (memtrace_init_regions_runtime(val))
> +		return -EINVAL;
> +
> +	if (memtrace_init_debugfs())
> +		return -EINVAL;
> +
> +	memtrace_size = val;
> +
> +	return 0;
> +}
> +
> +static int memtrace_enable_get(void *data, u64 *val)
> +{
> +	*val = memtrace_size;
> +	return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get, memtrace_enable_set, "0x%016llx\n");
> +
> +static int memtrace_init(void)
> +{
> +	memtrace_debugfs_dir = debugfs_create_dir("memtrace", powerpc_debugfs_root);
> +	if (!memtrace_debugfs_dir)
> +		return -1;
> +
> +	debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
> +			    NULL, &memtrace_init_fops);
> +
> +	return 0;
> +}
> +machine_device_initcall(powernv, memtrace_init);
> +
> 

BTW how we start the tracing process for the trace to be collected in the
interface before we can read them ? This interface does not seem to have
a handler. When it directs the HW to start collecting the traces ?

debugfs_create_x64("start", 0400, dir, &ent->start);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] powerpc/powernv: Add config option for removal of memory
  2017-04-28  9:39 ` [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Anshuman Khandual
@ 2017-05-03  3:52   ` Rashmica Gupta
  0 siblings, 0 replies; 14+ messages in thread
From: Rashmica Gupta @ 2017-05-03  3:52 UTC (permalink / raw)
  To: Anshuman Khandual, linuxppc-dev, mpe, anton, npiggin,
	bsingharora, oohall


On 28/04/17 19:39, Anshuman Khandual wrote:
> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
> Please describe a bit about the function/feature you are trying
> to add with this new config option. The subject says " Add config
> option for removal of memory" but I guess its not related to
> memory hotplug but about hardware enabled tracing IIUC.

Correct!

> Hence
> it should have some amount of description.
>
>> ---
>>   arch/powerpc/platforms/powernv/Kconfig  | 4 ++++
>>   arch/powerpc/platforms/powernv/Makefile | 1 +
>>   2 files changed, 5 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
>> index 6a6f4ef..1b8b3a8 100644
>> --- a/arch/powerpc/platforms/powernv/Kconfig
>> +++ b/arch/powerpc/platforms/powernv/Kconfig
>> @@ -30,3 +30,7 @@ config OPAL_PRD
>>   	help
>>   	  This enables the opal-prd driver, a facility to run processor
>>   	  recovery diagnostics on OpenPower machines
>> +
>> +config HARDWARE_TRACING
> This is too generic for platform specific feature and also it does
> not intend to fit into a generic HW tracing infrastructure. IMHO
> it should be named something like "PPC64_HARDWARE_TRACING" or
> something similar.
>
>> +	bool 'Enable removal of memory for hardware memory tracing'
> If this memory is going to be taken out of memblock like normal
> memory hotplug and eventually goes away from kernel control, then
> you need to be more specific about its usage.
>

All good points! Thanks for the review.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-04-28  9:52   ` Anshuman Khandual
@ 2017-05-03  3:52     ` Rashmica Gupta
  2017-05-03  6:08       ` Rashmica Gupta
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Rashmica Gupta @ 2017-05-03  3:52 UTC (permalink / raw)
  To: Anshuman Khandual, Anton Blanchard, linuxppc-dev, mpe, anton,
	npiggin, bsingharora, oohall

On 28/04/17 19:52, Anshuman Khandual wrote:
> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>> Some powerpc hardware features may want to gain access to a chunk of
> What kind of features ? Please add specifics.
>
>> undisturbed real memory.  This update provides a means to unplug said memory
> Undisturbed ? Meaning part of memblock and currently inside the buddy
> allocator which we are trying to hot unplug out ?
>
>> from the kernel with a set of debugfs calls.  By writing an integer containing
>>   the size of memory to be unplugged into
> Does the size has some constraints like aligned with memblock section
> size ? LMB size ? page block size ? etc. Please add the details.

Will do.

>
>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that much
>> memory from the end of each available chip's memory space (ie each memory node).
> <size> amount (I guess bytes in this case) of memory will be removed
> from the end of the NUMA node ? Whats the guarantee that they would be
> free at that time and not being pinned by some process ? If its not
> guaranteed to be freed, then interface description should state that
> clearly.

We start looking from the end of the NUMA node but of course there is no 
guarantee
that we will always be able to find some memory there that we are able 
to remove.

>> In addition, the means to read out the contents of the unplugged memory is also
>> provided by reading out the /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
>> file.
> All of the debugfs file interfaces added here should be documented some
> where in detail.
>
>> Signed-off-by: Anton Blanchard <anton@samba.org>
>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
>>
>> ---
>> This requires the 'Wire up hpte_removebolted for powernv' patch.
>>
>> RFC -> v1: Added in two missing locks. Replaced the open-coded flush_memory_region() with the existing
>> flush_inval_dcache_range(start, end).
>>
>> memtrace_offline_pages() is open-coded because offline_pages is designed to be
>> called through the sysfs interface - not directly.
>>
>> We could move the offlining of pages to userspace, which removes some of this
>> open-coding. This would then require passing info to the kernel such that it
>> can then remove the memory that has been offlined. This could be done using
>> notifiers, but this isn't simple due to locking (remove_memory needs
>> mem_hotplug_begin() which the sysfs interface already has). This could also be
>> done through the debugfs interface (similar to what is done here). Either way,
>> this would require the process that needs the memory to have open-coded code
>> which it shouldn't really be involved with.
>>
>> As the current remove_memory() function requires the memory to already be
>> offlined, it makes sense to keep the offlining and removal of memory
>> functionality grouped together so that a process can simply make one request to
>> unplug some memory. Ideally there would be a kernel function we could call that
>> would offline the memory and then remove it.
>>
>>
>>   arch/powerpc/platforms/powernv/memtrace.c | 276 ++++++++++++++++++++++++++++++
>>   1 file changed, 276 insertions(+)
>>   create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>
>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
>> new file mode 100644
>> index 0000000..86184b1
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>> @@ -0,0 +1,276 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * Copyright (C) IBM Corporation, 2014
>> + *
>> + * Author: Anton Blanchard <anton@au.ibm.com>
>> + */
>> +
>> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
>> +
>> +#include <linux/bitops.h>
>> +#include <linux/string.h>
>> +#include <linux/memblock.h>
>> +#include <linux/init.h>
>> +#include <linux/moduleparam.h>
>> +#include <linux/fs.h>
>> +#include <linux/debugfs.h>
>> +#include <linux/slab.h>
>> +#include <linux/memory.h>
>> +#include <linux/memory_hotplug.h>
>> +#include <asm/machdep.h>
>> +#include <asm/debugfs.h>
>> +#include <asm/cacheflush.h>
>> +
>> +struct memtrace_entry {
>> +	void *mem;
>> +	u64 start;
>> +	u64 size;
>> +	u32 nid;
>> +	struct dentry *dir;
>> +	char name[16];
>> +};
> Little bit of description about the structure here will help.

Something like 'this enables us to keep track of the memory removed from 
each node'?

>> +
>> +static struct memtrace_entry *memtrace_array;
>> +static unsigned int memtrace_array_nr;
>> +
>> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
>> +			     size_t count, loff_t *ppos)
>> +{
>> +	struct memtrace_entry *ent = filp->private_data;
>> +
>> +	return simple_read_from_buffer(ubuf, count, ppos, ent->mem, ent->size);
>> +}
>> +
>> +static bool valid_memtrace_range(struct memtrace_entry *dev,
>> +				 unsigned long start, unsigned long size)
>> +{
>> +	if ((dev->start <= start) &&
> Switch the position of start and dev->start above. Will make
> it easy while reading.
>
>> +	    ((start + size) <= (dev->start + dev->size)))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static int memtrace_mmap(struct file *filp, struct vm_area_struct *vma)
>> +{
>> +	unsigned long size = vma->vm_end - vma->vm_start;
>> +	struct memtrace_entry *dev = filp->private_data;
>> +
>> +	if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT, size))
>> +		return -EINVAL;
>> +
>> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> Why we do this ? Its coming from real RAM not IO memory. Then the page
> protection still needs changes ?

Once the memory is removed from the kernel mappings we want to mark it as
uncachable.

>> +
>> +	if (io_remap_pfn_range(vma, vma->vm_start,
>> +			       vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
>> +			       size, vma->vm_page_prot))
> You can just call remap_pfn_rang() instead though they are all the same.
> There is nothing I/O here should be explicit.

Good point.

>> +		return -EAGAIN;
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct file_operations memtrace_fops = {
>> +	.llseek = default_llseek,
>> +	.read	= memtrace_read,
>> +	.mmap	= memtrace_mmap,
>> +	.open	= simple_open,
>> +};
>> +
>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>> +{
>> +	if (mem->state != MEM_ONLINE)
>> +		return -1;
>> +
>> +	return 0;
>> +}
>> +
>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>> +{
>> +	unsigned long state = (unsigned long)arg;
>> +
>> +	mem->state = state;
>> +	return 0;
>> +}
>> +
>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
>> +{
>> +	u64 end_pfn = start_pfn + nr_pages - 1;
>> +
>> +	if (walk_memory_range(start_pfn, end_pfn, NULL,
>> +	    check_memblock_online))
>> +		return false;
>> +
>> +	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>> +			  change_memblock_state);
>> +
> walk_memory_range() might be expensive, cant we just change the state
> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
> the first loop and bail out if any of the memblock is not in MEM_ONLINE
> in the first place.

Good idea.

>> +	mem_hotplug_begin();
>> +	if (offline_pages(start_pfn, nr_pages)) {
>> +		walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>> +				  change_memblock_state);
>> +		mem_hotplug_done();
> Right, this can remain as is. If we fail to offline pages, mark the
> memory blocks as MEM_ONLINE again.
>
>> +		return false;
>> +	}
>> +
>> +	walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>> +			  change_memblock_state);
>> +	mem_hotplug_done();
> Right.
>
>> +
>> +	/* Clear the dcache to remove any references to the memory */
>> +	flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
>> +				   (u64)__va(end_pfn << PAGE_SHIFT));
> I am wondering why this is required now when we dont do anything for
> cache flushing calls from core VM. If its really required now then
> it also should be required during memory hot unplug operations in
> general as well.

I could not see if this was being done when removing memory so figured
that it was better to put it in than not do it.

>
> /*
>   * No cache flushing is required when address mappings are changed,
>   * because the caches on PowerPCs are physically addressed.
>   */
> #define flush_cache_all()			do { } while (0)
> #define flush_cache_mm(mm)			do { } while (0)
> #define flush_cache_dup_mm(mm)			do { } while (0)
> #define flush_cache_range(vma, start, end)	do { } while (0)
> #define flush_cache_page(vma, vmaddr, pfn)	do { } while (0)
> #define flush_icache_page(vma, page)		do { } while (0)
> #define flush_cache_vmap(start, end)		do { } while (0)
> #define flush_cache_vunmap(start, end)		do { } while (0)
>
>
>> +
>> +	/* Now remove memory from the mappings */
>> +	lock_device_hotplug();
>> +	remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
>> +	unlock_device_hotplug();
> Right. Now we have successfully taken down the memory.
>
>> +
>> +	return true;
>> +}
>> +
>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>> +{
>> +	u64 start_pfn, end_pfn, nr_pages;
>> +	u64 base_pfn;
>> +
>> +	if (!NODE_DATA(nid) || !node_spanned_pages(nid))
>> +		return 0;
> Why NODE_DATA check is required here ? Each node should have one
> allocated and initialized by now, else we have bigger problems.
> Is there any specific reason to check for spanned pages instead
> of present/managed pages.

Anton wrote this check, so will need to confirm with him. I assume
we check node_spanned_pages() rather than node_present_pages()
because in arch/powerpc/mm/numa.c we set node_spanned_pages() and
not node_present_pages()?

>
>> +
>> +	start_pfn = node_start_pfn(nid);
>> +	end_pfn = node_end_pfn(nid);
>> +	nr_pages = size >> PAGE_SHIFT;
>> +
>> +	/* Trace memory needs to be aligned to the size */
>> +	end_pfn = round_down(end_pfn - nr_pages, nr_pages);
>> +
>> +	for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -= nr_pages) {
>> +		if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
>> +			return base_pfn << PAGE_SHIFT;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int memtrace_init_regions_runtime(u64 size)
>> +{
>> +	u64 m;
>> +	u32 nid;
>> +
>> +	memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
>> +				num_online_nodes(), GFP_KERNEL);
>> +	if (!memtrace_array) {
>> +		pr_err("Failed to allocate memtrace_array\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	for_each_online_node(nid) {
>> +		m = memtrace_alloc_node(nid, size);
>> +		/*
>> +		 * A node might not have any local memory, so warn but
>> +		 * continue on.
>> +		 */
>> +		if (!m) {
>> +			pr_err("Failed to allocate trace memory on node %d\n",
>> +				 nid);
>> +		} else {
>> +			pr_info("Allocated trace memory on node %d at 0x%016llx\n",
>> +				 nid, m);
>> +
>> +			memtrace_array[memtrace_array_nr].start = m;
>> +			memtrace_array[memtrace_array_nr].size = size;
>> +			memtrace_array[memtrace_array_nr].nid = nid;
>> +			memtrace_array_nr++;
>> +		}
>> +	}
>> +	return 0;
>> +}
> All the pr_info() and pr_err() prints should have a "memtrace :" before the
> actual string to make it clear that its coming from this feature.
>
Good point!
>> +
>> +static struct dentry *memtrace_debugfs_dir;
>> +
>> +static int memtrace_init_debugfs(void)
>> +{
>> +	int ret = 0;
>> +	int i;
>> +
>> +	for (i = 0; i < memtrace_array_nr; i++) {
>> +		struct dentry *dir;
>> +		struct memtrace_entry *ent = &memtrace_array[i];
>> +
>> +		ent->mem = ioremap(ent->start, ent->size);
>> +		/* Warn but continue on */
>> +		if (!ent->mem) {
>> +			pr_err("Failed to map trace memory at 0x%llx\n",
>> +				 ent->start);
>> +			ret = -1;
>> +			continue;
>> +		}
>> +
>> +		snprintf(ent->name, 16, "%08x", ent->nid);
>> +		dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
>> +		if (!dir)
>> +			return -1;
>> +
>> +		ent->dir = dir;
>> +		debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
>> +		debugfs_create_x64("start", 0400, dir, &ent->start);
>> +		debugfs_create_x64("size", 0400, dir, &ent->size);
>> +		debugfs_create_u32("node", 0400, dir, &ent->nid);
>> +	}
> Oh okay, its creating all the four files. Please create corresponding
> to each of the files some where. Documentation/ABI/testing lists the
> actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.

I'm not exactly sure what you are saying here... Seeing that there is
documentation about debugfs files in Documentation/ABI/testing, I'll 
follow suit
and put it there.

>> +
>> +	return ret;
>> +}
>> +
>> +static u64 memtrace_size;
>> +
>> +static int memtrace_enable_set(void *data, u64 val)
>> +{
>> +	if (memtrace_size)
>> +		return -EINVAL;
>> +
>> +	if (!val)
>> +		return -EINVAL;
>> +
>> +	/* Make sure size is aligned to a memory block */
>> +	if (val & (memory_block_size_bytes()-1))
> As I have mentioned earlier, this should be mentioned in the interface
> description some where.
>
>> +		return -EINVAL;
>> +
>> +	if (memtrace_init_regions_runtime(val))
>> +		return -EINVAL;
>> +
>> +	if (memtrace_init_debugfs())
>> +		return -EINVAL;
>> +
>> +	memtrace_size = val;
>> +
>> +	return 0;
>> +}
>> +
>> +static int memtrace_enable_get(void *data, u64 *val)
>> +{
>> +	*val = memtrace_size;
>> +	return 0;
>> +}
>> +
>> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get, memtrace_enable_set, "0x%016llx\n");
>> +
>> +static int memtrace_init(void)
>> +{
>> +	memtrace_debugfs_dir = debugfs_create_dir("memtrace", powerpc_debugfs_root);
>> +	if (!memtrace_debugfs_dir)
>> +		return -1;
>> +
>> +	debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
>> +			    NULL, &memtrace_init_fops);
>> +
>> +	return 0;
>> +}
>> +machine_device_initcall(powernv, memtrace_init);
>> +
>>
> BTW how we start the tracing process for the trace to be collected in the
> interface before we can read them ? This interface does not seem to have
> a handler. When it directs the HW to start collecting the traces ?
>
> debugfs_create_x64("start", 0400, dir, &ent->start);
>
>
I think you're asking 'what is actually going to call this code and do 
the tracing'?
There is some userspace code in the works to do this. The person who was 
writing it
is doing something else now, and I think it's a bit gross so I'm trying 
to tidy it
up a little.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-03  3:52     ` Rashmica Gupta
@ 2017-05-03  6:08       ` Rashmica Gupta
  2017-05-03 11:25         ` Anshuman Khandual
  2017-05-03  6:08       ` Rashmica Gupta
  2017-05-03 11:56       ` Anshuman Khandual
  2 siblings, 1 reply; 14+ messages in thread
From: Rashmica Gupta @ 2017-05-03  6:08 UTC (permalink / raw)
  To: linuxppc-dev, Anshuman Khandual



On 03/05/17 13:52, Rashmica Gupta wrote:
> On 28/04/17 19:52, Anshuman Khandual wrote:
> ....
>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>> +{
>>> +    if (mem->state != MEM_ONLINE)
>>> +        return -1;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>> +{
>>> +    unsigned long state = (unsigned long)arg;
>>> +
>>> +    mem->state = state;
>>> +    return 0;
>>> +}
>>> +
>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 
>>> nr_pages)
>>> +{
>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>> +
>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>> +        check_memblock_online))
>>> +        return false;
>>> +
>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>> +              change_memblock_state);
>>> +
>> walk_memory_range() might be expensive, cant we just change the state
>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>> in the first place.
>
> Good idea.
>

This is assuming that it's more likely that the state of memory will be 
MEM_ONLINE rather than anything else (if the state isn't MEM_ONLINE we 
will still have to do a second call of walk_memory_range() to revert the 
state of any memory blocks that we changed). Seems like a reasonable 
assumption to me, thoughts?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-03  3:52     ` Rashmica Gupta
  2017-05-03  6:08       ` Rashmica Gupta
@ 2017-05-03  6:08       ` Rashmica Gupta
  2017-05-03 11:56       ` Anshuman Khandual
  2 siblings, 0 replies; 14+ messages in thread
From: Rashmica Gupta @ 2017-05-03  6:08 UTC (permalink / raw)
  To: linuxppc-dev, Anshuman Khandual



On 03/05/17 13:52, Rashmica Gupta wrote:
> On 28/04/17 19:52, Anshuman Khandual wrote:
> ....
>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>> +{
>>> +    if (mem->state != MEM_ONLINE)
>>> +        return -1;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>> +{
>>> +    unsigned long state = (unsigned long)arg;
>>> +
>>> +    mem->state = state;
>>> +    return 0;
>>> +}
>>> +
>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 
>>> nr_pages)
>>> +{
>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>> +
>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>> +        check_memblock_online))
>>> +        return false;
>>> +
>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>> +              change_memblock_state);
>>> +
>> walk_memory_range() might be expensive, cant we just change the state
>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>> in the first place.
>
> Good idea.
>

This is assuming that it's more likely that the state of memory will be 
MEM_ONLINE rather than anything else (if the state isn't MEM_ONLINE we 
will still have to do a second call of walk_memory_range() to revert the 
state of any memory blocks that we changed). Seems like a reasonable 
assumption to me, thoughts?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-03  6:08       ` Rashmica Gupta
@ 2017-05-03 11:25         ` Anshuman Khandual
  0 siblings, 0 replies; 14+ messages in thread
From: Anshuman Khandual @ 2017-05-03 11:25 UTC (permalink / raw)
  To: Rashmica Gupta, linuxppc-dev

On 05/03/2017 11:38 AM, Rashmica Gupta wrote:
> 
> 
> On 03/05/17 13:52, Rashmica Gupta wrote:
>> On 28/04/17 19:52, Anshuman Khandual wrote:
>> ....
>>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>>> +{
>>>> +    if (mem->state != MEM_ONLINE)
>>>> +        return -1;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>>> +{
>>>> +    unsigned long state = (unsigned long)arg;
>>>> +
>>>> +    mem->state = state;
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64
>>>> nr_pages)
>>>> +{
>>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>>> +
>>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>>> +        check_memblock_online))
>>>> +        return false;
>>>> +
>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>>> +              change_memblock_state);
>>>> +
>>> walk_memory_range() might be expensive, cant we just change the state
>>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>>> in the first place.
>>
>> Good idea.
>>
> 
> This is assuming that it's more likely that the state of memory will be
> MEM_ONLINE rather than anything else (if the state isn't MEM_ONLINE we
> will still have to do a second call of walk_memory_range() to revert the
> state of any memory blocks that we changed). Seems like a reasonable
> assumption to me, thoughts?

Revert the state of memory blocks that we changed till the point we
discover that something is not MEM_ONLINE and when we decide to abort.
In that case we have to remember all the changes we have done till
that point for us to revert back. Lets keep it as it is.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-03  3:52     ` Rashmica Gupta
  2017-05-03  6:08       ` Rashmica Gupta
  2017-05-03  6:08       ` Rashmica Gupta
@ 2017-05-03 11:56       ` Anshuman Khandual
  2017-05-09  7:06         ` Rashmica Gupta
  2 siblings, 1 reply; 14+ messages in thread
From: Anshuman Khandual @ 2017-05-03 11:56 UTC (permalink / raw)
  To: Rashmica Gupta, Anton Blanchard, linuxppc-dev, mpe, npiggin,
	bsingharora, oohall

On 05/03/2017 09:22 AM, Rashmica Gupta wrote:
> On 28/04/17 19:52, Anshuman Khandual wrote:
>> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>>> Some powerpc hardware features may want to gain access to a chunk of
>> What kind of features ? Please add specifics.
>>
>>> undisturbed real memory.  This update provides a means to unplug said
>>> memory
>> Undisturbed ? Meaning part of memblock and currently inside the buddy
>> allocator which we are trying to hot unplug out ?
>>
>>> from the kernel with a set of debugfs calls.  By writing an integer
>>> containing
>>>   the size of memory to be unplugged into
>> Does the size has some constraints like aligned with memblock section
>> size ? LMB size ? page block size ? etc. Please add the details.
> 
> Will do.
> 
>>
>>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that
>>> much
>>> memory from the end of each available chip's memory space (ie each
>>> memory node).
>> <size> amount (I guess bytes in this case) of memory will be removed
>> from the end of the NUMA node ? Whats the guarantee that they would be
>> free at that time and not being pinned by some process ? If its not
>> guaranteed to be freed, then interface description should state that
>> clearly.
> 
> We start looking from the end of the NUMA node but of course there is no
> guarantee
> that we will always be able to find some memory there that we are able
> to remove.


Okay. Do we have interface for giving this memory back to the buddy
allocator again when we are done with HW tracing ? If not we need to
add one.

> 
>>> In addition, the means to read out the contents of the unplugged
>>> memory is also
>>> provided by reading out the
>>> /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
>>> file.
>> All of the debugfs file interfaces added here should be documented some
>> where in detail.
>>
>>> Signed-off-by: Anton Blanchard <anton@samba.org>
>>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
>>>
>>> ---
>>> This requires the 'Wire up hpte_removebolted for powernv' patch.
>>>
>>> RFC -> v1: Added in two missing locks. Replaced the open-coded
>>> flush_memory_region() with the existing
>>> flush_inval_dcache_range(start, end).
>>>
>>> memtrace_offline_pages() is open-coded because offline_pages is
>>> designed to be
>>> called through the sysfs interface - not directly.
>>>
>>> We could move the offlining of pages to userspace, which removes some
>>> of this
>>> open-coding. This would then require passing info to the kernel such
>>> that it
>>> can then remove the memory that has been offlined. This could be done
>>> using
>>> notifiers, but this isn't simple due to locking (remove_memory needs
>>> mem_hotplug_begin() which the sysfs interface already has). This
>>> could also be
>>> done through the debugfs interface (similar to what is done here).
>>> Either way,
>>> this would require the process that needs the memory to have
>>> open-coded code
>>> which it shouldn't really be involved with.
>>>
>>> As the current remove_memory() function requires the memory to
>>> already be
>>> offlined, it makes sense to keep the offlining and removal of memory
>>> functionality grouped together so that a process can simply make one
>>> request to
>>> unplug some memory. Ideally there would be a kernel function we could
>>> call that
>>> would offline the memory and then remove it.
>>>
>>>
>>>   arch/powerpc/platforms/powernv/memtrace.c | 276
>>> ++++++++++++++++++++++++++++++
>>>   1 file changed, 276 insertions(+)
>>>   create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c
>>> b/arch/powerpc/platforms/powernv/memtrace.c
>>> new file mode 100644
>>> index 0000000..86184b1
>>> --- /dev/null
>>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>>> @@ -0,0 +1,276 @@
>>> +/*
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * Copyright (C) IBM Corporation, 2014
>>> + *
>>> + * Author: Anton Blanchard <anton@au.ibm.com>
>>> + */
>>> +
>>> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
>>> +
>>> +#include <linux/bitops.h>
>>> +#include <linux/string.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/init.h>
>>> +#include <linux/moduleparam.h>
>>> +#include <linux/fs.h>
>>> +#include <linux/debugfs.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/memory.h>
>>> +#include <linux/memory_hotplug.h>
>>> +#include <asm/machdep.h>
>>> +#include <asm/debugfs.h>
>>> +#include <asm/cacheflush.h>
>>> +
>>> +struct memtrace_entry {
>>> +    void *mem;
>>> +    u64 start;
>>> +    u64 size;
>>> +    u32 nid;
>>> +    struct dentry *dir;
>>> +    char name[16];
>>> +};
>> Little bit of description about the structure here will help.
> 
> Something like 'this enables us to keep track of the memory removed from
> each node'?

Right, something like that.

> 
>>> +
>>> +static struct memtrace_entry *memtrace_array;
>>> +static unsigned int memtrace_array_nr;
>>> +
>>> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
>>> +                 size_t count, loff_t *ppos)
>>> +{
>>> +    struct memtrace_entry *ent = filp->private_data;
>>> +
>>> +    return simple_read_from_buffer(ubuf, count, ppos, ent->mem,
>>> ent->size);
>>> +}
>>> +
>>> +static bool valid_memtrace_range(struct memtrace_entry *dev,
>>> +                 unsigned long start, unsigned long size)
>>> +{
>>> +    if ((dev->start <= start) &&
>> Switch the position of start and dev->start above. Will make
>> it easy while reading.
>>
>>> +        ((start + size) <= (dev->start + dev->size)))
>>> +        return true;
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +static int memtrace_mmap(struct file *filp, struct vm_area_struct *vma)
>>> +{
>>> +    unsigned long size = vma->vm_end - vma->vm_start;
>>> +    struct memtrace_entry *dev = filp->private_data;
>>> +
>>> +    if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT, size))
>>> +        return -EINVAL;
>>> +
>>> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> Why we do this ? Its coming from real RAM not IO memory. Then the page
>> protection still needs changes ?
> 
> Once the memory is removed from the kernel mappings we want to mark it as
> uncachable.

Got it but why ? Uncachable marking are for pages which will be mapped
to IO ranges which should not be cached just to prevent the possibility
of stale data.

> 
>>> +
>>> +    if (io_remap_pfn_range(vma, vma->vm_start,
>>> +                   vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
>>> +                   size, vma->vm_page_prot))
>> You can just call remap_pfn_rang() instead though they are all the same.
>> There is nothing I/O here should be explicit.
> 
> Good point.
> 
>>> +        return -EAGAIN;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static const struct file_operations memtrace_fops = {
>>> +    .llseek = default_llseek,
>>> +    .read    = memtrace_read,
>>> +    .mmap    = memtrace_mmap,
>>> +    .open    = simple_open,
>>> +};
>>> +
>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>> +{
>>> +    if (mem->state != MEM_ONLINE)
>>> +        return -1;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>> +{
>>> +    unsigned long state = (unsigned long)arg;
>>> +
>>> +    mem->state = state;
>>> +    return 0;
>>> +}
>>> +
>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64
>>> nr_pages)
>>> +{
>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>> +
>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>> +        check_memblock_online))
>>> +        return false;
>>> +
>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>> +              change_memblock_state);
>>> +
>> walk_memory_range() might be expensive, cant we just change the state
>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>> in the first place.
> 
> Good idea.
> 
>>> +    mem_hotplug_begin();
>>> +    if (offline_pages(start_pfn, nr_pages)) {
>>> +        walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>>> +                  change_memblock_state);
>>> +        mem_hotplug_done();
>> Right, this can remain as is. If we fail to offline pages, mark the
>> memory blocks as MEM_ONLINE again.
>>
>>> +        return false;
>>> +    }
>>> +
>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>>> +              change_memblock_state);
>>> +    mem_hotplug_done();
>> Right.
>>
>>> +
>>> +    /* Clear the dcache to remove any references to the memory */
>>> +    flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
>>> +                   (u64)__va(end_pfn << PAGE_SHIFT));
>> I am wondering why this is required now when we dont do anything for
>> cache flushing calls from core VM. If its really required now then
>> it also should be required during memory hot unplug operations in
>> general as well.
> 
> I could not see if this was being done when removing memory so figured
> that it was better to put it in than not do it.

Looking at the definitions I had pointed out before which gets
called from core VM, powerpc does not need to do anything specific
for cache invalidation or flushing. But I am not really sure on
this. So let it be.

> 
>>
>> /*
>>   * No cache flushing is required when address mappings are changed,
>>   * because the caches on PowerPCs are physically addressed.
>>   */
>> #define flush_cache_all()            do { } while (0)
>> #define flush_cache_mm(mm)            do { } while (0)
>> #define flush_cache_dup_mm(mm)            do { } while (0)
>> #define flush_cache_range(vma, start, end)    do { } while (0)
>> #define flush_cache_page(vma, vmaddr, pfn)    do { } while (0)
>> #define flush_icache_page(vma, page)        do { } while (0)
>> #define flush_cache_vmap(start, end)        do { } while (0)
>> #define flush_cache_vunmap(start, end)        do { } while (0)
>>
>>
>>> +
>>> +    /* Now remove memory from the mappings */
>>> +    lock_device_hotplug();
>>> +    remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages <<
>>> PAGE_SHIFT);
>>> +    unlock_device_hotplug();
>> Right. Now we have successfully taken down the memory.
>>
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>>> +{
>>> +    u64 start_pfn, end_pfn, nr_pages;
>>> +    u64 base_pfn;
>>> +
>>> +    if (!NODE_DATA(nid) || !node_spanned_pages(nid))
>>> +        return 0;
>> Why NODE_DATA check is required here ? Each node should have one
>> allocated and initialized by now, else we have bigger problems.
>> Is there any specific reason to check for spanned pages instead
>> of present/managed pages.
> 
> Anton wrote this check, so will need to confirm with him. I assume
> we check node_spanned_pages() rather than node_present_pages()
> because in arch/powerpc/mm/numa.c we set node_spanned_pages() and
> not node_present_pages()?

I guess any thing is okay but NODE_DATA() seems redundant though.

struct pglist_data {
        ..........................
	unsigned long node_present_pages; /* total number of physical pages */
	unsigned long node_spanned_pages; /* total size of physical page
					     range, including holes */

}

>>
>>> +
>>> +    start_pfn = node_start_pfn(nid);
>>> +    end_pfn = node_end_pfn(nid);
>>> +    nr_pages = size >> PAGE_SHIFT;
>>> +
>>> +    /* Trace memory needs to be aligned to the size */
>>> +    end_pfn = round_down(end_pfn - nr_pages, nr_pages);
>>> +
>>> +    for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -=
>>> nr_pages) {
>>> +        if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
>>> +            return base_pfn << PAGE_SHIFT;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int memtrace_init_regions_runtime(u64 size)
>>> +{
>>> +    u64 m;
>>> +    u32 nid;
>>> +
>>> +    memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
>>> +                num_online_nodes(), GFP_KERNEL);
>>> +    if (!memtrace_array) {
>>> +        pr_err("Failed to allocate memtrace_array\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    for_each_online_node(nid) {
>>> +        m = memtrace_alloc_node(nid, size);
>>> +        /*
>>> +         * A node might not have any local memory, so warn but
>>> +         * continue on.
>>> +         */
>>> +        if (!m) {
>>> +            pr_err("Failed to allocate trace memory on node %d\n",
>>> +                 nid);
>>> +        } else {
>>> +            pr_info("Allocated trace memory on node %d at 0x%016llx\n",
>>> +                 nid, m);
>>> +
>>> +            memtrace_array[memtrace_array_nr].start = m;
>>> +            memtrace_array[memtrace_array_nr].size = size;
>>> +            memtrace_array[memtrace_array_nr].nid = nid;
>>> +            memtrace_array_nr++;
>>> +        }
>>> +    }
>>> +    return 0;
>>> +}
>> All the pr_info() and pr_err() prints should have a "memtrace :"
>> before the
>> actual string to make it clear that its coming from this feature.
>>
> Good point!
>>> +
>>> +static struct dentry *memtrace_debugfs_dir;
>>> +
>>> +static int memtrace_init_debugfs(void)
>>> +{
>>> +    int ret = 0;
>>> +    int i;
>>> +
>>> +    for (i = 0; i < memtrace_array_nr; i++) {
>>> +        struct dentry *dir;
>>> +        struct memtrace_entry *ent = &memtrace_array[i];
>>> +
>>> +        ent->mem = ioremap(ent->start, ent->size);
>>> +        /* Warn but continue on */
>>> +        if (!ent->mem) {
>>> +            pr_err("Failed to map trace memory at 0x%llx\n",
>>> +                 ent->start);
>>> +            ret = -1;
>>> +            continue;
>>> +        }
>>> +
>>> +        snprintf(ent->name, 16, "%08x", ent->nid);
>>> +        dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
>>> +        if (!dir)
>>> +            return -1;
>>> +
>>> +        ent->dir = dir;
>>> +        debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
>>> +        debugfs_create_x64("start", 0400, dir, &ent->start);
>>> +        debugfs_create_x64("size", 0400, dir, &ent->size);
>>> +        debugfs_create_u32("node", 0400, dir, &ent->nid);
>>> +    }
>> Oh okay, its creating all the four files. Please create corresponding
>> to each of the files some where. Documentation/ABI/testing lists the
>> actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.
> 
> I'm not exactly sure what you are saying here... Seeing that there is
> documentation about debugfs files in Documentation/ABI/testing, I'll
> follow suit
> and put it there.

I meant the same.

> 
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static u64 memtrace_size;
>>> +
>>> +static int memtrace_enable_set(void *data, u64 val)
>>> +{
>>> +    if (memtrace_size)
>>> +        return -EINVAL;
>>> +
>>> +    if (!val)
>>> +        return -EINVAL;
>>> +
>>> +    /* Make sure size is aligned to a memory block */
>>> +    if (val & (memory_block_size_bytes()-1))
>> As I have mentioned earlier, this should be mentioned in the interface
>> description some where.
>>
>>> +        return -EINVAL;
>>> +
>>> +    if (memtrace_init_regions_runtime(val))
>>> +        return -EINVAL;
>>> +
>>> +    if (memtrace_init_debugfs())
>>> +        return -EINVAL;
>>> +
>>> +    memtrace_size = val;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int memtrace_enable_get(void *data, u64 *val)
>>> +{
>>> +    *val = memtrace_size;
>>> +    return 0;
>>> +}
>>> +
>>> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>>> memtrace_enable_set, "0x%016llx\n");
>>> +
>>> +static int memtrace_init(void)
>>> +{
>>> +    memtrace_debugfs_dir = debugfs_create_dir("memtrace",
>>> powerpc_debugfs_root);
>>> +    if (!memtrace_debugfs_dir)
>>> +        return -1;
>>> +
>>> +    debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
>>> +                NULL, &memtrace_init_fops);
>>> +
>>> +    return 0;
>>> +}
>>> +machine_device_initcall(powernv, memtrace_init);
>>> +
>>>
>> BTW how we start the tracing process for the trace to be collected in the
>> interface before we can read them ? This interface does not seem to have
>> a handler. When it directs the HW to start collecting the traces ?
>>
>> debugfs_create_x64("start", 0400, dir, &ent->start);
>>
>>
> I think you're asking 'what is actually going to call this code and do
> the tracing'?

No, when you call this interface, where is the routine to start the
actual tracing invoking appropriate platform functions or HW
instructions ? I dont see such a function associated with 'start'
interface mentioned above.


> There is some userspace code in the works to do this. The person who was
> writing it
> is doing something else now, and I think it's a bit gross so I'm trying
> to tidy it
> up a little.

Okay, please do provide some pointers when the code is ready which will
help in better understanding of this interface.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-03 11:56       ` Anshuman Khandual
@ 2017-05-09  7:06         ` Rashmica Gupta
  2017-05-14  4:55           ` Anshuman Khandual
  0 siblings, 1 reply; 14+ messages in thread
From: Rashmica Gupta @ 2017-05-09  7:06 UTC (permalink / raw)
  To: Anshuman Khandual, Rashmica Gupta, Anton Blanchard, linuxppc-dev,
	mpe, npiggin, bsingharora, oohall

Sorry for the late reply, I somehow missed this.


On 03/05/17 21:56, Anshuman Khandual wrote:
> On 05/03/2017 09:22 AM, Rashmica Gupta wrote:
>> On 28/04/17 19:52, Anshuman Khandual wrote:
>>> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>>>> Some powerpc hardware features may want to gain access to a chunk of
>>> What kind of features ? Please add specifics.
>>>
>>>> undisturbed real memory.  This update provides a means to unplug said
>>>> memory
>>> Undisturbed ? Meaning part of memblock and currently inside the buddy
>>> allocator which we are trying to hot unplug out ?
>>>
>>>> from the kernel with a set of debugfs calls.  By writing an integer
>>>> containing
>>>>    the size of memory to be unplugged into
>>> Does the size has some constraints like aligned with memblock section
>>> size ? LMB size ? page block size ? etc. Please add the details.
>> Will do.
>>
>>>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that
>>>> much
>>>> memory from the end of each available chip's memory space (ie each
>>>> memory node).
>>> <size> amount (I guess bytes in this case) of memory will be removed
>>> from the end of the NUMA node ? Whats the guarantee that they would be
>>> free at that time and not being pinned by some process ? If its not
>>> guaranteed to be freed, then interface description should state that
>>> clearly.
>> We start looking from the end of the NUMA node but of course there is no
>> guarantee
>> that we will always be able to find some memory there that we are able
>> to remove.
>
> Okay. Do we have interface for giving this memory back to the buddy
> allocator again when we are done with HW tracing ? If not we need to
> add one.

Not at the moment. Last time I spoke to Anton he said something along 
the lines
of it not being too important as if you are getting the hardware traces 
for debugging
purposes you are probably not worried about a bit of memory being out of 
action.

However I can't see why having an interface to online the memory would 
be a bad thing,
so I'll look into it.

>>>> In addition, the means to read out the contents of the unplugged
>>>> memory is also
>>>> provided by reading out the
>>>> /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
>>>> file.
>>> All of the debugfs file interfaces added here should be documented some
>>> where in detail.
>>>
>>>> Signed-off-by: Anton Blanchard <anton@samba.org>
>>>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
>>>>
>>>> ---
>>>> This requires the 'Wire up hpte_removebolted for powernv' patch.
>>>>
>>>> RFC -> v1: Added in two missing locks. Replaced the open-coded
>>>> flush_memory_region() with the existing
>>>> flush_inval_dcache_range(start, end).
>>>>
>>>> memtrace_offline_pages() is open-coded because offline_pages is
>>>> designed to be
>>>> called through the sysfs interface - not directly.
>>>>
>>>> We could move the offlining of pages to userspace, which removes some
>>>> of this
>>>> open-coding. This would then require passing info to the kernel such
>>>> that it
>>>> can then remove the memory that has been offlined. This could be done
>>>> using
>>>> notifiers, but this isn't simple due to locking (remove_memory needs
>>>> mem_hotplug_begin() which the sysfs interface already has). This
>>>> could also be
>>>> done through the debugfs interface (similar to what is done here).
>>>> Either way,
>>>> this would require the process that needs the memory to have
>>>> open-coded code
>>>> which it shouldn't really be involved with.
>>>>
>>>> As the current remove_memory() function requires the memory to
>>>> already be
>>>> offlined, it makes sense to keep the offlining and removal of memory
>>>> functionality grouped together so that a process can simply make one
>>>> request to
>>>> unplug some memory. Ideally there would be a kernel function we could
>>>> call that
>>>> would offline the memory and then remove it.
>>>>
>>>>
>>>>    arch/powerpc/platforms/powernv/memtrace.c | 276
>>>> ++++++++++++++++++++++++++++++
>>>>    1 file changed, 276 insertions(+)
>>>>    create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>>>
>>>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c
>>>> b/arch/powerpc/platforms/powernv/memtrace.c
>>>> new file mode 100644
>>>> index 0000000..86184b1
>>>> --- /dev/null
>>>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>>>> @@ -0,0 +1,276 @@
>>>> +/*
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License as published by
>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>> + * (at your option) any later version.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * Copyright (C) IBM Corporation, 2014
>>>> + *
>>>> + * Author: Anton Blanchard <anton@au.ibm.com>
>>>> + */
>>>> +
>>>> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
>>>> +
>>>> +#include <linux/bitops.h>
>>>> +#include <linux/string.h>
>>>> +#include <linux/memblock.h>
>>>> +#include <linux/init.h>
>>>> +#include <linux/moduleparam.h>
>>>> +#include <linux/fs.h>
>>>> +#include <linux/debugfs.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/memory.h>
>>>> +#include <linux/memory_hotplug.h>
>>>> +#include <asm/machdep.h>
>>>> +#include <asm/debugfs.h>
>>>> +#include <asm/cacheflush.h>
>>>> +
>>>> +struct memtrace_entry {
>>>> +    void *mem;
>>>> +    u64 start;
>>>> +    u64 size;
>>>> +    u32 nid;
>>>> +    struct dentry *dir;
>>>> +    char name[16];
>>>> +};
>>> Little bit of description about the structure here will help.
>> Something like 'this enables us to keep track of the memory removed from
>> each node'?
> Right, something like that.
>
>>>> +
>>>> +static struct memtrace_entry *memtrace_array;
>>>> +static unsigned int memtrace_array_nr;
>>>> +
>>>> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
>>>> +                 size_t count, loff_t *ppos)
>>>> +{
>>>> +    struct memtrace_entry *ent = filp->private_data;
>>>> +
>>>> +    return simple_read_from_buffer(ubuf, count, ppos, ent->mem,
>>>> ent->size);
>>>> +}
>>>> +
>>>> +static bool valid_memtrace_range(struct memtrace_entry *dev,
>>>> +                 unsigned long start, unsigned long size)
>>>> +{
>>>> +    if ((dev->start <= start) &&
>>> Switch the position of start and dev->start above. Will make
>>> it easy while reading.
>>>
>>>> +        ((start + size) <= (dev->start + dev->size)))
>>>> +        return true;
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>> +static int memtrace_mmap(struct file *filp, struct vm_area_struct *vma)
>>>> +{
>>>> +    unsigned long size = vma->vm_end - vma->vm_start;
>>>> +    struct memtrace_entry *dev = filp->private_data;
>>>> +
>>>> +    if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT, size))
>>>> +        return -EINVAL;
>>>> +
>>>> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>> Why we do this ? Its coming from real RAM not IO memory. Then the page
>>> protection still needs changes ?
>> Once the memory is removed from the kernel mappings we want to mark it as
>> uncachable.
> Got it but why ? Uncachable marking are for pages which will be mapped
> to IO ranges which should not be cached just to prevent the possibility
> of stale data.
>>>> +
>>>> +    if (io_remap_pfn_range(vma, vma->vm_start,
>>>> +                   vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
>>>> +                   size, vma->vm_page_prot))
>>> You can just call remap_pfn_rang() instead though they are all the same.
>>> There is nothing I/O here should be explicit.
>> Good point.
>>
>>>> +        return -EAGAIN;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static const struct file_operations memtrace_fops = {
>>>> +    .llseek = default_llseek,
>>>> +    .read    = memtrace_read,
>>>> +    .mmap    = memtrace_mmap,
>>>> +    .open    = simple_open,
>>>> +};
>>>> +
>>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>>> +{
>>>> +    if (mem->state != MEM_ONLINE)
>>>> +        return -1;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>>> +{
>>>> +    unsigned long state = (unsigned long)arg;
>>>> +
>>>> +    mem->state = state;
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64
>>>> nr_pages)
>>>> +{
>>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>>> +
>>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>>> +        check_memblock_online))
>>>> +        return false;
>>>> +
>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>>> +              change_memblock_state);
>>>> +
>>> walk_memory_range() might be expensive, cant we just change the state
>>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>>> in the first place.
>> Good idea.
>>
>>>> +    mem_hotplug_begin();
>>>> +    if (offline_pages(start_pfn, nr_pages)) {
>>>> +        walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>>>> +                  change_memblock_state);
>>>> +        mem_hotplug_done();
>>> Right, this can remain as is. If we fail to offline pages, mark the
>>> memory blocks as MEM_ONLINE again.
>>>
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>>>> +              change_memblock_state);
>>>> +    mem_hotplug_done();
>>> Right.
>>>
>>>> +
>>>> +    /* Clear the dcache to remove any references to the memory */
>>>> +    flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
>>>> +                   (u64)__va(end_pfn << PAGE_SHIFT));
>>> I am wondering why this is required now when we dont do anything for
>>> cache flushing calls from core VM. If its really required now then
>>> it also should be required during memory hot unplug operations in
>>> general as well.
>> I could not see if this was being done when removing memory so figured
>> that it was better to put it in than not do it.
> Looking at the definitions I had pointed out before which gets
> called from core VM, powerpc does not need to do anything specific
> for cache invalidation or flushing. But I am not really sure on
> this. So let it be.
>
>>> /*
>>>    * No cache flushing is required when address mappings are changed,
>>>    * because the caches on PowerPCs are physically addressed.
>>>    */
>>> #define flush_cache_all()            do { } while (0)
>>> #define flush_cache_mm(mm)            do { } while (0)
>>> #define flush_cache_dup_mm(mm)            do { } while (0)
>>> #define flush_cache_range(vma, start, end)    do { } while (0)
>>> #define flush_cache_page(vma, vmaddr, pfn)    do { } while (0)
>>> #define flush_icache_page(vma, page)        do { } while (0)
>>> #define flush_cache_vmap(start, end)        do { } while (0)
>>> #define flush_cache_vunmap(start, end)        do { } while (0)
>>>
>>>
>>>> +
>>>> +    /* Now remove memory from the mappings */
>>>> +    lock_device_hotplug();
>>>> +    remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages <<
>>>> PAGE_SHIFT);
>>>> +    unlock_device_hotplug();
>>> Right. Now we have successfully taken down the memory.
>>>
>>>> +
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>>>> +{
>>>> +    u64 start_pfn, end_pfn, nr_pages;
>>>> +    u64 base_pfn;
>>>> +
>>>> +    if (!NODE_DATA(nid) || !node_spanned_pages(nid))
>>>> +        return 0;
>>> Why NODE_DATA check is required here ? Each node should have one
>>> allocated and initialized by now, else we have bigger problems.
>>> Is there any specific reason to check for spanned pages instead
>>> of present/managed pages.
>> Anton wrote this check, so will need to confirm with him. I assume
>> we check node_spanned_pages() rather than node_present_pages()
>> because in arch/powerpc/mm/numa.c we set node_spanned_pages() and
>> not node_present_pages()?
> I guess any thing is okay but NODE_DATA() seems redundant though.

Agreed.

>
> struct pglist_data {
>          ..........................
> 	unsigned long node_present_pages; /* total number of physical pages */
> 	unsigned long node_spanned_pages; /* total size of physical page
> 					     range, including holes */
>
> }
>
>>>> +
>>>> +    start_pfn = node_start_pfn(nid);
>>>> +    end_pfn = node_end_pfn(nid);
>>>> +    nr_pages = size >> PAGE_SHIFT;
>>>> +
>>>> +    /* Trace memory needs to be aligned to the size */
>>>> +    end_pfn = round_down(end_pfn - nr_pages, nr_pages);
>>>> +
>>>> +    for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -=
>>>> nr_pages) {
>>>> +        if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
>>>> +            return base_pfn << PAGE_SHIFT;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int memtrace_init_regions_runtime(u64 size)
>>>> +{
>>>> +    u64 m;
>>>> +    u32 nid;
>>>> +
>>>> +    memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
>>>> +                num_online_nodes(), GFP_KERNEL);
>>>> +    if (!memtrace_array) {
>>>> +        pr_err("Failed to allocate memtrace_array\n");
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>> +    for_each_online_node(nid) {
>>>> +        m = memtrace_alloc_node(nid, size);
>>>> +        /*
>>>> +         * A node might not have any local memory, so warn but
>>>> +         * continue on.
>>>> +         */
>>>> +        if (!m) {
>>>> +            pr_err("Failed to allocate trace memory on node %d\n",
>>>> +                 nid);
>>>> +        } else {
>>>> +            pr_info("Allocated trace memory on node %d at 0x%016llx\n",
>>>> +                 nid, m);
>>>> +
>>>> +            memtrace_array[memtrace_array_nr].start = m;
>>>> +            memtrace_array[memtrace_array_nr].size = size;
>>>> +            memtrace_array[memtrace_array_nr].nid = nid;
>>>> +            memtrace_array_nr++;
>>>> +        }
>>>> +    }
>>>> +    return 0;
>>>> +}
>>> All the pr_info() and pr_err() prints should have a "memtrace :"
>>> before the
>>> actual string to make it clear that its coming from this feature.
>>>
>> Good point!
>>>> +
>>>> +static struct dentry *memtrace_debugfs_dir;
>>>> +
>>>> +static int memtrace_init_debugfs(void)
>>>> +{
>>>> +    int ret = 0;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < memtrace_array_nr; i++) {
>>>> +        struct dentry *dir;
>>>> +        struct memtrace_entry *ent = &memtrace_array[i];
>>>> +
>>>> +        ent->mem = ioremap(ent->start, ent->size);
>>>> +        /* Warn but continue on */
>>>> +        if (!ent->mem) {
>>>> +            pr_err("Failed to map trace memory at 0x%llx\n",
>>>> +                 ent->start);
>>>> +            ret = -1;
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        snprintf(ent->name, 16, "%08x", ent->nid);
>>>> +        dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
>>>> +        if (!dir)
>>>> +            return -1;
>>>> +
>>>> +        ent->dir = dir;
>>>> +        debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
>>>> +        debugfs_create_x64("start", 0400, dir, &ent->start);
>>>> +        debugfs_create_x64("size", 0400, dir, &ent->size);
>>>> +        debugfs_create_u32("node", 0400, dir, &ent->nid);
>>>> +    }
>>> Oh okay, its creating all the four files. Please create corresponding
>>> to each of the files some where. Documentation/ABI/testing lists the
>>> actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.
>> I'm not exactly sure what you are saying here... Seeing that there is
>> documentation about debugfs files in Documentation/ABI/testing, I'll
>> follow suit
>> and put it there.
> I meant the same.
>
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static u64 memtrace_size;
>>>> +
>>>> +static int memtrace_enable_set(void *data, u64 val)
>>>> +{
>>>> +    if (memtrace_size)
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (!val)
>>>> +        return -EINVAL;
>>>> +
>>>> +    /* Make sure size is aligned to a memory block */
>>>> +    if (val & (memory_block_size_bytes()-1))
>>> As I have mentioned earlier, this should be mentioned in the interface
>>> description some where.
>>>
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (memtrace_init_regions_runtime(val))
>>>> +        return -EINVAL;
>>>> +
>>>> +    if (memtrace_init_debugfs())
>>>> +        return -EINVAL;
>>>> +
>>>> +    memtrace_size = val;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int memtrace_enable_get(void *data, u64 *val)
>>>> +{
>>>> +    *val = memtrace_size;
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>>>> memtrace_enable_set, "0x%016llx\n");
>>>> +
>>>> +static int memtrace_init(void)
>>>> +{
>>>> +    memtrace_debugfs_dir = debugfs_create_dir("memtrace",
>>>> powerpc_debugfs_root);
>>>> +    if (!memtrace_debugfs_dir)
>>>> +        return -1;
>>>> +
>>>> +    debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
>>>> +                NULL, &memtrace_init_fops);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +machine_device_initcall(powernv, memtrace_init);
>>>> +
>>>>
>>> BTW how we start the tracing process for the trace to be collected in the
>>> interface before we can read them ? This interface does not seem to have
>>> a handler. When it directs the HW to start collecting the traces ?
>>>
>>> debugfs_create_x64("start", 0400, dir, &ent->start);
>>>
>>>
>> I think you're asking 'what is actually going to call this code and do
>> the tracing'?
> No, when you call this interface, where is the routine to start the
> actual tracing invoking appropriate platform functions or HW
> instructions ? I dont see such a function associated with 'start'
> interface mentioned above.

Essentially,
DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
memtrace_enable_set, "0x%016llx\n");

means that when a number is written to the memtrace/enable file, the 
number is
read as a u64 and passed to memtrace_enable_set()

>
>> There is some userspace code in the works to do this. The person who was
>> writing it
>> is doing something else now, and I think it's a bit gross so I'm trying
>> to tidy it
>> up a little.
> Okay, please do provide some pointers when the code is ready which will
> help in better understanding of this interface.
>
Will do!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-09  7:06         ` Rashmica Gupta
@ 2017-05-14  4:55           ` Anshuman Khandual
  2017-05-15  3:34             ` Rashmica Gupta
  0 siblings, 1 reply; 14+ messages in thread
From: Anshuman Khandual @ 2017-05-14  4:55 UTC (permalink / raw)
  To: Rashmica Gupta, Anton Blanchard, linuxppc-dev, mpe, npiggin,
	bsingharora, oohall

On 05/09/2017 12:36 PM, Rashmica Gupta wrote:
> Sorry for the late reply, I somehow missed this.
> 
> 
> On 03/05/17 21:56, Anshuman Khandual wrote:
>> On 05/03/2017 09:22 AM, Rashmica Gupta wrote:
>>> On 28/04/17 19:52, Anshuman Khandual wrote:
>>>> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>>>>> Some powerpc hardware features may want to gain access to a chunk of
>>>> What kind of features ? Please add specifics.
>>>>
>>>>> undisturbed real memory.  This update provides a means to unplug said
>>>>> memory
>>>> Undisturbed ? Meaning part of memblock and currently inside the buddy
>>>> allocator which we are trying to hot unplug out ?
>>>>
>>>>> from the kernel with a set of debugfs calls.  By writing an integer
>>>>> containing
>>>>>    the size of memory to be unplugged into
>>>> Does the size has some constraints like aligned with memblock section
>>>> size ? LMB size ? page block size ? etc. Please add the details.
>>> Will do.
>>>
>>>>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that
>>>>> much
>>>>> memory from the end of each available chip's memory space (ie each
>>>>> memory node).
>>>> <size> amount (I guess bytes in this case) of memory will be removed
>>>> from the end of the NUMA node ? Whats the guarantee that they would be
>>>> free at that time and not being pinned by some process ? If its not
>>>> guaranteed to be freed, then interface description should state that
>>>> clearly.
>>> We start looking from the end of the NUMA node but of course there is no
>>> guarantee
>>> that we will always be able to find some memory there that we are able
>>> to remove.
>>
>> Okay. Do we have interface for giving this memory back to the buddy
>> allocator again when we are done with HW tracing ? If not we need to
>> add one.
> 
> Not at the moment. Last time I spoke to Anton he said something along
> the lines
> of it not being too important as if you are getting the hardware traces
> for debugging
> purposes you are probably not worried about a bit of memory being out of
> action.
> 
> However I can't see why having an interface to online the memory would
> be a bad thing,
> so I'll look into it.

Yes, the interface to put them back into buddy is important even if the
amount of memory is very less for tracing. Just need to trigger hotplug
and online procedure to put it back.

> 
>>>>> In addition, the means to read out the contents of the unplugged
>>>>> memory is also
>>>>> provided by reading out the
>>>>> /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
>>>>> file.
>>>> All of the debugfs file interfaces added here should be documented some
>>>> where in detail.
>>>>
>>>>> Signed-off-by: Anton Blanchard <anton@samba.org>
>>>>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
>>>>>
>>>>> ---
>>>>> This requires the 'Wire up hpte_removebolted for powernv' patch.
>>>>>
>>>>> RFC -> v1: Added in two missing locks. Replaced the open-coded
>>>>> flush_memory_region() with the existing
>>>>> flush_inval_dcache_range(start, end).
>>>>>
>>>>> memtrace_offline_pages() is open-coded because offline_pages is
>>>>> designed to be
>>>>> called through the sysfs interface - not directly.
>>>>>
>>>>> We could move the offlining of pages to userspace, which removes some
>>>>> of this
>>>>> open-coding. This would then require passing info to the kernel such
>>>>> that it
>>>>> can then remove the memory that has been offlined. This could be done
>>>>> using
>>>>> notifiers, but this isn't simple due to locking (remove_memory needs
>>>>> mem_hotplug_begin() which the sysfs interface already has). This
>>>>> could also be
>>>>> done through the debugfs interface (similar to what is done here).
>>>>> Either way,
>>>>> this would require the process that needs the memory to have
>>>>> open-coded code
>>>>> which it shouldn't really be involved with.
>>>>>
>>>>> As the current remove_memory() function requires the memory to
>>>>> already be
>>>>> offlined, it makes sense to keep the offlining and removal of memory
>>>>> functionality grouped together so that a process can simply make one
>>>>> request to
>>>>> unplug some memory. Ideally there would be a kernel function we could
>>>>> call that
>>>>> would offline the memory and then remove it.
>>>>>
>>>>>
>>>>>    arch/powerpc/platforms/powernv/memtrace.c | 276
>>>>> ++++++++++++++++++++++++++++++
>>>>>    1 file changed, 276 insertions(+)
>>>>>    create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c
>>>>> b/arch/powerpc/platforms/powernv/memtrace.c
>>>>> new file mode 100644
>>>>> index 0000000..86184b1
>>>>> --- /dev/null
>>>>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>>>>> @@ -0,0 +1,276 @@
>>>>> +/*
>>>>> + * This program is free software; you can redistribute it and/or
>>>>> modify
>>>>> + * it under the terms of the GNU General Public License as
>>>>> published by
>>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>>> + * (at your option) any later version.
>>>>> + *
>>>>> + * This program is distributed in the hope that it will be useful,
>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>> + * GNU General Public License for more details.
>>>>> + *
>>>>> + * Copyright (C) IBM Corporation, 2014
>>>>> + *
>>>>> + * Author: Anton Blanchard <anton@au.ibm.com>
>>>>> + */
>>>>> +
>>>>> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
>>>>> +
>>>>> +#include <linux/bitops.h>
>>>>> +#include <linux/string.h>
>>>>> +#include <linux/memblock.h>
>>>>> +#include <linux/init.h>
>>>>> +#include <linux/moduleparam.h>
>>>>> +#include <linux/fs.h>
>>>>> +#include <linux/debugfs.h>
>>>>> +#include <linux/slab.h>
>>>>> +#include <linux/memory.h>
>>>>> +#include <linux/memory_hotplug.h>
>>>>> +#include <asm/machdep.h>
>>>>> +#include <asm/debugfs.h>
>>>>> +#include <asm/cacheflush.h>
>>>>> +
>>>>> +struct memtrace_entry {
>>>>> +    void *mem;
>>>>> +    u64 start;
>>>>> +    u64 size;
>>>>> +    u32 nid;
>>>>> +    struct dentry *dir;
>>>>> +    char name[16];
>>>>> +};
>>>> Little bit of description about the structure here will help.
>>> Something like 'this enables us to keep track of the memory removed from
>>> each node'?
>> Right, something like that.
>>
>>>>> +
>>>>> +static struct memtrace_entry *memtrace_array;
>>>>> +static unsigned int memtrace_array_nr;
>>>>> +
>>>>> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
>>>>> +                 size_t count, loff_t *ppos)
>>>>> +{
>>>>> +    struct memtrace_entry *ent = filp->private_data;
>>>>> +
>>>>> +    return simple_read_from_buffer(ubuf, count, ppos, ent->mem,
>>>>> ent->size);
>>>>> +}
>>>>> +
>>>>> +static bool valid_memtrace_range(struct memtrace_entry *dev,
>>>>> +                 unsigned long start, unsigned long size)
>>>>> +{
>>>>> +    if ((dev->start <= start) &&
>>>> Switch the position of start and dev->start above. Will make
>>>> it easy while reading.
>>>>
>>>>> +        ((start + size) <= (dev->start + dev->size)))
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>> +static int memtrace_mmap(struct file *filp, struct vm_area_struct
>>>>> *vma)
>>>>> +{
>>>>> +    unsigned long size = vma->vm_end - vma->vm_start;
>>>>> +    struct memtrace_entry *dev = filp->private_data;
>>>>> +
>>>>> +    if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT,
>>>>> size))
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>>> Why we do this ? Its coming from real RAM not IO memory. Then the page
>>>> protection still needs changes ?
>>> Once the memory is removed from the kernel mappings we want to mark
>>> it as
>>> uncachable.
>> Got it but why ? Uncachable marking are for pages which will be mapped
>> to IO ranges which should not be cached just to prevent the possibility
>> of stale data.
>>>>> +
>>>>> +    if (io_remap_pfn_range(vma, vma->vm_start,
>>>>> +                   vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
>>>>> +                   size, vma->vm_page_prot))
>>>> You can just call remap_pfn_rang() instead though they are all the
>>>> same.
>>>> There is nothing I/O here should be explicit.
>>> Good point.
>>>
>>>>> +        return -EAGAIN;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations memtrace_fops = {
>>>>> +    .llseek = default_llseek,
>>>>> +    .read    = memtrace_read,
>>>>> +    .mmap    = memtrace_mmap,
>>>>> +    .open    = simple_open,
>>>>> +};
>>>>> +
>>>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>>>> +{
>>>>> +    if (mem->state != MEM_ONLINE)
>>>>> +        return -1;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>>>> +{
>>>>> +    unsigned long state = (unsigned long)arg;
>>>>> +
>>>>> +    mem->state = state;
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64
>>>>> nr_pages)
>>>>> +{
>>>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>>>> +
>>>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>>>> +        check_memblock_online))
>>>>> +        return false;
>>>>> +
>>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>>>> +              change_memblock_state);
>>>>> +
>>>> walk_memory_range() might be expensive, cant we just change the state
>>>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>>>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>>>> in the first place.
>>> Good idea.
>>>
>>>>> +    mem_hotplug_begin();
>>>>> +    if (offline_pages(start_pfn, nr_pages)) {
>>>>> +        walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>>>>> +                  change_memblock_state);
>>>>> +        mem_hotplug_done();
>>>> Right, this can remain as is. If we fail to offline pages, mark the
>>>> memory blocks as MEM_ONLINE again.
>>>>
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>>>>> +              change_memblock_state);
>>>>> +    mem_hotplug_done();
>>>> Right.
>>>>
>>>>> +
>>>>> +    /* Clear the dcache to remove any references to the memory */
>>>>> +    flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
>>>>> +                   (u64)__va(end_pfn << PAGE_SHIFT));
>>>> I am wondering why this is required now when we dont do anything for
>>>> cache flushing calls from core VM. If its really required now then
>>>> it also should be required during memory hot unplug operations in
>>>> general as well.
>>> I could not see if this was being done when removing memory so figured
>>> that it was better to put it in than not do it.
>> Looking at the definitions I had pointed out before which gets
>> called from core VM, powerpc does not need to do anything specific
>> for cache invalidation or flushing. But I am not really sure on
>> this. So let it be.
>>
>>>> /*
>>>>    * No cache flushing is required when address mappings are changed,
>>>>    * because the caches on PowerPCs are physically addressed.
>>>>    */
>>>> #define flush_cache_all()            do { } while (0)
>>>> #define flush_cache_mm(mm)            do { } while (0)
>>>> #define flush_cache_dup_mm(mm)            do { } while (0)
>>>> #define flush_cache_range(vma, start, end)    do { } while (0)
>>>> #define flush_cache_page(vma, vmaddr, pfn)    do { } while (0)
>>>> #define flush_icache_page(vma, page)        do { } while (0)
>>>> #define flush_cache_vmap(start, end)        do { } while (0)
>>>> #define flush_cache_vunmap(start, end)        do { } while (0)
>>>>
>>>>
>>>>> +
>>>>> +    /* Now remove memory from the mappings */
>>>>> +    lock_device_hotplug();
>>>>> +    remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages <<
>>>>> PAGE_SHIFT);
>>>>> +    unlock_device_hotplug();
>>>> Right. Now we have successfully taken down the memory.
>>>>
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>>>>> +{
>>>>> +    u64 start_pfn, end_pfn, nr_pages;
>>>>> +    u64 base_pfn;
>>>>> +
>>>>> +    if (!NODE_DATA(nid) || !node_spanned_pages(nid))
>>>>> +        return 0;
>>>> Why NODE_DATA check is required here ? Each node should have one
>>>> allocated and initialized by now, else we have bigger problems.
>>>> Is there any specific reason to check for spanned pages instead
>>>> of present/managed pages.
>>> Anton wrote this check, so will need to confirm with him. I assume
>>> we check node_spanned_pages() rather than node_present_pages()
>>> because in arch/powerpc/mm/numa.c we set node_spanned_pages() and
>>> not node_present_pages()?
>> I guess any thing is okay but NODE_DATA() seems redundant though.
> 
> Agreed.
> 
>>
>> struct pglist_data {
>>          ..........................
>>     unsigned long node_present_pages; /* total number of physical
>> pages */
>>     unsigned long node_spanned_pages; /* total size of physical page
>>                          range, including holes */
>>
>> }
>>
>>>>> +
>>>>> +    start_pfn = node_start_pfn(nid);
>>>>> +    end_pfn = node_end_pfn(nid);
>>>>> +    nr_pages = size >> PAGE_SHIFT;
>>>>> +
>>>>> +    /* Trace memory needs to be aligned to the size */
>>>>> +    end_pfn = round_down(end_pfn - nr_pages, nr_pages);
>>>>> +
>>>>> +    for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -=
>>>>> nr_pages) {
>>>>> +        if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
>>>>> +            return base_pfn << PAGE_SHIFT;
>>>>> +    }
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int memtrace_init_regions_runtime(u64 size)
>>>>> +{
>>>>> +    u64 m;
>>>>> +    u32 nid;
>>>>> +
>>>>> +    memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
>>>>> +                num_online_nodes(), GFP_KERNEL);
>>>>> +    if (!memtrace_array) {
>>>>> +        pr_err("Failed to allocate memtrace_array\n");
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +
>>>>> +    for_each_online_node(nid) {
>>>>> +        m = memtrace_alloc_node(nid, size);
>>>>> +        /*
>>>>> +         * A node might not have any local memory, so warn but
>>>>> +         * continue on.
>>>>> +         */
>>>>> +        if (!m) {
>>>>> +            pr_err("Failed to allocate trace memory on node %d\n",
>>>>> +                 nid);
>>>>> +        } else {
>>>>> +            pr_info("Allocated trace memory on node %d at
>>>>> 0x%016llx\n",
>>>>> +                 nid, m);
>>>>> +
>>>>> +            memtrace_array[memtrace_array_nr].start = m;
>>>>> +            memtrace_array[memtrace_array_nr].size = size;
>>>>> +            memtrace_array[memtrace_array_nr].nid = nid;
>>>>> +            memtrace_array_nr++;
>>>>> +        }
>>>>> +    }
>>>>> +    return 0;
>>>>> +}
>>>> All the pr_info() and pr_err() prints should have a "memtrace :"
>>>> before the
>>>> actual string to make it clear that its coming from this feature.
>>>>
>>> Good point!
>>>>> +
>>>>> +static struct dentry *memtrace_debugfs_dir;
>>>>> +
>>>>> +static int memtrace_init_debugfs(void)
>>>>> +{
>>>>> +    int ret = 0;
>>>>> +    int i;
>>>>> +
>>>>> +    for (i = 0; i < memtrace_array_nr; i++) {
>>>>> +        struct dentry *dir;
>>>>> +        struct memtrace_entry *ent = &memtrace_array[i];
>>>>> +
>>>>> +        ent->mem = ioremap(ent->start, ent->size);
>>>>> +        /* Warn but continue on */
>>>>> +        if (!ent->mem) {
>>>>> +            pr_err("Failed to map trace memory at 0x%llx\n",
>>>>> +                 ent->start);
>>>>> +            ret = -1;
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        snprintf(ent->name, 16, "%08x", ent->nid);
>>>>> +        dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
>>>>> +        if (!dir)
>>>>> +            return -1;
>>>>> +
>>>>> +        ent->dir = dir;
>>>>> +        debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
>>>>> +        debugfs_create_x64("start", 0400, dir, &ent->start);
>>>>> +        debugfs_create_x64("size", 0400, dir, &ent->size);
>>>>> +        debugfs_create_u32("node", 0400, dir, &ent->nid);
>>>>> +    }
>>>> Oh okay, its creating all the four files. Please create corresponding
>>>> to each of the files some where. Documentation/ABI/testing lists the
>>>> actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.
>>> I'm not exactly sure what you are saying here... Seeing that there is
>>> documentation about debugfs files in Documentation/ABI/testing, I'll
>>> follow suit
>>> and put it there.
>> I meant the same.
>>
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static u64 memtrace_size;
>>>>> +
>>>>> +static int memtrace_enable_set(void *data, u64 val)
>>>>> +{
>>>>> +    if (memtrace_size)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (!val)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    /* Make sure size is aligned to a memory block */
>>>>> +    if (val & (memory_block_size_bytes()-1))
>>>> As I have mentioned earlier, this should be mentioned in the interface
>>>> description some where.
>>>>
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (memtrace_init_regions_runtime(val))
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    if (memtrace_init_debugfs())
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    memtrace_size = val;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static int memtrace_enable_get(void *data, u64 *val)
>>>>> +{
>>>>> +    *val = memtrace_size;
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>>>>> memtrace_enable_set, "0x%016llx\n");
>>>>> +
>>>>> +static int memtrace_init(void)
>>>>> +{
>>>>> +    memtrace_debugfs_dir = debugfs_create_dir("memtrace",
>>>>> powerpc_debugfs_root);
>>>>> +    if (!memtrace_debugfs_dir)
>>>>> +        return -1;
>>>>> +
>>>>> +    debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
>>>>> +                NULL, &memtrace_init_fops);
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +machine_device_initcall(powernv, memtrace_init);
>>>>> +
>>>>>
>>>> BTW how we start the tracing process for the trace to be collected
>>>> in the
>>>> interface before we can read them ? This interface does not seem to
>>>> have
>>>> a handler. When it directs the HW to start collecting the traces ?
>>>>
>>>> debugfs_create_x64("start", 0400, dir, &ent->start);
>>>>
>>>>
>>> I think you're asking 'what is actually going to call this code and do
>>> the tracing'?
>> No, when you call this interface, where is the routine to start the
>> actual tracing invoking appropriate platform functions or HW
>> instructions ? I dont see such a function associated with 'start'
>> interface mentioned above.
> 
> Essentially,
> DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
> memtrace_enable_set, "0x%016llx\n");
> 
> means that when a number is written to the memtrace/enable file, the
> number is
> read as a u64 and passed to memtrace_enable_set()

Right, then we create the following interfaces for each entry of the memory
trace. By now all the memory ranges are ioremapped.

+		debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
+		debugfs_create_x64("start", 0400, dir, &ent->start);
+		debugfs_create_x64("size", 0400, dir, &ent->size);
+		debugfs_create_u32("node", 0400, dir, &ent->nid);

Then what really starts the HW trace ? where we tell the HW to start tracing
and put the trace details in the memory buffer allocated and ioremapped ?
IIUC just making the memory ioremapped() wont start the trace automatically.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-14  4:55           ` Anshuman Khandual
@ 2017-05-15  3:34             ` Rashmica Gupta
  2017-05-15  4:35               ` Anshuman Khandual
  0 siblings, 1 reply; 14+ messages in thread
From: Rashmica Gupta @ 2017-05-15  3:34 UTC (permalink / raw)
  To: Anshuman Khandual, Rashmica Gupta, Anton Blanchard, linuxppc-dev,
	mpe, npiggin, bsingharora, oohall



On 14/05/17 14:55, Anshuman Khandual wrote:
> On 05/09/2017 12:36 PM, Rashmica Gupta wrote:
>> Sorry for the late reply, I somehow missed this.
>>
>>
>> On 03/05/17 21:56, Anshuman Khandual wrote:
>>> On 05/03/2017 09:22 AM, Rashmica Gupta wrote:
>>>> On 28/04/17 19:52, Anshuman Khandual wrote:
>>>>> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>>>>>> Some powerpc hardware features may want to gain access to a chunk of
>>>>> What kind of features ? Please add specifics.
>>>>>
>>>>>> undisturbed real memory.  This update provides a means to unplug said
>>>>>> memory
>>>>> Undisturbed ? Meaning part of memblock and currently inside the buddy
>>>>> allocator which we are trying to hot unplug out ?
>>>>>
>>>>>> from the kernel with a set of debugfs calls.  By writing an integer
>>>>>> containing
>>>>>>     the size of memory to be unplugged into
>>>>> Does the size has some constraints like aligned with memblock section
>>>>> size ? LMB size ? page block size ? etc. Please add the details.
>>>> Will do.
>>>>
>>>>>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that
>>>>>> much
>>>>>> memory from the end of each available chip's memory space (ie each
>>>>>> memory node).
>>>>> <size> amount (I guess bytes in this case) of memory will be removed
>>>>> from the end of the NUMA node ? Whats the guarantee that they would be
>>>>> free at that time and not being pinned by some process ? If its not
>>>>> guaranteed to be freed, then interface description should state that
>>>>> clearly.
>>>> We start looking from the end of the NUMA node but of course there is no
>>>> guarantee
>>>> that we will always be able to find some memory there that we are able
>>>> to remove.
>>> Okay. Do we have interface for giving this memory back to the buddy
>>> allocator again when we are done with HW tracing ? If not we need to
>>> add one.
>> Not at the moment. Last time I spoke to Anton he said something along
>> the lines
>> of it not being too important as if you are getting the hardware traces
>> for debugging
>> purposes you are probably not worried about a bit of memory being out of
>> action.
>>
>> However I can't see why having an interface to online the memory would
>> be a bad thing,
>> so I'll look into it.
> Yes, the interface to put them back into buddy is important even if the
> amount of memory is very less for tracing. Just need to trigger hotplug
> and online procedure to put it back.
>
>>>>>> In addition, the means to read out the contents of the unplugged
>>>>>> memory is also
>>>>>> provided by reading out the
>>>>>> /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
>>>>>> file.
>>>>> All of the debugfs file interfaces added here should be documented some
>>>>> where in detail.
>>>>>
>>>>>> Signed-off-by: Anton Blanchard <anton@samba.org>
>>>>>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
>>>>>>
>>>>>> ---
>>>>>> This requires the 'Wire up hpte_removebolted for powernv' patch.
>>>>>>
>>>>>> RFC -> v1: Added in two missing locks. Replaced the open-coded
>>>>>> flush_memory_region() with the existing
>>>>>> flush_inval_dcache_range(start, end).
>>>>>>
>>>>>> memtrace_offline_pages() is open-coded because offline_pages is
>>>>>> designed to be
>>>>>> called through the sysfs interface - not directly.
>>>>>>
>>>>>> We could move the offlining of pages to userspace, which removes some
>>>>>> of this
>>>>>> open-coding. This would then require passing info to the kernel such
>>>>>> that it
>>>>>> can then remove the memory that has been offlined. This could be done
>>>>>> using
>>>>>> notifiers, but this isn't simple due to locking (remove_memory needs
>>>>>> mem_hotplug_begin() which the sysfs interface already has). This
>>>>>> could also be
>>>>>> done through the debugfs interface (similar to what is done here).
>>>>>> Either way,
>>>>>> this would require the process that needs the memory to have
>>>>>> open-coded code
>>>>>> which it shouldn't really be involved with.
>>>>>>
>>>>>> As the current remove_memory() function requires the memory to
>>>>>> already be
>>>>>> offlined, it makes sense to keep the offlining and removal of memory
>>>>>> functionality grouped together so that a process can simply make one
>>>>>> request to
>>>>>> unplug some memory. Ideally there would be a kernel function we could
>>>>>> call that
>>>>>> would offline the memory and then remove it.
>>>>>>
>>>>>>
>>>>>>     arch/powerpc/platforms/powernv/memtrace.c | 276
>>>>>> ++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 276 insertions(+)
>>>>>>     create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>>>>>
>>>>>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c
>>>>>> b/arch/powerpc/platforms/powernv/memtrace.c
>>>>>> new file mode 100644
>>>>>> index 0000000..86184b1
>>>>>> --- /dev/null
>>>>>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>>>>>> @@ -0,0 +1,276 @@
>>>>>> +/*
>>>>>> + * This program is free software; you can redistribute it and/or
>>>>>> modify
>>>>>> + * it under the terms of the GNU General Public License as
>>>>>> published by
>>>>>> + * the Free Software Foundation; either version 2 of the License, or
>>>>>> + * (at your option) any later version.
>>>>>> + *
>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>> + * GNU General Public License for more details.
>>>>>> + *
>>>>>> + * Copyright (C) IBM Corporation, 2014
>>>>>> + *
>>>>>> + * Author: Anton Blanchard <anton@au.ibm.com>
>>>>>> + */
>>>>>> +
>>>>>> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
>>>>>> +
>>>>>> +#include <linux/bitops.h>
>>>>>> +#include <linux/string.h>
>>>>>> +#include <linux/memblock.h>
>>>>>> +#include <linux/init.h>
>>>>>> +#include <linux/moduleparam.h>
>>>>>> +#include <linux/fs.h>
>>>>>> +#include <linux/debugfs.h>
>>>>>> +#include <linux/slab.h>
>>>>>> +#include <linux/memory.h>
>>>>>> +#include <linux/memory_hotplug.h>
>>>>>> +#include <asm/machdep.h>
>>>>>> +#include <asm/debugfs.h>
>>>>>> +#include <asm/cacheflush.h>
>>>>>> +
>>>>>> +struct memtrace_entry {
>>>>>> +    void *mem;
>>>>>> +    u64 start;
>>>>>> +    u64 size;
>>>>>> +    u32 nid;
>>>>>> +    struct dentry *dir;
>>>>>> +    char name[16];
>>>>>> +};
>>>>> Little bit of description about the structure here will help.
>>>> Something like 'this enables us to keep track of the memory removed from
>>>> each node'?
>>> Right, something like that.
>>>
>>>>>> +
>>>>>> +static struct memtrace_entry *memtrace_array;
>>>>>> +static unsigned int memtrace_array_nr;
>>>>>> +
>>>>>> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
>>>>>> +                 size_t count, loff_t *ppos)
>>>>>> +{
>>>>>> +    struct memtrace_entry *ent = filp->private_data;
>>>>>> +
>>>>>> +    return simple_read_from_buffer(ubuf, count, ppos, ent->mem,
>>>>>> ent->size);
>>>>>> +}
>>>>>> +
>>>>>> +static bool valid_memtrace_range(struct memtrace_entry *dev,
>>>>>> +                 unsigned long start, unsigned long size)
>>>>>> +{
>>>>>> +    if ((dev->start <= start) &&
>>>>> Switch the position of start and dev->start above. Will make
>>>>> it easy while reading.
>>>>>
>>>>>> +        ((start + size) <= (dev->start + dev->size)))
>>>>>> +        return true;
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>> +static int memtrace_mmap(struct file *filp, struct vm_area_struct
>>>>>> *vma)
>>>>>> +{
>>>>>> +    unsigned long size = vma->vm_end - vma->vm_start;
>>>>>> +    struct memtrace_entry *dev = filp->private_data;
>>>>>> +
>>>>>> +    if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT,
>>>>>> size))
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>>>> Why we do this ? Its coming from real RAM not IO memory. Then the page
>>>>> protection still needs changes ?
>>>> Once the memory is removed from the kernel mappings we want to mark
>>>> it as
>>>> uncachable.
>>> Got it but why ? Uncachable marking are for pages which will be mapped
>>> to IO ranges which should not be cached just to prevent the possibility
>>> of stale data.
>>>>>> +
>>>>>> +    if (io_remap_pfn_range(vma, vma->vm_start,
>>>>>> +                   vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
>>>>>> +                   size, vma->vm_page_prot))
>>>>> You can just call remap_pfn_rang() instead though they are all the
>>>>> same.
>>>>> There is nothing I/O here should be explicit.
>>>> Good point.
>>>>
>>>>>> +        return -EAGAIN;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static const struct file_operations memtrace_fops = {
>>>>>> +    .llseek = default_llseek,
>>>>>> +    .read    = memtrace_read,
>>>>>> +    .mmap    = memtrace_mmap,
>>>>>> +    .open    = simple_open,
>>>>>> +};
>>>>>> +
>>>>>> +static int check_memblock_online(struct memory_block *mem, void *arg)
>>>>>> +{
>>>>>> +    if (mem->state != MEM_ONLINE)
>>>>>> +        return -1;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int change_memblock_state(struct memory_block *mem, void *arg)
>>>>>> +{
>>>>>> +    unsigned long state = (unsigned long)arg;
>>>>>> +
>>>>>> +    mem->state = state;
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64
>>>>>> nr_pages)
>>>>>> +{
>>>>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>>>>> +
>>>>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>>>>> +        check_memblock_online))
>>>>>> +        return false;
>>>>>> +
>>>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
>>>>>> +              change_memblock_state);
>>>>>> +
>>>>> walk_memory_range() might be expensive, cant we just change the state
>>>>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>>>>> the first loop and bail out if any of the memblock is not in MEM_ONLINE
>>>>> in the first place.
>>>> Good idea.
>>>>
>>>>>> +    mem_hotplug_begin();
>>>>>> +    if (offline_pages(start_pfn, nr_pages)) {
>>>>>> +        walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>>>>>> +                  change_memblock_state);
>>>>>> +        mem_hotplug_done();
>>>>> Right, this can remain as is. If we fail to offline pages, mark the
>>>>> memory blocks as MEM_ONLINE again.
>>>>>
>>>>>> +        return false;
>>>>>> +    }
>>>>>> +
>>>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>>>>>> +              change_memblock_state);
>>>>>> +    mem_hotplug_done();
>>>>> Right.
>>>>>
>>>>>> +
>>>>>> +    /* Clear the dcache to remove any references to the memory */
>>>>>> +    flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
>>>>>> +                   (u64)__va(end_pfn << PAGE_SHIFT));
>>>>> I am wondering why this is required now when we dont do anything for
>>>>> cache flushing calls from core VM. If its really required now then
>>>>> it also should be required during memory hot unplug operations in
>>>>> general as well.
>>>> I could not see if this was being done when removing memory so figured
>>>> that it was better to put it in than not do it.
>>> Looking at the definitions I had pointed out before which gets
>>> called from core VM, powerpc does not need to do anything specific
>>> for cache invalidation or flushing. But I am not really sure on
>>> this. So let it be.
>>>
>>>>> /*
>>>>>     * No cache flushing is required when address mappings are changed,
>>>>>     * because the caches on PowerPCs are physically addressed.
>>>>>     */
>>>>> #define flush_cache_all()            do { } while (0)
>>>>> #define flush_cache_mm(mm)            do { } while (0)
>>>>> #define flush_cache_dup_mm(mm)            do { } while (0)
>>>>> #define flush_cache_range(vma, start, end)    do { } while (0)
>>>>> #define flush_cache_page(vma, vmaddr, pfn)    do { } while (0)
>>>>> #define flush_icache_page(vma, page)        do { } while (0)
>>>>> #define flush_cache_vmap(start, end)        do { } while (0)
>>>>> #define flush_cache_vunmap(start, end)        do { } while (0)
>>>>>
>>>>>
>>>>>> +
>>>>>> +    /* Now remove memory from the mappings */
>>>>>> +    lock_device_hotplug();
>>>>>> +    remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages <<
>>>>>> PAGE_SHIFT);
>>>>>> +    unlock_device_hotplug();
>>>>> Right. Now we have successfully taken down the memory.
>>>>>
>>>>>> +
>>>>>> +    return true;
>>>>>> +}
>>>>>> +
>>>>>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>>>>>> +{
>>>>>> +    u64 start_pfn, end_pfn, nr_pages;
>>>>>> +    u64 base_pfn;
>>>>>> +
>>>>>> +    if (!NODE_DATA(nid) || !node_spanned_pages(nid))
>>>>>> +        return 0;
>>>>> Why NODE_DATA check is required here ? Each node should have one
>>>>> allocated and initialized by now, else we have bigger problems.
>>>>> Is there any specific reason to check for spanned pages instead
>>>>> of present/managed pages.
>>>> Anton wrote this check, so will need to confirm with him. I assume
>>>> we check node_spanned_pages() rather than node_present_pages()
>>>> because in arch/powerpc/mm/numa.c we set node_spanned_pages() and
>>>> not node_present_pages()?
>>> I guess any thing is okay but NODE_DATA() seems redundant though.
>> Agreed.
>>
>>> struct pglist_data {
>>>           ..........................
>>>      unsigned long node_present_pages; /* total number of physical
>>> pages */
>>>      unsigned long node_spanned_pages; /* total size of physical page
>>>                           range, including holes */
>>>
>>> }
>>>
>>>>>> +
>>>>>> +    start_pfn = node_start_pfn(nid);
>>>>>> +    end_pfn = node_end_pfn(nid);
>>>>>> +    nr_pages = size >> PAGE_SHIFT;
>>>>>> +
>>>>>> +    /* Trace memory needs to be aligned to the size */
>>>>>> +    end_pfn = round_down(end_pfn - nr_pages, nr_pages);
>>>>>> +
>>>>>> +    for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -=
>>>>>> nr_pages) {
>>>>>> +        if (memtrace_offline_pages(nid, base_pfn, nr_pages) == true)
>>>>>> +            return base_pfn << PAGE_SHIFT;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int memtrace_init_regions_runtime(u64 size)
>>>>>> +{
>>>>>> +    u64 m;
>>>>>> +    u32 nid;
>>>>>> +
>>>>>> +    memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
>>>>>> +                num_online_nodes(), GFP_KERNEL);
>>>>>> +    if (!memtrace_array) {
>>>>>> +        pr_err("Failed to allocate memtrace_array\n");
>>>>>> +        return -EINVAL;
>>>>>> +    }
>>>>>> +
>>>>>> +    for_each_online_node(nid) {
>>>>>> +        m = memtrace_alloc_node(nid, size);
>>>>>> +        /*
>>>>>> +         * A node might not have any local memory, so warn but
>>>>>> +         * continue on.
>>>>>> +         */
>>>>>> +        if (!m) {
>>>>>> +            pr_err("Failed to allocate trace memory on node %d\n",
>>>>>> +                 nid);
>>>>>> +        } else {
>>>>>> +            pr_info("Allocated trace memory on node %d at
>>>>>> 0x%016llx\n",
>>>>>> +                 nid, m);
>>>>>> +
>>>>>> +            memtrace_array[memtrace_array_nr].start = m;
>>>>>> +            memtrace_array[memtrace_array_nr].size = size;
>>>>>> +            memtrace_array[memtrace_array_nr].nid = nid;
>>>>>> +            memtrace_array_nr++;
>>>>>> +        }
>>>>>> +    }
>>>>>> +    return 0;
>>>>>> +}
>>>>> All the pr_info() and pr_err() prints should have a "memtrace :"
>>>>> before the
>>>>> actual string to make it clear that its coming from this feature.
>>>>>
>>>> Good point!
>>>>>> +
>>>>>> +static struct dentry *memtrace_debugfs_dir;
>>>>>> +
>>>>>> +static int memtrace_init_debugfs(void)
>>>>>> +{
>>>>>> +    int ret = 0;
>>>>>> +    int i;
>>>>>> +
>>>>>> +    for (i = 0; i < memtrace_array_nr; i++) {
>>>>>> +        struct dentry *dir;
>>>>>> +        struct memtrace_entry *ent = &memtrace_array[i];
>>>>>> +
>>>>>> +        ent->mem = ioremap(ent->start, ent->size);
>>>>>> +        /* Warn but continue on */
>>>>>> +        if (!ent->mem) {
>>>>>> +            pr_err("Failed to map trace memory at 0x%llx\n",
>>>>>> +                 ent->start);
>>>>>> +            ret = -1;
>>>>>> +            continue;
>>>>>> +        }
>>>>>> +
>>>>>> +        snprintf(ent->name, 16, "%08x", ent->nid);
>>>>>> +        dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
>>>>>> +        if (!dir)
>>>>>> +            return -1;
>>>>>> +
>>>>>> +        ent->dir = dir;
>>>>>> +        debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
>>>>>> +        debugfs_create_x64("start", 0400, dir, &ent->start);
>>>>>> +        debugfs_create_x64("size", 0400, dir, &ent->size);
>>>>>> +        debugfs_create_u32("node", 0400, dir, &ent->nid);
>>>>>> +    }
>>>>> Oh okay, its creating all the four files. Please create corresponding
>>>>> to each of the files some where. Documentation/ABI/testing lists the
>>>>> actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.
>>>> I'm not exactly sure what you are saying here... Seeing that there is
>>>> documentation about debugfs files in Documentation/ABI/testing, I'll
>>>> follow suit
>>>> and put it there.
>>> I meant the same.
>>>
>>>>>> +
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static u64 memtrace_size;
>>>>>> +
>>>>>> +static int memtrace_enable_set(void *data, u64 val)
>>>>>> +{
>>>>>> +    if (memtrace_size)
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    if (!val)
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    /* Make sure size is aligned to a memory block */
>>>>>> +    if (val & (memory_block_size_bytes()-1))
>>>>> As I have mentioned earlier, this should be mentioned in the interface
>>>>> description some where.
>>>>>
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    if (memtrace_init_regions_runtime(val))
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    if (memtrace_init_debugfs())
>>>>>> +        return -EINVAL;
>>>>>> +
>>>>>> +    memtrace_size = val;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int memtrace_enable_get(void *data, u64 *val)
>>>>>> +{
>>>>>> +    *val = memtrace_size;
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>>>>>> memtrace_enable_set, "0x%016llx\n");
>>>>>> +
>>>>>> +static int memtrace_init(void)
>>>>>> +{
>>>>>> +    memtrace_debugfs_dir = debugfs_create_dir("memtrace",
>>>>>> powerpc_debugfs_root);
>>>>>> +    if (!memtrace_debugfs_dir)
>>>>>> +        return -1;
>>>>>> +
>>>>>> +    debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
>>>>>> +                NULL, &memtrace_init_fops);
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +machine_device_initcall(powernv, memtrace_init);
>>>>>> +
>>>>>>
>>>>> BTW how we start the tracing process for the trace to be collected
>>>>> in the
>>>>> interface before we can read them ? This interface does not seem to
>>>>> have
>>>>> a handler. When it directs the HW to start collecting the traces ?
>>>>>
>>>>> debugfs_create_x64("start", 0400, dir, &ent->start);
>>>>>
>>>>>
>>>> I think you're asking 'what is actually going to call this code and do
>>>> the tracing'?
>>> No, when you call this interface, where is the routine to start the
>>> actual tracing invoking appropriate platform functions or HW
>>> instructions ? I dont see such a function associated with 'start'
>>> interface mentioned above.
>> Essentially,
>> DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>> memtrace_enable_set, "0x%016llx\n");
>>
>> means that when a number is written to the memtrace/enable file, the
>> number is
>> read as a u64 and passed to memtrace_enable_set()
> Right, then we create the following interfaces for each entry of the memory
> trace. By now all the memory ranges are ioremapped.
>
> +		debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
> +		debugfs_create_x64("start", 0400, dir, &ent->start);
> +		debugfs_create_x64("size", 0400, dir, &ent->size);
> +		debugfs_create_u32("node", 0400, dir, &ent->nid);
>
> Then what really starts the HW trace ? where we tell the HW to start tracing
> and put the trace details in the memory buffer allocated and ioremapped ?
> IIUC just making the memory ioremapped() wont start the trace automatically.
>

Ah sorry, misunderstood you. Yes, this patch is only to remove the 
memory from kernel mappings so the hardware tracing can use it - not the 
actual code that invokes the tracing. That code involves some hardware 
information that hasn't been made publicly accessible, so it is not my 
place to share that. The hardware trace will be written to the 'removed' 
memory directly by the hardware and the debugfs files allow us to read 
the trace without having to reonline the memory.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing
  2017-05-15  3:34             ` Rashmica Gupta
@ 2017-05-15  4:35               ` Anshuman Khandual
  0 siblings, 0 replies; 14+ messages in thread
From: Anshuman Khandual @ 2017-05-15  4:35 UTC (permalink / raw)
  To: Rashmica Gupta, Anton Blanchard, linuxppc-dev, mpe, npiggin,
	bsingharora, oohall

On 05/15/2017 09:04 AM, Rashmica Gupta wrote:
> 
> 
> On 14/05/17 14:55, Anshuman Khandual wrote:
>> On 05/09/2017 12:36 PM, Rashmica Gupta wrote:
>>> Sorry for the late reply, I somehow missed this.
>>>
>>>
>>> On 03/05/17 21:56, Anshuman Khandual wrote:
>>>> On 05/03/2017 09:22 AM, Rashmica Gupta wrote:
>>>>> On 28/04/17 19:52, Anshuman Khandual wrote:
>>>>>> On 04/28/2017 11:12 AM, Rashmica Gupta wrote:
>>>>>>> Some powerpc hardware features may want to gain access to a chunk of
>>>>>> What kind of features ? Please add specifics.
>>>>>>
>>>>>>> undisturbed real memory.  This update provides a means to unplug
>>>>>>> said
>>>>>>> memory
>>>>>> Undisturbed ? Meaning part of memblock and currently inside the buddy
>>>>>> allocator which we are trying to hot unplug out ?
>>>>>>
>>>>>>> from the kernel with a set of debugfs calls.  By writing an integer
>>>>>>> containing
>>>>>>>     the size of memory to be unplugged into
>>>>>> Does the size has some constraints like aligned with memblock section
>>>>>> size ? LMB size ? page block size ? etc. Please add the details.
>>>>> Will do.
>>>>>
>>>>>>> /sys/kernel/debug/powerpc/memtrace/enable, the code will remove that
>>>>>>> much
>>>>>>> memory from the end of each available chip's memory space (ie each
>>>>>>> memory node).
>>>>>> <size> amount (I guess bytes in this case) of memory will be removed
>>>>>> from the end of the NUMA node ? Whats the guarantee that they
>>>>>> would be
>>>>>> free at that time and not being pinned by some process ? If its not
>>>>>> guaranteed to be freed, then interface description should state that
>>>>>> clearly.
>>>>> We start looking from the end of the NUMA node but of course there
>>>>> is no
>>>>> guarantee
>>>>> that we will always be able to find some memory there that we are able
>>>>> to remove.
>>>> Okay. Do we have interface for giving this memory back to the buddy
>>>> allocator again when we are done with HW tracing ? If not we need to
>>>> add one.
>>> Not at the moment. Last time I spoke to Anton he said something along
>>> the lines
>>> of it not being too important as if you are getting the hardware traces
>>> for debugging
>>> purposes you are probably not worried about a bit of memory being out of
>>> action.
>>>
>>> However I can't see why having an interface to online the memory would
>>> be a bad thing,
>>> so I'll look into it.
>> Yes, the interface to put them back into buddy is important even if the
>> amount of memory is very less for tracing. Just need to trigger hotplug
>> and online procedure to put it back.
>>
>>>>>>> In addition, the means to read out the contents of the unplugged
>>>>>>> memory is also
>>>>>>> provided by reading out the
>>>>>>> /sys/kernel/debug/powerpc/memtrace/<chip-id>/trace
>>>>>>> file.
>>>>>> All of the debugfs file interfaces added here should be documented
>>>>>> some
>>>>>> where in detail.
>>>>>>
>>>>>>> Signed-off-by: Anton Blanchard <anton@samba.org>
>>>>>>> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
>>>>>>>
>>>>>>> ---
>>>>>>> This requires the 'Wire up hpte_removebolted for powernv' patch.
>>>>>>>
>>>>>>> RFC -> v1: Added in two missing locks. Replaced the open-coded
>>>>>>> flush_memory_region() with the existing
>>>>>>> flush_inval_dcache_range(start, end).
>>>>>>>
>>>>>>> memtrace_offline_pages() is open-coded because offline_pages is
>>>>>>> designed to be
>>>>>>> called through the sysfs interface - not directly.
>>>>>>>
>>>>>>> We could move the offlining of pages to userspace, which removes
>>>>>>> some
>>>>>>> of this
>>>>>>> open-coding. This would then require passing info to the kernel such
>>>>>>> that it
>>>>>>> can then remove the memory that has been offlined. This could be
>>>>>>> done
>>>>>>> using
>>>>>>> notifiers, but this isn't simple due to locking (remove_memory needs
>>>>>>> mem_hotplug_begin() which the sysfs interface already has). This
>>>>>>> could also be
>>>>>>> done through the debugfs interface (similar to what is done here).
>>>>>>> Either way,
>>>>>>> this would require the process that needs the memory to have
>>>>>>> open-coded code
>>>>>>> which it shouldn't really be involved with.
>>>>>>>
>>>>>>> As the current remove_memory() function requires the memory to
>>>>>>> already be
>>>>>>> offlined, it makes sense to keep the offlining and removal of memory
>>>>>>> functionality grouped together so that a process can simply make one
>>>>>>> request to
>>>>>>> unplug some memory. Ideally there would be a kernel function we
>>>>>>> could
>>>>>>> call that
>>>>>>> would offline the memory and then remove it.
>>>>>>>
>>>>>>>
>>>>>>>     arch/powerpc/platforms/powernv/memtrace.c | 276
>>>>>>> ++++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 276 insertions(+)
>>>>>>>     create mode 100644 arch/powerpc/platforms/powernv/memtrace.c
>>>>>>>
>>>>>>> diff --git a/arch/powerpc/platforms/powernv/memtrace.c
>>>>>>> b/arch/powerpc/platforms/powernv/memtrace.c
>>>>>>> new file mode 100644
>>>>>>> index 0000000..86184b1
>>>>>>> --- /dev/null
>>>>>>> +++ b/arch/powerpc/platforms/powernv/memtrace.c
>>>>>>> @@ -0,0 +1,276 @@
>>>>>>> +/*
>>>>>>> + * This program is free software; you can redistribute it and/or
>>>>>>> modify
>>>>>>> + * it under the terms of the GNU General Public License as
>>>>>>> published by
>>>>>>> + * the Free Software Foundation; either version 2 of the
>>>>>>> License, or
>>>>>>> + * (at your option) any later version.
>>>>>>> + *
>>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>>> + * GNU General Public License for more details.
>>>>>>> + *
>>>>>>> + * Copyright (C) IBM Corporation, 2014
>>>>>>> + *
>>>>>>> + * Author: Anton Blanchard <anton@au.ibm.com>
>>>>>>> + */
>>>>>>> +
>>>>>>> +#define pr_fmt(fmt) "powernv-memtrace: " fmt
>>>>>>> +
>>>>>>> +#include <linux/bitops.h>
>>>>>>> +#include <linux/string.h>
>>>>>>> +#include <linux/memblock.h>
>>>>>>> +#include <linux/init.h>
>>>>>>> +#include <linux/moduleparam.h>
>>>>>>> +#include <linux/fs.h>
>>>>>>> +#include <linux/debugfs.h>
>>>>>>> +#include <linux/slab.h>
>>>>>>> +#include <linux/memory.h>
>>>>>>> +#include <linux/memory_hotplug.h>
>>>>>>> +#include <asm/machdep.h>
>>>>>>> +#include <asm/debugfs.h>
>>>>>>> +#include <asm/cacheflush.h>
>>>>>>> +
>>>>>>> +struct memtrace_entry {
>>>>>>> +    void *mem;
>>>>>>> +    u64 start;
>>>>>>> +    u64 size;
>>>>>>> +    u32 nid;
>>>>>>> +    struct dentry *dir;
>>>>>>> +    char name[16];
>>>>>>> +};
>>>>>> Little bit of description about the structure here will help.
>>>>> Something like 'this enables us to keep track of the memory removed
>>>>> from
>>>>> each node'?
>>>> Right, something like that.
>>>>
>>>>>>> +
>>>>>>> +static struct memtrace_entry *memtrace_array;
>>>>>>> +static unsigned int memtrace_array_nr;
>>>>>>> +
>>>>>>> +static ssize_t memtrace_read(struct file *filp, char __user *ubuf,
>>>>>>> +                 size_t count, loff_t *ppos)
>>>>>>> +{
>>>>>>> +    struct memtrace_entry *ent = filp->private_data;
>>>>>>> +
>>>>>>> +    return simple_read_from_buffer(ubuf, count, ppos, ent->mem,
>>>>>>> ent->size);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static bool valid_memtrace_range(struct memtrace_entry *dev,
>>>>>>> +                 unsigned long start, unsigned long size)
>>>>>>> +{
>>>>>>> +    if ((dev->start <= start) &&
>>>>>> Switch the position of start and dev->start above. Will make
>>>>>> it easy while reading.
>>>>>>
>>>>>>> +        ((start + size) <= (dev->start + dev->size)))
>>>>>>> +        return true;
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int memtrace_mmap(struct file *filp, struct vm_area_struct
>>>>>>> *vma)
>>>>>>> +{
>>>>>>> +    unsigned long size = vma->vm_end - vma->vm_start;
>>>>>>> +    struct memtrace_entry *dev = filp->private_data;
>>>>>>> +
>>>>>>> +    if (!valid_memtrace_range(dev, vma->vm_pgoff << PAGE_SHIFT,
>>>>>>> size))
>>>>>>> +        return -EINVAL;
>>>>>>> +
>>>>>>> +    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>>>>> Why we do this ? Its coming from real RAM not IO memory. Then the
>>>>>> page
>>>>>> protection still needs changes ?
>>>>> Once the memory is removed from the kernel mappings we want to mark
>>>>> it as
>>>>> uncachable.
>>>> Got it but why ? Uncachable marking are for pages which will be mapped
>>>> to IO ranges which should not be cached just to prevent the possibility
>>>> of stale data.
>>>>>>> +
>>>>>>> +    if (io_remap_pfn_range(vma, vma->vm_start,
>>>>>>> +                   vma->vm_pgoff + (dev->start >> PAGE_SHIFT),
>>>>>>> +                   size, vma->vm_page_prot))
>>>>>> You can just call remap_pfn_rang() instead though they are all the
>>>>>> same.
>>>>>> There is nothing I/O here should be explicit.
>>>>> Good point.
>>>>>
>>>>>>> +        return -EAGAIN;
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static const struct file_operations memtrace_fops = {
>>>>>>> +    .llseek = default_llseek,
>>>>>>> +    .read    = memtrace_read,
>>>>>>> +    .mmap    = memtrace_mmap,
>>>>>>> +    .open    = simple_open,
>>>>>>> +};
>>>>>>> +
>>>>>>> +static int check_memblock_online(struct memory_block *mem, void
>>>>>>> *arg)
>>>>>>> +{
>>>>>>> +    if (mem->state != MEM_ONLINE)
>>>>>>> +        return -1;
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int change_memblock_state(struct memory_block *mem, void
>>>>>>> *arg)
>>>>>>> +{
>>>>>>> +    unsigned long state = (unsigned long)arg;
>>>>>>> +
>>>>>>> +    mem->state = state;
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64
>>>>>>> nr_pages)
>>>>>>> +{
>>>>>>> +    u64 end_pfn = start_pfn + nr_pages - 1;
>>>>>>> +
>>>>>>> +    if (walk_memory_range(start_pfn, end_pfn, NULL,
>>>>>>> +        check_memblock_online))
>>>>>>> +        return false;
>>>>>>> +
>>>>>>> +    walk_memory_range(start_pfn, end_pfn, (void
>>>>>>> *)MEM_GOING_OFFLINE,
>>>>>>> +              change_memblock_state);
>>>>>>> +
>>>>>> walk_memory_range() might be expensive, cant we just change the state
>>>>>> to MEM_GOING_OFFLINE while checking the state for MEM_ONLINE during
>>>>>> the first loop and bail out if any of the memblock is not in
>>>>>> MEM_ONLINE
>>>>>> in the first place.
>>>>> Good idea.
>>>>>
>>>>>>> +    mem_hotplug_begin();
>>>>>>> +    if (offline_pages(start_pfn, nr_pages)) {
>>>>>>> +        walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE,
>>>>>>> +                  change_memblock_state);
>>>>>>> +        mem_hotplug_done();
>>>>>> Right, this can remain as is. If we fail to offline pages, mark the
>>>>>> memory blocks as MEM_ONLINE again.
>>>>>>
>>>>>>> +        return false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>>>>>>> +              change_memblock_state);
>>>>>>> +    mem_hotplug_done();
>>>>>> Right.
>>>>>>
>>>>>>> +
>>>>>>> +    /* Clear the dcache to remove any references to the memory */
>>>>>>> +    flush_inval_dcache_range((u64)__va(start_pfn << PAGE_SHIFT),
>>>>>>> +                   (u64)__va(end_pfn << PAGE_SHIFT));
>>>>>> I am wondering why this is required now when we dont do anything for
>>>>>> cache flushing calls from core VM. If its really required now then
>>>>>> it also should be required during memory hot unplug operations in
>>>>>> general as well.
>>>>> I could not see if this was being done when removing memory so figured
>>>>> that it was better to put it in than not do it.
>>>> Looking at the definitions I had pointed out before which gets
>>>> called from core VM, powerpc does not need to do anything specific
>>>> for cache invalidation or flushing. But I am not really sure on
>>>> this. So let it be.
>>>>
>>>>>> /*
>>>>>>     * No cache flushing is required when address mappings are
>>>>>> changed,
>>>>>>     * because the caches on PowerPCs are physically addressed.
>>>>>>     */
>>>>>> #define flush_cache_all()            do { } while (0)
>>>>>> #define flush_cache_mm(mm)            do { } while (0)
>>>>>> #define flush_cache_dup_mm(mm)            do { } while (0)
>>>>>> #define flush_cache_range(vma, start, end)    do { } while (0)
>>>>>> #define flush_cache_page(vma, vmaddr, pfn)    do { } while (0)
>>>>>> #define flush_icache_page(vma, page)        do { } while (0)
>>>>>> #define flush_cache_vmap(start, end)        do { } while (0)
>>>>>> #define flush_cache_vunmap(start, end)        do { } while (0)
>>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +    /* Now remove memory from the mappings */
>>>>>>> +    lock_device_hotplug();
>>>>>>> +    remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages <<
>>>>>>> PAGE_SHIFT);
>>>>>>> +    unlock_device_hotplug();
>>>>>> Right. Now we have successfully taken down the memory.
>>>>>>
>>>>>>> +
>>>>>>> +    return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static u64 memtrace_alloc_node(u32 nid, u64 size)
>>>>>>> +{
>>>>>>> +    u64 start_pfn, end_pfn, nr_pages;
>>>>>>> +    u64 base_pfn;
>>>>>>> +
>>>>>>> +    if (!NODE_DATA(nid) || !node_spanned_pages(nid))
>>>>>>> +        return 0;
>>>>>> Why NODE_DATA check is required here ? Each node should have one
>>>>>> allocated and initialized by now, else we have bigger problems.
>>>>>> Is there any specific reason to check for spanned pages instead
>>>>>> of present/managed pages.
>>>>> Anton wrote this check, so will need to confirm with him. I assume
>>>>> we check node_spanned_pages() rather than node_present_pages()
>>>>> because in arch/powerpc/mm/numa.c we set node_spanned_pages() and
>>>>> not node_present_pages()?
>>>> I guess any thing is okay but NODE_DATA() seems redundant though.
>>> Agreed.
>>>
>>>> struct pglist_data {
>>>>           ..........................
>>>>      unsigned long node_present_pages; /* total number of physical
>>>> pages */
>>>>      unsigned long node_spanned_pages; /* total size of physical page
>>>>                           range, including holes */
>>>>
>>>> }
>>>>
>>>>>>> +
>>>>>>> +    start_pfn = node_start_pfn(nid);
>>>>>>> +    end_pfn = node_end_pfn(nid);
>>>>>>> +    nr_pages = size >> PAGE_SHIFT;
>>>>>>> +
>>>>>>> +    /* Trace memory needs to be aligned to the size */
>>>>>>> +    end_pfn = round_down(end_pfn - nr_pages, nr_pages);
>>>>>>> +
>>>>>>> +    for (base_pfn = end_pfn; base_pfn > start_pfn; base_pfn -=
>>>>>>> nr_pages) {
>>>>>>> +        if (memtrace_offline_pages(nid, base_pfn, nr_pages) ==
>>>>>>> true)
>>>>>>> +            return base_pfn << PAGE_SHIFT;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int memtrace_init_regions_runtime(u64 size)
>>>>>>> +{
>>>>>>> +    u64 m;
>>>>>>> +    u32 nid;
>>>>>>> +
>>>>>>> +    memtrace_array = kzalloc(sizeof(struct memtrace_entry) *
>>>>>>> +                num_online_nodes(), GFP_KERNEL);
>>>>>>> +    if (!memtrace_array) {
>>>>>>> +        pr_err("Failed to allocate memtrace_array\n");
>>>>>>> +        return -EINVAL;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    for_each_online_node(nid) {
>>>>>>> +        m = memtrace_alloc_node(nid, size);
>>>>>>> +        /*
>>>>>>> +         * A node might not have any local memory, so warn but
>>>>>>> +         * continue on.
>>>>>>> +         */
>>>>>>> +        if (!m) {
>>>>>>> +            pr_err("Failed to allocate trace memory on node %d\n",
>>>>>>> +                 nid);
>>>>>>> +        } else {
>>>>>>> +            pr_info("Allocated trace memory on node %d at
>>>>>>> 0x%016llx\n",
>>>>>>> +                 nid, m);
>>>>>>> +
>>>>>>> +            memtrace_array[memtrace_array_nr].start = m;
>>>>>>> +            memtrace_array[memtrace_array_nr].size = size;
>>>>>>> +            memtrace_array[memtrace_array_nr].nid = nid;
>>>>>>> +            memtrace_array_nr++;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>> All the pr_info() and pr_err() prints should have a "memtrace :"
>>>>>> before the
>>>>>> actual string to make it clear that its coming from this feature.
>>>>>>
>>>>> Good point!
>>>>>>> +
>>>>>>> +static struct dentry *memtrace_debugfs_dir;
>>>>>>> +
>>>>>>> +static int memtrace_init_debugfs(void)
>>>>>>> +{
>>>>>>> +    int ret = 0;
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    for (i = 0; i < memtrace_array_nr; i++) {
>>>>>>> +        struct dentry *dir;
>>>>>>> +        struct memtrace_entry *ent = &memtrace_array[i];
>>>>>>> +
>>>>>>> +        ent->mem = ioremap(ent->start, ent->size);
>>>>>>> +        /* Warn but continue on */
>>>>>>> +        if (!ent->mem) {
>>>>>>> +            pr_err("Failed to map trace memory at 0x%llx\n",
>>>>>>> +                 ent->start);
>>>>>>> +            ret = -1;
>>>>>>> +            continue;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        snprintf(ent->name, 16, "%08x", ent->nid);
>>>>>>> +        dir = debugfs_create_dir(ent->name, memtrace_debugfs_dir);
>>>>>>> +        if (!dir)
>>>>>>> +            return -1;
>>>>>>> +
>>>>>>> +        ent->dir = dir;
>>>>>>> +        debugfs_create_file("trace", 0400, dir, ent,
>>>>>>> &memtrace_fops);
>>>>>>> +        debugfs_create_x64("start", 0400, dir, &ent->start);
>>>>>>> +        debugfs_create_x64("size", 0400, dir, &ent->size);
>>>>>>> +        debugfs_create_u32("node", 0400, dir, &ent->nid);
>>>>>>> +    }
>>>>>> Oh okay, its creating all the four files. Please create corresponding
>>>>>> to each of the files some where. Documentation/ABI/testing lists the
>>>>>> actual system ABI on /sys/ not the sys/kernel/debug/ ones I guess.
>>>>> I'm not exactly sure what you are saying here... Seeing that there is
>>>>> documentation about debugfs files in Documentation/ABI/testing, I'll
>>>>> follow suit
>>>>> and put it there.
>>>> I meant the same.
>>>>
>>>>>>> +
>>>>>>> +    return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static u64 memtrace_size;
>>>>>>> +
>>>>>>> +static int memtrace_enable_set(void *data, u64 val)
>>>>>>> +{
>>>>>>> +    if (memtrace_size)
>>>>>>> +        return -EINVAL;
>>>>>>> +
>>>>>>> +    if (!val)
>>>>>>> +        return -EINVAL;
>>>>>>> +
>>>>>>> +    /* Make sure size is aligned to a memory block */
>>>>>>> +    if (val & (memory_block_size_bytes()-1))
>>>>>> As I have mentioned earlier, this should be mentioned in the
>>>>>> interface
>>>>>> description some where.
>>>>>>
>>>>>>> +        return -EINVAL;
>>>>>>> +
>>>>>>> +    if (memtrace_init_regions_runtime(val))
>>>>>>> +        return -EINVAL;
>>>>>>> +
>>>>>>> +    if (memtrace_init_debugfs())
>>>>>>> +        return -EINVAL;
>>>>>>> +
>>>>>>> +    memtrace_size = val;
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int memtrace_enable_get(void *data, u64 *val)
>>>>>>> +{
>>>>>>> +    *val = memtrace_size;
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>>>>>>> memtrace_enable_set, "0x%016llx\n");
>>>>>>> +
>>>>>>> +static int memtrace_init(void)
>>>>>>> +{
>>>>>>> +    memtrace_debugfs_dir = debugfs_create_dir("memtrace",
>>>>>>> powerpc_debugfs_root);
>>>>>>> +    if (!memtrace_debugfs_dir)
>>>>>>> +        return -1;
>>>>>>> +
>>>>>>> +    debugfs_create_file("enable", 0600, memtrace_debugfs_dir,
>>>>>>> +                NULL, &memtrace_init_fops);
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +machine_device_initcall(powernv, memtrace_init);
>>>>>>> +
>>>>>>>
>>>>>> BTW how we start the tracing process for the trace to be collected
>>>>>> in the
>>>>>> interface before we can read them ? This interface does not seem to
>>>>>> have
>>>>>> a handler. When it directs the HW to start collecting the traces ?
>>>>>>
>>>>>> debugfs_create_x64("start", 0400, dir, &ent->start);
>>>>>>
>>>>>>
>>>>> I think you're asking 'what is actually going to call this code and do
>>>>> the tracing'?
>>>> No, when you call this interface, where is the routine to start the
>>>> actual tracing invoking appropriate platform functions or HW
>>>> instructions ? I dont see such a function associated with 'start'
>>>> interface mentioned above.
>>> Essentially,
>>> DEFINE_SIMPLE_ATTRIBUTE(memtrace_init_fops, memtrace_enable_get,
>>> memtrace_enable_set, "0x%016llx\n");
>>>
>>> means that when a number is written to the memtrace/enable file, the
>>> number is
>>> read as a u64 and passed to memtrace_enable_set()
>> Right, then we create the following interfaces for each entry of the
>> memory
>> trace. By now all the memory ranges are ioremapped.
>>
>> +        debugfs_create_file("trace", 0400, dir, ent, &memtrace_fops);
>> +        debugfs_create_x64("start", 0400, dir, &ent->start);
>> +        debugfs_create_x64("size", 0400, dir, &ent->size);
>> +        debugfs_create_u32("node", 0400, dir, &ent->nid);
>>
>> Then what really starts the HW trace ? where we tell the HW to start
>> tracing
>> and put the trace details in the memory buffer allocated and ioremapped ?
>> IIUC just making the memory ioremapped() wont start the trace
>> automatically.
>>
> 
> Ah sorry, misunderstood you. Yes, this patch is only to remove the
> memory from kernel mappings so the hardware tracing can use it - not the
> actual code that invokes the tracing. That code involves some hardware
> information that hasn't been made publicly accessible, so it is not my
> place to share that. The hardware trace will be written to the 'removed'
> memory directly by the hardware and the debugfs files allow us to read
> the trace without having to reonline the memory.

It makes sense now. Thanks Rashmica.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-05-15  4:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-28  5:42 [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Rashmica Gupta
2017-04-28  5:42 ` [PATCH 2/2] v1 powerpc/powernv: Enable removal of memory for in memory tracing Rashmica Gupta
2017-04-28  9:52   ` Anshuman Khandual
2017-05-03  3:52     ` Rashmica Gupta
2017-05-03  6:08       ` Rashmica Gupta
2017-05-03 11:25         ` Anshuman Khandual
2017-05-03  6:08       ` Rashmica Gupta
2017-05-03 11:56       ` Anshuman Khandual
2017-05-09  7:06         ` Rashmica Gupta
2017-05-14  4:55           ` Anshuman Khandual
2017-05-15  3:34             ` Rashmica Gupta
2017-05-15  4:35               ` Anshuman Khandual
2017-04-28  9:39 ` [PATCH 1/2] powerpc/powernv: Add config option for removal of memory Anshuman Khandual
2017-05-03  3:52   ` Rashmica Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.