linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] pseries: Move memory hotplug to the kernel
@ 2014-09-15 20:26 Nathan Fontenot
  2014-09-15 20:29 ` [PATCH 1/5] pseries: Define rtas hotplug event sections Nathan Fontenot
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-15 20:26 UTC (permalink / raw)
  To: linuxppc-dev

In order to better support device hotplug (cpu, memory, and pci) in the
PowerVM and PowerKVM environments, the handling of device hotplug
could be updated so that the act of hotplugging a device occurs entirely
in the kernel. This patch set begins to address this by moving
memory hotplug to the kernel. Patches to follow will do the same
for cpu and pci devices.

To provide background, the current handling of memory hotplug is
handled by the drmgr command. This command is invoked when memory
add/remove requests are made at the HMC and conveyed to a partition
through the RSCT framework. The drmgr command then performs parts
of the hotplug in user-space and makes requests to the kernel to perform
other pieces. This is not really ideal, we can do everything in the
kernel and do it faster.

In this patchset, hotplug events will now be communicated to the kernel
in the form of rtas hotplug events. For PowerKVM systems this is done
by qemu using the ras epow interrupt. For PowerVM systems the drmgr
command will be updated to create a rtas hotplug event and send it to
the kernel via a new /proc/powerpc/dlpar interface. Both of these
entry points for hotplug rtas events then call a common routine
for handling rtas hotplug events.

-Nathan

Patch 1/5
- Add definition of hotplug rtas event sections.

Patch 2/5
- export the dlpar_[acquire|release]drc() routines.

Patch 3/5
- Create the new /proc/powerpc/dlpar interface

Patch 4/5
- Implement memory hotplug add in the kernel.

Patch 5/5
- Implement memory hotplug remove in the kernel.

 include/asm/rtas.h                 |   26 ++
 platforms/pseries/dlpar.c          |   63 ++++++-
 platforms/pseries/hotplug-memory.c |  332 ++++++++++++++++++++++++++++++++++++-
 platforms/pseries/pseries.h        |   12 +
 4 files changed, 431 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] pseries: Define rtas hotplug event sections
  2014-09-15 20:26 [PATCH 0/5] pseries: Move memory hotplug to the kernel Nathan Fontenot
@ 2014-09-15 20:29 ` Nathan Fontenot
  2014-09-17  7:06   ` [1/5] " Michael Ellerman
  2014-09-15 20:30 ` [PATCH 2/5] pseries: Export drc_[acquire|release]_drc() routines Nathan Fontenot
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-15 20:29 UTC (permalink / raw)
  To: linuxppc-dev

In order to handle device hotplug in the kernel on pseries the hotplug
notification will be communicated to the kernel in the form of a
rtas hotplug event. This patch adds the definition of rtas hotplug event
sections.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b390f55..a01879e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -273,6 +273,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_MANUFACT_INFO	(('M' << 8) | 'I')
 #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
 #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
+#define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
 
 /* Vendor specific Platform Event Log Format, Version 6, section header */
 struct pseries_errorlog {
@@ -296,6 +297,31 @@ inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
 	return be16_to_cpu(sect->length);
 }
 
+/* RTAS pseries hotplug errorlog section */
+struct pseries_hp_errorlog {
+	uint8_t	resource;
+	uint8_t	action;
+	uint8_t	id_type;
+	uint8_t	reserved;
+	union {
+		__be32	drc_index;
+		__be32	drc_count;
+		char	drc_name[1];
+	} _drc_u;
+};
+
+#define PSERIES_HP_ELOG_RESOURCE_CPU	1
+#define PSERIES_HP_ELOG_RESOURCE_MEM	2
+#define PSERIES_HP_ELOG_RESOURCE_SLOT	3
+#define PSERIES_HP_ELOG_RESOURCE_PHB	4
+
+#define PSERIES_HP_ELOG_ACTION_ADD	1
+#define PSERIES_HP_ELOG_ACTION_REMOVE	2
+
+#define PSERIES_HP_ELOG_ID_DRC_NAME	1
+#define PSERIES_HP_ELOG_ID_DRC_INDEX	2
+#define PSERIES_HP_ELOG_ID_DRC_COUNT	3
+
 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 					      uint16_t section_id);
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] pseries: Export drc_[acquire|release]_drc() routines
  2014-09-15 20:26 [PATCH 0/5] pseries: Move memory hotplug to the kernel Nathan Fontenot
  2014-09-15 20:29 ` [PATCH 1/5] pseries: Define rtas hotplug event sections Nathan Fontenot
@ 2014-09-15 20:30 ` Nathan Fontenot
  2014-09-17  7:07   ` [2/5] " Michael Ellerman
  2014-09-15 20:31 ` [PATCH 3/5] pseries: Create device hotplug entry point Nathan Fontenot
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-15 20:30 UTC (permalink / raw)
  To: linuxppc-dev

Export the routines to acquire and release a drc index.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/pseries.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 361add6..b94516b 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -59,6 +59,8 @@ extern void dlpar_free_cc_property(struct property *);
 extern struct device_node *dlpar_configure_connector(u32, struct device_node *);
 extern int dlpar_attach_node(struct device_node *);
 extern int dlpar_detach_node(struct device_node *);
+extern int dlpar_acquire_drc(u32);
+extern int dlpar_release_drc(u32);
 
 /* PCI root bridge prepare function override for pseries */
 struct pci_host_bridge;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] pseries: Create device hotplug entry point
  2014-09-15 20:26 [PATCH 0/5] pseries: Move memory hotplug to the kernel Nathan Fontenot
  2014-09-15 20:29 ` [PATCH 1/5] pseries: Define rtas hotplug event sections Nathan Fontenot
  2014-09-15 20:30 ` [PATCH 2/5] pseries: Export drc_[acquire|release]_drc() routines Nathan Fontenot
@ 2014-09-15 20:31 ` Nathan Fontenot
  2014-09-17  7:07   ` [3/5] " Michael Ellerman
  2014-09-15 20:32 ` [PATCH 4/5] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
  2014-09-15 20:33 ` [PATCH 5/5] pseries: Implement memory hotplug remove " Nathan Fontenot
  4 siblings, 1 reply; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-15 20:31 UTC (permalink / raw)
  To: linuxppc-dev

For pseries system the kernel will be notified of hotplug requests in
the form of rtas hotplug events. This patch creates a common routine that
can handle these requests in both the PowerVM anbd PowerKVM environments,
handle_dlpar_errorlog(). This also creates the initial memory hotplug
request handling stub.

For PowerVM this patch also creates a new /proc file that the drmgr
command will use to write rtas hotplug events to.

For future PowerKVM handling the rtas check-exception code can pass
any rtas hotplug events received to handle_dlpar_errorlog().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c          |   63 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/hotplug-memory.c |   22 ++++++++
 arch/powerpc/platforms/pseries/pseries.h        |   10 ++++
 3 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index a2450b8..574ec73 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -16,7 +16,9 @@
 #include <linux/cpu.h>
 #include <linux/slab.h>
 #include <linux/of.h>
+#include <linux/proc_fs.h>
 #include "offline_states.h"
+#include "pseries.h"
 
 #include <asm/prom.h>
 #include <asm/machdep.h>
@@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
 	return count;
 }
 
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
+
+static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
+{
+	struct pseries_errorlog *pseries_log;
+	struct pseries_hp_errorlog *hp_elog;
+	int rc = -EINVAL;
+
+	pseries_log = get_pseries_errorlog(error_log,
+					   PSERIES_ELOG_SECT_ID_HOTPLUG);
+	if (!pseries_log)
+		return rc;
+
+	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
+	if (!hp_elog)
+		return rc;
+
+	switch (hp_elog->resource) {
+	case PSERIES_HP_ELOG_RESOURCE_MEM:
+		rc = dlpar_memory(hp_elog);
+		break;
+	}
+
+	return rc;
+}
+
+static ssize_t dlpar_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *offset)
+{
+	char *event_buf;
+	int rc;
+
+	event_buf = kmalloc(count + 1, GFP_KERNEL);
+	if (!event_buf)
+		return -ENOMEM;
+
+	rc = copy_from_user(event_buf, buf, count);
+	if (rc) {
+		kfree(event_buf);
+		return rc;
+	}
+
+	rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf);
+	kfree(event_buf);
+	return rc ? rc : count;
+}
+
+static const struct file_operations dlpar_fops = {
+	.write = dlpar_write,
+	.llseek = noop_llseek,
+};
+
 static int __init pseries_dlpar_init(void)
 {
+	struct proc_dir_entry *proc_ent;
+
+	proc_ent = proc_create("powerpc/dlpar", S_IWUSR, NULL, &dlpar_fops);
+	if (proc_ent)
+		proc_set_size(proc_ent, 0);
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 	ppc_md.cpu_probe = dlpar_cpu_probe;
 	ppc_md.cpu_release = dlpar_cpu_release;
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
 	return 0;
 }
 machine_device_initcall(pseries, pseries_dlpar_init);
 
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 24abc5c..0e60e15 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -9,6 +9,8 @@
  *      2 of the License, or (at your option) any later version.
  */
 
+#define pr_fmt(fmt)	"pseries-hotplug: " fmt
+
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/memblock.h>
@@ -20,6 +22,9 @@
 #include <asm/machdep.h>
 #include <asm/prom.h>
 #include <asm/sparsemem.h>
+#include <asm/rtas.h>
+
+DEFINE_MUTEX(dlpar_mem_mutex);
 
 unsigned long pseries_memory_block_size(void)
 {
@@ -150,6 +155,23 @@ static inline int pseries_remove_mem_node(struct device_node *np)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
+{
+	int rc = 0;
+
+	mutex_lock(&dlpar_mem_mutex);
+
+	switch (hp_elog->action) {
+	default:
+		pr_err("Invalid action (%d) specified\n", hp_elog->action);
+		rc = -EINVAL;
+		break;
+	}
+
+	mutex_unlock(&dlpar_mem_mutex);
+	return rc;
+}
+
 static int pseries_add_mem_node(struct device_node *np)
 {
 	const char *type;
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index b94516b..28bd994 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -11,6 +11,7 @@
 #define _PSERIES_PSERIES_H
 
 #include <linux/interrupt.h>
+#include <asm/rtas.h>
 
 struct device_node;
 
@@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct device_node *);
 extern int dlpar_acquire_drc(u32);
 extern int dlpar_release_drc(u32);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern int dlpar_memory(struct pseries_hp_errorlog *);
+#else
+static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
+{
+	return -ENOTSUPP;
+}
+#endif
+
 /* PCI root bridge prepare function override for pseries */
 struct pci_host_bridge;
 int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] pseries: Implement memory hotplug add in the kernel
  2014-09-15 20:26 [PATCH 0/5] pseries: Move memory hotplug to the kernel Nathan Fontenot
                   ` (2 preceding siblings ...)
  2014-09-15 20:31 ` [PATCH 3/5] pseries: Create device hotplug entry point Nathan Fontenot
@ 2014-09-15 20:32 ` Nathan Fontenot
  2014-09-17  7:07   ` [4/5] " Michael Ellerman
  2014-09-15 20:33 ` [PATCH 5/5] pseries: Implement memory hotplug remove " Nathan Fontenot
  4 siblings, 1 reply; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-15 20:32 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds the ability to do memory hotplug adding in the kernel.

Currently the hotplug add/remove of memory is handled by the drmgr
command. The drmgr command performs the add/remove by performing
some work in user-space and making requests to the kernel to handle 
other pieces. By moving all of the work to the kernel we can do the
add and remove faster, and provide a common place to do memory hotplug
for both the PowerVM and PowerKVM environments.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  170 +++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0e60e15..b254773 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -17,6 +17,7 @@
 #include <linux/vmalloc.h>
 #include <linux/memory.h>
 #include <linux/memory_hotplug.h>
+#include <linux/slab.h>
 
 #include <asm/firmware.h>
 #include <asm/machdep.h>
@@ -24,6 +25,8 @@
 #include <asm/sparsemem.h>
 #include <asm/rtas.h>
 
+#include "pseries.h"
+
 DEFINE_MUTEX(dlpar_mem_mutex);
 
 unsigned long pseries_memory_block_size(void)
@@ -69,6 +72,53 @@ unsigned long pseries_memory_block_size(void)
 	return memblock_size;
 }
 
+static void dlpar_free_drconf_property(struct property *prop)
+{
+	kfree(prop->name);
+	kfree(prop->value);
+	kfree(prop);
+}
+
+static struct property *dlpar_clone_drconf_property(struct device_node *dn)
+{
+	struct property *prop, *new_prop;
+
+	prop = of_find_property(dn, "ibm,dynamic-memory", NULL);
+	if (!prop)
+		return NULL;
+
+	new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
+	if (!new_prop)
+		return NULL;
+
+	new_prop->name = kstrdup(prop->name, GFP_KERNEL);
+	new_prop->value = kmalloc(prop->length + 1, GFP_KERNEL);
+	if (!new_prop->name || !new_prop->value) {
+		dlpar_free_drconf_property(new_prop);
+		return NULL;
+	}
+
+	memcpy(new_prop->value, prop->value, prop->length);
+	new_prop->length = prop->length;
+	*(((char *)new_prop->value) + new_prop->length) = 0;
+
+	return new_prop;
+}
+
+static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
+{
+	unsigned long section_nr;
+	struct mem_section *mem_sect;
+	struct memory_block *mem_block;
+	u64 phys_addr = be64_to_cpu(lmb->base_addr);
+
+	section_nr = pfn_to_section_nr(PFN_DOWN(phys_addr));
+	mem_sect = __nr_to_section(section_nr);
+
+	mem_block = find_memory_block(mem_sect);
+	return mem_block;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static int pseries_remove_memory(u64 start, u64 size)
 {
@@ -155,13 +205,133 @@ static inline int pseries_remove_mem_node(struct device_node *np)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+static int dlpar_add_one_lmb(struct of_drconf_cell *lmb)
+{
+	struct memory_block *mem_block;
+	u64 phys_addr;
+	unsigned long pages_per_block;
+	unsigned long block_sz;
+	int nid, sections_per_block;
+	int rc;
+
+	phys_addr = be64_to_cpu(lmb->base_addr);
+	block_sz = memory_block_size_bytes();
+	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+	pages_per_block = PAGES_PER_SECTION * sections_per_block;
+
+	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
+		return -EINVAL;
+
+	nid = memory_add_physaddr_to_nid(phys_addr);
+	rc = add_memory(nid, phys_addr, block_sz);
+	if (rc)
+		return rc;
+
+	rc = memblock_add(phys_addr, block_sz);
+	if (rc) {
+		remove_memory(nid, phys_addr, block_sz);
+		return rc;
+	}
+
+	mem_block = lmb_to_memblock(lmb);
+	if (!mem_block) {
+		remove_memory(nid, phys_addr, block_sz);
+		return -EINVAL;
+	}
+
+	rc = device_online(&mem_block->dev);
+	put_device(&mem_block->dev);
+	if (rc)
+		remove_memory(nid, phys_addr, block_sz);
+
+	return rc;
+}
+
+static int dlpar_memory_add(struct pseries_hp_errorlog *hp_elog)
+{
+	struct of_drconf_cell *lmb;
+	struct device_node *dn;
+	struct property *prop;
+	uint32_t entries, *p;
+	int i, lmbs_to_add;
+	int lmbs_added = 0;
+	int rc = -EINVAL;
+
+	if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) {
+		lmbs_to_add = be32_to_cpu(hp_elog->_drc_u.drc_count);
+		pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
+	} else {
+		lmbs_to_add = 1;
+		pr_info("Attempting to hot-add LMB, drc index %x\n",
+			be32_to_cpu(hp_elog->_drc_u.drc_index));
+	}
+
+	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+	if (!dn)
+		return -EINVAL;
+
+	prop = dlpar_clone_drconf_property(dn);
+	if (!prop) {
+		of_node_put(dn);
+		return -EINVAL;
+	}
+
+	p = prop->value;
+	entries = be32_to_cpu(*p++);
+	lmb = (struct of_drconf_cell *)p;
+
+	for (i = 0; i < entries; i++, lmb++) {
+		u32 drc_index = be32_to_cpu(lmb->drc_index);
+
+		if (lmbs_to_add == lmbs_added)
+			break;
+
+		if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
+			continue;
+
+		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX
+		    && lmb->drc_index != hp_elog->_drc_u.drc_index)
+			continue;
+
+		rc = dlpar_acquire_drc(drc_index);
+		if (rc)
+			continue;
+
+		rc = dlpar_add_one_lmb(lmb);
+		if (rc) {
+			dlpar_release_drc(drc_index);
+			continue;
+		}
+
+		lmb->flags |= cpu_to_be32(DRCONF_MEM_ASSIGNED);
+		lmbs_added++;
+		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
+			be64_to_cpu(lmb->base_addr), drc_index);
+	}
+
+	if (lmbs_added)
+		rc = of_update_property(dn, prop);
+	else
+		dlpar_free_drconf_property(prop);
+
+	of_node_put(dn);
+	return rc ? rc : lmbs_added;
+}
+
 int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 {
 	int rc = 0;
 
+	if (hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_COUNT
+	    && hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_INDEX)
+		return -EINVAL;
+
 	mutex_lock(&dlpar_mem_mutex);
 
 	switch (hp_elog->action) {
+	case PSERIES_HP_ELOG_ACTION_ADD:
+		rc = dlpar_memory_add(hp_elog);
+		break;
 	default:
 		pr_err("Invalid action (%d) specified\n", hp_elog->action);
 		rc = -EINVAL;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] pseries: Implement memory hotplug remove in the kernel
  2014-09-15 20:26 [PATCH 0/5] pseries: Move memory hotplug to the kernel Nathan Fontenot
                   ` (3 preceding siblings ...)
  2014-09-15 20:32 ` [PATCH 4/5] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
@ 2014-09-15 20:33 ` Nathan Fontenot
  2014-09-17  7:07   ` [5/5] " Michael Ellerman
  4 siblings, 1 reply; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-15 20:33 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds the ability to do memory hotplug remove in the kernel.

Currently the hotplug add/remove of memory is handled by the drmgr
command. The drmgr command performs the add/remove by performing
some work in user-space and making requests to the kernel to handle
other pieces. By moving all of the work to the kernel we can do the
add and remove faster, and provide a common place to do memory hotplug
for both the PowerVM and PowerKVM environments.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  140 +++++++++++++++++++++++
 1 file changed, 139 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b254773..160c424 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -193,7 +193,137 @@ static int pseries_remove_mem_node(struct device_node *np)
 	pseries_remove_memblock(base, lmb_size);
 	return 0;
 }
+
+static int lmb_is_removable(struct of_drconf_cell *lmb)
+{
+	int i, scns_per_block;
+	int rc = 1;
+	unsigned long pfn, block_sz;
+	u64 phys_addr;
+
+	phys_addr = be64_to_cpu(lmb->base_addr);
+	block_sz = memory_block_size_bytes();
+	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
+	for (i = 0; i < scns_per_block; i++) {
+		pfn = PFN_DOWN(phys_addr);
+		if (!pfn_present(pfn))
+			continue;
+
+		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+		phys_addr += MIN_MEMORY_BLOCK_SIZE;
+	}
+
+	return rc;
+}
+
+static int dlpar_add_one_lmb(struct of_drconf_cell *);
+
+static int dlpar_remove_one_lmb(struct of_drconf_cell *lmb)
+{
+	struct memory_block *mem_block;
+	unsigned long block_sz;
+	u64 phys_addr;
+	int nid, rc;
+
+	block_sz = memory_block_size_bytes();
+	phys_addr = be64_to_cpu(lmb->base_addr);
+	nid = memory_add_physaddr_to_nid(phys_addr);
+
+	if (!pfn_valid(phys_addr >> PAGE_SHIFT)) {
+		memblock_remove(phys_addr, block_sz);
+		return 0;
+	}
+
+	mem_block = lmb_to_memblock(lmb);
+	if (!mem_block)
+		return -EINVAL;
+
+	rc = device_offline(&mem_block->dev);
+	put_device(&mem_block->dev);
+	if (rc)
+		return rc;
+
+	remove_memory(nid, phys_addr, block_sz);
+	memblock_remove(phys_addr, block_sz);
+
+	return 0;
+}
+
+static int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
+{
+	struct of_drconf_cell *lmb;
+	struct device_node *dn;
+	struct property *prop;
+	int lmbs_to_remove, lmbs_removed = 0;
+	int i, entries;
+	int rc = -EINVAL;
+	uint32_t *p;
+
+	if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) {
+		lmbs_to_remove = be32_to_cpu(hp_elog->_drc_u.drc_count);
+		pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
+	} else {
+		lmbs_to_remove = 1;
+		pr_info("Attempting to hot-remove LMB, drc index %x\n",
+			be32_to_cpu(hp_elog->_drc_u.drc_index));
+	}
+
+	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+	if (!dn)
+		return -EINVAL;
+
+	prop = dlpar_clone_drconf_property(dn);
+	if (!prop) {
+		of_node_put(dn);
+		return -EINVAL;
+	}
+
+	p = prop->value;
+	entries = be32_to_cpu(*p++);
+	lmb = (struct of_drconf_cell *)p;
+
+	for (i = 0; i < entries; i++, lmb++) {
+		u32 drc_index = be32_to_cpu(lmb->drc_index);
+
+		if (lmbs_to_remove == lmbs_removed)
+			break;
+
+		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX
+		    && lmb->drc_index != hp_elog->_drc_u.drc_index)
+			continue;
+
+		if (!(be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
+		    || !lmb_is_removable(lmb))
+			continue;
+
+		rc = dlpar_remove_one_lmb(lmb);
+		if (rc)
+			continue;
+
+		rc = dlpar_release_drc(drc_index);
+		if (rc) {
+			dlpar_add_one_lmb(lmb);
+			continue;
+		}
+
+		lmb->flags &= cpu_to_be32(~DRCONF_MEM_ASSIGNED);
+		lmbs_removed++;
+		pr_info("Memory at %llx (drc index %x) has been hot-removed\n",
+			be64_to_cpu(lmb->base_addr), drc_index);
+	}
+
+	if (lmbs_removed)
+		rc = of_update_property(dn, prop);
+	else
+		dlpar_free_drconf_property(prop);
+
+	of_node_put(dn);
+	return rc ? rc : lmbs_removed;
+}
+
 #else
+
 static inline int pseries_remove_memblock(unsigned long base,
 					  unsigned int memblock_size)
 {
@@ -203,6 +333,11 @@ static inline int pseries_remove_mem_node(struct device_node *np)
 {
 	return 0;
 }
+static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 static int dlpar_add_one_lmb(struct of_drconf_cell *lmb)
@@ -320,7 +455,7 @@ static int dlpar_memory_add(struct pseries_hp_errorlog *hp_elog)
 
 int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 {
-	int rc = 0;
+	int rc;
 
 	if (hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_COUNT
 	    && hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_INDEX)
@@ -332,6 +467,9 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 	case PSERIES_HP_ELOG_ACTION_ADD:
 		rc = dlpar_memory_add(hp_elog);
 		break;
+	case PSERIES_HP_ELOG_ACTION_REMOVE:
+		rc = dlpar_memory_remove(hp_elog);
+		break;
 	default:
 		pr_err("Invalid action (%d) specified\n", hp_elog->action);
 		rc = -EINVAL;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [1/5] pseries: Define rtas hotplug event sections
  2014-09-15 20:29 ` [PATCH 1/5] pseries: Define rtas hotplug event sections Nathan Fontenot
@ 2014-09-17  7:06   ` Michael Ellerman
  2014-09-17 16:49     ` Nathan Fontenot
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Ellerman @ 2014-09-17  7:06 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-09-15 at 15:29 -0500, Nathan Fontenot wrote:
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index b390f55..a01879e 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -273,6 +273,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
>  #define PSERIES_ELOG_SECT_ID_MANUFACT_INFO	(('M' << 8) | 'I')
>  #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
>  #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
> +#define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
>  
>  /* Vendor specific Platform Event Log Format, Version 6, section header */
>  struct pseries_errorlog {
> @@ -296,6 +297,31 @@ inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
>  	return be16_to_cpu(sect->length);
>  }
>  
> +/* RTAS pseries hotplug errorlog section */
> +struct pseries_hp_errorlog {
> +	uint8_t	resource;
> +	uint8_t	action;
> +	uint8_t	id_type;
> +	uint8_t	reserved;

These should be u8.

> +	union {
> +		__be32	drc_index;
> +		__be32	drc_count;
> +		char	drc_name[1];

I don't see drc_name used?

> +	} _drc_u;
> +};

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [2/5] pseries: Export drc_[acquire|release]_drc() routines
  2014-09-15 20:30 ` [PATCH 2/5] pseries: Export drc_[acquire|release]_drc() routines Nathan Fontenot
@ 2014-09-17  7:07   ` Michael Ellerman
  2014-09-17 16:50     ` Nathan Fontenot
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Ellerman @ 2014-09-17  7:07 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-09-15 at 15:30 -0500, Nathan Fontenot wrote:
> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
> index 361add6..b94516b 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -59,6 +59,8 @@ extern void dlpar_free_cc_property(struct property *);
>  extern struct device_node *dlpar_configure_connector(u32, struct device_node *);
>  extern int dlpar_attach_node(struct device_node *);
>  extern int dlpar_detach_node(struct device_node *);
> +extern int dlpar_acquire_drc(u32);
> +extern int dlpar_release_drc(u32);

Please name the parameters.

And don't bother with extern.

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/5] pseries: Create device hotplug entry point
  2014-09-15 20:31 ` [PATCH 3/5] pseries: Create device hotplug entry point Nathan Fontenot
@ 2014-09-17  7:07   ` Michael Ellerman
  2014-09-17 19:15     ` Nathan Fontenot
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Ellerman @ 2014-09-17  7:07 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote:
> For pseries system the kernel will be notified of hotplug requests in
> the form of rtas hotplug events. 

Can you flesh that design out a bit for me, I don't entirely get how it's going
to work.

The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to
userspace, which then writes them back in ?

> This patch creates a common routine that can handle these requests in both
> the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also
                ^
> creates the initial memory hotplug request handling stub.
>
> For PowerVM this patch also creates a new /proc file that the drmgr
> command will use to write rtas hotplug events to.

Why is this different between phyp and KVM?

> For future PowerKVM handling the rtas check-exception code can pass
> any rtas hotplug events received to handle_dlpar_errorlog().

Internally to the kernel you mean?

> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index a2450b8..574ec73 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -16,7 +16,9 @@
>  #include <linux/cpu.h>
>  #include <linux/slab.h>
>  #include <linux/of.h>
> +#include <linux/proc_fs.h>
>  #include "offline_states.h"
> +#include "pseries.h"
>  
>  #include <asm/prom.h>
>  #include <asm/machdep.h>
> @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>  	return count;
>  }
>  
> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */

That is really confusing, but I think it's just a diff artifact?

> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_hp_errorlog *hp_elog;
> +	int rc = -EINVAL;
> +
> +	pseries_log = get_pseries_errorlog(error_log,
> +					   PSERIES_ELOG_SECT_ID_HOTPLUG);
> +	if (!pseries_log)
> +		return rc;
> +
> +	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
> +	if (!hp_elog)
> +		return rc;

I don't see how that can happen?

struct pseries_errorlog {
	__be16 id;			/* 0x00 2-byte ASCII section ID	*/
	__be16 length;			/* 0x02 Section length in bytes	*/
	uint8_t version;		/* 0x04 Section version		*/
	uint8_t subtype;		/* 0x05 Section subtype		*/
	__be16 creator_component;	/* 0x06 Creator component ID	*/
	uint8_t data[];			/* 0x08 Start of section data	*/
};

Should you be checking for length == 0 instead ?

Also I think the code will probably end up cleaner if you do the endian
conversions immediately when you read the hp_elog, rather than passing it
around in BE and having to remember to convert at all the usages.

> +	switch (hp_elog->resource) {
> +	case PSERIES_HP_ELOG_RESOURCE_MEM:
> +		rc = dlpar_memory(hp_elog);
> +		break;

Please add:

  default:
	pr_warn_ratelimited("Unknown resource ..")

Or something.


> +	}
> +
> +	return rc;
> +}
> +
> +static ssize_t dlpar_write(struct file *file, const char __user *buf,
> +			   size_t count, loff_t *offset)
> +{
> +	char *event_buf;
> +	int rc;
> +
> +	event_buf = kmalloc(count + 1, GFP_KERNEL);

Why + 1 ? It's not null-terminated AFAICS.

> +	if (!event_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_from_user(event_buf, buf, count);
> +	if (rc) {
> +		kfree(event_buf);
> +		return rc;
> +	}
> +
> +	rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf);

If you start with a struct rtas_error_log * you shouldn't need any casts.

> +	kfree(event_buf);
> +	return rc ? rc : count;
> +}
> +
> +static const struct file_operations dlpar_fops = {
> +	.write = dlpar_write,
> +	.llseek = noop_llseek,
> +};
> +
>  static int __init pseries_dlpar_init(void)
>  {
> +	struct proc_dir_entry *proc_ent;
> +
> +	proc_ent = proc_create("powerpc/dlpar", S_IWUSR, NULL, &dlpar_fops);
> +	if (proc_ent)
> +		proc_set_size(proc_ent, 0);

  else
	error message at least please

Why are we putting it in /proc, can't it go in /sys/kernel like the mobility
stuff?

> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 24abc5c..0e60e15 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -20,6 +22,9 @@
>  #include <asm/machdep.h>
>  #include <asm/prom.h>
>  #include <asm/sparsemem.h>
> +#include <asm/rtas.h>
> +
> +DEFINE_MUTEX(dlpar_mem_mutex);

static ?

> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
> index b94516b..28bd994 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct device_node *);
>  extern int dlpar_acquire_drc(u32);
>  extern int dlpar_release_drc(u32);
>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +extern int dlpar_memory(struct pseries_hp_errorlog *);
> +#else
> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> +{
> +	return -ENOTSUPP;

EOPNOTSUPP is a bit more standard.

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [4/5] pseries: Implement memory hotplug add in the kernel
  2014-09-15 20:32 ` [PATCH 4/5] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
@ 2014-09-17  7:07   ` Michael Ellerman
  2014-09-17 19:45     ` Nathan Fontenot
  2014-09-24 20:57     ` Nathan Fontenot
  0 siblings, 2 replies; 19+ messages in thread
From: Michael Ellerman @ 2014-09-17  7:07 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-09-15 at 15:32 -0500, Nathan Fontenot wrote:
> This patch adds the ability to do memory hotplug adding in the kernel.
> 
> Currently the hotplug add/remove of memory is handled by the drmgr
> command. The drmgr command performs the add/remove by performing
> some work in user-space and making requests to the kernel to handle 
> other pieces. By moving all of the work to the kernel we can do the
> add and remove faster, and provide a common place to do memory hotplug
> for both the PowerVM and PowerKVM environments.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  170 +++++++++++++++++++++++
>  1 file changed, 170 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 0e60e15..b254773 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -17,6 +17,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
> +#include <linux/slab.h>
>  
>  #include <asm/firmware.h>
>  #include <asm/machdep.h>
> @@ -24,6 +25,8 @@
>  #include <asm/sparsemem.h>
>  #include <asm/rtas.h>
>  
> +#include "pseries.h"
> +
>  DEFINE_MUTEX(dlpar_mem_mutex);
>  
>  unsigned long pseries_memory_block_size(void)
> @@ -69,6 +72,53 @@ unsigned long pseries_memory_block_size(void)
>  	return memblock_size;
>  }
>  
> +static void dlpar_free_drconf_property(struct property *prop)
> +{
> +	kfree(prop->name);
> +	kfree(prop->value);
> +	kfree(prop);
> +}
> +
> +static struct property *dlpar_clone_drconf_property(struct device_node *dn)
> +{
> +	struct property *prop, *new_prop;
> +
> +	prop = of_find_property(dn, "ibm,dynamic-memory", NULL);
> +	if (!prop)
> +		return NULL;
> +
> +	new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
> +	if (!new_prop)
> +		return NULL;
> +
> +	new_prop->name = kstrdup(prop->name, GFP_KERNEL);
> +	new_prop->value = kmalloc(prop->length + 1, GFP_KERNEL);
> +	if (!new_prop->name || !new_prop->value) {
> +		dlpar_free_drconf_property(new_prop);
> +		return NULL;
> +	}
> +
> +	memcpy(new_prop->value, prop->value, prop->length);
> +	new_prop->length = prop->length;
> +	*(((char *)new_prop->value) + new_prop->length) = 0;

It's not a string, is it?

> +	return new_prop;
> +}
> +
> +static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
> +{
> +	unsigned long section_nr;
> +	struct mem_section *mem_sect;
> +	struct memory_block *mem_block;
> +	u64 phys_addr = be64_to_cpu(lmb->base_addr);
> +
> +	section_nr = pfn_to_section_nr(PFN_DOWN(phys_addr));
> +	mem_sect = __nr_to_section(section_nr);
> +
> +	mem_block = find_memory_block(mem_sect);
> +	return mem_block;
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  static int pseries_remove_memory(u64 start, u64 size)
>  {
> @@ -155,13 +205,133 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
> +static int dlpar_add_one_lmb(struct of_drconf_cell *lmb)
> +{
> +	struct memory_block *mem_block;
> +	u64 phys_addr;
> +	unsigned long pages_per_block;
> +	unsigned long block_sz;
> +	int nid, sections_per_block;
> +	int rc;
> +
> +	phys_addr = be64_to_cpu(lmb->base_addr);

of_drconf_cell needs endian annotations.

> +	block_sz = memory_block_size_bytes();
> +	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +	pages_per_block = PAGES_PER_SECTION * sections_per_block;
> +
> +	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
> +		return -EINVAL;
> +
> +	nid = memory_add_physaddr_to_nid(phys_addr);
> +	rc = add_memory(nid, phys_addr, block_sz);
> +	if (rc)
> +		return rc;
> +
> +	rc = memblock_add(phys_addr, block_sz);
> +	if (rc) {
> +		remove_memory(nid, phys_addr, block_sz);
> +		return rc;
> +	}
> +
> +	mem_block = lmb_to_memblock(lmb);
> +	if (!mem_block) {
> +		remove_memory(nid, phys_addr, block_sz);
> +		return -EINVAL;
> +	}

That could all use a lot of comments. ie. why do we have to add it twice?

> +	rc = device_online(&mem_block->dev);
> +	put_device(&mem_block->dev);
> +	if (rc)
> +		remove_memory(nid, phys_addr, block_sz);
> +
> +	return rc;
> +}
> +
> +static int dlpar_memory_add(struct pseries_hp_errorlog *hp_elog)
> +{
> +	struct of_drconf_cell *lmb;
> +	struct device_node *dn;
> +	struct property *prop;
> +	uint32_t entries, *p;

*p should be __be32.

> +	int i, lmbs_to_add;
> +	int lmbs_added = 0;
> +	int rc = -EINVAL;

Don't pre-initialise your rc variables.

> +	if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) {
> +		lmbs_to_add = be32_to_cpu(hp_elog->_drc_u.drc_count);
> +		pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
> +	} else {
> +		lmbs_to_add = 1;
> +		pr_info("Attempting to hot-add LMB, drc index %x\n",
> +			be32_to_cpu(hp_elog->_drc_u.drc_index));
> +	}
> +
> +	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> +	if (!dn)
> +		return -EINVAL;
> +
> +	prop = dlpar_clone_drconf_property(dn);
> +	if (!prop) {
> +		of_node_put(dn);
> +		return -EINVAL;
> +	}
> +
> +	p = prop->value;
> +	entries = be32_to_cpu(*p++);
> +	lmb = (struct of_drconf_cell *)p;


So if I'm reading this right the hp_elog either contains an index or a count of
LMBs to add. But it doesn't contain anything about which address ranges to add
or any of those details. That is all in the ibm,dynamic-memory property - but
how did it get in there?

> +
> +	for (i = 0; i < entries; i++, lmb++) {
> +		u32 drc_index = be32_to_cpu(lmb->drc_index);
> +
> +		if (lmbs_to_add == lmbs_added)
> +			break;
> +
> +		if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
> +			continue;
> +
> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX
> +		    && lmb->drc_index != hp_elog->_drc_u.drc_index)
> +			continue;
> +
> +		rc = dlpar_acquire_drc(drc_index);
> +		if (rc)
> +			continue;
> +
> +		rc = dlpar_add_one_lmb(lmb);
> +		if (rc) {
> +			dlpar_release_drc(drc_index);
> +			continue;
> +		}

In both the above error cases you just move along. That means we potentially
hotplugged some memory but not everything that we were asked to. That seems
like a bad idea, we should either do everything or nothing.


> +
> +		lmb->flags |= cpu_to_be32(DRCONF_MEM_ASSIGNED);
> +		lmbs_added++;
> +		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
> +			be64_to_cpu(lmb->base_addr), drc_index);
> +	}
> +
> +	if (lmbs_added)
> +		rc = of_update_property(dn, prop);
> +	else
> +		dlpar_free_drconf_property(prop);

The value of rc here is not clear. It could be EINVAL or it could be the result
of the last dlpar_add_one_lmb(lmb). gcc would have told you that if you hadn't
initialised it.

> +
> +	of_node_put(dn);
> +	return rc ? rc : lmbs_added;

This looks wrong.

Doesn't the rc eventually go back to dlpar_write(), which expects 0 for success?

That should show up as the write failing in userspace.


>  int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>  {
>  	int rc = 0;

Don't initialise to zero, that way gcc can tell you if there's a path where you
forget to initialise it. It also means you can't accidentally return success.

> +	if (hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_COUNT
> +	    && hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_INDEX)
> +		return -EINVAL;

This would look nicer as a switch I think.

>  	mutex_lock(&dlpar_mem_mutex);
>  
>  	switch (hp_elog->action) {
> +	case PSERIES_HP_ELOG_ACTION_ADD:
> +		rc = dlpar_memory_add(hp_elog);
> +		break;
>  	default:
>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>  		rc = -EINVAL;
> 

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [5/5] pseries: Implement memory hotplug remove in the kernel
  2014-09-15 20:33 ` [PATCH 5/5] pseries: Implement memory hotplug remove " Nathan Fontenot
@ 2014-09-17  7:07   ` Michael Ellerman
  2014-09-17 19:58     ` Nathan Fontenot
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Ellerman @ 2014-09-17  7:07 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev list


On Mon, 2014-09-15 at 15:33 -0500, Nathan Fontenot wrote:
> This patch adds the ability to do memory hotplug remove in the kernel.
> 
> Currently the hotplug add/remove of memory is handled by the drmgr
> command. The drmgr command performs the add/remove by performing
> some work in user-space and making requests to the kernel to handle
> other pieces. By moving all of the work to the kernel we can do the
> add and remove faster, and provide a common place to do memory hotplug
> for both the PowerVM and PowerKVM environments.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  140 +++++++++++++++++++++++
>  1 file changed, 139 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b254773..160c424 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -193,7 +193,137 @@ static int pseries_remove_mem_node(struct device_node *np)
>  	pseries_remove_memblock(base, lmb_size);
>  	return 0;
>  }
> +
> +static int lmb_is_removable(struct of_drconf_cell *lmb)
> +{

Do we not already have something like this?

> +	int i, scns_per_block;
> +	int rc = 1;

I can see this makes the &= work below.

But what if block_sz / MIN_MEMORY_BLOCK_SIZE = 0 ?

> +	unsigned long pfn, block_sz;
> +	u64 phys_addr;
> +
> +	phys_addr = be64_to_cpu(lmb->base_addr);
> +	block_sz = memory_block_size_bytes();
> +	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +
> +	for (i = 0; i < scns_per_block; i++) {
> +		pfn = PFN_DOWN(phys_addr);
> +		if (!pfn_present(pfn))
> +			continue;
> +
> +		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
> +		phys_addr += MIN_MEMORY_BLOCK_SIZE;
> +	}
> +
> +	return rc;
> +}

> +static int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
> +{

...

> +}

Most of the same comments as for add.

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [1/5] pseries: Define rtas hotplug event sections
  2014-09-17  7:06   ` [1/5] " Michael Ellerman
@ 2014-09-17 16:49     ` Nathan Fontenot
  0 siblings, 0 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-17 16:49 UTC (permalink / raw)
  To: linuxppc-dev

On 09/17/2014 02:06 AM, Michael Ellerman wrote:
> 
> On Mon, 2014-09-15 at 15:29 -0500, Nathan Fontenot wrote:
>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index b390f55..a01879e 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -273,6 +273,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
>>  #define PSERIES_ELOG_SECT_ID_MANUFACT_INFO	(('M' << 8) | 'I')
>>  #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
>>  #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
>> +#define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
>>  
>>  /* Vendor specific Platform Event Log Format, Version 6, section header */
>>  struct pseries_errorlog {
>> @@ -296,6 +297,31 @@ inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
>>  	return be16_to_cpu(sect->length);
>>  }
>>  
>> +/* RTAS pseries hotplug errorlog section */
>> +struct pseries_hp_errorlog {
>> +	uint8_t	resource;
>> +	uint8_t	action;
>> +	uint8_t	id_type;
>> +	uint8_t	reserved;
> 
> These should be u8.

ok.

> 
>> +	union {
>> +		__be32	drc_index;
>> +		__be32	drc_count;
>> +		char	drc_name[1];
> 
> I don't see drc_name used?

I don't use drc_name in this patch set but the drc_name piece is
part of the rtas hotplug section definition and I wanted to provide
a complete definition of the section.

-Nathan

> 
>> +	} _drc_u;
>> +};
> 
> cheers
> 
> 
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [2/5] pseries: Export drc_[acquire|release]_drc() routines
  2014-09-17  7:07   ` [2/5] " Michael Ellerman
@ 2014-09-17 16:50     ` Nathan Fontenot
  0 siblings, 0 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-17 16:50 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On 09/17/2014 02:07 AM, Michael Ellerman wrote:
> 
> On Mon, 2014-09-15 at 15:30 -0500, Nathan Fontenot wrote:
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>> index 361add6..b94516b 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -59,6 +59,8 @@ extern void dlpar_free_cc_property(struct property *);
>>  extern struct device_node *dlpar_configure_connector(u32, struct device_node *);
>>  extern int dlpar_attach_node(struct device_node *);
>>  extern int dlpar_detach_node(struct device_node *);
>> +extern int dlpar_acquire_drc(u32);
>> +extern int dlpar_release_drc(u32);
> 
> Please name the parameters.

Will do.

> 
> And don't bother with extern.

I was following the convention used in the file, droppping the
extern is fine with me though.

-Nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/5] pseries: Create device hotplug entry point
  2014-09-17  7:07   ` [3/5] " Michael Ellerman
@ 2014-09-17 19:15     ` Nathan Fontenot
  2014-09-23  1:15       ` Tyrel Datwyler
  0 siblings, 1 reply; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-17 19:15 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On 09/17/2014 02:07 AM, Michael Ellerman wrote:
> 
> On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote:
>> For pseries system the kernel will be notified of hotplug requests in
>> the form of rtas hotplug events. 
> 
> Can you flesh that design out a bit for me, I don't entirely get how it's going
> to work.
> 
> The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to
> userspace, which then writes them back in ?
> 
>> This patch creates a common routine that can handle these requests in both
>> the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also
>                 ^
>> creates the initial memory hotplug request handling stub.
>>
>> For PowerVM this patch also creates a new /proc file that the drmgr
>> command will use to write rtas hotplug events to.
> 
> Why is this different between phyp and KVM?
> 
>> For future PowerKVM handling the rtas check-exception code can pass
>> any rtas hotplug events received to handle_dlpar_errorlog().
> 
> Internally to the kernel you mean?
> 

Perhaps a better explanation of how things work today and where I see
them going is needed. I was trying to avoid a long explanation and I
don't think my shortened explanation worked. I'll include this in v2
of the patchset too.

The current hotplug (or dlpar) of devices (the process is generally the
same for memory, cpu, and pci) on PowerVM systems is initiated
from the HMC, which communicates the request to the partitions through
the RSCT framework. The RSCT framework then invokes the drmgr command.
The drmgr command performs the hotplug operation by doing some pieces,
such as most of the rtas calls and device tree parsing, in userspace
and make requests to the kernel to online/offline the device, update the
device tree and add/remove the device.

For PowerKVM the approach is to follow what is currently being done for
pci hotplug. A hotplug request is initiated from the host. QEMU then
sends an EPOW interrupt to the guest which causes the guest to make the
rtas,check-exception call. In QEMU, the rtas,check-exception call
returns a rtas hotplug event to the guest. I was using this same framework
to also enable memory (and next cpu) hotplug.

You are correct that the current pci hotplug path for PowerKVM involves
the kernel receiving the rtas event, passing it to rtas_errd in userspace,
and having rtas_errd invoke drmgr. The drmgr command then handles the request
as described above for PowerVM systems.

There is no need for this circuitous route, we should just handle the entire
hotplug of devices in the kernel. What I am hoping to do is to enable this
by moving the code to handle hotplug from drmgr into the kernel and 
provide a single path for handling hotplug for PowerVM and PowerKVM. To
make this work for PowerKVM we will update the kernel rtas code to
recognize rtas hotplug events returned from rtas,check-exception calls
and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out
to userspace.

For PowerVM systems, I created the /proc/powerpc/dlpar file that a rtas
hotplug event can be written to and passed to handle_dlpar_errorlog().
There is no chance of updating how we receive hotplug requests on PowerVM
systems.

Hopefully that explains the design better.

>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
>> index a2450b8..574ec73 100644
>> --- a/arch/powerpc/platforms/pseries/dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/dlpar.c
>> @@ -16,7 +16,9 @@
>>  #include <linux/cpu.h>
>>  #include <linux/slab.h>
>>  #include <linux/of.h>
>> +#include <linux/proc_fs.h>
>>  #include "offline_states.h"
>> +#include "pseries.h"
>>  
>>  #include <asm/prom.h>
>>  #include <asm/machdep.h>
>> @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>>  	return count;
>>  }
>>  
>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> 
> That is really confusing, but I think it's just a diff artifact?

Yes, diff artifact.

> 
>> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_hp_errorlog *hp_elog;
>> +	int rc = -EINVAL;
>> +
>> +	pseries_log = get_pseries_errorlog(error_log,
>> +					   PSERIES_ELOG_SECT_ID_HOTPLUG);
>> +	if (!pseries_log)
>> +		return rc;
>> +
>> +	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
>> +	if (!hp_elog)
>> +		return rc;
> 
> I don't see how that can happen?
> 
> struct pseries_errorlog {
> 	__be16 id;			/* 0x00 2-byte ASCII section ID	*/
> 	__be16 length;			/* 0x02 Section length in bytes	*/
> 	uint8_t version;		/* 0x04 Section version		*/
> 	uint8_t subtype;		/* 0x05 Section subtype		*/
> 	__be16 creator_component;	/* 0x06 Creator component ID	*/
> 	uint8_t data[];			/* 0x08 Start of section data	*/
> };
> 
> Should you be checking for length == 0 instead ?
>

You are correct.
 
> Also I think the code will probably end up cleaner if you do the endian
> conversions immediately when you read the hp_elog, rather than passing it
> around in BE and having to remember to convert at all the usages.
> 

Agreed. I'll clean that up.

>> +	switch (hp_elog->resource) {
>> +	case PSERIES_HP_ELOG_RESOURCE_MEM:
>> +		rc = dlpar_memory(hp_elog);
>> +		break;
> 
> Please add:
> 
>   default:
> 	pr_warn_ratelimited("Unknown resource ..")
> 
> Or something.

can do.

> 
> 
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static ssize_t dlpar_write(struct file *file, const char __user *buf,
>> +			   size_t count, loff_t *offset)
>> +{
>> +	char *event_buf;
>> +	int rc;
>> +
>> +	event_buf = kmalloc(count + 1, GFP_KERNEL);
> 
> Why + 1 ? It's not null-terminated AFAICS.

I think that's just habit. You're correct, it's not needed.

> 
>> +	if (!event_buf)
>> +		return -ENOMEM;
>> +
>> +	rc = copy_from_user(event_buf, buf, count);
>> +	if (rc) {
>> +		kfree(event_buf);
>> +		return rc;
>> +	}
>> +
>> +	rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf);
> 
> If you start with a struct rtas_error_log * you shouldn't need any casts.

good point.

> 
>> +	kfree(event_buf);
>> +	return rc ? rc : count;
>> +}
>> +
>> +static const struct file_operations dlpar_fops = {
>> +	.write = dlpar_write,
>> +	.llseek = noop_llseek,
>> +};
>> +
>>  static int __init pseries_dlpar_init(void)
>>  {
>> +	struct proc_dir_entry *proc_ent;
>> +
>> +	proc_ent = proc_create("powerpc/dlpar", S_IWUSR, NULL, &dlpar_fops);
>> +	if (proc_ent)
>> +		proc_set_size(proc_ent, 0);
> 
>   else
> 	error message at least please
> 
> Why are we putting it in /proc, can't it go in /sys/kernel like the mobility
> stuff?

I have no personal preference, I'll move it to /sys/kernel and update the
creation handling.

> 
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 24abc5c..0e60e15 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -20,6 +22,9 @@
>>  #include <asm/machdep.h>
>>  #include <asm/prom.h>
>>  #include <asm/sparsemem.h>
>> +#include <asm/rtas.h>
>> +
>> +DEFINE_MUTEX(dlpar_mem_mutex);
> 
> static ?

yes, that should have been static.

> 
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>> index b94516b..28bd994 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct device_node *);
>>  extern int dlpar_acquire_drc(u32);
>>  extern int dlpar_release_drc(u32);
>>  
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +extern int dlpar_memory(struct pseries_hp_errorlog *);
>> +#else
>> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	return -ENOTSUPP;
> 
> EOPNOTSUPP is a bit more standard.

ok.

Thanks for all the feedback.

-Nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [4/5] pseries: Implement memory hotplug add in the kernel
  2014-09-17  7:07   ` [4/5] " Michael Ellerman
@ 2014-09-17 19:45     ` Nathan Fontenot
  2014-09-24 20:57     ` Nathan Fontenot
  1 sibling, 0 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-17 19:45 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On 09/17/2014 02:07 AM, Michael Ellerman wrote:
> 
> On Mon, 2014-09-15 at 15:32 -0500, Nathan Fontenot wrote:
>> This patch adds the ability to do memory hotplug adding in the kernel.
>>
>> Currently the hotplug add/remove of memory is handled by the drmgr
>> command. The drmgr command performs the add/remove by performing
>> some work in user-space and making requests to the kernel to handle 
>> other pieces. By moving all of the work to the kernel we can do the
>> add and remove faster, and provide a common place to do memory hotplug
>> for both the PowerVM and PowerKVM environments.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |  170 +++++++++++++++++++++++
>>  1 file changed, 170 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 0e60e15..b254773 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/vmalloc.h>
>>  #include <linux/memory.h>
>>  #include <linux/memory_hotplug.h>
>> +#include <linux/slab.h>
>>  
>>  #include <asm/firmware.h>
>>  #include <asm/machdep.h>
>> @@ -24,6 +25,8 @@
>>  #include <asm/sparsemem.h>
>>  #include <asm/rtas.h>
>>  
>> +#include "pseries.h"
>> +
>>  DEFINE_MUTEX(dlpar_mem_mutex);
>>  
>>  unsigned long pseries_memory_block_size(void)
>> @@ -69,6 +72,53 @@ unsigned long pseries_memory_block_size(void)
>>  	return memblock_size;
>>  }
>>  
>> +static void dlpar_free_drconf_property(struct property *prop)
>> +{
>> +	kfree(prop->name);
>> +	kfree(prop->value);
>> +	kfree(prop);
>> +}
>> +
>> +static struct property *dlpar_clone_drconf_property(struct device_node *dn)
>> +{
>> +	struct property *prop, *new_prop;
>> +
>> +	prop = of_find_property(dn, "ibm,dynamic-memory", NULL);
>> +	if (!prop)
>> +		return NULL;
>> +
>> +	new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
>> +	if (!new_prop)
>> +		return NULL;
>> +
>> +	new_prop->name = kstrdup(prop->name, GFP_KERNEL);
>> +	new_prop->value = kmalloc(prop->length + 1, GFP_KERNEL);
>> +	if (!new_prop->name || !new_prop->value) {
>> +		dlpar_free_drconf_property(new_prop);
>> +		return NULL;
>> +	}
>> +
>> +	memcpy(new_prop->value, prop->value, prop->length);
>> +	new_prop->length = prop->length;
>> +	*(((char *)new_prop->value) + new_prop->length) = 0;
> 
> It's not a string, is it?

No, property->value is a void*. I'll drop that line of code.

> 
>> +	return new_prop;
>> +}
>> +
>> +static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
>> +{
>> +	unsigned long section_nr;
>> +	struct mem_section *mem_sect;
>> +	struct memory_block *mem_block;
>> +	u64 phys_addr = be64_to_cpu(lmb->base_addr);
>> +
>> +	section_nr = pfn_to_section_nr(PFN_DOWN(phys_addr));
>> +	mem_sect = __nr_to_section(section_nr);
>> +
>> +	mem_block = find_memory_block(mem_sect);
>> +	return mem_block;
>> +}
>> +
>>  #ifdef CONFIG_MEMORY_HOTREMOVE
>>  static int pseries_remove_memory(u64 start, u64 size)
>>  {
>> @@ -155,13 +205,133 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>>  }
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>  
>> +static int dlpar_add_one_lmb(struct of_drconf_cell *lmb)
>> +{
>> +	struct memory_block *mem_block;
>> +	u64 phys_addr;
>> +	unsigned long pages_per_block;
>> +	unsigned long block_sz;
>> +	int nid, sections_per_block;
>> +	int rc;
>> +
>> +	phys_addr = be64_to_cpu(lmb->base_addr);
> 
> of_drconf_cell needs endian annotations.

Yes it does. I can include a patch to update the struct.

> 
>> +	block_sz = memory_block_size_bytes();
>> +	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> +	pages_per_block = PAGES_PER_SECTION * sections_per_block;
>> +
>> +	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
>> +		return -EINVAL;
>> +
>> +	nid = memory_add_physaddr_to_nid(phys_addr);
>> +	rc = add_memory(nid, phys_addr, block_sz);
>> +	if (rc)
>> +		return rc;
>> +
>> +	rc = memblock_add(phys_addr, block_sz);
>> +	if (rc) {
>> +		remove_memory(nid, phys_addr, block_sz);
>> +		return rc;
>> +	}
>> +
>> +	mem_block = lmb_to_memblock(lmb);
>> +	if (!mem_block) {
>> +		remove_memory(nid, phys_addr, block_sz);
>> +		return -EINVAL;
>> +	}
> 
> That could all use a lot of comments. ie. why do we have to add it twice?

We don't actually add it twice, though I can see how one could think
that based on the names of the routines called. I'll add comments to 
clarify this in v2 of the patch.

memory_add_physaddr_to_nid(), this doesn't add anything despite its naming.
The routine finds the node id for the specified physical address.

add_memory(), this actually adds the memory.

memblock_add(), this informs the memory block information tracking about the
newly added memory. Why this is not done as part of add_memory I don't know.

> 
>> +	rc = device_online(&mem_block->dev);
>> +	put_device(&mem_block->dev);
>> +	if (rc)
>> +		remove_memory(nid, phys_addr, block_sz);
>> +
>> +	return rc;
>> +}
>> +
>> +static int dlpar_memory_add(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	struct of_drconf_cell *lmb;
>> +	struct device_node *dn;
>> +	struct property *prop;
>> +	uint32_t entries, *p;
> 
> *p should be __be32.

Yes.

> 
>> +	int i, lmbs_to_add;
>> +	int lmbs_added = 0;
>> +	int rc = -EINVAL;
> 
> Don't pre-initialise your rc variables.

I did this here for a reason. When asking to add memory by drc_index it
is possible to fall out of the for() loop traversing the lmb entries
and not find the requested drc_index.

Adding a check for this situation after the loop would do the same thing
and probably make this situation more clear.
 
> 
>> +	if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT) {
>> +		lmbs_to_add = be32_to_cpu(hp_elog->_drc_u.drc_count);
>> +		pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
>> +	} else {
>> +		lmbs_to_add = 1;
>> +		pr_info("Attempting to hot-add LMB, drc index %x\n",
>> +			be32_to_cpu(hp_elog->_drc_u.drc_index));
>> +	}
>> +
>> +	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>> +	if (!dn)
>> +		return -EINVAL;
>> +
>> +	prop = dlpar_clone_drconf_property(dn);
>> +	if (!prop) {
>> +		of_node_put(dn);
>> +		return -EINVAL;
>> +	}
>> +
>> +	p = prop->value;
>> +	entries = be32_to_cpu(*p++);
>> +	lmb = (struct of_drconf_cell *)p;
> 
> 
> So if I'm reading this right the hp_elog either contains an index or a count of
> LMBs to add. But it doesn't contain anything about which address ranges to add
> or any of those details. That is all in the ibm,dynamic-memory property - but
> how did it get in there?

The ibm,dynamic-memory property of the device tree is passed to the kernel
by phyp/qemu. The property is an array that associates each LMB with a
starting physical address and associativity.

> 
>> +
>> +	for (i = 0; i < entries; i++, lmb++) {
>> +		u32 drc_index = be32_to_cpu(lmb->drc_index);
>> +
>> +		if (lmbs_to_add == lmbs_added)
>> +			break;
>> +
>> +		if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
>> +			continue;
>> +
>> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX
>> +		    && lmb->drc_index != hp_elog->_drc_u.drc_index)
>> +			continue;
>> +
>> +		rc = dlpar_acquire_drc(drc_index);
>> +		if (rc)
>> +			continue;
>> +
>> +		rc = dlpar_add_one_lmb(lmb);
>> +		if (rc) {
>> +			dlpar_release_drc(drc_index);
>> +			continue;
>> +		}
> 
> In both the above error cases you just move along. That means we potentially
> hotplugged some memory but not everything that we were asked to. That seems
> like a bad idea, we should either do everything or nothing.
> 
> 

That can be done, though will require some additional tracking.

The current hotplug handling for PowerVM make a best effort and tries to
add/remove as much of the requested memory as possible. I was going with
that same approach here, but have no problem moving to an all or nothing
approach.

We will need to keep track of the LMBs added/removed during a request so
we can return to the original state if the request cannot be satisfied.
 
>> +
>> +		lmb->flags |= cpu_to_be32(DRCONF_MEM_ASSIGNED);
>> +		lmbs_added++;
>> +		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
>> +			be64_to_cpu(lmb->base_addr), drc_index);
>> +	}
>> +
>> +	if (lmbs_added)
>> +		rc = of_update_property(dn, prop);
>> +	else
>> +		dlpar_free_drconf_property(prop);
> 
> The value of rc here is not clear. It could be EINVAL or it could be the result
> of the last dlpar_add_one_lmb(lmb). gcc would have told you that if you hadn't
> initialised it.
> 
>> +
>> +	of_node_put(dn);
>> +	return rc ? rc : lmbs_added;
> 
> This looks wrong.
> 
> Doesn't the rc eventually go back to dlpar_write(), which expects 0 for success?
> 
> That should show up as the write failing in userspace.
> 
> 

Based on previous comments I think the handling of rc will be updated so
we either return success or failure.

>>  int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>>  {
>>  	int rc = 0;
> 
> Don't initialise to zero, that way gcc can tell you if there's a path where you
> forget to initialise it. It also means you can't accidentally return success.
> 
>> +	if (hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_COUNT
>> +	    && hp_elog->id_type != PSERIES_HP_ELOG_ID_DRC_INDEX)
>> +		return -EINVAL;
> 
> This would look nicer as a switch I think.

can do.

-Nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [5/5] pseries: Implement memory hotplug remove in the kernel
  2014-09-17  7:07   ` [5/5] " Michael Ellerman
@ 2014-09-17 19:58     ` Nathan Fontenot
  0 siblings, 0 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-17 19:58 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev list

On 09/17/2014 02:07 AM, Michael Ellerman wrote:
> 
> On Mon, 2014-09-15 at 15:33 -0500, Nathan Fontenot wrote:
>> This patch adds the ability to do memory hotplug remove in the kernel.
>>
>> Currently the hotplug add/remove of memory is handled by the drmgr
>> command. The drmgr command performs the add/remove by performing
>> some work in user-space and making requests to the kernel to handle
>> other pieces. By moving all of the work to the kernel we can do the
>> add and remove faster, and provide a common place to do memory hotplug
>> for both the PowerVM and PowerKVM environments.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |  140 +++++++++++++++++++++++
>>  1 file changed, 139 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index b254773..160c424 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -193,7 +193,137 @@ static int pseries_remove_mem_node(struct device_node *np)
>>  	pseries_remove_memblock(base, lmb_size);
>>  	return 0;
>>  }
>> +
>> +static int lmb_is_removable(struct of_drconf_cell *lmb)
>> +{
> 
> Do we not already have something like this?

No. Perhaps your thinking of the code in drivers/base/memory.c that
handles the sysfs removable file. That code just calls the same
is_mem_section_removable() routine.

> 
>> +	int i, scns_per_block;
>> +	int rc = 1;
> 
> I can see this makes the &= work below.
> 
> But what if block_sz / MIN_MEMORY_BLOCK_SIZE = 0 ?

If that happens, something else is really wrong. Most
likely a malformed device tree.

For pseries MIN_MEMORY_BLOCK_SIZE is defined to be the smallest
LMB size we suppport, 16MB.

I can add a pr_warn() statement here and bail if that happens.

> 
>> +	unsigned long pfn, block_sz;
>> +	u64 phys_addr;
>> +
>> +	phys_addr = be64_to_cpu(lmb->base_addr);
>> +	block_sz = memory_block_size_bytes();
>> +	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> +
>> +	for (i = 0; i < scns_per_block; i++) {
>> +		pfn = PFN_DOWN(phys_addr);
>> +		if (!pfn_present(pfn))
>> +			continue;
>> +
>> +		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
>> +		phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> +	}
>> +
>> +	return rc;
>> +}
> 
>> +static int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
>> +{
> 
> ...
> 
>> +}
> 
> Most of the same comments as for add.
> 

ok, I'll go through them and apply them to the remove code.

Thanks for the review.

-Nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/5] pseries: Create device hotplug entry point
  2014-09-17 19:15     ` Nathan Fontenot
@ 2014-09-23  1:15       ` Tyrel Datwyler
  2014-09-23 14:43         ` Nathan Fontenot
  0 siblings, 1 reply; 19+ messages in thread
From: Tyrel Datwyler @ 2014-09-23  1:15 UTC (permalink / raw)
  To: Nathan Fontenot, Michael Ellerman; +Cc: linuxppc-dev

On 09/17/2014 12:15 PM, Nathan Fontenot wrote:
> On 09/17/2014 02:07 AM, Michael Ellerman wrote:
>>
>> On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote:
>>> For pseries system the kernel will be notified of hotplug requests in
>>> the form of rtas hotplug events. 
>>
>> Can you flesh that design out a bit for me, I don't entirely get how it's going
>> to work.
>>
>> The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to
>> userspace, which then writes them back in ?
>>
>>> This patch creates a common routine that can handle these requests in both
>>> the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also
>>                 ^
>>> creates the initial memory hotplug request handling stub.
>>>
>>> For PowerVM this patch also creates a new /proc file that the drmgr
>>> command will use to write rtas hotplug events to.
>>
>> Why is this different between phyp and KVM?
>>
>>> For future PowerKVM handling the rtas check-exception code can pass
>>> any rtas hotplug events received to handle_dlpar_errorlog().
>>
>> Internally to the kernel you mean?
>>
> 
> Perhaps a better explanation of how things work today and where I see
> them going is needed. I was trying to avoid a long explanation and I
> don't think my shortened explanation worked. I'll include this in v2
> of the patchset too.
> 
> The current hotplug (or dlpar) of devices (the process is generally the
> same for memory, cpu, and pci) on PowerVM systems is initiated
> from the HMC, which communicates the request to the partitions through
> the RSCT framework. The RSCT framework then invokes the drmgr command.
> The drmgr command performs the hotplug operation by doing some pieces,
> such as most of the rtas calls and device tree parsing, in userspace
> and make requests to the kernel to online/offline the device, update the
> device tree and add/remove the device.
> 
> For PowerKVM the approach is to follow what is currently being done for
> pci hotplug. A hotplug request is initiated from the host. QEMU then
> sends an EPOW interrupt to the guest which causes the guest to make the
> rtas,check-exception call. In QEMU, the rtas,check-exception call
> returns a rtas hotplug event to the guest. I was using this same framework
> to also enable memory (and next cpu) hotplug.
> 
> You are correct that the current pci hotplug path for PowerKVM involves
> the kernel receiving the rtas event, passing it to rtas_errd in userspace,
> and having rtas_errd invoke drmgr. The drmgr command then handles the request
> as described above for PowerVM systems.
> 
> There is no need for this circuitous route, we should just handle the entire
> hotplug of devices in the kernel. What I am hoping to do is to enable this
> by moving the code to handle hotplug from drmgr into the kernel and 
> provide a single path for handling hotplug for PowerVM and PowerKVM. To
> make this work for PowerKVM we will update the kernel rtas code to
> recognize rtas hotplug events returned from rtas,check-exception calls
> and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out
> to userspace.

Wouldn't we still want the event surfaced to userspace so that it can at
least be logged?

-Tyrel

> 
> For PowerVM systems, I created the /proc/powerpc/dlpar file that a rtas
> hotplug event can be written to and passed to handle_dlpar_errorlog().
> There is no chance of updating how we receive hotplug requests on PowerVM
> systems.
> 
> Hopefully that explains the design better.
> 
>>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
>>> index a2450b8..574ec73 100644
>>> --- a/arch/powerpc/platforms/pseries/dlpar.c
>>> +++ b/arch/powerpc/platforms/pseries/dlpar.c
>>> @@ -16,7 +16,9 @@
>>>  #include <linux/cpu.h>
>>>  #include <linux/slab.h>
>>>  #include <linux/of.h>
>>> +#include <linux/proc_fs.h>
>>>  #include "offline_states.h"
>>> +#include "pseries.h"
>>>  
>>>  #include <asm/prom.h>
>>>  #include <asm/machdep.h>
>>> @@ -530,13 +532,72 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>>>  	return count;
>>>  }
>>>  
>>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>>
>> That is really confusing, but I think it's just a diff artifact?
> 
> Yes, diff artifact.
> 
>>
>>> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
>>> +{
>>> +	struct pseries_errorlog *pseries_log;
>>> +	struct pseries_hp_errorlog *hp_elog;
>>> +	int rc = -EINVAL;
>>> +
>>> +	pseries_log = get_pseries_errorlog(error_log,
>>> +					   PSERIES_ELOG_SECT_ID_HOTPLUG);
>>> +	if (!pseries_log)
>>> +		return rc;
>>> +
>>> +	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
>>> +	if (!hp_elog)
>>> +		return rc;
>>
>> I don't see how that can happen?
>>
>> struct pseries_errorlog {
>> 	__be16 id;			/* 0x00 2-byte ASCII section ID	*/
>> 	__be16 length;			/* 0x02 Section length in bytes	*/
>> 	uint8_t version;		/* 0x04 Section version		*/
>> 	uint8_t subtype;		/* 0x05 Section subtype		*/
>> 	__be16 creator_component;	/* 0x06 Creator component ID	*/
>> 	uint8_t data[];			/* 0x08 Start of section data	*/
>> };
>>
>> Should you be checking for length == 0 instead ?
>>
> 
> You are correct.
>  
>> Also I think the code will probably end up cleaner if you do the endian
>> conversions immediately when you read the hp_elog, rather than passing it
>> around in BE and having to remember to convert at all the usages.
>>
> 
> Agreed. I'll clean that up.
> 
>>> +	switch (hp_elog->resource) {
>>> +	case PSERIES_HP_ELOG_RESOURCE_MEM:
>>> +		rc = dlpar_memory(hp_elog);
>>> +		break;
>>
>> Please add:
>>
>>   default:
>> 	pr_warn_ratelimited("Unknown resource ..")
>>
>> Or something.
> 
> can do.
> 
>>
>>
>>> +	}
>>> +
>>> +	return rc;
>>> +}
>>> +
>>> +static ssize_t dlpar_write(struct file *file, const char __user *buf,
>>> +			   size_t count, loff_t *offset)
>>> +{
>>> +	char *event_buf;
>>> +	int rc;
>>> +
>>> +	event_buf = kmalloc(count + 1, GFP_KERNEL);
>>
>> Why + 1 ? It's not null-terminated AFAICS.
> 
> I think that's just habit. You're correct, it's not needed.
> 
>>
>>> +	if (!event_buf)
>>> +		return -ENOMEM;
>>> +
>>> +	rc = copy_from_user(event_buf, buf, count);
>>> +	if (rc) {
>>> +		kfree(event_buf);
>>> +		return rc;
>>> +	}
>>> +
>>> +	rc = handle_dlpar_errorlog((struct rtas_error_log *)event_buf);
>>
>> If you start with a struct rtas_error_log * you shouldn't need any casts.
> 
> good point.
> 
>>
>>> +	kfree(event_buf);
>>> +	return rc ? rc : count;
>>> +}
>>> +
>>> +static const struct file_operations dlpar_fops = {
>>> +	.write = dlpar_write,
>>> +	.llseek = noop_llseek,
>>> +};
>>> +
>>>  static int __init pseries_dlpar_init(void)
>>>  {
>>> +	struct proc_dir_entry *proc_ent;
>>> +
>>> +	proc_ent = proc_create("powerpc/dlpar", S_IWUSR, NULL, &dlpar_fops);
>>> +	if (proc_ent)
>>> +		proc_set_size(proc_ent, 0);
>>
>>   else
>> 	error message at least please
>>
>> Why are we putting it in /proc, can't it go in /sys/kernel like the mobility
>> stuff?
> 
> I have no personal preference, I'll move it to /sys/kernel and update the
> creation handling.
> 
>>
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> index 24abc5c..0e60e15 100644
>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> @@ -20,6 +22,9 @@
>>>  #include <asm/machdep.h>
>>>  #include <asm/prom.h>
>>>  #include <asm/sparsemem.h>
>>> +#include <asm/rtas.h>
>>> +
>>> +DEFINE_MUTEX(dlpar_mem_mutex);
>>
>> static ?
> 
> yes, that should have been static.
> 
>>
>>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>>> index b94516b..28bd994 100644
>>> --- a/arch/powerpc/platforms/pseries/pseries.h
>>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>>> @@ -62,6 +63,15 @@ extern int dlpar_detach_node(struct device_node *);
>>>  extern int dlpar_acquire_drc(u32);
>>>  extern int dlpar_release_drc(u32);
>>>  
>>> +#ifdef CONFIG_MEMORY_HOTPLUG
>>> +extern int dlpar_memory(struct pseries_hp_errorlog *);
>>> +#else
>>> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>>> +{
>>> +	return -ENOTSUPP;
>>
>> EOPNOTSUPP is a bit more standard.
> 
> ok.
> 
> Thanks for all the feedback.
> 
> -Nathan
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/5] pseries: Create device hotplug entry point
  2014-09-23  1:15       ` Tyrel Datwyler
@ 2014-09-23 14:43         ` Nathan Fontenot
  0 siblings, 0 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-23 14:43 UTC (permalink / raw)
  To: Tyrel Datwyler, Michael Ellerman; +Cc: linuxppc-dev

On 09/22/2014 08:15 PM, Tyrel Datwyler wrote:
> On 09/17/2014 12:15 PM, Nathan Fontenot wrote:
>> On 09/17/2014 02:07 AM, Michael Ellerman wrote:
>>>
>>> On Mon, 2014-09-15 at 15:31 -0500, Nathan Fontenot wrote:
>>>> For pseries system the kernel will be notified of hotplug requests in
>>>> the form of rtas hotplug events. 
>>>
>>> Can you flesh that design out a bit for me, I don't entirely get how it's going
>>> to work.
>>>
>>> The kernel gets the rtas hotplug events (in rtasd.c) and spits them out to
>>> userspace, which then writes them back in ?
>>>
>>>> This patch creates a common routine that can handle these requests in both
>>>> the PowerVM anbd PowerKVM environments, handle_dlpar_errorlog(). This also
>>>                 ^
>>>> creates the initial memory hotplug request handling stub.
>>>>
>>>> For PowerVM this patch also creates a new /proc file that the drmgr
>>>> command will use to write rtas hotplug events to.
>>>
>>> Why is this different between phyp and KVM?
>>>
>>>> For future PowerKVM handling the rtas check-exception code can pass
>>>> any rtas hotplug events received to handle_dlpar_errorlog().
>>>
>>> Internally to the kernel you mean?
>>>
>>
>> Perhaps a better explanation of how things work today and where I see
>> them going is needed. I was trying to avoid a long explanation and I
>> don't think my shortened explanation worked. I'll include this in v2
>> of the patchset too.
>>
>> The current hotplug (or dlpar) of devices (the process is generally the
>> same for memory, cpu, and pci) on PowerVM systems is initiated
>> from the HMC, which communicates the request to the partitions through
>> the RSCT framework. The RSCT framework then invokes the drmgr command.
>> The drmgr command performs the hotplug operation by doing some pieces,
>> such as most of the rtas calls and device tree parsing, in userspace
>> and make requests to the kernel to online/offline the device, update the
>> device tree and add/remove the device.
>>
>> For PowerKVM the approach is to follow what is currently being done for
>> pci hotplug. A hotplug request is initiated from the host. QEMU then
>> sends an EPOW interrupt to the guest which causes the guest to make the
>> rtas,check-exception call. In QEMU, the rtas,check-exception call
>> returns a rtas hotplug event to the guest. I was using this same framework
>> to also enable memory (and next cpu) hotplug.
>>
>> You are correct that the current pci hotplug path for PowerKVM involves
>> the kernel receiving the rtas event, passing it to rtas_errd in userspace,
>> and having rtas_errd invoke drmgr. The drmgr command then handles the request
>> as described above for PowerVM systems.
>>
>> There is no need for this circuitous route, we should just handle the entire
>> hotplug of devices in the kernel. What I am hoping to do is to enable this
>> by moving the code to handle hotplug from drmgr into the kernel and 
>> provide a single path for handling hotplug for PowerVM and PowerKVM. To
>> make this work for PowerKVM we will update the kernel rtas code to
>> recognize rtas hotplug events returned from rtas,check-exception calls
>> and call handle_dlpar_errorlog(). The hotplug rtas event is never sent out
>> to userspace.
> 
> Wouldn't we still want the event surfaced to userspace so that it can at
> least be logged?
> 

The only logging of hotplug/dlpar events we do is putting a notification
iv /var/log/messages. This is done today by the drmgr command.

I can add a pr_info message to log the hotplug/dlpar request and it's
success/failure.

Also, I believe one of the longer term goals is to not require the rtas_errd
daemon for PowerKVM.

-Nathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [4/5] pseries: Implement memory hotplug add in the kernel
  2014-09-17  7:07   ` [4/5] " Michael Ellerman
  2014-09-17 19:45     ` Nathan Fontenot
@ 2014-09-24 20:57     ` Nathan Fontenot
  1 sibling, 0 replies; 19+ messages in thread
From: Nathan Fontenot @ 2014-09-24 20:57 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On 09/17/2014 02:07 AM, Michael Ellerman wrote:
> 
> On Mon, 2014-09-15 at 15:32 -0500, Nathan Fontenot wrote:
>> This patch adds the ability to do memory hotplug adding in the kernel.
>>
>> Currently the hotplug add/remove of memory is handled by the drmgr
>> command. The drmgr command performs the add/remove by performing
>> some work in user-space and making requests to the kernel to handle 
>> other pieces. By moving all of the work to the kernel we can do the
>> add and remove faster, and provide a common place to do memory hotplug
>> for both the PowerVM and PowerKVM environments.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---

>> +	for (i = 0; i < entries; i++, lmb++) {
>> +		u32 drc_index = be32_to_cpu(lmb->drc_index);
>> +
>> +		if (lmbs_to_add == lmbs_added)
>> +			break;
>> +
>> +		if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
>> +			continue;
>> +
>> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX
>> +		    && lmb->drc_index != hp_elog->_drc_u.drc_index)
>> +			continue;
>> +
>> +		rc = dlpar_acquire_drc(drc_index);
>> +		if (rc)
>> +			continue;
>> +
>> +		rc = dlpar_add_one_lmb(lmb);
>> +		if (rc) {
>> +			dlpar_release_drc(drc_index);
>> +			continue;
>> +		}
> 
> In both the above error cases you just move along. That means we potentially
> hotplugged some memory but not everything that we were asked to. That seems
> like a bad idea, we should either do everything or nothing.
> 

Michael, how set are you on the all or nothing approach?

Note that I think the all or nothing approach is best but I think it will
present some problems. We do memory add (and remove) on a LMB basis, so it
is possible to hit a scenario in which we cannot revert back to the original
state. For example, a request to add 5 LMBs only succeeds in adding 4 LMBs.
There is no guarantee that we then remove the 4 MLBs that were added. That
memory could be in use somewhere that it cannot be moved.

I would suggest we continue with the current approach in that we try to
satisfy the request but not try to roll-back the changes if the entire
request cannot be satisfied.

-Nathan   

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-09-24 20:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15 20:26 [PATCH 0/5] pseries: Move memory hotplug to the kernel Nathan Fontenot
2014-09-15 20:29 ` [PATCH 1/5] pseries: Define rtas hotplug event sections Nathan Fontenot
2014-09-17  7:06   ` [1/5] " Michael Ellerman
2014-09-17 16:49     ` Nathan Fontenot
2014-09-15 20:30 ` [PATCH 2/5] pseries: Export drc_[acquire|release]_drc() routines Nathan Fontenot
2014-09-17  7:07   ` [2/5] " Michael Ellerman
2014-09-17 16:50     ` Nathan Fontenot
2014-09-15 20:31 ` [PATCH 3/5] pseries: Create device hotplug entry point Nathan Fontenot
2014-09-17  7:07   ` [3/5] " Michael Ellerman
2014-09-17 19:15     ` Nathan Fontenot
2014-09-23  1:15       ` Tyrel Datwyler
2014-09-23 14:43         ` Nathan Fontenot
2014-09-15 20:32 ` [PATCH 4/5] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
2014-09-17  7:07   ` [4/5] " Michael Ellerman
2014-09-17 19:45     ` Nathan Fontenot
2014-09-24 20:57     ` Nathan Fontenot
2014-09-15 20:33 ` [PATCH 5/5] pseries: Implement memory hotplug remove " Nathan Fontenot
2014-09-17  7:07   ` [5/5] " Michael Ellerman
2014-09-17 19:58     ` Nathan Fontenot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).