All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] pseries: Move memory hotplug to the kernel
@ 2014-11-17 21:44 Nathan Fontenot
  2014-11-17 21:48 ` [PATCH v2 1/6] pseries: Define rtas hotplug event sections Nathan Fontenot
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:44 UTC (permalink / raw)
  To: linuxppc-dev

In order to better support device hotplug (cpu, memory, and pci) in the
PowerVM and PowerKVM environments, the handling of device hotplug
could be updated so that the act of hotplugging a device occurs entirely
in the kernel. This patch set begins to address this by moving
memory hotplug to the kernel. Patches to follow will do the same
for cpu and pci devices.

To provide background, the current handling of memory hotplug is
handled by the drmgr command. This command is invoked when memory
add/remove requests are made at the HMC and conveyed to a partition
through the RSCT framework. The drmgr command then performs parts
of the hotplug in user-space and makes requests to the kernel to perform
other pieces. This is not really ideal, we can do everything in the
kernel and do it faster.

In this patchset, hotplug events will now be communicated to the kernel
in the form of rtas hotplug events. For PowerKVM systems this is done
by qemu using the ras epow interrupt. For PowerVM systems the drmgr
command will be updated to create a rtas hotplug event and send it to
the kernel via a new /sys/kernel/dlpar interface. Both of these
entry points for hotplug rtas events then call a common routine
for handling rtas hotplug events.

-Nathan

Patch 1/6
- Add definition of hotplug rtas event sections.

Patch 2/6
- Update struct of_drconf_cell to use __be64/__be32
 
Patch 3/6
- Export the dlpar_[acquire|release]drc() routines.

Patch 4/6
- Create the new /sys/kernel/dlpar interface

Patch 5/6
- Implement memory hotplug add in the kernel.

Patch 6/6
- Implement memory hotplug remove in the kernel.

 include/asm/prom.h                 |   10 
 include/asm/rtas.h                 |   26 ++
 platforms/pseries/dlpar.c          |   72 +++++
 platforms/pseries/hotplug-memory.c |  469 ++++++++++++++++++++++++++++++++++++-
 platforms/pseries/pseries.h        |   12 
 5 files changed, 576 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] pseries: Define rtas hotplug event sections
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
@ 2014-11-17 21:48 ` Nathan Fontenot
  2014-11-17 21:50 ` [PATCH v2 2/6] pseries: Update of_drconf_cell struct for endian-ness Nathan Fontenot
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:48 UTC (permalink / raw)
  To: linuxppc-dev

In order to handle device hotplug in the kernel on pseries hotplug
notifications will be communicated to the kernel in the form of a
rtas hotplug events. This patch adds the definition of rtas hotplug event
sections.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index b390f55..77e112e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -273,6 +273,7 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_MANUFACT_INFO	(('M' << 8) | 'I')
 #define PSERIES_ELOG_SECT_ID_CALL_HOME		(('C' << 8) | 'H')
 #define PSERIES_ELOG_SECT_ID_USER_DEF		(('U' << 8) | 'D')
+#define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
 
 /* Vendor specific Platform Event Log Format, Version 6, section header */
 struct pseries_errorlog {
@@ -296,6 +297,31 @@ inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
 	return be16_to_cpu(sect->length);
 }
 
+/* RTAS pseries hotplug errorlog section */
+struct pseries_hp_errorlog {
+	u8	resource;
+	u8	action;
+	u8	id_type;
+	u8	reserved;
+	union {
+		__be32	drc_index;
+		__be32	drc_count;
+		char	drc_name[1];
+	} _drc_u;
+};
+
+#define PSERIES_HP_ELOG_RESOURCE_CPU	1
+#define PSERIES_HP_ELOG_RESOURCE_MEM	2
+#define PSERIES_HP_ELOG_RESOURCE_SLOT	3
+#define PSERIES_HP_ELOG_RESOURCE_PHB	4
+
+#define PSERIES_HP_ELOG_ACTION_ADD	1
+#define PSERIES_HP_ELOG_ACTION_REMOVE	2
+
+#define PSERIES_HP_ELOG_ID_DRC_NAME	1
+#define PSERIES_HP_ELOG_ID_DRC_INDEX	2
+#define PSERIES_HP_ELOG_ID_DRC_COUNT	3
+
 struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 					      uint16_t section_id);
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] pseries: Update of_drconf_cell struct for endian-ness
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
  2014-11-17 21:48 ` [PATCH v2 1/6] pseries: Define rtas hotplug event sections Nathan Fontenot
@ 2014-11-17 21:50 ` Nathan Fontenot
  2014-11-17 21:51 ` [PATCH v2 3/6] pseries: Create new device hotplug entry point Nathan Fontenot
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:50 UTC (permalink / raw)
  To: linuxppc-dev

The of_drconf_cell defines each LMB for a system in the device tree property
ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory. The values are
in BE format by definition, this patch updates the of_drconf_cell struct
to reflect the proper endian-ness.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/prom.h |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 7f436ba..6f1cbe3 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -85,11 +85,11 @@ extern int of_get_ibm_chip_id(struct device_node *np);
  * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
  */
 struct of_drconf_cell {
-	u64	base_addr;
-	u32	drc_index;
-	u32	reserved;
-	u32	aa_index;
-	u32	flags;
+	__be64	base_addr;
+	__be32	drc_index;
+	__be32	reserved;
+	__be32	aa_index;
+	__be32	flags;
 };
 
 #define DRCONF_MEM_ASSIGNED	0x00000008

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] pseries: Create new device hotplug entry point
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
  2014-11-17 21:48 ` [PATCH v2 1/6] pseries: Define rtas hotplug event sections Nathan Fontenot
  2014-11-17 21:50 ` [PATCH v2 2/6] pseries: Update of_drconf_cell struct for endian-ness Nathan Fontenot
@ 2014-11-17 21:51 ` Nathan Fontenot
  2014-11-17 22:53   ` Gavin Shan
                     ` (2 more replies)
  2014-11-17 21:53 ` [PATCH v2 4/6] pseries: Export the acquire/release drc index routines Nathan Fontenot
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:51 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman

Create a new entry point for device hotplug on pseries that will
work for both PowerVM and PowerKVM systems.

The current process to hotplug (or dlpar) devices (generally the same
process for memory, cpu, and pci devices) on PowerVM systems is initiated
from the HMC, which communicates the request to the partitions through
the RSCT framework. The RSCT framework then invokes the drmgr command.
The drmgr command performs the hotplug operation by doing some pieces,
such as most of the rtas calls and device tree parsing, in userspace
and make requests to the kernel to online/offline the device, update the
device tree and add/remove the device.

For PowerKVM the approach for device hotplug is to follow what is currently
being done for pci hotplug. A hotplug request is initiated from the host,
QEMU then generates an EPOW interrupt to the guest which causes the guest
to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
returns a rtas hotplug event to the guest.

Please note that the current pci hotplug path for PowerKVM involves the
kernel receiving the rtas hotplug event, passing it to rtas_errd in
userspace, and having rtas_errd invoke drmgr. The drmgr command then
handles the request as described above for PowerVM systems. This is to
be updated to perform pci completely in the kernel in a later patch set.

There is no need for this circuitous route, we should handle the entire
hotplug of devices in the kernel. What I am planning is to enable this
by moving the code to handle device hotplug from drmgr into the kernel to
provide a single path for both PowerVM and PowerKVM systems. This patch
provides the common entry point. For PowerKVM a future update to the kernel
rtas code will recognize rtas hotplug events returned from
rtas,check-exception calls and use the common entry point to handle device
hotplug entirely in the kernel.

For PowerVM systems, this patch creates the /sys/kernel/dlpar file that rtas
hotplug events can be written to by drmgr and passed to the common entry point.
There is no chance of updating how we receive hotplug requests on PowerVM
systems.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c          |   72 ++++++++++++++++++++++-
 arch/powerpc/platforms/pseries/hotplug-memory.c |   19 ++++++
 arch/powerpc/platforms/pseries/pseries.h        |   10 +++
 3 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index c22bb1b..ec825d3 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -10,6 +10,8 @@
  * 2 as published by the Free Software Foundation.
  */
 
+#define pr_fmt(fmt)	"dlpar: " fmt
+
 #include <linux/kernel.h>
 #include <linux/notifier.h>
 #include <linux/spinlock.h>
@@ -535,13 +537,79 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
 	return count;
 }
 
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
+
+static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
+{
+	struct pseries_errorlog *pseries_log;
+	struct pseries_hp_errorlog *hp_elog;
+	int rc;
+
+	pseries_log = get_pseries_errorlog(error_log,
+					   PSERIES_ELOG_SECT_ID_HOTPLUG);
+	if (!pseries_log || (pseries_log->length == 0))
+		return -EINVAL;
+
+	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
+
+	/* Go ahead and convert the hotplug type to the correct endianness
+	 * to avoid converting it everywhere we use it.
+	 */
+	switch (hp_elog->id_type) {
+	case PSERIES_HP_ELOG_ID_DRC_COUNT:
+		hp_elog->_drc_u.drc_count =
+					be32_to_cpu(hp_elog->_drc_u.drc_count);
+	case PSERIES_HP_ELOG_ID_DRC_INDEX:
+		hp_elog->_drc_u.drc_index =
+					be32_to_cpu(hp_elog->_drc_u.drc_index);
+	}
+
+	switch (hp_elog->resource) {
+	case PSERIES_HP_ELOG_RESOURCE_MEM:
+		rc = dlpar_memory(hp_elog);
+		break;
+	default:
+		pr_warn_ratelimited("Invalid resource (%d) specified\n",
+				    hp_elog->resource);
+		rc = -EINVAL;
+		break;
+	}
+
+	return rc;
+}
+
+static ssize_t dlpar_store(struct file *filp, struct kobject *kobj,
+			   struct bin_attribute *bin_attr, char *buf,
+			   loff_t pos, size_t count)
+{
+	struct rtas_error_log *error_log;
+	int rc;
+
+	error_log = kmalloc(count, GFP_KERNEL);
+	if (!error_log)
+		return -ENOMEM;
+
+	memcpy(error_log, buf, count);
+
+	rc = handle_dlpar_errorlog(error_log);
+	kfree(error_log);
+	return rc ? rc : count;
+}
+
+static BIN_ATTR(dlpar, S_IWUSR, NULL, dlpar_store, 0);
+
 static int __init pseries_dlpar_init(void)
 {
+	int rc;
+
+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
 	ppc_md.cpu_probe = dlpar_cpu_probe;
 	ppc_md.cpu_release = dlpar_cpu_release;
+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
-	return 0;
+	rc = sysfs_create_bin_file(kernel_kobj, &bin_attr_dlpar);
+
+	return rc;
 }
 machine_device_initcall(pseries, pseries_dlpar_init);
 
-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 3cb256c..69d178b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -9,6 +9,8 @@
  *      2 of the License, or (at your option) any later version.
  */
 
+#define pr_fmt(fmt)	"pseries-hotplug-mem: " fmt
+
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/memblock.h>
@@ -134,6 +136,23 @@ static inline int pseries_remove_mem_node(struct device_node *np)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
+{
+	int rc = 0;
+
+	lock_device_hotplug();
+
+	switch (hp_elog->action) {
+	default:
+		pr_err("Invalid action (%d) specified\n", hp_elog->action);
+		rc = -EINVAL;
+		break;
+	}
+
+	unlock_device_hotplug();
+	return rc;
+}
+
 static int pseries_add_mem_node(struct device_node *np)
 {
 	const char *type;
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 239bee5..40e0339 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -11,6 +11,7 @@
 #define _PSERIES_PSERIES_H
 
 #include <linux/interrupt.h>
+#include <asm/rtas.h>
 
 struct device_node;
 
@@ -63,6 +64,15 @@ extern int dlpar_detach_node(struct device_node *);
 int dlpar_acquire_drc(u32 drc_index);
 int dlpar_release_drc(u32 drc_index);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
+#else
+static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 /* PCI root bridge prepare function override for pseries */
 struct pci_host_bridge;
 int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] pseries: Export the acquire/release drc index routines
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
                   ` (2 preceding siblings ...)
  2014-11-17 21:51 ` [PATCH v2 3/6] pseries: Create new device hotplug entry point Nathan Fontenot
@ 2014-11-17 21:53 ` Nathan Fontenot
  2014-11-17 21:54 ` [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:53 UTC (permalink / raw)
  To: linuxppc-dev

Export the routines to acquire and release a drc index.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/pseries.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 1796c54..239bee5 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -60,6 +60,8 @@ extern struct device_node *dlpar_configure_connector(__be32,
 						struct device_node *);
 extern int dlpar_attach_node(struct device_node *);
 extern int dlpar_detach_node(struct device_node *);
+int dlpar_acquire_drc(u32 drc_index);
+int dlpar_release_drc(u32 drc_index);
 
 /* PCI root bridge prepare function override for pseries */
 struct pci_host_bridge;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
                   ` (3 preceding siblings ...)
  2014-11-17 21:53 ` [PATCH v2 4/6] pseries: Export the acquire/release drc index routines Nathan Fontenot
@ 2014-11-17 21:54 ` Nathan Fontenot
  2014-11-21  7:49   ` Cyril Bur
  2014-11-17 21:56 ` [PATCH v2 6/6] pseries: Implement memory hotplug remove " Nathan Fontenot
  2014-11-18  2:00 ` [PATCH v2 0/6] pseries: Move memory hotplug to " Cyril Bur
  6 siblings, 1 reply; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:54 UTC (permalink / raw)
  To: linuxppc-dev

Move handling of memory hotplug add on pseries completely into the kernel.

The current memory hotplug add path involves the drmgr command doing part
of this work in userspace and requesting the kernel to do additional pieces.
This patch allows us to handle the act completely in the kernel via rtas
hotplug events. This allows us to perform the operation faster and provide
a common memory hotplug add path for PowerVM and PowerKVM systems.

The patch does introduce a static rtas_hp_event variable that is set to
true when updating the device tree during memory hotplug initiated from
a rtas hotplug event. This is needed because we do not need to do the
work in the of notifier, this work is already performed in handling the
hotplug request. At a later time we can remove this when we deprecate the
previous method of memory hotplug.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  244 +++++++++++++++++++++++
 1 file changed, 243 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 69d178b..b57d42b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -16,6 +16,7 @@
 #include <linux/memblock.h>
 #include <linux/memory.h>
 #include <linux/memory_hotplug.h>
+#include <linux/slab.h>
 
 #include <asm/firmware.h>
 #include <asm/machdep.h>
@@ -23,6 +24,8 @@
 #include <asm/sparsemem.h>
 #include "pseries.h"
 
+static bool rtas_hp_event;
+
 unsigned long pseries_memory_block_size(void)
 {
 	struct device_node *np;
@@ -66,6 +69,52 @@ unsigned long pseries_memory_block_size(void)
 	return memblock_size;
 }
 
+static void dlpar_free_drconf_property(struct property *prop)
+{
+	kfree(prop->name);
+	kfree(prop->value);
+	kfree(prop);
+}
+
+static struct property *dlpar_clone_drconf_property(struct device_node *dn)
+{
+	struct property *prop, *new_prop;
+
+	prop = of_find_property(dn, "ibm,dynamic-memory", NULL);
+	if (!prop)
+		return NULL;
+
+	new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
+	if (!new_prop)
+		return NULL;
+
+	new_prop->name = kstrdup(prop->name, GFP_KERNEL);
+	new_prop->value = kmalloc(prop->length, GFP_KERNEL);
+	if (!new_prop->name || !new_prop->value) {
+		dlpar_free_drconf_property(new_prop);
+		return NULL;
+	}
+
+	memcpy(new_prop->value, prop->value, prop->length);
+	new_prop->length = prop->length;
+
+	return new_prop;
+}
+
+static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
+{
+	unsigned long section_nr;
+	struct mem_section *mem_sect;
+	struct memory_block *mem_block;
+	u64 phys_addr = be64_to_cpu(lmb->base_addr);
+
+	section_nr = pfn_to_section_nr(PFN_DOWN(phys_addr));
+	mem_sect = __nr_to_section(section_nr);
+
+	mem_block = find_memory_block(mem_sect);
+	return mem_block;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
 {
@@ -136,19 +185,209 @@ static inline int pseries_remove_mem_node(struct device_node *np)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+static int dlpar_add_lmb(struct of_drconf_cell *lmb)
+{
+	struct memory_block *mem_block;
+	u64 phys_addr;
+	uint32_t drc_index;
+	unsigned long pages_per_block;
+	unsigned long block_sz;
+	int nid, sections_per_block;
+	int rc;
+
+	if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
+		return -EINVAL;
+
+	phys_addr = be64_to_cpu(lmb->base_addr);
+	drc_index = be32_to_cpu(lmb->drc_index);
+	block_sz = memory_block_size_bytes();
+	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+	pages_per_block = PAGES_PER_SECTION * sections_per_block;
+
+	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
+		return -EINVAL;
+
+	rc = dlpar_acquire_drc(drc_index);
+	if (rc)
+		return rc;
+
+	/* Find the node id for this address */
+	nid = memory_add_physaddr_to_nid(phys_addr);
+
+	/* Add the memory */
+	rc = add_memory(nid, phys_addr, block_sz);
+	if (rc) {
+		dlpar_release_drc(drc_index);
+		return rc;
+	}
+
+	/* Register this block of memory */
+	rc = memblock_add(phys_addr, block_sz);
+	if (rc) {
+		remove_memory(nid, phys_addr, block_sz);
+		dlpar_release_drc(drc_index);
+		return rc;
+	}
+
+	mem_block = lmb_to_memblock(lmb);
+	if (!mem_block) {
+		remove_memory(nid, phys_addr, block_sz);
+		dlpar_release_drc(drc_index);
+		return -EINVAL;
+	}
+
+	rc = device_online(&mem_block->dev);
+	put_device(&mem_block->dev);
+	if (rc) {
+		remove_memory(nid, phys_addr, block_sz);
+		dlpar_release_drc(drc_index);
+		return rc;
+	}
+
+	lmb->flags |= cpu_to_be32(DRCONF_MEM_ASSIGNED);
+	return 0;
+}
+
+static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
+				     struct property *prop)
+{
+	struct of_drconf_cell *lmbs;
+	uint32_t num_lmbs;
+	__be32 *p;
+	int i, lmbs_to_add;
+	int lmbs_available = 0;
+	int lmbs_added = 0;
+	int rc;
+
+	lmbs_to_add = be32_to_cpu(hp_elog->_drc_u.drc_count);
+	pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
+
+	if (lmbs_to_add == 0)
+		return -EINVAL;
+
+	p = prop->value;
+	num_lmbs = be32_to_cpu(*p++);
+	lmbs = (struct of_drconf_cell *)p;
+
+	/* Validate that there are enough LMBs to satisfy the request */
+	for (i = 0; i < num_lmbs; i++) {
+		if (!(be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED))
+			lmbs_available++;
+	}
+
+	if (lmbs_available < lmbs_to_add)
+		return -EINVAL;
+
+	for (i = 0; i < num_lmbs; i++) {
+		if (lmbs_to_add == lmbs_added)
+			break;
+
+		rc = dlpar_add_lmb(&lmbs[i]);
+		if (rc)
+			continue;
+
+		lmbs_added++;
+		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
+			be64_to_cpu(lmbs[i].base_addr),
+			be32_to_cpu(lmbs[i].drc_index));
+
+		/* Mark this lmb so we can remove it later if all of the
+		 * requested LMBs cannot be added.
+		 */
+		lmbs[i].reserved = 1;
+	}
+
+	if (lmbs_added != lmbs_to_add) {
+		/* TODO: remove added lmbs */
+		rc = -EINVAL;
+	}
+
+	/* Clear the reserved fields */
+	for (i = 0; i < num_lmbs; i++)
+		lmbs[i].reserved = 0;
+
+	return rc;
+}
+
+static int dlpar_memory_add_by_index(struct pseries_hp_errorlog *hp_elog,
+				     struct property *prop)
+{
+	struct of_drconf_cell *lmbs;
+	uint32_t num_lmbs, drc_index;
+	__be32 *p;
+	int i, lmb_found;
+	int rc;
+
+	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
+	pr_info("Attempting to hot-add LMB, drc index %x\n", drc_index);
+
+	p = prop->value;
+	num_lmbs = be32_to_cpu(*p++);
+	lmbs = (struct of_drconf_cell *)p;
+
+	lmb_found = 0;
+	for (i = 0; i < num_lmbs; i++) {
+		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
+			lmb_found = 1;
+			rc = dlpar_add_lmb(&lmbs[i]);
+			break;
+		}
+	}
+
+	if (!lmb_found)
+		rc = -EINVAL;
+
+	if (rc)
+		pr_info("Failed to hot-add memory, drc index %x\n", drc_index);
+	else
+		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
+			be64_to_cpu(lmbs[i].base_addr), drc_index);
+
+	return rc;
+}
+
 int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 {
-	int rc = 0;
+	struct device_node *dn;
+	struct property *prop;
+	int rc;
 
 	lock_device_hotplug();
 
+	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+	if (!dn)
+		return -EINVAL;
+
+	prop = dlpar_clone_drconf_property(dn);
+	if (!prop) {
+		of_node_put(dn);
+		return -EINVAL;
+	}
+
 	switch (hp_elog->action) {
+	case PSERIES_HP_ELOG_ACTION_ADD:
+		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
+			rc = dlpar_memory_add_by_count(hp_elog, prop);
+		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
+			rc = dlpar_memory_add_by_index(hp_elog, prop);
+		else
+			rc = -EINVAL;
+		break;
 	default:
 		pr_err("Invalid action (%d) specified\n", hp_elog->action);
 		rc = -EINVAL;
 		break;
 	}
 
+	if (rc)
+		dlpar_free_drconf_property(prop);
+	else {
+		rtas_hp_event = true;
+		of_update_property(dn, prop);
+		rtas_hp_event = false;
+	}
+
+	of_node_put(dn);
 	unlock_device_hotplug();
 	return rc;
 }
@@ -193,6 +432,9 @@ static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
 	__be32 *p;
 	int i, rc = -EINVAL;
 
+	if (rtas_hp_event)
+		return 0;
+
 	memblock_size = pseries_memory_block_size();
 	if (!memblock_size)
 		return -EINVAL;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] pseries: Implement memory hotplug remove in the kernel
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
                   ` (4 preceding siblings ...)
  2014-11-17 21:54 ` [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
@ 2014-11-17 21:56 ` Nathan Fontenot
  2014-11-21  7:49   ` Cyril Bur
  2014-11-18  2:00 ` [PATCH v2 0/6] pseries: Move memory hotplug to " Cyril Bur
  6 siblings, 1 reply; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-17 21:56 UTC (permalink / raw)
  To: linuxppc-dev

Move handling of memory hotplug remove on pseries completely into the kernel.

The current memory hotplug remove path involves the drmgr command doing part
of this work in userspace and requesting the kernel to do additional pieces.
This patch allows us to handle the act completely in the kernel via rtas
hotplug events. This allows us to perform the operation faster and provide
a common memory hotplug remove path for PowerVM and PowerKVM systems.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  206 ++++++++++++++++++++++-
 1 file changed, 201 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index b57d42b..c8189e8 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -173,6 +173,179 @@ static int pseries_remove_mem_node(struct device_node *np)
 	pseries_remove_memblock(base, lmb_size);
 	return 0;
 }
+
+static int lmb_is_removable(struct of_drconf_cell *lmb)
+{
+	int i, scns_per_block;
+	int rc = 1;
+	unsigned long pfn, block_sz;
+	u64 phys_addr;
+
+	if (!(be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED))
+		return -1;
+
+	phys_addr = be64_to_cpu(lmb->base_addr);
+	block_sz = memory_block_size_bytes();
+	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+
+	for (i = 0; i < scns_per_block; i++) {
+		pfn = PFN_DOWN(phys_addr);
+		if (!pfn_present(pfn))
+			continue;
+
+		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+		phys_addr += MIN_MEMORY_BLOCK_SIZE;
+	}
+
+	return rc;
+}
+
+static int dlpar_add_lmb(struct of_drconf_cell *);
+
+static int dlpar_remove_lmb(struct of_drconf_cell *lmb)
+{
+	struct memory_block *mem_block;
+	unsigned long block_sz;
+	u64 phys_addr;
+	uint32_t drc_index;
+	int nid, rc;
+
+	if (!lmb_is_removable(lmb))
+		return -EINVAL;
+
+	phys_addr = be64_to_cpu(lmb->base_addr);
+	drc_index = be32_to_cpu(lmb->drc_index);
+
+	mem_block = lmb_to_memblock(lmb);
+	if (!mem_block)
+		return -EINVAL;
+
+	rc = device_offline(&mem_block->dev);
+	put_device(&mem_block->dev);
+	if (rc)
+		return rc;
+
+	block_sz = pseries_memory_block_size();
+	nid = memory_add_physaddr_to_nid(phys_addr);
+
+	remove_memory(nid, phys_addr, block_sz);
+
+	/* Update memory regions for memory remove */
+	memblock_remove(phys_addr, block_sz);
+
+	dlpar_release_drc(drc_index);
+
+	lmb->flags &= cpu_to_be32(~DRCONF_MEM_ASSIGNED);
+	pr_info("Memory at %llx (drc index %x) has been hot-removed\n",
+		be64_to_cpu(lmb->base_addr), drc_index);
+
+	return 0;
+}
+
+static int dlpar_memory_remove_by_count(struct pseries_hp_errorlog *hp_elog,
+					struct property *prop)
+{
+	struct of_drconf_cell *lmbs;
+	int lmbs_to_remove, lmbs_removed = 0;
+	int lmbs_available = 0;
+	uint32_t num_lmbs;
+	__be32 *p;
+	int i, rc;
+
+	lmbs_to_remove = be32_to_cpu(hp_elog->_drc_u.drc_count);
+	pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
+
+	if (lmbs_to_remove == 0)
+		return -EINVAL;
+
+	p = prop->value;
+	num_lmbs = be32_to_cpu(*p++);
+	lmbs = (struct of_drconf_cell *)p;
+
+	/* Validate that there are enough LMBs to satisfy the request */
+	for (i = 0; i < num_lmbs; i++) {
+		if (be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED)
+			lmbs_available++;
+	}
+
+	if (lmbs_available < lmbs_to_remove)
+		return -EINVAL;
+
+	for (i = 0; i < num_lmbs; i++) {
+		if (lmbs_to_remove == lmbs_removed)
+			break;
+
+		rc = dlpar_remove_lmb(&lmbs[i]);
+		if (rc)
+			continue;
+
+		lmbs_removed++;
+
+		/* Mark this lmb so we can add it later if all of the
+		 * requested LMBs cannot be removed.
+		 */
+		lmbs[i].reserved = 1;
+	}
+
+	if (lmbs_removed != lmbs_to_remove) {
+		pr_err("Memory hot-remove failed, adding LMB's back\n");
+
+		for (i = 0; i < num_lmbs; i++) {
+			if (!lmbs[i].reserved)
+				continue;
+
+			rc = dlpar_add_lmb(&lmbs[i]);
+			if (rc)
+				pr_err("Failed to add LMB back, drc index %x\n",
+				       be32_to_cpu(lmbs[i].drc_index));
+
+			lmbs[i].reserved = 0;
+		}
+		rc = -EINVAL;
+	} else {
+		/* remove any reserved markings */
+		for (i = 0; i < num_lmbs; i++)
+			lmbs[i].reserved = 0;
+	}
+
+	return rc;
+}
+
+static int dlpar_memory_remove_by_index(struct pseries_hp_errorlog *hp_elog,
+					struct property *prop)
+{
+	struct of_drconf_cell *lmbs;
+	uint32_t num_lmbs, drc_index;
+	int lmb_found;
+	__be32 *p;
+	int i, rc;
+
+	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
+	pr_info("Attempting to hot-remove LMB, drc index %x\n", drc_index);
+
+	p = prop->value;
+	num_lmbs = be32_to_cpu(*p++);
+	lmbs = (struct of_drconf_cell *)p;
+
+	lmb_found = 0;
+	for (i = 0; i < num_lmbs; i++) {
+		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
+			lmb_found = 1;
+			rc = dlpar_remove_lmb(&lmbs[i]);
+			break;
+		}
+	}
+
+	if (!lmb_found)
+		rc = -EINVAL;
+
+	if (rc)
+		pr_info("Failed to hot-remove memory, drc index %x\n",
+			drc_index);
+
+	return rc;
+}
+
 #else
 static inline int pseries_remove_memblock(unsigned long base,
 					  unsigned int memblock_size)
@@ -183,6 +356,11 @@ static inline int pseries_remove_mem_node(struct device_node *np)
 {
 	return 0;
 }
+static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 static int dlpar_add_lmb(struct of_drconf_cell *lmb)
@@ -298,14 +476,24 @@ static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
 	}
 
 	if (lmbs_added != lmbs_to_add) {
-		/* TODO: remove added lmbs */
+		pr_err("Memory hot-add failed, removing any added LMBs\n");
+
+		for (i = 0; i < num_lmbs; i++) {
+			if (!lmbs[i].reserved)
+				continue;
+
+			rc = dlpar_remove_lmb(&lmbs[i]);
+			if (rc)
+				pr_err("Failed to remove LMB, drc index %x\n",
+				       be32_to_cpu(lmbs[i].drc_index));
+		}
 		rc = -EINVAL;
+	} else {
+		/* Clear the reserved fields */
+		for (i = 0; i < num_lmbs; i++)
+			lmbs[i].reserved = 0;
 	}
 
-	/* Clear the reserved fields */
-	for (i = 0; i < num_lmbs; i++)
-		lmbs[i].reserved = 0;
-
 	return rc;
 }
 
@@ -373,6 +561,14 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 		else
 			rc = -EINVAL;
 		break;
+	case PSERIES_HP_ELOG_ACTION_REMOVE:
+		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
+			rc = dlpar_memory_remove_by_count(hp_elog, prop);
+		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
+			rc = dlpar_memory_remove_by_index(hp_elog, prop);
+		else
+			rc = -EINVAL;
+		break;
 	default:
 		pr_err("Invalid action (%d) specified\n", hp_elog->action);
 		rc = -EINVAL;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] pseries: Create new device hotplug entry point
  2014-11-17 21:51 ` [PATCH v2 3/6] pseries: Create new device hotplug entry point Nathan Fontenot
@ 2014-11-17 22:53   ` Gavin Shan
  2014-11-18 18:18     ` Nathan Fontenot
  2014-11-21  7:49   ` Cyril Bur
  2014-11-26 23:44   ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 20+ messages in thread
From: Gavin Shan @ 2014-11-17 22:53 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Nov 17, 2014 at 03:51:42PM -0600, Nathan Fontenot wrote:
>Create a new entry point for device hotplug on pseries that will
>work for both PowerVM and PowerKVM systems.
>
>The current process to hotplug (or dlpar) devices (generally the same
>process for memory, cpu, and pci devices) on PowerVM systems is initiated
>from the HMC, which communicates the request to the partitions through
>the RSCT framework. The RSCT framework then invokes the drmgr command.
>The drmgr command performs the hotplug operation by doing some pieces,
>such as most of the rtas calls and device tree parsing, in userspace
>and make requests to the kernel to online/offline the device, update the
>device tree and add/remove the device.
>
>For PowerKVM the approach for device hotplug is to follow what is currently
>being done for pci hotplug. A hotplug request is initiated from the host,
>QEMU then generates an EPOW interrupt to the guest which causes the guest
>to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
>returns a rtas hotplug event to the guest.
>
>Please note that the current pci hotplug path for PowerKVM involves the
>kernel receiving the rtas hotplug event, passing it to rtas_errd in
>userspace, and having rtas_errd invoke drmgr. The drmgr command then
>handles the request as described above for PowerVM systems. This is to
>be updated to perform pci completely in the kernel in a later patch set.
>
>There is no need for this circuitous route, we should handle the entire
>hotplug of devices in the kernel. What I am planning is to enable this
>by moving the code to handle device hotplug from drmgr into the kernel to
>provide a single path for both PowerVM and PowerKVM systems. This patch
>provides the common entry point. For PowerKVM a future update to the kernel
>rtas code will recognize rtas hotplug events returned from
>rtas,check-exception calls and use the common entry point to handle device
>hotplug entirely in the kernel.
>
>For PowerVM systems, this patch creates the /sys/kernel/dlpar file that rtas
>hotplug events can be written to by drmgr and passed to the common entry point.
>There is no chance of updating how we receive hotplug requests on PowerVM
>systems.
>
>Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>---
> arch/powerpc/platforms/pseries/dlpar.c          |   72 ++++++++++++++++++++++-
> arch/powerpc/platforms/pseries/hotplug-memory.c |   19 ++++++
> arch/powerpc/platforms/pseries/pseries.h        |   10 +++
> 3 files changed, 99 insertions(+), 2 deletions(-)
>
>diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
>index c22bb1b..ec825d3 100644
>--- a/arch/powerpc/platforms/pseries/dlpar.c
>+++ b/arch/powerpc/platforms/pseries/dlpar.c
>@@ -10,6 +10,8 @@
>  * 2 as published by the Free Software Foundation.
>  */
> 
>+#define pr_fmt(fmt)	"dlpar: " fmt
>+
> #include <linux/kernel.h>
> #include <linux/notifier.h>
> #include <linux/spinlock.h>
>@@ -535,13 +537,79 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
> 	return count;
> }
> 
>+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>+
>+static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
>+{
>+	struct pseries_errorlog *pseries_log;
>+	struct pseries_hp_errorlog *hp_elog;
>+	int rc;
>+
>+	pseries_log = get_pseries_errorlog(error_log,
>+					   PSERIES_ELOG_SECT_ID_HOTPLUG);
>+	if (!pseries_log || (pseries_log->length == 0))
>+		return -EINVAL;
>+
>+	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
>+
>+	/* Go ahead and convert the hotplug type to the correct endianness
>+	 * to avoid converting it everywhere we use it.
>+	 */
>+	switch (hp_elog->id_type) {
>+	case PSERIES_HP_ELOG_ID_DRC_COUNT:
>+		hp_elog->_drc_u.drc_count =
>+					be32_to_cpu(hp_elog->_drc_u.drc_count);
>+	case PSERIES_HP_ELOG_ID_DRC_INDEX:
>+		hp_elog->_drc_u.drc_index =
>+					be32_to_cpu(hp_elog->_drc_u.drc_index);
>+	}
>+

It seems that "break" was missed for all cases.

>+	switch (hp_elog->resource) {
>+	case PSERIES_HP_ELOG_RESOURCE_MEM:
>+		rc = dlpar_memory(hp_elog);
>+		break;
>+	default:
>+		pr_warn_ratelimited("Invalid resource (%d) specified\n",
>+				    hp_elog->resource);
>+		rc = -EINVAL;
>+		break;

Unnecessary "break" here.

>+	}
>+
>+	return rc;
>+}
>+
>+static ssize_t dlpar_store(struct file *filp, struct kobject *kobj,
>+			   struct bin_attribute *bin_attr, char *buf,
>+			   loff_t pos, size_t count)
>+{
>+	struct rtas_error_log *error_log;
>+	int rc;
>+
>+	error_log = kmalloc(count, GFP_KERNEL);
>+	if (!error_log)
>+		return -ENOMEM;
>+
>+	memcpy(error_log, buf, count);
>+
>+	rc = handle_dlpar_errorlog(error_log);
>+	kfree(error_log);
>+	return rc ? rc : count;
>+}
>+
>+static BIN_ATTR(dlpar, S_IWUSR, NULL, dlpar_store, 0);
>+
> static int __init pseries_dlpar_init(void)
> {
>+	int rc;
>+
>+#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
> 	ppc_md.cpu_probe = dlpar_cpu_probe;
> 	ppc_md.cpu_release = dlpar_cpu_release;
>+#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> 
>-	return 0;
>+	rc = sysfs_create_bin_file(kernel_kobj, &bin_attr_dlpar);
>+
>+	return rc;
> }
> machine_device_initcall(pseries, pseries_dlpar_init);
> 
>-#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>index 3cb256c..69d178b 100644
>--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>@@ -9,6 +9,8 @@
>  *      2 of the License, or (at your option) any later version.
>  */
> 
>+#define pr_fmt(fmt)	"pseries-hotplug-mem: " fmt
>+
> #include <linux/of.h>
> #include <linux/of_address.h>
> #include <linux/memblock.h>
>@@ -134,6 +136,23 @@ static inline int pseries_remove_mem_node(struct device_node *np)
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
> 
>+int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>+{
>+	int rc = 0;
>+
>+	lock_device_hotplug();
>+
>+	switch (hp_elog->action) {
>+	default:
>+		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>+		rc = -EINVAL;
>+		break;
>+	}
>+
>+	unlock_device_hotplug();
>+	return rc;
>+}
>+
> static int pseries_add_mem_node(struct device_node *np)
> {
> 	const char *type;
>diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>index 239bee5..40e0339 100644
>--- a/arch/powerpc/platforms/pseries/pseries.h
>+++ b/arch/powerpc/platforms/pseries/pseries.h
>@@ -11,6 +11,7 @@
> #define _PSERIES_PSERIES_H
> 
> #include <linux/interrupt.h>
>+#include <asm/rtas.h>
> 
> struct device_node;
> 
>@@ -63,6 +64,15 @@ extern int dlpar_detach_node(struct device_node *);
> int dlpar_acquire_drc(u32 drc_index);
> int dlpar_release_drc(u32 drc_index);
> 
>+#ifdef CONFIG_MEMORY_HOTPLUG
>+int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
>+#else
>+static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>+{
>+	return -EOPNOTSUPP;
>+}
>+#endif
>+
> /* PCI root bridge prepare function override for pseries */
> struct pci_host_bridge;
> int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
>
>_______________________________________________
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] pseries: Move memory hotplug to the kernel
  2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
                   ` (5 preceding siblings ...)
  2014-11-17 21:56 ` [PATCH v2 6/6] pseries: Implement memory hotplug remove " Nathan Fontenot
@ 2014-11-18  2:00 ` Cyril Bur
  2014-11-18 18:34   ` Nathan Fontenot
  6 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2014-11-18  2:00 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

Hi Nathan,

I tried to apply these to Linus' tree and Mpes tree and to stable and
got several problems, I got stuck at the third hunk in patch 5.

Could you point out where I'm going wrong?

Thanks,

Cyril

On Mon, 2014-11-17 at 15:44 -0600, Nathan Fontenot wrote:
> In order to better support device hotplug (cpu, memory, and pci) in the
> PowerVM and PowerKVM environments, the handling of device hotplug
> could be updated so that the act of hotplugging a device occurs entirely
> in the kernel. This patch set begins to address this by moving
> memory hotplug to the kernel. Patches to follow will do the same
> for cpu and pci devices.
> 
> To provide background, the current handling of memory hotplug is
> handled by the drmgr command. This command is invoked when memory
> add/remove requests are made at the HMC and conveyed to a partition
> through the RSCT framework. The drmgr command then performs parts
> of the hotplug in user-space and makes requests to the kernel to perform
> other pieces. This is not really ideal, we can do everything in the
> kernel and do it faster.
> 
> In this patchset, hotplug events will now be communicated to the kernel
> in the form of rtas hotplug events. For PowerKVM systems this is done
> by qemu using the ras epow interrupt. For PowerVM systems the drmgr
> command will be updated to create a rtas hotplug event and send it to
> the kernel via a new /sys/kernel/dlpar interface. Both of these
> entry points for hotplug rtas events then call a common routine
> for handling rtas hotplug events.
> 
> -Nathan
> 
> Patch 1/6
> - Add definition of hotplug rtas event sections.
> 
> Patch 2/6
> - Update struct of_drconf_cell to use __be64/__be32
>  
> Patch 3/6
> - Export the dlpar_[acquire|release]drc() routines.
> 
> Patch 4/6
> - Create the new /sys/kernel/dlpar interface
> 
> Patch 5/6
> - Implement memory hotplug add in the kernel.
> 
> Patch 6/6
> - Implement memory hotplug remove in the kernel.
> 
>  include/asm/prom.h                 |   10 
>  include/asm/rtas.h                 |   26 ++
>  platforms/pseries/dlpar.c          |   72 +++++
>  platforms/pseries/hotplug-memory.c |  469 ++++++++++++++++++++++++++++++++++++-
>  platforms/pseries/pseries.h        |   12 
>  5 files changed, 576 insertions(+), 13 deletions(-)
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] pseries: Create new device hotplug entry point
  2014-11-17 22:53   ` Gavin Shan
@ 2014-11-18 18:18     ` Nathan Fontenot
  0 siblings, 0 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-18 18:18 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On 11/17/2014 04:53 PM, Gavin Shan wrote:
> On Mon, Nov 17, 2014 at 03:51:42PM -0600, Nathan Fontenot wrote:
>> Create a new entry point for device hotplug on pseries that will
>> work for both PowerVM and PowerKVM systems.
>>
>> The current process to hotplug (or dlpar) devices (generally the same
>> process for memory, cpu, and pci devices) on PowerVM systems is initiated
>>from the HMC, which communicates the request to the partitions through
>> the RSCT framework. The RSCT framework then invokes the drmgr command.
>> The drmgr command performs the hotplug operation by doing some pieces,
>> such as most of the rtas calls and device tree parsing, in userspace
>> and make requests to the kernel to online/offline the device, update the
>> device tree and add/remove the device.
>>
>> For PowerKVM the approach for device hotplug is to follow what is currently
>> being done for pci hotplug. A hotplug request is initiated from the host,
>> QEMU then generates an EPOW interrupt to the guest which causes the guest
>> to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
>> returns a rtas hotplug event to the guest.
>>
>> Please note that the current pci hotplug path for PowerKVM involves the
>> kernel receiving the rtas hotplug event, passing it to rtas_errd in
>> userspace, and having rtas_errd invoke drmgr. The drmgr command then
>> handles the request as described above for PowerVM systems. This is to
>> be updated to perform pci completely in the kernel in a later patch set.
>>
>> There is no need for this circuitous route, we should handle the entire
>> hotplug of devices in the kernel. What I am planning is to enable this
>> by moving the code to handle device hotplug from drmgr into the kernel to
>> provide a single path for both PowerVM and PowerKVM systems. This patch
>> provides the common entry point. For PowerKVM a future update to the kernel
>> rtas code will recognize rtas hotplug events returned from
>> rtas,check-exception calls and use the common entry point to handle device
>> hotplug entirely in the kernel.
>>
>> For PowerVM systems, this patch creates the /sys/kernel/dlpar file that rtas
>> hotplug events can be written to by drmgr and passed to the common entry point.
>> There is no chance of updating how we receive hotplug requests on PowerVM
>> systems.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/platforms/pseries/dlpar.c          |   72 ++++++++++++++++++++++-
>> arch/powerpc/platforms/pseries/hotplug-memory.c |   19 ++++++
>> arch/powerpc/platforms/pseries/pseries.h        |   10 +++
>> 3 files changed, 99 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
>> index c22bb1b..ec825d3 100644
>> --- a/arch/powerpc/platforms/pseries/dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/dlpar.c
>> @@ -10,6 +10,8 @@
>>  * 2 as published by the Free Software Foundation.
>>  */
>>
>> +#define pr_fmt(fmt)	"dlpar: " fmt
>> +
>> #include <linux/kernel.h>
>> #include <linux/notifier.h>
>> #include <linux/spinlock.h>
>> @@ -535,13 +537,79 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>> 	return count;
>> }
>>
>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>> +
>> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_hp_errorlog *hp_elog;
>> +	int rc;
>> +
>> +	pseries_log = get_pseries_errorlog(error_log,
>> +					   PSERIES_ELOG_SECT_ID_HOTPLUG);
>> +	if (!pseries_log || (pseries_log->length == 0))
>> +		return -EINVAL;
>> +
>> +	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
>> +
>> +	/* Go ahead and convert the hotplug type to the correct endianness
>> +	 * to avoid converting it everywhere we use it.
>> +	 */
>> +	switch (hp_elog->id_type) {
>> +	case PSERIES_HP_ELOG_ID_DRC_COUNT:
>> +		hp_elog->_drc_u.drc_count =
>> +					be32_to_cpu(hp_elog->_drc_u.drc_count);
>> +	case PSERIES_HP_ELOG_ID_DRC_INDEX:
>> +		hp_elog->_drc_u.drc_index =
>> +					be32_to_cpu(hp_elog->_drc_u.drc_index);
>> +	}
>> +
> 
> It seems that "break" was missed for all cases.

Yep, it was.

> 
>> +	switch (hp_elog->resource) {
>> +	case PSERIES_HP_ELOG_RESOURCE_MEM:
>> +		rc = dlpar_memory(hp_elog);
>> +		break;
>> +	default:
>> +		pr_warn_ratelimited("Invalid resource (%d) specified\n",
>> +				    hp_elog->resource);
>> +		rc = -EINVAL;
>> +		break;
> 
> Unnecessary "break" here.

I'll remove this one and send an updated patch.

-Nathan

> 
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static ssize_t dlpar_store(struct file *filp, struct kobject *kobj,
>> +			   struct bin_attribute *bin_attr, char *buf,
>> +			   loff_t pos, size_t count)
>> +{
>> +	struct rtas_error_log *error_log;
>> +	int rc;
>> +
>> +	error_log = kmalloc(count, GFP_KERNEL);
>> +	if (!error_log)
>> +		return -ENOMEM;
>> +
>> +	memcpy(error_log, buf, count);
>> +
>> +	rc = handle_dlpar_errorlog(error_log);
>> +	kfree(error_log);
>> +	return rc ? rc : count;
>> +}
>> +
>> +static BIN_ATTR(dlpar, S_IWUSR, NULL, dlpar_store, 0);
>> +
>> static int __init pseries_dlpar_init(void)
>> {
>> +	int rc;
>> +
>> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>> 	ppc_md.cpu_probe = dlpar_cpu_probe;
>> 	ppc_md.cpu_release = dlpar_cpu_release;
>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>>
>> -	return 0;
>> +	rc = sysfs_create_bin_file(kernel_kobj, &bin_attr_dlpar);
>> +
>> +	return rc;
>> }
>> machine_device_initcall(pseries, pseries_dlpar_init);
>>
>> -#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 3cb256c..69d178b 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -9,6 +9,8 @@
>>  *      2 of the License, or (at your option) any later version.
>>  */
>>
>> +#define pr_fmt(fmt)	"pseries-hotplug-mem: " fmt
>> +
>> #include <linux/of.h>
>> #include <linux/of_address.h>
>> #include <linux/memblock.h>
>> @@ -134,6 +136,23 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>> }
>> #endif /* CONFIG_MEMORY_HOTREMOVE */
>>
>> +int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	int rc = 0;
>> +
>> +	lock_device_hotplug();
>> +
>> +	switch (hp_elog->action) {
>> +	default:
>> +		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>> +		rc = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	unlock_device_hotplug();
>> +	return rc;
>> +}
>> +
>> static int pseries_add_mem_node(struct device_node *np)
>> {
>> 	const char *type;
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>> index 239bee5..40e0339 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -11,6 +11,7 @@
>> #define _PSERIES_PSERIES_H
>>
>> #include <linux/interrupt.h>
>> +#include <asm/rtas.h>
>>
>> struct device_node;
>>
>> @@ -63,6 +64,15 @@ extern int dlpar_detach_node(struct device_node *);
>> int dlpar_acquire_drc(u32 drc_index);
>> int dlpar_release_drc(u32 drc_index);
>>
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
>> +#else
>> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>> +#endif
>> +
>> /* PCI root bridge prepare function override for pseries */
>> struct pci_host_bridge;
>> int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] pseries: Move memory hotplug to the kernel
  2014-11-18  2:00 ` [PATCH v2 0/6] pseries: Move memory hotplug to " Cyril Bur
@ 2014-11-18 18:34   ` Nathan Fontenot
  2014-11-18 22:59     ` Cyril Bur
  0 siblings, 1 reply; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-18 18:34 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev

On 11/17/2014 08:00 PM, Cyril Bur wrote:
> Hi Nathan,
> 
> I tried to apply these to Linus' tree and Mpes tree and to stable and
> got several problems, I got stuck at the third hunk in patch 5.

I based these patches off of mpe's -next tree. I did a fresh pull of
mpe's tree and found that they do apply with some fuzz to the master branch.

Which tree were you having issue with patch 5?

-Nathan

> 
> Could you point out where I'm going wrong?
> 
> Thanks,
> 
> Cyril
> 
> On Mon, 2014-11-17 at 15:44 -0600, Nathan Fontenot wrote:
>> In order to better support device hotplug (cpu, memory, and pci) in the
>> PowerVM and PowerKVM environments, the handling of device hotplug
>> could be updated so that the act of hotplugging a device occurs entirely
>> in the kernel. This patch set begins to address this by moving
>> memory hotplug to the kernel. Patches to follow will do the same
>> for cpu and pci devices.
>>
>> To provide background, the current handling of memory hotplug is
>> handled by the drmgr command. This command is invoked when memory
>> add/remove requests are made at the HMC and conveyed to a partition
>> through the RSCT framework. The drmgr command then performs parts
>> of the hotplug in user-space and makes requests to the kernel to perform
>> other pieces. This is not really ideal, we can do everything in the
>> kernel and do it faster.
>>
>> In this patchset, hotplug events will now be communicated to the kernel
>> in the form of rtas hotplug events. For PowerKVM systems this is done
>> by qemu using the ras epow interrupt. For PowerVM systems the drmgr
>> command will be updated to create a rtas hotplug event and send it to
>> the kernel via a new /sys/kernel/dlpar interface. Both of these
>> entry points for hotplug rtas events then call a common routine
>> for handling rtas hotplug events.
>>
>> -Nathan
>>
>> Patch 1/6
>> - Add definition of hotplug rtas event sections.
>>
>> Patch 2/6
>> - Update struct of_drconf_cell to use __be64/__be32
>>  
>> Patch 3/6
>> - Export the dlpar_[acquire|release]drc() routines.
>>
>> Patch 4/6
>> - Create the new /sys/kernel/dlpar interface
>>
>> Patch 5/6
>> - Implement memory hotplug add in the kernel.
>>
>> Patch 6/6
>> - Implement memory hotplug remove in the kernel.
>>
>>  include/asm/prom.h                 |   10 
>>  include/asm/rtas.h                 |   26 ++
>>  platforms/pseries/dlpar.c          |   72 +++++
>>  platforms/pseries/hotplug-memory.c |  469 ++++++++++++++++++++++++++++++++++++-
>>  platforms/pseries/pseries.h        |   12 
>>  5 files changed, 576 insertions(+), 13 deletions(-)
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] pseries: Move memory hotplug to the kernel
  2014-11-18 18:34   ` Nathan Fontenot
@ 2014-11-18 22:59     ` Cyril Bur
  0 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2014-11-18 22:59 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Tue, 2014-11-18 at 12:34 -0600, Nathan Fontenot wrote:
> On 11/17/2014 08:00 PM, Cyril Bur wrote:
> > Hi Nathan,
> > 
> > I tried to apply these to Linus' tree and Mpes tree and to stable and
> > got several problems, I got stuck at the third hunk in patch 5.
> 
> I based these patches off of mpe's -next tree. I did a fresh pull of
> mpe's tree and found that they do apply with some fuzz to the master branch.
> 
Got them onto mpe's -next thanks.

> Which tree were you having issue with patch 5?

Looks like 16d0f5c4af76b0c3424290937bf1ac22adf439b1 was the cause of my
problems. 
> 
> -Nathan
> 
> > 
> > Could you point out where I'm going wrong?
> > 
> > Thanks,
> > 
> > Cyril
> > 
> > On Mon, 2014-11-17 at 15:44 -0600, Nathan Fontenot wrote:
> >> In order to better support device hotplug (cpu, memory, and pci) in the
> >> PowerVM and PowerKVM environments, the handling of device hotplug
> >> could be updated so that the act of hotplugging a device occurs entirely
> >> in the kernel. This patch set begins to address this by moving
> >> memory hotplug to the kernel. Patches to follow will do the same
> >> for cpu and pci devices.
> >>
> >> To provide background, the current handling of memory hotplug is
> >> handled by the drmgr command. This command is invoked when memory
> >> add/remove requests are made at the HMC and conveyed to a partition
> >> through the RSCT framework. The drmgr command then performs parts
> >> of the hotplug in user-space and makes requests to the kernel to perform
> >> other pieces. This is not really ideal, we can do everything in the
> >> kernel and do it faster.
> >>
> >> In this patchset, hotplug events will now be communicated to the kernel
> >> in the form of rtas hotplug events. For PowerKVM systems this is done
> >> by qemu using the ras epow interrupt. For PowerVM systems the drmgr
> >> command will be updated to create a rtas hotplug event and send it to
> >> the kernel via a new /sys/kernel/dlpar interface. Both of these
> >> entry points for hotplug rtas events then call a common routine
> >> for handling rtas hotplug events.
> >>
> >> -Nathan
> >>
> >> Patch 1/6
> >> - Add definition of hotplug rtas event sections.
> >>
> >> Patch 2/6
> >> - Update struct of_drconf_cell to use __be64/__be32
> >>  
> >> Patch 3/6
> >> - Export the dlpar_[acquire|release]drc() routines.
> >>
> >> Patch 4/6
> >> - Create the new /sys/kernel/dlpar interface
> >>
> >> Patch 5/6
> >> - Implement memory hotplug add in the kernel.
> >>
> >> Patch 6/6
> >> - Implement memory hotplug remove in the kernel.
> >>
> >>  include/asm/prom.h                 |   10 
> >>  include/asm/rtas.h                 |   26 ++
> >>  platforms/pseries/dlpar.c          |   72 +++++
> >>  platforms/pseries/hotplug-memory.c |  469 ++++++++++++++++++++++++++++++++++++-
> >>  platforms/pseries/pseries.h        |   12 
> >>  5 files changed, 576 insertions(+), 13 deletions(-)
> >>
> >> _______________________________________________
> >> Linuxppc-dev mailing list
> >> Linuxppc-dev@lists.ozlabs.org
> >> https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] pseries: Create new device hotplug entry point
  2014-11-17 21:51 ` [PATCH v2 3/6] pseries: Create new device hotplug entry point Nathan Fontenot
  2014-11-17 22:53   ` Gavin Shan
@ 2014-11-21  7:49   ` Cyril Bur
  2014-11-24 14:45     ` Nathan Fontenot
  2014-11-26 23:44   ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2014-11-21  7:49 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-11-17 at 15:51 -0600, Nathan Fontenot wrote:
> Create a new entry point for device hotplug on pseries that will
> work for both PowerVM and PowerKVM systems.
> 
> The current process to hotplug (or dlpar) devices (generally the same
> process for memory, cpu, and pci devices) on PowerVM systems is initiated
> from the HMC, which communicates the request to the partitions through
> the RSCT framework. The RSCT framework then invokes the drmgr command.
> The drmgr command performs the hotplug operation by doing some pieces,
> such as most of the rtas calls and device tree parsing, in userspace
> and make requests to the kernel to online/offline the device, update the
> device tree and add/remove the device.
> 
> For PowerKVM the approach for device hotplug is to follow what is currently
> being done for pci hotplug. A hotplug request is initiated from the host,
> QEMU then generates an EPOW interrupt to the guest which causes the guest
> to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
> returns a rtas hotplug event to the guest.

Please forgive my ignorance of the exact details of how this all works.
I've been trying to wrap my head around how it works in a little more
detail than your high level overview. Correct me where I go wrong.

So the EPOW interrupt comes in and QEMU receives the
rtas,check-exception call and returns the rtas hotplug event. As you
state below, the connection of the arrival of the rtas hotplug event to
the hotplug code will be made in a subsequent patch.

Here is my understanding of what happens when a hotplug event gets
processed.
An LMB is selected
from /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory in the
device tree which is populated at boot time (although it's possible that
it can updated once the guest is running?) either because it matches a
specific drc-index or simply because it is in the list and we are to
hotplug 'count' LMBs. So there's a maximum amount of memory that can be
hotplugged (without device tree updates...)?

Once selected the kernel informs the hypervisor that it is going to use
that LMB with two RTAS calls, 'get-sensor-state' and 'set-indicator'
which first checks the actual state of that LMB with the hypervisor and
then marks it as 'in use' (in dlpar_acquire_drc).

After that, find the (NUMA?) node id of the memory and inform the
generic kernel about this new memory.

For some reason memblock needs to be informed separately (does it need
to be informed exactly?), which as you pointed out in the previous
version of this patchset you're not sure why it isn't done in
add_memory, and you still do it because it's what currently happens? I'd
very much like to know why this sequence of events but I suspect the
explanation might get quite involved.

Finally mark the LMB as assigned in its device tree node. This is
bookkeeping right?

This process is repeated for each LMB that should be added.

In the event a failure a best effort rollback is done, as you mentioned
in the previous version, it may not always be possible but at least it's
attempted.

Once all this succeeds update the device tree.

Basically the exact reverse of this process happens for unplug.

I do have questions about this process: What is the most likely part to
fail? I have no idea how feasible it would be but perhaps trying to do
the likely failure on all the LBMs might help unwinding and perhaps
provide a guarantee that it can be completely rolled back. I'm really
not sure but by the looks of things are going to be pretty reversible up
until device_online.

I can't help but notice the duplication with memory_probe_store
(although that doesn't do any rollback). Probably unavoidable and its
really not much code.

Thanks in advance for the clarifications,

Cyril
> 
> Please note that the current pci hotplug path for PowerKVM involves the
> kernel receiving the rtas hotplug event, passing it to rtas_errd in
> userspace, and having rtas_errd invoke drmgr. The drmgr command then
> handles the request as described above for PowerVM systems. This is to
> be updated to perform pci completely in the kernel in a later patch set.
> 
> There is no need for this circuitous route, we should handle the entire
> hotplug of devices in the kernel. What I am planning is to enable this
> by moving the code to handle device hotplug from drmgr into the kernel to
> provide a single path for both PowerVM and PowerKVM systems. This patch
> provides the common entry point. For PowerKVM a future update to the kernel
> rtas code will recognize rtas hotplug events returned from
> rtas,check-exception calls and use the common entry point to handle device
> hotplug entirely in the kernel.
> 
> For PowerVM systems, this patch creates the /sys/kernel/dlpar file that rtas
> hotplug events can be written to by drmgr and passed to the common entry point.
> There is no chance of updating how we receive hotplug requests on PowerVM
> systems.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/dlpar.c          |   72 ++++++++++++++++++++++-
>  arch/powerpc/platforms/pseries/hotplug-memory.c |   19 ++++++
>  arch/powerpc/platforms/pseries/pseries.h        |   10 +++
>  3 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index c22bb1b..ec825d3 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -10,6 +10,8 @@
>   * 2 as published by the Free Software Foundation.
>   */
>  
> +#define pr_fmt(fmt)	"dlpar: " fmt
> +
>  #include <linux/kernel.h>
>  #include <linux/notifier.h>
>  #include <linux/spinlock.h>
> @@ -535,13 +537,79 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>  	return count;
>  }
>  
> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> +
> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_hp_errorlog *hp_elog;
> +	int rc;
> +
> +	pseries_log = get_pseries_errorlog(error_log,
> +					   PSERIES_ELOG_SECT_ID_HOTPLUG);

> +	if (!pseries_log || (pseries_log->length == 0))
> +		return -EINVAL;
> +
> +	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
Maybe rather than doing this here it might be worth having a function
that returns you a pseries_hp_errorlog from a pseries_errorlog but also
with everything in CPU endian which will make 5/6 and 6/6 look nicer.
> +
> +	/* Go ahead and convert the hotplug type to the correct endianness
> +	 * to avoid converting it everywhere we use it.
> +	 */
> +	switch (hp_elog->id_type) {
> +	case PSERIES_HP_ELOG_ID_DRC_COUNT:
> +		hp_elog->_drc_u.drc_count =
> +					be32_to_cpu(hp_elog->_drc_u.drc_count);
> +	case PSERIES_HP_ELOG_ID_DRC_INDEX:
> +		hp_elog->_drc_u.drc_index =
> +					be32_to_cpu(hp_elog->_drc_u.drc_index);
> +	}
You're converting __be32 to CPU endian but storing them back into a
__be32? This will upset sparse.
Should this not also have a default case?

> +
> +	switch (hp_elog->resource) {
> +	case PSERIES_HP_ELOG_RESOURCE_MEM:
> +		rc = dlpar_memory(hp_elog);
> +		break;
> +	default:
> +		pr_warn_ratelimited("Invalid resource (%d) specified\n",
> +				    hp_elog->resource);
> +		rc = -EINVAL;
> +		break;
> +	}
> +
> +	return rc;
> +}
> +
> +static ssize_t dlpar_store(struct file *filp, struct kobject *kobj,
> +			   struct bin_attribute *bin_attr, char *buf,
> +			   loff_t pos, size_t count)
> +{
> +	struct rtas_error_log *error_log;
> +	int rc;
> +
> +	error_log = kmalloc(count, GFP_KERNEL);
> +	if (!error_log)
> +		return -ENOMEM;
> +
> +	memcpy(error_log, buf, count);
> +
> +	rc = handle_dlpar_errorlog(error_log);
> +	kfree(error_log);
> +	return rc ? rc : count;
> +}
> +
> +static BIN_ATTR(dlpar, S_IWUSR, NULL, dlpar_store, 0);
> +
>  static int __init pseries_dlpar_init(void)
>  {
> +	int rc;
> +
> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>  	ppc_md.cpu_probe = dlpar_cpu_probe;
>  	ppc_md.cpu_release = dlpar_cpu_release;
> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>  
> -	return 0;
> +	rc = sysfs_create_bin_file(kernel_kobj, &bin_attr_dlpar);
> +
> +	return rc;
>  }
>  machine_device_initcall(pseries, pseries_dlpar_init);
>  
> -#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 3cb256c..69d178b 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -9,6 +9,8 @@
>   *      2 of the License, or (at your option) any later version.
>   */
>  
> +#define pr_fmt(fmt)	"pseries-hotplug-mem: " fmt
> +
>  #include <linux/of.h>
>  #include <linux/of_address.h>
>  #include <linux/memblock.h>
> @@ -134,6 +136,23 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
> +int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> +{
> +	int rc = 0;
> +
> +	lock_device_hotplug();
> +
> +	switch (hp_elog->action) {
> +	default:
> +		pr_err("Invalid action (%d) specified\n", hp_elog->action);
> +		rc = -EINVAL;
> +		break;
> +	}
> +
> +	unlock_device_hotplug();
> +	return rc;
> +}
> +
>  static int pseries_add_mem_node(struct device_node *np)
>  {
>  	const char *type;
> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
> index 239bee5..40e0339 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -11,6 +11,7 @@
>  #define _PSERIES_PSERIES_H
>  
>  #include <linux/interrupt.h>
> +#include <asm/rtas.h>
>  
>  struct device_node;
>  
> @@ -63,6 +64,15 @@ extern int dlpar_detach_node(struct device_node *);
>  int dlpar_acquire_drc(u32 drc_index);
>  int dlpar_release_drc(u32 drc_index);
>  
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
> +#else
> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> +{
> +	return -EOPNOTSUPP;
> +}
> +#endif
> +
>  /* PCI root bridge prepare function override for pseries */
>  struct pci_host_bridge;
>  int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel
  2014-11-17 21:54 ` [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
@ 2014-11-21  7:49   ` Cyril Bur
  2014-11-24 14:56     ` Nathan Fontenot
  0 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2014-11-21  7:49 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-11-17 at 15:54 -0600, Nathan Fontenot wrote:
> Move handling of memory hotplug add on pseries completely into the kernel.
> 
> The current memory hotplug add path involves the drmgr command doing part
> of this work in userspace and requesting the kernel to do additional pieces.
> This patch allows us to handle the act completely in the kernel via rtas
> hotplug events. This allows us to perform the operation faster and provide
> a common memory hotplug add path for PowerVM and PowerKVM systems.
> 
> The patch does introduce a static rtas_hp_event variable that is set to
> true when updating the device tree during memory hotplug initiated from
> a rtas hotplug event. This is needed because we do not need to do the
> work in the of notifier, this work is already performed in handling the
> hotplug request. At a later time we can remove this when we deprecate the
> previous method of memory hotplug.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  244 +++++++++++++++++++++++
>  1 file changed, 243 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 69d178b..b57d42b 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -16,6 +16,7 @@
>  #include <linux/memblock.h>
>  #include <linux/memory.h>
>  #include <linux/memory_hotplug.h>
> +#include <linux/slab.h>
>  
>  #include <asm/firmware.h>
>  #include <asm/machdep.h>
> @@ -23,6 +24,8 @@
>  #include <asm/sparsemem.h>
>  #include "pseries.h"
>  
> +static bool rtas_hp_event;
> +
>  unsigned long pseries_memory_block_size(void)
>  {
>  	struct device_node *np;
> @@ -66,6 +69,52 @@ unsigned long pseries_memory_block_size(void)
>  	return memblock_size;
>  }
>  
> +static void dlpar_free_drconf_property(struct property *prop)
> +{
> +	kfree(prop->name);
> +	kfree(prop->value);
> +	kfree(prop);
> +}
> +
> +static struct property *dlpar_clone_drconf_property(struct device_node *dn)
> +{
> +	struct property *prop, *new_prop;
> +
> +	prop = of_find_property(dn, "ibm,dynamic-memory", NULL);
> +	if (!prop)
> +		return NULL;
> +
> +	new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
> +	if (!new_prop)
> +		return NULL;
> +
> +	new_prop->name = kstrdup(prop->name, GFP_KERNEL);
> +	new_prop->value = kmalloc(prop->length, GFP_KERNEL);
> +	if (!new_prop->name || !new_prop->value) {
> +		dlpar_free_drconf_property(new_prop);
> +		return NULL;
> +	}
> +
> +	memcpy(new_prop->value, prop->value, prop->length);
> +	new_prop->length = prop->length;
> +
> +	return new_prop;
> +}
> +
> +static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
> +{
> +	unsigned long section_nr;
> +	struct mem_section *mem_sect;
> +	struct memory_block *mem_block;
> +	u64 phys_addr = be64_to_cpu(lmb->base_addr);
> +
> +	section_nr = pfn_to_section_nr(PFN_DOWN(phys_addr));
> +	mem_sect = __nr_to_section(section_nr);
> +
> +	mem_block = find_memory_block(mem_sect);
> +	return mem_block;
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
>  {
> @@ -136,19 +185,209 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
> +static int dlpar_add_lmb(struct of_drconf_cell *lmb)
> +{
> +	struct memory_block *mem_block;
> +	u64 phys_addr;
> +	uint32_t drc_index;
I started commenting this and it turns out you've used uint32_t almost
everywhere for values you've pulled from the device tree or the elog.
These should be u32.
> +	unsigned long pages_per_block;
> +	unsigned long block_sz;
> +	int nid, sections_per_block;
> +	int rc;
> +
> +	if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
> +		return -EINVAL;
> +
> +	phys_addr = be64_to_cpu(lmb->base_addr);
> +	drc_index = be32_to_cpu(lmb->drc_index);
> +	block_sz = memory_block_size_bytes();
> +	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +	pages_per_block = PAGES_PER_SECTION * sections_per_block;
> +
Perhaps I'm being a bit slow here but it isn't exactly clear what you're
getting with all those variables and could you explain what that
statement below is checking for?

> +	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
> +		return -EINVAL;
> +
> +	rc = dlpar_acquire_drc(drc_index);
> +	if (rc)
> +		return rc;
> +
> +	/* Find the node id for this address */
> +	nid = memory_add_physaddr_to_nid(phys_addr);
> +
> +	/* Add the memory */
> +	rc = add_memory(nid, phys_addr, block_sz);
> +	if (rc) {
> +		dlpar_release_drc(drc_index);
> +		return rc;
> +	}
> +
> +	/* Register this block of memory */
> +	rc = memblock_add(phys_addr, block_sz);
> +	if (rc) {
> +		remove_memory(nid, phys_addr, block_sz);
> +		dlpar_release_drc(drc_index);
> +		return rc;
> +	}
> +
> +	mem_block = lmb_to_memblock(lmb);
> +	if (!mem_block) {
> +		remove_memory(nid, phys_addr, block_sz);
> +		dlpar_release_drc(drc_index);
> +		return -EINVAL;
> +	}
> +
> +	rc = device_online(&mem_block->dev);
> +	put_device(&mem_block->dev);
> +	if (rc) {
> +		remove_memory(nid, phys_addr, block_sz);
> +		dlpar_release_drc(drc_index);
> +		return rc;
> +	}
> +
> +	lmb->flags |= cpu_to_be32(DRCONF_MEM_ASSIGNED);
> +	return 0;
> +}
> +
> +static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
> +				     struct property *prop)
> +{
> +	struct of_drconf_cell *lmbs;
> +	uint32_t num_lmbs;
> +	__be32 *p;
> +	int i, lmbs_to_add;
> +	int lmbs_available = 0;
> +	int lmbs_added = 0;
> +	int rc;
> +
> +	lmbs_to_add = be32_to_cpu(hp_elog->_drc_u.drc_count);
Didn't you already do the endian conversion back in
handle_dlpar_errorlog?
> +	pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
> +
> +	if (lmbs_to_add == 0)
> +		return -EINVAL;
> +
> +	p = prop->value;
> +	num_lmbs = be32_to_cpu(*p++);
> +	lmbs = (struct of_drconf_cell *)p;
> +
> +	/* Validate that there are enough LMBs to satisfy the request */
> +	for (i = 0; i < num_lmbs; i++) {
> +		if (!(be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED))
> +			lmbs_available++;
> +	}
> +
> +	if (lmbs_available < lmbs_to_add)
> +		return -EINVAL;
> +
I'm wondering why the next 3 lines when the if could be incorporated
into the for condition
for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
(or lmbs_to_add < lmbs_added)...
> +	for (i = 0; i < num_lmbs; i++) {
> +		if (lmbs_to_add == lmbs_added)
> +			break;
> +
> +		rc = dlpar_add_lmb(&lmbs[i]);
> +		if (rc)
> +			continue;
> +
> +		lmbs_added++;
> +		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
> +			be64_to_cpu(lmbs[i].base_addr),
> +			be32_to_cpu(lmbs[i].drc_index));
This message feels a tad premature, you're going to roll back if the
requested amount couldn't be added which is good but then there's going
to be a confusing mix of 'hot-added' followed by 'hot-removed'. Could
this message be moved to when you clear the reserved fields?
> +
> +		/* Mark this lmb so we can remove it later if all of the
> +		 * requested LMBs cannot be added.
> +		 */
> +		lmbs[i].reserved = 1;
Writing 1 like that into a __be variable will upset sparse. 

As a more general comment that it might be worth not passing around __be
structures and convert them all into CPU endian in one place so that all
these functions can do away with the endian conversion calls.
> +	}
> +
> +	if (lmbs_added != lmbs_to_add) {
> +		/* TODO: remove added lmbs */
> +		rc = -EINVAL;
> +	}
> +
> +	/* Clear the reserved fields */
> +	for (i = 0; i < num_lmbs; i++)
> +		lmbs[i].reserved = 0;
> +
> +	return rc;
> +}
> +
> +static int dlpar_memory_add_by_index(struct pseries_hp_errorlog *hp_elog,
> +				     struct property *prop)
> +{
> +	struct of_drconf_cell *lmbs;
> +	uint32_t num_lmbs, drc_index;
> +	__be32 *p;
> +	int i, lmb_found;
> +	int rc;
> +
> +	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
Didn't you already do the endian conversion back in
handle_dlpar_errorlog?
> +	pr_info("Attempting to hot-add LMB, drc index %x\n", drc_index);
> +
> +	p = prop->value;
> +	num_lmbs = be32_to_cpu(*p++);
> +	lmbs = (struct of_drconf_cell *)p;
> +
> +	lmb_found = 0;
> +	for (i = 0; i < num_lmbs; i++) {
> +		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
Probably wanted to use your local variable drc_index here also
lmbs[i].drc_index is __be and I don't think you've converted it but the
other side of the condition has been...
> +			lmb_found = 1;
> +			rc = dlpar_add_lmb(&lmbs[i]);
> +			break;
> +		}
> +	}
> +
> +	if (!lmb_found)
> +		rc = -EINVAL;
> +
> +	if (rc)
> +		pr_info("Failed to hot-add memory, drc index %x\n", drc_index);
> +	else
> +		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
> +			be64_to_cpu(lmbs[i].base_addr), drc_index);
> +
> +	return rc;
> +}
> +
>  int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>  {
> -	int rc = 0;
> +	struct device_node *dn;
> +	struct property *prop;
> +	int rc;
>  
>  	lock_device_hotplug();
>  
> +	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> +	if (!dn)
> +		return -EINVAL;
> +
> +	prop = dlpar_clone_drconf_property(dn);
> +	if (!prop) {
> +		of_node_put(dn);
> +		return -EINVAL;
> +	}
> +
>  	switch (hp_elog->action) {
> +	case PSERIES_HP_ELOG_ACTION_ADD:
> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
> +			rc = dlpar_memory_add_by_count(hp_elog, prop);
> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> +			rc = dlpar_memory_add_by_index(hp_elog, prop);
> +		else
> +			rc = -EINVAL;
> +		break;
>  	default:
>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>  		rc = -EINVAL;
>  		break;
>  	}
>  
> +	if (rc)
> +		dlpar_free_drconf_property(prop);
> +	else {
> +		rtas_hp_event = true;
> +		of_update_property(dn, prop);
> +		rtas_hp_event = false;
> +	}
> +
> +	of_node_put(dn);
>  	unlock_device_hotplug();
>  	return rc;
>  }
> @@ -193,6 +432,9 @@ static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
>  	__be32 *p;
>  	int i, rc = -EINVAL;
>  
> +	if (rtas_hp_event)
> +		return 0;
> +
>  	memblock_size = pseries_memory_block_size();
>  	if (!memblock_size)
>  		return -EINVAL;
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] pseries: Implement memory hotplug remove in the kernel
  2014-11-17 21:56 ` [PATCH v2 6/6] pseries: Implement memory hotplug remove " Nathan Fontenot
@ 2014-11-21  7:49   ` Cyril Bur
  2014-11-24 15:03     ` Nathan Fontenot
  0 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2014-11-21  7:49 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev


On Mon, 2014-11-17 at 15:56 -0600, Nathan Fontenot wrote:
> Move handling of memory hotplug remove on pseries completely into the kernel.
> 
> The current memory hotplug remove path involves the drmgr command doing part
> of this work in userspace and requesting the kernel to do additional pieces.
> This patch allows us to handle the act completely in the kernel via rtas
> hotplug events. This allows us to perform the operation faster and provide
> a common memory hotplug remove path for PowerVM and PowerKVM systems.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  206 ++++++++++++++++++++++-
>  1 file changed, 201 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b57d42b..c8189e8 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -173,6 +173,179 @@ static int pseries_remove_mem_node(struct device_node *np)
>  	pseries_remove_memblock(base, lmb_size);
>  	return 0;
>  }
> +
> +static int lmb_is_removable(struct of_drconf_cell *lmb)
> +{
> +	int i, scns_per_block;
> +	int rc = 1;
> +	unsigned long pfn, block_sz;
> +	u64 phys_addr;
> +
> +	if (!(be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED))
> +		return -1;
This makes me kind of nervous. You're using the return value of
lmb_is_removable as a boolean but it returns three possible values -1,0
and 1. Functionally it looks correct to me so its not a massive issue
> +
> +	phys_addr = be64_to_cpu(lmb->base_addr);
> +	block_sz = memory_block_size_bytes();
> +	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +
> +	for (i = 0; i < scns_per_block; i++) {
> +		pfn = PFN_DOWN(phys_addr);
> +		if (!pfn_present(pfn))
> +			continue;
> +
> +		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
> +		phys_addr += MIN_MEMORY_BLOCK_SIZE;
> +	}
> +
> +	return rc;
> +}
> +
> +static int dlpar_add_lmb(struct of_drconf_cell *);
> +
> +static int dlpar_remove_lmb(struct of_drconf_cell *lmb)
> +{
> +	struct memory_block *mem_block;
> +	unsigned long block_sz;
> +	u64 phys_addr;
> +	uint32_t drc_index;
> +	int nid, rc;
> +
> +	if (!lmb_is_removable(lmb))
> +		return -EINVAL;
> +
> +	phys_addr = be64_to_cpu(lmb->base_addr);
> +	drc_index = be32_to_cpu(lmb->drc_index);
> +
> +	mem_block = lmb_to_memblock(lmb);
> +	if (!mem_block)
> +		return -EINVAL;
> +
> +	rc = device_offline(&mem_block->dev);
> +	put_device(&mem_block->dev);
> +	if (rc)
> +		return rc;
> +
> +	block_sz = pseries_memory_block_size();
> +	nid = memory_add_physaddr_to_nid(phys_addr);
> +
> +	remove_memory(nid, phys_addr, block_sz);
> +
> +	/* Update memory regions for memory remove */
> +	memblock_remove(phys_addr, block_sz);
> +
> +	dlpar_release_drc(drc_index);
> +
> +	lmb->flags &= cpu_to_be32(~DRCONF_MEM_ASSIGNED);
> +	pr_info("Memory at %llx (drc index %x) has been hot-removed\n",
> +		be64_to_cpu(lmb->base_addr), drc_index);
dlpar_add_lmb doesn't print anything but dlpar_remove_lmb does? Related
to my comment about printing a 'hot-add' messages prematurely, perhaps
move this to the callers
> +
> +	return 0;
> +}
> +
> +static int dlpar_memory_remove_by_count(struct pseries_hp_errorlog *hp_elog,
> +					struct property *prop)
> +{
> +	struct of_drconf_cell *lmbs;
> +	int lmbs_to_remove, lmbs_removed = 0;
> +	int lmbs_available = 0;
> +	uint32_t num_lmbs;
> +	__be32 *p;
> +	int i, rc;
> +
> +	lmbs_to_remove = be32_to_cpu(hp_elog->_drc_u.drc_count);
Didn't you already do the endian conversion back in
handle_dlpar_errorlog?
> +	pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
> +
> +	if (lmbs_to_remove == 0)
> +		return -EINVAL;
> +
> +	p = prop->value;
> +	num_lmbs = be32_to_cpu(*p++);
> +	lmbs = (struct of_drconf_cell *)p;
> +
> +	/* Validate that there are enough LMBs to satisfy the request */
> +	for (i = 0; i < num_lmbs; i++) {
> +		if (be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED)
> +			lmbs_available++;
> +	}
> +
> +	if (lmbs_available < lmbs_to_remove)
> +		return -EINVAL;
> +
> +	for (i = 0; i < num_lmbs; i++) {
> +		if (lmbs_to_remove == lmbs_removed)
> +			break;
> +
> +		rc = dlpar_remove_lmb(&lmbs[i]);
> +		if (rc)
> +			continue;
> +
> +		lmbs_removed++;
> +
> +		/* Mark this lmb so we can add it later if all of the
> +		 * requested LMBs cannot be removed.
> +		 */
> +		lmbs[i].reserved = 1;
> +	}
> +
> +	if (lmbs_removed != lmbs_to_remove) {
> +		pr_err("Memory hot-remove failed, adding LMB's back\n");
> +
> +		for (i = 0; i < num_lmbs; i++) {
> +			if (!lmbs[i].reserved)
> +				continue;
> +
> +			rc = dlpar_add_lmb(&lmbs[i]);
> +			if (rc)
If this happens you this will leave an LMB removed but an error code
from this function will cause dlpar_memory to discard the dn prop that
it passed in. If you couldn't roll back completely the caller should
still update the dn property with the ones that it has left
removed/failed to re-add. Perhaps this function needs a mechanism to
report success/failure (as it currently does) and a mechanism to say
whether or not the dn property is dirty or not.
> +				pr_err("Failed to add LMB back, drc index %x\n",
> +				       be32_to_cpu(lmbs[i].drc_index));
> +
On a clean (everything rolled back successfully) failure, I don't think
you _need_ to bother although this is nice cleanup, you didn't do it
dlpar_memory_add_by_count - I'm in favour of being clean especially
since on an 'unclean' failure you should clear that flag - if you're
going to still update the device tree.
> +			lmbs[i].reserved = 0;
> +		}
> +		rc = -EINVAL;
> +	} else {
> +		/* remove any reserved markings */
> +		for (i = 0; i < num_lmbs; i++)
> +			lmbs[i].reserved = 0;
> +	}
Hmmmm since the if statements are checking for failures. It might be
more clear to return -EINVAL (rather than setting rc) and the code in
the else block can be moved out of it and come to think of it, return 0.
Otherwise it looks odd to me with the 'normal' success case code being
in an else statement.
> +
> +	return rc;
> +}
> +
> +static int dlpar_memory_remove_by_index(struct pseries_hp_errorlog *hp_elog,
> +					struct property *prop)
> +{
> +	struct of_drconf_cell *lmbs;
> +	uint32_t num_lmbs, drc_index;
> +	int lmb_found;
> +	__be32 *p;
> +	int i, rc;
> +
> +	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
Didn't you already do the endian conversion back in
handle_dlpar_errorlog?
> +	pr_info("Attempting to hot-remove LMB, drc index %x\n", drc_index);
> +
> +	p = prop->value;
> +	num_lmbs = be32_to_cpu(*p++);
> +	lmbs = (struct of_drconf_cell *)p;
> +
> +	lmb_found = 0;
> +	for (i = 0; i < num_lmbs; i++) {
> +		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
> +			lmb_found = 1;
> +			rc = dlpar_remove_lmb(&lmbs[i]);
> +			break;
> +		}
> +	}
> +
> +	if (!lmb_found)
> +		rc = -EINVAL;
> +
> +	if (rc)
> +		pr_info("Failed to hot-remove memory, drc index %x\n",
> +			drc_index);
> +
> +	return rc;
> +}
> +
>  #else
>  static inline int pseries_remove_memblock(unsigned long base,
>  					  unsigned int memblock_size)
> @@ -183,6 +356,11 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>  {
>  	return 0;
>  }
> +static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  static int dlpar_add_lmb(struct of_drconf_cell *lmb)
> @@ -298,14 +476,24 @@ static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
>  	}
>  
>  	if (lmbs_added != lmbs_to_add) {
> -		/* TODO: remove added lmbs */
> +		pr_err("Memory hot-add failed, removing any added LMBs\n");
> +
> +		for (i = 0; i < num_lmbs; i++) {
> +			if (!lmbs[i].reserved)
> +				continue;
> +
> +			rc = dlpar_remove_lmb(&lmbs[i]);
> +			if (rc)
There is a very much related comment in dlpar_memory_remove_by_count
about this condition being true.
> +				pr_err("Failed to remove LMB, drc index %x\n",
> +				       be32_to_cpu(lmbs[i].drc_index));

You don't clear the reserved flag here?

> +		}
>  		rc = -EINVAL;
Same observation as in dlpar_memory_remove_by_count
> +	} else {
> +		/* Clear the reserved fields */
> +		for (i = 0; i < num_lmbs; i++)
> +			lmbs[i].reserved = 0;
>  	}
>  
> -	/* Clear the reserved fields */
> -	for (i = 0; i < num_lmbs; i++)
> -		lmbs[i].reserved = 0;
> -
>  	return rc;
>  }
>  
> @@ -373,6 +561,14 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>  		else
>  			rc = -EINVAL;
>  		break;
> +	case PSERIES_HP_ELOG_ACTION_REMOVE:
> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
> +			rc = dlpar_memory_remove_by_count(hp_elog, prop);
> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> +			rc = dlpar_memory_remove_by_index(hp_elog, prop);
> +		else
> +			rc = -EINVAL;
> +		break;
>  	default:
>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>  		rc = -EINVAL;
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] pseries: Create new device hotplug entry point
  2014-11-21  7:49   ` Cyril Bur
@ 2014-11-24 14:45     ` Nathan Fontenot
  0 siblings, 0 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-24 14:45 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev

On 11/21/2014 01:49 AM, Cyril Bur wrote:
> 
> On Mon, 2014-11-17 at 15:51 -0600, Nathan Fontenot wrote:
>> Create a new entry point for device hotplug on pseries that will
>> work for both PowerVM and PowerKVM systems.
>>
>> The current process to hotplug (or dlpar) devices (generally the same
>> process for memory, cpu, and pci devices) on PowerVM systems is initiated
>> from the HMC, which communicates the request to the partitions through
>> the RSCT framework. The RSCT framework then invokes the drmgr command.
>> The drmgr command performs the hotplug operation by doing some pieces,
>> such as most of the rtas calls and device tree parsing, in userspace
>> and make requests to the kernel to online/offline the device, update the
>> device tree and add/remove the device.
>>
>> For PowerKVM the approach for device hotplug is to follow what is currently
>> being done for pci hotplug. A hotplug request is initiated from the host,
>> QEMU then generates an EPOW interrupt to the guest which causes the guest
>> to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
>> returns a rtas hotplug event to the guest.
> 
> Please forgive my ignorance of the exact details of how this all works.
> I've been trying to wrap my head around how it works in a little more
> detail than your high level overview. Correct me where I go wrong.
>
> So the EPOW interrupt comes in and QEMU receives the
> rtas,check-exception call and returns the rtas hotplug event. As you
> state below, the connection of the arrival of the rtas hotplug event to
> the hotplug code will be made in a subsequent patch.
> 

Correct. This should be a fairly straightforward patch once we're ready to
enable this.

> Here is my understanding of what happens when a hotplug event gets
> processed.
> An LMB is selected
> from /ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory in the
> device tree which is populated at boot time (although it's possible that
> it can updated once the guest is running?) either because it matches a
> specific drc-index or simply because it is in the list and we are to
> hotplug 'count' LMBs. So there's a maximum amount of memory that can be
> hotplugged (without device tree updates...)?

I don't think you could update the ibm,dynamic-memory property to add/remove
LMB's to it after booting. The property should contain all of the possible
LMB's that the system could use.

> 
> Once selected the kernel informs the hypervisor that it is going to use
> that LMB with two RTAS calls, 'get-sensor-state' and 'set-indicator'
> which first checks the actual state of that LMB with the hypervisor and
> then marks it as 'in use' (in dlpar_acquire_drc).
> 
> After that, find the (NUMA?) node id of the memory and inform the
> generic kernel about this new memory.
> 
> For some reason memblock needs to be informed separately (does it need
> to be informed exactly?), which as you pointed out in the previous
> version of this patchset you're not sure why it isn't done in
> add_memory, and you still do it because it's what currently happens? I'd
> very much like to know why this sequence of events but I suspect the
> explanation might get quite involved.
>

Why it is like this, I don't know.

It's possible the call to memblock_add|remove could be moved to the
arch_add|remove_memory routines in powerpc/mm/mem.c but this would
take some investigation and is probably beyond the scope of
these patches.
 
> Finally mark the LMB as assigned in its device tree node. This is
> bookkeeping right?

Correct.

> 
> This process is repeated for each LMB that should be added.
> 
> In the event a failure a best effort rollback is done, as you mentioned
> in the previous version, it may not always be possible but at least it's
> attempted.
> 
> Once all this succeeds update the device tree.
> 
> Basically the exact reverse of this process happens for unplug.
> 
> I do have questions about this process: What is the most likely part to
> fail? I have no idea how feasible it would be but perhaps trying to do
> the likely failure on all the LBMs might help unwinding and perhaps
> provide a guarantee that it can be completely rolled back. I'm really
> not sure but by the looks of things are going to be pretty reversible up
> until device_online.
>

The most likely failure is when trying to roll back after a failure when
adding memory. The failure I see most often when trying to remove memory
is failing to offline the memory of an LMB. This usually occurs due to
some portion of the pages not being able to be migrated elsewhere (such
as being pinned). Once the lmb is made available we have the potential
hit this scenario where a previously added LMB cannot be removed.
 
 
> I can't help but notice the duplication with memory_probe_store
> (although that doesn't do any rollback). Probably unavoidable and its
> really not much code.
>

There is some duplication. The memory_probe_store routine is used for
the current method of memory hotplug add. Once the framework is in place
for using rtas hotplug events I hope to deprecate the old method and
remove this code.
 
> Thanks in advance for the clarifications,
> 
> Cyril
>>
>> Please note that the current pci hotplug path for PowerKVM involves the
>> kernel receiving the rtas hotplug event, passing it to rtas_errd in
>> userspace, and having rtas_errd invoke drmgr. The drmgr command then
>> handles the request as described above for PowerVM systems. This is to
>> be updated to perform pci completely in the kernel in a later patch set.
>>
>> There is no need for this circuitous route, we should handle the entire
>> hotplug of devices in the kernel. What I am planning is to enable this
>> by moving the code to handle device hotplug from drmgr into the kernel to
>> provide a single path for both PowerVM and PowerKVM systems. This patch
>> provides the common entry point. For PowerKVM a future update to the kernel
>> rtas code will recognize rtas hotplug events returned from
>> rtas,check-exception calls and use the common entry point to handle device
>> hotplug entirely in the kernel.
>>
>> For PowerVM systems, this patch creates the /sys/kernel/dlpar file that rtas
>> hotplug events can be written to by drmgr and passed to the common entry point.
>> There is no chance of updating how we receive hotplug requests on PowerVM
>> systems.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/dlpar.c          |   72 ++++++++++++++++++++++-
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |   19 ++++++
>>  arch/powerpc/platforms/pseries/pseries.h        |   10 +++
>>  3 files changed, 99 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
>> index c22bb1b..ec825d3 100644
>> --- a/arch/powerpc/platforms/pseries/dlpar.c
>> +++ b/arch/powerpc/platforms/pseries/dlpar.c
>> @@ -10,6 +10,8 @@
>>   * 2 as published by the Free Software Foundation.
>>   */
>>  
>> +#define pr_fmt(fmt)	"dlpar: " fmt
>> +
>>  #include <linux/kernel.h>
>>  #include <linux/notifier.h>
>>  #include <linux/spinlock.h>
>> @@ -535,13 +537,79 @@ static ssize_t dlpar_cpu_release(const char *buf, size_t count)
>>  	return count;
>>  }
>>  
>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>> +
>> +static int handle_dlpar_errorlog(struct rtas_error_log *error_log)
>> +{
>> +	struct pseries_errorlog *pseries_log;
>> +	struct pseries_hp_errorlog *hp_elog;
>> +	int rc;
>> +
>> +	pseries_log = get_pseries_errorlog(error_log,
>> +					   PSERIES_ELOG_SECT_ID_HOTPLUG);
> 
>> +	if (!pseries_log || (pseries_log->length == 0))
>> +		return -EINVAL;
>> +
>> +	hp_elog = (struct pseries_hp_errorlog *)pseries_log->data;
> Maybe rather than doing this here it might be worth having a function
> that returns you a pseries_hp_errorlog from a pseries_errorlog but also
> with everything in CPU endian which will make 5/6 and 6/6 look nicer.

I've thought about this, but I need to have parts of the errorlog, namely
the drc_index when used, in BE format when comparing to the device tree
property values and in cpu format when used directly.

I'll look over the patches again and see if I can do some pre-conversion.

>> +
>> +	/* Go ahead and convert the hotplug type to the correct endianness
>> +	 * to avoid converting it everywhere we use it.
>> +	 */
>> +	switch (hp_elog->id_type) {
>> +	case PSERIES_HP_ELOG_ID_DRC_COUNT:
>> +		hp_elog->_drc_u.drc_count =
>> +					be32_to_cpu(hp_elog->_drc_u.drc_count);
>> +	case PSERIES_HP_ELOG_ID_DRC_INDEX:
>> +		hp_elog->_drc_u.drc_index =
>> +					be32_to_cpu(hp_elog->_drc_u.drc_index);
>> +	}
> You're converting __be32 to CPU endian but storing them back into a
> __be32? This will upset sparse.

Hmm... good point. This also makes doing the pre-conversion mentioned above a
bit harder.

> Should this not also have a default case?

Yep, to be fixed.

-Nathan

> 
>> +
>> +	switch (hp_elog->resource) {
>> +	case PSERIES_HP_ELOG_RESOURCE_MEM:
>> +		rc = dlpar_memory(hp_elog);
>> +		break;
>> +	default:
>> +		pr_warn_ratelimited("Invalid resource (%d) specified\n",
>> +				    hp_elog->resource);
>> +		rc = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static ssize_t dlpar_store(struct file *filp, struct kobject *kobj,
>> +			   struct bin_attribute *bin_attr, char *buf,
>> +			   loff_t pos, size_t count)
>> +{
>> +	struct rtas_error_log *error_log;
>> +	int rc;
>> +
>> +	error_log = kmalloc(count, GFP_KERNEL);
>> +	if (!error_log)
>> +		return -ENOMEM;
>> +
>> +	memcpy(error_log, buf, count);
>> +
>> +	rc = handle_dlpar_errorlog(error_log);
>> +	kfree(error_log);
>> +	return rc ? rc : count;
>> +}
>> +
>> +static BIN_ATTR(dlpar, S_IWUSR, NULL, dlpar_store, 0);
>> +
>>  static int __init pseries_dlpar_init(void)
>>  {
>> +	int rc;
>> +
>> +#ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>>  	ppc_md.cpu_probe = dlpar_cpu_probe;
>>  	ppc_md.cpu_release = dlpar_cpu_release;
>> +#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>>  
>> -	return 0;
>> +	rc = sysfs_create_bin_file(kernel_kobj, &bin_attr_dlpar);
>> +
>> +	return rc;
>>  }
>>  machine_device_initcall(pseries, pseries_dlpar_init);
>>  
>> -#endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 3cb256c..69d178b 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -9,6 +9,8 @@
>>   *      2 of the License, or (at your option) any later version.
>>   */
>>  
>> +#define pr_fmt(fmt)	"pseries-hotplug-mem: " fmt
>> +
>>  #include <linux/of.h>
>>  #include <linux/of_address.h>
>>  #include <linux/memblock.h>
>> @@ -134,6 +136,23 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>>  }
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>  
>> +int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	int rc = 0;
>> +
>> +	lock_device_hotplug();
>> +
>> +	switch (hp_elog->action) {
>> +	default:
>> +		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>> +		rc = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	unlock_device_hotplug();
>> +	return rc;
>> +}
>> +
>>  static int pseries_add_mem_node(struct device_node *np)
>>  {
>>  	const char *type;
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>> index 239bee5..40e0339 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -11,6 +11,7 @@
>>  #define _PSERIES_PSERIES_H
>>  
>>  #include <linux/interrupt.h>
>> +#include <asm/rtas.h>
>>  
>>  struct device_node;
>>  
>> @@ -63,6 +64,15 @@ extern int dlpar_detach_node(struct device_node *);
>>  int dlpar_acquire_drc(u32 drc_index);
>>  int dlpar_release_drc(u32 drc_index);
>>  
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
>> +#else
>> +static inline int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>> +#endif
>> +
>>  /* PCI root bridge prepare function override for pseries */
>>  struct pci_host_bridge;
>>  int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel
  2014-11-21  7:49   ` Cyril Bur
@ 2014-11-24 14:56     ` Nathan Fontenot
  0 siblings, 0 replies; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-24 14:56 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev

On 11/21/2014 01:49 AM, Cyril Bur wrote:
> 
> On Mon, 2014-11-17 at 15:54 -0600, Nathan Fontenot wrote:
>> Move handling of memory hotplug add on pseries completely into the kernel.
>>
>> The current memory hotplug add path involves the drmgr command doing part
>> of this work in userspace and requesting the kernel to do additional pieces.
>> This patch allows us to handle the act completely in the kernel via rtas
>> hotplug events. This allows us to perform the operation faster and provide
>> a common memory hotplug add path for PowerVM and PowerKVM systems.
>>
>> The patch does introduce a static rtas_hp_event variable that is set to
>> true when updating the device tree during memory hotplug initiated from
>> a rtas hotplug event. This is needed because we do not need to do the
>> work in the of notifier, this work is already performed in handling the
>> hotplug request. At a later time we can remove this when we deprecate the
>> previous method of memory hotplug.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |  244 +++++++++++++++++++++++
>>  1 file changed, 243 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index 69d178b..b57d42b 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -16,6 +16,7 @@
>>  #include <linux/memblock.h>
>>  #include <linux/memory.h>
>>  #include <linux/memory_hotplug.h>
>> +#include <linux/slab.h>
>>  
>>  #include <asm/firmware.h>
>>  #include <asm/machdep.h>
>> @@ -23,6 +24,8 @@
>>  #include <asm/sparsemem.h>
>>  #include "pseries.h"
>>  
>> +static bool rtas_hp_event;
>> +
>>  unsigned long pseries_memory_block_size(void)
>>  {
>>  	struct device_node *np;
>> @@ -66,6 +69,52 @@ unsigned long pseries_memory_block_size(void)
>>  	return memblock_size;
>>  }
>>  
>> +static void dlpar_free_drconf_property(struct property *prop)
>> +{
>> +	kfree(prop->name);
>> +	kfree(prop->value);
>> +	kfree(prop);
>> +}
>> +
>> +static struct property *dlpar_clone_drconf_property(struct device_node *dn)
>> +{
>> +	struct property *prop, *new_prop;
>> +
>> +	prop = of_find_property(dn, "ibm,dynamic-memory", NULL);
>> +	if (!prop)
>> +		return NULL;
>> +
>> +	new_prop = kzalloc(sizeof(*new_prop), GFP_KERNEL);
>> +	if (!new_prop)
>> +		return NULL;
>> +
>> +	new_prop->name = kstrdup(prop->name, GFP_KERNEL);
>> +	new_prop->value = kmalloc(prop->length, GFP_KERNEL);
>> +	if (!new_prop->name || !new_prop->value) {
>> +		dlpar_free_drconf_property(new_prop);
>> +		return NULL;
>> +	}
>> +
>> +	memcpy(new_prop->value, prop->value, prop->length);
>> +	new_prop->length = prop->length;
>> +
>> +	return new_prop;
>> +}
>> +
>> +static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
>> +{
>> +	unsigned long section_nr;
>> +	struct mem_section *mem_sect;
>> +	struct memory_block *mem_block;
>> +	u64 phys_addr = be64_to_cpu(lmb->base_addr);
>> +
>> +	section_nr = pfn_to_section_nr(PFN_DOWN(phys_addr));
>> +	mem_sect = __nr_to_section(section_nr);
>> +
>> +	mem_block = find_memory_block(mem_sect);
>> +	return mem_block;
>> +}
>> +
>>  #ifdef CONFIG_MEMORY_HOTREMOVE
>>  static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
>>  {
>> @@ -136,19 +185,209 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>>  }
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>  
>> +static int dlpar_add_lmb(struct of_drconf_cell *lmb)
>> +{
>> +	struct memory_block *mem_block;
>> +	u64 phys_addr;
>> +	uint32_t drc_index;
> I started commenting this and it turns out you've used uint32_t almost
> everywhere for values you've pulled from the device tree or the elog.
> These should be u32.

ok, that should be an easy fix.

>> +	unsigned long pages_per_block;
>> +	unsigned long block_sz;
>> +	int nid, sections_per_block;
>> +	int rc;
>> +
>> +	if (be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED)
>> +		return -EINVAL;
>> +
>> +	phys_addr = be64_to_cpu(lmb->base_addr);
>> +	drc_index = be32_to_cpu(lmb->drc_index);
>> +	block_sz = memory_block_size_bytes();
>> +	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> +	pages_per_block = PAGES_PER_SECTION * sections_per_block;
>> +
> Perhaps I'm being a bit slow here but it isn't exactly clear what you're
> getting with all those variables and could you explain what that
> statement below is checking for?
> 
>> +	if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
>> +		return -EINVAL;

We could probably get rid of this check, The values were are validating
come from the device tree and if it's wrong we have more serious problems
than a failure in mem hotplug.

This would also remove the need for the pages_per_block and sections_per_block
variables.

>> +
>> +	rc = dlpar_acquire_drc(drc_index);
>> +	if (rc)
>> +		return rc;
>> +
>> +	/* Find the node id for this address */
>> +	nid = memory_add_physaddr_to_nid(phys_addr);
>> +
>> +	/* Add the memory */
>> +	rc = add_memory(nid, phys_addr, block_sz);
>> +	if (rc) {
>> +		dlpar_release_drc(drc_index);
>> +		return rc;
>> +	}
>> +
>> +	/* Register this block of memory */
>> +	rc = memblock_add(phys_addr, block_sz);
>> +	if (rc) {
>> +		remove_memory(nid, phys_addr, block_sz);
>> +		dlpar_release_drc(drc_index);
>> +		return rc;
>> +	}
>> +
>> +	mem_block = lmb_to_memblock(lmb);
>> +	if (!mem_block) {
>> +		remove_memory(nid, phys_addr, block_sz);
>> +		dlpar_release_drc(drc_index);
>> +		return -EINVAL;
>> +	}
>> +
>> +	rc = device_online(&mem_block->dev);
>> +	put_device(&mem_block->dev);
>> +	if (rc) {
>> +		remove_memory(nid, phys_addr, block_sz);
>> +		dlpar_release_drc(drc_index);
>> +		return rc;
>> +	}
>> +
>> +	lmb->flags |= cpu_to_be32(DRCONF_MEM_ASSIGNED);
>> +	return 0;
>> +}
>> +
>> +static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
>> +				     struct property *prop)
>> +{
>> +	struct of_drconf_cell *lmbs;
>> +	uint32_t num_lmbs;
>> +	__be32 *p;
>> +	int i, lmbs_to_add;
>> +	int lmbs_available = 0;
>> +	int lmbs_added = 0;
>> +	int rc;
>> +
>> +	lmbs_to_add = be32_to_cpu(hp_elog->_drc_u.drc_count);

> Didn't you already do the endian conversion back in
> handle_dlpar_errorlog?

Yep, already fixed in the next patchset.

>> +	pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
>> +
>> +	if (lmbs_to_add == 0)
>> +		return -EINVAL;
>> +
>> +	p = prop->value;
>> +	num_lmbs = be32_to_cpu(*p++);
>> +	lmbs = (struct of_drconf_cell *)p;
>> +
>> +	/* Validate that there are enough LMBs to satisfy the request */
>> +	for (i = 0; i < num_lmbs; i++) {
>> +		if (!(be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED))
>> +			lmbs_available++;
>> +	}
>> +
>> +	if (lmbs_available < lmbs_to_add)
>> +		return -EINVAL;
>> +
> I'm wondering why the next 3 lines when the if could be incorporated
> into the for condition
> for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
> (or lmbs_to_add < lmbs_added)...

It could. I tend to put less in the for() statement. Just my preference on
readability.

>> +	for (i = 0; i < num_lmbs; i++) {
>> +		if (lmbs_to_add == lmbs_added)
>> +			break;
>> +
>> +		rc = dlpar_add_lmb(&lmbs[i]);
>> +		if (rc)
>> +			continue;
>> +
>> +		lmbs_added++;
>> +		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
>> +			be64_to_cpu(lmbs[i].base_addr),
>> +			be32_to_cpu(lmbs[i].drc_index));

> This message feels a tad premature, you're going to roll back if the
> requested amount couldn't be added which is good but then there's going
> to be a confusing mix of 'hot-added' followed by 'hot-removed'. Could
> this message be moved to when you clear the reserved fields?

Ahh... good point. I will do that.

>> +
>> +		/* Mark this lmb so we can remove it later if all of the
>> +		 * requested LMBs cannot be added.
>> +		 */
>> +		lmbs[i].reserved = 1;
> Writing 1 like that into a __be variable will upset sparse. 
> 
> As a more general comment that it might be worth not passing around __be
> structures and convert them all into CPU endian in one place so that all
> these functions can do away with the endian conversion calls.

This may be possible. I will have to look at how the struct is passed around
what is used in it. It would be nice to have fewer conversion calls.

>> +	}
>> +
>> +	if (lmbs_added != lmbs_to_add) {
>> +		/* TODO: remove added lmbs */
>> +		rc = -EINVAL;
>> +	}
>> +
>> +	/* Clear the reserved fields */
>> +	for (i = 0; i < num_lmbs; i++)
>> +		lmbs[i].reserved = 0;
>> +
>> +	return rc;
>> +}
>> +
>> +static int dlpar_memory_add_by_index(struct pseries_hp_errorlog *hp_elog,
>> +				     struct property *prop)
>> +{
>> +	struct of_drconf_cell *lmbs;
>> +	uint32_t num_lmbs, drc_index;
>> +	__be32 *p;
>> +	int i, lmb_found;
>> +	int rc;
>> +
>> +	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
> Didn't you already do the endian conversion back in
> handle_dlpar_errorlog?
>> +	pr_info("Attempting to hot-add LMB, drc index %x\n", drc_index);
>> +
>> +	p = prop->value;
>> +	num_lmbs = be32_to_cpu(*p++);
>> +	lmbs = (struct of_drconf_cell *)p;
>> +
>> +	lmb_found = 0;
>> +	for (i = 0; i < num_lmbs; i++) {
>> +		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
> Probably wanted to use your local variable drc_index here also
> lmbs[i].drc_index is __be and I don't think you've converted it but the
> other side of the condition has been...

Yep.

-Nathan

>> +			lmb_found = 1;
>> +			rc = dlpar_add_lmb(&lmbs[i]);
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!lmb_found)
>> +		rc = -EINVAL;
>> +
>> +	if (rc)
>> +		pr_info("Failed to hot-add memory, drc index %x\n", drc_index);
>> +	else
>> +		pr_info("Memory at %llx (drc index %x) has been hot-added\n",
>> +			be64_to_cpu(lmbs[i].base_addr), drc_index);
>> +
>> +	return rc;
>> +}
>> +
>>  int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>>  {
>> -	int rc = 0;
>> +	struct device_node *dn;
>> +	struct property *prop;
>> +	int rc;
>>  
>>  	lock_device_hotplug();
>>  
>> +	dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>> +	if (!dn)
>> +		return -EINVAL;
>> +
>> +	prop = dlpar_clone_drconf_property(dn);
>> +	if (!prop) {
>> +		of_node_put(dn);
>> +		return -EINVAL;
>> +	}
>> +
>>  	switch (hp_elog->action) {
>> +	case PSERIES_HP_ELOG_ACTION_ADD:
>> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>> +			rc = dlpar_memory_add_by_count(hp_elog, prop);
>> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
>> +			rc = dlpar_memory_add_by_index(hp_elog, prop);
>> +		else
>> +			rc = -EINVAL;
>> +		break;
>>  	default:
>>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>>  		rc = -EINVAL;
>>  		break;
>>  	}
>>  
>> +	if (rc)
>> +		dlpar_free_drconf_property(prop);
>> +	else {
>> +		rtas_hp_event = true;
>> +		of_update_property(dn, prop);
>> +		rtas_hp_event = false;
>> +	}
>> +
>> +	of_node_put(dn);
>>  	unlock_device_hotplug();
>>  	return rc;
>>  }
>> @@ -193,6 +432,9 @@ static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
>>  	__be32 *p;
>>  	int i, rc = -EINVAL;
>>  
>> +	if (rtas_hp_event)
>> +		return 0;
>> +
>>  	memblock_size = pseries_memory_block_size();
>>  	if (!memblock_size)
>>  		return -EINVAL;
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] pseries: Implement memory hotplug remove in the kernel
  2014-11-21  7:49   ` Cyril Bur
@ 2014-11-24 15:03     ` Nathan Fontenot
  2014-11-24 22:05       ` Cyril Bur
  0 siblings, 1 reply; 20+ messages in thread
From: Nathan Fontenot @ 2014-11-24 15:03 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev



On 11/21/2014 01:49 AM, Cyril Bur wrote:
> 
> On Mon, 2014-11-17 at 15:56 -0600, Nathan Fontenot wrote:
>> Move handling of memory hotplug remove on pseries completely into the kernel.
>>
>> The current memory hotplug remove path involves the drmgr command doing part
>> of this work in userspace and requesting the kernel to do additional pieces.
>> This patch allows us to handle the act completely in the kernel via rtas
>> hotplug events. This allows us to perform the operation faster and provide
>> a common memory hotplug remove path for PowerVM and PowerKVM systems.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-memory.c |  206 ++++++++++++++++++++++-
>>  1 file changed, 201 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> index b57d42b..c8189e8 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>> @@ -173,6 +173,179 @@ static int pseries_remove_mem_node(struct device_node *np)
>>  	pseries_remove_memblock(base, lmb_size);
>>  	return 0;
>>  }
>> +
>> +static int lmb_is_removable(struct of_drconf_cell *lmb)
>> +{
>> +	int i, scns_per_block;
>> +	int rc = 1;
>> +	unsigned long pfn, block_sz;
>> +	u64 phys_addr;
>> +
>> +	if (!(be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED))
>> +		return -1;
> This makes me kind of nervous. You're using the return value of
> lmb_is_removable as a boolean but it returns three possible values -1,0
> and 1. Functionally it looks correct to me so its not a massive issue

Oh yuck. this should be a boolean return. I'll fix that.

>> +
>> +	phys_addr = be64_to_cpu(lmb->base_addr);
>> +	block_sz = memory_block_size_bytes();
>> +	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>> +
>> +	for (i = 0; i < scns_per_block; i++) {
>> +		pfn = PFN_DOWN(phys_addr);
>> +		if (!pfn_present(pfn))
>> +			continue;
>> +
>> +		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
>> +		phys_addr += MIN_MEMORY_BLOCK_SIZE;
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static int dlpar_add_lmb(struct of_drconf_cell *);
>> +
>> +static int dlpar_remove_lmb(struct of_drconf_cell *lmb)
>> +{
>> +	struct memory_block *mem_block;
>> +	unsigned long block_sz;
>> +	u64 phys_addr;
>> +	uint32_t drc_index;
>> +	int nid, rc;
>> +
>> +	if (!lmb_is_removable(lmb))
>> +		return -EINVAL;
>> +
>> +	phys_addr = be64_to_cpu(lmb->base_addr);
>> +	drc_index = be32_to_cpu(lmb->drc_index);
>> +
>> +	mem_block = lmb_to_memblock(lmb);
>> +	if (!mem_block)
>> +		return -EINVAL;
>> +
>> +	rc = device_offline(&mem_block->dev);
>> +	put_device(&mem_block->dev);
>> +	if (rc)
>> +		return rc;
>> +
>> +	block_sz = pseries_memory_block_size();
>> +	nid = memory_add_physaddr_to_nid(phys_addr);
>> +
>> +	remove_memory(nid, phys_addr, block_sz);
>> +
>> +	/* Update memory regions for memory remove */
>> +	memblock_remove(phys_addr, block_sz);
>> +
>> +	dlpar_release_drc(drc_index);
>> +
>> +	lmb->flags &= cpu_to_be32(~DRCONF_MEM_ASSIGNED);
>> +	pr_info("Memory at %llx (drc index %x) has been hot-removed\n",
>> +		be64_to_cpu(lmb->base_addr), drc_index);
> dlpar_add_lmb doesn't print anything but dlpar_remove_lmb does? Related
> to my comment about printing a 'hot-add' messages prematurely, perhaps
> move this to the callers

Yes, I'll update the remove mem messages also.

>> +
>> +	return 0;
>> +}
>> +
>> +static int dlpar_memory_remove_by_count(struct pseries_hp_errorlog *hp_elog,
>> +					struct property *prop)
>> +{
>> +	struct of_drconf_cell *lmbs;
>> +	int lmbs_to_remove, lmbs_removed = 0;
>> +	int lmbs_available = 0;
>> +	uint32_t num_lmbs;
>> +	__be32 *p;
>> +	int i, rc;
>> +
>> +	lmbs_to_remove = be32_to_cpu(hp_elog->_drc_u.drc_count);
> Didn't you already do the endian conversion back in
> handle_dlpar_errorlog?
>> +	pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
>> +
>> +	if (lmbs_to_remove == 0)
>> +		return -EINVAL;
>> +
>> +	p = prop->value;
>> +	num_lmbs = be32_to_cpu(*p++);
>> +	lmbs = (struct of_drconf_cell *)p;
>> +
>> +	/* Validate that there are enough LMBs to satisfy the request */
>> +	for (i = 0; i < num_lmbs; i++) {
>> +		if (be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED)
>> +			lmbs_available++;
>> +	}
>> +
>> +	if (lmbs_available < lmbs_to_remove)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < num_lmbs; i++) {
>> +		if (lmbs_to_remove == lmbs_removed)
>> +			break;
>> +
>> +		rc = dlpar_remove_lmb(&lmbs[i]);
>> +		if (rc)
>> +			continue;
>> +
>> +		lmbs_removed++;
>> +
>> +		/* Mark this lmb so we can add it later if all of the
>> +		 * requested LMBs cannot be removed.
>> +		 */
>> +		lmbs[i].reserved = 1;
>> +	}
>> +
>> +	if (lmbs_removed != lmbs_to_remove) {
>> +		pr_err("Memory hot-remove failed, adding LMB's back\n");
>> +
>> +		for (i = 0; i < num_lmbs; i++) {
>> +			if (!lmbs[i].reserved)
>> +				continue;
>> +
>> +			rc = dlpar_add_lmb(&lmbs[i]);
>> +			if (rc)
> If this happens you this will leave an LMB removed but an error code
> from this function will cause dlpar_memory to discard the dn prop that
> it passed in. If you couldn't roll back completely the caller should
> still update the dn property with the ones that it has left
> removed/failed to re-add. Perhaps this function needs a mechanism to
> report success/failure (as it currently does) and a mechanism to say
> whether or not the dn property is dirty or not.

Good catch. The device tree should be updated to denote lmb's that were
added but we failed to remove because of some error.

>> +				pr_err("Failed to add LMB back, drc index %x\n",
>> +				       be32_to_cpu(lmbs[i].drc_index));
>> +
> On a clean (everything rolled back successfully) failure, I don't think
> you _need_ to bother although this is nice cleanup, you didn't do it
> dlpar_memory_add_by_count - I'm in favour of being clean especially
> since on an 'unclean' failure you should clear that flag - if you're
> going to still update the device tree.

I'll re-visit the cleanup code to make sure the device tree property is left
in the correct state on a failure during roll-back.

>> +			lmbs[i].reserved = 0;
>> +		}
>> +		rc = -EINVAL;
>> +	} else {
>> +		/* remove any reserved markings */
>> +		for (i = 0; i < num_lmbs; i++)
>> +			lmbs[i].reserved = 0;
>> +	}
> Hmmmm since the if statements are checking for failures. It might be
> more clear to return -EINVAL (rather than setting rc) and the code in
> the else block can be moved out of it and come to think of it, return 0.
> Otherwise it looks odd to me with the 'normal' success case code being
> in an else statement.

I like this.

>> +
>> +	return rc;
>> +}
>> +
>> +static int dlpar_memory_remove_by_index(struct pseries_hp_errorlog *hp_elog,
>> +					struct property *prop)
>> +{
>> +	struct of_drconf_cell *lmbs;
>> +	uint32_t num_lmbs, drc_index;
>> +	int lmb_found;
>> +	__be32 *p;
>> +	int i, rc;
>> +
>> +	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
> Didn't you already do the endian conversion back in
> handle_dlpar_errorlog?
>> +	pr_info("Attempting to hot-remove LMB, drc index %x\n", drc_index);
>> +
>> +	p = prop->value;
>> +	num_lmbs = be32_to_cpu(*p++);
>> +	lmbs = (struct of_drconf_cell *)p;
>> +
>> +	lmb_found = 0;
>> +	for (i = 0; i < num_lmbs; i++) {
>> +		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
>> +			lmb_found = 1;
>> +			rc = dlpar_remove_lmb(&lmbs[i]);
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!lmb_found)
>> +		rc = -EINVAL;
>> +
>> +	if (rc)
>> +		pr_info("Failed to hot-remove memory, drc index %x\n",
>> +			drc_index);
>> +
>> +	return rc;
>> +}
>> +
>>  #else
>>  static inline int pseries_remove_memblock(unsigned long base,
>>  					  unsigned int memblock_size)
>> @@ -183,6 +356,11 @@ static inline int pseries_remove_mem_node(struct device_node *np)
>>  {
>>  	return 0;
>>  }
>> +static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>> +
>>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>>  
>>  static int dlpar_add_lmb(struct of_drconf_cell *lmb)
>> @@ -298,14 +476,24 @@ static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
>>  	}
>>  
>>  	if (lmbs_added != lmbs_to_add) {
>> -		/* TODO: remove added lmbs */
>> +		pr_err("Memory hot-add failed, removing any added LMBs\n");
>> +
>> +		for (i = 0; i < num_lmbs; i++) {
>> +			if (!lmbs[i].reserved)
>> +				continue;
>> +
>> +			rc = dlpar_remove_lmb(&lmbs[i]);
>> +			if (rc)
> There is a very much related comment in dlpar_memory_remove_by_count
> about this condition being true.
>> +				pr_err("Failed to remove LMB, drc index %x\n",
>> +				       be32_to_cpu(lmbs[i].drc_index));
> 
> You don't clear the reserved flag here?
In the failure case here the device tree property is not updated, so no
cleanup needed. This is not really obvious so it would probably be best
to do the cleanup.
 
> 
>> +		}
>>  		rc = -EINVAL;
> Same observation as in dlpar_memory_remove_by_count

ok.

Thanks for the review.

-Nathan

>> +	} else {
>> +		/* Clear the reserved fields */
>> +		for (i = 0; i < num_lmbs; i++)
>> +			lmbs[i].reserved = 0;
>>  	}
>>  
>> -	/* Clear the reserved fields */
>> -	for (i = 0; i < num_lmbs; i++)
>> -		lmbs[i].reserved = 0;
>> -
>>  	return rc;
>>  }
>>  
>> @@ -373,6 +561,14 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
>>  		else
>>  			rc = -EINVAL;
>>  		break;
>> +	case PSERIES_HP_ELOG_ACTION_REMOVE:
>> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>> +			rc = dlpar_memory_remove_by_count(hp_elog, prop);
>> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
>> +			rc = dlpar_memory_remove_by_index(hp_elog, prop);
>> +		else
>> +			rc = -EINVAL;
>> +		break;
>>  	default:
>>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
>>  		rc = -EINVAL;
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] pseries: Implement memory hotplug remove in the kernel
  2014-11-24 15:03     ` Nathan Fontenot
@ 2014-11-24 22:05       ` Cyril Bur
  0 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2014-11-24 22:05 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, 2014-11-24 at 09:03 -0600, Nathan Fontenot wrote:
> 
> On 11/21/2014 01:49 AM, Cyril Bur wrote:
> > 
> > On Mon, 2014-11-17 at 15:56 -0600, Nathan Fontenot wrote:
> >> Move handling of memory hotplug remove on pseries completely into the kernel.
> >>
> >> The current memory hotplug remove path involves the drmgr command doing part
> >> of this work in userspace and requesting the kernel to do additional pieces.
> >> This patch allows us to handle the act completely in the kernel via rtas
> >> hotplug events. This allows us to perform the operation faster and provide
> >> a common memory hotplug remove path for PowerVM and PowerKVM systems.
> >>
> >> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> >> ---
> >>  arch/powerpc/platforms/pseries/hotplug-memory.c |  206 ++++++++++++++++++++++-
> >>  1 file changed, 201 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> >> index b57d42b..c8189e8 100644
> >> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> >> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> >> @@ -173,6 +173,179 @@ static int pseries_remove_mem_node(struct device_node *np)
> >>  	pseries_remove_memblock(base, lmb_size);
> >>  	return 0;
> >>  }
> >> +
> >> +static int lmb_is_removable(struct of_drconf_cell *lmb)
> >> +{
> >> +	int i, scns_per_block;
> >> +	int rc = 1;
> >> +	unsigned long pfn, block_sz;
> >> +	u64 phys_addr;
> >> +
> >> +	if (!(be32_to_cpu(lmb->flags) & DRCONF_MEM_ASSIGNED))
> >> +		return -1;
> > This makes me kind of nervous. You're using the return value of
> > lmb_is_removable as a boolean but it returns three possible values -1,0
> > and 1. Functionally it looks correct to me so its not a massive issue
> 
> Oh yuck. this should be a boolean return. I'll fix that.
> 
> >> +
> >> +	phys_addr = be64_to_cpu(lmb->base_addr);
> >> +	block_sz = memory_block_size_bytes();
> >> +	scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> >> +
> >> +	for (i = 0; i < scns_per_block; i++) {
> >> +		pfn = PFN_DOWN(phys_addr);
> >> +		if (!pfn_present(pfn))
> >> +			continue;
> >> +
> >> +		rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
> >> +		phys_addr += MIN_MEMORY_BLOCK_SIZE;
> >> +	}
> >> +
> >> +	return rc;
> >> +}
> >> +
> >> +static int dlpar_add_lmb(struct of_drconf_cell *);
> >> +
> >> +static int dlpar_remove_lmb(struct of_drconf_cell *lmb)
> >> +{
> >> +	struct memory_block *mem_block;
> >> +	unsigned long block_sz;
> >> +	u64 phys_addr;
> >> +	uint32_t drc_index;
> >> +	int nid, rc;
> >> +
> >> +	if (!lmb_is_removable(lmb))
> >> +		return -EINVAL;
> >> +
> >> +	phys_addr = be64_to_cpu(lmb->base_addr);
> >> +	drc_index = be32_to_cpu(lmb->drc_index);
> >> +
> >> +	mem_block = lmb_to_memblock(lmb);
> >> +	if (!mem_block)
> >> +		return -EINVAL;
> >> +
> >> +	rc = device_offline(&mem_block->dev);
> >> +	put_device(&mem_block->dev);
> >> +	if (rc)
> >> +		return rc;
> >> +
> >> +	block_sz = pseries_memory_block_size();
> >> +	nid = memory_add_physaddr_to_nid(phys_addr);
> >> +
> >> +	remove_memory(nid, phys_addr, block_sz);
> >> +
> >> +	/* Update memory regions for memory remove */
> >> +	memblock_remove(phys_addr, block_sz);
> >> +
> >> +	dlpar_release_drc(drc_index);
> >> +
> >> +	lmb->flags &= cpu_to_be32(~DRCONF_MEM_ASSIGNED);
> >> +	pr_info("Memory at %llx (drc index %x) has been hot-removed\n",
> >> +		be64_to_cpu(lmb->base_addr), drc_index);
> > dlpar_add_lmb doesn't print anything but dlpar_remove_lmb does? Related
> > to my comment about printing a 'hot-add' messages prematurely, perhaps
> > move this to the callers
> 
> Yes, I'll update the remove mem messages also.
> 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static int dlpar_memory_remove_by_count(struct pseries_hp_errorlog *hp_elog,
> >> +					struct property *prop)
> >> +{
> >> +	struct of_drconf_cell *lmbs;
> >> +	int lmbs_to_remove, lmbs_removed = 0;
> >> +	int lmbs_available = 0;
> >> +	uint32_t num_lmbs;
> >> +	__be32 *p;
> >> +	int i, rc;
> >> +
> >> +	lmbs_to_remove = be32_to_cpu(hp_elog->_drc_u.drc_count);
> > Didn't you already do the endian conversion back in
> > handle_dlpar_errorlog?
> >> +	pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
> >> +
> >> +	if (lmbs_to_remove == 0)
> >> +		return -EINVAL;
> >> +
> >> +	p = prop->value;
> >> +	num_lmbs = be32_to_cpu(*p++);
> >> +	lmbs = (struct of_drconf_cell *)p;
> >> +
> >> +	/* Validate that there are enough LMBs to satisfy the request */
> >> +	for (i = 0; i < num_lmbs; i++) {
> >> +		if (be32_to_cpu(lmbs[i].flags) & DRCONF_MEM_ASSIGNED)
> >> +			lmbs_available++;
> >> +	}
> >> +
> >> +	if (lmbs_available < lmbs_to_remove)
> >> +		return -EINVAL;
> >> +
> >> +	for (i = 0; i < num_lmbs; i++) {
> >> +		if (lmbs_to_remove == lmbs_removed)
> >> +			break;
> >> +
> >> +		rc = dlpar_remove_lmb(&lmbs[i]);
> >> +		if (rc)
> >> +			continue;
> >> +
> >> +		lmbs_removed++;
> >> +
> >> +		/* Mark this lmb so we can add it later if all of the
> >> +		 * requested LMBs cannot be removed.
> >> +		 */
> >> +		lmbs[i].reserved = 1;
> >> +	}
> >> +
> >> +	if (lmbs_removed != lmbs_to_remove) {
> >> +		pr_err("Memory hot-remove failed, adding LMB's back\n");
> >> +
> >> +		for (i = 0; i < num_lmbs; i++) {
> >> +			if (!lmbs[i].reserved)
> >> +				continue;
> >> +
> >> +			rc = dlpar_add_lmb(&lmbs[i]);
> >> +			if (rc)
> > If this happens you this will leave an LMB removed but an error code
> > from this function will cause dlpar_memory to discard the dn prop that
> > it passed in. If you couldn't roll back completely the caller should
> > still update the dn property with the ones that it has left
> > removed/failed to re-add. Perhaps this function needs a mechanism to
> > report success/failure (as it currently does) and a mechanism to say
> > whether or not the dn property is dirty or not.
> 
> Good catch. The device tree should be updated to denote lmb's that were
> added but we failed to remove because of some error.
> 
> >> +				pr_err("Failed to add LMB back, drc index %x\n",
> >> +				       be32_to_cpu(lmbs[i].drc_index));
> >> +
> > On a clean (everything rolled back successfully) failure, I don't think
> > you _need_ to bother although this is nice cleanup, you didn't do it
> > dlpar_memory_add_by_count - I'm in favour of being clean especially
> > since on an 'unclean' failure you should clear that flag - if you're
> > going to still update the device tree.
> 
> I'll re-visit the cleanup code to make sure the device tree property is left
> in the correct state on a failure during roll-back.
> 
> >> +			lmbs[i].reserved = 0;
> >> +		}
> >> +		rc = -EINVAL;
> >> +	} else {
> >> +		/* remove any reserved markings */
> >> +		for (i = 0; i < num_lmbs; i++)
> >> +			lmbs[i].reserved = 0;
> >> +	}
> > Hmmmm since the if statements are checking for failures. It might be
> > more clear to return -EINVAL (rather than setting rc) and the code in
> > the else block can be moved out of it and come to think of it, return 0.
> > Otherwise it looks odd to me with the 'normal' success case code being
> > in an else statement.
> 
> I like this.
> 
> >> +
> >> +	return rc;
> >> +}
> >> +
> >> +static int dlpar_memory_remove_by_index(struct pseries_hp_errorlog *hp_elog,
> >> +					struct property *prop)
> >> +{
> >> +	struct of_drconf_cell *lmbs;
> >> +	uint32_t num_lmbs, drc_index;
> >> +	int lmb_found;
> >> +	__be32 *p;
> >> +	int i, rc;
> >> +
> >> +	drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index);
> > Didn't you already do the endian conversion back in
> > handle_dlpar_errorlog?
> >> +	pr_info("Attempting to hot-remove LMB, drc index %x\n", drc_index);
> >> +
> >> +	p = prop->value;
> >> +	num_lmbs = be32_to_cpu(*p++);
> >> +	lmbs = (struct of_drconf_cell *)p;
> >> +
> >> +	lmb_found = 0;
> >> +	for (i = 0; i < num_lmbs; i++) {
> >> +		if (lmbs[i].drc_index == hp_elog->_drc_u.drc_index) {
> >> +			lmb_found = 1;
> >> +			rc = dlpar_remove_lmb(&lmbs[i]);
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	if (!lmb_found)
> >> +		rc = -EINVAL;
> >> +
> >> +	if (rc)
> >> +		pr_info("Failed to hot-remove memory, drc index %x\n",
> >> +			drc_index);
> >> +
> >> +	return rc;
> >> +}
> >> +
> >>  #else
> >>  static inline int pseries_remove_memblock(unsigned long base,
> >>  					  unsigned int memblock_size)
> >> @@ -183,6 +356,11 @@ static inline int pseries_remove_mem_node(struct device_node *np)
> >>  {
> >>  	return 0;
> >>  }
> >> +static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
> >> +{
> >> +	return -EOPNOTSUPP;
> >> +}
> >> +
> >>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> >>  
> >>  static int dlpar_add_lmb(struct of_drconf_cell *lmb)
> >> @@ -298,14 +476,24 @@ static int dlpar_memory_add_by_count(struct pseries_hp_errorlog *hp_elog,
> >>  	}
> >>  
> >>  	if (lmbs_added != lmbs_to_add) {
> >> -		/* TODO: remove added lmbs */
> >> +		pr_err("Memory hot-add failed, removing any added LMBs\n");
> >> +
> >> +		for (i = 0; i < num_lmbs; i++) {
> >> +			if (!lmbs[i].reserved)
> >> +				continue;
> >> +
> >> +			rc = dlpar_remove_lmb(&lmbs[i]);
> >> +			if (rc)
> > There is a very much related comment in dlpar_memory_remove_by_count
> > about this condition being true.
> >> +				pr_err("Failed to remove LMB, drc index %x\n",
> >> +				       be32_to_cpu(lmbs[i].drc_index));
> > 
> > You don't clear the reserved flag here?
> In the failure case here the device tree property is not updated, so no
> cleanup needed. This is not really obvious so it would probably be best
> to do the cleanup.
>  
Of course except when you fail to roll back and you'll be forced to
update the device tree - but I suppose clearing it here or not really
depends on how you do that...
> > 
> >> +		}
> >>  		rc = -EINVAL;
> > Same observation as in dlpar_memory_remove_by_count
> 
> ok.
> 
> Thanks for the review.
> 
> -Nathan
> 
> >> +	} else {
> >> +		/* Clear the reserved fields */
> >> +		for (i = 0; i < num_lmbs; i++)
> >> +			lmbs[i].reserved = 0;
> >>  	}
> >>  
> >> -	/* Clear the reserved fields */
> >> -	for (i = 0; i < num_lmbs; i++)
> >> -		lmbs[i].reserved = 0;
> >> -
> >>  	return rc;
> >>  }
> >>  
> >> @@ -373,6 +561,14 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
> >>  		else
> >>  			rc = -EINVAL;
> >>  		break;
> >> +	case PSERIES_HP_ELOG_ACTION_REMOVE:
> >> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
> >> +			rc = dlpar_memory_remove_by_count(hp_elog, prop);
> >> +		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> >> +			rc = dlpar_memory_remove_by_index(hp_elog, prop);
> >> +		else
> >> +			rc = -EINVAL;
> >> +		break;
> >>  	default:
> >>  		pr_err("Invalid action (%d) specified\n", hp_elog->action);
> >>  		rc = -EINVAL;
> >>
> >> _______________________________________________
> >> Linuxppc-dev mailing list
> >> Linuxppc-dev@lists.ozlabs.org
> >> https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 
> > 
> > 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] pseries: Create new device hotplug entry point
  2014-11-17 21:51 ` [PATCH v2 3/6] pseries: Create new device hotplug entry point Nathan Fontenot
  2014-11-17 22:53   ` Gavin Shan
  2014-11-21  7:49   ` Cyril Bur
@ 2014-11-26 23:44   ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 20+ messages in thread
From: Benjamin Herrenschmidt @ 2014-11-26 23:44 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, 2014-11-17 at 15:51 -0600, Nathan Fontenot wrote:
> 
> For PowerVM systems, this patch creates the /sys/kernel/dlpar file that rtas
> hotplug events can be written to by drmgr and passed to the common entry point.
> There is no chance of updating how we receive hotplug requests on PowerVM
> systems.

I'm not convinced this is the right place for that file, might be worth asking
Greg KH what he thinks here.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-11-26 23:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-17 21:44 [PATCH v2 0/6] pseries: Move memory hotplug to the kernel Nathan Fontenot
2014-11-17 21:48 ` [PATCH v2 1/6] pseries: Define rtas hotplug event sections Nathan Fontenot
2014-11-17 21:50 ` [PATCH v2 2/6] pseries: Update of_drconf_cell struct for endian-ness Nathan Fontenot
2014-11-17 21:51 ` [PATCH v2 3/6] pseries: Create new device hotplug entry point Nathan Fontenot
2014-11-17 22:53   ` Gavin Shan
2014-11-18 18:18     ` Nathan Fontenot
2014-11-21  7:49   ` Cyril Bur
2014-11-24 14:45     ` Nathan Fontenot
2014-11-26 23:44   ` Benjamin Herrenschmidt
2014-11-17 21:53 ` [PATCH v2 4/6] pseries: Export the acquire/release drc index routines Nathan Fontenot
2014-11-17 21:54 ` [PATCH v2 5/6] pseries: Implement memory hotplug add in the kernel Nathan Fontenot
2014-11-21  7:49   ` Cyril Bur
2014-11-24 14:56     ` Nathan Fontenot
2014-11-17 21:56 ` [PATCH v2 6/6] pseries: Implement memory hotplug remove " Nathan Fontenot
2014-11-21  7:49   ` Cyril Bur
2014-11-24 15:03     ` Nathan Fontenot
2014-11-24 22:05       ` Cyril Bur
2014-11-18  2:00 ` [PATCH v2 0/6] pseries: Move memory hotplug to " Cyril Bur
2014-11-18 18:34   ` Nathan Fontenot
2014-11-18 22:59     ` Cyril Bur

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.