linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN
@ 2013-03-25 18:43 Nathan Fontenot
  2013-03-25 18:51 ` [PATCH v2 1/11] Expose pseries devicetree_update() Nathan Fontenot
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:43 UTC (permalink / raw)
  To: linuxppc-dev

Newer firmware on Power systems can transparently reassign platform resources
(CPU and Memory) in use. For instance, if a processor or memory unit is
predicted to fail, the platform may transparently move the processing to an
equivalent unused processor or the memory state to an equivalent unused
memory unit. However, reassigning resources across NUMA boundaries may alter
the performance of the partition. When such reassignment is necessary, the
Platform Resource Reassignment Notification (PRRN) option provides a
mechanism to inform the Linux kernel of changes to the NUMA affinity of
its platform resources.

PRRN Events are RTAS events sent up through the event-scan mechanism on
Power. When these events are received the system needs can get the updated
device tree affinity information for the affected CPUs/memory via the
rtas update-nodes and update-properties calls. This information is then
used to update the NUMA affinity of the CPUs/Memory in the kernel.

This patch set adds the ability to recognize PRRN events, update the device
tree and kernel information for CPUs (memory will be handled in a later
patch), and add an interface to enable/disable toplogy updates from /proc.

Additionally, these updates solve an exisitng problem with the VPHN (Virtual
Processor Home Node) capability and allow us to re-enable this feature.

Nathan Fontenot

Updates for Version 2 of this patchset

- Merged the functionality of platform_has_feature into the existing
  firmware_has_feature routine.
- Corrected the new way certain bits in the architecture vector are
  defined based on config options.
---

 arch/powerpc/include/asm/firmware.h               |    3 
 arch/powerpc/include/asm/prom.h                   |   46 ++---
 arch/powerpc/include/asm/rtas.h                   |    2 
 arch/powerpc/kernel/prom_init.c                   |   98 ++---------
 arch/powerpc/kernel/rtasd.c                       |   35 ++++
 arch/powerpc/mm/numa.c                            |  183 ++++++++++++++--------
 arch/powerpc/platforms/pseries/firmware.c         |    1 
 powerpc/arch/powerpc/include/asm/firmware.h       |    4 
 powerpc/arch/powerpc/include/asm/prom.h           |   73 ++++++++
 powerpc/arch/powerpc/include/asm/rtas.h           |    1 
 powerpc/arch/powerpc/include/asm/topology.h       |    5 
 powerpc/arch/powerpc/kernel/prom_init.c           |    2 
 powerpc/arch/powerpc/kernel/rtasd.c               |    6 
 powerpc/arch/powerpc/mm/numa.c                    |   62 +++++++
 powerpc/arch/powerpc/platforms/pseries/firmware.c |   67 +++++++-
 powerpc/arch/powerpc/platforms/pseries/mobility.c |   21 +-
 powerpc/arch/powerpc/platforms/pseries/pseries.h  |    5 
 powerpc/arch/powerpc/platforms/pseries/setup.c    |   40 +++-
 18 files changed, 455 insertions(+), 199 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/11] Expose pseries devicetree_update()
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
@ 2013-03-25 18:51 ` Nathan Fontenot
  2013-04-04  3:09   ` Paul Mackerras
  2013-03-25 18:52 ` [PATCH v2 2/11] Add PRRN Event Handler Nathan Fontenot
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:51 UTC (permalink / raw)
  To: linuxppc-dev

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

Newer firmware on Power systems can transparently reassign platform resources
(CPU and Memory) in use. For instance, if a processor or memory unit is
predicted to fail, the platform may transparently move the processing to an
equivalent unused processor or the memory state to an equivalent unused
memory unit. However, reassigning resources across NUMA boundaries may alter
the performance of the partition. When such reassignment is necessary, the
Platform Resource Reassignment Notification (PRRN) option provides a
mechanism to inform the Linux kernel of changes to the NUMA affinity of
its platform resources.

When rtasd receives a PRRN event, it needs to make a series of RTAS
calls (ibm,update-nodes and ibm,update-properties) to retrieve the
updated device tree information. These calls are already handled in the
pseries_devtree_update() routine used in partition migration.

This patch simply exposes pseries_devicetree_update() so it can be
called by rtasd. pseries_devicetree_update() and supporting functions
are also modified to take a 32-bit 'scope' parameter. This parameter is
required by the ibm,update-nodes/ibm,update-properties RTAS calls, and
the appropriate value is contained within the RTAS event for PRRN
notifications. In pseries_devicetree_update() it was previously
hard-coded to 1, the scope value for partition migration.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h           |    1 +
 arch/powerpc/platforms/pseries/mobility.c |   21 ++++++++++++---------
 2 files changed, 13 insertions(+), 9 deletions(-)

Index: powerpc/arch/powerpc/include/asm/rtas.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/rtas.h	2013-03-20 08:24:15.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/rtas.h	2013-03-20 08:51:59.000000000 -0500
@@ -276,6 +276,7 @@
 		const char *uname, int depth, void *data);
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
+extern int pseries_devicetree_update(s32 scope);
 
 #ifdef CONFIG_PPC_RTAS_DAEMON
 extern void rtas_cancel_event_scan(void);
Index: powerpc/arch/powerpc/platforms/pseries/mobility.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/mobility.c	2013-03-20 08:24:15.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/mobility.c	2013-03-20 08:51:59.000000000 -0500
@@ -37,14 +37,16 @@
 #define UPDATE_DT_NODE	0x02000000
 #define ADD_DT_NODE	0x03000000
 
-static int mobility_rtas_call(int token, char *buf)
+#define MIGRATION_SCOPE	(1)
+
+static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
 	int rc;
 
 	spin_lock(&rtas_data_buf_lock);
 
 	memcpy(rtas_data_buf, buf, RTAS_DATA_BUF_SIZE);
-	rc = rtas_call(token, 2, 1, NULL, rtas_data_buf, 1);
+	rc = rtas_call(token, 2, 1, NULL, rtas_data_buf, scope);
 	memcpy(buf, rtas_data_buf, RTAS_DATA_BUF_SIZE);
 
 	spin_unlock(&rtas_data_buf_lock);
@@ -123,7 +125,7 @@
 	return 0;
 }
 
-static int update_dt_node(u32 phandle)
+static int update_dt_node(u32 phandle, s32 scope)
 {
 	struct update_props_workarea *upwa;
 	struct device_node *dn;
@@ -151,7 +153,8 @@
 	upwa->phandle = phandle;
 
 	do {
-		rc = mobility_rtas_call(update_properties_token, rtas_buf);
+		rc = mobility_rtas_call(update_properties_token, rtas_buf,
+					scope);
 		if (rc < 0)
 			break;
 
@@ -219,7 +222,7 @@
 	return rc;
 }
 
-static int pseries_devicetree_update(void)
+int pseries_devicetree_update(s32 scope)
 {
 	char *rtas_buf;
 	u32 *data;
@@ -235,7 +238,7 @@
 		return -ENOMEM;
 
 	do {
-		rc = mobility_rtas_call(update_nodes_token, rtas_buf);
+		rc = mobility_rtas_call(update_nodes_token, rtas_buf, scope);
 		if (rc && rc != 1)
 			break;
 
@@ -256,7 +259,7 @@
 					delete_dt_node(phandle);
 					break;
 				case UPDATE_DT_NODE:
-					update_dt_node(phandle);
+					update_dt_node(phandle, scope);
 					break;
 				case ADD_DT_NODE:
 					drc_index = *data++;
@@ -276,7 +279,7 @@
 	int rc;
 	int activate_fw_token;
 
-	rc = pseries_devicetree_update();
+	rc = pseries_devicetree_update(MIGRATION_SCOPE);
 	if (rc) {
 		printk(KERN_ERR "Initial post-mobility device tree update "
 		       "failed: %d\n", rc);
@@ -292,7 +295,7 @@
 
 	rc = rtas_call(activate_fw_token, 0, 1, NULL);
 	if (!rc) {
-		rc = pseries_devicetree_update();
+		rc = pseries_devicetree_update(MIGRATION_SCOPE);
 		if (rc)
 			printk(KERN_ERR "Secondary post-mobility device tree "
 			       "update failed: %d\n", rc);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 2/11] Add PRRN Event Handler
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
  2013-03-25 18:51 ` [PATCH v2 1/11] Expose pseries devicetree_update() Nathan Fontenot
@ 2013-03-25 18:52 ` Nathan Fontenot
  2013-04-04  3:34   ` Paul Mackerras
  2013-04-10  8:30   ` Michael Ellerman
  2013-03-25 18:53 ` [PATCH v2 3/11] Move architecture vector definitions to prom.h Nathan Fontenot
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:52 UTC (permalink / raw)
  To: linuxppc-dev

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

A PRRN event is signaled via the RTAS event-scan mechanism, which
returns a Hot Plug Event message "fixed part" indicating "Platform
Resource Reassignment". In response to the Hot Plug Event message,
we must call ibm,update-nodes to determine which resources were
reassigned and then ibm,update-properties to obtain the new affinity
information about those resources.

The PRRN event-scan RTAS message contains only the "fixed part" with
the "Type" field set to the value 160 and no Extended Event Log. The
four-byte Extended Event Log Length field is repurposed (since no
Extended Event Log message is included) to pass the "scope" parameter
that causes the ibm,update-nodes to return the nodes affected by the
specific resource reassignment.

This patch adds a handler in rtasd for PRRN RTAS events. The function
pseries_devicetree_update() (from mobility.c) is used to make the
ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
(handled by a subsequent patch) will require significant processing,
so pseries_devicetree_update() is called from an asynchronous workqueue
to allow rtasd to continue processing events. Since we flush all work
on the queue before handling any new work there should only be one event
in flight of being handled at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h |    2 ++
 arch/powerpc/kernel/rtasd.c     |   35 ++++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 1 deletion(-)

Index: powerpc/arch/powerpc/include/asm/rtas.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/rtas.h	2013-03-20 08:51:59.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/rtas.h	2013-03-20 08:52:08.000000000 -0500
@@ -143,6 +143,8 @@
 #define RTAS_TYPE_PMGM_TIME_ALARM	0x6f
 #define RTAS_TYPE_PMGM_CONFIG_CHANGE	0x70
 #define RTAS_TYPE_PMGM_SERVICE_PROC	0x71
+/* Platform Resource Reassignment Notification */
+#define RTAS_TYPE_PRRN			0xA0
 
 /* RTAS check-exception vector offset */
 #define RTAS_VECTOR_EXTERNAL_INTERRUPT	0x500
Index: powerpc/arch/powerpc/kernel/rtasd.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-03-20 08:24:14.000000000 -0500
+++ powerpc/arch/powerpc/kernel/rtasd.c	2013-03-20 08:52:08.000000000 -0500
@@ -87,6 +87,8 @@
 			return "Resource Deallocation Event";
 		case RTAS_TYPE_DUMP:
 			return "Dump Notification Event";
+		case RTAS_TYPE_PRRN:
+			return "Platform Resource Reassignment Event";
 	}
 
 	return rtas_type[0];
@@ -265,7 +267,38 @@
 		spin_unlock_irqrestore(&rtasd_log_lock, s);
 		return;
 	}
+}
+
+static s32 update_scope;
+
+static void prrn_work_fn(struct work_struct *work)
+{
+	/*
+	 * For PRRN, we must pass the negative of the scope value in
+	 * the RTAS event.
+	 */
+	pseries_devicetree_update(-update_scope);
+}
+static DECLARE_WORK(prrn_work, prrn_work_fn);
+
+void prrn_schedule_update(u32 scope)
+{
+	flush_work(&prrn_work);
+	update_scope = scope;
+	schedule_work(&prrn_work);
+}
+
+static void pseries_handle_event(const struct rtas_error_log *log)
+{
+	pSeries_log_error((char *)log, ERR_TYPE_RTAS_LOG, 0);
+
+	if (log->type == RTAS_TYPE_PRRN)
+		/* For PRRN Events the extended log length is used to denote
+		 * the scope for calling rtas update-nodes.
+		 */
+		prrn_schedule_update(log->extended_log_length);
 
+	return;
 }
 
 static int rtas_log_open(struct inode * inode, struct file * file)
@@ -389,7 +422,7 @@
 		}
 
 		if (error == 0)
-			pSeries_log_error(logdata, ERR_TYPE_RTAS_LOG, 0);
+			pseries_handle_event((struct rtas_error_log *)logdata);
 
 	} while(error == 0);
 }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 3/11] Move architecture vector definitions to prom.h
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
  2013-03-25 18:51 ` [PATCH v2 1/11] Expose pseries devicetree_update() Nathan Fontenot
  2013-03-25 18:52 ` [PATCH v2 2/11] Add PRRN Event Handler Nathan Fontenot
@ 2013-03-25 18:53 ` Nathan Fontenot
  2013-03-25 18:54 ` [PATCH v2 4/11] Update firmware_has_feature() to check architecture bits Nathan Fontenot
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:53 UTC (permalink / raw)
  To: linuxppc-dev

As part of handling of hndling PRRN events we will need to check the
vector 5 portion of the architectire bits reported in the device tree
to ensure that PRRN event handling is enabled. In order to do this a
new platform_has_feature call is introduced (in a subsequent patch) to
make this check.  To avoid having to re-define bits in the architecture
vector the bits are moved to prom.h.

This patch is the first step in implementing the platform_has_feature
call by simply moving the bit definitions from prom_init.c to asm/prom.h.
There are no functional.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

---
 arch/powerpc/include/asm/prom.h |   73 ++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/prom_init.c |   75 +++-------------------------------------
 2 files changed, 79 insertions(+), 69 deletions(-)

Index: powerpc/arch/powerpc/include/asm/prom.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/prom.h	2013-03-20 08:24:13.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/prom.h	2013-03-20 08:52:59.000000000 -0500
@@ -74,6 +74,79 @@
 #define DRCONF_MEM_AI_INVALID	0x00000040
 #define DRCONF_MEM_RESERVED	0x00000080
 
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
+/*
+ * There are two methods for telling firmware what our capabilities are.
+ * Newer machines have an "ibm,client-architecture-support" method on the
+ * root node.  For older machines, we have to call the "process-elf-header"
+ * method in the /packages/elf-loader node, passing it a fake 32-bit
+ * ELF header containing a couple of PT_NOTE sections that contain
+ * structures that contain various information.
+ */
+
+/* New method - extensible architecture description vector. */
+
+/* Option vector bits - generic bits in byte 1 */
+#define OV_IGNORE		0x80	/* ignore this vector */
+#define OV_CESSATION_POLICY	0x40	/* halt if unsupported option present*/
+
+/* Option vector 1: processor architectures supported */
+#define OV1_PPC_2_00		0x80	/* set if we support PowerPC 2.00 */
+#define OV1_PPC_2_01		0x40	/* set if we support PowerPC 2.01 */
+#define OV1_PPC_2_02		0x20	/* set if we support PowerPC 2.02 */
+#define OV1_PPC_2_03		0x10	/* set if we support PowerPC 2.03 */
+#define OV1_PPC_2_04		0x08	/* set if we support PowerPC 2.04 */
+#define OV1_PPC_2_05		0x04	/* set if we support PowerPC 2.05 */
+#define OV1_PPC_2_06		0x02	/* set if we support PowerPC 2.06 */
+#define OV1_PPC_2_07		0x01	/* set if we support PowerPC 2.07 */
+
+/* Option vector 2: Open Firmware options supported */
+#define OV2_REAL_MODE		0x20	/* set if we want OF in real mode */
+
+/* Option vector 3: processor options supported */
+#define OV3_FP			0x80	/* floating point */
+#define OV3_VMX			0x40	/* VMX/Altivec */
+#define OV3_DFP			0x20	/* decimal FP */
+
+/* Option vector 4: IBM PAPR implementation */
+#define OV4_MIN_ENT_CAP		0x01	/* minimum VP entitled capacity */
+
+/* Option vector 5: PAPR/OF options supported */
+#define OV5_LPAR		0x80	/* logical partitioning supported */
+#define OV5_SPLPAR		0x40	/* shared-processor LPAR supported */
+/* ibm,dynamic-reconfiguration-memory property supported */
+#define OV5_DRCONF_MEMORY	0x20
+#define OV5_LARGE_PAGES		0x10	/* large pages supported */
+#define OV5_DONATE_DEDICATE_CPU	0x02	/* donate dedicated CPU support */
+/* PCIe/MSI support.  Without MSI full PCIe is not supported */
+#ifdef CONFIG_PCI_MSI
+#define OV5_MSI			0x01	/* PCIe/MSI support */
+#else
+#define OV5_MSI			0x00
+#endif /* CONFIG_PCI_MSI */
+#ifdef CONFIG_PPC_SMLPAR
+#define OV5_CMO			0x80	/* Cooperative Memory Overcommitment */
+#define OV5_XCMO		0x40	/* Page Coalescing */
+#else
+#define OV5_CMO			0x00
+#define OV5_XCMO		0x00
+#endif
+#define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
+#define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
+#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
+#define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
+#define OV5_SUB_PROCESSORS	0x01	/* 1,2,or 4 Sub-Processors supported */
+
+/* Option Vector 6: IBM PAPR hints */
+#define OV6_LINUX		0x02	/* Linux is our OS */
+
+/*
+ * The architecture vector has an array of PVR mask/value pairs,
+ * followed by # option vectors - 1, followed by the option vectors.
+ */
+extern unsigned char ibm_architecture_vec[];
+#endif
+
 /* These includes are put at the bottom because they may contain things
  * that are overridden by this file.  Ideally they shouldn't be included
  * by this file, but there are a bunch of .c files that currently depend
Index: powerpc/arch/powerpc/kernel/prom_init.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/prom_init.c	2013-03-20 08:24:13.000000000 -0500
+++ powerpc/arch/powerpc/kernel/prom_init.c	2013-03-20 08:52:59.000000000 -0500
@@ -627,16 +627,11 @@
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
- * There are two methods for telling firmware what our capabilities are.
- * Newer machines have an "ibm,client-architecture-support" method on the
- * root node.  For older machines, we have to call the "process-elf-header"
- * method in the /packages/elf-loader node, passing it a fake 32-bit
- * ELF header containing a couple of PT_NOTE sections that contain
- * structures that contain various information.
- */
-
-/*
- * New method - extensible architecture description vector.
+ * The architecture vector has an array of PVR mask/value pairs,
+ * followed by # option vectors - 1, followed by the option vectors.
+ *
+ * See prom.h for the definition of the bits specified in the
+ * achitecture vector.
  *
  * Because the description vector contains a mix of byte and word
  * values, we declare it as an unsigned char array, and use this
@@ -645,65 +640,7 @@
 #define W(x)	((x) >> 24) & 0xff, ((x) >> 16) & 0xff, \
 		((x) >> 8) & 0xff, (x) & 0xff
 
-/* Option vector bits - generic bits in byte 1 */
-#define OV_IGNORE		0x80	/* ignore this vector */
-#define OV_CESSATION_POLICY	0x40	/* halt if unsupported option present*/
-
-/* Option vector 1: processor architectures supported */
-#define OV1_PPC_2_00		0x80	/* set if we support PowerPC 2.00 */
-#define OV1_PPC_2_01		0x40	/* set if we support PowerPC 2.01 */
-#define OV1_PPC_2_02		0x20	/* set if we support PowerPC 2.02 */
-#define OV1_PPC_2_03		0x10	/* set if we support PowerPC 2.03 */
-#define OV1_PPC_2_04		0x08	/* set if we support PowerPC 2.04 */
-#define OV1_PPC_2_05		0x04	/* set if we support PowerPC 2.05 */
-#define OV1_PPC_2_06		0x02	/* set if we support PowerPC 2.06 */
-#define OV1_PPC_2_07		0x01	/* set if we support PowerPC 2.07 */
-
-/* Option vector 2: Open Firmware options supported */
-#define OV2_REAL_MODE		0x20	/* set if we want OF in real mode */
-
-/* Option vector 3: processor options supported */
-#define OV3_FP			0x80	/* floating point */
-#define OV3_VMX			0x40	/* VMX/Altivec */
-#define OV3_DFP			0x20	/* decimal FP */
-
-/* Option vector 4: IBM PAPR implementation */
-#define OV4_MIN_ENT_CAP		0x01	/* minimum VP entitled capacity */
-
-/* Option vector 5: PAPR/OF options supported */
-#define OV5_LPAR		0x80	/* logical partitioning supported */
-#define OV5_SPLPAR		0x40	/* shared-processor LPAR supported */
-/* ibm,dynamic-reconfiguration-memory property supported */
-#define OV5_DRCONF_MEMORY	0x20
-#define OV5_LARGE_PAGES		0x10	/* large pages supported */
-#define OV5_DONATE_DEDICATE_CPU 0x02	/* donate dedicated CPU support */
-/* PCIe/MSI support.  Without MSI full PCIe is not supported */
-#ifdef CONFIG_PCI_MSI
-#define OV5_MSI			0x01	/* PCIe/MSI support */
-#else
-#define OV5_MSI			0x00
-#endif /* CONFIG_PCI_MSI */
-#ifdef CONFIG_PPC_SMLPAR
-#define OV5_CMO			0x80	/* Cooperative Memory Overcommitment */
-#define OV5_XCMO			0x40	/* Page Coalescing */
-#else
-#define OV5_CMO			0x00
-#define OV5_XCMO			0x00
-#endif
-#define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
-#define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
-#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
-#define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
-#define OV5_SUB_PROCESSORS	0x01    /* 1,2,or 4 Sub-Processors supported */
-
-/* Option Vector 6: IBM PAPR hints */
-#define OV6_LINUX		0x02	/* Linux is our OS */
-
-/*
- * The architecture vector has an array of PVR mask/value pairs,
- * followed by # option vectors - 1, followed by the option vectors.
- */
-static unsigned char ibm_architecture_vec[] = {
+unsigned char ibm_architecture_vec[] = {
 	W(0xfffe0000), W(0x003a0000),	/* POWER5/POWER5+ */
 	W(0xffff0000), W(0x003e0000),	/* POWER6 */
 	W(0xffff0000), W(0x003f0000),	/* POWER7 */

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 4/11] Update firmware_has_feature() to check architecture bits
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (2 preceding siblings ...)
  2013-03-25 18:53 ` [PATCH v2 3/11] Move architecture vector definitions to prom.h Nathan Fontenot
@ 2013-03-25 18:54 ` Nathan Fontenot
  2013-04-04  4:19   ` Paul Mackerras
  2013-03-25 18:56 ` [PATCH v2 5/11] Update numa.c to use updated firmware_has_feature() Nathan Fontenot
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:54 UTC (permalink / raw)
  To: linuxppc-dev

The firmware_has_feature() function makes it easy to check for supported
features of the hypervisor. This patch extends the capability of the
firmware_has_feature() function to include checking for specified bits
in vector 5 of the architecture vector as is reported in the device tree.

As part of this the #defines used for the architecture vector are
moved to prom.h and re-defined such that the vector 5 options have the vector
index and the feature bits encoded into them. This makes for a much
simpler design to add bits from the architecture vector to be added to
the checking done in firmware_has_feature().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       |    4 +
 arch/powerpc/include/asm/prom.h           |   45 +++++++++-----------
 arch/powerpc/kernel/prom_init.c           |   23 +++++++---
 arch/powerpc/platforms/pseries/firmware.c |   67 ++++++++++++++++++++++++++----
 arch/powerpc/platforms/pseries/pseries.h  |    5 +-
 arch/powerpc/platforms/pseries/setup.c    |   40 ++++++++++++-----
 6 files changed, 131 insertions(+), 53 deletions(-)

Index: powerpc/arch/powerpc/include/asm/prom.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/prom.h	2013-03-25 10:47:54.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/prom.h	2013-03-25 11:07:56.000000000 -0500
@@ -111,31 +111,27 @@
 /* Option vector 4: IBM PAPR implementation */
 #define OV4_MIN_ENT_CAP		0x01	/* minimum VP entitled capacity */
 
-/* Option vector 5: PAPR/OF options supported */
-#define OV5_LPAR		0x80	/* logical partitioning supported */
-#define OV5_SPLPAR		0x40	/* shared-processor LPAR supported */
+/* Option vector 5: PAPR/OF options supported
+ * Thses bits are also used for the platform_has_feature() call so
+ * we encode the vector index in the define and use the OV5_FEAT()
+ * and OV5_INDX() macros to extract the desired information.
+ */
+#define OV5_FEAT(x)	((x) & 0xff)
+#define OV5_INDX(x)	((x) >> 8)
+#define OV5_LPAR		0x0280	/* logical partitioning supported */
+#define OV5_SPLPAR		0x0240	/* shared-processor LPAR supported */
 /* ibm,dynamic-reconfiguration-memory property supported */
-#define OV5_DRCONF_MEMORY	0x20
-#define OV5_LARGE_PAGES		0x10	/* large pages supported */
-#define OV5_DONATE_DEDICATE_CPU	0x02	/* donate dedicated CPU support */
-/* PCIe/MSI support.  Without MSI full PCIe is not supported */
-#ifdef CONFIG_PCI_MSI
-#define OV5_MSI			0x01	/* PCIe/MSI support */
-#else
-#define OV5_MSI			0x00
-#endif /* CONFIG_PCI_MSI */
-#ifdef CONFIG_PPC_SMLPAR
-#define OV5_CMO			0x80	/* Cooperative Memory Overcommitment */
-#define OV5_XCMO		0x40	/* Page Coalescing */
-#else
-#define OV5_CMO			0x00
-#define OV5_XCMO		0x00
-#endif
-#define OV5_TYPE1_AFFINITY	0x80	/* Type 1 NUMA affinity */
-#define OV5_PFO_HW_RNG		0x80	/* PFO Random Number Generator */
-#define OV5_PFO_HW_842		0x40	/* PFO Compression Accelerator */
-#define OV5_PFO_HW_ENCR		0x20	/* PFO Encryption Accelerator */
-#define OV5_SUB_PROCESSORS	0x01	/* 1,2,or 4 Sub-Processors supported */
+#define OV5_DRCONF_MEMORY	0x0220
+#define OV5_LARGE_PAGES		0x0210	/* large pages supported */
+#define OV5_DONATE_DEDICATE_CPU	0x0202	/* donate dedicated CPU support */
+#define OV5_MSI			0x0201	/* PCIe/MSI support */
+#define OV5_CMO			0x0480	/* Cooperative Memory Overcommitment */
+#define OV5_XCMO		0x0440	/* Page Coalescing */
+#define OV5_TYPE1_AFFINITY	0x0580	/* Type 1 NUMA affinity */
+#define OV5_PFO_HW_RNG		0x0E80	/* PFO Random Number Generator */
+#define OV5_PFO_HW_842		0x0E40	/* PFO Compression Accelerator */
+#define OV5_PFO_HW_ENCR		0x0E20	/* PFO Encryption Accelerator */
+#define OV5_SUB_PROCESSORS	0x0F01	/* 1,2,or 4 Sub-Processors supported */
 
 /* Option Vector 6: IBM PAPR hints */
 #define OV6_LINUX		0x02	/* Linux is our OS */
@@ -145,6 +141,7 @@
  * followed by # option vectors - 1, followed by the option vectors.
  */
 extern unsigned char ibm_architecture_vec[];
+bool platform_has_feature(unsigned int);
 #endif
 
 /* These includes are put at the bottom because they may contain things
Index: powerpc/arch/powerpc/kernel/prom_init.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/prom_init.c	2013-03-25 10:47:54.000000000 -0500
+++ powerpc/arch/powerpc/kernel/prom_init.c	2013-03-25 11:07:56.000000000 -0500
@@ -684,11 +684,21 @@
 	/* option vector 5: PAPR/OF options */
 	19 - 2,				/* length */
 	0,				/* don't ignore, don't halt */
-	OV5_LPAR | OV5_SPLPAR | OV5_LARGE_PAGES | OV5_DRCONF_MEMORY |
-	OV5_DONATE_DEDICATE_CPU | OV5_MSI,
+	OV5_FEAT(OV5_LPAR) | OV5_FEAT(OV5_SPLPAR) | OV5_FEAT(OV5_LARGE_PAGES) |
+	OV5_FEAT(OV5_DRCONF_MEMORY) | OV5_FEAT(OV5_DONATE_DEDICATE_CPU) |
+#ifdef CONFIG_PCI_MSI
+	/* PCIe/MSI support.  Without MSI full PCIe is not supported */
+	OV5_FEAT(OV5_MSI),
+#else
+	0,
+#endif
+	0,
+#ifdef CONFIG_PPC_SMLPAR
+	OV5_FEAT(OV5_CMO) | OV5_FEAT(OV5_XCMO),
+#else
 	0,
-	OV5_CMO | OV5_XCMO,
-	OV5_TYPE1_AFFINITY,
+#endif
+	OV5_FEAT(OV5_TYPE1_AFFINITY),
 	0,
 	0,
 	0,
@@ -702,8 +712,9 @@
 	0,
 	0,
 	0,
-	OV5_PFO_HW_RNG | OV5_PFO_HW_ENCR | OV5_PFO_HW_842,
-	OV5_SUB_PROCESSORS,
+	OV5_FEAT(OV5_PFO_HW_RNG) | OV5_FEAT(OV5_PFO_HW_ENCR) |
+	OV5_FEAT(OV5_PFO_HW_842),
+	OV5_FEAT(OV5_SUB_PROCESSORS),
 	/* option vector 6: IBM PAPR hints */
 	4 - 2,				/* length */
 	0,
Index: powerpc/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/setup.c	2013-03-25 10:22:22.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/setup.c	2013-03-25 11:09:45.000000000 -0500
@@ -628,25 +628,39 @@
  * Called very early, MMU is off, device-tree isn't unflattened
  */
 
-static int __init pSeries_probe_hypertas(unsigned long node,
-					 const char *uname, int depth,
-					 void *data)
+static int __init pseries_probe_fw_features(unsigned long node,
+					    const char *uname, int depth,
+					    void *data)
 {
-	const char *hypertas;
+	const char *prop;
 	unsigned long len;
+	static int hypertas_found;
+	static int vec5_found;
 
-	if (depth != 1 ||
-	    (strcmp(uname, "rtas") != 0 && strcmp(uname, "rtas@0") != 0))
+	if (depth != 1)
 		return 0;
 
-	hypertas = of_get_flat_dt_prop(node, "ibm,hypertas-functions", &len);
-	if (!hypertas)
-		return 1;
+	if (!strcmp(uname, "rtas") || !strcmp(uname, "rtas@0")) {
+		prop = of_get_flat_dt_prop(node, "ibm,hypertas-functions",
+					   &len);
+		if (prop) {
+			powerpc_firmware_features |= FW_FEATURE_LPAR;
+			fw_hypertas_feature_init(prop, len);
+		}
+
+		hypertas_found = 1;
+	}
+
+	if (!strcmp(uname, "chosen")) {
+		prop = of_get_flat_dt_prop(node, "ibm,architecture-vec-5",
+					   &len);
+		if (prop)
+			fw_vec5_feature_init(prop, len);
 
-	powerpc_firmware_features |= FW_FEATURE_LPAR;
-	fw_feature_init(hypertas, len);
+		vec5_found = 1;
+	}
 
-	return 1;
+	return hypertas_found && vec5_found;
 }
 
 static int __init pSeries_probe(void)
@@ -669,7 +683,7 @@
 	pr_debug("pSeries detected, looking for LPAR capability...\n");
 
 	/* Now try to figure out if we are running on LPAR */
-	of_scan_flat_dt(pSeries_probe_hypertas, NULL);
+	of_scan_flat_dt(pseries_probe_fw_features, NULL);
 
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		hpte_init_lpar();
Index: powerpc/arch/powerpc/platforms/pseries/firmware.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/firmware.c	2013-03-25 10:22:22.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/firmware.c	2013-03-25 11:11:27.000000000 -0500
@@ -31,15 +31,15 @@
 typedef struct {
     unsigned long val;
     char * name;
-} firmware_feature_t;
+} hypertas_fw_feature_t;
 
 /*
  * The names in this table match names in rtas/ibm,hypertas-functions.  If the
  * entry ends in a '*', only upto the '*' is matched.  Otherwise the entire
  * string must match.
  */
-static __initdata firmware_feature_t
-firmware_features_table[FIRMWARE_MAX_FEATURES] = {
+static __initdata hypertas_fw_feature_t
+hypertas_fw_features_table[FIRMWARE_MAX_FEATURES] = {
 	{FW_FEATURE_PFT,		"hcall-pft"},
 	{FW_FEATURE_TCE,		"hcall-tce"},
 	{FW_FEATURE_SPRG0,		"hcall-sprg0"},
@@ -69,16 +69,16 @@
  * device-tree/ibm,hypertas-functions.  Ultimately this functionality may
  * be moved into prom.c prom_init().
  */
-void __init fw_feature_init(const char *hypertas, unsigned long len)
+void __init fw_hypertas_feature_init(const char *hypertas, unsigned long len)
 {
 	const char *s;
 	int i;
 
-	pr_debug(" -> fw_feature_init()\n");
+	pr_debug(" -> fw_hypertas_feature_init()\n");
 
 	for (s = hypertas; s < hypertas + len; s += strlen(s) + 1) {
 		for (i = 0; i < FIRMWARE_MAX_FEATURES; i++) {
-			const char *name = firmware_features_table[i].name;
+			const char *name = hypertas_fw_features_table[i].name;
 			size_t size;
 			/* check value against table of strings */
 			if (!name)
@@ -96,10 +96,61 @@
 
 			/* we have a match */
 			powerpc_firmware_features |=
-				firmware_features_table[i].val;
+				hypertas_fw_features_table[i].val;
 			break;
 		}
 	}
 
-	pr_debug(" <- fw_feature_init()\n");
+	pr_debug(" <- fw_hypertas_feature_init()\n");
+}
+
+struct vec5_fw_feature {
+	unsigned long	val;
+	unsigned int	feature;
+
+};
+
+static __initdata struct vec5_fw_feature
+vec5_fw_features_table[FIRMWARE_MAX_FEATURES] = {
+	{FW_FEATURE_TYPE1_AFFINITY,	OV5_TYPE1_AFFINITY},
+};
+
+void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
+{
+	const char *s;
+	int index;
+	int i, j;
+
+	pr_debug(" -> fw_vec5_feature_init()\n");
+
+	/* vec5[0] is the length, no need to check */
+	for (s = &vec5[1], index = 1; s < vec5 + len; s++, index++) {
+		if (*s == 0)
+			continue;
+
+		/* Check each bit for a possible match */
+		for (i = 0; i < 8; i++) {
+			unsigned int feat = (index << 8) | (1 << i);
+
+			if ((*s & OV5_FEAT(feat)) == 0)
+				continue;
+
+			/* Look for a match */
+			for (j = 0; j < FIRMWARE_MAX_FEATURES; j++) {
+				if (vec5_fw_features_table[j].val == 0)
+					continue;
+
+				if (vec5_fw_features_table[j].feature != feat)
+					continue;
+
+				/* we have a match */
+				powerpc_firmware_features |=
+					vec5_fw_features_table[j].val;
+
+				break;
+			}
+		}
+	}
+
+	pr_debug(" <- fw_vec5_feature_init()\n");
 }
Index: powerpc/arch/powerpc/include/asm/firmware.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/firmware.h	2013-03-25 10:22:22.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/firmware.h	2013-03-25 11:07:56.000000000 -0500
@@ -51,6 +51,7 @@
 #define FW_FEATURE_OPALv2	ASM_CONST(0x0000000020000000)
 #define FW_FEATURE_SET_MODE	ASM_CONST(0x0000000040000000)
 #define FW_FEATURE_BEST_ENERGY	ASM_CONST(0x0000000080000000)
+#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
 
 #ifndef __ASSEMBLY__
 
@@ -65,7 +66,8 @@
 		FW_FEATURE_BULK_REMOVE | FW_FEATURE_XDABR |
 		FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
 		FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
-		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY,
+		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
+		FW_FEATURE_TYPE1_AFFINITY,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
Index: powerpc/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/pseries.h	2013-03-25 10:22:22.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/pseries.h	2013-03-25 11:07:56.000000000 -0500
@@ -19,7 +19,10 @@
 
 #include <linux/of.h>
 
-extern void __init fw_feature_init(const char *hypertas, unsigned long len);
+extern void __init fw_hypertas_feature_init(const char *hypertas,
+					    unsigned long len);
+extern void __init fw_vec5_feature_init(const char *hypertas,
+					unsigned long len);
 
 struct pt_regs;
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 5/11] Update numa.c to use updated firmware_has_feature()
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (3 preceding siblings ...)
  2013-03-25 18:54 ` [PATCH v2 4/11] Update firmware_has_feature() to check architecture bits Nathan Fontenot
@ 2013-03-25 18:56 ` Nathan Fontenot
  2013-04-04  4:20   ` Paul Mackerras
  2013-03-25 18:57 ` [PATCH v2 6/11] Update CPU Maps Nathan Fontenot
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:56 UTC (permalink / raw)
  To: linuxppc-dev

Update the numa code to use the updated firmware_has_feature() when checking
for type 1 affinity.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-20 12:25:42.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-03-20 12:26:29.000000000 -0500
@@ -291,9 +291,7 @@
 static int __init find_min_common_depth(void)
 {
 	int depth;
-	struct device_node *chosen;
 	struct device_node *root;
-	const char *vec5;
 
 	if (firmware_has_feature(FW_FEATURE_OPAL))
 		root = of_find_node_by_path("/ibm,opal");
@@ -325,24 +323,10 @@
 
 	distance_ref_points_depth /= sizeof(int);
 
-#define VEC5_AFFINITY_BYTE	5
-#define VEC5_AFFINITY		0x80
-
-	if (firmware_has_feature(FW_FEATURE_OPAL))
+	if (firmware_has_feature(FW_FEATURE_OPAL) ||
+	    firmware_has_feature(FW_FEATURE_TYPE1_AFFINITY)) {
+		dbg("Using form 1 affinity\n");
 		form1_affinity = 1;
-	else {
-		chosen = of_find_node_by_path("/chosen");
-		if (chosen) {
-			vec5 = of_get_property(chosen,
-					       "ibm,architecture-vec-5", NULL);
-			if (vec5 && (vec5[VEC5_AFFINITY_BYTE] &
-							VEC5_AFFINITY)) {
-				dbg("Using form 1 affinity\n");
-				form1_affinity = 1;
-			}
-
-			of_node_put(chosen);
-		}
 	}
 
 	if (form1_affinity) {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 6/11] Update CPU Maps
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (4 preceding siblings ...)
  2013-03-25 18:56 ` [PATCH v2 5/11] Update numa.c to use updated firmware_has_feature() Nathan Fontenot
@ 2013-03-25 18:57 ` Nathan Fontenot
  2013-04-04  4:42   ` Paul Mackerras
  2013-03-25 18:58 ` [PATCH v2 7/11] Use stop machine to update cpu maps Nathan Fontenot
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:57 UTC (permalink / raw)
  To: linuxppc-dev

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

Platform events such as partition migration or the new PRRN firmware
feature can cause the NUMA characteristics of a CPU to change, and these
changes will be reflected in the device tree nodes for the affected
CPUs.

This patch registers a handler for Open Firmware device tree updates
and reconfigures the CPU and node maps whenever the associativity
changes. Currently, this is accomplished by marking the affected CPUs in
the cpu_associativity_changes_mask and allowing
arch_update_cpu_topology() to retrieve the new associativity information
using hcall_vphn().

Protecting the NUMA cpu maps from concurrent access during an update
operation will be addressed in a subsequent patch in this series.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/firmware.h       |    3 
 arch/powerpc/include/asm/prom.h           |    1 
 arch/powerpc/mm/numa.c                    |   99 ++++++++++++++++++++++--------
 arch/powerpc/platforms/pseries/firmware.c |    1 
 4 files changed, 79 insertions(+), 25 deletions(-)

Index: powerpc/arch/powerpc/include/asm/prom.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/prom.h	2013-03-25 11:07:56.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/prom.h	2013-03-25 11:27:11.000000000 -0500
@@ -128,6 +128,7 @@
 #define OV5_CMO			0x0480	/* Cooperative Memory Overcommitment */
 #define OV5_XCMO		0x0440	/* Page Coalescing */
 #define OV5_TYPE1_AFFINITY	0x0580	/* Type 1 NUMA affinity */
+#define OV5_PRRN		0x0540	/* Platform Resource Reassignment */
 #define OV5_PFO_HW_RNG		0x0E80	/* PFO Random Number Generator */
 #define OV5_PFO_HW_842		0x0E40	/* PFO Compression Accelerator */
 #define OV5_PFO_HW_ENCR		0x0E20	/* PFO Encryption Accelerator */
Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-25 11:22:44.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-03-25 11:27:11.000000000 -0500
@@ -1257,7 +1257,8 @@
 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
-static void set_topology_timer(void);
+static int prrn_enabled;
+static void reset_topology_timer(void);
 
 /*
  * Store the current values of the associativity change counters in the
@@ -1293,11 +1294,9 @@
  */
 static int update_cpu_associativity_changes_mask(void)
 {
-	int cpu, nr_cpus = 0;
+	int cpu;
 	cpumask_t *changes = &cpu_associativity_changes_mask;
 
-	cpumask_clear(changes);
-
 	for_each_possible_cpu(cpu) {
 		int i, changed = 0;
 		u8 *counts = vphn_cpu_change_counts[cpu];
@@ -1311,11 +1310,10 @@
 		}
 		if (changed) {
 			cpumask_set_cpu(cpu, changes);
-			nr_cpus++;
 		}
 	}
 
-	return nr_cpus;
+	return cpumask_weight(changes);
 }
 
 /*
@@ -1416,7 +1414,7 @@
 	unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	struct device *dev;
 
-	for_each_cpu(cpu,&cpu_associativity_changes_mask) {
+	for_each_cpu(cpu, &cpu_associativity_changes_mask) {
 		vphn_get_associativity(cpu, associativity);
 		nid = associativity_to_nid(associativity);
 
@@ -1438,6 +1436,7 @@
 		dev = get_cpu_device(cpu);
 		if (dev)
 			kobject_uevent(&dev->kobj, KOBJ_CHANGE);
+		cpumask_clear_cpu(cpu, &cpu_associativity_changes_mask);
 		changed = 1;
 	}
 
@@ -1457,37 +1456,80 @@
 
 static void topology_timer_fn(unsigned long ignored)
 {
-	if (!vphn_enabled)
-		return;
-	if (update_cpu_associativity_changes_mask() > 0)
+	if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask))
 		topology_schedule_update();
-	set_topology_timer();
+	else if (vphn_enabled) {
+		if (update_cpu_associativity_changes_mask() > 0)
+			topology_schedule_update();
+		reset_topology_timer();
+	}
 }
 static struct timer_list topology_timer =
 	TIMER_INITIALIZER(topology_timer_fn, 0, 0);
 
-static void set_topology_timer(void)
+static void reset_topology_timer(void)
 {
 	topology_timer.data = 0;
 	topology_timer.expires = jiffies + 60 * HZ;
-	add_timer(&topology_timer);
+	mod_timer(&topology_timer, topology_timer.expires);
+}
+
+static void stage_topology_update(int core_id)
+{
+	cpumask_or(&cpu_associativity_changes_mask,
+		&cpu_associativity_changes_mask, cpu_sibling_mask(core_id));
+	reset_topology_timer();
 }
 
+static int dt_update_callback(struct notifier_block *nb,
+				unsigned long action, void *data)
+{
+	struct of_prop_reconfig *update;
+	int rc = NOTIFY_DONE;
+
+	switch (action) {
+	case OF_RECONFIG_ADD_PROPERTY:
+	case OF_RECONFIG_UPDATE_PROPERTY:
+		update = (struct of_prop_reconfig *)data;
+		if (!of_prop_cmp(update->dn->type, "cpu")) {
+			u32 core_id;
+			of_property_read_u32(update->dn, "reg", &core_id);
+			stage_topology_update(core_id);
+			rc = NOTIFY_OK;
+		}
+		break;
+	}
+
+	return rc;
+}
+
+static struct notifier_block dt_update_nb = {
+	.notifier_call = dt_update_callback,
+};
+
 /*
- * Start polling for VPHN associativity changes.
+ * Start polling for associativity changes.
  */
 int start_topology_update(void)
 {
 	int rc = 0;
 
-	/* Disabled until races with load balancing are fixed */
-	if (0 && firmware_has_feature(FW_FEATURE_VPHN) &&
-	    get_lppaca()->shared_proc) {
-		vphn_enabled = 1;
-		setup_cpu_associativity_change_counters();
-		init_timer_deferrable(&topology_timer);
-		set_topology_timer();
-		rc = 1;
+	if (firmware_has_feature(OV5_PRRN)) {
+		if (!prrn_enabled) {
+			prrn_enabled = 1;
+			vphn_enabled = 0;
+			rc = of_reconfig_notifier_register(&dt_update_nb);
+		}
+	} else if (0 && firmware_has_feature(FW_FEATURE_VPHN) &&
+		   get_lppaca()->shared_proc) {
+		/* Disabled until races with load balancing are fixed */
+		if (!vphn_enabled) {
+			prrn_enabled = 0;
+			vphn_enabled = 1;
+			setup_cpu_associativity_change_counters();
+			init_timer_deferrable(&topology_timer);
+			reset_topology_timer();
+		}
 	}
 
 	return rc;
@@ -1499,7 +1541,16 @@
  */
 int stop_topology_update(void)
 {
-	vphn_enabled = 0;
-	return del_timer_sync(&topology_timer);
+	int rc = 0;
+
+	if (prrn_enabled) {
+		prrn_enabled = 0;
+		rc = of_reconfig_notifier_unregister(&dt_update_nb);
+	} else if (vphn_enabled) {
+		vphn_enabled = 0;
+		rc = del_timer_sync(&topology_timer);
+	}
+
+	return rc;
 }
 #endif /* CONFIG_PPC_SPLPAR */
Index: powerpc/arch/powerpc/include/asm/firmware.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/firmware.h	2013-03-25 11:07:56.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/firmware.h	2013-03-25 11:27:11.000000000 -0500
@@ -52,6 +52,7 @@
 #define FW_FEATURE_SET_MODE	ASM_CONST(0x0000000040000000)
 #define FW_FEATURE_BEST_ENERGY	ASM_CONST(0x0000000080000000)
 #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
+#define FW_FEATURE_PRRN		ASM_CONST(0x0000000200000000)
 
 #ifndef __ASSEMBLY__
 
@@ -67,7 +68,7 @@
 		FW_FEATURE_MULTITCE | FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
 		FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
 		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
-		FW_FEATURE_TYPE1_AFFINITY,
+		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_OPALv2,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
Index: powerpc/arch/powerpc/platforms/pseries/firmware.c
===================================================================
--- powerpc.orig/arch/powerpc/platforms/pseries/firmware.c	2013-03-25 11:11:27.000000000 -0500
+++ powerpc/arch/powerpc/platforms/pseries/firmware.c	2013-03-25 11:27:11.000000000 -0500
@@ -113,6 +113,7 @@
 static __initdata struct vec5_fw_feature
 vec5_fw_features_table[FIRMWARE_MAX_FEATURES] = {
 	{FW_FEATURE_TYPE1_AFFINITY,	OV5_TYPE1_AFFINITY},
+	{FW_FEATURE_PRRN,		OV5_PRRN},
 };
 
 void __init fw_vec5_feature_init(const char *vec5, unsigned long len)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 7/11] Use stop machine to update cpu maps
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (5 preceding siblings ...)
  2013-03-25 18:57 ` [PATCH v2 6/11] Update CPU Maps Nathan Fontenot
@ 2013-03-25 18:58 ` Nathan Fontenot
  2013-04-04  4:46   ` Paul Mackerras
  2013-03-25 18:59 ` [PATCH v2 8/11] Update numa cpu vdso info Nathan Fontenot
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:58 UTC (permalink / raw)
  To: linuxppc-dev

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

The new PRRN firmware feature allows CPU and memory resources to be
transparently reassigned across NUMA boundaries. When this happens, the
kernel must update the node maps to reflect the new affinity
information.

Although the NUMA maps can be protected by locking primitives during the
update itself, this is insufficient to prevent concurrent accesses to these
structures. Since cpumask_of_node() hands out a pointer to these
structures, they can still be modified outside of the lock. Furthermore,
tracking down each usage of these pointers and adding locks would be quite
invasive and difficult to maintain.

Situations like these are best handled using stop_machine(). Since the NUMA
affinity updates are exceptionally rare events, this approach has the
benefit of not adding any overhead while accessing the NUMA maps during
normal operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |   51 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 16 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-20 12:26:36.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-03-20 12:27:43.000000000 -0500
@@ -22,6 +22,7 @@
 #include <linux/pfn.h>
 #include <linux/cpuset.h>
 #include <linux/node.h>
+#include <linux/stop_machine.h>
 #include <asm/sparsemem.h>
 #include <asm/prom.h>
 #include <asm/smp.h>
@@ -1254,6 +1255,12 @@
 
 /* Virtual Processor Home Node (VPHN) support */
 #ifdef CONFIG_PPC_SPLPAR
+struct topology_update_data {
+	int cpu;
+	int old_nid;
+	int new_nid;
+};
+
 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
@@ -1405,34 +1412,46 @@
 }
 
 /*
+ * Update the CPU maps and sysfs entries for a single CPU when its NUMA
+ * characteristics change. This function doesn't perform any locking and is
+ * only safe to call from stop_machine().
+ */
+static int update_cpu_topology(void *data)
+{
+	struct topology_update_data *update = data;
+
+	if (!update)
+		return -EINVAL;
+
+	unregister_cpu_under_node(update->cpu, update->old_nid);
+	unmap_cpu_from_node(update->cpu);
+	map_cpu_to_node(update->cpu, update->new_nid);
+	register_cpu_under_node(update->cpu, update->new_nid);
+
+	return 0;
+}
+
+/*
  * Update the node maps and sysfs entries for each cpu whose home node
  * has changed. Returns 1 when the topology has changed, and 0 otherwise.
  */
 int arch_update_cpu_topology(void)
 {
-	int cpu, nid, old_nid, changed = 0;
+	int cpu, changed = 0;
+	struct topology_update_data update;
 	unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	struct device *dev;
 
 	for_each_cpu(cpu, &cpu_associativity_changes_mask) {
+		update.cpu = cpu;
 		vphn_get_associativity(cpu, associativity);
-		nid = associativity_to_nid(associativity);
-
-		if (nid < 0 || !node_online(nid))
-			nid = first_online_node;
+		update.new_nid = associativity_to_nid(associativity);
 
-		old_nid = numa_cpu_lookup_table[cpu];
-
-		/* Disable hotplug while we update the cpu
-		 * masks and sysfs.
-		 */
-		get_online_cpus();
-		unregister_cpu_under_node(cpu, old_nid);
-		unmap_cpu_from_node(cpu);
-		map_cpu_to_node(cpu, nid);
-		register_cpu_under_node(cpu, nid);
-		put_online_cpus();
+		if (update.new_nid < 0 || !node_online(update.new_nid))
+			update.new_nid = first_online_node;
 
+		update.old_nid = numa_cpu_lookup_table[cpu];
+		stop_machine(update_cpu_topology, &update, cpu_online_mask);
 		dev = get_cpu_device(cpu);
 		if (dev)
 			kobject_uevent(&dev->kobj, KOBJ_CHANGE);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 8/11] Update numa cpu vdso info
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (6 preceding siblings ...)
  2013-03-25 18:58 ` [PATCH v2 7/11] Use stop machine to update cpu maps Nathan Fontenot
@ 2013-03-25 18:59 ` Nathan Fontenot
  2013-03-25 19:00 ` [PATCH v2 9/11] Re-enable Virtual Private Home Node capabilities Nathan Fontenot
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 18:59 UTC (permalink / raw)
  To: linuxppc-dev

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

The following patch adds vdso_getcpu_init(), which stores the NUMA node for
a cpu in SPRG3:

Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds
vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3.

This patch ensures that this information is also updated when the NUMA
affinity of a cpu changes.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-20 12:27:43.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-03-20 12:27:46.000000000 -0500
@@ -30,6 +30,7 @@
 #include <asm/paca.h>
 #include <asm/hvcall.h>
 #include <asm/setup.h>
+#include <asm/vdso.h>
 
 static int numa_enabled = 1;
 
@@ -1426,6 +1427,7 @@
 	unregister_cpu_under_node(update->cpu, update->old_nid);
 	unmap_cpu_from_node(update->cpu);
 	map_cpu_to_node(update->cpu, update->new_nid);
+	vdso_getcpu_init();
 	register_cpu_under_node(update->cpu, update->new_nid);
 
 	return 0;
@@ -1440,8 +1442,11 @@
 	int cpu, changed = 0;
 	struct topology_update_data update;
 	unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
+	cpumask_t updated_cpu;
 	struct device *dev;
 
+	cpumask_clear(&updated_cpu);
+
 	for_each_cpu(cpu, &cpu_associativity_changes_mask) {
 		update.cpu = cpu;
 		vphn_get_associativity(cpu, associativity);
@@ -1451,7 +1456,8 @@
 			update.new_nid = first_online_node;
 
 		update.old_nid = numa_cpu_lookup_table[cpu];
-		stop_machine(update_cpu_topology, &update, cpu_online_mask);
+		cpumask_set_cpu(cpu, &updated_cpu);
+		stop_machine(update_cpu_topology, &update, &updated_cpu);
 		dev = get_cpu_device(cpu);
 		if (dev)
 			kobject_uevent(&dev->kobj, KOBJ_CHANGE);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 9/11] Re-enable Virtual Private Home Node capabilities
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (7 preceding siblings ...)
  2013-03-25 18:59 ` [PATCH v2 8/11] Update numa cpu vdso info Nathan Fontenot
@ 2013-03-25 19:00 ` Nathan Fontenot
  2013-03-25 19:01 ` [PATCH v2 10/11] Enable PRRN Nathan Fontenot
  2013-03-25 19:02 ` [PATCH v2 11/11] Add /proc interface to control topology updates Nathan Fontenot
  10 siblings, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 19:00 UTC (permalink / raw)
  To: linuxppc-dev

From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>

The new PRRN firmware feature provides a more convenient and event-driven
interface than VPHN for notifying Linux of changes to the NUMA affinity of
platform resources. However, for practical reasons, it may not be feasible
for some customers to update to the latest firmware. For these customers,
the VPHN feature supported on previous firmware versions may still be the
best option.

The VPHN feature was previously disabled due to races with the load
balancing code when accessing the NUMA cpu maps, but the new stop_machine()
approach protects the NUMA cpu maps from these concurrent accesses. It
should be safe to re-enable this feature now.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-20 12:27:46.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-03-20 12:27:48.000000000 -0500
@@ -1545,9 +1545,8 @@
 			vphn_enabled = 0;
 			rc = of_reconfig_notifier_register(&dt_update_nb);
 		}
-	} else if (0 && firmware_has_feature(FW_FEATURE_VPHN) &&
+	} else if (firmware_has_feature(FW_FEATURE_VPHN) &&
 		   get_lppaca()->shared_proc) {
-		/* Disabled until races with load balancing are fixed */
 		if (!vphn_enabled) {
 			prrn_enabled = 0;
 			vphn_enabled = 1;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 10/11] Enable PRRN
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (8 preceding siblings ...)
  2013-03-25 19:00 ` [PATCH v2 9/11] Re-enable Virtual Private Home Node capabilities Nathan Fontenot
@ 2013-03-25 19:01 ` Nathan Fontenot
  2013-03-25 19:02 ` [PATCH v2 11/11] Add /proc interface to control topology updates Nathan Fontenot
  10 siblings, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 19:01 UTC (permalink / raw)
  To: linuxppc-dev

The Linux kernel and platform firmware negotiate their mutual support
of the PRRN option via the ibm,client-architecture-support interface.
This patch simply sets the appropriate fields in the client architecture
vector to indicate Linux support and will cause the firmware to begin
sending PRRN events via the RTAS event-scan mechanism.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/prom_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: powerpc/arch/powerpc/kernel/prom_init.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/prom_init.c	2013-03-20 12:25:38.000000000 -0500
+++ powerpc/arch/powerpc/kernel/prom_init.c	2013-03-20 12:27:50.000000000 -0500
@@ -698,7 +698,7 @@
 #else
 	0,
 #endif
-	OV5_FEAT(OV5_TYPE1_AFFINITY),
+	OV5_FEAT(OV5_TYPE1_AFFINITY) | OV5_FEAT(OV5_PRRN),
 	0,
 	0,
 	0,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 11/11] Add /proc interface to control topology updates
  2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
                   ` (9 preceding siblings ...)
  2013-03-25 19:01 ` [PATCH v2 10/11] Enable PRRN Nathan Fontenot
@ 2013-03-25 19:02 ` Nathan Fontenot
  2013-04-10  6:59   ` Michael Ellerman
  10 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-03-25 19:02 UTC (permalink / raw)
  To: linuxppc-dev

There are instances in which we do not want topology updates to occur.
In order to allow this a /proc interface (/proc/powerpc/topology_updates)
is introduced so that topology updates can be enabled and disabled.

This patch also adds a prrn_is_enabled() call so that PRRN events are
handled in the kernel only if topology updating is enabled.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h |    5 ++
 arch/powerpc/kernel/rtasd.c         |    6 ++-
 arch/powerpc/mm/numa.c              |   62 +++++++++++++++++++++++++++++++++++-
 3 files changed, 70 insertions(+), 3 deletions(-)

Index: powerpc/arch/powerpc/mm/numa.c
===================================================================
--- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-20 12:27:48.000000000 -0500
+++ powerpc/arch/powerpc/mm/numa.c	2013-03-20 12:27:52.000000000 -0500
@@ -23,6 +23,9 @@
 #include <linux/cpuset.h>
 #include <linux/node.h>
 #include <linux/stop_machine.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/uaccess.h>
 #include <asm/sparsemem.h>
 #include <asm/prom.h>
 #include <asm/smp.h>
@@ -1558,7 +1561,6 @@
 
 	return rc;
 }
-__initcall(start_topology_update);
 
 /*
  * Disable polling for VPHN associativity changes.
@@ -1577,4 +1579,62 @@
 
 	return rc;
 }
+
+inline int prrn_is_enabled(void)
+{
+	return prrn_enabled;
+}
+
+static int topology_read(struct seq_file *file, void *v)
+{
+	if (vphn_enabled || prrn_enabled)
+		seq_puts(file, "on\n");
+	else
+		seq_puts(file, "off\n");
+
+	return 0;
+}
+
+static int topology_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, topology_read, NULL);
+}
+
+static ssize_t topology_write(struct file *file, const char __user *buf,
+			      size_t count, loff_t *off)
+{
+	char kbuf[4]; /* "on" or "off" plus null. */
+	int read_len;
+
+	read_len = count < 3 ? count : 3;
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EINVAL;
+
+	kbuf[read_len] = '\0';
+
+	if (!strncmp(kbuf, "on", 2))
+		start_topology_update();
+	else if (!strncmp(kbuf, "off", 3))
+		stop_topology_update();
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations topology_ops = {
+	.read = seq_read,
+	.write = topology_write,
+	.open = topology_open,
+	.release = single_release
+};
+
+static int topology_update_init(void)
+{
+	start_topology_update();
+	proc_create("powerpc/topology_updates", 644, NULL, &topology_ops);
+
+	return 0;
+}
+device_initcall(topology_update_init);
 #endif /* CONFIG_PPC_SPLPAR */
Index: powerpc/arch/powerpc/include/asm/topology.h
===================================================================
--- powerpc.orig/arch/powerpc/include/asm/topology.h	2013-03-20 12:25:37.000000000 -0500
+++ powerpc/arch/powerpc/include/asm/topology.h	2013-03-20 12:27:52.000000000 -0500
@@ -71,6 +71,7 @@
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
 extern int start_topology_update(void);
 extern int stop_topology_update(void);
+extern inline int prrn_is_enabled(void);
 #else
 static inline int start_topology_update(void)
 {
@@ -80,6 +81,10 @@
 {
 	return 0;
 }
+static inline int prrn_is_enabled(void)
+{
+	return 0;
+}
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include <asm-generic/topology.h>
Index: powerpc/arch/powerpc/kernel/rtasd.c
===================================================================
--- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-03-20 12:25:37.000000000 -0500
+++ powerpc/arch/powerpc/kernel/rtasd.c	2013-03-20 12:27:52.000000000 -0500
@@ -292,11 +292,13 @@
 {
 	pSeries_log_error((char *)log, ERR_TYPE_RTAS_LOG, 0);
 
-	if (log->type == RTAS_TYPE_PRRN)
+	if (log->type == RTAS_TYPE_PRRN) {
 		/* For PRRN Events the extended log length is used to denote
 		 * the scope for calling rtas update-nodes.
 		 */
-		prrn_schedule_update(log->extended_log_length);
+		if (prrn_is_enabled())
+			prrn_schedule_update(log->extended_log_length);
+	}
 
 	return;
 }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/11] Expose pseries devicetree_update()
  2013-03-25 18:51 ` [PATCH v2 1/11] Expose pseries devicetree_update() Nathan Fontenot
@ 2013-04-04  3:09   ` Paul Mackerras
  0 siblings, 0 replies; 26+ messages in thread
From: Paul Mackerras @ 2013-04-04  3:09 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:51:38PM -0500, Nathan Fontenot wrote:
> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
> 
> Newer firmware on Power systems can transparently reassign platform resources
> (CPU and Memory) in use. For instance, if a processor or memory unit is
> predicted to fail, the platform may transparently move the processing to an
> equivalent unused processor or the memory state to an equivalent unused
> memory unit. However, reassigning resources across NUMA boundaries may alter
> the performance of the partition. When such reassignment is necessary, the
> Platform Resource Reassignment Notification (PRRN) option provides a
> mechanism to inform the Linux kernel of changes to the NUMA affinity of
> its platform resources.
> 
> When rtasd receives a PRRN event, it needs to make a series of RTAS
> calls (ibm,update-nodes and ibm,update-properties) to retrieve the
> updated device tree information. These calls are already handled in the
> pseries_devtree_update() routine used in partition migration.
> 
> This patch simply exposes pseries_devicetree_update() so it can be
> called by rtasd. pseries_devicetree_update() and supporting functions
> are also modified to take a 32-bit 'scope' parameter. This parameter is
> required by the ibm,update-nodes/ibm,update-properties RTAS calls, and
> the appropriate value is contained within the RTAS event for PRRN
> notifications. In pseries_devicetree_update() it was previously
> hard-coded to 1, the scope value for partition migration.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/11] Add PRRN Event Handler
  2013-03-25 18:52 ` [PATCH v2 2/11] Add PRRN Event Handler Nathan Fontenot
@ 2013-04-04  3:34   ` Paul Mackerras
  2013-04-04  7:16     ` Benjamin Herrenschmidt
  2013-04-05 15:43     ` Nathan Fontenot
  2013-04-10  8:30   ` Michael Ellerman
  1 sibling, 2 replies; 26+ messages in thread
From: Paul Mackerras @ 2013-04-04  3:34 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:52:32PM -0500, Nathan Fontenot wrote:
> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
> 
> A PRRN event is signaled via the RTAS event-scan mechanism, which
> returns a Hot Plug Event message "fixed part" indicating "Platform
> Resource Reassignment". In response to the Hot Plug Event message,
> we must call ibm,update-nodes to determine which resources were
> reassigned and then ibm,update-properties to obtain the new affinity
> information about those resources.
> 
> The PRRN event-scan RTAS message contains only the "fixed part" with
> the "Type" field set to the value 160 and no Extended Event Log. The
> four-byte Extended Event Log Length field is repurposed (since no
> Extended Event Log message is included) to pass the "scope" parameter
> that causes the ibm,update-nodes to return the nodes affected by the
> specific resource reassignment.
> 
> This patch adds a handler in rtasd for PRRN RTAS events. The function
> pseries_devicetree_update() (from mobility.c) is used to make the
> ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
> (handled by a subsequent patch) will require significant processing,
> so pseries_devicetree_update() is called from an asynchronous workqueue
> to allow rtasd to continue processing events. Since we flush all work
> on the queue before handling any new work there should only be one event
> in flight of being handled at a time.
            ^^ "of" is superfluous

In the worst case where PRRN events come close together in time, the
flush_work will block for however long it takes to do this
"significant processing", meaning that we're no better off using a
workqueue.  Do we have any reason to think that these PRRN events will
normally be widely spaced in time?  If so you should mention it in the
patch description.

Also, rtasd isn't actually a task, it's just a function that gets run
via schedule_delayed_work_on() and re-schedules itself each time it
runs.  Is there any deadlock possibility in calling flush_work from a
work function?

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/11] Update firmware_has_feature() to check architecture bits
  2013-03-25 18:54 ` [PATCH v2 4/11] Update firmware_has_feature() to check architecture bits Nathan Fontenot
@ 2013-04-04  4:19   ` Paul Mackerras
  0 siblings, 0 replies; 26+ messages in thread
From: Paul Mackerras @ 2013-04-04  4:19 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:54:54PM -0500, Nathan Fontenot wrote:
> The firmware_has_feature() function makes it easy to check for supported
> features of the hypervisor. This patch extends the capability of the
> firmware_has_feature() function to include checking for specified bits
> in vector 5 of the architecture vector as is reported in the device tree.
> 
> As part of this the #defines used for the architecture vector are
> moved to prom.h and re-defined such that the vector 5 options have the vector
> index and the feature bits encoded into them. This makes for a much
> simpler design to add bits from the architecture vector to be added to
> the checking done in firmware_has_feature().
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

The inner loop in fw_vec5_feature_init is perhaps a bit less efficient
than it could be, but I don't imagine it's going to take a noticeable
amount of time.

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/11] Update numa.c to use updated firmware_has_feature()
  2013-03-25 18:56 ` [PATCH v2 5/11] Update numa.c to use updated firmware_has_feature() Nathan Fontenot
@ 2013-04-04  4:20   ` Paul Mackerras
  0 siblings, 0 replies; 26+ messages in thread
From: Paul Mackerras @ 2013-04-04  4:20 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:56:05PM -0500, Nathan Fontenot wrote:
> Update the numa code to use the updated firmware_has_feature() when checking
> for type 1 affinity.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 6/11] Update CPU Maps
  2013-03-25 18:57 ` [PATCH v2 6/11] Update CPU Maps Nathan Fontenot
@ 2013-04-04  4:42   ` Paul Mackerras
  2013-04-05 18:02     ` Nathan Fontenot
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Mackerras @ 2013-04-04  4:42 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:57:08PM -0500, Nathan Fontenot wrote:
> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
> 
> Platform events such as partition migration or the new PRRN firmware
> feature can cause the NUMA characteristics of a CPU to change, and these
> changes will be reflected in the device tree nodes for the affected
> CPUs.
> 
> This patch registers a handler for Open Firmware device tree updates
> and reconfigures the CPU and node maps whenever the associativity
> changes. Currently, this is accomplished by marking the affected CPUs in
> the cpu_associativity_changes_mask and allowing
> arch_update_cpu_topology() to retrieve the new associativity information
> using hcall_vphn().
> 
> Protecting the NUMA cpu maps from concurrent access during an update
> operation will be addressed in a subsequent patch in this series.
> 
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>

[snip]

> +	if (firmware_has_feature(OV5_PRRN)) {

Shouldn't this be FW_FEATURE_PRRN?  How well has this patch been
tested? :-/

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/11] Use stop machine to update cpu maps
  2013-03-25 18:58 ` [PATCH v2 7/11] Use stop machine to update cpu maps Nathan Fontenot
@ 2013-04-04  4:46   ` Paul Mackerras
  2013-04-05 18:22     ` Nathan Fontenot
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Mackerras @ 2013-04-04  4:46 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:58:04PM -0500, Nathan Fontenot wrote:
> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
> 
> The new PRRN firmware feature allows CPU and memory resources to be
> transparently reassigned across NUMA boundaries. When this happens, the
> kernel must update the node maps to reflect the new affinity
> information.
> 
> Although the NUMA maps can be protected by locking primitives during the
> update itself, this is insufficient to prevent concurrent accesses to these
> structures. Since cpumask_of_node() hands out a pointer to these
> structures, they can still be modified outside of the lock. Furthermore,
> tracking down each usage of these pointers and adding locks would be quite
> invasive and difficult to maintain.
> 
> Situations like these are best handled using stop_machine(). Since the NUMA
> affinity updates are exceptionally rare events, this approach has the
> benefit of not adding any overhead while accessing the NUMA maps during
> normal operation.

I notice you do one stop_machine() call for every cpu whose affinity
has changed.  Couldn't we update the affinity for them all in one
stop_machine call?  Given that stopping the whole machine can be quite
slow, wouldn't it be better to do one call rather than potentially
many?

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/11] Add PRRN Event Handler
  2013-04-04  3:34   ` Paul Mackerras
@ 2013-04-04  7:16     ` Benjamin Herrenschmidt
  2013-04-05 15:43     ` Nathan Fontenot
  1 sibling, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2013-04-04  7:16 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Nathan Fontenot, linuxppc-dev

On Thu, 2013-04-04 at 14:34 +1100, Paul Mackerras wrote:
> Also, rtasd isn't actually a task, it's just a function that gets run
> via schedule_delayed_work_on() and re-schedules itself each time it
> runs.  Is there any deadlock possibility in calling flush_work from a
> work function?

There used to be, but I'm not familiar with the "new" implementation of
the work queue stuff.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/11] Add PRRN Event Handler
  2013-04-04  3:34   ` Paul Mackerras
  2013-04-04  7:16     ` Benjamin Herrenschmidt
@ 2013-04-05 15:43     ` Nathan Fontenot
  1 sibling, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-04-05 15:43 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On 04/03/2013 10:34 PM, Paul Mackerras wrote:
> On Mon, Mar 25, 2013 at 01:52:32PM -0500, Nathan Fontenot wrote:
>> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>>
>> A PRRN event is signaled via the RTAS event-scan mechanism, which
>> returns a Hot Plug Event message "fixed part" indicating "Platform
>> Resource Reassignment". In response to the Hot Plug Event message,
>> we must call ibm,update-nodes to determine which resources were
>> reassigned and then ibm,update-properties to obtain the new affinity
>> information about those resources.
>>
>> The PRRN event-scan RTAS message contains only the "fixed part" with
>> the "Type" field set to the value 160 and no Extended Event Log. The
>> four-byte Extended Event Log Length field is repurposed (since no
>> Extended Event Log message is included) to pass the "scope" parameter
>> that causes the ibm,update-nodes to return the nodes affected by the
>> specific resource reassignment.
>>
>> This patch adds a handler in rtasd for PRRN RTAS events. The function
>> pseries_devicetree_update() (from mobility.c) is used to make the
>> ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
>> (handled by a subsequent patch) will require significant processing,
>> so pseries_devicetree_update() is called from an asynchronous workqueue
>> to allow rtasd to continue processing events. Since we flush all work
>> on the queue before handling any new work there should only be one event
>> in flight of being handled at a time.
>             ^^ "of" is superfluous

will remove it.

> 
> In the worst case where PRRN events come close together in time, the
> flush_work will block for however long it takes to do this
> "significant processing", meaning that we're no better off using a
> workqueue.  Do we have any reason to think that these PRRN events will
> normally be widely spaced in time?  If so you should mention it in the
> patch description.

Yes. PRRN events can only be triggered from the HMC by an IBM tech who has
to actualy log into a customer system and initiate the PRRN event. There
is no method for a user to initiate a PRRN event. Given this is is safe
to assume that these events will be widely spaced in time.

> 
> Also, rtasd isn't actually a task, it's just a function that gets run
> via schedule_delayed_work_on() and re-schedules itself each time it
> runs.  Is there any deadlock possibility in calling flush_work from a
> work function?

I don't know of any but I will investigate.

Thanks for the feedback.
-Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 6/11] Update CPU Maps
  2013-04-04  4:42   ` Paul Mackerras
@ 2013-04-05 18:02     ` Nathan Fontenot
  0 siblings, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-04-05 18:02 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On 04/03/2013 11:42 PM, Paul Mackerras wrote:
> On Mon, Mar 25, 2013 at 01:57:08PM -0500, Nathan Fontenot wrote:
>> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>>
>> Platform events such as partition migration or the new PRRN firmware
>> feature can cause the NUMA characteristics of a CPU to change, and these
>> changes will be reflected in the device tree nodes for the affected
>> CPUs.
>>
>> This patch registers a handler for Open Firmware device tree updates
>> and reconfigures the CPU and node maps whenever the associativity
>> changes. Currently, this is accomplished by marking the affected CPUs in
>> the cpu_associativity_changes_mask and allowing
>> arch_update_cpu_topology() to retrieve the new associativity information
>> using hcall_vphn().
>>
>> Protecting the NUMA cpu maps from concurrent access during an update
>> operation will be addressed in a subsequent patch in this series.
>>
>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
> 
> [snip]
> 
>> +	if (firmware_has_feature(OV5_PRRN)) {
> 
> Shouldn't this be FW_FEATURE_PRRN?  How well has this patch been
> tested? :-/

Yes this should have been FW_FEATURE_PRRN.

I know I tested this and it took some digging to find out why my test succeeded
even though I used the wrong value in the call to firmware_has_feature. The value
for OV5_PRRN (0x0540) just happens to match some of he bits that are set in
powerpc_firmware_features bit field and cause the check to return true. My test
worked out of sheer luck. I'll update this patch and re-test to ensure it works
with the real value.

This does make me think, should we update firmware_has_feature() to avoid this
kind of false positive in the future. something like

#define firmware_has_feature(feature)                                              \
        ((FW_FEATURE_ALWAYS & (feature)) == (feature) ||                           \
         (FW_FEATURE_POSSIBLE & powerpc_firmware_features & (feature)) == (feature)

-Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/11] Use stop machine to update cpu maps
  2013-04-04  4:46   ` Paul Mackerras
@ 2013-04-05 18:22     ` Nathan Fontenot
  2013-04-23  0:23       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 26+ messages in thread
From: Nathan Fontenot @ 2013-04-05 18:22 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

On 04/03/2013 11:46 PM, Paul Mackerras wrote:
> On Mon, Mar 25, 2013 at 01:58:04PM -0500, Nathan Fontenot wrote:
>> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>>
>> The new PRRN firmware feature allows CPU and memory resources to be
>> transparently reassigned across NUMA boundaries. When this happens, the
>> kernel must update the node maps to reflect the new affinity
>> information.
>>
>> Although the NUMA maps can be protected by locking primitives during the
>> update itself, this is insufficient to prevent concurrent accesses to these
>> structures. Since cpumask_of_node() hands out a pointer to these
>> structures, they can still be modified outside of the lock. Furthermore,
>> tracking down each usage of these pointers and adding locks would be quite
>> invasive and difficult to maintain.
>>
>> Situations like these are best handled using stop_machine(). Since the NUMA
>> affinity updates are exceptionally rare events, this approach has the
>> benefit of not adding any overhead while accessing the NUMA maps during
>> normal operation.
> 
> I notice you do one stop_machine() call for every cpu whose affinity
> has changed.  Couldn't we update the affinity for them all in one
> stop_machine call?  Given that stopping the whole machine can be quite
> slow, wouldn't it be better to do one call rather than potentially
> many?
> 

Agreed, having to call stop_machine() for each cpu that gets updated is
pretty brutal. The plus side is that PRRN events should a rare occurrence 
and not cause too much pain.

The current design ties into the of notification chain so that we can do
the affinity update when the affinity property in the device tree is updated.
Switching to doing one stop and updating all of the cpus would require a
design change....and....

I went back and looked at the code again and there is another issue with
way this is done. Tying into the of notification chain is great for
being informed of when a property changes but the code (from patch 6/11)

+	case OF_RECONFIG_ADD_PROPERTY:
+	case OF_RECONFIG_UPDATE_PROPERTY:
+		update = (struct of_prop_reconfig *)data;
+		if (!of_prop_cmp(update->dn->type, "cpu")) {
+			u32 core_id;
+			of_property_read_u32(update->dn, "reg", &core_id);
+			stage_topology_update(core_id);
+			rc = NOTIFY_OK;
+		}
+		break;

Does not check to see which property is being updated and just assumes
the affinity is being updated. This code as is will do an affinity update
every time any property of a cpu is updated or added.

Since this needs an update I will also look at possibly doing this so
that we call stop_machine only once.

-- 
-Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 11/11] Add /proc interface to control topology updates
  2013-03-25 19:02 ` [PATCH v2 11/11] Add /proc interface to control topology updates Nathan Fontenot
@ 2013-04-10  6:59   ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2013-04-10  6:59 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 02:02:09PM -0500, Nathan Fontenot wrote:
> There are instances in which we do not want topology updates to occur.
> In order to allow this a /proc interface (/proc/powerpc/topology_updates)
> is introduced so that topology updates can be enabled and disabled.
> 
> This patch also adds a prrn_is_enabled() call so that PRRN events are
> handled in the kernel only if topology updating is enabled.

Hi Nathan,
 
> Index: powerpc/arch/powerpc/mm/numa.c
> ===================================================================
> --- powerpc.orig/arch/powerpc/mm/numa.c	2013-03-20 12:27:48.000000000 -0500
> +++ powerpc/arch/powerpc/mm/numa.c	2013-03-20 12:27:52.000000000 -0500
> @@ -1577,4 +1579,62 @@
>  
>  	return rc;
>  }
> +
> +inline int prrn_is_enabled(void)
> +{
> +	return prrn_enabled;
> +}

...

> Index: powerpc/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- powerpc.orig/arch/powerpc/include/asm/topology.h	2013-03-20 12:25:37.000000000 -0500
> +++ powerpc/arch/powerpc/include/asm/topology.h	2013-03-20 12:27:52.000000000 -0500
> @@ -71,6 +71,7 @@
>  #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
>  extern int start_topology_update(void);
>  extern int stop_topology_update(void);
> +extern inline int prrn_is_enabled(void);

This doesn't compile for me, with:

arch/powerpc/kernel/rtasd.c: In function 'rtas_event_scan':
arch/powerpc/include/asm/topology.h:74:19: sorry, unimplemented: inlining failed in call to 'prrn_is_enabled': function body not available
arch/powerpc/kernel/rtasd.c:299:22: sorry, unimplemented: called from here


cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/11] Add PRRN Event Handler
  2013-03-25 18:52 ` [PATCH v2 2/11] Add PRRN Event Handler Nathan Fontenot
  2013-04-04  3:34   ` Paul Mackerras
@ 2013-04-10  8:30   ` Michael Ellerman
  2013-04-15 20:12     ` Nathan Fontenot
  1 sibling, 1 reply; 26+ messages in thread
From: Michael Ellerman @ 2013-04-10  8:30 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev

On Mon, Mar 25, 2013 at 01:52:32PM -0500, Nathan Fontenot wrote:
> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
> 
> A PRRN event is signaled via the RTAS event-scan mechanism, which
> returns a Hot Plug Event message "fixed part" indicating "Platform
> Resource Reassignment". In response to the Hot Plug Event message,
> we must call ibm,update-nodes to determine which resources were
> reassigned and then ibm,update-properties to obtain the new affinity
> information about those resources.
..

> Index: powerpc/arch/powerpc/kernel/rtasd.c
> ===================================================================
> --- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-03-20 08:24:14.000000000 -0500
> +++ powerpc/arch/powerpc/kernel/rtasd.c	2013-03-20 08:52:08.000000000 -0500
> @@ -87,6 +87,8 @@
>  			return "Resource Deallocation Event";
>  		case RTAS_TYPE_DUMP:
>  			return "Dump Notification Event";
> +		case RTAS_TYPE_PRRN:
> +			return "Platform Resource Reassignment Event";
>  	}
>  
>  	return rtas_type[0];
> @@ -265,7 +267,38 @@
>  		spin_unlock_irqrestore(&rtasd_log_lock, s);
>  		return;
>  	}
> +}
> +
> +static s32 update_scope;
> +
> +static void prrn_work_fn(struct work_struct *work)
> +{
> +	/*
> +	 * For PRRN, we must pass the negative of the scope value in
> +	 * the RTAS event.
> +	 */
> +	pseries_devicetree_update(-update_scope);
> +}
> +static DECLARE_WORK(prrn_work, prrn_work_fn);

This breaks the 32-bit build (ppc6xx_defconfig):

arch/powerpc/kernel/rtasd.c:280: undefined reference to `pseries_devicetree_update'

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/11] Add PRRN Event Handler
  2013-04-10  8:30   ` Michael Ellerman
@ 2013-04-15 20:12     ` Nathan Fontenot
  0 siblings, 0 replies; 26+ messages in thread
From: Nathan Fontenot @ 2013-04-15 20:12 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On 04/10/2013 03:30 AM, Michael Ellerman wrote:
> On Mon, Mar 25, 2013 at 01:52:32PM -0500, Nathan Fontenot wrote:
>> From: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>>
>> A PRRN event is signaled via the RTAS event-scan mechanism, which
>> returns a Hot Plug Event message "fixed part" indicating "Platform
>> Resource Reassignment". In response to the Hot Plug Event message,
>> we must call ibm,update-nodes to determine which resources were
>> reassigned and then ibm,update-properties to obtain the new affinity
>> information about those resources.
> ..
> 
>> Index: powerpc/arch/powerpc/kernel/rtasd.c
>> ===================================================================
>> --- powerpc.orig/arch/powerpc/kernel/rtasd.c	2013-03-20 08:24:14.000000000 -0500
>> +++ powerpc/arch/powerpc/kernel/rtasd.c	2013-03-20 08:52:08.000000000 -0500
>> @@ -87,6 +87,8 @@
>>  			return "Resource Deallocation Event";
>>  		case RTAS_TYPE_DUMP:
>>  			return "Dump Notification Event";
>> +		case RTAS_TYPE_PRRN:
>> +			return "Platform Resource Reassignment Event";
>>  	}
>>  
>>  	return rtas_type[0];
>> @@ -265,7 +267,38 @@
>>  		spin_unlock_irqrestore(&rtasd_log_lock, s);
>>  		return;
>>  	}
>> +}
>> +
>> +static s32 update_scope;
>> +
>> +static void prrn_work_fn(struct work_struct *work)
>> +{
>> +	/*
>> +	 * For PRRN, we must pass the negative of the scope value in
>> +	 * the RTAS event.
>> +	 */
>> +	pseries_devicetree_update(-update_scope);
>> +}
>> +static DECLARE_WORK(prrn_work, prrn_work_fn);
> 
> This breaks the 32-bit build (ppc6xx_defconfig):
> 
> arch/powerpc/kernel/rtasd.c:280: undefined reference to `pseries_devicetree_update'
> 

I'm not seeing this error. rtasd.c compilkes fine, but I am hitting another
error later in the build that keeps it from finishing.

arch/powerpc/platforms/52xx/mpc52xx_pic.c: In function ‘mpc52xx_irqhost_map’:
arch/powerpc/platforms/52xx/mpc52xx_pic.c:343: error: ‘irqchip’ may be used uninitialized in this function


-Nathan 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/11] Use stop machine to update cpu maps
  2013-04-05 18:22     ` Nathan Fontenot
@ 2013-04-23  0:23       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2013-04-23  0:23 UTC (permalink / raw)
  To: Nathan Fontenot; +Cc: linuxppc-dev, Paul Mackerras

On Fri, 2013-04-05 at 13:22 -0500, Nathan Fontenot wrote:

> Agreed, having to call stop_machine() for each cpu that gets updated is
> pretty brutal. The plus side is that PRRN events should a rare occurrence 
> and not cause too much pain.

So that doesn't happen on VPHN changes ?

> The current design ties into the of notification chain so that we can do
> the affinity update when the affinity property in the device tree is updated.
> Switching to doing one stop and updating all of the cpus would require a
> design change....and....
> 
> I went back and looked at the code again and there is another issue with
> way this is done. Tying into the of notification chain is great for
> being informed of when a property changes but the code (from patch 6/11)
> 
> +	case OF_RECONFIG_ADD_PROPERTY:
> +	case OF_RECONFIG_UPDATE_PROPERTY:
> +		update = (struct of_prop_reconfig *)data;
> +		if (!of_prop_cmp(update->dn->type, "cpu")) {
> +			u32 core_id;
> +			of_property_read_u32(update->dn, "reg", &core_id);
> +			stage_topology_update(core_id);
> +			rc = NOTIFY_OK;
> +		}
> +		break;
> 
> Does not check to see which property is being updated and just assumes
> the affinity is being updated. This code as is will do an affinity update
> every time any property of a cpu is updated or added.
> 
> Since this needs an update I will also look at possibly doing this so
> that we call stop_machine only once.

Any new patch set ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-04-23  0:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-25 18:43 [PATCH v2 0/11] NUMA CPU Reconfiguration using PRRN Nathan Fontenot
2013-03-25 18:51 ` [PATCH v2 1/11] Expose pseries devicetree_update() Nathan Fontenot
2013-04-04  3:09   ` Paul Mackerras
2013-03-25 18:52 ` [PATCH v2 2/11] Add PRRN Event Handler Nathan Fontenot
2013-04-04  3:34   ` Paul Mackerras
2013-04-04  7:16     ` Benjamin Herrenschmidt
2013-04-05 15:43     ` Nathan Fontenot
2013-04-10  8:30   ` Michael Ellerman
2013-04-15 20:12     ` Nathan Fontenot
2013-03-25 18:53 ` [PATCH v2 3/11] Move architecture vector definitions to prom.h Nathan Fontenot
2013-03-25 18:54 ` [PATCH v2 4/11] Update firmware_has_feature() to check architecture bits Nathan Fontenot
2013-04-04  4:19   ` Paul Mackerras
2013-03-25 18:56 ` [PATCH v2 5/11] Update numa.c to use updated firmware_has_feature() Nathan Fontenot
2013-04-04  4:20   ` Paul Mackerras
2013-03-25 18:57 ` [PATCH v2 6/11] Update CPU Maps Nathan Fontenot
2013-04-04  4:42   ` Paul Mackerras
2013-04-05 18:02     ` Nathan Fontenot
2013-03-25 18:58 ` [PATCH v2 7/11] Use stop machine to update cpu maps Nathan Fontenot
2013-04-04  4:46   ` Paul Mackerras
2013-04-05 18:22     ` Nathan Fontenot
2013-04-23  0:23       ` Benjamin Herrenschmidt
2013-03-25 18:59 ` [PATCH v2 8/11] Update numa cpu vdso info Nathan Fontenot
2013-03-25 19:00 ` [PATCH v2 9/11] Re-enable Virtual Private Home Node capabilities Nathan Fontenot
2013-03-25 19:01 ` [PATCH v2 10/11] Enable PRRN Nathan Fontenot
2013-03-25 19:02 ` [PATCH v2 11/11] Add /proc interface to control topology updates Nathan Fontenot
2013-04-10  6:59   ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).