linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] pseries: Add cede support for cpu-offline
@ 2009-10-09  8:30 Gautham R Shenoy
  2009-10-09  8:30 ` [PATCH v4 1/4] pSeries: extended_cede_processor() helper function Gautham R Shenoy
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Gautham R Shenoy @ 2009-10-09  8:30 UTC (permalink / raw)
  To: Nathan Fontenot, Benjamin Herrenschmidt, linuxppc-dev,
	Peter Zijlstra, linux-kernel
  Cc: Arun R Bharadwaj

Hi,

This is version 4 of patch series that provides a framework to choose the
state a pseries CPU must be put to when it is offlined.

Previous versions can be found here:
Version 3: http://lkml.org/lkml/2009/9/15/164
Version 2: http://lkml.org/lkml/2009/8/28/102
Version 1: http://lkml.org/lkml/2009/8/6/236

This patch series differs considerably from the previous iterations in that:
- It is based on top of Nathan Fontenot's
  "pseries kernel handling of dynamic logical partitioning v2" patches
  found here: http://lkml.org/lkml/2009/9/18/173

- It does away with the sysfs tunables "available_hotplug_states" and
  "currrent_hotplug_state", thereby refraining from exposing any platform
  specific option to the userspace.

- It provides a framework that discovers the support for ceding the vcpu
  to the hypervisor with a latency specifier value. When such a support is
  present, it would put the offlined CPUs into a ceded state while
  setting the cede   latency specifier value in the VPA indicating
  the latency expectation of the   guest OS. Hypervisor may use this for
  better energy savings.

- On platforms which don't have support for ceding with a latency specifier
  hint, it puts the offlined CPUs into the rtas_stop_self() state.

- Currently, Nathan's patches have provided sysfs interfaces
	/sys/devices/system/cpu/{probe, release}
  to dynamically allocate and deallocate resources respectively. The current
  process of allocating and deallocating CPUs would require two steps,
  namely:
	- Write to the "probe/release" file
	- Write to the "online" file.

  This patch-series combines these two operations such that when the user
  writes to the "probe", the CPUs would not only get allocated to the
  partition but also onlined. Similarly, when the user writes to "release",
  the CPUs would be offlined before they are deallocated.

  Thus, the user will now have to play with only one set of sysfs tunables
  for performing deallocate and the sysfs tunable "online" can be used for
  the purpose of deactivating the CPU while still letting the LPAR own it.

- It provides serializations for handling the writes the the "online" file
  against the writes to "probe/release" pair so that at any given instant,
  the user could either be deallocating the CPUs or deactivating it, but
  not both.

With this approach, I hope that the objection regarding the layering
violation that was raised by Peter for previous approaches is
addressed, while also providing clean interfaces for the userspace tools
to distinguish between the deallocation and deactivation without exposing
any platform specific details.

The patchset has been tested on the available pseries platforms and it
works as per the expectations.

Comments are very much appreciated.
---

Gautham R Shenoy (4):
      pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
      pseries: Add code to online/offline CPUs of a DLPAR node.
      pSeries: Add hooks to put the CPU into an appropriate offline state
      pSeries: extended_cede_processor() helper function.


 arch/powerpc/include/asm/lppaca.h               |    9 +
 arch/powerpc/platforms/pseries/dlpar.c          |  129 ++++++++++++++++++-
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |  157 ++++++++++++++++++++++-
 arch/powerpc/platforms/pseries/offline_states.h |   18 +++
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   22 +++
 arch/powerpc/platforms/pseries/smp.c            |   19 +++
 arch/powerpc/xmon/xmon.c                        |    3 
 drivers/base/cpu.c                              |    2 
 include/linux/cpu.h                             |   13 ++
 9 files changed, 356 insertions(+), 16 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

-- 
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/4] pSeries: extended_cede_processor() helper function.
  2009-10-09  8:30 [PATCH v4 0/4] pseries: Add cede support for cpu-offline Gautham R Shenoy
@ 2009-10-09  8:30 ` Gautham R Shenoy
  2009-10-09  8:30 ` [PATCH v4 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state Gautham R Shenoy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Gautham R Shenoy @ 2009-10-09  8:30 UTC (permalink / raw)
  To: Nathan Fontenot, Benjamin Herrenschmidt, linuxppc-dev,
	Peter Zijlstra, linux-kernel
  Cc: Arun R Bharadwaj

This patch provides an extended_cede_processor() helper function
which takes the cede latency hint as an argument. This hint is to be passed
on to the hypervisor to cede to the corresponding state on platforms
which support it.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h               |    9 ++++++++-
 arch/powerpc/platforms/pseries/plpar_wrappers.h |   22 ++++++++++++++++++++++
 arch/powerpc/xmon/xmon.c                        |    3 ++-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index f78f65c..14b592d 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -100,7 +100,14 @@ struct lppaca {
 	// Used to pass parms from the OS to PLIC for SetAsrAndRfid
 	u64	saved_gpr3;		// Saved GPR3                   x20-x27
 	u64	saved_gpr4;		// Saved GPR4                   x28-x2F
-	u64	saved_gpr5;		// Saved GPR5                   x30-x37
+	union {
+		u64	saved_gpr5;	/* Saved GPR5               x30-x37 */
+		struct {
+			u8	cede_latency_hint;  /*			x30 */
+			u8	reserved[7];        /*		    x31-x36 */
+		} fields;
+	} gpr5_dword;
+
 
 	u8	dtl_enable_mask;	// Dispatch Trace Log mask	x38-x38
 	u8	donate_dedicated_cpu;	// Donate dedicated CPU cycles  x39-x39
diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index a24a6b2..0603c91 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -9,11 +9,33 @@ static inline long poll_pending(void)
 	return plpar_hcall_norets(H_POLL_PENDING);
 }
 
+static inline u8 get_cede_latency_hint(void)
+{
+	return get_lppaca()->gpr5_dword.fields.cede_latency_hint;
+}
+
+static inline void set_cede_latency_hint(u8 latency_hint)
+{
+	get_lppaca()->gpr5_dword.fields.cede_latency_hint = latency_hint;
+}
+
 static inline long cede_processor(void)
 {
 	return plpar_hcall_norets(H_CEDE);
 }
 
+static inline long extended_cede_processor(unsigned long latency_hint)
+{
+	long rc;
+	u8 old_latency_hint = get_cede_latency_hint();
+
+	set_cede_latency_hint(latency_hint);
+	rc = cede_processor();
+	set_cede_latency_hint(old_latency_hint);
+
+	return rc;
+}
+
 static inline long vpa_call(unsigned long flags, unsigned long cpu,
 		unsigned long vpa)
 {
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index c6f0a71..57124cf 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1623,7 +1623,8 @@ static void super_regs(void)
 			       ptrLpPaca->saved_srr0, ptrLpPaca->saved_srr1);
 			printf("    Saved Gpr3=%.16lx  Saved Gpr4=%.16lx \n",
 			       ptrLpPaca->saved_gpr3, ptrLpPaca->saved_gpr4);
-			printf("    Saved Gpr5=%.16lx \n", ptrLpPaca->saved_gpr5);
+			printf("    Saved Gpr5=%.16lx \n",
+				ptrLpPaca->gpr5_dword.saved_gpr5);
 		}
 #endif
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state
  2009-10-09  8:30 [PATCH v4 0/4] pseries: Add cede support for cpu-offline Gautham R Shenoy
  2009-10-09  8:30 ` [PATCH v4 1/4] pSeries: extended_cede_processor() helper function Gautham R Shenoy
@ 2009-10-09  8:30 ` Gautham R Shenoy
  2009-10-09  8:31 ` [PATCH v4 3/4] pseries: Add code to online/offline CPUs of a DLPAR node Gautham R Shenoy
  2009-10-09  8:31 ` [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate Gautham R Shenoy
  3 siblings, 0 replies; 6+ messages in thread
From: Gautham R Shenoy @ 2009-10-09  8:30 UTC (permalink / raw)
  To: Nathan Fontenot, Benjamin Herrenschmidt, linuxppc-dev,
	Peter Zijlstra, linux-kernel
  Cc: Arun R Bharadwaj

When a CPU is offlined on POWER currently, we call rtas_stop_self() and hand
the CPU back to the resource pool. This path is used for DLPAR which will
cause a change in the LPAR configuration which will be visible outside.

This patch changes the default state a CPU is put into when it is offlined.
On platforms which support ceding the processor to the hypervisor with
latency hint specifier value, during a cpu offline operation,
instead of calling rtas_stop_self(), we cede the vCPU to the hypervisor
while passing a latency hint specifier value. The Hypervisor can use this hint
to provide better energy savings. Also, during the offline
operation, the control of the vCPU remains with the LPAR as oppposed to
returning it to the resource pool.

The patch achieves this by creating an infrastructure to set the
preferred_offline_state() which can be either
- CPU_STATE_OFFLINE: which is the current behaviour of calling
  rtas_stop_self()

- CPU_STATE_INACTIVE: which cedes the vCPU to the hypervisor with the latency
  hint specifier.

The codepath which wants to perform a DLPAR operation can set the
preferred_offline_state() of a CPU to CPU_STATE_OFFLINE before invoking
cpu_down().

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |  157 ++++++++++++++++++++++-
 arch/powerpc/platforms/pseries/offline_states.h |   18 +++
 arch/powerpc/platforms/pseries/smp.c            |   19 +++
 3 files changed, 185 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_states.h

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index ebff6d9..8a47db4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include <asm/pSeries_reconfig.h>
 #include "xics.h"
 #include "plpar_wrappers.h"
+#include "offline_states.h"
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -39,6 +40,37 @@ static struct rtas_args rtas_stop_self_args = {
 	.rets = &rtas_stop_self_args.args[0],
 };
 
+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+							CPU_STATE_OFFLINE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_STATE_OFFLINE;
+
+static enum cpu_state_vals default_offline_state = CPU_STATE_OFFLINE;
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+	return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+	return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+	per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
 static void rtas_stop_self(void)
 {
 	struct rtas_args *args = &rtas_stop_self_args;
@@ -56,11 +88,66 @@ static void rtas_stop_self(void)
 
 static void pseries_mach_cpu_die(void)
 {
+	unsigned int cpu = smp_processor_id();
+	unsigned int hwcpu = hard_smp_processor_id();
+	u8 cede_latency_hint = 0;
+
 	local_irq_disable();
 	idle_task_exit();
 	xics_teardown_cpu();
-	unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-	rtas_stop_self();
+
+	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+		set_cpu_current_state(cpu, CPU_STATE_INACTIVE);
+		cede_latency_hint = 2;
+
+		get_lppaca()->idle = 1;
+		if (!get_lppaca()->shared_proc)
+			get_lppaca()->donate_dedicated_cpu = 1;
+
+		printk(KERN_INFO
+			"cpu %u (hwid %u) ceding for offline with hint %d\n",
+			cpu, hwcpu, cede_latency_hint);
+		while (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+			extended_cede_processor(cede_latency_hint);
+			printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n",
+				cpu, hwcpu);
+			printk(KERN_INFO
+			"Decrementer value = %x Timebase value = %llx\n",
+			get_dec(), get_tb());
+		}
+
+		printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n",
+			cpu, hwcpu);
+
+		if (!get_lppaca()->shared_proc)
+			get_lppaca()->donate_dedicated_cpu = 0;
+		get_lppaca()->idle = 0;
+	}
+
+	if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
+		unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+		/*
+		 * NOTE: Calling start_secondary() here for now to
+		 * start new context.
+		 * However, need to do it cleanly by resetting the
+		 * stack pointer.
+		 */
+		start_secondary();
+		goto out_bug;
+
+	}
+
+	if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+		set_cpu_current_state(cpu, CPU_STATE_OFFLINE);
+		unregister_slb_shadow(hard_smp_processor_id(),
+					__pa(get_slb_shadow()));
+		rtas_stop_self();
+		goto out_bug;
+	}
+
+out_bug:
 	/* Should never get here... */
 	BUG();
 	for(;;);
@@ -109,15 +196,28 @@ static int pseries_cpu_disable(void)
 static void pseries_cpu_die(unsigned int cpu)
 {
 	int tries;
-	int cpu_status;
+	int cpu_status = 1;
 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-	for (tries = 0; tries < 25; tries++) {
-		cpu_status = query_cpu_stopped(pcpu);
-		if (cpu_status == 0 || cpu_status == -1)
-			break;
-		cpu_relax();
+	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
+		cpu_status = 1;
+		for (tries = 0; tries < 1000; tries++) {
+			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
+				cpu_status = 0;
+				break;
+			}
+			cpu_relax();
+		}
+	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+
+		for (tries = 0; tries < 25; tries++) {
+			cpu_status = query_cpu_stopped(pcpu);
+			if (cpu_status == 0 || cpu_status == -1)
+				break;
+			cpu_relax();
+		}
 	}
+
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",
 		       cpu, pcpu, cpu_status);
@@ -252,10 +352,41 @@ static struct notifier_block pseries_smp_nb = {
 	.notifier_call = pseries_smp_notifier,
 };
 
+#define MAX_CEDE_LATENCY_LEVELS		4
+#define	CEDE_LATENCY_PARAM_LENGTH	10
+#define CEDE_LATENCY_PARAM_MAX_LENGTH	\
+	(MAX_CEDE_LATENCY_LEVELS * CEDE_LATENCY_PARAM_LENGTH * sizeof(char))
+#define CEDE_LATENCY_TOKEN		45
+
+static char cede_parameters[CEDE_LATENCY_PARAM_MAX_LENGTH];
+
+static int parse_cede_parameters(void)
+{
+	int call_status;
+
+	memset(cede_parameters, 0, CEDE_LATENCY_PARAM_MAX_LENGTH);
+	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+				NULL,
+				CEDE_LATENCY_TOKEN,
+				__pa(cede_parameters),
+				CEDE_LATENCY_PARAM_MAX_LENGTH);
+
+	if (call_status != 0)
+		printk(KERN_INFO "CEDE_LATENCY: \
+			%s %s Error calling get-system-parameter(0x%x)\n",
+			__FILE__, __func__, call_status);
+	else
+		printk(KERN_INFO "CEDE_LATENCY: \
+			get-system-parameter successful.\n");
+
+	return call_status;
+}
+
 static int __init pseries_cpu_hotplug_init(void)
 {
 	struct device_node *np;
 	const char *typep;
+	int cpu;
 
 	for_each_node_by_name(np, "interrupt-controller") {
 		typep = of_get_property(np, "compatible", NULL);
@@ -283,8 +414,16 @@ static int __init pseries_cpu_hotplug_init(void)
 	smp_ops->cpu_die = pseries_cpu_die;
 
 	/* Processors can be added/removed only on LPAR */
-	if (firmware_has_feature(FW_FEATURE_LPAR))
+	if (firmware_has_feature(FW_FEATURE_LPAR)) {
 		pSeries_reconfig_notifier_register(&pseries_smp_nb);
+		cpu_maps_update_begin();
+		if (parse_cede_parameters() == 0) {
+			default_offline_state = CPU_STATE_INACTIVE;
+			for_each_online_cpu(cpu)
+				set_default_offline_state(cpu);
+		}
+		cpu_maps_update_done();
+	}
 
 	return 0;
 }
diff --git a/arch/powerpc/platforms/pseries/offline_states.h b/arch/powerpc/platforms/pseries/offline_states.h
new file mode 100644
index 0000000..22574e0
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_states.h
@@ -0,0 +1,18 @@
+#ifndef _OFFLINE_STATES_H_
+#define _OFFLINE_STATES_H_
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+	CPU_STATE_OFFLINE,
+	CPU_STATE_INACTIVE,
+	CPU_STATE_ONLINE,
+	CPU_MAX_OFFLINE_STATES
+};
+
+extern enum cpu_state_vals get_cpu_current_state(int cpu);
+extern void set_cpu_current_state(int cpu, enum cpu_state_vals state);
+extern enum cpu_state_vals get_preferred_offline_state(int cpu);
+extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state);
+extern void set_default_offline_state(int cpu);
+extern int start_secondary(void);
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 440000c..8868c01 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
 #include "plpar_wrappers.h"
 #include "pseries.h"
 #include "xics.h"
+#include "offline_states.h"
 
 
 /*
@@ -84,6 +85,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
 
+	if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
+		goto out;
+
 	/* 
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
@@ -98,6 +102,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 		return 0;
 	}
 
+out:
 	return 1;
 }
 
@@ -111,12 +116,16 @@ static void __devinit smp_xics_setup_cpu(int cpu)
 		vpa_init(cpu);
 
 	cpu_clear(cpu, of_spin_map);
+	set_cpu_current_state(cpu, CPU_STATE_ONLINE);
+	set_default_offline_state(cpu);
 
 }
 #endif /* CONFIG_XICS */
 
 static void __devinit smp_pSeries_kick_cpu(int nr)
 {
+	long rc;
+	unsigned long hcpuid;
 	BUG_ON(nr < 0 || nr >= NR_CPUS);
 
 	if (!smp_startup_cpu(nr))
@@ -128,6 +137,16 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
 	 * the processor will continue on to secondary_start
 	 */
 	paca[nr].cpu_start = 1;
+
+	set_preferred_offline_state(nr, CPU_STATE_ONLINE);
+
+	if (get_cpu_current_state(nr) == CPU_STATE_INACTIVE) {
+		hcpuid = get_hard_smp_processor_id(nr);
+		rc = plpar_hcall_norets(H_PROD, hcpuid);
+		if (rc != H_SUCCESS)
+			panic("Error: Prod to wake up processor %d Ret= %ld\n",
+				nr, rc);
+	}
 }
 
 static int smp_pSeries_cpu_bootable(unsigned int nr)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 3/4] pseries: Add code to online/offline CPUs of a DLPAR node.
  2009-10-09  8:30 [PATCH v4 0/4] pseries: Add cede support for cpu-offline Gautham R Shenoy
  2009-10-09  8:30 ` [PATCH v4 1/4] pSeries: extended_cede_processor() helper function Gautham R Shenoy
  2009-10-09  8:30 ` [PATCH v4 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state Gautham R Shenoy
@ 2009-10-09  8:31 ` Gautham R Shenoy
  2009-10-09  8:31 ` [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate Gautham R Shenoy
  3 siblings, 0 replies; 6+ messages in thread
From: Gautham R Shenoy @ 2009-10-09  8:31 UTC (permalink / raw)
  To: Nathan Fontenot, Benjamin Herrenschmidt, linuxppc-dev,
	Peter Zijlstra, linux-kernel
  Cc: Arun R Bharadwaj

Currently the cpu-allocation/deallocation on pSeries is a
two step process from the Userspace.

- Set the indicators and update the device tree by writing to the sysfs
  tunable "probe" during allocation and "release" during deallocation.
- Online / Offline the CPUs of the allocated/would_be_deallocated node by
  writing to the sysfs tunable "online".

This patch adds kernel code to online/offline the CPUs soon_after/just_before
they have been allocated/would_be_deallocated. This way, the userspace tool
that performs DLPAR operations would only have to deal with one set of sysfs
tunables namely "probe" and release".

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c |  103 ++++++++++++++++++++++++++++++++
 1 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 84d156b..9752386 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -20,7 +20,7 @@
 #include <linux/sysdev.h>
 #include <linux/sysfs.h>
 #include <linux/cpu.h>
-
+#include "offline_states.h"
 
 #include <asm/prom.h>
 #include <asm/machdep.h>
@@ -357,6 +357,98 @@ int remove_device_tree_nodes(struct device_node *dn)
 	return rc;
 }
 
+int online_node_cpus(struct device_node *dn)
+{
+	int rc = 0;
+	unsigned int cpu;
+	int len, nthreads, i;
+	const u32 *intserv;
+
+	intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+	if (!intserv)
+		return -EINVAL;
+
+	nthreads = len / sizeof(u32);
+
+	cpu_maps_update_begin();
+	for (i = 0; i < nthreads; i++) {
+		for_each_present_cpu(cpu) {
+			if (get_hard_smp_processor_id(cpu) != intserv[i])
+				continue;
+			BUG_ON(get_cpu_current_state(cpu)
+					!= CPU_STATE_OFFLINE);
+			cpu_maps_update_done();
+			rc = cpu_up(cpu);
+			if (rc)
+				goto out;
+			cpu_maps_update_begin();
+
+			break;
+		}
+		if (cpu == num_possible_cpus())
+			printk(KERN_WARNING "Could not find cpu to online "
+			       "with physical id 0x%x\n", intserv[i]);
+	}
+	cpu_maps_update_done();
+
+out:
+	return rc;
+
+}
+
+int offline_node_cpus(struct device_node *dn)
+{
+	int rc = 0;
+	unsigned int cpu;
+	int len, nthreads, i;
+	const u32 *intserv;
+
+	intserv = of_get_property(dn, "ibm,ppc-interrupt-server#s", &len);
+	if (!intserv)
+		return -EINVAL;
+
+	nthreads = len / sizeof(u32);
+
+	cpu_maps_update_begin();
+	for (i = 0; i < nthreads; i++) {
+		for_each_present_cpu(cpu) {
+			if (get_hard_smp_processor_id(cpu) != intserv[i])
+				continue;
+
+			if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
+				break;
+
+			if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
+				cpu_maps_update_done();
+				rc = cpu_down(cpu);
+				if (rc)
+					goto out;
+				cpu_maps_update_begin();
+				break;
+
+			}
+
+			/*
+			 * The cpu is in CPU_STATE_INACTIVE.
+			 * Upgrade it's state to CPU_STATE_OFFLINE.
+			 */
+			set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
+			BUG_ON(plpar_hcall_norets(H_PROD, intserv[i])
+								!= H_SUCCESS);
+			__cpu_die(cpu);
+			break;
+		}
+		if (cpu == num_possible_cpus())
+			printk(KERN_WARNING "Could not find cpu to offline "
+			       "with physical id 0x%x\n", intserv[i]);
+	}
+	cpu_maps_update_done();
+
+out:
+	return rc;
+
+}
+
 #define DR_ENTITY_SENSE		9003
 #define DR_ENTITY_PRESENT	1
 #define DR_ENTITY_UNUSABLE	2
@@ -591,6 +683,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
 	if (rc)
 		release_drc(drc_index);
 
+	rc = online_node_cpus(dn);
+
 	return rc ? rc : count;
 }
 
@@ -611,6 +705,11 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
 		return -EINVAL;
 	}
 
+	rc = offline_node_cpus(dn);
+
+	if (rc)
+		goto out;
+
 	rc = release_drc(*drc_index);
 	if (rc) {
 		of_node_put(dn);
@@ -622,6 +721,8 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
 		acquire_drc(*drc_index);
 
 	of_node_put(dn);
+
+out:
 	return rc ? rc : count;
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
  2009-10-09  8:30 [PATCH v4 0/4] pseries: Add cede support for cpu-offline Gautham R Shenoy
                   ` (2 preceding siblings ...)
  2009-10-09  8:31 ` [PATCH v4 3/4] pseries: Add code to online/offline CPUs of a DLPAR node Gautham R Shenoy
@ 2009-10-09  8:31 ` Gautham R Shenoy
  2009-10-27  3:23   ` Benjamin Herrenschmidt
  3 siblings, 1 reply; 6+ messages in thread
From: Gautham R Shenoy @ 2009-10-09  8:31 UTC (permalink / raw)
  To: Nathan Fontenot, Benjamin Herrenschmidt, linuxppc-dev,
	Peter Zijlstra, linux-kernel
  Cc: Arun R Bharadwaj

Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
  information.

- Online/offline the allocated/deallocated CPU.

This is achieved by writing to the sysfs tunables "probe" during allocation
and "release" during deallocation.

At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable "online".

It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the "online" sysfs tunable thereby causing the deallocate process to fail.

The solution to this is to serialize writes to the "probe/release" sysfs
tunable with the writes to the "online" sysfs tunable.

This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/platforms/pseries/dlpar.c |   26 ++++++++++++++++++++++----
 drivers/base/cpu.c                     |    2 ++
 include/linux/cpu.h                    |   13 +++++++++++++
 3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 9752386..fc261e6 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -644,6 +644,18 @@ static ssize_t memory_release_store(struct class *class, const char *buf,
 	return rc ? -1 : count;
 }
 
+static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
+
+void cpu_hotplug_driver_lock()
+{
+	mutex_lock(&pseries_cpu_hotplug_mutex);
+}
+
+void cpu_hotplug_driver_unlock()
+{
+	mutex_unlock(&pseries_cpu_hotplug_mutex);
+}
+
 static ssize_t cpu_probe_store(struct class *class, const char *buf,
 			       size_t count)
 {
@@ -656,14 +668,15 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
 	if (rc)
 		return -EINVAL;
 
+	cpu_hotplug_driver_lock();
 	rc = acquire_drc(drc_index);
 	if (rc)
-		return rc;
+		goto out;
 
 	dn = configure_connector(drc_index);
 	if (!dn) {
 		release_drc(drc_index);
-		return rc;
+		goto out;
 	}
 
 	/* fixup dn name */
@@ -672,7 +685,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
 	if (!cpu_name) {
 		free_cc_nodes(dn);
 		release_drc(drc_index);
-		return -ENOMEM;
+		rc = -ENOMEM;
+		goto out;
 	}
 
 	sprintf(cpu_name, "/cpus/%s", dn->full_name);
@@ -684,6 +698,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
 		release_drc(drc_index);
 
 	rc = online_node_cpus(dn);
+out:
+	cpu_hotplug_driver_unlock();
 
 	return rc ? rc : count;
 }
@@ -705,6 +721,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
 		return -EINVAL;
 	}
 
+	cpu_hotplug_driver_lock();
 	rc = offline_node_cpus(dn);
 
 	if (rc)
@@ -713,7 +730,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
 	rc = release_drc(*drc_index);
 	if (rc) {
 		of_node_put(dn);
-		return rc;
+		goto out;
 	}
 
 	rc = remove_device_tree_nodes(dn);
@@ -723,6 +740,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
 	of_node_put(dn);
 
 out:
+	cpu_hotplug_driver_unlock();
 	return rc ? rc : count;
 }
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..07c3f05 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 	ssize_t ret;
 
+	cpu_hotplug_driver_lock();
 	switch (buf[0]) {
 	case '0':
 		ret = cpu_down(cpu->sysdev.id);
@@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 	default:
 		ret = -EINVAL;
 	}
+	cpu_hotplug_driver_unlock();
 
 	if (ret >= 0)
 		ret = count;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4753619..b0ad4e1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -115,6 +115,19 @@ extern void put_online_cpus(void);
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
 int cpu_down(unsigned int cpu);
 
+#ifdef CONFIG_PPC_PSERIES
+extern void cpu_hotplug_driver_lock(void);
+extern void cpu_hotplug_driver_unlock(void);
+#else
+static inline void cpu_hotplug_driver_lock(void)
+{
+}
+
+static inline void cpu_hotplug_driver_unlock(void)
+{
+}
+#endif
+
 #else		/* CONFIG_HOTPLUG_CPU */
 
 #define get_online_cpus()	do { } while (0)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
  2009-10-09  8:31 ` [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate Gautham R Shenoy
@ 2009-10-27  3:23   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-27  3:23 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Arun R Bharadwaj, linuxppc-dev, Peter Zijlstra, linux-kernel

On Fri, 2009-10-09 at 14:01 +0530, Gautham R Shenoy wrote:
> Currently the cpu-allocation/deallocation process comprises of two steps:
> - Set the indicators and to update the device tree with DLPAR node
>   information.
> 
> - Online/offline the allocated/deallocated CPU.
> 
> This is achieved by writing to the sysfs tunables "probe" during allocation
> and "release" during deallocation.
> 
> At the sametime, the userspace can independently online/offline the CPUs of
> the system using the sysfs tunable "online".
> 
> It is quite possible that when a userspace tool offlines a CPU
> for the purpose of deallocation and is in the process of updating the device
> tree, some other userspace tool could bring the CPU back online by writing to
> the "online" sysfs tunable thereby causing the deallocate process to fail.
> 
> The solution to this is to serialize writes to the "probe/release" sysfs
> tunable with the writes to the "online" sysfs tunable.
> 
> This patch employs a mutex to provide this serialization, which is a no-op on
> all architectures except PPC_PSERIES
> 
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>

Peter, did you get a chance to review this one ?

Cheers,
Ben.

> ---
>  arch/powerpc/platforms/pseries/dlpar.c |   26 ++++++++++++++++++++++----
>  drivers/base/cpu.c                     |    2 ++
>  include/linux/cpu.h                    |   13 +++++++++++++
>  3 files changed, 37 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index 9752386..fc261e6 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -644,6 +644,18 @@ static ssize_t memory_release_store(struct class *class, const char *buf,
>  	return rc ? -1 : count;
>  }
>  
> +static DEFINE_MUTEX(pseries_cpu_hotplug_mutex);
> +
> +void cpu_hotplug_driver_lock()
> +{
> +	mutex_lock(&pseries_cpu_hotplug_mutex);
> +}
> +
> +void cpu_hotplug_driver_unlock()
> +{
> +	mutex_unlock(&pseries_cpu_hotplug_mutex);
> +}
> +
>  static ssize_t cpu_probe_store(struct class *class, const char *buf,
>  			       size_t count)
>  {
> @@ -656,14 +668,15 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
>  	if (rc)
>  		return -EINVAL;
>  
> +	cpu_hotplug_driver_lock();
>  	rc = acquire_drc(drc_index);
>  	if (rc)
> -		return rc;
> +		goto out;
>  
>  	dn = configure_connector(drc_index);
>  	if (!dn) {
>  		release_drc(drc_index);
> -		return rc;
> +		goto out;
>  	}
>  
>  	/* fixup dn name */
> @@ -672,7 +685,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
>  	if (!cpu_name) {
>  		free_cc_nodes(dn);
>  		release_drc(drc_index);
> -		return -ENOMEM;
> +		rc = -ENOMEM;
> +		goto out;
>  	}
>  
>  	sprintf(cpu_name, "/cpus/%s", dn->full_name);
> @@ -684,6 +698,8 @@ static ssize_t cpu_probe_store(struct class *class, const char *buf,
>  		release_drc(drc_index);
>  
>  	rc = online_node_cpus(dn);
> +out:
> +	cpu_hotplug_driver_unlock();
>  
>  	return rc ? rc : count;
>  }
> @@ -705,6 +721,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
>  		return -EINVAL;
>  	}
>  
> +	cpu_hotplug_driver_lock();
>  	rc = offline_node_cpus(dn);
>  
>  	if (rc)
> @@ -713,7 +730,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
>  	rc = release_drc(*drc_index);
>  	if (rc) {
>  		of_node_put(dn);
> -		return rc;
> +		goto out;
>  	}
>  
>  	rc = remove_device_tree_nodes(dn);
> @@ -723,6 +740,7 @@ static ssize_t cpu_release_store(struct class *class, const char *buf,
>  	of_node_put(dn);
>  
>  out:
> +	cpu_hotplug_driver_unlock();
>  	return rc ? rc : count;
>  }
>  
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index e62a4cc..07c3f05 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -35,6 +35,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
>  	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
>  	ssize_t ret;
>  
> +	cpu_hotplug_driver_lock();
>  	switch (buf[0]) {
>  	case '0':
>  		ret = cpu_down(cpu->sysdev.id);
> @@ -49,6 +50,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
>  	default:
>  		ret = -EINVAL;
>  	}
> +	cpu_hotplug_driver_unlock();
>  
>  	if (ret >= 0)
>  		ret = count;
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 4753619..b0ad4e1 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -115,6 +115,19 @@ extern void put_online_cpus(void);
>  #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
>  int cpu_down(unsigned int cpu);
>  
> +#ifdef CONFIG_PPC_PSERIES
> +extern void cpu_hotplug_driver_lock(void);
> +extern void cpu_hotplug_driver_unlock(void);
> +#else
> +static inline void cpu_hotplug_driver_lock(void)
> +{
> +}
> +
> +static inline void cpu_hotplug_driver_unlock(void)
> +{
> +}
> +#endif
> +
>  #else		/* CONFIG_HOTPLUG_CPU */
>  
>  #define get_online_cpus()	do { } while (0)
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-27  3:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-09  8:30 [PATCH v4 0/4] pseries: Add cede support for cpu-offline Gautham R Shenoy
2009-10-09  8:30 ` [PATCH v4 1/4] pSeries: extended_cede_processor() helper function Gautham R Shenoy
2009-10-09  8:30 ` [PATCH v4 2/4] pSeries: Add hooks to put the CPU into an appropriate offline state Gautham R Shenoy
2009-10-09  8:31 ` [PATCH v4 3/4] pseries: Add code to online/offline CPUs of a DLPAR node Gautham R Shenoy
2009-10-09  8:31 ` [PATCH v4 4/4] pseries: Serialize cpu hotplug operations during deactivate Vs deallocate Gautham R Shenoy
2009-10-27  3:23   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).