All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-08-28 10:00 ` Gautham R Shenoy
  0 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-08-28 10:00 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma
  Cc: Balbir Singh, Venkatesh Pallipadi, linuxppc-dev, linux-kernel,
	Darrick J. Wong

Hi,

This is the version 2 of the patch series to provide a cpu-offline framework
that enables the administrators choose the state the offline CPU must be put
into when multiple such states are exposed by the underlying architecture.

Version 1 of the Patch can be found here:
http://lkml.org/lkml/2009/8/6/236

The patch-series exposes the following sysfs tunables to
allow the system-adminstrator to choose the state of a CPU:

To query the available hotplug states, one needs to read the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
To query or set the current state, on needs to read/write the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/current_states

The patchset ensures that the writes to the "current_state" sysfs file are
serialized against the writes to the "online" file.

This patchset also contains the offline state driver implemented for
pSeries. For pSeries, we define three available_hotplug_states. They are:

	online: The processor is online.

	deallocate: This is the the default behaviour when the cpu is offlined
	even in the absense of this driver. The CPU would call make an
	rtas_stop_self() call and hand over the CPU back to the resource pool,
	thereby effectively deallocating that vCPU from the LPAR.
	NOTE: This would result in a configuration change to the LPAR
	which is visible to the outside world.

	deactivate: This cedes the vCPU to the hypervisor which
	in turn can put the vCPU time to the best use.
	NOTE: This option DOES NOT result in a configuration change
	and the vCPU would be still entitled to the LPAR to which it earlier
	belong to.

Awaiting your feedback.
---

Gautham R Shenoy (2):
      cpu: Implement cpu-offline-state driver for pSeries.
      cpu: Offline state Framework.


 arch/powerpc/platforms/pseries/Makefile         |    2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   76 +++++++++-
 arch/powerpc/platforms/pseries/offline_driver.c |  161 +++++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c            |   17 ++
 drivers/base/cpu.c                              |  176 ++++++++++++++++++++++-
 include/linux/cpu.h                             |   30 ++++
 7 files changed, 465 insertions(+), 17 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-08-28 10:00 ` Gautham R Shenoy
  0 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-08-28 10:00 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma
  Cc: Darrick J. Wong, linuxppc-dev, linux-kernel, Venkatesh Pallipadi

Hi,

This is the version 2 of the patch series to provide a cpu-offline framework
that enables the administrators choose the state the offline CPU must be put
into when multiple such states are exposed by the underlying architecture.

Version 1 of the Patch can be found here:
http://lkml.org/lkml/2009/8/6/236

The patch-series exposes the following sysfs tunables to
allow the system-adminstrator to choose the state of a CPU:

To query the available hotplug states, one needs to read the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
To query or set the current state, on needs to read/write the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/current_states

The patchset ensures that the writes to the "current_state" sysfs file are
serialized against the writes to the "online" file.

This patchset also contains the offline state driver implemented for
pSeries. For pSeries, we define three available_hotplug_states. They are:

	online: The processor is online.

	deallocate: This is the the default behaviour when the cpu is offlined
	even in the absense of this driver. The CPU would call make an
	rtas_stop_self() call and hand over the CPU back to the resource pool,
	thereby effectively deallocating that vCPU from the LPAR.
	NOTE: This would result in a configuration change to the LPAR
	which is visible to the outside world.

	deactivate: This cedes the vCPU to the hypervisor which
	in turn can put the vCPU time to the best use.
	NOTE: This option DOES NOT result in a configuration change
	and the vCPU would be still entitled to the LPAR to which it earlier
	belong to.

Awaiting your feedback.
---

Gautham R Shenoy (2):
      cpu: Implement cpu-offline-state driver for pSeries.
      cpu: Offline state Framework.


 arch/powerpc/platforms/pseries/Makefile         |    2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   76 +++++++++-
 arch/powerpc/platforms/pseries/offline_driver.c |  161 +++++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c            |   17 ++
 drivers/base/cpu.c                              |  176 ++++++++++++++++++++++-
 include/linux/cpu.h                             |   30 ++++
 7 files changed, 465 insertions(+), 17 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

-- 
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] cpu: Offline state Framework.
  2009-08-28 10:00 ` Gautham R Shenoy
@ 2009-08-28 10:00   ` Gautham R Shenoy
  -1 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-08-28 10:00 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma
  Cc: Balbir Singh, Venkatesh Pallipadi, linuxppc-dev, linux-kernel,
	Darrick J. Wong

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the hotplug states, on needs to perform a read on the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/available_hotplug_states

To query or set the current state for a particular CPU, one needs to
use the sysfs interface:
	/sys/devices/system/cpu/cpu<number>/current_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

Architectures which want to expose the multiple offline-states to the
userspace are expected to write a driver which can register
with this framework.

Such a driver should:
- Implement the callbacks defined in the structure struct cpu_offline_driver
  which can be called into by this framework when the corresponding
  sysfs interfaces are read or written into.

- Ensure that the following operation puts the CPU in the same state
  as it did in the absence of the driver.
	echo 0 > /sys/devices/system/cpu/cpu<number>/online

This framework also serializes the writes to the "current_state"
with respect to with the writes to the "online" sysfs tunable.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 drivers/base/cpu.c  |  176 ++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/cpu.h |   30 +++++++++
 2 files changed, 197 insertions(+), 9 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..73efc55 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -20,7 +20,161 @@ EXPORT_SYMBOL(cpu_sysdev_class);
 
 static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
+struct sys_device *get_cpu_sysdev(unsigned cpu)
+{
+	if (cpu < nr_cpu_ids && cpu_possible(cpu))
+		return per_cpu(cpu_sys_devices, cpu);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(get_cpu_sysdev);
+
+
 #ifdef CONFIG_HOTPLUG_CPU
+
+struct cpu_offline_driver *cpu_offline_driver;
+static DEFINE_MUTEX(cpu_offline_driver_lock);
+
+ssize_t show_available_states(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->read_available_states(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	return ret;
+
+}
+
+ssize_t show_current_state(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret = 0;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->read_current_state(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	return ret;
+}
+
+ssize_t store_current_state(struct sys_device *dev,
+			struct sysdev_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret = count;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->write_current_state(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	if (ret >= 0)
+		ret = count;
+	return ret;
+}
+
+static SYSDEV_ATTR(available_hotplug_states, 0444, show_available_states,
+								NULL);
+static SYSDEV_ATTR(current_state, 0644, show_current_state,
+						store_current_state);
+
+/* Should be called with cpu_offline_driver_lock held */
+void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver || !cpu_sys_dev)
+		return;
+
+	sysdev_create_file(cpu_sys_dev, &attr_available_hotplug_states);
+	sysdev_create_file(cpu_sys_dev, &attr_current_state);
+}
+
+/* Should be called with cpu_offline_driver_lock held */
+void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver || !cpu_sys_dev)
+		return;
+
+	sysdev_remove_file(cpu_sys_dev, &attr_available_hotplug_states);
+	sysdev_remove_file(cpu_sys_dev, &attr_current_state);
+
+}
+
+int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int ret = 0;
+	int cpu;
+	mutex_lock(&cpu_offline_driver_lock);
+
+	if (cpu_offline_driver != NULL) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	if (!(arch_cpu_driver->read_available_states &&
+	      arch_cpu_driver->read_current_state &&
+	      arch_cpu_driver->write_current_state)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	cpu_offline_driver = arch_cpu_driver;
+
+	for_each_possible_cpu(cpu)
+		cpu_offline_driver_add_cpu(get_cpu_sysdev(cpu));
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+	return ret;
+}
+
+void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int cpu;
+	mutex_lock(&cpu_offline_driver_lock);
+
+	if (!cpu_offline_driver) {
+		WARN_ON(1);
+		mutex_unlock(&cpu_offline_driver_lock);
+		return;
+	}
+
+	for_each_possible_cpu(cpu)
+		cpu_offline_driver_remove_cpu(get_cpu_sysdev(cpu));
+
+	cpu_offline_driver = NULL;
+	mutex_unlock(&cpu_offline_driver_lock);
+}
+
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
@@ -35,6 +189,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 	ssize_t ret;
 
+	mutex_lock(&cpu_offline_driver_lock);
 	switch (buf[0]) {
 	case '0':
 		ret = cpu_down(cpu->sysdev.id);
@@ -50,6 +205,8 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 		ret = -EINVAL;
 	}
 
+	mutex_unlock(&cpu_offline_driver_lock);
+
 	if (ret >= 0)
 		ret = count;
 	return ret;
@@ -59,23 +216,33 @@ static SYSDEV_ATTR(online, 0644, show_online, store_online);
 static void __cpuinit register_cpu_control(struct cpu *cpu)
 {
 	sysdev_create_file(&cpu->sysdev, &attr_online);
+	mutex_lock(&cpu_offline_driver_lock);
+	cpu_offline_driver_add_cpu(&cpu->sysdev);
+	mutex_unlock(&cpu_offline_driver_lock);
 }
+
 void unregister_cpu(struct cpu *cpu)
 {
 	int logical_cpu = cpu->sysdev.id;
 
 	unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
 
+	mutex_lock(&cpu_offline_driver_lock);
+	cpu_offline_driver_remove_cpu(&cpu->sysdev);
+	mutex_unlock(&cpu_offline_driver_lock);
+
 	sysdev_remove_file(&cpu->sysdev, &attr_online);
 
 	sysdev_unregister(&cpu->sysdev);
 	per_cpu(cpu_sys_devices, logical_cpu) = NULL;
 	return;
 }
+
 #else /* ... !CONFIG_HOTPLUG_CPU */
 static inline void register_cpu_control(struct cpu *cpu)
 {
 }
+
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_KEXEC
@@ -224,15 +391,6 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
 	return error;
 }
 
-struct sys_device *get_cpu_sysdev(unsigned cpu)
-{
-	if (cpu < nr_cpu_ids && cpu_possible(cpu))
-		return per_cpu(cpu_sys_devices, cpu);
-	else
-		return NULL;
-}
-EXPORT_SYMBOL_GPL(get_cpu_sysdev);
-
 int __init cpu_dev_init(void)
 {
 	int err;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..7636420 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -51,6 +51,36 @@ struct notifier_block;
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
 extern void unregister_cpu_notifier(struct notifier_block *nb);
+
+/*
+ * struct cpu_offline_driver: Callbacks for cpu-offline state framework.
+ *
+ * Defines the hooks for the architecture dependent callbacks
+ * which can be invoked when the user queries the
+ * available_hotplug_states and current_state and sets the current_state.
+ *
+ * read_available_state: Called when the user queries available_hotplug_states.
+ * @cpu: Cpu for which available_hotplug_states are being queried.
+ * @buf: Buffer in which the available hotplug states are to be populated.
+ *
+ * read_current_state: Called when the user queries current_state.
+ * @cpu: Cpu for which current_state is being queried.
+ * @buf: Buffer in which the current_state value is to be returned.
+ *
+ * write_current_state: Called when the user wants to set the current_state.
+ * @cpu: Cpu for which the current state is being set.
+ * @buf: Buffer containing the string corresponding to the current_state
+ * that needs to be set.
+ */
+struct cpu_offline_driver {
+	ssize_t (*read_available_states)(unsigned int cpu, char *buf);
+	ssize_t (*read_current_state)(unsigned int cpu, char *buf);
+	ssize_t (*write_current_state)(unsigned int cpu, const char *buf);
+};
+
+extern int register_cpu_offline_driver(struct cpu_offline_driver *driver);
+extern void unregister_cpu_offline_driver(struct cpu_offline_driver *driver);
+
 #else
 
 #ifndef MODULE


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] cpu: Offline state Framework.
@ 2009-08-28 10:00   ` Gautham R Shenoy
  0 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-08-28 10:00 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma
  Cc: Darrick J. Wong, linuxppc-dev, linux-kernel, Venkatesh Pallipadi

Provide an interface by which the system administrator can decide what state
should the CPU go to when it is offlined.

To query the hotplug states, on needs to perform a read on the sysfs tunable:
	/sys/devices/system/cpu/cpu<number>/available_hotplug_states

To query or set the current state for a particular CPU, one needs to
use the sysfs interface:
	/sys/devices/system/cpu/cpu<number>/current_state

This patch implements the architecture independent bits of the
cpu-offline-state framework.

Architectures which want to expose the multiple offline-states to the
userspace are expected to write a driver which can register
with this framework.

Such a driver should:
- Implement the callbacks defined in the structure struct cpu_offline_driver
  which can be called into by this framework when the corresponding
  sysfs interfaces are read or written into.

- Ensure that the following operation puts the CPU in the same state
  as it did in the absence of the driver.
	echo 0 > /sys/devices/system/cpu/cpu<number>/online

This framework also serializes the writes to the "current_state"
with respect to with the writes to the "online" sysfs tunable.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 drivers/base/cpu.c  |  176 ++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/cpu.h |   30 +++++++++
 2 files changed, 197 insertions(+), 9 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index e62a4cc..73efc55 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -20,7 +20,161 @@ EXPORT_SYMBOL(cpu_sysdev_class);
 
 static DEFINE_PER_CPU(struct sys_device *, cpu_sys_devices);
 
+struct sys_device *get_cpu_sysdev(unsigned cpu)
+{
+	if (cpu < nr_cpu_ids && cpu_possible(cpu))
+		return per_cpu(cpu_sys_devices, cpu);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(get_cpu_sysdev);
+
+
 #ifdef CONFIG_HOTPLUG_CPU
+
+struct cpu_offline_driver *cpu_offline_driver;
+static DEFINE_MUTEX(cpu_offline_driver_lock);
+
+ssize_t show_available_states(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->read_available_states(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	return ret;
+
+}
+
+ssize_t show_current_state(struct sys_device *dev,
+			struct sysdev_attribute *attr, char *buf)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret = 0;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->read_current_state(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	return ret;
+}
+
+ssize_t store_current_state(struct sys_device *dev,
+			struct sysdev_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
+	int cpu_num = cpu->sysdev.id;
+	ssize_t ret = count;
+
+	mutex_lock(&cpu_offline_driver_lock);
+	if (!cpu_offline_driver) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	ret = cpu_offline_driver->write_current_state(cpu_num, buf);
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+
+	if (ret >= 0)
+		ret = count;
+	return ret;
+}
+
+static SYSDEV_ATTR(available_hotplug_states, 0444, show_available_states,
+								NULL);
+static SYSDEV_ATTR(current_state, 0644, show_current_state,
+						store_current_state);
+
+/* Should be called with cpu_offline_driver_lock held */
+void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver || !cpu_sys_dev)
+		return;
+
+	sysdev_create_file(cpu_sys_dev, &attr_available_hotplug_states);
+	sysdev_create_file(cpu_sys_dev, &attr_current_state);
+}
+
+/* Should be called with cpu_offline_driver_lock held */
+void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
+{
+	if (!cpu_offline_driver || !cpu_sys_dev)
+		return;
+
+	sysdev_remove_file(cpu_sys_dev, &attr_available_hotplug_states);
+	sysdev_remove_file(cpu_sys_dev, &attr_current_state);
+
+}
+
+int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int ret = 0;
+	int cpu;
+	mutex_lock(&cpu_offline_driver_lock);
+
+	if (cpu_offline_driver != NULL) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
+	if (!(arch_cpu_driver->read_available_states &&
+	      arch_cpu_driver->read_current_state &&
+	      arch_cpu_driver->write_current_state)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	cpu_offline_driver = arch_cpu_driver;
+
+	for_each_possible_cpu(cpu)
+		cpu_offline_driver_add_cpu(get_cpu_sysdev(cpu));
+
+out_unlock:
+	mutex_unlock(&cpu_offline_driver_lock);
+	return ret;
+}
+
+void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
+{
+	int cpu;
+	mutex_lock(&cpu_offline_driver_lock);
+
+	if (!cpu_offline_driver) {
+		WARN_ON(1);
+		mutex_unlock(&cpu_offline_driver_lock);
+		return;
+	}
+
+	for_each_possible_cpu(cpu)
+		cpu_offline_driver_remove_cpu(get_cpu_sysdev(cpu));
+
+	cpu_offline_driver = NULL;
+	mutex_unlock(&cpu_offline_driver_lock);
+}
+
+
 static ssize_t show_online(struct sys_device *dev, struct sysdev_attribute *attr,
 			   char *buf)
 {
@@ -35,6 +189,7 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 	ssize_t ret;
 
+	mutex_lock(&cpu_offline_driver_lock);
 	switch (buf[0]) {
 	case '0':
 		ret = cpu_down(cpu->sysdev.id);
@@ -50,6 +205,8 @@ static ssize_t __ref store_online(struct sys_device *dev, struct sysdev_attribut
 		ret = -EINVAL;
 	}
 
+	mutex_unlock(&cpu_offline_driver_lock);
+
 	if (ret >= 0)
 		ret = count;
 	return ret;
@@ -59,23 +216,33 @@ static SYSDEV_ATTR(online, 0644, show_online, store_online);
 static void __cpuinit register_cpu_control(struct cpu *cpu)
 {
 	sysdev_create_file(&cpu->sysdev, &attr_online);
+	mutex_lock(&cpu_offline_driver_lock);
+	cpu_offline_driver_add_cpu(&cpu->sysdev);
+	mutex_unlock(&cpu_offline_driver_lock);
 }
+
 void unregister_cpu(struct cpu *cpu)
 {
 	int logical_cpu = cpu->sysdev.id;
 
 	unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
 
+	mutex_lock(&cpu_offline_driver_lock);
+	cpu_offline_driver_remove_cpu(&cpu->sysdev);
+	mutex_unlock(&cpu_offline_driver_lock);
+
 	sysdev_remove_file(&cpu->sysdev, &attr_online);
 
 	sysdev_unregister(&cpu->sysdev);
 	per_cpu(cpu_sys_devices, logical_cpu) = NULL;
 	return;
 }
+
 #else /* ... !CONFIG_HOTPLUG_CPU */
 static inline void register_cpu_control(struct cpu *cpu)
 {
 }
+
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_KEXEC
@@ -224,15 +391,6 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
 	return error;
 }
 
-struct sys_device *get_cpu_sysdev(unsigned cpu)
-{
-	if (cpu < nr_cpu_ids && cpu_possible(cpu))
-		return per_cpu(cpu_sys_devices, cpu);
-	else
-		return NULL;
-}
-EXPORT_SYMBOL_GPL(get_cpu_sysdev);
-
 int __init cpu_dev_init(void)
 {
 	int err;
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..7636420 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -51,6 +51,36 @@ struct notifier_block;
 #ifdef CONFIG_HOTPLUG_CPU
 extern int register_cpu_notifier(struct notifier_block *nb);
 extern void unregister_cpu_notifier(struct notifier_block *nb);
+
+/*
+ * struct cpu_offline_driver: Callbacks for cpu-offline state framework.
+ *
+ * Defines the hooks for the architecture dependent callbacks
+ * which can be invoked when the user queries the
+ * available_hotplug_states and current_state and sets the current_state.
+ *
+ * read_available_state: Called when the user queries available_hotplug_states.
+ * @cpu: Cpu for which available_hotplug_states are being queried.
+ * @buf: Buffer in which the available hotplug states are to be populated.
+ *
+ * read_current_state: Called when the user queries current_state.
+ * @cpu: Cpu for which current_state is being queried.
+ * @buf: Buffer in which the current_state value is to be returned.
+ *
+ * write_current_state: Called when the user wants to set the current_state.
+ * @cpu: Cpu for which the current state is being set.
+ * @buf: Buffer containing the string corresponding to the current_state
+ * that needs to be set.
+ */
+struct cpu_offline_driver {
+	ssize_t (*read_available_states)(unsigned int cpu, char *buf);
+	ssize_t (*read_current_state)(unsigned int cpu, char *buf);
+	ssize_t (*write_current_state)(unsigned int cpu, const char *buf);
+};
+
+extern int register_cpu_offline_driver(struct cpu_offline_driver *driver);
+extern void unregister_cpu_offline_driver(struct cpu_offline_driver *driver);
+
 #else
 
 #ifndef MODULE

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] cpu: Implement cpu-offline-state driver for pSeries.
  2009-08-28 10:00 ` Gautham R Shenoy
@ 2009-08-28 10:00   ` Gautham R Shenoy
  -1 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-08-28 10:00 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma
  Cc: Balbir Singh, Venkatesh Pallipadi, linuxppc-dev, linux-kernel,
	Darrick J. Wong

This patch implements the callbacks to handle the reads/writes
into the sysfs interfaces

/sys/devices/system/cpu/cpu<number>/available_hotplug_states
and
/sys/devices/system/cpu/cpu<number>/current_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- deallocate: This is the the default behaviour when the cpu is offlined even
  in the absense of this driver.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR. This would result in a configuration change to the
  LPAR which is visible to the outside world.

- deactivate: This cedes the vCPU to the hypervisor which in turn can put the
  vCPU time to the best use. This option does not result in a configuration
  change and the vCPU would be still entitled to the LPAR to which it earlier
  belong to.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/platforms/pseries/Makefile         |    2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   76 ++++++++++-
 arch/powerpc/platforms/pseries/offline_driver.c |  161 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c            |   17 ++
 5 files changed, 268 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 790c0b8..3a569c7 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_KEXEC)	+= kexec.o
 obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)	+= msi.o
 
-obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
+obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o offline_driver.o
 obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
 
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..6880a1d 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include <asm/pSeries_reconfig.h>
 #include "xics.h"
 #include "plpar_wrappers.h"
+#include "offline_driver.h"
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -54,13 +55,62 @@ static void rtas_stop_self(void)
 	panic("Alas, I survived.\n");
 }
 
+static void cede_on_offline(void)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned int hwcpu = hard_smp_processor_id();
+
+	get_lppaca()->idle = 1;
+	if (!get_lppaca()->shared_proc)
+		get_lppaca()->donate_dedicated_cpu = 1;
+
+	printk(KERN_INFO "cpu %u (hwid %u) ceding for offline with hint %d\n",
+			cpu, hwcpu, cede_latency_hint);
+	while (get_preferred_offline_state(cpu) != CPU_STATE_ONLINE) {
+		cede_processor();
+		printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n",
+			cpu, hwcpu);
+	}
+
+	printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n",
+		cpu, hwcpu);
+
+	if (!get_lppaca()->shared_proc)
+		get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->idle = 0;
+	unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+	/*
+	 * NOTE: Calling start_secondary() here for now to start
+	 * a new context.
+	 *
+	 * However, need to do it cleanly by resetting the stack
+	 * pointer.
+	 */
+	start_secondary();
+}
+
 static void pseries_mach_cpu_die(void)
 {
+	unsigned int cpu = smp_processor_id();
+
 	local_irq_disable();
 	idle_task_exit();
 	xics_teardown_cpu();
-	unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-	rtas_stop_self();
+
+	if (get_preferred_offline_state(cpu) == CPU_DEALLOCATE) {
+
+		set_cpu_current_state(cpu, CPU_DEALLOCATE);
+		unregister_slb_shadow(hard_smp_processor_id(),
+					__pa(get_slb_shadow()));
+		rtas_stop_self();
+		goto out_bug;
+	} else if (get_preferred_offline_state(cpu) == CPU_DEACTIVATE) {
+		set_cpu_current_state(cpu, CPU_DEACTIVATE);
+		cede_on_offline();
+	}
+
+out_bug:
 	/* Should never get here... */
 	BUG();
 	for(;;);
@@ -112,11 +162,23 @@ static void pseries_cpu_die(unsigned int cpu)
 	int cpu_status;
 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-	for (tries = 0; tries < 25; tries++) {
-		cpu_status = query_cpu_stopped(pcpu);
-		if (cpu_status == 0 || cpu_status == -1)
-			break;
-		cpu_relax();
+	if (get_preferred_offline_state(cpu) == CPU_DEACTIVATE) {
+		cpu_status = 1;
+		for (tries = 0; tries < 1000; tries++) {
+			if (get_cpu_current_state(cpu) == CPU_DEACTIVATE) {
+				cpu_status = 0;
+				break;
+			}
+			cpu_relax();
+		}
+	} else {
+
+		for (tries = 0; tries < 25; tries++) {
+			cpu_status = query_cpu_stopped(pcpu);
+			if (cpu_status == 0 || cpu_status == -1)
+				break;
+			cpu_relax();
+		}
 	}
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",
diff --git a/arch/powerpc/platforms/pseries/offline_driver.c b/arch/powerpc/platforms/pseries/offline_driver.c
new file mode 100644
index 0000000..e75e6e5
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.c
@@ -0,0 +1,161 @@
+#include "offline_driver.h"
+#include <linux/cpu.h>
+#include <linux/percpu-defs.h>
+
+struct cpu_hotplug_state {
+	enum cpu_state_vals state_val;
+	const char *state_name;
+	int available;
+} pSeries_cpu_hotplug_states[] = {
+	{CPU_DEALLOCATE, "deallocate", 1},
+	{CPU_DEACTIVATE, "deactivate", 1},
+	{CPU_STATE_ONLINE, "online", 1},
+	{CPU_MAX_OFFLINE_STATES, "", 0},
+};
+
+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+							CPU_DEALLOCATE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_DEALLOCATE;
+
+static enum cpu_state_vals default_offline_state = CPU_DEALLOCATE;
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+	return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+	return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+	per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
+static const char *get_cpu_hotplug_state_name(enum cpu_state_vals state_val)
+{
+	return pSeries_cpu_hotplug_states[state_val].state_name;
+}
+
+static bool cpu_hotplug_state_available(enum cpu_state_vals state_val)
+{
+	return pSeries_cpu_hotplug_states[state_val].available;
+}
+
+ssize_t pSeries_read_available_states(unsigned int cpu, char *buf)
+{
+	int state;
+	ssize_t ret = 0;
+
+	for (state = CPU_DEALLOCATE; state < CPU_MAX_OFFLINE_STATES; state++) {
+		if (!cpu_hotplug_state_available(state))
+			continue;
+
+		if (ret >= (ssize_t) ((PAGE_SIZE / sizeof(char))
+					- (CPU_STATES_LEN + 2)))
+			goto out;
+		ret += scnprintf(&buf[ret], CPU_STATES_LEN, "%s ",
+				get_cpu_hotplug_state_name(state));
+	}
+
+out:
+	ret += sprintf(&buf[ret], "\n");
+	return ret;
+}
+
+ssize_t pSeries_read_current_state(unsigned int cpu, char *buf)
+{
+	int state = get_cpu_current_state(cpu);
+
+	return scnprintf(buf, CPU_STATES_LEN, "%s\n",
+				get_cpu_hotplug_state_name(state));
+}
+
+ssize_t pSeries_write_current_state(unsigned int cpu, const char *buf)
+{
+	int ret;
+	char state_name[CPU_STATES_LEN];
+	int i;
+	struct sys_device *dev = get_cpu_sysdev(cpu);
+	ret = sscanf(buf, "%15s", state_name);
+
+	if (ret != 1) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	for (i = CPU_DEALLOCATE; i < CPU_MAX_OFFLINE_STATES; i++)
+		if (!strnicmp(state_name,
+				get_cpu_hotplug_state_name(i),
+				CPU_STATES_LEN))
+			break;
+
+	if (i == CPU_MAX_OFFLINE_STATES) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (i == get_cpu_current_state(cpu)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (i == CPU_STATE_ONLINE) {
+		ret = cpu_up(cpu);
+		if (!ret)
+			kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+		goto out_unlock;
+	}
+
+	switch (i) {
+	case CPU_DEALLOCATE:
+		if (get_cpu_current_state(cpu) == CPU_DEACTIVATE) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		break;
+	case CPU_DEACTIVATE:
+		if (get_cpu_current_state(cpu) == CPU_DEALLOCATE) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	set_preferred_offline_state(cpu, i);
+	ret = cpu_down(cpu);
+	if (!ret)
+		kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+
+out_unlock:
+	return ret;
+}
+
+struct cpu_offline_driver pSeries_offline_driver = {
+	.read_available_states = pSeries_read_available_states,
+	.read_current_state = pSeries_read_current_state,
+	.write_current_state = pSeries_write_current_state,
+};
+
+static int pseries_hotplug_driver_init(void)
+{
+	return register_cpu_offline_driver(&pSeries_offline_driver);
+}
+
+arch_initcall(pseries_hotplug_driver_init);
diff --git a/arch/powerpc/platforms/pseries/offline_driver.h b/arch/powerpc/platforms/pseries/offline_driver.h
new file mode 100644
index 0000000..77b8f76
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.h
@@ -0,0 +1,20 @@
+#ifndef _OFFLINE_DRIVER_H_
+#define _OFFLINE_DRIVER_H_
+
+#define CPU_STATES_LEN	16
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+	CPU_DEALLOCATE,
+	CPU_DEACTIVATE,
+	CPU_STATE_ONLINE,
+	CPU_MAX_OFFLINE_STATES
+};
+
+extern enum cpu_state_vals get_cpu_current_state(int cpu);
+extern void set_cpu_current_state(int cpu, enum cpu_state_vals state);
+extern enum cpu_state_vals get_preferred_offline_state(int cpu);
+extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state);
+extern int start_secondary(void);
+extern void set_default_offline_state(int cpu);
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 1f8f6cf..cfea8db 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
 #include "plpar_wrappers.h"
 #include "pseries.h"
 #include "xics.h"
+#include "offline_driver.h"
 
 
 /*
@@ -86,6 +87,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
 
+	if (get_cpu_current_state(lcpu) == CPU_DEACTIVATE)
+		goto out;
+
 	/* 
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
@@ -100,6 +104,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 		return 0;
 	}
 
+out:
 	return 1;
 }
 
@@ -113,12 +118,15 @@ static void __devinit smp_xics_setup_cpu(int cpu)
 		vpa_init(cpu);
 
 	cpu_clear(cpu, of_spin_map);
+	set_cpu_current_state(cpu, CPU_STATE_ONLINE);
+	set_default_offline_state(cpu);
 
 }
 #endif /* CONFIG_XICS */
 
 static void __devinit smp_pSeries_kick_cpu(int nr)
 {
+	long rc;
 	BUG_ON(nr < 0 || nr >= NR_CPUS);
 
 	if (!smp_startup_cpu(nr))
@@ -130,6 +138,15 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
 	 * the processor will continue on to secondary_start
 	 */
 	paca[nr].cpu_start = 1;
+
+	set_preferred_offline_state(nr, CPU_STATE_ONLINE);
+
+	if (get_cpu_current_state(nr) == CPU_DEACTIVATE) {
+		rc = plpar_hcall_norets(H_PROD, nr);
+		if (rc != H_SUCCESS)
+			panic("Error: Prod to wake up processor %d Ret= %ld\n",
+				nr, rc);
+	}
 }
 
 static int smp_pSeries_cpu_bootable(unsigned int nr)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] cpu: Implement cpu-offline-state driver for pSeries.
@ 2009-08-28 10:00   ` Gautham R Shenoy
  0 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-08-28 10:00 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma
  Cc: Darrick J. Wong, linuxppc-dev, linux-kernel, Venkatesh Pallipadi

This patch implements the callbacks to handle the reads/writes
into the sysfs interfaces

/sys/devices/system/cpu/cpu<number>/available_hotplug_states
and
/sys/devices/system/cpu/cpu<number>/current_state

Currently, the patch defines two states which the processor can go to when it
is offlined. They are

- deallocate: This is the the default behaviour when the cpu is offlined even
  in the absense of this driver.
  The CPU would call make an rtas_stop_self() call and hand over the
  CPU back to the resource pool, thereby effectively deallocating
  that vCPU from the LPAR. This would result in a configuration change to the
  LPAR which is visible to the outside world.

- deactivate: This cedes the vCPU to the hypervisor which in turn can put the
  vCPU time to the best use. This option does not result in a configuration
  change and the vCPU would be still entitled to the LPAR to which it earlier
  belong to.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/powerpc/platforms/pseries/Makefile         |    2 
 arch/powerpc/platforms/pseries/hotplug-cpu.c    |   76 ++++++++++-
 arch/powerpc/platforms/pseries/offline_driver.c |  161 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/offline_driver.h |   20 +++
 arch/powerpc/platforms/pseries/smp.c            |   17 ++
 5 files changed, 268 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.c
 create mode 100644 arch/powerpc/platforms/pseries/offline_driver.h

diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 790c0b8..3a569c7 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_KEXEC)	+= kexec.o
 obj-$(CONFIG_PCI)	+= pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)	+= msi.o
 
-obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o
+obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu.o offline_driver.o
 obj-$(CONFIG_MEMORY_HOTPLUG)	+= hotplug-memory.o
 
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index a20ead8..6880a1d 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -30,6 +30,7 @@
 #include <asm/pSeries_reconfig.h>
 #include "xics.h"
 #include "plpar_wrappers.h"
+#include "offline_driver.h"
 
 /* This version can't take the spinlock, because it never returns */
 static struct rtas_args rtas_stop_self_args = {
@@ -54,13 +55,62 @@ static void rtas_stop_self(void)
 	panic("Alas, I survived.\n");
 }
 
+static void cede_on_offline(void)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned int hwcpu = hard_smp_processor_id();
+
+	get_lppaca()->idle = 1;
+	if (!get_lppaca()->shared_proc)
+		get_lppaca()->donate_dedicated_cpu = 1;
+
+	printk(KERN_INFO "cpu %u (hwid %u) ceding for offline with hint %d\n",
+			cpu, hwcpu, cede_latency_hint);
+	while (get_preferred_offline_state(cpu) != CPU_STATE_ONLINE) {
+		cede_processor();
+		printk(KERN_INFO "cpu %u (hwid %u) returned from cede.\n",
+			cpu, hwcpu);
+	}
+
+	printk(KERN_INFO "cpu %u (hwid %u) got prodded to go online\n",
+		cpu, hwcpu);
+
+	if (!get_lppaca()->shared_proc)
+		get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->idle = 0;
+	unregister_slb_shadow(hwcpu, __pa(get_slb_shadow()));
+
+	/*
+	 * NOTE: Calling start_secondary() here for now to start
+	 * a new context.
+	 *
+	 * However, need to do it cleanly by resetting the stack
+	 * pointer.
+	 */
+	start_secondary();
+}
+
 static void pseries_mach_cpu_die(void)
 {
+	unsigned int cpu = smp_processor_id();
+
 	local_irq_disable();
 	idle_task_exit();
 	xics_teardown_cpu();
-	unregister_slb_shadow(hard_smp_processor_id(), __pa(get_slb_shadow()));
-	rtas_stop_self();
+
+	if (get_preferred_offline_state(cpu) == CPU_DEALLOCATE) {
+
+		set_cpu_current_state(cpu, CPU_DEALLOCATE);
+		unregister_slb_shadow(hard_smp_processor_id(),
+					__pa(get_slb_shadow()));
+		rtas_stop_self();
+		goto out_bug;
+	} else if (get_preferred_offline_state(cpu) == CPU_DEACTIVATE) {
+		set_cpu_current_state(cpu, CPU_DEACTIVATE);
+		cede_on_offline();
+	}
+
+out_bug:
 	/* Should never get here... */
 	BUG();
 	for(;;);
@@ -112,11 +162,23 @@ static void pseries_cpu_die(unsigned int cpu)
 	int cpu_status;
 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
 
-	for (tries = 0; tries < 25; tries++) {
-		cpu_status = query_cpu_stopped(pcpu);
-		if (cpu_status == 0 || cpu_status == -1)
-			break;
-		cpu_relax();
+	if (get_preferred_offline_state(cpu) == CPU_DEACTIVATE) {
+		cpu_status = 1;
+		for (tries = 0; tries < 1000; tries++) {
+			if (get_cpu_current_state(cpu) == CPU_DEACTIVATE) {
+				cpu_status = 0;
+				break;
+			}
+			cpu_relax();
+		}
+	} else {
+
+		for (tries = 0; tries < 25; tries++) {
+			cpu_status = query_cpu_stopped(pcpu);
+			if (cpu_status == 0 || cpu_status == -1)
+				break;
+			cpu_relax();
+		}
 	}
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",
diff --git a/arch/powerpc/platforms/pseries/offline_driver.c b/arch/powerpc/platforms/pseries/offline_driver.c
new file mode 100644
index 0000000..e75e6e5
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.c
@@ -0,0 +1,161 @@
+#include "offline_driver.h"
+#include <linux/cpu.h>
+#include <linux/percpu-defs.h>
+
+struct cpu_hotplug_state {
+	enum cpu_state_vals state_val;
+	const char *state_name;
+	int available;
+} pSeries_cpu_hotplug_states[] = {
+	{CPU_DEALLOCATE, "deallocate", 1},
+	{CPU_DEACTIVATE, "deactivate", 1},
+	{CPU_STATE_ONLINE, "online", 1},
+	{CPU_MAX_OFFLINE_STATES, "", 0},
+};
+
+static DEFINE_PER_CPU(enum cpu_state_vals, preferred_offline_state) =
+							CPU_DEALLOCATE;
+static DEFINE_PER_CPU(enum cpu_state_vals, current_state) = CPU_DEALLOCATE;
+
+static enum cpu_state_vals default_offline_state = CPU_DEALLOCATE;
+
+enum cpu_state_vals get_cpu_current_state(int cpu)
+{
+	return per_cpu(current_state, cpu);
+}
+
+void set_cpu_current_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(current_state, cpu) = state;
+}
+
+enum cpu_state_vals get_preferred_offline_state(int cpu)
+{
+	return per_cpu(preferred_offline_state, cpu);
+}
+
+void set_preferred_offline_state(int cpu, enum cpu_state_vals state)
+{
+	per_cpu(preferred_offline_state, cpu) = state;
+}
+
+void set_default_offline_state(int cpu)
+{
+	per_cpu(preferred_offline_state, cpu) = default_offline_state;
+}
+
+static const char *get_cpu_hotplug_state_name(enum cpu_state_vals state_val)
+{
+	return pSeries_cpu_hotplug_states[state_val].state_name;
+}
+
+static bool cpu_hotplug_state_available(enum cpu_state_vals state_val)
+{
+	return pSeries_cpu_hotplug_states[state_val].available;
+}
+
+ssize_t pSeries_read_available_states(unsigned int cpu, char *buf)
+{
+	int state;
+	ssize_t ret = 0;
+
+	for (state = CPU_DEALLOCATE; state < CPU_MAX_OFFLINE_STATES; state++) {
+		if (!cpu_hotplug_state_available(state))
+			continue;
+
+		if (ret >= (ssize_t) ((PAGE_SIZE / sizeof(char))
+					- (CPU_STATES_LEN + 2)))
+			goto out;
+		ret += scnprintf(&buf[ret], CPU_STATES_LEN, "%s ",
+				get_cpu_hotplug_state_name(state));
+	}
+
+out:
+	ret += sprintf(&buf[ret], "\n");
+	return ret;
+}
+
+ssize_t pSeries_read_current_state(unsigned int cpu, char *buf)
+{
+	int state = get_cpu_current_state(cpu);
+
+	return scnprintf(buf, CPU_STATES_LEN, "%s\n",
+				get_cpu_hotplug_state_name(state));
+}
+
+ssize_t pSeries_write_current_state(unsigned int cpu, const char *buf)
+{
+	int ret;
+	char state_name[CPU_STATES_LEN];
+	int i;
+	struct sys_device *dev = get_cpu_sysdev(cpu);
+	ret = sscanf(buf, "%15s", state_name);
+
+	if (ret != 1) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	for (i = CPU_DEALLOCATE; i < CPU_MAX_OFFLINE_STATES; i++)
+		if (!strnicmp(state_name,
+				get_cpu_hotplug_state_name(i),
+				CPU_STATES_LEN))
+			break;
+
+	if (i == CPU_MAX_OFFLINE_STATES) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (i == get_cpu_current_state(cpu)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (i == CPU_STATE_ONLINE) {
+		ret = cpu_up(cpu);
+		if (!ret)
+			kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+		goto out_unlock;
+	}
+
+	switch (i) {
+	case CPU_DEALLOCATE:
+		if (get_cpu_current_state(cpu) == CPU_DEACTIVATE) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		break;
+	case CPU_DEACTIVATE:
+		if (get_cpu_current_state(cpu) == CPU_DEALLOCATE) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		break;
+	default:
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	set_preferred_offline_state(cpu, i);
+	ret = cpu_down(cpu);
+	if (!ret)
+		kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+
+out_unlock:
+	return ret;
+}
+
+struct cpu_offline_driver pSeries_offline_driver = {
+	.read_available_states = pSeries_read_available_states,
+	.read_current_state = pSeries_read_current_state,
+	.write_current_state = pSeries_write_current_state,
+};
+
+static int pseries_hotplug_driver_init(void)
+{
+	return register_cpu_offline_driver(&pSeries_offline_driver);
+}
+
+arch_initcall(pseries_hotplug_driver_init);
diff --git a/arch/powerpc/platforms/pseries/offline_driver.h b/arch/powerpc/platforms/pseries/offline_driver.h
new file mode 100644
index 0000000..77b8f76
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/offline_driver.h
@@ -0,0 +1,20 @@
+#ifndef _OFFLINE_DRIVER_H_
+#define _OFFLINE_DRIVER_H_
+
+#define CPU_STATES_LEN	16
+
+/* Cpu offline states go here */
+enum cpu_state_vals {
+	CPU_DEALLOCATE,
+	CPU_DEACTIVATE,
+	CPU_STATE_ONLINE,
+	CPU_MAX_OFFLINE_STATES
+};
+
+extern enum cpu_state_vals get_cpu_current_state(int cpu);
+extern void set_cpu_current_state(int cpu, enum cpu_state_vals state);
+extern enum cpu_state_vals get_preferred_offline_state(int cpu);
+extern void set_preferred_offline_state(int cpu, enum cpu_state_vals state);
+extern int start_secondary(void);
+extern void set_default_offline_state(int cpu);
+#endif
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 1f8f6cf..cfea8db 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -48,6 +48,7 @@
 #include "plpar_wrappers.h"
 #include "pseries.h"
 #include "xics.h"
+#include "offline_driver.h"
 
 
 /*
@@ -86,6 +87,9 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
 
+	if (get_cpu_current_state(lcpu) == CPU_DEACTIVATE)
+		goto out;
+
 	/* 
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.
@@ -100,6 +104,7 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 		return 0;
 	}
 
+out:
 	return 1;
 }
 
@@ -113,12 +118,15 @@ static void __devinit smp_xics_setup_cpu(int cpu)
 		vpa_init(cpu);
 
 	cpu_clear(cpu, of_spin_map);
+	set_cpu_current_state(cpu, CPU_STATE_ONLINE);
+	set_default_offline_state(cpu);
 
 }
 #endif /* CONFIG_XICS */
 
 static void __devinit smp_pSeries_kick_cpu(int nr)
 {
+	long rc;
 	BUG_ON(nr < 0 || nr >= NR_CPUS);
 
 	if (!smp_startup_cpu(nr))
@@ -130,6 +138,15 @@ static void __devinit smp_pSeries_kick_cpu(int nr)
 	 * the processor will continue on to secondary_start
 	 */
 	paca[nr].cpu_start = 1;
+
+	set_preferred_offline_state(nr, CPU_STATE_ONLINE);
+
+	if (get_cpu_current_state(nr) == CPU_DEACTIVATE) {
+		rc = plpar_hcall_norets(H_PROD, nr);
+		if (rc != H_SUCCESS)
+			panic("Error: Prod to wake up processor %d Ret= %ld\n",
+				nr, rc);
+	}
 }
 
 static int smp_pSeries_cpu_bootable(unsigned int nr)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] cpu: Offline state Framework.
  2009-08-28 10:00   ` Gautham R Shenoy
@ 2009-09-02  4:49     ` Andrew Morton
  -1 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-09-02  4:49 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Peter Zijlstra, Dipankar Sarma, Balbir Singh,
	Venkatesh Pallipadi, linuxppc-dev, linux-kernel, Darrick J. Wong

On Fri, 28 Aug 2009 15:30:16 +0530 Gautham R Shenoy <ego@in.ibm.com> wrote:

> Provide an interface by which the system administrator can decide what state
> should the CPU go to when it is offlined.
> 
> To query the hotplug states, on needs to perform a read on the sysfs tunable:
> 	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
> 
> To query or set the current state for a particular CPU, one needs to
> use the sysfs interface:
> 	/sys/devices/system/cpu/cpu<number>/current_state
> 
> This patch implements the architecture independent bits of the
> cpu-offline-state framework.
> 
> Architectures which want to expose the multiple offline-states to the
> userspace are expected to write a driver which can register
> with this framework.
> 
> Such a driver should:
> - Implement the callbacks defined in the structure struct cpu_offline_driver
>   which can be called into by this framework when the corresponding
>   sysfs interfaces are read or written into.
> 
> - Ensure that the following operation puts the CPU in the same state
>   as it did in the absence of the driver.
> 	echo 0 > /sys/devices/system/cpu/cpu<number>/online
> 
> This framework also serializes the writes to the "current_state"
> with respect to with the writes to the "online" sysfs tunable.
> 

It would be nice to document this new userspace interface somewhere.


> +struct cpu_offline_driver *cpu_offline_driver;
> +static DEFINE_MUTEX(cpu_offline_driver_lock);
> +
> +ssize_t show_available_states(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *buf)
> +{
> +	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
> +	int cpu_num = cpu->sysdev.id;
> +	ssize_t ret;
> +
> +	mutex_lock(&cpu_offline_driver_lock);
> +	if (!cpu_offline_driver) {
> +		ret = -EEXIST;
> +		goto out_unlock;
> +	}
> +
> +	ret = cpu_offline_driver->read_available_states(cpu_num, buf);
> +
> +out_unlock:
> +	mutex_unlock(&cpu_offline_driver_lock);
> +
> +	return ret;
> +
> +}

The patch adds boatloads of global symbols which do not have names
which are appropriate for global symbols.

> +ssize_t show_current_state(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *buf)

Like that.

> +ssize_t store_current_state(struct sys_device *dev,
> +			struct sysdev_attribute *attr,
> +			const char *buf, size_t count)

And that.

> +
> +static SYSDEV_ATTR(available_hotplug_states, 0444, show_available_states,
> +								NULL);
> +static SYSDEV_ATTR(current_state, 0644, show_current_state,
> +						store_current_state);
> +
> +/* Should be called with cpu_offline_driver_lock held */
> +void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
> +{
> +	if (!cpu_offline_driver || !cpu_sys_dev)
> +		return;
> +
> +	sysdev_create_file(cpu_sys_dev, &attr_available_hotplug_states);
> +	sysdev_create_file(cpu_sys_dev, &attr_current_state);
> +}
> +
> +/* Should be called with cpu_offline_driver_lock held */
> +void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
> +{
> +	if (!cpu_offline_driver || !cpu_sys_dev)
> +		return;
> +
> +	sysdev_remove_file(cpu_sys_dev, &attr_available_hotplug_states);
> +	sysdev_remove_file(cpu_sys_dev, &attr_current_state);
> +
> +}

Please don't just ignore possible error returns.

> +int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
> +{
> +	int ret = 0;
> +	int cpu;
> +	mutex_lock(&cpu_offline_driver_lock);
> +

The blank line goes after end-of-locals and before start-of-code.

> +	if (cpu_offline_driver != NULL) {
> +		ret = -EEXIST;
> +		goto out_unlock;
> +	}
> +
> +	if (!(arch_cpu_driver->read_available_states &&
> +	      arch_cpu_driver->read_current_state &&
> +	      arch_cpu_driver->write_current_state)) {
> +		ret = -EINVAL;
> +		goto out_unlock;

This seems pretty pointless.  Just let the code oops - the developer
will notice fairly quickly.

> +	}
> +
> +	cpu_offline_driver = arch_cpu_driver;
> +
> +	for_each_possible_cpu(cpu)
> +		cpu_offline_driver_add_cpu(get_cpu_sysdev(cpu));
> +
> +out_unlock:
> +	mutex_unlock(&cpu_offline_driver_lock);
> +	return ret;
> +}
> +
> +void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
> +{
> +	int cpu;
> +	mutex_lock(&cpu_offline_driver_lock);
> +
> +	if (!cpu_offline_driver) {
> +		WARN_ON(1);

	if (WARN_ON(!cpu_offline_driver)) {

> +		mutex_unlock(&cpu_offline_driver_lock);
> +		return;
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		cpu_offline_driver_remove_cpu(get_cpu_sysdev(cpu));
> +
> +	cpu_offline_driver = NULL;
> +	mutex_unlock(&cpu_offline_driver_lock);
> +}
> +
> +


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] cpu: Offline state Framework.
@ 2009-09-02  4:49     ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-09-02  4:49 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Peter Zijlstra, Venkatesh Pallipadi, linux-kernel, linuxppc-dev,
	Darrick J. Wong

On Fri, 28 Aug 2009 15:30:16 +0530 Gautham R Shenoy <ego@in.ibm.com> wrote:

> Provide an interface by which the system administrator can decide what state
> should the CPU go to when it is offlined.
> 
> To query the hotplug states, on needs to perform a read on the sysfs tunable:
> 	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
> 
> To query or set the current state for a particular CPU, one needs to
> use the sysfs interface:
> 	/sys/devices/system/cpu/cpu<number>/current_state
> 
> This patch implements the architecture independent bits of the
> cpu-offline-state framework.
> 
> Architectures which want to expose the multiple offline-states to the
> userspace are expected to write a driver which can register
> with this framework.
> 
> Such a driver should:
> - Implement the callbacks defined in the structure struct cpu_offline_driver
>   which can be called into by this framework when the corresponding
>   sysfs interfaces are read or written into.
> 
> - Ensure that the following operation puts the CPU in the same state
>   as it did in the absence of the driver.
> 	echo 0 > /sys/devices/system/cpu/cpu<number>/online
> 
> This framework also serializes the writes to the "current_state"
> with respect to with the writes to the "online" sysfs tunable.
> 

It would be nice to document this new userspace interface somewhere.


> +struct cpu_offline_driver *cpu_offline_driver;
> +static DEFINE_MUTEX(cpu_offline_driver_lock);
> +
> +ssize_t show_available_states(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *buf)
> +{
> +	struct cpu *cpu = container_of(dev, struct cpu, sysdev);
> +	int cpu_num = cpu->sysdev.id;
> +	ssize_t ret;
> +
> +	mutex_lock(&cpu_offline_driver_lock);
> +	if (!cpu_offline_driver) {
> +		ret = -EEXIST;
> +		goto out_unlock;
> +	}
> +
> +	ret = cpu_offline_driver->read_available_states(cpu_num, buf);
> +
> +out_unlock:
> +	mutex_unlock(&cpu_offline_driver_lock);
> +
> +	return ret;
> +
> +}

The patch adds boatloads of global symbols which do not have names
which are appropriate for global symbols.

> +ssize_t show_current_state(struct sys_device *dev,
> +			struct sysdev_attribute *attr, char *buf)

Like that.

> +ssize_t store_current_state(struct sys_device *dev,
> +			struct sysdev_attribute *attr,
> +			const char *buf, size_t count)

And that.

> +
> +static SYSDEV_ATTR(available_hotplug_states, 0444, show_available_states,
> +								NULL);
> +static SYSDEV_ATTR(current_state, 0644, show_current_state,
> +						store_current_state);
> +
> +/* Should be called with cpu_offline_driver_lock held */
> +void cpu_offline_driver_add_cpu(struct sys_device *cpu_sys_dev)
> +{
> +	if (!cpu_offline_driver || !cpu_sys_dev)
> +		return;
> +
> +	sysdev_create_file(cpu_sys_dev, &attr_available_hotplug_states);
> +	sysdev_create_file(cpu_sys_dev, &attr_current_state);
> +}
> +
> +/* Should be called with cpu_offline_driver_lock held */
> +void cpu_offline_driver_remove_cpu(struct sys_device *cpu_sys_dev)
> +{
> +	if (!cpu_offline_driver || !cpu_sys_dev)
> +		return;
> +
> +	sysdev_remove_file(cpu_sys_dev, &attr_available_hotplug_states);
> +	sysdev_remove_file(cpu_sys_dev, &attr_current_state);
> +
> +}

Please don't just ignore possible error returns.

> +int register_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
> +{
> +	int ret = 0;
> +	int cpu;
> +	mutex_lock(&cpu_offline_driver_lock);
> +

The blank line goes after end-of-locals and before start-of-code.

> +	if (cpu_offline_driver != NULL) {
> +		ret = -EEXIST;
> +		goto out_unlock;
> +	}
> +
> +	if (!(arch_cpu_driver->read_available_states &&
> +	      arch_cpu_driver->read_current_state &&
> +	      arch_cpu_driver->write_current_state)) {
> +		ret = -EINVAL;
> +		goto out_unlock;

This seems pretty pointless.  Just let the code oops - the developer
will notice fairly quickly.

> +	}
> +
> +	cpu_offline_driver = arch_cpu_driver;
> +
> +	for_each_possible_cpu(cpu)
> +		cpu_offline_driver_add_cpu(get_cpu_sysdev(cpu));
> +
> +out_unlock:
> +	mutex_unlock(&cpu_offline_driver_lock);
> +	return ret;
> +}
> +
> +void unregister_cpu_offline_driver(struct cpu_offline_driver *arch_cpu_driver)
> +{
> +	int cpu;
> +	mutex_lock(&cpu_offline_driver_lock);
> +
> +	if (!cpu_offline_driver) {
> +		WARN_ON(1);

	if (WARN_ON(!cpu_offline_driver)) {

> +		mutex_unlock(&cpu_offline_driver_lock);
> +		return;
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		cpu_offline_driver_remove_cpu(get_cpu_sysdev(cpu));
> +
> +	cpu_offline_driver = NULL;
> +	mutex_unlock(&cpu_offline_driver_lock);
> +}
> +
> +

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-08-28 10:00 ` Gautham R Shenoy
@ 2009-09-02  5:33   ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-09-02  5:33 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Joel Schopp, Benjamin Herrenschmidt, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi, linuxppc-dev,
	linux-kernel, Darrick J. Wong

On Fri, 2009-08-28 at 15:30 +0530, Gautham R Shenoy wrote:
> Hi,
> 
> This is the version 2 of the patch series to provide a cpu-offline framework
> that enables the administrators choose the state the offline CPU must be put
> into when multiple such states are exposed by the underlying architecture.
> 
> Version 1 of the Patch can be found here:
> http://lkml.org/lkml/2009/8/6/236
> 
> The patch-series exposes the following sysfs tunables to
> allow the system-adminstrator to choose the state of a CPU:
> 
> To query the available hotplug states, one needs to read the sysfs tunable:
> 	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
> To query or set the current state, on needs to read/write the sysfs tunable:
> 	/sys/devices/system/cpu/cpu<number>/current_states
> 
> The patchset ensures that the writes to the "current_state" sysfs file are
> serialized against the writes to the "online" file.
> 
> This patchset also contains the offline state driver implemented for
> pSeries. For pSeries, we define three available_hotplug_states. They are:
> 
> 	online: The processor is online.
> 
> 	deallocate: This is the the default behaviour when the cpu is offlined
> 	even in the absense of this driver. The CPU would call make an
> 	rtas_stop_self() call and hand over the CPU back to the resource pool,
> 	thereby effectively deallocating that vCPU from the LPAR.
> 	NOTE: This would result in a configuration change to the LPAR
> 	which is visible to the outside world.
> 
> 	deactivate: This cedes the vCPU to the hypervisor which
> 	in turn can put the vCPU time to the best use.
> 	NOTE: This option DOES NOT result in a configuration change
> 	and the vCPU would be still entitled to the LPAR to which it earlier
> 	belong to.
> 
> Awaiting your feedback.

I'm still thinking this is a bad idea.

The OS should only know about online/offline.

Use the hypervisor interface to deal with the cpu once its offline.

That is, I think this interface you propose is a layering violation.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-02  5:33   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-09-02  5:33 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: linux-kernel, Venkatesh Pallipadi, linuxppc-dev, Darrick J. Wong

On Fri, 2009-08-28 at 15:30 +0530, Gautham R Shenoy wrote:
> Hi,
> 
> This is the version 2 of the patch series to provide a cpu-offline framework
> that enables the administrators choose the state the offline CPU must be put
> into when multiple such states are exposed by the underlying architecture.
> 
> Version 1 of the Patch can be found here:
> http://lkml.org/lkml/2009/8/6/236
> 
> The patch-series exposes the following sysfs tunables to
> allow the system-adminstrator to choose the state of a CPU:
> 
> To query the available hotplug states, one needs to read the sysfs tunable:
> 	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
> To query or set the current state, on needs to read/write the sysfs tunable:
> 	/sys/devices/system/cpu/cpu<number>/current_states
> 
> The patchset ensures that the writes to the "current_state" sysfs file are
> serialized against the writes to the "online" file.
> 
> This patchset also contains the offline state driver implemented for
> pSeries. For pSeries, we define three available_hotplug_states. They are:
> 
> 	online: The processor is online.
> 
> 	deallocate: This is the the default behaviour when the cpu is offlined
> 	even in the absense of this driver. The CPU would call make an
> 	rtas_stop_self() call and hand over the CPU back to the resource pool,
> 	thereby effectively deallocating that vCPU from the LPAR.
> 	NOTE: This would result in a configuration change to the LPAR
> 	which is visible to the outside world.
> 
> 	deactivate: This cedes the vCPU to the hypervisor which
> 	in turn can put the vCPU time to the best use.
> 	NOTE: This option DOES NOT result in a configuration change
> 	and the vCPU would be still entitled to the LPAR to which it earlier
> 	belong to.
> 
> Awaiting your feedback.

I'm still thinking this is a bad idea.

The OS should only know about online/offline.

Use the hypervisor interface to deal with the cpu once its offline.

That is, I think this interface you propose is a layering violation.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-02  5:33   ` Peter Zijlstra
@ 2009-09-02 20:02     ` Pavel Machek
  -1 siblings, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2009-09-02 20:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Joel Schopp, Benjamin Herrenschmidt,
	Vaidyanathan Srinivasan, Dipankar Sarma, Balbir Singh,
	Venkatesh Pallipadi, linuxppc-dev, linux-kernel, Darrick J. Wong

On Wed 2009-09-02 07:33:31, Peter Zijlstra wrote:
> On Fri, 2009-08-28 at 15:30 +0530, Gautham R Shenoy wrote:
> > Hi,
> > 
> > This is the version 2 of the patch series to provide a cpu-offline framework
> > that enables the administrators choose the state the offline CPU must be put
> > into when multiple such states are exposed by the underlying architecture.
> > 
> > Version 1 of the Patch can be found here:
> > http://lkml.org/lkml/2009/8/6/236
> > 
> > The patch-series exposes the following sysfs tunables to
> > allow the system-adminstrator to choose the state of a CPU:
> > 
> > To query the available hotplug states, one needs to read the sysfs tunable:
> > 	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
> > To query or set the current state, on needs to read/write the sysfs tunable:
> > 	/sys/devices/system/cpu/cpu<number>/current_states
> > 
> > The patchset ensures that the writes to the "current_state" sysfs file are
> > serialized against the writes to the "online" file.
> > 
> > This patchset also contains the offline state driver implemented for
> > pSeries. For pSeries, we define three available_hotplug_states. They are:
> > 
> > 	online: The processor is online.
> > 
> > 	deallocate: This is the the default behaviour when the cpu is offlined
> > 	even in the absense of this driver. The CPU would call make an
> > 	rtas_stop_self() call and hand over the CPU back to the resource pool,
> > 	thereby effectively deallocating that vCPU from the LPAR.
> > 	NOTE: This would result in a configuration change to the LPAR
> > 	which is visible to the outside world.
> > 
> > 	deactivate: This cedes the vCPU to the hypervisor which
> > 	in turn can put the vCPU time to the best use.
> > 	NOTE: This option DOES NOT result in a configuration change
> > 	and the vCPU would be still entitled to the LPAR to which it earlier
> > 	belong to.
> > 
> > Awaiting your feedback.
> 
> I'm still thinking this is a bad idea.
> 
> The OS should only know about online/offline.
> 
> Use the hypervisor interface to deal with the cpu once its offline.
> 
> That is, I think this interface you propose is a layering violation.

Agreed. Plus having interface like 'go to this state during offliine'
then 'go offline' is strange/stupid. For hypervisor case, you might
want to change 'state' of cpu that is already offline.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-02 20:02     ` Pavel Machek
  0 siblings, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2009-09-02 20:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linuxppc-dev, Darrick J. Wong

On Wed 2009-09-02 07:33:31, Peter Zijlstra wrote:
> On Fri, 2009-08-28 at 15:30 +0530, Gautham R Shenoy wrote:
> > Hi,
> > 
> > This is the version 2 of the patch series to provide a cpu-offline framework
> > that enables the administrators choose the state the offline CPU must be put
> > into when multiple such states are exposed by the underlying architecture.
> > 
> > Version 1 of the Patch can be found here:
> > http://lkml.org/lkml/2009/8/6/236
> > 
> > The patch-series exposes the following sysfs tunables to
> > allow the system-adminstrator to choose the state of a CPU:
> > 
> > To query the available hotplug states, one needs to read the sysfs tunable:
> > 	/sys/devices/system/cpu/cpu<number>/available_hotplug_states
> > To query or set the current state, on needs to read/write the sysfs tunable:
> > 	/sys/devices/system/cpu/cpu<number>/current_states
> > 
> > The patchset ensures that the writes to the "current_state" sysfs file are
> > serialized against the writes to the "online" file.
> > 
> > This patchset also contains the offline state driver implemented for
> > pSeries. For pSeries, we define three available_hotplug_states. They are:
> > 
> > 	online: The processor is online.
> > 
> > 	deallocate: This is the the default behaviour when the cpu is offlined
> > 	even in the absense of this driver. The CPU would call make an
> > 	rtas_stop_self() call and hand over the CPU back to the resource pool,
> > 	thereby effectively deallocating that vCPU from the LPAR.
> > 	NOTE: This would result in a configuration change to the LPAR
> > 	which is visible to the outside world.
> > 
> > 	deactivate: This cedes the vCPU to the hypervisor which
> > 	in turn can put the vCPU time to the best use.
> > 	NOTE: This option DOES NOT result in a configuration change
> > 	and the vCPU would be still entitled to the LPAR to which it earlier
> > 	belong to.
> > 
> > Awaiting your feedback.
> 
> I'm still thinking this is a bad idea.
> 
> The OS should only know about online/offline.
> 
> Use the hypervisor interface to deal with the cpu once its offline.
> 
> That is, I think this interface you propose is a layering violation.

Agreed. Plus having interface like 'go to this state during offliine'
then 'go offline' is strange/stupid. For hypervisor case, you might
want to change 'state' of cpu that is already offline.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-02  5:33   ` Peter Zijlstra
@ 2009-09-24  0:48     ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-24  0:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Joel Schopp, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi, linuxppc-dev,
	linux-kernel, Darrick J. Wong

On Wed, 2009-09-02 at 07:33 +0200, Peter Zijlstra wrote:
> 
> I'm still thinking this is a bad idea.
> 
> The OS should only know about online/offline.
> 
> Use the hypervisor interface to deal with the cpu once its offline.
> 
> That is, I think this interface you propose is a layering violation.
> 
I don't quite follow your logic here. This is useful for more than just
hypervisors. For example, take the HV out of the picture for a moment
and imagine that the HW has the ability to offline CPU in various power
levels, with varying latencies to bring them back.

For example, the HW can put them in some low power state where they can
be re-plugged quickly, or can shutdown entire power planes completely,
possibly allowing physical hotplug, but that takes much longer to bring
them back into the pool.

In any case, regarding the pseries case, this is how our hypervisor
works and I don't think we can change it, other than by always going
into the "cede" function and having some weird separate interface in the
arch to then whack them into some different state.

Ben.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-24  0:48     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-24  0:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linuxppc-dev, Darrick J. Wong

On Wed, 2009-09-02 at 07:33 +0200, Peter Zijlstra wrote:
> 
> I'm still thinking this is a bad idea.
> 
> The OS should only know about online/offline.
> 
> Use the hypervisor interface to deal with the cpu once its offline.
> 
> That is, I think this interface you propose is a layering violation.
> 
I don't quite follow your logic here. This is useful for more than just
hypervisors. For example, take the HV out of the picture for a moment
and imagine that the HW has the ability to offline CPU in various power
levels, with varying latencies to bring them back.

For example, the HW can put them in some low power state where they can
be re-plugged quickly, or can shutdown entire power planes completely,
possibly allowing physical hotplug, but that takes much longer to bring
them back into the pool.

In any case, regarding the pseries case, this is how our hypervisor
works and I don't think we can change it, other than by always going
into the "cede" function and having some weird separate interface in the
arch to then whack them into some different state.

Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-24  0:48     ` Benjamin Herrenschmidt
@ 2009-09-24  7:51       ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-09-24  7:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gautham R Shenoy, Joel Schopp, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi, linuxppc-dev,
	linux-kernel, Darrick J. Wong

On Thu, 2009-09-24 at 10:48 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2009-09-02 at 07:33 +0200, Peter Zijlstra wrote:
> > 
> > I'm still thinking this is a bad idea.
> > 
> > The OS should only know about online/offline.
> > 
> > Use the hypervisor interface to deal with the cpu once its offline.
> > 
> > That is, I think this interface you propose is a layering violation.
> > 
> I don't quite follow your logic here. This is useful for more than just
> hypervisors. For example, take the HV out of the picture for a moment
> and imagine that the HW has the ability to offline CPU in various power
> levels, with varying latencies to bring them back.

cpu-hotplug is an utter slow path, anybody saying latency and hotplug in
the same sentence doesn't seem to grasp either or both concepts.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-24  7:51       ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-09-24  7:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linuxppc-dev, Darrick J. Wong

On Thu, 2009-09-24 at 10:48 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2009-09-02 at 07:33 +0200, Peter Zijlstra wrote:
> > 
> > I'm still thinking this is a bad idea.
> > 
> > The OS should only know about online/offline.
> > 
> > Use the hypervisor interface to deal with the cpu once its offline.
> > 
> > That is, I think this interface you propose is a layering violation.
> > 
> I don't quite follow your logic here. This is useful for more than just
> hypervisors. For example, take the HV out of the picture for a moment
> and imagine that the HW has the ability to offline CPU in various power
> levels, with varying latencies to bring them back.

cpu-hotplug is an utter slow path, anybody saying latency and hotplug in
the same sentence doesn't seem to grasp either or both concepts.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-24  7:51       ` Peter Zijlstra
@ 2009-09-24  8:38         ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-24  8:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Joel Schopp, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi, linuxppc-dev,
	linux-kernel, Darrick J. Wong

On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > I don't quite follow your logic here. This is useful for more than just
> > hypervisors. For example, take the HV out of the picture for a moment
> > and imagine that the HW has the ability to offline CPU in various power
> > levels, with varying latencies to bring them back.
> 
> cpu-hotplug is an utter slow path, anybody saying latency and hotplug in
> the same sentence doesn't seem to grasp either or both concepts.

Let's forget about latency then. Let's imagine I want to set a CPU
offline to save power, vs. setting it offline -and- opening the back
door of the machine to actually physically replace it :-)

In any case, I don't see the added feature as being problematic, and
not such a "layering violation" as you seem to imply it is. It's a
convenient way to atomically take the CPU out -and- convey some
information about the "intent" to the hypervisor, and I really fail
to see why you have so strong objections about it.

Ben.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-24  8:38         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 26+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-24  8:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linuxppc-dev, Darrick J. Wong

On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > I don't quite follow your logic here. This is useful for more than just
> > hypervisors. For example, take the HV out of the picture for a moment
> > and imagine that the HW has the ability to offline CPU in various power
> > levels, with varying latencies to bring them back.
> 
> cpu-hotplug is an utter slow path, anybody saying latency and hotplug in
> the same sentence doesn't seem to grasp either or both concepts.

Let's forget about latency then. Let's imagine I want to set a CPU
offline to save power, vs. setting it offline -and- opening the back
door of the machine to actually physically replace it :-)

In any case, I don't see the added feature as being problematic, and
not such a "layering violation" as you seem to imply it is. It's a
convenient way to atomically take the CPU out -and- convey some
information about the "intent" to the hypervisor, and I really fail
to see why you have so strong objections about it.

Ben.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-24  8:38         ` Benjamin Herrenschmidt
@ 2009-09-24 11:33           ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-09-24 11:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gautham R Shenoy, Joel Schopp, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi, linuxppc-dev,
	linux-kernel, Darrick J. Wong

On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > > I don't quite follow your logic here. This is useful for more than just
> > > hypervisors. For example, take the HV out of the picture for a moment
> > > and imagine that the HW has the ability to offline CPU in various power
> > > levels, with varying latencies to bring them back.
> > 
> > cpu-hotplug is an utter slow path, anybody saying latency and hotplug in
> > the same sentence doesn't seem to grasp either or both concepts.
> 
> Let's forget about latency then. Let's imagine I want to set a CPU
> offline to save power, vs. setting it offline -and- opening the back
> door of the machine to actually physically replace it :-)

If the hardware is capable of physical hotplug, then surely powering the
socket down saves most power and is the preferred mode?

> In any case, I don't see the added feature as being problematic, and
> not such a "layering violation" as you seem to imply it is. It's a
> convenient way to atomically take the CPU out -and- convey some
> information about the "intent" to the hypervisor, and I really fail
> to see why you have so strong objections about it.

Ignorance on my part probably :-)

I'm simply not seeing a use case for it, except for the virt case, which
I think we should bug the virt interface with and not the cpu-hotplug
interface.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-24 11:33           ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-09-24 11:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linuxppc-dev, Darrick J. Wong

On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > > I don't quite follow your logic here. This is useful for more than just
> > > hypervisors. For example, take the HV out of the picture for a moment
> > > and imagine that the HW has the ability to offline CPU in various power
> > > levels, with varying latencies to bring them back.
> > 
> > cpu-hotplug is an utter slow path, anybody saying latency and hotplug in
> > the same sentence doesn't seem to grasp either or both concepts.
> 
> Let's forget about latency then. Let's imagine I want to set a CPU
> offline to save power, vs. setting it offline -and- opening the back
> door of the machine to actually physically replace it :-)

If the hardware is capable of physical hotplug, then surely powering the
socket down saves most power and is the preferred mode?

> In any case, I don't see the added feature as being problematic, and
> not such a "layering violation" as you seem to imply it is. It's a
> convenient way to atomically take the CPU out -and- convey some
> information about the "intent" to the hypervisor, and I really fail
> to see why you have so strong objections about it.

Ignorance on my part probably :-)

I'm simply not seeing a use case for it, except for the virt case, which
I think we should bug the virt interface with and not the cpu-hotplug
interface.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-24 11:33           ` Peter Zijlstra
@ 2009-09-24 11:41             ` Arjan van de Ven
  -1 siblings, 0 replies; 26+ messages in thread
From: Arjan van de Ven @ 2009-09-24 11:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Benjamin Herrenschmidt, Gautham R Shenoy, Joel Schopp,
	Vaidyanathan Srinivasan, Dipankar Sarma, Balbir Singh,
	Venkatesh Pallipadi, linuxppc-dev, linux-kernel, Darrick J. Wong

On Thu, 24 Sep 2009 13:33:07 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > > > I don't quite follow your logic here. This is useful for more
> > > > than just hypervisors. For example, take the HV out of the
> > > > picture for a moment and imagine that the HW has the ability to
> > > > offline CPU in various power levels, with varying latencies to
> > > > bring them back.
> > > 
> > > cpu-hotplug is an utter slow path, anybody saying latency and
> > > hotplug in the same sentence doesn't seem to grasp either or both
> > > concepts.
> > 
> > Let's forget about latency then. Let's imagine I want to set a CPU
> > offline to save power, vs. setting it offline -and- opening the back
> > door of the machine to actually physically replace it :-)
> 
> If the hardware is capable of physical hotplug, then surely powering
> the socket down saves most power and is the preferred mode?

btw just to take away a perception that generally powering down sockets
help; it does not help for all cpus. Some cpus are so efficient in idle
that the incremental gain one would get by "offlining" a core is just
not worth it
(in fact, in x86, it's the same thing)

I obviously can't speak for p-series cpus, just wanted to point out
that there is no universal truth about "offlining saves power".

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-24 11:41             ` Arjan van de Ven
  0 siblings, 0 replies; 26+ messages in thread
From: Arjan van de Ven @ 2009-09-24 11:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, Venkatesh Pallipadi, linux-kernel,
	linuxppc-dev, Darrick J. Wong

On Thu, 24 Sep 2009 13:33:07 +0200
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > > > I don't quite follow your logic here. This is useful for more
> > > > than just hypervisors. For example, take the HV out of the
> > > > picture for a moment and imagine that the HW has the ability to
> > > > offline CPU in various power levels, with varying latencies to
> > > > bring them back.
> > > 
> > > cpu-hotplug is an utter slow path, anybody saying latency and
> > > hotplug in the same sentence doesn't seem to grasp either or both
> > > concepts.
> > 
> > Let's forget about latency then. Let's imagine I want to set a CPU
> > offline to save power, vs. setting it offline -and- opening the back
> > door of the machine to actually physically replace it :-)
> 
> If the hardware is capable of physical hotplug, then surely powering
> the socket down saves most power and is the preferred mode?

btw just to take away a perception that generally powering down sockets
help; it does not help for all cpus. Some cpus are so efficient in idle
that the incremental gain one would get by "offlining" a core is just
not worth it
(in fact, in x86, it's the same thing)

I obviously can't speak for p-series cpus, just wanted to point out
that there is no universal truth about "offlining saves power".

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-24 11:41             ` Arjan van de Ven
@ 2009-09-25  7:25               ` Vaidyanathan Srinivasan
  -1 siblings, 0 replies; 26+ messages in thread
From: Vaidyanathan Srinivasan @ 2009-09-25  7:25 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Gautham R Shenoy,
	Joel Schopp, Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi,
	linuxppc-dev, linux-kernel, Darrick J. Wong

* Arjan van de Ven <arjan@infradead.org> [2009-09-24 13:41:23]:

> On Thu, 24 Sep 2009 13:33:07 +0200
> Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote:
> > > On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > > > > I don't quite follow your logic here. This is useful for more
> > > > > than just hypervisors. For example, take the HV out of the
> > > > > picture for a moment and imagine that the HW has the ability to
> > > > > offline CPU in various power levels, with varying latencies to
> > > > > bring them back.
> > > > 
> > > > cpu-hotplug is an utter slow path, anybody saying latency and
> > > > hotplug in the same sentence doesn't seem to grasp either or both
> > > > concepts.
> > > 
> > > Let's forget about latency then. Let's imagine I want to set a CPU
> > > offline to save power, vs. setting it offline -and- opening the back
> > > door of the machine to actually physically replace it :-)
> > 
> > If the hardware is capable of physical hotplug, then surely powering
> > the socket down saves most power and is the preferred mode?
> 
> btw just to take away a perception that generally powering down sockets
> help; it does not help for all cpus. Some cpus are so efficient in idle
> that the incremental gain one would get by "offlining" a core is just
> not worth it
> (in fact, in x86, it's the same thing)
> 
> I obviously can't speak for p-series cpus, just wanted to point out
> that there is no universal truth about "offlining saves power".

Hi Arjan,

As you have said, on some cpus the extra effort of offlining does not
save us any extra power, and the state will be same as idle.  The
assertion that offlining saves power is still valid, it could be same
as idle or better depending on the architecture and implementation.

On x86 we still need the code (Venki posted) to take cpus to C6 on
offline to save power or else offlining consumes more power than idle
due to C1/hlt state.  This framework can help here as well if we have
any apprehension on making lowest sleep state as default on x86 and
want the administrator to decide.

--Vaidy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-25  7:25               ` Vaidyanathan Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: Vaidyanathan Srinivasan @ 2009-09-25  7:25 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Peter Zijlstra, Gautham R Shenoy, Venkatesh Pallipadi,
	linux-kernel, linuxppc-dev, Darrick J. Wong

* Arjan van de Ven <arjan@infradead.org> [2009-09-24 13:41:23]:

> On Thu, 24 Sep 2009 13:33:07 +0200
> Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote:
> > > On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote:
> > > > > I don't quite follow your logic here. This is useful for more
> > > > > than just hypervisors. For example, take the HV out of the
> > > > > picture for a moment and imagine that the HW has the ability to
> > > > > offline CPU in various power levels, with varying latencies to
> > > > > bring them back.
> > > > 
> > > > cpu-hotplug is an utter slow path, anybody saying latency and
> > > > hotplug in the same sentence doesn't seem to grasp either or both
> > > > concepts.
> > > 
> > > Let's forget about latency then. Let's imagine I want to set a CPU
> > > offline to save power, vs. setting it offline -and- opening the back
> > > door of the machine to actually physically replace it :-)
> > 
> > If the hardware is capable of physical hotplug, then surely powering
> > the socket down saves most power and is the preferred mode?
> 
> btw just to take away a perception that generally powering down sockets
> help; it does not help for all cpus. Some cpus are so efficient in idle
> that the incremental gain one would get by "offlining" a core is just
> not worth it
> (in fact, in x86, it's the same thing)
> 
> I obviously can't speak for p-series cpus, just wanted to point out
> that there is no universal truth about "offlining saves power".

Hi Arjan,

As you have said, on some cpus the extra effort of offlining does not
save us any extra power, and the state will be same as idle.  The
assertion that offlining saves power is still valid, it could be same
as idle or better depending on the architecture and implementation.

On x86 we still need the code (Venki posted) to take cpus to C6 on
offline to save power or else offlining consumes more power than idle
due to C1/hlt state.  This framework can help here as well if we have
any apprehension on making lowest sleep state as default on x86 and
want the administrator to decide.

--Vaidy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
  2009-09-25  7:25               ` Vaidyanathan Srinivasan
@ 2009-09-25  7:42                 ` Arjan van de Ven
  -1 siblings, 0 replies; 26+ messages in thread
From: Arjan van de Ven @ 2009-09-25  7:42 UTC (permalink / raw)
  To: svaidy
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Gautham R Shenoy,
	Joel Schopp, Dipankar Sarma, Balbir Singh, Venkatesh Pallipadi,
	linuxppc-dev, linux-kernel, Darrick J. Wong

On Fri, 25 Sep 2009 12:55:49 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> > I obviously can't speak for p-series cpus, just wanted to point out
> > that there is no universal truth about "offlining saves power".
> 
> Hi Arjan,
> 
> As you have said, on some cpus the extra effort of offlining does not
> save us any extra power, and the state will be same as idle.  The
> assertion that offlining saves power is still valid, it could be same
> as idle or better depending on the architecture and implementation.
> 
> On x86 we still need the code (Venki posted) to take cpus to C6 on
> offline to save power or else offlining consumes more power than idle
> due to C1/hlt state.  This framework can help here as well if we have
> any apprehension on making lowest sleep state as default on x86 and
> want the administrator to decide.

even with Venki's patch, all our measurements indicate that taking
cores away is damage on x86. Don't let that stop what you do for
powerpc, but for x86 it's not a win. Linux is good at keeping cores in
C6 long enough that the downside of offlining is bigger...



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
@ 2009-09-25  7:42                 ` Arjan van de Ven
  0 siblings, 0 replies; 26+ messages in thread
From: Arjan van de Ven @ 2009-09-25  7:42 UTC (permalink / raw)
  To: svaidy
  Cc: Peter Zijlstra, Gautham R Shenoy, Venkatesh Pallipadi,
	linux-kernel, linuxppc-dev, Darrick J. Wong

On Fri, 25 Sep 2009 12:55:49 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> > I obviously can't speak for p-series cpus, just wanted to point out
> > that there is no universal truth about "offlining saves power".
> 
> Hi Arjan,
> 
> As you have said, on some cpus the extra effort of offlining does not
> save us any extra power, and the state will be same as idle.  The
> assertion that offlining saves power is still valid, it could be same
> as idle or better depending on the architecture and implementation.
> 
> On x86 we still need the code (Venki posted) to take cpus to C6 on
> offline to save power or else offlining consumes more power than idle
> due to C1/hlt state.  This framework can help here as well if we have
> any apprehension on making lowest sleep state as default on x86 and
> want the administrator to decide.

even with Venki's patch, all our measurements indicate that taking
cores away is damage on x86. Don't let that stop what you do for
powerpc, but for x86 it's not a win. Linux is good at keeping cores in
C6 long enough that the downside of offlining is bigger...



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-09-25  7:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-28 10:00 [PATCH v2 0/2] cpu: pseries: Offline state framework Gautham R Shenoy
2009-08-28 10:00 ` Gautham R Shenoy
2009-08-28 10:00 ` [PATCH v2 1/2] cpu: Offline state Framework Gautham R Shenoy
2009-08-28 10:00   ` Gautham R Shenoy
2009-09-02  4:49   ` Andrew Morton
2009-09-02  4:49     ` Andrew Morton
2009-08-28 10:00 ` [PATCH v2 2/2] cpu: Implement cpu-offline-state driver for pSeries Gautham R Shenoy
2009-08-28 10:00   ` Gautham R Shenoy
2009-09-02  5:33 ` [PATCH v2 0/2] cpu: pseries: Offline state framework Peter Zijlstra
2009-09-02  5:33   ` Peter Zijlstra
2009-09-02 20:02   ` Pavel Machek
2009-09-02 20:02     ` Pavel Machek
2009-09-24  0:48   ` Benjamin Herrenschmidt
2009-09-24  0:48     ` Benjamin Herrenschmidt
2009-09-24  7:51     ` Peter Zijlstra
2009-09-24  7:51       ` Peter Zijlstra
2009-09-24  8:38       ` Benjamin Herrenschmidt
2009-09-24  8:38         ` Benjamin Herrenschmidt
2009-09-24 11:33         ` Peter Zijlstra
2009-09-24 11:33           ` Peter Zijlstra
2009-09-24 11:41           ` Arjan van de Ven
2009-09-24 11:41             ` Arjan van de Ven
2009-09-25  7:25             ` Vaidyanathan Srinivasan
2009-09-25  7:25               ` Vaidyanathan Srinivasan
2009-09-25  7:42               ` Arjan van de Ven
2009-09-25  7:42                 ` Arjan van de Ven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.