All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
@ 2011-05-11  7:58 ` MyungJoo Ham
  0 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-11  7:58 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park, myungjoo.ham

With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.

This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.

The generic DVFS for devices, DEVFREQ, may appear quite similar with
/drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.

Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
DEVFREQ also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.

Tested with Exynos4-NURI board.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.

Changes from v1(RFC)
- Rename: DVFS --> DEVFREQ
- Revised governor design
    . Governor receives the whole struct devfreq
    . Governor should gather usage information (thru get_dev_status)
itself
- Periodic monitoring runs only when needed.
- DEVFREQ no more deals with voltage information directly
- Removed some printks.
- Some cosmetics update
- Use freezable_wq.
---
 drivers/base/power/Makefile  |    1 +
 drivers/base/power/devfreq.c |  353 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/power/opp.c     |    6 +
 include/linux/devfreq.h      |  108 +++++++++++++
 kernel/power/Kconfig         |   25 +++
 5 files changed, 493 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/power/devfreq.c
 create mode 100644 include/linux/devfreq.h

diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index 118c1b9..d7f0ad7 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_PM_SLEEP)	+= main.o wakeup.o
 obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 obj-$(CONFIG_PM_OPP)	+= opp.o
+obj-$(CONFIG_PM_DEVFREQ)	+= devfreq.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
 ccflags-$(CONFIG_PM_VERBOSE)   += -DDEBUG
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
new file mode 100644
index 0000000..8e2e45b
--- /dev/null
+++ b/drivers/base/power/devfreq.c
@@ -0,0 +1,353 @@
+/*
+ * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/opp.h>
+#include <linux/devfreq.h>
+#include <linux/workqueue.h>
+#include <linux/platform_device.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+
+/*
+ * DEVFREQ Monitoring Interval in ms.
+ * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
+ */
+#define DEVFREQ_INTERVAL	20
+
+/*
+ * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
+ * registered device.
+ */
+static bool monitoring;
+static struct workqueue_struct *devfreq_wq;
+static struct delayed_work devfreq_work;
+/* The list of all device-devfreq */
+static LIST_HEAD(devfreq_list);
+/* Exclusive access to devfreq_list and its elements */
+static DEFINE_MUTEX(devfreq_list_lock);
+
+/**
+ * find_device_devfreq() - find devfreq struct using device pointer
+ * @dev:	device pointer used to lookup device DEVFREQ.
+ *
+ * Search the list of device DEVFREQs and return the matched device's
+ * DEVFREQ info.
+ */
+static struct devfreq *find_device_devfreq(struct device *dev)
+{
+	struct devfreq *tmp_devfreq, *devfreq = ERR_PTR(-ENODEV);
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return ERR_PTR(-EINVAL);
+	}
+
+	list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
+		if (tmp_devfreq->dev == dev) {
+			devfreq = tmp_devfreq;
+			break;
+		}
+	}
+
+	return devfreq;
+}
+
+#define dev_dbg_once(dev, fmt, ...)				\
+	if (!once) {						\
+		once = 1;					\
+		dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);	\
+	}
+/**
+ * devfreq_do() - Check the usage profile of a given device and configure
+ *		frequency and voltage accordingly
+ * @devfreq:	DEVFREQ info of the given device
+ */
+static int devfreq_do(struct devfreq *devfreq)
+{
+	struct opp *opp;
+	unsigned long freq;
+	int err;
+	static int once;
+
+	err = devfreq->governor->get_target_freq(devfreq, &freq);
+	if (err) {
+		dev_dbg_once(devfreq->dev, "%s: get_target_freq error(%d)\n",
+			     __func__, err);
+		return err;
+	}
+
+	opp = opp_find_freq_ceil(devfreq->dev, &freq);
+	if (opp == ERR_PTR(-ENODEV))
+		opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+	if (IS_ERR(opp)) {
+		dev_dbg_once(devfreq->dev, "%s: Cannot find opp with %luHz.\n",
+			     __func__, freq);
+		return PTR_ERR(opp);
+	}
+
+	freq = opp_get_freq(opp);
+	if (devfreq->previous_freq != freq) {
+		err = devfreq->profile->target(devfreq->dev, opp);
+		if (!err)
+			devfreq->previous_freq = freq;
+	}
+
+	if (err)
+		dev_dbg_once(devfreq->dev, "%s: Cannot set %luHz/%luuV\n",
+			     __func__, opp_get_freq(opp), opp_get_voltage(opp));
+	return err;
+}
+
+/**
+ * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
+ * @work: the work struct used to run devfreq_monitor periodically.
+ */
+static void devfreq_monitor(struct work_struct *work)
+{
+	struct devfreq *devfreq;
+	int error;
+	int reserved = 0;
+	static int once;
+
+	mutex_lock(&devfreq_list_lock);
+
+	list_for_each_entry(devfreq, &devfreq_list, node) {
+		if (devfreq->next_polling == 0)
+			continue;
+
+		reserved++;
+
+		if (devfreq->tickle) {
+			devfreq->tickle--;
+			continue;
+		}
+		if (devfreq->next_polling == 1) {
+			error = devfreq_do(devfreq);
+			if (error && !once) {
+				once = 1;
+				dev_err(devfreq->dev, "devfreq_do error(%d)\n",
+					error);
+			}
+			devfreq->next_polling = DIV_ROUND_UP(
+						devfreq->profile->polling_ms,
+						DEVFREQ_INTERVAL);
+		} else {
+			devfreq->next_polling--;
+		}
+	}
+
+	if (reserved) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	} else {
+		monitoring = false;
+	}
+
+	mutex_unlock(&devfreq_list_lock);
+}
+
+/**
+ * devfreq_add_device() - Add devfreq feature to the device
+ * @dev:	the device to add devfreq feature.
+ * @profile:	device-specific profile to run devfreq.
+ * @governor:	the policy to choose frequency.
+ */
+int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
+		       struct devfreq_governor *governor)
+{
+	struct devfreq *new_devfreq, *devfreq;
+	int err = 0;
+
+	if (!dev || !profile || !governor) {
+		dev_err(dev, "%s: Invalid parameters.\n", __func__);
+		return -EINVAL;
+	}
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (!IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to create DEVFREQ for the device. "
+			"It already has one.\n", __func__);
+		err = -EINVAL;
+		goto out;
+	}
+
+	new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
+	if (!new_devfreq) {
+		dev_err(dev, "%s: Unable to create DEVFREQ for the device\n",
+			__func__);
+		err = -ENOMEM;
+		goto out;
+	}
+
+	new_devfreq->dev = dev;
+	new_devfreq->profile = profile;
+	new_devfreq->governor = governor;
+	new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
+						 DEVFREQ_INTERVAL);
+	new_devfreq->previous_freq = profile->initial_freq;
+
+	list_add(&new_devfreq->node, &devfreq_list);
+
+	if (devfreq_wq && new_devfreq->next_polling && !monitoring) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+out:
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+/**
+ * devfreq_remove_device() - Remove DEVFREQ feature from a device.
+ * @device:	the device to remove devfreq feature.
+ */
+int devfreq_remove_device(struct device *dev)
+{
+	struct devfreq *devfreq;
+
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to find DEVFREQ entry for the device.\n",
+			__func__);
+		mutex_unlock(&devfreq_list_lock);
+		return -EINVAL;
+	}
+
+	list_del(&devfreq->node);
+
+	kfree(devfreq);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return 0;
+}
+
+/**
+ * devfreq_update() - Notify that the device OPP has been changed.
+ * @dev:	the device whose OPP has been changed.
+ * @may_not_exist:	do not print error message even if the device
+ *			does not have devfreq entry.
+ */
+int devfreq_update(struct device *dev, bool may_not_exist)
+{
+	struct devfreq *devfreq;
+	int err = 0;
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		if (may_not_exist && PTR_ERR(devfreq) == -EINVAL)
+			goto out;
+
+		err = PTR_ERR(devfreq);
+		goto out;
+	}
+
+	if (devfreq->tickle) {
+		unsigned long freq = devfreq->profile->max_freq;
+		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+		if (!IS_ERR(opp) && devfreq->previous_freq != freq) {
+			err = devfreq->profile->target(devfreq->dev, opp);
+			if (!err)
+				devfreq->previous_freq = opp_get_freq(opp);
+		}
+	} else {
+		err = devfreq_do(devfreq);
+	}
+
+out:
+	mutex_unlock(&devfreq_list_lock);
+	return err;
+}
+
+/**
+ * devfreq_tickle_device() - Guarantee maximum operation speed for a while
+ *			instaneously.
+ * @dev:	the device to be tickled.
+ * @duration_ms:	the duration of tickle effect.
+ *
+ * Tickle sets the device at the maximum frequency instaneously and
+ * the maximum frequency is guaranteed to be used for the given duration.
+ * For faster user reponse time, an input event may tickle a related device
+ * so that the input event does not need to wait for the DEVFREQ to react with
+ * normal interval.
+ */
+int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	struct devfreq *devfreq;
+	struct opp *opp;
+	unsigned long freq;
+	int err = 0;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	if (!IS_ERR(devfreq)) {
+		freq = devfreq->profile->max_freq;
+		opp = opp_find_freq_floor(devfreq->dev, &freq);
+		freq = opp_get_freq(opp);
+		if (devfreq->previous_freq != freq) {
+			err = devfreq->profile->target(devfreq->dev, opp);
+			if (!err)
+				devfreq->previous_freq = freq;
+		}
+		if (err)
+			dev_err(dev, "%s: Cannot set frequency.\n", __func__);
+		else
+			devfreq->tickle = DIV_ROUND_UP(duration_ms,
+						       DEVFREQ_INTERVAL);
+	}
+
+	if (devfreq_wq && !monitoring) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+	mutex_unlock(&devfreq_list_lock);
+
+	if (IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Cannot find devfreq.\n", __func__);
+		err = PTR_ERR(devfreq);
+	}
+
+	return err;
+}
+
+static int __init devfreq_init(void)
+{
+	mutex_lock(&devfreq_list_lock);
+
+	monitoring = false;
+	devfreq_wq = create_freezable_workqueue("devfreq_wq");
+	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+	mutex_unlock(&devfreq_list_lock);
+
+	devfreq_monitor(&devfreq_work.work);
+	return 0;
+}
+late_initcall(devfreq_init);
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 56a6899..4b6b995 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -21,6 +21,7 @@
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
 #include <linux/opp.h>
+#include <linux/devfreq.h>
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -428,6 +429,8 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
 	list_add_rcu(&new_opp->node, head);
 	mutex_unlock(&dev_opp_list_lock);
 
+	/* Notify generic dvfs for the change */
+	devfreq_update(dev, true);
 	return 0;
 }
 
@@ -512,6 +515,9 @@ unlock:
 	mutex_unlock(&dev_opp_list_lock);
 out:
 	kfree(new_opp);
+
+	/* Notify generic dvfs for the change */
+	devfreq_update(dev, true);
 	return r;
 }
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
new file mode 100644
index 0000000..d08e9f5
--- /dev/null
+++ b/include/linux/devfreq.h
@@ -0,0 +1,108 @@
+/*
+ * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __LINUX_DEVFREQ_H__
+#define __LINUX_DEVFREQ_H__
+
+struct devfreq;
+struct devfreq_dev_status {
+	/* both since the last measure */
+	unsigned long total_time;
+	unsigned long busy_time;
+	unsigned long current_frequency;
+};
+
+struct devfreq_dev_profile {
+	unsigned long max_freq; /* may be larger than the actual value */
+	unsigned long initial_freq;
+	int polling_ms;	/* 0 for at opp change only */
+
+	int (*target)(struct device *dev, struct opp *opp);
+	int (*get_dev_status)(struct device *dev,
+			      struct devfreq_dev_status *stat);
+};
+
+/**
+ * struct devfreq_governor - DEVFREQ Policy Governor
+ * @data	Governor's internal data. The framework does not care of it.
+ * @get_target_freq	Returns desired operating frequency for the device.
+ *			Basically, get_target_freq will run
+ *			devfreq_dev_profile.get_dev_status() to get the
+ *			status of the device (load = busy_time / total_time).
+ */
+struct devfreq_governor {
+	void *data; /* private data for get_target_freq */
+	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+};
+
+/**
+ * struct devfreq - Device DEVFREQ structure
+ * @node	list node - contains the devices with DEVFREQ that have been
+ *		registered.
+ * @dev		device pointer
+ * @profile	device-specific devfreq profile
+ * @governor	method how to choose frequency based on the usage.
+ * @previous_freq	previously configured frequency value.
+ * @next_polling	the number of remaining "devfreq_monitor" executions to
+ *			reevaluate frequency/voltage of the device. Set by
+ *			profile's polling_ms interval.
+ * @tickle	positive if DEVFREQ-tickling is activated for the device.
+ *		at each executino of devfreq_monitor, tickle is decremented.
+ *		User may tickle a device-devfreq in order to set maximum
+ *		frequency instaneously with some guaranteed duration.
+ *
+ * This structure stores the DEVFREQ information for a give device.
+ */
+struct devfreq {
+	struct list_head node;
+
+	struct device *dev;
+	struct devfreq_dev_profile *profile;
+	struct devfreq_governor *governor;
+
+	unsigned long previous_freq;
+	unsigned int next_polling;
+	unsigned int tickle;
+};
+
+#if defined(CONFIG_PM_DEVFREQ)
+extern int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor);
+extern int devfreq_remove_device(struct device *dev);
+extern int devfreq_update(struct device *dev, bool may_not_exist);
+extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+#else /* !CONFIG_PM_DEVFREQ */
+static int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor)
+{
+	return 0;
+}
+
+static int devfreq_remove_device(struct device *dev)
+{
+	return 0;
+}
+
+static int devfreq_update(struct device *dev, bool may_not_exist)
+{
+	return 0;
+}
+
+static int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	return 0;
+}
+#endif /* CONFIG_PM_DEVFREQ */
+
+#endif /* __LINUX_DEVFREQ_H__ */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 4603f08..e5d2e36 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -225,3 +225,28 @@ config PM_OPP
 	  representing individual voltage domains and provides SOC
 	  implementations a ready to use framework to manage OPPs.
 	  For more information, read <file:Documentation/power/opp.txt>
+
+config PM_DEVFREQ
+	bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
+	depends on PM_OPP
+	help
+	  With OPP support, a device may have a list of frequencies and
+	  voltages available. DEVFREQ, a generic DVFS framework can be
+	  registered for a device with OPP support in order to let the
+	  governor provided to DEVFREQ choose an operating frequency
+	  based on the OPP's list and the policy given with DEVFREQ.
+
+	  Each device may have its own governor and policy. DEVFREQ can
+	  reevaluate the device state periodically and/or based on the
+	  OPP list changes (each frequency/voltage pair in OPP may be
+	  disabled or enabled).
+
+	  Like some CPUs with CPUFREQ, a device may have multiple clocks.
+	  However, because the clock frequencies of a single device are
+	  determined by the single device's state, an instance of DEVFREQ
+	  is attached to a single device and returns a "representative"
+	  clock frequency from the OPP of the device, which is also attached
+	  to a device by 1-to-1. The device registering DEVFREQ takes the
+	  responsiblity to "interpret" the frequency listed in OPP and
+	  to set its every clock accordingly with the "target" callback
+	  given to DEVFREQ.
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
@ 2011-05-11  7:58 ` MyungJoo Ham
  0 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-11  7:58 UTC (permalink / raw)
  To: linux-pm
  Cc: Nishanth Menon, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Kyungmin Park, Thomas Gleixner

With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.

This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.

The generic DVFS for devices, DEVFREQ, may appear quite similar with
/drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.

Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
DEVFREQ also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.

Tested with Exynos4-NURI board.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.

Changes from v1(RFC)
- Rename: DVFS --> DEVFREQ
- Revised governor design
    . Governor receives the whole struct devfreq
    . Governor should gather usage information (thru get_dev_status)
itself
- Periodic monitoring runs only when needed.
- DEVFREQ no more deals with voltage information directly
- Removed some printks.
- Some cosmetics update
- Use freezable_wq.
---
 drivers/base/power/Makefile  |    1 +
 drivers/base/power/devfreq.c |  353 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/power/opp.c     |    6 +
 include/linux/devfreq.h      |  108 +++++++++++++
 kernel/power/Kconfig         |   25 +++
 5 files changed, 493 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/power/devfreq.c
 create mode 100644 include/linux/devfreq.h

diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index 118c1b9..d7f0ad7 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_PM_SLEEP)	+= main.o wakeup.o
 obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 obj-$(CONFIG_PM_OPP)	+= opp.o
+obj-$(CONFIG_PM_DEVFREQ)	+= devfreq.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
 ccflags-$(CONFIG_PM_VERBOSE)   += -DDEBUG
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
new file mode 100644
index 0000000..8e2e45b
--- /dev/null
+++ b/drivers/base/power/devfreq.c
@@ -0,0 +1,353 @@
+/*
+ * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/opp.h>
+#include <linux/devfreq.h>
+#include <linux/workqueue.h>
+#include <linux/platform_device.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+
+/*
+ * DEVFREQ Monitoring Interval in ms.
+ * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
+ */
+#define DEVFREQ_INTERVAL	20
+
+/*
+ * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
+ * registered device.
+ */
+static bool monitoring;
+static struct workqueue_struct *devfreq_wq;
+static struct delayed_work devfreq_work;
+/* The list of all device-devfreq */
+static LIST_HEAD(devfreq_list);
+/* Exclusive access to devfreq_list and its elements */
+static DEFINE_MUTEX(devfreq_list_lock);
+
+/**
+ * find_device_devfreq() - find devfreq struct using device pointer
+ * @dev:	device pointer used to lookup device DEVFREQ.
+ *
+ * Search the list of device DEVFREQs and return the matched device's
+ * DEVFREQ info.
+ */
+static struct devfreq *find_device_devfreq(struct device *dev)
+{
+	struct devfreq *tmp_devfreq, *devfreq = ERR_PTR(-ENODEV);
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return ERR_PTR(-EINVAL);
+	}
+
+	list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
+		if (tmp_devfreq->dev == dev) {
+			devfreq = tmp_devfreq;
+			break;
+		}
+	}
+
+	return devfreq;
+}
+
+#define dev_dbg_once(dev, fmt, ...)				\
+	if (!once) {						\
+		once = 1;					\
+		dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);	\
+	}
+/**
+ * devfreq_do() - Check the usage profile of a given device and configure
+ *		frequency and voltage accordingly
+ * @devfreq:	DEVFREQ info of the given device
+ */
+static int devfreq_do(struct devfreq *devfreq)
+{
+	struct opp *opp;
+	unsigned long freq;
+	int err;
+	static int once;
+
+	err = devfreq->governor->get_target_freq(devfreq, &freq);
+	if (err) {
+		dev_dbg_once(devfreq->dev, "%s: get_target_freq error(%d)\n",
+			     __func__, err);
+		return err;
+	}
+
+	opp = opp_find_freq_ceil(devfreq->dev, &freq);
+	if (opp == ERR_PTR(-ENODEV))
+		opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+	if (IS_ERR(opp)) {
+		dev_dbg_once(devfreq->dev, "%s: Cannot find opp with %luHz.\n",
+			     __func__, freq);
+		return PTR_ERR(opp);
+	}
+
+	freq = opp_get_freq(opp);
+	if (devfreq->previous_freq != freq) {
+		err = devfreq->profile->target(devfreq->dev, opp);
+		if (!err)
+			devfreq->previous_freq = freq;
+	}
+
+	if (err)
+		dev_dbg_once(devfreq->dev, "%s: Cannot set %luHz/%luuV\n",
+			     __func__, opp_get_freq(opp), opp_get_voltage(opp));
+	return err;
+}
+
+/**
+ * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
+ * @work: the work struct used to run devfreq_monitor periodically.
+ */
+static void devfreq_monitor(struct work_struct *work)
+{
+	struct devfreq *devfreq;
+	int error;
+	int reserved = 0;
+	static int once;
+
+	mutex_lock(&devfreq_list_lock);
+
+	list_for_each_entry(devfreq, &devfreq_list, node) {
+		if (devfreq->next_polling == 0)
+			continue;
+
+		reserved++;
+
+		if (devfreq->tickle) {
+			devfreq->tickle--;
+			continue;
+		}
+		if (devfreq->next_polling == 1) {
+			error = devfreq_do(devfreq);
+			if (error && !once) {
+				once = 1;
+				dev_err(devfreq->dev, "devfreq_do error(%d)\n",
+					error);
+			}
+			devfreq->next_polling = DIV_ROUND_UP(
+						devfreq->profile->polling_ms,
+						DEVFREQ_INTERVAL);
+		} else {
+			devfreq->next_polling--;
+		}
+	}
+
+	if (reserved) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	} else {
+		monitoring = false;
+	}
+
+	mutex_unlock(&devfreq_list_lock);
+}
+
+/**
+ * devfreq_add_device() - Add devfreq feature to the device
+ * @dev:	the device to add devfreq feature.
+ * @profile:	device-specific profile to run devfreq.
+ * @governor:	the policy to choose frequency.
+ */
+int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
+		       struct devfreq_governor *governor)
+{
+	struct devfreq *new_devfreq, *devfreq;
+	int err = 0;
+
+	if (!dev || !profile || !governor) {
+		dev_err(dev, "%s: Invalid parameters.\n", __func__);
+		return -EINVAL;
+	}
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (!IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to create DEVFREQ for the device. "
+			"It already has one.\n", __func__);
+		err = -EINVAL;
+		goto out;
+	}
+
+	new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
+	if (!new_devfreq) {
+		dev_err(dev, "%s: Unable to create DEVFREQ for the device\n",
+			__func__);
+		err = -ENOMEM;
+		goto out;
+	}
+
+	new_devfreq->dev = dev;
+	new_devfreq->profile = profile;
+	new_devfreq->governor = governor;
+	new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
+						 DEVFREQ_INTERVAL);
+	new_devfreq->previous_freq = profile->initial_freq;
+
+	list_add(&new_devfreq->node, &devfreq_list);
+
+	if (devfreq_wq && new_devfreq->next_polling && !monitoring) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+out:
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+/**
+ * devfreq_remove_device() - Remove DEVFREQ feature from a device.
+ * @device:	the device to remove devfreq feature.
+ */
+int devfreq_remove_device(struct device *dev)
+{
+	struct devfreq *devfreq;
+
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to find DEVFREQ entry for the device.\n",
+			__func__);
+		mutex_unlock(&devfreq_list_lock);
+		return -EINVAL;
+	}
+
+	list_del(&devfreq->node);
+
+	kfree(devfreq);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return 0;
+}
+
+/**
+ * devfreq_update() - Notify that the device OPP has been changed.
+ * @dev:	the device whose OPP has been changed.
+ * @may_not_exist:	do not print error message even if the device
+ *			does not have devfreq entry.
+ */
+int devfreq_update(struct device *dev, bool may_not_exist)
+{
+	struct devfreq *devfreq;
+	int err = 0;
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		if (may_not_exist && PTR_ERR(devfreq) == -EINVAL)
+			goto out;
+
+		err = PTR_ERR(devfreq);
+		goto out;
+	}
+
+	if (devfreq->tickle) {
+		unsigned long freq = devfreq->profile->max_freq;
+		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+		if (!IS_ERR(opp) && devfreq->previous_freq != freq) {
+			err = devfreq->profile->target(devfreq->dev, opp);
+			if (!err)
+				devfreq->previous_freq = opp_get_freq(opp);
+		}
+	} else {
+		err = devfreq_do(devfreq);
+	}
+
+out:
+	mutex_unlock(&devfreq_list_lock);
+	return err;
+}
+
+/**
+ * devfreq_tickle_device() - Guarantee maximum operation speed for a while
+ *			instaneously.
+ * @dev:	the device to be tickled.
+ * @duration_ms:	the duration of tickle effect.
+ *
+ * Tickle sets the device at the maximum frequency instaneously and
+ * the maximum frequency is guaranteed to be used for the given duration.
+ * For faster user reponse time, an input event may tickle a related device
+ * so that the input event does not need to wait for the DEVFREQ to react with
+ * normal interval.
+ */
+int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	struct devfreq *devfreq;
+	struct opp *opp;
+	unsigned long freq;
+	int err = 0;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	if (!IS_ERR(devfreq)) {
+		freq = devfreq->profile->max_freq;
+		opp = opp_find_freq_floor(devfreq->dev, &freq);
+		freq = opp_get_freq(opp);
+		if (devfreq->previous_freq != freq) {
+			err = devfreq->profile->target(devfreq->dev, opp);
+			if (!err)
+				devfreq->previous_freq = freq;
+		}
+		if (err)
+			dev_err(dev, "%s: Cannot set frequency.\n", __func__);
+		else
+			devfreq->tickle = DIV_ROUND_UP(duration_ms,
+						       DEVFREQ_INTERVAL);
+	}
+
+	if (devfreq_wq && !monitoring) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+	mutex_unlock(&devfreq_list_lock);
+
+	if (IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Cannot find devfreq.\n", __func__);
+		err = PTR_ERR(devfreq);
+	}
+
+	return err;
+}
+
+static int __init devfreq_init(void)
+{
+	mutex_lock(&devfreq_list_lock);
+
+	monitoring = false;
+	devfreq_wq = create_freezable_workqueue("devfreq_wq");
+	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+	mutex_unlock(&devfreq_list_lock);
+
+	devfreq_monitor(&devfreq_work.work);
+	return 0;
+}
+late_initcall(devfreq_init);
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 56a6899..4b6b995 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -21,6 +21,7 @@
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
 #include <linux/opp.h>
+#include <linux/devfreq.h>
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -428,6 +429,8 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
 	list_add_rcu(&new_opp->node, head);
 	mutex_unlock(&dev_opp_list_lock);
 
+	/* Notify generic dvfs for the change */
+	devfreq_update(dev, true);
 	return 0;
 }
 
@@ -512,6 +515,9 @@ unlock:
 	mutex_unlock(&dev_opp_list_lock);
 out:
 	kfree(new_opp);
+
+	/* Notify generic dvfs for the change */
+	devfreq_update(dev, true);
 	return r;
 }
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
new file mode 100644
index 0000000..d08e9f5
--- /dev/null
+++ b/include/linux/devfreq.h
@@ -0,0 +1,108 @@
+/*
+ * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __LINUX_DEVFREQ_H__
+#define __LINUX_DEVFREQ_H__
+
+struct devfreq;
+struct devfreq_dev_status {
+	/* both since the last measure */
+	unsigned long total_time;
+	unsigned long busy_time;
+	unsigned long current_frequency;
+};
+
+struct devfreq_dev_profile {
+	unsigned long max_freq; /* may be larger than the actual value */
+	unsigned long initial_freq;
+	int polling_ms;	/* 0 for at opp change only */
+
+	int (*target)(struct device *dev, struct opp *opp);
+	int (*get_dev_status)(struct device *dev,
+			      struct devfreq_dev_status *stat);
+};
+
+/**
+ * struct devfreq_governor - DEVFREQ Policy Governor
+ * @data	Governor's internal data. The framework does not care of it.
+ * @get_target_freq	Returns desired operating frequency for the device.
+ *			Basically, get_target_freq will run
+ *			devfreq_dev_profile.get_dev_status() to get the
+ *			status of the device (load = busy_time / total_time).
+ */
+struct devfreq_governor {
+	void *data; /* private data for get_target_freq */
+	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+};
+
+/**
+ * struct devfreq - Device DEVFREQ structure
+ * @node	list node - contains the devices with DEVFREQ that have been
+ *		registered.
+ * @dev		device pointer
+ * @profile	device-specific devfreq profile
+ * @governor	method how to choose frequency based on the usage.
+ * @previous_freq	previously configured frequency value.
+ * @next_polling	the number of remaining "devfreq_monitor" executions to
+ *			reevaluate frequency/voltage of the device. Set by
+ *			profile's polling_ms interval.
+ * @tickle	positive if DEVFREQ-tickling is activated for the device.
+ *		at each executino of devfreq_monitor, tickle is decremented.
+ *		User may tickle a device-devfreq in order to set maximum
+ *		frequency instaneously with some guaranteed duration.
+ *
+ * This structure stores the DEVFREQ information for a give device.
+ */
+struct devfreq {
+	struct list_head node;
+
+	struct device *dev;
+	struct devfreq_dev_profile *profile;
+	struct devfreq_governor *governor;
+
+	unsigned long previous_freq;
+	unsigned int next_polling;
+	unsigned int tickle;
+};
+
+#if defined(CONFIG_PM_DEVFREQ)
+extern int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor);
+extern int devfreq_remove_device(struct device *dev);
+extern int devfreq_update(struct device *dev, bool may_not_exist);
+extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+#else /* !CONFIG_PM_DEVFREQ */
+static int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor)
+{
+	return 0;
+}
+
+static int devfreq_remove_device(struct device *dev)
+{
+	return 0;
+}
+
+static int devfreq_update(struct device *dev, bool may_not_exist)
+{
+	return 0;
+}
+
+static int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	return 0;
+}
+#endif /* CONFIG_PM_DEVFREQ */
+
+#endif /* __LINUX_DEVFREQ_H__ */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 4603f08..e5d2e36 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -225,3 +225,28 @@ config PM_OPP
 	  representing individual voltage domains and provides SOC
 	  implementations a ready to use framework to manage OPPs.
 	  For more information, read <file:Documentation/power/opp.txt>
+
+config PM_DEVFREQ
+	bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
+	depends on PM_OPP
+	help
+	  With OPP support, a device may have a list of frequencies and
+	  voltages available. DEVFREQ, a generic DVFS framework can be
+	  registered for a device with OPP support in order to let the
+	  governor provided to DEVFREQ choose an operating frequency
+	  based on the OPP's list and the policy given with DEVFREQ.
+
+	  Each device may have its own governor and policy. DEVFREQ can
+	  reevaluate the device state periodically and/or based on the
+	  OPP list changes (each frequency/voltage pair in OPP may be
+	  disabled or enabled).
+
+	  Like some CPUs with CPUFREQ, a device may have multiple clocks.
+	  However, because the clock frequencies of a single device are
+	  determined by the single device's state, an instance of DEVFREQ
+	  is attached to a single device and returns a "representative"
+	  clock frequency from the OPP of the device, which is also attached
+	  to a device by 1-to-1. The device registering DEVFREQ takes the
+	  responsiblity to "interpret" the frequency listed in OPP and
+	  to set its every clock accordingly with the "target" callback
+	  given to DEVFREQ.
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-11  7:58 ` MyungJoo Ham
@ 2011-05-11  7:58   ` MyungJoo Ham
  -1 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-11  7:58 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park, myungjoo.ham

Three CPUFREQ-like governors are provided as examples.

powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.

performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.

simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.

When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies DEVFREQ to update operating frequency
accordingly. Thus, DEVFREQ users (device drivers) do not need to update
DEVFREQ manually with OPP entry updates or set polling_ms for powersave
, performance, or any other "static" governors.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/devfreq.h      |    5 +++
 2 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index 8e2e45b..251d761 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -351,3 +351,72 @@ static int __init devfreq_init(void)
 	return 0;
 }
 late_initcall(devfreq_init);
+
+static int devfreq_powersave_func(struct devfreq *df,
+				  unsigned long *freq)
+{
+	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
+	return 0;
+}
+
+struct devfreq_governor devfreq_powersave = {
+	.get_target_freq = devfreq_powersave_func,
+};
+
+static int devfreq_performance_func(struct devfreq *df,
+				    unsigned long *freq)
+{
+	*freq = UINT_MAX; /* devfreq_do will run "floor" */
+	return 0;
+}
+
+struct devfreq_governor devfreq_performance = {
+	.get_target_freq = devfreq_performance_func,
+};
+
+/* Constants for DevFreq-Simple-Ondemand (DFSO) */
+#define DFSO_UPTHRESHOLD	(90)
+#define DFSO_DOWNDIFFERENCTIAL	(5)
+static int devfreq_simple_ondemand_func(struct devfreq *df,
+					unsigned long *freq)
+{
+	struct devfreq_dev_status stat;
+	int err = df->profile->get_dev_status(df->dev, &stat);
+	unsigned long long a, b;
+
+	if (err)
+		return err;
+
+	/* Set MAX if it's busy enough */
+	if (stat.busy_time * 100 >
+	    stat.total_time * DFSO_UPTHRESHOLD) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Set MAX if we do not know the initial frequency */
+	if (stat.current_frequency == 0) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Keep the current frequency */
+	if (stat.busy_time * 100 >
+	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
+		*freq = stat.current_frequency;
+		return 0;
+	}
+
+	/* Set the desired frequency based on the load */
+	a = (unsigned long long) stat.busy_time * stat.current_frequency;
+	b = div_u64(a, stat.total_time);
+	b *= 100;
+	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
+	*freq = (unsigned long) b;
+
+	return 0;
+}
+
+struct devfreq_governor devfreq_simple_ondemand = {
+	.get_target_freq = devfreq_simple_ondemand_func,
+};
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index d08e9f5..ec41ba6 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -81,6 +81,11 @@ extern int devfreq_add_device(struct device *dev,
 extern int devfreq_remove_device(struct device *dev);
 extern int devfreq_update(struct device *dev, bool may_not_exist);
 extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+
+extern struct devfreq_governor devfreq_powersave;
+extern struct devfreq_governor devfreq_performance;
+extern struct devfreq_governor devfreq_simple_ondemand;
+
 #else /* !CONFIG_PM_DEVFREQ */
 static int devfreq_add_device(struct device *dev,
 			   struct devfreq_dev_profile *profile,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/3] PM / DEVFREQ: add example governors
@ 2011-05-11  7:58   ` MyungJoo Ham
  0 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-11  7:58 UTC (permalink / raw)
  To: linux-pm
  Cc: Nishanth Menon, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Kyungmin Park, Thomas Gleixner

Three CPUFREQ-like governors are provided as examples.

powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.

performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.

simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.

When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies DEVFREQ to update operating frequency
accordingly. Thus, DEVFREQ users (device drivers) do not need to update
DEVFREQ manually with OPP entry updates or set polling_ms for powersave
, performance, or any other "static" governors.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/devfreq.h      |    5 +++
 2 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index 8e2e45b..251d761 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -351,3 +351,72 @@ static int __init devfreq_init(void)
 	return 0;
 }
 late_initcall(devfreq_init);
+
+static int devfreq_powersave_func(struct devfreq *df,
+				  unsigned long *freq)
+{
+	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
+	return 0;
+}
+
+struct devfreq_governor devfreq_powersave = {
+	.get_target_freq = devfreq_powersave_func,
+};
+
+static int devfreq_performance_func(struct devfreq *df,
+				    unsigned long *freq)
+{
+	*freq = UINT_MAX; /* devfreq_do will run "floor" */
+	return 0;
+}
+
+struct devfreq_governor devfreq_performance = {
+	.get_target_freq = devfreq_performance_func,
+};
+
+/* Constants for DevFreq-Simple-Ondemand (DFSO) */
+#define DFSO_UPTHRESHOLD	(90)
+#define DFSO_DOWNDIFFERENCTIAL	(5)
+static int devfreq_simple_ondemand_func(struct devfreq *df,
+					unsigned long *freq)
+{
+	struct devfreq_dev_status stat;
+	int err = df->profile->get_dev_status(df->dev, &stat);
+	unsigned long long a, b;
+
+	if (err)
+		return err;
+
+	/* Set MAX if it's busy enough */
+	if (stat.busy_time * 100 >
+	    stat.total_time * DFSO_UPTHRESHOLD) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Set MAX if we do not know the initial frequency */
+	if (stat.current_frequency == 0) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Keep the current frequency */
+	if (stat.busy_time * 100 >
+	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
+		*freq = stat.current_frequency;
+		return 0;
+	}
+
+	/* Set the desired frequency based on the load */
+	a = (unsigned long long) stat.busy_time * stat.current_frequency;
+	b = div_u64(a, stat.total_time);
+	b *= 100;
+	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
+	*freq = (unsigned long) b;
+
+	return 0;
+}
+
+struct devfreq_governor devfreq_simple_ondemand = {
+	.get_target_freq = devfreq_simple_ondemand_func,
+};
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index d08e9f5..ec41ba6 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -81,6 +81,11 @@ extern int devfreq_add_device(struct device *dev,
 extern int devfreq_remove_device(struct device *dev);
 extern int devfreq_update(struct device *dev, bool may_not_exist);
 extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+
+extern struct devfreq_governor devfreq_powersave;
+extern struct devfreq_governor devfreq_performance;
+extern struct devfreq_governor devfreq_simple_ondemand;
+
 #else /* !CONFIG_PM_DEVFREQ */
 static int devfreq_add_device(struct device *dev,
 			   struct devfreq_dev_profile *profile,
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-11  7:58 ` MyungJoo Ham
@ 2011-05-11  7:58   ` MyungJoo Ham
  -1 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-11  7:58 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, Rafael J. Wysocki, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park, myungjoo.ham

1. System-wide sysfs interface
- tickle_all	R: number of tickle_all execution
		W: tickle all devfreq devices
- min_interval	R: devfreq monitoring base interval in ms
- monitoring	R: shows whether devfreq monitoring is active or
  not.

2. Device specific sysfs interface
- tickle	R: number of tickle execution for the device
		W: tickle the device

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 drivers/base/power/devfreq.c |  191 ++++++++++++++++++++++++++++++++++++-----
 include/linux/devfreq.h      |    3 +
 2 files changed, 170 insertions(+), 24 deletions(-)

diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index 251d761..ba1b606 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -40,6 +40,9 @@ static LIST_HEAD(devfreq_list);
 /* Exclusive access to devfreq_list and its elements */
 static DEFINE_MUTEX(devfreq_list_lock);
 
+static struct kobject *devfreq_kobj;
+static struct attribute_group dev_attr_group;
+
 /**
  * find_device_devfreq() - find devfreq struct using device pointer
  * @dev:	device pointer used to lookup device DEVFREQ.
@@ -211,6 +214,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
 		queue_delayed_work(devfreq_wq, &devfreq_work,
 				   msecs_to_jiffies(DEVFREQ_INTERVAL));
 	}
+
+	sysfs_update_group(&dev->kobj, &dev_attr_group);
 out:
 	mutex_unlock(&devfreq_list_lock);
 
@@ -237,6 +242,8 @@ int devfreq_remove_device(struct device *dev)
 		return -EINVAL;
 	}
 
+	sysfs_remove_group(&dev->kobj, &dev_attr_group);
+
 	list_del(&devfreq->node);
 
 	kfree(devfreq);
@@ -286,6 +293,38 @@ out:
 	return err;
 }
 
+int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
+{
+	int err = 0;
+	unsigned long freq;
+	struct opp *opp;
+
+	freq = df->profile->max_freq;
+	opp = opp_find_freq_floor(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	freq = opp_get_freq(opp);
+	if (df->previous_freq != freq) {
+		err = df->profile->target(df->dev, opp);
+		if (!err)
+			df->previous_freq = freq;
+	}
+	if (err) {
+		dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
+	} else {
+		df->tickle = delay;
+		df->num_tickle++;
+	}
+
+	if (devfreq_wq && !monitoring) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+
+	return err;
+}
 /**
  * devfreq_tickle_device() - Guarantee maximum operation speed for a while
  *			instaneously.
@@ -301,43 +340,133 @@ out:
 int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
 {
 	struct devfreq *devfreq;
-	struct opp *opp;
-	unsigned long freq;
 	int err = 0;
+	unsigned long delay; /* in num DEVFREQ_INTERVAL */
 
 	mutex_lock(&devfreq_list_lock);
 	devfreq = find_device_devfreq(dev);
-	if (!IS_ERR(devfreq)) {
-		freq = devfreq->profile->max_freq;
-		opp = opp_find_freq_floor(devfreq->dev, &freq);
-		freq = opp_get_freq(opp);
-		if (devfreq->previous_freq != freq) {
-			err = devfreq->profile->target(devfreq->dev, opp);
-			if (!err)
-				devfreq->previous_freq = freq;
-		}
-		if (err)
-			dev_err(dev, "%s: Cannot set frequency.\n", __func__);
-		else
-			devfreq->tickle = DIV_ROUND_UP(duration_ms,
-						       DEVFREQ_INTERVAL);
+	delay = DIV_ROUND_UP(duration_ms, DEVFREQ_INTERVAL);
+
+	if (IS_ERR(devfreq))
+		err = PTR_ERR(devfreq);
+	else
+		err = _devfreq_tickle_device(devfreq, delay);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+static int num_tickle_all;
+static ssize_t tickle_all(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	int duration;
+	struct devfreq *tmp;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return -EINVAL;
 	}
 
-	if (devfreq_wq && !monitoring) {
-		monitoring = true;
-		queue_delayed_work(devfreq_wq, &devfreq_work,
-				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	list_for_each_entry(tmp, &devfreq_list, node) {
+		_devfreq_tickle_device(tmp, delay);
 	}
 	mutex_unlock(&devfreq_list_lock);
 
-	if (IS_ERR(devfreq)) {
-		dev_err(dev, "%s: Cannot find devfreq.\n", __func__);
-		err = PTR_ERR(devfreq);
+	num_tickle_all++;
+	return count;
+}
+
+static ssize_t show_num_tickle_all(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", num_tickle_all);
+}
+
+static ssize_t show_min_interval(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
+}
+
+static ssize_t show_monitoring(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", monitoring ? 1 : 0);
+}
+
+static DEVICE_ATTR(tickle_all, 0644, show_num_tickle_all, tickle_all);
+static DEVICE_ATTR(min_interval, 0444, show_min_interval, NULL);
+static DEVICE_ATTR(monitoring, 0444, show_monitoring, NULL);
+static struct attribute *devfreq_entries[] = {
+	&dev_attr_tickle_all.attr,
+	&dev_attr_min_interval.attr,
+	&dev_attr_monitoring.attr,
+	NULL,
+};
+static struct attribute_group devfreq_attr_group = {
+	.name	= "devfreq",
+	.attrs	= devfreq_entries,
+};
+
+static ssize_t tickle(struct device *dev, struct device_attribute *attr,
+		      const char *buf, size_t count)
+{
+	int duration;
+	struct devfreq *df;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return -EINVAL;
 	}
 
-	return err;
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	df = find_device_devfreq(dev);
+	_devfreq_tickle_device(df, delay);
+	mutex_unlock(&devfreq_list_lock);
+
+	return count;
 }
 
+static ssize_t show_num_tickle(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df;
+
+	df = find_device_devfreq(dev);
+
+	if (!IS_ERR(df))
+		return sprintf(buf, "%d\n", df->num_tickle);
+
+	return PTR_ERR(df);
+}
+
+static DEVICE_ATTR(tickle, 0644, show_num_tickle, tickle);
+static struct attribute *dev_entries[] = {
+	&dev_attr_tickle.attr,
+	NULL,
+};
+static struct attribute_group dev_attr_group = {
+	.name	= NULL,
+	.attrs	= dev_entries,
+};
+
 static int __init devfreq_init(void)
 {
 	mutex_lock(&devfreq_list_lock);
@@ -345,6 +474,20 @@ static int __init devfreq_init(void)
 	monitoring = false;
 	devfreq_wq = create_freezable_workqueue("devfreq_wq");
 	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+
+	/* Create sysfs */
+#ifdef CONFIG_PM
+	devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);
+	if (!devfreq_kobj) {
+		pr_err("Unable to create DEVFREQ kobject.\n");
+		goto out;
+	}
+	if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
+		pr_err("Unable to create DEVFREQ sysfs entries.\n");
+		goto out;
+	}
+#endif
+out:
 	mutex_unlock(&devfreq_list_lock);
 
 	devfreq_monitor(&devfreq_work.work);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index ec41ba6..f6e38ee 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -59,6 +59,7 @@ struct devfreq_governor {
  *		at each executino of devfreq_monitor, tickle is decremented.
  *		User may tickle a device-devfreq in order to set maximum
  *		frequency instaneously with some guaranteed duration.
+ * @num_tickle	number of tickle calls.
  *
  * This structure stores the DEVFREQ information for a give device.
  */
@@ -72,6 +73,8 @@ struct devfreq {
 	unsigned long previous_freq;
 	unsigned int next_polling;
 	unsigned int tickle;
+
+	unsigned int num_tickle;
 };
 
 #if defined(CONFIG_PM_DEVFREQ)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
@ 2011-05-11  7:58   ` MyungJoo Ham
  0 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-11  7:58 UTC (permalink / raw)
  To: linux-pm
  Cc: Nishanth Menon, Len Brown, Greg Kroah-Hartman, linux-kernel,
	Kyungmin Park, Thomas Gleixner

1. System-wide sysfs interface
- tickle_all	R: number of tickle_all execution
		W: tickle all devfreq devices
- min_interval	R: devfreq monitoring base interval in ms
- monitoring	R: shows whether devfreq monitoring is active or
  not.

2. Device specific sysfs interface
- tickle	R: number of tickle execution for the device
		W: tickle the device

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 drivers/base/power/devfreq.c |  191 ++++++++++++++++++++++++++++++++++++-----
 include/linux/devfreq.h      |    3 +
 2 files changed, 170 insertions(+), 24 deletions(-)

diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index 251d761..ba1b606 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -40,6 +40,9 @@ static LIST_HEAD(devfreq_list);
 /* Exclusive access to devfreq_list and its elements */
 static DEFINE_MUTEX(devfreq_list_lock);
 
+static struct kobject *devfreq_kobj;
+static struct attribute_group dev_attr_group;
+
 /**
  * find_device_devfreq() - find devfreq struct using device pointer
  * @dev:	device pointer used to lookup device DEVFREQ.
@@ -211,6 +214,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
 		queue_delayed_work(devfreq_wq, &devfreq_work,
 				   msecs_to_jiffies(DEVFREQ_INTERVAL));
 	}
+
+	sysfs_update_group(&dev->kobj, &dev_attr_group);
 out:
 	mutex_unlock(&devfreq_list_lock);
 
@@ -237,6 +242,8 @@ int devfreq_remove_device(struct device *dev)
 		return -EINVAL;
 	}
 
+	sysfs_remove_group(&dev->kobj, &dev_attr_group);
+
 	list_del(&devfreq->node);
 
 	kfree(devfreq);
@@ -286,6 +293,38 @@ out:
 	return err;
 }
 
+int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
+{
+	int err = 0;
+	unsigned long freq;
+	struct opp *opp;
+
+	freq = df->profile->max_freq;
+	opp = opp_find_freq_floor(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	freq = opp_get_freq(opp);
+	if (df->previous_freq != freq) {
+		err = df->profile->target(df->dev, opp);
+		if (!err)
+			df->previous_freq = freq;
+	}
+	if (err) {
+		dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
+	} else {
+		df->tickle = delay;
+		df->num_tickle++;
+	}
+
+	if (devfreq_wq && !monitoring) {
+		monitoring = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+
+	return err;
+}
 /**
  * devfreq_tickle_device() - Guarantee maximum operation speed for a while
  *			instaneously.
@@ -301,43 +340,133 @@ out:
 int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
 {
 	struct devfreq *devfreq;
-	struct opp *opp;
-	unsigned long freq;
 	int err = 0;
+	unsigned long delay; /* in num DEVFREQ_INTERVAL */
 
 	mutex_lock(&devfreq_list_lock);
 	devfreq = find_device_devfreq(dev);
-	if (!IS_ERR(devfreq)) {
-		freq = devfreq->profile->max_freq;
-		opp = opp_find_freq_floor(devfreq->dev, &freq);
-		freq = opp_get_freq(opp);
-		if (devfreq->previous_freq != freq) {
-			err = devfreq->profile->target(devfreq->dev, opp);
-			if (!err)
-				devfreq->previous_freq = freq;
-		}
-		if (err)
-			dev_err(dev, "%s: Cannot set frequency.\n", __func__);
-		else
-			devfreq->tickle = DIV_ROUND_UP(duration_ms,
-						       DEVFREQ_INTERVAL);
+	delay = DIV_ROUND_UP(duration_ms, DEVFREQ_INTERVAL);
+
+	if (IS_ERR(devfreq))
+		err = PTR_ERR(devfreq);
+	else
+		err = _devfreq_tickle_device(devfreq, delay);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+static int num_tickle_all;
+static ssize_t tickle_all(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	int duration;
+	struct devfreq *tmp;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return -EINVAL;
 	}
 
-	if (devfreq_wq && !monitoring) {
-		monitoring = true;
-		queue_delayed_work(devfreq_wq, &devfreq_work,
-				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	list_for_each_entry(tmp, &devfreq_list, node) {
+		_devfreq_tickle_device(tmp, delay);
 	}
 	mutex_unlock(&devfreq_list_lock);
 
-	if (IS_ERR(devfreq)) {
-		dev_err(dev, "%s: Cannot find devfreq.\n", __func__);
-		err = PTR_ERR(devfreq);
+	num_tickle_all++;
+	return count;
+}
+
+static ssize_t show_num_tickle_all(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", num_tickle_all);
+}
+
+static ssize_t show_min_interval(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
+}
+
+static ssize_t show_monitoring(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", monitoring ? 1 : 0);
+}
+
+static DEVICE_ATTR(tickle_all, 0644, show_num_tickle_all, tickle_all);
+static DEVICE_ATTR(min_interval, 0444, show_min_interval, NULL);
+static DEVICE_ATTR(monitoring, 0444, show_monitoring, NULL);
+static struct attribute *devfreq_entries[] = {
+	&dev_attr_tickle_all.attr,
+	&dev_attr_min_interval.attr,
+	&dev_attr_monitoring.attr,
+	NULL,
+};
+static struct attribute_group devfreq_attr_group = {
+	.name	= "devfreq",
+	.attrs	= devfreq_entries,
+};
+
+static ssize_t tickle(struct device *dev, struct device_attribute *attr,
+		      const char *buf, size_t count)
+{
+	int duration;
+	struct devfreq *df;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return -EINVAL;
 	}
 
-	return err;
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	df = find_device_devfreq(dev);
+	_devfreq_tickle_device(df, delay);
+	mutex_unlock(&devfreq_list_lock);
+
+	return count;
 }
 
+static ssize_t show_num_tickle(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df;
+
+	df = find_device_devfreq(dev);
+
+	if (!IS_ERR(df))
+		return sprintf(buf, "%d\n", df->num_tickle);
+
+	return PTR_ERR(df);
+}
+
+static DEVICE_ATTR(tickle, 0644, show_num_tickle, tickle);
+static struct attribute *dev_entries[] = {
+	&dev_attr_tickle.attr,
+	NULL,
+};
+static struct attribute_group dev_attr_group = {
+	.name	= NULL,
+	.attrs	= dev_entries,
+};
+
 static int __init devfreq_init(void)
 {
 	mutex_lock(&devfreq_list_lock);
@@ -345,6 +474,20 @@ static int __init devfreq_init(void)
 	monitoring = false;
 	devfreq_wq = create_freezable_workqueue("devfreq_wq");
 	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+
+	/* Create sysfs */
+#ifdef CONFIG_PM
+	devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);
+	if (!devfreq_kobj) {
+		pr_err("Unable to create DEVFREQ kobject.\n");
+		goto out;
+	}
+	if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
+		pr_err("Unable to create DEVFREQ sysfs entries.\n");
+		goto out;
+	}
+#endif
+out:
 	mutex_unlock(&devfreq_list_lock);
 
 	devfreq_monitor(&devfreq_work.work);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index ec41ba6..f6e38ee 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -59,6 +59,7 @@ struct devfreq_governor {
  *		at each executino of devfreq_monitor, tickle is decremented.
  *		User may tickle a device-devfreq in order to set maximum
  *		frequency instaneously with some guaranteed duration.
+ * @num_tickle	number of tickle calls.
  *
  * This structure stores the DEVFREQ information for a give device.
  */
@@ -72,6 +73,8 @@ struct devfreq {
 	unsigned long previous_freq;
 	unsigned int next_polling;
 	unsigned int tickle;
+
+	unsigned int num_tickle;
 };
 
 #if defined(CONFIG_PM_DEVFREQ)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-11  7:58   ` MyungJoo Ham
  (?)
  (?)
@ 2011-05-11 22:55   ` Greg KH
  2011-05-17  5:04     ` MyungJoo Ham
  2011-05-17  5:04     ` MyungJoo Ham
  -1 siblings, 2 replies; 36+ messages in thread
From: Greg KH @ 2011-05-11 22:55 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: linux-pm, linux-kernel, Rafael J. Wysocki, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park, myungjoo.ham

On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
> 1. System-wide sysfs interface
> - tickle_all	R: number of tickle_all execution
> 		W: tickle all devfreq devices
> - min_interval	R: devfreq monitoring base interval in ms
> - monitoring	R: shows whether devfreq monitoring is active or
>   not.
> 
> 2. Device specific sysfs interface
> - tickle	R: number of tickle execution for the device
> 		W: tickle the device

Any sysfs file change/addition/removal needs to have a
Documentation/ABI/ entry as well.  Please add that to this patch instead
of burying it in the changelog entry, where no one will be able to find
it in the future.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-11  7:58   ` MyungJoo Ham
  (?)
@ 2011-05-11 22:55   ` Greg KH
  -1 siblings, 0 replies; 36+ messages in thread
From: Greg KH @ 2011-05-11 22:55 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, linux-kernel, Kyungmin Park, linux-pm, Thomas Gleixner

On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
> 1. System-wide sysfs interface
> - tickle_all	R: number of tickle_all execution
> 		W: tickle all devfreq devices
> - min_interval	R: devfreq monitoring base interval in ms
> - monitoring	R: shows whether devfreq monitoring is active or
>   not.
> 
> 2. Device specific sysfs interface
> - tickle	R: number of tickle execution for the device
> 		W: tickle the device

Any sysfs file change/addition/removal needs to have a
Documentation/ABI/ entry as well.  Please add that to this patch instead
of burying it in the changelog entry, where no one will be able to find
it in the future.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-11 22:55   ` Greg KH
  2011-05-17  5:04     ` MyungJoo Ham
@ 2011-05-17  5:04     ` MyungJoo Ham
  2011-05-17 18:32       ` Greg KH
                         ` (3 more replies)
  1 sibling, 4 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-17  5:04 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-pm, linux-kernel, Rafael J. Wysocki, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
> On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
>> 1. System-wide sysfs interface
>> - tickle_all  R: number of tickle_all execution
>>               W: tickle all devfreq devices
>> - min_interval        R: devfreq monitoring base interval in ms
>> - monitoring  R: shows whether devfreq monitoring is active or
>>   not.
>>
>> 2. Device specific sysfs interface
>> - tickle      R: number of tickle execution for the device
>>               W: tickle the device
>
> Any sysfs file change/addition/removal needs to have a
> Documentation/ABI/ entry as well.  Please add that to this patch instead
> of burying it in the changelog entry, where no one will be able to find
> it in the future.
>
> thanks,
>
> greg k-h
>

Sure, I'll add Documentation/ABI/testing/* entries.

However, would it be appropriate for "1. System-wide sysfs interface"
to be in "sysfs-class-power" and for "2. Device specific sysfs
interface" to be in "sysfs-devices-devfreq"?

System-wide sysfs interface is in /sys/class/power/devfreq/* and
Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .

Thank you.


- MyungJoo
-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-11 22:55   ` Greg KH
@ 2011-05-17  5:04     ` MyungJoo Ham
  2011-05-17  5:04     ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-17  5:04 UTC (permalink / raw)
  To: Greg KH; +Cc: Len Brown, linux-kernel, Kyungmin Park, linux-pm, Thomas Gleixner

On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
> On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
>> 1. System-wide sysfs interface
>> - tickle_all  R: number of tickle_all execution
>>               W: tickle all devfreq devices
>> - min_interval        R: devfreq monitoring base interval in ms
>> - monitoring  R: shows whether devfreq monitoring is active or
>>   not.
>>
>> 2. Device specific sysfs interface
>> - tickle      R: number of tickle execution for the device
>>               W: tickle the device
>
> Any sysfs file change/addition/removal needs to have a
> Documentation/ABI/ entry as well.  Please add that to this patch instead
> of burying it in the changelog entry, where no one will be able to find
> it in the future.
>
> thanks,
>
> greg k-h
>

Sure, I'll add Documentation/ABI/testing/* entries.

However, would it be appropriate for "1. System-wide sysfs interface"
to be in "sysfs-class-power" and for "2. Device specific sysfs
interface" to be in "sysfs-devices-devfreq"?

System-wide sysfs interface is in /sys/class/power/devfreq/* and
Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .

Thank you.


- MyungJoo
-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-17  5:04     ` MyungJoo Ham
@ 2011-05-17 18:32       ` Greg KH
  2011-05-17 18:32       ` Greg KH
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Greg KH @ 2011-05-17 18:32 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: linux-pm, linux-kernel, Rafael J. Wysocki, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

On Tue, May 17, 2011 at 02:04:52PM +0900, MyungJoo Ham wrote:
> On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
> > On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
> >> 1. System-wide sysfs interface
> >> - tickle_all  R: number of tickle_all execution
> >>               W: tickle all devfreq devices
> >> - min_interval        R: devfreq monitoring base interval in ms
> >> - monitoring  R: shows whether devfreq monitoring is active or
> >>   not.
> >>
> >> 2. Device specific sysfs interface
> >> - tickle      R: number of tickle execution for the device
> >>               W: tickle the device
> >
> > Any sysfs file change/addition/removal needs to have a
> > Documentation/ABI/ entry as well.  Please add that to this patch instead
> > of burying it in the changelog entry, where no one will be able to find
> > it in the future.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Sure, I'll add Documentation/ABI/testing/* entries.
> 
> However, would it be appropriate for "1. System-wide sysfs interface"
> to be in "sysfs-class-power" and for "2. Device specific sysfs
> interface" to be in "sysfs-devices-devfreq"?
> 
> System-wide sysfs interface is in /sys/class/power/devfreq/* and
> Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .

That makes sense.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-17  5:04     ` MyungJoo Ham
  2011-05-17 18:32       ` Greg KH
@ 2011-05-17 18:32       ` Greg KH
  2011-05-17 22:41       ` Rafael J. Wysocki
  2011-05-17 22:41       ` Rafael J. Wysocki
  3 siblings, 0 replies; 36+ messages in thread
From: Greg KH @ 2011-05-17 18:32 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, linux-kernel, Kyungmin Park, linux-pm, Thomas Gleixner

On Tue, May 17, 2011 at 02:04:52PM +0900, MyungJoo Ham wrote:
> On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
> > On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
> >> 1. System-wide sysfs interface
> >> - tickle_all  R: number of tickle_all execution
> >>               W: tickle all devfreq devices
> >> - min_interval        R: devfreq monitoring base interval in ms
> >> - monitoring  R: shows whether devfreq monitoring is active or
> >>   not.
> >>
> >> 2. Device specific sysfs interface
> >> - tickle      R: number of tickle execution for the device
> >>               W: tickle the device
> >
> > Any sysfs file change/addition/removal needs to have a
> > Documentation/ABI/ entry as well.  Please add that to this patch instead
> > of burying it in the changelog entry, where no one will be able to find
> > it in the future.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Sure, I'll add Documentation/ABI/testing/* entries.
> 
> However, would it be appropriate for "1. System-wide sysfs interface"
> to be in "sysfs-class-power" and for "2. Device specific sysfs
> interface" to be in "sysfs-devices-devfreq"?
> 
> System-wide sysfs interface is in /sys/class/power/devfreq/* and
> Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .

That makes sense.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-11  7:58 ` MyungJoo Ham
                   ` (2 preceding siblings ...)
  (?)
@ 2011-05-17 22:36 ` Rafael J. Wysocki
  2011-05-18  8:22   ` MyungJoo Ham
  2011-05-18  8:22   ` MyungJoo Ham
  -1 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-17 22:36 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park, myungjoo.ham

Hi,

On Wednesday, May 11, 2011, MyungJoo Ham wrote:
> With OPPs, a device may have multiple operable frequency and voltage
> sets. However, there can be multiple possible operable sets and a system
> will need to choose one from them. In order to reduce the power
> consumption (by reducing frequency and voltage) without affecting the
> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
> scheme may be used.
> 
> This patch introduces the DVFS capability to non-CPU devices with OPPs.
> DVFS is a techique whereby the frequency and supplied voltage of a
> device is adjusted on-the-fly. DVFS usually sets the frequency as low
> as possible with given conditions (such as QoS assurance) and adjusts
> voltage according to the chosen frequency in order to reduce power
> consumption and heat dissipation.
> 
> The generic DVFS for devices, DEVFREQ, may appear quite similar with
> /drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
> devices registered and is not suitable to have multiple heterogenous
> devices with different (but simple) governors.
> 
> Normally, DVFS mechanism controls frequency based on the demand for
> the device, and then, chooses voltage based on the chosen frequency.
> DEVFREQ also controls the frequency based on the governor's frequency
> recommendation and let OPP pick up the pair of frequency and voltage
> based on the recommended frequency. Then, the chosen OPP is passed to
> device driver's "target" callback.
> 
> Tested with Exynos4-NURI board.
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> --
> Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.
> 
> Changes from v1(RFC)
> - Rename: DVFS --> DEVFREQ
> - Revised governor design
>     . Governor receives the whole struct devfreq
>     . Governor should gather usage information (thru get_dev_status)
> itself
> - Periodic monitoring runs only when needed.
> - DEVFREQ no more deals with voltage information directly
> - Removed some printks.
> - Some cosmetics update
> - Use freezable_wq.
> ---
>  drivers/base/power/Makefile  |    1 +
>  drivers/base/power/devfreq.c |  353 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/base/power/opp.c     |    6 +
>  include/linux/devfreq.h      |  108 +++++++++++++
>  kernel/power/Kconfig         |   25 +++
>  5 files changed, 493 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/power/devfreq.c
>  create mode 100644 include/linux/devfreq.h
> 
> diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
> index 118c1b9..d7f0ad7 100644
> --- a/drivers/base/power/Makefile
> +++ b/drivers/base/power/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_PM_SLEEP)	+= main.o wakeup.o
>  obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
>  obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
>  obj-$(CONFIG_PM_OPP)	+= opp.o
> +obj-$(CONFIG_PM_DEVFREQ)	+= devfreq.o
>  
>  ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
>  ccflags-$(CONFIG_PM_VERBOSE)   += -DDEBUG
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> new file mode 100644
> index 0000000..8e2e45b
> --- /dev/null
> +++ b/drivers/base/power/devfreq.c
> @@ -0,0 +1,353 @@
> +/*
> + * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
> + *	    for Non-CPU Devices Based on OPP.
> + *
> + * Copyright (C) 2011 Samsung Electronics
> + *	MyungJoo Ham <myungjoo.ham@samsung.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/opp.h>
> +#include <linux/devfreq.h>
> +#include <linux/workqueue.h>
> +#include <linux/platform_device.h>
> +#include <linux/list.h>
> +#include <linux/printk.h>
> +
> +/*
> + * DEVFREQ Monitoring Interval in ms.
> + * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
> + */
> +#define DEVFREQ_INTERVAL	20
> +
> +/*
> + * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
> + * registered device.
> + */

I'm not sure what that means.  Does it happen only if monitoring is 'true'?
If so, perhaps it could be called 'polling'?

> +static bool monitoring;
> +static struct workqueue_struct *devfreq_wq;
> +static struct delayed_work devfreq_work;
> +/* The list of all device-devfreq */
> +static LIST_HEAD(devfreq_list);
> +/* Exclusive access to devfreq_list and its elements */
> +static DEFINE_MUTEX(devfreq_list_lock);
> +
> +/**
> + * find_device_devfreq() - find devfreq struct using device pointer
> + * @dev:	device pointer used to lookup device DEVFREQ.
> + *
> + * Search the list of device DEVFREQs and return the matched device's
> + * DEVFREQ info.
> + */
> +static struct devfreq *find_device_devfreq(struct device *dev)
> +{
> +	struct devfreq *tmp_devfreq, *devfreq = ERR_PTR(-ENODEV);
> +
> +	if (unlikely(IS_ERR_OR_NULL(dev))) {
> +		pr_err("%s: Invalid parameters\n", __func__);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
> +		if (tmp_devfreq->dev == dev) {
> +			devfreq = tmp_devfreq;

Well, it looks like you could simply do "return tmp_devfreq" here
(then you'd only need one local pointer).

> +			break;
> +		}
> +	}
> +
> +	return devfreq;

And return ERR_PTR(-ENODEV) here.

> +}
> +
> +#define dev_dbg_once(dev, fmt, ...)				\
> +	if (!once) {						\
> +		once = 1;					\
> +		dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);	\
> +	}

Why do you need this?

> +/**
> + * devfreq_do() - Check the usage profile of a given device and configure
> + *		frequency and voltage accordingly
> + * @devfreq:	DEVFREQ info of the given device
> + */
> +static int devfreq_do(struct devfreq *devfreq)
> +{
> +	struct opp *opp;
> +	unsigned long freq;
> +	int err;
> +	static int once;

No, please.  Either make it a part of the macro, or remove the macro
entirely.

> +
> +	err = devfreq->governor->get_target_freq(devfreq, &freq);
> +	if (err) {
> +		dev_dbg_once(devfreq->dev, "%s: get_target_freq error(%d)\n",
> +			     __func__, err);
> +		return err;
> +	}
> +
> +	opp = opp_find_freq_ceil(devfreq->dev, &freq);
> +	if (opp == ERR_PTR(-ENODEV))
> +		opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> +	if (IS_ERR(opp)) {
> +		dev_dbg_once(devfreq->dev, "%s: Cannot find opp with %luHz.\n",
> +			     __func__, freq);
> +		return PTR_ERR(opp);
> +	}
> +
> +	freq = opp_get_freq(opp);
> +	if (devfreq->previous_freq != freq) {
> +		err = devfreq->profile->target(devfreq->dev, opp);
> +		if (!err)
> +			devfreq->previous_freq = freq;

I'd do

    if (devfreq->previous_freq == freq)
        return 0;

    err = devfreq->profile->target(devfreq->dev, opp);
    if (err) {
       print message or something;
       return err;
    }

    devfreq->previous_freq = freq;
    return 0;

> +	}
> +
> +	if (err)
> +		dev_dbg_once(devfreq->dev, "%s: Cannot set %luHz/%luuV\n",
> +			     __func__, opp_get_freq(opp), opp_get_voltage(opp));
> +	return err;
> +}
> +
> +/**
> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
> + * @work: the work struct used to run devfreq_monitor periodically.
> + */
> +static void devfreq_monitor(struct work_struct *work)
> +{
> +	struct devfreq *devfreq;
> +	int error;
> +	int reserved = 0;
> +	static int once;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	list_for_each_entry(devfreq, &devfreq_list, node) {
> +		if (devfreq->next_polling == 0)
> +			continue;
> +
> +		reserved++;

Why is the variable called 'reserved'?

> +
> +		if (devfreq->tickle) {
> +			devfreq->tickle--;
> +			continue;
> +		}

I'd put a coment above that explaining what's going on here.

> +		if (devfreq->next_polling == 1) {
> +			error = devfreq_do(devfreq);
> +			if (error && !once) {
> +				once = 1;
> +				dev_err(devfreq->dev, "devfreq_do error(%d)\n",
> +					error);

Hmm.  I'm not sure, but wouldn't it make sense to drop the device from
the list if devfreq_do() returns error code?

> +			}
> +			devfreq->next_polling = DIV_ROUND_UP(
> +						devfreq->profile->polling_ms,
> +						DEVFREQ_INTERVAL);
> +		} else {
> +			devfreq->next_polling--;
> +		}
> +	}
> +
> +	if (reserved) {
> +		monitoring = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	} else {
> +		monitoring = false;
> +	}
> +
> +	mutex_unlock(&devfreq_list_lock);
> +}
> +
> +/**
> + * devfreq_add_device() - Add devfreq feature to the device
> + * @dev:	the device to add devfreq feature.
> + * @profile:	device-specific profile to run devfreq.
> + * @governor:	the policy to choose frequency.
> + */
> +int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
> +		       struct devfreq_governor *governor)
> +{
> +	struct devfreq *new_devfreq, *devfreq;
> +	int err = 0;
> +
> +	if (!dev || !profile || !governor) {
> +		dev_err(dev, "%s: Invalid parameters.\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	devfreq = find_device_devfreq(dev);
> +	if (!IS_ERR(devfreq)) {
> +		dev_err(dev, "%s: Unable to create DEVFREQ for the device. "
> +			"It already has one.\n", __func__);
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
> +	if (!new_devfreq) {
> +		dev_err(dev, "%s: Unable to create DEVFREQ for the device\n",
> +			__func__);
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	new_devfreq->dev = dev;
> +	new_devfreq->profile = profile;
> +	new_devfreq->governor = governor;
> +	new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
> +						 DEVFREQ_INTERVAL);
> +	new_devfreq->previous_freq = profile->initial_freq;
> +
> +	list_add(&new_devfreq->node, &devfreq_list);
> +
> +	if (devfreq_wq && new_devfreq->next_polling && !monitoring) {
> +		monitoring = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	}
> +out:
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	return err;
> +}
> +
> +/**
> + * devfreq_remove_device() - Remove DEVFREQ feature from a device.
> + * @device:	the device to remove devfreq feature.
> + */
> +int devfreq_remove_device(struct device *dev)
> +{
> +	struct devfreq *devfreq;
> +
> +	if (!dev)
> +		return -EINVAL;
> +
> +	mutex_lock(&devfreq_list_lock);
> +	devfreq = find_device_devfreq(dev);
> +	if (IS_ERR(devfreq)) {
> +		dev_err(dev, "%s: Unable to find DEVFREQ entry for the device.\n",
> +			__func__);
> +		mutex_unlock(&devfreq_list_lock);
> +		return -EINVAL;
> +	}
> +
> +	list_del(&devfreq->node);
> +
> +	kfree(devfreq);
> +
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	return 0;
> +}
> +
> +/**
> + * devfreq_update() - Notify that the device OPP has been changed.
> + * @dev:	the device whose OPP has been changed.
> + * @may_not_exist:	do not print error message even if the device
> + *			does not have devfreq entry.
> + */
> +int devfreq_update(struct device *dev, bool may_not_exist)
> +{
> +	struct devfreq *devfreq;
> +	int err = 0;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	devfreq = find_device_devfreq(dev);
> +	if (IS_ERR(devfreq)) {
> +		if (may_not_exist && PTR_ERR(devfreq) == -EINVAL)
> +			goto out;
> +

This is kind of strange.  Why don't you simply pass -ENODEV to the caller
so that it can decide?

> +		err = PTR_ERR(devfreq);
> +		goto out;
> +	}
> +
> +	if (devfreq->tickle) {
> +		unsigned long freq = devfreq->profile->max_freq;
> +		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> +		if (!IS_ERR(opp) && devfreq->previous_freq != freq) {
> +			err = devfreq->profile->target(devfreq->dev, opp);
> +			if (!err)
> +				devfreq->previous_freq = opp_get_freq(opp);
> +		}

This looks like a code duplication from devfreq_do().  Would it make sense
to put it into a separate function?

> +	} else {
> +		err = devfreq_do(devfreq);
> +	}
> +
> +out:
> +	mutex_unlock(&devfreq_list_lock);
> +	return err;
> +}
> +
> +/**
> + * devfreq_tickle_device() - Guarantee maximum operation speed for a while
> + *			instaneously.
> + * @dev:	the device to be tickled.
> + * @duration_ms:	the duration of tickle effect.
> + *
> + * Tickle sets the device at the maximum frequency instaneously and
> + * the maximum frequency is guaranteed to be used for the given duration.
> + * For faster user reponse time, an input event may tickle a related device
> + * so that the input event does not need to wait for the DEVFREQ to react with
> + * normal interval.
> + */
> +int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
> +{
> +	struct devfreq *devfreq;
> +	struct opp *opp;
> +	unsigned long freq;
> +	int err = 0;
> +
> +	mutex_lock(&devfreq_list_lock);
> +	devfreq = find_device_devfreq(dev);

I think we should return here if the device hasn't been found.
Do you want to set 'monitoring' unconditionally and schedule the work item
unconditionally in this function and if so, why not to do that at the
beginning?

> +	if (!IS_ERR(devfreq)) {
> +		freq = devfreq->profile->max_freq;
> +		opp = opp_find_freq_floor(devfreq->dev, &freq);
> +		freq = opp_get_freq(opp);
> +		if (devfreq->previous_freq != freq) {
> +			err = devfreq->profile->target(devfreq->dev, opp);
> +			if (!err)
> +				devfreq->previous_freq = freq;
> +		}
> +		if (err)
> +			dev_err(dev, "%s: Cannot set frequency.\n", __func__);
> +		else
> +			devfreq->tickle = DIV_ROUND_UP(duration_ms,
> +						       DEVFREQ_INTERVAL);
> +	}
> +
> +	if (devfreq_wq && !monitoring) {
> +		monitoring = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	}
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	if (IS_ERR(devfreq)) {
> +		dev_err(dev, "%s: Cannot find devfreq.\n", __func__);
> +		err = PTR_ERR(devfreq);
> +	}
> +
> +	return err;
> +}
> +
> +static int __init devfreq_init(void)
> +{
> +	mutex_lock(&devfreq_list_lock);
> +
> +	monitoring = false;
> +	devfreq_wq = create_freezable_workqueue("devfreq_wq");
> +	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	devfreq_monitor(&devfreq_work.work);
> +	return 0;
> +}
> +late_initcall(devfreq_init);
> diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
> index 56a6899..4b6b995 100644
> --- a/drivers/base/power/opp.c
> +++ b/drivers/base/power/opp.c
> @@ -21,6 +21,7 @@
>  #include <linux/rculist.h>
>  #include <linux/rcupdate.h>
>  #include <linux/opp.h>
> +#include <linux/devfreq.h>
>  
>  /*
>   * Internal data structure organization with the OPP layer library is as
> @@ -428,6 +429,8 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
>  	list_add_rcu(&new_opp->node, head);
>  	mutex_unlock(&dev_opp_list_lock);
>  
> +	/* Notify generic dvfs for the change */
> +	devfreq_update(dev, true);
>  	return 0;
>  }
>  
> @@ -512,6 +515,9 @@ unlock:
>  	mutex_unlock(&dev_opp_list_lock);
>  out:
>  	kfree(new_opp);
> +
> +	/* Notify generic dvfs for the change */
> +	devfreq_update(dev, true);
>  	return r;
>  }
>  
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> new file mode 100644
> index 0000000..d08e9f5
> --- /dev/null
> +++ b/include/linux/devfreq.h
> @@ -0,0 +1,108 @@
> +/*
> + * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
> + *	    for Non-CPU Devices Based on OPP.
> + *
> + * Copyright (C) 2011 Samsung Electronics
> + *	MyungJoo Ham <myungjoo.ham@samsung.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef __LINUX_DEVFREQ_H__
> +#define __LINUX_DEVFREQ_H__
> +
> +struct devfreq;
> +struct devfreq_dev_status {
> +	/* both since the last measure */
> +	unsigned long total_time;
> +	unsigned long busy_time;
> +	unsigned long current_frequency;
> +};
> +
> +struct devfreq_dev_profile {
> +	unsigned long max_freq; /* may be larger than the actual value */
> +	unsigned long initial_freq;
> +	int polling_ms;	/* 0 for at opp change only */
> +
> +	int (*target)(struct device *dev, struct opp *opp);
> +	int (*get_dev_status)(struct device *dev,
> +			      struct devfreq_dev_status *stat);
> +};
> +
> +/**
> + * struct devfreq_governor - DEVFREQ Policy Governor
> + * @data	Governor's internal data. The framework does not care of it.
> + * @get_target_freq	Returns desired operating frequency for the device.
> + *			Basically, get_target_freq will run
> + *			devfreq_dev_profile.get_dev_status() to get the
> + *			status of the device (load = busy_time / total_time).
> + */
> +struct devfreq_governor {
> +	void *data; /* private data for get_target_freq */
> +	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> +};
> +
> +/**
> + * struct devfreq - Device DEVFREQ structure
> + * @node	list node - contains the devices with DEVFREQ that have been
> + *		registered.
> + * @dev		device pointer
> + * @profile	device-specific devfreq profile
> + * @governor	method how to choose frequency based on the usage.
> + * @previous_freq	previously configured frequency value.
> + * @next_polling	the number of remaining "devfreq_monitor" executions to
> + *			reevaluate frequency/voltage of the device. Set by
> + *			profile's polling_ms interval.
> + * @tickle	positive if DEVFREQ-tickling is activated for the device.
> + *		at each executino of devfreq_monitor, tickle is decremented.
> + *		User may tickle a device-devfreq in order to set maximum
> + *		frequency instaneously with some guaranteed duration.
> + *
> + * This structure stores the DEVFREQ information for a give device.
> + */
> +struct devfreq {
> +	struct list_head node;
> +
> +	struct device *dev;
> +	struct devfreq_dev_profile *profile;
> +	struct devfreq_governor *governor;
> +
> +	unsigned long previous_freq;
> +	unsigned int next_polling;
> +	unsigned int tickle;
> +};
> +
> +#if defined(CONFIG_PM_DEVFREQ)
> +extern int devfreq_add_device(struct device *dev,
> +			   struct devfreq_dev_profile *profile,
> +			   struct devfreq_governor *governor);
> +extern int devfreq_remove_device(struct device *dev);
> +extern int devfreq_update(struct device *dev, bool may_not_exist);
> +extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
> +#else /* !CONFIG_PM_DEVFREQ */
> +static int devfreq_add_device(struct device *dev,
> +			   struct devfreq_dev_profile *profile,
> +			   struct devfreq_governor *governor)
> +{
> +	return 0;
> +}
> +
> +static int devfreq_remove_device(struct device *dev)
> +{
> +	return 0;
> +}
> +
> +static int devfreq_update(struct device *dev, bool may_not_exist)
> +{
> +	return 0;
> +}
> +
> +static int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_PM_DEVFREQ */
> +
> +#endif /* __LINUX_DEVFREQ_H__ */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index 4603f08..e5d2e36 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -225,3 +225,28 @@ config PM_OPP
>  	  representing individual voltage domains and provides SOC
>  	  implementations a ready to use framework to manage OPPs.
>  	  For more information, read <file:Documentation/power/opp.txt>
> +
> +config PM_DEVFREQ
> +	bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
> +	depends on PM_OPP

This assumes the user will know if his/her platform uses that code.
It may be a good idea to make it depend on a user-invisible option that
can be selected by the platform.

> +	help
> +	  With OPP support, a device may have a list of frequencies and
> +	  voltages available. DEVFREQ, a generic DVFS framework can be
> +	  registered for a device with OPP support in order to let the
> +	  governor provided to DEVFREQ choose an operating frequency
> +	  based on the OPP's list and the policy given with DEVFREQ.
> +
> +	  Each device may have its own governor and policy. DEVFREQ can
> +	  reevaluate the device state periodically and/or based on the
> +	  OPP list changes (each frequency/voltage pair in OPP may be
> +	  disabled or enabled).
> +
> +	  Like some CPUs with CPUFREQ, a device may have multiple clocks.
> +	  However, because the clock frequencies of a single device are
> +	  determined by the single device's state, an instance of DEVFREQ
> +	  is attached to a single device and returns a "representative"
> +	  clock frequency from the OPP of the device, which is also attached
> +	  to a device by 1-to-1. The device registering DEVFREQ takes the
> +	  responsiblity to "interpret" the frequency listed in OPP and
> +	  to set its every clock accordingly with the "target" callback
> +	  given to DEVFREQ.
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-11  7:58 ` MyungJoo Ham
                   ` (3 preceding siblings ...)
  (?)
@ 2011-05-17 22:36 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-17 22:36 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

Hi,

On Wednesday, May 11, 2011, MyungJoo Ham wrote:
> With OPPs, a device may have multiple operable frequency and voltage
> sets. However, there can be multiple possible operable sets and a system
> will need to choose one from them. In order to reduce the power
> consumption (by reducing frequency and voltage) without affecting the
> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
> scheme may be used.
> 
> This patch introduces the DVFS capability to non-CPU devices with OPPs.
> DVFS is a techique whereby the frequency and supplied voltage of a
> device is adjusted on-the-fly. DVFS usually sets the frequency as low
> as possible with given conditions (such as QoS assurance) and adjusts
> voltage according to the chosen frequency in order to reduce power
> consumption and heat dissipation.
> 
> The generic DVFS for devices, DEVFREQ, may appear quite similar with
> /drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
> devices registered and is not suitable to have multiple heterogenous
> devices with different (but simple) governors.
> 
> Normally, DVFS mechanism controls frequency based on the demand for
> the device, and then, chooses voltage based on the chosen frequency.
> DEVFREQ also controls the frequency based on the governor's frequency
> recommendation and let OPP pick up the pair of frequency and voltage
> based on the recommended frequency. Then, the chosen OPP is passed to
> device driver's "target" callback.
> 
> Tested with Exynos4-NURI board.
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> --
> Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.
> 
> Changes from v1(RFC)
> - Rename: DVFS --> DEVFREQ
> - Revised governor design
>     . Governor receives the whole struct devfreq
>     . Governor should gather usage information (thru get_dev_status)
> itself
> - Periodic monitoring runs only when needed.
> - DEVFREQ no more deals with voltage information directly
> - Removed some printks.
> - Some cosmetics update
> - Use freezable_wq.
> ---
>  drivers/base/power/Makefile  |    1 +
>  drivers/base/power/devfreq.c |  353 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/base/power/opp.c     |    6 +
>  include/linux/devfreq.h      |  108 +++++++++++++
>  kernel/power/Kconfig         |   25 +++
>  5 files changed, 493 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/power/devfreq.c
>  create mode 100644 include/linux/devfreq.h
> 
> diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
> index 118c1b9..d7f0ad7 100644
> --- a/drivers/base/power/Makefile
> +++ b/drivers/base/power/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_PM_SLEEP)	+= main.o wakeup.o
>  obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
>  obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
>  obj-$(CONFIG_PM_OPP)	+= opp.o
> +obj-$(CONFIG_PM_DEVFREQ)	+= devfreq.o
>  
>  ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
>  ccflags-$(CONFIG_PM_VERBOSE)   += -DDEBUG
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> new file mode 100644
> index 0000000..8e2e45b
> --- /dev/null
> +++ b/drivers/base/power/devfreq.c
> @@ -0,0 +1,353 @@
> +/*
> + * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
> + *	    for Non-CPU Devices Based on OPP.
> + *
> + * Copyright (C) 2011 Samsung Electronics
> + *	MyungJoo Ham <myungjoo.ham@samsung.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/err.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/opp.h>
> +#include <linux/devfreq.h>
> +#include <linux/workqueue.h>
> +#include <linux/platform_device.h>
> +#include <linux/list.h>
> +#include <linux/printk.h>
> +
> +/*
> + * DEVFREQ Monitoring Interval in ms.
> + * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
> + */
> +#define DEVFREQ_INTERVAL	20
> +
> +/*
> + * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
> + * registered device.
> + */

I'm not sure what that means.  Does it happen only if monitoring is 'true'?
If so, perhaps it could be called 'polling'?

> +static bool monitoring;
> +static struct workqueue_struct *devfreq_wq;
> +static struct delayed_work devfreq_work;
> +/* The list of all device-devfreq */
> +static LIST_HEAD(devfreq_list);
> +/* Exclusive access to devfreq_list and its elements */
> +static DEFINE_MUTEX(devfreq_list_lock);
> +
> +/**
> + * find_device_devfreq() - find devfreq struct using device pointer
> + * @dev:	device pointer used to lookup device DEVFREQ.
> + *
> + * Search the list of device DEVFREQs and return the matched device's
> + * DEVFREQ info.
> + */
> +static struct devfreq *find_device_devfreq(struct device *dev)
> +{
> +	struct devfreq *tmp_devfreq, *devfreq = ERR_PTR(-ENODEV);
> +
> +	if (unlikely(IS_ERR_OR_NULL(dev))) {
> +		pr_err("%s: Invalid parameters\n", __func__);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
> +		if (tmp_devfreq->dev == dev) {
> +			devfreq = tmp_devfreq;

Well, it looks like you could simply do "return tmp_devfreq" here
(then you'd only need one local pointer).

> +			break;
> +		}
> +	}
> +
> +	return devfreq;

And return ERR_PTR(-ENODEV) here.

> +}
> +
> +#define dev_dbg_once(dev, fmt, ...)				\
> +	if (!once) {						\
> +		once = 1;					\
> +		dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);	\
> +	}

Why do you need this?

> +/**
> + * devfreq_do() - Check the usage profile of a given device and configure
> + *		frequency and voltage accordingly
> + * @devfreq:	DEVFREQ info of the given device
> + */
> +static int devfreq_do(struct devfreq *devfreq)
> +{
> +	struct opp *opp;
> +	unsigned long freq;
> +	int err;
> +	static int once;

No, please.  Either make it a part of the macro, or remove the macro
entirely.

> +
> +	err = devfreq->governor->get_target_freq(devfreq, &freq);
> +	if (err) {
> +		dev_dbg_once(devfreq->dev, "%s: get_target_freq error(%d)\n",
> +			     __func__, err);
> +		return err;
> +	}
> +
> +	opp = opp_find_freq_ceil(devfreq->dev, &freq);
> +	if (opp == ERR_PTR(-ENODEV))
> +		opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> +	if (IS_ERR(opp)) {
> +		dev_dbg_once(devfreq->dev, "%s: Cannot find opp with %luHz.\n",
> +			     __func__, freq);
> +		return PTR_ERR(opp);
> +	}
> +
> +	freq = opp_get_freq(opp);
> +	if (devfreq->previous_freq != freq) {
> +		err = devfreq->profile->target(devfreq->dev, opp);
> +		if (!err)
> +			devfreq->previous_freq = freq;

I'd do

    if (devfreq->previous_freq == freq)
        return 0;

    err = devfreq->profile->target(devfreq->dev, opp);
    if (err) {
       print message or something;
       return err;
    }

    devfreq->previous_freq = freq;
    return 0;

> +	}
> +
> +	if (err)
> +		dev_dbg_once(devfreq->dev, "%s: Cannot set %luHz/%luuV\n",
> +			     __func__, opp_get_freq(opp), opp_get_voltage(opp));
> +	return err;
> +}
> +
> +/**
> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
> + * @work: the work struct used to run devfreq_monitor periodically.
> + */
> +static void devfreq_monitor(struct work_struct *work)
> +{
> +	struct devfreq *devfreq;
> +	int error;
> +	int reserved = 0;
> +	static int once;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	list_for_each_entry(devfreq, &devfreq_list, node) {
> +		if (devfreq->next_polling == 0)
> +			continue;
> +
> +		reserved++;

Why is the variable called 'reserved'?

> +
> +		if (devfreq->tickle) {
> +			devfreq->tickle--;
> +			continue;
> +		}

I'd put a coment above that explaining what's going on here.

> +		if (devfreq->next_polling == 1) {
> +			error = devfreq_do(devfreq);
> +			if (error && !once) {
> +				once = 1;
> +				dev_err(devfreq->dev, "devfreq_do error(%d)\n",
> +					error);

Hmm.  I'm not sure, but wouldn't it make sense to drop the device from
the list if devfreq_do() returns error code?

> +			}
> +			devfreq->next_polling = DIV_ROUND_UP(
> +						devfreq->profile->polling_ms,
> +						DEVFREQ_INTERVAL);
> +		} else {
> +			devfreq->next_polling--;
> +		}
> +	}
> +
> +	if (reserved) {
> +		monitoring = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	} else {
> +		monitoring = false;
> +	}
> +
> +	mutex_unlock(&devfreq_list_lock);
> +}
> +
> +/**
> + * devfreq_add_device() - Add devfreq feature to the device
> + * @dev:	the device to add devfreq feature.
> + * @profile:	device-specific profile to run devfreq.
> + * @governor:	the policy to choose frequency.
> + */
> +int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
> +		       struct devfreq_governor *governor)
> +{
> +	struct devfreq *new_devfreq, *devfreq;
> +	int err = 0;
> +
> +	if (!dev || !profile || !governor) {
> +		dev_err(dev, "%s: Invalid parameters.\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	devfreq = find_device_devfreq(dev);
> +	if (!IS_ERR(devfreq)) {
> +		dev_err(dev, "%s: Unable to create DEVFREQ for the device. "
> +			"It already has one.\n", __func__);
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
> +	if (!new_devfreq) {
> +		dev_err(dev, "%s: Unable to create DEVFREQ for the device\n",
> +			__func__);
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	new_devfreq->dev = dev;
> +	new_devfreq->profile = profile;
> +	new_devfreq->governor = governor;
> +	new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
> +						 DEVFREQ_INTERVAL);
> +	new_devfreq->previous_freq = profile->initial_freq;
> +
> +	list_add(&new_devfreq->node, &devfreq_list);
> +
> +	if (devfreq_wq && new_devfreq->next_polling && !monitoring) {
> +		monitoring = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	}
> +out:
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	return err;
> +}
> +
> +/**
> + * devfreq_remove_device() - Remove DEVFREQ feature from a device.
> + * @device:	the device to remove devfreq feature.
> + */
> +int devfreq_remove_device(struct device *dev)
> +{
> +	struct devfreq *devfreq;
> +
> +	if (!dev)
> +		return -EINVAL;
> +
> +	mutex_lock(&devfreq_list_lock);
> +	devfreq = find_device_devfreq(dev);
> +	if (IS_ERR(devfreq)) {
> +		dev_err(dev, "%s: Unable to find DEVFREQ entry for the device.\n",
> +			__func__);
> +		mutex_unlock(&devfreq_list_lock);
> +		return -EINVAL;
> +	}
> +
> +	list_del(&devfreq->node);
> +
> +	kfree(devfreq);
> +
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	return 0;
> +}
> +
> +/**
> + * devfreq_update() - Notify that the device OPP has been changed.
> + * @dev:	the device whose OPP has been changed.
> + * @may_not_exist:	do not print error message even if the device
> + *			does not have devfreq entry.
> + */
> +int devfreq_update(struct device *dev, bool may_not_exist)
> +{
> +	struct devfreq *devfreq;
> +	int err = 0;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	devfreq = find_device_devfreq(dev);
> +	if (IS_ERR(devfreq)) {
> +		if (may_not_exist && PTR_ERR(devfreq) == -EINVAL)
> +			goto out;
> +

This is kind of strange.  Why don't you simply pass -ENODEV to the caller
so that it can decide?

> +		err = PTR_ERR(devfreq);
> +		goto out;
> +	}
> +
> +	if (devfreq->tickle) {
> +		unsigned long freq = devfreq->profile->max_freq;
> +		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> +		if (!IS_ERR(opp) && devfreq->previous_freq != freq) {
> +			err = devfreq->profile->target(devfreq->dev, opp);
> +			if (!err)
> +				devfreq->previous_freq = opp_get_freq(opp);
> +		}

This looks like a code duplication from devfreq_do().  Would it make sense
to put it into a separate function?

> +	} else {
> +		err = devfreq_do(devfreq);
> +	}
> +
> +out:
> +	mutex_unlock(&devfreq_list_lock);
> +	return err;
> +}
> +
> +/**
> + * devfreq_tickle_device() - Guarantee maximum operation speed for a while
> + *			instaneously.
> + * @dev:	the device to be tickled.
> + * @duration_ms:	the duration of tickle effect.
> + *
> + * Tickle sets the device at the maximum frequency instaneously and
> + * the maximum frequency is guaranteed to be used for the given duration.
> + * For faster user reponse time, an input event may tickle a related device
> + * so that the input event does not need to wait for the DEVFREQ to react with
> + * normal interval.
> + */
> +int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
> +{
> +	struct devfreq *devfreq;
> +	struct opp *opp;
> +	unsigned long freq;
> +	int err = 0;
> +
> +	mutex_lock(&devfreq_list_lock);
> +	devfreq = find_device_devfreq(dev);

I think we should return here if the device hasn't been found.
Do you want to set 'monitoring' unconditionally and schedule the work item
unconditionally in this function and if so, why not to do that at the
beginning?

> +	if (!IS_ERR(devfreq)) {
> +		freq = devfreq->profile->max_freq;
> +		opp = opp_find_freq_floor(devfreq->dev, &freq);
> +		freq = opp_get_freq(opp);
> +		if (devfreq->previous_freq != freq) {
> +			err = devfreq->profile->target(devfreq->dev, opp);
> +			if (!err)
> +				devfreq->previous_freq = freq;
> +		}
> +		if (err)
> +			dev_err(dev, "%s: Cannot set frequency.\n", __func__);
> +		else
> +			devfreq->tickle = DIV_ROUND_UP(duration_ms,
> +						       DEVFREQ_INTERVAL);
> +	}
> +
> +	if (devfreq_wq && !monitoring) {
> +		monitoring = true;
> +		queue_delayed_work(devfreq_wq, &devfreq_work,
> +				   msecs_to_jiffies(DEVFREQ_INTERVAL));
> +	}
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	if (IS_ERR(devfreq)) {
> +		dev_err(dev, "%s: Cannot find devfreq.\n", __func__);
> +		err = PTR_ERR(devfreq);
> +	}
> +
> +	return err;
> +}
> +
> +static int __init devfreq_init(void)
> +{
> +	mutex_lock(&devfreq_list_lock);
> +
> +	monitoring = false;
> +	devfreq_wq = create_freezable_workqueue("devfreq_wq");
> +	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
> +	mutex_unlock(&devfreq_list_lock);
> +
> +	devfreq_monitor(&devfreq_work.work);
> +	return 0;
> +}
> +late_initcall(devfreq_init);
> diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
> index 56a6899..4b6b995 100644
> --- a/drivers/base/power/opp.c
> +++ b/drivers/base/power/opp.c
> @@ -21,6 +21,7 @@
>  #include <linux/rculist.h>
>  #include <linux/rcupdate.h>
>  #include <linux/opp.h>
> +#include <linux/devfreq.h>
>  
>  /*
>   * Internal data structure organization with the OPP layer library is as
> @@ -428,6 +429,8 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
>  	list_add_rcu(&new_opp->node, head);
>  	mutex_unlock(&dev_opp_list_lock);
>  
> +	/* Notify generic dvfs for the change */
> +	devfreq_update(dev, true);
>  	return 0;
>  }
>  
> @@ -512,6 +515,9 @@ unlock:
>  	mutex_unlock(&dev_opp_list_lock);
>  out:
>  	kfree(new_opp);
> +
> +	/* Notify generic dvfs for the change */
> +	devfreq_update(dev, true);
>  	return r;
>  }
>  
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> new file mode 100644
> index 0000000..d08e9f5
> --- /dev/null
> +++ b/include/linux/devfreq.h
> @@ -0,0 +1,108 @@
> +/*
> + * DEVFREQ: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
> + *	    for Non-CPU Devices Based on OPP.
> + *
> + * Copyright (C) 2011 Samsung Electronics
> + *	MyungJoo Ham <myungjoo.ham@samsung.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef __LINUX_DEVFREQ_H__
> +#define __LINUX_DEVFREQ_H__
> +
> +struct devfreq;
> +struct devfreq_dev_status {
> +	/* both since the last measure */
> +	unsigned long total_time;
> +	unsigned long busy_time;
> +	unsigned long current_frequency;
> +};
> +
> +struct devfreq_dev_profile {
> +	unsigned long max_freq; /* may be larger than the actual value */
> +	unsigned long initial_freq;
> +	int polling_ms;	/* 0 for at opp change only */
> +
> +	int (*target)(struct device *dev, struct opp *opp);
> +	int (*get_dev_status)(struct device *dev,
> +			      struct devfreq_dev_status *stat);
> +};
> +
> +/**
> + * struct devfreq_governor - DEVFREQ Policy Governor
> + * @data	Governor's internal data. The framework does not care of it.
> + * @get_target_freq	Returns desired operating frequency for the device.
> + *			Basically, get_target_freq will run
> + *			devfreq_dev_profile.get_dev_status() to get the
> + *			status of the device (load = busy_time / total_time).
> + */
> +struct devfreq_governor {
> +	void *data; /* private data for get_target_freq */
> +	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
> +};
> +
> +/**
> + * struct devfreq - Device DEVFREQ structure
> + * @node	list node - contains the devices with DEVFREQ that have been
> + *		registered.
> + * @dev		device pointer
> + * @profile	device-specific devfreq profile
> + * @governor	method how to choose frequency based on the usage.
> + * @previous_freq	previously configured frequency value.
> + * @next_polling	the number of remaining "devfreq_monitor" executions to
> + *			reevaluate frequency/voltage of the device. Set by
> + *			profile's polling_ms interval.
> + * @tickle	positive if DEVFREQ-tickling is activated for the device.
> + *		at each executino of devfreq_monitor, tickle is decremented.
> + *		User may tickle a device-devfreq in order to set maximum
> + *		frequency instaneously with some guaranteed duration.
> + *
> + * This structure stores the DEVFREQ information for a give device.
> + */
> +struct devfreq {
> +	struct list_head node;
> +
> +	struct device *dev;
> +	struct devfreq_dev_profile *profile;
> +	struct devfreq_governor *governor;
> +
> +	unsigned long previous_freq;
> +	unsigned int next_polling;
> +	unsigned int tickle;
> +};
> +
> +#if defined(CONFIG_PM_DEVFREQ)
> +extern int devfreq_add_device(struct device *dev,
> +			   struct devfreq_dev_profile *profile,
> +			   struct devfreq_governor *governor);
> +extern int devfreq_remove_device(struct device *dev);
> +extern int devfreq_update(struct device *dev, bool may_not_exist);
> +extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
> +#else /* !CONFIG_PM_DEVFREQ */
> +static int devfreq_add_device(struct device *dev,
> +			   struct devfreq_dev_profile *profile,
> +			   struct devfreq_governor *governor)
> +{
> +	return 0;
> +}
> +
> +static int devfreq_remove_device(struct device *dev)
> +{
> +	return 0;
> +}
> +
> +static int devfreq_update(struct device *dev, bool may_not_exist)
> +{
> +	return 0;
> +}
> +
> +static int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_PM_DEVFREQ */
> +
> +#endif /* __LINUX_DEVFREQ_H__ */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index 4603f08..e5d2e36 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -225,3 +225,28 @@ config PM_OPP
>  	  representing individual voltage domains and provides SOC
>  	  implementations a ready to use framework to manage OPPs.
>  	  For more information, read <file:Documentation/power/opp.txt>
> +
> +config PM_DEVFREQ
> +	bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
> +	depends on PM_OPP

This assumes the user will know if his/her platform uses that code.
It may be a good idea to make it depend on a user-invisible option that
can be selected by the platform.

> +	help
> +	  With OPP support, a device may have a list of frequencies and
> +	  voltages available. DEVFREQ, a generic DVFS framework can be
> +	  registered for a device with OPP support in order to let the
> +	  governor provided to DEVFREQ choose an operating frequency
> +	  based on the OPP's list and the policy given with DEVFREQ.
> +
> +	  Each device may have its own governor and policy. DEVFREQ can
> +	  reevaluate the device state periodically and/or based on the
> +	  OPP list changes (each frequency/voltage pair in OPP may be
> +	  disabled or enabled).
> +
> +	  Like some CPUs with CPUFREQ, a device may have multiple clocks.
> +	  However, because the clock frequencies of a single device are
> +	  determined by the single device's state, an instance of DEVFREQ
> +	  is attached to a single device and returns a "representative"
> +	  clock frequency from the OPP of the device, which is also attached
> +	  to a device by 1-to-1. The device registering DEVFREQ takes the
> +	  responsiblity to "interpret" the frequency listed in OPP and
> +	  to set its every clock accordingly with the "target" callback
> +	  given to DEVFREQ.
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-11  7:58   ` MyungJoo Ham
  (?)
  (?)
@ 2011-05-17 22:39   ` Rafael J. Wysocki
  2011-05-18  0:48     ` MyungJoo Ham
  2011-05-18  0:48     ` MyungJoo Ham
  -1 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-17 22:39 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park, myungjoo.ham

On Wednesday, May 11, 2011, MyungJoo Ham wrote:
> Three CPUFREQ-like governors are provided as examples.
> 
> powersave: use the lowest frequency possible. The user (device) should
> set the polling_ms as 0 because polling is useless for this governor.
> 
> performance: use the highest freqeuncy possible. The user (device)
> should set the polling_ms as 0 because polling is useless for this
> governor.
> 
> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
> 
> When a user updates OPP entries (enable/disable/add), OPP framework
> automatically notifies DEVFREQ to update operating frequency
> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> , performance, or any other "static" governors.

Well, do you expect anyone to actually use them?  If not, it would make
more sense to put them into a doc.

Thanks,
Rafael


> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
>  drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/devfreq.h      |    5 +++
>  2 files changed, 74 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> index 8e2e45b..251d761 100644
> --- a/drivers/base/power/devfreq.c
> +++ b/drivers/base/power/devfreq.c
> @@ -351,3 +351,72 @@ static int __init devfreq_init(void)
>  	return 0;
>  }
>  late_initcall(devfreq_init);
> +
> +static int devfreq_powersave_func(struct devfreq *df,
> +				  unsigned long *freq)
> +{
> +	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_powersave = {
> +	.get_target_freq = devfreq_powersave_func,
> +};
> +
> +static int devfreq_performance_func(struct devfreq *df,
> +				    unsigned long *freq)
> +{
> +	*freq = UINT_MAX; /* devfreq_do will run "floor" */
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_performance = {
> +	.get_target_freq = devfreq_performance_func,
> +};
> +
> +/* Constants for DevFreq-Simple-Ondemand (DFSO) */
> +#define DFSO_UPTHRESHOLD	(90)
> +#define DFSO_DOWNDIFFERENCTIAL	(5)
> +static int devfreq_simple_ondemand_func(struct devfreq *df,
> +					unsigned long *freq)
> +{
> +	struct devfreq_dev_status stat;
> +	int err = df->profile->get_dev_status(df->dev, &stat);
> +	unsigned long long a, b;
> +
> +	if (err)
> +		return err;
> +
> +	/* Set MAX if it's busy enough */
> +	if (stat.busy_time * 100 >
> +	    stat.total_time * DFSO_UPTHRESHOLD) {
> +		*freq = UINT_MAX;
> +		return 0;
> +	}
> +
> +	/* Set MAX if we do not know the initial frequency */
> +	if (stat.current_frequency == 0) {
> +		*freq = UINT_MAX;
> +		return 0;
> +	}
> +
> +	/* Keep the current frequency */
> +	if (stat.busy_time * 100 >
> +	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
> +		*freq = stat.current_frequency;
> +		return 0;
> +	}
> +
> +	/* Set the desired frequency based on the load */
> +	a = (unsigned long long) stat.busy_time * stat.current_frequency;
> +	b = div_u64(a, stat.total_time);
> +	b *= 100;
> +	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
> +	*freq = (unsigned long) b;
> +
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_simple_ondemand = {
> +	.get_target_freq = devfreq_simple_ondemand_func,
> +};
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index d08e9f5..ec41ba6 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -81,6 +81,11 @@ extern int devfreq_add_device(struct device *dev,
>  extern int devfreq_remove_device(struct device *dev);
>  extern int devfreq_update(struct device *dev, bool may_not_exist);
>  extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
> +
> +extern struct devfreq_governor devfreq_powersave;
> +extern struct devfreq_governor devfreq_performance;
> +extern struct devfreq_governor devfreq_simple_ondemand;
> +
>  #else /* !CONFIG_PM_DEVFREQ */
>  static int devfreq_add_device(struct device *dev,
>  			   struct devfreq_dev_profile *profile,
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-11  7:58   ` MyungJoo Ham
  (?)
@ 2011-05-17 22:39   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-17 22:39 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

On Wednesday, May 11, 2011, MyungJoo Ham wrote:
> Three CPUFREQ-like governors are provided as examples.
> 
> powersave: use the lowest frequency possible. The user (device) should
> set the polling_ms as 0 because polling is useless for this governor.
> 
> performance: use the highest freqeuncy possible. The user (device)
> should set the polling_ms as 0 because polling is useless for this
> governor.
> 
> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
> 
> When a user updates OPP entries (enable/disable/add), OPP framework
> automatically notifies DEVFREQ to update operating frequency
> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> , performance, or any other "static" governors.

Well, do you expect anyone to actually use them?  If not, it would make
more sense to put them into a doc.

Thanks,
Rafael


> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
>  drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/devfreq.h      |    5 +++
>  2 files changed, 74 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
> index 8e2e45b..251d761 100644
> --- a/drivers/base/power/devfreq.c
> +++ b/drivers/base/power/devfreq.c
> @@ -351,3 +351,72 @@ static int __init devfreq_init(void)
>  	return 0;
>  }
>  late_initcall(devfreq_init);
> +
> +static int devfreq_powersave_func(struct devfreq *df,
> +				  unsigned long *freq)
> +{
> +	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_powersave = {
> +	.get_target_freq = devfreq_powersave_func,
> +};
> +
> +static int devfreq_performance_func(struct devfreq *df,
> +				    unsigned long *freq)
> +{
> +	*freq = UINT_MAX; /* devfreq_do will run "floor" */
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_performance = {
> +	.get_target_freq = devfreq_performance_func,
> +};
> +
> +/* Constants for DevFreq-Simple-Ondemand (DFSO) */
> +#define DFSO_UPTHRESHOLD	(90)
> +#define DFSO_DOWNDIFFERENCTIAL	(5)
> +static int devfreq_simple_ondemand_func(struct devfreq *df,
> +					unsigned long *freq)
> +{
> +	struct devfreq_dev_status stat;
> +	int err = df->profile->get_dev_status(df->dev, &stat);
> +	unsigned long long a, b;
> +
> +	if (err)
> +		return err;
> +
> +	/* Set MAX if it's busy enough */
> +	if (stat.busy_time * 100 >
> +	    stat.total_time * DFSO_UPTHRESHOLD) {
> +		*freq = UINT_MAX;
> +		return 0;
> +	}
> +
> +	/* Set MAX if we do not know the initial frequency */
> +	if (stat.current_frequency == 0) {
> +		*freq = UINT_MAX;
> +		return 0;
> +	}
> +
> +	/* Keep the current frequency */
> +	if (stat.busy_time * 100 >
> +	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
> +		*freq = stat.current_frequency;
> +		return 0;
> +	}
> +
> +	/* Set the desired frequency based on the load */
> +	a = (unsigned long long) stat.busy_time * stat.current_frequency;
> +	b = div_u64(a, stat.total_time);
> +	b *= 100;
> +	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
> +	*freq = (unsigned long) b;
> +
> +	return 0;
> +}
> +
> +struct devfreq_governor devfreq_simple_ondemand = {
> +	.get_target_freq = devfreq_simple_ondemand_func,
> +};
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index d08e9f5..ec41ba6 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -81,6 +81,11 @@ extern int devfreq_add_device(struct device *dev,
>  extern int devfreq_remove_device(struct device *dev);
>  extern int devfreq_update(struct device *dev, bool may_not_exist);
>  extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
> +
> +extern struct devfreq_governor devfreq_powersave;
> +extern struct devfreq_governor devfreq_performance;
> +extern struct devfreq_governor devfreq_simple_ondemand;
> +
>  #else /* !CONFIG_PM_DEVFREQ */
>  static int devfreq_add_device(struct device *dev,
>  			   struct devfreq_dev_profile *profile,
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-17  5:04     ` MyungJoo Ham
                         ` (2 preceding siblings ...)
  2011-05-17 22:41       ` Rafael J. Wysocki
@ 2011-05-17 22:41       ` Rafael J. Wysocki
  2011-05-18  0:43         ` MyungJoo Ham
  2011-05-18  0:43         ` MyungJoo Ham
  3 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-17 22:41 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Greg KH, linux-pm, linux-kernel, Mark Brown, Jiejing Zhang,
	Pavel Machek, Colin Cross, Nishanth Menon, Thomas Gleixner,
	Len Brown, Kyungmin Park

On Tuesday, May 17, 2011, MyungJoo Ham wrote:
> On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
> > On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
> >> 1. System-wide sysfs interface
> >> - tickle_all  R: number of tickle_all execution
> >>               W: tickle all devfreq devices
> >> - min_interval        R: devfreq monitoring base interval in ms
> >> - monitoring  R: shows whether devfreq monitoring is active or
> >>   not.
> >>
> >> 2. Device specific sysfs interface
> >> - tickle      R: number of tickle execution for the device
> >>               W: tickle the device
> >
> > Any sysfs file change/addition/removal needs to have a
> > Documentation/ABI/ entry as well.  Please add that to this patch instead
> > of burying it in the changelog entry, where no one will be able to find
> > it in the future.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Sure, I'll add Documentation/ABI/testing/* entries.
> 
> However, would it be appropriate for "1. System-wide sysfs interface"
> to be in "sysfs-class-power" and for "2. Device specific sysfs
> interface" to be in "sysfs-devices-devfreq"?
> 
> System-wide sysfs interface is in /sys/class/power/devfreq/* and

Hmm.  Why not to use /sys/power/ that's already there?

> Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-17  5:04     ` MyungJoo Ham
  2011-05-17 18:32       ` Greg KH
  2011-05-17 18:32       ` Greg KH
@ 2011-05-17 22:41       ` Rafael J. Wysocki
  2011-05-17 22:41       ` Rafael J. Wysocki
  3 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-17 22:41 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg KH, linux-kernel, Kyungmin Park, linux-pm,
	Thomas Gleixner

On Tuesday, May 17, 2011, MyungJoo Ham wrote:
> On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
> > On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
> >> 1. System-wide sysfs interface
> >> - tickle_all  R: number of tickle_all execution
> >>               W: tickle all devfreq devices
> >> - min_interval        R: devfreq monitoring base interval in ms
> >> - monitoring  R: shows whether devfreq monitoring is active or
> >>   not.
> >>
> >> 2. Device specific sysfs interface
> >> - tickle      R: number of tickle execution for the device
> >>               W: tickle the device
> >
> > Any sysfs file change/addition/removal needs to have a
> > Documentation/ABI/ entry as well.  Please add that to this patch instead
> > of burying it in the changelog entry, where no one will be able to find
> > it in the future.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Sure, I'll add Documentation/ABI/testing/* entries.
> 
> However, would it be appropriate for "1. System-wide sysfs interface"
> to be in "sysfs-class-power" and for "2. Device specific sysfs
> interface" to be in "sysfs-devices-devfreq"?
> 
> System-wide sysfs interface is in /sys/class/power/devfreq/* and

Hmm.  Why not to use /sys/power/ that's already there?

> Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-17 22:41       ` Rafael J. Wysocki
  2011-05-18  0:43         ` MyungJoo Ham
@ 2011-05-18  0:43         ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-18  0:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, linux-pm, linux-kernel, Mark Brown, Jiejing Zhang,
	Pavel Machek, Colin Cross, Nishanth Menon, Thomas Gleixner,
	Len Brown, Kyungmin Park

On Wed, May 18, 2011 at 7:41 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday, May 17, 2011, MyungJoo Ham wrote:
>> On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
>> > On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
>> >> 1. System-wide sysfs interface
>> >> - tickle_all  R: number of tickle_all execution
>> >>               W: tickle all devfreq devices
>> >> - min_interval        R: devfreq monitoring base interval in ms
>> >> - monitoring  R: shows whether devfreq monitoring is active or
>> >>   not.
>> >>
>> >> 2. Device specific sysfs interface
>> >> - tickle      R: number of tickle execution for the device
>> >>               W: tickle the device
>> >
>> > Any sysfs file change/addition/removal needs to have a
>> > Documentation/ABI/ entry as well.  Please add that to this patch instead
>> > of burying it in the changelog entry, where no one will be able to find
>> > it in the future.
>> >
>> > thanks,
>> >
>> > greg k-h
>> >
>>
>> Sure, I'll add Documentation/ABI/testing/* entries.
>>
>> However, would it be appropriate for "1. System-wide sysfs interface"
>> to be in "sysfs-class-power" and for "2. Device specific sysfs
>> interface" to be in "sysfs-devices-devfreq"?
>>
>> System-wide sysfs interface is in /sys/class/power/devfreq/* and
>
> Hmm.  Why not to use /sys/power/ that's already there?

Ah.. that was my mistake. I'd put it at /sys/power/devfreq/*.

Thank you for cleaning that up.

>
>> Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .
>
> Thanks,
> Rafael
>

Cheers!
- MyungJoo
-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-05-17 22:41       ` Rafael J. Wysocki
@ 2011-05-18  0:43         ` MyungJoo Ham
  2011-05-18  0:43         ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-18  0:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg KH, linux-kernel, Kyungmin Park, linux-pm,
	Thomas Gleixner

On Wed, May 18, 2011 at 7:41 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Tuesday, May 17, 2011, MyungJoo Ham wrote:
>> On Thu, May 12, 2011 at 7:55 AM, Greg KH <gregkh@suse.de> wrote:
>> > On Wed, May 11, 2011 at 04:58:43PM +0900, MyungJoo Ham wrote:
>> >> 1. System-wide sysfs interface
>> >> - tickle_all  R: number of tickle_all execution
>> >>               W: tickle all devfreq devices
>> >> - min_interval        R: devfreq monitoring base interval in ms
>> >> - monitoring  R: shows whether devfreq monitoring is active or
>> >>   not.
>> >>
>> >> 2. Device specific sysfs interface
>> >> - tickle      R: number of tickle execution for the device
>> >>               W: tickle the device
>> >
>> > Any sysfs file change/addition/removal needs to have a
>> > Documentation/ABI/ entry as well.  Please add that to this patch instead
>> > of burying it in the changelog entry, where no one will be able to find
>> > it in the future.
>> >
>> > thanks,
>> >
>> > greg k-h
>> >
>>
>> Sure, I'll add Documentation/ABI/testing/* entries.
>>
>> However, would it be appropriate for "1. System-wide sysfs interface"
>> to be in "sysfs-class-power" and for "2. Device specific sysfs
>> interface" to be in "sysfs-devices-devfreq"?
>>
>> System-wide sysfs interface is in /sys/class/power/devfreq/* and
>
> Hmm.  Why not to use /sys/power/ that's already there?

Ah.. that was my mistake. I'd put it at /sys/power/devfreq/*.

Thank you for cleaning that up.

>
>> Device specific sysfs interface is in /sys/...DEVICE.../devfreq/* .
>
> Thanks,
> Rafael
>

Cheers!
- MyungJoo
-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-17 22:39   ` Rafael J. Wysocki
@ 2011-05-18  0:48     ` MyungJoo Ham
  2011-05-18 19:46       ` Rafael J. Wysocki
  2011-05-18 19:46       ` Rafael J. Wysocki
  2011-05-18  0:48     ` MyungJoo Ham
  1 sibling, 2 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-18  0:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> On Wednesday, May 11, 2011, MyungJoo Ham wrote:
>> Three CPUFREQ-like governors are provided as examples.
>>
>> powersave: use the lowest frequency possible. The user (device) should
>> set the polling_ms as 0 because polling is useless for this governor.
>>
>> performance: use the highest freqeuncy possible. The user (device)
>> should set the polling_ms as 0 because polling is useless for this
>> governor.
>>
>> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>>
>> When a user updates OPP entries (enable/disable/add), OPP framework
>> automatically notifies DEVFREQ to update operating frequency
>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
>> , performance, or any other "static" governors.
>
> Well, do you expect anyone to actually use them?  If not, it would make
> more sense to put them into a doc.

According to our experiences of DVFS(although this "DEVFREQ" is not
applied to them, yet) in memory-bus and GPU,
I expect most DEVFREQ users might use "simple_ondemand" and
expect "powersave" and "performance" will probably mostly used while
testing and debugging.
("userspace"-like governor would be also useful for that purpose, but
I'd add it later)


Cheers!
- MyungJoo

>
> Thanks,
> Rafael
>
>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>>  drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/devfreq.h      |    5 +++
>>  2 files changed, 74 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
>> index 8e2e45b..251d761 100644
>> --- a/drivers/base/power/devfreq.c
>> +++ b/drivers/base/power/devfreq.c
>> @@ -351,3 +351,72 @@ static int __init devfreq_init(void)
>>       return 0;
>>  }
>>  late_initcall(devfreq_init);
>> +
>> +static int devfreq_powersave_func(struct devfreq *df,
>> +                               unsigned long *freq)
>> +{
>> +     *freq = 0; /* devfreq_do will run "ceiling" to 0 */
>> +     return 0;
>> +}
>> +
>> +struct devfreq_governor devfreq_powersave = {
>> +     .get_target_freq = devfreq_powersave_func,
>> +};
>> +
>> +static int devfreq_performance_func(struct devfreq *df,
>> +                                 unsigned long *freq)
>> +{
>> +     *freq = UINT_MAX; /* devfreq_do will run "floor" */
>> +     return 0;
>> +}
>> +
>> +struct devfreq_governor devfreq_performance = {
>> +     .get_target_freq = devfreq_performance_func,
>> +};
>> +
>> +/* Constants for DevFreq-Simple-Ondemand (DFSO) */
>> +#define DFSO_UPTHRESHOLD     (90)
>> +#define DFSO_DOWNDIFFERENCTIAL       (5)
>> +static int devfreq_simple_ondemand_func(struct devfreq *df,
>> +                                     unsigned long *freq)
>> +{
>> +     struct devfreq_dev_status stat;
>> +     int err = df->profile->get_dev_status(df->dev, &stat);
>> +     unsigned long long a, b;
>> +
>> +     if (err)
>> +             return err;
>> +
>> +     /* Set MAX if it's busy enough */
>> +     if (stat.busy_time * 100 >
>> +         stat.total_time * DFSO_UPTHRESHOLD) {
>> +             *freq = UINT_MAX;
>> +             return 0;
>> +     }
>> +
>> +     /* Set MAX if we do not know the initial frequency */
>> +     if (stat.current_frequency == 0) {
>> +             *freq = UINT_MAX;
>> +             return 0;
>> +     }
>> +
>> +     /* Keep the current frequency */
>> +     if (stat.busy_time * 100 >
>> +         stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
>> +             *freq = stat.current_frequency;
>> +             return 0;
>> +     }
>> +
>> +     /* Set the desired frequency based on the load */
>> +     a = (unsigned long long) stat.busy_time * stat.current_frequency;
>> +     b = div_u64(a, stat.total_time);
>> +     b *= 100;
>> +     b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>> +     *freq = (unsigned long) b;
>> +
>> +     return 0;
>> +}
>> +
>> +struct devfreq_governor devfreq_simple_ondemand = {
>> +     .get_target_freq = devfreq_simple_ondemand_func,
>> +};
>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>> index d08e9f5..ec41ba6 100644
>> --- a/include/linux/devfreq.h
>> +++ b/include/linux/devfreq.h
>> @@ -81,6 +81,11 @@ extern int devfreq_add_device(struct device *dev,
>>  extern int devfreq_remove_device(struct device *dev);
>>  extern int devfreq_update(struct device *dev, bool may_not_exist);
>>  extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
>> +
>> +extern struct devfreq_governor devfreq_powersave;
>> +extern struct devfreq_governor devfreq_performance;
>> +extern struct devfreq_governor devfreq_simple_ondemand;
>> +
>>  #else /* !CONFIG_PM_DEVFREQ */
>>  static int devfreq_add_device(struct device *dev,
>>                          struct devfreq_dev_profile *profile,
>>
>
>



-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-17 22:39   ` Rafael J. Wysocki
  2011-05-18  0:48     ` MyungJoo Ham
@ 2011-05-18  0:48     ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-18  0:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> On Wednesday, May 11, 2011, MyungJoo Ham wrote:
>> Three CPUFREQ-like governors are provided as examples.
>>
>> powersave: use the lowest frequency possible. The user (device) should
>> set the polling_ms as 0 because polling is useless for this governor.
>>
>> performance: use the highest freqeuncy possible. The user (device)
>> should set the polling_ms as 0 because polling is useless for this
>> governor.
>>
>> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>>
>> When a user updates OPP entries (enable/disable/add), OPP framework
>> automatically notifies DEVFREQ to update operating frequency
>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
>> , performance, or any other "static" governors.
>
> Well, do you expect anyone to actually use them?  If not, it would make
> more sense to put them into a doc.

According to our experiences of DVFS(although this "DEVFREQ" is not
applied to them, yet) in memory-bus and GPU,
I expect most DEVFREQ users might use "simple_ondemand" and
expect "powersave" and "performance" will probably mostly used while
testing and debugging.
("userspace"-like governor would be also useful for that purpose, but
I'd add it later)


Cheers!
- MyungJoo

>
> Thanks,
> Rafael
>
>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>>  drivers/base/power/devfreq.c |   69 ++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/devfreq.h      |    5 +++
>>  2 files changed, 74 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
>> index 8e2e45b..251d761 100644
>> --- a/drivers/base/power/devfreq.c
>> +++ b/drivers/base/power/devfreq.c
>> @@ -351,3 +351,72 @@ static int __init devfreq_init(void)
>>       return 0;
>>  }
>>  late_initcall(devfreq_init);
>> +
>> +static int devfreq_powersave_func(struct devfreq *df,
>> +                               unsigned long *freq)
>> +{
>> +     *freq = 0; /* devfreq_do will run "ceiling" to 0 */
>> +     return 0;
>> +}
>> +
>> +struct devfreq_governor devfreq_powersave = {
>> +     .get_target_freq = devfreq_powersave_func,
>> +};
>> +
>> +static int devfreq_performance_func(struct devfreq *df,
>> +                                 unsigned long *freq)
>> +{
>> +     *freq = UINT_MAX; /* devfreq_do will run "floor" */
>> +     return 0;
>> +}
>> +
>> +struct devfreq_governor devfreq_performance = {
>> +     .get_target_freq = devfreq_performance_func,
>> +};
>> +
>> +/* Constants for DevFreq-Simple-Ondemand (DFSO) */
>> +#define DFSO_UPTHRESHOLD     (90)
>> +#define DFSO_DOWNDIFFERENCTIAL       (5)
>> +static int devfreq_simple_ondemand_func(struct devfreq *df,
>> +                                     unsigned long *freq)
>> +{
>> +     struct devfreq_dev_status stat;
>> +     int err = df->profile->get_dev_status(df->dev, &stat);
>> +     unsigned long long a, b;
>> +
>> +     if (err)
>> +             return err;
>> +
>> +     /* Set MAX if it's busy enough */
>> +     if (stat.busy_time * 100 >
>> +         stat.total_time * DFSO_UPTHRESHOLD) {
>> +             *freq = UINT_MAX;
>> +             return 0;
>> +     }
>> +
>> +     /* Set MAX if we do not know the initial frequency */
>> +     if (stat.current_frequency == 0) {
>> +             *freq = UINT_MAX;
>> +             return 0;
>> +     }
>> +
>> +     /* Keep the current frequency */
>> +     if (stat.busy_time * 100 >
>> +         stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
>> +             *freq = stat.current_frequency;
>> +             return 0;
>> +     }
>> +
>> +     /* Set the desired frequency based on the load */
>> +     a = (unsigned long long) stat.busy_time * stat.current_frequency;
>> +     b = div_u64(a, stat.total_time);
>> +     b *= 100;
>> +     b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>> +     *freq = (unsigned long) b;
>> +
>> +     return 0;
>> +}
>> +
>> +struct devfreq_governor devfreq_simple_ondemand = {
>> +     .get_target_freq = devfreq_simple_ondemand_func,
>> +};
>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>> index d08e9f5..ec41ba6 100644
>> --- a/include/linux/devfreq.h
>> +++ b/include/linux/devfreq.h
>> @@ -81,6 +81,11 @@ extern int devfreq_add_device(struct device *dev,
>>  extern int devfreq_remove_device(struct device *dev);
>>  extern int devfreq_update(struct device *dev, bool may_not_exist);
>>  extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
>> +
>> +extern struct devfreq_governor devfreq_powersave;
>> +extern struct devfreq_governor devfreq_performance;
>> +extern struct devfreq_governor devfreq_simple_ondemand;
>> +
>>  #else /* !CONFIG_PM_DEVFREQ */
>>  static int devfreq_add_device(struct device *dev,
>>                          struct devfreq_dev_profile *profile,
>>
>
>



-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-17 22:36 ` [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs Rafael J. Wysocki
@ 2011-05-18  8:22   ` MyungJoo Ham
  2011-05-18 20:02     ` Rafael J. Wysocki
  2011-05-18 20:02     ` Rafael J. Wysocki
  2011-05-18  8:22   ` MyungJoo Ham
  1 sibling, 2 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-18  8:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> Hi,
>
> On Wednesday, May 11, 2011, MyungJoo Ham wrote:
[]
>> +/*
>> + * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
>> + * registered device.
>> + */
>
> I'm not sure what that means.  Does it happen only if monitoring is 'true'?
> If so, perhaps it could be called 'polling'?
>

Yes, it is correct. I'll change the name to polling.

>> +static bool monitoring;
[]
>> +                     devfreq = tmp_devfreq;
>
> Well, it looks like you could simply do "return tmp_devfreq" here
> (then you'd only need one local pointer).
>
>> +     return devfreq;
>
> And return ERR_PTR(-ENODEV) here.

Good. I'll do that.

>
>> +}
>> +
>> +#define dev_dbg_once(dev, fmt, ...)                          \
>> +     if (!once) {                                            \
>> +             once = 1;                                       \
>> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
>> +     }
>
> Why do you need this?
>

This devfreq_do is going to be called periodically; thus, I want to
print a message if there is an error, but not too many messages with
the repeated calls.

Besides, I'd change the macro like this:

#define dev_dbg_once(dev, fmt, ...)                          \
     {                                                                \
             static int once;                                   \
             if (!once) {                                            \
                     once = 1;                                       \
                     dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
             }                                                 \
     }

so that "static int once;" in functions can be removed.


>> +     static int once;
>
> No, please.  Either make it a part of the macro, or remove the macro
> entirely.
>
>> +
>> +     err = devfreq->governor->get_target_freq(devfreq, &freq);
>> +     if (err) {
>> +             dev_dbg_once(devfreq->dev, "%s: get_target_freq error(%d)\n",
>> +                          __func__, err);
>> +             return err;
>> +     }
>> +
>> +     opp = opp_find_freq_ceil(devfreq->dev, &freq);
>> +     if (opp == ERR_PTR(-ENODEV))
>> +             opp = opp_find_freq_floor(devfreq->dev, &freq);
>> +
>> +     if (IS_ERR(opp)) {
>> +             dev_dbg_once(devfreq->dev, "%s: Cannot find opp with %luHz.\n",
>> +                          __func__, freq);
>> +             return PTR_ERR(opp);
>> +     }
>> +
>> +     freq = opp_get_freq(opp);
>> +     if (devfreq->previous_freq != freq) {
>> +             err = devfreq->profile->target(devfreq->dev, opp);
>> +             if (!err)
>> +                     devfreq->previous_freq = freq;
>
> I'd do
>
>    if (devfreq->previous_freq == freq)
>        return 0;
>
>    err = devfreq->profile->target(devfreq->dev, opp);
>    if (err) {
>       print message or something;
>       return err;
>    }
>
>    devfreq->previous_freq = freq;
>    return 0;
>

Good. That looks more pretty. I'll take that form.

>> +     }
>> +
>> +     if (err)
>> +             dev_dbg_once(devfreq->dev, "%s: Cannot set %luHz/%luuV\n",
>> +                          __func__, opp_get_freq(opp), opp_get_voltage(opp));
>> +     return err;
>> +}
>> +
>> +/**
>> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
>> + * @work: the work struct used to run devfreq_monitor periodically.
>> + */
>> +static void devfreq_monitor(struct work_struct *work)
>> +{
>> +     struct devfreq *devfreq;
>> +     int error;
>> +     int reserved = 0;
>> +     static int once;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +
>> +     list_for_each_entry(devfreq, &devfreq_list, node) {
>> +             if (devfreq->next_polling == 0)
>> +                     continue;
>> +
>> +             reserved++;
>
> Why is the variable called 'reserved'?

The variable reserves the next devfreq_monitor call. If there is no
one reserves, we stop monitoring.

However, I've found a bug in this routine (not affecting "reserved++"
directly though) that a tickled device without polling governor may
not return to its original frequency. It is easily addressed by adding
one more condition to the if statement before reserved++;. I'll fix
that in the next revision.

>
>> +
>> +             if (devfreq->tickle) {
>> +                     devfreq->tickle--;
>> +                     continue;
>> +             }
>
> I'd put a coment above that explaining what's going on here.

Ok.

>
>> +             if (devfreq->next_polling == 1) {
>> +                     error = devfreq_do(devfreq);
>> +                     if (error && !once) {
>> +                             once = 1;
>> +                             dev_err(devfreq->dev, "devfreq_do error(%d)\n",
>> +                                     error);
>
> Hmm.  I'm not sure, but wouldn't it make sense to drop the device from
> the list if devfreq_do() returns error code?

Umm... yeah.. that option (calling devfreq_remove_device() for errors)
is also possible, which will also remove the need for the macro you've
mentioned.

However, when the error is temporary or the device has blocked
changing frequencies temporarily from target callback or governor, it
could be not so desirable.

So, I'm considering to call devfreq_remove_device() at error if the
error is not "EAGAIN". That will also remove the need for the macro
and debug messages above. How about that?

[]
>> +int devfreq_update(struct device *dev, bool may_not_exist)
>> +{
>> +     struct devfreq *devfreq;
>> +     int err = 0;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +
>> +     devfreq = find_device_devfreq(dev);
>> +     if (IS_ERR(devfreq)) {
>> +             if (may_not_exist && PTR_ERR(devfreq) == -EINVAL)
>> +                     goto out;
>> +
>
> This is kind of strange.  Why don't you simply pass -ENODEV to the caller
> so that it can decide?

Ewww.. Indeed. I'll remove that lines.

>
>> +             err = PTR_ERR(devfreq);
>> +             goto out;
>> +     }
>> +
>> +     if (devfreq->tickle) {
>> +             unsigned long freq = devfreq->profile->max_freq;
>> +             struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
>> +
>> +             if (!IS_ERR(opp) && devfreq->previous_freq != freq) {
>> +                     err = devfreq->profile->target(devfreq->dev, opp);
>> +                     if (!err)
>> +                             devfreq->previous_freq = opp_get_freq(opp);
>> +             }
>
> This looks like a code duplication from devfreq_do().  Would it make sense
> to put it into a separate function?

devfreq_do() does not handle tickling. In devfreq_monitor(), it also
calls devfreq_do only when it is not being tickled.

>
>> +     } else {
>> +             err = devfreq_do(devfreq);
>> +     }
>> +
[]
>> +int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
>> +{
>> +     struct devfreq *devfreq;
>> +     struct opp *opp;
>> +     unsigned long freq;
>> +     int err = 0;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +     devfreq = find_device_devfreq(dev);
>
> I think we should return here if the device hasn't been found.
> Do you want to set 'monitoring' unconditionally and schedule the work item
> unconditionally in this function and if so, why not to do that at the
> beginning?

Uhhh.. yes, this part required some cleaning. In fact, the cleaning is
done with the "add sysfs interface" patch, which added sysfs interface
to tickle. I'll restructure the patches and move the cleaned part from
the "add sysfs interface" patch.

[]
>> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
>> index 4603f08..e5d2e36 100644
>> --- a/kernel/power/Kconfig
>> +++ b/kernel/power/Kconfig
>> @@ -225,3 +225,28 @@ config PM_OPP
>>         representing individual voltage domains and provides SOC
>>         implementations a ready to use framework to manage OPPs.
>>         For more information, read <file:Documentation/power/opp.txt>
>> +
>> +config PM_DEVFREQ
>> +     bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
>> +     depends on PM_OPP
>
> This assumes the user will know if his/her platform uses that code.
> It may be a good idea to make it depend on a user-invisible option that
> can be selected by the platform.

I think that like CPUFREQ, users will want to enable and disable
DEVFREQ feature by choice although they cannot choose the governor
directly. I'm also considering to allow users to set governors
forcibly and globally at menuconfig (like CPUFREQ does). With CPUFREQ,
such options helped a lot in troubleshooting of CPU related issues.

Do you think it'd be better to have DEVFREQ enabled unconditionally
(if PM_OPP is available) nonetheless?

>>
>
> Thanks,
> Rafael
>


Cheers!

- MyungJoo
-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-17 22:36 ` [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs Rafael J. Wysocki
  2011-05-18  8:22   ` MyungJoo Ham
@ 2011-05-18  8:22   ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-18  8:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> Hi,
>
> On Wednesday, May 11, 2011, MyungJoo Ham wrote:
[]
>> +/*
>> + * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
>> + * registered device.
>> + */
>
> I'm not sure what that means.  Does it happen only if monitoring is 'true'?
> If so, perhaps it could be called 'polling'?
>

Yes, it is correct. I'll change the name to polling.

>> +static bool monitoring;
[]
>> +                     devfreq = tmp_devfreq;
>
> Well, it looks like you could simply do "return tmp_devfreq" here
> (then you'd only need one local pointer).
>
>> +     return devfreq;
>
> And return ERR_PTR(-ENODEV) here.

Good. I'll do that.

>
>> +}
>> +
>> +#define dev_dbg_once(dev, fmt, ...)                          \
>> +     if (!once) {                                            \
>> +             once = 1;                                       \
>> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
>> +     }
>
> Why do you need this?
>

This devfreq_do is going to be called periodically; thus, I want to
print a message if there is an error, but not too many messages with
the repeated calls.

Besides, I'd change the macro like this:

#define dev_dbg_once(dev, fmt, ...)                          \
     {                                                                \
             static int once;                                   \
             if (!once) {                                            \
                     once = 1;                                       \
                     dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
             }                                                 \
     }

so that "static int once;" in functions can be removed.


>> +     static int once;
>
> No, please.  Either make it a part of the macro, or remove the macro
> entirely.
>
>> +
>> +     err = devfreq->governor->get_target_freq(devfreq, &freq);
>> +     if (err) {
>> +             dev_dbg_once(devfreq->dev, "%s: get_target_freq error(%d)\n",
>> +                          __func__, err);
>> +             return err;
>> +     }
>> +
>> +     opp = opp_find_freq_ceil(devfreq->dev, &freq);
>> +     if (opp == ERR_PTR(-ENODEV))
>> +             opp = opp_find_freq_floor(devfreq->dev, &freq);
>> +
>> +     if (IS_ERR(opp)) {
>> +             dev_dbg_once(devfreq->dev, "%s: Cannot find opp with %luHz.\n",
>> +                          __func__, freq);
>> +             return PTR_ERR(opp);
>> +     }
>> +
>> +     freq = opp_get_freq(opp);
>> +     if (devfreq->previous_freq != freq) {
>> +             err = devfreq->profile->target(devfreq->dev, opp);
>> +             if (!err)
>> +                     devfreq->previous_freq = freq;
>
> I'd do
>
>    if (devfreq->previous_freq == freq)
>        return 0;
>
>    err = devfreq->profile->target(devfreq->dev, opp);
>    if (err) {
>       print message or something;
>       return err;
>    }
>
>    devfreq->previous_freq = freq;
>    return 0;
>

Good. That looks more pretty. I'll take that form.

>> +     }
>> +
>> +     if (err)
>> +             dev_dbg_once(devfreq->dev, "%s: Cannot set %luHz/%luuV\n",
>> +                          __func__, opp_get_freq(opp), opp_get_voltage(opp));
>> +     return err;
>> +}
>> +
>> +/**
>> + * devfreq_monitor() - Regularly run devfreq_do() and support device DEVFREQ tickle.
>> + * @work: the work struct used to run devfreq_monitor periodically.
>> + */
>> +static void devfreq_monitor(struct work_struct *work)
>> +{
>> +     struct devfreq *devfreq;
>> +     int error;
>> +     int reserved = 0;
>> +     static int once;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +
>> +     list_for_each_entry(devfreq, &devfreq_list, node) {
>> +             if (devfreq->next_polling == 0)
>> +                     continue;
>> +
>> +             reserved++;
>
> Why is the variable called 'reserved'?

The variable reserves the next devfreq_monitor call. If there is no
one reserves, we stop monitoring.

However, I've found a bug in this routine (not affecting "reserved++"
directly though) that a tickled device without polling governor may
not return to its original frequency. It is easily addressed by adding
one more condition to the if statement before reserved++;. I'll fix
that in the next revision.

>
>> +
>> +             if (devfreq->tickle) {
>> +                     devfreq->tickle--;
>> +                     continue;
>> +             }
>
> I'd put a coment above that explaining what's going on here.

Ok.

>
>> +             if (devfreq->next_polling == 1) {
>> +                     error = devfreq_do(devfreq);
>> +                     if (error && !once) {
>> +                             once = 1;
>> +                             dev_err(devfreq->dev, "devfreq_do error(%d)\n",
>> +                                     error);
>
> Hmm.  I'm not sure, but wouldn't it make sense to drop the device from
> the list if devfreq_do() returns error code?

Umm... yeah.. that option (calling devfreq_remove_device() for errors)
is also possible, which will also remove the need for the macro you've
mentioned.

However, when the error is temporary or the device has blocked
changing frequencies temporarily from target callback or governor, it
could be not so desirable.

So, I'm considering to call devfreq_remove_device() at error if the
error is not "EAGAIN". That will also remove the need for the macro
and debug messages above. How about that?

[]
>> +int devfreq_update(struct device *dev, bool may_not_exist)
>> +{
>> +     struct devfreq *devfreq;
>> +     int err = 0;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +
>> +     devfreq = find_device_devfreq(dev);
>> +     if (IS_ERR(devfreq)) {
>> +             if (may_not_exist && PTR_ERR(devfreq) == -EINVAL)
>> +                     goto out;
>> +
>
> This is kind of strange.  Why don't you simply pass -ENODEV to the caller
> so that it can decide?

Ewww.. Indeed. I'll remove that lines.

>
>> +             err = PTR_ERR(devfreq);
>> +             goto out;
>> +     }
>> +
>> +     if (devfreq->tickle) {
>> +             unsigned long freq = devfreq->profile->max_freq;
>> +             struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
>> +
>> +             if (!IS_ERR(opp) && devfreq->previous_freq != freq) {
>> +                     err = devfreq->profile->target(devfreq->dev, opp);
>> +                     if (!err)
>> +                             devfreq->previous_freq = opp_get_freq(opp);
>> +             }
>
> This looks like a code duplication from devfreq_do().  Would it make sense
> to put it into a separate function?

devfreq_do() does not handle tickling. In devfreq_monitor(), it also
calls devfreq_do only when it is not being tickled.

>
>> +     } else {
>> +             err = devfreq_do(devfreq);
>> +     }
>> +
[]
>> +int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
>> +{
>> +     struct devfreq *devfreq;
>> +     struct opp *opp;
>> +     unsigned long freq;
>> +     int err = 0;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +     devfreq = find_device_devfreq(dev);
>
> I think we should return here if the device hasn't been found.
> Do you want to set 'monitoring' unconditionally and schedule the work item
> unconditionally in this function and if so, why not to do that at the
> beginning?

Uhhh.. yes, this part required some cleaning. In fact, the cleaning is
done with the "add sysfs interface" patch, which added sysfs interface
to tickle. I'll restructure the patches and move the cleaned part from
the "add sysfs interface" patch.

[]
>> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
>> index 4603f08..e5d2e36 100644
>> --- a/kernel/power/Kconfig
>> +++ b/kernel/power/Kconfig
>> @@ -225,3 +225,28 @@ config PM_OPP
>>         representing individual voltage domains and provides SOC
>>         implementations a ready to use framework to manage OPPs.
>>         For more information, read <file:Documentation/power/opp.txt>
>> +
>> +config PM_DEVFREQ
>> +     bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
>> +     depends on PM_OPP
>
> This assumes the user will know if his/her platform uses that code.
> It may be a good idea to make it depend on a user-invisible option that
> can be selected by the platform.

I think that like CPUFREQ, users will want to enable and disable
DEVFREQ feature by choice although they cannot choose the governor
directly. I'm also considering to allow users to set governors
forcibly and globally at menuconfig (like CPUFREQ does). With CPUFREQ,
such options helped a lot in troubleshooting of CPU related issues.

Do you think it'd be better to have DEVFREQ enabled unconditionally
(if PM_OPP is available) nonetheless?

>>
>
> Thanks,
> Rafael
>


Cheers!

- MyungJoo
-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-18  0:48     ` MyungJoo Ham
  2011-05-18 19:46       ` Rafael J. Wysocki
@ 2011-05-18 19:46       ` Rafael J. Wysocki
  2011-05-27  4:42         ` MyungJoo Ham
  2011-05-27  4:42         ` MyungJoo Ham
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-18 19:46 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

On Wednesday, May 18, 2011, MyungJoo Ham wrote:
> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> > On Wednesday, May 11, 2011, MyungJoo Ham wrote:
> >> Three CPUFREQ-like governors are provided as examples.
> >>
> >> powersave: use the lowest frequency possible. The user (device) should
> >> set the polling_ms as 0 because polling is useless for this governor.
> >>
> >> performance: use the highest freqeuncy possible. The user (device)
> >> should set the polling_ms as 0 because polling is useless for this
> >> governor.
> >>
> >> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
> >>
> >> When a user updates OPP entries (enable/disable/add), OPP framework
> >> automatically notifies DEVFREQ to update operating frequency
> >> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
> >> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> >> , performance, or any other "static" governors.
> >
> > Well, do you expect anyone to actually use them?  If not, it would make
> > more sense to put them into a doc.
> 
> According to our experiences of DVFS(although this "DEVFREQ" is not
> applied to them, yet) in memory-bus and GPU,
> I expect most DEVFREQ users might use "simple_ondemand" and
> expect "powersave" and "performance" will probably mostly used while
> testing and debugging.
> ("userspace"-like governor would be also useful for that purpose, but
> I'd add it later)

It would be good to have at least one in-tree user for each of them
(not necessarily from the start, but at one point in the future at least).

So if you have any _specific_ users in mind, please let me know.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-18  0:48     ` MyungJoo Ham
@ 2011-05-18 19:46       ` Rafael J. Wysocki
  2011-05-18 19:46       ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-18 19:46 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

On Wednesday, May 18, 2011, MyungJoo Ham wrote:
> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> > On Wednesday, May 11, 2011, MyungJoo Ham wrote:
> >> Three CPUFREQ-like governors are provided as examples.
> >>
> >> powersave: use the lowest frequency possible. The user (device) should
> >> set the polling_ms as 0 because polling is useless for this governor.
> >>
> >> performance: use the highest freqeuncy possible. The user (device)
> >> should set the polling_ms as 0 because polling is useless for this
> >> governor.
> >>
> >> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
> >>
> >> When a user updates OPP entries (enable/disable/add), OPP framework
> >> automatically notifies DEVFREQ to update operating frequency
> >> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
> >> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
> >> , performance, or any other "static" governors.
> >
> > Well, do you expect anyone to actually use them?  If not, it would make
> > more sense to put them into a doc.
> 
> According to our experiences of DVFS(although this "DEVFREQ" is not
> applied to them, yet) in memory-bus and GPU,
> I expect most DEVFREQ users might use "simple_ondemand" and
> expect "powersave" and "performance" will probably mostly used while
> testing and debugging.
> ("userspace"-like governor would be also useful for that purpose, but
> I'd add it later)

It would be good to have at least one in-tree user for each of them
(not necessarily from the start, but at one point in the future at least).

So if you have any _specific_ users in mind, please let me know.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18  8:22   ` MyungJoo Ham
  2011-05-18 20:02     ` Rafael J. Wysocki
@ 2011-05-18 20:02     ` Rafael J. Wysocki
  2011-05-18 20:21       ` Pavel Machek
                         ` (3 more replies)
  1 sibling, 4 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-18 20:02 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

On Wednesday, May 18, 2011, MyungJoo Ham wrote:
> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> > Hi,

Hi,

... 
> >
> >> +}
> >> +
> >> +#define dev_dbg_once(dev, fmt, ...)                          \
> >> +     if (!once) {                                            \
> >> +             once = 1;                                       \
> >> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> >> +     }
> >
> > Why do you need this?
> >
> 
> This devfreq_do is going to be called periodically; thus, I want to
> print a message if there is an error, but not too many messages with
> the repeated calls.
> 
> Besides, I'd change the macro like this:
> 
> #define dev_dbg_once(dev, fmt, ...)                          \
>      {                                                                \
>              static int once;                                   \
>              if (!once) {                                            \
>                      once = 1;                                       \
>                      dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
>              }                                                 \
>      }
> 
> so that "static int once;" in functions can be removed.

That's a good change in my opinion, but since there is the dynamic debug
feature, I don't think you need to worry too much about that (the user
can always disable output from those dev_dbg() statements if they generate
too much noise).

...
> >> +     list_for_each_entry(devfreq, &devfreq_list, node) {
> >> +             if (devfreq->next_polling == 0)
> >> +                     continue;
> >> +
> >> +             reserved++;
> >
> > Why is the variable called 'reserved'?
> 
> The variable reserves the next devfreq_monitor call. If there is no
> one reserves, we stop monitoring.

So I'd call it 'continue_polling' or something like this.

> However, I've found a bug in this routine (not affecting "reserved++"
> directly though) that a tickled device without polling governor may
> not return to its original frequency. It is easily addressed by adding
> one more condition to the if statement before reserved++;. I'll fix
> that in the next revision.

OK

> >
> >> +
> >> +             if (devfreq->tickle) {
> >> +                     devfreq->tickle--;
> >> +                     continue;
> >> +             }
> >
> > I'd put a coment above that explaining what's going on here.
> 
> Ok.
> 
> >
> >> +             if (devfreq->next_polling == 1) {
> >> +                     error = devfreq_do(devfreq);
> >> +                     if (error && !once) {
> >> +                             once = 1;
> >> +                             dev_err(devfreq->dev, "devfreq_do error(%d)\n",
> >> +                                     error);
> >
> > Hmm.  I'm not sure, but wouldn't it make sense to drop the device from
> > the list if devfreq_do() returns error code?
> 
> Umm... yeah.. that option (calling devfreq_remove_device() for errors)
> is also possible, which will also remove the need for the macro you've
> mentioned.

Yes.

> However, when the error is temporary or the device has blocked
> changing frequencies temporarily from target callback or governor, it
> could be not so desirable.

I guess we need some experience here.  Namely, it's difficult to say
what's going to be more frequent, devices that have temporary failures
or such that either work or not work at all.

That said, I think the simpler approach is to drop devices from the list
on errors (perhaps depending on the type of the error).

> So, I'm considering to call devfreq_remove_device() at error if the
> error is not "EAGAIN". That will also remove the need for the macro
> and debug messages above. How about that?

Sounds reasonable.

...
> >> @@ -225,3 +225,28 @@ config PM_OPP
> >>         representing individual voltage domains and provides SOC
> >>         implementations a ready to use framework to manage OPPs.
> >>         For more information, read <file:Documentation/power/opp.txt>
> >> +
> >> +config PM_DEVFREQ
> >> +     bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
> >> +     depends on PM_OPP
> >
> > This assumes the user will know if his/her platform uses that code.
> > It may be a good idea to make it depend on a user-invisible option that
> > can be selected by the platform.
> 
> I think that like CPUFREQ, users will want to enable and disable
> DEVFREQ feature by choice although they cannot choose the governor
> directly. I'm also considering to allow users to set governors
> forcibly and globally at menuconfig (like CPUFREQ does). With CPUFREQ,
> such options helped a lot in troubleshooting of CPU related issues.
> 
> Do you think it'd be better to have DEVFREQ enabled unconditionally
> (if PM_OPP is available) nonetheless?

First off, it doesn't make sense to enable it if the platform is not going to
use it.  That's why I think it should depend on a platform-selected option.
Only if that option is set the user should be given the choice to select
DEVFREQ.

Second, I'm not sure if that's a good idea to force DEVFREQ is the platform
is going to use it.  Perhaps in the future if there are no major issues with
it, we can do that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18  8:22   ` MyungJoo Ham
@ 2011-05-18 20:02     ` Rafael J. Wysocki
  2011-05-18 20:02     ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-18 20:02 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

On Wednesday, May 18, 2011, MyungJoo Ham wrote:
> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
> > Hi,

Hi,

... 
> >
> >> +}
> >> +
> >> +#define dev_dbg_once(dev, fmt, ...)                          \
> >> +     if (!once) {                                            \
> >> +             once = 1;                                       \
> >> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> >> +     }
> >
> > Why do you need this?
> >
> 
> This devfreq_do is going to be called periodically; thus, I want to
> print a message if there is an error, but not too many messages with
> the repeated calls.
> 
> Besides, I'd change the macro like this:
> 
> #define dev_dbg_once(dev, fmt, ...)                          \
>      {                                                                \
>              static int once;                                   \
>              if (!once) {                                            \
>                      once = 1;                                       \
>                      dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
>              }                                                 \
>      }
> 
> so that "static int once;" in functions can be removed.

That's a good change in my opinion, but since there is the dynamic debug
feature, I don't think you need to worry too much about that (the user
can always disable output from those dev_dbg() statements if they generate
too much noise).

...
> >> +     list_for_each_entry(devfreq, &devfreq_list, node) {
> >> +             if (devfreq->next_polling == 0)
> >> +                     continue;
> >> +
> >> +             reserved++;
> >
> > Why is the variable called 'reserved'?
> 
> The variable reserves the next devfreq_monitor call. If there is no
> one reserves, we stop monitoring.

So I'd call it 'continue_polling' or something like this.

> However, I've found a bug in this routine (not affecting "reserved++"
> directly though) that a tickled device without polling governor may
> not return to its original frequency. It is easily addressed by adding
> one more condition to the if statement before reserved++;. I'll fix
> that in the next revision.

OK

> >
> >> +
> >> +             if (devfreq->tickle) {
> >> +                     devfreq->tickle--;
> >> +                     continue;
> >> +             }
> >
> > I'd put a coment above that explaining what's going on here.
> 
> Ok.
> 
> >
> >> +             if (devfreq->next_polling == 1) {
> >> +                     error = devfreq_do(devfreq);
> >> +                     if (error && !once) {
> >> +                             once = 1;
> >> +                             dev_err(devfreq->dev, "devfreq_do error(%d)\n",
> >> +                                     error);
> >
> > Hmm.  I'm not sure, but wouldn't it make sense to drop the device from
> > the list if devfreq_do() returns error code?
> 
> Umm... yeah.. that option (calling devfreq_remove_device() for errors)
> is also possible, which will also remove the need for the macro you've
> mentioned.

Yes.

> However, when the error is temporary or the device has blocked
> changing frequencies temporarily from target callback or governor, it
> could be not so desirable.

I guess we need some experience here.  Namely, it's difficult to say
what's going to be more frequent, devices that have temporary failures
or such that either work or not work at all.

That said, I think the simpler approach is to drop devices from the list
on errors (perhaps depending on the type of the error).

> So, I'm considering to call devfreq_remove_device() at error if the
> error is not "EAGAIN". That will also remove the need for the macro
> and debug messages above. How about that?

Sounds reasonable.

...
> >> @@ -225,3 +225,28 @@ config PM_OPP
> >>         representing individual voltage domains and provides SOC
> >>         implementations a ready to use framework to manage OPPs.
> >>         For more information, read <file:Documentation/power/opp.txt>
> >> +
> >> +config PM_DEVFREQ
> >> +     bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
> >> +     depends on PM_OPP
> >
> > This assumes the user will know if his/her platform uses that code.
> > It may be a good idea to make it depend on a user-invisible option that
> > can be selected by the platform.
> 
> I think that like CPUFREQ, users will want to enable and disable
> DEVFREQ feature by choice although they cannot choose the governor
> directly. I'm also considering to allow users to set governors
> forcibly and globally at menuconfig (like CPUFREQ does). With CPUFREQ,
> such options helped a lot in troubleshooting of CPU related issues.
> 
> Do you think it'd be better to have DEVFREQ enabled unconditionally
> (if PM_OPP is available) nonetheless?

First off, it doesn't make sense to enable it if the platform is not going to
use it.  That's why I think it should depend on a platform-selected option.
Only if that option is set the user should be given the choice to select
DEVFREQ.

Second, I'm not sure if that's a good idea to force DEVFREQ is the platform
is going to use it.  Perhaps in the future if there are no major issues with
it, we can do that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18 20:02     ` Rafael J. Wysocki
@ 2011-05-18 20:21       ` Pavel Machek
  2011-05-18 20:29         ` Rafael J. Wysocki
  2011-05-18 20:29         ` Rafael J. Wysocki
  2011-05-18 20:21       ` Pavel Machek
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 36+ messages in thread
From: Pavel Machek @ 2011-05-18 20:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: myungjoo.ham, linux-pm, linux-kernel, Greg Kroah-Hartman,
	Mark Brown, Jiejing Zhang, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

Hi!

> > >> +#define dev_dbg_once(dev, fmt, ...)                          \
> > >> +     if (!once) {                                            \
> > >> +             once = 1;                                       \
> > >> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> > >> +     }
> > >
> > > Why do you need this?
> > >
> > 
> > This devfreq_do is going to be called periodically; thus, I want to
> > print a message if there is an error, but not too many messages with
> > the repeated calls.
> > 
> > Besides, I'd change the macro like this:
> > 
> > #define dev_dbg_once(dev, fmt, ...)                          \
> >      {                                                                \
> >              static int once;                                   \
> >              if (!once) {                                            \
> >                      once = 1;                                       \
> >                      dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> >              }                                                 \
> >      }
> > 
> > so that "static int once;" in functions can be removed.
> 
> That's a good change in my opinion, but since there is the dynamic debug
> feature, I don't think you need to worry too much about that (the user
> can always disable output from those dev_dbg() statements if they generate
> too much noise).

Well... we do print-once in other places, too. And that way, we can
maybe enable those prints by default...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18 20:02     ` Rafael J. Wysocki
  2011-05-18 20:21       ` Pavel Machek
@ 2011-05-18 20:21       ` Pavel Machek
  2011-05-20  5:36       ` MyungJoo Ham
  2011-05-20  5:36       ` MyungJoo Ham
  3 siblings, 0 replies; 36+ messages in thread
From: Pavel Machek @ 2011-05-18 20:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

Hi!

> > >> +#define dev_dbg_once(dev, fmt, ...)                          \
> > >> +     if (!once) {                                            \
> > >> +             once = 1;                                       \
> > >> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> > >> +     }
> > >
> > > Why do you need this?
> > >
> > 
> > This devfreq_do is going to be called periodically; thus, I want to
> > print a message if there is an error, but not too many messages with
> > the repeated calls.
> > 
> > Besides, I'd change the macro like this:
> > 
> > #define dev_dbg_once(dev, fmt, ...)                          \
> >      {                                                                \
> >              static int once;                                   \
> >              if (!once) {                                            \
> >                      once = 1;                                       \
> >                      dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> >              }                                                 \
> >      }
> > 
> > so that "static int once;" in functions can be removed.
> 
> That's a good change in my opinion, but since there is the dynamic debug
> feature, I don't think you need to worry too much about that (the user
> can always disable output from those dev_dbg() statements if they generate
> too much noise).

Well... we do print-once in other places, too. And that way, we can
maybe enable those prints by default...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18 20:21       ` Pavel Machek
  2011-05-18 20:29         ` Rafael J. Wysocki
@ 2011-05-18 20:29         ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-18 20:29 UTC (permalink / raw)
  To: Pavel Machek
  Cc: myungjoo.ham, linux-pm, linux-kernel, Greg Kroah-Hartman,
	Mark Brown, Jiejing Zhang, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

On Wednesday, May 18, 2011, Pavel Machek wrote:
> Hi!
> 
> > > >> +#define dev_dbg_once(dev, fmt, ...)                          \
> > > >> +     if (!once) {                                            \
> > > >> +             once = 1;                                       \
> > > >> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> > > >> +     }
> > > >
> > > > Why do you need this?
> > > >
> > > 
> > > This devfreq_do is going to be called periodically; thus, I want to
> > > print a message if there is an error, but not too many messages with
> > > the repeated calls.
> > > 
> > > Besides, I'd change the macro like this:
> > > 
> > > #define dev_dbg_once(dev, fmt, ...)                          \
> > >      {                                                                \
> > >              static int once;                                   \
> > >              if (!once) {                                            \
> > >                      once = 1;                                       \
> > >                      dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> > >              }                                                 \
> > >      }
> > > 
> > > so that "static int once;" in functions can be removed.
> > 
> > That's a good change in my opinion, but since there is the dynamic debug
> > feature, I don't think you need to worry too much about that (the user
> > can always disable output from those dev_dbg() statements if they generate
> > too much noise).
> 
> Well... we do print-once in other places, too. And that way, we can
> maybe enable those prints by default...

I'm not sure what you mean?

Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18 20:21       ` Pavel Machek
@ 2011-05-18 20:29         ` Rafael J. Wysocki
  2011-05-18 20:29         ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2011-05-18 20:29 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

On Wednesday, May 18, 2011, Pavel Machek wrote:
> Hi!
> 
> > > >> +#define dev_dbg_once(dev, fmt, ...)                          \
> > > >> +     if (!once) {                                            \
> > > >> +             once = 1;                                       \
> > > >> +             dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> > > >> +     }
> > > >
> > > > Why do you need this?
> > > >
> > > 
> > > This devfreq_do is going to be called periodically; thus, I want to
> > > print a message if there is an error, but not too many messages with
> > > the repeated calls.
> > > 
> > > Besides, I'd change the macro like this:
> > > 
> > > #define dev_dbg_once(dev, fmt, ...)                          \
> > >      {                                                                \
> > >              static int once;                                   \
> > >              if (!once) {                                            \
> > >                      once = 1;                                       \
> > >                      dev_dbg(dev, pr_fmt(fmt), ##__VA_ARGS__);       \
> > >              }                                                 \
> > >      }
> > > 
> > > so that "static int once;" in functions can be removed.
> > 
> > That's a good change in my opinion, but since there is the dynamic debug
> > feature, I don't think you need to worry too much about that (the user
> > can always disable output from those dev_dbg() statements if they generate
> > too much noise).
> 
> Well... we do print-once in other places, too. And that way, we can
> maybe enable those prints by default...

I'm not sure what you mean?

Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18 20:02     ` Rafael J. Wysocki
  2011-05-18 20:21       ` Pavel Machek
  2011-05-18 20:21       ` Pavel Machek
@ 2011-05-20  5:36       ` MyungJoo Ham
  2011-05-20  5:36       ` MyungJoo Ham
  3 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-20  5:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

Hello,

On Thu, May 19, 2011 at 5:02 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Wednesday, May 18, 2011, MyungJoo Ham wrote:
>> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
>> > Hi,
>
> Hi,
>
> ...
[]
>>
>> Umm... yeah.. that option (calling devfreq_remove_device() for errors)
>> is also possible, which will also remove the need for the macro you've
>> mentioned.
>
> Yes.
>
>> However, when the error is temporary or the device has blocked
>> changing frequencies temporarily from target callback or governor, it
>> could be not so desirable.
>
> I guess we need some experience here.  Namely, it's difficult to say
> what's going to be more frequent, devices that have temporary failures
> or such that either work or not work at all.
>
> That said, I think the simpler approach is to drop devices from the list
> on errors (perhaps depending on the type of the error).
>
>> So, I'm considering to call devfreq_remove_device() at error if the
>> error is not "EAGAIN". That will also remove the need for the macro
>> and debug messages above. How about that?
>
> Sounds reasonable.

Alright, I'll try this in the next revision.

>
> ...
>> >> @@ -225,3 +225,28 @@ config PM_OPP
>> >>         representing individual voltage domains and provides SOC
>> >>         implementations a ready to use framework to manage OPPs.
>> >>         For more information, read <file:Documentation/power/opp.txt>
>> >> +
>> >> +config PM_DEVFREQ
>> >> +     bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
>> >> +     depends on PM_OPP
>> >
>> > This assumes the user will know if his/her platform uses that code.
>> > It may be a good idea to make it depend on a user-invisible option that
>> > can be selected by the platform.
>>
>> I think that like CPUFREQ, users will want to enable and disable
>> DEVFREQ feature by choice although they cannot choose the governor
>> directly. I'm also considering to allow users to set governors
>> forcibly and globally at menuconfig (like CPUFREQ does). With CPUFREQ,
>> such options helped a lot in troubleshooting of CPU related issues.
>>
>> Do you think it'd be better to have DEVFREQ enabled unconditionally
>> (if PM_OPP is available) nonetheless?
>
> First off, it doesn't make sense to enable it if the platform is not going to
> use it.  That's why I think it should depend on a platform-selected option.
> Only if that option is set the user should be given the choice to select
> DEVFREQ.
>
> Second, I'm not sure if that's a good idea to force DEVFREQ is the platform
> is going to use it.  Perhaps in the future if there are no major issues with
> it, we can do that.
>
> Thanks,
> Rafael
>

I see.

I'll open an option to enable/diable DEVFREQ and will make it depends
on OPP and add platform-selected option like OPP does.


Thank you.


Cheers! It's Friday :)
- MyungJoo

-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-05-18 20:02     ` Rafael J. Wysocki
                         ` (2 preceding siblings ...)
  2011-05-20  5:36       ` MyungJoo Ham
@ 2011-05-20  5:36       ` MyungJoo Ham
  3 siblings, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-20  5:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

Hello,

On Thu, May 19, 2011 at 5:02 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Wednesday, May 18, 2011, MyungJoo Ham wrote:
>> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
>> > Hi,
>
> Hi,
>
> ...
[]
>>
>> Umm... yeah.. that option (calling devfreq_remove_device() for errors)
>> is also possible, which will also remove the need for the macro you've
>> mentioned.
>
> Yes.
>
>> However, when the error is temporary or the device has blocked
>> changing frequencies temporarily from target callback or governor, it
>> could be not so desirable.
>
> I guess we need some experience here.  Namely, it's difficult to say
> what's going to be more frequent, devices that have temporary failures
> or such that either work or not work at all.
>
> That said, I think the simpler approach is to drop devices from the list
> on errors (perhaps depending on the type of the error).
>
>> So, I'm considering to call devfreq_remove_device() at error if the
>> error is not "EAGAIN". That will also remove the need for the macro
>> and debug messages above. How about that?
>
> Sounds reasonable.

Alright, I'll try this in the next revision.

>
> ...
>> >> @@ -225,3 +225,28 @@ config PM_OPP
>> >>         representing individual voltage domains and provides SOC
>> >>         implementations a ready to use framework to manage OPPs.
>> >>         For more information, read <file:Documentation/power/opp.txt>
>> >> +
>> >> +config PM_DEVFREQ
>> >> +     bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
>> >> +     depends on PM_OPP
>> >
>> > This assumes the user will know if his/her platform uses that code.
>> > It may be a good idea to make it depend on a user-invisible option that
>> > can be selected by the platform.
>>
>> I think that like CPUFREQ, users will want to enable and disable
>> DEVFREQ feature by choice although they cannot choose the governor
>> directly. I'm also considering to allow users to set governors
>> forcibly and globally at menuconfig (like CPUFREQ does). With CPUFREQ,
>> such options helped a lot in troubleshooting of CPU related issues.
>>
>> Do you think it'd be better to have DEVFREQ enabled unconditionally
>> (if PM_OPP is available) nonetheless?
>
> First off, it doesn't make sense to enable it if the platform is not going to
> use it.  That's why I think it should depend on a platform-selected option.
> Only if that option is set the user should be given the choice to select
> DEVFREQ.
>
> Second, I'm not sure if that's a good idea to force DEVFREQ is the platform
> is going to use it.  Perhaps in the future if there are no major issues with
> it, we can do that.
>
> Thanks,
> Rafael
>

I see.

I'll open an option to enable/diable DEVFREQ and will make it depends
on OPP and add platform-selected option like OPP does.


Thank you.


Cheers! It's Friday :)
- MyungJoo

-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-18 19:46       ` Rafael J. Wysocki
@ 2011-05-27  4:42         ` MyungJoo Ham
  2011-05-27  4:42         ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-27  4:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linux-kernel, Greg Kroah-Hartman, Mark Brown,
	Jiejing Zhang, Pavel Machek, Colin Cross, Nishanth Menon,
	Thomas Gleixner, Len Brown, Kyungmin Park

On Thu, May 19, 2011 at 4:46 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Wednesday, May 18, 2011, MyungJoo Ham wrote:
>> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
>> > On Wednesday, May 11, 2011, MyungJoo Ham wrote:
>> >> Three CPUFREQ-like governors are provided as examples.
>> >>
>> >> powersave: use the lowest frequency possible. The user (device) should
>> >> set the polling_ms as 0 because polling is useless for this governor.
>> >>
>> >> performance: use the highest freqeuncy possible. The user (device)
>> >> should set the polling_ms as 0 because polling is useless for this
>> >> governor.
>> >>
>> >> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>> >>
>> >> When a user updates OPP entries (enable/disable/add), OPP framework
>> >> automatically notifies DEVFREQ to update operating frequency
>> >> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>> >> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
>> >> , performance, or any other "static" governors.
>> >
>> > Well, do you expect anyone to actually use them?  If not, it would make
>> > more sense to put them into a doc.
>>
>> According to our experiences of DVFS(although this "DEVFREQ" is not
>> applied to them, yet) in memory-bus and GPU,
>> I expect most DEVFREQ users might use "simple_ondemand" and
>> expect "powersave" and "performance" will probably mostly used while
>> testing and debugging.
>> ("userspace"-like governor would be also useful for that purpose, but
>> I'd add it later)
>
> It would be good to have at least one in-tree user for each of them
> (not necessarily from the start, but at one point in the future at least).
>
> So if you have any _specific_ users in mind, please let me know.

For Exynos4, the DRAM bus, which is included in cpufreq and altered
along with CPU, but would be better to alter independently, and the
G3D core (MALI) are the candidate. In our kernel hacks, the DRAM bus
frequency varies independently from CPU with DRAM usage monitoring and
I think it can be easily modified to use DEVFREQ; thus, I'm going to
"translate" this to DEVFREQ next time. For MALI, I don't know whether
there is non-copyrighted driver or now, but they do support DVFS and
are using DVFS with some PITA driver support.

These two (EXYNOS4 DRAM bus and MALI) are the specific users in mind for now.
Probably, out-of-SoC devices such as WiFi, BT, and others may be the
candidate as well.

>
> Thanks,
> Rafael
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/3] PM / DEVFREQ: add example governors
  2011-05-18 19:46       ` Rafael J. Wysocki
  2011-05-27  4:42         ` MyungJoo Ham
@ 2011-05-27  4:42         ` MyungJoo Ham
  1 sibling, 0 replies; 36+ messages in thread
From: MyungJoo Ham @ 2011-05-27  4:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, linux-kernel, Kyungmin Park,
	linux-pm, Thomas Gleixner

On Thu, May 19, 2011 at 4:46 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Wednesday, May 18, 2011, MyungJoo Ham wrote:
>> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>:
>> > On Wednesday, May 11, 2011, MyungJoo Ham wrote:
>> >> Three CPUFREQ-like governors are provided as examples.
>> >>
>> >> powersave: use the lowest frequency possible. The user (device) should
>> >> set the polling_ms as 0 because polling is useless for this governor.
>> >>
>> >> performance: use the highest freqeuncy possible. The user (device)
>> >> should set the polling_ms as 0 because polling is useless for this
>> >> governor.
>> >>
>> >> simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.
>> >>
>> >> When a user updates OPP entries (enable/disable/add), OPP framework
>> >> automatically notifies DEVFREQ to update operating frequency
>> >> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>> >> DEVFREQ manually with OPP entry updates or set polling_ms for powersave
>> >> , performance, or any other "static" governors.
>> >
>> > Well, do you expect anyone to actually use them?  If not, it would make
>> > more sense to put them into a doc.
>>
>> According to our experiences of DVFS(although this "DEVFREQ" is not
>> applied to them, yet) in memory-bus and GPU,
>> I expect most DEVFREQ users might use "simple_ondemand" and
>> expect "powersave" and "performance" will probably mostly used while
>> testing and debugging.
>> ("userspace"-like governor would be also useful for that purpose, but
>> I'd add it later)
>
> It would be good to have at least one in-tree user for each of them
> (not necessarily from the start, but at one point in the future at least).
>
> So if you have any _specific_ users in mind, please let me know.

For Exynos4, the DRAM bus, which is included in cpufreq and altered
along with CPU, but would be better to alter independently, and the
G3D core (MALI) are the candidate. In our kernel hacks, the DRAM bus
frequency varies independently from CPU with DRAM usage monitoring and
I think it can be easily modified to use DEVFREQ; thus, I'm going to
"translate" this to DEVFREQ next time. For MALI, I don't know whether
there is non-copyrighted driver or now, but they do support DVFS and
are using DVFS with some PITA driver support.

These two (EXYNOS4 DRAM bus and MALI) are the specific users in mind for now.
Probably, out-of-SoC devices such as WiFi, BT, and others may be the
candidate as well.

>
> Thanks,
> Rafael
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-05-27  4:42 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-11  7:58 [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
2011-05-11  7:58 ` MyungJoo Ham
2011-05-11  7:58 ` [PATCH v2 2/3] PM / DEVFREQ: add example governors MyungJoo Ham
2011-05-11  7:58   ` MyungJoo Ham
2011-05-17 22:39   ` Rafael J. Wysocki
2011-05-17 22:39   ` Rafael J. Wysocki
2011-05-18  0:48     ` MyungJoo Ham
2011-05-18 19:46       ` Rafael J. Wysocki
2011-05-18 19:46       ` Rafael J. Wysocki
2011-05-27  4:42         ` MyungJoo Ham
2011-05-27  4:42         ` MyungJoo Ham
2011-05-18  0:48     ` MyungJoo Ham
2011-05-11  7:58 ` [PATCH v2 3/3] PM / DEVFREQ: add sysfs interface (including user tickling) MyungJoo Ham
2011-05-11  7:58   ` MyungJoo Ham
2011-05-11 22:55   ` Greg KH
2011-05-11 22:55   ` Greg KH
2011-05-17  5:04     ` MyungJoo Ham
2011-05-17  5:04     ` MyungJoo Ham
2011-05-17 18:32       ` Greg KH
2011-05-17 18:32       ` Greg KH
2011-05-17 22:41       ` Rafael J. Wysocki
2011-05-17 22:41       ` Rafael J. Wysocki
2011-05-18  0:43         ` MyungJoo Ham
2011-05-18  0:43         ` MyungJoo Ham
2011-05-17 22:36 ` [PATCH v2 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs Rafael J. Wysocki
2011-05-18  8:22   ` MyungJoo Ham
2011-05-18 20:02     ` Rafael J. Wysocki
2011-05-18 20:02     ` Rafael J. Wysocki
2011-05-18 20:21       ` Pavel Machek
2011-05-18 20:29         ` Rafael J. Wysocki
2011-05-18 20:29         ` Rafael J. Wysocki
2011-05-18 20:21       ` Pavel Machek
2011-05-20  5:36       ` MyungJoo Ham
2011-05-20  5:36       ` MyungJoo Ham
2011-05-18  8:22   ` MyungJoo Ham
2011-05-17 22:36 ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.