All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-07-15  8:11 ` [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling) MyungJoo Ham
@ 2011-06-09 17:11   ` Pavel Machek
  2011-07-19  2:14     ` MyungJoo Ham
  0 siblings, 1 reply; 30+ messages in thread
From: Pavel Machek @ 2011-06-09 17:11 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm, Thomas Gleixner

Hi!

> +What:		/sys/devices/.../power/devfreq_polling_interval
> +Date:		July 2011
> +Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> +		The /sys/devices/.../power/devfreq_polling_interval shows the
> +		requested polling interval of the corresponding device.

AFAICT, polling only makes sense for one governor; and I guess more
governor parameters will be needed in future. Should polling interval
be governor-specific?
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
@ 2011-07-15  8:11 MyungJoo Ham
  2011-07-15  8:11 ` [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner

For a usage example, please look at
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq

In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
and other related clocks simply follow the determined DDR RAM clock.

The DEVFREQ driver for Exynos4210 memory bus is at
/arch/arm/mach-exynos4/devfreq_bus.c in the git tree.

MyungJoo Ham (3):
  PM: Introduce DEVFREQ: generic DVFS framework with device-specific
    OPPs
  PM / DEVFREQ: add example governors
  PM / DEVFREQ: add sysfs interface (including user tickling)

 Documentation/ABI/testing/sysfs-devices-power |   50 ++
 Documentation/ABI/testing/sysfs-power         |   43 ++
 drivers/base/power/Makefile                   |    1 +
 drivers/base/power/devfreq.c                  |  714 +++++++++++++++++++++++++
 drivers/base/power/opp.c                      |    9 +
 include/linux/devfreq.h                       |  119 ++++
 kernel/power/Kconfig                          |   34 ++
 7 files changed, 970 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/power/devfreq.c
 create mode 100644 include/linux/devfreq.h

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-07-15  8:11 [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices MyungJoo Ham
@ 2011-07-15  8:11 ` MyungJoo Ham
  2011-08-02 18:45   ` Kevin Hilman
  2011-08-02 21:56   ` Kevin Hilman
  2011-07-15  8:11 ` [PATCH v4 2/3] PM / DEVFREQ: add example governors MyungJoo Ham
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner

With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.

This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.

The generic DVFS for devices, DEVFREQ, may appear quite similar with
/drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.

Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
DEVFREQ also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.

Tested with memory bus of Exynos4-NURI board.

The test code with board support for Exynos4-NURI is at
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.

Changed from v3
- In kerneldoc comments, DEVFREQ has ben replaced by devfreq
- Revised removing devfreq entries with error mechanism
- Added and revised comments
- Removed unnecessary codes
- Allow to give a name to a governor
- Bugfix: a tickle call may cancel an older tickle call that is still in
  effect.

Changed from v2
- Code style revised and cleaned up.
- Remove DEVFREQ entries that incur errors except for EAGAIN
- Bug fixed: tickle for devices without polling governors

Changes from v1(RFC)
- Rename: DVFS --> DEVFREQ
- Revised governor design
    . Governor receives the whole struct devfreq
    . Governor should gather usage information (thru get_dev_status)
itself
- Periodic monitoring runs only when needed.
- DEVFREQ no more deals with voltage information directly
- Removed some printks.
- Some cosmetics update
- Use freezable_wq.
---
 drivers/base/power/Makefile  |    1 +
 drivers/base/power/devfreq.c |  397 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/power/opp.c     |    9 +
 include/linux/devfreq.h      |  111 ++++++++++++
 kernel/power/Kconfig         |   34 ++++
 5 files changed, 552 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/power/devfreq.c
 create mode 100644 include/linux/devfreq.h

diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index 3647e11..20118dc 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -4,5 +4,6 @@ obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 obj-$(CONFIG_PM_OPP)	+= opp.o
 obj-$(CONFIG_HAVE_CLK)	+= clock_ops.o
+obj-$(CONFIG_PM_DEVFREQ)	+= devfreq.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
\ No newline at end of file
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
new file mode 100644
index 0000000..aba9768
--- /dev/null
+++ b/drivers/base/power/devfreq.c
@@ -0,0 +1,397 @@
+/*
+ * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/opp.h>
+#include <linux/devfreq.h>
+#include <linux/workqueue.h>
+#include <linux/platform_device.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+
+/*
+ * devfreq polling interval in ms.
+ * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
+ */
+#define DEVFREQ_INTERVAL	20
+
+/*
+ * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
+ * registered device.
+ */
+static bool polling;
+static struct workqueue_struct *devfreq_wq;
+static struct delayed_work devfreq_work;
+
+/* The list of all device-devfreq */
+static LIST_HEAD(devfreq_list);
+static DEFINE_MUTEX(devfreq_list_lock);
+
+/**
+ * find_device_devfreq() - find devfreq struct using device pointer
+ * @dev:	device pointer used to lookup device devfreq.
+ *
+ * Search the list of device devfreqs and return the matched device's
+ * devfreq info. devfreq_list_lock should be held by the caller.
+ */
+static struct devfreq *find_device_devfreq(struct device *dev)
+{
+	struct devfreq *tmp_devfreq;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return ERR_PTR(-EINVAL);
+	}
+
+	list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
+		if (tmp_devfreq->dev == dev)
+			return tmp_devfreq;
+	}
+
+	return ERR_PTR(-ENODEV);
+}
+
+/**
+ * devfreq_do() - Check the usage profile of a given device and configure
+ *		frequency and voltage accordingly
+ * @devfreq:	devfreq info of the given device
+ */
+static int devfreq_do(struct devfreq *devfreq)
+{
+	struct opp *opp;
+	unsigned long freq;
+	int err;
+
+	err = devfreq->governor->get_target_freq(devfreq, &freq);
+	if (err)
+		return err;
+
+	opp = opp_find_freq_ceil(devfreq->dev, &freq);
+	if (opp == ERR_PTR(-ENODEV))
+		opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	if (devfreq->previous_freq == freq)
+		return 0;
+
+	err = devfreq->profile->target(devfreq->dev, opp);
+	if (err)
+		return err;
+
+	devfreq->previous_freq = freq;
+	return 0;
+}
+
+/**
+ * devfreq_monitor() - Periodically run devfreq_do() and support
+ *		     device devfreq tickle.
+ * @work: the work struct used to run devfreq_monitor periodically.
+ *
+ * Tickle is to force a device to operate at its maximum operable frequency
+ * for a while temporarily. Look at devfreq_tickle_device() for more
+ * information about tickle.
+ */
+static void devfreq_monitor(struct work_struct *work)
+{
+	struct devfreq *devfreq, *tmp;
+	int error;
+
+	mutex_lock(&devfreq_list_lock);
+
+	polling = false;
+
+	list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
+		/*
+		 * If the device is tickled and the tickle duration is left,
+		 * do not change the frequency for a while
+		 */
+		if (devfreq->tickle) {
+			polling = true;
+			devfreq->tickle--;
+
+			/*
+			 * If tickling is ending and the device is not going
+			 * to poll, force the device to poll next time so that
+			 * it can return to the original frequency.
+			 * However, as a non-polling device has 0 polling_ms,
+			 * it will not poll again later.
+			 */
+			if (devfreq->tickle == 0 && devfreq->next_polling == 0)
+				devfreq->next_polling = 1;
+
+			continue;
+		}
+
+		if (devfreq->next_polling == 0)
+			continue;
+
+		polling = true;
+
+		if (devfreq->next_polling-- == 1) {
+			error = devfreq_do(devfreq);
+
+			/* Remove a devfreq with an error. */
+			if (error && error != -EAGAIN) {
+				dev_err(devfreq->dev, "devfreq_do error(%d). "
+					"devfreq is removed from the device\n",
+					error);
+
+				list_del(&devfreq->node);
+				kfree(devfreq);
+
+				continue;
+			}
+			devfreq->next_polling = DIV_ROUND_UP(
+						devfreq->profile->polling_ms,
+						DEVFREQ_INTERVAL);
+		}
+	}
+
+	if (polling)
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+
+	mutex_unlock(&devfreq_list_lock);
+}
+
+/**
+ * devfreq_add_device() - Add devfreq feature to the device
+ * @dev:	the device to add devfreq feature.
+ * @profile:	device-specific profile to run devfreq.
+ * @governor:	the policy to choose frequency.
+ */
+int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
+		       struct devfreq_governor *governor)
+{
+	struct devfreq *new_devfreq, *devfreq;
+	int err = 0;
+
+	if (!dev || !profile || !governor) {
+		dev_err(dev, "%s: Invalid parameters.\n", __func__);
+		return -EINVAL;
+	}
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (!IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to create devfreq for the device. "
+			"It already has one.\n", __func__);
+		err = -EINVAL;
+		goto out;
+	}
+
+	new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
+	if (!new_devfreq) {
+		dev_err(dev, "%s: Unable to create devfreq for the device\n",
+			__func__);
+		err = -ENOMEM;
+		goto out;
+	}
+
+	new_devfreq->dev = dev;
+	new_devfreq->profile = profile;
+	new_devfreq->governor = governor;
+	new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
+						 DEVFREQ_INTERVAL);
+	new_devfreq->previous_freq = profile->initial_freq;
+
+	list_add(&new_devfreq->node, &devfreq_list);
+
+	if (devfreq_wq && new_devfreq->next_polling && !polling) {
+		polling = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+out:
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+/**
+ * devfreq_remove_device() - Remove devfreq feature from a device.
+ * @device:	the device to remove devfreq feature.
+ */
+int devfreq_remove_device(struct device *dev)
+{
+	struct devfreq *devfreq;
+
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to find devfreq entry for the device.\n",
+			__func__);
+		mutex_unlock(&devfreq_list_lock);
+		return -EINVAL;
+	}
+
+	list_del(&devfreq->node);
+
+	kfree(devfreq);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return 0;
+}
+
+/**
+ * devfreq_update() - Notify that the device OPP has been changed.
+ * @dev:	the device whose OPP has been changed.
+ */
+int devfreq_update(struct device *dev)
+{
+	struct devfreq *devfreq;
+	int err = 0;
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		err = PTR_ERR(devfreq);
+		goto out;
+	}
+
+	/*
+	 * If the maximum frequency available is changed either by
+	 * enabling higher frequency or disabling the current
+	 * maximum frequency, we need to adjust the frequency
+	 * (tickle) again if the device has been being tickled.
+	 */
+	if (devfreq->tickle) {
+		unsigned long freq = devfreq->profile->max_freq;
+		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+		if (IS_ERR(opp)) {
+			err = PTR_ERR(opp);
+			goto out;
+		}
+
+		/* Max freq available is not changed */
+		if (devfreq->previous_freq == freq)
+			goto out;
+
+		/* Tickle again. Max freq available is changed */
+		err = devfreq->profile->target(devfreq->dev, opp);
+		if (!err)
+			devfreq->previous_freq = freq;
+	} else {
+		/* Reevaluate the proper frequency */
+		err = devfreq_do(devfreq);
+	}
+
+out:
+	mutex_unlock(&devfreq_list_lock);
+	return err;
+}
+
+/**
+ * _devfreq_tickle_device() - Adjust operating frequency at maximum and
+ *			    keep the frequency for the designiated delay.
+ * @df:		devfreq entry of the device being tickled.
+ * @delay:	duration of tickle effect in the number of polling.
+ */
+static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
+{
+	int err = 0;
+	unsigned long freq;
+	struct opp *opp;
+
+	freq = df->profile->max_freq;
+	opp = opp_find_freq_floor(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	if (df->previous_freq != freq) {
+		err = df->profile->target(df->dev, opp);
+		if (!err)
+			df->previous_freq = freq;
+	}
+	if (err) {
+		dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
+	} else {
+		/* Do not shorten tickle duration with a new tickle call */
+		if (df->tickle < delay)
+			df->tickle = delay;
+
+		df->num_tickle++;
+	}
+
+	if (devfreq_wq && !polling) {
+		polling = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+
+	return err;
+}
+
+/**
+ * devfreq_tickle_device() - Guarantee maximum operation speed for a while
+ *			instaneously.
+ * @dev:	the device to be tickled.
+ * @duration_ms:	the duration of tickle effect.
+ *
+ * Tickle sets the device at the maximum frequency instaneously and
+ * the maximum frequency is guaranteed to be used for the given duration.
+ * For faster user reponse time, an input event may tickle a related device
+ * so that the input event does not need to wait for the devfreq to react with
+ * normal interval.
+ *
+ * _devfreq_tickle_device() is used as a helper function for tickling.
+ */
+int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	struct devfreq *devfreq;
+	int err = 0;
+	unsigned long delay; /* in # of DEVFREQ_INTERVAL */
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	delay = DIV_ROUND_UP(duration_ms, DEVFREQ_INTERVAL);
+
+	if (IS_ERR(devfreq))
+		err = PTR_ERR(devfreq);
+	else
+		err = _devfreq_tickle_device(devfreq, delay);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+/**
+ * devfreq_init() - Initialize data structure for devfreq framework and
+ *		  start polling registered devfreq devices.
+ */
+static int __init devfreq_init(void)
+{
+	mutex_lock(&devfreq_list_lock);
+
+	polling = false;
+	devfreq_wq = create_freezable_workqueue("devfreq_wq");
+	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+	mutex_unlock(&devfreq_list_lock);
+
+	devfreq_monitor(&devfreq_work.work);
+	return 0;
+}
+late_initcall(devfreq_init);
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 56a6899..819c1b3 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -21,6 +21,7 @@
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
 #include <linux/opp.h>
+#include <linux/devfreq.h>
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -428,6 +429,11 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
 	list_add_rcu(&new_opp->node, head);
 	mutex_unlock(&dev_opp_list_lock);
 
+	/*
+	 * Notify generic dvfs for the change and ignore error
+	 * because the device may not have a devfreq entry
+	 */
+	devfreq_update(dev);
 	return 0;
 }
 
@@ -512,6 +518,9 @@ unlock:
 	mutex_unlock(&dev_opp_list_lock);
 out:
 	kfree(new_opp);
+
+	/* Notify generic dvfs for the change and ignore error */
+	devfreq_update(dev);
 	return r;
 }
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
new file mode 100644
index 0000000..7c881cc
--- /dev/null
+++ b/include/linux/devfreq.h
@@ -0,0 +1,111 @@
+/*
+ * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __LINUX_DEVFREQ_H__
+#define __LINUX_DEVFREQ_H__
+
+#define DEVFREQ_NAME_LEN 16
+
+struct devfreq;
+struct devfreq_dev_status {
+	/* both since the last measure */
+	unsigned long total_time;
+	unsigned long busy_time;
+	unsigned long current_frequency;
+};
+
+struct devfreq_dev_profile {
+	unsigned long max_freq; /* may be larger than the actual value */
+	unsigned long initial_freq;
+	int polling_ms;	/* 0 for at opp change only */
+
+	int (*target)(struct device *dev, struct opp *opp);
+	int (*get_dev_status)(struct device *dev,
+			      struct devfreq_dev_status *stat);
+};
+
+/**
+ * struct devfreq_governor - DEVFREQ Policy Governor
+ * @data	Governor's internal data. The framework does not care of it.
+ * @get_target_freq	Returns desired operating frequency for the device.
+ *			Basically, get_target_freq will run
+ *			devfreq_dev_profile.get_dev_status() to get the
+ *			status of the device (load = busy_time / total_time).
+ */
+struct devfreq_governor {
+	char name[DEVFREQ_NAME_LEN];
+	void *data; /* private data for get_target_freq */
+	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+};
+
+/**
+ * struct devfreq - Device DEVFREQ structure
+ * @node	list node - contains the devices with DEVFREQ that have been
+ *		registered.
+ * @dev		device pointer
+ * @profile	device-specific devfreq profile
+ * @governor	method how to choose frequency based on the usage.
+ * @previous_freq	previously configured frequency value.
+ * @next_polling	the number of remaining "devfreq_monitor" executions to
+ *			reevaluate frequency/voltage of the device. Set by
+ *			profile's polling_ms interval.
+ * @tickle	positive if DEVFREQ-tickling is activated for the device.
+ *		at each executino of devfreq_monitor, tickle is decremented.
+ *		User may tickle a device-devfreq in order to set maximum
+ *		frequency instaneously with some guaranteed duration.
+ *
+ * This structure stores the DEVFREQ information for a give device.
+ */
+struct devfreq {
+	struct list_head node;
+
+	struct device *dev;
+	struct devfreq_dev_profile *profile;
+	struct devfreq_governor *governor;
+
+	unsigned long previous_freq;
+	unsigned int next_polling;
+	unsigned int tickle;
+};
+
+#if defined(CONFIG_PM_DEVFREQ)
+extern int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor);
+extern int devfreq_remove_device(struct device *dev);
+extern int devfreq_update(struct device *dev);
+extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+#else /* !CONFIG_PM_DEVFREQ */
+static int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor)
+{
+	return 0;
+}
+
+static int devfreq_remove_device(struct device *dev)
+{
+	return 0;
+}
+
+static int devfreq_update(struct device *dev)
+{
+	return 0;
+}
+
+static int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	return 0;
+}
+#endif /* CONFIG_PM_DEVFREQ */
+
+#endif /* __LINUX_DEVFREQ_H__ */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 87f4d24..b7e15c8 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -227,3 +227,37 @@ config PM_OPP
 config PM_RUNTIME_CLK
 	def_bool y
 	depends on PM_RUNTIME && HAVE_CLK
+
+config ARCH_HAS_DEVFREQ
+	bool
+	depends on ARCH_HAS_OPP
+	help
+	  Denotes that the architecture supports DEVFREQ. If the architecture
+	  supports multiple OPP entries per device and the frequency of the
+	  devices with OPPs may be altered dynamically, the architecture
+	  supports DEVFREQ.
+
+config PM_DEVFREQ
+	bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
+	depends on PM_OPP && ARCH_HAS_DEVFREQ
+	help
+	  With OPP support, a device may have a list of frequencies and
+	  voltages available. DEVFREQ, a generic DVFS framework can be
+	  registered for a device with OPP support in order to let the
+	  governor provided to DEVFREQ choose an operating frequency
+	  based on the OPP's list and the policy given with DEVFREQ.
+
+	  Each device may have its own governor and policy. DEVFREQ can
+	  reevaluate the device state periodically and/or based on the
+	  OPP list changes (each frequency/voltage pair in OPP may be
+	  disabled or enabled).
+
+	  Like some CPUs with CPUFREQ, a device may have multiple clocks.
+	  However, because the clock frequencies of a single device are
+	  determined by the single device's state, an instance of DEVFREQ
+	  is attached to a single device and returns a "representative"
+	  clock frequency from the OPP of the device, which is also attached
+	  to a device by 1-to-1. The device registering DEVFREQ takes the
+	  responsiblity to "interpret" the frequency listed in OPP and
+	  to set its every clock accordingly with the "target" callback
+	  given to DEVFREQ.
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 2/3] PM / DEVFREQ: add example governors
  2011-07-15  8:11 [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices MyungJoo Ham
  2011-07-15  8:11 ` [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
@ 2011-07-15  8:11 ` MyungJoo Ham
  2011-07-15  8:11 ` [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling) MyungJoo Ham
  2011-07-28 22:10 ` [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices Rafael J. Wysocki
  3 siblings, 0 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner

Three CPUFREQ-like governors are provided as examples.

powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.

performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.

simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.

When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies DEVFREQ to update operating frequency
accordingly. Thus, DEVFREQ users (device drivers) do not need to update
DEVFREQ manually with OPP entry updates or set polling_ms for powersave
, performance, or any other "static" governors.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

---
Changes from v3:
- Bugfixes on simple-ondemand governor (divide by zero / overflow)
- Style fixes
- Give names to governors
---
 drivers/base/power/devfreq.c |   85 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/devfreq.h      |    5 ++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index aba9768..e5a73aa 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -395,3 +395,88 @@ static int __init devfreq_init(void)
 	return 0;
 }
 late_initcall(devfreq_init);
+
+static int devfreq_powersave_func(struct devfreq *df,
+				  unsigned long *freq)
+{
+	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
+	return 0;
+}
+
+struct devfreq_governor devfreq_powersave = {
+	.name = "powersave",
+	.get_target_freq = devfreq_powersave_func,
+};
+
+static int devfreq_performance_func(struct devfreq *df,
+				    unsigned long *freq)
+{
+	*freq = UINT_MAX; /* devfreq_do will run "floor" */
+	return 0;
+}
+
+struct devfreq_governor devfreq_performance = {
+	.name = "performance",
+	.get_target_freq = devfreq_performance_func,
+};
+
+/* Constants for DevFreq-Simple-Ondemand (DFSO) */
+#define DFSO_UPTHRESHOLD	(90)
+#define DFSO_DOWNDIFFERENCTIAL	(5)
+static int devfreq_simple_ondemand_func(struct devfreq *df,
+					unsigned long *freq)
+{
+	struct devfreq_dev_status stat;
+	int err = df->profile->get_dev_status(df->dev, &stat);
+	unsigned long long a, b;
+
+	if (err)
+		return err;
+
+	/* Assume MAX if it is going to be divided by zero */
+	if (stat.total_time == 0) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Prevent overflow */
+	if (stat.busy_time >= (1 << 24) || stat.total_time >= (1 << 24)) {
+		stat.busy_time >>= 7;
+		stat.total_time >>= 7;
+	}
+
+	/* Set MAX if it's busy enough */
+	if (stat.busy_time * 100 >
+	    stat.total_time * DFSO_UPTHRESHOLD) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Set MAX if we do not know the initial frequency */
+	if (stat.current_frequency == 0) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Keep the current frequency */
+	if (stat.busy_time * 100 >
+	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
+		*freq = stat.current_frequency;
+		return 0;
+	}
+
+	/* Set the desired frequency based on the load */
+	a = stat.busy_time;
+	a *= stat.current_frequency;
+	b = div_u64(a, stat.total_time);
+	b *= 100;
+	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
+	*freq = (unsigned long) b;
+
+	return 0;
+}
+
+struct devfreq_governor devfreq_simple_ondemand = {
+	.name = "simple_ondemand",
+	.get_target_freq = devfreq_simple_ondemand_func,
+};
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 7c881cc..baa074c 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -84,6 +84,11 @@ extern int devfreq_add_device(struct device *dev,
 extern int devfreq_remove_device(struct device *dev);
 extern int devfreq_update(struct device *dev);
 extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+
+extern struct devfreq_governor devfreq_powersave;
+extern struct devfreq_governor devfreq_performance;
+extern struct devfreq_governor devfreq_simple_ondemand;
+
 #else /* !CONFIG_PM_DEVFREQ */
 static int devfreq_add_device(struct device *dev,
 			   struct devfreq_dev_profile *profile,
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-07-15  8:11 [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices MyungJoo Ham
  2011-07-15  8:11 ` [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
  2011-07-15  8:11 ` [PATCH v4 2/3] PM / DEVFREQ: add example governors MyungJoo Ham
@ 2011-07-15  8:11 ` MyungJoo Ham
  2011-06-09 17:11   ` Pavel Machek
  2011-07-28 22:10 ` [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices Rafael J. Wysocki
  3 siblings, 1 reply; 30+ messages in thread
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner

1. System-wide sysfs interface /sys/power/
- tickle_all	R: number of tickle_all execution
		W: tickle all devfreq devices
- min_interval	R: devfreq monitoring base interval in ms
- monitoring	R: shows whether devfreq monitoring is active or
 not.

2. Device specific sysfs interface /sys/devices/.../power/devfreq_*
- tickle	R: number of tickle execution for the device
		W: tickle the device
- governor	R: name of governor
- cur_freq	R: current frequency
- max_freq	R: maximum operable frequency
- min_freq	R: minimum operable frequency
- polling_interval	R: polling interval in ms given with devfreq profile

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Changed from v3
- corrected sysfs API usage
- corrected error messages
- moved sysfs entry location
- added sysfs entries

Changed from v2
- add ABI entries for devfreq sysfs interface
---
 Documentation/ABI/testing/sysfs-devices-power |   50 ++++++
 Documentation/ABI/testing/sysfs-power         |   43 +++++
 drivers/base/power/devfreq.c                  |  232 +++++++++++++++++++++++++
 include/linux/devfreq.h                       |    3 +
 4 files changed, 328 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
index 8ffbc25..692f845 100644
--- a/Documentation/ABI/testing/sysfs-devices-power
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -165,3 +165,53 @@ Description:
 
 		Not all drivers support this attribute.  If it isn't supported,
 		attempts to read or write it will yield I/O errors.
+
+What:		/sys/devices/.../power/devfreq_tickle
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_tickle file allows users
+		to force the corresponding device to operate at its maximum
+		operable frequency instaneously and temporarily. After a
+		designated duration has passed, the operating frequency returns
+		to normal. When a user reads the tickle entry, it returns
+		the number of tickle executions for the device. When a user
+		writes to the tickle entry with the tickle duration in ms,
+		the effect of device tickling is held for the designated
+		duration. Note that the duration is rounded-up by
+		the value DEVFREQ_INTERVAL defined in devfreq.c
+
+What:		/sys/devices/.../power/devfreq_governor
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_governor shows the name
+		of the governor used by the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_cur_freq
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_cur_freq shows the current
+		frequency of the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_max_freq
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_cur_freq shows the
+		maximum operable frequency of the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_min_freq
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_cur_freq shows the
+		minimum operable frequency of the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_polling_interval
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_polling_interval shows the
+		requested polling interval of the corresponding device.
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index b464d12..4d8434b 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -172,3 +172,46 @@ Description:
 
 		Reading from this file will display the current value, which is
 		set to 1 MB by default.
+
+What:		/sys/power/devfreq/
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq directory will contain files that will
+		provide a unified interface to the DEVFREQ, a generic DVFS
+		(dynamic voltage and frequency scaling) framework.
+
+What:		/sys/power/devfreq/tickle_all
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq/tickle_all file allows user space to
+		force every device with DEVFREQ to operate at the maximum
+		frequency of the device instaneously and temporarily. After
+		a designated delay has passed, the operating frequency returns
+		to normal. If a user reads the tickle_all entry, it returns
+		the number of tickle_all executions. When writing to the
+		tickle_all entry, the user should supply with the duration of
+		tickle in ms (the "designated delay" mentioned before). Then,
+		the effect of tickle_all will hold for the denoted duration.
+		Note that the duration is rounded by the monitoring period
+		defined by DEVFREQ_INTERVAL in /drivers/base/power/devfreq.c.
+
+What:		/sys/power/devfreq/min_interval
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq/min_interval file shows the monitoring
+		period defined by DEVFREQ_INTERVAL in
+		/drivers/base/power/devfreq.c. The duration of device tickling
+		is rounded-up by DEVFREQ_INTERVAL.
+
+What:		/sys/power/devfreq/monitoring
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq/monitoring file shows whether DEVFREQ
+		is periodically monitoring. Periodic monitoring is activated
+		if there is a device that wants periodic monitoring for DVFS or
+		there is a device that is tickled (and the tickling duration is
+		not yet expired).
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index e5a73aa..a62e757 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -40,6 +40,9 @@ static struct delayed_work devfreq_work;
 static LIST_HEAD(devfreq_list);
 static DEFINE_MUTEX(devfreq_list_lock);
 
+static struct kobject *devfreq_kobj;
+static struct attribute_group dev_attr_group;
+
 /**
  * find_device_devfreq() - find devfreq struct using device pointer
  * @dev:	device pointer used to lookup device devfreq.
@@ -151,6 +154,8 @@ static void devfreq_monitor(struct work_struct *work)
 					"devfreq is removed from the device\n",
 					error);
 
+				sysfs_remove_group(&devfreq->dev->kobj,
+						   &dev_attr_group);
 				list_del(&devfreq->node);
 				kfree(devfreq);
 
@@ -218,6 +223,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
 		queue_delayed_work(devfreq_wq, &devfreq_work,
 				   msecs_to_jiffies(DEVFREQ_INTERVAL));
 	}
+
+	sysfs_merge_group(&dev->kobj, &dev_attr_group);
 out:
 	mutex_unlock(&devfreq_list_lock);
 
@@ -244,6 +251,8 @@ int devfreq_remove_device(struct device *dev)
 		return -EINVAL;
 	}
 
+	sysfs_unmerge_group(&dev->kobj, &dev_attr_group);
+
 	list_del(&devfreq->node);
 
 	kfree(devfreq);
@@ -378,6 +387,215 @@ int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
 	return err;
 }
 
+static int num_tickle_all;
+
+static ssize_t tickle_all_store(struct kobject *kobj,
+				struct kobj_attribute *attr, const char *buf,
+				size_t count)
+{
+	int duration = 0;
+	struct devfreq *tmp;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	list_for_each_entry(tmp, &devfreq_list, node) {
+		_devfreq_tickle_device(tmp, delay);
+	}
+	mutex_unlock(&devfreq_list_lock);
+
+	num_tickle_all++;
+	return count;
+}
+
+static ssize_t tickle_all_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", num_tickle_all);
+}
+
+static ssize_t min_interval_show(struct kobject *kobj,
+				 struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
+}
+
+static ssize_t monitoring_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", polling ? 1 : 0);
+}
+
+static struct kobj_attribute tickle_all_attr = {
+	.attr = {
+		.name = "tickle_all",
+		.mode = 0644,
+	},
+	.show = tickle_all_show,
+	.store = tickle_all_store,
+};
+static struct kobj_attribute min_interval_attr = {
+	.attr = {
+		.name = "min_interval",
+		.mode = 0444,
+	},
+	.show = min_interval_show,
+};
+static struct kobj_attribute monitoring_attr = {
+	.attr = {
+		.name = "monitoring",
+		.mode = 0444,
+	},
+	.show = monitoring_show,
+};
+static struct attribute *devfreq_entries[] = {
+	&tickle_all_attr.attr,
+	&min_interval_attr.attr,
+	&monitoring_attr.attr,
+	NULL,
+};
+static struct attribute_group devfreq_attr_group = {
+	.name	= NULL,
+	.attrs	= devfreq_entries,
+};
+
+static ssize_t tickle(struct device *dev, struct device_attribute *attr,
+		      const char *buf, size_t count)
+{
+	int duration;
+	struct devfreq *df;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Null or invalid device.\n", __func__);
+		return -EINVAL;
+	}
+
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	df = find_device_devfreq(dev);
+	_devfreq_tickle_device(df, delay);
+	mutex_unlock(&devfreq_list_lock);
+
+	return count;
+}
+
+static ssize_t show_num_tickle(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (!IS_ERR(df))
+		return sprintf(buf, "%d\n", df->num_tickle);
+
+	return PTR_ERR(df);
+}
+
+static ssize_t show_governor(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->governor)
+		return -EINVAL;
+
+	return sprintf(buf, "%s\n", df->governor->name);
+}
+
+static ssize_t show_freq(struct device *dev,
+			 struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+
+	return sprintf(buf, "%lu\n", df->previous_freq);
+}
+
+static ssize_t show_max_freq(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+	unsigned long freq = ULONG_MAX;
+	struct opp *opp;
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->dev)
+		return -EINVAL;
+
+	opp = opp_find_freq_floor(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	return sprintf(buf, "%lu\n", freq);
+}
+
+static ssize_t show_min_freq(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+	unsigned long freq = 0;
+	struct opp *opp;
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->dev)
+		return -EINVAL;
+
+	opp = opp_find_freq_ceil(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	return sprintf(buf, "%lu\n", freq);
+}
+
+static ssize_t show_polling_interval(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->profile)
+		return -EINVAL;
+
+	return sprintf(buf, "%d\n", df->profile->polling_ms);
+}
+
+static DEVICE_ATTR(devfreq_tickle, 0644, show_num_tickle, tickle);
+static DEVICE_ATTR(devfreq_governor, 0444, show_governor, NULL);
+static DEVICE_ATTR(devfreq_cur_freq, 0444, show_freq, NULL);
+static DEVICE_ATTR(devfreq_max_freq, 0444, show_max_freq, NULL);
+static DEVICE_ATTR(devfreq_min_freq, 0444, show_min_freq, NULL);
+static DEVICE_ATTR(devfreq_polling_interval, 0444, show_polling_interval, NULL);
+static struct attribute *dev_entries[] = {
+	&dev_attr_devfreq_tickle.attr,
+	&dev_attr_devfreq_governor.attr,
+	&dev_attr_devfreq_cur_freq.attr,
+	&dev_attr_devfreq_max_freq.attr,
+	&dev_attr_devfreq_min_freq.attr,
+	&dev_attr_devfreq_polling_interval.attr,
+	NULL,
+};
+static struct attribute_group dev_attr_group = {
+	.name	= power_group_name,
+	.attrs	= dev_entries,
+};
+
 /**
  * devfreq_init() - Initialize data structure for devfreq framework and
  *		  start polling registered devfreq devices.
@@ -389,6 +607,20 @@ static int __init devfreq_init(void)
 	polling = false;
 	devfreq_wq = create_freezable_workqueue("devfreq_wq");
 	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+
+#ifdef CONFIG_PM
+	/* Create sysfs */
+	devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);
+	if (!devfreq_kobj) {
+		pr_err("Unable to create devfreq kobject.\n");
+		goto out;
+	}
+	if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
+		pr_err("Unable to create devfreq sysfs entries.\n");
+		goto out;
+	}
+#endif
+out:
 	mutex_unlock(&devfreq_list_lock);
 
 	devfreq_monitor(&devfreq_work.work);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index baa074c..f6e4e3b 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -62,6 +62,7 @@ struct devfreq_governor {
  *		at each executino of devfreq_monitor, tickle is decremented.
  *		User may tickle a device-devfreq in order to set maximum
  *		frequency instaneously with some guaranteed duration.
+ * @num_tickle	number of tickle calls.
  *
  * This structure stores the DEVFREQ information for a give device.
  */
@@ -75,6 +76,8 @@ struct devfreq {
 	unsigned long previous_freq;
 	unsigned int next_polling;
 	unsigned int tickle;
+
+	unsigned int num_tickle;
 };
 
 #if defined(CONFIG_PM_DEVFREQ)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
  2011-06-09 17:11   ` Pavel Machek
@ 2011-07-19  2:14     ` MyungJoo Ham
  0 siblings, 0 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-07-19  2:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm, Thomas Gleixner

On Fri, Jun 10, 2011 at 2:11 AM, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> +What:                /sys/devices/.../power/devfreq_polling_interval
>> +Date:                July 2011
>> +Contact:     MyungJoo Ham <myungjoo.ham@samsung.com>
>> +Description:
>> +             The /sys/devices/.../power/devfreq_polling_interval shows the
>> +             requested polling interval of the corresponding device.
>
> AFAICT, polling only makes sense for one governor; and I guess more
> governor parameters will be needed in future. Should polling interval
> be governor-specific?

struct devfreq_governor {
        char name[DEVFREQ_NAME_LEN];
        void *data; /* private data for get_target_freq */
        int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
};

The "data" entry in struct devfreq_governor is meant to be the
parameters as well as internal governor data in the future; the
current three example governors do not need it anyway.

Unlike CPUFREQ, we have many heterogeneous devices to support with
DEVFREQ. Thus, a specific device may need a governor with its own
unique design and supply such a governor (not defined as those
examples in devfreq.c) with devfreq_add_device.


And, yes. Among those three examples, only "simple-ondemand" is
required to use polling interval.

However, any drivers may add its own governors with polling and
tickling requires polling interval regardless of the governor used.

Therefore, I think polling interval is not governor specific in devfreq.


Thank you.
- MyungJoo

> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>



-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-15  8:11 [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices MyungJoo Ham
                   ` (2 preceding siblings ...)
  2011-07-15  8:11 ` [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling) MyungJoo Ham
@ 2011-07-28 22:10 ` Rafael J. Wysocki
  2011-07-29  4:46   ` Turquette, Mike
  2011-08-02 22:02   ` Kevin Hilman
  3 siblings, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2011-07-28 22:10 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm, Thomas Gleixner

On Friday, July 15, 2011, MyungJoo Ham wrote:
> For a usage example, please look at
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
> 
> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
> and other related clocks simply follow the determined DDR RAM clock.
> 
> The DEVFREQ driver for Exynos4210 memory bus is at
> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
> 
> MyungJoo Ham (3):
>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>     OPPs
>   PM / DEVFREQ: add example governors
>   PM / DEVFREQ: add sysfs interface (including user tickling)

OK, I'm going to take the patches for 3.2.

Thanks,
Rafael


>  Documentation/ABI/testing/sysfs-devices-power |   50 ++
>  Documentation/ABI/testing/sysfs-power         |   43 ++
>  drivers/base/power/Makefile                   |    1 +
>  drivers/base/power/devfreq.c                  |  714 +++++++++++++++++++++++++
>  drivers/base/power/opp.c                      |    9 +
>  include/linux/devfreq.h                       |  119 ++++
>  kernel/power/Kconfig                          |   34 ++
>  7 files changed, 970 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/power/devfreq.c
>  create mode 100644 include/linux/devfreq.h
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-28 22:10 ` [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices Rafael J. Wysocki
@ 2011-07-29  4:46   ` Turquette, Mike
  2011-07-29  9:10     ` Rafael J. Wysocki
  2011-08-02 22:02   ` Kevin Hilman
  1 sibling, 1 reply; 30+ messages in thread
From: Turquette, Mike @ 2011-07-29  4:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, MyungJoo Ham,
	Thomas Gleixner, linux-pm

On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, July 15, 2011, MyungJoo Ham wrote:
>> For a usage example, please look at
>> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>
>> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>> and other related clocks simply follow the determined DDR RAM clock.
>>
>> The DEVFREQ driver for Exynos4210 memory bus is at
>> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>>
>> MyungJoo Ham (3):
>>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>     OPPs
>>   PM / DEVFREQ: add example governors
>>   PM / DEVFREQ: add sysfs interface (including user tickling)
>
> OK, I'm going to take the patches for 3.2.

Have any other platforms signed up to use this mechanism to manage
their peripheral DVFS?

Thanks,
Mike

> Thanks,
> Rafael
>
>
>>  Documentation/ABI/testing/sysfs-devices-power |   50 ++
>>  Documentation/ABI/testing/sysfs-power         |   43 ++
>>  drivers/base/power/Makefile                   |    1 +
>>  drivers/base/power/devfreq.c                  |  714 +++++++++++++++++++++++++
>>  drivers/base/power/opp.c                      |    9 +
>>  include/linux/devfreq.h                       |  119 ++++
>>  kernel/power/Kconfig                          |   34 ++
>>  7 files changed, 970 insertions(+), 0 deletions(-)
>>  create mode 100644 drivers/base/power/devfreq.c
>>  create mode 100644 include/linux/devfreq.h
>>
>>
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-29  4:46   ` Turquette, Mike
@ 2011-07-29  9:10     ` Rafael J. Wysocki
  2011-07-30  1:02       ` Turquette, Mike
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2011-07-29  9:10 UTC (permalink / raw)
  To: Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, MyungJoo Ham,
	Thomas Gleixner, linux-pm

On Friday, July 29, 2011, Turquette, Mike wrote:
> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, July 15, 2011, MyungJoo Ham wrote:
> >> For a usage example, please look at
> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
> >>
> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
> >> and other related clocks simply follow the determined DDR RAM clock.
> >>
> >> The DEVFREQ driver for Exynos4210 memory bus is at
> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
> >>
> >> MyungJoo Ham (3):
> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
> >>     OPPs
> >>   PM / DEVFREQ: add example governors
> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
> >
> > OK, I'm going to take the patches for 3.2.
> 
> Have any other platforms signed up to use this mechanism to manage
> their peripheral DVFS?

Not that I know of, but one initial user is sufficient for me.
So if you have anything _against_ the patches, please speak up.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-29  9:10     ` Rafael J. Wysocki
@ 2011-07-30  1:02       ` Turquette, Mike
  2011-07-30 21:23         ` Rafael J. Wysocki
  2011-08-01  6:22         ` MyungJoo Ham
  0 siblings, 2 replies; 30+ messages in thread
From: Turquette, Mike @ 2011-07-30  1:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, MyungJoo Ham,
	Thomas Gleixner, linux-pm

On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, July 29, 2011, Turquette, Mike wrote:
>> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Friday, July 15, 2011, MyungJoo Ham wrote:
>> >> For a usage example, please look at
>> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>> >>
>> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>> >> and other related clocks simply follow the determined DDR RAM clock.
>> >>
>> >> The DEVFREQ driver for Exynos4210 memory bus is at
>> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>> >>
>> >> MyungJoo Ham (3):
>> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>> >>     OPPs
>> >>   PM / DEVFREQ: add example governors
>> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
>> >
>> > OK, I'm going to take the patches for 3.2.
>>
>> Have any other platforms signed up to use this mechanism to manage
>> their peripheral DVFS?
>
> Not that I know of, but one initial user is sufficient for me.
> So if you have anything _against_ the patches, please speak up.

I do have some concerns.  Let me start by saying that I'm defining a
"governor" as some active piece of executing code, probably a looping
workqueue that inspects activity/idleness of a device and then makes a
determination regarding clock frequency.

devfreq seems to be good framework for creating DVFS governors.
However I think that most scalable devices on an SoC do *not* need a
governor, and many scalable devices won't have performance counters or
any other way to implement such introspection.

Some examples include a MMC controller, which might change its clock
rate depending on the class of card that the user has inserted.  Or
even a "smartish" device like a GPU lacking performance counters; it's
driver will ramp up frequency when there is work to be done and kick
off a timeout.  If no new work comes in before the timeout then the
driver will drop the frequency.

A governor is not required in these cases (as they are event driven)
and devfreq is quite heavyweight for such applications.  What is
needed is a QoS-style software layer that allows throughput requests
to be made from an initiator device towards a target device.  This
layer should aggregate requests since many initator devices may make
requests to the same target device.  This layer I'm describing, which
does not exist today, should be where the actual DVFS transition takes
place.  That could take the form of a clk_set_rate call in the clock
framework (as described by Colin in V1 of this series), or some other
not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
per-device PM QoS patches or whatever.  For the purposes of this email
I don't really care which framework implements the QoS request
aggregation.

The point of describing this non-existant API is that devfreq should
really be just another input into it.  A governor that can measure bus
saturation is really cool, but it may not yield optimal results
compared to several drivers which make QoS-style requests and insure
that performance is guaranteed for their particular needs during their
transactions.  The good news is that we don't have to choose between
performance counter introspection and software QoS requests: both the
driver requests and the governor should all feed as inputs into the
QoS-style DVFS mechanism.

Taking that logic to its inevitable conclusion, tickle doesn't belong
inside the governor at all.  If some device X wants to ramp up the
frequency of device Y, it should just make a QoS-style throughput
request towards device Y, possibly with a timeout (keeping the
original idea of tickle intact).  This is entirely a separate idea
from a governor's introspective workqueue loop.

For userspace, a sysfs entry for tickle would also not feed into the
governor, but some dummy struct device *user would probably be the
initiator device and it would simply call the QoS-style throughput
API.

In summary my objections to this series are:
1) devfreq should not be the *final* software layer to invoke a DVFS
transition as it has not taken all constraints into account.
2) a devfreq governor represents just one constraint out of many to be
considered for any given scalable device.

My objection to these patches getting merged is that I think they are
a bit ahead of their time.  We need to know what the real DVFS API
looks like underneath devfreq first, since devfreq should really be
built on top of it.

Regards,
Mike

> Thanks,
> Rafael
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-30  1:02       ` Turquette, Mike
@ 2011-07-30 21:23         ` Rafael J. Wysocki
  2011-08-01 21:47           ` Turquette, Mike
  2011-08-01  6:22         ` MyungJoo Ham
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2011-07-30 21:23 UTC (permalink / raw)
  To: Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, MyungJoo Ham,
	Thomas Gleixner, linux-pm

On Saturday, July 30, 2011, Turquette, Mike wrote:
> On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, July 29, 2011, Turquette, Mike wrote:
> >> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > On Friday, July 15, 2011, MyungJoo Ham wrote:
> >> >> For a usage example, please look at
> >> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
> >> >>
> >> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
> >> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
> >> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
> >> >> and other related clocks simply follow the determined DDR RAM clock.
> >> >>
> >> >> The DEVFREQ driver for Exynos4210 memory bus is at
> >> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
> >> >>
> >> >> MyungJoo Ham (3):
> >> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
> >> >>     OPPs
> >> >>   PM / DEVFREQ: add example governors
> >> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
> >> >
> >> > OK, I'm going to take the patches for 3.2.
> >>
> >> Have any other platforms signed up to use this mechanism to manage
> >> their peripheral DVFS?
> >
> > Not that I know of, but one initial user is sufficient for me.
> > So if you have anything _against_ the patches, please speak up.
> 
> I do have some concerns.  Let me start by saying that I'm defining a
> "governor" as some active piece of executing code, probably a looping
> workqueue that inspects activity/idleness of a device and then makes a
> determination regarding clock frequency.
> 
> devfreq seems to be good framework for creating DVFS governors.
> However I think that most scalable devices on an SoC do *not* need a
> governor, and many scalable devices won't have performance counters or
> any other way to implement such introspection.

OK, so I'd like to see what the author of the patch series has to say
in the face of your comments below.

> Some examples include a MMC controller, which might change its clock
> rate depending on the class of card that the user has inserted.  Or
> even a "smartish" device like a GPU lacking performance counters; it's
> driver will ramp up frequency when there is work to be done and kick
> off a timeout.  If no new work comes in before the timeout then the
> driver will drop the frequency.
> 
> A governor is not required in these cases (as they are event driven)
> and devfreq is quite heavyweight for such applications.  What is
> needed is a QoS-style software layer that allows throughput requests
> to be made from an initiator device towards a target device.  This
> layer should aggregate requests since many initator devices may make
> requests to the same target device.  This layer I'm describing, which
> does not exist today, should be where the actual DVFS transition takes
> place.  That could take the form of a clk_set_rate call in the clock
> framework (as described by Colin in V1 of this series), or some other
> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
> per-device PM QoS patches or whatever.  For the purposes of this email
> I don't really care which framework implements the QoS request
> aggregation.
> 
> The point of describing this non-existant API is that devfreq should
> really be just another input into it.  A governor that can measure bus
> saturation is really cool, but it may not yield optimal results
> compared to several drivers which make QoS-style requests and insure
> that performance is guaranteed for their particular needs during their
> transactions.  The good news is that we don't have to choose between
> performance counter introspection and software QoS requests: both the
> driver requests and the governor should all feed as inputs into the
> QoS-style DVFS mechanism.
> 
> Taking that logic to its inevitable conclusion, tickle doesn't belong
> inside the governor at all.  If some device X wants to ramp up the
> frequency of device Y, it should just make a QoS-style throughput
> request towards device Y, possibly with a timeout (keeping the
> original idea of tickle intact).  This is entirely a separate idea
> from a governor's introspective workqueue loop.
> 
> For userspace, a sysfs entry for tickle would also not feed into the
> governor, but some dummy struct device *user would probably be the
> initiator device and it would simply call the QoS-style throughput
> API.
> 
> In summary my objections to this series are:
> 1) devfreq should not be the *final* software layer to invoke a DVFS
> transition as it has not taken all constraints into account.
> 2) a devfreq governor represents just one constraint out of many to be
> considered for any given scalable device.
> 
> My objection to these patches getting merged is that I think they are
> a bit ahead of their time.

Still, we're merging quite a number of patches being ahead of their time.
The resulting code is then modified as we learn what's wrong with it or
how to improve it.

Why exactly do you think this approach will not work in this particular case?

> We need to know what the real DVFS API looks like underneath devfreq first,
> since devfreq should really be built on top of it.

Well, quite frankly, if we generally adopted that point of view, much of the
useful functionality we have in the kernel right now wouldn't be merged at all.

Think of the USB subsystem, for one example, that has been rewritten from
scratch no fewer that three times.

The code appears to be reasonably isolated and simple enough to merge.
There is a subsystem wanting to use it and I don't see anyone forcing
anybody else to adopt it.  If it's not suitable to you, you won't be using it,
plain and simple.  And if you come up with some better code to replace it,
I won't have any problems with taking that too.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-30  1:02       ` Turquette, Mike
  2011-07-30 21:23         ` Rafael J. Wysocki
@ 2011-08-01  6:22         ` MyungJoo Ham
  2011-08-01 22:01           ` Turquette, Mike
  1 sibling, 1 reply; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-01  6:22 UTC (permalink / raw)
  To: Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

Hello.

On Sat, Jul 30, 2011 at 10:02 AM, Turquette, Mike <mturquette@ti.com> wrote:
> On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> On Friday, July 29, 2011, Turquette, Mike wrote:
>>> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>> > On Friday, July 15, 2011, MyungJoo Ham wrote:
>>> >> For a usage example, please look at
>>> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>> >>
>>> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>>> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>>> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>>> >> and other related clocks simply follow the determined DDR RAM clock.
>>> >>
>>> >> The DEVFREQ driver for Exynos4210 memory bus is at
>>> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>>> >>
>>> >> MyungJoo Ham (3):
>>> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>> >>     OPPs
>>> >>   PM / DEVFREQ: add example governors
>>> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
>>> >
>>> > OK, I'm going to take the patches for 3.2.
>>>
>>> Have any other platforms signed up to use this mechanism to manage
>>> their peripheral DVFS?
>>
>> Not that I know of, but one initial user is sufficient for me.
>> So if you have anything _against_ the patches, please speak up.
>
> I do have some concerns.  Let me start by saying that I'm defining a
> "governor" as some active piece of executing code, probably a looping
> workqueue that inspects activity/idleness of a device and then makes a
> determination regarding clock frequency.
>
> devfreq seems to be good framework for creating DVFS governors.
> However I think that most scalable devices on an SoC do *not* need a
> governor, and many scalable devices won't have performance counters or
> any other way to implement such introspection.

Yes, governors except for some static or userspace-driven ones (such
as "performance", "powersave", and "userspace" although "userspace" is
not implemented for devfreq yet), they loop workqueue that inspects
activity/idleness of a device and determines frequency. However, the
inspection is done with a callback provided by each device, not done
directly by the devfreq itself. Therefore, if there is any way to
measure the activities (not just performance counters, number of
requests/function calls should be fine for may cases), normal
governors like "simple-ondemand" will work.

> Some examples include a MMC controller, which might change its clock
> rate depending on the class of card that the user has inserted.  Or
> even a "smartish" device like a GPU lacking performance counters; it's
> driver will ramp up frequency when there is work to be done and kick
> off a timeout.  If no new work comes in before the timeout then the
> driver will drop the frequency.

In the "simple MMC controller w/o performance counter" case, there are
following ways to use devfreq even if using the number of requests or
functions calls is not possible.

Method 1) use "userspace" governor and let user process choose
frequency based on the class
Method 2) use any "reasonable" governor and let the device driver set
only "valid" frequencies enabled.
   For a rough example, we may do if class < 6, disable freq > 40MHz,
class < 10, disable freq > 80MHz, and so on. If we do not have
performance counters or any other mechanisms to monitor the
activities, "performance" governor along with clock-gated MMC driver
will save enough power.

For GPUs without anything to monitor the activities, we may do the
same as the MMC case.

However, with the H/W I've got now, (Exynos4210), we have performance
counters (PPMU) for many blocks: 3D(MALI GPU), ACP, CAMIF, CPU, DMC0,
DMC1 (memory controllers), FSYS, IMAGE, LCD0, LCD1, MFC_L, MFC_R, TV,
LEFT_BUS, and RIGHT_BUS. I don't think Exynos4 is an exceptionally
fancy SoC (already millions are sold for phones) and other mobile SoCs
(at least for flagship models) will have them very soon (or already
have them). Along with this patch, in the example with git branch
link, we control DMC0/DMC1 blocks. And,

> A governor is not required in these cases (as they are event driven)
> and devfreq is quite heavyweight for such applications.  What is
> needed is a QoS-style software layer that allows throughput requests
> to be made from an initiator device towards a target device.  This
> layer should aggregate requests since many initator devices may make
> requests to the same target device.  This layer I'm describing, which
> does not exist today, should be where the actual DVFS transition takes
> place.  That could take the form of a clk_set_rate call in the clock
> framework (as described by Colin in V1 of this series), or some other
> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
> per-device PM QoS patches or whatever.  For the purposes of this email
> I don't really care which framework implements the QoS request
> aggregation.

Such aggregation could be also done with governors. If the
governor-device pair does not want to poll devfreq wouldn't loop
unless there is any governor-device pair that wants to do so. If it is
event-driven, users may just "allow/disallow" frequencies with OPP
framework and devfreq will choose proper frequency with the given
governor for the device. If every device uses "static" or
"event-driven" governors such as powersave/performance/userspace,
there will be no polling/looping.

When it is going to be directly controlled by userspace, we'll need a
"userspace" governor (same with userspace governor of cpufreq).

If there is a QoS request for a devfreq-ed device, the request could
be done with OPP's frequency enable/disable. If a device is to be
executed at 400MHz or faster, all frequencies under 400MHz could be
simply disabled w/ OPP. Devfreq governors cannot override such
frequency enable/disable configurations.

However, if such QoS requests need delays (timers) like tickle, a
generalized tickle supplied with frequency or percent of max-frequency
might work. (i.e., tickle(dev, freuqency, duration); ) Then, this
generalized tickle will hold at the request frequency or higher by
disabling lower frequencies temporarily.

>
> The point of describing this non-existant API is that devfreq should
> really be just another input into it.  A governor that can measure bus
> saturation is really cool, but it may not yield optimal results
> compared to several drivers which make QoS-style requests and insure
> that performance is guaranteed for their particular needs during their
> transactions.  The good news is that we don't have to choose between
> performance counter introspection and software QoS requests: both the
> driver requests and the governor should all feed as inputs into the
> QoS-style DVFS mechanism.
>
> Taking that logic to its inevitable conclusion, tickle doesn't belong
> inside the governor at all.  If some device X wants to ramp up the
> frequency of device Y, it should just make a QoS-style throughput
> request towards device Y, possibly with a timeout (keeping the
> original idea of tickle intact).  This is entirely a separate idea
> from a governor's introspective workqueue loop.

Although tickle is sharing the same loop with governors, tickle does
not belong inside governors. Tickle overrides the decisions of
governors; governor's decision function is not called if the device is
being tickled. However, generalizing the tickle function so that it
may take "at least at xx % of max frequency" or "operate at least xx
khz" as an option seems reasonable for QoS requests. And such options
might be implemented for next version of devfreq later. This requires
modification in tickle function interface or adding another interface
for tickle function. However, if such QoS requests do not need
duration set, we can just go with OPP's frequency enable/disable and
disable lower-than-QoS-requirement frequencies.

Thus, I guess this QoS issue is somewhat not very significant for
devfreq. And it can be easily mitigated by adding another interface or
modifying the interface of tickle function.

>
> For userspace, a sysfs entry for tickle would also not feed into the
> governor, but some dummy struct device *user would probably be the
> initiator device and it would simply call the QoS-style throughput
> API.
>
> In summary my objections to this series are:
> 1) devfreq should not be the *final* software layer to invoke a DVFS
> transition as it has not taken all constraints into account.
> 2) a devfreq governor represents just one constraint out of many to be
> considered for any given scalable device.

If the concern is about the QoS requests, I guess generalizing tickle
would be sufficient as above. For devices without performance counters
and any other mechanisms to infer the usage statistics, "performance"
governor with event-driven OPP freq-enable/disable should be fine.

>
> My objection to these patches getting merged is that I think they are
> a bit ahead of their time.  We need to know what the real DVFS API
> looks like underneath devfreq first, since devfreq should really be
> built on top of it.
>
> Regards,
> Mike
>
>> Thanks,
>> Rafael
>>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>

Cheers!
MyungJoo.


-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-30 21:23         ` Rafael J. Wysocki
@ 2011-08-01 21:47           ` Turquette, Mike
  0 siblings, 0 replies; 30+ messages in thread
From: Turquette, Mike @ 2011-08-01 21:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, MyungJoo Ham,
	Thomas Gleixner, linux-pm

On Sat, Jul 30, 2011 at 2:23 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Saturday, July 30, 2011, Turquette, Mike wrote:
>> On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Friday, July 29, 2011, Turquette, Mike wrote:
>> >> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> >> > On Friday, July 15, 2011, MyungJoo Ham wrote:
>> >> >> For a usage example, please look at
>> >> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>> >> >>
>> >> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>> >> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>> >> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>> >> >> and other related clocks simply follow the determined DDR RAM clock.
>> >> >>
>> >> >> The DEVFREQ driver for Exynos4210 memory bus is at
>> >> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>> >> >>
>> >> >> MyungJoo Ham (3):
>> >> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>> >> >>     OPPs
>> >> >>   PM / DEVFREQ: add example governors
>> >> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
>> >> >
>> >> > OK, I'm going to take the patches for 3.2.
>> >>
>> >> Have any other platforms signed up to use this mechanism to manage
>> >> their peripheral DVFS?
>> >
>> > Not that I know of, but one initial user is sufficient for me.
>> > So if you have anything _against_ the patches, please speak up.
>>
>> I do have some concerns.  Let me start by saying that I'm defining a
>> "governor" as some active piece of executing code, probably a looping
>> workqueue that inspects activity/idleness of a device and then makes a
>> determination regarding clock frequency.
>>
>> devfreq seems to be good framework for creating DVFS governors.
>> However I think that most scalable devices on an SoC do *not* need a
>> governor, and many scalable devices won't have performance counters or
>> any other way to implement such introspection.
>
> OK, so I'd like to see what the author of the patch series has to say
> in the face of your comments below.
>
>> Some examples include a MMC controller, which might change its clock
>> rate depending on the class of card that the user has inserted.  Or
>> even a "smartish" device like a GPU lacking performance counters; it's
>> driver will ramp up frequency when there is work to be done and kick
>> off a timeout.  If no new work comes in before the timeout then the
>> driver will drop the frequency.
>>
>> A governor is not required in these cases (as they are event driven)
>> and devfreq is quite heavyweight for such applications.  What is
>> needed is a QoS-style software layer that allows throughput requests
>> to be made from an initiator device towards a target device.  This
>> layer should aggregate requests since many initator devices may make
>> requests to the same target device.  This layer I'm describing, which
>> does not exist today, should be where the actual DVFS transition takes
>> place.  That could take the form of a clk_set_rate call in the clock
>> framework (as described by Colin in V1 of this series), or some other
>> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
>> per-device PM QoS patches or whatever.  For the purposes of this email
>> I don't really care which framework implements the QoS request
>> aggregation.
>>
>> The point of describing this non-existant API is that devfreq should
>> really be just another input into it.  A governor that can measure bus
>> saturation is really cool, but it may not yield optimal results
>> compared to several drivers which make QoS-style requests and insure
>> that performance is guaranteed for their particular needs during their
>> transactions.  The good news is that we don't have to choose between
>> performance counter introspection and software QoS requests: both the
>> driver requests and the governor should all feed as inputs into the
>> QoS-style DVFS mechanism.
>>
>> Taking that logic to its inevitable conclusion, tickle doesn't belong
>> inside the governor at all.  If some device X wants to ramp up the
>> frequency of device Y, it should just make a QoS-style throughput
>> request towards device Y, possibly with a timeout (keeping the
>> original idea of tickle intact).  This is entirely a separate idea
>> from a governor's introspective workqueue loop.
>>
>> For userspace, a sysfs entry for tickle would also not feed into the
>> governor, but some dummy struct device *user would probably be the
>> initiator device and it would simply call the QoS-style throughput
>> API.
>>
>> In summary my objections to this series are:
>> 1) devfreq should not be the *final* software layer to invoke a DVFS
>> transition as it has not taken all constraints into account.
>> 2) a devfreq governor represents just one constraint out of many to be
>> considered for any given scalable device.
>>
>> My objection to these patches getting merged is that I think they are
>> a bit ahead of their time.
>
> Still, we're merging quite a number of patches being ahead of their time.
> The resulting code is then modified as we learn what's wrong with it or
> how to improve it.
>
> Why exactly do you think this approach will not work in this particular case?

I shouldn't have objected to the patches being merged, because I think
they are OK for a subset of the bigger DVFS problem.  The "governor"
method seems fine for devices with performance counters and whose
drivers do not express performance constraints.

>> We need to know what the real DVFS API looks like underneath devfreq first,
>> since devfreq should really be built on top of it.
>
> Well, quite frankly, if we generally adopted that point of view, much of the
> useful functionality we have in the kernel right now wouldn't be merged at all.

Yes yes, I got ahead of myself; my real goal is to restart the
discussion that popped up in the V1 patchset regarding some API for
handling clock/voltage scaling for DVFS transitions.  However it
doesn't mean that these patches need to be blocked.

I do have an overall concern about the approach mentioned by MyungJoo
where drivers start enabling/disabling OPPs and then the governors
compensate by raising frequency/voltage the next time their workqueues
loop around.  That seems entirely backwards for devices that express
performance needs on an event-driven basis, so my concerns are more
for how these patches *might* be used in the future, and less about
how they look right now.

We still don't have a good way to manage DVFS transitions and
aggregate QoS requests for an SoC.  My hope is when that problem gets
solved devfreq will use those new APIs in its ->target function.  I
also hope that too many drivers won't start using "tickle" as a
substitute for a real DVFS API.

Regards,
Mike

> Think of the USB subsystem, for one example, that has been rewritten from
> scratch no fewer that three times.
>
> The code appears to be reasonably isolated and simple enough to merge.
> There is a subsystem wanting to use it and I don't see anyone forcing
> anybody else to adopt it.  If it's not suitable to you, you won't be using it,
> plain and simple.  And if you come up with some better code to replace it,
> I won't have any problems with taking that too.
>
> Thanks,
> Rafael
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-01  6:22         ` MyungJoo Ham
@ 2011-08-01 22:01           ` Turquette, Mike
  2011-08-02  7:17             ` MyungJoo Ham
  0 siblings, 1 reply; 30+ messages in thread
From: Turquette, Mike @ 2011-08-01 22:01 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Sun, Jul 31, 2011 at 11:22 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> Hello.
>
> On Sat, Jul 30, 2011 at 10:02 AM, Turquette, Mike <mturquette@ti.com> wrote:
>> On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>> On Friday, July 29, 2011, Turquette, Mike wrote:
>>>> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>>> > On Friday, July 15, 2011, MyungJoo Ham wrote:
>>>> >> For a usage example, please look at
>>>> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>>> >>
>>>> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>>>> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>>>> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>>>> >> and other related clocks simply follow the determined DDR RAM clock.
>>>> >>
>>>> >> The DEVFREQ driver for Exynos4210 memory bus is at
>>>> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>>>> >>
>>>> >> MyungJoo Ham (3):
>>>> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>>> >>     OPPs
>>>> >>   PM / DEVFREQ: add example governors
>>>> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
>>>> >
>>>> > OK, I'm going to take the patches for 3.2.
>>>>
>>>> Have any other platforms signed up to use this mechanism to manage
>>>> their peripheral DVFS?
>>>
>>> Not that I know of, but one initial user is sufficient for me.
>>> So if you have anything _against_ the patches, please speak up.
>>
>> I do have some concerns.  Let me start by saying that I'm defining a
>> "governor" as some active piece of executing code, probably a looping
>> workqueue that inspects activity/idleness of a device and then makes a
>> determination regarding clock frequency.
>>
>> devfreq seems to be good framework for creating DVFS governors.
>> However I think that most scalable devices on an SoC do *not* need a
>> governor, and many scalable devices won't have performance counters or
>> any other way to implement such introspection.
>
> Yes, governors except for some static or userspace-driven ones (such
> as "performance", "powersave", and "userspace" although "userspace" is
> not implemented for devfreq yet), they loop workqueue that inspects
> activity/idleness of a device and determines frequency. However, the
> inspection is done with a callback provided by each device, not done
> directly by the devfreq itself. Therefore, if there is any way to
> measure the activities (not just performance counters, number of
> requests/function calls should be fine for may cases), normal
> governors like "simple-ondemand" will work.

Maybe I'm not understanding how the devfreq requests would be made
from drivers.  Can you explain an example where a single target device
named X has constraints placed on it's clock rate from two different
drivers Y & Z?  Imagine in this case that there are no performance
counters or any way in hardware to monitor device saturation.

>> Some examples include a MMC controller, which might change its clock
>> rate depending on the class of card that the user has inserted.  Or
>> even a "smartish" device like a GPU lacking performance counters; it's
>> driver will ramp up frequency when there is work to be done and kick
>> off a timeout.  If no new work comes in before the timeout then the
>> driver will drop the frequency.
>
> In the "simple MMC controller w/o performance counter" case, there are
> following ways to use devfreq even if using the number of requests or
> functions calls is not possible.
>
> Method 1) use "userspace" governor and let user process choose
> frequency based on the class

I'm less interested in userspace control of MMC controller operating
frequency and much more interested in how devfreq might arbitrate QoS
requests from multiple "client" devices.

> Method 2) use any "reasonable" governor and let the device driver set
> only "valid" frequencies enabled.

Can you elaborate on this?  I'm not sure I understand how this will
look in driver code.  Maybe the example I requested above will shed
some light.

>   For a rough example, we may do if class < 6, disable freq > 40MHz,
> class < 10, disable freq > 80MHz, and so on. If we do not have
> performance counters or any other mechanisms to monitor the
> activities, "performance" governor along with clock-gated MMC driver
> will save enough power.
>
> For GPUs without anything to monitor the activities, we may do the
> same as the MMC case.
>
> However, with the H/W I've got now, (Exynos4210), we have performance
> counters (PPMU) for many blocks: 3D(MALI GPU), ACP, CAMIF, CPU, DMC0,
> DMC1 (memory controllers), FSYS, IMAGE, LCD0, LCD1, MFC_L, MFC_R, TV,
> LEFT_BUS, and RIGHT_BUS. I don't think Exynos4 is an exceptionally
> fancy SoC (already millions are sold for phones) and other mobile SoCs
> (at least for flagship models) will have them very soon (or already
> have them). Along with this patch, in the example with git branch
> link, we control DMC0/DMC1 blocks. And,

I agree devfreq is well-suited for such hardware.

>> A governor is not required in these cases (as they are event driven)
>> and devfreq is quite heavyweight for such applications.  What is
>> needed is a QoS-style software layer that allows throughput requests
>> to be made from an initiator device towards a target device.  This
>> layer should aggregate requests since many initator devices may make
>> requests to the same target device.  This layer I'm describing, which
>> does not exist today, should be where the actual DVFS transition takes
>> place.  That could take the form of a clk_set_rate call in the clock
>> framework (as described by Colin in V1 of this series), or some other
>> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
>> per-device PM QoS patches or whatever.  For the purposes of this email
>> I don't really care which framework implements the QoS request
>> aggregation.
>
> Such aggregation could be also done with governors. If the
> governor-device pair does not want to poll devfreq wouldn't loop
> unless there is any governor-device pair that wants to do so. If it is
> event-driven, users may just "allow/disallow" frequencies with OPP
> framework and devfreq will choose proper frequency with the given
> governor for the device. If every device uses "static" or
> "event-driven" governors such as powersave/performance/userspace,
> there will be no polling/looping.

So drivers must disable OPPs, and then the non-polling devfreq
governor will have to be notified by the OPP code and then run it's
->target code again?  This sounds backwards to me.

devfreq seems like an ideal bit of code to understand the constraints
needed by a device (via the workqueue/monitor loop) and then request
those needs via the proper API.  It seems entirely wrong to me to have
other device drivers send their QoS needs to devfreq.

I'm starting to sound like a broken record though, and I've rescinded
my NAK in my reply to Rafael.  If you could explain how multiple
drivers can request their performance needs to a devfreq governor
(same question I asked above) then that would be really helpful.

Thanks,
Mike

> When it is going to be directly controlled by userspace, we'll need a
> "userspace" governor (same with userspace governor of cpufreq).
>
> If there is a QoS request for a devfreq-ed device, the request could
> be done with OPP's frequency enable/disable. If a device is to be
> executed at 400MHz or faster, all frequencies under 400MHz could be
> simply disabled w/ OPP. Devfreq governors cannot override such
> frequency enable/disable configurations.
>
> However, if such QoS requests need delays (timers) like tickle, a
> generalized tickle supplied with frequency or percent of max-frequency
> might work. (i.e., tickle(dev, freuqency, duration); ) Then, this
> generalized tickle will hold at the request frequency or higher by
> disabling lower frequencies temporarily.
>
>>
>> The point of describing this non-existant API is that devfreq should
>> really be just another input into it.  A governor that can measure bus
>> saturation is really cool, but it may not yield optimal results
>> compared to several drivers which make QoS-style requests and insure
>> that performance is guaranteed for their particular needs during their
>> transactions.  The good news is that we don't have to choose between
>> performance counter introspection and software QoS requests: both the
>> driver requests and the governor should all feed as inputs into the
>> QoS-style DVFS mechanism.
>>
>> Taking that logic to its inevitable conclusion, tickle doesn't belong
>> inside the governor at all.  If some device X wants to ramp up the
>> frequency of device Y, it should just make a QoS-style throughput
>> request towards device Y, possibly with a timeout (keeping the
>> original idea of tickle intact).  This is entirely a separate idea
>> from a governor's introspective workqueue loop.
>
> Although tickle is sharing the same loop with governors, tickle does
> not belong inside governors. Tickle overrides the decisions of
> governors; governor's decision function is not called if the device is
> being tickled. However, generalizing the tickle function so that it
> may take "at least at xx % of max frequency" or "operate at least xx
> khz" as an option seems reasonable for QoS requests. And such options
> might be implemented for next version of devfreq later. This requires
> modification in tickle function interface or adding another interface
> for tickle function. However, if such QoS requests do not need
> duration set, we can just go with OPP's frequency enable/disable and
> disable lower-than-QoS-requirement frequencies.
>
> Thus, I guess this QoS issue is somewhat not very significant for
> devfreq. And it can be easily mitigated by adding another interface or
> modifying the interface of tickle function.
>
>>
>> For userspace, a sysfs entry for tickle would also not feed into the
>> governor, but some dummy struct device *user would probably be the
>> initiator device and it would simply call the QoS-style throughput
>> API.
>>
>> In summary my objections to this series are:
>> 1) devfreq should not be the *final* software layer to invoke a DVFS
>> transition as it has not taken all constraints into account.
>> 2) a devfreq governor represents just one constraint out of many to be
>> considered for any given scalable device.
>
> If the concern is about the QoS requests, I guess generalizing tickle
> would be sufficient as above. For devices without performance counters
> and any other mechanisms to infer the usage statistics, "performance"
> governor with event-driven OPP freq-enable/disable should be fine.
>
>>
>> My objection to these patches getting merged is that I think they are
>> a bit ahead of their time.  We need to know what the real DVFS API
>> looks like underneath devfreq first, since devfreq should really be
>> built on top of it.
>>
>> Regards,
>> Mike
>>
>>> Thanks,
>>> Rafael
>>>
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>
>
> Cheers!
> MyungJoo.
>
>
> --
> MyungJoo Ham (함명주), Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
>
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-01 22:01           ` Turquette, Mike
@ 2011-08-02  7:17             ` MyungJoo Ham
  0 siblings, 0 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-02  7:17 UTC (permalink / raw)
  To: Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Tue, Aug 2, 2011 at 7:01 AM, Turquette, Mike <mturquette@ti.com> wrote:
>
> Maybe I'm not understanding how the devfreq requests would be made
> from drivers.  Can you explain an example where a single target device
> named X has constraints placed on it's clock rate from two different
> drivers Y & Z?  Imagine in this case that there are no performance
> counters or any way in hardware to monitor device saturation.

Ok, what you want to see is the case where X has a clock with OPP and
DEVFREQ and Y and Z are going to give constraints on that X's clock,
right?

In such a case, DEVFREQ has nothing to interfere directly with the
relation between X <--> Y/Z.

Y and Z can give constraints on X's clock with OPP interface (
opp_enable(dev, freq) and opp_disable(dev, freq) ) without the need
for DEVFREQ-aware.

DEVFREQ chooses frequencies from enabled OPPs regardless of the governor chosen.

However, if your concern is about the inconsistency between Y and Z
caused by calling opp_enable to "cancel" opp_disable, DEVFREQ provides
no protection against it and writers of Y/Z will need to do some
bothersome work unless QoS request feature (or generalized tickle) is
added to DEVFREQ. Anyway, (as will be discussed below) I guess the QoS
request feature might wait for next version of DEVFREQ.

>
>>> Some examples include a MMC controller, which might change its clock
>>> rate depending on the class of card that the user has inserted.  Or
>>> even a "smartish" device like a GPU lacking performance counters; it's
>>> driver will ramp up frequency when there is work to be done and kick
>>> off a timeout.  If no new work comes in before the timeout then the
>>> driver will drop the frequency.
>>
>> In the "simple MMC controller w/o performance counter" case, there are
>> following ways to use devfreq even if using the number of requests or
>> functions calls is not possible.
>>
>> Method 1) use "userspace" governor and let user process choose
>> frequency based on the class
>
> I'm less interested in userspace control of MMC controller operating
> frequency and much more interested in how devfreq might arbitrate QoS
> requests from multiple "client" devices.
>
>> Method 2) use any "reasonable" governor and let the device driver set
>> only "valid" frequencies enabled.
>
> Can you elaborate on this?  I'm not sure I understand how this will
> look in driver code.  Maybe the example I requested above will shed
> some light.

If you are concerned about the consistency (between Y and Z's
enable/disable calls) problem in the previous X/Y/Z example, it'd be
addressed with "generalized tickle" or "QoS requests" unless those Y
and Z are aware of each other. I'd say such a feature is for the
"next" version of DEVFREQ as it is not going to affect the framework
itself significantly.

Anyway, the interface I'm thinking about are:
Method1:
   id = devfreq_qos_request(dev, freq); /* sets dev's frequency at
freq or higher */
   devfreq_qos_release(dev, id);
OR
   devfreq_qos_request(this_dev, target_dev, freq); /* this_dev sets
target_dev's frequency at freq or higher */
   devfreq_qos_release(this_dev, target_dev);

Method1 would be suitable for usual qos requests from related devices.
If there are multiple qos requests active, the highest requested freq
is used.
Internally, devfreq will manage a sorted (descending with freq) list
of "this_dev" or "id" per target_dev and enforce the target-freq to be
>= the highest freq in the list.

Method2:
   devfreq_tickle(dev, rate, duration); /* sets dev's frequency at its
maximum frequency * rate / 100 for duration in ms */

Method2 would be suitable for reacting to inputs (e,g, a user hitting
a key, clicking a mouse, touching a screen, ...).



>
>>   For a rough example, we may do if class < 6, disable freq > 40MHz,
>> class < 10, disable freq > 80MHz, and so on. If we do not have
>> performance counters or any other mechanisms to monitor the
>> activities, "performance" governor along with clock-gated MMC driver
>> will save enough power.
>>
>> For GPUs without anything to monitor the activities, we may do the
>> same as the MMC case.
>>
>> However, with the H/W I've got now, (Exynos4210), we have performance
>> counters (PPMU) for many blocks: 3D(MALI GPU), ACP, CAMIF, CPU, DMC0,
>> DMC1 (memory controllers), FSYS, IMAGE, LCD0, LCD1, MFC_L, MFC_R, TV,
>> LEFT_BUS, and RIGHT_BUS. I don't think Exynos4 is an exceptionally
>> fancy SoC (already millions are sold for phones) and other mobile SoCs
>> (at least for flagship models) will have them very soon (or already
>> have them). Along with this patch, in the example with git branch
>> link, we control DMC0/DMC1 blocks. And,
>
> I agree devfreq is well-suited for such hardware.
>
>>> A governor is not required in these cases (as they are event driven)
>>> and devfreq is quite heavyweight for such applications.  What is
>>> needed is a QoS-style software layer that allows throughput requests
>>> to be made from an initiator device towards a target device.  This
>>> layer should aggregate requests since many initator devices may make
>>> requests to the same target device.  This layer I'm describing, which
>>> does not exist today, should be where the actual DVFS transition takes
>>> place.  That could take the form of a clk_set_rate call in the clock
>>> framework (as described by Colin in V1 of this series), or some other
>>> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
>>> per-device PM QoS patches or whatever.  For the purposes of this email
>>> I don't really care which framework implements the QoS request
>>> aggregation.
>>
>> Such aggregation could be also done with governors. If the
>> governor-device pair does not want to poll devfreq wouldn't loop
>> unless there is any governor-device pair that wants to do so. If it is
>> event-driven, users may just "allow/disallow" frequencies with OPP
>> framework and devfreq will choose proper frequency with the given
>> governor for the device. If every device uses "static" or
>> "event-driven" governors such as powersave/performance/userspace,
>> there will be no polling/looping.
>
> So drivers must disable OPPs, and then the non-polling devfreq
> governor will have to be notified by the OPP code and then run it's
> ->target code again?  This sounds backwards to me.

DEVFREQ (not its governors. governors only "recommend" proper
frequency to DEVFREQ framework when requested by DEVFREQ.) is already
being notified by any OPP changes (add/disable/enable) so that DEVFREQ
wouldn't choose disabled frequencies. That way, disabling and enabling
frequencies at OPP takes effects immediately with DEVFREQ.

More semantically sound approach may be to let OPP have a notifier
(per device) so that the changes in the opp availability go to the
"OPP consumers". However, at least for now, DEVFREQ is the only one
that needs such notification. Therefore, using such a notifier per
device only for DEVFREQ (moreover, not all of OPP'ed devices are using
DEVFREQ) could incur too much overhead as notifier is heavier than a
simple function call.

>
> devfreq seems like an ideal bit of code to understand the constraints
> needed by a device (via the workqueue/monitor loop) and then request
> those needs via the proper API.  It seems entirely wrong to me to have
> other device drivers send their QoS needs to devfreq.

Tickle is an approach for temporal QoS requests. And, I understand
that there are needs for non-tempoeral QoS requests. However, I guess
it might be ok to let QoS requests be "next TODO" subjects for
DEVFREQ. Besides, some engineers have already requested QoS request
feature for DEVFREQ in my side as well. :)

>
> I'm starting to sound like a broken record though, and I've rescinded
> my NAK in my reply to Rafael.  If you could explain how multiple
> drivers can request their performance needs to a devfreq governor
> (same question I asked above) then that would be really helpful.

Without QoS request feature (the Method1 up there), using
opp_enable/disable is the only feasible way unless tickle fits for the
need. (managing "canceling disable" could be bothersome without the
Method1 anyway...)

>
> Thanks,
> Mike
>
-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-07-15  8:11 ` [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
@ 2011-08-02 18:45   ` Kevin Hilman
  2011-08-03  8:06     ` MyungJoo Ham
  2011-08-02 21:56   ` Kevin Hilman
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Hilman @ 2011-08-02 18:45 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, linux-pm, Greg Kroah-Hartman, Thomas Gleixner, Kyungmin Park

MyungJoo Ham <myungjoo.ham@samsung.com> writes:

> With OPPs, a device may have multiple operable frequency and voltage
> sets. However, there can be multiple possible operable sets and a system
> will need to choose one from them. In order to reduce the power
> consumption (by reducing frequency and voltage) without affecting the
> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
> scheme may be used.
>
> This patch introduces the DVFS capability to non-CPU devices with OPPs.
> DVFS is a techique whereby the frequency and supplied voltage of a
> device is adjusted on-the-fly. DVFS usually sets the frequency as low
> as possible with given conditions (such as QoS assurance) and adjusts
> voltage according to the chosen frequency in order to reduce power
> consumption and heat dissipation.
>
> The generic DVFS for devices, DEVFREQ, may appear quite similar with
> /drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
> devices registered and is not suitable to have multiple heterogenous
> devices with different (but simple) governors.
>
> Normally, DVFS mechanism controls frequency based on the demand for
> the device, and then, chooses voltage based on the chosen frequency.
> DEVFREQ also controls the frequency based on the governor's frequency
> recommendation and let OPP pick up the pair of frequency and voltage
> based on the recommended frequency. Then, the chosen OPP is passed to
> device driver's "target" callback.
>
> Tested with memory bus of Exynos4-NURI board.
>
> The test code with board support for Exynos4-NURI is at
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>
> --
> Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.
>
> Changed from v3
> - In kerneldoc comments, DEVFREQ has ben replaced by devfreq

FYI... there are still lots of kerneldoc comments in this version with
DEVFREQ instead of devfreq, particularily in devfreq.h.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-07-15  8:11 ` [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
  2011-08-02 18:45   ` Kevin Hilman
@ 2011-08-02 21:56   ` Kevin Hilman
  2011-08-03  6:02     ` MyungJoo Ham
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Hilman @ 2011-08-02 21:56 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, linux-pm, Greg Kroah-Hartman, Thomas Gleixner, Kyungmin Park

MyungJoo Ham <myungjoo.ham@samsung.com> writes:

> With OPPs, a device may have multiple operable frequency and voltage
> sets. However, there can be multiple possible operable sets and a system
> will need to choose one from them. In order to reduce the power
> consumption (by reducing frequency and voltage) without affecting the
> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
> scheme may be used.
>
> This patch introduces the DVFS capability to non-CPU devices with OPPs.
> DVFS is a techique whereby the frequency and supplied voltage of a
> device is adjusted on-the-fly. DVFS usually sets the frequency as low
> as possible with given conditions (such as QoS assurance) and adjusts
> voltage according to the chosen frequency in order to reduce power
> consumption and heat dissipation.
>
> The generic DVFS for devices, DEVFREQ, may appear quite similar with
> /drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
> devices registered and is not suitable to have multiple heterogenous
> devices with different (but simple) governors.
>
> Normally, DVFS mechanism controls frequency based on the demand for
> the device, and then, chooses voltage based on the chosen frequency.
> DEVFREQ also controls the frequency based on the governor's frequency
> recommendation and let OPP pick up the pair of frequency and voltage
> based on the recommended frequency. Then, the chosen OPP is passed to
> device driver's "target" callback.
>
> Tested with memory bus of Exynos4-NURI board.
>
> The test code with board support for Exynos4-NURI is at
> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

[...]

> +int devfreq_update(struct device *dev)
> +{
> +	struct devfreq *devfreq;
> +	int err = 0;
> +
> +	mutex_lock(&devfreq_list_lock);
> +
> +	devfreq = find_device_devfreq(dev);
> +	if (IS_ERR(devfreq)) {
> +		err = PTR_ERR(devfreq);
> +		goto out;
> +	}
> +
> +	/*
> +	 * If the maximum frequency available is changed either by
> +	 * enabling higher frequency or disabling the current
> +	 * maximum frequency, we need to adjust the frequency
> +	 * (tickle) again if the device has been being tickled.
> +	 */
> +	if (devfreq->tickle) {
> +		unsigned long freq = devfreq->profile->max_freq;
> +		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> +		if (IS_ERR(opp)) {
> +			err = PTR_ERR(opp);
> +			goto out;
> +		}
> +
> +		/* Max freq available is not changed */
> +		if (devfreq->previous_freq == freq)
> +			goto out;
> +
> +		/* Tickle again. Max freq available is changed */
> +		err = devfreq->profile->target(devfreq->dev, opp);
> +		if (!err)
> +			devfreq->previous_freq = freq;

This looks an awful lot like _devfreq_tickle_device()... wondering why
that is not called here.

> +	} else {
> +		/* Reevaluate the proper frequency */
> +		err = devfreq_do(devfreq);
> +	}
> +
> +out:
> +	mutex_unlock(&devfreq_list_lock);
> +	return err;
> +}

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-07-28 22:10 ` [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices Rafael J. Wysocki
  2011-07-29  4:46   ` Turquette, Mike
@ 2011-08-02 22:02   ` Kevin Hilman
  2011-08-03  7:03     ` MyungJoo Ham
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Hilman @ 2011-08-02 22:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, MyungJoo Ham,
	Thomas Gleixner, linux-pm

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Friday, July 15, 2011, MyungJoo Ham wrote:
>> For a usage example, please look at
>> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>> 
>> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>> and other related clocks simply follow the determined DDR RAM clock.
>> 
>> The DEVFREQ driver for Exynos4210 memory bus is at
>> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>> 
>> MyungJoo Ham (3):
>>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>     OPPs
>>   PM / DEVFREQ: add example governors
>>   PM / DEVFREQ: add sysfs interface (including user tickling)
>
> OK, I'm going to take the patches for 3.2.

Sorry for being late to the discussion, but personally I don't think
this should be merged for v3.2.

First, I think the governor part of this series is basically fine, and
can see some cases where it would be useful, but as Mike has pointed
out, there is still a majority of devices for which a governor like this
would be overkill.

My main problem is with the QoS aspects.  There is significant overlap
between this approach and the per-device PM QoS approach currently being
proposed by Jean Pihet, and I think any sort of per-device DVFS should
be built on top of a more generic per-device QoS layer (such as Jean's.)

This series currently provides a *very* basic QoS mechanism (e.g. fixed
duration frequency constraint) in the form of "tickle", which BTW I seem
to having a hard time understanding (more on that below...)

More importantly though, this series also introduces a sysfs layer for
doing its QoS-like stuff, so adding this and then adding a more generic
per-device QoS is asking for confusion about how userspace is to do QoS.
And adding a sysfs interface may be turn out to be difficult to remove.

Basically, without a more general constraints mechanism in place, I
don't see how this can be generally useful since there are too many
assumptions made with the current "tickle" approach, and as Mike has
pointed out, it cannot cleanly handle cases where there might be
multiple DVFS-related constraints on a given device.

OK, back to "tickle"...  I haven't yet fully understood how that
interface is intended to be used, or who the potential users might be
and it is not documented in the code or changelog.  I also didn't see
any users of that API (except the sysfs code.)

IIUC, tickle is just basically a way to set a frequency constraint on a
device for a fixed duration.  However, if tickle has been requested, any
OPP change will also force a change to the highest performance OPP
temporarily before changing to the target OPP.

Maybe I'm not understanding the usage of it fully, but that seems like
hard-coding policy into the framework that might not be appropriate.
For example, what if there are other devices with constraints such that
they cannot currently scale frequency/voltage?

Mabye MyungJoo can explain in more detail the usecases for tickle?

Thanks,

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-08-02 21:56   ` Kevin Hilman
@ 2011-08-03  6:02     ` MyungJoo Ham
  0 siblings, 0 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-03  6:02 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Kyungmin Park, Len Brown, linux-pm, Greg Kroah-Hartman, Thomas Gleixner

On Wed, Aug 3, 2011 at 6:56 AM, Kevin Hilman <khilman@ti.com> wrote:
> MyungJoo Ham <myungjoo.ham@samsung.com> writes:
>
>> With OPPs, a device may have multiple operable frequency and voltage
>> sets. However, there can be multiple possible operable sets and a system
>> will need to choose one from them. In order to reduce the power
>> consumption (by reducing frequency and voltage) without affecting the
>> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
>> scheme may be used.
>>
>> This patch introduces the DVFS capability to non-CPU devices with OPPs.
>> DVFS is a techique whereby the frequency and supplied voltage of a
>> device is adjusted on-the-fly. DVFS usually sets the frequency as low
>> as possible with given conditions (such as QoS assurance) and adjusts
>> voltage according to the chosen frequency in order to reduce power
>> consumption and heat dissipation.
>>
>> The generic DVFS for devices, DEVFREQ, may appear quite similar with
>> /drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
>> devices registered and is not suitable to have multiple heterogenous
>> devices with different (but simple) governors.
>>
>> Normally, DVFS mechanism controls frequency based on the demand for
>> the device, and then, chooses voltage based on the chosen frequency.
>> DEVFREQ also controls the frequency based on the governor's frequency
>> recommendation and let OPP pick up the pair of frequency and voltage
>> based on the recommended frequency. Then, the chosen OPP is passed to
>> device driver's "target" callback.
>>
>> Tested with memory bus of Exynos4-NURI board.
>>
>> The test code with board support for Exynos4-NURI is at
>> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>
> [...]
>
>> +int devfreq_update(struct device *dev)
>> +{
>> +     struct devfreq *devfreq;
>> +     int err = 0;
>> +
>> +     mutex_lock(&devfreq_list_lock);
>> +
>> +     devfreq = find_device_devfreq(dev);
>> +     if (IS_ERR(devfreq)) {
>> +             err = PTR_ERR(devfreq);
>> +             goto out;
>> +     }
>> +
>> +     /*
>> +      * If the maximum frequency available is changed either by
>> +      * enabling higher frequency or disabling the current
>> +      * maximum frequency, we need to adjust the frequency
>> +      * (tickle) again if the device has been being tickled.
>> +      */
>> +     if (devfreq->tickle) {
>> +             unsigned long freq = devfreq->profile->max_freq;
>> +             struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
>> +
>> +             if (IS_ERR(opp)) {
>> +                     err = PTR_ERR(opp);
>> +                     goto out;
>> +             }
>> +
>> +             /* Max freq available is not changed */
>> +             if (devfreq->previous_freq == freq)
>> +                     goto out;
>> +
>> +             /* Tickle again. Max freq available is changed */
>> +             err = devfreq->profile->target(devfreq->dev, opp);
>> +             if (!err)
>> +                     devfreq->previous_freq = freq;
>
> This looks an awful lot like _devfreq_tickle_device()... wondering why
> that is not called here.
>

The device is already tickled by a user and we do not need to update
the timing/delay information about the current tickle. We only need to
adjust frequency in case the maximum available frequency has been
changed.

Anyway, in order to reduce redundant code, I will modify
_devfreq_tickle_device to be able to be called by devfreq_update().
For now, if devfreq_update() uses _devfreq_tickle_device without
modification, it will calculate num_tickle incorrectly and setting
delay of _devfreq_tickle_device again is required. I will let
_devfreq_tickle_device take delay = 0 for devfreq_update().

Thanks.

- MyungJoo

>> +     } else {
>> +             /* Reevaluate the proper frequency */
>> +             err = devfreq_do(devfreq);
>> +     }
>> +
>> +out:
>> +     mutex_unlock(&devfreq_list_lock);
>> +     return err;
>> +}
>
> Kevin
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>



-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-02 22:02   ` Kevin Hilman
@ 2011-08-03  7:03     ` MyungJoo Ham
  2011-08-03 17:31       ` Turquette, Mike
  2011-08-03 18:33       ` Kevin Hilman
  0 siblings, 2 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-03  7:03 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Wed, Aug 3, 2011 at 7:02 AM, Kevin Hilman <khilman@ti.com> wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>
>> On Friday, July 15, 2011, MyungJoo Ham wrote:
>>> For a usage example, please look at
>>> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>>
>>> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>>> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>>> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>>> and other related clocks simply follow the determined DDR RAM clock.
>>>
>>> The DEVFREQ driver for Exynos4210 memory bus is at
>>> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>>>
>>> MyungJoo Ham (3):
>>>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>>     OPPs
>>>   PM / DEVFREQ: add example governors
>>>   PM / DEVFREQ: add sysfs interface (including user tickling)
>>
>> OK, I'm going to take the patches for 3.2.
>
> Sorry for being late to the discussion, but personally I don't think
> this should be merged for v3.2.
>
> First, I think the governor part of this series is basically fine, and
> can see some cases where it would be useful, but as Mike has pointed
> out, there is still a majority of devices for which a governor like this
> would be overkill.
>
> My main problem is with the QoS aspects.  There is significant overlap
> between this approach and the per-device PM QoS approach currently being
> proposed by Jean Pihet, and I think any sort of per-device DVFS should
> be built on top of a more generic per-device QoS layer (such as Jean's.)

Then, what about adding a governor? Besides, DEVFREQ is designed to
allow each device to create its own governors. The governors in the
devfreq.c are meant to be used commonly by "many" devices.

For PM QoS, we may have several approaches:
1. Adding a governor aware of PM QoS taking PM QoS parameters.
2. Modifying OPP to be aware of PM QoS so that only OPPs meeting the
requirement are enabled if the PM QoS requirement specifies frequency
or voltage conditions explicitly.
3. Modifying DEVFREQ to be aware of PM QoS and let DEVFREQ ignore
frequencies that do not satisfy the PM QoS requirement. As in approach
2, the requirement also needs to be explicit about frequency / voltage
conditions or we need to interpret devfreq_dev_status result in terms
of PM QoS.

>
> This series currently provides a *very* basic QoS mechanism (e.g. fixed
> duration frequency constraint) in the form of "tickle", which BTW I seem
> to having a hard time understanding (more on that below...)
>
> More importantly though, this series also introduces a sysfs layer for
> doing its QoS-like stuff, so adding this and then adding a more generic
> per-device QoS is asking for confusion about how userspace is to do QoS.
> And adding a sysfs interface may be turn out to be difficult to remove.
>
> Basically, without a more general constraints mechanism in place, I
> don't see how this can be generally useful since there are too many
> assumptions made with the current "tickle" approach, and as Mike has
> pointed out, it cannot cleanly handle cases where there might be
> multiple DVFS-related constraints on a given device.
>
> OK, back to "tickle"...  I haven't yet fully understood how that
> interface is intended to be used, or who the potential users might be
> and it is not documented in the code or changelog.  I also didn't see
> any users of that API (except the sysfs code.)
>
> IIUC, tickle is just basically a way to set a frequency constraint on a
> device for a fixed duration.  However, if tickle has been requested, any
> OPP change will also force a change to the highest performance OPP
> temporarily before changing to the target OPP.
>
> Maybe I'm not understanding the usage of it fully, but that seems like
> hard-coding policy into the framework that might not be appropriate.
> For example, what if there are other devices with constraints such that
> they cannot currently scale frequency/voltage?
>
> Mabye MyungJoo can explain in more detail the usecases for tickle?

Tickle is not for QoS between devices. It is for faster reaction to
(human) user inputs at DVFS side where waiting for DVFS's reaction
takes too much time and reducing polling interval costs too much. In
fact, this tickling method was quite effective with CPUFREQ's ondemand
governor (not upstreamed). We may tune DVFS constants to let it react
faster with lower threshold; however, this results in higher power
consumption with small load.


Here goes more detailed description about the issue intended to be
tackled by tickling, the response time of GUI. With DVFS (CPUFREQ), we
have been suffering from slower user response time (e.g., dragging
touch screen of mobile devices at "application drawer" or "web
browsers"). Let's assume a system with a CPU that runs at the range of
100MHz ~ 2GHz and a GUI that requires at least1.5GHz for smooth
transitions. If we are going to use 60Hz based display and 20ms
CPUFREQ polling interval, a sudden user input requires 30ms of delay
in average to get CPU to work at high speed, which loses 2 frames
(often, it takes more time/frames for CPUFREQ to react; I don't know
why). If a user repeatedly drags and stops the touch screen, such
delay and lost frames become very noticeable and the screen follows
user's finger retarded. With CPUFREQ-ondemand, tickling has been made
such test to have almost same result with that of "performance"
governor.


For the sysfs interface... we actually do not need sysfs interface for
tickling if we are going to allow tickle function to the interrupt
handler of HCI devices (touch screen, keyboards, mice, ...). However,
it seems that many people don't like that idea; thus, I thought I need
to allow userspace user-input handlers to tickle.

>
> Thanks,
>
> Kevin
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>



-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
  2011-08-02 18:45   ` Kevin Hilman
@ 2011-08-03  8:06     ` MyungJoo Ham
  0 siblings, 0 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-03  8:06 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Kyungmin Park, Len Brown, linux-pm, Greg Kroah-Hartman, Thomas Gleixner

On Wed, Aug 3, 2011 at 3:45 AM, Kevin Hilman <khilman@ti.com> wrote:
> MyungJoo Ham <myungjoo.ham@samsung.com> writes:
>
>> With OPPs, a device may have multiple operable frequency and voltage
>> sets. However, there can be multiple possible operable sets and a system
>> will need to choose one from them. In order to reduce the power
>> consumption (by reducing frequency and voltage) without affecting the
>> performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
>> scheme may be used.
>>
>> This patch introduces the DVFS capability to non-CPU devices with OPPs.
>> DVFS is a techique whereby the frequency and supplied voltage of a
>> device is adjusted on-the-fly. DVFS usually sets the frequency as low
>> as possible with given conditions (such as QoS assurance) and adjusts
>> voltage according to the chosen frequency in order to reduce power
>> consumption and heat dissipation.
>>
>> The generic DVFS for devices, DEVFREQ, may appear quite similar with
>> /drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
>> devices registered and is not suitable to have multiple heterogenous
>> devices with different (but simple) governors.
>>
>> Normally, DVFS mechanism controls frequency based on the demand for
>> the device, and then, chooses voltage based on the chosen frequency.
>> DEVFREQ also controls the frequency based on the governor's frequency
>> recommendation and let OPP pick up the pair of frequency and voltage
>> based on the recommended frequency. Then, the chosen OPP is passed to
>> device driver's "target" callback.
>>
>> Tested with memory bus of Exynos4-NURI board.
>>
>> The test code with board support for Exynos4-NURI is at
>> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> --
>> Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.
>>
>> Changed from v3
>> - In kerneldoc comments, DEVFREQ has ben replaced by devfreq
>
> FYI... there are still lots of kerneldoc comments in this version with
> DEVFREQ instead of devfreq, particularily in devfreq.h.
>
> Kevin
> _______________________________________________

Ah.. I'll correct them.

Besides, I've found that the private data for governor is better
located at struct devfreq than at struct devfreq_governor as such
private data is "per-device" and the same governor (with same memory
location) may be shared between different devices.


Cheers!
MyungJoo

-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-03  7:03     ` MyungJoo Ham
@ 2011-08-03 17:31       ` Turquette, Mike
  2011-08-03 18:33       ` Kevin Hilman
  1 sibling, 0 replies; 30+ messages in thread
From: Turquette, Mike @ 2011-08-03 17:31 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Wed, Aug 3, 2011 at 12:03 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> On Wed, Aug 3, 2011 at 7:02 AM, Kevin Hilman <khilman@ti.com> wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>>
>>> On Friday, July 15, 2011, MyungJoo Ham wrote:
>>>> For a usage example, please look at
>>>> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>>>>
>>>> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>>>> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>>>> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>>>> and other related clocks simply follow the determined DDR RAM clock.
>>>>
>>>> The DEVFREQ driver for Exynos4210 memory bus is at
>>>> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>>>>
>>>> MyungJoo Ham (3):
>>>>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>>>>     OPPs
>>>>   PM / DEVFREQ: add example governors
>>>>   PM / DEVFREQ: add sysfs interface (including user tickling)
>>>
>>> OK, I'm going to take the patches for 3.2.
>>
>> Sorry for being late to the discussion, but personally I don't think
>> this should be merged for v3.2.
>>
>> First, I think the governor part of this series is basically fine, and
>> can see some cases where it would be useful, but as Mike has pointed
>> out, there is still a majority of devices for which a governor like this
>> would be overkill.
>>
>> My main problem is with the QoS aspects.  There is significant overlap
>> between this approach and the per-device PM QoS approach currently being
>> proposed by Jean Pihet, and I think any sort of per-device DVFS should
>> be built on top of a more generic per-device QoS layer (such as Jean's.)
>
> Then, what about adding a governor? Besides, DEVFREQ is designed to

No one is saying that you can't have a governor to control DVFS
transitions for your device.  The point is that the governor should
make a call to the per-device PM QoS DVFS API in it's ->target()
function.

The fundamental difference is this: you want devfreq to be the final
place where DVFS transitions actually occur (meaning clocks and
voltage rails get scaled).  Kevin and I are both arguing for devfreq
to be policy built on top of a dedicated DVFS API for managing those
transitions.

(Kevin, let me know if I'm putting words in your mouth)

> allow each device to create its own governors. The governors in the
> devfreq.c are meant to be used commonly by "many" devices.
>
> For PM QoS, we may have several approaches:
> 1. Adding a governor aware of PM QoS taking PM QoS parameters.
> 2. Modifying OPP to be aware of PM QoS so that only OPPs meeting the
> requirement are enabled if the PM QoS requirement specifies frequency
> or voltage conditions explicitly.
> 3. Modifying DEVFREQ to be aware of PM QoS and let DEVFREQ ignore
> frequencies that do not satisfy the PM QoS requirement. As in approach
> 2, the requirement also needs to be explicit about frequency / voltage
> conditions or we need to interpret devfreq_dev_status result in terms
> of PM QoS.
>
>>
>> This series currently provides a *very* basic QoS mechanism (e.g. fixed
>> duration frequency constraint) in the form of "tickle", which BTW I seem
>> to having a hard time understanding (more on that below...)
>>
>> More importantly though, this series also introduces a sysfs layer for
>> doing its QoS-like stuff, so adding this and then adding a more generic
>> per-device QoS is asking for confusion about how userspace is to do QoS.
>> And adding a sysfs interface may be turn out to be difficult to remove.
>>
>> Basically, without a more general constraints mechanism in place, I
>> don't see how this can be generally useful since there are too many
>> assumptions made with the current "tickle" approach, and as Mike has
>> pointed out, it cannot cleanly handle cases where there might be
>> multiple DVFS-related constraints on a given device.
>>
>> OK, back to "tickle"...  I haven't yet fully understood how that
>> interface is intended to be used, or who the potential users might be
>> and it is not documented in the code or changelog.  I also didn't see
>> any users of that API (except the sysfs code.)
>>
>> IIUC, tickle is just basically a way to set a frequency constraint on a
>> device for a fixed duration.  However, if tickle has been requested, any
>> OPP change will also force a change to the highest performance OPP
>> temporarily before changing to the target OPP.
>>
>> Maybe I'm not understanding the usage of it fully, but that seems like
>> hard-coding policy into the framework that might not be appropriate.
>> For example, what if there are other devices with constraints such that
>> they cannot currently scale frequency/voltage?
>>
>> Mabye MyungJoo can explain in more detail the usecases for tickle?
>
> Tickle is not for QoS between devices. It is for faster reaction to
> (human) user inputs at DVFS side where waiting for DVFS's reaction
> takes too much time and reducing polling interval costs too much. In
> fact, this tickling method was quite effective with CPUFREQ's ondemand
> governor (not upstreamed). We may tune DVFS constants to let it react
> faster with lower threshold; however, this results in higher power
> consumption with small load.

Do you think that your "tickle" method would go upstream for CPUfreq?
devfreq is analogous to CPUfreq is almost every way, so if that
solution would not be accepted for CPUfreq, then it should not be
accepted for devfreq either.

> Here goes more detailed description about the issue intended to be
> tackled by tickling, the response time of GUI. With DVFS (CPUFREQ), we
> have been suffering from slower user response time (e.g., dragging
> touch screen of mobile devices at "application drawer" or "web
> browsers"). Let's assume a system with a CPU that runs at the range of
> 100MHz ~ 2GHz and a GUI that requires at least1.5GHz for smooth
> transitions. If we are going to use 60Hz based display and 20ms
> CPUFREQ polling interval, a sudden user input requires 30ms of delay
> in average to get CPU to work at high speed, which loses 2 frames
> (often, it takes more time/frames for CPUFREQ to react; I don't know
> why). If a user repeatedly drags and stops the touch screen, such
> delay and lost frames become very noticeable and the screen follows
> user's finger retarded. With CPUFREQ-ondemand, tickling has been made
> such test to have almost same result with that of "performance"
> governor.

This can be achieved entirely with a DVFS framework that devices can
use to scale the CPU's frequency.  It requires no changes to CPUfreq
core or to the ondemand governor.  The CPUfreq driver makes a call to
the DVFS scaling API in it's ->target function, as do any other
devices which needs to hold a constraint against the CPU's clock speed
(in your case it sounds like the touchscreen driver would hold this
constraint).  Again, the "tickle" concept here is made redundant.

Regards,
Mike

> For the sysfs interface... we actually do not need sysfs interface for
> tickling if we are going to allow tickle function to the interrupt
> handler of HCI devices (touch screen, keyboards, mice, ...). However,
> it seems that many people don't like that idea; thus, I thought I need
> to allow userspace user-input handlers to tickle.
>
>>
>> Thanks,
>>
>> Kevin
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>
>
>
>
> --
> MyungJoo Ham (함명주), Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-03  7:03     ` MyungJoo Ham
  2011-08-03 17:31       ` Turquette, Mike
@ 2011-08-03 18:33       ` Kevin Hilman
  2011-08-04  8:15         ` MyungJoo Ham
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Hilman @ 2011-08-03 18:33 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

MyungJoo Ham <myungjoo.ham@samsung.com> writes:

> On Wed, Aug 3, 2011 at 7:02 AM, Kevin Hilman <khilman@ti.com> wrote:

[...]

>> Maybe I'm not understanding the usage of it fully, but that seems like
>> hard-coding policy into the framework that might not be appropriate.
>> For example, what if there are other devices with constraints such that
>> they cannot currently scale frequency/voltage?
>>
>> Mabye MyungJoo can explain in more detail the usecases for tickle?
>
> Tickle is not for QoS between devices. It is for faster reaction to
> (human) user inputs at DVFS side where waiting for DVFS's reaction
> takes too much time and reducing polling interval costs too much. 

This is exactly what quality of service (QoS) is about.

The user (whether it's a human user input or another device) has low
quality and expects higher quality.  It wants to request better quality,
so it needs a way to request it.   

The proposed "tickle" approach proposed here is simply a "request max
frequency for duration X" QoS request.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-03 18:33       ` Kevin Hilman
@ 2011-08-04  8:15         ` MyungJoo Ham
  2011-08-04 21:59           ` Turquette, Mike
  0 siblings, 1 reply; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-04  8:15 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Thu, Aug 4, 2011 at 3:33 AM, Kevin Hilman <khilman@ti.com> wrote:
> MyungJoo Ham <myungjoo.ham@samsung.com> writes:
>
>> On Wed, Aug 3, 2011 at 7:02 AM, Kevin Hilman <khilman@ti.com> wrote:
>
> [...]
>
>>> Maybe I'm not understanding the usage of it fully, but that seems like
>>> hard-coding policy into the framework that might not be appropriate.
>>> For example, what if there are other devices with constraints such that
>>> they cannot currently scale frequency/voltage?
>>>
>>> Mabye MyungJoo can explain in more detail the usecases for tickle?
>>
>> Tickle is not for QoS between devices. It is for faster reaction to
>> (human) user inputs at DVFS side where waiting for DVFS's reaction
>> takes too much time and reducing polling interval costs too much.
>
> This is exactly what quality of service (QoS) is about.
>
> The user (whether it's a human user input or another device) has low
> quality and expects higher quality.  It wants to request better quality,
> so it needs a way to request it.
>
> The proposed "tickle" approach proposed here is simply a "request max
> frequency for duration X" QoS request.
>
> Kevin
>

Ok.. I see.

Now, I can agree with you that tickle is subset of QoS request.

As long as we have QoS request feature on devices with either OPP or
DEVFREQ, tickling is not needed.

I'll remove tickle in the next revision (along with some bugfixes for
bugs found recently).


Anyway, it appears that clock-rate-wise QoS request may be dealt at
OPP so that the OPPs meeting the QoS requests w/ frequency or voltage
specifications are enabled and returned with opp_find_* functions.
Maybe we will need to separate enable/disable by
opp_enable()/opp_disable() from enable/disable by QoS requests so that
the two may have different semantics. Then, QoS requests cannot
override opp_disable and opp_enable cannot override QoS requests. This
way, any clock-setting code properly based on OPP (including any
customized devfreq governors) cannot violate QoS requests.

How about this concept of getting QoS requests associated with clock rates?



Cheers!
MyungJoo.
-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-04  8:15         ` MyungJoo Ham
@ 2011-08-04 21:59           ` Turquette, Mike
  2011-08-05  6:18             ` MyungJoo Ham
  0 siblings, 1 reply; 30+ messages in thread
From: Turquette, Mike @ 2011-08-04 21:59 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Thu, Aug 4, 2011 at 1:15 AM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 3:33 AM, Kevin Hilman <khilman@ti.com> wrote:
>> MyungJoo Ham <myungjoo.ham@samsung.com> writes:
>>
>>> On Wed, Aug 3, 2011 at 7:02 AM, Kevin Hilman <khilman@ti.com> wrote:
>>
>> [...]
>>
>>>> Maybe I'm not understanding the usage of it fully, but that seems like
>>>> hard-coding policy into the framework that might not be appropriate.
>>>> For example, what if there are other devices with constraints such that
>>>> they cannot currently scale frequency/voltage?
>>>>
>>>> Mabye MyungJoo can explain in more detail the usecases for tickle?
>>>
>>> Tickle is not for QoS between devices. It is for faster reaction to
>>> (human) user inputs at DVFS side where waiting for DVFS's reaction
>>> takes too much time and reducing polling interval costs too much.
>>
>> This is exactly what quality of service (QoS) is about.
>>
>> The user (whether it's a human user input or another device) has low
>> quality and expects higher quality.  It wants to request better quality,
>> so it needs a way to request it.
>>
>> The proposed "tickle" approach proposed here is simply a "request max
>> frequency for duration X" QoS request.
>>
>> Kevin
>>
>
> Ok.. I see.
>
> Now, I can agree with you that tickle is subset of QoS request.
>
> As long as we have QoS request feature on devices with either OPP or
> DEVFREQ, tickling is not needed.
>
> I'll remove tickle in the next revision (along with some bugfixes for
> bugs found recently).
>
>
> Anyway, it appears that clock-rate-wise QoS request may be dealt at
> OPP so that the OPPs meeting the QoS requests w/ frequency or voltage
> specifications are enabled and returned with opp_find_* functions.
> Maybe we will need to separate enable/disable by
> opp_enable()/opp_disable() from enable/disable by QoS requests so that
> the two may have different semantics. Then, QoS requests cannot
> override opp_disable and opp_enable cannot override QoS requests. This
> way, any clock-setting code properly based on OPP (including any
> customized devfreq governors) cannot violate QoS requests.

If devfreq used the QoS API in it's ->target() call then this would
not be an issue, and further illustrates the idea of devfreq simply
being a policy into a proper QoS API.

Regards,
Mike

> How about this concept of getting QoS requests associated with clock rates?
>
>
>
> Cheers!
> MyungJoo.
> --
> MyungJoo Ham, Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-04 21:59           ` Turquette, Mike
@ 2011-08-05  6:18             ` MyungJoo Ham
  2011-08-08 19:13               ` Turquette, Mike
  0 siblings, 1 reply; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-05  6:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Fri, Aug 5, 2011 at 6:59 AM, Turquette, Mike <mturquette@ti.com> wrote:
> On Thu, Aug 4, 2011 at 1:15 AM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>>
>> Ok.. I see.
>>
>> Now, I can agree with you that tickle is subset of QoS request.
>>
>> As long as we have QoS request feature on devices with either OPP or
>> DEVFREQ, tickling is not needed.
>>
>> I'll remove tickle in the next revision (along with some bugfixes for
>> bugs found recently).
>>
>>
>> Anyway, it appears that clock-rate-wise QoS request may be dealt at
>> OPP so that the OPPs meeting the QoS requests w/ frequency or voltage
>> specifications are enabled and returned with opp_find_* functions.
>> Maybe we will need to separate enable/disable by
>> opp_enable()/opp_disable() from enable/disable by QoS requests so that
>> the two may have different semantics. Then, QoS requests cannot
>> override opp_disable and opp_enable cannot override QoS requests. This
>> way, any clock-setting code properly based on OPP (including any
>> customized devfreq governors) cannot violate QoS requests.
>
> If devfreq used the QoS API in it's ->target() call then this would
> not be an issue, and further illustrates the idea of devfreq simply
> being a policy into a proper QoS API.
>
> Regards,
> Mike
>

Yes, if we choose an OPP that meets the PM-QoS requests before calling
->target() in devfreq_do(), there wouldn't be an issue.

However, if a device is using DEVFREQ, it also means the device has
OPPs (mandatory for DEVFREQ). If the device is using PM QoS as well as
OPP, I guess the correctly implemented device driver needs to call
opp_enable() and opp_disable() according to PM QoS's update_target()
calls through the PM QoS notifier cb.

Then, for such drivers, DEVFREQ automatically meets PM QoS requests
without any modification; as long as QoS meeting OPPs are enabled at
the device driver's PM QoS callback, there is no QoS issue.

Therefore, now, it appears that neither OPP or DEVFREQ should be
allowed to directly touch PM QoS APIs, but, the device driver should
do so with notifier by simply calling opp-enable/disable if the
frequency is the key QoS "actuator".

If we are going to let DEVFREQ handle its corresponding devices' PM
QoS APIs, it would mean that both device driver and its DEVFREQ codes
will be handling PM QoS API duplicated (or worse, inconsistently).



Cheers,
MyungJoo

>> How about this concept of getting QoS requests associated with clock rates?
>>
>>
>>
>> Cheers!
>> MyungJoo.
>> --
>> MyungJoo Ham, Ph.D.
>> Mobile Software Platform Lab,
>> Digital Media and Communications (DMC) Business
>> Samsung Electronics
>> cell: 82-10-6714-2858
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-05  6:18             ` MyungJoo Ham
@ 2011-08-08 19:13               ` Turquette, Mike
  2011-08-09  5:27                 ` MyungJoo Ham
  0 siblings, 1 reply; 30+ messages in thread
From: Turquette, Mike @ 2011-08-08 19:13 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Thu, Aug 4, 2011 at 11:18 PM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
> On Fri, Aug 5, 2011 at 6:59 AM, Turquette, Mike <mturquette@ti.com> wrote:
>> On Thu, Aug 4, 2011 at 1:15 AM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>>>
>>> Ok.. I see.
>>>
>>> Now, I can agree with you that tickle is subset of QoS request.
>>>
>>> As long as we have QoS request feature on devices with either OPP or
>>> DEVFREQ, tickling is not needed.
>>>
>>> I'll remove tickle in the next revision (along with some bugfixes for
>>> bugs found recently).
>>>
>>>
>>> Anyway, it appears that clock-rate-wise QoS request may be dealt at
>>> OPP so that the OPPs meeting the QoS requests w/ frequency or voltage
>>> specifications are enabled and returned with opp_find_* functions.
>>> Maybe we will need to separate enable/disable by
>>> opp_enable()/opp_disable() from enable/disable by QoS requests so that
>>> the two may have different semantics. Then, QoS requests cannot
>>> override opp_disable and opp_enable cannot override QoS requests. This
>>> way, any clock-setting code properly based on OPP (including any
>>> customized devfreq governors) cannot violate QoS requests.
>>
>> If devfreq used the QoS API in it's ->target() call then this would
>> not be an issue, and further illustrates the idea of devfreq simply
>> being a policy into a proper QoS API.
>>
>> Regards,
>> Mike
>>
>
> Yes, if we choose an OPP that meets the PM-QoS requests before calling
> ->target() in devfreq_do(), there wouldn't be an issue.
>
> However, if a device is using DEVFREQ, it also means the device has
> OPPs (mandatory for DEVFREQ). If the device is using PM QoS as well as
> OPP, I guess the correctly implemented device driver needs to call
> opp_enable() and opp_disable() according to PM QoS's update_target()
> calls through the PM QoS notifier cb.
>
> Then, for such drivers, DEVFREQ automatically meets PM QoS requests
> without any modification; as long as QoS meeting OPPs are enabled at
> the device driver's PM QoS callback, there is no QoS issue.
>
> Therefore, now, it appears that neither OPP or DEVFREQ should be
> allowed to directly touch PM QoS APIs, but, the device driver should
> do so with notifier by simply calling opp-enable/disable if the
> frequency is the key QoS "actuator".
>
> If we are going to let DEVFREQ handle its corresponding devices' PM
> QoS APIs, it would mean that both device driver and its DEVFREQ codes
> will be handling PM QoS API duplicated (or worse, inconsistently).

I'm not getting too bogged down with the OPP specifics because I don't
know if that interface is going to be used in the future, and I don't
think that devfreq will need to know about OPPs once the DVFS QoS API
exists.  In that case, devfreq can just requests clock frequencies
through the QoS API.  I view devfreq's usage of the OPP library as
temporary.

The real issue here is that we don't want some weird feedback loop
with device QoS requests and devfreq targets stepping on each other.

One way to handle this is to partition QoS use in drivers away from
devfreq usage.  For example, if a GPU supports performance counters
and can introspect its own usage, then it is a perfect candidate for
devfreq; on the flip-side device drivers should *not* be allowed to
put performance/qos constraints on this particular GPU, since we
assume that the performance counters/devfreq governor will handle the
whole job for us.  Since this centralizes the decision-making for the
GPU it is safe for the devfreq->target() call to use the QoS APIs for
controlling the GPU, since no one else will.  This avoids the feedback
loop.

On the other hand, if the GPU does not support performance counters
then it should not use devfreq at all and rely 100% of QoS constraints
from various sources: the GPU driver might request a high OPP every
time work comes in, coupled with a timeout; if a QoS knob is exported
to userspace then some OpenGL library might hold constraints through
it; or some other kernel driver (video-related?) needs to use the GPU
then it can hold a constraint through the QoS API.

So there is a clear partition of QoS API usage between devices that
support performance counters and ones that do not.  We want to avoid a
feedback loop here.

Regards,
Mike

> Cheers,
> MyungJoo
>
>>> How about this concept of getting QoS requests associated with clock rates?
>>>
>>>
>>>
>>> Cheers!
>>> MyungJoo.
>>> --
>>> MyungJoo Ham, Ph.D.
>>> Mobile Software Platform Lab,
>>> Digital Media and Communications (DMC) Business
>>> Samsung Electronics
>>> cell: 82-10-6714-2858
>>> _______________________________________________
>>> linux-pm mailing list
>>> linux-pm@lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>>
>>
>
>
>
> --
> MyungJoo Ham, Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-08 19:13               ` Turquette, Mike
@ 2011-08-09  5:27                 ` MyungJoo Ham
  2011-08-11  1:28                   ` Turquette, Mike
  0 siblings, 1 reply; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-09  5:27 UTC (permalink / raw)
  To: Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Tue, Aug 9, 2011 at 4:13 AM, Turquette, Mike <mturquette@ti.com> wrote:
> On Thu, Aug 4, 2011 at 11:18 PM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>> On Fri, Aug 5, 2011 at 6:59 AM, Turquette, Mike <mturquette@ti.com> wrote:
>>> On Thu, Aug 4, 2011 at 1:15 AM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>>>>
>>>> Ok.. I see.
>>>>
>>>> Now, I can agree with you that tickle is subset of QoS request.
>>>>
>>>> As long as we have QoS request feature on devices with either OPP or
>>>> DEVFREQ, tickling is not needed.
>>>>
>>>> I'll remove tickle in the next revision (along with some bugfixes for
>>>> bugs found recently).
>>>>
>>>>
>>>> Anyway, it appears that clock-rate-wise QoS request may be dealt at
>>>> OPP so that the OPPs meeting the QoS requests w/ frequency or voltage
>>>> specifications are enabled and returned with opp_find_* functions.
>>>> Maybe we will need to separate enable/disable by
>>>> opp_enable()/opp_disable() from enable/disable by QoS requests so that
>>>> the two may have different semantics. Then, QoS requests cannot
>>>> override opp_disable and opp_enable cannot override QoS requests. This
>>>> way, any clock-setting code properly based on OPP (including any
>>>> customized devfreq governors) cannot violate QoS requests.
>>>
>>> If devfreq used the QoS API in it's ->target() call then this would
>>> not be an issue, and further illustrates the idea of devfreq simply
>>> being a policy into a proper QoS API.
>>>
>>> Regards,
>>> Mike
>>>
>>
>> Yes, if we choose an OPP that meets the PM-QoS requests before calling
>> ->target() in devfreq_do(), there wouldn't be an issue.
>>
>> However, if a device is using DEVFREQ, it also means the device has
>> OPPs (mandatory for DEVFREQ). If the device is using PM QoS as well as
>> OPP, I guess the correctly implemented device driver needs to call
>> opp_enable() and opp_disable() according to PM QoS's update_target()
>> calls through the PM QoS notifier cb.
>>
>> Then, for such drivers, DEVFREQ automatically meets PM QoS requests
>> without any modification; as long as QoS meeting OPPs are enabled at
>> the device driver's PM QoS callback, there is no QoS issue.
>>
>> Therefore, now, it appears that neither OPP or DEVFREQ should be
>> allowed to directly touch PM QoS APIs, but, the device driver should
>> do so with notifier by simply calling opp-enable/disable if the
>> frequency is the key QoS "actuator".
>>
>> If we are going to let DEVFREQ handle its corresponding devices' PM
>> QoS APIs, it would mean that both device driver and its DEVFREQ codes
>> will be handling PM QoS API duplicated (or worse, inconsistently).
>
> I'm not getting too bogged down with the OPP specifics because I don't
> know if that interface is going to be used in the future, and I don't
> think that devfreq will need to know about OPPs once the DVFS QoS API
> exists.  In that case, devfreq can just requests clock frequencies
> through the QoS API.  I view devfreq's usage of the OPP library as
> temporary.
>
> The real issue here is that we don't want some weird feedback loop
> with device QoS requests and devfreq targets stepping on each other.
>
> One way to handle this is to partition QoS use in drivers away from
> devfreq usage.  For example, if a GPU supports performance counters
> and can introspect its own usage, then it is a perfect candidate for
> devfreq; on the flip-side device drivers should *not* be allowed to
> put performance/qos constraints on this particular GPU, since we
> assume that the performance counters/devfreq governor will handle the
> whole job for us.  Since this centralizes the decision-making for the
> GPU it is safe for the devfreq->target() call to use the QoS APIs for
> controlling the GPU, since no one else will.  This avoids the feedback
> loop.
>
> On the other hand, if the GPU does not support performance counters
> then it should not use devfreq at all and rely 100% of QoS constraints
> from various sources: the GPU driver might request a high OPP every
> time work comes in, coupled with a timeout; if a QoS knob is exported
> to userspace then some OpenGL library might hold constraints through
> it; or some other kernel driver (video-related?) needs to use the GPU
> then it can hold a constraint through the QoS API.
>
> So there is a clear partition of QoS API usage between devices that
> support performance counters and ones that do not.  We want to avoid a
> feedback loop here.

Such weird feedback loop may happen if the result frequency ranges
from QoS are inconsistent with those from DEVFREQ. However, if DEVFREQ
provides frequencies that do not violate QoS requirements, there would
be no such feedback loop. If they are consistent (DEVFREQ never gives
frequencies that do NOT satisfy QoS), DEVFREQ is transparent to PM
QoS.

For now, with OPPs, DEVFREQ will always select frequencies that
satisfy QoS requirements if the driver implementation is correct.

However, as you mentioned, if we assume that OPP is going to disappear
in the future kernel, then we will somehow need to design an interface
for device drivers to provide lists of available frequencies to
DEVFREQ and those lists are going to be runtime-update-able and where
QoS requirements are applied. As with the current DEVFREQ
implementation, if DEVFREQ chooses frequencies only from those enabled
by the driver (meeting the QoS requirements), things are same with the
cases with OPPs. The reason why QoS requirements are being interpreted
at device drivers, not at DEVFREQ, is that the device driver itself is
the only entity that can interpret QoS requests to frequencies
(latency/#ops/... --> Hz).

And, for the another point, making PM QoS disabled (or ignored?) for
devices with DEVFREQ, I guess that should be left for each device
driver's decision. Even when DVFS is used for a device, there are
cases where QoS requests are needed as well because DEVFREQ is based
on polling which incurs response time that is not acceptable for some
cases: e.g., a sudden user input at idle in a system with high-speed
displays. Besides, there could be another type of PM QoS requirements:
"under xxx MHz" in order to restrict (or as a response to) the
temperature or power consumption rate, which should affect DEVFREQ's
behavior. For battery-powered devices (laptops will want to do so as
well at userspace), I guess such features will be soon needed; at
least, we are working on it.


Cheers!
MyungJoo

>
> Regards,
> Mike
>
>> Cheers,
>> MyungJoo
>>
>>>> How about this concept of getting QoS requests associated with clock rates?
>>>>
>>>>
>>>>
>>>> Cheers!
>>>> MyungJoo.
>>>> --
>>>> MyungJoo Ham, Ph.D.
>>>> Mobile Software Platform Lab,
>>>> Digital Media and Communications (DMC) Business
>>>> Samsung Electronics
>>>> cell: 82-10-6714-2858
>>>> _______________________________________________
>>>> linux-pm mailing list
>>>> linux-pm@lists.linux-foundation.org
>>>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>>>
>>>
>>
>>
>>
>> --
>> MyungJoo Ham, Ph.D.
>> Mobile Software Platform Lab,
>> Digital Media and Communications (DMC) Business
>> Samsung Electronics
>> cell: 82-10-6714-2858
>>
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-09  5:27                 ` MyungJoo Ham
@ 2011-08-11  1:28                   ` Turquette, Mike
  2011-08-17 10:07                     ` MyungJoo Ham
  0 siblings, 1 reply; 30+ messages in thread
From: Turquette, Mike @ 2011-08-11  1:28 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Mon, Aug 8, 2011 at 10:27 PM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
> On Tue, Aug 9, 2011 at 4:13 AM, Turquette, Mike <mturquette@ti.com> wrote:
>> On Thu, Aug 4, 2011 at 11:18 PM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>>> On Fri, Aug 5, 2011 at 6:59 AM, Turquette, Mike <mturquette@ti.com> wrote:
>>>> On Thu, Aug 4, 2011 at 1:15 AM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>>>>>
>>>>> Ok.. I see.
>>>>>
>>>>> Now, I can agree with you that tickle is subset of QoS request.
>>>>>
>>>>> As long as we have QoS request feature on devices with either OPP or
>>>>> DEVFREQ, tickling is not needed.
>>>>>
>>>>> I'll remove tickle in the next revision (along with some bugfixes for
>>>>> bugs found recently).
>>>>>
>>>>>
>>>>> Anyway, it appears that clock-rate-wise QoS request may be dealt at
>>>>> OPP so that the OPPs meeting the QoS requests w/ frequency or voltage
>>>>> specifications are enabled and returned with opp_find_* functions.
>>>>> Maybe we will need to separate enable/disable by
>>>>> opp_enable()/opp_disable() from enable/disable by QoS requests so that
>>>>> the two may have different semantics. Then, QoS requests cannot
>>>>> override opp_disable and opp_enable cannot override QoS requests. This
>>>>> way, any clock-setting code properly based on OPP (including any
>>>>> customized devfreq governors) cannot violate QoS requests.
>>>>
>>>> If devfreq used the QoS API in it's ->target() call then this would
>>>> not be an issue, and further illustrates the idea of devfreq simply
>>>> being a policy into a proper QoS API.
>>>>
>>>> Regards,
>>>> Mike
>>>>
>>>
>>> Yes, if we choose an OPP that meets the PM-QoS requests before calling
>>> ->target() in devfreq_do(), there wouldn't be an issue.
>>>
>>> However, if a device is using DEVFREQ, it also means the device has
>>> OPPs (mandatory for DEVFREQ). If the device is using PM QoS as well as
>>> OPP, I guess the correctly implemented device driver needs to call
>>> opp_enable() and opp_disable() according to PM QoS's update_target()
>>> calls through the PM QoS notifier cb.
>>>
>>> Then, for such drivers, DEVFREQ automatically meets PM QoS requests
>>> without any modification; as long as QoS meeting OPPs are enabled at
>>> the device driver's PM QoS callback, there is no QoS issue.
>>>
>>> Therefore, now, it appears that neither OPP or DEVFREQ should be
>>> allowed to directly touch PM QoS APIs, but, the device driver should
>>> do so with notifier by simply calling opp-enable/disable if the
>>> frequency is the key QoS "actuator".
>>>
>>> If we are going to let DEVFREQ handle its corresponding devices' PM
>>> QoS APIs, it would mean that both device driver and its DEVFREQ codes
>>> will be handling PM QoS API duplicated (or worse, inconsistently).
>>
>> I'm not getting too bogged down with the OPP specifics because I don't
>> know if that interface is going to be used in the future, and I don't
>> think that devfreq will need to know about OPPs once the DVFS QoS API
>> exists.  In that case, devfreq can just requests clock frequencies
>> through the QoS API.  I view devfreq's usage of the OPP library as
>> temporary.
>>
>> The real issue here is that we don't want some weird feedback loop
>> with device QoS requests and devfreq targets stepping on each other.
>>
>> One way to handle this is to partition QoS use in drivers away from
>> devfreq usage.  For example, if a GPU supports performance counters
>> and can introspect its own usage, then it is a perfect candidate for
>> devfreq; on the flip-side device drivers should *not* be allowed to
>> put performance/qos constraints on this particular GPU, since we
>> assume that the performance counters/devfreq governor will handle the
>> whole job for us.  Since this centralizes the decision-making for the
>> GPU it is safe for the devfreq->target() call to use the QoS APIs for
>> controlling the GPU, since no one else will.  This avoids the feedback
>> loop.
>>
>> On the other hand, if the GPU does not support performance counters
>> then it should not use devfreq at all and rely 100% of QoS constraints
>> from various sources: the GPU driver might request a high OPP every
>> time work comes in, coupled with a timeout; if a QoS knob is exported
>> to userspace then some OpenGL library might hold constraints through
>> it; or some other kernel driver (video-related?) needs to use the GPU
>> then it can hold a constraint through the QoS API.
>>
>> So there is a clear partition of QoS API usage between devices that
>> support performance counters and ones that do not.  We want to avoid a
>> feedback loop here.
>
> Such weird feedback loop may happen if the result frequency ranges
> from QoS are inconsistent with those from DEVFREQ. However, if DEVFREQ
> provides frequencies that do not violate QoS requirements, there would
> be no such feedback loop. If they are consistent (DEVFREQ never gives
> frequencies that do NOT satisfy QoS), DEVFREQ is transparent to PM
> QoS.

If two devices, x and y both place frequency constraints on device z,
then QoS should probably aggregate their requests instead of just
taking the max.  Then if devfreq also determines that it needs to make
a QoS request against device z, then that will get aggregated on top
of x and y's requests.  This isn't really what we want since x and y's
requests may be adequate and devfreq's performance counter sampling is
just requesting performance for operations which have already placed
requests.

> For now, with OPPs, DEVFREQ will always select frequencies that
> satisfy QoS requirements if the driver implementation is correct.
>
> However, as you mentioned, if we assume that OPP is going to disappear
> in the future kernel, then we will somehow need to design an interface
> for device drivers to provide lists of available frequencies to
> DEVFREQ and those lists are going to be runtime-update-able and where
> QoS requirements are applied. As with the current DEVFREQ

I'm not saying that OPP is going to disappear because I have no idea
what the future holds.

My point is that devfreq doesn't need to know about OPP at all.  The
code assumes that OPPs exist for the device and take a frequency as
part of devfreq->profile->get_target_freq (or something like that) and
they can just pass that frequency to devfreq->profile->target.  If
that device has support for the OPP library than it can do the job of
looking up OPPs, etc.

Regards,
Mike

> implementation, if DEVFREQ chooses frequencies only from those enabled
> by the driver (meeting the QoS requirements), things are same with the
> cases with OPPs. The reason why QoS requirements are being interpreted
> at device drivers, not at DEVFREQ, is that the device driver itself is
> the only entity that can interpret QoS requests to frequencies
> (latency/#ops/... --> Hz).
>
> And, for the another point, making PM QoS disabled (or ignored?) for
> devices with DEVFREQ, I guess that should be left for each device
> driver's decision. Even when DVFS is used for a device, there are
> cases where QoS requests are needed as well because DEVFREQ is based
> on polling which incurs response time that is not acceptable for some
> cases: e.g., a sudden user input at idle in a system with high-speed
> displays. Besides, there could be another type of PM QoS requirements:
> "under xxx MHz" in order to restrict (or as a response to) the
> temperature or power consumption rate, which should affect DEVFREQ's
> behavior. For battery-powered devices (laptops will want to do so as
> well at userspace), I guess such features will be soon needed; at
> least, we are working on it.
>
>
> Cheers!
> MyungJoo
>
>>
>> Regards,
>> Mike
>>
>>> Cheers,
>>> MyungJoo
>>>
>>>>> How about this concept of getting QoS requests associated with clock rates?
>>>>>
>>>>>
>>>>>
>>>>> Cheers!
>>>>> MyungJoo.
>>>>> --
>>>>> MyungJoo Ham, Ph.D.
>>>>> Mobile Software Platform Lab,
>>>>> Digital Media and Communications (DMC) Business
>>>>> Samsung Electronics
>>>>> cell: 82-10-6714-2858
>>>>> _______________________________________________
>>>>> linux-pm mailing list
>>>>> linux-pm@lists.linux-foundation.org
>>>>> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> MyungJoo Ham, Ph.D.
>>> Mobile Software Platform Lab,
>>> Digital Media and Communications (DMC) Business
>>> Samsung Electronics
>>> cell: 82-10-6714-2858
>>>
>>
>
>
>
> --
> MyungJoo Ham, Ph.D.
> Mobile Software Platform Lab,
> Digital Media and Communications (DMC) Business
> Samsung Electronics
> cell: 82-10-6714-2858
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
  2011-08-11  1:28                   ` Turquette, Mike
@ 2011-08-17 10:07                     ` MyungJoo Ham
  0 siblings, 0 replies; 30+ messages in thread
From: MyungJoo Ham @ 2011-08-17 10:07 UTC (permalink / raw)
  To: Turquette, Mike
  Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner, linux-pm

On Thu, Aug 11, 2011 at 10:28 AM, Turquette, Mike <mturquette@ti.com> wrote:
> On Mon, Aug 8, 2011 at 10:27 PM, MyungJoo Ham <myungjoo.ham@gmail.com> wrote:
>> On Tue, Aug 9, 2011 at 4:13 AM, Turquette, Mike <mturquette@ti.com> wrote:
>>> I'm not getting too bogged down with the OPP specifics because I don't
>>> know if that interface is going to be used in the future, and I don't
>>> think that devfreq will need to know about OPPs once the DVFS QoS API
>>> exists.  In that case, devfreq can just requests clock frequencies
>>> through the QoS API.  I view devfreq's usage of the OPP library as
>>> temporary.
>>>
>>> The real issue here is that we don't want some weird feedback loop
>>> with device QoS requests and devfreq targets stepping on each other.
>>>
>>> One way to handle this is to partition QoS use in drivers away from
>>> devfreq usage.  For example, if a GPU supports performance counters
>>> and can introspect its own usage, then it is a perfect candidate for
>>> devfreq; on the flip-side device drivers should *not* be allowed to
>>> put performance/qos constraints on this particular GPU, since we
>>> assume that the performance counters/devfreq governor will handle the
>>> whole job for us.  Since this centralizes the decision-making for the
>>> GPU it is safe for the devfreq->target() call to use the QoS APIs for
>>> controlling the GPU, since no one else will.  This avoids the feedback
>>> loop.
>>>
>>> On the other hand, if the GPU does not support performance counters
>>> then it should not use devfreq at all and rely 100% of QoS constraints
>>> from various sources: the GPU driver might request a high OPP every
>>> time work comes in, coupled with a timeout; if a QoS knob is exported
>>> to userspace then some OpenGL library might hold constraints through
>>> it; or some other kernel driver (video-related?) needs to use the GPU
>>> then it can hold a constraint through the QoS API.
>>>
>>> So there is a clear partition of QoS API usage between devices that
>>> support performance counters and ones that do not.  We want to avoid a
>>> feedback loop here.
>>
>> Such weird feedback loop may happen if the result frequency ranges
>> from QoS are inconsistent with those from DEVFREQ. However, if DEVFREQ
>> provides frequencies that do not violate QoS requirements, there would
>> be no such feedback loop. If they are consistent (DEVFREQ never gives
>> frequencies that do NOT satisfy QoS), DEVFREQ is transparent to PM
>> QoS.
>
> If two devices, x and y both place frequency constraints on device z,
> then QoS should probably aggregate their requests instead of just
> taking the max.  Then if devfreq also determines that it needs to make
> a QoS request against device z, then that will get aggregated on top
> of x and y's requests.  This isn't really what we want since x and y's
> requests may be adequate and devfreq's performance counter sampling is
> just requesting performance for operations which have already placed
> requests.

DEVFREQ won't give contradicting frequencies with PM QoS inputs to the
target device. DEVFREQ will choose one from frequencies that satisfy
the PM QoS inputs from both x and y (probably aggregated by PM QoS
framework) as long as the PM QoS part of the target device driver is
properly implemented by enabling satisfying OPPs and disabling
not-satisfying OPPs.

>
>> For now, with OPPs, DEVFREQ will always select frequencies that
>> satisfy QoS requirements if the driver implementation is correct.
>>
>> However, as you mentioned, if we assume that OPP is going to disappear
>> in the future kernel, then we will somehow need to design an interface
>> for device drivers to provide lists of available frequencies to
>> DEVFREQ and those lists are going to be runtime-update-able and where
>> QoS requirements are applied. As with the current DEVFREQ
>
> I'm not saying that OPP is going to disappear because I have no idea
> what the future holds.
>
> My point is that devfreq doesn't need to know about OPP at all.  The
> code assumes that OPPs exist for the device and take a frequency as
> part of devfreq->profile->get_target_freq (or something like that) and
> they can just pass that frequency to devfreq->profile->target.  If
> that device has support for the OPP library than it can do the job of
> looking up OPPs, etc.

As discussed in the other thread, OPP is the one that describes the
data every DVFS-able device have. Is there any reason to redundantly
define the same data structure and functions that are publicly
available and represent the same things? Every DEVFREQ device will
have multiple pairs of frequencies and voltages associated with the
specific device and use the functions equivalent to those in OPP
library.

>
> Regards,
> Mike
>

Cheers!
MyungJoo
-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-08-17 10:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-15  8:11 [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices MyungJoo Ham
2011-07-15  8:11 ` [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs MyungJoo Ham
2011-08-02 18:45   ` Kevin Hilman
2011-08-03  8:06     ` MyungJoo Ham
2011-08-02 21:56   ` Kevin Hilman
2011-08-03  6:02     ` MyungJoo Ham
2011-07-15  8:11 ` [PATCH v4 2/3] PM / DEVFREQ: add example governors MyungJoo Ham
2011-07-15  8:11 ` [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling) MyungJoo Ham
2011-06-09 17:11   ` Pavel Machek
2011-07-19  2:14     ` MyungJoo Ham
2011-07-28 22:10 ` [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices Rafael J. Wysocki
2011-07-29  4:46   ` Turquette, Mike
2011-07-29  9:10     ` Rafael J. Wysocki
2011-07-30  1:02       ` Turquette, Mike
2011-07-30 21:23         ` Rafael J. Wysocki
2011-08-01 21:47           ` Turquette, Mike
2011-08-01  6:22         ` MyungJoo Ham
2011-08-01 22:01           ` Turquette, Mike
2011-08-02  7:17             ` MyungJoo Ham
2011-08-02 22:02   ` Kevin Hilman
2011-08-03  7:03     ` MyungJoo Ham
2011-08-03 17:31       ` Turquette, Mike
2011-08-03 18:33       ` Kevin Hilman
2011-08-04  8:15         ` MyungJoo Ham
2011-08-04 21:59           ` Turquette, Mike
2011-08-05  6:18             ` MyungJoo Ham
2011-08-08 19:13               ` Turquette, Mike
2011-08-09  5:27                 ` MyungJoo Ham
2011-08-11  1:28                   ` Turquette, Mike
2011-08-17 10:07                     ` MyungJoo Ham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.