All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: linux-next: build failure after merge of the cpufreq tree
@ 2012-03-05  1:46 ` MyungJoo Ham
  0 siblings, 0 replies; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-05  1:46 UTC (permalink / raw)
  To: Stephen Rothwell, Dave Jones
  Cc: linux-next, linux-kernel, 박경민,
	RafaelJ.Wysocki<rjw, linux-pm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=euc-kr, Size: 1567 bytes --]

Hello,

The patch "CPUfreq ondemand: handle QoS request on DVFS response latency" that introduced the mentioned errors requires the patch "PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency" get merged, too. I guess that patch will be at pm-qos tree if get merged (v2 patch has been released 7 days ago).


Cheers!
MyungJoo.

------- Original Message -------
Sender : Stephen Rothwell<sfr@canb.auug.org.au>
Date : 2012-03-01 11:56 (GMT+09:00)
Title : linux-next: build failure after merge of the cpufreq tree

Hi Dave,

After merging the cpufreq tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_init':
drivers/cpufreq/cpufreq_ondemand.c:880:28: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)
drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_exit':
drivers/cpufreq/cpufreq_ondemand.c:896:25: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)

Caused by commit 500e8ca39c56 ("[CPUFREQ] ondemand: handle QoS request on
DVFS response latency").

I have used the cpufreq tree fomr next-20120229 for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/



--
MyungJoo Ham (ÇÔ¸íÁÖ), PHD
System S/W Lab, S/W Platform Team, Software Center
Samsung Electronics
Cell: +82-10-6714-2858ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: build failure after merge of the cpufreq tree
@ 2012-03-05  1:46 ` MyungJoo Ham
  0 siblings, 0 replies; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-05  1:46 UTC (permalink / raw)
  To: Stephen Rothwell, Dave Jones
  Cc: linux-next, linux-kernel, 박경민,
	RafaelJ.Wysocki<rjw, linux-pm

Hello,

The patch "CPUfreq ondemand: handle QoS request on DVFS response latency" that introduced the mentioned errors requires the patch "PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency" get merged, too. I guess that patch will be at pm-qos tree if get merged (v2 patch has been released 7 days ago).


Cheers!
MyungJoo.

------- Original Message -------
Sender : Stephen Rothwell<sfr@canb.auug.org.au>
Date : 2012-03-01 11:56 (GMT+09:00)
Title : linux-next: build failure after merge of the cpufreq tree

Hi Dave,

After merging the cpufreq tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_init':
drivers/cpufreq/cpufreq_ondemand.c:880:28: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)
drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_exit':
drivers/cpufreq/cpufreq_ondemand.c:896:25: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)

Caused by commit 500e8ca39c56 ("[CPUFREQ] ondemand: handle QoS request on
DVFS response latency").

I have used the cpufreq tree fomr next-20120229 for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/



--
MyungJoo Ham (함명주), PHD
System S/W Lab, S/W Platform Team, Software Center
Samsung Electronics
Cell: +82-10-6714-2858

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-05  1:46 ` MyungJoo Ham
  (?)
@ 2012-03-07  5:02 ` MyungJoo Ham
  2012-03-07  9:02   ` Rafael J. Wysocki
  2012-03-08  3:47   ` mark gross
  -1 siblings, 2 replies; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-07  5:02 UTC (permalink / raw)
  To: Rafael J. Wysocki, Stephen Rothwell, Dave Jones
  Cc: linux-pm, linux-next, Len Brown, Pavel Machek, Kevin Hilman,
	Jean Pihet, markgross, kyungmin.park, myungjoo.ham, linux-kernel

1. CPU_DMA_THROUGHPUT

This might look simliar to CPU_DMA_LATENCY. However, there are H/W
blocks that creates QoS requirement based on DMA throughput, not
latency, while their (those QoS requester H/W blocks) services are
short-term bursts that cannot be effectively responsed by DVFS
mechanisms (CPUFreq and Devfreq).

In the Exynos4412 systems that are being tested, such H/W blocks include
MFC (multi-function codec)'s decoding and enconding features, TV-out
(including HDMI), and Cameras. When the display is operated at 60Hz,
each chunk of task should be done within 16ms and the workload on DMA is
not well spread and fluctuates between frames; some frame requires more
and some do not and within a frame, the workload also fluctuates
heavily and the tasks within a frame are usually not parallelized; they
are processed through specific H/W blocks, not CPU cores. They often
have PPMU capabilities; however, they need to be polled very frequently
in order to let DVFS mechanisms react properly. (less than 5ms).

For such specific tasks, allowing them to request QoS requirements seems
adequete because DVFS mechanisms (as long as the polling rate is 5ms or
longer) cannot follow up with them. Besides, the device drivers know
when to request and cancel QoS exactly.

2. DVFS_LATENCY

Both CPUFreq and Devfreq have response latency to a sudden workload
increase. With near-100% (e.g., 95%) up-threshold, the average response
latency is approximately 1.5 x polling-rate.

A specific polling rate (e.g., 100ms) may generally fit for its system;
however, there could be exceptions for that. For example,
- When a user input suddenly starts: typing, clicking, moving cursors, and
  such, the user might need the full performance immediately. However,
  we do not know whether the full performance is actually needed or not
  until we calculate the utilization; thus, we need to calculate it
  faster with user inputs or any similar events. Specifying QoS on CPU
  processing power or Memory bandwidth at every user input is an
  overkill because there are many cases where such speed-up isn't
  necessary.
- When a device driver needs a faster performance response from DVFS
  mechanism. This could be addressed by simply putting QoS requests.
  However, such QoS requests may keep the system running fast
  unnecessary in some cases, especially if a) the device's resource
  usage bursts with some duration (e.g., 100ms-long bursts) and
  b) the driver doesn't know when such burst come. MMC/WiFi often had
  such behaviors although there are possibilities that part (b) might
  be addressed with further efforts.

The cases shown above can be tackled with putting QoS requests on the
response time or latency of DVFS mechanism, which is directly related to
its polling interval (if the DVFS mechanism is polling based).

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Changes from v2
- Rebased on the recent PM QoS patches, resolving the merge conflict.

Changes from RFC(v1)
- Added omitted part (registering new classes)
---
 include/linux/pm_qos.h |    4 ++++
 kernel/power/qos.c     |   31 ++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index c8a541e..0ee7caa 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -14,6 +14,8 @@ enum {
 	PM_QOS_CPU_DMA_LATENCY,
 	PM_QOS_NETWORK_LATENCY,
 	PM_QOS_NETWORK_THROUGHPUT,
+	PM_QOS_CPU_DMA_THROUGHPUT,
+	PM_QOS_DVFS_RESPONSE_LATENCY,
 
 	/* insert new class ID */
 	PM_QOS_NUM_CLASSES,
@@ -24,6 +26,8 @@ enum {
 #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
 #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
 #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE	0
+#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE	0
+#define PM_QOS_DVFS_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
 #define PM_QOS_DEV_LAT_DEFAULT_VALUE		0
 
 struct pm_qos_request {
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index d6d6dbd..3e122db 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
 };
 
 
+static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
+static struct pm_qos_constraints cpu_dma_tput_constraints = {
+	.list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
+	.target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
+	.default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
+	.type = PM_QOS_MAX,
+	.notifiers = &cpu_dma_throughput_notifier,
+};
+static struct pm_qos_object cpu_dma_throughput_pm_qos = {
+	.constraints = &cpu_dma_tput_constraints,
+	.name = "cpu_dma_throughput",
+};
+
+
+static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
+static struct pm_qos_constraints dvfs_lat_constraints = {
+	.list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
+	.target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
+	.default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
+	.type = PM_QOS_MIN,
+	.notifiers = &dvfs_lat_notifier,
+};
+static struct pm_qos_object dvfs_lat_pm_qos = {
+	.constraints = &dvfs_lat_constraints,
+	.name = "dvfs_latency",
+};
+
 static struct pm_qos_object *pm_qos_array[] = {
 	&null_pm_qos,
 	&cpu_dma_pm_qos,
 	&network_lat_pm_qos,
-	&network_throughput_pm_qos
+	&network_throughput_pm_qos,
+	&cpu_dma_throughput_pm_qos,
+	&dvfs_lat_pm_qos,
 };
 
 static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-07  5:02 ` [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency MyungJoo Ham
@ 2012-03-07  9:02   ` Rafael J. Wysocki
  2012-03-07  9:36     ` MyungJoo Ham
  2012-03-08  3:47   ` mark gross
  1 sibling, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-03-07  9:02 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Stephen Rothwell, Dave Jones, linux-pm, linux-next, Len Brown,
	Pavel Machek, Kevin Hilman, Jean Pihet, markgross, kyungmin.park,
	myungjoo.ham, linux-kernel

Hi,

Can you please post all of your outstanding PM-related patches that
you want me to look at in one series, so that they appear in context?

I'm struggling to understand what you need all those changes for.

Thanks,
Rafael


On Wednesday, March 07, 2012, MyungJoo Ham wrote:
> 1. CPU_DMA_THROUGHPUT
> 
> This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> blocks that creates QoS requirement based on DMA throughput, not
> latency, while their (those QoS requester H/W blocks) services are
> short-term bursts that cannot be effectively responsed by DVFS
> mechanisms (CPUFreq and Devfreq).
> 
> In the Exynos4412 systems that are being tested, such H/W blocks include
> MFC (multi-function codec)'s decoding and enconding features, TV-out
> (including HDMI), and Cameras. When the display is operated at 60Hz,
> each chunk of task should be done within 16ms and the workload on DMA is
> not well spread and fluctuates between frames; some frame requires more
> and some do not and within a frame, the workload also fluctuates
> heavily and the tasks within a frame are usually not parallelized; they
> are processed through specific H/W blocks, not CPU cores. They often
> have PPMU capabilities; however, they need to be polled very frequently
> in order to let DVFS mechanisms react properly. (less than 5ms).
> 
> For such specific tasks, allowing them to request QoS requirements seems
> adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> longer) cannot follow up with them. Besides, the device drivers know
> when to request and cancel QoS exactly.
> 
> 2. DVFS_LATENCY
> 
> Both CPUFreq and Devfreq have response latency to a sudden workload
> increase. With near-100% (e.g., 95%) up-threshold, the average response
> latency is approximately 1.5 x polling-rate.
> 
> A specific polling rate (e.g., 100ms) may generally fit for its system;
> however, there could be exceptions for that. For example,
> - When a user input suddenly starts: typing, clicking, moving cursors, and
>   such, the user might need the full performance immediately. However,
>   we do not know whether the full performance is actually needed or not
>   until we calculate the utilization; thus, we need to calculate it
>   faster with user inputs or any similar events. Specifying QoS on CPU
>   processing power or Memory bandwidth at every user input is an
>   overkill because there are many cases where such speed-up isn't
>   necessary.
> - When a device driver needs a faster performance response from DVFS
>   mechanism. This could be addressed by simply putting QoS requests.
>   However, such QoS requests may keep the system running fast
>   unnecessary in some cases, especially if a) the device's resource
>   usage bursts with some duration (e.g., 100ms-long bursts) and
>   b) the driver doesn't know when such burst come. MMC/WiFi often had
>   such behaviors although there are possibilities that part (b) might
>   be addressed with further efforts.
> 
> The cases shown above can be tackled with putting QoS requests on the
> response time or latency of DVFS mechanism, which is directly related to
> its polling interval (if the DVFS mechanism is polling based).
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> --
> Changes from v2
> - Rebased on the recent PM QoS patches, resolving the merge conflict.
> 
> Changes from RFC(v1)
> - Added omitted part (registering new classes)
> ---
>  include/linux/pm_qos.h |    4 ++++
>  kernel/power/qos.c     |   31 ++++++++++++++++++++++++++++++-
>  2 files changed, 34 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index c8a541e..0ee7caa 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -14,6 +14,8 @@ enum {
>  	PM_QOS_CPU_DMA_LATENCY,
>  	PM_QOS_NETWORK_LATENCY,
>  	PM_QOS_NETWORK_THROUGHPUT,
> +	PM_QOS_CPU_DMA_THROUGHPUT,
> +	PM_QOS_DVFS_RESPONSE_LATENCY,
>  
>  	/* insert new class ID */
>  	PM_QOS_NUM_CLASSES,
> @@ -24,6 +26,8 @@ enum {
>  #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
>  #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
>  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE	0
> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE	0
> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
>  #define PM_QOS_DEV_LAT_DEFAULT_VALUE		0
>  
>  struct pm_qos_request {
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index d6d6dbd..3e122db 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
>  };
>  
>  
> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
> +	.list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
> +	.target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> +	.default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> +	.type = PM_QOS_MAX,
> +	.notifiers = &cpu_dma_throughput_notifier,
> +};
> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
> +	.constraints = &cpu_dma_tput_constraints,
> +	.name = "cpu_dma_throughput",
> +};
> +
> +
> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
> +static struct pm_qos_constraints dvfs_lat_constraints = {
> +	.list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
> +	.target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> +	.default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> +	.type = PM_QOS_MIN,
> +	.notifiers = &dvfs_lat_notifier,
> +};
> +static struct pm_qos_object dvfs_lat_pm_qos = {
> +	.constraints = &dvfs_lat_constraints,
> +	.name = "dvfs_latency",
> +};
> +
>  static struct pm_qos_object *pm_qos_array[] = {
>  	&null_pm_qos,
>  	&cpu_dma_pm_qos,
>  	&network_lat_pm_qos,
> -	&network_throughput_pm_qos
> +	&network_throughput_pm_qos,
> +	&cpu_dma_throughput_pm_qos,
> +	&dvfs_lat_pm_qos,
>  };
>  
>  static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-07  9:02   ` Rafael J. Wysocki
@ 2012-03-07  9:36     ` MyungJoo Ham
  2012-03-09  8:17       ` MyungJoo Ham
  0 siblings, 1 reply; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-07  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Rothwell, Dave Jones, linux-pm, linux-next, Len Brown,
	Pavel Machek, Kevin Hilman, Jean Pihet, markgross, kyungmin.park,
	linux-kernel

2012/3/7 Rafael J. Wysocki <rjw@sisk.pl>:
> Hi,
>
> Can you please post all of your outstanding PM-related patches that
> you want me to look at in one series, so that they appear in context?
>
> I'm struggling to understand what you need all those changes for.
>

Hello Rafael,


I've put the patches at

- devfreq patches (based on your pm-devfreq branch)
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/devfreq-for-next

- pm-qos patches (based on your pm-qos branch)
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/pm_qos-for-next


- In order to help understand, all related patches are at (do not pull
from here, please)
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/devfreq
devfreq patches + pm-qos patches + cpufreq patches combined. (based on
Linux 3.3-rc6 + some test patches.)
However, it does not include recent PM-QoS patches in pm-qos branch,
so the patches are not recent and different from the above. So please
do not pull from here to your branches.


Anyway, we are synching our local repositories with infradead.org now;
so there could be some lag; infradead.org is not showing the current
version, yet.
(.... it even seems that the server is down now. I'll retry sync if it
resurrects)


Thank you.


Cheers!
MyungJoo.

> Thanks,
> Rafael
>
>
> On Wednesday, March 07, 2012, MyungJoo Ham wrote:
>> 1. CPU_DMA_THROUGHPUT
>>
>> This might look simliar to CPU_DMA_LATENCY. However, there are H/W
>> blocks that creates QoS requirement based on DMA throughput, not
>> latency, while their (those QoS requester H/W blocks) services are
>> short-term bursts that cannot be effectively responsed by DVFS
>> mechanisms (CPUFreq and Devfreq).
>>
>> In the Exynos4412 systems that are being tested, such H/W blocks include
>> MFC (multi-function codec)'s decoding and enconding features, TV-out
>> (including HDMI), and Cameras. When the display is operated at 60Hz,
>> each chunk of task should be done within 16ms and the workload on DMA is
>> not well spread and fluctuates between frames; some frame requires more
>> and some do not and within a frame, the workload also fluctuates
>> heavily and the tasks within a frame are usually not parallelized; they
>> are processed through specific H/W blocks, not CPU cores. They often
>> have PPMU capabilities; however, they need to be polled very frequently
>> in order to let DVFS mechanisms react properly. (less than 5ms).
>>
>> For such specific tasks, allowing them to request QoS requirements seems
>> adequete because DVFS mechanisms (as long as the polling rate is 5ms or
>> longer) cannot follow up with them. Besides, the device drivers know
>> when to request and cancel QoS exactly.
>>
>> 2. DVFS_LATENCY
>>
>> Both CPUFreq and Devfreq have response latency to a sudden workload
>> increase. With near-100% (e.g., 95%) up-threshold, the average response
>> latency is approximately 1.5 x polling-rate.
>>
>> A specific polling rate (e.g., 100ms) may generally fit for its system;
>> however, there could be exceptions for that. For example,
>> - When a user input suddenly starts: typing, clicking, moving cursors, and
>>   such, the user might need the full performance immediately. However,
>>   we do not know whether the full performance is actually needed or not
>>   until we calculate the utilization; thus, we need to calculate it
>>   faster with user inputs or any similar events. Specifying QoS on CPU
>>   processing power or Memory bandwidth at every user input is an
>>   overkill because there are many cases where such speed-up isn't
>>   necessary.
>> - When a device driver needs a faster performance response from DVFS
>>   mechanism. This could be addressed by simply putting QoS requests.
>>   However, such QoS requests may keep the system running fast
>>   unnecessary in some cases, especially if a) the device's resource
>>   usage bursts with some duration (e.g., 100ms-long bursts) and
>>   b) the driver doesn't know when such burst come. MMC/WiFi often had
>>   such behaviors although there are possibilities that part (b) might
>>   be addressed with further efforts.
>>
>> The cases shown above can be tackled with putting QoS requests on the
>> response time or latency of DVFS mechanism, which is directly related to
>> its polling interval (if the DVFS mechanism is polling based).
>>
>> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> --
>> Changes from v2
>> - Rebased on the recent PM QoS patches, resolving the merge conflict.
>>
>> Changes from RFC(v1)
>> - Added omitted part (registering new classes)
>> ---
>>  include/linux/pm_qos.h |    4 ++++
>>  kernel/power/qos.c     |   31 ++++++++++++++++++++++++++++++-
>>  2 files changed, 34 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
>> index c8a541e..0ee7caa 100644
>> --- a/include/linux/pm_qos.h
>> +++ b/include/linux/pm_qos.h
>> @@ -14,6 +14,8 @@ enum {
>>       PM_QOS_CPU_DMA_LATENCY,
>>       PM_QOS_NETWORK_LATENCY,
>>       PM_QOS_NETWORK_THROUGHPUT,
>> +     PM_QOS_CPU_DMA_THROUGHPUT,
>> +     PM_QOS_DVFS_RESPONSE_LATENCY,
>>
>>       /* insert new class ID */
>>       PM_QOS_NUM_CLASSES,
>> @@ -24,6 +26,8 @@ enum {
>>  #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE     (2000 * USEC_PER_SEC)
>>  #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE     (2000 * USEC_PER_SEC)
>>  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE      0
>> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE      0
>> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE        (2000 * USEC_PER_SEC)
>>  #define PM_QOS_DEV_LAT_DEFAULT_VALUE         0
>>
>>  struct pm_qos_request {
>> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
>> index d6d6dbd..3e122db 100644
>> --- a/kernel/power/qos.c
>> +++ b/kernel/power/qos.c
>> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
>>  };
>>
>>
>> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
>> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
>> +     .list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
>> +     .target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
>> +     .default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
>> +     .type = PM_QOS_MAX,
>> +     .notifiers = &cpu_dma_throughput_notifier,
>> +};
>> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
>> +     .constraints = &cpu_dma_tput_constraints,
>> +     .name = "cpu_dma_throughput",
>> +};
>> +
>> +
>> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
>> +static struct pm_qos_constraints dvfs_lat_constraints = {
>> +     .list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
>> +     .target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
>> +     .default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
>> +     .type = PM_QOS_MIN,
>> +     .notifiers = &dvfs_lat_notifier,
>> +};
>> +static struct pm_qos_object dvfs_lat_pm_qos = {
>> +     .constraints = &dvfs_lat_constraints,
>> +     .name = "dvfs_latency",
>> +};
>> +
>>  static struct pm_qos_object *pm_qos_array[] = {
>>       &null_pm_qos,
>>       &cpu_dma_pm_qos,
>>       &network_lat_pm_qos,
>> -     &network_throughput_pm_qos
>> +     &network_throughput_pm_qos,
>> +     &cpu_dma_throughput_pm_qos,
>> +     &dvfs_lat_pm_qos,
>>  };
>>
>>  static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
>>
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-07  5:02 ` [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency MyungJoo Ham
  2012-03-07  9:02   ` Rafael J. Wysocki
@ 2012-03-08  3:47   ` mark gross
  2012-03-09  5:53     ` MyungJoo Ham
  2012-03-10 22:25     ` Rafael J. Wysocki
  1 sibling, 2 replies; 19+ messages in thread
From: mark gross @ 2012-03-08  3:47 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Rafael J. Wysocki, Stephen Rothwell, Dave Jones, linux-pm,
	linux-next, Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet,
	markgross, kyungmin.park, myungjoo.ham, linux-kernel

On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> 1. CPU_DMA_THROUGHPUT
> 
> This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> blocks that creates QoS requirement based on DMA throughput, not
> latency, while their (those QoS requester H/W blocks) services are
> short-term bursts that cannot be effectively responsed by DVFS
> mechanisms (CPUFreq and Devfreq).
> 
> In the Exynos4412 systems that are being tested, such H/W blocks include
> MFC (multi-function codec)'s decoding and enconding features, TV-out
> (including HDMI), and Cameras. When the display is operated at 60Hz,
> each chunk of task should be done within 16ms and the workload on DMA is
> not well spread and fluctuates between frames; some frame requires more
> and some do not and within a frame, the workload also fluctuates
> heavily and the tasks within a frame are usually not parallelized; they
> are processed through specific H/W blocks, not CPU cores. They often
> have PPMU capabilities; however, they need to be polled very frequently
> in order to let DVFS mechanisms react properly. (less than 5ms).
> 
> For such specific tasks, allowing them to request QoS requirements seems
> adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> longer) cannot follow up with them. Besides, the device drivers know
> when to request and cancel QoS exactly.
> 
> 2. DVFS_LATENCY
> 
> Both CPUFreq and Devfreq have response latency to a sudden workload
> increase. With near-100% (e.g., 95%) up-threshold, the average response
> latency is approximately 1.5 x polling-rate.
> 
> A specific polling rate (e.g., 100ms) may generally fit for its system;
> however, there could be exceptions for that. For example,
> - When a user input suddenly starts: typing, clicking, moving cursors, and
>   such, the user might need the full performance immediately. However,
>   we do not know whether the full performance is actually needed or not
>   until we calculate the utilization; thus, we need to calculate it
>   faster with user inputs or any similar events. Specifying QoS on CPU
>   processing power or Memory bandwidth at every user input is an
>   overkill because there are many cases where such speed-up isn't
>   necessary.
> - When a device driver needs a faster performance response from DVFS
>   mechanism. This could be addressed by simply putting QoS requests.
>   However, such QoS requests may keep the system running fast
>   unnecessary in some cases, especially if a) the device's resource
>   usage bursts with some duration (e.g., 100ms-long bursts) and
>   b) the driver doesn't know when such burst come. MMC/WiFi often had
>   such behaviors although there are possibilities that part (b) might
>   be addressed with further efforts.
> 
> The cases shown above can be tackled with putting QoS requests on the
> response time or latency of DVFS mechanism, which is directly related to
> its polling interval (if the DVFS mechanism is polling based).
> 
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> --
> Changes from v2
> - Rebased on the recent PM QoS patches, resolving the merge conflict.
> 
> Changes from RFC(v1)
> - Added omitted part (registering new classes)
> ---
>  include/linux/pm_qos.h |    4 ++++
>  kernel/power/qos.c     |   31 ++++++++++++++++++++++++++++++-
>  2 files changed, 34 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> index c8a541e..0ee7caa 100644
> --- a/include/linux/pm_qos.h
> +++ b/include/linux/pm_qos.h
> @@ -14,6 +14,8 @@ enum {
>  	PM_QOS_CPU_DMA_LATENCY,
>  	PM_QOS_NETWORK_LATENCY,
>  	PM_QOS_NETWORK_THROUGHPUT,
> +	PM_QOS_CPU_DMA_THROUGHPUT,
> +	PM_QOS_DVFS_RESPONSE_LATENCY,
>  
>  	/* insert new class ID */
>  	PM_QOS_NUM_CLASSES,
> @@ -24,6 +26,8 @@ enum {
>  #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
>  #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
>  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE	0
> +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE	0
> +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
>  #define PM_QOS_DEV_LAT_DEFAULT_VALUE		0
>  
>  struct pm_qos_request {
> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> index d6d6dbd..3e122db 100644
> --- a/kernel/power/qos.c
> +++ b/kernel/power/qos.c
> @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
>  };
>  
>  
> +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
> +static struct pm_qos_constraints cpu_dma_tput_constraints = {
> +	.list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
> +	.target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> +	.default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> +	.type = PM_QOS_MAX,
> +	.notifiers = &cpu_dma_throughput_notifier,
> +};
> +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
> +	.constraints = &cpu_dma_tput_constraints,
> +	.name = "cpu_dma_throughput",
> +};
> +
> +
> +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
> +static struct pm_qos_constraints dvfs_lat_constraints = {
> +	.list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
> +	.target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> +	.default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> +	.type = PM_QOS_MIN,
> +	.notifiers = &dvfs_lat_notifier,
> +};
> +static struct pm_qos_object dvfs_lat_pm_qos = {
> +	.constraints = &dvfs_lat_constraints,
> +	.name = "dvfs_latency",
> +};
> +
>  static struct pm_qos_object *pm_qos_array[] = {
>  	&null_pm_qos,
>  	&cpu_dma_pm_qos,
>  	&network_lat_pm_qos,
> -	&network_throughput_pm_qos
> +	&network_throughput_pm_qos,
> +	&cpu_dma_throughput_pm_qos,
> +	&dvfs_lat_pm_qos,
>  };
>  
>  static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> -- 
> 1.7.4.1
>

The cpu_dma_throughput looks ok to me.  I do however; wonder about the
dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
matter so much?  why can't dvfs_lat use the cpu_dma_lat?

BTW I'll be out of town for the next 10 days and probably will not get
to this email account until I get home.

--mark


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-08  3:47   ` mark gross
@ 2012-03-09  5:53     ` MyungJoo Ham
  2012-03-10 22:53       ` Rafael J. Wysocki
  2012-03-10 22:25     ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-09  5:53 UTC (permalink / raw)
  To: markgross
  Cc: Rafael J. Wysocki, Stephen Rothwell, Dave Jones, linux-pm,
	linux-next, Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet,
	kyungmin.park, linux-kernel

On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
> On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
>> 1. CPU_DMA_THROUGHPUT
...
>> 2. DVFS_LATENCY
>
> The cpu_dma_throughput looks ok to me.  I do however; wonder about the
> dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
> matter so much?  why can't dvfs_lat use the cpu_dma_lat?
>
> BTW I'll be out of town for the next 10 days and probably will not get
> to this email account until I get home.
>
> --mark
>

1. Should DVFS Latency be exposed to user mode?

It would depend on the policy of the given system; however, yes, there
are systems that require a user interface for DVFS Latency.
With the example of user input response (response to user click,
typing, touching, and etc), a user program (probably platform s/w or
middleware) may input QoS requests. Besides, when a new "application"
is starting, such "middleware" may want faster responses from DVFS
mechanisms.


2. Does DVFS Latency matter?

Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
S2 equivalent; not exactly as I'm not conducted in Android systems,
but Tizen), we could see noticable difference w/ bare eyes for
user-input responses. When we shortened DVFS polling interval with
touches, the touch responses were greatly improved; e.g., losing 10
frames into losing 0 or 1 frame for a sudden input rush.

3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?

When we implement the user-input response enhancement with CPU-DMA QoS
requests, the PM-QoS will unconditionally increase CPU and BUS
frequencies/voltages with user inputs. However, with many cases it is
unnecessary; i.e., a user input means that there will be unexpected
changes soon; however, the change does not mean that the load will
increase. Thus, allowing DVFS mechanism to evolve faster was enough to
shorten the response time and not to increase frequencies and voltages
when not needed. There were significant difference in power
consumption with this changes if the user inputs were not involving
drastic graphics jobs; e.g., typing a text message.



Cheers!
MyungJoo.

-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-07  9:36     ` MyungJoo Ham
@ 2012-03-09  8:17       ` MyungJoo Ham
  2012-03-10 22:22         ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-09  8:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Rothwell, Dave Jones, linux-pm, linux-next, Len Brown,
	Pavel Machek, Kevin Hilman, Jean Pihet, markgross, kyungmin.park,
	linux-kernel

On Wed, Mar 7, 2012 at 6:36 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> 2012/3/7 Rafael J. Wysocki <rjw@sisk.pl>:
>> Hi,
>>
>> Can you please post all of your outstanding PM-related patches that
>> you want me to look at in one series, so that they appear in context?
>>
>> I'm struggling to understand what you need all those changes for.
>>
>
> Hello Rafael,
>
>
> I've put the patches at
>
> - devfreq patches (based on your pm-devfreq branch)
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/devfreq-for-next
>
> - pm-qos patches (based on your pm-qos branch)
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/pm_qos-for-next
>
>
> - In order to help understand, all related patches are at (do not pull
> from here, please)
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/devfreq
> devfreq patches + pm-qos patches + cpufreq patches combined. (based on
> Linux 3.3-rc6 + some test patches.)
> However, it does not include recent PM-QoS patches in pm-qos branch,
> so the patches are not recent and different from the above. So please
> do not pull from here to your branches.
>

Hi Rafael,


Here goes to-be-pulled branches for linux-pm.git of yours (pm-devfreq,
pm-qos); they have some updates since last submissions (like those
mentioned by Mike)
Please use the following addresses if you feel fine for them to be pulled.


Cheers!
MyungJoo.

-------- PM / devfreq (fitted for pm-devfreq branch) ----------
The following changes since commit e4c9d8efe6bdc844071d68960dfa2003c5cf6449:

  Merge branch 'devfreq-for-next' of
git://git.infradead.org/users/kmpark/linux-samsung into pm-devfreq
(2012-01-25 00:02:08 +0100)

are available in the git repository at:

  git://git.infradead.org/users/kmpark/linux-samsung devfreq-for-next

GITWEB: http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/devfreq-for-next

MyungJoo Ham (2):
      PM / devfreq: add PM QoS support
      PM / devfreq: add relation of recommended frequency.

 drivers/devfreq/devfreq.c     |  193 +++++++++++++++++++++++++++++++++++++++--
 drivers/devfreq/exynos4_bus.c |   14 ++-
 include/linux/devfreq.h       |   53 +++++++++++-
 3 files changed, 245 insertions(+), 15 deletions(-)



-------- PM / QoS (fitted for pm-qos branch) ----------
The following changes since commit a9b542ee607a8afafa9447292394959fc84ea650:

  PM / QoS: unconditionally build the feature (2012-02-13 16:23:42 +0100)

are available in the git repository at:
  git://git.infradead.org/users/kmpark/linux-samsung pm_qos-for-next

GITWEB: http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/pm_qos-for-next

MyungJoo Ham (2):
      PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
      PM / QoS: add pm_qos_update_request_timeout API

 include/linux/pm_qos.h |    8 +++++
 kernel/power/qos.c     |   81 +++++++++++++++++++++++++++++++++++++++++++++++-




-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-09  8:17       ` MyungJoo Ham
@ 2012-03-10 22:22         ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-03-10 22:22 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: Stephen Rothwell, Dave Jones, linux-pm, linux-next, Len Brown,
	Pavel Machek, Kevin Hilman, Jean Pihet, markgross, kyungmin.park,
	linux-kernel

Hi,

On Friday, March 09, 2012, MyungJoo Ham wrote:
> On Wed, Mar 7, 2012 at 6:36 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> > 2012/3/7 Rafael J. Wysocki <rjw@sisk.pl>:
[...]
> Hi Rafael,
> 
> 
> Here goes to-be-pulled branches for linux-pm.git of yours (pm-devfreq,
> pm-qos); they have some updates since last submissions (like those
> mentioned by Mike)
> Please use the following addresses if you feel fine for them to be pulled.
> 
> 
> Cheers!
> MyungJoo.
> 
> -------- PM / devfreq (fitted for pm-devfreq branch) ----------
> The following changes since commit e4c9d8efe6bdc844071d68960dfa2003c5cf6449:
> 
>   Merge branch 'devfreq-for-next' of
> git://git.infradead.org/users/kmpark/linux-samsung into pm-devfreq
> (2012-01-25 00:02:08 +0100)
> 
> are available in the git repository at:
> 
>   git://git.infradead.org/users/kmpark/linux-samsung devfreq-for-next
> 
> GITWEB: http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/devfreq-for-next
> 
> MyungJoo Ham (2):
>       PM / devfreq: add PM QoS support
>       PM / devfreq: add relation of recommended frequency.

This patch I'm fine whith and I can either take it as a standalone patch
or pull it from your tree if you put it into a separate branch for me.

The previous one, however, I'm not entirely comfortable with and I'd like
to discuss it in more detail.  Can you please post it for discussion?

>  drivers/devfreq/devfreq.c     |  193 +++++++++++++++++++++++++++++++++++++++--
>  drivers/devfreq/exynos4_bus.c |   14 ++-
>  include/linux/devfreq.h       |   53 +++++++++++-
>  3 files changed, 245 insertions(+), 15 deletions(-)
> 
> 
> 
> -------- PM / QoS (fitted for pm-qos branch) ----------
> The following changes since commit a9b542ee607a8afafa9447292394959fc84ea650:
> 
>   PM / QoS: unconditionally build the feature (2012-02-13 16:23:42 +0100)
> 
> are available in the git repository at:
>   git://git.infradead.org/users/kmpark/linux-samsung pm_qos-for-next
> 
> GITWEB: http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/pm_qos-for-next
> 
> MyungJoo Ham (2):
>       PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
>       PM / QoS: add pm_qos_update_request_timeout API

I'd prefer all discussions to settle down before pulling these two.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-08  3:47   ` mark gross
  2012-03-09  5:53     ` MyungJoo Ham
@ 2012-03-10 22:25     ` Rafael J. Wysocki
  1 sibling, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-03-10 22:25 UTC (permalink / raw)
  To: markgross, MyungJoo Ham
  Cc: Stephen Rothwell, Dave Jones, linux-pm, linux-next, Len Brown,
	Pavel Machek, Kevin Hilman, Jean Pihet, kyungmin.park,
	myungjoo.ham, linux-kernel

On Thursday, March 08, 2012, mark gross wrote:
> On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> > 1. CPU_DMA_THROUGHPUT
> > 
> > This might look simliar to CPU_DMA_LATENCY. However, there are H/W
> > blocks that creates QoS requirement based on DMA throughput, not
> > latency, while their (those QoS requester H/W blocks) services are
> > short-term bursts that cannot be effectively responsed by DVFS
> > mechanisms (CPUFreq and Devfreq).
> > 
> > In the Exynos4412 systems that are being tested, such H/W blocks include
> > MFC (multi-function codec)'s decoding and enconding features, TV-out
> > (including HDMI), and Cameras. When the display is operated at 60Hz,
> > each chunk of task should be done within 16ms and the workload on DMA is
> > not well spread and fluctuates between frames; some frame requires more
> > and some do not and within a frame, the workload also fluctuates
> > heavily and the tasks within a frame are usually not parallelized; they
> > are processed through specific H/W blocks, not CPU cores. They often
> > have PPMU capabilities; however, they need to be polled very frequently
> > in order to let DVFS mechanisms react properly. (less than 5ms).
> > 
> > For such specific tasks, allowing them to request QoS requirements seems
> > adequete because DVFS mechanisms (as long as the polling rate is 5ms or
> > longer) cannot follow up with them. Besides, the device drivers know
> > when to request and cancel QoS exactly.
> > 
> > 2. DVFS_LATENCY
> > 
> > Both CPUFreq and Devfreq have response latency to a sudden workload
> > increase. With near-100% (e.g., 95%) up-threshold, the average response
> > latency is approximately 1.5 x polling-rate.
> > 
> > A specific polling rate (e.g., 100ms) may generally fit for its system;
> > however, there could be exceptions for that. For example,
> > - When a user input suddenly starts: typing, clicking, moving cursors, and
> >   such, the user might need the full performance immediately. However,
> >   we do not know whether the full performance is actually needed or not
> >   until we calculate the utilization; thus, we need to calculate it
> >   faster with user inputs or any similar events. Specifying QoS on CPU
> >   processing power or Memory bandwidth at every user input is an
> >   overkill because there are many cases where such speed-up isn't
> >   necessary.
> > - When a device driver needs a faster performance response from DVFS
> >   mechanism. This could be addressed by simply putting QoS requests.
> >   However, such QoS requests may keep the system running fast
> >   unnecessary in some cases, especially if a) the device's resource
> >   usage bursts with some duration (e.g., 100ms-long bursts) and
> >   b) the driver doesn't know when such burst come. MMC/WiFi often had
> >   such behaviors although there are possibilities that part (b) might
> >   be addressed with further efforts.
> > 
> > The cases shown above can be tackled with putting QoS requests on the
> > response time or latency of DVFS mechanism, which is directly related to
> > its polling interval (if the DVFS mechanism is polling based).
> > 
> > Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> > 
> > --
> > Changes from v2
> > - Rebased on the recent PM QoS patches, resolving the merge conflict.
> > 
> > Changes from RFC(v1)
> > - Added omitted part (registering new classes)
> > ---
> >  include/linux/pm_qos.h |    4 ++++
> >  kernel/power/qos.c     |   31 ++++++++++++++++++++++++++++++-
> >  2 files changed, 34 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
> > index c8a541e..0ee7caa 100644
> > --- a/include/linux/pm_qos.h
> > +++ b/include/linux/pm_qos.h
> > @@ -14,6 +14,8 @@ enum {
> >  	PM_QOS_CPU_DMA_LATENCY,
> >  	PM_QOS_NETWORK_LATENCY,
> >  	PM_QOS_NETWORK_THROUGHPUT,
> > +	PM_QOS_CPU_DMA_THROUGHPUT,
> > +	PM_QOS_DVFS_RESPONSE_LATENCY,
> >  
> >  	/* insert new class ID */
> >  	PM_QOS_NUM_CLASSES,
> > @@ -24,6 +26,8 @@ enum {
> >  #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> >  #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> >  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE	0
> > +#define PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE	0
> > +#define PM_QOS_DVFS_LAT_DEFAULT_VALUE	(2000 * USEC_PER_SEC)
> >  #define PM_QOS_DEV_LAT_DEFAULT_VALUE		0
> >  
> >  struct pm_qos_request {
> > diff --git a/kernel/power/qos.c b/kernel/power/qos.c
> > index d6d6dbd..3e122db 100644
> > --- a/kernel/power/qos.c
> > +++ b/kernel/power/qos.c
> > @@ -101,11 +101,40 @@ static struct pm_qos_object network_throughput_pm_qos = {
> >  };
> >  
> >  
> > +static BLOCKING_NOTIFIER_HEAD(cpu_dma_throughput_notifier);
> > +static struct pm_qos_constraints cpu_dma_tput_constraints = {
> > +	.list = PLIST_HEAD_INIT(cpu_dma_tput_constraints.list),
> > +	.target_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> > +	.default_value = PM_QOS_CPU_DMA_THROUGHPUT_DEFAULT_VALUE,
> > +	.type = PM_QOS_MAX,
> > +	.notifiers = &cpu_dma_throughput_notifier,
> > +};
> > +static struct pm_qos_object cpu_dma_throughput_pm_qos = {
> > +	.constraints = &cpu_dma_tput_constraints,
> > +	.name = "cpu_dma_throughput",
> > +};
> > +
> > +
> > +static BLOCKING_NOTIFIER_HEAD(dvfs_lat_notifier);
> > +static struct pm_qos_constraints dvfs_lat_constraints = {
> > +	.list = PLIST_HEAD_INIT(dvfs_lat_constraints.list),
> > +	.target_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> > +	.default_value = PM_QOS_DVFS_LAT_DEFAULT_VALUE,
> > +	.type = PM_QOS_MIN,
> > +	.notifiers = &dvfs_lat_notifier,
> > +};
> > +static struct pm_qos_object dvfs_lat_pm_qos = {
> > +	.constraints = &dvfs_lat_constraints,
> > +	.name = "dvfs_latency",
> > +};
> > +
> >  static struct pm_qos_object *pm_qos_array[] = {
> >  	&null_pm_qos,
> >  	&cpu_dma_pm_qos,
> >  	&network_lat_pm_qos,
> > -	&network_throughput_pm_qos
> > +	&network_throughput_pm_qos,
> > +	&cpu_dma_throughput_pm_qos,
> > +	&dvfs_lat_pm_qos,
> >  };
> >  
> >  static ssize_t pm_qos_power_write(struct file *filp, const char __user *buf,
> >
> 
> The cpu_dma_throughput looks ok to me.

I agree with Mark, but I'm not sure about the name.  Specifically, I'm not sure
what the CPU has to do with that?

Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-09  5:53     ` MyungJoo Ham
@ 2012-03-10 22:53       ` Rafael J. Wysocki
  2012-03-16  8:30         ` MyungJoo Ham
  2012-03-18 16:50         ` mark gross
  0 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-03-10 22:53 UTC (permalink / raw)
  To: myungjoo.ham
  Cc: markgross, Stephen Rothwell, Dave Jones, linux-pm, linux-next,
	Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet, kyungmin.park,
	linux-kernel

On Friday, March 09, 2012, MyungJoo Ham wrote:
> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> >> 1. CPU_DMA_THROUGHPUT
> ...
> >> 2. DVFS_LATENCY
> >
> > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
> > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
> > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
> >
> > BTW I'll be out of town for the next 10 days and probably will not get
> > to this email account until I get home.
> >
> > --mark
> >
> 
> 1. Should DVFS Latency be exposed to user mode?
> 
> It would depend on the policy of the given system; however, yes, there
> are systems that require a user interface for DVFS Latency.
> With the example of user input response (response to user click,
> typing, touching, and etc), a user program (probably platform s/w or
> middleware) may input QoS requests. Besides, when a new "application"
> is starting, such "middleware" may want faster responses from DVFS
> mechanisms.

But this is a global knob, isn't it?  And it seems that a per-device one
is needed rather than that?

It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?

> 2. Does DVFS Latency matter?
> 
> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
> S2 equivalent; not exactly as I'm not conducted in Android systems,
> but Tizen), we could see noticable difference w/ bare eyes for
> user-input responses. When we shortened DVFS polling interval with
> touches, the touch responses were greatly improved; e.g., losing 10
> frames into losing 0 or 1 frame for a sudden input rush.

Well, this basically means PM QoS matters, which is kind of obvious.
It doesn't mean that it can't be implemented in a better way, though.

> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
> 
> When we implement the user-input response enhancement with CPU-DMA QoS
> requests, the PM-QoS will unconditionally increase CPU and BUS
> frequencies/voltages with user inputs. However, with many cases it is
> unnecessary; i.e., a user input means that there will be unexpected
> changes soon; however, the change does not mean that the load will
> increase. Thus, allowing DVFS mechanism to evolve faster was enough to
> shorten the response time and not to increase frequencies and voltages
> when not needed. There were significant difference in power
> consumption with this changes if the user inputs were not involving
> drastic graphics jobs; e.g., typing a text message.

Again, you're arguing for having PM QoS rather than not having it.  You don't
have to do that. :-)

Generally speaking, I don't think we should add any more PM QoS "classes"
as defined in pm_qos.h, since they are global and there's only one
list of requests per class.  While that may be good for CPU power
management (in an SMP system all CPUs are identical, so the same list of
requests may be applied to all of them), it generally isn't for I/O
devices (some of them work in different time scales, for example).

So, for example, most likely, a list of PM QoS requests for storage devices
shouldn't be applied to input devices (keyboards and mice to be precise) and
vice versa.

On the other hand, I don't think that applications should access PM QoS
interfaces associated with individual devices directly, because they may
not have enough information about the relationships between devices in the
system.  So, perhaps, there needs to be an interface allowing applications
to specify their PM QoS expectations in a general way (e.g. "I want <number>
disk I/O throughput") and a code layer between that interface and device
drivers translating those expecataions into PM QoS requests for specific
devices.  However, that would require support from subsystems throughout
the kernel (e.g. if an application wants specific disk I/O throughput,
we need to figure out what disks are used by that application and apply
appropriate PM QoS requests to them on behalf of it and that may require
support from the VFS and the block layer).

I don't really think we have sufficiently understood the problem area yet.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-10 22:53       ` Rafael J. Wysocki
@ 2012-03-16  8:30         ` MyungJoo Ham
  2012-03-17  0:01           ` Rafael J. Wysocki
  2012-03-18 17:06           ` mark gross
  2012-03-18 16:50         ` mark gross
  1 sibling, 2 replies; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-16  8:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: markgross, Stephen Rothwell, Dave Jones, linux-pm, linux-next,
	Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet, kyungmin.park,
	linux-kernel

On Sun, Mar 11, 2012 at 7:53 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, March 09, 2012, MyungJoo Ham wrote:
>> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
>> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
>> >> 1. CPU_DMA_THROUGHPUT
>> ...
>> >> 2. DVFS_LATENCY
>> >
>> > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
>> > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
>> > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
>> >
>> > BTW I'll be out of town for the next 10 days and probably will not get
>> > to this email account until I get home.
>> >
>> > --mark
>> >
>>
>> 1. Should DVFS Latency be exposed to user mode?
>>
>> It would depend on the policy of the given system; however, yes, there
>> are systems that require a user interface for DVFS Latency.
>> With the example of user input response (response to user click,
>> typing, touching, and etc), a user program (probably platform s/w or
>> middleware) may input QoS requests. Besides, when a new "application"
>> is starting, such "middleware" may want faster responses from DVFS
>> mechanisms.
>
> But this is a global knob, isn't it?  And it seems that a per-device one
> is needed rather than that?
>
> It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?


Yes, the two are global knobs. And both the two control multiple
devices simultaneously, not just a single device. I suppose per-device
QoS is appropriate for QoS requests directed to a single device. Am I
right about this one?


Let's assume that, in an example system, we have devfreq on GPU,
memory-Interface, and main bus and CPUfreq (Exynos5 will have them all
seperated).

If we use per-device QoS for DVFS LATENCY, in order to control the
DVFS response latency, we will need to make QoS requests to all the
four devices independently, not to the global DVFS LATENCY QOS CLASS.
There, we could have a shared single QoS request list for these four
DVFS devices, saying that the DVFS response should be done in "50ms"
after a sudden utilization increase.

We may be able to use "dev_pm_qos_add_notifier()" for a virtual device
representing "DVFS Latency" or "DMA Throughput" and let the GPU, CPU,
main-bus, and memory-interface listen to the events from the virtual
device. Hmm..., do you recommend this approach? creating a device
representing "DVFS" as a whole (both CPUFreq and device drivers of
devfreq).

CPU_DMA_THROUGHPUT is quite similar as CPU_DMA_LATENCY. However, we
think it is addtionally needed because many IPs (in-SoC devices) need
to specify its DMA usage in "kbytes/sec", not "usecs/ops". For
example, a video-decoding chip device driver may say it requires
"750000kbytes/sec" for 1080p60, "300000kbytes/sec" for 720p60, and so
on, which affects CPUfreq, memory-interface, and main-bus at the same
time.

>
>> 2. Does DVFS Latency matter?
>>
>> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
>> S2 equivalent; not exactly as I'm not conducted in Android systems,
>> but Tizen), we could see noticable difference w/ bare eyes for
>> user-input responses. When we shortened DVFS polling interval with
>> touches, the touch responses were greatly improved; e.g., losing 10
>> frames into losing 0 or 1 frame for a sudden input rush.
>
> Well, this basically means PM QoS matters, which is kind of obvious.
> It doesn't mean that it can't be implemented in a better way, though.

For DVFS-Latency and DMA-Throughput, I think a normal pm-qos-dev (one
device per one qos knob) isn't appropriate because there are multiple
devices that are required to react simultaneously.

It is possible to let multiple devices react by adding notifiers with
dev_pm_qos_add_notifier(). However, I felt that it wasn't the purpose
of this one and it might get things ugly. Anyway, was allowing
multiple devices to change their frequencies/voltages for a single
per-device QoS list the purpose of dev_pm_qos_add_notifier()?


Just throwing an idea and suggestion if it was the purpose,
I speculate that If we are going to do this (supporting multiple
devices per one qos knob without adding QoS class), we'd better create
"qos class device" in /drivers/qos/ and let those qos class handle
multiple devices depending on a single "qos class". Probably, this
will transform "global PM-QoS class" that notifies related devices
into "QoS class device" that notifies related devices.

>
>> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
>>
>> When we implement the user-input response enhancement with CPU-DMA QoS
>> requests, the PM-QoS will unconditionally increase CPU and BUS
>> frequencies/voltages with user inputs. However, with many cases it is
>> unnecessary; i.e., a user input means that there will be unexpected
>> changes soon; however, the change does not mean that the load will
>> increase. Thus, allowing DVFS mechanism to evolve faster was enough to
>> shorten the response time and not to increase frequencies and voltages
>> when not needed. There were significant difference in power
>> consumption with this changes if the user inputs were not involving
>> drastic graphics jobs; e.g., typing a text message.
>
> Again, you're arguing for having PM QoS rather than not having it.  You don't
> have to do that. :-)
>
> Generally speaking, I don't think we should add any more PM QoS "classes"
> as defined in pm_qos.h, since they are global and there's only one
> list of requests per class.  While that may be good for CPU power
> management (in an SMP system all CPUs are identical, so the same list of
> requests may be applied to all of them), it generally isn't for I/O
> devices (some of them work in different time scales, for example).
>
> So, for example, most likely, a list of PM QoS requests for storage devices
> shouldn't be applied to input devices (keyboards and mice to be precise) and
> vice versa.
>
> On the other hand, I don't think that applications should access PM QoS
> interfaces associated with individual devices directly, because they may
> not have enough information about the relationships between devices in the
> system.  So, perhaps, there needs to be an interface allowing applications
> to specify their PM QoS expectations in a general way (e.g. "I want <number>
> disk I/O throughput") and a code layer between that interface and device
> drivers translating those expecataions into PM QoS requests for specific
> devices.

With DVFS Latency PM QoS Class, we can say "I want the system to react
in 50ms for any sudden utilization increases.". Without it, we should
say, for example, "CPUFreq/Ondemand should set interval at 25ms,
Devfreq/Bus should set interval at 25ms, and Devfreq/GPU should set
interval at 10ms."

And with CPU Throughput PM QoS Class, we can say "I want 1000000
kbytes/sec DMA transfer". Without it, we should say "Memory-Interface
at 1000000 kbytes/sec, Exynos4412 core should be at least 500MHz, and
Bus should be at least 166MHz".


Thank you.


Cheers!
MyungJoo.


> However, that would require support from subsystems throughout
> the kernel (e.g. if an application wants specific disk I/O throughput,
> we need to figure out what disks are used by that application and apply
> appropriate PM QoS requests to them on behalf of it and that may require
> support from the VFS and the block layer).
>
> I don't really think we have sufficiently understood the problem area yet.
>
> Thanks,
> Rafael
>



-- 
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-16  8:30         ` MyungJoo Ham
@ 2012-03-17  0:01           ` Rafael J. Wysocki
  2012-03-18 17:06           ` mark gross
  1 sibling, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2012-03-17  0:01 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: markgross, Stephen Rothwell, Dave Jones, linux-pm, linux-next,
	Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet, kyungmin.park,
	linux-kernel

On Friday, March 16, 2012, MyungJoo Ham wrote:
> On Sun, Mar 11, 2012 at 7:53 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, March 09, 2012, MyungJoo Ham wrote:
> >> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
> >> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> >> >> 1. CPU_DMA_THROUGHPUT
> >> ...
> >> >> 2. DVFS_LATENCY
> >> >
> >> > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
> >> > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
> >> > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
> >> >
> >> > BTW I'll be out of town for the next 10 days and probably will not get
> >> > to this email account until I get home.
> >> >
> >> > --mark
> >> >
> >>
> >> 1. Should DVFS Latency be exposed to user mode?
> >>
> >> It would depend on the policy of the given system; however, yes, there
> >> are systems that require a user interface for DVFS Latency.
> >> With the example of user input response (response to user click,
> >> typing, touching, and etc), a user program (probably platform s/w or
> >> middleware) may input QoS requests. Besides, when a new "application"
> >> is starting, such "middleware" may want faster responses from DVFS
> >> mechanisms.
> >
> > But this is a global knob, isn't it?  And it seems that a per-device one
> > is needed rather than that?
> >
> > It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?
> 
> 
> Yes, the two are global knobs. And both the two control multiple
> devices simultaneously, not just a single device. I suppose per-device
> QoS is appropriate for QoS requests directed to a single device. Am I
> right about this one?
> 
> 
> Let's assume that, in an example system, we have devfreq on GPU,
> memory-Interface, and main bus and CPUfreq (Exynos5 will have them all
> seperated).
> 
> If we use per-device QoS for DVFS LATENCY, in order to control the
> DVFS response latency, we will need to make QoS requests to all the
> four devices independently, not to the global DVFS LATENCY QOS CLASS.
> There, we could have a shared single QoS request list for these four
> DVFS devices, saying that the DVFS response should be done in "50ms"
> after a sudden utilization increase.

I think that the fact that you use the same value for all of those things
is a policy decision.  You might as well use different values for different
things and the decision whether or not to do that should be left to user
space, IMO.

> We may be able to use "dev_pm_qos_add_notifier()" for a virtual device
> representing "DVFS Latency" or "DMA Throughput" and let the GPU, CPU,
> main-bus, and memory-interface listen to the events from the virtual
> device. Hmm..., do you recommend this approach? creating a device
> representing "DVFS" as a whole (both CPUFreq and device drivers of
> devfreq).

While there may be an interface representing a "global" or "default"
setting, I don't think it really should be a device.  Just a separate
interface with well defined purpose.

> CPU_DMA_THROUGHPUT is quite similar as CPU_DMA_LATENCY. However, we
> think it is addtionally needed because many IPs (in-SoC devices) need
> to specify its DMA usage in "kbytes/sec", not "usecs/ops". For
> example, a video-decoding chip device driver may say it requires
> "750000kbytes/sec" for 1080p60, "300000kbytes/sec" for 720p60, and so
> on, which affects CPUfreq, memory-interface, and main-bus at the same
> time.

That depends on what you want to use that for.  I'd really prefer to
see it in one patch along with the user.

> >> 2. Does DVFS Latency matter?
> >>
> >> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
> >> S2 equivalent; not exactly as I'm not conducted in Android systems,
> >> but Tizen), we could see noticable difference w/ bare eyes for
> >> user-input responses. When we shortened DVFS polling interval with
> >> touches, the touch responses were greatly improved; e.g., losing 10
> >> frames into losing 0 or 1 frame for a sudden input rush.
> >
> > Well, this basically means PM QoS matters, which is kind of obvious.
> > It doesn't mean that it can't be implemented in a better way, though.
> 
> For DVFS-Latency and DMA-Throughput, I think a normal pm-qos-dev (one
> device per one qos knob) isn't appropriate because there are multiple
> devices that are required to react simultaneously.
> 
> It is possible to let multiple devices react by adding notifiers with
> dev_pm_qos_add_notifier(). However, I felt that it wasn't the purpose
> of this one and it might get things ugly.

I agree with that.

> Anyway, was allowing multiple devices to change their
> frequencies/voltages for a single per-device QoS list the purpose of
> dev_pm_qos_add_notifier()?

Well, please first tell me why exactly those devices are _required_ to
react simultaneously.

> Just throwing an idea and suggestion if it was the purpose,
> I speculate that If we are going to do this (supporting multiple
> devices per one qos knob without adding QoS class), we'd better create
> "qos class device" in /drivers/qos/ and let those qos class handle
> multiple devices depending on a single "qos class". Probably, this
> will transform "global PM-QoS class" that notifies related devices
> into "QoS class device" that notifies related devices.

At a general level, it would make sense to use a single PM QoS "knob"
for multiple devices at the same time, with an in-kernel API to
add/remove devices to/from such a "class".  The details depend on the
implementation, though.

> >
> >> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
> >>
> >> When we implement the user-input response enhancement with CPU-DMA QoS
> >> requests, the PM-QoS will unconditionally increase CPU and BUS
> >> frequencies/voltages with user inputs. However, with many cases it is
> >> unnecessary; i.e., a user input means that there will be unexpected
> >> changes soon; however, the change does not mean that the load will
> >> increase. Thus, allowing DVFS mechanism to evolve faster was enough to
> >> shorten the response time and not to increase frequencies and voltages
> >> when not needed. There were significant difference in power
> >> consumption with this changes if the user inputs were not involving
> >> drastic graphics jobs; e.g., typing a text message.
> >
> > Again, you're arguing for having PM QoS rather than not having it.  You don't
> > have to do that. :-)
> >
> > Generally speaking, I don't think we should add any more PM QoS "classes"
> > as defined in pm_qos.h, since they are global and there's only one
> > list of requests per class.  While that may be good for CPU power
> > management (in an SMP system all CPUs are identical, so the same list of
> > requests may be applied to all of them), it generally isn't for I/O
> > devices (some of them work in different time scales, for example).
> >
> > So, for example, most likely, a list of PM QoS requests for storage devices
> > shouldn't be applied to input devices (keyboards and mice to be precise) and
> > vice versa.
> >
> > On the other hand, I don't think that applications should access PM QoS
> > interfaces associated with individual devices directly, because they may
> > not have enough information about the relationships between devices in the
> > system.  So, perhaps, there needs to be an interface allowing applications
> > to specify their PM QoS expectations in a general way (e.g. "I want <number>
> > disk I/O throughput") and a code layer between that interface and device
> > drivers translating those expecataions into PM QoS requests for specific
> > devices.
> 
> With DVFS Latency PM QoS Class, we can say "I want the system to react
> in 50ms for any sudden utilization increases.". Without it, we should
> say, for example, "CPUFreq/Ondemand should set interval at 25ms,
> Devfreq/Bus should set interval at 25ms, and Devfreq/GPU should set
> interval at 10ms."
> 
> And with CPU Throughput PM QoS Class, we can say "I want 1000000
> kbytes/sec DMA transfer". Without it, we should say "Memory-Interface
> at 1000000 kbytes/sec, Exynos4412 core should be at least 500MHz, and
> Bus should be at least 166MHz".

That really depends on the design of the user space "management" layer.
Ideally, it should translate global requirements (like "I want the system to
react in 50ms for any sudden utilization increases") into specific PM QoS
settings (presumably with some user or platform designer configuration
input).  If we try to do that in the kernel, we'll end up with a quite
inflexible system.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-10 22:53       ` Rafael J. Wysocki
  2012-03-16  8:30         ` MyungJoo Ham
@ 2012-03-18 16:50         ` mark gross
  1 sibling, 0 replies; 19+ messages in thread
From: mark gross @ 2012-03-18 16:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: myungjoo.ham, markgross, Stephen Rothwell, Dave Jones, linux-pm,
	linux-next, Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet,
	kyungmin.park, linux-kernel

On Sat, Mar 10, 2012 at 11:53:23PM +0100, Rafael J. Wysocki wrote:
> On Friday, March 09, 2012, MyungJoo Ham wrote:
> > On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
> > > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> > >> 1. CPU_DMA_THROUGHPUT
> > ...
> > >> 2. DVFS_LATENCY
> > >
> > > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
> > > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
> > > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
> > >
> > > BTW I'll be out of town for the next 10 days and probably will not get
> > > to this email account until I get home.
> > >
> > > --mark
> > >
> > 
> > 1. Should DVFS Latency be exposed to user mode?
> > 
> > It would depend on the policy of the given system; however, yes, there
> > are systems that require a user interface for DVFS Latency.
> > With the example of user input response (response to user click,
> > typing, touching, and etc), a user program (probably platform s/w or
> > middleware) may input QoS requests. Besides, when a new "application"
> > is starting, such "middleware" may want faster responses from DVFS
> > mechanisms.
> 
> But this is a global knob, isn't it?  And it seems that a per-device one
> is needed rather than that?
> 
> It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?
> 
> > 2. Does DVFS Latency matter?
> > 
> > Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
> > S2 equivalent; not exactly as I'm not conducted in Android systems,
> > but Tizen), we could see noticable difference w/ bare eyes for
> > user-input responses. When we shortened DVFS polling interval with
> > touches, the touch responses were greatly improved; e.g., losing 10
> > frames into losing 0 or 1 frame for a sudden input rush.
> 
> Well, this basically means PM QoS matters, which is kind of obvious.
> It doesn't mean that it can't be implemented in a better way, though.
> 
> > 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
> > 
> > When we implement the user-input response enhancement with CPU-DMA QoS
> > requests, the PM-QoS will unconditionally increase CPU and BUS
> > frequencies/voltages with user inputs. However, with many cases it is
> > unnecessary; i.e., a user input means that there will be unexpected
> > changes soon; however, the change does not mean that the load will
> > increase. Thus, allowing DVFS mechanism to evolve faster was enough to
> > shorten the response time and not to increase frequencies and voltages
> > when not needed. There were significant difference in power
> > consumption with this changes if the user inputs were not involving
> > drastic graphics jobs; e.g., typing a text message.
> 
> Again, you're arguing for having PM QoS rather than not having it.  You don't
> have to do that. :-)
> 
> Generally speaking, I don't think we should add any more PM QoS "classes"
> as defined in pm_qos.h, since they are global and there's only one
> list of requests per class.  While that may be good for CPU power
> management (in an SMP system all CPUs are identical, so the same list of
> requests may be applied to all of them), it generally isn't for I/O
> devices (some of them work in different time scales, for example).
> 
> So, for example, most likely, a list of PM QoS requests for storage devices
> shouldn't be applied to input devices (keyboards and mice to be precise) and
> vice versa.
> 
> On the other hand, I don't think that applications should access PM QoS
> interfaces associated with individual devices directly, because they may
> not have enough information about the relationships between devices in the
> system.  So, perhaps, there needs to be an interface allowing applications
> to specify their PM QoS expectations in a general way (e.g. "I want <number>
> disk I/O throughput") and a code layer between that interface and device
> drivers translating those expecataions into PM QoS requests for specific
> devices.  However, that would require support from subsystems throughout
> the kernel (e.g. if an application wants specific disk I/O throughput,
> we need to figure out what disks are used by that application and apply
> appropriate PM QoS requests to them on behalf of it and that may require
> support from the VFS and the block layer).

FWIW The thought experiment I try to do (but sometimes forget to do) is
to consider how a qos constraint can be expressed in a platform
independent way.  i.e. can I write an application or middle ware in such
a way that it can express the exact same qos-request on a ARM based
system and an x86 based system (or even a different ARM system with say,
many cores or different performance characteristics) and have it work right.

If the answer is no, you need to tune the application for the platform
its running on then, we need to step back and thing things through
before exposing them though to user mode.  Candidate implementations
need to scale across architectures and board implementations.

> I don't really think we have sufficiently understood the problem area yet.

I agree.  However; I do know that this is an area we need to work on.
FWIW the x86 SOC's are also starting to have use of such things as well.

--mark

> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-16  8:30         ` MyungJoo Ham
  2012-03-17  0:01           ` Rafael J. Wysocki
@ 2012-03-18 17:06           ` mark gross
  2012-03-26 12:06             ` MyungJoo Ham
  1 sibling, 1 reply; 19+ messages in thread
From: mark gross @ 2012-03-18 17:06 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: Rafael J. Wysocki, markgross, Stephen Rothwell, Dave Jones,
	linux-pm, linux-next, Len Brown, Pavel Machek, Kevin Hilman,
	Jean Pihet, kyungmin.park, linux-kernel

On Fri, Mar 16, 2012 at 05:30:33PM +0900, MyungJoo Ham wrote:
> On Sun, Mar 11, 2012 at 7:53 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, March 09, 2012, MyungJoo Ham wrote:
> >> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
> >> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> >> >> 1. CPU_DMA_THROUGHPUT
> >> ...
> >> >> 2. DVFS_LATENCY
> >> >
> >> > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
> >> > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
> >> > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
> >> >
> >> > BTW I'll be out of town for the next 10 days and probably will not get
> >> > to this email account until I get home.
> >> >
> >> > --mark
> >> >
> >>
> >> 1. Should DVFS Latency be exposed to user mode?
> >>
> >> It would depend on the policy of the given system; however, yes, there
> >> are systems that require a user interface for DVFS Latency.
> >> With the example of user input response (response to user click,
> >> typing, touching, and etc), a user program (probably platform s/w or
> >> middleware) may input QoS requests. Besides, when a new "application"
> >> is starting, such "middleware" may want faster responses from DVFS
> >> mechanisms.
> >
> > But this is a global knob, isn't it?  And it seems that a per-device one
> > is needed rather than that?
> >
> > It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?
> 
> 
> Yes, the two are global knobs. And both the two control multiple
> devices simultaneously, not just a single device. I suppose per-device
> QoS is appropriate for QoS requests directed to a single device. Am I
> right about this one?
> 
> 
> Let's assume that, in an example system, we have devfreq on GPU,
> memory-Interface, and main bus and CPUfreq (Exynos5 will have them all
> seperated).
> 
> If we use per-device QoS for DVFS LATENCY, in order to control the
> DVFS response latency, we will need to make QoS requests to all the
> four devices independently, not to the global DVFS LATENCY QOS CLASS.
> There, we could have a shared single QoS request list for these four
> DVFS devices, saying that the DVFS response should be done in "50ms"
> after a sudden utilization increase.
> 
> We may be able to use "dev_pm_qos_add_notifier()" for a virtual device
> representing "DVFS Latency" or "DMA Throughput" and let the GPU, CPU,
> main-bus, and memory-interface listen to the events from the virtual
> device. Hmm..., do you recommend this approach? creating a device
> representing "DVFS" as a whole (both CPUFreq and device drivers of
> devfreq).
> 
> CPU_DMA_THROUGHPUT is quite similar as CPU_DMA_LATENCY. However, we
> think it is addtionally needed because many IPs (in-SoC devices) need
> to specify its DMA usage in "kbytes/sec", not "usecs/ops". For
> example, a video-decoding chip device driver may say it requires
> "750000kbytes/sec" for 1080p60, "300000kbytes/sec" for 720p60, and so
> on, which affects CPUfreq, memory-interface, and main-bus at the same
> time.
I have an example of a need for cpu_dma_throughput for x86 soc's as
well.  Mostly my example comes down to on-demand thinking the work load
is low (gpu is doing all the work) yet the work load needs a higher
clock rates between frame times to avoid buffer under running the gfx
pipe).

My version of the patch didn't fly too well because it failed to offer a
scalable definition of the units of cpu_dma_throughput.  I tried using
KHZ as the unit (the units used in cpufreq).  However; Applications
written to assume HZ units on one system would need to re-written on the
next.  Perhaps using bandwidth would be better than throughput?



> >
> >> 2. Does DVFS Latency matter?
> >>
> >> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
> >> S2 equivalent; not exactly as I'm not conducted in Android systems,
> >> but Tizen), we could see noticable difference w/ bare eyes for
> >> user-input responses. When we shortened DVFS polling interval with
> >> touches, the touch responses were greatly improved; e.g., losing 10
> >> frames into losing 0 or 1 frame for a sudden input rush.
> >
> > Well, this basically means PM QoS matters, which is kind of obvious.
> > It doesn't mean that it can't be implemented in a better way, though.
> 
> For DVFS-Latency and DMA-Throughput, I think a normal pm-qos-dev (one
> device per one qos knob) isn't appropriate because there are multiple
> devices that are required to react simultaneously.
> 
> It is possible to let multiple devices react by adding notifiers with
> dev_pm_qos_add_notifier(). However, I felt that it wasn't the purpose
> of this one and it might get things ugly. Anyway, was allowing
> multiple devices to change their frequencies/voltages for a single
> per-device QoS list the purpose of dev_pm_qos_add_notifier()?
> 
> 
> Just throwing an idea and suggestion if it was the purpose,
> I speculate that If we are going to do this (supporting multiple
> devices per one qos knob without adding QoS class), we'd better create
> "qos class device" in /drivers/qos/ and let those qos class handle
> multiple devices depending on a single "qos class". Probably, this
> will transform "global PM-QoS class" that notifies related devices
> into "QoS class device" that notifies related devices.
> 
> >
> >> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
> >>
> >> When we implement the user-input response enhancement with CPU-DMA QoS
> >> requests, the PM-QoS will unconditionally increase CPU and BUS
> >> frequencies/voltages with user inputs. However, with many cases it is
> >> unnecessary; i.e., a user input means that there will be unexpected
> >> changes soon; however, the change does not mean that the load will
> >> increase. Thus, allowing DVFS mechanism to evolve faster was enough to
> >> shorten the response time and not to increase frequencies and voltages
> >> when not needed. There were significant difference in power
> >> consumption with this changes if the user inputs were not involving
> >> drastic graphics jobs; e.g., typing a text message.
> >
> > Again, you're arguing for having PM QoS rather than not having it.  You don't
> > have to do that. :-)
> >
> > Generally speaking, I don't think we should add any more PM QoS "classes"
> > as defined in pm_qos.h, since they are global and there's only one
> > list of requests per class.  While that may be good for CPU power
> > management (in an SMP system all CPUs are identical, so the same list of
> > requests may be applied to all of them), it generally isn't for I/O
> > devices (some of them work in different time scales, for example).
> >
> > So, for example, most likely, a list of PM QoS requests for storage devices
> > shouldn't be applied to input devices (keyboards and mice to be precise) and
> > vice versa.
> >
> > On the other hand, I don't think that applications should access PM QoS
> > interfaces associated with individual devices directly, because they may
> > not have enough information about the relationships between devices in the
> > system.  So, perhaps, there needs to be an interface allowing applications
> > to specify their PM QoS expectations in a general way (e.g. "I want <number>
> > disk I/O throughput") and a code layer between that interface and device
> > drivers translating those expecataions into PM QoS requests for specific
> > devices.
> 
> With DVFS Latency PM QoS Class, we can say "I want the system to react
> in 50ms for any sudden utilization increases.". Without it, we should
> say, for example, "CPUFreq/Ondemand should set interval at 25ms,
> Devfreq/Bus should set interval at 25ms, and Devfreq/GPU should set
> interval at 10ms."
> 
> And with CPU Throughput PM QoS Class, we can say "I want 1000000
> kbytes/sec DMA transfer". Without it, we should say "Memory-Interface
> at 1000000 kbytes/sec, Exynos4412 core should be at least 500MHz, and
> Bus should be at least 166MHz".
> 

What things are coming down to is we need to see if we can identify good
abstractions that can be portable / scalable across ISA's and boards,
such that applications would not need to be changed to work correctly
across all of them.

One issue I have with adding a single DVFS latency and throughput pm-qos
parameter is that what Device the DVFS *really* means changes from one
board to the next.  Thus making it impossible to abstract to user mode.

--mark

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-18 17:06           ` mark gross
@ 2012-03-26 12:06             ` MyungJoo Ham
  2012-03-26 14:38               ` mark gross
  0 siblings, 1 reply; 19+ messages in thread
From: MyungJoo Ham @ 2012-03-26 12:06 UTC (permalink / raw)
  To: markgross
  Cc: Rafael J. Wysocki, Stephen Rothwell, Dave Jones, linux-pm,
	linux-next, Len Brown, Pavel Machek, Kevin Hilman, Jean Pihet,
	kyungmin.park, linux-kernel

On Mon, Mar 19, 2012 at 2:06 AM, mark gross <markgross@thegnar.org> wrote:
> On Fri, Mar 16, 2012 at 05:30:33PM +0900, MyungJoo Ham wrote:
>> On Sun, Mar 11, 2012 at 7:53 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>> > On Friday, March 09, 2012, MyungJoo Ham wrote:
>> >> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
>> >> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
>> >> >> 1. CPU_DMA_THROUGHPUT
>> >> ...
>> >> >> 2. DVFS_LATENCY
>> >> >
>> >> > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
>> >> > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
>> >> > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
>> >> >
>> >> > BTW I'll be out of town for the next 10 days and probably will not get
>> >> > to this email account until I get home.
>> >> >
>> >> > --mark
>> >> >
>> >>
>> >> 1. Should DVFS Latency be exposed to user mode?
>> >>
>> >> It would depend on the policy of the given system; however, yes, there
>> >> are systems that require a user interface for DVFS Latency.
>> >> With the example of user input response (response to user click,
>> >> typing, touching, and etc), a user program (probably platform s/w or
>> >> middleware) may input QoS requests. Besides, when a new "application"
>> >> is starting, such "middleware" may want faster responses from DVFS
>> >> mechanisms.
>> >
>> > But this is a global knob, isn't it?  And it seems that a per-device one
>> > is needed rather than that?
>> >
>> > It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?
>>
>>
>> Yes, the two are global knobs. And both the two control multiple
>> devices simultaneously, not just a single device. I suppose per-device
>> QoS is appropriate for QoS requests directed to a single device. Am I
>> right about this one?
>>
>>
>> Let's assume that, in an example system, we have devfreq on GPU,
>> memory-Interface, and main bus and CPUfreq (Exynos5 will have them all
>> seperated).
>>
>> If we use per-device QoS for DVFS LATENCY, in order to control the
>> DVFS response latency, we will need to make QoS requests to all the
>> four devices independently, not to the global DVFS LATENCY QOS CLASS.
>> There, we could have a shared single QoS request list for these four
>> DVFS devices, saying that the DVFS response should be done in "50ms"
>> after a sudden utilization increase.
>>
>> We may be able to use "dev_pm_qos_add_notifier()" for a virtual device
>> representing "DVFS Latency" or "DMA Throughput" and let the GPU, CPU,
>> main-bus, and memory-interface listen to the events from the virtual
>> device. Hmm..., do you recommend this approach? creating a device
>> representing "DVFS" as a whole (both CPUFreq and device drivers of
>> devfreq).
>>
>> CPU_DMA_THROUGHPUT is quite similar as CPU_DMA_LATENCY. However, we
>> think it is addtionally needed because many IPs (in-SoC devices) need
>> to specify its DMA usage in "kbytes/sec", not "usecs/ops". For
>> example, a video-decoding chip device driver may say it requires
>> "750000kbytes/sec" for 1080p60, "300000kbytes/sec" for 720p60, and so
>> on, which affects CPUfreq, memory-interface, and main-bus at the same
>> time.
> I have an example of a need for cpu_dma_throughput for x86 soc's as
> well.  Mostly my example comes down to on-demand thinking the work load
> is low (gpu is doing all the work) yet the work load needs a higher
> clock rates between frame times to avoid buffer under running the gfx
> pipe).
>
> My version of the patch didn't fly too well because it failed to offer a
> scalable definition of the units of cpu_dma_throughput.  I tried using
> KHZ as the unit (the units used in cpufreq).  However; Applications
> written to assume HZ units on one system would need to re-written on the
> next.  Perhaps using bandwidth would be better than throughput?
>
>

The unit itself won't change whether we use bandwidth or throughput
here; i.e., throughput is often referred as "effective bandwidth". For
applications and middleware, throughput is more attractive than
bandwidth because they will prefer to express what they want
(throughput or "effective" bandwidth), not what the devices should do
(bandwidth). For example, the bandwidth of a 100MHz-128bit bus is
12.8Gbps; however, if this bus is considered to be saturated at 30% of
its bandwidth, the throughput will be around 3.84Gbps. The QoS users
will prefer the latter because it can be expressed independently from
the architecture; it doesn't matter if it is saturated at 30% or 50%.
Besides, such variables should be known to the bus device driver
(i.e., "exynos4-bus" devfreq driver, which assumes 30% or 40%
depending on architectures).

Anyway, we've been using kHz with a test version of this (driven by
GPUs like your example), which has to be changed even between Exynos4
series. :)

>
>> >
>> >> 2. Does DVFS Latency matter?
>> >>
>> >> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
>> >> S2 equivalent; not exactly as I'm not conducted in Android systems,
>> >> but Tizen), we could see noticable difference w/ bare eyes for
>> >> user-input responses. When we shortened DVFS polling interval with
>> >> touches, the touch responses were greatly improved; e.g., losing 10
>> >> frames into losing 0 or 1 frame for a sudden input rush.
>> >
>> > Well, this basically means PM QoS matters, which is kind of obvious.
>> > It doesn't mean that it can't be implemented in a better way, though.
>>
>> For DVFS-Latency and DMA-Throughput, I think a normal pm-qos-dev (one
>> device per one qos knob) isn't appropriate because there are multiple
>> devices that are required to react simultaneously.
>>
>> It is possible to let multiple devices react by adding notifiers with
>> dev_pm_qos_add_notifier(). However, I felt that it wasn't the purpose
>> of this one and it might get things ugly. Anyway, was allowing
>> multiple devices to change their frequencies/voltages for a single
>> per-device QoS list the purpose of dev_pm_qos_add_notifier()?
>>
>>
>> Just throwing an idea and suggestion if it was the purpose,
>> I speculate that If we are going to do this (supporting multiple
>> devices per one qos knob without adding QoS class), we'd better create
>> "qos class device" in /drivers/qos/ and let those qos class handle
>> multiple devices depending on a single "qos class". Probably, this
>> will transform "global PM-QoS class" that notifies related devices
>> into "QoS class device" that notifies related devices.
>>
>> >
>> >> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
>> >>
>> >> When we implement the user-input response enhancement with CPU-DMA QoS
>> >> requests, the PM-QoS will unconditionally increase CPU and BUS
>> >> frequencies/voltages with user inputs. However, with many cases it is
>> >> unnecessary; i.e., a user input means that there will be unexpected
>> >> changes soon; however, the change does not mean that the load will
>> >> increase. Thus, allowing DVFS mechanism to evolve faster was enough to
>> >> shorten the response time and not to increase frequencies and voltages
>> >> when not needed. There were significant difference in power
>> >> consumption with this changes if the user inputs were not involving
>> >> drastic graphics jobs; e.g., typing a text message.
>> >
>> > Again, you're arguing for having PM QoS rather than not having it.  You don't
>> > have to do that. :-)
>> >
>> > Generally speaking, I don't think we should add any more PM QoS "classes"
>> > as defined in pm_qos.h, since they are global and there's only one
>> > list of requests per class.  While that may be good for CPU power
>> > management (in an SMP system all CPUs are identical, so the same list of
>> > requests may be applied to all of them), it generally isn't for I/O
>> > devices (some of them work in different time scales, for example).
>> >
>> > So, for example, most likely, a list of PM QoS requests for storage devices
>> > shouldn't be applied to input devices (keyboards and mice to be precise) and
>> > vice versa.
>> >
>> > On the other hand, I don't think that applications should access PM QoS
>> > interfaces associated with individual devices directly, because they may
>> > not have enough information about the relationships between devices in the
>> > system.  So, perhaps, there needs to be an interface allowing applications
>> > to specify their PM QoS expectations in a general way (e.g. "I want <number>
>> > disk I/O throughput") and a code layer between that interface and device
>> > drivers translating those expecataions into PM QoS requests for specific
>> > devices.
>>
>> With DVFS Latency PM QoS Class, we can say "I want the system to react
>> in 50ms for any sudden utilization increases.". Without it, we should
>> say, for example, "CPUFreq/Ondemand should set interval at 25ms,
>> Devfreq/Bus should set interval at 25ms, and Devfreq/GPU should set
>> interval at 10ms."
>>
>> And with CPU Throughput PM QoS Class, we can say "I want 1000000
>> kbytes/sec DMA transfer". Without it, we should say "Memory-Interface
>> at 1000000 kbytes/sec, Exynos4412 core should be at least 500MHz, and
>> Bus should be at least 166MHz".
>>
>
> What things are coming down to is we need to see if we can identify good
> abstractions that can be portable / scalable across ISA's and boards,
> such that applications would not need to be changed to work correctly
> across all of them.
>
> One issue I have with adding a single DVFS latency and throughput pm-qos
> parameter is that what Device the DVFS *really* means changes from one
> board to the next.  Thus making it impossible to abstract to user mode.
>
> --mark

Does "what QoS/DVFS really means changes from one board to the next"
mean that the specific behavior with the same QoS request changes from
one board to the next?

In other words, for a 10MByte/sec CPU_DMA_THROUGHPUT QoS request,
- In board A, it means setting CPU at least at 200MHz,
Memory-interface at 166MHz, Bus at 166MHz.
- In board B, it means seeting Bus at least at 200MHz.
Or, for a 50ms DVFS_LATENCY QoS request,
- In board A, it means setting cpufreq interval at 30ms
- In board B, it means setting cpufreq interval at 30ms,
devfreq-memory at 30ms, devfreq-GPU at 20ms.

Is the issue you've mentioned the difference between board A and B
above? or something else I'm missing?


My understanding is that (global) PM-QoS is abstracting (hiding) what
PM-QoS really does to hardware from userspace or other QoS user device
drivers by allowing to express the requirement in the user side (what
users want, not the h/w or device drivers do).

What userspace needs to know (or to express) is:
NETWORK_THROUGHPUT: how many kbytes per sec can we send? (abstracting
how bus, NIC react to such requests; e.g., some NIC does 100MHz for
10Mbps, some other NICs do 33MHz for 10Mbps, and such)
CPU_DMA_THROUGHPUT: how many kbytes per sec can we send? (abstracting
how bus device driver and memory-interface driver does; e.g., for
100Mbps, bus: 100MHz, mif: 100MHz in machine A, for 100Mbps, bus:
133MHz, mif: 200MHz in machine B, and such)
DVFS_LATENCY: how long may we wait for DVFS mechanisms to react?
(abstracting how each devfreq/cpufreq driver behave). An FPS game
might want 10ms, a web browser may want 20ms, and a touch screen event
might want 10ms, and a menuscreen (home screen) manager may want 50ms.
Based on this QoS value, the polling interval of cpufreq/devfreq
devices may be changed. If there were only cpufreq for DVFS
mechanisms, I won't bother with this issue. However, there may be
multiple DVFS devices with independent polling intervals; thus,
without some interfaces like DVFS_LATENCY, userspace or QoS-requesting
device drivers need to know and control every DVFS device's polling
interval making them dependent on specific boards/devices.


ps. For the example Rafael wanted for CPU_DMA_THROUGHPUT, I'll create
one when the corresponding (Qos-requesting) device drivers are ready
to be upstreamed. The device driver is with KHz of memory-interface,
yet.

Cheers!
MyungJoo.
-- 
MyungJoo Ham, Ph.D.
System S/W Lab, S/W Center, Samsung Electronics

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency
  2012-03-26 12:06             ` MyungJoo Ham
@ 2012-03-26 14:38               ` mark gross
  0 siblings, 0 replies; 19+ messages in thread
From: mark gross @ 2012-03-26 14:38 UTC (permalink / raw)
  To: MyungJoo Ham
  Cc: markgross, Rafael J. Wysocki, Stephen Rothwell, Dave Jones,
	linux-pm, linux-next, Len Brown, Pavel Machek, Kevin Hilman,
	Jean Pihet, kyungmin.park, linux-kernel

On Mon, Mar 26, 2012 at 09:06:48PM +0900, MyungJoo Ham wrote:
> On Mon, Mar 19, 2012 at 2:06 AM, mark gross <markgross@thegnar.org> wrote:
> > On Fri, Mar 16, 2012 at 05:30:33PM +0900, MyungJoo Ham wrote:
> >> On Sun, Mar 11, 2012 at 7:53 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> > On Friday, March 09, 2012, MyungJoo Ham wrote:
> >> >> On Thu, Mar 8, 2012 at 12:47 PM, mark gross <markgross@thegnar.org> wrote:
> >> >> > On Wed, Mar 07, 2012 at 02:02:01PM +0900, MyungJoo Ham wrote:
> >> >> >> 1. CPU_DMA_THROUGHPUT
> >> >> ...
> >> >> >> 2. DVFS_LATENCY
> >> >> >
> >> >> > The cpu_dma_throughput looks ok to me.  I do however; wonder about the
> >> >> > dvfs_lat_pm_qos.  Should that knob be exposed to user mode?  Does that
> >> >> > matter so much?  why can't dvfs_lat use the cpu_dma_lat?
> >> >> >
> >> >> > BTW I'll be out of town for the next 10 days and probably will not get
> >> >> > to this email account until I get home.
> >> >> >
> >> >> > --mark
> >> >> >
> >> >>
> >> >> 1. Should DVFS Latency be exposed to user mode?
> >> >>
> >> >> It would depend on the policy of the given system; however, yes, there
> >> >> are systems that require a user interface for DVFS Latency.
> >> >> With the example of user input response (response to user click,
> >> >> typing, touching, and etc), a user program (probably platform s/w or
> >> >> middleware) may input QoS requests. Besides, when a new "application"
> >> >> is starting, such "middleware" may want faster responses from DVFS
> >> >> mechanisms.
> >> >
> >> > But this is a global knob, isn't it?  And it seems that a per-device one
> >> > is needed rather than that?
> >> >
> >> > It also applies to your CPU_DMA_THROUGHPUT thing, doesn't it?
> >>
> >>
> >> Yes, the two are global knobs. And both the two control multiple
> >> devices simultaneously, not just a single device. I suppose per-device
> >> QoS is appropriate for QoS requests directed to a single device. Am I
> >> right about this one?
> >>
> >>
> >> Let's assume that, in an example system, we have devfreq on GPU,
> >> memory-Interface, and main bus and CPUfreq (Exynos5 will have them all
> >> seperated).
> >>
> >> If we use per-device QoS for DVFS LATENCY, in order to control the
> >> DVFS response latency, we will need to make QoS requests to all the
> >> four devices independently, not to the global DVFS LATENCY QOS CLASS.
> >> There, we could have a shared single QoS request list for these four
> >> DVFS devices, saying that the DVFS response should be done in "50ms"
> >> after a sudden utilization increase.
> >>
> >> We may be able to use "dev_pm_qos_add_notifier()" for a virtual device
> >> representing "DVFS Latency" or "DMA Throughput" and let the GPU, CPU,
> >> main-bus, and memory-interface listen to the events from the virtual
> >> device. Hmm..., do you recommend this approach? creating a device
> >> representing "DVFS" as a whole (both CPUFreq and device drivers of
> >> devfreq).
> >>
> >> CPU_DMA_THROUGHPUT is quite similar as CPU_DMA_LATENCY. However, we
> >> think it is addtionally needed because many IPs (in-SoC devices) need
> >> to specify its DMA usage in "kbytes/sec", not "usecs/ops". For
> >> example, a video-decoding chip device driver may say it requires
> >> "750000kbytes/sec" for 1080p60, "300000kbytes/sec" for 720p60, and so
> >> on, which affects CPUfreq, memory-interface, and main-bus at the same
> >> time.
> > I have an example of a need for cpu_dma_throughput for x86 soc's as
> > well.  Mostly my example comes down to on-demand thinking the work load
> > is low (gpu is doing all the work) yet the work load needs a higher
> > clock rates between frame times to avoid buffer under running the gfx
> > pipe).
> >
> > My version of the patch didn't fly too well because it failed to offer a
> > scalable definition of the units of cpu_dma_throughput.  I tried using
> > KHZ as the unit (the units used in cpufreq).  However; Applications
> > written to assume HZ units on one system would need to re-written on the
> > next.  Perhaps using bandwidth would be better than throughput?
> >
> >
> 
> The unit itself won't change whether we use bandwidth or throughput
> here; i.e., throughput is often referred as "effective bandwidth". For
> applications and middleware, throughput is more attractive than
> bandwidth because they will prefer to express what they want
> (throughput or "effective" bandwidth), not what the devices should do
> (bandwidth). For example, the bandwidth of a 100MHz-128bit bus is
> 12.8Gbps; however, if this bus is considered to be saturated at 30% of
> its bandwidth, the throughput will be around 3.84Gbps. The QoS users
> will prefer the latter because it can be expressed independently from
> the architecture; it doesn't matter if it is saturated at 30% or 50%.
> Besides, such variables should be known to the bus device driver
> (i.e., "exynos4-bus" devfreq driver, which assumes 30% or 40%
> depending on architectures).
> 
> Anyway, we've been using kHz with a test version of this (driven by
> GPUs like your example), which has to be changed even between Exynos4
> series. :)
> 
> >
> >> >
> >> >> 2. Does DVFS Latency matter?
> >> >>
> >> >> Yes, in our experimental sets w/ Exynos4210 (those slapped in Galaxy
> >> >> S2 equivalent; not exactly as I'm not conducted in Android systems,
> >> >> but Tizen), we could see noticable difference w/ bare eyes for
> >> >> user-input responses. When we shortened DVFS polling interval with
> >> >> touches, the touch responses were greatly improved; e.g., losing 10
> >> >> frames into losing 0 or 1 frame for a sudden input rush.
> >> >
> >> > Well, this basically means PM QoS matters, which is kind of obvious.
> >> > It doesn't mean that it can't be implemented in a better way, though.
> >>
> >> For DVFS-Latency and DMA-Throughput, I think a normal pm-qos-dev (one
> >> device per one qos knob) isn't appropriate because there are multiple
> >> devices that are required to react simultaneously.
> >>
> >> It is possible to let multiple devices react by adding notifiers with
> >> dev_pm_qos_add_notifier(). However, I felt that it wasn't the purpose
> >> of this one and it might get things ugly. Anyway, was allowing
> >> multiple devices to change their frequencies/voltages for a single
> >> per-device QoS list the purpose of dev_pm_qos_add_notifier()?
> >>
> >>
> >> Just throwing an idea and suggestion if it was the purpose,
> >> I speculate that If we are going to do this (supporting multiple
> >> devices per one qos knob without adding QoS class), we'd better create
> >> "qos class device" in /drivers/qos/ and let those qos class handle
> >> multiple devices depending on a single "qos class". Probably, this
> >> will transform "global PM-QoS class" that notifies related devices
> >> into "QoS class device" that notifies related devices.
> >>
> >> >
> >> >> 3. Why not replace DVFS Latency w/ CPU-DMA-Latency/Throughput?
> >> >>
> >> >> When we implement the user-input response enhancement with CPU-DMA QoS
> >> >> requests, the PM-QoS will unconditionally increase CPU and BUS
> >> >> frequencies/voltages with user inputs. However, with many cases it is
> >> >> unnecessary; i.e., a user input means that there will be unexpected
> >> >> changes soon; however, the change does not mean that the load will
> >> >> increase. Thus, allowing DVFS mechanism to evolve faster was enough to
> >> >> shorten the response time and not to increase frequencies and voltages
> >> >> when not needed. There were significant difference in power
> >> >> consumption with this changes if the user inputs were not involving
> >> >> drastic graphics jobs; e.g., typing a text message.
> >> >
> >> > Again, you're arguing for having PM QoS rather than not having it.  You don't
> >> > have to do that. :-)
> >> >
> >> > Generally speaking, I don't think we should add any more PM QoS "classes"
> >> > as defined in pm_qos.h, since they are global and there's only one
> >> > list of requests per class.  While that may be good for CPU power
> >> > management (in an SMP system all CPUs are identical, so the same list of
> >> > requests may be applied to all of them), it generally isn't for I/O
> >> > devices (some of them work in different time scales, for example).
> >> >
> >> > So, for example, most likely, a list of PM QoS requests for storage devices
> >> > shouldn't be applied to input devices (keyboards and mice to be precise) and
> >> > vice versa.
> >> >
> >> > On the other hand, I don't think that applications should access PM QoS
> >> > interfaces associated with individual devices directly, because they may
> >> > not have enough information about the relationships between devices in the
> >> > system.  So, perhaps, there needs to be an interface allowing applications
> >> > to specify their PM QoS expectations in a general way (e.g. "I want <number>
> >> > disk I/O throughput") and a code layer between that interface and device
> >> > drivers translating those expecataions into PM QoS requests for specific
> >> > devices.
> >>
> >> With DVFS Latency PM QoS Class, we can say "I want the system to react
> >> in 50ms for any sudden utilization increases.". Without it, we should
> >> say, for example, "CPUFreq/Ondemand should set interval at 25ms,
> >> Devfreq/Bus should set interval at 25ms, and Devfreq/GPU should set
> >> interval at 10ms."
> >>
> >> And with CPU Throughput PM QoS Class, we can say "I want 1000000
> >> kbytes/sec DMA transfer". Without it, we should say "Memory-Interface
> >> at 1000000 kbytes/sec, Exynos4412 core should be at least 500MHz, and
> >> Bus should be at least 166MHz".
> >>
> >
> > What things are coming down to is we need to see if we can identify good
> > abstractions that can be portable / scalable across ISA's and boards,
> > such that applications would not need to be changed to work correctly
> > across all of them.
> >
> > One issue I have with adding a single DVFS latency and throughput pm-qos
> > parameter is that what Device the DVFS *really* means changes from one
> > board to the next.  Thus making it impossible to abstract to user mode.
> >
> > --mark
> 
> Does "what QoS/DVFS really means changes from one board to the next"
> mean that the specific behavior with the same QoS request changes from
> one board to the next?

I mean that on one board the DVFS could be used for constraining the VFS
of say the audio codec but on another board its use for constraining the
VFS of the graphics chip.

> In other words, for a 10MByte/sec CPU_DMA_THROUGHPUT QoS request,
> - In board A, it means setting CPU at least at 200MHz,
> Memory-interface at 166MHz, Bus at 166MHz.
> - In board B, it means seeting Bus at least at 200MHz.
> Or, for a 50ms DVFS_LATENCY QoS request,
> - In board A, it means setting cpufreq interval at 30ms
> - In board B, it means setting cpufreq interval at 30ms,
> devfreq-memory at 30ms, devfreq-GPU at 20ms.
> 
> Is the issue you've mentioned the difference between board A and B
> above? or something else I'm missing?
> 
> 
> My understanding is that (global) PM-QoS is abstracting (hiding) what
> PM-QoS really does to hardware from userspace or other QoS user device
> drivers by allowing to express the requirement in the user side (what
> users want, not the h/w or device drivers do).
> 
> What userspace needs to know (or to express) is:
> NETWORK_THROUGHPUT: how many kbytes per sec can we send? (abstracting
> how bus, NIC react to such requests; e.g., some NIC does 100MHz for
> 10Mbps, some other NICs do 33MHz for 10Mbps, and such)
> CPU_DMA_THROUGHPUT: how many kbytes per sec can we send? (abstracting
> how bus device driver and memory-interface driver does; e.g., for
> 100Mbps, bus: 100MHz, mif: 100MHz in machine A, for 100Mbps, bus:
> 133MHz, mif: 200MHz in machine B, and such)
> DVFS_LATENCY: how long may we wait for DVFS mechanisms to react?
for which device?
there can be a number of devices that can do VFS.  Which one should the
DVFS_LATENCY be associated with?  And how would user mode know which one
its effecting by hitting the misc device node interface?

> (abstracting how each devfreq/cpufreq driver behave). An FPS game
> might want 10ms, a web browser may want 20ms, and a touch screen event
> might want 10ms, and a menuscreen (home screen) manager may want 50ms.
> Based on this QoS value, the polling interval of cpufreq/devfreq
> devices may be changed. If there were only cpufreq for DVFS
> mechanisms, I won't bother with this issue. However, there may be
> multiple DVFS devices with independent polling intervals; thus,
> without some interfaces like DVFS_LATENCY, userspace or QoS-requesting
> device drivers need to know and control every DVFS device's polling
> interval making them dependent on specific boards/devices.
> 
> 
> ps. For the example Rafael wanted for CPU_DMA_THROUGHPUT, I'll create
> one when the corresponding (Qos-requesting) device drivers are ready
> to be upstreamed. The device driver is with KHz of memory-interface,
> yet.
>
That sounds cool.

--mark



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: build failure after merge of the cpufreq tree
  2012-03-01  2:56 linux-next: build failure after merge of the cpufreq tree Stephen Rothwell
@ 2012-03-01  3:01 ` Dave Jones
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Jones @ 2012-03-01  3:01 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-next, linux-kernel, MyungJoo Ham, Kyungmin Park

On Thu, Mar 01, 2012 at 01:56:01PM +1100, Stephen Rothwell wrote:
 > Hi Dave,
 > 
 > After merging the cpufreq tree, today's linux-next build (x86_64
 > allmodconfig) failed like this:
 > 
 > drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_init':
 > drivers/cpufreq/cpufreq_ondemand.c:880:28: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)
 > drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_exit':
 > drivers/cpufreq/cpufreq_ondemand.c:896:25: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)
 > 
 > Caused by commit 500e8ca39c56 ("[CPUFREQ] ondemand: handle QoS request on
 > DVFS response latency").
 > 
 > I have used the cpufreq tree fomr next-20120229 for today.

gah, this patch was dependant upon another that didn't apply cleanly.

	Dave


^ permalink raw reply	[flat|nested] 19+ messages in thread

* linux-next: build failure after merge of the cpufreq tree
@ 2012-03-01  2:56 Stephen Rothwell
  2012-03-01  3:01 ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Rothwell @ 2012-03-01  2:56 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-next, linux-kernel, MyungJoo Ham, Kyungmin Park

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

Hi Dave,

After merging the cpufreq tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_init':
drivers/cpufreq/cpufreq_ondemand.c:880:28: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)
drivers/cpufreq/cpufreq_ondemand.c: In function 'cpufreq_gov_dbs_exit':
drivers/cpufreq/cpufreq_ondemand.c:896:25: error: 'PM_QOS_DVFS_RESPONSE_LATENCY' undeclared (first use in this function)

Caused by commit 500e8ca39c56 ("[CPUFREQ] ondemand: handle QoS request on
DVFS response latency").

I have used the cpufreq tree fomr next-20120229 for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-03-26 14:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-05  1:46 linux-next: build failure after merge of the cpufreq tree MyungJoo Ham
2012-03-05  1:46 ` MyungJoo Ham
2012-03-07  5:02 ` [PATCH v3] PM / QoS: Introduce new classes: DMA-Throughput and DVFS-Latency MyungJoo Ham
2012-03-07  9:02   ` Rafael J. Wysocki
2012-03-07  9:36     ` MyungJoo Ham
2012-03-09  8:17       ` MyungJoo Ham
2012-03-10 22:22         ` Rafael J. Wysocki
2012-03-08  3:47   ` mark gross
2012-03-09  5:53     ` MyungJoo Ham
2012-03-10 22:53       ` Rafael J. Wysocki
2012-03-16  8:30         ` MyungJoo Ham
2012-03-17  0:01           ` Rafael J. Wysocki
2012-03-18 17:06           ` mark gross
2012-03-26 12:06             ` MyungJoo Ham
2012-03-26 14:38               ` mark gross
2012-03-18 16:50         ` mark gross
2012-03-10 22:25     ` Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2012-03-01  2:56 linux-next: build failure after merge of the cpufreq tree Stephen Rothwell
2012-03-01  3:01 ` Dave Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.