linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
@ 2020-04-28 16:41 Qais Yousef
  2020-04-28 16:41 ` [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
  2020-04-29 11:32 ` [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Pavan Kondeti
  0 siblings, 2 replies; 11+ messages in thread
From: Qais Yousef @ 2020-04-28 16:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Qais Yousef, Jonathan Corbet, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, Quentin Perret,
	Valentin Schneider, Patrick Bellasi, Pavan Kondeti, linux-doc,
	linux-kernel, linux-fsdevel

RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.

This is also referred to as 'the default boost value of RT tasks'.

See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks").

On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.

For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.

Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.

They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.

The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.

Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.

To control the default behavior globally by system admins and device
integrators, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.

I anticipate this to be mostly in the form of modifying the init script
of a particular device.

Whenever the new default changes, it'd be applied lazily on the next
opportunity the scheduler needs to calculate the effective uclamp.min
value for the task, assuming that it still uses the system default value
and not a user applied one.

Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.

With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.

[1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware")

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
CC: Jonathan Corbet <corbet@lwn.net>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Vincent Guittot <vincent.guittot@linaro.org>
CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ben Segall <bsegall@google.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Luis Chamberlain <mcgrof@kernel.org>
CC: Kees Cook <keescook@chromium.org>
CC: Iurii Zaikin <yzaikin@google.com>
CC: Quentin Perret <qperret@google.com>
CC: Valentin Schneider <valentin.schneider@arm.com>
CC: Patrick Bellasi <patrick.bellasi@matbug.net>
CC: Pavan Kondeti <pkondeti@codeaurora.org>
CC: linux-doc@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
---

Changes in v3:

	* Do the sync in uclamp_eff_get() (Patrck & Dietmar)
	* Rename to sysctl_sched_uclamp_util_min_rt_default (Patrick, Steve,
	  Dietmar)
	* Ensure the sync is applied only to RT tasks (Patrick)

v2 can be found here (apologies forgot to mark it as v2 in the subject)

https://lore.kernel.org/lkml/20200403123020.13897-1-qais.yousef@arm.com/

 include/linux/sched/sysctl.h |  1 +
 kernel/sched/core.c          | 63 +++++++++++++++++++++++++++++++++---
 kernel/sysctl.c              |  7 ++++
 3 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215ee03f..e62cef019094 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -59,6 +59,7 @@ extern int sysctl_sched_rt_runtime;
 #ifdef CONFIG_UCLAMP_TASK
 extern unsigned int sysctl_sched_uclamp_util_min;
 extern unsigned int sysctl_sched_uclamp_util_max;
+extern unsigned int sysctl_sched_uclamp_util_min_rt_default;
 #endif
 
 #ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9a2fbf98fd6f..17325b4aa451 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -790,6 +790,26 @@ unsigned int sysctl_sched_uclamp_util_min = SCHED_CAPACITY_SCALE;
 /* Max allowed maximum utilization */
 unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE;
 
+/*
+ * By default RT tasks run at the maximum performance point/capacity of the
+ * system. Uclamp enforces this by always setting UCLAMP_MIN of RT tasks to
+ * SCHED_CAPACITY_SCALE.
+ *
+ * This knob allows admins to change the default behavior when uclamp is being
+ * used. In battery powered devices, particularly, running at the maximum
+ * capacity and frequency will increase energy consumption and shorten the
+ * battery life.
+ *
+ * This knob only affects RT tasks that their uclamp_se->user_defined == false.
+ *
+ * This knob will not override the system default sched_util_clamp_min defined
+ * above.
+ *
+ * Any modification is applied lazily on the next attempt to calculate the
+ * effective value of the task.
+ */
+unsigned int sysctl_sched_uclamp_util_min_rt_default = SCHED_CAPACITY_SCALE;
+
 /* All clamps are required to be less or equal than these values */
 static struct uclamp_se uclamp_default[UCLAMP_CNT];
 
@@ -872,6 +892,14 @@ unsigned int uclamp_rq_max_value(struct rq *rq, enum uclamp_id clamp_id,
 	return uclamp_idle_value(rq, clamp_id, clamp_value);
 }
 
+static void uclamp_sync_util_min_rt_default(struct task_struct *p)
+{
+	struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
+
+	if (unlikely(rt_task(p)) && !uc_se->user_defined)
+		uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
+}
+
 static inline struct uclamp_se
 uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
 {
@@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
 static inline struct uclamp_se
 uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
 {
-	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
-	struct uclamp_se uc_max = uclamp_default[clamp_id];
+	struct uclamp_se uc_req, uc_max;
+
+	/*
+	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
+	 */
+	uclamp_sync_util_min_rt_default(p);
+
+	uc_req = uclamp_tg_restrict(p, clamp_id);
+	uc_max = uclamp_default[clamp_id];
 
 	/* System default restrictions always apply */
 	if (unlikely(uc_req.value > uc_max.value))
@@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 				loff_t *ppos)
 {
 	bool update_root_tg = false;
-	int old_min, old_max;
+	int old_min, old_max, old_min_rt;
 	int result;
 
 	mutex_lock(&uclamp_mutex);
 	old_min = sysctl_sched_uclamp_util_min;
 	old_max = sysctl_sched_uclamp_util_max;
+	old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
 
 	result = proc_dointvec(table, write, buffer, lenp, ppos);
 	if (result)
@@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 		goto undo;
 	}
 
+	/*
+	 * The new value will be applied to RT tasks the next time the
+	 * scheduler needs to calculate the effective uclamp.min for that task,
+	 * assuming the task is using the system default and not a user
+	 * specified value. In the latter we shall leave the value as the user
+	 * requested.
+	 */
+	if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
+		result = -EINVAL;
+		goto undo;
+	}
+
 	if (old_min != sysctl_sched_uclamp_util_min) {
 		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
 			      sysctl_sched_uclamp_util_min, false);
@@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 undo:
 	sysctl_sched_uclamp_util_min = old_min;
 	sysctl_sched_uclamp_util_max = old_max;
+	sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
 done:
 	mutex_unlock(&uclamp_mutex);
 
@@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
 		if (uc_se->user_defined)
 			continue;
 
-		/* By default, RT tasks always get 100% boost */
+		/*
+		 * By default, RT tasks always get 100% boost, which the admins
+		 * are allowed to change via
+		 * sysctl_sched_uclamp_util_min_rt_default knob.
+		 */
 		if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
-			clamp_value = uclamp_none(UCLAMP_MAX);
+			clamp_value = sysctl_sched_uclamp_util_min_rt_default;
 
 		uclamp_se_set(uc_se, clamp_value, false);
 	}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a176d8727a3..64117363c502 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= sysctl_sched_uclamp_handler,
 	},
+	{
+		.procname	= "sched_util_clamp_min_rt_default",
+		.data		= &sysctl_sched_uclamp_util_min_rt_default,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sysctl_sched_uclamp_handler,
+	},
 #endif
 #ifdef CONFIG_SCHED_AUTOGROUP
 	{
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs
  2020-04-28 16:41 [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
@ 2020-04-28 16:41 ` Qais Yousef
  2020-04-28 17:43   ` Randy Dunlap
       [not found]   ` <BL0PR14MB377949FBF2B798EEC425EE719AAA0@BL0PR14MB3779.namprd14.prod.outlook.com>
  2020-04-29 11:32 ` [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Pavan Kondeti
  1 sibling, 2 replies; 11+ messages in thread
From: Qais Yousef @ 2020-04-28 16:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Qais Yousef, Jonathan Corbet, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, Quentin Perret,
	Valentin Schneider, Patrick Bellasi, Pavan Kondeti, linux-doc,
	linux-kernel, linux-fsdevel

Uclamp exposes 3 sysctl knobs:

	* sched_util_clamp_min
	* sched_util_clamp_max
	* sched_util_clamp_min_rt_default

Document them in sysctl/kernel.rst.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
CC: Jonathan Corbet <corbet@lwn.net>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Vincent Guittot <vincent.guittot@linaro.org>
CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ben Segall <bsegall@google.com>
CC: Mel Gorman <mgorman@suse.de>
CC: Luis Chamberlain <mcgrof@kernel.org>
CC: Kees Cook <keescook@chromium.org>
CC: Iurii Zaikin <yzaikin@google.com>
CC: Quentin Perret <qperret@google.com>
CC: Valentin Schneider <valentin.schneider@arm.com>
CC: Patrick Bellasi <patrick.bellasi@matbug.net>
CC: Pavan Kondeti <pkondeti@codeaurora.org>
CC: linux-doc@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
---
 Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0d427fd10941..e7255f71493c 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
 incurs a small amount of overhead in the scheduler but is
 useful for debugging and performance tuning.
 
+sched_util_clamp_min:
+=====================
+
+Max allowed *minimum* utilization.
+
+Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
+value.
+
+It means that any requested uclamp.min value cannot be greater than
+sched_util_clamp_min, ie: it is restricted to the range
+[0:sched_util_clamp_min].
+
+sched_util_clamp_max:
+=====================
+
+Max allowed *maximum* utilization.
+
+Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
+value.
+
+It means that any requested uclamp.max value cannot be greater than
+sched_util_clamp_max, ie: it is restricted to the range
+[0:sched_util_clamp_max].
+
+sched_util_clamp_min_rt_default:
+================================
+
+By default Linux is tuned for performance. Which means that RT tasks always run
+at the highest frequency and most capable (highest capacity) CPU (in
+heterogeneous systems).
+
+Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
+SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to
+run at the highest frequency and bias them to run on the biggest CPU.
+
+This knob allows admins to change the default behavior when uclamp is being
+used. In battery powered devices particularly, running at the maximum
+capacity and frequency will increase energy consumption and shorten the battery
+life.
+
+This knob is only effective for RT tasks which the user hasn't modified their
+requested uclamp.min value via sched_setattr() syscall.
+
+This knob will not escape the constraint imposed by sched_util_clamp_min
+defined above.
+
+Any modification is applied lazily on the next opportunity the scheduler needs
+to calculate the effective value of uclamp.min of the task.
 
 seccomp
 =======
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs
  2020-04-28 16:41 ` [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
@ 2020-04-28 17:43   ` Randy Dunlap
  2020-04-29  9:54     ` Qais Yousef
       [not found]   ` <BL0PR14MB377949FBF2B798EEC425EE719AAA0@BL0PR14MB3779.namprd14.prod.outlook.com>
  1 sibling, 1 reply; 11+ messages in thread
From: Randy Dunlap @ 2020-04-28 17:43 UTC (permalink / raw)
  To: Qais Yousef, Peter Zijlstra, Ingo Molnar
  Cc: Jonathan Corbet, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Quentin Perret, Valentin Schneider,
	Patrick Bellasi, Pavan Kondeti, linux-doc, linux-kernel,
	linux-fsdevel

Hi--

I have a few corrections for you below:

On 4/28/20 9:41 AM, Qais Yousef wrote:
> Uclamp exposes 3 sysctl knobs:
> 
> 	* sched_util_clamp_min
> 	* sched_util_clamp_max
> 	* sched_util_clamp_min_rt_default
> 
> Document them in sysctl/kernel.rst.
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> CC: Jonathan Corbet <corbet@lwn.net>
> CC: Juri Lelli <juri.lelli@redhat.com>
> CC: Vincent Guittot <vincent.guittot@linaro.org>
> CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ben Segall <bsegall@google.com>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Luis Chamberlain <mcgrof@kernel.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: Iurii Zaikin <yzaikin@google.com>
> CC: Quentin Perret <qperret@google.com>
> CC: Valentin Schneider <valentin.schneider@arm.com>
> CC: Patrick Bellasi <patrick.bellasi@matbug.net>
> CC: Pavan Kondeti <pkondeti@codeaurora.org>
> CC: linux-doc@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-fsdevel@vger.kernel.org
> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 0d427fd10941..e7255f71493c 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
>  incurs a small amount of overhead in the scheduler but is
>  useful for debugging and performance tuning.
>  
> +sched_util_clamp_min:
> +=====================
> +
> +Max allowed *minimum* utilization.
> +
> +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> +value.
> +
> +It means that any requested uclamp.min value cannot be greater than
> +sched_util_clamp_min, ie: it is restricted to the range

                         i.e., it is

> +[0:sched_util_clamp_min].
> +
> +sched_util_clamp_max:
> +=====================
> +
> +Max allowed *maximum* utilization.
> +
> +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> +value.
> +
> +It means that any requested uclamp.max value cannot be greater than
> +sched_util_clamp_max, ie: it is restricted to the range

                         i.e., it is

> +[0:sched_util_clamp_max].
> +
> +sched_util_clamp_min_rt_default:
> +================================
> +
> +By default Linux is tuned for performance. Which means that RT tasks always run
> +at the highest frequency and most capable (highest capacity) CPU (in
> +heterogeneous systems).
> +
> +Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
> +SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to

                               by default, which

> +run at the highest frequency and bias them to run on the biggest CPU.

                                    biases them

> +
> +This knob allows admins to change the default behavior when uclamp is being
> +used. In battery powered devices particularly, running at the maximum
> +capacity and frequency will increase energy consumption and shorten the battery
> +life.
> +
> +This knob is only effective for RT tasks which the user hasn't modified their
> +requested uclamp.min value via sched_setattr() syscall.
> +
> +This knob will not escape the constraint imposed by sched_util_clamp_min
> +defined above.
> +
> +Any modification is applied lazily on the next opportunity the scheduler needs
> +to calculate the effective value of uclamp.min of the task.
>  
>  seccomp
>  =======
> 

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs
  2020-04-28 17:43   ` Randy Dunlap
@ 2020-04-29  9:54     ` Qais Yousef
  0 siblings, 0 replies; 11+ messages in thread
From: Qais Yousef @ 2020-04-29  9:54 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi,
	Pavan Kondeti, linux-doc, linux-kernel, linux-fsdevel

On 04/28/20 10:43, Randy Dunlap wrote:
> Hi--
> 
> I have a few corrections for you below:

Thanks Randy. I applied all your suggestions for the next version.

Thanks

--
Qais Yousef

> 
> On 4/28/20 9:41 AM, Qais Yousef wrote:
> > Uclamp exposes 3 sysctl knobs:
> > 
> > 	* sched_util_clamp_min
> > 	* sched_util_clamp_max
> > 	* sched_util_clamp_min_rt_default
> > 
> > Document them in sysctl/kernel.rst.
> > 
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > CC: Jonathan Corbet <corbet@lwn.net>
> > CC: Juri Lelli <juri.lelli@redhat.com>
> > CC: Vincent Guittot <vincent.guittot@linaro.org>
> > CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > CC: Steven Rostedt <rostedt@goodmis.org>
> > CC: Ben Segall <bsegall@google.com>
> > CC: Mel Gorman <mgorman@suse.de>
> > CC: Luis Chamberlain <mcgrof@kernel.org>
> > CC: Kees Cook <keescook@chromium.org>
> > CC: Iurii Zaikin <yzaikin@google.com>
> > CC: Quentin Perret <qperret@google.com>
> > CC: Valentin Schneider <valentin.schneider@arm.com>
> > CC: Patrick Bellasi <patrick.bellasi@matbug.net>
> > CC: Pavan Kondeti <pkondeti@codeaurora.org>
> > CC: linux-doc@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> > CC: linux-fsdevel@vger.kernel.org
> > ---
> >  Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
> >  1 file changed, 48 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 0d427fd10941..e7255f71493c 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
> >  incurs a small amount of overhead in the scheduler but is
> >  useful for debugging and performance tuning.
> >  
> > +sched_util_clamp_min:
> > +=====================
> > +
> > +Max allowed *minimum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.min value cannot be greater than
> > +sched_util_clamp_min, ie: it is restricted to the range
> 
>                          i.e., it is
> 
> > +[0:sched_util_clamp_min].
> > +
> > +sched_util_clamp_max:
> > +=====================
> > +
> > +Max allowed *maximum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.max value cannot be greater than
> > +sched_util_clamp_max, ie: it is restricted to the range
> 
>                          i.e., it is
> 
> > +[0:sched_util_clamp_max].
> > +
> > +sched_util_clamp_min_rt_default:
> > +================================
> > +
> > +By default Linux is tuned for performance. Which means that RT tasks always run
> > +at the highest frequency and most capable (highest capacity) CPU (in
> > +heterogeneous systems).
> > +
> > +Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
> > +SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to
> 
>                                by default, which
> 
> > +run at the highest frequency and bias them to run on the biggest CPU.
> 
>                                     biases them
> 
> > +
> > +This knob allows admins to change the default behavior when uclamp is being
> > +used. In battery powered devices particularly, running at the maximum
> > +capacity and frequency will increase energy consumption and shorten the battery
> > +life.
> > +
> > +This knob is only effective for RT tasks which the user hasn't modified their
> > +requested uclamp.min value via sched_setattr() syscall.
> > +
> > +This knob will not escape the constraint imposed by sched_util_clamp_min
> > +defined above.
> > +
> > +Any modification is applied lazily on the next opportunity the scheduler needs
> > +to calculate the effective value of uclamp.min of the task.
> >  
> >  seccomp
> >  =======
> > 
> 
> thanks.
> -- 
> ~Randy
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
  2020-04-28 16:41 [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
  2020-04-28 16:41 ` [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
@ 2020-04-29 11:32 ` Pavan Kondeti
  2020-04-29 12:30   ` Qais Yousef
  1 sibling, 1 reply; 11+ messages in thread
From: Pavan Kondeti @ 2020-04-29 11:32 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi, linux-doc,
	linux-kernel, linux-fsdevel

Hi Qais,

On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:

[...]

>  
> +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> +{
> +	struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> +
> +	if (unlikely(rt_task(p)) && !uc_se->user_defined)
> +		uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> +}

Unlike system default clamp values, RT default value is written to
p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
big deal. Just sharing my observation. Is this how you expected it to work?

> +
>  static inline struct uclamp_se
>  uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>  {
> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>  static inline struct uclamp_se
>  uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
>  {
> -	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> -	struct uclamp_se uc_max = uclamp_default[clamp_id];
> +	struct uclamp_se uc_req, uc_max;
> +
> +	/*
> +	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> +	 */
> +	uclamp_sync_util_min_rt_default(p);
> +
> +	uc_req = uclamp_tg_restrict(p, clamp_id);
> +	uc_max = uclamp_default[clamp_id];

We are calling uclamp_sync_util_min_rt_default() unnecessarily for
clamp_id == UCLAMP_MAX case. Would it be better to have a separate
uclamp_default for RT like uclamp_default_rt and select uc_max based
on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
we can cover the case of uclamp_util_min < uclamp_util_min_rt.

>  
>  	/* System default restrictions always apply */
>  	if (unlikely(uc_req.value > uc_max.value))
> @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
>  				loff_t *ppos)
>  {
>  	bool update_root_tg = false;
> -	int old_min, old_max;
> +	int old_min, old_max, old_min_rt;
>  	int result;
>  
>  	mutex_lock(&uclamp_mutex);
>  	old_min = sysctl_sched_uclamp_util_min;
>  	old_max = sysctl_sched_uclamp_util_max;
> +	old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
>  
>  	result = proc_dointvec(table, write, buffer, lenp, ppos);
>  	if (result)
> @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
>  		goto undo;
>  	}
>  
> +	/*
> +	 * The new value will be applied to RT tasks the next time the
> +	 * scheduler needs to calculate the effective uclamp.min for that task,
> +	 * assuming the task is using the system default and not a user
> +	 * specified value. In the latter we shall leave the value as the user
> +	 * requested.
> +	 */
> +	if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> +		result = -EINVAL;
> +		goto undo;
> +	}
> +
>  	if (old_min != sysctl_sched_uclamp_util_min) {
>  		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
>  			      sysctl_sched_uclamp_util_min, false);
> @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
>  undo:
>  	sysctl_sched_uclamp_util_min = old_min;
>  	sysctl_sched_uclamp_util_max = old_max;
> +	sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
>  done:
>  	mutex_unlock(&uclamp_mutex);
>  
> @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
>  		if (uc_se->user_defined)
>  			continue;
>  
> -		/* By default, RT tasks always get 100% boost */
> +		/*
> +		 * By default, RT tasks always get 100% boost, which the admins
> +		 * are allowed to change via
> +		 * sysctl_sched_uclamp_util_min_rt_default knob.
> +		 */
>  		if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> -			clamp_value = uclamp_none(UCLAMP_MAX);
> +			clamp_value = sysctl_sched_uclamp_util_min_rt_default;
>  
>  		uclamp_se_set(uc_se, clamp_value, false);
>  	}
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8a176d8727a3..64117363c502 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= sysctl_sched_uclamp_handler,
>  	},
> +	{
> +		.procname	= "sched_util_clamp_min_rt_default",
> +		.data		= &sysctl_sched_uclamp_util_min_rt_default,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 0644,
> +		.proc_handler	= sysctl_sched_uclamp_handler,
> +	},
>  #endif
>  #ifdef CONFIG_SCHED_AUTOGROUP
>  	{
> -- 
> 2.17.1
> 

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
  2020-04-29 11:32 ` [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Pavan Kondeti
@ 2020-04-29 12:30   ` Qais Yousef
  2020-04-29 15:21     ` Pavan Kondeti
  2020-04-30 18:21     ` Dietmar Eggemann
  0 siblings, 2 replies; 11+ messages in thread
From: Qais Yousef @ 2020-04-29 12:30 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi, linux-doc,
	linux-kernel, linux-fsdevel

Hi Pavan

On 04/29/20 17:02, Pavan Kondeti wrote:
> Hi Qais,
> 
> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
> 
> [...]
> 
> >  
> > +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> > +{
> > +	struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> > +
> > +	if (unlikely(rt_task(p)) && !uc_se->user_defined)
> > +		uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> > +}
> 
> Unlike system default clamp values, RT default value is written to
> p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
> lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
> big deal. Just sharing my observation. Is this how you expected it to work?

This is how I expect it to work yes.

Note that the sysctl_sched_uclamp_util_{min,max}, or 'system default clamp
values' as you called them, are constraints. They define the range any task is
_allowed to_ *request*.

The new sysctl_sched_uclamp_util_min_rt_default is setting the default
*request* for RT tasks. And this was done since historically RT tasks has
always run at the highest performance point in Linux.

Can you think of a scenario where setting the default *request* of uclamp.max
for all RT tasks helps?

I'm thinking, the global constrain of sysctl_sched_uclamp_util_max should
suffice. Or one can use cgroup to selectively cap some tasks.

For example if you don't want any task in the system to be boosted to anything
higher than 800, just do

	sysctl_sched_uclamp_util_max = 800

Or if you want to be selective, the same thing can be achieved by creating
a cgroup cpu controller that has uclamp.max = 0.8 and attaching tasks to it.

> 
> > +
> >  static inline struct uclamp_se
> >  uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> >  {
> > @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> >  static inline struct uclamp_se
> >  uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> >  {
> > -	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> > -	struct uclamp_se uc_max = uclamp_default[clamp_id];
> > +	struct uclamp_se uc_req, uc_max;
> > +
> > +	/*
> > +	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> > +	 */
> > +	uclamp_sync_util_min_rt_default(p);
> > +
> > +	uc_req = uclamp_tg_restrict(p, clamp_id);
> > +	uc_max = uclamp_default[clamp_id];
> 
> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> clamp_id == UCLAMP_MAX case. Would it be better to have a separate

It was actually intentional to make sure we update the value ASAP. I didn't
think it's a lot of overhead. I can further protect with a check to verify
whether the value has changed if it seems heavy handed.

> uclamp_default for RT like uclamp_default_rt and select uc_max based
> on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
> we can cover the case of uclamp_util_min < uclamp_util_min_rt.

Hmm I'm not sure I got you.

uclamp_default[] is setting the constraints. I.e. what's the range it's allowed
to take.

What's added here is simply setting the requested uclamp.min value, which if
the constraints allow it will be applied.

For example. If

	sysctl_sched_uclamp_util_min = 0
	sysctl_sched_uclamp_util_min_rt_default = 400

uclamp_eff_get() will return 0 as effective uclamp.min value for the task. So
while the task still requests to be boosted to 1024, but the system constraint
prevents it from getting it. But as soon as the system constraint has changed,
the task might be able to get what it wants.

For example, if at runtime sysctl_sched_uclamp_util_min was modified to 1024

	sysctl_sched_uclamp_util_min = 1024
	sysctl_sched_uclamp_util_min_rt_default = 400

uclamp_eff_get() return 400 for the task now, as requested.

Did this help answering your questions?

Thanks

--
Qais Yousef

> 
> >  
> >  	/* System default restrictions always apply */
> >  	if (unlikely(uc_req.value > uc_max.value))
> > @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> >  				loff_t *ppos)
> >  {
> >  	bool update_root_tg = false;
> > -	int old_min, old_max;
> > +	int old_min, old_max, old_min_rt;
> >  	int result;
> >  
> >  	mutex_lock(&uclamp_mutex);
> >  	old_min = sysctl_sched_uclamp_util_min;
> >  	old_max = sysctl_sched_uclamp_util_max;
> > +	old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
> >  
> >  	result = proc_dointvec(table, write, buffer, lenp, ppos);
> >  	if (result)
> > @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> >  		goto undo;
> >  	}
> >  
> > +	/*
> > +	 * The new value will be applied to RT tasks the next time the
> > +	 * scheduler needs to calculate the effective uclamp.min for that task,
> > +	 * assuming the task is using the system default and not a user
> > +	 * specified value. In the latter we shall leave the value as the user
> > +	 * requested.
> > +	 */
> > +	if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> > +		result = -EINVAL;
> > +		goto undo;
> > +	}
> > +
> >  	if (old_min != sysctl_sched_uclamp_util_min) {
> >  		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> >  			      sysctl_sched_uclamp_util_min, false);
> > @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> >  undo:
> >  	sysctl_sched_uclamp_util_min = old_min;
> >  	sysctl_sched_uclamp_util_max = old_max;
> > +	sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> >  done:
> >  	mutex_unlock(&uclamp_mutex);
> >  
> > @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> >  		if (uc_se->user_defined)
> >  			continue;
> >  
> > -		/* By default, RT tasks always get 100% boost */
> > +		/*
> > +		 * By default, RT tasks always get 100% boost, which the admins
> > +		 * are allowed to change via
> > +		 * sysctl_sched_uclamp_util_min_rt_default knob.
> > +		 */
> >  		if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> > -			clamp_value = uclamp_none(UCLAMP_MAX);
> > +			clamp_value = sysctl_sched_uclamp_util_min_rt_default;
> >  
> >  		uclamp_se_set(uc_se, clamp_value, false);
> >  	}
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index 8a176d8727a3..64117363c502 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> >  		.mode		= 0644,
> >  		.proc_handler	= sysctl_sched_uclamp_handler,
> >  	},
> > +	{
> > +		.procname	= "sched_util_clamp_min_rt_default",
> > +		.data		= &sysctl_sched_uclamp_util_min_rt_default,
> > +		.maxlen		= sizeof(unsigned int),
> > +		.mode		= 0644,
> > +		.proc_handler	= sysctl_sched_uclamp_handler,
> > +	},
> >  #endif
> >  #ifdef CONFIG_SCHED_AUTOGROUP
> >  	{
> > -- 
> > 2.17.1
> > 
> 
> Thanks,
> Pavan
> 
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
  2020-04-29 12:30   ` Qais Yousef
@ 2020-04-29 15:21     ` Pavan Kondeti
  2020-04-29 15:40       ` Qais Yousef
  2020-04-30 18:21     ` Dietmar Eggemann
  1 sibling, 1 reply; 11+ messages in thread
From: Pavan Kondeti @ 2020-04-29 15:21 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi, linux-doc,
	linux-kernel, linux-fsdevel

Hi Qais,

On Wed, Apr 29, 2020 at 01:30:57PM +0100, Qais Yousef wrote:
> Hi Pavan
> 
> On 04/29/20 17:02, Pavan Kondeti wrote:
> > Hi Qais,
> > 
> > On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
> > 
> > [...]
> > 
> > >  
> > > +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> > > +{
> > > +	struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> > > +
> > > +	if (unlikely(rt_task(p)) && !uc_se->user_defined)
> > > +		uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> > > +}
> > 
> > Unlike system default clamp values, RT default value is written to
> > p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
> > lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
> > big deal. Just sharing my observation. Is this how you expected it to work?
> 
> This is how I expect it to work yes.
> 
> Note that the sysctl_sched_uclamp_util_{min,max}, or 'system default clamp
> values' as you called them, are constraints. They define the range any task is
> _allowed to_ *request*.
> 
> The new sysctl_sched_uclamp_util_min_rt_default is setting the default
> *request* for RT tasks. And this was done since historically RT tasks has
> always run at the highest performance point in Linux.
> 
> Can you think of a scenario where setting the default *request* of uclamp.max
> for all RT tasks helps?
> 

It was a hypothetical question :-) Thanks for clearly explaining the
difference between system default clamp values (constraints) vs the newly
introduced sysctl_sched_uclamp_util_min_rt_default (request for RT tasks).

> I'm thinking, the global constrain of sysctl_sched_uclamp_util_max should
> suffice. Or one can use cgroup to selectively cap some tasks.
> 
> For example if you don't want any task in the system to be boosted to anything
> higher than 800, just do
> 
> 	sysctl_sched_uclamp_util_max = 800
> 
> Or if you want to be selective, the same thing can be achieved by creating
> a cgroup cpu controller that has uclamp.max = 0.8 and attaching tasks to it.
> 
> > 
> > > +
> > >  static inline struct uclamp_se
> > >  uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > >  {
> > > @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > >  static inline struct uclamp_se
> > >  uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> > >  {
> > > -	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> > > -	struct uclamp_se uc_max = uclamp_default[clamp_id];
> > > +	struct uclamp_se uc_req, uc_max;
> > > +
> > > +	/*
> > > +	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> > > +	 */
> > > +	uclamp_sync_util_min_rt_default(p);
> > > +
> > > +	uc_req = uclamp_tg_restrict(p, clamp_id);
> > > +	uc_max = uclamp_default[clamp_id];
> > 
> > We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> > clamp_id == UCLAMP_MAX case. Would it be better to have a separate
> 
> It was actually intentional to make sure we update the value ASAP. I didn't
> think it's a lot of overhead. I can further protect with a check to verify
> whether the value has changed if it seems heavy handed.
> 
> > uclamp_default for RT like uclamp_default_rt and select uc_max based
> > on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
> > we can cover the case of uclamp_util_min < uclamp_util_min_rt.
> 
> Hmm I'm not sure I got you.
> 
> uclamp_default[] is setting the constraints. I.e. what's the range it's allowed
> to take.
> 
> What's added here is simply setting the requested uclamp.min value, which if
> the constraints allow it will be applied.
> 
> For example. If
> 
> 	sysctl_sched_uclamp_util_min = 0
> 	sysctl_sched_uclamp_util_min_rt_default = 400
> 
> uclamp_eff_get() will return 0 as effective uclamp.min value for the task. So
> while the task still requests to be boosted to 1024, but the system constraint
> prevents it from getting it. But as soon as the system constraint has changed,
> the task might be able to get what it wants.
> 
> For example, if at runtime sysctl_sched_uclamp_util_min was modified to 1024
> 
> 	sysctl_sched_uclamp_util_min = 1024
> 	sysctl_sched_uclamp_util_min_rt_default = 400
> 
> uclamp_eff_get() return 400 for the task now, as requested.
> 

Yes, I did understand the relation between sysctl_sched_uclamp_util_min and
sysctl_sched_uclamp_util_min_rt_default . It is not clear why we are copying
sysctl_sched_uclamp_util_min_rt_default into p->uclamp_req[UCLAMP_MIN] though
user did not explicitly asked for it. In the above reply, you clearly
mentioned that the new knob works like a request from the user space for all
RT tasks.

As we are copying the sysctl_sched_uclamp_util_min_rt_default value into
p->uclamp_req[UCLAMP_MIN], user gets it when sched_getattr() is called though
sched_setattr() was not called before. I guess that is expected behavior with
your definition of this new tunable. Thanks for answering the question in
detail.

Thanks,
Pavan

> Thanks
> 
> --
> Qais Yousef
> 
> > 
> > >  
> > >  	/* System default restrictions always apply */
> > >  	if (unlikely(uc_req.value > uc_max.value))
> > > @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > >  				loff_t *ppos)
> > >  {
> > >  	bool update_root_tg = false;
> > > -	int old_min, old_max;
> > > +	int old_min, old_max, old_min_rt;
> > >  	int result;
> > >  
> > >  	mutex_lock(&uclamp_mutex);
> > >  	old_min = sysctl_sched_uclamp_util_min;
> > >  	old_max = sysctl_sched_uclamp_util_max;
> > > +	old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
> > >  
> > >  	result = proc_dointvec(table, write, buffer, lenp, ppos);
> > >  	if (result)
> > > @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > >  		goto undo;
> > >  	}
> > >  
> > > +	/*
> > > +	 * The new value will be applied to RT tasks the next time the
> > > +	 * scheduler needs to calculate the effective uclamp.min for that task,
> > > +	 * assuming the task is using the system default and not a user
> > > +	 * specified value. In the latter we shall leave the value as the user
> > > +	 * requested.
> > > +	 */
> > > +	if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> > > +		result = -EINVAL;
> > > +		goto undo;
> > > +	}
> > > +
> > >  	if (old_min != sysctl_sched_uclamp_util_min) {
> > >  		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> > >  			      sysctl_sched_uclamp_util_min, false);
> > > @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > >  undo:
> > >  	sysctl_sched_uclamp_util_min = old_min;
> > >  	sysctl_sched_uclamp_util_max = old_max;
> > > +	sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> > >  done:
> > >  	mutex_unlock(&uclamp_mutex);
> > >  
> > > @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> > >  		if (uc_se->user_defined)
> > >  			continue;
> > >  
> > > -		/* By default, RT tasks always get 100% boost */
> > > +		/*
> > > +		 * By default, RT tasks always get 100% boost, which the admins
> > > +		 * are allowed to change via
> > > +		 * sysctl_sched_uclamp_util_min_rt_default knob.
> > > +		 */
> > >  		if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> > > -			clamp_value = uclamp_none(UCLAMP_MAX);
> > > +			clamp_value = sysctl_sched_uclamp_util_min_rt_default;
> > >  
> > >  		uclamp_se_set(uc_se, clamp_value, false);
> > >  	}
> > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > > index 8a176d8727a3..64117363c502 100644
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> > >  		.mode		= 0644,
> > >  		.proc_handler	= sysctl_sched_uclamp_handler,
> > >  	},
> > > +	{
> > > +		.procname	= "sched_util_clamp_min_rt_default",
> > > +		.data		= &sysctl_sched_uclamp_util_min_rt_default,
> > > +		.maxlen		= sizeof(unsigned int),
> > > +		.mode		= 0644,
> > > +		.proc_handler	= sysctl_sched_uclamp_handler,
> > > +	},
> > >  #endif
> > >  #ifdef CONFIG_SCHED_AUTOGROUP
> > >  	{
> > > -- 
> > > 2.17.1
> > > 
> > 
> > Thanks,
> > Pavan
> > 
> > -- 
> > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
  2020-04-29 15:21     ` Pavan Kondeti
@ 2020-04-29 15:40       ` Qais Yousef
  0 siblings, 0 replies; 11+ messages in thread
From: Qais Yousef @ 2020-04-29 15:40 UTC (permalink / raw)
  To: Pavan Kondeti
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi, linux-doc,
	linux-kernel, linux-fsdevel

On 04/29/20 20:51, Pavan Kondeti wrote:
> As we are copying the sysctl_sched_uclamp_util_min_rt_default value into
> p->uclamp_req[UCLAMP_MIN], user gets it when sched_getattr() is called though
> sched_setattr() was not called before. I guess that is expected behavior with
> your definition of this new tunable. Thanks for answering the question in
> detail.

Yes. That's the original design without this patch actually. Though before it
was always set to 1024.

Thanks for having a look!

--
Qais Yousef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs
       [not found]   ` <BL0PR14MB377949FBF2B798EEC425EE719AAA0@BL0PR14MB3779.namprd14.prod.outlook.com>
@ 2020-04-30 10:03     ` Qais Yousef
  0 siblings, 0 replies; 11+ messages in thread
From: Qais Yousef @ 2020-04-30 10:03 UTC (permalink / raw)
  To: Tao Zhou
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi,
	Pavan Kondeti, linux-doc, linux-kernel, linux-fsdevel

On 04/30/20 13:06, Tao Zhou wrote:
> Hi,
> 
> On Tue, Apr 28, 2020 at 05:41:34PM +0100, Qais Yousef wrote:
> > Uclamp exposes 3 sysctl knobs:
> > 
> > 	* sched_util_clamp_min
> > 	* sched_util_clamp_max
> > 	* sched_util_clamp_min_rt_default
>  
> > Document them in sysctl/kernel.rst.
> > 
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > CC: Jonathan Corbet <corbet@lwn.net>
> > CC: Juri Lelli <juri.lelli@redhat.com>
> > CC: Vincent Guittot <vincent.guittot@linaro.org>
> > CC: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > CC: Steven Rostedt <rostedt@goodmis.org>
> > CC: Ben Segall <bsegall@google.com>
> > CC: Mel Gorman <mgorman@suse.de>
> > CC: Luis Chamberlain <mcgrof@kernel.org>
> > CC: Kees Cook <keescook@chromium.org>
> > CC: Iurii Zaikin <yzaikin@google.com>
> > CC: Quentin Perret <qperret@google.com>
> > CC: Valentin Schneider <valentin.schneider@arm.com>
> > CC: Patrick Bellasi <patrick.bellasi@matbug.net>
> > CC: Pavan Kondeti <pkondeti@codeaurora.org>
> > CC: linux-doc@vger.kernel.org
> > CC: linux-kernel@vger.kernel.org
> > CC: linux-fsdevel@vger.kernel.org
> > ---
> >  Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
> >  1 file changed, 48 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 0d427fd10941..e7255f71493c 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
> >  incurs a small amount of overhead in the scheduler but is
> >  useful for debugging and performance tuning.
> >  
> > +sched_util_clamp_min:
> > +=====================
> > +
> > +Max allowed *minimum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.min value cannot be greater than
>                                                           ^^^^^^^
> 
> Seems that 'greater' should be 'smaller'.
> And the range is [sched_util_clamp_min:SCHED_CAPACITY_SCALE]
> Uclamp request should not below this system value.
> Or I am totally wrong for memory leak.

No the range is [0:sched_util_clamp_min].

It is what it is implemented by this condition:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v5.7-rc3#n913

So if the requested uclamp.min value is greater than sched_util_clamp_min, we
return sched_util_clamp_min.

Thanks

--
Qais Yousef

> 
> Thank you
> 
> > +sched_util_clamp_min, ie: it is restricted to the range
> > +[0:sched_util_clamp_min].
> > +
> > +sched_util_clamp_max:
> > +=====================
> > +
> > +Max allowed *maximum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.max value cannot be greater than
> > +sched_util_clamp_max, ie: it is restricted to the range
> > +[0:sched_util_clamp_max].
> > +
> > +sched_util_clamp_min_rt_default:
> > +================================
> > +
> > +By default Linux is tuned for performance. Which means that RT tasks always run
> > +at the highest frequency and most capable (highest capacity) CPU (in
> > +heterogeneous systems).
> > +
> > +Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
> > +SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to
> > +run at the highest frequency and bias them to run on the biggest CPU.
> > +
> > +This knob allows admins to change the default behavior when uclamp is being
> > +used. In battery powered devices particularly, running at the maximum
> > +capacity and frequency will increase energy consumption and shorten the battery
> > +life.
> > +
> > +This knob is only effective for RT tasks which the user hasn't modified their
> > +requested uclamp.min value via sched_setattr() syscall.
> > +
> > +This knob will not escape the constraint imposed by sched_util_clamp_min
> > +defined above.
> > +
> > +Any modification is applied lazily on the next opportunity the scheduler needs
> > +to calculate the effective value of uclamp.min of the task.
> >  
> >  seccomp
> >  =======
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
  2020-04-29 12:30   ` Qais Yousef
  2020-04-29 15:21     ` Pavan Kondeti
@ 2020-04-30 18:21     ` Dietmar Eggemann
  2020-05-01 11:03       ` Qais Yousef
  1 sibling, 1 reply; 11+ messages in thread
From: Dietmar Eggemann @ 2020-04-30 18:21 UTC (permalink / raw)
  To: Qais Yousef, Pavan Kondeti
  Cc: Peter Zijlstra, Ingo Molnar, Jonathan Corbet, Juri Lelli,
	Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, Quentin Perret,
	Valentin Schneider, Patrick Bellasi, linux-doc, linux-kernel,
	linux-fsdevel

On 29/04/2020 14:30, Qais Yousef wrote:
> Hi Pavan
> 
> On 04/29/20 17:02, Pavan Kondeti wrote:
>> Hi Qais,
>>
>> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:

[...]

>>> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>>>  static inline struct uclamp_se
>>>  uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
>>>  {
>>> -	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
>>> -	struct uclamp_se uc_max = uclamp_default[clamp_id];
>>> +	struct uclamp_se uc_req, uc_max;
>>> +
>>> +	/*
>>> +	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
>>> +	 */
>>> +	uclamp_sync_util_min_rt_default(p);
>>> +
>>> +	uc_req = uclamp_tg_restrict(p, clamp_id);
>>> +	uc_max = uclamp_default[clamp_id];
>>
>> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
>> clamp_id == UCLAMP_MAX case. Would it be better to have a separate
> 
> It was actually intentional to make sure we update the value ASAP. I didn't
> think it's a lot of overhead. I can further protect with a check to verify
> whether the value has changed if it seems heavy handed.

Users of uclamp_eff_value()->uclamp_eff_get() ((like
rt_task_fits_capacity())) always call both ids.

So calling uclamp_sync_util_min_rt_default() only for UCLAMP_MIN would
make sense. It's overhead in the fast path for rt tasks.

Since changes to sched_util_clamp_min_rt_default will be fairly rare,
you might even want to consider only doing the uclamp_se_set(...,
min_rt_default, ...) in case

  uc_se->value != sysctl_sched_uclamp_util_min_rt_default

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
  2020-04-30 18:21     ` Dietmar Eggemann
@ 2020-05-01 11:03       ` Qais Yousef
  0 siblings, 0 replies; 11+ messages in thread
From: Qais Yousef @ 2020-05-01 11:03 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Pavan Kondeti, Peter Zijlstra, Ingo Molnar, Jonathan Corbet,
	Juri Lelli, Vincent Guittot, Steven Rostedt, Ben Segall,
	Mel Gorman, Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Quentin Perret, Valentin Schneider, Patrick Bellasi, linux-doc,
	linux-kernel, linux-fsdevel

On 04/30/20 20:21, Dietmar Eggemann wrote:
> On 29/04/2020 14:30, Qais Yousef wrote:
> > Hi Pavan
> > 
> > On 04/29/20 17:02, Pavan Kondeti wrote:
> >> Hi Qais,
> >>
> >> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
> 
> [...]
> 
> >>> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> >>>  static inline struct uclamp_se
> >>>  uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> >>>  {
> >>> -	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> >>> -	struct uclamp_se uc_max = uclamp_default[clamp_id];
> >>> +	struct uclamp_se uc_req, uc_max;
> >>> +
> >>> +	/*
> >>> +	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> >>> +	 */
> >>> +	uclamp_sync_util_min_rt_default(p);
> >>> +
> >>> +	uc_req = uclamp_tg_restrict(p, clamp_id);
> >>> +	uc_max = uclamp_default[clamp_id];
> >>
> >> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> >> clamp_id == UCLAMP_MAX case. Would it be better to have a separate
> > 
> > It was actually intentional to make sure we update the value ASAP. I didn't
> > think it's a lot of overhead. I can further protect with a check to verify
> > whether the value has changed if it seems heavy handed.
> 
> Users of uclamp_eff_value()->uclamp_eff_get() ((like
> rt_task_fits_capacity())) always call both ids.
> 
> So calling uclamp_sync_util_min_rt_default() only for UCLAMP_MIN would
> make sense. It's overhead in the fast path for rt tasks.
> 
> Since changes to sched_util_clamp_min_rt_default will be fairly rare,
> you might even want to consider only doing the uclamp_se_set(...,
> min_rt_default, ...) in case
> 
>   uc_se->value != sysctl_sched_uclamp_util_min_rt_default

Okay will do though I'm not convinced this micro optimization makes any
difference. p->uclamp_req[] is already read, so it should be cache hot.
uclamp_set_se() performs 3 writes on it, so I expect no stalls. Even if it
was a write-through cache, there's usually a write buffer that I think should
be able to hide the 3 writes. Write-through + linux is a bad idea in general.

In contrary, the complex if condition might make branch prediction less
effective, hence this micro optimization might create a negative effect.

So I don't see this a clear win and it would be hard to know without proper
measurement really. It is cheaper sometimes to always execute something than
sprinkle more branches to avoid executing this simple code.

I just realized though that I didn't define the
uclamp_sync_util_min_rt_default() as inline, which I should to avoid a function
call. Compiler *should* be smart though :p

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-05-01 11:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 16:41 [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-04-28 16:41 ` [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
2020-04-28 17:43   ` Randy Dunlap
2020-04-29  9:54     ` Qais Yousef
     [not found]   ` <BL0PR14MB377949FBF2B798EEC425EE719AAA0@BL0PR14MB3779.namprd14.prod.outlook.com>
2020-04-30 10:03     ` Qais Yousef
2020-04-29 11:32 ` [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Pavan Kondeti
2020-04-29 12:30   ` Qais Yousef
2020-04-29 15:21     ` Pavan Kondeti
2020-04-29 15:40       ` Qais Yousef
2020-04-30 18:21     ` Dietmar Eggemann
2020-05-01 11:03       ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).