linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched: Optionally skip uclamp logic in fast path
@ 2020-06-18 19:55 Qais Yousef
  2020-06-18 19:55 ` [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-18 19:55 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Qais Yousef, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

This series attempts to address the report that uclamp logic could be expensive
sometimes and shows a regression in netperf UDP_STREAM under certain
conditions.

The first patch is a fix for how struct uclamp_rq is initialized which is
required by the 2nd patch which contains the real 'fix'.

Worth noting that the root cause of the overhead is believed to be system
specific or related to potential certain code/data layout issues, leading to
worse I/D $ performance.

Different systems exhibited different behaviors and the regression did
disappear in certain kernel version while attempting to reporoduce.

More info can be found here:

https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/

Having the static key seemed the best thing to do to ensure the effect of
uclamp is minimized for kernels that compile it in but don't have a userspace
that uses it, which will allow distros to distribute uclamp capable kernels by
default without having to compromise on performance for some systems that could
be affected.

Thanks

--
Qais Yousef

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
CC: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Chris Redpath <chrid.redpath@arm.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: linux-kernel@vger.kernel.org

Qais Yousef (2):
  sched/uclamp: Fix initialization of strut uclamp_rq
  sched/uclamp: Protect uclamp fast path code with static key

 kernel/sched/core.c | 73 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 9 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-18 19:55 [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef
@ 2020-06-18 19:55 ` Qais Yousef
  2020-06-19 10:36   ` Valentin Schneider
  2020-06-19 17:30   ` Peter Zijlstra
  2020-06-18 19:55 ` [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
  2020-06-22  9:06 ` [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba
  2 siblings, 2 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-18 19:55 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Qais Yousef, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

struct uclamp_rq was zeroed out entirely in assumption that in the first
call to uclamp_rq_inc() they'd be initialized correctly in accordance to
default settings.

But when next patch introduces a static key to skip
uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil
will fail to perform any frequency changes because the
rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which
means all rqs are capped to 0 by default.

Fix it by making sure we do proper initialization at init without
relying on uclamp_rq_inc() doing it later.

Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
CC: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Chris Redpath <chrid.redpath@arm.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a43c84c27c6f..e19d2b915406 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1248,6 +1248,20 @@ static void uclamp_fork(struct task_struct *p)
 	}
 }
 
+static void __init init_uclamp_rq(struct rq *rq)
+{
+	enum uclamp_id clamp_id;
+	struct uclamp_rq *uc_rq = rq->uclamp;
+
+	for_each_clamp_id(clamp_id) {
+		memset(uc_rq[clamp_id].bucket,
+		       0,
+		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
+
+		uc_rq[clamp_id].value = uclamp_none(clamp_id);
+	}
+}
+
 static void __init init_uclamp(void)
 {
 	struct uclamp_se uc_max = {};
@@ -1257,8 +1271,7 @@ static void __init init_uclamp(void)
 	mutex_init(&uclamp_mutex);
 
 	for_each_possible_cpu(cpu) {
-		memset(&cpu_rq(cpu)->uclamp, 0,
-				sizeof(struct uclamp_rq)*UCLAMP_CNT);
+		init_uclamp_rq(cpu_rq(cpu));
 		cpu_rq(cpu)->uclamp_flags = 0;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-18 19:55 [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef
  2020-06-18 19:55 ` [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
@ 2020-06-18 19:55 ` Qais Yousef
  2020-06-19 10:36   ` Valentin Schneider
                     ` (2 more replies)
  2020-06-22  9:06 ` [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba
  2 siblings, 3 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-18 19:55 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Qais Yousef, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

There is a report that when uclamp is enabled, a netperf UDP test
regresses compared to a kernel compiled without uclamp.

https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/

While investigating the root cause, there were no sign that the uclamp
code is doing anything particularly expensive but could suffer from bad
cache behavior under certain circumstances that are yet to be
understood.

https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/

To reduce the pressure on the fast path anyway, add a static key that is
by default will skip executing uclamp logic in the
enqueue/dequeue_task() fast path in the until it's needed.

As soon as the user start using util clamp by:

	1. Changing uclamp value of a task with sched_setattr()
	2. Modifying the default sysctl_sched_util_clamp_{min, max}
	3. Modifying the default cpu.uclamp.{min, max} value in cgroup

We flip the static key now that the user has opted to use util clamp.
Effectively re-introducing uclamp logic in the enqueue/dequeue_task()
fast path. It stays on from that point forward until the next reboot.

This should help minimize the effect of util clamp on workloads that
don't need it but still allow distros to ship their kernels with uclamp
compiled in by default.

SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end
up with unbalanced call to uclamp_rq_dec_id() if we flip the key while
a task is running in the rq. Since we know it is harmless we just
quietly return if we attempt a uclamp_rq_dec_id() when
rq->uclamp[].bucket[].tasks is 0.

The following results demonstrates how this helps on 2 Sockets Xeon E5
2x10-Cores system.

                                   nouclamp                 uclamp      uclamp-static-key
Hmean     send-64         162.43 (   0.00%)      157.84 *  -2.82%*      163.39 *   0.59%*
Hmean     send-128        324.71 (   0.00%)      314.78 *  -3.06%*      326.18 *   0.45%*
Hmean     send-256        641.55 (   0.00%)      628.67 *  -2.01%*      648.12 *   1.02%*
Hmean     send-1024      2525.28 (   0.00%)     2448.26 *  -3.05%*     2543.73 *   0.73%*
Hmean     send-2048      4836.14 (   0.00%)     4712.08 *  -2.57%*     4867.69 *   0.65%*
Hmean     send-3312      7540.83 (   0.00%)     7425.45 *  -1.53%*     7621.06 *   1.06%*
Hmean     send-4096      9124.53 (   0.00%)     8948.82 *  -1.93%*     9276.25 *   1.66%*
Hmean     send-8192     15589.67 (   0.00%)    15486.35 *  -0.66%*    15819.98 *   1.48%*
Hmean     send-16384    26386.47 (   0.00%)    25752.25 *  -2.40%*    26773.74 *   1.47%*

Reported-by: Mel Gorman <mgorman@suse.de>
Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
CC: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Chris Redpath <chrid.redpath@arm.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: linux-kernel@vger.kernel.org
---

This takes a different approach to PSI which introduces a config option

```
      CONFIG_PSI_DEFAULT_DISABLED

        Require boot parameter to enable pressure stall information
        tracking (NEW)

      boot param psi
```

via commit e0c274472d5d "psi: make disabling/enabling easier for vendor kernels"

uclamp has a clearer points of entry when userspace would like to use it so we
can automatically flip the switch if the kernel is running on a userspace that
wants to user utilclamp without any extra userspace visible switches.

I wanted to make this dependent on schedutil being the governor too, but beside
the complexity, uclamp is used for capacity awareness. We could certainly
construct a more complex condition, but I'm not sure it's worth it. Open to
hear more opinions and points of views on this :)


 kernel/sched/core.c | 56 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e19d2b915406..0824e1bfb484 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -793,6 +793,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE;
 /* All clamps are required to be less or equal than these values */
 static struct uclamp_se uclamp_default[UCLAMP_CNT];
 
+/*
+ * This static key is used to reduce the uclamp overhead in the fast path. It
+ * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task().
+ *
+ * This allows users to continue to enable uclamp in their kernel config with
+ * minimum uclamp overhead in the fast path.
+ *
+ * As soon as userspace modifies any of the uclamp knobs, the static key is
+ * disabled, since we have an actual users that make use of uclamp
+ * functionality.
+ *
+ * The knobs that would disable this static key are:
+ *
+ *   * A task modifying its uclamp value with sched_setattr().
+ *   * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
+ *   * An admin modifying the cgroup cpu.uclamp.{min, max}
+ */
+DEFINE_STATIC_KEY_TRUE(sched_uclamp_unused);
+
 /* Integer rounded range for each bucket */
 #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
 
@@ -993,9 +1012,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 	lockdep_assert_held(&rq->lock);
 
 	bucket = &uc_rq->bucket[uc_se->bucket_id];
-	SCHED_WARN_ON(!bucket->tasks);
-	if (likely(bucket->tasks))
-		bucket->tasks--;
+
+	/*
+	 * This could happen if sched_uclamp_unused was disabled while the
+	 * current task was running, hence we could end up with unbalanced call
+	 * to uclamp_rq_dec_id().
+	 */
+	if (unlikely(!bucket->tasks))
+		return;
+
+	bucket->tasks--;
 	uc_se->active = false;
 
 	/*
@@ -1031,6 +1057,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
 {
 	enum uclamp_id clamp_id;
 
+	if (static_branch_likely(&sched_uclamp_unused))
+		return;
+
 	if (unlikely(!p->sched_class->uclamp_enabled))
 		return;
 
@@ -1046,6 +1075,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 {
 	enum uclamp_id clamp_id;
 
+	if (static_branch_likely(&sched_uclamp_unused))
+		return;
+
 	if (unlikely(!p->sched_class->uclamp_enabled))
 		return;
 
@@ -1155,9 +1187,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 		update_root_tg = true;
 	}
 
-	if (update_root_tg)
+	if (update_root_tg) {
 		uclamp_update_root_tg();
 
+		if (static_branch_unlikely(&sched_uclamp_unused))
+			static_branch_disable(&sched_uclamp_unused);
+	}
+
 	/*
 	 * We update all RUNNABLE tasks only when task groups are in use.
 	 * Otherwise, keep it simple and do just a lazy update at each next
@@ -1221,6 +1257,9 @@ static void __setscheduler_uclamp(struct task_struct *p,
 	if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)))
 		return;
 
+	if (static_branch_unlikely(&sched_uclamp_unused))
+		static_branch_disable(&sched_uclamp_unused);
+
 	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {
 		uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
 			      attr->sched_util_min, true);
@@ -1260,6 +1299,8 @@ static void __init init_uclamp_rq(struct rq *rq)
 
 		uc_rq[clamp_id].value = uclamp_none(clamp_id);
 	}
+
+	rq->uclamp_flags = 0;
 }
 
 static void __init init_uclamp(void)
@@ -1270,10 +1311,8 @@ static void __init init_uclamp(void)
 
 	mutex_init(&uclamp_mutex);
 
-	for_each_possible_cpu(cpu) {
+	for_each_possible_cpu(cpu)
 		init_uclamp_rq(cpu_rq(cpu));
-		cpu_rq(cpu)->uclamp_flags = 0;
-	}
 
 	for_each_clamp_id(clamp_id) {
 		uclamp_se_set(&init_task.uclamp_req[clamp_id],
@@ -7315,6 +7354,9 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
 	if (req.ret)
 		return req.ret;
 
+	if (static_branch_unlikely(&sched_uclamp_unused))
+		static_branch_disable(&sched_uclamp_unused);
+
 	mutex_lock(&uclamp_mutex);
 	rcu_read_lock();
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-18 19:55 ` [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
@ 2020-06-19 10:36   ` Valentin Schneider
  2020-06-19 17:30   ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 10:36 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 18/06/20 20:55, Qais Yousef wrote:
> struct uclamp_rq was zeroed out entirely in assumption that in the first
> call to uclamp_rq_inc() they'd be initialized correctly in accordance to
> default settings.
>
> But when next patch introduces a static key to skip
> uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil
> will fail to perform any frequency changes because the
> rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which
> means all rqs are capped to 0 by default.
>
> Fix it by making sure we do proper initialization at init without
> relying on uclamp_rq_inc() doing it later.
>
> Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> CC: Patrick Bellasi <patrick.bellasi@matbug.net>
> Cc: Chris Redpath <chrid.redpath@arm.com>
> Cc: Lukasz Luba <lukasz.luba@arm.com>
> Cc: linux-kernel@vger.kernel.org

Small nit below, otherwise:
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

> ---
>  kernel/sched/core.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a43c84c27c6f..e19d2b915406 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1248,6 +1248,20 @@ static void uclamp_fork(struct task_struct *p)
>       }
>  }
>
> +static void __init init_uclamp_rq(struct rq *rq)
> +{
> +	enum uclamp_id clamp_id;
> +	struct uclamp_rq *uc_rq = rq->uclamp;
> +
> +	for_each_clamp_id(clamp_id) {
> +		memset(uc_rq[clamp_id].bucket,
> +		       0,
> +		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
> +
> +		uc_rq[clamp_id].value = uclamp_none(clamp_id);
> +	}
> +}
> +
>  static void __init init_uclamp(void)
>  {
>       struct uclamp_se uc_max = {};
> @@ -1257,8 +1271,7 @@ static void __init init_uclamp(void)
>       mutex_init(&uclamp_mutex);
>
>       for_each_possible_cpu(cpu) {
> -		memset(&cpu_rq(cpu)->uclamp, 0,
> -				sizeof(struct uclamp_rq)*UCLAMP_CNT);
> +		init_uclamp_rq(cpu_rq(cpu));
>               cpu_rq(cpu)->uclamp_flags = 0;

That flags assignment ought to be squashed in the init function as well -
and actually you do that in patch 2, so I suppose that's a rebase accident.

>       }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-18 19:55 ` [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
@ 2020-06-19 10:36   ` Valentin Schneider
  2020-06-19 11:57     ` Mel Gorman
  2020-06-19 12:51     ` Qais Yousef
  2020-06-19 10:39   ` Valentin Schneider
  2020-06-19 17:45   ` Peter Zijlstra
  2 siblings, 2 replies; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 10:36 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 18/06/20 20:55, Qais Yousef wrote:
> There is a report that when uclamp is enabled, a netperf UDP test
> regresses compared to a kernel compiled without uclamp.
>
> https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
>

ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>

> While investigating the root cause, there were no sign that the uclamp
> code is doing anything particularly expensive but could suffer from bad
> cache behavior under certain circumstances that are yet to be
> understood.
>
> https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/

Ditto on the URL

>
> To reduce the pressure on the fast path anyway, add a static key that is
                                                                        ^^
                                                                     s/is//

> by default will skip executing uclamp logic in the
> enqueue/dequeue_task() fast path in the until it's needed.
                                   ^^^^^^
                                 s/in the//

> As soon as the user start using util clamp by:
>
>       1. Changing uclamp value of a task with sched_setattr()
>       2. Modifying the default sysctl_sched_util_clamp_{min, max}
>       3. Modifying the default cpu.uclamp.{min, max} value in cgroup
>
> We flip the static key now that the user has opted to use util clamp.
> Effectively re-introducing uclamp logic in the enqueue/dequeue_task()
> fast path. It stays on from that point forward until the next reboot.
>
> This should help minimize the effect of util clamp on workloads that
> don't need it but still allow distros to ship their kernels with uclamp
> compiled in by default.
>
> SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end
> up with unbalanced call to uclamp_rq_dec_id() if we flip the key while
> a task is running in the rq. Since we know it is harmless we just
> quietly return if we attempt a uclamp_rq_dec_id() when
> rq->uclamp[].bucket[].tasks is 0.
>

I have an extra comment about that down in the diff.

Also, I think it would be worth mentioning in the changelog why we use the
same static key with different likelihoods - unlikely in unfrequent paths,
and likely in the eq/dq hotpath.

> The following results demonstrates how this helps on 2 Sockets Xeon E5
> 2x10-Cores system.
>
>                                    nouclamp                 uclamp      uclamp-static-key
> Hmean     send-64         162.43 (   0.00%)      157.84 *  -2.82%*      163.39 *   0.59%*
> Hmean     send-128        324.71 (   0.00%)      314.78 *  -3.06%*      326.18 *   0.45%*
> Hmean     send-256        641.55 (   0.00%)      628.67 *  -2.01%*      648.12 *   1.02%*
> Hmean     send-1024      2525.28 (   0.00%)     2448.26 *  -3.05%*     2543.73 *   0.73%*
> Hmean     send-2048      4836.14 (   0.00%)     4712.08 *  -2.57%*     4867.69 *   0.65%*
> Hmean     send-3312      7540.83 (   0.00%)     7425.45 *  -1.53%*     7621.06 *   1.06%*
> Hmean     send-4096      9124.53 (   0.00%)     8948.82 *  -1.93%*     9276.25 *   1.66%*
> Hmean     send-8192     15589.67 (   0.00%)    15486.35 *  -0.66%*    15819.98 *   1.48%*
> Hmean     send-16384    26386.47 (   0.00%)    25752.25 *  -2.40%*    26773.74 *   1.47%*
>

Am I reading this correctly in that compiling in uclamp but having the
static key enabled gives a slight improvement compared to not compiling in
uclamp? I suppose the important bit is that we're not seeing regressions
anymore, but still.

> Reported-by: Mel Gorman <mgorman@suse.de>
> Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> CC: Patrick Bellasi <patrick.bellasi@matbug.net>
> Cc: Chris Redpath <chrid.redpath@arm.com>
> Cc: Lukasz Luba <lukasz.luba@arm.com>
> Cc: linux-kernel@vger.kernel.org
>
>  kernel/sched/core.c | 56 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 49 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e19d2b915406..0824e1bfb484 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -993,9 +1012,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
>       lockdep_assert_held(&rq->lock);
>
>       bucket = &uc_rq->bucket[uc_se->bucket_id];
> -	SCHED_WARN_ON(!bucket->tasks);
> -	if (likely(bucket->tasks))
> -		bucket->tasks--;
> +
> +	/*
> +	 * This could happen if sched_uclamp_unused was disabled while the
> +	 * current task was running, hence we could end up with unbalanced call
> +	 * to uclamp_rq_dec_id().
> +	 */
> +	if (unlikely(!bucket->tasks))
> +		return;

I'm slightly worried about silent returns for cases like these, can we try
to cook something up to preserve the previous SCHED_WARN_ON()? Say,
something like the horrendous below - alternatively might be feasible with
with some clever p->on_rq flag.

---
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b62e6aaf28f0..09a7891eb481 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -695,6 +695,9 @@ struct task_struct {
        struct uclamp_se		uclamp_req[UCLAMP_CNT];
        /* Effective clamp values used for a scheduling entity */
        struct uclamp_se		uclamp[UCLAMP_CNT];
+#ifdef CONFIG_SCHED_DEBUG
+	int                             uclamp_unused_enqueue;
+#endif
 #endif

 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a712dcb682b..2a723e9d5219 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1018,8 +1018,10 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
         * current task was running, hence we could end up with unbalanced call
         * to uclamp_rq_dec_id().
         */
-	if (unlikely(!bucket->tasks))
+	if (unlikely(!bucket->tasks)) {
+		SCHED_WARN_ON(!p->uclamp_unused_enqueue);
                return;
+	}

        bucket->tasks--;
        uc_se->active = false;
@@ -1049,8 +1051,16 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
 {
        enum uclamp_id clamp_id;

-	if (static_branch_likely(&sched_uclamp_unused))
+#ifdef CONFIG_SCHED_DEBUG
+	p->uclamp_unused_enqueue = 0;
+#endif
+
+	if (static_branch_likely(&sched_uclamp_unused)) {
+#ifdef CONFIG_SCHED_DEBUG
+		p->uclamp_unused_enqueue = 1;
+#endif
                return;
+	}

        if (unlikely(!p->sched_class->uclamp_enabled))
                return;
@@ -1075,6 +1085,10 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)

        for_each_clamp_id(clamp_id)
                uclamp_rq_dec_id(rq, p, clamp_id);
+
+#ifdef CONFIG_SCHED_DEBUG
+	p->uclamp_unused_enqueue = 0;
+#endif
 }

 static inline void
---

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-18 19:55 ` [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
  2020-06-19 10:36   ` Valentin Schneider
@ 2020-06-19 10:39   ` Valentin Schneider
  2020-06-19 17:45   ` Peter Zijlstra
  2 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 10:39 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 18/06/20 20:55, Qais Yousef wrote:
> Reported-by: Mel Gorman <mgorman@suse.de>
> Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> CC: Patrick Bellasi <patrick.bellasi@matbug.net>
> Cc: Chris Redpath <chrid.redpath@arm.com>

Seeing as I just got complaints from the mail server; that address is misspelt.

> Cc: Lukasz Luba <lukasz.luba@arm.com>
> Cc: linux-kernel@vger.kernel.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 10:36   ` Valentin Schneider
@ 2020-06-19 11:57     ` Mel Gorman
  2020-06-19 12:17       ` Valentin Schneider
                         ` (2 more replies)
  2020-06-19 12:51     ` Qais Yousef
  1 sibling, 3 replies; 26+ messages in thread
From: Mel Gorman @ 2020-06-19 11:57 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Qais Yousef, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On Fri, Jun 19, 2020 at 11:36:46AM +0100, Valentin Schneider wrote:
> >                                    nouclamp                 uclamp      uclamp-static-key
> > Hmean     send-64         162.43 (   0.00%)      157.84 *  -2.82%*      163.39 *   0.59%*
> > Hmean     send-128        324.71 (   0.00%)      314.78 *  -3.06%*      326.18 *   0.45%*
> > Hmean     send-256        641.55 (   0.00%)      628.67 *  -2.01%*      648.12 *   1.02%*
> > Hmean     send-1024      2525.28 (   0.00%)     2448.26 *  -3.05%*     2543.73 *   0.73%*
> > Hmean     send-2048      4836.14 (   0.00%)     4712.08 *  -2.57%*     4867.69 *   0.65%*
> > Hmean     send-3312      7540.83 (   0.00%)     7425.45 *  -1.53%*     7621.06 *   1.06%*
> > Hmean     send-4096      9124.53 (   0.00%)     8948.82 *  -1.93%*     9276.25 *   1.66%*
> > Hmean     send-8192     15589.67 (   0.00%)    15486.35 *  -0.66%*    15819.98 *   1.48%*
> > Hmean     send-16384    26386.47 (   0.00%)    25752.25 *  -2.40%*    26773.74 *   1.47%*
> >
> 
> Am I reading this correctly in that compiling in uclamp but having the
> static key enabled gives a slight improvement compared to not compiling in
> uclamp? I suppose the important bit is that we're not seeing regressions
> anymore, but still.
> 

I haven't reviewed the series in depth because from your review, another
version is likely in the works. However, it is not that unusual to
see small fluctuations like this that are counter-intuitive. The report
indicates the difference is likely outside of the noise with * around the
percentage difference instead of () but it could be small boot-to-boot
variance, differences in code layout, slight differences in slab usage
patterns etc. The definitive evidence that uclamp overhead is no there
is whether the uclamp functions show up in annotated profiles or not.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 11:57     ` Mel Gorman
@ 2020-06-19 12:17       ` Valentin Schneider
  2020-06-19 12:55       ` Qais Yousef
  2020-06-19 14:51       ` Qais Yousef
  2 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 12:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Qais Yousef, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 19/06/20 12:57, Mel Gorman wrote:
> On Fri, Jun 19, 2020 at 11:36:46AM +0100, Valentin Schneider wrote:
>> >                                    nouclamp                 uclamp      uclamp-static-key
>> > Hmean     send-64         162.43 (   0.00%)      157.84 *  -2.82%*      163.39 *   0.59%*
>> > Hmean     send-128        324.71 (   0.00%)      314.78 *  -3.06%*      326.18 *   0.45%*
>> > Hmean     send-256        641.55 (   0.00%)      628.67 *  -2.01%*      648.12 *   1.02%*
>> > Hmean     send-1024      2525.28 (   0.00%)     2448.26 *  -3.05%*     2543.73 *   0.73%*
>> > Hmean     send-2048      4836.14 (   0.00%)     4712.08 *  -2.57%*     4867.69 *   0.65%*
>> > Hmean     send-3312      7540.83 (   0.00%)     7425.45 *  -1.53%*     7621.06 *   1.06%*
>> > Hmean     send-4096      9124.53 (   0.00%)     8948.82 *  -1.93%*     9276.25 *   1.66%*
>> > Hmean     send-8192     15589.67 (   0.00%)    15486.35 *  -0.66%*    15819.98 *   1.48%*
>> > Hmean     send-16384    26386.47 (   0.00%)    25752.25 *  -2.40%*    26773.74 *   1.47%*
>> >
>>
>> Am I reading this correctly in that compiling in uclamp but having the
>> static key enabled gives a slight improvement compared to not compiling in
>> uclamp? I suppose the important bit is that we're not seeing regressions
>> anymore, but still.
>>
>
> I haven't reviewed the series in depth because from your review, another
> version is likely in the works.

I don't wait Qais to hate me here - I think you could start the performance
testing on this version if you feel like it, given my comments were mostly
on changelog / debug options - the core of that patch shouldn't change
much.

> However, it is not that unusual to
> see small fluctuations like this that are counter-intuitive. The report
> indicates the difference is likely outside of the noise with * around the
> percentage difference instead of () but it could be small boot-to-boot
> variance, differences in code layout, slight differences in slab usage
> patterns etc. The definitive evidence that uclamp overhead is no there
> is whether the uclamp functions show up in annotated profiles or not.

I see, thanks! I suppose if we have access to individual samples we can
also run some statistical tests / stare at some boxplots to see how it
compares.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 10:36   ` Valentin Schneider
  2020-06-19 11:57     ` Mel Gorman
@ 2020-06-19 12:51     ` Qais Yousef
  2020-06-19 13:23       ` Steven Rostedt
  2020-06-19 13:25       ` Valentin Schneider
  1 sibling, 2 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 12:51 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 11:36, Valentin Schneider wrote:
> 
> On 18/06/20 20:55, Qais Yousef wrote:
> > There is a report that when uclamp is enabled, a netperf UDP test
> > regresses compared to a kernel compiled without uclamp.
> >
> > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
> >
> 
> ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>

The link is correct permalinnk from lore and contains the message-id as Peter
likes and he has accepted this form before.

If you look closely you'll see that what you suggest is just moving 'lkml' to
replace lore in the dns name and put an /r/. I don't see a need to enforce one
form over the other as the one I used is much easier to get.

If Peter really insists I'll be happy to change.

[...]

> > +	 * This could happen if sched_uclamp_unused was disabled while the
> > +	 * current task was running, hence we could end up with unbalanced call
> > +	 * to uclamp_rq_dec_id().
> > +	 */
> > +	if (unlikely(!bucket->tasks))
> > +		return;
> 
> I'm slightly worried about silent returns for cases like these, can we try
> to cook something up to preserve the previous SCHED_WARN_ON()? Say,
> something like the horrendous below - alternatively might be feasible with
> with some clever p->on_rq flag.

I am really against extra churn and debug code to detect an impossible case
that is not really tricky for reviewers to discern. Outside of enqueue/dequeue
path, it's only used in update_uclamp_active(). It is quite easy to see that
it's impossible, except for the legit case now when we have a static key
changing when a task is running.

I am strongly against extra debug code just to be safe. It ends up with
confusion down the line and extra complexity, and since this is the hot path
maybe potential extra variables to mess with cache behaviors.

Thanks

--
Qais Yousef

> 
> ---
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..09a7891eb481 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -695,6 +695,9 @@ struct task_struct {
>         struct uclamp_se		uclamp_req[UCLAMP_CNT];
>         /* Effective clamp values used for a scheduling entity */
>         struct uclamp_se		uclamp[UCLAMP_CNT];
> +#ifdef CONFIG_SCHED_DEBUG
> +	int                             uclamp_unused_enqueue;
> +#endif
>  #endif
> 
>  #ifdef CONFIG_PREEMPT_NOTIFIERS
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2a712dcb682b..2a723e9d5219 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1018,8 +1018,10 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
>          * current task was running, hence we could end up with unbalanced call
>          * to uclamp_rq_dec_id().
>          */
> -	if (unlikely(!bucket->tasks))
> +	if (unlikely(!bucket->tasks)) {
> +		SCHED_WARN_ON(!p->uclamp_unused_enqueue);
>                 return;
> +	}
> 
>         bucket->tasks--;
>         uc_se->active = false;
> @@ -1049,8 +1051,16 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
>  {
>         enum uclamp_id clamp_id;
> 
> -	if (static_branch_likely(&sched_uclamp_unused))
> +#ifdef CONFIG_SCHED_DEBUG
> +	p->uclamp_unused_enqueue = 0;
> +#endif
> +
> +	if (static_branch_likely(&sched_uclamp_unused)) {
> +#ifdef CONFIG_SCHED_DEBUG
> +		p->uclamp_unused_enqueue = 1;
> +#endif
>                 return;
> +	}
> 
>         if (unlikely(!p->sched_class->uclamp_enabled))
>                 return;
> @@ -1075,6 +1085,10 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
> 
>         for_each_clamp_id(clamp_id)
>                 uclamp_rq_dec_id(rq, p, clamp_id);
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +	p->uclamp_unused_enqueue = 0;
> +#endif
>  }
> 
>  static inline void
> ---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 11:57     ` Mel Gorman
  2020-06-19 12:17       ` Valentin Schneider
@ 2020-06-19 12:55       ` Qais Yousef
  2020-06-19 14:51       ` Qais Yousef
  2 siblings, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 12:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Valentin Schneider, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 12:57, Mel Gorman wrote:
> On Fri, Jun 19, 2020 at 11:36:46AM +0100, Valentin Schneider wrote:
> > >                                    nouclamp                 uclamp      uclamp-static-key
> > > Hmean     send-64         162.43 (   0.00%)      157.84 *  -2.82%*      163.39 *   0.59%*
> > > Hmean     send-128        324.71 (   0.00%)      314.78 *  -3.06%*      326.18 *   0.45%*
> > > Hmean     send-256        641.55 (   0.00%)      628.67 *  -2.01%*      648.12 *   1.02%*
> > > Hmean     send-1024      2525.28 (   0.00%)     2448.26 *  -3.05%*     2543.73 *   0.73%*
> > > Hmean     send-2048      4836.14 (   0.00%)     4712.08 *  -2.57%*     4867.69 *   0.65%*
> > > Hmean     send-3312      7540.83 (   0.00%)     7425.45 *  -1.53%*     7621.06 *   1.06%*
> > > Hmean     send-4096      9124.53 (   0.00%)     8948.82 *  -1.93%*     9276.25 *   1.66%*
> > > Hmean     send-8192     15589.67 (   0.00%)    15486.35 *  -0.66%*    15819.98 *   1.48%*
> > > Hmean     send-16384    26386.47 (   0.00%)    25752.25 *  -2.40%*    26773.74 *   1.47%*
> > >
> > 
> > Am I reading this correctly in that compiling in uclamp but having the
> > static key enabled gives a slight improvement compared to not compiling in
> > uclamp? I suppose the important bit is that we're not seeing regressions
> > anymore, but still.
> > 
> 
> I haven't reviewed the series in depth because from your review, another
> version is likely in the works. However, it is not that unusual to
> see small fluctuations like this that are counter-intuitive. The report
> indicates the difference is likely outside of the noise with * around the
> percentage difference instead of () but it could be small boot-to-boot
> variance, differences in code layout, slight differences in slab usage
> patterns etc. The definitive evidence that uclamp overhead is no there
> is whether the uclamp functions show up in annotated profiles or not.

I certainly have seen weird variations in the numbers. If you've seen my
numbers in the links below, I was buffled when I moved to 5.7-rc2 and couldn't
reproduce again.

https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/

I think the hot path can be sensitive to code/data layout variations and now
uclamp added more variables to be accesses, this sensitivity could be
manifested in more ways, me thinks.

I am re-running the test now with perf record. But not sure if I'll be able to
provide the numbers by the end of the day. If it is easy for you to pick this
up, I'd appreciate if you can kick off a test.

But it's Friday after all.. :-)

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 12:51     ` Qais Yousef
@ 2020-06-19 13:23       ` Steven Rostedt
  2020-06-19 13:25       ` Valentin Schneider
  1 sibling, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2020-06-19 13:23 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Valentin Schneider, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On Fri, 19 Jun 2020 13:51:48 +0100
Qais Yousef <qais.yousef@arm.com> wrote:

> > On 18/06/20 20:55, Qais Yousef wrote:  
> > > There is a report that when uclamp is enabled, a netperf UDP test
> > > regresses compared to a kernel compiled without uclamp.
> > >
> > > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
> > >  
> > 
> > ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>  
> 
> The link is correct permalinnk from lore and contains the message-id as Peter
> likes and he has accepted this form before.
> 
> If you look closely you'll see that what you suggest is just moving 'lkml' to
> replace lore in the dns name and put an /r/. I don't see a need to enforce one
> form over the other as the one I used is much easier to get.

The two produce the same result, and I personally have used both. I'm
starting to move over to lore over lkml, as that seems to be becoming
the more popular form.

> 
> If Peter really insists I'll be happy to change.

But I agree, if Peter is insistent on lkml over lore, then it really
doesn't make a difference to switch it. As I said, they are identical
in what they produce.

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 12:51     ` Qais Yousef
  2020-06-19 13:23       ` Steven Rostedt
@ 2020-06-19 13:25       ` Valentin Schneider
  2020-06-19 14:13         ` Qais Yousef
  2020-06-19 19:47         ` Peter Zijlstra
  1 sibling, 2 replies; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 13:25 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 19/06/20 13:51, Qais Yousef wrote:
> On 06/19/20 11:36, Valentin Schneider wrote:
>>
>> On 18/06/20 20:55, Qais Yousef wrote:
>> > There is a report that when uclamp is enabled, a netperf UDP test
>> > regresses compared to a kernel compiled without uclamp.
>> >
>> > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
>> >
>>
>> ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>
>
> The link is correct permalinnk from lore and contains the message-id as Peter
> likes and he has accepted this form before.
>

I think the objections I remember were on using lkml.org rather than
lkml.kernel.org. Sorry!

> If you look closely you'll see that what you suggest is just moving 'lkml' to
> replace lore in the dns name and put an /r/. I don't see a need to enforce one
> form over the other as the one I used is much easier to get.
>

My assumption would be that while lore may fade (it hasn't been there for
that long, who knows what will come next), lkml.kernel.org ought to be
perennial. Keyword here being "assumption".

> If Peter really insists I'll be happy to change.
>
> [...]
>
>> > +	 * This could happen if sched_uclamp_unused was disabled while the
>> > +	 * current task was running, hence we could end up with unbalanced call
>> > +	 * to uclamp_rq_dec_id().
>> > +	 */
>> > +	if (unlikely(!bucket->tasks))
>> > +		return;
>>
>> I'm slightly worried about silent returns for cases like these, can we try
>> to cook something up to preserve the previous SCHED_WARN_ON()? Say,
>> something like the horrendous below - alternatively might be feasible with
>> with some clever p->on_rq flag.
>
> I am really against extra churn and debug code to detect an impossible case
> that is not really tricky for reviewers to discern. Outside of enqueue/dequeue
> path, it's only used in update_uclamp_active(). It is quite easy to see that
> it's impossible, except for the legit case now when we have a static key
> changing when a task is running.
>

Providing it isn't too much of a head scratcher (and admittedly what I am
suggesting is borderline here), I believe it is worthwhile to add debug
helps in what is assumed to be impossible cases - even more so in this case
seeing as it had been deemed worth to check previously. We've been proved
wrong on the "impossible" nature of some things before.

We have a few of those checks strewn over the scheduler code, so it's not
like we would be starting a new trend.

> I am strongly against extra debug code just to be safe. It ends up with
> confusion down the line and extra complexity, and since this is the hot path
> maybe potential extra variables to mess with cache behaviors.
>

Hence why I'd put this under CONFIG_SCHED_DEBUG.

> Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 13:25       ` Valentin Schneider
@ 2020-06-19 14:13         ` Qais Yousef
  2020-06-19 15:17           ` Valentin Schneider
  2020-06-19 19:47         ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 14:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 14:25, Valentin Schneider wrote:
> 
> On 19/06/20 13:51, Qais Yousef wrote:
> > On 06/19/20 11:36, Valentin Schneider wrote:
> >>
> >> On 18/06/20 20:55, Qais Yousef wrote:
> >> > There is a report that when uclamp is enabled, a netperf UDP test
> >> > regresses compared to a kernel compiled without uclamp.
> >> >
> >> > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
> >> >
> >>
> >> ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>
> >
> > The link is correct permalinnk from lore and contains the message-id as Peter
> > likes and he has accepted this form before.
> >
> 
> I think the objections I remember were on using lkml.org rather than
> lkml.kernel.org. Sorry!
> 
> > If you look closely you'll see that what you suggest is just moving 'lkml' to
> > replace lore in the dns name and put an /r/. I don't see a need to enforce one
> > form over the other as the one I used is much easier to get.
> >
> 
> My assumption would be that while lore may fade (it hasn't been there for
> that long, who knows what will come next), lkml.kernel.org ought to be
> perennial. Keyword here being "assumption".
> 
> > If Peter really insists I'll be happy to change.
> >
> > [...]
> >
> >> > +	 * This could happen if sched_uclamp_unused was disabled while the
> >> > +	 * current task was running, hence we could end up with unbalanced call
> >> > +	 * to uclamp_rq_dec_id().
> >> > +	 */
> >> > +	if (unlikely(!bucket->tasks))
> >> > +		return;
> >>
> >> I'm slightly worried about silent returns for cases like these, can we try
> >> to cook something up to preserve the previous SCHED_WARN_ON()? Say,
> >> something like the horrendous below - alternatively might be feasible with
> >> with some clever p->on_rq flag.
> >
> > I am really against extra churn and debug code to detect an impossible case
> > that is not really tricky for reviewers to discern. Outside of enqueue/dequeue
> > path, it's only used in update_uclamp_active(). It is quite easy to see that
> > it's impossible, except for the legit case now when we have a static key
> > changing when a task is running.
> >
> 
> Providing it isn't too much of a head scratcher (and admittedly what I am
> suggesting is borderline here), I believe it is worthwhile to add debug
> helps in what is assumed to be impossible cases - even more so in this case
> seeing as it had been deemed worth to check previously. We've been proved
> wrong on the "impossible" nature of some things before.
> 
> We have a few of those checks strewn over the scheduler code, so it's not
> like we would be starting a new trend.

I am sorry I am still not bought in. I think the parts you're talking about are
in the lockless part of the scheduler which are really hard to debug as several
cpus could be traversing these data from different code paths. But here this is
just extra churn.

If an imbalance has happend this means either:

	1. enqueue/dequeue_task() is imablanced itself
	2. uclamp_update_active() calls dec without inc.

If 1 happened we have more reasons to be worried about. For 2 the function
takes task_rq_lock() and does dec/inc in obvious way.

So I don't see any reason to add new info in task_struct and sprinkle #ifdefs
to protect against something that I can't see we can't reason correctly about
now.

We don't use pr_debug() in scheduler (I guess no computer would have booted
with that on), otherwise that would have been a good candidate for one, yes.
But we can't do that.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 11:57     ` Mel Gorman
  2020-06-19 12:17       ` Valentin Schneider
  2020-06-19 12:55       ` Qais Yousef
@ 2020-06-19 14:51       ` Qais Yousef
  2 siblings, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 14:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Valentin Schneider, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 12:57, Mel Gorman wrote:
> On Fri, Jun 19, 2020 at 11:36:46AM +0100, Valentin Schneider wrote:
> > >                                    nouclamp                 uclamp      uclamp-static-key
> > > Hmean     send-64         162.43 (   0.00%)      157.84 *  -2.82%*      163.39 *   0.59%*
> > > Hmean     send-128        324.71 (   0.00%)      314.78 *  -3.06%*      326.18 *   0.45%*
> > > Hmean     send-256        641.55 (   0.00%)      628.67 *  -2.01%*      648.12 *   1.02%*
> > > Hmean     send-1024      2525.28 (   0.00%)     2448.26 *  -3.05%*     2543.73 *   0.73%*
> > > Hmean     send-2048      4836.14 (   0.00%)     4712.08 *  -2.57%*     4867.69 *   0.65%*
> > > Hmean     send-3312      7540.83 (   0.00%)     7425.45 *  -1.53%*     7621.06 *   1.06%*
> > > Hmean     send-4096      9124.53 (   0.00%)     8948.82 *  -1.93%*     9276.25 *   1.66%*
> > > Hmean     send-8192     15589.67 (   0.00%)    15486.35 *  -0.66%*    15819.98 *   1.48%*
> > > Hmean     send-16384    26386.47 (   0.00%)    25752.25 *  -2.40%*    26773.74 *   1.47%*
> > >
> > 
> > Am I reading this correctly in that compiling in uclamp but having the
> > static key enabled gives a slight improvement compared to not compiling in
> > uclamp? I suppose the important bit is that we're not seeing regressions
> > anymore, but still.
> > 
> 
> I haven't reviewed the series in depth because from your review, another
> version is likely in the works. However, it is not that unusual to
> see small fluctuations like this that are counter-intuitive. The report
> indicates the difference is likely outside of the noise with * around the
> percentage difference instead of () but it could be small boot-to-boot
> variance, differences in code layout, slight differences in slab usage
> patterns etc. The definitive evidence that uclamp overhead is no there
> is whether the uclamp functions show up in annotated profiles or not.

The diff between nouclamp and uclamp-static-key (disabling uclamp in fast path)

     8.73%     -1.55%  [kernel.kallsyms]        [k] try_to_wake_up
     0.07%     +0.04%  [kernel.kallsyms]        [k] deactivate_task
     0.13%     -0.02%  [kernel.kallsyms]        [k] activate_task

The diff between nouclamp and uclamp-static-key (uclamp actively used in the
fast path)

     8.73%     -0.72%  [kernel.kallsyms]        [k] try_to_wake_up
     0.13%     +0.39%  [kernel.kallsyms]        [k] activate_task
     0.07%     +0.38%  [kernel.kallsyms]        [k] deactivate_task

I will include these numbers in the commit message in v2.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 14:13         ` Qais Yousef
@ 2020-06-19 15:17           ` Valentin Schneider
  2020-06-19 17:25             ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 15:17 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 19/06/20 15:13, Qais Yousef wrote:
> On 06/19/20 14:25, Valentin Schneider wrote:
>>
>> On 19/06/20 13:51, Qais Yousef wrote:
>> > On 06/19/20 11:36, Valentin Schneider wrote:
>> >>
>> >> On 18/06/20 20:55, Qais Yousef wrote:
>> >> > There is a report that when uclamp is enabled, a netperf UDP test
>> >> > regresses compared to a kernel compiled without uclamp.
>> >> >
>> >> > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
>> >> >
>> >>
>> >> ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>
>> >
>> > The link is correct permalinnk from lore and contains the message-id as Peter
>> > likes and he has accepted this form before.
>> >
>>
>> I think the objections I remember were on using lkml.org rather than
>> lkml.kernel.org. Sorry!
>>
>> > If you look closely you'll see that what you suggest is just moving 'lkml' to
>> > replace lore in the dns name and put an /r/. I don't see a need to enforce one
>> > form over the other as the one I used is much easier to get.
>> >
>>
>> My assumption would be that while lore may fade (it hasn't been there for
>> that long, who knows what will come next), lkml.kernel.org ought to be
>> perennial. Keyword here being "assumption".
>>
>> > If Peter really insists I'll be happy to change.
>> >
>> > [...]
>> >
>> >> > +	 * This could happen if sched_uclamp_unused was disabled while the
>> >> > +	 * current task was running, hence we could end up with unbalanced call
>> >> > +	 * to uclamp_rq_dec_id().
>> >> > +	 */
>> >> > +	if (unlikely(!bucket->tasks))
>> >> > +		return;
>> >>
>> >> I'm slightly worried about silent returns for cases like these, can we try
>> >> to cook something up to preserve the previous SCHED_WARN_ON()? Say,
>> >> something like the horrendous below - alternatively might be feasible with
>> >> with some clever p->on_rq flag.
>> >
>> > I am really against extra churn and debug code to detect an impossible case
>> > that is not really tricky for reviewers to discern. Outside of enqueue/dequeue
>> > path, it's only used in update_uclamp_active(). It is quite easy to see that
>> > it's impossible, except for the legit case now when we have a static key
>> > changing when a task is running.
>> >
>>
>> Providing it isn't too much of a head scratcher (and admittedly what I am
>> suggesting is borderline here), I believe it is worthwhile to add debug
>> helps in what is assumed to be impossible cases - even more so in this case
>> seeing as it had been deemed worth to check previously. We've been proved
>> wrong on the "impossible" nature of some things before.
>>
>> We have a few of those checks strewn over the scheduler code, so it's not
>> like we would be starting a new trend.
>
> I am sorry I am still not bought in. I think the parts you're talking about are
> in the lockless part of the scheduler which are really hard to debug as several
> cpus could be traversing these data from different code paths.

Not necessarily just those, see pick_next_task(),
active_load_balance_cpu_stop(), or a good proportion of SCHED_WARN_ON()'s.

> But here this is
> just extra churn.
>
> If an imbalance has happend this means either:
>
>       1. enqueue/dequeue_task() is imablanced itself
>       2. uclamp_update_active() calls dec without inc.
>
> If 1 happened we have more reasons to be worried about. For 2 the function
> takes task_rq_lock() and does dec/inc in obvious way.
>

True. I won't argue over the feasibility of the scenarios we are currently
aware of, my point was that if they do happen, it's nice to have debug
helps in the right places as the final breakage can happen much further
downstream.

FWIW I don't like the diff I suggested at all, but if we can come up with a
cleverer scheme I think we should do it, as per the above.

> So I don't see any reason to add new info in task_struct and sprinkle #ifdefs
> to protect against something that I can't see we can't reason correctly about
> now.
>
> We don't use pr_debug() in scheduler (I guess no computer would have booted
> with that on), otherwise that would have been a good candidate for one, yes.
> But we can't do that.
>
> Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 15:17           ` Valentin Schneider
@ 2020-06-19 17:25             ` Qais Yousef
  2020-06-19 18:52               ` Valentin Schneider
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 17:25 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 16:17, Valentin Schneider wrote:

[...]

> > But here this is
> > just extra churn.
> >
> > If an imbalance has happend this means either:
> >
> >       1. enqueue/dequeue_task() is imablanced itself
> >       2. uclamp_update_active() calls dec without inc.
> >
> > If 1 happened we have more reasons to be worried about. For 2 the function
> > takes task_rq_lock() and does dec/inc in obvious way.
> >
> 
> True. I won't argue over the feasibility of the scenarios we are currently
> aware of, my point was that if they do happen, it's nice to have debug
> helps in the right places as the final breakage can happen much further
> downstream.
> 
> FWIW I don't like the diff I suggested at all, but if we can come up with a
> cleverer scheme I think we should do it, as per the above.

There's the fact as well that this whole thing is to deal with potentially
avoid doing anything that is stricly not necessary in the fast path.

keep in mind that my patch of introducing the sysctl is not accepted yet
because it introduces such thing, but in that case it's not a debug only
feature. CONFIG_SCHED_DEBUG do get enabled by distros because it exports a lot
of useful info.

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-18 19:55 ` [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
  2020-06-19 10:36   ` Valentin Schneider
@ 2020-06-19 17:30   ` Peter Zijlstra
  2020-06-19 17:39     ` Qais Yousef
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2020-06-19 17:30 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On Thu, Jun 18, 2020 at 08:55:24PM +0100, Qais Yousef wrote:

> +	for_each_clamp_id(clamp_id) {
> +		memset(uc_rq[clamp_id].bucket,
> +		       0,
> +		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
> +
> +		uc_rq[clamp_id].value = uclamp_none(clamp_id);

I think you can replace all that with:

		*uc_rq = (struct uclamp_rq){
			.value = uclamp_none(clamp_id),
		};

it's shorter and is free or weird line-breaks :-)

> +	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-19 17:30   ` Peter Zijlstra
@ 2020-06-19 17:39     ` Qais Yousef
  2020-06-19 18:13       ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 17:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 19:30, Peter Zijlstra wrote:
> On Thu, Jun 18, 2020 at 08:55:24PM +0100, Qais Yousef wrote:
> 
> > +	for_each_clamp_id(clamp_id) {
> > +		memset(uc_rq[clamp_id].bucket,
> > +		       0,
> > +		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
> > +
> > +		uc_rq[clamp_id].value = uclamp_none(clamp_id);
> 
> I think you can replace all that with:
> 
> 		*uc_rq = (struct uclamp_rq){
> 			.value = uclamp_none(clamp_id),
> 		};
> 
> it's shorter and is free or weird line-breaks :-)

Sure. I just sent v2 so that people will be encouraged to run tests hopefully.
But will fix in v3.

Do we actually need to 0 out anything here? Shouldn't the runqueues all be in
BSS which gets initialized to 0 by default at boot?

Maybe better stay explicit..

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-18 19:55 ` [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
  2020-06-19 10:36   ` Valentin Schneider
  2020-06-19 10:39   ` Valentin Schneider
@ 2020-06-19 17:45   ` Peter Zijlstra
  2020-06-19 17:53     ` Qais Yousef
  2 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2020-06-19 17:45 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On Thu, Jun 18, 2020 at 08:55:25PM +0100, Qais Yousef wrote:

> +/*
> + * This static key is used to reduce the uclamp overhead in the fast path. It
> + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task().
> + *
> + * This allows users to continue to enable uclamp in their kernel config with
> + * minimum uclamp overhead in the fast path.
> + *
> + * As soon as userspace modifies any of the uclamp knobs, the static key is
> + * disabled, since we have an actual users that make use of uclamp
> + * functionality.
> + *
> + * The knobs that would disable this static key are:
> + *
> + *   * A task modifying its uclamp value with sched_setattr().
> + *   * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
> + *   * An admin modifying the cgroup cpu.uclamp.{min, max}
> + */
> +DEFINE_STATIC_KEY_TRUE(sched_uclamp_unused);

Maybe call the thing: 'sched_uclamp_users', instead?


> +		if (static_branch_unlikely(&sched_uclamp_unused))
> +			static_branch_disable(&sched_uclamp_unused);


> +	if (static_branch_unlikely(&sched_uclamp_unused))
> +		static_branch_disable(&sched_uclamp_unused);


> +	if (static_branch_unlikely(&sched_uclamp_unused))
> +		static_branch_disable(&sched_uclamp_unused);

That's an anti-pattern... just static_branch_disable(), or _enable()
with a 'better' name is sufficient.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 17:45   ` Peter Zijlstra
@ 2020-06-19 17:53     ` Qais Yousef
  0 siblings, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 17:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 19:45, Peter Zijlstra wrote:
> On Thu, Jun 18, 2020 at 08:55:25PM +0100, Qais Yousef wrote:
> 
> > +/*
> > + * This static key is used to reduce the uclamp overhead in the fast path. It
> > + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task().
> > + *
> > + * This allows users to continue to enable uclamp in their kernel config with
> > + * minimum uclamp overhead in the fast path.
> > + *
> > + * As soon as userspace modifies any of the uclamp knobs, the static key is
> > + * disabled, since we have an actual users that make use of uclamp
> > + * functionality.
> > + *
> > + * The knobs that would disable this static key are:
> > + *
> > + *   * A task modifying its uclamp value with sched_setattr().
> > + *   * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
> > + *   * An admin modifying the cgroup cpu.uclamp.{min, max}
> > + */
> > +DEFINE_STATIC_KEY_TRUE(sched_uclamp_unused);
> 
> Maybe call the thing: 'sched_uclamp_users', instead?
> 
> 
> > +		if (static_branch_unlikely(&sched_uclamp_unused))
> > +			static_branch_disable(&sched_uclamp_unused);
> 
> 
> > +	if (static_branch_unlikely(&sched_uclamp_unused))
> > +		static_branch_disable(&sched_uclamp_unused);
> 
> 
> > +	if (static_branch_unlikely(&sched_uclamp_unused))
> > +		static_branch_disable(&sched_uclamp_unused);
> 
> That's an anti-pattern... just static_branch_disable(), or _enable()
> with a 'better' name is sufficient.

I misread the code. I saw there's a WAN_ON_ONCE() but that only triggers if the
atomic variable has a value that is ! in (0, 1) range.

So yes we can call it unconditionally.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-19 17:39     ` Qais Yousef
@ 2020-06-19 18:13       ` Peter Zijlstra
  2020-06-19 18:42         ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2020-06-19 18:13 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On Fri, Jun 19, 2020 at 06:39:44PM +0100, Qais Yousef wrote:
> On 06/19/20 19:30, Peter Zijlstra wrote:
> > On Thu, Jun 18, 2020 at 08:55:24PM +0100, Qais Yousef wrote:
> > 
> > > +	for_each_clamp_id(clamp_id) {
> > > +		memset(uc_rq[clamp_id].bucket,
> > > +		       0,
> > > +		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
> > > +
> > > +		uc_rq[clamp_id].value = uclamp_none(clamp_id);
> > 
> > I think you can replace all that with:
> > 
> > 		*uc_rq = (struct uclamp_rq){
> > 			.value = uclamp_none(clamp_id),
> > 		};
> > 
> > it's shorter and is free or weird line-breaks :-)
> 
> Sure. I just sent v2 so that people will be encouraged to run tests hopefully.
> But will fix in v3.
> 
> Do we actually need to 0 out anything here? Shouldn't the runqueues all be in
> BSS which gets initialized to 0 by default at boot?
> 
> Maybe better stay explicit..

C99 named initializer (as used here) explicitly zero initializes all
unnamed members. Is that explicit enough? ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-19 18:13       ` Peter Zijlstra
@ 2020-06-19 18:42         ` Qais Yousef
  2020-06-22 10:30           ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-06-19 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 20:13, Peter Zijlstra wrote:
> On Fri, Jun 19, 2020 at 06:39:44PM +0100, Qais Yousef wrote:
> > On 06/19/20 19:30, Peter Zijlstra wrote:
> > > On Thu, Jun 18, 2020 at 08:55:24PM +0100, Qais Yousef wrote:
> > > 
> > > > +	for_each_clamp_id(clamp_id) {
> > > > +		memset(uc_rq[clamp_id].bucket,
> > > > +		       0,
> > > > +		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
> > > > +
> > > > +		uc_rq[clamp_id].value = uclamp_none(clamp_id);
> > > 
> > > I think you can replace all that with:
> > > 
> > > 		*uc_rq = (struct uclamp_rq){
> > > 			.value = uclamp_none(clamp_id),
> > > 		};
> > > 
> > > it's shorter and is free or weird line-breaks :-)
> > 
> > Sure. I just sent v2 so that people will be encouraged to run tests hopefully.
> > But will fix in v3.
> > 
> > Do we actually need to 0 out anything here? Shouldn't the runqueues all be in
> > BSS which gets initialized to 0 by default at boot?
> > 
> > Maybe better stay explicit..
> 
> C99 named initializer (as used here) explicitly zero initializes all
> unnamed members. Is that explicit enough? ;-)

Hehe yes, but what I meant is that unless

	DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);

has some special rules, it should be in BSS and already zeroed out when we do
sched_init(). So do we really need to explicitly zero out anything? It seems
redundant, but again maybe being explicit is more readable,
so maybe better keep it the way it is (named initializer of struct).

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 17:25             ` Qais Yousef
@ 2020-06-19 18:52               ` Valentin Schneider
  0 siblings, 0 replies; 26+ messages in thread
From: Valentin Schneider @ 2020-06-19 18:52 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel


On 19/06/20 18:25, Qais Yousef wrote:
> On 06/19/20 16:17, Valentin Schneider wrote:
>
> [...]
>
>> > But here this is
>> > just extra churn.
>> >
>> > If an imbalance has happend this means either:
>> >
>> >       1. enqueue/dequeue_task() is imablanced itself
>> >       2. uclamp_update_active() calls dec without inc.
>> >
>> > If 1 happened we have more reasons to be worried about. For 2 the function
>> > takes task_rq_lock() and does dec/inc in obvious way.
>> >
>>
>> True. I won't argue over the feasibility of the scenarios we are currently
>> aware of, my point was that if they do happen, it's nice to have debug
>> helps in the right places as the final breakage can happen much further
>> downstream.
>>
>> FWIW I don't like the diff I suggested at all, but if we can come up with a
>> cleverer scheme I think we should do it, as per the above.
>
> There's the fact as well that this whole thing is to deal with potentially
> avoid doing anything that is stricly not necessary in the fast path.
>
> keep in mind that my patch of introducing the sysctl is not accepted yet
> because it introduces such thing, but in that case it's not a debug only
> feature. CONFIG_SCHED_DEBUG do get enabled by distros because it exports a lot
> of useful info.

Sigh, true, but they really shouldn't. The whole point of having
SCHED_WARN_ON() is that it's a no-op on !SCHED_DEBUG kernels, which should
be any "production" kernel :(

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key
  2020-06-19 13:25       ` Valentin Schneider
  2020-06-19 14:13         ` Qais Yousef
@ 2020-06-19 19:47         ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2020-06-19 19:47 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Qais Yousef, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel

On Fri, Jun 19, 2020 at 02:25:20PM +0100, Valentin Schneider wrote:
> 
> On 19/06/20 13:51, Qais Yousef wrote:
> > On 06/19/20 11:36, Valentin Schneider wrote:
> >>
> >> On 18/06/20 20:55, Qais Yousef wrote:
> >> > There is a report that when uclamp is enabled, a netperf UDP test
> >> > regresses compared to a kernel compiled without uclamp.
> >> >
> >> > https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
> >> >
> >>
> >> ISTR the perennial form for those is: https://lkml.kernel.org/r/<message-id>
> >
> > The link is correct permalinnk from lore and contains the message-id as Peter
> > likes and he has accepted this form before.
> >
> 
> I think the objections I remember were on using lkml.org rather than
> lkml.kernel.org. Sorry!

:-) Yeah, lkml.org is bad.

That said, I prefer the lkml.kernel.org/r/ variant over
lore.kernel.org/lkml/ in part because it existed first and I'm lazy, in
part because it's shorter and in part because lkml.kernel.org/r/ is an
explicit redirect and lore is 'just' an archive it can redirect to
(although admittedly, it looks rather unlikely lore is going to die).

I'll not make a big point of it though and will in all likelyhood accept
lore links. The important point is that it has the MessageID in.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] sched: Optionally skip uclamp logic in fast path
  2020-06-18 19:55 [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef
  2020-06-18 19:55 ` [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
  2020-06-18 19:55 ` [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
@ 2020-06-22  9:06 ` Lukasz Luba
  2 siblings, 0 replies; 26+ messages in thread
From: Lukasz Luba @ 2020-06-22  9:06 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Patrick Bellasi, Chris Redpath, linux-kernel

Hi Qais,

On 6/18/20 8:55 PM, Qais Yousef wrote:
> This series attempts to address the report that uclamp logic could be expensive
> sometimes and shows a regression in netperf UDP_STREAM under certain
> conditions.
> 
> The first patch is a fix for how struct uclamp_rq is initialized which is
> required by the 2nd patch which contains the real 'fix'.
> 
> Worth noting that the root cause of the overhead is believed to be system
> specific or related to potential certain code/data layout issues, leading to
> worse I/D $ performance.
> 
> Different systems exhibited different behaviors and the regression did
> disappear in certain kernel version while attempting to reporoduce.
> 
> More info can be found here:
> 
> https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/
> 
> Having the static key seemed the best thing to do to ensure the effect of
> uclamp is minimized for kernels that compile it in but don't have a userspace
> that uses it, which will allow distros to distribute uclamp capable kernels by
> default without having to compromise on performance for some systems that could
> be affected.
> 
> Thanks
> 
> --
> Qais Yousef
> 

I have given it a try on my machine
(HP server 2 socket 24 CPUs X86 64bit 4 NUMA nodes, AMD Opteron 6174,
L2 512KB/cpu, L3 6MB/node, RAM 40GB/node).
Kernel v5.7-rc7 with Open Suse 15.1 distro config.
The numa control for pinning tasks has not been used.

The results are better than the last time I have checked this uclamp
issue [1]. Here are the results for kernel built based on Suse 15.1
distor config + uclamp tasks + task groups (similar to 3rd kernel from
[1] tests).
They are really good in terms of nteperf-udp performance and how they
deal with statistical noise due to context switch and/or tasks jumping
around CPUs.

The netperf-udp (100 tests for each udp size):
                         ./v5.7-rc7-base    ./v5.7-rc7-uclamp-tsk-grp-fix
Hmean     send-64         62.36 (   0.00%)       66.27 *   6.27%*
Hmean     send-128       124.24 (   0.00%)      132.03 *   6.27%*
Hmean     send-256       244.81 (   0.00%)      261.21 *   6.70%*
Hmean     send-1024      922.18 (   0.00%)      985.84 *   6.90%*
Hmean     send-2048      1716.61 (   0.00%)     1811.30 *   5.52%*
Hmean     send-3312      2564.73 (   0.00%)     2690.62 *   4.91%*
Hmean     send-4096      2967.01 (   0.00%)     3128.71 *   5.45%*
Hmean     send-8192      4834.31 (   0.00%)     5028.15 *   4.01%*
Hmean     send-16384     7569.17 (   0.00%)     7734.05 *   2.18%*
Hmean     recv-64         62.35 (   0.00%)       66.27 *   6.28%*
Hmean     recv-128       124.24 (   0.00%)      132.03 *   6.27%*
Hmean     recv-256       244.79 (   0.00%)      261.20 *   6.70%*
Hmean     recv-1024      922.10 (   0.00%)      985.82 *   6.91%*
Hmean     recv-2048      1716.61 (   0.00%)     1811.29 *   5.52%*
Hmean     recv-3312      2564.46 (   0.00%)     2690.60 *   4.92%*
Hmean     recv-4096      2967.00 (   0.00%)     3128.71 *   5.45%*
Hmean     recv-8192      4834.06 (   0.00%)     5028.05 *   4.01%*
Hmean     recv-16384     7568.70 (   0.00%)     7733.69 *   2.18%*

Statistics info showing performance when there is the context
switch noise and/or tasks are jumping around CPUs. This is from
netperf-udp benchmark but running only 64B test case once with
tracing, but repeated 100 times in bash loop.
Traced functions performance combined and presented in statistical form
(padas dataframe describe() output).
It can be compared with basic kernel results or the similar kernel
(but w/o this fix) results present at [1].
For completeness I also put them below.

kernel with uclamp task + task group + this fix
activate_task
              Hit    Time_us  Avg_us  s^2_us
count     101.00     101.00  101.00  101.00
mean   26,269.33  19,397.98    1.15    0.51
std   123,464.10  90,121.64    0.36    0.19
min       101.00     161.49    0.37    0.03
50%       408.00     479.45    1.26    0.50
75%       720.00     704.05    1.40    0.60
90%     1,795.00   1,071.86    1.57    0.72
95%     3,688.00   1,776.87    1.61    0.79
99%   733,737.00 518,448.60    1.73    1.03
max   756,631.00 537,865.40    1.76    1.06
deactivate_task
                Hit    Time_us  Avg_us  s^2_us
count       101.00     101.00  101.00  101.00
mean    111,714.44  55,791.32    0.80    0.27
std     307,358.56 153,230.31    0.26    0.14
min          88.00      91.73    0.31    0.00
50%         464.00     381.30    0.90    0.29
75%       1,118.00     622.70    1.01    0.36
90%     517,991.00 255,669.50    1.10    0.44
95%     997,663.00 484,013.20    1.12    0.47
99%   1,189,980.00 578,987.30    1.14    0.51
max   1,422,640.00 686,828.60    1.16    0.60


Basic kernel traced functions performance
(no uclamp, no fixes, just built based on distro config)

activate_task
                Hit    Time_us  Avg_us  s^2_us
count       138.00     138.00  138.00  138.00
mean     20,387.44  14,587.33    1.15    0.53
std     114,980.19  81,427.51    0.42    0.23
min         110.00     181.68    0.32    0.00
50%         411.00     461.55    1.32    0.54
75%         881.75     760.08    1.47    0.66
90%       2,885.60   1,302.03    1.61    0.80
95%      55,318.05  41,273.41    1.66    0.92
99%     501,660.04 358,939.04    1.77    1.09
max   1,131,457.00 798,097.30    1.80    1.42
deactivate_task
                Hit    Time_us  Avg_us  s^2_us
count       138.00     138.00  138.00  138.00
mean     81,828.83  39,991.61    0.81    0.28
std     260,130.01 126,386.89    0.28    0.14
min          97.00      92.35    0.26    0.00
50%         424.00     340.35    0.94    0.30
75%       1,062.25     684.98    1.05    0.37
90%     330,657.50 168,320.94    1.11    0.46
95%     748,920.70 359,498.23    1.15    0.51
99%   1,094,614.76 528,459.50    1.21    0.56
max   1,630,473.00 789,476.50    1.25    0.60

Old kernel (w/o fix, uclamp task + tsg grp) which had this uclamp issue

activate_task
                Hit      Time_us  Avg_us  s^2_us
count       273.00       273.00  273.00  273.00
mean     15,958.34    16,471.84    1.58    0.67
std     105,096.88   108,322.03    0.43    0.32
min           3.00         4.96    0.41    0.00
50%         245.00       400.23    1.70    0.64
75%         384.00       565.53    1.85    0.78
90%       1,602.00     1,069.08    1.95    0.95
95%       3,403.00     1,573.74    2.01    1.13
99%     589,484.56   604,992.57    2.11    1.75
max   1,035,866.00 1,096,975.00    2.40    3.08
deactivate_task
                Hit      Time_us  Avg_us  s^2_us
count       273.00       273.00  273.00  273.00
mean     94,607.02    63,433.12    1.02    0.34
std     325,130.91   216,844.92    0.28    0.16
min           2.00         2.79    0.29    0.00
50%         244.00       291.49    1.11    0.36
75%         496.00       448.72    1.19    0.43
90%     120,304.60    82,964.94    1.25    0.55
95%     945,480.60   626,793.58    1.33    0.60
99%   1,485,959.96 1,010,615.72    1.40    0.68
max   2,120,682.00 1,403,280.00    1.80    1.11


Based on these statistics this fix has better distribution in almost all
marked points and better performance results for netperf-udp.

I can run also tests for kernel just with uclamp tasks today, if it's
needed.

Regards,
Lukasz

[1] 
https://lore.kernel.org/lkml/d9c951da-87eb-ab20-9434-f15b34096d66@arm.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq
  2020-06-19 18:42         ` Qais Yousef
@ 2020-06-22 10:30           ` Qais Yousef
  0 siblings, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2020-06-22 10:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi,
	Chris Redpath, Lukasz Luba, linux-kernel

On 06/19/20 19:42, Qais Yousef wrote:
> On 06/19/20 20:13, Peter Zijlstra wrote:
> > On Fri, Jun 19, 2020 at 06:39:44PM +0100, Qais Yousef wrote:
> > > On 06/19/20 19:30, Peter Zijlstra wrote:
> > > > On Thu, Jun 18, 2020 at 08:55:24PM +0100, Qais Yousef wrote:
> > > > 
> > > > > +	for_each_clamp_id(clamp_id) {
> > > > > +		memset(uc_rq[clamp_id].bucket,
> > > > > +		       0,
> > > > > +		       sizeof(struct uclamp_bucket)*UCLAMP_BUCKETS);
> > > > > +
> > > > > +		uc_rq[clamp_id].value = uclamp_none(clamp_id);
> > > > 
> > > > I think you can replace all that with:
> > > > 
> > > > 		*uc_rq = (struct uclamp_rq){
> > > > 			.value = uclamp_none(clamp_id),
> > > > 		};
> > > > 
> > > > it's shorter and is free or weird line-breaks :-)
> > > 
> > > Sure. I just sent v2 so that people will be encouraged to run tests hopefully.
> > > But will fix in v3.
> > > 
> > > Do we actually need to 0 out anything here? Shouldn't the runqueues all be in
> > > BSS which gets initialized to 0 by default at boot?
> > > 
> > > Maybe better stay explicit..
> > 
> > C99 named initializer (as used here) explicitly zero initializes all
> > unnamed members. Is that explicit enough? ;-)
> 
> Hehe yes, but what I meant is that unless
> 
> 	DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
> 
> has some special rules, it should be in BSS and already zeroed out when we do
> sched_init(). So do we really need to explicitly zero out anything? It seems
> redundant, but again maybe being explicit is more readable,
> so maybe better keep it the way it is (named initializer of struct).

FWIW, they end up in .data section actually. So they're assumed to be
initialized. So we must explicitly initialize everything..

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-06-22 10:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 19:55 [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef
2020-06-18 19:55 ` [PATCH 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef
2020-06-19 10:36   ` Valentin Schneider
2020-06-19 17:30   ` Peter Zijlstra
2020-06-19 17:39     ` Qais Yousef
2020-06-19 18:13       ` Peter Zijlstra
2020-06-19 18:42         ` Qais Yousef
2020-06-22 10:30           ` Qais Yousef
2020-06-18 19:55 ` [PATCH 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef
2020-06-19 10:36   ` Valentin Schneider
2020-06-19 11:57     ` Mel Gorman
2020-06-19 12:17       ` Valentin Schneider
2020-06-19 12:55       ` Qais Yousef
2020-06-19 14:51       ` Qais Yousef
2020-06-19 12:51     ` Qais Yousef
2020-06-19 13:23       ` Steven Rostedt
2020-06-19 13:25       ` Valentin Schneider
2020-06-19 14:13         ` Qais Yousef
2020-06-19 15:17           ` Valentin Schneider
2020-06-19 17:25             ` Qais Yousef
2020-06-19 18:52               ` Valentin Schneider
2020-06-19 19:47         ` Peter Zijlstra
2020-06-19 10:39   ` Valentin Schneider
2020-06-19 17:45   ` Peter Zijlstra
2020-06-19 17:53     ` Qais Yousef
2020-06-22  9:06 ` [PATCH 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).