All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
@ 2023-05-30 13:55 Vineeth Pillai
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Vineeth Pillai @ 2023-05-30 13:55 UTC (permalink / raw)
  To: luca.abeni, Juri Lelli, Daniel Bristot de Oliveira,
	Peter Zijlstra, Ingo Molnar, Vincent Guittot, Steven Rostedt,
	Joel Fernandes, youssefesmat, Dietmar Eggemann, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: Vineeth Pillai, Jonathan Corbet, linux-kernel, linux-doc

According to the GRUB[1] rule, the runtime is depreciated as:
  "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

To guarantee that deadline tasks doesn't starve lower class tasks,
we do not allocate the full bandwidth of the cpu to deadline tasks.
Maximum bandwidth usable by deadline tasks is denoted by "Umax".
Considering Umax, equation (1) becomes:
  "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Current implementation has a minor bug in equation (2), which this
patch fixes.

The reclamation logic is verified by a sample program which creates
multiple deadline threads and observing their utilization. The tests
were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

Tests on 6.3.0
==============

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

RUN 3: 2 tasks
  Task 1: runtime=1ms, deadline=period=10ms
  Task 2: runtime=1ms, deadline=period=100ms
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

As seen above, the reclamation doesn't reclaim the maximum allowed
bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
bandwidth also comes down.

Tests with this patch applied
=============================

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

RUN 3: 2 tasks
  Task 1: runtime=1ms, deadline=period=10ms
  Task 2: runtime=1ms, deadline=period=100ms
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

Running tasks on all cpus allowing for migration also showed that
the utilization is reclaimed to the maximum. Running 10 tasks on
3 cpus SCHED_FLAG_RECLAIM - top shows:
%Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
%Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
%Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
     (2015). Parallel and sequential reclaiming in multicore
     real-time global scheduling.

Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 kernel/sched/deadline.c | 50 +++++++++++++++++++----------------------
 kernel/sched/sched.h    |  6 +++++
 2 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 71b24371a6f7..dfb59a363560 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
 }
 
 /*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
  * where u is the utilization of the task, Umax is the maximum reclaimable
  * utilization, Uinact is the (per-runqueue) inactive utilization, computed
  * as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
+ * "runqueue active utilization", and Uextra is the (per runqueue) extra
  * reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
  */
 static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
 {
-	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
 	u64 u_act;
-	u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
 
 	/*
-	 * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
-	 * we compare u_inact + rq->dl.extra_bw with
-	 * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
-	 * u_inact + rq->dl.extra_bw can be larger than
-	 * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
-	 * leading to wrong results)
+	 * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+	 * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+	 * can be larger than u_max. So, u_max - u_inact - u_extra would be
+	 * negative leading to wrong results.
 	 */
-	if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
-		u_act = u_act_min;
+	if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+		u_act = dl_se->dl_bw;
 	else
-		u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+		u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
 
+	u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
 	return (delta * u_act) >> BW_SHIFT;
 }
 
@@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
 {
 	if (global_rt_runtime() == RUNTIME_INF) {
 		dl_rq->bw_ratio = 1 << RATIO_SHIFT;
-		dl_rq->extra_bw = 1 << BW_SHIFT;
+		dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
 	} else {
 		dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
 			  global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
-		dl_rq->extra_bw = to_ratio(global_rt_period(),
-						    global_rt_runtime());
+		dl_rq->max_bw = dl_rq->extra_bw =
+			to_ratio(global_rt_period(), global_rt_runtime());
 	}
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e8df6d31c1e..73027c2806dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -753,6 +753,12 @@ struct dl_rq {
 	u64			this_bw;
 	u64			extra_bw;
 
+	/*
+	 * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+	 * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+	 */
+	u64			max_bw;
+
 	/*
 	 * Inverse of the fraction of CPU utilization that can be reclaimed
 	 * by the GRUB algorithm.
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
  2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
@ 2023-05-30 13:55 ` Vineeth Pillai
  2023-05-30 14:12   ` luca abeni
                     ` (3 more replies)
  2023-05-30 14:12 ` [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB luca abeni
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 12+ messages in thread
From: Vineeth Pillai @ 2023-05-30 13:55 UTC (permalink / raw)
  To: luca.abeni, Juri Lelli, Daniel Bristot de Oliveira,
	Peter Zijlstra, Ingo Molnar, Vincent Guittot, Steven Rostedt,
	Joel Fernandes, youssefesmat, Dietmar Eggemann, Ben Segall,
	Mel Gorman, Valentin Schneider
  Cc: Vineeth Pillai, Jonathan Corbet, linux-kernel, linux-doc

Update the details of GRUB to reflect the updated logic.

Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 Documentation/scheduler/sched-deadline.rst | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index 9d9be52f221a..9fe4846079bb 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -203,12 +203,15 @@ Deadline Task Scheduling
   - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
     runqueue, including the tasks in Inactive state.
 
+  - Maximum usable bandwidth (max_bw): This is the maximum bandwidth usable by
+    deadline tasks and is currently set to the RT capacity.
+
 
  The algorithm reclaims the bandwidth of the tasks in Inactive state.
  It does so by decrementing the runtime of the executing task Ti at a pace equal
  to
 
-           dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+           dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt
 
  where:
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
  2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
@ 2023-05-30 14:12 ` luca abeni
  2023-06-01 10:34 ` Daniel Bristot de Oliveira
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: luca abeni @ 2023-05-30 14:12 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: Juri Lelli, Daniel Bristot de Oliveira, Peter Zijlstra,
	Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
	youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc

I think this patch is OK


			Thanks,
				Luca

On Tue, 30 May 2023 09:55:25 -0400
Vineeth Pillai <vineeth@bitbyteword.org> wrote:

> According to the GRUB[1] rule, the runtime is depreciated as:
>   "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
> 
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
>   "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
> 
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
> 
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
> 
> Tests on 6.3.0
> ==============
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
> 
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
> 
> Tests with this patch applied
> =============================
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
> 
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
> %Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
> %Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa
> 
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
>      (2015). Parallel and sequential reclaiming in multicore
>      real-time global scheduling.
> 
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
>  kernel/sched/deadline.c | 50
> +++++++++++++++++++---------------------- kernel/sched/sched.h    |
> 6 +++++ 2 files changed, 29 insertions(+), 27 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct
> sched_dl_entity *dl_se) }
>  
>  /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to
> the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq =
> -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
>   * where u is the utilization of the task, Umax is the maximum
> reclaimable
>   * utilization, Uinact is the (per-runqueue) inactive utilization,
> computed
>   * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue)
> extra
> + * "runqueue active utilization", and Uextra is the (per runqueue)
> extra
>   * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by
> RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64
> seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> + * Since delta is a 64 bit variable, to have an overflow its value
> should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So,
> overflow is
> + * not an issue here.
>   */
>  static u64 grub_reclaim(u64 delta, struct rq *rq, struct
> sched_dl_entity *dl_se) {
> -	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */ u64 u_act;
> -	u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >>
> RATIO_SHIFT;
> +	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */ 
>  	/*
> -	 * Instead of computing max{u * bw_ratio, (1 - u_inact -
> u_extra)},
> -	 * we compare u_inact + rq->dl.extra_bw with
> -	 * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> -	 * u_inact + rq->dl.extra_bw can be larger than
> -	 * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> -	 * leading to wrong results)
> +	 * Instead of computing max{u, (u_max - u_inact - u_extra)},
> we
> +	 * compare u_inact + u_extra with u_max - u, because u_inact
> + u_extra
> +	 * can be larger than u_max. So, u_max - u_inact - u_extra
> would be
> +	 * negative leading to wrong results.
>  	 */
> -	if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> -		u_act = u_act_min;
> +	if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> +		u_act = dl_se->dl_bw;
>  	else
> -		u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> +		u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>  
> +	u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
>  	return (delta * u_act) >> BW_SHIFT;
>  }
>  
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq
> *dl_rq) {
>  	if (global_rt_runtime() == RUNTIME_INF) {
>  		dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> -		dl_rq->extra_bw = 1 << BW_SHIFT;
> +		dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
>  	} else {
>  		dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
>  			  global_rt_period()) >> (BW_SHIFT -
> RATIO_SHIFT);
> -		dl_rq->extra_bw = to_ratio(global_rt_period(),
> -
> global_rt_runtime());
> +		dl_rq->max_bw = dl_rq->extra_bw =
> +			to_ratio(global_rt_period(),
> global_rt_runtime()); }
>  }
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
>  	u64			this_bw;
>  	u64			extra_bw;
>  
> +	/*
> +	 * Maximum available bandwidth for reclaiming by
> SCHED_FLAG_RECLAIM
> +	 * tasks of this rq. Used in calculation of reclaimable
> bandwidth(GRUB).
> +	 */
> +	u64			max_bw;
> +
>  	/*
>  	 * Inverse of the fraction of CPU utilization that can be
> reclaimed
>  	 * by the GRUB algorithm.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
@ 2023-05-30 14:12   ` luca abeni
  2023-06-01 10:35   ` Daniel Bristot de Oliveira
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: luca abeni @ 2023-05-30 14:12 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: Juri Lelli, Daniel Bristot de Oliveira, Peter Zijlstra,
	Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
	youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc

I think this patch is also OK


			Thanks,
				Luca

On Tue, 30 May 2023 09:55:26 -0400
Vineeth Pillai <vineeth@bitbyteword.org> wrote:

> Update the details of GRUB to reflect the updated logic.
> 
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---
>  Documentation/scheduler/sched-deadline.rst | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/scheduler/sched-deadline.rst
> b/Documentation/scheduler/sched-deadline.rst index
> 9d9be52f221a..9fe4846079bb 100644 ---
> a/Documentation/scheduler/sched-deadline.rst +++
> b/Documentation/scheduler/sched-deadline.rst @@ -203,12 +203,15 @@
> Deadline Task Scheduling
>    - Total bandwidth (this_bw): this is the sum of all tasks
> "belonging" to the runqueue, including the tasks in Inactive state.
>  
> +  - Maximum usable bandwidth (max_bw): This is the maximum bandwidth
> usable by
> +    deadline tasks and is currently set to the RT capacity.
> +
>  
>   The algorithm reclaims the bandwidth of the tasks in Inactive state.
>   It does so by decrementing the runtime of the executing task Ti at
> a pace equal to
>  
> -           dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
> +           dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt
>  
>   where:
>  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
  2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
  2023-05-30 14:12 ` [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB luca abeni
@ 2023-06-01 10:34 ` Daniel Bristot de Oliveira
  2023-06-01 11:56 ` Dietmar Eggemann
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-06-01 10:34 UTC (permalink / raw)
  To: Vineeth Pillai, luca.abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: Jonathan Corbet, linux-kernel, linux-doc, Vincent Guittot,
	Steven Rostedt, Joel Fernandes, youssefesmat, Dietmar Eggemann,
	Mel Gorman, Valentin Schneider, Ben Segall

On 5/30/23 15:55, Vineeth Pillai wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
>   "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
> 
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
>   "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
> 
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
> 
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
> 
> Tests on 6.3.0
> ==============
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
> 
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
> 
> Tests with this patch applied
> =============================
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
> 
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
> %Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
> %Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa
> 
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
>      (2015). Parallel and sequential reclaiming in multicore
>      real-time global scheduling.

So, I did some testing, mainly to validate the "one task cannot run on
two CPUs,"  that is, a case in which a higher utilization task would
always have its % of CPU available, even in the presence of low utilization
trying to reclaim all the CPU time. E.g.,

	Task 1: runtime=1ms, deadline=period=10ms with reclaim
	Task 2: runtime=1ms, deadline=period=10ms with reclaim
	Task 3: runtime 8ms, deadline=period=10ms without reclaim

On two CPUs task 3 always have 80% available... the other tasks
do not get 95% because of migrations.

With 3+ cpus, the tasks can run up to 95% because there are CPUs to
everyone.

I played with migrate disable to force timelines... and yet, I failed
to break things so... we are good :-).

Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>

Thanks!
-- Daniel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
  2023-05-30 14:12   ` luca abeni
@ 2023-06-01 10:35   ` Daniel Bristot de Oliveira
  2023-06-01 12:55   ` Juri Lelli
  2023-06-19 11:04   ` [tip: sched/core] " tip-bot2 for Vineeth Pillai
  3 siblings, 0 replies; 12+ messages in thread
From: Daniel Bristot de Oliveira @ 2023-06-01 10:35 UTC (permalink / raw)
  To: Vineeth Pillai, luca.abeni, Juri Lelli, Peter Zijlstra,
	Ingo Molnar, Jonathan Corbet
  Cc: linux-kernel, linux-doc, Vincent Guittot, Joel Fernandes,
	Steven Rostedt, youssefesmat, Dietmar Eggemann, Ben Segall,
	Valentin Schneider, Mel Gorman

On 5/30/23 15:55, Vineeth Pillai wrote:
> Update the details of GRUB to reflect the updated logic.
> 
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>

Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>

Thanks!
-- Daniel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
  2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
                   ` (2 preceding siblings ...)
  2023-06-01 10:34 ` Daniel Bristot de Oliveira
@ 2023-06-01 11:56 ` Dietmar Eggemann
  2023-06-05  1:57   ` Vineeth Remanan Pillai
  2023-06-01 12:54 ` Juri Lelli
  2023-06-19 11:04 ` [tip: sched/core] " tip-bot2 for Vineeth Pillai
  5 siblings, 1 reply; 12+ messages in thread
From: Dietmar Eggemann @ 2023-06-01 11:56 UTC (permalink / raw)
  To: Vineeth Pillai, luca.abeni, Juri Lelli,
	Daniel Bristot de Oliveira, Peter Zijlstra, Ingo Molnar,
	Vincent Guittot, Steven Rostedt, Joel Fernandes, youssefesmat,
	Ben Segall, Mel Gorman, Valentin Schneider
  Cc: Jonathan Corbet, linux-kernel, linux-doc

On 30/05/2023 15:55, Vineeth Pillai wrote:

[...]

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
>  }
>  
>  /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
>   * where u is the utilization of the task, Umax is the maximum reclaimable
>   * utilization, Uinact is the (per-runqueue) inactive utilization, computed
>   * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue) extra
> + * "runqueue active utilization", and Uextra is the (per runqueue) extra
>   * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.

nit-pick:

s/multiped/multiplied

> + * Since delta is a 64 bit variable, to have an overflow its value should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
> + * not an issue here.
>   */
>  static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
>  {
> -	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
>  	u64 u_act;
> -	u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
> +	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
>  
>  	/*
> -	 * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
> -	 * we compare u_inact + rq->dl.extra_bw with
> -	 * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> -	 * u_inact + rq->dl.extra_bw can be larger than
> -	 * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> -	 * leading to wrong results)
> +	 * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
> +	 * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
> +	 * can be larger than u_max. So, u_max - u_inact - u_extra would be
> +	 * negative leading to wrong results.
>  	 */
> -	if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> -		u_act = u_act_min;
> +	if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> +		u_act = dl_se->dl_bw;
>  	else
> -		u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> +		u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>  
> +	u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
>  	return (delta * u_act) >> BW_SHIFT;
>  }
>  
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
>  {
>  	if (global_rt_runtime() == RUNTIME_INF) {
>  		dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> -		dl_rq->extra_bw = 1 << BW_SHIFT;
> +		dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
>  	} else {
>  		dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
>  			  global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
> -		dl_rq->extra_bw = to_ratio(global_rt_period(),
> -						    global_rt_runtime());
> +		dl_rq->max_bw = dl_rq->extra_bw =
> +			to_ratio(global_rt_period(), global_rt_runtime());
>  	}
>  }
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
>  	u64			this_bw;
>  	u64			extra_bw;
>  
> +	/*
> +	 * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
> +	 * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
> +	 */
> +	u64			max_bw;
> +
>  	/*
>  	 * Inverse of the fraction of CPU utilization that can be reclaimed
>  	 * by the GRUB algorithm.

Not related to this patch directly but I still can't see how `GRUB-PA`
with schedutil CPUfreq governor should work together with GRUB reclaiming.

CPU frequency is influenced by Uact (rq->dl.running_bw):

sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->

      return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT

But the extra GRUB reclaim runtime is calculated based on rq->dl.max_bw
and AFAICS, Uact is not adjusted by scaled_delta_exec?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
  2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
                   ` (3 preceding siblings ...)
  2023-06-01 11:56 ` Dietmar Eggemann
@ 2023-06-01 12:54 ` Juri Lelli
  2023-06-19 11:04 ` [tip: sched/core] " tip-bot2 for Vineeth Pillai
  5 siblings, 0 replies; 12+ messages in thread
From: Juri Lelli @ 2023-06-01 12:54 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: luca.abeni, Daniel Bristot de Oliveira, Peter Zijlstra,
	Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
	youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc

Hi!

On 30/05/23 09:55, Vineeth Pillai wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
>   "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
> 
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
>   "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
> 
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
> 
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
> 
> Tests on 6.3.0
> ==============
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
> 
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
> 
> Tests with this patch applied
> =============================
> 
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
> 
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
> 
> RUN 3: 2 tasks
>   Task 1: runtime=1ms, deadline=period=10ms
>   Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
> 
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
> %Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
> %Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa
> 
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
>      (2015). Parallel and sequential reclaiming in multicore
>      real-time global scheduling.
> 
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---

This looks good to me too. Thanks a lot for working on this and of
course to Luca and Daniel who reviewed and played with it as well.

Acked-by: Juri Lelli <juri.lelli@redhat.com>

Best,
Juri


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
  2023-05-30 14:12   ` luca abeni
  2023-06-01 10:35   ` Daniel Bristot de Oliveira
@ 2023-06-01 12:55   ` Juri Lelli
  2023-06-19 11:04   ` [tip: sched/core] " tip-bot2 for Vineeth Pillai
  3 siblings, 0 replies; 12+ messages in thread
From: Juri Lelli @ 2023-06-01 12:55 UTC (permalink / raw)
  To: Vineeth Pillai
  Cc: luca.abeni, Daniel Bristot de Oliveira, Peter Zijlstra,
	Ingo Molnar, Vincent Guittot, Steven Rostedt, Joel Fernandes,
	youssefesmat, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc

On 30/05/23 09:55, Vineeth Pillai wrote:
> Update the details of GRUB to reflect the updated logic.
> 
> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> ---

Acked-by: Juri Lelli <juri.lelli@redhat.com>

Best,
Juri


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB
  2023-06-01 11:56 ` Dietmar Eggemann
@ 2023-06-05  1:57   ` Vineeth Remanan Pillai
  0 siblings, 0 replies; 12+ messages in thread
From: Vineeth Remanan Pillai @ 2023-06-05  1:57 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: luca.abeni, Juri Lelli, Daniel Bristot de Oliveira,
	Peter Zijlstra, Ingo Molnar, Vincent Guittot, Steven Rostedt,
	Joel Fernandes, youssefesmat, Ben Segall, Mel Gorman,
	Valentin Schneider, Jonathan Corbet, linux-kernel, linux-doc

Hi Dietmar,

Sorry for my late response..

> > + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> > + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
>
> nit-pick:
>
> s/multiped/multiplied
>
Sorry I missed this. I am working on couple more GRUB fixes and will fix
this along with those changes.

> Not related to this patch directly but I still can't see how `GRUB-PA`
> with schedutil CPUfreq governor should work together with GRUB reclaiming.
>
> CPU frequency is influenced by Uact (rq->dl.running_bw):
>
> sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->
>
>       return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT
>
> But the extra GRUB reclaim runtime is calculated based on rq->dl.max_bw
> and AFAICS, Uact is not adjusted by scaled_delta_exec?
>
I haven't had a chance to look at GRUB-PA till now and after reading
your mail, I had a quick read of GRUB-PA paper[1] :-). From what I
understood, the dea of GRUB-PA is not to reclaim to the maximum
allowable bandwidth, but to scale down the frequency to the required
utilization(running_bw). This way the tasks run longer(similar to
reclaiming) but using less power.

GRUB reclaim is calculated indirectly based on running_bw as well as
"Uinact = this_bw - running_bw". We factor in reserved and running bw
into the equation and then scale it down to max_bw. So if my
understanding is correct, GRUB-PA doesn't technically reclaim extra
processing capacity, but just allows the task to run longer at lower
frequency so as to save power. I might be wrong and not an expert
here. Luca, please correct me if I am wrong.

Thanks,
Vineeth

[1] C. Scordino, G. Lipari, A Resource Reservation Algorithm for Power-Aware
    Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions
    on Computers, December 2006.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip: sched/core] sched/deadline: Fix bandwidth reclaim equation in GRUB
  2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
                   ` (4 preceding siblings ...)
  2023-06-01 12:54 ` Juri Lelli
@ 2023-06-19 11:04 ` tip-bot2 for Vineeth Pillai
  5 siblings, 0 replies; 12+ messages in thread
From: tip-bot2 for Vineeth Pillai @ 2023-06-19 11:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vineeth Pillai (Google), Peter Zijlstra (Intel),
	Daniel Bristot de Oliveira, Juri Lelli, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     6a9d623aad89539eca71eb264db6b9d538620ad5
Gitweb:        https://git.kernel.org/tip/6a9d623aad89539eca71eb264db6b9d538620ad5
Author:        Vineeth Pillai <vineeth@bitbyteword.org>
AuthorDate:    Tue, 30 May 2023 09:55:25 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 16 Jun 2023 22:08:11 +02:00

sched/deadline: Fix bandwidth reclaim equation in GRUB

According to the GRUB[1] rule, the runtime is depreciated as:
  "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

To guarantee that deadline tasks doesn't starve lower class tasks,
we do not allocate the full bandwidth of the cpu to deadline tasks.
Maximum bandwidth usable by deadline tasks is denoted by "Umax".
Considering Umax, equation (1) becomes:
  "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Current implementation has a minor bug in equation (2), which this
patch fixes.

The reclamation logic is verified by a sample program which creates
multiple deadline threads and observing their utilization. The tests
were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

Tests on 6.3.0
==============

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

RUN 3: 2 tasks
  Task 1: runtime=1ms, deadline=period=10ms
  Task 2: runtime=1ms, deadline=period=100ms
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

As seen above, the reclamation doesn't reclaim the maximum allowed
bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
bandwidth also comes down.

Tests with this patch applied
=============================

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

RUN 3: 2 tasks
  Task 1: runtime=1ms, deadline=period=10ms
  Task 2: runtime=1ms, deadline=period=100ms
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

Running tasks on all cpus allowing for migration also showed that
the utilization is reclaimed to the maximum. Running 10 tasks on
3 cpus SCHED_FLAG_RECLAIM - top shows:
%Cpu0  : 94.6 us,  0.0 sy,  0.0 ni,  5.4 id,  0.0 wa
%Cpu1  : 95.2 us,  0.0 sy,  0.0 ni,  4.8 id,  0.0 wa
%Cpu2  : 95.8 us,  0.0 sy,  0.0 ni,  4.2 id,  0.0 wa

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
     (2015). Parallel and sequential reclaiming in multicore
     real-time global scheduling.

Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20230530135526.2385378-1-vineeth@bitbyteword.org
---
 kernel/sched/deadline.c | 50 ++++++++++++++++++----------------------
 kernel/sched/sched.h    |  6 +++++-
 2 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f827067..e41a36b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1253,43 +1253,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
 }
 
 /*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
  * where u is the utilization of the task, Umax is the maximum reclaimable
  * utilization, Uinact is the (per-runqueue) inactive utilization, computed
  * as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
+ * "runqueue active utilization", and Uextra is the (per runqueue) extra
  * reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
  */
 static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
 {
-	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
 	u64 u_act;
-	u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
 
 	/*
-	 * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
-	 * we compare u_inact + rq->dl.extra_bw with
-	 * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
-	 * u_inact + rq->dl.extra_bw can be larger than
-	 * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
-	 * leading to wrong results)
+	 * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+	 * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+	 * can be larger than u_max. So, u_max - u_inact - u_extra would be
+	 * negative leading to wrong results.
 	 */
-	if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
-		u_act = u_act_min;
+	if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+		u_act = dl_se->dl_bw;
 	else
-		u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+		u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
 
+	u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
 	return (delta * u_act) >> BW_SHIFT;
 }
 
@@ -2788,12 +2784,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
 {
 	if (global_rt_runtime() == RUNTIME_INF) {
 		dl_rq->bw_ratio = 1 << RATIO_SHIFT;
-		dl_rq->extra_bw = 1 << BW_SHIFT;
+		dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
 	} else {
 		dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
 			  global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
-		dl_rq->extra_bw = to_ratio(global_rt_period(),
-						    global_rt_runtime());
+		dl_rq->max_bw = dl_rq->extra_bw =
+			to_ratio(global_rt_period(), global_rt_runtime());
 	}
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 556496c..36e23e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -748,6 +748,12 @@ struct dl_rq {
 	u64			extra_bw;
 
 	/*
+	 * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+	 * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+	 */
+	u64			max_bw;
+
+	/*
 	 * Inverse of the fraction of CPU utilization that can be reclaimed
 	 * by the GRUB algorithm.
 	 */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [tip: sched/core] sched/deadline: Update GRUB description in the documentation
  2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
                     ` (2 preceding siblings ...)
  2023-06-01 12:55   ` Juri Lelli
@ 2023-06-19 11:04   ` tip-bot2 for Vineeth Pillai
  3 siblings, 0 replies; 12+ messages in thread
From: tip-bot2 for Vineeth Pillai @ 2023-06-19 11:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vineeth Pillai (Google), Peter Zijlstra (Intel),
	Daniel Bristot de Oliveira, Juri Lelli, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e20f204c88d595c04fc9197794bb68c0fbabd902
Gitweb:        https://git.kernel.org/tip/e20f204c88d595c04fc9197794bb68c0fbabd902
Author:        Vineeth Pillai <vineeth@bitbyteword.org>
AuthorDate:    Tue, 30 May 2023 09:55:26 -04:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 16 Jun 2023 22:08:12 +02:00

sched/deadline: Update GRUB description in the documentation

Update the details of GRUB to reflect the updated logic.

Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20230530135526.2385378-2-vineeth@bitbyteword.org
---
 Documentation/scheduler/sched-deadline.rst | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index 9d9be52..9fe4846 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -203,12 +203,15 @@ Deadline Task Scheduling
   - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
     runqueue, including the tasks in Inactive state.
 
+  - Maximum usable bandwidth (max_bw): This is the maximum bandwidth usable by
+    deadline tasks and is currently set to the RT capacity.
+
 
  The algorithm reclaims the bandwidth of the tasks in Inactive state.
  It does so by decrementing the runtime of the executing task Ti at a pace equal
  to
 
-           dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+           dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt
 
  where:
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-06-19 11:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-30 13:55 [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB Vineeth Pillai
2023-05-30 13:55 ` [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation Vineeth Pillai
2023-05-30 14:12   ` luca abeni
2023-06-01 10:35   ` Daniel Bristot de Oliveira
2023-06-01 12:55   ` Juri Lelli
2023-06-19 11:04   ` [tip: sched/core] " tip-bot2 for Vineeth Pillai
2023-05-30 14:12 ` [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB luca abeni
2023-06-01 10:34 ` Daniel Bristot de Oliveira
2023-06-01 11:56 ` Dietmar Eggemann
2023-06-05  1:57   ` Vineeth Remanan Pillai
2023-06-01 12:54 ` Juri Lelli
2023-06-19 11:04 ` [tip: sched/core] " tip-bot2 for Vineeth Pillai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.