linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst
@ 2021-07-30  7:09 Huaixin Chang
  2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Huaixin Chang @ 2021-07-30  7:09 UTC (permalink / raw)
  To: peterz
  Cc: anderson, baruah, bsegall, changhuaixin, dietmar.eggemann,
	dtcccc, juri.lelli, khlebnikov, linux-kernel, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

Huaixin Chang (2):
  sched/fair: Add cfs bandwidth burst statistics
  sched/fair: Add document for burstable CFS bandwidth

 Documentation/admin-guide/cgroup-v2.rst |  8 ++++
 Documentation/scheduler/sched-bwc.rst   | 85 +++++++++++++++++++++++++++++----
 kernel/sched/core.c                     | 13 +++--
 kernel/sched/fair.c                     |  9 ++++
 kernel/sched/sched.h                    |  3 ++
 5 files changed, 105 insertions(+), 13 deletions(-)

-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-07-30  7:09 [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Huaixin Chang
@ 2021-07-30  7:09 ` Huaixin Chang
  2021-08-12 12:18   ` changhuaixin
  2021-07-30  7:09 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
  2021-08-12 21:38 ` [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Tejun Heo
  2 siblings, 1 reply; 5+ messages in thread
From: Huaixin Chang @ 2021-07-30  7:09 UTC (permalink / raw)
  To: peterz
  Cc: anderson, baruah, bsegall, changhuaixin, dietmar.eggemann,
	dtcccc, juri.lelli, khlebnikov, linux-kernel, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

Two new statistics are introduced to show the internal of burst feature
and explain why burst helps or not.

nr_bursts:  number of periods bandwidth burst occurs
burst_usec: cumulative wall-time that any cpus has
	    used above quota in respective periods

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
---
 kernel/sched/core.c  | 13 ++++++++++---
 kernel/sched/fair.c  |  9 +++++++++
 kernel/sched/sched.h |  3 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d9ff40f4661..9a286c8a1354 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10088,6 +10088,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 		seq_printf(sf, "wait_sum %llu\n", ws);
 	}
 
+	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
+	seq_printf(sf, "burst_usec %llu\n", cfs_b->burst_time);
+
 	return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -10184,16 +10187,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
 	{
 		struct task_group *tg = css_tg(css);
 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
-		u64 throttled_usec;
+		u64 throttled_usec, burst_usec;
 
 		throttled_usec = cfs_b->throttled_time;
 		do_div(throttled_usec, NSEC_PER_USEC);
+		burst_usec = cfs_b->burst_time;
+		do_div(burst_usec, NSEC_PER_USEC);
 
 		seq_printf(sf, "nr_periods %d\n"
 			   "nr_throttled %d\n"
-			   "throttled_usec %llu\n",
+			   "throttled_usec %llu\n"
+			   "nr_bursts %d\n"
+			   "burst_usec %llu\n",
 			   cfs_b->nr_periods, cfs_b->nr_throttled,
-			   throttled_usec);
+			   throttled_usec, cfs_b->nr_burst, burst_usec);
 	}
 #endif
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44c452072a1b..464371f364f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
  */
 void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 {
+	s64 runtime;
+
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
 	cfs_b->runtime += cfs_b->quota;
+	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	if (runtime > 0) {
+		cfs_b->burst_time += runtime;
+		cfs_b->nr_burst++;
+	}
+
 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
+	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14a41a243f7b..80e4322727b4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -367,6 +367,7 @@ struct cfs_bandwidth {
 	u64			quota;
 	u64			runtime;
 	u64			burst;
+	u64			runtime_snap;
 	s64			hierarchical_quota;
 
 	u8			idle;
@@ -379,7 +380,9 @@ struct cfs_bandwidth {
 	/* Statistics: */
 	int			nr_periods;
 	int			nr_throttled;
+	int			nr_burst;
 	u64			throttled_time;
+	u64			burst_time;
 #endif
 };
 
-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth
  2021-07-30  7:09 [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Huaixin Chang
  2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
@ 2021-07-30  7:09 ` Huaixin Chang
  2021-08-12 21:38 ` [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Tejun Heo
  2 siblings, 0 replies; 5+ messages in thread
From: Huaixin Chang @ 2021-07-30  7:09 UTC (permalink / raw)
  To: peterz
  Cc: anderson, baruah, bsegall, changhuaixin, dietmar.eggemann,
	dtcccc, juri.lelli, khlebnikov, linux-kernel, luca.abeni,
	mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj,
	tommaso.cucinotta, vincent.guittot, xiyou.wangcong

Basic description of usage and effect for CFS Bandwidth Control Burst.

Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  8 ++++
 Documentation/scheduler/sched-bwc.rst   | 85 +++++++++++++++++++++++++++++----
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 5c7377b5bd3e..c79477089c53 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1016,6 +1016,8 @@ All time durations are in microseconds.
 	- nr_periods
 	- nr_throttled
 	- throttled_usec
+	- nr_bursts
+	- burst_usec
 
   cpu.weight
 	A read-write single value file which exists on non-root
@@ -1047,6 +1049,12 @@ All time durations are in microseconds.
 	$PERIOD duration.  "max" for $MAX indicates no limit.  If only
 	one number is written, $MAX is updated.
 
+  cpu.max.burst
+	A read-write single value file which exists on non-root
+	cgroups.  The default is "0".
+
+	The burst in the range [0, $QUOTA].
+
   cpu.pressure
 	A read-write nested-keyed file.
 
diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst
index 1fc73555f5c4..0b2a3b2e3369 100644
--- a/Documentation/scheduler/sched-bwc.rst
+++ b/Documentation/scheduler/sched-bwc.rst
@@ -22,39 +22,89 @@ cfs_quota units at each period boundary. As threads consume this bandwidth it
 is transferred to cpu-local "silos" on a demand basis. The amount transferred
 within each of these updates is tunable and described as the "slice".
 
+Burst feature
+-------------
+This feature borrows time now against our future underrun, at the cost of
+increased interference against the other system users. All nicely bounded.
+
+Traditional (UP-EDF) bandwidth control is something like:
+
+  (U = \Sum u_i) <= 1
+
+This guaranteeds both that every deadline is met and that the system is
+stable. After all, if U were > 1, then for every second of walltime,
+we'd have to run more than a second of program time, and obviously miss
+our deadline, but the next deadline will be further out still, there is
+never time to catch up, unbounded fail.
+
+The burst feature observes that a workload doesn't always executes the full
+quota; this enables one to describe u_i as a statistical distribution.
+
+For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100)
+(the traditional WCET). This effectively allows u to be smaller,
+increasing the efficiency (we can pack more tasks in the system), but at
+the cost of missing deadlines when all the odds line up. However, it
+does maintain stability, since every overrun must be paired with an
+underrun as long as our x is above the average.
+
+That is, suppose we have 2 tasks, both specify a p(95) value, then we
+have a p(95)*p(95) = 90.25% chance both tasks are within their quota and
+everything is good. At the same time we have a p(5)p(5) = 0.25% chance
+both tasks will exceed their quota at the same time (guaranteed deadline
+fail). Somewhere in between there's a threshold where one exceeds and
+the other doesn't underrun enough to compensate; this depends on the
+specific CDFs.
+
+At the same time, we can say that the worst case deadline miss, will be
+\Sum e_i; that is, there is a bounded tardiness (under the assumption
+that x+e is indeed WCET).
+
+The interferenece when using burst is valued by the possibilities for
+missing the deadline and the average WCET. Test results showed that when
+there many cgroups or CPU is under utilized, the interference is
+limited. More details are shown in:
+https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
+
 Management
 ----------
-Quota and period are managed within the cpu subsystem via cgroupfs.
+Quota, period and burst are managed within the cpu subsystem via cgroupfs.
 
 .. note::
    The cgroupfs files described in this section are only applicable
    to cgroup v1. For cgroup v2, see
    :ref:`Documentation/admin-guide/cgroup-v2.rst <cgroup-v2-cpu>`.
 
-- cpu.cfs_quota_us: the total available run-time within a period (in
-  microseconds)
+- cpu.cfs_quota_us: run-time replenished within a period (in microseconds)
 - cpu.cfs_period_us: the length of a period (in microseconds)
 - cpu.stat: exports throttling statistics [explained further below]
+- cpu.cfs_burst_us: the maximum accumulated run-time (in microseconds)
 
 The default values are::
 
 	cpu.cfs_period_us=100ms
-	cpu.cfs_quota=-1
+	cpu.cfs_quota_us=-1
+	cpu.cfs_burst_us=0
 
 A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
 bandwidth restriction in place, such a group is described as an unconstrained
 bandwidth group. This represents the traditional work-conserving behavior for
 CFS.
 
-Writing any (valid) positive value(s) will enact the specified bandwidth limit.
-The minimum quota allowed for the quota or period is 1ms. There is also an
-upper bound on the period length of 1s. Additional restrictions exist when
-bandwidth limits are used in a hierarchical fashion, these are explained in
-more detail below.
+Writing any (valid) positive value(s) no smaller than cpu.cfs_burst_us will
+enact the specified bandwidth limit. The minimum quota allowed for the quota or
+period is 1ms. There is also an upper bound on the period length of 1s.
+Additional restrictions exist when bandwidth limits are used in a hierarchical
+fashion, these are explained in more detail below.
 
 Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
 and return the group to an unconstrained state once more.
 
+A value of 0 for cpu.cfs_burst_us indicates that the group can not accumulate
+any unused bandwidth. It makes the traditional bandwidth control behavior for
+CFS unchanged. Writing any (valid) positive value(s) no larger than
+cpu.cfs_quota_us into cpu.cfs_burst_us will enact the cap on unused bandwidth
+accumulation.
+
 Any updates to a group's bandwidth specification will result in it becoming
 unthrottled if it is in a constrained state.
 
@@ -74,7 +124,7 @@ for more fine-grained consumption.
 
 Statistics
 ----------
-A group's bandwidth statistics are exported via 3 fields in cpu.stat.
+A group's bandwidth statistics are exported via 5 fields in cpu.stat.
 
 cpu.stat:
 
@@ -82,6 +132,9 @@ cpu.stat:
 - nr_throttled: Number of times the group has been throttled/limited.
 - throttled_time: The total time duration (in nanoseconds) for which entities
   of the group have been throttled.
+- nr_bursts: Number of periods burst occurs.
+- burst_usec: Cumulative wall-time that any CPUs has used above quota in
+  respective periods
 
 This interface is read-only.
 
@@ -179,3 +232,15 @@ Examples
 
    By using a small period here we are ensuring a consistent latency
    response at the expense of burst capacity.
+
+4. Limit a group to 40% of 1 CPU, and allow accumulate up to 20% of 1 CPU
+   additionally, in case accumulation has been done.
+
+   With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU.
+   And 10ms burst will be equivalent to 20% of 1 CPU.
+
+	# echo 20000 > cpu.cfs_quota_us /* quota = 20ms */
+	# echo 50000 > cpu.cfs_period_us /* period = 50ms */
+	# echo 10000 > cpu.cfs_burst_us /* burst = 10ms */
+
+   Larger buffer setting (no larger than quota) allows greater burst capacity.
-- 
2.14.4.44.g2045bb6


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics
  2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
@ 2021-08-12 12:18   ` changhuaixin
  0 siblings, 0 replies; 5+ messages in thread
From: changhuaixin @ 2021-08-12 12:18 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: Peter Zijlstra, anderson, baruah, Benjamin Segall,
	Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list,
	luca.abeni, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal,
	pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo,
	tommaso.cucinotta, Vincent Guittot, xiyou.wangcong

Ping.

The statistics code is further simplified than the one discussed before. Mind having a look at it?

> On Jul 30, 2021, at 3:09 PM, Huaixin Chang <changhuaixin@linux.alibaba.com> wrote:
> 
> Two new statistics are introduced to show the internal of burst feature
> and explain why burst helps or not.
> 
> nr_bursts:  number of periods bandwidth burst occurs
> burst_usec: cumulative wall-time that any cpus has
> 	    used above quota in respective periods
> 
> Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com>
> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
> ---
> kernel/sched/core.c  | 13 ++++++++++---
> kernel/sched/fair.c  |  9 +++++++++
> kernel/sched/sched.h |  3 +++
> 3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d9ff40f4661..9a286c8a1354 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -10088,6 +10088,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
> 		seq_printf(sf, "wait_sum %llu\n", ws);
> 	}
> 
> +	seq_printf(sf, "nr_bursts %d\n", cfs_b->nr_burst);
> +	seq_printf(sf, "burst_usec %llu\n", cfs_b->burst_time);
> +
> 	return 0;
> }
> #endif /* CONFIG_CFS_BANDWIDTH */
> @@ -10184,16 +10187,20 @@ static int cpu_extra_stat_show(struct seq_file *sf,
> 	{
> 		struct task_group *tg = css_tg(css);
> 		struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> -		u64 throttled_usec;
> +		u64 throttled_usec, burst_usec;
> 
> 		throttled_usec = cfs_b->throttled_time;
> 		do_div(throttled_usec, NSEC_PER_USEC);
> +		burst_usec = cfs_b->burst_time;
> +		do_div(burst_usec, NSEC_PER_USEC);
> 
> 		seq_printf(sf, "nr_periods %d\n"
> 			   "nr_throttled %d\n"
> -			   "throttled_usec %llu\n",
> +			   "throttled_usec %llu\n"
> +			   "nr_bursts %d\n"
> +			   "burst_usec %llu\n",
> 			   cfs_b->nr_periods, cfs_b->nr_throttled,
> -			   throttled_usec);
> +			   throttled_usec, cfs_b->nr_burst, burst_usec);
> 	}
> #endif
> 	return 0;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 44c452072a1b..464371f364f1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4655,11 +4655,20 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>  */
> void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
> {
> +	s64 runtime;
> +
> 	if (unlikely(cfs_b->quota == RUNTIME_INF))
> 		return;
> 
> 	cfs_b->runtime += cfs_b->quota;
> +	runtime = cfs_b->runtime_snap - cfs_b->runtime;
> +	if (runtime > 0) {
> +		cfs_b->burst_time += runtime;
> +		cfs_b->nr_burst++;
> +	}
> +
> 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> +	cfs_b->runtime_snap = cfs_b->runtime;
> }
> 
> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 14a41a243f7b..80e4322727b4 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -367,6 +367,7 @@ struct cfs_bandwidth {
> 	u64			quota;
> 	u64			runtime;
> 	u64			burst;
> +	u64			runtime_snap;
> 	s64			hierarchical_quota;
> 
> 	u8			idle;
> @@ -379,7 +380,9 @@ struct cfs_bandwidth {
> 	/* Statistics: */
> 	int			nr_periods;
> 	int			nr_throttled;
> +	int			nr_burst;
> 	u64			throttled_time;
> +	u64			burst_time;
> #endif
> };
> 
> -- 
> 2.14.4.44.g2045bb6
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst
  2021-07-30  7:09 [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Huaixin Chang
  2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
  2021-07-30  7:09 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
@ 2021-08-12 21:38 ` Tejun Heo
  2 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2021-08-12 21:38 UTC (permalink / raw)
  To: Huaixin Chang
  Cc: peterz, anderson, baruah, bsegall, dietmar.eggemann, dtcccc,
	juri.lelli, khlebnikov, linux-kernel, luca.abeni, mgorman, mingo,
	odin, odin, pauld, pjt, rostedt, shanpeic, tommaso.cucinotta,
	vincent.guittot, xiyou.wangcong

On Fri, Jul 30, 2021 at 03:09:54PM +0800, Huaixin Chang wrote:
> Huaixin Chang (2):
>   sched/fair: Add cfs bandwidth burst statistics
>   sched/fair: Add document for burstable CFS bandwidth

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-12 21:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30  7:09 [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Huaixin Chang
2021-07-30  7:09 ` [PATCH 1/2] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
2021-08-12 12:18   ` changhuaixin
2021-07-30  7:09 ` [PATCH 2/2] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
2021-08-12 21:38 ` [PATCH 0/2] Add statistics and ducument for cfs bandwidth burst Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).