All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] sched: Enable root level cgroup bandwidth control
@ 2022-05-18 10:08 Fam Zheng
  2022-05-18 10:21 ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Fam Zheng @ 2022-05-18 10:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, zhouchengming, Vincent Guittot, fam,
	Peter Zijlstra, Mel Gorman, Ingo Molnar, songmuchun, Juri Lelli,
	Fam Zheng

In the data center there sometimes comes a need to throttle down a
server, cgroup is a natural choice to reduce cpu quota for running task
but there is no interface for the root group.

Alternative solution such as cpufreq controlling exists, with the help
of e.g. intel-pstate or acpi-cpufreq; but that is not always available,
depending on the hardware and BIOS.

This patch allows capping the global cpu utilization.

Currently, writing a positive integer to the v1 root cgroup:

        /sys/fs/cgroup/cpu/cpu.cfs_quota_ns

will be rejected by kernel (-EINVAL). And there is no such entries in v2
either because of CFTYPE_NOT_ON_ROOT flags.

Remove this limitation by checking the root node's throttled state.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
---
 kernel/sched/core.c | 13 ++++---------
 kernel/sched/fair.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d58c0389eb23..c30c8a4d006a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10402,9 +10402,6 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
 	int i, ret = 0, runtime_enabled, runtime_was_enabled;
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 
-	if (tg == &root_task_group)
-		return -EINVAL;
-
 	/*
 	 * Ensure we have at some amount of bandwidth every period.  This is
 	 * to prevent reaching a state of large arrears when throttled via
@@ -10632,12 +10629,10 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data)
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 	s64 quota = 0, parent_quota = -1;
 
-	if (!tg->parent) {
-		quota = RUNTIME_INF;
-	} else {
+	quota = normalize_cfs_quota(tg, d);
+	if (tg->parent) {
 		struct cfs_bandwidth *parent_b = &tg->parent->cfs_bandwidth;
 
-		quota = normalize_cfs_quota(tg, d);
 		parent_quota = parent_b->hierarchical_quota;
 
 		/*
@@ -10983,13 +10978,13 @@ static struct cftype cpu_files[] = {
 #ifdef CONFIG_CFS_BANDWIDTH
 	{
 		.name = "max",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = 0,
 		.seq_show = cpu_max_show,
 		.write = cpu_max_write,
 	},
 	{
 		.name = "max.burst",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = 0,
 		.read_u64 = cpu_cfs_burst_read_u64,
 		.write_u64 = cpu_cfs_burst_write_u64,
 	},
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a68482d66535..dd8c7eb9b648 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7310,7 +7310,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 			if (unlikely(check_cfs_rq_runtime(cfs_rq))) {
 				cfs_rq = &rq->cfs;
 
-				if (!cfs_rq->nr_running)
+				if (!cfs_rq->nr_running || cfs_rq_throttled(cfs_rq))
 					goto idle;
 
 				goto simple;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 10:08 [RFC PATCH] sched: Enable root level cgroup bandwidth control Fam Zheng
@ 2022-05-18 10:21 ` Peter Zijlstra
  2022-05-18 10:38   ` [External] " Feiran Zheng .
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2022-05-18 10:21 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming,
	Vincent Guittot, fam, Mel Gorman, Ingo Molnar, songmuchun,
	Juri Lelli

On Wed, May 18, 2022 at 11:08:41AM +0100, Fam Zheng wrote:
> In the data center there sometimes comes a need to throttle down a
> server, 

Why?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 10:21 ` Peter Zijlstra
@ 2022-05-18 10:38   ` Feiran Zheng .
  2022-05-18 12:03     ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Feiran Zheng . @ 2022-05-18 10:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming,
	Vincent Guittot, fam, Mel Gorman, Ingo Molnar, songmuchun,
	Juri Lelli

On Wed, May 18, 2022 at 11:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, May 18, 2022 at 11:08:41AM +0100, Fam Zheng wrote:
> > In the data center there sometimes comes a need to throttle down a
> > server,
>
> Why?

For economical reasons there can be over-provisioning in DC power
supply (UPS capacity etc) because the utilization expectation of the
racks is not maximum value. But the workload can be client driven,
depending on how many users are online, and in the end the power
supply may overload and trip itself. To avoid that, upon a threshold,
some servers need to be brought down or throttled. The latter is
obviously going to be much more smooth.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 10:38   ` [External] " Feiran Zheng .
@ 2022-05-18 12:03     ` Vincent Guittot
  2022-05-18 12:55       ` Feiran Zheng .
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2022-05-18 12:03 UTC (permalink / raw)
  To: Feiran Zheng .
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming, fam,
	Mel Gorman, Ingo Molnar, songmuchun, Juri Lelli

On Wed, 18 May 2022 at 12:38, Feiran Zheng . <fam.zheng@bytedance.com> wrote:
>
> On Wed, May 18, 2022 at 11:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, May 18, 2022 at 11:08:41AM +0100, Fam Zheng wrote:
> > > In the data center there sometimes comes a need to throttle down a
> > > server,
> >
> > Why?
>
> For economical reasons there can be over-provisioning in DC power
> supply (UPS capacity etc) because the utilization expectation of the
> racks is not maximum value. But the workload can be client driven,
> depending on how many users are online, and in the end the power
> supply may overload and trip itself. To avoid that, upon a threshold,
> some servers need to be brought down or throttled. The latter is
> obviously going to be much more smooth.

This looks like thermal or power budget management. We have other ways
to do so with powercap or idle injection. Did you consider those
solutions ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 12:03     ` Vincent Guittot
@ 2022-05-18 12:55       ` Feiran Zheng .
  2022-05-18 14:31         ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Feiran Zheng . @ 2022-05-18 12:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming, fam,
	Mel Gorman, Ingo Molnar, songmuchun, Juri Lelli

On Wed, May 18, 2022 at 1:03 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Wed, 18 May 2022 at 12:38, Feiran Zheng . <fam.zheng@bytedance.com> wrote:
> >
> > On Wed, May 18, 2022 at 11:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, May 18, 2022 at 11:08:41AM +0100, Fam Zheng wrote:
> > > > In the data center there sometimes comes a need to throttle down a
> > > > server,
> > >
> > > Why?
> >
> > For economical reasons there can be over-provisioning in DC power
> > supply (UPS capacity etc) because the utilization expectation of the
> > racks is not maximum value. But the workload can be client driven,
> > depending on how many users are online, and in the end the power
> > supply may overload and trip itself. To avoid that, upon a threshold,
> > some servers need to be brought down or throttled. The latter is
> > obviously going to be much more smooth.
>
> This looks like thermal or power budget management. We have other ways
> to do so with powercap or idle injection. Did you consider those
> solutions ?

Hi Vincent,

I looked at powercap, and it seems Intel only? Any idea about AMD/ARM?
There seems nothing for them under drivers/powercap/.

I don't know the idle injection interface, can you please give more hints?

I also plan to test uclamp, still need to learn more about that.

Fam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 12:55       ` Feiran Zheng .
@ 2022-05-18 14:31         ` Vincent Guittot
  2022-05-18 15:55           ` Feiran Zheng .
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2022-05-18 14:31 UTC (permalink / raw)
  To: Feiran Zheng .
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming, fam,
	Mel Gorman, Ingo Molnar, songmuchun, Juri Lelli, Daniel Lezcano,
	Cristian Marussi

Adding daniel and cristian who works on some powercap implementation

On Wed, 18 May 2022 at 14:56, Feiran Zheng . <fam.zheng@bytedance.com> wrote:
>
> On Wed, May 18, 2022 at 1:03 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Wed, 18 May 2022 at 12:38, Feiran Zheng . <fam.zheng@bytedance.com> wrote:
> > >
> > > On Wed, May 18, 2022 at 11:21 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Wed, May 18, 2022 at 11:08:41AM +0100, Fam Zheng wrote:
> > > > > In the data center there sometimes comes a need to throttle down a
> > > > > server,
> > > >
> > > > Why?
> > >
> > > For economical reasons there can be over-provisioning in DC power
> > > supply (UPS capacity etc) because the utilization expectation of the
> > > racks is not maximum value. But the workload can be client driven,
> > > depending on how many users are online, and in the end the power
> > > supply may overload and trip itself. To avoid that, upon a threshold,
> > > some servers need to be brought down or throttled. The latter is
> > > obviously going to be much more smooth.
> >
> > This looks like thermal or power budget management. We have other ways
> > to do so with powercap or idle injection. Did you consider those
> > solutions ?
>
> Hi Vincent,
>
> I looked at powercap, and it seems Intel only? Any idea about AMD/ARM?
> There seems nothing for them under drivers/powercap/.

there is a DTPM powercap provider in the latest kernel and a scmi
power capp provider is under review
>
> I don't know the idle injection interface, can you please give more hints?

idle injection can be used with cpuidle cooling device and there were
some discussion to make a dtpm idle injection device but I think this
has  never been sent on mailing list


>
> I also plan to test uclamp, still need to learn more about that.
>
> Fam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 14:31         ` Vincent Guittot
@ 2022-05-18 15:55           ` Feiran Zheng .
  2022-05-23 15:36             ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Feiran Zheng . @ 2022-05-18 15:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming, fam,
	Mel Gorman, Ingo Molnar, songmuchun, Juri Lelli, Daniel Lezcano,
	Cristian Marussi

On Wed, May 18, 2022 at 3:31 PM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> there is a DTPM powercap provider in the latest kernel and a scmi
> power capp provider is under review


Thanks, so DTPM can be a good solution for ARM. We could also deal
with AMD with acpi-cpufreq if powercap is not supported yet.

That aside, I think cpu cgroup has a familiar and simple sysfs
interface, and is more importantly hardware agnostic so it would be
really nice to have.

Alternatively, I assume we can look into a device-independent idle
injection mechanism?

Fam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] Re: [RFC PATCH] sched: Enable root level cgroup bandwidth control
  2022-05-18 15:55           ` Feiran Zheng .
@ 2022-05-23 15:36             ` Vincent Guittot
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2022-05-23 15:36 UTC (permalink / raw)
  To: Feiran Zheng .
  Cc: Peter Zijlstra, linux-kernel, Steven Rostedt, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, zhouchengming, fam,
	Mel Gorman, Ingo Molnar, songmuchun, Juri Lelli, Daniel Lezcano,
	Cristian Marussi

On Wed, 18 May 2022 at 17:55, Feiran Zheng . <fam.zheng@bytedance.com> wrote:
>
> On Wed, May 18, 2022 at 3:31 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> > there is a DTPM powercap provider in the latest kernel and a scmi
> > power capp provider is under review
>
>
> Thanks, so DTPM can be a good solution for ARM. We could also deal
> with AMD with acpi-cpufreq if powercap is not supported yet.
>
> That aside, I think cpu cgroup has a familiar and simple sysfs
> interface, and is more importantly hardware agnostic so it would be
> really nice to have.

cgroup is about allocating runtime to a group but you want to force a
system idle for power consideration so it looks like abusing the
interface

>
> Alternatively, I assume we can look into a device-independent idle
> injection mechanism?

Yes, idle injection is  device-independent and fit better with your needs

thermal framework already support cpu idle cooling device but I'm not
sure your case is only related to thermal so you might want a more
generic interface like powercap--> dtpm --> idle injection

>
> Fam

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-23 15:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 10:08 [RFC PATCH] sched: Enable root level cgroup bandwidth control Fam Zheng
2022-05-18 10:21 ` Peter Zijlstra
2022-05-18 10:38   ` [External] " Feiran Zheng .
2022-05-18 12:03     ` Vincent Guittot
2022-05-18 12:55       ` Feiran Zheng .
2022-05-18 14:31         ` Vincent Guittot
2022-05-18 15:55           ` Feiran Zheng .
2022-05-23 15:36             ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.