From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Quentin Perret <quentin.perret@arm.com>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Paul Turner <pjt@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
Joel Fernandes <joelaf@google.com>,
Steve Muckle <smuckle@google.com>,
Suren Baghdasaryan <surenb@google.com>,
Alessio Balsini <balsini@android.com>
Subject: Re: [PATCH v11 1/5] sched/core: uclamp: Extend CPU's cgroup controller
Date: Mon, 15 Jul 2019 14:38:01 +0100 [thread overview]
Message-ID: <20190715133801.yohhd2hywzsv3uyf@e110439-lin> (raw)
In-Reply-To: <20190708110838.4ohd7pqx5ngkzcsu@queper01-lin>
On 08-Jul 12:08, Quentin Perret wrote:
> Hi Patrick,
Hi Quentin!
> On Monday 08 Jul 2019 at 09:43:53 (+0100), Patrick Bellasi wrote:
> > +static inline int uclamp_scale_from_percent(char *buf, u64 *value)
> > +{
> > + *value = SCHED_CAPACITY_SCALE;
> > +
> > + buf = strim(buf);
> > + if (strncmp("max", buf, 4)) {
> > + s64 percent;
> > + int ret;
> > +
> > + ret = cgroup_parse_float(buf, 2, &percent);
> > + if (ret)
> > + return ret;
> > +
> > + percent <<= SCHED_CAPACITY_SHIFT;
> > + *value = DIV_ROUND_CLOSEST_ULL(percent, 10000);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static inline u64 uclamp_percent_from_scale(u64 value)
> > +{
> > + return DIV_ROUND_CLOSEST_ULL(value * 10000, SCHED_CAPACITY_SCALE);
> > +}
>
> FWIW, I tried the patches and realized these conversions result in a
> 'funny' behaviour from a user's perspective. Things like this happen:
>
> $ echo 20 > cpu.uclamp.min
> $ cat cpu.uclamp.min
> 20.2
> $ echo 20.2 > cpu.uclamp.min
> $ cat cpu.uclamp.min
> 20.21
>
> Having looked at the code, I get why this is happening, but I'm not sure
> if a random user will. It's not an issue per se, but it's just a bit
> weird.
Yes, that's what we get if we need to use a "two decimal digit
precision percentage" to represent a 1024 range in kernel space.
I don't think the "percent <=> utilization" conversion code can be
made more robust. The only possible alternative I see to get back
exactly what we write in, is to store the actual request in kernel
space, alongside its conversion to the SCHED_CAPACITY_SCALE required by the
actual scheduler code.
Something along these lines (on top of what we have in this series):
---8<---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ddc5fcd4b9cf..82b28cfa5c3f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7148,40 +7148,35 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
}
}
-static inline int uclamp_scale_from_percent(char *buf, u64 *value)
+static inline int uclamp_scale_from_percent(char *buf, s64 *percent, u64 *scale)
{
- *value = SCHED_CAPACITY_SCALE;
+ *scale = SCHED_CAPACITY_SCALE;
buf = strim(buf);
if (strncmp("max", buf, 4)) {
- s64 percent;
int ret;
- ret = cgroup_parse_float(buf, 2, &percent);
+ ret = cgroup_parse_float(buf, 2, percent);
if (ret)
return ret;
- percent <<= SCHED_CAPACITY_SHIFT;
- *value = DIV_ROUND_CLOSEST_ULL(percent, 10000);
+ *scale = *percent << SCHED_CAPACITY_SHIFT;
+ *scale = DIV_ROUND_CLOSEST_ULL(*scale, 10000);
}
return 0;
}
-static inline u64 uclamp_percent_from_scale(u64 value)
-{
- return DIV_ROUND_CLOSEST_ULL(value * 10000, SCHED_CAPACITY_SCALE);
-}
-
static ssize_t cpu_uclamp_min_write(struct kernfs_open_file *of,
char *buf, size_t nbytes,
loff_t off)
{
struct task_group *tg;
u64 min_value;
+ s64 percent;
int ret;
- ret = uclamp_scale_from_percent(buf, &min_value);
+ ret = uclamp_scale_from_percent(buf, &percent, &min_value);
if (ret)
return ret;
if (min_value > SCHED_CAPACITY_SCALE)
@@ -7197,6 +7192,9 @@ static ssize_t cpu_uclamp_min_write(struct kernfs_open_file *of,
/* Update effective clamps to track the most restrictive value */
cpu_util_update_eff(of_css(of));
+ /* Keep track of the actual requested value */
+ tg->uclamp_pct[UCLAMP_MIN] = percent;
+
rcu_read_unlock();
mutex_unlock(&uclamp_mutex);
@@ -7209,9 +7207,10 @@ static ssize_t cpu_uclamp_max_write(struct kernfs_open_file *of,
{
struct task_group *tg;
u64 max_value;
+ s64 percent;
int ret;
- ret = uclamp_scale_from_percent(buf, &max_value);
+ ret = uclamp_scale_from_percent(buf, &percent, &max_value);
if (ret)
return ret;
if (max_value > SCHED_CAPACITY_SCALE)
@@ -7227,6 +7226,9 @@ static ssize_t cpu_uclamp_max_write(struct kernfs_open_file *of,
/* Update effective clamps to track the most restrictive value */
cpu_util_update_eff(of_css(of));
+ /* Keep track of the actual requested value */
+ tg->uclamp_pct[UCLAMP_MAX] = percent;
+
rcu_read_unlock();
mutex_unlock(&uclamp_mutex);
@@ -7251,7 +7253,7 @@ static inline void cpu_uclamp_print(struct seq_file *sf,
return;
}
- percent = uclamp_percent_from_scale(util_clamp);
+ percent = tg->uclamp_pct[clamp_id];
percent = div_u64_rem(percent, 100, &rem);
seq_printf(sf, "%llu.%u\n", percent, rem);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0e37f4a4e536..4f9b0c660310 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -395,6 +395,8 @@ struct task_group {
struct cfs_bandwidth cfs_bandwidth;
#ifdef CONFIG_UCLAMP_TASK_GROUP
+ /* The two decimal precision [%] value requested from user-space */
+ unsigned int uclamp_pct[UCLAMP_CNT];
/* Clamp values requested for a task group */
struct uclamp_se uclamp_req[UCLAMP_CNT];
/* Effective clamp values used for a task group */
---8<---
> I guess one way to fix this would be to revert back to having a
> 1024-scale for the cgroup interface too ... Though I understand Tejun
> wanted % for consistency with other things.
Yes that would be another option, which will also keep aligned the per-task
and system-wide APIs with the CGroups one. Although, AFAIU, having two
different APIs is not considered a major issue.
> So, I'm not sure if this is still up for discussion, but in any case I
> wanted to say I support your original idea of using a 1024-scale for the
> cgroups interface, since that would solve the 'issue' above and keeps
> things consistent with the per-task API too.
Right, I'm personally more leaning toward either going back to use
SCHED_CAPACITY_SCALE or the add the small change I suggested above.
Tejun, Peter: any preference? Alternative suggestions?
> Thanks,
> Quentin
Cheers,
Patrick
--
#include <best/regards.h>
Patrick Bellasi
next prev parent reply other threads:[~2019-07-15 13:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 8:43 [PATCH v11 0/5] Add utilization clamping support (CGroups API) Patrick Bellasi
2019-07-08 8:43 ` [PATCH v11 1/5] sched/core: uclamp: Extend CPU's cgroup controller Patrick Bellasi
2019-07-08 11:08 ` Quentin Perret
2019-07-15 13:38 ` Patrick Bellasi [this message]
2019-07-18 14:52 ` Tejun Heo
2019-07-18 15:26 ` Patrick Bellasi
2019-07-08 8:43 ` [PATCH v11 2/5] sched/core: uclamp: Propagate parent clamps Patrick Bellasi
2019-07-15 16:42 ` Michal Koutný
2019-07-16 14:07 ` Patrick Bellasi
2019-07-16 15:29 ` Michal Koutný
2019-07-16 17:55 ` Patrick Bellasi
2019-07-08 8:43 ` [PATCH v11 3/5] sched/core: uclamp: Propagate system defaults to root group Patrick Bellasi
2019-07-15 16:42 ` Michal Koutný
2019-07-16 14:34 ` Patrick Bellasi
2019-07-16 15:36 ` Michal Koutný
2019-07-16 18:00 ` Patrick Bellasi
2019-07-16 15:46 ` Joel Fernandes
2019-07-08 8:43 ` [PATCH v11 4/5] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps Patrick Bellasi
2019-07-15 16:42 ` Michal Koutný
2019-07-16 14:34 ` Patrick Bellasi
2019-07-16 15:58 ` Michal Koutný
2019-07-08 8:43 ` [PATCH v11 5/5] sched/core: uclamp: Update CPU's refcount on TG's clamp changes Patrick Bellasi
2019-07-15 16:51 ` [PATCH v11 0/5] Add utilization clamping support (CGroups API) Michal Koutný
2019-07-16 14:03 ` Patrick Bellasi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190715133801.yohhd2hywzsv3uyf@e110439-lin \
--to=patrick.bellasi@arm.com \
--cc=balsini@android.com \
--cc=dietmar.eggemann@arm.com \
--cc=joelaf@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=quentin.perret@arm.com \
--cc=rafael.j.wysocki@intel.com \
--cc=smuckle@google.com \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=tkjos@google.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).