From: shrikanth hegde <sshegde@linux.vnet.ibm.com>
To: Benjamin Segall <bsegall@google.com>
Cc: mingo@redhat.com, peterz@infradead.org,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
tglx@linutronix.de, srikar@linux.vnet.ibm.com,
arjan@linux.intel.com, svaidy@linux.ibm.com,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization
Date: Wed, 15 Feb 2023 16:31:29 +0530 [thread overview]
Message-ID: <cd37483e-bf11-ec74-c240-74935bb44809@linux.vnet.ibm.com> (raw)
In-Reply-To: <xm268rh06i97.fsf@google.com>
>>
>> 6.2.rc5 with patch
>> 1CG power 2CG power | 1CG power 2CG power
>> 1Core 218 44 315 46 | 219 45 277(+12%) 47(-2%)
>> 219 43 315 45 | 219 44 244(+22%) 48(-6%)
>> |
>> 2Core 108 48 158 52 | 109 50 114(+26%) 59(-13%)
>> 109 49 157 52 | 109 49 136(+13%) 56(-7%)
>> |
>> 4Core 60 59 89 65 | 62 58 72(+19%) 68(-5%)
>> 61 61 90 65 | 62 60 68(+24%) 73(-12%)
>> |
>> 8Core 33 77 48 83 | 33 77 37(+23%) 91(-10%)
>> 33 77 48 84 | 33 77 38(+21%) 90(-7%)
>>
>> There is no benefit at higher utilization of 50% or more. There is no
>> degradation also.
>>
>> This is RFC PATCH V2, where the code has been shifted from hrtimer to
>> sched. This patch sets an initial value as multiple of period/10.
>> Here timers can still align if the time started the cgroup is within the
>> period/10 interval. On a real life workload, time gives sufficient
>> randomness. There can be a better interleaving by being more
>> deterministic. For example, when there are 2 cgroups, they should
>> have initial value of 0/50ms or 10/60ms so on. When there are 3 cgroups,
>> 0/3/6ms or 1/4/7ms etc. That is more complicated as it has to account
>> for cgroup addition/deletion and accuracy w.r.t to period/quota.
>> If that approach is better here, then will come up with that patch.
>
> This does seem vaguely reasonable, though the power argument of
> consolidating wakeups and such is something that we intentionally do in
> other situations.
>
Thank you Benjamin for taking a look and spending time in reviewing this.
> How reasonable do you think it is to just say (and what do the
> equivalent numbers look like on your particular benchmark) "put some
> variance on your period config if you want variance"?
>Run to run variance is expected with this patch as the patch depends
on time upto last period/10 as the basis for interleaving.
What i could infer from this comment about variance. Please correct if not.
>>
>> Signed-off-by: Shrikanth Hegde<sshegde@linux.vnet.ibm.com>
>> ---
>> kernel/sched/fair.c | 17 ++++++++++++++---
>> 1 file changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff4dbbae3b10..7b69c329e05d 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5939,14 +5939,25 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>>
>> void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>> {
>> - lockdep_assert_held(&cfs_b->lock);
>> + struct hrtimer *period_timer = &cfs_b->period_timer;
>> + s64 incr = ktime_to_ns(cfs_b->period) / 10;
>> + ktime_t delta;
>> + u64 orun = 1;
>>
>> + lockdep_assert_held(&cfs_b->lock);
>> if (cfs_b->period_active)
>> return;
>>
>> cfs_b->period_active = 1;
>> - hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period);
>> - hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED);
>> + delta = ktime_sub(period_timer->base->get_time(),
>> + hrtimer_get_expires(period_timer));
>> + if (unlikely(delta >= cfs_b->period)) {
>
> Probably could have a short comment here that's something like "forward
> the hrtimer by period / 10 to reduce synchronized wakeups"
>
Sure. Will do in the next version of this patch.
>> + orun = ktime_divns(delta, incr);
>> + hrtimer_add_expires_ns(period_timer, incr * orun);
>> + }
>> +
>> + hrtimer_forward_now(period_timer, cfs_b->period);
>> + hrtimer_start_expires(period_timer, HRTIMER_MODE_ABS_PINNED);
>> }
>>
>> static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>> --
>> 2.31.1
next prev parent reply other threads:[~2023-02-15 11:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230214120502.934324-1-sshegde@linux.vnet.ibm.com>
2023-02-14 21:37 ` [RFC PATCH] sched/fair: Interleave cfs bandwidth timers for improved single thread performance at low utilization Benjamin Segall
2023-02-15 11:01 ` shrikanth hegde [this message]
2023-02-15 21:32 ` Benjamin Segall
2023-02-16 19:57 ` shrikanth hegde
2023-02-14 15:24 shrikanth hegde
2023-02-20 17:38 ` Peter Zijlstra
2023-02-21 18:53 ` shrikanth hegde
2023-02-21 21:43 ` Benjamin Segall
2023-02-22 9:36 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd37483e-bf11-ec74-c240-74935bb44809@linux.vnet.ibm.com \
--to=sshegde@linux.vnet.ibm.com \
--cc=arjan@linux.intel.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=svaidy@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).