* [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
@ 2018-07-28 0:24 Cong Wang
2018-07-30 5:28 ` Xunlei Pang
0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2018-07-28 0:24 UTC (permalink / raw)
To: linux-kernel
Cc: Cong Wang, Xunlei Pang, Ben Segall, Linus Torvalds,
Peter Zijlstra, Thomas Gleixner
Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
we should sync its ->expires_seq too. However it is missing
for distribute_cfs_runtime(), especially the slack timer call path.
Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Cc: Xunlei Pang <xlpang@linux.alibaba.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
kernel/sched/fair.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be4d344..910c50db3d74 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4857,7 +4857,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
}
static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
- u64 remaining, u64 expires)
+ u64 remaining, u64 expires, int expires_seq)
{
struct cfs_rq *cfs_rq;
u64 runtime;
@@ -4880,6 +4880,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
cfs_rq->runtime_remaining += runtime;
cfs_rq->runtime_expires = expires;
+ cfs_rq->expires_seq = expires_seq;
/* we check whether we're throttled above */
if (cfs_rq->runtime_remaining > 0)
@@ -4905,7 +4906,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
{
u64 runtime, runtime_expires;
- int throttled;
+ int throttled, expires_seq;
/* no need to continue the timer with no bandwidth constraint */
if (cfs_b->quota == RUNTIME_INF)
@@ -4933,6 +4934,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
cfs_b->nr_throttled += overrun;
runtime_expires = cfs_b->runtime_expires;
+ expires_seq = cfs_b->expires_seq;
/*
* This check is repeated as we are holding onto the new bandwidth while
@@ -4946,7 +4948,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
raw_spin_unlock(&cfs_b->lock);
/* we can't nest cfs_b->lock while distributing bandwidth */
runtime = distribute_cfs_runtime(cfs_b, runtime,
- runtime_expires);
+ runtime_expires, expires_seq);
raw_spin_lock(&cfs_b->lock);
throttled = !list_empty(&cfs_b->throttled_cfs_rq);
@@ -5055,6 +5057,7 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
{
u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
+ int expires_seq;
u64 expires;
/* confirm we're still not at a refresh boundary */
@@ -5068,12 +5071,13 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
runtime = cfs_b->runtime;
expires = cfs_b->runtime_expires;
+ expires_seq = cfs_b->expires_seq;
raw_spin_unlock(&cfs_b->lock);
if (!runtime)
return;
- runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
+ runtime = distribute_cfs_runtime(cfs_b, runtime, expires, expires_seq);
raw_spin_lock(&cfs_b->lock);
if (expires == cfs_b->runtime_expires)
--
2.14.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-28 0:24 [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime() Cong Wang
@ 2018-07-30 5:28 ` Xunlei Pang
2018-07-30 17:32 ` bsegall
2018-07-30 17:55 ` Cong Wang
0 siblings, 2 replies; 11+ messages in thread
From: Xunlei Pang @ 2018-07-30 5:28 UTC (permalink / raw)
To: Cong Wang, linux-kernel
Cc: Ben Segall, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
Hi Cong,
On 7/28/18 8:24 AM, Cong Wang wrote:
> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
> we should sync its ->expires_seq too. However it is missing
> for distribute_cfs_runtime(), especially the slack timer call path.
I don't think it's a problem, as expires_seq will get synced in
assign_cfs_rq_runtime().
Thanks,
Xunlei
>
> Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
> Cc: Xunlei Pang <xlpang@linux.alibaba.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> kernel/sched/fair.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2f0a0be4d344..910c50db3d74 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4857,7 +4857,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> }
>
> static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
> - u64 remaining, u64 expires)
> + u64 remaining, u64 expires, int expires_seq)
> {
> struct cfs_rq *cfs_rq;
> u64 runtime;
> @@ -4880,6 +4880,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>
> cfs_rq->runtime_remaining += runtime;
> cfs_rq->runtime_expires = expires;
> + cfs_rq->expires_seq = expires_seq;
>
> /* we check whether we're throttled above */
> if (cfs_rq->runtime_remaining > 0)
> @@ -4905,7 +4906,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
> static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
> {
> u64 runtime, runtime_expires;
> - int throttled;
> + int throttled, expires_seq;
>
> /* no need to continue the timer with no bandwidth constraint */
> if (cfs_b->quota == RUNTIME_INF)
> @@ -4933,6 +4934,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
> cfs_b->nr_throttled += overrun;
>
> runtime_expires = cfs_b->runtime_expires;
> + expires_seq = cfs_b->expires_seq;
>
> /*
> * This check is repeated as we are holding onto the new bandwidth while
> @@ -4946,7 +4948,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
> raw_spin_unlock(&cfs_b->lock);
> /* we can't nest cfs_b->lock while distributing bandwidth */
> runtime = distribute_cfs_runtime(cfs_b, runtime,
> - runtime_expires);
> + runtime_expires, expires_seq);
> raw_spin_lock(&cfs_b->lock);
>
> throttled = !list_empty(&cfs_b->throttled_cfs_rq);
> @@ -5055,6 +5057,7 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
> static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
> {
> u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
> + int expires_seq;
> u64 expires;
>
> /* confirm we're still not at a refresh boundary */
> @@ -5068,12 +5071,13 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
> runtime = cfs_b->runtime;
>
> expires = cfs_b->runtime_expires;
> + expires_seq = cfs_b->expires_seq;
> raw_spin_unlock(&cfs_b->lock);
>
> if (!runtime)
> return;
>
> - runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
> + runtime = distribute_cfs_runtime(cfs_b, runtime, expires, expires_seq);
>
> raw_spin_lock(&cfs_b->lock);
> if (expires == cfs_b->runtime_expires)
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-30 5:28 ` Xunlei Pang
@ 2018-07-30 17:32 ` bsegall
2018-07-30 17:55 ` Cong Wang
1 sibling, 0 replies; 11+ messages in thread
From: bsegall @ 2018-07-30 17:32 UTC (permalink / raw)
To: Xunlei Pang
Cc: Cong Wang, linux-kernel, Ben Segall, Linus Torvalds,
Peter Zijlstra, Thomas Gleixner
Xunlei Pang <xlpang@linux.alibaba.com> writes:
> Hi Cong,
>
> On 7/28/18 8:24 AM, Cong Wang wrote:
>> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
>> we should sync its ->expires_seq too. However it is missing
>> for distribute_cfs_runtime(), especially the slack timer call path.
>
> I don't think it's a problem, as expires_seq will get synced in
> assign_cfs_rq_runtime().
>
> Thanks,
> Xunlei
It does seem unlikely to actually come up since the cfs_rq would have to
not run until the period was expired-locally-but-not-globally, but
there's no reason to not fix it.
>
>>
>> Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
>> Cc: Xunlei Pang <xlpang@linux.alibaba.com>
>> Cc: Ben Segall <bsegall@google.com>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> ---
>> kernel/sched/fair.c | 12 ++++++++----
>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 2f0a0be4d344..910c50db3d74 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4857,7 +4857,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>> }
>>
>> static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>> - u64 remaining, u64 expires)
>> + u64 remaining, u64 expires, int expires_seq)
>> {
>> struct cfs_rq *cfs_rq;
>> u64 runtime;
>> @@ -4880,6 +4880,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>>
>> cfs_rq->runtime_remaining += runtime;
>> cfs_rq->runtime_expires = expires;
>> + cfs_rq->expires_seq = expires_seq;
>>
>> /* we check whether we're throttled above */
>> if (cfs_rq->runtime_remaining > 0)
>> @@ -4905,7 +4906,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
>> static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>> {
>> u64 runtime, runtime_expires;
>> - int throttled;
>> + int throttled, expires_seq;
>>
>> /* no need to continue the timer with no bandwidth constraint */
>> if (cfs_b->quota == RUNTIME_INF)
>> @@ -4933,6 +4934,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>> cfs_b->nr_throttled += overrun;
>>
>> runtime_expires = cfs_b->runtime_expires;
>> + expires_seq = cfs_b->expires_seq;
>>
>> /*
>> * This check is repeated as we are holding onto the new bandwidth while
>> @@ -4946,7 +4948,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
>> raw_spin_unlock(&cfs_b->lock);
>> /* we can't nest cfs_b->lock while distributing bandwidth */
>> runtime = distribute_cfs_runtime(cfs_b, runtime,
>> - runtime_expires);
>> + runtime_expires, expires_seq);
>> raw_spin_lock(&cfs_b->lock);
>>
>> throttled = !list_empty(&cfs_b->throttled_cfs_rq);
>> @@ -5055,6 +5057,7 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>> static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>> {
>> u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
>> + int expires_seq;
>> u64 expires;
>>
>> /* confirm we're still not at a refresh boundary */
>> @@ -5068,12 +5071,13 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
>> runtime = cfs_b->runtime;
>>
>> expires = cfs_b->runtime_expires;
>> + expires_seq = cfs_b->expires_seq;
>> raw_spin_unlock(&cfs_b->lock);
>>
>> if (!runtime)
>> return;
>>
>> - runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
>> + runtime = distribute_cfs_runtime(cfs_b, runtime, expires, expires_seq);
>>
>> raw_spin_lock(&cfs_b->lock);
>> if (expires == cfs_b->runtime_expires)
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-30 5:28 ` Xunlei Pang
2018-07-30 17:32 ` bsegall
@ 2018-07-30 17:55 ` Cong Wang
2018-07-31 14:58 ` Xunlei Pang
1 sibling, 1 reply; 11+ messages in thread
From: Cong Wang @ 2018-07-30 17:55 UTC (permalink / raw)
To: Xunlei Pang
Cc: LKML, Ben Segall, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
On Sun, Jul 29, 2018 at 10:29 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
>
> Hi Cong,
>
> On 7/28/18 8:24 AM, Cong Wang wrote:
> > Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
> > we should sync its ->expires_seq too. However it is missing
> > for distribute_cfs_runtime(), especially the slack timer call path.
>
> I don't think it's a problem, as expires_seq will get synced in
> assign_cfs_rq_runtime().
Sure, but there is a small window during which they are not synced.
Why do you want to wait until the next assign_cfs_rq_runtime() when
you already know runtime_expires is synced?
Also, expire_cfs_rq_runtime() is called before assign_cfs_rq_runtime()
inside __account_cfs_rq_runtime(), which means the check of
cfs_rq->expires_seq is not accurate for unthrottling case if the clock
drift happens soon enough?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-30 17:55 ` Cong Wang
@ 2018-07-31 14:58 ` Xunlei Pang
2018-07-31 17:13 ` bsegall
0 siblings, 1 reply; 11+ messages in thread
From: Xunlei Pang @ 2018-07-31 14:58 UTC (permalink / raw)
To: Cong Wang
Cc: LKML, Ben Segall, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
On 7/31/18 1:55 AM, Cong Wang wrote:
> On Sun, Jul 29, 2018 at 10:29 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
>>
>> Hi Cong,
>>
>> On 7/28/18 8:24 AM, Cong Wang wrote:
>>> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
>>> we should sync its ->expires_seq too. However it is missing
>>> for distribute_cfs_runtime(), especially the slack timer call path.
>>
>> I don't think it's a problem, as expires_seq will get synced in
>> assign_cfs_rq_runtime().
>
> Sure, but there is a small window during which they are not synced.
> Why do you want to wait until the next assign_cfs_rq_runtime() when
> you already know runtime_expires is synced?
>
> Also, expire_cfs_rq_runtime() is called before assign_cfs_rq_runtime()
> inside __account_cfs_rq_runtime(), which means the check of
> cfs_rq->expires_seq is not accurate for unthrottling case if the clock
> drift happens soon enough?
>
expire_cfs_rq_runtime():
if (cfs_rq->expires_seq == cfs_b->expires_seq) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;
} else {
/* global deadline is ahead, expiration has passed */
cfs_rq->runtime_remaining = 0;
}
So if clock drift happens soon, then expires_seq decides the correct
thing we should do: if cfs_b->expires_seq advanced, then clear the stale
cfs_rq->runtime_remaining from the slack timer of the past period, then
assign_cfs_rq_runtime() will refresh them afterwards, otherwise it is a
real clock drift. I am still not getting where the race is?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-31 14:58 ` Xunlei Pang
@ 2018-07-31 17:13 ` bsegall
2018-07-31 20:55 ` Cong Wang
0 siblings, 1 reply; 11+ messages in thread
From: bsegall @ 2018-07-31 17:13 UTC (permalink / raw)
To: Xunlei Pang
Cc: Cong Wang, LKML, Ben Segall, Linus Torvalds, Peter Zijlstra,
Thomas Gleixner
Xunlei Pang <xlpang@linux.alibaba.com> writes:
> On 7/31/18 1:55 AM, Cong Wang wrote:
>> On Sun, Jul 29, 2018 at 10:29 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
>>>
>>> Hi Cong,
>>>
>>> On 7/28/18 8:24 AM, Cong Wang wrote:
>>>> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
>>>> we should sync its ->expires_seq too. However it is missing
>>>> for distribute_cfs_runtime(), especially the slack timer call path.
>>>
>>> I don't think it's a problem, as expires_seq will get synced in
>>> assign_cfs_rq_runtime().
>>
>> Sure, but there is a small window during which they are not synced.
>> Why do you want to wait until the next assign_cfs_rq_runtime() when
>> you already know runtime_expires is synced?
>>
>> Also, expire_cfs_rq_runtime() is called before assign_cfs_rq_runtime()
>> inside __account_cfs_rq_runtime(), which means the check of
>> cfs_rq->expires_seq is not accurate for unthrottling case if the clock
>> drift happens soon enough?
>>
>
> expire_cfs_rq_runtime():
> if (cfs_rq->expires_seq == cfs_b->expires_seq) {
> /* extend local deadline, drift is bounded above by 2 ticks */
> cfs_rq->runtime_expires += TICK_NSEC;
> } else {
> /* global deadline is ahead, expiration has passed */
> cfs_rq->runtime_remaining = 0;
> }
>
> So if clock drift happens soon, then expires_seq decides the correct
> thing we should do: if cfs_b->expires_seq advanced, then clear the stale
> cfs_rq->runtime_remaining from the slack timer of the past period, then
> assign_cfs_rq_runtime() will refresh them afterwards, otherwise it is a
> real clock drift. I am still not getting where the race is?
Nothing /important/ goes wrong because distribute_cfs_runtime only fills
runtime_remaining up to 1, not a real amount.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-31 17:13 ` bsegall
@ 2018-07-31 20:55 ` Cong Wang
2018-08-01 3:24 ` Xunlei Pang
2018-08-01 17:17 ` bsegall
0 siblings, 2 replies; 11+ messages in thread
From: Cong Wang @ 2018-07-31 20:55 UTC (permalink / raw)
To: Ben Segall
Cc: Xunlei Pang, LKML, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
On Tue, Jul 31, 2018 at 10:13 AM <bsegall@google.com> wrote:
>
> Xunlei Pang <xlpang@linux.alibaba.com> writes:
>
> > On 7/31/18 1:55 AM, Cong Wang wrote:
> >> On Sun, Jul 29, 2018 at 10:29 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
> >>>
> >>> Hi Cong,
> >>>
> >>> On 7/28/18 8:24 AM, Cong Wang wrote:
> >>>> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
> >>>> we should sync its ->expires_seq too. However it is missing
> >>>> for distribute_cfs_runtime(), especially the slack timer call path.
> >>>
> >>> I don't think it's a problem, as expires_seq will get synced in
> >>> assign_cfs_rq_runtime().
> >>
> >> Sure, but there is a small window during which they are not synced.
> >> Why do you want to wait until the next assign_cfs_rq_runtime() when
> >> you already know runtime_expires is synced?
> >>
> >> Also, expire_cfs_rq_runtime() is called before assign_cfs_rq_runtime()
> >> inside __account_cfs_rq_runtime(), which means the check of
> >> cfs_rq->expires_seq is not accurate for unthrottling case if the clock
> >> drift happens soon enough?
> >>
> >
> > expire_cfs_rq_runtime():
> > if (cfs_rq->expires_seq == cfs_b->expires_seq) {
> > /* extend local deadline, drift is bounded above by 2 ticks */
> > cfs_rq->runtime_expires += TICK_NSEC;
> > } else {
> > /* global deadline is ahead, expiration has passed */
> > cfs_rq->runtime_remaining = 0;
> > }
> >
> > So if clock drift happens soon, then expires_seq decides the correct
> > thing we should do: if cfs_b->expires_seq advanced, then clear the stale
> > cfs_rq->runtime_remaining from the slack timer of the past period, then
> > assign_cfs_rq_runtime() will refresh them afterwards, otherwise it is a
> > real clock drift. I am still not getting where the race is?
But expires_seq is supposed to be the same here, after
distribute_cfs_runtime(), therefore runtime_remaining is not supposed
to be cleared.
Which part do I misunderstand? expires_seq should not be same here?
Or you are saying a wrongly clear of runtime_remaning is fine?
>
> Nothing /important/ goes wrong because distribute_cfs_runtime only fills
> runtime_remaining up to 1, not a real amount.
No, runtime_remaining is updated right before expire_cfs_rq_runtime():
static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
{
/* dock delta_exec before expiring quota (as it could span periods) */
cfs_rq->runtime_remaining -= delta_exec;
expire_cfs_rq_runtime(cfs_rq);
so almost certainly it can't be 1.
Which means the following check could be passed:
4655 if (cfs_rq->runtime_remaining < 0)
4656 return;
therefore we are reaching the clock drift logic code inside
expire_cfs_rq_runtime()
where expires_seq is supposed to be same as they should be sync'ed.
Therefore without patch, we wrongly clear the runtime_remainng?
Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-31 20:55 ` Cong Wang
@ 2018-08-01 3:24 ` Xunlei Pang
2018-08-03 18:57 ` Cong Wang
2018-08-01 17:17 ` bsegall
1 sibling, 1 reply; 11+ messages in thread
From: Xunlei Pang @ 2018-08-01 3:24 UTC (permalink / raw)
To: Cong Wang, Ben Segall
Cc: LKML, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
On 8/1/18 4:55 AM, Cong Wang wrote:
> On Tue, Jul 31, 2018 at 10:13 AM <bsegall@google.com> wrote:
>>
>> Xunlei Pang <xlpang@linux.alibaba.com> writes:
>>
>>> On 7/31/18 1:55 AM, Cong Wang wrote:
>>>> On Sun, Jul 29, 2018 at 10:29 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
>>>>>
>>>>> Hi Cong,
>>>>>
>>>>> On 7/28/18 8:24 AM, Cong Wang wrote:
>>>>>> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
>>>>>> we should sync its ->expires_seq too. However it is missing
>>>>>> for distribute_cfs_runtime(), especially the slack timer call path.
>>>>>
>>>>> I don't think it's a problem, as expires_seq will get synced in
>>>>> assign_cfs_rq_runtime().
>>>>
>>>> Sure, but there is a small window during which they are not synced.
>>>> Why do you want to wait until the next assign_cfs_rq_runtime() when
>>>> you already know runtime_expires is synced?
>>>>
>>>> Also, expire_cfs_rq_runtime() is called before assign_cfs_rq_runtime()
>>>> inside __account_cfs_rq_runtime(), which means the check of
>>>> cfs_rq->expires_seq is not accurate for unthrottling case if the clock
>>>> drift happens soon enough?
>>>>
>>>
>>> expire_cfs_rq_runtime():
>>> if (cfs_rq->expires_seq == cfs_b->expires_seq) {
>>> /* extend local deadline, drift is bounded above by 2 ticks */
>>> cfs_rq->runtime_expires += TICK_NSEC;
>>> } else {
>>> /* global deadline is ahead, expiration has passed */
>>> cfs_rq->runtime_remaining = 0;
>>> }
>>>
>>> So if clock drift happens soon, then expires_seq decides the correct
>>> thing we should do: if cfs_b->expires_seq advanced, then clear the stale
>>> cfs_rq->runtime_remaining from the slack timer of the past period, then
>>> assign_cfs_rq_runtime() will refresh them afterwards, otherwise it is a
>>> real clock drift. I am still not getting where the race is?
>
> But expires_seq is supposed to be the same here, after
> distribute_cfs_runtime(), therefore runtime_remaining is not supposed
> to be cleared.
>
> Which part do I misunderstand? expires_seq should not be same here?
> Or you are saying a wrongly clear of runtime_remaning is fine?
>
Let's see the unthrottle cases.
1. for the periodic timer
distribute_cfs_runtime updates the throttled cfs_rq->runtime_expires to
be a new value, so expire_cfs_rq_runtime does nothing because of:
rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires < 0
Afterwards assign_cfs_rq_runtime() will sync its expires_seq.
2. for the slack timer
the two expires_seq should be the same, so if clock drift happens soon,
expire_cfs_rq_runtime regards it as true clock drift:
cfs_rq->runtime_expires += TICK_NSEC
If it happens that global expires_seq advances, it also doesn't matter,
expire_cfs_rq_runtime will clear the stale expire_cfs_rq_runtime as
expected.
>
>>
>> Nothing /important/ goes wrong because distribute_cfs_runtime only fills
>> runtime_remaining up to 1, not a real amount.
>
> No, runtime_remaining is updated right before expire_cfs_rq_runtime():
>
> static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
> {
> /* dock delta_exec before expiring quota (as it could span periods) */
> cfs_rq->runtime_remaining -= delta_exec;
> expire_cfs_rq_runtime(cfs_rq);
>
> so almost certainly it can't be 1.
I think Ben means it firstly gets a distributtion of 1 to run after
unthrottling, soon it will have a negative runtime_remaining, and go
to assign_cfs_rq_runtime().
Thanks,
Xunlei
>
> Which means the following check could be passed:
>
> 4655 if (cfs_rq->runtime_remaining < 0)
> 4656 return;
>
> therefore we are reaching the clock drift logic code inside
> expire_cfs_rq_runtime()
> where expires_seq is supposed to be same as they should be sync'ed.
> Therefore without patch, we wrongly clear the runtime_remainng?
>
> Thanks.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-08-01 3:24 ` Xunlei Pang
@ 2018-08-03 18:57 ` Cong Wang
0 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2018-08-03 18:57 UTC (permalink / raw)
To: Xunlei Pang
Cc: Ben Segall, LKML, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
On Tue, Jul 31, 2018 at 8:24 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
>
> Let's see the unthrottle cases.
> 1. for the periodic timer
> distribute_cfs_runtime updates the throttled cfs_rq->runtime_expires to
> be a new value, so expire_cfs_rq_runtime does nothing because of:
> rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires < 0
>
> Afterwards assign_cfs_rq_runtime() will sync its expires_seq.
Is there any guarantee rq_clock(cfs_rq) is always ahead of
cfs_rq->runtime_expires in this case?
I doubt, because cfs_rq->runtime_expires could be assigned
by a sched_clock() on a different CPU running the periodic timer.
Also, rq_clock() is behind sched_clock() on the same CPU too,
sometimes it is merely hundreds of nanoseconds, sometimes it is
tens of thousands nanoseconds in my environment. (I have a
different patch to address this, but still not sure if it is correct.)
>
> 2. for the slack timer
> the two expires_seq should be the same, so if clock drift happens soon,
> expire_cfs_rq_runtime regards it as true clock drift:
> cfs_rq->runtime_expires += TICK_NSEC
> If it happens that global expires_seq advances, it also doesn't matter,
> expire_cfs_rq_runtime will clear the stale expire_cfs_rq_runtime as
> expected.
Hmm, looks like due to the runtime_refresh_within() check in
slack timer.
>
> >
> >>
> >> Nothing /important/ goes wrong because distribute_cfs_runtime only fills
> >> runtime_remaining up to 1, not a real amount.
> >
> > No, runtime_remaining is updated right before expire_cfs_rq_runtime():
> >
> > static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
> > {
> > /* dock delta_exec before expiring quota (as it could span periods) */
> > cfs_rq->runtime_remaining -= delta_exec;
> > expire_cfs_rq_runtime(cfs_rq);
> >
> > so almost certainly it can't be 1.
>
> I think Ben means it firstly gets a distributtion of 1 to run after
> unthrottling, soon it will have a negative runtime_remaining, and go
> to assign_cfs_rq_runtime().
That is obvious, being 1 in distribute_cfs_runtime is not relevant to the
discussion here.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-07-31 20:55 ` Cong Wang
2018-08-01 3:24 ` Xunlei Pang
@ 2018-08-01 17:17 ` bsegall
2018-08-03 21:56 ` Cong Wang
1 sibling, 1 reply; 11+ messages in thread
From: bsegall @ 2018-08-01 17:17 UTC (permalink / raw)
To: Cong Wang
Cc: Ben Segall, Xunlei Pang, LKML, Linus Torvalds, Peter Zijlstra,
Thomas Gleixner
Cong Wang <xiyou.wangcong@gmail.com> writes:
> On Tue, Jul 31, 2018 at 10:13 AM <bsegall@google.com> wrote:
>>
>> Xunlei Pang <xlpang@linux.alibaba.com> writes:
>>
>> > On 7/31/18 1:55 AM, Cong Wang wrote:
>> >> On Sun, Jul 29, 2018 at 10:29 PM Xunlei Pang <xlpang@linux.alibaba.com> wrote:
>> >>>
>> >>> Hi Cong,
>> >>>
>> >>> On 7/28/18 8:24 AM, Cong Wang wrote:
>> >>>> Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
>> >>>> we should sync its ->expires_seq too. However it is missing
>> >>>> for distribute_cfs_runtime(), especially the slack timer call path.
>> >>>
>> >>> I don't think it's a problem, as expires_seq will get synced in
>> >>> assign_cfs_rq_runtime().
>> >>
>> >> Sure, but there is a small window during which they are not synced.
>> >> Why do you want to wait until the next assign_cfs_rq_runtime() when
>> >> you already know runtime_expires is synced?
>> >>
>> >> Also, expire_cfs_rq_runtime() is called before assign_cfs_rq_runtime()
>> >> inside __account_cfs_rq_runtime(), which means the check of
>> >> cfs_rq->expires_seq is not accurate for unthrottling case if the clock
>> >> drift happens soon enough?
>> >>
>> >
>> > expire_cfs_rq_runtime():
>> > if (cfs_rq->expires_seq == cfs_b->expires_seq) {
>> > /* extend local deadline, drift is bounded above by 2 ticks */
>> > cfs_rq->runtime_expires += TICK_NSEC;
>> > } else {
>> > /* global deadline is ahead, expiration has passed */
>> > cfs_rq->runtime_remaining = 0;
>> > }
>> >
>> > So if clock drift happens soon, then expires_seq decides the correct
>> > thing we should do: if cfs_b->expires_seq advanced, then clear the stale
>> > cfs_rq->runtime_remaining from the slack timer of the past period, then
>> > assign_cfs_rq_runtime() will refresh them afterwards, otherwise it is a
>> > real clock drift. I am still not getting where the race is?
>
> But expires_seq is supposed to be the same here, after
> distribute_cfs_runtime(), therefore runtime_remaining is not supposed
> to be cleared.
>
> Which part do I misunderstand? expires_seq should not be same here?
> Or you are saying a wrongly clear of runtime_remaning is fine?
>
>
>>
>> Nothing /important/ goes wrong because distribute_cfs_runtime only fills
>> runtime_remaining up to 1, not a real amount.
>
> No, runtime_remaining is updated right before expire_cfs_rq_runtime():
>
> static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
> {
> /* dock delta_exec before expiring quota (as it could span periods) */
> cfs_rq->runtime_remaining -= delta_exec;
> expire_cfs_rq_runtime(cfs_rq);
>
> so almost certainly it can't be 1.
Yes, in practice what's actually going to happen is that the
runtime_remaining will be put to 1 by distribute, the cfs_rq will be
unthrottled, and then when it runs it will go negative immediately and
hit the negative check in expires, so expires_seq being wrong will not
actually matter. In addition, the worst thing that will happen if one of
the account_cfs_rq_runtime(cfs_rq, 0) paths is hit first is that it will
lose 1ns of quota, which also doesn't really matter.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()
2018-08-01 17:17 ` bsegall
@ 2018-08-03 21:56 ` Cong Wang
0 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2018-08-03 21:56 UTC (permalink / raw)
To: Ben Segall
Cc: Xunlei Pang, LKML, Linus Torvalds, Peter Zijlstra, Thomas Gleixner
On Wed, Aug 1, 2018 at 10:17 AM <bsegall@google.com> wrote:
> Yes, in practice what's actually going to happen is that the
> runtime_remaining will be put to 1 by distribute, the cfs_rq will be
> unthrottled, and then when it runs it will go negative immediately and
> hit the negative check in expires, so expires_seq being wrong will not
> actually matter. In addition, the worst thing that will happen if one of
> the account_cfs_rq_runtime(cfs_rq, 0) paths is hit first is that it will
> lose 1ns of quota, which also doesn't really matter.
Ah, I see.
Thanks!
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-08-03 21:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-28 0:24 [PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime() Cong Wang
2018-07-30 5:28 ` Xunlei Pang
2018-07-30 17:32 ` bsegall
2018-07-30 17:55 ` Cong Wang
2018-07-31 14:58 ` Xunlei Pang
2018-07-31 17:13 ` bsegall
2018-07-31 20:55 ` Cong Wang
2018-08-01 3:24 ` Xunlei Pang
2018-08-03 18:57 ` Cong Wang
2018-08-01 17:17 ` bsegall
2018-08-03 21:56 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).