stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
       [not found] <20190319130005.25492-1-pauld@redhat.com>
@ 2019-04-03  8:38 ` tip-bot for Phil Auld
       [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
  1 sibling, 1 reply; 8+ messages in thread
From: tip-bot for Phil Auld @ 2019-04-03  8:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, pauld, efault, tglx, peterz, anton, bsegall, torvalds,
	stable, hpa, linux-kernel

Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
Author:     Phil Auld <pauld@redhat.com>
AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 3 Apr 2019 09:50:23 +0200

sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..d4cce633eac8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota /= old;
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				new/NSEC_PER_USEC,
+				cfs_b->quota/NSEC_PER_USEC);
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
       [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
@ 2019-04-08 13:42     ` Phil Auld
  2019-04-08 14:50       ` Sasha Levin
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Auld @ 2019-04-08 13:42 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
> 
> The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
> 
> v5.0.6: Failed to apply! Possible dependencies:
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> 
> v4.19.33: Failed to apply! Possible dependencies:
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> 
> v4.14.110: Failed to apply! Possible dependencies:
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>

This is a minor context difference. There is no actual dependency on the
c0ad4aa4d841 patch.  It would be easy to produce new version that could
go in these trees. I'm not sure what the right action is in that case.
Should I spin a new version with the different locking in the context?


Also, I can't find this commit in tip. It's visible in gitweb but it's not
in the tip tree as far as I can tell. 


> v4.9.167: Failed to apply! Possible dependencies:
>     8a8c69c32778 ("sched/core: Add rq->lock wrappers")
>     92509b732baf ("sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock")
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>     cb42c9a3ebbb ("sched/core: Add debugging code to catch missing update_rq_clock() calls")
>     d1ccc66df8bf ("sched/core: Clean up comments")
>     d8ac897137a2 ("sched/core: Add wrappers for lockdep_(un)pin_lock()")
> 
> v4.4.178: Failed to apply! Possible dependencies:
>     2a67e741bbbc ("rcu: Create transitive rnp->lock acquisition functions")
>     3e71a462dd48 ("sched/core: Move task_rq_lock() out of line")
>     3ea94de15ce9 ("sched/core: Fix incorrect wait time and wait count statistics")
>     46a5d164db53 ("rcu: Stop disabling interrupts in scheduler fastpaths")
>     8a8c69c32778 ("sched/core: Add rq->lock wrappers")
>     958c5f848e17 ("stop_machine: Change stop_one_cpu() to rely on cpu_stop_queue_work()")
>     bf89a304722f ("stop_machine: Avoid a sleep and wakeup in stop_one_cpu()")
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>     cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
>     d8ac897137a2 ("sched/core: Add wrappers for lockdep_(un)pin_lock()")
>     e7904a28f533 ("locking/lockdep, sched/core: Implement a better lock pinning scheme")
>     fecbf6f01fbd ("rcu: Simplify rcu_sched_qs() control flow")
> 
> v3.18.138: Failed to apply! Possible dependencies:
>     1a43a14a5bd9 ("sched: Fix schedule_tail() to disable preemption")
>     1b537c7d1e58 ("sched/core: Remove check of p->sched_class")
>     1d7e974cbf2f ("sched/deadline: Don't check SD_BALANCE_FORK")
>     3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()")
>     3e71a462dd48 ("sched/core: Move task_rq_lock() out of line")
>     4c9a4bc89a9c ("sched: Allow balance callbacks for check_class_changed()")
>     5cc389bcee08 ("sched: Move code around")
>     5e16bbc2fb40 ("sched: Streamline the task migration locking a little")
>     67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()")
>     6c1d9410f007 ("sched: Move p->nr_cpus_allowed check to select_task_rq()")
>     74b8a4cb6ce3 ("sched: Clarify ordering between task_rq_lock() and move_queued_task()")
>     8a8c69c32778 ("sched/core: Add rq->lock wrappers")
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>     cbce1a686700 ("sched,lockdep: Employ lock pinning")
>     dfa50b605c2a ("sched: Make finish_task_switch() return 'struct rq *'")
>     f4e9d94a5bf6 ("sched/deadline: Don't balance during wakeup if wakee is pinned")
> 
> 
> How should we proceed with this patch?
>

I haven't look that far back.  This patch should be pretty self contained. I don't
think it's worth pulling it back to these trees if it would require all of these
patches.  The risk would out-weigh the reward.


Cheers,
Phil


> --
> Thanks,
> Sasha

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 13:42     ` Phil Auld
@ 2019-04-08 14:50       ` Sasha Levin
  2019-04-08 17:03         ` Phil Auld
  0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2019-04-08 14:50 UTC (permalink / raw)
  To: Phil Auld
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
>On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
>> Hi,
>>
>> [This is an automated email]
>>
>> This commit has been processed because it contains a -stable tag.
>> The stable tag indicates that it's relevant for the following trees: all
>>
>> The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
>>
>> v5.0.6: Failed to apply! Possible dependencies:
>>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>>
>> v4.19.33: Failed to apply! Possible dependencies:
>>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>>
>> v4.14.110: Failed to apply! Possible dependencies:
>>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>>
>
>This is a minor context difference. There is no actual dependency on the
>c0ad4aa4d841 patch.  It would be easy to produce new version that could
>go in these trees. I'm not sure what the right action is in that case.
>Should I spin a new version with the different locking in the context?

Please do :)

The algorithm does the dependency analysis by looking at surrounding
code rather than actual functional dependency.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 14:50       ` Sasha Levin
@ 2019-04-08 17:03         ` Phil Auld
  2019-04-08 17:06           ` Sasha Levin
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Auld @ 2019-04-08 17:03 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 10:50:26AM -0400 Sasha Levin wrote:
> On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
> > On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
> > > Hi,
> > > 
> > > [This is an automated email]
> > > 
> > > This commit has been processed because it contains a -stable tag.
> > > The stable tag indicates that it's relevant for the following trees: all
> > > 
> > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
> > > 
> > > v5.0.6: Failed to apply! Possible dependencies:
> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > 
> > > v4.19.33: Failed to apply! Possible dependencies:
> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > 
> > > v4.14.110: Failed to apply! Possible dependencies:
> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > 
> > 
> > This is a minor context difference. There is no actual dependency on the
> > c0ad4aa4d841 patch.  It would be easy to produce new version that could
> > go in these trees. I'm not sure what the right action is in that case.
> > Should I spin a new version with the different locking in the context?
> 
> Please do :)
>

Sure. I'm just not sure how to post it. It only shows up in this tip-bot
email and on gitweb. It's not in tip.git and not in Linus' upstream tree.

I've updated the patch at it will apply now to v5.0.6, v4.19.33, v4.14.110,
v4.9.167, v4.4.178  all with increasing offsets but nothing else.

v3.18.138 won't take it without more work. I'd be inclined to skip that one.


> The algorithm does the dependency analysis by looking at surrounding
> code rather than actual functional dependency.


That's pretty cool though :)

Cheers,
Phil


>
> --
> Thanks,
> Sasha

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 17:03         ` Phil Auld
@ 2019-04-08 17:06           ` Sasha Levin
  2019-04-08 17:31             ` Phil Auld
  0 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2019-04-08 17:06 UTC (permalink / raw)
  To: Phil Auld
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 01:03:05PM -0400, Phil Auld wrote:
>On Mon, Apr 08, 2019 at 10:50:26AM -0400 Sasha Levin wrote:
>> On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
>> > On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
>> > > Hi,
>> > >
>> > > [This is an automated email]
>> > >
>> > > This commit has been processed because it contains a -stable tag.
>> > > The stable tag indicates that it's relevant for the following trees: all
>> > >
>> > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
>> > >
>> > > v5.0.6: Failed to apply! Possible dependencies:
>> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>> > >
>> > > v4.19.33: Failed to apply! Possible dependencies:
>> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>> > >
>> > > v4.14.110: Failed to apply! Possible dependencies:
>> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>> > >
>> >
>> > This is a minor context difference. There is no actual dependency on the
>> > c0ad4aa4d841 patch.  It would be easy to produce new version that could
>> > go in these trees. I'm not sure what the right action is in that case.
>> > Should I spin a new version with the different locking in the context?
>>
>> Please do :)
>>
>
>Sure. I'm just not sure how to post it. It only shows up in this tip-bot
>email and on gitweb. It's not in tip.git and not in Linus' upstream tree.
>
>I've updated the patch at it will apply now to v5.0.6, v4.19.33, v4.14.110,
>v4.9.167, v4.4.178  all with increasing offsets but nothing else.

You can either reply to this thread with the patch(es), or just send
them out and annotate one way or the other that they should go to their
appropriate stable trees.

>v3.18.138 won't take it without more work. I'd be inclined to skip that one.

No problem there.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 17:06           ` Sasha Levin
@ 2019-04-08 17:31             ` Phil Auld
  0 siblings, 0 replies; 8+ messages in thread
From: Phil Auld @ 2019-04-08 17:31 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 01:06:12PM -0400 Sasha Levin wrote:
> On Mon, Apr 08, 2019 at 01:03:05PM -0400, Phil Auld wrote:
> > On Mon, Apr 08, 2019 at 10:50:26AM -0400 Sasha Levin wrote:
> > > On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
> > > > On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
> > > > > Hi,
> > > > >
> > > > > [This is an automated email]
> > > > >
> > > > > This commit has been processed because it contains a -stable tag.
> > > > > The stable tag indicates that it's relevant for the following trees: all
> > > > >
> > > > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
> > > > >
> > > > > v5.0.6: Failed to apply! Possible dependencies:
> > > > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > > >
> > > > > v4.19.33: Failed to apply! Possible dependencies:
> > > > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > > >
> > > > > v4.14.110: Failed to apply! Possible dependencies:
> > > > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > > >
> > > >
> > > > This is a minor context difference. There is no actual dependency on the
> > > > c0ad4aa4d841 patch.  It would be easy to produce new version that could
> > > > go in these trees. I'm not sure what the right action is in that case.
> > > > Should I spin a new version with the different locking in the context?
> > > 
> > > Please do :)
> > > 
> > 
> > Sure. I'm just not sure how to post it. It only shows up in this tip-bot
> > email and on gitweb. It's not in tip.git and not in Linus' upstream tree.
> > 
> > I've updated the patch at it will apply now to v5.0.6, v4.19.33, v4.14.110,
> > v4.9.167, v4.4.178  all with increasing offsets but nothing else.
> 
> You can either reply to this thread with the patch(es), or just send
> them out and annotate one way or the other that they should go to their
> appropriate stable trees.
>

Okay, I'll do that.  Am I the only one who finds it strange that the commit
exists only in this email and gitweb but not the actual git tree(s)?


> > v3.18.138 won't take it without more work. I'd be inclined to skip that one.
> 
> No problem there.
> 
> 
> --
> Thanks,
> Sasha

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
       [not found] <20190319130005.25492-1-pauld@redhat.com>
  2019-04-03  8:38 ` [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup tip-bot for Phil Auld
@ 2019-04-16 15:32 ` tip-bot for Phil Auld
  2019-04-16 19:26   ` Phil Auld
  1 sibling, 1 reply; 8+ messages in thread
From: tip-bot for Phil Auld @ 2019-04-16 15:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, bsegall, linux-kernel, pauld, stable, anton, tglx,
	peterz, hpa, mingo

Commit-ID:  2e8e19226398db8265a8e675fcc0118b9e80c9e8
Gitweb:     https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8
Author:     Phil Auld <pauld@redhat.com>
AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 16 Apr 2019 16:50:05 +0200

sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				div_u64(new, NSEC_PER_USEC),
+				div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
@ 2019-04-16 19:26   ` Phil Auld
  0 siblings, 0 replies; 8+ messages in thread
From: Phil Auld @ 2019-04-16 19:26 UTC (permalink / raw)
  To: sashal
  Cc: tglx, hpa, peterz, mingo, torvalds, linux-kernel, bsegall, stable, anton


Hi Sasha,

On Tue, Apr 16, 2019 at 08:32:09AM -0700 tip-bot for Phil Auld wrote:
> Commit-ID:  2e8e19226398db8265a8e675fcc0118b9e80c9e8
> Gitweb:     https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8
> Author:     Phil Auld <pauld@redhat.com>
> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 16 Apr 2019 16:50:05 +0200
> 
> sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer() can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.
> 
>  NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
>  RIP: 0010:tg_nop+0x0/0x10
>   <IRQ>
>   walk_tg_tree_from+0x29/0xb0
>   unthrottle_cfs_rq+0xe0/0x1a0
>   distribute_cfs_runtime+0xd3/0xf0
>   sched_cfs_period_timer+0xcb/0x160
>   ? sched_cfs_slack_timer+0xd0/0xd0
>   __hrtimer_run_queues+0xfb/0x270
>   hrtimer_interrupt+0x122/0x270
>   smp_apic_timer_interrupt+0x6a/0x140
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: <stable@vger.kernel.org>
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---

The above commit won't work on the stable trees. Below is an updated version
that will work on v5.0.7, v4.19.34, v4.14.111, v4.9.168, and v4.4.178 with 
increasing offsets. I believe v3.18.138 will require more so that one is not 
included.

There is only a minor change to context, none of actual changes in the patch are
different.


Thanks,
Phil
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 310d0637fe4b..f0380229b6f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4859,12 +4859,15 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
 		container_of(timer, struct cfs_bandwidth, period_timer);
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock(&cfs_b->lock);
 	for (;;) {
@@ -4872,6 +4875,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+			pr_warn_ratelimited(
+        "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+	                        smp_processor_id(),
+	                        div_u64(new, NSEC_PER_USEC),
+                                div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun);
 	}
 	if (idle)



-- 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-16 19:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190319130005.25492-1-pauld@redhat.com>
2019-04-03  8:38 ` [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup tip-bot for Phil Auld
     [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
2019-04-08 13:42     ` Phil Auld
2019-04-08 14:50       ` Sasha Levin
2019-04-08 17:03         ` Phil Auld
2019-04-08 17:06           ` Sasha Levin
2019-04-08 17:31             ` Phil Auld
2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
2019-04-16 19:26   ` Phil Auld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).