All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
@ 2019-03-19 13:00 Phil Auld
  2019-03-21 18:01 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Phil Auld @ 2019-03-19 13:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ben Segall, Ingo Molnar, Peter Zijlstra, Anton Blanchard

sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

[  217.264946] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
[  217.264948] Modules linked in: sunrpc iTCO_wdt gpio_ich iTCO_vendor_support intel_powerclamp coretemp kvm_intel
+kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si intel_cstate joydev
+ipmi_devintf pcspkr hpilo intel_uncore sg hpwdt ipmi_msghandler acpi_power_meter i7core_edac lpc_ich acpi_cpufreq
+ip_tables xfs libcrc32c sr_mod sd_mod cdrom ata_generic radeon i2c_algo_bit drm_kms_helper syscopyarea
+sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm serio_raw crc32c_intel hpsa myri10ge libata dca
+scsi_transport_sas netxen_nic dm_mirror dm_region_hash dm_log dm_mod
[  217.264964] CPU: 24 PID: 46684 Comm: myapp Not tainted 5.0.0-rc7+ #25
[  217.264965] Hardware name: HP ProLiant DL580 G7, BIOS P65 08/16/2015
[  217.264965] RIP: 0010:tg_nop+0x0/0x10
[  217.264966] Code: 83 c9 08 f0 48 0f b1 0f 48 39 c2 74 0e 48 89 c2 f7 c2 00 00 20 00 75 dc 31 c0 c3 b8 01 00 00
+00 c3 66 0f 1f 84 00 00 00 00 00 <66> 66 66 66 90 31 c0 c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b
[  217.264967] RSP: 0000:ffff976a7f703e48 EFLAGS: 00000087
[  217.264967] RAX: ffff976a7c7c8f00 RBX: ffff976a6f4fad00 RCX: ffff976a7c7c90f0
[  217.264968] RDX: ffff976a6f4faee0 RSI: ffff976a7f961f00 RDI: ffff976a6f4fad00
[  217.264968] RBP: ffff976a7f961f00 R08: 0000000000000002 R09: ffffffad2c9b3331
[  217.264969] R10: 0000000000000000 R11: 0000000000000000 R12: ffff976a7c7c8f00
[  217.264969] R13: ffffffffb2305c00 R14: 0000000000000000 R15: ffffffffb22f8510
[  217.264970] FS:  00007f6240678740(0000) GS:ffff976a7f700000(0000) knlGS:0000000000000000
[  217.264970] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  217.264971] CR2: 00000000006dee20 CR3: 000000bf2bffc005 CR4: 00000000000206e0
[  217.264971] Call Trace:
[  217.264971]  <IRQ>
[  217.264972]  walk_tg_tree_from+0x29/0xb0
[  217.264972]  unthrottle_cfs_rq+0xe0/0x1a0
[  217.264972]  distribute_cfs_runtime+0xd3/0xf0
[  217.264973]  sched_cfs_period_timer+0xcb/0x160
[  217.264973]  ? sched_cfs_slack_timer+0xd0/0xd0
[  217.264973]  __hrtimer_run_queues+0xfb/0x270
[  217.264974]  hrtimer_interrupt+0x122/0x270
[  217.264974]  smp_apic_timer_interrupt+0x6a/0x140
[  217.264975]  apic_timer_interrupt+0xf/0x20
[  217.264975]  </IRQ>
[  217.264975] RIP: 0033:0x7f6240125fe5
[  217.264976] Code: 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f
+84 00 00 00 00 00 66 44 0f 7f 01 <66> 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41 30 48 83 c1 40
...

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

v2: Math reworked/simplified by Peter Zijlstra.

Signed-off-by: Phil Auld <pauld@redhat.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anton Blanchard <anton@ozlabs.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ea74d43924b2..4e43a40ec13c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota /= old;
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				new/NSEC_PER_USEC,
+				cfs_b->quota/NSEC_PER_USEC);
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
@ 2019-03-21 18:01 ` Peter Zijlstra
  2019-03-21 18:32   ` Phil Auld
  2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2019-03-21 18:01 UTC (permalink / raw)
  To: Phil Auld; +Cc: linux-kernel, Ben Segall, Ingo Molnar, Anton Blanchard

On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote:
> sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.

> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> v2: Math reworked/simplified by Peter Zijlstra.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Anton Blanchard <anton@ozlabs.org>

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-03-21 18:01 ` Peter Zijlstra
@ 2019-03-21 18:32   ` Phil Auld
  0 siblings, 0 replies; 16+ messages in thread
From: Phil Auld @ 2019-03-21 18:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ben Segall, Ingo Molnar, Anton Blanchard

On Thu, Mar 21, 2019 at 07:01:37PM +0100 Peter Zijlstra wrote:
> On Tue, Mar 19, 2019 at 09:00:05AM -0400, Phil Auld wrote:
> > sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
> > 
> > With extremely short cfs_period_us setting on a parent task group with a large
> > number of children the for loop in sched_cfs_period_timer can run until the
> > watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> > will ever return 0.  The large number of children can make
> > do_sched_cfs_period_timer() take longer than the period.
> 
> > 
> > To prevent this we add protection to the loop that detects when the loop has run
> > too many times and scales the period and quota up, proportionally, so that the timer
> > can complete before then next period expires.  This preserves the relative runtime
> > quota while preventing the hard lockup.
> > 
> > A warning is issued reporting this state and the new values.
> > 
> > v2: Math reworked/simplified by Peter Zijlstra.
> > 
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Anton Blanchard <anton@ozlabs.org>
> 
> Thanks!

Thank you for your time and help.

What do you think about Cc: stable?


Cheers,
Phil

-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
  2019-03-21 18:01 ` Peter Zijlstra
@ 2019-04-03  8:38 ` tip-bot for Phil Auld
       [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
  2019-04-09 12:48   ` Phil Auld
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
  2 siblings, 2 replies; 16+ messages in thread
From: tip-bot for Phil Auld @ 2019-04-03  8:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, pauld, efault, tglx, peterz, anton, bsegall, torvalds,
	stable, hpa, linux-kernel

Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
Author:     Phil Auld <pauld@redhat.com>
AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 3 Apr 2019 09:50:23 +0200

sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..d4cce633eac8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota /= old;
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				new/NSEC_PER_USEC,
+				cfs_b->quota/NSEC_PER_USEC);
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
       [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
@ 2019-04-08 13:42     ` Phil Auld
  2019-04-08 14:50       ` Sasha Levin
  0 siblings, 1 reply; 16+ messages in thread
From: Phil Auld @ 2019-04-08 13:42 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
> 
> The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
> 
> v5.0.6: Failed to apply! Possible dependencies:
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> 
> v4.19.33: Failed to apply! Possible dependencies:
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> 
> v4.14.110: Failed to apply! Possible dependencies:
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>

This is a minor context difference. There is no actual dependency on the
c0ad4aa4d841 patch.  It would be easy to produce new version that could
go in these trees. I'm not sure what the right action is in that case.
Should I spin a new version with the different locking in the context?


Also, I can't find this commit in tip. It's visible in gitweb but it's not
in the tip tree as far as I can tell. 


> v4.9.167: Failed to apply! Possible dependencies:
>     8a8c69c32778 ("sched/core: Add rq->lock wrappers")
>     92509b732baf ("sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock")
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>     cb42c9a3ebbb ("sched/core: Add debugging code to catch missing update_rq_clock() calls")
>     d1ccc66df8bf ("sched/core: Clean up comments")
>     d8ac897137a2 ("sched/core: Add wrappers for lockdep_(un)pin_lock()")
> 
> v4.4.178: Failed to apply! Possible dependencies:
>     2a67e741bbbc ("rcu: Create transitive rnp->lock acquisition functions")
>     3e71a462dd48 ("sched/core: Move task_rq_lock() out of line")
>     3ea94de15ce9 ("sched/core: Fix incorrect wait time and wait count statistics")
>     46a5d164db53 ("rcu: Stop disabling interrupts in scheduler fastpaths")
>     8a8c69c32778 ("sched/core: Add rq->lock wrappers")
>     958c5f848e17 ("stop_machine: Change stop_one_cpu() to rely on cpu_stop_queue_work()")
>     bf89a304722f ("stop_machine: Avoid a sleep and wakeup in stop_one_cpu()")
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>     cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
>     d8ac897137a2 ("sched/core: Add wrappers for lockdep_(un)pin_lock()")
>     e7904a28f533 ("locking/lockdep, sched/core: Implement a better lock pinning scheme")
>     fecbf6f01fbd ("rcu: Simplify rcu_sched_qs() control flow")
> 
> v3.18.138: Failed to apply! Possible dependencies:
>     1a43a14a5bd9 ("sched: Fix schedule_tail() to disable preemption")
>     1b537c7d1e58 ("sched/core: Remove check of p->sched_class")
>     1d7e974cbf2f ("sched/deadline: Don't check SD_BALANCE_FORK")
>     3960c8c0c789 ("sched: Make dl_task_time() use task_rq_lock()")
>     3e71a462dd48 ("sched/core: Move task_rq_lock() out of line")
>     4c9a4bc89a9c ("sched: Allow balance callbacks for check_class_changed()")
>     5cc389bcee08 ("sched: Move code around")
>     5e16bbc2fb40 ("sched: Streamline the task migration locking a little")
>     67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()")
>     6c1d9410f007 ("sched: Move p->nr_cpus_allowed check to select_task_rq()")
>     74b8a4cb6ce3 ("sched: Clarify ordering between task_rq_lock() and move_queued_task()")
>     8a8c69c32778 ("sched/core: Add rq->lock wrappers")
>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>     cbce1a686700 ("sched,lockdep: Employ lock pinning")
>     dfa50b605c2a ("sched: Make finish_task_switch() return 'struct rq *'")
>     f4e9d94a5bf6 ("sched/deadline: Don't balance during wakeup if wakee is pinned")
> 
> 
> How should we proceed with this patch?
>

I haven't look that far back.  This patch should be pretty self contained. I don't
think it's worth pulling it back to these trees if it would require all of these
patches.  The risk would out-weigh the reward.


Cheers,
Phil


> --
> Thanks,
> Sasha

-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 13:42     ` Phil Auld
@ 2019-04-08 14:50       ` Sasha Levin
  2019-04-08 17:03         ` Phil Auld
  0 siblings, 1 reply; 16+ messages in thread
From: Sasha Levin @ 2019-04-08 14:50 UTC (permalink / raw)
  To: Phil Auld
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
>On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
>> Hi,
>>
>> [This is an automated email]
>>
>> This commit has been processed because it contains a -stable tag.
>> The stable tag indicates that it's relevant for the following trees: all
>>
>> The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
>>
>> v5.0.6: Failed to apply! Possible dependencies:
>>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>>
>> v4.19.33: Failed to apply! Possible dependencies:
>>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>>
>> v4.14.110: Failed to apply! Possible dependencies:
>>     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>>
>
>This is a minor context difference. There is no actual dependency on the
>c0ad4aa4d841 patch.  It would be easy to produce new version that could
>go in these trees. I'm not sure what the right action is in that case.
>Should I spin a new version with the different locking in the context?

Please do :)

The algorithm does the dependency analysis by looking at surrounding
code rather than actual functional dependency.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 14:50       ` Sasha Levin
@ 2019-04-08 17:03         ` Phil Auld
  2019-04-08 17:06           ` Sasha Levin
  0 siblings, 1 reply; 16+ messages in thread
From: Phil Auld @ 2019-04-08 17:03 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 10:50:26AM -0400 Sasha Levin wrote:
> On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
> > On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
> > > Hi,
> > > 
> > > [This is an automated email]
> > > 
> > > This commit has been processed because it contains a -stable tag.
> > > The stable tag indicates that it's relevant for the following trees: all
> > > 
> > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
> > > 
> > > v5.0.6: Failed to apply! Possible dependencies:
> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > 
> > > v4.19.33: Failed to apply! Possible dependencies:
> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > 
> > > v4.14.110: Failed to apply! Possible dependencies:
> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > 
> > 
> > This is a minor context difference. There is no actual dependency on the
> > c0ad4aa4d841 patch.  It would be easy to produce new version that could
> > go in these trees. I'm not sure what the right action is in that case.
> > Should I spin a new version with the different locking in the context?
> 
> Please do :)
>

Sure. I'm just not sure how to post it. It only shows up in this tip-bot
email and on gitweb. It's not in tip.git and not in Linus' upstream tree.

I've updated the patch at it will apply now to v5.0.6, v4.19.33, v4.14.110,
v4.9.167, v4.4.178  all with increasing offsets but nothing else.

v3.18.138 won't take it without more work. I'd be inclined to skip that one.


> The algorithm does the dependency analysis by looking at surrounding
> code rather than actual functional dependency.


That's pretty cool though :)

Cheers,
Phil


>
> --
> Thanks,
> Sasha

-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 17:03         ` Phil Auld
@ 2019-04-08 17:06           ` Sasha Levin
  2019-04-08 17:31             ` Phil Auld
  0 siblings, 1 reply; 16+ messages in thread
From: Sasha Levin @ 2019-04-08 17:06 UTC (permalink / raw)
  To: Phil Auld
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 01:03:05PM -0400, Phil Auld wrote:
>On Mon, Apr 08, 2019 at 10:50:26AM -0400 Sasha Levin wrote:
>> On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
>> > On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
>> > > Hi,
>> > >
>> > > [This is an automated email]
>> > >
>> > > This commit has been processed because it contains a -stable tag.
>> > > The stable tag indicates that it's relevant for the following trees: all
>> > >
>> > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
>> > >
>> > > v5.0.6: Failed to apply! Possible dependencies:
>> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>> > >
>> > > v4.19.33: Failed to apply! Possible dependencies:
>> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>> > >
>> > > v4.14.110: Failed to apply! Possible dependencies:
>> > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
>> > >
>> >
>> > This is a minor context difference. There is no actual dependency on the
>> > c0ad4aa4d841 patch.  It would be easy to produce new version that could
>> > go in these trees. I'm not sure what the right action is in that case.
>> > Should I spin a new version with the different locking in the context?
>>
>> Please do :)
>>
>
>Sure. I'm just not sure how to post it. It only shows up in this tip-bot
>email and on gitweb. It's not in tip.git and not in Linus' upstream tree.
>
>I've updated the patch at it will apply now to v5.0.6, v4.19.33, v4.14.110,
>v4.9.167, v4.4.178  all with increasing offsets but nothing else.

You can either reply to this thread with the patch(es), or just send
them out and annotate one way or the other that they should go to their
appropriate stable trees.

>v3.18.138 won't take it without more work. I'd be inclined to skip that one.

No problem there.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-08 17:06           ` Sasha Levin
@ 2019-04-08 17:31             ` Phil Auld
  0 siblings, 0 replies; 16+ messages in thread
From: Phil Auld @ 2019-04-08 17:31 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tip-bot for Phil Auld, linux-tip-commits, mingo, efault,
	Anton Blanchard, Ben Segall, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, stable

On Mon, Apr 08, 2019 at 01:06:12PM -0400 Sasha Levin wrote:
> On Mon, Apr 08, 2019 at 01:03:05PM -0400, Phil Auld wrote:
> > On Mon, Apr 08, 2019 at 10:50:26AM -0400 Sasha Levin wrote:
> > > On Mon, Apr 08, 2019 at 09:42:14AM -0400, Phil Auld wrote:
> > > > On Fri, Apr 05, 2019 at 02:15:23PM +0000 Sasha Levin wrote:
> > > > > Hi,
> > > > >
> > > > > [This is an automated email]
> > > > >
> > > > > This commit has been processed because it contains a -stable tag.
> > > > > The stable tag indicates that it's relevant for the following trees: all
> > > > >
> > > > > The bot has tested the following trees: v5.0.6, v4.19.33, v4.14.110, v4.9.167, v4.4.178, v3.18.138.
> > > > >
> > > > > v5.0.6: Failed to apply! Possible dependencies:
> > > > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > > >
> > > > > v4.19.33: Failed to apply! Possible dependencies:
> > > > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > > >
> > > > > v4.14.110: Failed to apply! Possible dependencies:
> > > > >     c0ad4aa4d841 ("sched/fair: Robustify CFS-bandwidth timer locking")
> > > > >
> > > >
> > > > This is a minor context difference. There is no actual dependency on the
> > > > c0ad4aa4d841 patch.  It would be easy to produce new version that could
> > > > go in these trees. I'm not sure what the right action is in that case.
> > > > Should I spin a new version with the different locking in the context?
> > > 
> > > Please do :)
> > > 
> > 
> > Sure. I'm just not sure how to post it. It only shows up in this tip-bot
> > email and on gitweb. It's not in tip.git and not in Linus' upstream tree.
> > 
> > I've updated the patch at it will apply now to v5.0.6, v4.19.33, v4.14.110,
> > v4.9.167, v4.4.178  all with increasing offsets but nothing else.
> 
> You can either reply to this thread with the patch(es), or just send
> them out and annotate one way or the other that they should go to their
> appropriate stable trees.
>

Okay, I'll do that.  Am I the only one who finds it strange that the commit
exists only in this email and gitweb but not the actual git tree(s)?


> > v3.18.138 won't take it without more work. I'd be inclined to skip that one.
> 
> No problem there.
> 
> 
> --
> Thanks,
> Sasha

-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
       [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
@ 2019-04-09 12:48   ` Phil Auld
  2019-04-09 13:05     ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Phil Auld @ 2019-04-09 12:48 UTC (permalink / raw)
  To: linux-kernel, hpa, peterz, bsegall, torvalds, anton, efault, mingo, tglx
  Cc: linux-tip-commits

Hi Ingo, Peter,

On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> Author:     Phil Auld <pauld@redhat.com>
> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Wed, 3 Apr 2019 09:50:23 +0200

This commit seems to have gotten lost. It's not in tip and now the 
direct gitweb link is also showing bad commit reference. 

Did this fall victim to a reset or something?


Thanks,

Phil


> 
> sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.
> 
>  NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
>  RIP: 0010:tg_nop+0x0/0x10
>   <IRQ>
>   walk_tg_tree_from+0x29/0xb0
>   unthrottle_cfs_rq+0xe0/0x1a0
>   distribute_cfs_runtime+0xd3/0xf0
>   sched_cfs_period_timer+0xcb/0x160
>   ? sched_cfs_slack_timer+0xd0/0xd0
>   __hrtimer_run_queues+0xfb/0x270
>   hrtimer_interrupt+0x122/0x270
>   smp_apic_timer_interrupt+0x6a/0x140
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: <stable@vger.kernel.org>
> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/fair.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 40bd1e27b1b7..d4cce633eac8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
>  	return HRTIMER_NORESTART;
>  }
>  
> +extern const u64 max_cfs_quota_period;
> +
>  static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  {
>  	struct cfs_bandwidth *cfs_b =
> @@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  	unsigned long flags;
>  	int overrun;
>  	int idle = 0;
> +	int count = 0;
>  
>  	raw_spin_lock_irqsave(&cfs_b->lock, flags);
>  	for (;;) {
> @@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
>  		if (!overrun)
>  			break;
>  
> +		if (++count > 3) {
> +			u64 new, old = ktime_to_ns(cfs_b->period);
> +
> +			new = (old * 147) / 128; /* ~115% */
> +			new = min(new, max_cfs_quota_period);
> +
> +			cfs_b->period = ns_to_ktime(new);
> +
> +			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
> +			cfs_b->quota *= new;
> +			cfs_b->quota /= old;
> +
> +			pr_warn_ratelimited(
> +	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
> +				smp_processor_id(),
> +				new/NSEC_PER_USEC,
> +				cfs_b->quota/NSEC_PER_USEC);
> +
> +			/* reset count so we don't come right back in here */
> +			count = 0;
> +		}
> +
>  		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
>  	}
>  	if (idle)

-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 12:48   ` Phil Auld
@ 2019-04-09 13:05     ` Peter Zijlstra
  2019-04-09 13:15       ` Phil Auld
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Peter Zijlstra @ 2019-04-09 13:05 UTC (permalink / raw)
  To: Phil Auld
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> Hi Ingo, Peter,
> 
> On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > Author:     Phil Auld <pauld@redhat.com>
> > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> 
> This commit seems to have gotten lost. It's not in tip and now the 
> direct gitweb link is also showing bad commit reference. 
> 
> Did this fall victim to a reset or something?

It had (trivial) builds fails on 32 bit. I have a fixed up version
around somewhere, but that hasn't made it back in yet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 13:05     ` Peter Zijlstra
@ 2019-04-09 13:15       ` Phil Auld
  2019-04-16 13:33       ` Phil Auld
  2019-04-16 16:18       ` Phil Auld
  2 siblings, 0 replies; 16+ messages in thread
From: Phil Auld @ 2019-04-09 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> > Hi Ingo, Peter,
> > 
> > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Author:     Phil Auld <pauld@redhat.com>
> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> > 
> > This commit seems to have gotten lost. It's not in tip and now the 
> > direct gitweb link is also showing bad commit reference. 
> > 
> > Did this fall victim to a reset or something?
> 
> It had (trivial) builds fails on 32 bit. I have a fixed up version
> around somewhere, but that hasn't made it back in yet.

Drat. Sorry about that. At least I'm not going crazy...  

Do you want me to fix it and push it back or will you do that? 


Thanks,
Phil


-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 13:05     ` Peter Zijlstra
  2019-04-09 13:15       ` Phil Auld
@ 2019-04-16 13:33       ` Phil Auld
  2019-04-16 16:18       ` Phil Auld
  2 siblings, 0 replies; 16+ messages in thread
From: Phil Auld @ 2019-04-16 13:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> > Hi Ingo, Peter,
> > 
> > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Author:     Phil Auld <pauld@redhat.com>
> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> > 
> > This commit seems to have gotten lost. It's not in tip and now the 
> > direct gitweb link is also showing bad commit reference. 
> > 
> > Did this fall victim to a reset or something?
> 
> It had (trivial) builds fails on 32 bit. I have a fixed up version
> around somewhere, but that hasn't made it back in yet.


Friendly ping...


Thanks,
Phil

-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
  2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
  2019-03-21 18:01 ` Peter Zijlstra
  2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
@ 2019-04-16 15:32 ` tip-bot for Phil Auld
  2019-04-16 19:26   ` Phil Auld
  2 siblings, 1 reply; 16+ messages in thread
From: tip-bot for Phil Auld @ 2019-04-16 15:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, bsegall, linux-kernel, pauld, stable, anton, tglx,
	peterz, hpa, mingo

Commit-ID:  2e8e19226398db8265a8e675fcc0118b9e80c9e8
Gitweb:     https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8
Author:     Phil Auld <pauld@redhat.com>
AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 16 Apr 2019 16:50:05 +0200

sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup

With extremely short cfs_period_us setting on a parent task group with a large
number of children the for loop in sched_cfs_period_timer() can run until the
watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
will ever return 0.  The large number of children can make
do_sched_cfs_period_timer() take longer than the period.

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
 RIP: 0010:tg_nop+0x0/0x10
  <IRQ>
  walk_tg_tree_from+0x29/0xb0
  unthrottle_cfs_rq+0xe0/0x1a0
  distribute_cfs_runtime+0xd3/0xf0
  sched_cfs_period_timer+0xcb/0x160
  ? sched_cfs_slack_timer+0xd0/0xd0
  __hrtimer_run_queues+0xfb/0x270
  hrtimer_interrupt+0x122/0x270
  smp_apic_timer_interrupt+0x6a/0x140
  apic_timer_interrupt+0xf/0x20
  </IRQ>

To prevent this we add protection to the loop that detects when the loop has run
too many times and scales the period and quota up, proportionally, so that the timer
can complete before then next period expires.  This preserves the relative runtime
quota while preventing the hard lockup.

A warning is issued reporting this state and the new values.

Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40bd1e27b1b7..a4d9e14bf138 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4885,6 +4885,8 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
@@ -4892,6 +4894,7 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 	unsigned long flags;
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock_irqsave(&cfs_b->lock, flags);
 	for (;;) {
@@ -4899,6 +4902,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+			pr_warn_ratelimited(
+	"cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+				smp_processor_id(),
+				div_u64(new, NSEC_PER_USEC),
+				div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun, flags);
 	}
 	if (idle)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [tip:sched/core] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup
  2019-04-09 13:05     ` Peter Zijlstra
  2019-04-09 13:15       ` Phil Auld
  2019-04-16 13:33       ` Phil Auld
@ 2019-04-16 16:18       ` Phil Auld
  2 siblings, 0 replies; 16+ messages in thread
From: Phil Auld @ 2019-04-16 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, hpa, bsegall, torvalds, anton, efault, mingo, tglx,
	linux-tip-commits

On Tue, Apr 09, 2019 at 03:05:27PM +0200 Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 08:48:16AM -0400, Phil Auld wrote:
> > Hi Ingo, Peter,
> > 
> > On Wed, Apr 03, 2019 at 01:38:39AM -0700 tip-bot for Phil Auld wrote:
> > > Commit-ID:  06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Gitweb:     https://git.kernel.org/tip/06ec5d30e8d57b820d44df6340dcb25010d6d0fa
> > > Author:     Phil Auld <pauld@redhat.com>
> > > AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Wed, 3 Apr 2019 09:50:23 +0200
> > 
> > This commit seems to have gotten lost. It's not in tip and now the 
> > direct gitweb link is also showing bad commit reference. 
> > 
> > Did this fall victim to a reset or something?
> 
> It had (trivial) builds fails on 32 bit. I have a fixed up version
> around somewhere, but that hasn't made it back in yet.


Thank you :)

I'll post the follow up version for the stable trees soon.


-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
  2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
@ 2019-04-16 19:26   ` Phil Auld
  0 siblings, 0 replies; 16+ messages in thread
From: Phil Auld @ 2019-04-16 19:26 UTC (permalink / raw)
  To: sashal
  Cc: tglx, hpa, peterz, mingo, torvalds, linux-kernel, bsegall, stable, anton


Hi Sasha,

On Tue, Apr 16, 2019 at 08:32:09AM -0700 tip-bot for Phil Auld wrote:
> Commit-ID:  2e8e19226398db8265a8e675fcc0118b9e80c9e8
> Gitweb:     https://git.kernel.org/tip/2e8e19226398db8265a8e675fcc0118b9e80c9e8
> Author:     Phil Auld <pauld@redhat.com>
> AuthorDate: Tue, 19 Mar 2019 09:00:05 -0400
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 16 Apr 2019 16:50:05 +0200
> 
> sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
> 
> With extremely short cfs_period_us setting on a parent task group with a large
> number of children the for loop in sched_cfs_period_timer() can run until the
> watchdog fires. There is no guarantee that the call to hrtimer_forward_now()
> will ever return 0.  The large number of children can make
> do_sched_cfs_period_timer() take longer than the period.
> 
>  NMI watchdog: Watchdog detected hard LOCKUP on cpu 24
>  RIP: 0010:tg_nop+0x0/0x10
>   <IRQ>
>   walk_tg_tree_from+0x29/0xb0
>   unthrottle_cfs_rq+0xe0/0x1a0
>   distribute_cfs_runtime+0xd3/0xf0
>   sched_cfs_period_timer+0xcb/0x160
>   ? sched_cfs_slack_timer+0xd0/0xd0
>   __hrtimer_run_queues+0xfb/0x270
>   hrtimer_interrupt+0x122/0x270
>   smp_apic_timer_interrupt+0x6a/0x140
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> 
> To prevent this we add protection to the loop that detects when the loop has run
> too many times and scales the period and quota up, proportionally, so that the timer
> can complete before then next period expires.  This preserves the relative runtime
> quota while preventing the hard lockup.
> 
> A warning is issued reporting this state and the new values.
> 
> Signed-off-by: Phil Auld <pauld@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: <stable@vger.kernel.org>
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---

The above commit won't work on the stable trees. Below is an updated version
that will work on v5.0.7, v4.19.34, v4.14.111, v4.9.168, and v4.4.178 with 
increasing offsets. I believe v3.18.138 will require more so that one is not 
included.

There is only a minor change to context, none of actual changes in the patch are
different.


Thanks,
Phil
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 310d0637fe4b..f0380229b6f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4859,12 +4859,15 @@ static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
+extern const u64 max_cfs_quota_period;
+
 static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 {
 	struct cfs_bandwidth *cfs_b =
 		container_of(timer, struct cfs_bandwidth, period_timer);
 	int overrun;
 	int idle = 0;
+	int count = 0;
 
 	raw_spin_lock(&cfs_b->lock);
 	for (;;) {
@@ -4872,6 +4875,28 @@ static enum hrtimer_restart sched_cfs_period_timer(struct hrtimer *timer)
 		if (!overrun)
 			break;
 
+		if (++count > 3) {
+			u64 new, old = ktime_to_ns(cfs_b->period);
+
+			new = (old * 147) / 128; /* ~115% */
+			new = min(new, max_cfs_quota_period);
+
+			cfs_b->period = ns_to_ktime(new);
+
+			/* since max is 1s, this is limited to 1e9^2, which fits in u64 */
+			cfs_b->quota *= new;
+			cfs_b->quota = div64_u64(cfs_b->quota, old);
+
+			pr_warn_ratelimited(
+        "cfs_period_timer[cpu%d]: period too short, scaling up (new cfs_period_us %lld, cfs_quota_us = %lld)\n",
+	                        smp_processor_id(),
+	                        div_u64(new, NSEC_PER_USEC),
+                                div_u64(cfs_b->quota, NSEC_PER_USEC));
+
+			/* reset count so we don't come right back in here */
+			count = 0;
+		}
+
 		idle = do_sched_cfs_period_timer(cfs_b, overrun);
 	}
 	if (idle)



-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-04-16 19:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 13:00 [PATCH v2] sched/fair: Limit sched_cfs_period_timer loop to avoid hard lockup Phil Auld
2019-03-21 18:01 ` Peter Zijlstra
2019-03-21 18:32   ` Phil Auld
2019-04-03  8:38 ` [tip:sched/core] " tip-bot for Phil Auld
     [not found]   ` <20190405141524.05DDA2186A@mail.kernel.org>
2019-04-08 13:42     ` Phil Auld
2019-04-08 14:50       ` Sasha Levin
2019-04-08 17:03         ` Phil Auld
2019-04-08 17:06           ` Sasha Levin
2019-04-08 17:31             ` Phil Auld
2019-04-09 12:48   ` Phil Auld
2019-04-09 13:05     ` Peter Zijlstra
2019-04-09 13:15       ` Phil Auld
2019-04-16 13:33       ` Phil Auld
2019-04-16 16:18       ` Phil Auld
2019-04-16 15:32 ` [tip:sched/urgent] sched/fair: Limit sched_cfs_period_timer() " tip-bot for Phil Auld
2019-04-16 19:26   ` Phil Auld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.