From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752469AbaC3HYK (ORCPT ); Sun, 30 Mar 2014 03:24:10 -0400 Received: from mail-ee0-f54.google.com ([74.125.83.54]:36244 "EHLO mail-ee0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751257AbaC3HYI (ORCPT ); Sun, 30 Mar 2014 03:24:08 -0400 Message-ID: <1396164244.28950.15.camel@marge.simpson.net> Subject: [PATCH] sched: update_rq_clock() must skip ONE update From: Mike Galbraith To: Peter Zijlstra , Ingo Molnar Cc: LKML , Linus Torvalds Date: Sun, 30 Mar 2014 09:24:04 +0200 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org NEWSFLASH: Wimpy latent scheduler buglet turns axe-murderer... Root cause being us staying in kernel for weeks at a time during boot: Subverting ftrace, we see below that updates _can_ be skipped for AGES, which can make very bad things happen. It's supposed to skip ONE, as an optimization. not lord knows how many. Here we see we've been skipping since 4915630446, resulting in the watchdog being credited with a 137ms delta_exec. Bogus, and.. modprobe-134 [000] 5.027903: update_rq_clock <-scheduler_tick modprobe-134 [000] 5.037857: update_rq_clock <-scheduler_tick modprobe-134 [000] 5.037864: update_rq_clock <-enqueue_task modprobe-134 [000] 5.037866: enqueue_task_watchdog: WATCHDOG/0 enqueue 4915630446 modprobe-134 [000] 5.044965: update_rq_clock <-scheduler_tick modprobe-134 [000] 5.046275: update_rq_clock <-scheduler_tick modprobe-134 [000] 5.050274: update_rq_clock <-scheduler_tick modprobe-134 [000] 5.052862: update_rq_clock <-dequeue_task modprobe-134 [000] 5.052866: pick_next_task_watchdog: WATCHDOG/0 select 4915630446 watchdog/0-7 [000] 5.052868: finish_task_switch: WATCHDOG/0 runs 5052867644 watchdog/0-7 [000] 5.052869: watchdog: WATCHDOG/0 DELAYED 137238048 max: 137238048 watchdog/0-7 [000] 5.052872: update_rq_clock <-dequeue_task watchdog/0-7 [000] 5.052873: dequeue_task_watchdog: WATCHDOG/0 dequeue 5052872018 watchdog/0-7 [000] 5.052874: put_prev_task_watchdog: WATCHDOG/0 STOP trace : delta_exec 137241572 sum_exec 137254287 ..delta_exec was sufficient in a NOPREEMPT kernel on zillion disk box that disk discovery combined with swsusp wanting to find out if it needs to resume, that the featherweight watchdog was credited with delta_exec HUGE enough to be incapacitated by the rt throttle long enough for the hard lockup detector to trigger. Throttled watchdog is bad juju, led to -ENOBOOT for IO beast. Cc: Signed-off-by: Mike Galbraith --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -118,7 +118,7 @@ void update_rq_clock(struct rq *rq) { s64 delta; - if (rq->skip_clock_update > 0) + if (rq->skip_clock_update-- > 0) return; delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;