From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1036915AbdEZCK7 (ORCPT ); Thu, 25 May 2017 22:10:59 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:33931 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932931AbdEZCK4 (ORCPT ); Thu, 25 May 2017 22:10:56 -0400 Date: Fri, 26 May 2017 04:10:53 +0200 From: Frederic Weisbecker To: Ingo Molnar Cc: LKML , Peter Zijlstra , Rik van Riel , James Hartsock , Thomas Gleixner , stable@vger.kernel.org, Tim Wright , Pavel Machek Subject: Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Message-ID: <20170526021051.GA27061@lerouge> References: <1495201910-12466-1-git-send-email-fweisbec@gmail.com> <20170523072508.3bzyipgwnyjcbef6@gmail.com> <20170523131042.GA26103@lerouge> <20170524071628.ftwjuh27jpdo6qkm@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170524071628.ftwjuh27jpdo6qkm@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 24, 2017 at 09:16:28AM +0200, Ingo Molnar wrote: > So the interdiff between your two patches and the 3 commits already queued up is: > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index e3043873fcdc..30253ed0380b 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) > touch_softlockup_watchdog_sched(); > if (is_idle_task(current)) > ts->idle_jiffies++; > - /* > - * In case the current tick fired too early past its expected > - * expiration, make sure we don't bypass the next clock reprogramming > - * to the same deadline. > - */ > - ts->next_tick = 0; > } > #endif > update_process_times(user_mode(regs)); > @@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev) > tick_sched_handle(ts, regs); > > /* No need to reprogram if we are running tickless */ > - if (unlikely(ts->tick_stopped)) > + if (unlikely(ts->tick_stopped)) { > + /* > + * In case the current tick fired too early past its expected > + * expiration, make sure we don't bypass the next clock reprogramming > + * to the same deadline. > + */ > + ts->next_tick = 0; > return; > + } > > hrtimer_forward(&ts->sched_timer, now, tick_period); > tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1); > @@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) > */ > if (regs) > tick_sched_handle(ts, regs); > - else > - ts->next_tick = 0; > > /* No need to reprogram if we are in idle or full dynticks mode */ > - if (unlikely(ts->tick_stopped)) > + if (unlikely(ts->tick_stopped)) { > + /* > + * In case the current tick fired too early past its expected > + * expiration, make sure we don't bypass the next clock reprogramming > + * to the same deadline. > + */ > + ts->next_tick = 0; > return HRTIMER_NORESTART; > + } > > hrtimer_forward(timer, now, tick_period); > > > ... so the two are not the same - I'd rather not rebase it, I'd like to keep what > is working, we had problems with these changes before ... > > If you'd like the changes in this interdiff to be applied as well, please add a > changelog to it and post it as a fourth patch. > > Thanks, > > Ingo So if you like, you can replace the top patch with the following. It's exactly the same code, I've only added a comment and a changelog: --- >>From 72956bf08c3b2e506a5ce5ec4faac9fd6b097307 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 15 May 2017 14:56:50 +0200 Subject: [PATCH] nohz: Reset next_tick cache even when the timer has no regs The tick IRQ regs can be NULL if hrtimer_interrupt() is called from non-interrupt contexts (ex: hotplug CPU down). For such very special path we forget to clean the cached next tick deadline. If we are in dynticks mode and the actual timer deadline is ahead of us, we might perform a buggy bypass of the next clock reprogramming. In fact since CPU down is the only user I'm aware of, this fix is likely unnecessary as dying CPUs already clean their tick deadline cache. But given how hard it is to debug such timer cache related issue, we should never be short on paranoid measures. Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar --- kernel/time/tick-sched.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 764d290..ed18ca5 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1200,8 +1200,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer) * Do not call, when we are not in irq context and have * no valid regs pointer */ - if (regs) + if (regs) { tick_sched_handle(ts, regs); + } else { + /* + * IRQ regs are NULL if hrtimer_interrupt() is called from + * non-interrupt contexts (ex: hotplug cpu down). Make sure to + * clean the cached next tick deadline to avoid buggy bypass of + * clock reprog. + */ + ts->next_tick = 0; + } /* No need to reprogram if we are in idle or full dynticks mode */ if (unlikely(ts->tick_stopped)) -- 2.7.4