linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
@ 2017-05-19 13:51 Frederic Weisbecker
  2017-05-19 13:51 ` [PATCH 1/2] nohz: Add hrtimer sanity check Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-19 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Rik van Riel,
	James Hartsock, Thomas Gleixner, stable, Tim Wright,
	Pavel Machek

v2 had issues on -tip tree and triggered a warning. It seems to have
disappeared. Perhaps it was due to another timer issue. Anyway this
version brings more debugging informations, with a layout that is more
bisection-friendly and it also handles ticks that fire outside IRQ
context and thus carry NULL irq regs. This happen when
hrtimer_interrupt() is called on hotplug cpu down for example.

We'll see if the issue arises again.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	nohz/fixes

HEAD: cd15f46b284f04dbedd065a9d99a4e0badae379a

Thanks,
	Frederic
---

Frederic Weisbecker (2):
      nohz: Add hrtimer sanity check
      nohz: Fix collision between tick and other hrtimers, again


 kernel/time/tick-sched.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
 kernel/time/tick-sched.h |  2 ++
 2 files changed, 45 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] nohz: Add hrtimer sanity check
  2017-05-19 13:51 [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Frederic Weisbecker
@ 2017-05-19 13:51 ` Frederic Weisbecker
  2017-05-19 13:51 ` [PATCH 2/2] nohz: Fix collision between tick and other hrtimers, again Frederic Weisbecker
  2017-05-23  7:25 ` [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Ingo Molnar
  2 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-19 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Rik van Riel,
	James Hartsock, Thomas Gleixner, stable, Tim Wright,
	Pavel Machek

Make sure that the clockevent device is never programmed to a deadline
later than the tick.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tim Wright <tim@binbash.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: James Hartsock <hartsjc@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/time/tick-sched.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 64c97fc..d212bb6 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -771,8 +771,13 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	tick = expires;
 
 	/* Skip reprogram of event if its not changed */
-	if (ts->tick_stopped && (expires == dev->next_event))
-		goto out;
+	if (ts->tick_stopped) {
+		if (hrtimer_active(&ts->sched_timer))
+			WARN_ON_ONCE(hrtimer_get_expires(&ts->sched_timer) < dev->next_event);
+
+		if (expires == dev->next_event)
+			goto out;
+	}
 
 	/*
 	 * nohz_stop_sched_tick can be called several times before
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] nohz: Fix collision between tick and other hrtimers, again
  2017-05-19 13:51 [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Frederic Weisbecker
  2017-05-19 13:51 ` [PATCH 1/2] nohz: Add hrtimer sanity check Frederic Weisbecker
@ 2017-05-19 13:51 ` Frederic Weisbecker
  2017-05-23  7:25 ` [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Ingo Molnar
  2 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-19 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra, Rik van Riel,
	James Hartsock, Thomas Gleixner, stable, Tim Wright,
	Pavel Machek

This restores commit:

  24b91e360ef5: ("nohz: Fix collision between tick and other hrtimers")

... which got reverted by commit:

  558e8e27e73f: ('Revert "nohz: Fix collision between tick and other hrtimers"')

... due to a regression where CPUs spuriously stopped ticking.

The bug happened when a tick fired too early past its expected expiration:
on IRQ exit the tick was scheduled again to the same deadline but skipped
reprogramming because ts->next_tick still kept in cache the deadline.
This has been fixed now with resetting ts->next_tick from the tick
itself. Extra care has also been taken to prevent from obsolete values
throughout CPU hotplug operations. Also in order to honour further
paranoia, this version of the patch takes care about tick interrupts
whose regs are NULL. It happens when hrtimer_interrupt() is called from
non-interrupt contexts (ex: hotplug cpu down).

When the tick is stopped and an interrupt occurs afterward, we check on
that interrupt exit if the next tick needs to be rescheduled. If it
doesn't need any update, we don't want to do anything.

In order to check if the tick needs an update, we compare it against the
clockevent device deadline. Now that's a problem because the clockevent
device is at a lower level than the tick itself if it is implemented
on top of hrtimer.

Every hrtimer share this clockevent device. So comparing the next tick
deadline against the clockevent device deadline is wrong because the
device may be programmed for another hrtimer whose deadline collides
with the tick. As a result we may end up not reprogramming the tick
accidentally.

In a worst case scenario under full dynticks mode, the tick stops firing
as it is supposed to every 1hz, leaving /proc/stat stalled:

      Task in a full dynticks CPU
      ----------------------------

      * hrtimer A is queued 2 seconds ahead
      * the tick is stopped, scheduled 1 second ahead
      * tick fires 1 second later
      * on tick exit, nohz schedules the tick 1 second ahead but sees
        the clockevent device is already programmed to that deadline,
        fooled by hrtimer A, the tick isn't rescheduled.
      * hrtimer A is cancelled before its deadline
      * tick never fires again until an interrupt happens...

In order to fix this, store the next tick deadline to the tick_sched
local structure and reuse that value later to check whether we need to
reprogram the clock after an interrupt.

On the other hand, ts->sleep_length still wants to know about the next
clock event and not just the tick, so we want to improve the related
comment to avoid confusion.

Reported-and-tested-by: Tim Wright <tim@binbash.co.uk>
Reported-and-tested-by: Pavel Machek <pavel@ucw.cz>
Reported-by: James Hartsock <hartsjc@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/time/tick-sched.c | 49 ++++++++++++++++++++++++++++++++++++++++--------
 kernel/time/tick-sched.h |  2 ++
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d212bb6..30253ed 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -660,6 +660,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
 		hrtimer_start_expires(&ts->sched_timer, HRTIMER_MODE_ABS_PINNED);
 	else
 		tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
+
+	/*
+	 * Reset to make sure next tick stop doesn't get fooled by past
+	 * cached clock deadline.
+	 */
+	ts->next_tick = 0;
 }
 
 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
@@ -771,12 +777,15 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	tick = expires;
 
 	/* Skip reprogram of event if its not changed */
-	if (ts->tick_stopped) {
-		if (hrtimer_active(&ts->sched_timer))
-			WARN_ON_ONCE(hrtimer_get_expires(&ts->sched_timer) < dev->next_event);
-
-		if (expires == dev->next_event)
+	if (ts->tick_stopped && (expires == ts->next_tick)) {
+		/* Sanity check: make sure clockevent is actually programmed */
+		if (likely(dev->next_event <= ts->next_tick))
 			goto out;
+
+		WARN_ON_ONCE(1);
+		printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n",
+			    basemono, ts->next_tick, dev->next_event,
+			    hrtimer_active(&ts->sched_timer), hrtimer_get_expires(&ts->sched_timer));
 	}
 
 	/*
@@ -796,6 +805,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		trace_tick_stop(1, TICK_DEP_MASK_NONE);
 	}
 
+	ts->next_tick = tick;
+
 	/*
 	 * If the expiration time == KTIME_MAX, then we simply stop
 	 * the tick timer.
@@ -811,7 +822,10 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	else
 		tick_program_event(tick, 1);
 out:
-	/* Update the estimated sleep length */
+	/*
+	 * Update the estimated sleep length until the next timer
+	 * (not only the tick).
+	 */
 	ts->sleep_length = ktime_sub(dev->next_event, now);
 	return tick;
 }
@@ -869,6 +883,11 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (unlikely(!cpu_online(cpu))) {
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+		/*
+		 * Make sure the CPU doesn't get fooled by obsolete tick
+		 * deadline if it comes back online later.
+		 */
+		ts->next_tick = 0;
 		return false;
 	}
 
@@ -1078,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are running tickless  */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		/*
+		 * In case the current tick fired too early past its expected
+		 * expiration, make sure we don't bypass the next clock reprogramming
+		 * to the same deadline.
+		 */
+		ts->next_tick = 0;
 		return;
+	}
 
 	hrtimer_forward(&ts->sched_timer, now, tick_period);
 	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
@@ -1179,8 +1205,15 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 		tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are in idle or full dynticks mode */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		/*
+		 * In case the current tick fired too early past its expected
+		 * expiration, make sure we don't bypass the next clock reprogramming
+		 * to the same deadline.
+		 */
+		ts->next_tick = 0;
 		return HRTIMER_NORESTART;
+	}
 
 	hrtimer_forward(timer, now, tick_period);
 
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index bf38226..075444e 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -27,6 +27,7 @@ enum tick_nohz_mode {
  *			timer is modified for nohz sleeps. This is necessary
  *			to resume the tick timer operation in the timeline
  *			when the CPU returns from nohz sleep.
+ * @next_tick:		Next tick to be fired when in dynticks mode.
  * @tick_stopped:	Indicator that the idle tick has been stopped
  * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
  * @idle_calls:		Total number of idle calls
@@ -44,6 +45,7 @@ struct tick_sched {
 	unsigned long			check_clocks;
 	enum tick_nohz_mode		nohz_mode;
 	ktime_t				last_tick;
+	ktime_t				next_tick;
 	int				inidle;
 	int				tick_stopped;
 	unsigned long			idle_jiffies;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-19 13:51 [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Frederic Weisbecker
  2017-05-19 13:51 ` [PATCH 1/2] nohz: Add hrtimer sanity check Frederic Weisbecker
  2017-05-19 13:51 ` [PATCH 2/2] nohz: Fix collision between tick and other hrtimers, again Frederic Weisbecker
@ 2017-05-23  7:25 ` Ingo Molnar
  2017-05-23 13:10   ` Frederic Weisbecker
  2 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-05-23  7:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> v2 had issues on -tip tree and triggered a warning. It seems to have
> disappeared. Perhaps it was due to another timer issue. Anyway this
> version brings more debugging informations, with a layout that is more
> bisection-friendly and it also handles ticks that fire outside IRQ
> context and thus carry NULL irq regs. This happen when
> hrtimer_interrupt() is called on hotplug cpu down for example.
> 
> We'll see if the issue arises again.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	nohz/fixes
> 
> HEAD: cd15f46b284f04dbedd065a9d99a4e0badae379a
> 
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (2):
>       nohz: Add hrtimer sanity check
>       nohz: Fix collision between tick and other hrtimers, again
> 
> 
>  kernel/time/tick-sched.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
>  kernel/time/tick-sched.h |  2 ++
>  2 files changed, 45 insertions(+), 5 deletions(-)

So I think the 3 commits queued up right now:

 99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
 411fe24e6b7c: nohz: Fix collision between tick and other hrtimers, again
 ce6cf9a15d62: nohz: Add hrtimer sanity check

are OK and I'd not rebase them unless there's some breakage.

One thing I noticed: your second series does appear to have:

 99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs

is that intentional? That is pretty much the only commit I'd love to rebase with a 
proper description added.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-23  7:25 ` [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Ingo Molnar
@ 2017-05-23 13:10   ` Frederic Weisbecker
  2017-05-24  7:16     ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-23 13:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek

On Tue, May 23, 2017 at 09:25:08AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > v2 had issues on -tip tree and triggered a warning. It seems to have
> > disappeared. Perhaps it was due to another timer issue. Anyway this
> > version brings more debugging informations, with a layout that is more
> > bisection-friendly and it also handles ticks that fire outside IRQ
> > context and thus carry NULL irq regs. This happen when
> > hrtimer_interrupt() is called on hotplug cpu down for example.
> > 
> > We'll see if the issue arises again.
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > 	nohz/fixes
> > 
> > HEAD: cd15f46b284f04dbedd065a9d99a4e0badae379a
> > 
> > Thanks,
> > 	Frederic
> > ---
> > 
> > Frederic Weisbecker (2):
> >       nohz: Add hrtimer sanity check
> >       nohz: Fix collision between tick and other hrtimers, again
> > 
> > 
> >  kernel/time/tick-sched.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> >  kernel/time/tick-sched.h |  2 ++
> >  2 files changed, 45 insertions(+), 5 deletions(-)
> 
> So I think the 3 commits queued up right now:
> 
>  99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
>  411fe24e6b7c: nohz: Fix collision between tick and other hrtimers, again
>  ce6cf9a15d62: nohz: Add hrtimer sanity check
> 
> are OK and I'd not rebase them unless there's some breakage.
> 
> One thing I noticed: your second series does appear to have:
> 
>  99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> 
> is that intentional? That is pretty much the only commit I'd love to rebase with a 
> proper description added.

Yes in my latest series I melted "nohz: Reset next_tick cache even when the timer has no regs"
into "nohz: Fix collision between tick and other hrtimers, again" because it's a fixup and
keeping that patch separate may break bisection.

So ideally, it would be nice if you could fixup 411fe24e6b7c with 99fa871820cf. That's roughly
all I did in my latest series.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-23 13:10   ` Frederic Weisbecker
@ 2017-05-24  7:16     ` Ingo Molnar
  2017-05-24 13:28       ` Frederic Weisbecker
  2017-05-26  2:10       ` Frederic Weisbecker
  0 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2017-05-24  7:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, May 23, 2017 at 09:25:08AM +0200, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > v2 had issues on -tip tree and triggered a warning. It seems to have
> > > disappeared. Perhaps it was due to another timer issue. Anyway this
> > > version brings more debugging informations, with a layout that is more
> > > bisection-friendly and it also handles ticks that fire outside IRQ
> > > context and thus carry NULL irq regs. This happen when
> > > hrtimer_interrupt() is called on hotplug cpu down for example.
> > > 
> > > We'll see if the issue arises again.
> > > 
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > 	nohz/fixes
> > > 
> > > HEAD: cd15f46b284f04dbedd065a9d99a4e0badae379a
> > > 
> > > Thanks,
> > > 	Frederic
> > > ---
> > > 
> > > Frederic Weisbecker (2):
> > >       nohz: Add hrtimer sanity check
> > >       nohz: Fix collision between tick and other hrtimers, again
> > > 
> > > 
> > >  kernel/time/tick-sched.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> > >  kernel/time/tick-sched.h |  2 ++
> > >  2 files changed, 45 insertions(+), 5 deletions(-)
> > 
> > So I think the 3 commits queued up right now:
> > 
> >  99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> >  411fe24e6b7c: nohz: Fix collision between tick and other hrtimers, again
> >  ce6cf9a15d62: nohz: Add hrtimer sanity check
> > 
> > are OK and I'd not rebase them unless there's some breakage.
> > 
> > One thing I noticed: your second series does appear to have:
> > 
> >  99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> > 
> > is that intentional? That is pretty much the only commit I'd love to rebase with a 
> > proper description added.
> 
> Yes in my latest series I melted "nohz: Reset next_tick cache even when the timer has no regs"
> into "nohz: Fix collision between tick and other hrtimers, again" because it's a fixup and
> keeping that patch separate may break bisection.
> 
> So ideally, it would be nice if you could fixup 411fe24e6b7c with 99fa871820cf. That's roughly
> all I did in my latest series.

So the interdiff between your two patches and the 3 commits already queued up is:

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e3043873fcdc..30253ed0380b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 		touch_softlockup_watchdog_sched();
 		if (is_idle_task(current))
 			ts->idle_jiffies++;
-		/*
-		 * In case the current tick fired too early past its expected
-		 * expiration, make sure we don't bypass the next clock reprogramming
-		 * to the same deadline.
-		 */
-		ts->next_tick = 0;
 	}
 #endif
 	update_process_times(user_mode(regs));
@@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	tick_sched_handle(ts, regs);
 
 	/* No need to reprogram if we are running tickless  */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		/*
+		 * In case the current tick fired too early past its expected
+		 * expiration, make sure we don't bypass the next clock reprogramming
+		 * to the same deadline.
+		 */
+		ts->next_tick = 0;
 		return;
+	}
 
 	hrtimer_forward(&ts->sched_timer, now, tick_period);
 	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
@@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	 */
 	if (regs)
 		tick_sched_handle(ts, regs);
-	else
-		ts->next_tick = 0;
 
 	/* No need to reprogram if we are in idle or full dynticks mode */
-	if (unlikely(ts->tick_stopped))
+	if (unlikely(ts->tick_stopped)) {
+		/*
+		 * In case the current tick fired too early past its expected
+		 * expiration, make sure we don't bypass the next clock reprogramming
+		 * to the same deadline.
+		 */
+		ts->next_tick = 0;
 		return HRTIMER_NORESTART;
+	}
 
 	hrtimer_forward(timer, now, tick_period);
 

... so the two are not the same - I'd rather not rebase it, I'd like to keep what 
is working, we had problems with these changes before ...

If you'd like the changes in this interdiff to be applied as well, please add a 
changelog to it and post it as a fourth patch.

Thanks,

	Ingo

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-24  7:16     ` Ingo Molnar
@ 2017-05-24 13:28       ` Frederic Weisbecker
  2017-05-26  2:10       ` Frederic Weisbecker
  1 sibling, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-24 13:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek

On Wed, May 24, 2017 at 09:16:28AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Tue, May 23, 2017 at 09:25:08AM +0200, Ingo Molnar wrote:
> > > 
> > > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > 
> > > > v2 had issues on -tip tree and triggered a warning. It seems to have
> > > > disappeared. Perhaps it was due to another timer issue. Anyway this
> > > > version brings more debugging informations, with a layout that is more
> > > > bisection-friendly and it also handles ticks that fire outside IRQ
> > > > context and thus carry NULL irq regs. This happen when
> > > > hrtimer_interrupt() is called on hotplug cpu down for example.
> > > > 
> > > > We'll see if the issue arises again.
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > > 	nohz/fixes
> > > > 
> > > > HEAD: cd15f46b284f04dbedd065a9d99a4e0badae379a
> > > > 
> > > > Thanks,
> > > > 	Frederic
> > > > ---
> > > > 
> > > > Frederic Weisbecker (2):
> > > >       nohz: Add hrtimer sanity check
> > > >       nohz: Fix collision between tick and other hrtimers, again
> > > > 
> > > > 
> > > >  kernel/time/tick-sched.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
> > > >  kernel/time/tick-sched.h |  2 ++
> > > >  2 files changed, 45 insertions(+), 5 deletions(-)
> > > 
> > > So I think the 3 commits queued up right now:
> > > 
> > >  99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> > >  411fe24e6b7c: nohz: Fix collision between tick and other hrtimers, again
> > >  ce6cf9a15d62: nohz: Add hrtimer sanity check
> > > 
> > > are OK and I'd not rebase them unless there's some breakage.
> > > 
> > > One thing I noticed: your second series does appear to have:
> > > 
> > >  99fa871820cf: nohz: Reset next_tick cache even when the timer has no regs
> > > 
> > > is that intentional? That is pretty much the only commit I'd love to rebase with a 
> > > proper description added.
> > 
> > Yes in my latest series I melted "nohz: Reset next_tick cache even when the timer has no regs"
> > into "nohz: Fix collision between tick and other hrtimers, again" because it's a fixup and
> > keeping that patch separate may break bisection.
> > 
> > So ideally, it would be nice if you could fixup 411fe24e6b7c with 99fa871820cf. That's roughly
> > all I did in my latest series.
> 
> So the interdiff between your two patches and the 3 commits already queued up is:
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index e3043873fcdc..30253ed0380b 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
>  		touch_softlockup_watchdog_sched();
>  		if (is_idle_task(current))
>  			ts->idle_jiffies++;
> -		/*
> -		 * In case the current tick fired too early past its expected
> -		 * expiration, make sure we don't bypass the next clock reprogramming
> -		 * to the same deadline.
> -		 */
> -		ts->next_tick = 0;
>  	}
>  #endif
>  	update_process_times(user_mode(regs));
> @@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
>  	tick_sched_handle(ts, regs);
>  
>  	/* No need to reprogram if we are running tickless  */
> -	if (unlikely(ts->tick_stopped))
> +	if (unlikely(ts->tick_stopped)) {
> +		/*
> +		 * In case the current tick fired too early past its expected
> +		 * expiration, make sure we don't bypass the next clock reprogramming
> +		 * to the same deadline.
> +		 */
> +		ts->next_tick = 0;
>  		return;
> +	}
>  
>  	hrtimer_forward(&ts->sched_timer, now, tick_period);
>  	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
> @@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
>  	 */
>  	if (regs)
>  		tick_sched_handle(ts, regs);
> -	else
> -		ts->next_tick = 0;
>  
>  	/* No need to reprogram if we are in idle or full dynticks mode */
> -	if (unlikely(ts->tick_stopped))
> +	if (unlikely(ts->tick_stopped)) {
> +		/*
> +		 * In case the current tick fired too early past its expected
> +		 * expiration, make sure we don't bypass the next clock reprogramming
> +		 * to the same deadline.
> +		 */
> +		ts->next_tick = 0;
>  		return HRTIMER_NORESTART;
> +	}
>  
>  	hrtimer_forward(timer, now, tick_period);
>  
> 
> ... so the two are not the same - I'd rather not rebase it, I'd like to keep what 
> is working, we had problems with these changes before ...
> 
> If you'd like the changes in this interdiff to be applied as well, please add a 
> changelog to it and post it as a fourth patch.

After all, things are ok as they are. The difference is (at least intended to be) cosmetic
and I'm not sure it's even better with the new version of the patches.

What can I do for the changelog of the top patch in your current branch? Should I repost
the patch with a changelog? I may need to add a comment as well on the code. In the end you'll
need to only rebase that one and the code diff will only be an added comment. How does that sound?

Thanks.

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-24  7:16     ` Ingo Molnar
  2017-05-24 13:28       ` Frederic Weisbecker
@ 2017-05-26  2:10       ` Frederic Weisbecker
  2017-05-26  6:13         ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-26  2:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek

On Wed, May 24, 2017 at 09:16:28AM +0200, Ingo Molnar wrote:
> So the interdiff between your two patches and the 3 commits already queued up is:
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index e3043873fcdc..30253ed0380b 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
>  		touch_softlockup_watchdog_sched();
>  		if (is_idle_task(current))
>  			ts->idle_jiffies++;
> -		/*
> -		 * In case the current tick fired too early past its expected
> -		 * expiration, make sure we don't bypass the next clock reprogramming
> -		 * to the same deadline.
> -		 */
> -		ts->next_tick = 0;
>  	}
>  #endif
>  	update_process_times(user_mode(regs));
> @@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
>  	tick_sched_handle(ts, regs);
>  
>  	/* No need to reprogram if we are running tickless  */
> -	if (unlikely(ts->tick_stopped))
> +	if (unlikely(ts->tick_stopped)) {
> +		/*
> +		 * In case the current tick fired too early past its expected
> +		 * expiration, make sure we don't bypass the next clock reprogramming
> +		 * to the same deadline.
> +		 */
> +		ts->next_tick = 0;
>  		return;
> +	}
>  
>  	hrtimer_forward(&ts->sched_timer, now, tick_period);
>  	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
> @@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
>  	 */
>  	if (regs)
>  		tick_sched_handle(ts, regs);
> -	else
> -		ts->next_tick = 0;
>  
>  	/* No need to reprogram if we are in idle or full dynticks mode */
> -	if (unlikely(ts->tick_stopped))
> +	if (unlikely(ts->tick_stopped)) {
> +		/*
> +		 * In case the current tick fired too early past its expected
> +		 * expiration, make sure we don't bypass the next clock reprogramming
> +		 * to the same deadline.
> +		 */
> +		ts->next_tick = 0;
>  		return HRTIMER_NORESTART;
> +	}
>  
>  	hrtimer_forward(timer, now, tick_period);
>  
> 
> ... so the two are not the same - I'd rather not rebase it, I'd like to keep what 
> is working, we had problems with these changes before ...
> 
> If you'd like the changes in this interdiff to be applied as well, please add a 
> changelog to it and post it as a fourth patch.
> 
> Thanks,
> 
> 	Ingo

So if you like, you can replace the top patch with the following. It's exactly
the same code, I've only added a comment and a changelog:

---
>From 72956bf08c3b2e506a5ce5ec4faac9fd6b097307 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Mon, 15 May 2017 14:56:50 +0200
Subject: [PATCH] nohz: Reset next_tick cache even when the timer has no regs

The tick IRQ regs can be NULL if hrtimer_interrupt() is called from
non-interrupt contexts (ex: hotplug CPU down). For such very special
path we forget to clean the cached next tick deadline. If we are in
dynticks mode and the actual timer deadline is ahead of us, we might
perform a buggy bypass of the next clock reprogramming.

In fact since CPU down is the only user I'm aware of, this fix is likely
unnecessary as dying CPUs already clean their tick deadline cache. But
given how hard it is to debug such timer cache related issue, we should
never be short on paranoid measures.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/time/tick-sched.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 764d290..ed18ca5 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1200,8 +1200,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	 * Do not call, when we are not in irq context and have
 	 * no valid regs pointer
 	 */
-	if (regs)
+	if (regs) {
 		tick_sched_handle(ts, regs);
+	} else {
+		/*
+		 * IRQ regs are NULL if hrtimer_interrupt() is called from
+		 * non-interrupt contexts (ex: hotplug cpu down). Make sure to
+		 * clean the cached next tick deadline to avoid buggy bypass of
+		 * clock reprog.
+		 */
+		ts->next_tick = 0;
+	}
 
 	/* No need to reprogram if we are in idle or full dynticks mode */
 	if (unlikely(ts->tick_stopped))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-26  2:10       ` Frederic Weisbecker
@ 2017-05-26  6:13         ` Ingo Molnar
  2017-05-29 13:55           ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-05-26  6:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Wed, May 24, 2017 at 09:16:28AM +0200, Ingo Molnar wrote:
> > So the interdiff between your two patches and the 3 commits already queued up is:
> > 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index e3043873fcdc..30253ed0380b 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
> >  		touch_softlockup_watchdog_sched();
> >  		if (is_idle_task(current))
> >  			ts->idle_jiffies++;
> > -		/*
> > -		 * In case the current tick fired too early past its expected
> > -		 * expiration, make sure we don't bypass the next clock reprogramming
> > -		 * to the same deadline.
> > -		 */
> > -		ts->next_tick = 0;
> >  	}
> >  #endif
> >  	update_process_times(user_mode(regs));
> > @@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
> >  	tick_sched_handle(ts, regs);
> >  
> >  	/* No need to reprogram if we are running tickless  */
> > -	if (unlikely(ts->tick_stopped))
> > +	if (unlikely(ts->tick_stopped)) {
> > +		/*
> > +		 * In case the current tick fired too early past its expected
> > +		 * expiration, make sure we don't bypass the next clock reprogramming
> > +		 * to the same deadline.
> > +		 */
> > +		ts->next_tick = 0;
> >  		return;
> > +	}
> >  
> >  	hrtimer_forward(&ts->sched_timer, now, tick_period);
> >  	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
> > @@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
> >  	 */
> >  	if (regs)
> >  		tick_sched_handle(ts, regs);
> > -	else
> > -		ts->next_tick = 0;
> >  
> >  	/* No need to reprogram if we are in idle or full dynticks mode */
> > -	if (unlikely(ts->tick_stopped))
> > +	if (unlikely(ts->tick_stopped)) {
> > +		/*
> > +		 * In case the current tick fired too early past its expected
> > +		 * expiration, make sure we don't bypass the next clock reprogramming
> > +		 * to the same deadline.
> > +		 */
> > +		ts->next_tick = 0;
> >  		return HRTIMER_NORESTART;
> > +	}
> >  
> >  	hrtimer_forward(timer, now, tick_period);
> >  
> > 
> > ... so the two are not the same - I'd rather not rebase it, I'd like to keep what 
> > is working, we had problems with these changes before ...
> > 
> > If you'd like the changes in this interdiff to be applied as well, please add a 
> > changelog to it and post it as a fourth patch.
> > 
> > Thanks,
> > 
> > 	Ingo
> 
> So if you like, you can replace the top patch with the following. It's exactly
> the same code, I've only added a comment and a changelog:
> 
> ---
> From 72956bf08c3b2e506a5ce5ec4faac9fd6b097307 Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <fweisbec@gmail.com>
> Date: Mon, 15 May 2017 14:56:50 +0200
> Subject: [PATCH] nohz: Reset next_tick cache even when the timer has no regs
> 
> The tick IRQ regs can be NULL if hrtimer_interrupt() is called from
> non-interrupt contexts (ex: hotplug CPU down). For such very special
> path we forget to clean the cached next tick deadline. If we are in
> dynticks mode and the actual timer deadline is ahead of us, we might
> perform a buggy bypass of the next clock reprogramming.
> 
> In fact since CPU down is the only user I'm aware of, this fix is likely
> unnecessary as dying CPUs already clean their tick deadline cache. But
> given how hard it is to debug such timer cache related issue, we should
> never be short on paranoid measures.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/time/tick-sched.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 764d290..ed18ca5 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1200,8 +1200,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
>  	 * Do not call, when we are not in irq context and have
>  	 * no valid regs pointer
>  	 */
> -	if (regs)
> +	if (regs) {
>  		tick_sched_handle(ts, regs);
> +	} else {
> +		/*
> +		 * IRQ regs are NULL if hrtimer_interrupt() is called from
> +		 * non-interrupt contexts (ex: hotplug cpu down). Make sure to
> +		 * clean the cached next tick deadline to avoid buggy bypass of
> +		 * clock reprog.
> +		 */
> +		ts->next_tick = 0;
> +	}
>  
>  	/* No need to reprogram if we are in idle or full dynticks mode */
>  	if (unlikely(ts->tick_stopped))

Well, this does not answer my question: between latest tip:timers/nohz and the 
patches you posted there's a delta, so it's not just a pure rebase.

I can do a rebase to resolve the bisectability problem (which isn't very serious 
by the way, only a single commit wide window, right?), but only if 'git diff 
old_branch new_branch' comes up empty.

In every other case let's iterate the existing timers/nohz with additional 
patches, ok? I'd rather have a finegrained iteration with well-tested intermediate 
stages than break things again.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-26  6:13         ` Ingo Molnar
@ 2017-05-29 13:55           ` Frederic Weisbecker
  2017-05-30  5:47             ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-29 13:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek

On Fri, May 26, 2017 at 08:13:20AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > On Wed, May 24, 2017 at 09:16:28AM +0200, Ingo Molnar wrote:
> > > So the interdiff between your two patches and the 3 commits already queued up is:
> > > 
> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > index e3043873fcdc..30253ed0380b 100644
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -150,12 +150,6 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
> > >  		touch_softlockup_watchdog_sched();
> > >  		if (is_idle_task(current))
> > >  			ts->idle_jiffies++;
> > > -		/*
> > > -		 * In case the current tick fired too early past its expected
> > > -		 * expiration, make sure we don't bypass the next clock reprogramming
> > > -		 * to the same deadline.
> > > -		 */
> > > -		ts->next_tick = 0;
> > >  	}
> > >  #endif
> > >  	update_process_times(user_mode(regs));
> > > @@ -1103,8 +1097,15 @@ static void tick_nohz_handler(struct clock_event_device *dev)
> > >  	tick_sched_handle(ts, regs);
> > >  
> > >  	/* No need to reprogram if we are running tickless  */
> > > -	if (unlikely(ts->tick_stopped))
> > > +	if (unlikely(ts->tick_stopped)) {
> > > +		/*
> > > +		 * In case the current tick fired too early past its expected
> > > +		 * expiration, make sure we don't bypass the next clock reprogramming
> > > +		 * to the same deadline.
> > > +		 */
> > > +		ts->next_tick = 0;
> > >  		return;
> > > +	}
> > >  
> > >  	hrtimer_forward(&ts->sched_timer, now, tick_period);
> > >  	tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
> > > @@ -1202,12 +1203,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
> > >  	 */
> > >  	if (regs)
> > >  		tick_sched_handle(ts, regs);
> > > -	else
> > > -		ts->next_tick = 0;
> > >  
> > >  	/* No need to reprogram if we are in idle or full dynticks mode */
> > > -	if (unlikely(ts->tick_stopped))
> > > +	if (unlikely(ts->tick_stopped)) {
> > > +		/*
> > > +		 * In case the current tick fired too early past its expected
> > > +		 * expiration, make sure we don't bypass the next clock reprogramming
> > > +		 * to the same deadline.
> > > +		 */
> > > +		ts->next_tick = 0;
> > >  		return HRTIMER_NORESTART;
> > > +	}
> > >  
> > >  	hrtimer_forward(timer, now, tick_period);
> > >  
> > > 
> > > ... so the two are not the same - I'd rather not rebase it, I'd like to keep what 
> > > is working, we had problems with these changes before ...
> > > 
> > > If you'd like the changes in this interdiff to be applied as well, please add a 
> > > changelog to it and post it as a fourth patch.
> > > 
> > > Thanks,
> > > 
> > > 	Ingo
> > 
> > So if you like, you can replace the top patch with the following. It's exactly
> > the same code, I've only added a comment and a changelog:
> > 
> > ---
> > From 72956bf08c3b2e506a5ce5ec4faac9fd6b097307 Mon Sep 17 00:00:00 2001
> > From: Frederic Weisbecker <fweisbec@gmail.com>
> > Date: Mon, 15 May 2017 14:56:50 +0200
> > Subject: [PATCH] nohz: Reset next_tick cache even when the timer has no regs
> > 
> > The tick IRQ regs can be NULL if hrtimer_interrupt() is called from
> > non-interrupt contexts (ex: hotplug CPU down). For such very special
> > path we forget to clean the cached next tick deadline. If we are in
> > dynticks mode and the actual timer deadline is ahead of us, we might
> > perform a buggy bypass of the next clock reprogramming.
> > 
> > In fact since CPU down is the only user I'm aware of, this fix is likely
> > unnecessary as dying CPUs already clean their tick deadline cache. But
> > given how hard it is to debug such timer cache related issue, we should
> > never be short on paranoid measures.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  kernel/time/tick-sched.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 764d290..ed18ca5 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1200,8 +1200,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
> >  	 * Do not call, when we are not in irq context and have
> >  	 * no valid regs pointer
> >  	 */
> > -	if (regs)
> > +	if (regs) {
> >  		tick_sched_handle(ts, regs);
> > +	} else {
> > +		/*
> > +		 * IRQ regs are NULL if hrtimer_interrupt() is called from
> > +		 * non-interrupt contexts (ex: hotplug cpu down). Make sure to
> > +		 * clean the cached next tick deadline to avoid buggy bypass of
> > +		 * clock reprog.
> > +		 */
> > +		ts->next_tick = 0;
> > +	}
> >  
> >  	/* No need to reprogram if we are in idle or full dynticks mode */
> >  	if (unlikely(ts->tick_stopped))
> 
> Well, this does not answer my question: between latest tip:timers/nohz and the 
> patches you posted there's a delta, so it's not just a pure rebase.

Yeah but like I said, you can forget the series I posted because the diff is
mostly cosmetic and things are actually ok as they are in tip:timers/nohz

The only thing that bothers me is the fact that the HEAD of this branch doesn't have
a changelog or even just a comment.

> 
> I can do a rebase to resolve the bisectability problem (which isn't very serious 
> by the way, only a single commit wide window, right?), but only if 'git diff 
> old_branch new_branch' comes up empty.
> 
> In every other case let's iterate the existing timers/nohz with additional 
> patches, ok? I'd rather have a finegrained iteration with well-tested intermediate 
> stages than break things again.

Ok so either we simply fixup HEAD~ with HEAD or we provide a changelog to the very last
patch. Which way do you prefer?

Thanks.

> 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-29 13:55           ` Frederic Weisbecker
@ 2017-05-30  5:47             ` Ingo Molnar
  2017-05-30 12:51               ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-05-30  5:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> > Well, this does not answer my question: between latest tip:timers/nohz and the 
> > patches you posted there's a delta, so it's not just a pure rebase.
> 
> Yeah but like I said, you can forget the series I posted because the diff is
> mostly cosmetic and things are actually ok as they are in tip:timers/nohz
> 
> The only thing that bothers me is the fact that the HEAD of this branch doesn't have
> a changelog or even just a comment.

We can still amend that - is this changelog what you had in mind:

  nohz: Reset next_tick cache even when the timer has no regs

  Handle tick interrupts whose regs are NULL, out of general paranoia. It happens 
  when hrtimer_interrupt() is called from non-interrupt contexts, such as hotplug
  CPU down events.

?

Or you can send me a longer version as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3
  2017-05-30  5:47             ` Ingo Molnar
@ 2017-05-30 12:51               ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2017-05-30 12:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Rik van Riel, James Hartsock,
	Thomas Gleixner, stable, Tim Wright, Pavel Machek

On Tue, May 30, 2017 at 07:47:09AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > > Well, this does not answer my question: between latest tip:timers/nohz and the 
> > > patches you posted there's a delta, so it's not just a pure rebase.
> > 
> > Yeah but like I said, you can forget the series I posted because the diff is
> > mostly cosmetic and things are actually ok as they are in tip:timers/nohz
> > 
> > The only thing that bothers me is the fact that the HEAD of this branch doesn't have
> > a changelog or even just a comment.
> 
> We can still amend that - is this changelog what you had in mind:
> 
>   nohz: Reset next_tick cache even when the timer has no regs
> 
>   Handle tick interrupts whose regs are NULL, out of general paranoia. It happens 
>   when hrtimer_interrupt() is called from non-interrupt contexts, such as hotplug
>   CPU down events.
> 
> ?

Yep that one is fine. Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-05-30 12:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 13:51 [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Frederic Weisbecker
2017-05-19 13:51 ` [PATCH 1/2] nohz: Add hrtimer sanity check Frederic Weisbecker
2017-05-19 13:51 ` [PATCH 2/2] nohz: Fix collision between tick and other hrtimers, again Frederic Weisbecker
2017-05-23  7:25 ` [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v3 Ingo Molnar
2017-05-23 13:10   ` Frederic Weisbecker
2017-05-24  7:16     ` Ingo Molnar
2017-05-24 13:28       ` Frederic Weisbecker
2017-05-26  2:10       ` Frederic Weisbecker
2017-05-26  6:13         ` Ingo Molnar
2017-05-29 13:55           ` Frederic Weisbecker
2017-05-30  5:47             ` Ingo Molnar
2017-05-30 12:51               ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).