linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] timers: Fix base clock forwarding issues
@ 2016-10-22 11:07 Thomas Gleixner
  2016-10-22 11:07 ` [patch 1/2] timers: Prevent base clock rewind when forwarding clock Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Thomas Gleixner @ 2016-10-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Ashton Holmes, Michal Necasek, stern, michael.thayer,
	knut.osmundsen, Peter Zijlstra, Ingo Molnar, rt

Several people have reported USB timeout issues starting in 4.8 which are
related to the timer wheel overhaul.

The following two patches adress the brown paperbag bugs in that code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch 1/2] timers: Prevent base clock rewind when forwarding clock
  2016-10-22 11:07 [patch 0/2] timers: Fix base clock forwarding issues Thomas Gleixner
@ 2016-10-22 11:07 ` Thomas Gleixner
  2016-10-25 14:58   ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
  2016-10-22 11:07 ` [patch 2/2] timers: Prevent base clock corruption when forwarding Thomas Gleixner
       [not found] ` <CA+OSeukz5OSzHdezcUpL1hYj9FurYxu8Nwm5CToLjvyCNvesbQ@mail.gmail.com>
  2 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2016-10-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Ashton Holmes, Michal Necasek, stern, michael.thayer,
	knut.osmundsen, Peter Zijlstra, Ingo Molnar, rt, stable

[-- Attachment #1: timers--Prevent-base-clock-rewind-when-forwarding-clock.patch --]
[-- Type: text/plain, Size: 3435 bytes --]

Ashton and Michal reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.

This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:

run_timers()
  while (jiffies >= base->clk) {
    collect_expired_timers();
    base->clk++;
    expire_timers();
  }          

So base->clk = jiffies + 1. Now the cpu goes idle:

idle()
  get_next_timer_interrupt()
    nextevt = __next_time_interrupt();
    if (time_after(nextevt, base->clk))
       	base->clk = jiffies;

jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.

base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():

 jiffies = 0
 base->clk = 1

A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:

  base->clk + delta = 1 + 63 = 64

The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.

If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.

Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:

  if (time_after(jiffies, base->clk) {
    if (time_after(nextevt, base->clk))
       base->clk = jiffies;

jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.

get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.

Thanks to Ashton for bisecting and providing trace data!

Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michal Necasek <michal.necasek@oracle.com>
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timer.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1510,12 +1510,16 @@ u64 get_next_timer_interrupt(unsigned lo
 	is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA);
 	base->next_expiry = nextevt;
 	/*
-	 * We have a fresh next event. Check whether we can forward the base:
+	 * We have a fresh next event. Check whether we can forward the
+	 * base. We can only do that when @basej is past base->clk
+	 * otherwise we might rewind base->clk.
 	 */
-	if (time_after(nextevt, jiffies))
-		base->clk = jiffies;
-	else if (time_after(nextevt, base->clk))
-		base->clk = nextevt;
+	if (time_after(basej, base->clk)) {
+		if (time_after(nextevt, basej))
+			base->clk = basej;
+		else if (time_after(nextevt, base->clk))
+			base->clk = nextevt;
+	}
 
 	if (time_before_eq(nextevt, basej)) {
 		expires = basem;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch 2/2] timers: Prevent base clock corruption when forwarding
  2016-10-22 11:07 [patch 0/2] timers: Fix base clock forwarding issues Thomas Gleixner
  2016-10-22 11:07 ` [patch 1/2] timers: Prevent base clock rewind when forwarding clock Thomas Gleixner
@ 2016-10-22 11:07 ` Thomas Gleixner
  2016-10-25 14:59   ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
       [not found] ` <CA+OSeukz5OSzHdezcUpL1hYj9FurYxu8Nwm5CToLjvyCNvesbQ@mail.gmail.com>
  2 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2016-10-22 11:07 UTC (permalink / raw)
  To: LKML
  Cc: Ashton Holmes, Michal Necasek, stern, michael.thayer,
	knut.osmundsen, Peter Zijlstra, Ingo Molnar, rt, stable

[-- Attachment #1: timers--Prevent-base-clock-corruption-when-forwarding.patch --]
[-- Type: text/plain, Size: 3361 bytes --]

When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:

1) Forwarding a remote base unlocked

The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.

Solution is simple: Invoke the forwarding after the target base is locked.

2) Possible corruption due to jiffies advancing

This is similar to the issue in get_net_timer_interrupt() which was
fixed in the previous patch. jiffies can advance between check and
assignement and therefor advancing base->clk after the next expiry
value.

 So we need to read jiffies once into a local variable once and do the
checks and assignment with the local copy.

Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michal Necasek <michal.necasek@oracle.com>
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timer.c |   23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -878,7 +878,7 @@ static inline struct timer_base *get_tim
 
 #ifdef CONFIG_NO_HZ_COMMON
 static inline struct timer_base *
-__get_target_base(struct timer_base *base, unsigned tflags)
+get_target_base(struct timer_base *base, unsigned tflags)
 {
 #ifdef CONFIG_SMP
 	if ((tflags & TIMER_PINNED) || !base->migration_enabled)
@@ -891,25 +891,27 @@ static inline struct timer_base *
 
 static inline void forward_timer_base(struct timer_base *base)
 {
+	unsigned long jnow = READ_ONCE(jiffies);
+
 	/*
 	 * We only forward the base when it's idle and we have a delta between
 	 * base clock and jiffies.
 	 */
-	if (!base->is_idle || (long) (jiffies - base->clk) < 2)
+	if (!base->is_idle || (long) (jnow - base->clk) < 2)
 		return;
 
 	/*
 	 * If the next expiry value is > jiffies, then we fast forward to
 	 * jiffies otherwise we forward to the next expiry value.
 	 */
-	if (time_after(base->next_expiry, jiffies))
-		base->clk = jiffies;
+	if (time_after(base->next_expiry, jnow))
+		base->clk = jnow;
 	else
 		base->clk = base->next_expiry;
 }
 #else
 static inline struct timer_base *
-__get_target_base(struct timer_base *base, unsigned tflags)
+get_target_base(struct timer_base *base, unsigned tflags)
 {
 	return get_timer_this_cpu_base(tflags);
 }
@@ -917,14 +919,6 @@ static inline struct timer_base *
 static inline void forward_timer_base(struct timer_base *base) { }
 #endif
 
-static inline struct timer_base *
-get_target_base(struct timer_base *base, unsigned tflags)
-{
-	struct timer_base *target = __get_target_base(base, tflags);
-
-	forward_timer_base(target);
-	return target;
-}
 
 /*
  * We are using hashed locking: Holding per_cpu(timer_bases[x]).lock means
@@ -1025,6 +1019,9 @@ static inline int
 		}
 	}
 
+	/* Try to forward a stale timer base clock */
+	forward_timer_base(base);
+
 	timer->expires = expires;
 	/*
 	 * If 'idx' was calculated above and the base time did not advance

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:timers/urgent] timers: Prevent base clock rewind when forwarding clock
  2016-10-22 11:07 ` [patch 1/2] timers: Prevent base clock rewind when forwarding clock Thomas Gleixner
@ 2016-10-25 14:58   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-10-25 14:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: michal.necasek, michael.thayer, hpa, linux-kernel, mingo, tglx,
	scoopta, peterz

Commit-ID:  041ad7bc758db259bb960ef795197dd14aab19a6
Gitweb:     http://git.kernel.org/tip/041ad7bc758db259bb960ef795197dd14aab19a6
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 22 Oct 2016 11:07:35 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 25 Oct 2016 16:32:50 +0200

timers: Prevent base clock rewind when forwarding clock

Ashton and Michael reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.

This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:

run_timers()
  while (jiffies >= base->clk) {
    collect_expired_timers();
    base->clk++;
    expire_timers();
  }          

So base->clk = jiffies + 1. Now the cpu goes idle:

idle()
  get_next_timer_interrupt()
    nextevt = __next_time_interrupt();
    if (time_after(nextevt, base->clk))
       	base->clk = jiffies;

jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.

base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():

 jiffies = 0
 base->clk = 1

A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:

  base->clk + delta = 1 + 63 = 64

The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.

If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.

Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:

  if (time_after(jiffies, base->clk) {
    if (time_after(nextevt, base->clk))
       base->clk = jiffies;

jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.

get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.

Thanks to Ashton for bisecting and providing trace data!

Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michael Thayer <michael.thayer@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timer.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ccf9130..7c446fb 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1523,12 +1523,16 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 	is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA);
 	base->next_expiry = nextevt;
 	/*
-	 * We have a fresh next event. Check whether we can forward the base:
+	 * We have a fresh next event. Check whether we can forward the
+	 * base. We can only do that when @basej is past base->clk
+	 * otherwise we might rewind base->clk.
 	 */
-	if (time_after(nextevt, jiffies))
-		base->clk = jiffies;
-	else if (time_after(nextevt, base->clk))
-		base->clk = nextevt;
+	if (time_after(basej, base->clk)) {
+		if (time_after(nextevt, basej))
+			base->clk = basej;
+		else if (time_after(nextevt, base->clk))
+			base->clk = nextevt;
+	}
 
 	if (time_before_eq(nextevt, basej)) {
 		expires = basem;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:timers/urgent] timers: Prevent base clock corruption when forwarding
  2016-10-22 11:07 ` [patch 2/2] timers: Prevent base clock corruption when forwarding Thomas Gleixner
@ 2016-10-25 14:59   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-10-25 14:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, peterz, michael.thayer, tglx, michal.necasek,
	mingo, scoopta

Commit-ID:  6bad6bccf2d717f652d37e63cf261eaa23466009
Gitweb:     http://git.kernel.org/tip/6bad6bccf2d717f652d37e63cf261eaa23466009
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 22 Oct 2016 11:07:37 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 25 Oct 2016 16:32:50 +0200

timers: Prevent base clock corruption when forwarding

When a timer is enqueued we try to forward the timer base clock. This
mechanism has two issues:

1) Forwarding a remote base unlocked

The forwarding function is called from get_target_base() with the current
timer base lock held. But if the new target base is a different base than
the current base (can happen with NOHZ, sigh!) then the forwarding is done
on an unlocked base. This can lead to corruption of base->clk.

Solution is simple: Invoke the forwarding after the target base is locked.

2) Possible corruption due to jiffies advancing

This is similar to the issue in get_net_timer_interrupt() which was fixed
in the previous patch. jiffies can advance between check and assignement
and therefore advancing base->clk beyond the next expiry value.

So we need to read jiffies into a local variable once and do the checks and
assignment with the local copy.

Fixes: a683f390b93f("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michael Thayer <michael.thayer@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.253640125@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timer.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 7c446fb..c611c47 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -878,7 +878,7 @@ static inline struct timer_base *get_timer_base(u32 tflags)
 
 #ifdef CONFIG_NO_HZ_COMMON
 static inline struct timer_base *
-__get_target_base(struct timer_base *base, unsigned tflags)
+get_target_base(struct timer_base *base, unsigned tflags)
 {
 #ifdef CONFIG_SMP
 	if ((tflags & TIMER_PINNED) || !base->migration_enabled)
@@ -891,25 +891,27 @@ __get_target_base(struct timer_base *base, unsigned tflags)
 
 static inline void forward_timer_base(struct timer_base *base)
 {
+	unsigned long jnow = READ_ONCE(jiffies);
+
 	/*
 	 * We only forward the base when it's idle and we have a delta between
 	 * base clock and jiffies.
 	 */
-	if (!base->is_idle || (long) (jiffies - base->clk) < 2)
+	if (!base->is_idle || (long) (jnow - base->clk) < 2)
 		return;
 
 	/*
 	 * If the next expiry value is > jiffies, then we fast forward to
 	 * jiffies otherwise we forward to the next expiry value.
 	 */
-	if (time_after(base->next_expiry, jiffies))
-		base->clk = jiffies;
+	if (time_after(base->next_expiry, jnow))
+		base->clk = jnow;
 	else
 		base->clk = base->next_expiry;
 }
 #else
 static inline struct timer_base *
-__get_target_base(struct timer_base *base, unsigned tflags)
+get_target_base(struct timer_base *base, unsigned tflags)
 {
 	return get_timer_this_cpu_base(tflags);
 }
@@ -917,14 +919,6 @@ __get_target_base(struct timer_base *base, unsigned tflags)
 static inline void forward_timer_base(struct timer_base *base) { }
 #endif
 
-static inline struct timer_base *
-get_target_base(struct timer_base *base, unsigned tflags)
-{
-	struct timer_base *target = __get_target_base(base, tflags);
-
-	forward_timer_base(target);
-	return target;
-}
 
 /*
  * We are using hashed locking: Holding per_cpu(timer_bases[x]).lock means
@@ -1037,6 +1031,9 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
 		}
 	}
 
+	/* Try to forward a stale timer base clock */
+	forward_timer_base(base);
+
 	timer->expires = expires;
 	/*
 	 * If 'idx' was calculated above and the base time did not advance

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch 0/2] timers: Fix base clock forwarding issues
       [not found] ` <CA+OSeukz5OSzHdezcUpL1hYj9FurYxu8Nwm5CToLjvyCNvesbQ@mail.gmail.com>
@ 2016-10-27 20:42   ` Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2016-10-27 20:42 UTC (permalink / raw)
  To: Ashton Holmes
  Cc: LKML, Michal Necasek, stern, michael.thayer, knut.osmundsen,
	Peter Zijlstra, Ingo Molnar, rt

On Thu, 27 Oct 2016, Ashton Holmes wrote:
> On Sat, Oct 22, 2016 at 4:10 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > Several people have reported USB timeout issues starting in 4.8 which are
> > related to the timer wheel overhaul.
> >
> > The following two patches adress the brown paperbag bugs in that code.
> >
> > Thanks,
> >
> >         tglx
> >
> Sorry it took me so long to reply. I've been rather busy. I've been running
> kernel 4.8.4 with these two patches for the past 2 days and have yet to
> encounter any issues. Thanks.

Thanks for confirmation and all the help with tracking this down!

       tglx
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-10-27 20:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-22 11:07 [patch 0/2] timers: Fix base clock forwarding issues Thomas Gleixner
2016-10-22 11:07 ` [patch 1/2] timers: Prevent base clock rewind when forwarding clock Thomas Gleixner
2016-10-25 14:58   ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
2016-10-22 11:07 ` [patch 2/2] timers: Prevent base clock corruption when forwarding Thomas Gleixner
2016-10-25 14:59   ` [tip:timers/urgent] " tip-bot for Thomas Gleixner
     [not found] ` <CA+OSeukz5OSzHdezcUpL1hYj9FurYxu8Nwm5CToLjvyCNvesbQ@mail.gmail.com>
2016-10-27 20:42   ` [patch 0/2] timers: Fix base clock forwarding issues Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).