linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <john.stultz@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	David Gibson <david@gibson.dropbear.id.au>,
	Liav Rehana <liavr@mellanox.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Parit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>
Subject: [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion
Date: Thu, 08 Dec 2016 20:49:32 -0000	[thread overview]
Message-ID: <20161208204228.688545601@linutronix.de> (raw)
In-Reply-To: 20161208202623.883855034@linutronix.de

[-- Attachment #1: timekeeping--Force-unsigned-conversions.patch --]
[-- Type: text/plain, Size: 4972 bytes --]

The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:
 
 - Time jumps backwards
 
 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycles_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycles_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
-static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
+static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
 					  cycle_t delta)
 {
-	s64 nsec;
+	u64 nsec;
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;

  reply	other threads:[~2016-12-08 20:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-08 20:49 [patch 0/6] timekeeping: Cure the signed/unsigned wreckage Thomas Gleixner
2016-12-08 20:49 ` Thomas Gleixner [this message]
2016-12-08 23:38   ` [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion David Gibson
2016-12-09 11:13   ` [tip:timers/core] timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 2/6] timekeeping: Make the conversion call chain consistently unsigned Thomas Gleixner
2016-12-08 23:39   ` David Gibson
2016-12-09 11:13   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 3/6] timekeeping: Get rid of pointless typecasts Thomas Gleixner
2016-12-08 23:40   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 4/6] timekeeping: Use mul_u64_u32_shr() instead of open coding it Thomas Gleixner
2016-12-08 23:41   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Thomas Gleixner
2016-12-09  4:08   ` Ingo Molnar
2016-12-09  4:29     ` Ingo Molnar
2016-12-09  4:39       ` John Stultz
2016-12-09  4:48     ` Peter Zijlstra
2016-12-09  5:22       ` Ingo Molnar
2016-12-09  5:41         ` Peter Zijlstra
2016-12-09  5:11   ` Peter Zijlstra
2016-12-09  6:08     ` Peter Zijlstra
2016-12-09  5:26   ` Peter Zijlstra
2016-12-09  6:38     ` Peter Zijlstra
2016-12-09  8:30       ` Peter Zijlstra
2016-12-09  9:11         ` Peter Zijlstra
2016-12-09 10:01         ` Peter Zijlstra
2016-12-09 17:32         ` Chris Metcalf
2017-01-14 12:51         ` [tip:timers/core] math64, timers: Fix 32bit mul_u64_u32_shr() and friends tip-bot for Peter Zijlstra
2016-12-09 10:18       ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Peter Zijlstra
2016-12-09 17:20         ` Chris Metcalf
2016-12-08 20:49 ` [patch 6/6] [RFD] timekeeping: Get rid of cycle_t Thomas Gleixner
2016-12-08 23:43   ` David Gibson
2016-12-09  4:52 ` [patch 0/6] timekeeping: Cure the signed/unsigned wreckage John Stultz
2016-12-09  5:30 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161208204228.688545601@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=christopher.s.hall@intel.com \
    --cc=cmetcalf@mellanox.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=john.stultz@linaro.org \
    --cc=liavr@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).