linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	John Stultz <john.stultz@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Liav Rehana <liavr@mellanox.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Parit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>
Subject: Re: [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion
Date: Fri, 9 Dec 2016 10:38:44 +1100	[thread overview]
Message-ID: <20161208233844.GJ13139@umbus.fritz.box> (raw)
In-Reply-To: <20161208204228.688545601@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 5654 bytes --]

On Thu, Dec 08, 2016 at 08:49:32PM -0000, Thomas Gleixner wrote:
> The clocksource delta to nanoseconds conversion is using signed math, but
> the delta is unsigned. This makes the conversion space smaller than
> necessary and in case of a multiplication overflow the conversion can
> become negative. The conversion is done with scaled math:
> 
>     s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
> 
> Shifting a signed integer right obvioulsy preserves the sign, which has
> interesting consequences:
>  
>  - Time jumps backwards
>  
>  - __iter_div_u64_rem() which is used in one of the calling code pathes
>    will take forever to piecewise calculate the seconds/nanoseconds part.
> 
> This has been reported by several people with different scenarios:
> 
> David observed that when stopping a VM with a debugger:
> 
>  "It was essentially the stopped by debugger case.  I forget exactly why,
>   but the guest was being explicitly stopped from outside, it wasn't just
>   scheduling lag.  I think it was something in the vicinity of 10 minutes
>   stopped."
> 
>  When lifting the stop the machine went dead.
> 
> The stopped by debugger case is not really interesting, but nevertheless it
> would be a good thing not to die completely.
> 
> But this was also observed on a live system by Liav:
> 
>  "When the OS is too overloaded, delta will get a high enough value for the
>   msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
>   after the shift the nsec variable will gain a value similar to
>   0xffffffffff000000."
> 
> Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
> ("time: Add cycles to nanoseconds translation"). It had been fixed a year
> ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
> timekeeping_get_ns()").
> 
> Though it's not surprising that the issue has been reintroduced because the
> function itself and the whole call chain uses s64 for the result and the
> propagation of it. The change in this recent commit is subtle:
> 
>    s64 nsec;
> 
> -  nsec = (d * m + n) >> s:
> +  nsec = d * m + n;
> +  nsec >>= s;
> 
> d being type of cycles_t adds another level of obfuscation.
> 
> This wouldn't have happened if the previous change to unsigned computation
> would have made the 'nsec' variable u64 right away and a follow up patch
> had cleaned up the whole call chain.
> 
> There have been patches submitted which basically did a revert of the above
> patch leaving everything else unchanged as signed. Back to square one. This
> spawned a admittedly pointless discussion about potential users which rely
> on the unsigned behaviour until someone pointed out that it had been fixed
> before. The changelogs of said patches added further confusion as they made
> finally false claims about the consequences for eventual users which expect
> signed results.
> 
> Despite delta being cycles_t, aka. u64, it's very well possible to hand in
> a signed negative value and the signed computation will happily return the
> correct result. But nobody actually sat down and analyzed the code which
> was added as user after the propably unintended signed conversion.
> 
> Though in sensitive code like this it's better to analyze it proper and
> make sure that nothing relies on this than hunting the subtle wreckage half
> a year later. After analyzing all call chains it stands that no caller can
> hand in a negative value (which actually would work due to the s64 cast)
> and rely on the signed math to do the right thing.
> 
> Change the conversion function to unsigned math. The conversion of all call
> chains is done in a follow up patch.
> 
> This solves the starvation issue, which was caused by the negative result,
> but it does not solve the underlying problem. It merily procrastinates
> it. When the timekeeper update is deferred long enough that the unsigned
> multiplication overflows, then time going backwards is observable again.
> 
> It does neither solve the issue of clocksources with a small counter width
> which will wrap around possibly several times and cause random time stamps
> to be generated. But those are usually not found on systems used for
> virtualization, so this is likely a non issue.
> 
> I took the liberty to claim authorship for this simply because
> analyzing all callsites and writing the changelog took substantially
> more time than just making the simple s/s64/u64/ change and ignore the
> rest.
> 
> Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
> Reported-by: David Gibson <david@gibson.dropbear.id.au>
> Reported-by: Liav Rehana <liavr@mellanox.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  kernel/time/timekeeping.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul
>  static inline u32 arch_gettimeoffset(void) { return 0; }
>  #endif
>  
> -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
>  					  cycle_t delta)
>  {
> -	s64 nsec;
> +	u64 nsec;
>  
>  	nsec = delta * tkr->mult + tkr->xtime_nsec;
>  	nsec >>= tkr->shift;
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-12-08 23:47 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-08 20:49 [patch 0/6] timekeeping: Cure the signed/unsigned wreckage Thomas Gleixner
2016-12-08 20:49 ` [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion Thomas Gleixner
2016-12-08 23:38   ` David Gibson [this message]
2016-12-09 11:13   ` [tip:timers/core] timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 2/6] timekeeping: Make the conversion call chain consistently unsigned Thomas Gleixner
2016-12-08 23:39   ` David Gibson
2016-12-09 11:13   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 3/6] timekeeping: Get rid of pointless typecasts Thomas Gleixner
2016-12-08 23:40   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 4/6] timekeeping: Use mul_u64_u32_shr() instead of open coding it Thomas Gleixner
2016-12-08 23:41   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Thomas Gleixner
2016-12-09  4:08   ` Ingo Molnar
2016-12-09  4:29     ` Ingo Molnar
2016-12-09  4:39       ` John Stultz
2016-12-09  4:48     ` Peter Zijlstra
2016-12-09  5:22       ` Ingo Molnar
2016-12-09  5:41         ` Peter Zijlstra
2016-12-09  5:11   ` Peter Zijlstra
2016-12-09  6:08     ` Peter Zijlstra
2016-12-09  5:26   ` Peter Zijlstra
2016-12-09  6:38     ` Peter Zijlstra
2016-12-09  8:30       ` Peter Zijlstra
2016-12-09  9:11         ` Peter Zijlstra
2016-12-09 10:01         ` Peter Zijlstra
2016-12-09 17:32         ` Chris Metcalf
2017-01-14 12:51         ` [tip:timers/core] math64, timers: Fix 32bit mul_u64_u32_shr() and friends tip-bot for Peter Zijlstra
2016-12-09 10:18       ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Peter Zijlstra
2016-12-09 17:20         ` Chris Metcalf
2016-12-08 20:49 ` [patch 6/6] [RFD] timekeeping: Get rid of cycle_t Thomas Gleixner
2016-12-08 23:43   ` David Gibson
2016-12-09  4:52 ` [patch 0/6] timekeeping: Cure the signed/unsigned wreckage John Stultz
2016-12-09  5:30 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161208233844.GJ13139@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=christopher.s.hall@intel.com \
    --cc=cmetcalf@mellanox.com \
    --cc=john.stultz@linaro.org \
    --cc=liavr@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).