All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	John Stultz <john.stultz@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Liav Rehana <liavr@mellanox.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Parit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>
Subject: Re: [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion
Date: Fri, 9 Dec 2016 10:38:44 +1100	[thread overview]
Message-ID: <20161208233844.GJ13139@umbus.fritz.box> (raw)
In-Reply-To: <20161208204228.688545601@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 5654 bytes --]

On Thu, Dec 08, 2016 at 08:49:32PM -0000, Thomas Gleixner wrote:
> The clocksource delta to nanoseconds conversion is using signed math, but
> the delta is unsigned. This makes the conversion space smaller than
> necessary and in case of a multiplication overflow the conversion can
> become negative. The conversion is done with scaled math:
> 
>     s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
> 
> Shifting a signed integer right obvioulsy preserves the sign, which has
> interesting consequences:
>  
>  - Time jumps backwards
>  
>  - __iter_div_u64_rem() which is used in one of the calling code pathes
>    will take forever to piecewise calculate the seconds/nanoseconds part.
> 
> This has been reported by several people with different scenarios:
> 
> David observed that when stopping a VM with a debugger:
> 
>  "It was essentially the stopped by debugger case.  I forget exactly why,
>   but the guest was being explicitly stopped from outside, it wasn't just
>   scheduling lag.  I think it was something in the vicinity of 10 minutes
>   stopped."
> 
>  When lifting the stop the machine went dead.
> 
> The stopped by debugger case is not really interesting, but nevertheless it
> would be a good thing not to die completely.
> 
> But this was also observed on a live system by Liav:
> 
>  "When the OS is too overloaded, delta will get a high enough value for the
>   msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
>   after the shift the nsec variable will gain a value similar to
>   0xffffffffff000000."
> 
> Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
> ("time: Add cycles to nanoseconds translation"). It had been fixed a year
> ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
> timekeeping_get_ns()").
> 
> Though it's not surprising that the issue has been reintroduced because the
> function itself and the whole call chain uses s64 for the result and the
> propagation of it. The change in this recent commit is subtle:
> 
>    s64 nsec;
> 
> -  nsec = (d * m + n) >> s:
> +  nsec = d * m + n;
> +  nsec >>= s;
> 
> d being type of cycles_t adds another level of obfuscation.
> 
> This wouldn't have happened if the previous change to unsigned computation
> would have made the 'nsec' variable u64 right away and a follow up patch
> had cleaned up the whole call chain.
> 
> There have been patches submitted which basically did a revert of the above
> patch leaving everything else unchanged as signed. Back to square one. This
> spawned a admittedly pointless discussion about potential users which rely
> on the unsigned behaviour until someone pointed out that it had been fixed
> before. The changelogs of said patches added further confusion as they made
> finally false claims about the consequences for eventual users which expect
> signed results.
> 
> Despite delta being cycles_t, aka. u64, it's very well possible to hand in
> a signed negative value and the signed computation will happily return the
> correct result. But nobody actually sat down and analyzed the code which
> was added as user after the propably unintended signed conversion.
> 
> Though in sensitive code like this it's better to analyze it proper and
> make sure that nothing relies on this than hunting the subtle wreckage half
> a year later. After analyzing all call chains it stands that no caller can
> hand in a negative value (which actually would work due to the s64 cast)
> and rely on the signed math to do the right thing.
> 
> Change the conversion function to unsigned math. The conversion of all call
> chains is done in a follow up patch.
> 
> This solves the starvation issue, which was caused by the negative result,
> but it does not solve the underlying problem. It merily procrastinates
> it. When the timekeeper update is deferred long enough that the unsigned
> multiplication overflows, then time going backwards is observable again.
> 
> It does neither solve the issue of clocksources with a small counter width
> which will wrap around possibly several times and cause random time stamps
> to be generated. But those are usually not found on systems used for
> virtualization, so this is likely a non issue.
> 
> I took the liberty to claim authorship for this simply because
> analyzing all callsites and writing the changelog took substantially
> more time than just making the simple s/s64/u64/ change and ignore the
> rest.
> 
> Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
> Reported-by: David Gibson <david@gibson.dropbear.id.au>
> Reported-by: Liav Rehana <liavr@mellanox.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  kernel/time/timekeeping.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul
>  static inline u32 arch_gettimeoffset(void) { return 0; }
>  #endif
>  
> -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
> +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
>  					  cycle_t delta)
>  {
> -	s64 nsec;
> +	u64 nsec;
>  
>  	nsec = delta * tkr->mult + tkr->xtime_nsec;
>  	nsec >>= tkr->shift;
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-12-08 23:47 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-08 20:49 [patch 0/6] timekeeping: Cure the signed/unsigned wreckage Thomas Gleixner
2016-12-08 20:49 ` [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion Thomas Gleixner
2016-12-08 23:38   ` David Gibson [this message]
2016-12-09 11:13   ` [tip:timers/core] timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 2/6] timekeeping: Make the conversion call chain consistently unsigned Thomas Gleixner
2016-12-08 23:39   ` David Gibson
2016-12-09 11:13   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 3/6] timekeeping: Get rid of pointless typecasts Thomas Gleixner
2016-12-08 23:40   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 4/6] timekeeping: Use mul_u64_u32_shr() instead of open coding it Thomas Gleixner
2016-12-08 23:41   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Thomas Gleixner
2016-12-09  4:08   ` Ingo Molnar
2016-12-09  4:29     ` Ingo Molnar
2016-12-09  4:39       ` John Stultz
2016-12-09  4:48     ` Peter Zijlstra
2016-12-09  5:22       ` Ingo Molnar
2016-12-09  5:41         ` Peter Zijlstra
2016-12-09  5:11   ` Peter Zijlstra
2016-12-09  6:08     ` Peter Zijlstra
2016-12-09  5:26   ` Peter Zijlstra
2016-12-09  6:38     ` Peter Zijlstra
2016-12-09  8:30       ` Peter Zijlstra
2016-12-09  9:11         ` Peter Zijlstra
2016-12-09 10:01         ` Peter Zijlstra
2016-12-09 17:32         ` Chris Metcalf
2017-01-14 12:51         ` [tip:timers/core] math64, timers: Fix 32bit mul_u64_u32_shr() and friends tip-bot for Peter Zijlstra
2016-12-09 10:18       ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Peter Zijlstra
2016-12-09 17:20         ` Chris Metcalf
2016-12-08 20:49 ` [patch 6/6] [RFD] timekeeping: Get rid of cycle_t Thomas Gleixner
2016-12-08 23:43   ` David Gibson
2016-12-09  4:52 ` [patch 0/6] timekeeping: Cure the signed/unsigned wreckage John Stultz
2016-12-09  5:30 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161208233844.GJ13139@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=christopher.s.hall@intel.com \
    --cc=cmetcalf@mellanox.com \
    --cc=john.stultz@linaro.org \
    --cc=liavr@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.