From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753970AbcLHXrq (ORCPT ); Thu, 8 Dec 2016 18:47:46 -0500 Received: from ozlabs.org ([103.22.144.67]:46475 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752376AbcLHXrl (ORCPT ); Thu, 8 Dec 2016 18:47:41 -0500 Date: Fri, 9 Dec 2016 10:38:44 +1100 From: David Gibson To: Thomas Gleixner Cc: LKML , John Stultz , Peter Zijlstra , Ingo Molnar , Liav Rehana , Chris Metcalf , Richard Cochran , Parit Bhargava , Laurent Vivier , "Christopher S. Hall" Subject: Re: [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion Message-ID: <20161208233844.GJ13139@umbus.fritz.box> References: <20161208202623.883855034@linutronix.de> <20161208204228.688545601@linutronix.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="zPXeIxDajdrcF2en" Content-Disposition: inline In-Reply-To: <20161208204228.688545601@linutronix.de> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --zPXeIxDajdrcF2en Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 08, 2016 at 08:49:32PM -0000, Thomas Gleixner wrote: > The clocksource delta to nanoseconds conversion is using signed math, but > the delta is unsigned. This makes the conversion space smaller than > necessary and in case of a multiplication overflow the conversion can > become negative. The conversion is done with scaled math: >=20 > s64 nsec_delta =3D ((s64)clkdelta * clk->mult) >> clk->shift; >=20 > Shifting a signed integer right obvioulsy preserves the sign, which has > interesting consequences: > =20 > - Time jumps backwards > =20 > - __iter_div_u64_rem() which is used in one of the calling code pathes > will take forever to piecewise calculate the seconds/nanoseconds part. >=20 > This has been reported by several people with different scenarios: >=20 > David observed that when stopping a VM with a debugger: >=20 > "It was essentially the stopped by debugger case. I forget exactly why, > but the guest was being explicitly stopped from outside, it wasn't just > scheduling lag. I think it was something in the vicinity of 10 minutes > stopped." >=20 > When lifting the stop the machine went dead. >=20 > The stopped by debugger case is not really interesting, but nevertheless = it > would be a good thing not to die completely. >=20 > But this was also observed on a live system by Liav: >=20 > "When the OS is too overloaded, delta will get a high enough value for t= he > msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so > after the shift the nsec variable will gain a value similar to > 0xffffffffff000000." >=20 > Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8 > ("time: Add cycles to nanoseconds translation"). It had been fixed a year > ago already in commit 35a4933a8959 ("time: Avoid signed overflow in > timekeeping_get_ns()"). >=20 > Though it's not surprising that the issue has been reintroduced because t= he > function itself and the whole call chain uses s64 for the result and the > propagation of it. The change in this recent commit is subtle: >=20 > s64 nsec; >=20 > - nsec =3D (d * m + n) >> s: > + nsec =3D d * m + n; > + nsec >>=3D s; >=20 > d being type of cycles_t adds another level of obfuscation. >=20 > This wouldn't have happened if the previous change to unsigned computation > would have made the 'nsec' variable u64 right away and a follow up patch > had cleaned up the whole call chain. >=20 > There have been patches submitted which basically did a revert of the abo= ve > patch leaving everything else unchanged as signed. Back to square one. Th= is > spawned a admittedly pointless discussion about potential users which rely > on the unsigned behaviour until someone pointed out that it had been fixed > before. The changelogs of said patches added further confusion as they ma= de > finally false claims about the consequences for eventual users which expe= ct > signed results. >=20 > Despite delta being cycles_t, aka. u64, it's very well possible to hand in > a signed negative value and the signed computation will happily return the > correct result. But nobody actually sat down and analyzed the code which > was added as user after the propably unintended signed conversion. >=20 > Though in sensitive code like this it's better to analyze it proper and > make sure that nothing relies on this than hunting the subtle wreckage ha= lf > a year later. After analyzing all call chains it stands that no caller can > hand in a negative value (which actually would work due to the s64 cast) > and rely on the signed math to do the right thing. >=20 > Change the conversion function to unsigned math. The conversion of all ca= ll > chains is done in a follow up patch. >=20 > This solves the starvation issue, which was caused by the negative result, > but it does not solve the underlying problem. It merily procrastinates > it. When the timekeeper update is deferred long enough that the unsigned > multiplication overflows, then time going backwards is observable again. >=20 > It does neither solve the issue of clocksources with a small counter width > which will wrap around possibly several times and cause random time stamps > to be generated. But those are usually not found on systems used for > virtualization, so this is likely a non issue. >=20 > I took the liberty to claim authorship for this simply because > analyzing all callsites and writing the changelog took substantially > more time than just making the simple s/s64/u64/ change and ignore the > rest. >=20 > Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") > Reported-by: David Gibson > Reported-by: Liav Rehana > Signed-off-by: Thomas Gleixner Reviewed-by: David Gibson > --- > kernel/time/timekeeping.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) >=20 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) =3D defaul > static inline u32 arch_gettimeoffset(void) { return 0; } > #endif > =20 > -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr, > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, > cycle_t delta) > { > - s64 nsec; > + u64 nsec; > =20 > nsec =3D delta * tkr->mult + tkr->xtime_nsec; > nsec >>=3D tkr->shift; >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --zPXeIxDajdrcF2en Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJYSe8BAAoJEGw4ysog2bOSIcsQAJPLwIkA6WV7HqaxTQxwKVSf /Ywv5ZK3fdhpPSLOZr3bvMwoORyyrpkX7gU9vmT28M0Rpmn+S9fcfvUNPcknPb9f dDdDmHyBLKCXir3ZOnqcmqC+FinFZeSsJIqoSq+kmjuVsXNpXncgKG5bH/K5svHf 8wRsQMWXdj1DB0TZXQ9Y4DvpO4MYOwJuhxLsrbltMaNKPRxa8zLIMXzIQZcywJ6X azWkrvH2PfRxTfZ+yv3yq8n8Jd9eDSbGpERFff/2B2A5aJWnWY0e7m66kMCue1N5 nLeEpB/RbD8oH0oFoL49YHV9yHloptQBvU/yDSn78iK3up/hjeKqv832I20oVpFq IVPrYTXlxOBP/FRJpMXdWQR6lHeDfplGPcRPtIScyBAIjBPdy6GQO/AWsqO4zYy9 kOZI5tqnaY+0Hif6F/rircnUe+sOMb1VKbosPIUCnPGoJktypJpVKKIlJK6sP/ni DQ/ccpcNzklLAq8uM4oailSlprl9W04mg8IH1nkgRIwRRTEDkeRReC6Tj6GCqLPz IfD4Kzl9TuSHoAoFlxC2Ta98R+7IyRCvUnWhDRAKh2e5u1jmeTCoqw4Z7XS0Z3ci EHWkg2yBaTJw1ZfzH/T7CO/aJj0oVxGCTnT/AGwlyQFFhAgplzURxmpTuNcwNekj g2Hgq7XCrZ/NyrecQ5Z4 =ftIE -----END PGP SIGNATURE----- --zPXeIxDajdrcF2en--