On Thu, Dec 08, 2016 at 08:49:32PM -0000, Thomas Gleixner wrote: > The clocksource delta to nanoseconds conversion is using signed math, but > the delta is unsigned. This makes the conversion space smaller than > necessary and in case of a multiplication overflow the conversion can > become negative. The conversion is done with scaled math: > > s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift; > > Shifting a signed integer right obvioulsy preserves the sign, which has > interesting consequences: > > - Time jumps backwards > > - __iter_div_u64_rem() which is used in one of the calling code pathes > will take forever to piecewise calculate the seconds/nanoseconds part. > > This has been reported by several people with different scenarios: > > David observed that when stopping a VM with a debugger: > > "It was essentially the stopped by debugger case. I forget exactly why, > but the guest was being explicitly stopped from outside, it wasn't just > scheduling lag. I think it was something in the vicinity of 10 minutes > stopped." > > When lifting the stop the machine went dead. > > The stopped by debugger case is not really interesting, but nevertheless it > would be a good thing not to die completely. > > But this was also observed on a live system by Liav: > > "When the OS is too overloaded, delta will get a high enough value for the > msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so > after the shift the nsec variable will gain a value similar to > 0xffffffffff000000." > > Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8 > ("time: Add cycles to nanoseconds translation"). It had been fixed a year > ago already in commit 35a4933a8959 ("time: Avoid signed overflow in > timekeeping_get_ns()"). > > Though it's not surprising that the issue has been reintroduced because the > function itself and the whole call chain uses s64 for the result and the > propagation of it. The change in this recent commit is subtle: > > s64 nsec; > > - nsec = (d * m + n) >> s: > + nsec = d * m + n; > + nsec >>= s; > > d being type of cycles_t adds another level of obfuscation. > > This wouldn't have happened if the previous change to unsigned computation > would have made the 'nsec' variable u64 right away and a follow up patch > had cleaned up the whole call chain. > > There have been patches submitted which basically did a revert of the above > patch leaving everything else unchanged as signed. Back to square one. This > spawned a admittedly pointless discussion about potential users which rely > on the unsigned behaviour until someone pointed out that it had been fixed > before. The changelogs of said patches added further confusion as they made > finally false claims about the consequences for eventual users which expect > signed results. > > Despite delta being cycles_t, aka. u64, it's very well possible to hand in > a signed negative value and the signed computation will happily return the > correct result. But nobody actually sat down and analyzed the code which > was added as user after the propably unintended signed conversion. > > Though in sensitive code like this it's better to analyze it proper and > make sure that nothing relies on this than hunting the subtle wreckage half > a year later. After analyzing all call chains it stands that no caller can > hand in a negative value (which actually would work due to the s64 cast) > and rely on the signed math to do the right thing. > > Change the conversion function to unsigned math. The conversion of all call > chains is done in a follow up patch. > > This solves the starvation issue, which was caused by the negative result, > but it does not solve the underlying problem. It merily procrastinates > it. When the timekeeper update is deferred long enough that the unsigned > multiplication overflows, then time going backwards is observable again. > > It does neither solve the issue of clocksources with a small counter width > which will wrap around possibly several times and cause random time stamps > to be generated. But those are usually not found on systems used for > virtualization, so this is likely a non issue. > > I took the liberty to claim authorship for this simply because > analyzing all callsites and writing the changelog took substantially > more time than just making the simple s/s64/u64/ change and ignore the > rest. > > Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") > Reported-by: David Gibson > Reported-by: Liav Rehana > Signed-off-by: Thomas Gleixner Reviewed-by: David Gibson > --- > kernel/time/timekeeping.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = defaul > static inline u32 arch_gettimeoffset(void) { return 0; } > #endif > > -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr, > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, > cycle_t delta) > { > - s64 nsec; > + u64 nsec; > > nsec = delta * tkr->mult + tkr->xtime_nsec; > nsec >>= tkr->shift; > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson