On Thu, Dec 01, 2016 at 12:21:02AM +0100, Thomas Gleixner wrote: > On Wed, 30 Nov 2016, David Gibson wrote: > > On Tue, Nov 29, 2016 at 03:22:17PM +0100, Thomas Gleixner wrote: > > > If we have legitimate use cases with a negative delta, then this patch > > > breaks them no matter what. See the basic C course section in the second > > > link. > > > > So, fwiw, when I first wrote a variant on this, I wasn't trying to fix > > every case - just to make the consequences less bad if something goes > > wrong. An overflow here can still mess up timekeeping, it's true, but > > time going backwards tends to cause things to go horribly, horribly > > wrong - which was why I spotted this in the first place. > > I completely understand the intention. > > We _cannot_ make that whole thing unsigned when it is not 100% clear > that there is no legitimate caller which hands in a negative delta and > rightfully expects to get a negative nanoseconds value handed back. But.. delta is a cycle_t, which is typedef'd to u64, so how could it be negative? This is why I believed my original version (35a4933) to be safe - it was merely removing a signed intermediate from what was essentially an unsigned calculation (technically the output was signed, but the right shift means that's not relevant). > If someone sits down and proves that this cannot happen there is no reason > to hold that off. > > But that still does not solve the underlying root cause. Assume the > following: > > T1 = base + to_nsec(delta1) > > where delta1 is big, but the multiplication does not overflow 64bit > > Now wait a bit and do: > > T2 = base + to_nsec(delta2) > > now delta2 is big enough, so the multiplication does overflow 64bit > now delta2 is big enough to overflow 64bit with the multiplication. > > The result is T2 < T1, i.e. time goes backwards. Hm, I see. Do we ever actually update time that way (at least core system time), rather than using the last result as a base? It does seem like the safer approach might be to clamp the result in case of overflow, though. > All what the unsigned conversion does is to procrastinate the problem by a > factor of 2. So instead of failing after 10 seconds we fail after 20 > seconds. And just because you never observed the 20 seconds problem it does > not go away magically. At least in the case I was observing I'm pretty sure we weren't updating time that way - we always used a delta from the last value, so to_nsec() returning always positive was enough to make time not go backwards. > The proper solution is to figure out WHY we are running into that situation > at all. So far all I have seen are symptom reports and fairy tales about > ftp connections, but no real root cause analysis. In the case I hit, it was due to running in a VM that had been stopped for a substantial amount of time, so nothing that's actually under the guest kernel's control. The bug-as-reported was that if the VM was suspended for too long it would blow up immediately upon resume. > The only reason for this to happen is that 'base' does not get updated for > a too long time, so the delta grows into the overflow range. > > We already have protection against idle sleeping too long for this to > happen. If the idle protection is not working then it needs to be fixed. > > if some other situation can cause the base not to be updated for a long > time, then this needs to be fixed. > > Curing the symptom is a guarantee that the root cause will show another > symptom sooner than later. > > Thanks, > > tglx > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson