From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1757991AbcLACNZ (ORCPT <rfc822;w@1wt.eu>);
        Wed, 30 Nov 2016 21:13:25 -0500
Received: from ozlabs.org ([103.22.144.67]:43549 "EHLO ozlabs.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1753128AbcLACNX (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 30 Nov 2016 21:13:23 -0500
Date: Thu, 1 Dec 2016 13:12:33 +1100
From: David Gibson <david@gibson.dropbear.id.au>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>,
        lkml <linux-kernel@vger.kernel.org>, Liav Rehana <liavr@mellanox.com>,
        Chris Metcalf <cmetcalf@mellanox.com>,
        Richard Cochran <richardcochran@gmail.com>,
        Ingo Molnar <mingo@kernel.org>, Prarit Bhargava <prarit@redhat.com>,
        Laurent Vivier <lvivier@redhat.com>,
        "Christopher S . Hall" <christopher.s.hall@intel.com>,
        "4.6+" <stable@vger.kernel.org>
Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in
 its calculation.
Message-ID: <20161201021233.GI19891@umbus>
References: <1479531216-25361-1-git-send-email-john.stultz@linaro.org>
 <alpine.DEB.2.20.1611291520070.4358@nanos>
 <20161129235727.GA19891@umbus>
 <alpine.DEB.2.20.1611302355070.3619@nanos>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
        protocol="application/pgp-signature"; boundary="9s922KAXlWjPfK/Q"
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.20.1611302355070.3619@nanos>
User-Agent: Mutt/1.7.1 (2016-10-04)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


--9s922KAXlWjPfK/Q
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 01, 2016 at 12:21:02AM +0100, Thomas Gleixner wrote:
> On Wed, 30 Nov 2016, David Gibson wrote:
> > On Tue, Nov 29, 2016 at 03:22:17PM +0100, Thomas Gleixner wrote:
> > > If we have legitimate use cases with a negative delta, then this patch
> > > breaks them no matter what. See the basic C course section in the sec=
ond
> > > link.
> >=20
> > So, fwiw, when I first wrote a variant on this, I wasn't trying to fix
> > every case - just to make the consequences less bad if something goes
> > wrong.  An overflow here can still mess up timekeeping, it's true, but
> > time going backwards tends to cause things to go horribly, horribly
> > wrong - which was why I spotted this in the first place.
>=20
> I completely understand the intention.
>=20
> We _cannot_ make that whole thing unsigned when it is not 100% clear
> that there is no legitimate caller which hands in a negative delta and
> rightfully expects to get a negative nanoseconds value handed back.

But.. delta is a cycle_t, which is typedef'd to u64, so how could it
be negative?

This is why I believed my original version (35a4933) to be safe - it
was merely removing a signed intermediate from what was essentially an
unsigned calculation (technically the output was signed, but the right
shift means that's not relevant).

> If someone sits down and proves that this cannot happen there is no reason
> to hold that off.
>=20
> But that still does not solve the underlying root cause. Assume the
> following:
>=20
>   T1 =3D base + to_nsec(delta1)
>=20
>        where delta1 is big, but the multiplication does not overflow 64bit
>=20
>   Now wait a bit and do:
>      =20
>   T2 =3D base + to_nsec(delta2)
>=20
>        now delta2 is big enough, so the  multiplication does overflow 64b=
it
>        now delta2 is big enough to overflow 64bit with the multiplication.
>=20
>   The result is T2 < T1, i.e. time goes backwards.

Hm, I see.  Do we ever actually update time that way (at least core
system time), rather than using the last result as a base?

It does seem like the safer approach might be to clamp the result in
case of overflow, though.

> All what the unsigned conversion does is to procrastinate the problem by a
> factor of 2. So instead of failing after 10 seconds we fail after 20
> seconds. And just because you never observed the 20 seconds problem it do=
es
> not go away magically.

At least in the case I was observing I'm pretty sure we weren't
updating time that way - we always used a delta from the last value,
so to_nsec() returning always positive was enough to make time not go
backwards.

> The proper solution is to figure out WHY we are running into that situati=
on
> at all. So far all I have seen are symptom reports and fairy tales about
> ftp connections, but no real root cause analysis.

In the case I hit, it was due to running in a VM that had been stopped
for a substantial amount of time, so nothing that's actually under the
guest kernel's control.  The bug-as-reported was that if the VM was
suspended for too long it would blow up immediately upon resume.

> The only reason for this to happen is that 'base' does not get updated for
> a too long time, so the delta grows into the overflow range.
>=20
> We already have protection against idle sleeping too long for this to
> happen. If the idle protection is not working then it needs to be fixed.
>=20
> if some other situation can cause the base not to be updated for a long
> time, then this needs to be fixed.
>=20
> Curing the symptom is a guarantee that the root cause will show another
> symptom sooner than later.
>=20
> Thanks,
>=20
> 	tglx
>=20

--=20
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--9s922KAXlWjPfK/Q
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJYP4cNAAoJEGw4ysog2bOSWaAP/2N07AzidUAnrV2KoBrNvK/i
ttTh6mO4qrqC/zvQonThKfVU+AnycN44IcPoMfsnoh8Sqy++/gyS4dYr4NKdBYhJ
DNM5quwiX1GeOO1yNHzQxrqlHFrfi7oQxTwdcSlYExPwmHeVhpmDsIuGHL8pdkoX
EZ4tQwmcwqfZEMTEiEdAMOCo26MMzznUXy8bUmgR+A/LBPSas2pOB6nRr/59Bfi8
84MNn/GU91GoTWds+YpUFe2S1yEqDjqQ0d1bcQ/VqSGhbNwEiuvrfd1aqTCDiwfx
lj2dbOE1NHPvFPAyW10Kwdl1rgGT6eoZ4CynSXy7sBO5rFR4lX5+FcOmB6m/SfU8
xfkb8DlOIUvQo6VrpzoPpQ4+x8bj2IlSNDYm9g9QcKviDVDUt6H4cawyPHU3+fVi
jMqkCai9Lc6uEaZpQStOyty0XxAbsOAeKixYsXq2qlxKnWTcVTrAYUbpaE2m+ODR
4WnTTnsUkkKHf8S+ZULUVoVDzjHiSd9DaFpQ7HWtettlPlXki5KOHJLGAf6ghFKM
Qf8+oy2k89oLwHKezoGo1iH+J7tfR1RrPkDqVT+NSc7cOQzh8uUbLoUQzEdYVRnD
zt1vf4/uGg0F3zqLpwdZaqlPppRf4ocGTbkikDG95DEXmfZ+DYqetIekO85bS5Jy
m/USIiEVWJgmYDKhaGUN
=wKZH
-----END PGP SIGNATURE-----

--9s922KAXlWjPfK/Q--