Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation.

From: Thomas Gleixner <tglx@linutronix.de>
To: John Stultz <john.stultz@linaro.org>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	lkml <linux-kernel@vger.kernel.org>,
	Liav Rehana <liavr@mellanox.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Prarit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S . Hall" <christopher.s.hall@intel.com>,
	"4.6+" <stable@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation.
Date: Thu, 1 Dec 2016 21:46:30 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1612012127260.3666@nanos> (raw)
In-Reply-To: <CALAqxLXSEYMrzZOPhg1ezzatqgRp9w_iyg=a_yZZXmK2rGDxOg@mail.gmail.com>

On Thu, 1 Dec 2016, John Stultz wrote:
> On Thu, Dec 1, 2016 at 3:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > So yes, we can make all this unsigned and all worries about negative deltas
> > are moot.
> 
> Sorry for the slow response, and thanks David, for stepping in here.
> 
> So apologies for not rewriting the commit message, but this is the
> reason I came around on this patch. I didn't see negative deltas as
> valid and so moving to u64 seemed proper.

I understand that, but the changelog is just a fairy tale and not really
helpful in explaining WHY this is the right thing to do.

Aside of that just making that single point u64 is just a sloppy
hack. Either we clean up the whole chain or leave it as is.

And that's something I was asking for from the very beginning, but all I
got so far was handwaving and changelogs which are worse than no changelog
at all.

> > But we really should get rid of that cycle_t typedef and simply use u64 and
> > be done with it. All this typedeffery for no value is just causing
> > confusion. I'm very well able to confuse myself, so I don't need extra
> > stimulus.
> 
> Yea, it can be obscuring. However I worry if we just have a bunch of
> u64s around folks (or maybe just me :) will mix up which are cycles
> and which are nanoseconds more easily.

Well, I really prefer non obscure data types and having:

      u64 nsec, cycles;

makes it pretty clear.

> > We have two options to deal with the issue for wide enough clocksources:
> >
> > 1) Checking that delta is less than clocksource->max_cycles.
> >
> >    I really hate this because we invested a lot of thoughts to squeeze
> >    everything we need for the time getters hotpath into a single cache
> >    line. Accessing clocksource->max_cycles forces us to touch another
> >    one. Bah!
> >
> >    Aside of that what guarantees that we never run into a situation where
> >    something doing timekeeping updates (NTP, PTP, PPS ...) uses such a
> >    clamped value and comes to completely bogus conclusions? Are we going to
> >    analyze and fixup all of that in order to prevent such wreckage?
> >
> >    I seriously doubt that given the fact, that nobody sat down and analyzed
> >    the signed/unsigned issue proper, which is way less complex.
> >
> > 2) Use mul_u64_u32_shr()
> >
> >    That works without an extra cache line, but it's more expensive in terms
> >    of text size especially on architectures which do not support the mul64
> >    expansion to 128bit natively.
> >
> >    But that seems like the most robust solution. We can be clever and make
> >    this conditional on both a configuration switch and a static key which
> >    can be turned on by guests. I'll send out a RFC series later today.
> >
> > Yet another proof that virtualization is creating more problems than it
> > solves.
> 
> I would also suggest:
> 3) If the systems are halted for longer then the timekeeping core
> expects, the system will "miss" or "lose" some portion of that halted
> time, but otherwise the system will function properly.  Which is the
> result with this patch.

Wrong. This is not the result with this patch.

If the time advances enough to overflow the unsigned mult, which is
entirely possible as it takes just twice the time of the negative overflow,
then time will go backwards again and that's not 'miss' or 'lose', that's
just broken.

If we want to prevent that, then we either have to clamp the delta value,
which is the worst choice or use 128bit math to avoid the overflow.

> I'm not sure if its really worth trying to recover that time or be
> perfect in those situations. Especially since on narrow clocksources
> you'll have the same result.

We can deal with the 64bit overflow at least for wide clocksources which
all virtualizaton infected architectures provide in a sane way.

For bare metal systems with narrow clocksources the whole issue is non
existant we can make the 128bit math depend on both a config switch and a
static key, so bare metal will not have to take the burden.

Thanks,

	tglx