linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unexpected clocksource overflow in nsec conversion
@ 2012-01-13 16:21 Will Deacon
  2012-01-13 19:49 ` John Stultz
  0 siblings, 1 reply; 3+ messages in thread
From: Will Deacon @ 2012-01-13 16:21 UTC (permalink / raw)
  To: john.stultz, tglx; +Cc: linux-kernel

Hi Thomas, John,

I'm having some problems with sched_clock where I experience unexpected
overflow due to clocksource->mult being set too high for the width of my
clocksource.

My clocksource parameters are:

	Frequency:	100Mhz
	Width:		56 bits (i.e. mask of (1 << 56) - 1)

	[ following calculated by clocks_calc_mult_shift ]
	Shift:		24
	Mult:		0x0a000000 (167772160)

The reason for the huge multiplier seems to be this code in
__clocksource_updatefreq_scale:


void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
{
	u64 sec;

	/*
	 * Calc the maximum number of seconds which we can run before
	 * wrapping around. For clocksources which have a mask > 32bit
	 * we need to limit the max sleep time to have a good
	 * conversion precision. 10 minutes is still a reasonable
	 * amount. That results in a shift value of 24 for a
	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
	 * margin as we do in clocksource_max_deferment()
	 */
	sec = (cs->mask - (cs->mask >> 3));
	do_div(sec, freq);
	do_div(sec, scale);
	if (!sec)
		sec = 1;
	else if (sec > 600 && cs->mask > UINT_MAX)
		sec = 600;


where we truncate the maximum period to 10 minutes in order to improve
the precision. Since we don't update cs->mask, doesn't this leave us in
a situation where the clocksource can overflow in the ns domain without
overflowing in terms of ticks? I can certainly trigger this situation
with my counter, which results in negative exec_runtimes for tasks, leading
to nasty scheduling issues. When the clocksource passes from 0x199997b3cd
to 0x1999a6e97c, the time (ns) unexpectedly wraps from 0xffffed0602 to
0x851ed8.

So, should we update the clocksource mask when forcing the maximum period
to 600 seconds, or am I missing something?

Cheers,

Will

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Unexpected clocksource overflow in nsec conversion
  2012-01-13 16:21 Unexpected clocksource overflow in nsec conversion Will Deacon
@ 2012-01-13 19:49 ` John Stultz
  2012-01-15 15:14   ` Will Deacon
  0 siblings, 1 reply; 3+ messages in thread
From: John Stultz @ 2012-01-13 19:49 UTC (permalink / raw)
  To: Will Deacon; +Cc: tglx, linux-kernel

On Fri, 2012-01-13 at 16:21 +0000, Will Deacon wrote:
> Hi Thomas, John,
> 
> I'm having some problems with sched_clock where I experience unexpected
> overflow due to clocksource->mult being set too high for the width of my
> clocksource.
> 
> My clocksource parameters are:
> 
> 	Frequency:	100Mhz
> 	Width:		56 bits (i.e. mask of (1 << 56) - 1)
> 
> 	[ following calculated by clocks_calc_mult_shift ]
> 	Shift:		24
> 	Mult:		0x0a000000 (167772160)


Sigh. Yea. In the past, occasional sched_clock rollovers weren't an
issue, the idea was it was an potentially unreliable clock, but really
fast, and any errors would be fleeting. But over time sched_clock's
requirement have grown.

I think you probably need to split the sched clock mult/shift pair from
the clocksource, as they really have different requirements.  The
clocksource needs to have a larger shift value, so we can make fine
grained adjustments to keep accurate time, where as sched_clock should
have a low shift value to avoid frequent overflows.

Even so, the sched clock code doesn't do any sort of periodic
accumulation, so overflows either at the counter level or at the
multiply level when the counter gets large enough (see recent 208 day
bugs on x86) will crop up eventually.

The hard part is that the locking required to do periodic accumulation
goes contrary to what sched_clock is all about.


> The reason for the huge multiplier seems to be this code in
> __clocksource_updatefreq_scale:
> 
> 
> void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
> {
> 	u64 sec;
> 
> 	/*
> 	 * Calc the maximum number of seconds which we can run before
> 	 * wrapping around. For clocksources which have a mask > 32bit
> 	 * we need to limit the max sleep time to have a good
> 	 * conversion precision. 10 minutes is still a reasonable
> 	 * amount. That results in a shift value of 24 for a
> 	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
> 	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
> 	 * margin as we do in clocksource_max_deferment()
> 	 */
> 	sec = (cs->mask - (cs->mask >> 3));
> 	do_div(sec, freq);
> 	do_div(sec, scale);
> 	if (!sec)
> 		sec = 1;
> 	else if (sec > 600 && cs->mask > UINT_MAX)
> 		sec = 600;
> 
> 
> where we truncate the maximum period to 10 minutes in order to improve
> the precision. Since we don't update cs->mask, doesn't this leave us in
> a situation where the clocksource can overflow in the ns domain without
> overflowing in terms of ticks? I can certainly trigger this situation
> with my counter, which results in negative exec_runtimes for tasks, leading
> to nasty scheduling issues. When the clocksource passes from 0x199997b3cd
> to 0x1999a6e97c, the time (ns) unexpectedly wraps from 0xffffed0602 to
> 0x851ed8.
> 
> So, should we update the clocksource mask when forcing the maximum period
> to 600 seconds, or am I missing something?

No. The logic about is really focused around timekeeping requirements,
not sched_clock. With timekeeping we periodicially accumulate time,
creating a new cycle_last base from which we generate cycle deltas with.
This keeps the cycle portion that is multiplied small.

Again, sched_clock doesn't accumulate, so when the counter gets large
enough, the multiply can overflow. On x86 we've split the multiply to
avoid this for now, but this doesn't help on other architectures where
the counter overflows.

So this area definitely needs work, and unfortunately I've not had much
time to work on it.

For a short-term solution, you can maybe look at enabling
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK and setting sched_clock_stable to 0.
This will probably be necessary on any arch using clocksources logic for
sched_clock.

thanks
-john


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Unexpected clocksource overflow in nsec conversion
  2012-01-13 19:49 ` John Stultz
@ 2012-01-15 15:14   ` Will Deacon
  0 siblings, 0 replies; 3+ messages in thread
From: Will Deacon @ 2012-01-15 15:14 UTC (permalink / raw)
  To: John Stultz; +Cc: tglx, linux-kernel

On Fri, Jan 13, 2012 at 07:49:51PM +0000, John Stultz wrote:
> On Fri, 2012-01-13 at 16:21 +0000, Will Deacon wrote:
> > I'm having some problems with sched_clock where I experience unexpected
> > overflow due to clocksource->mult being set too high for the width of my
> > clocksource.
> > 
> > My clocksource parameters are:
> > 
> > 	Frequency:	100Mhz
> > 	Width:		56 bits (i.e. mask of (1 << 56) - 1)
> > 
> > 	[ following calculated by clocks_calc_mult_shift ]
> > 	Shift:		24
> > 	Mult:		0x0a000000 (167772160)
> 
> 
> Sigh. Yea. In the past, occasional sched_clock rollovers weren't an
> issue, the idea was it was an potentially unreliable clock, but really
> fast, and any errors would be fleeting. But over time sched_clock's
> requirement have grown.
> 
> I think you probably need to split the sched clock mult/shift pair from
> the clocksource, as they really have different requirements.  The
> clocksource needs to have a larger shift value, so we can make fine
> grained adjustments to keep accurate time, where as sched_clock should
> have a low shift value to avoid frequent overflows.

Ah, ok then. The hardware counter I'm using is guaranteed not to roll over
within 40 years, so I'm happy to ignore hardware overflow at the moment. So
if I avoid using clocksource_cyc2ns from sched_clock then I can also avoid
the large multiplier.

> > So, should we update the clocksource mask when forcing the maximum period
> > to 600 seconds, or am I missing something?
> 
> No. The logic about is really focused around timekeeping requirements,
> not sched_clock. With timekeeping we periodicially accumulate time,
> creating a new cycle_last base from which we generate cycle deltas with.
> This keeps the cycle portion that is multiplied small.

Right - as long as the timekeeping stuff only uses periods of up to 10
minutes then we're safe (rather than looking at the clocksource mask,
for example).

> Again, sched_clock doesn't accumulate, so when the counter gets large
> enough, the multiply can overflow. On x86 we've split the multiply to
> avoid this for now, but this doesn't help on other architectures where
> the counter overflows.

Given that the CPU I'm working with has hardware integer division, I guess I
could just return ticks * (NSEC_PER_SEC / freq) in my sched clock and forget
about the scaling maths?

Cheers,

Will

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-01-15 15:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-13 16:21 Unexpected clocksource overflow in nsec conversion Will Deacon
2012-01-13 19:49 ` John Stultz
2012-01-15 15:14   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).