linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* One of these things (CONFIG_HZ) is not like the others..
@ 2013-01-21 20:01 Matt Sealey
  2013-01-21 20:41 ` Arnd Bergmann
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 20:01 UTC (permalink / raw)
  To: Linux ARM Kernel ML
  Cc: Arnd Bergmann, LKML, Peter Zijlstra, Ingo Molnar,
	Russell King - ARM Linux

Hello all,

Understanding that this is a bit of a digression, I have a related
nitpick to discussion of the patch "arm: kconfig: don't select TWD
with local timer for Armada 370/XP" which is allowing me to explain
myself a little better given Arnd's recommendation for it, since I was
looking for a really good way to describe it without seeming too
focused on a particular configuration item..

So, to recap, there is a discussion going on about where HAVE_ lives
and what ARCH_MULTIPLATFORM breakes when using HAVE_. I think this is
related, at least, to configuration reworks to make ARCH_MULTIPLATFORM
a truly "inclusive" place..

ARM seems to be the only "major" platform not using the
kernel/Kconfig.hz definitions, instead rolling it's own and setting
what could be described as both reasonable and unreasonable defaults
for platforms. If we're going wholesale for multiplatform on ARM then
having CONFIG_HZ be selected dependent on platform options seems
rather curious since building a kernel for Exynos, OMAP or so will
force the default to a value which is not truly desired by the
maintainers.

config HZ
        int
        default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \
                ARCH_S5PV210 || ARCH_EXYNOS4
        default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER
        default AT91_TIMER_HZ if ARCH_AT91
        default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE
        default 100

There is a patch floating around ("ARM: OMAP2+: timer: remove
CONFIG_OMAP_32K_TIMER")
which modifies the OMAP line, so I'll ignore that for my below
example, and I saw a patch for adding Exynos5 processors to the top
default somewhere around here.

So, based on those getting in, in my case here, I can see a situation where;

* I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I
will get CONFIG_HZ=200.

* If I built for just i.MX6, I will get CONFIG_HZ=100.

Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the
other ARM platforms I also want to boot on it.. this is not exactly
multiplatform compliant, right?

In fact, if I want any other value without meeting any of the other
defaults I am *forced* to have a CONFIG_HZ value of 100 (running
oldconfig will set any value back to this), because none of the
standard (100/300/1000 as I see on x86 and PPC) selection entries or
the override control are present or sourced in the main
arch/arm/Kconfig.

This seems infuriatingly inconsistent - and I am absolutely sure that
the default for Samsung platforms is basically totally unreasonable
(and definitely not multiplatform-aware) behavior in forcing some
default setting.

For AT91 and SHMOBILE, I am not sure at all.. given the need for the
OMAP platform to know what it's timer frequency is, maybe they can be
worked around the same way as the OMAP patch so the dependencies get
removed, but I also don't understand why the actual value CONFIG_HZ
would really matter in these cases (except that it would stop the
kernel trying to check or queue timer events more often than the timer
is capable of running.. surely this is a runtime issue and proper use
of the sched_clock implementation handles this?)

This could in theory be resolved by having the arch-specific Kconfigs
add for example CONFIG_HZ_MY_ARCH (similar to kernel/Kconfig.hz's
CONFIG_HZ_1000 which selects 1000 as the "default") and selecting it
if !ARCH_MULTIPLATFORM, which keeps these special little "my arch is
different to your arch" quirks out of a core configuration file. That
way Exynos-only kernels keep their 200, and AT91 keeps it's.. whatever
that config item resolves to (128 I think), and they would pop up in
the list with 100/300/1000. Also, on ARCH_MULTIPLATFORM kernels, the
default-setting behavior is turned off, so all you'd see is
100/300/1000 and an opportunity to set your own value.

This is, I think, what should be the case - that rather than
"magically" selecting CONFIG_HZ's value, it should be up to the
configurator (individual, maintainer shipping a defconfig,
distribution) of the kernel. And, why not document that "foo" arch
runs better with "CONFIG_HZ_MY_ARCH" and instruct configurators of the
kernel to do the right thing, or pick the average value, or specific
lowest-common-denominator value, instead of forcing the value to the
default for the highest/lowest/random arch that met the dependency of
the "default" directive? The Kconfig system isn't smart enough to
handle this automatically for multiplatform.

Additionally, using kernel/Kconfig.hz is a predicate for enabling
(forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere
else. I don't know how many ARM systems here benefit from this, if
there is a benefit, or what this really means.. if you really have a
high resolution timer (and hrtimers enabled) that would assist the
scheduler this way, is it supposed to make a big difference to the way
the scheduler works for the better or worse? Is this actually
overridden by ARM sched_clock handling or so? Shouldn't there be a
help entry or some documentation for what this option does? I have
CC'd the scheduler maintainers because I'd really like to know what I
am doing here before I venture into putting patches out which could
potentially rip open spacetime and have us all sucked in..

And I guess I have one more question before I do attempt to open that
tear, what really is the effect of CONFIG_HZ vs. CONFIG_NO_HZ vs. ARM
sched_clock, and usage of the new hooks to register a real timer as
ARM delay_timer? I have patches I can modify for upstream that add
both device tree implementation and probing of i.MX highres
clocksources (GPT and EPIT) and registration of sched_clock and delay
timer implementations based on these clocks, but while the code
compiles and seems to work, the ACTUAL effect of these (and the
fundamental requirements for the clocks being used) seems to be
information only in the minds of the people who wrote the code. It's
not that obvious to me what the true effect of using a non-architected
ARM core timer for at least the delay_timer is, and I have some really
odd lpj values and very strange re-calibrations popping out (with
constant rate for the timer, lpj goes down.. when using the
delay_timer implementation, shouldn't lpj be still relative to the
timer rate and NOT cpu frequency?) when using cpufreq on i.MX5 when I
do it, and whether CONFIG_SCHED_HRTICK is a good or bad idea..

Apologies for the insane number of questions here, but fully
appreciative of any answers,

--
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 20:01 One of these things (CONFIG_HZ) is not like the others Matt Sealey
@ 2013-01-21 20:41 ` Arnd Bergmann
  2013-01-21 21:00   ` John Stultz
                     ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Arnd Bergmann @ 2013-01-21 20:41 UTC (permalink / raw)
  To: Matt Sealey
  Cc: Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar,
	Russell King - ARM Linux, John Stultz, Ben Dooks

On Monday 21 January 2013, Matt Sealey wrote:
> 
> ARM seems to be the only "major" platform not using the
> kernel/Kconfig.hz definitions, instead rolling it's own and setting
> what could be described as both reasonable and unreasonable defaults
> for platforms. If we're going wholesale for multiplatform on ARM then
> having CONFIG_HZ be selected dependent on platform options seems
> rather curious since building a kernel for Exynos, OMAP or so will
> force the default to a value which is not truly desired by the
> maintainers.

Agreed 100%.

(adding John Stultz to Cc, he's the local time expert)

> config HZ
>         int
>         default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \
>                 ARCH_S5PV210 || ARCH_EXYNOS4
>         default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER
>         default AT91_TIMER_HZ if ARCH_AT91
>         default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE
>         default 100
> 
> There is a patch floating around ("ARM: OMAP2+: timer: remove
> CONFIG_OMAP_32K_TIMER")
> which modifies the OMAP line, so I'll ignore that for my below
> example, and I saw a patch for adding Exynos5 processors to the top
> default somewhere around here.
> 
> So, based on those getting in, in my case here, I can see a situation where;
> 
> * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I
> will get CONFIG_HZ=200.
> 
> * If I built for just i.MX6, I will get CONFIG_HZ=100.
> 
> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the
> other ARM platforms I also want to boot on it.. this is not exactly
> multiplatform compliant, right?

Right. It's pretty clear that the above logic does not work
with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
select NO_HZ to make the question much less interesting.

Regarding the defaults, I would suggest putting them into all the
defaults into the defconfig files and removing the other hardcoding
otherwise. Ben Dooks and Russell are probably the best to know 
what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
is that the other samsung ones are the result of cargo cult
programming.

at91 and omap set the HZ value to something that is derived
from their hardware timer, but we have also forever had logic
to calculate the exact time when that does not match. This code
has very recently been moved into the new register_refined_jiffies()
function. John can probably tell is if this solves all the problems
for these platforms.

> Additionally, using kernel/Kconfig.hz is a predicate for enabling
> (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere
> else. I don't know how many ARM systems here benefit from this, if
> there is a benefit, or what this really means.. if you really have a
> high resolution timer (and hrtimers enabled) that would assist the
> scheduler this way, is it supposed to make a big difference to the way
> the scheduler works for the better or worse? Is this actually
> overridden by ARM sched_clock handling or so? Shouldn't there be a
> help entry or some documentation for what this option does? I have
> CC'd the scheduler maintainers because I'd really like to know what I
> am doing here before I venture into putting patches out which could
> potentially rip open spacetime and have us all sucked in..

Yes, that sounds like yet another bug.

	Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 20:41 ` Arnd Bergmann
@ 2013-01-21 21:00   ` John Stultz
  2013-01-21 21:12     ` Russell King - ARM Linux
  2013-01-21 21:14     ` Matt Sealey
  2013-01-21 21:02   ` Matt Sealey
  2013-01-21 21:03   ` Russell King - ARM Linux
  2 siblings, 2 replies; 48+ messages in thread
From: John Stultz @ 2013-01-21 21:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux, Ben Dooks

On 01/21/2013 12:41 PM, Arnd Bergmann wrote:
> On Monday 21 January 2013, Matt Sealey wrote:
>> config HZ
>>          int
>>          default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \
>>                  ARCH_S5PV210 || ARCH_EXYNOS4
>>          default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER
>>          default AT91_TIMER_HZ if ARCH_AT91
>>          default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE
>>          default 100
>>
>> There is a patch floating around ("ARM: OMAP2+: timer: remove
>> CONFIG_OMAP_32K_TIMER")
>> which modifies the OMAP line, so I'll ignore that for my below
>> example, and I saw a patch for adding Exynos5 processors to the top
>> default somewhere around here.
>>
>> So, based on those getting in, in my case here, I can see a situation where;
>>
>> * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I
>> will get CONFIG_HZ=200.
>>
>> * If I built for just i.MX6, I will get CONFIG_HZ=100.
>>
>> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the
>> other ARM platforms I also want to boot on it.. this is not exactly
>> multiplatform compliant, right?
> Right. It's pretty clear that the above logic does not work
> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
> select NO_HZ to make the question much less interesting.

Although, even with NO_HZ, we still have some sense of HZ.

> Regarding the defaults, I would suggest putting them into all the
> defaults into the defconfig files and removing the other hardcoding
> otherwise. Ben Dooks and Russell are probably the best to know
> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
> is that the other samsung ones are the result of cargo cult
> programming.
>
> at91 and omap set the HZ value to something that is derived
> from their hardware timer, but we have also forever had logic
> to calculate the exact time when that does not match. This code
> has very recently been moved into the new register_refined_jiffies()
> function. John can probably tell is if this solves all the problems
> for these platforms.

Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent 
(and the register_refined_jiffies is really only necessary if you're not 
expecting a proper clocksource to eventually be registered), assuming 
the hardware can do something close to the HZ value requested.

So I'd probably want to hear about what history caused the specific 200 
HZ selections, as I suspect there's actual hardware limitations there. 
So if you can not get actual timer ticks any faster then 200 HZ on that 
hardware, setting HZ higher could cause some jiffies related timer 
trouble (ie: if the kernel thinks HZ is 1000 but the hardware can only 
do 200, that's a different problem then if the hardware actually can 
only do 999.8 HZ). So things like timer-wheel timeouts may not happen 
when they should.

I suspect the best approach for multi-arch in those cases may be to 
select HZ=100 and use HRT to allow more modern systems to have 
finer-grained timers.

thanks
-john




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 20:41 ` Arnd Bergmann
  2013-01-21 21:00   ` John Stultz
@ 2013-01-21 21:02   ` Matt Sealey
  2013-01-21 22:30     ` Arnd Bergmann
  2013-01-21 21:03   ` Russell King - ARM Linux
  2 siblings, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 21:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar,
	Russell King - ARM Linux, John Stultz, Ben Dooks

On Mon, Jan 21, 2013 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 21 January 2013, Matt Sealey wrote:
>>
>> ARM seems to be the only "major" platform not using the
>> kernel/Kconfig.hz definitions, instead rolling it's own and setting
>> what could be described as both reasonable and unreasonable defaults
>> for platforms. If we're going wholesale for multiplatform on ARM then
>> having CONFIG_HZ be selected dependent on platform options seems
>> rather curious since building a kernel for Exynos, OMAP or so will
>> force the default to a value which is not truly desired by the
>> maintainers.
>
> Agreed 100%.
>
> (adding John Stultz to Cc, he's the local time expert)

Hi, John! Welcome to the fray :)

>> config HZ
>>         int
>>         default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \
>>                 ARCH_S5PV210 || ARCH_EXYNOS4
>>         default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER
>>         default AT91_TIMER_HZ if ARCH_AT91
>>         default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE
>>         default 100
>>

[snip]

>> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the
>> other ARM platforms I also want to boot on it.. this is not exactly
>> multiplatform compliant, right?
>
> Right. It's pretty clear that the above logic does not work
> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
> select NO_HZ to make the question much less interesting.
>
> Regarding the defaults, I would suggest putting them into all the
> defaults into the defconfig files and removing the other hardcoding
> otherwise. Ben Dooks and Russell are probably the best to know
> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
> is that the other samsung ones are the result of cargo cult
> programming.
>
> at91 and omap set the HZ value to something that is derived
> from their hardware timer, but we have also forever had logic
> to calculate the exact time when that does not match. This code
> has very recently been moved into the new register_refined_jiffies()
> function. John can probably tell is if this solves all the problems
> for these platforms.

I would be very interested. My plan would be then (providing John
responds in the affirmative) to basically submit a patch to remove the
8 lines pasted above and source kernel/Kconfig.hz instead. I'm doing
this now on a local kernel tree and I can't see any real problem with
it.

It would then be up to the above-mentioned maintainers to decide if
they are part of the cargo cult and don't need it or refine their
board files to match the New World Order of using Kconfig.hz. The
unconfigured kernel default is 100 anyway which is lower than all the
above default setting, so I would technically be causing a regression
on those platforms... do I want to be responsible for that? Probably
not, but as I said, it's not affecting (in fact, it may be
*improving*) the platforms I care about.

>> Additionally, using kernel/Kconfig.hz is a predicate for enabling
>> (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere
>> else. I don't know how many ARM systems here benefit from this, if
>> there is a benefit, or what this really means.. if you really have a
>> high resolution timer (and hrtimers enabled) that would assist the
>> scheduler this way, is it supposed to make a big difference to the way
>> the scheduler works for the better or worse? Is this actually
>> overridden by ARM sched_clock handling or so? Shouldn't there be a
>> help entry or some documentation for what this option does? I have
>> CC'd the scheduler maintainers because I'd really like to know what I
>> am doing here before I venture into putting patches out which could
>> potentially rip open spacetime and have us all sucked in..
>
> Yes, that sounds like yet another bug.

So is that a bug in that it is not available to ARM right now, a bug
in that it would be impossible for anyone on ARM to have ever tested
this code, or a bug in that it should NEVER be enabled for ARM for
some reason? John? Ingo? :)

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 20:41 ` Arnd Bergmann
  2013-01-21 21:00   ` John Stultz
  2013-01-21 21:02   ` Matt Sealey
@ 2013-01-21 21:03   ` Russell King - ARM Linux
  2013-01-21 23:23     ` Tony Lindgren
  2 siblings, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 21:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, John Stultz, Ben Dooks

On Mon, Jan 21, 2013 at 08:41:17PM +0000, Arnd Bergmann wrote:
> On Monday 21 January 2013, Matt Sealey wrote:
> > 
> > ARM seems to be the only "major" platform not using the
> > kernel/Kconfig.hz definitions, instead rolling it's own and setting
> > what could be described as both reasonable and unreasonable defaults
> > for platforms.

No, you've got this totally wrong.

They're not defaults.  And I object to your use of "unreasonable" too.
I've no idea where you get that from.

There's a reason why we have different HZ rates - some platforms just
can't do the standard 100Hz tick rate.  No way - their timers can't
divide down to that interrupt rate.  Sorry to spoil your ivory tower
with a few facts, but your statement is just rediculous.

The reason we don't use kernel/Kconfig.hz is precisely because of that;
we _HAVE_ to have different HZ definitions on different platforms, and
you'll notice that kernel/Kconfig.hz makes _no_ prevision for this.

Now, while things have moved forwards and we have clocksource/clockevent
support, not every platform can support this timekeeping structure;
ebsa110 certainly can't.  There's one timer and one timer only which
is usable, which even needs to be manually reloaded by the CPU.  No
other independent counter to act as a clock source.

As for Samsung and the rest I can't comment.  The original reason OMAP
used this though was because the 32768Hz counter can't produce 100Hz
without a .1% error - too much error under pre-clocksource
implementations for timekeeping.  Whether that's changed with the
clocksource/clockevent support needs to be checked.

It's entirely possible with the modern clocksource/clockevent support
that many of these platforms can have their alternative HZ tick rates
removed - but there will continue to be a subset which can't, and all
the time that we have such a subset, kernel/Kconfig.hz can't be used
without modification.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:00   ` John Stultz
@ 2013-01-21 21:12     ` Russell King - ARM Linux
  2013-01-21 22:18       ` John Stultz
  2013-01-21 22:20       ` Matt Sealey
  2013-01-21 21:14     ` Matt Sealey
  1 sibling, 2 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 21:12 UTC (permalink / raw)
  To: John Stultz
  Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, Ben Dooks

On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote:
> So if you can not get actual timer ticks any faster then 200 HZ on that  
> hardware, setting HZ higher could cause some jiffies related timer  
> trouble

Err, no John.  It's the other way around - especially on some platforms
which are incapable of being converted to the clock source support.

EBSA110 has _one_ counter.  It counts down at a certain rate, and when
it rolls over from 0 to FFFF, it produces an interrupt and continues
counting down from FFFF.

To produce anything close to a reasonable regular tick rate from that,
the only way to do it is - with interrupts disabled - read the current
value to find out how far the timer has rolled over, and set it so that
the next event will expire as close as possible to the desired HZ rate.

So, none of the clcokevent stuff can be used; and we rely _purely_ on
counting interrupts in jiffy based increments to provide any reference
of time.

Moreover, because the counter is only 16-bit, and it's clocked from
something around 7MHz, well, maths will tell you why 200Hz had to be
chosen rather than 100Hz.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:00   ` John Stultz
  2013-01-21 21:12     ` Russell King - ARM Linux
@ 2013-01-21 21:14     ` Matt Sealey
  2013-01-21 22:36       ` John Stultz
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 21:14 UTC (permalink / raw)
  To: John Stultz
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> wrote:
> On 01/21/2013 12:41 PM, Arnd Bergmann wrote:
>>
>> Right. It's pretty clear that the above logic does not work
>> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
>> select NO_HZ to make the question much less interesting.
>
> Although, even with NO_HZ, we still have some sense of HZ.

I wonder if you can confirm my understanding of this by the way? The
way I think this works is;

CONFIG_HZ on it's own defines the rate at which the kernel wakes up
from sleeping on the job, and checks for current or expired timer
events such that it can do things like schedule_work (as in
workqueues) or perform scheduler (as in processes/tasks) operations.

CONFIG_NO_HZ turns on logic which effectively only wakes up at a
*maximum* of CONFIG_HZ times per second, but otherwise will go to
sleep and stay that way if no events actually happened (so, we rely on
a timer interrupt popping up).

In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
example) combined with CONFIG_NO_HZ and less than e.g. 250 things
happening per second will wake up "exactly" the same number of times?

CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
solution here, then, and CONFIG_HZ=100 should be a reasonable default
(as it is anyway with an otherwise-unconfigured kernel on any other
platform) for !CONFIG_NO_HZ.

I have to admit, the only reason I noticed the above is because I was
reading one of CK's BFS logs and reading it makes it seem like the
above is the case, but I have no idea if he thinks BFS makes that the
case or if the current CFQ scheduler makes that the case, or if this
is simply.. the case.. (can you see this is kind of confusing to me as
this is basically not written anywhere except maybe an LWN article
from 2008 I read up on? :)

>> Regarding the defaults, I would suggest putting them into all the
>> defaults into the defconfig files and removing the other hardcoding
>> otherwise. Ben Dooks and Russell are probably the best to know
>> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
>> is that the other samsung ones are the result of cargo cult
>> programming.
>>
>> at91 and omap set the HZ value to something that is derived
>> from their hardware timer, but we have also forever had logic
>> to calculate the exact time when that does not match. This code
>> has very recently been moved into the new register_refined_jiffies()
>> function. John can probably tell is if this solves all the problems
>> for these platforms.
>
>
> Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and
> the register_refined_jiffies is really only necessary if you're not
> expecting a proper clocksource to eventually be registered), assuming the
> hardware can do something close to the HZ value requested.
>
> So I'd probably want to hear about what history caused the specific 200 HZ
> selections, as I suspect there's actual hardware limitations there. So if
> you can not get actual timer ticks any faster then 200 HZ on that hardware,
> setting HZ higher could cause some jiffies related timer trouble (ie: if the
> kernel thinks HZ is 1000 but the hardware can only do 200, that's a
> different problem then if the hardware actually can only do 999.8 HZ). So
> things like timer-wheel timeouts may not happen when they should.
>
> I suspect the best approach for multi-arch in those cases may be to select
> HZ=100

As above, or "not select anything at all" since HZ=100 if you don't
touch anything, right?

If someone picks HZ=1000 and their platform can't support it, then
that's their own damn problem (don't touch things you don't
understand, right? ;)

> and use HRT to allow more modern systems to have finer-grained
> timers.

My question really has to be is CONFIG_SCHED_HRTICK useful, what
exactly is it going to do on ARM here since nobody can ever have
enabled it? Is it going to keel over and explode if nobody registers a
non-jiffies sched_clock (since the jiffies clock is technically
reporting itself as a ridiculously high resolution clocksource..)?

Or is this one of those things that if your platform doesn't have a
real high resolution timer, you shouldn't enable HRTIMERS and
therefore not enable SCHED_HRTICK as a result? That affects
ARCH_MULTIPLATFORM here. Is the solution as simple as
ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
resolution timer? Documentation to that effect?

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:12     ` Russell King - ARM Linux
@ 2013-01-21 22:18       ` John Stultz
  2013-01-21 22:44         ` Russell King - ARM Linux
  2013-01-21 22:20       ` Matt Sealey
  1 sibling, 1 reply; 48+ messages in thread
From: John Stultz @ 2013-01-21 22:18 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, Ben Dooks

On 01/21/2013 01:12 PM, Russell King - ARM Linux wrote:
> On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote:
>> So if you can not get actual timer ticks any faster then 200 HZ on that
>> hardware, setting HZ higher could cause some jiffies related timer
>> trouble
> Err, no John.  It's the other way around - especially on some platforms
> which are incapable of being converted to the clock source support.
>
> EBSA110 has _one_ counter.  It counts down at a certain rate, and when
> it rolls over from 0 to FFFF, it produces an interrupt and continues
> counting down from FFFF.
>
> To produce anything close to a reasonable regular tick rate from that,
> the only way to do it is - with interrupts disabled - read the current
> value to find out how far the timer has rolled over, and set it so that
> the next event will expire as close as possible to the desired HZ rate.
>
> So, none of the clcokevent stuff can be used; and we rely _purely_ on
> counting interrupts in jiffy based increments to provide any reference
> of time.
> Moreover, because the counter is only 16-bit, and it's clocked from
> something around 7MHz, well, maths will tell you why 200Hz had to be
> chosen rather than 100Hz.

Ah, so the counter can't do anything *lower* then ~107HZ, right? (7MHZ/2^16)

So we used to have the ACTHZ code to handle error from the HZ rate 
requested and the HZ rate possible given the underlying hardware. That's 
been moved to the register_refined_jiffies(), but do you have a sense if 
there a reason it couldn't be used? I don't quite recall the bounds at 
this second, so ~7% error might very well be too large.

So yes, I suspect these sorts of platforms, where there are no modern 
clocksource/clockevent driver, as well as further constraints (like 
specific HZ) are likely not good candidates for a multi-arch build.

thanks
-john

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:12     ` Russell King - ARM Linux
  2013-01-21 22:18       ` John Stultz
@ 2013-01-21 22:20       ` Matt Sealey
  2013-01-21 22:42         ` Russell King - ARM Linux
  1 sibling, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 22:20 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, Ben Dooks

On Mon, Jan 21, 2013 at 3:12 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote:
>> So if you can not get actual timer ticks any faster then 200 HZ on that
>> hardware, setting HZ higher could cause some jiffies related timer
>> trouble
>
> Err, no John.  It's the other way around - especially on some platforms
> which are incapable of being converted to the clock source support.
>
> EBSA110 has _one_ counter.  It counts down at a certain rate, and when
> it rolls over from 0 to FFFF, it produces an interrupt and continues
> counting down from FFFF.
>
> To produce anything close to a reasonable regular tick rate from that,
> the only way to do it is - with interrupts disabled - read the current
> value to find out how far the timer has rolled over, and set it so that
> the next event will expire as close as possible to the desired HZ rate.
>
> So, none of the clcokevent stuff can be used; and we rely _purely_ on
> counting interrupts in jiffy based increments to provide any reference
> of time.
>
> Moreover, because the counter is only 16-bit, and it's clocked from
> something around 7MHz, well, maths will tell you why 200Hz had to be
> chosen rather than 100Hz.

I am sorry it sounded if I was being high and mighty about not being
able to select my own HZ (or being forced by Exynos to be 200 or by
not being able to test an Exynos board, forced to default to 100). My
real "grievance" here is we got a configuration item for the scheduler
which is being left out of ARM configurations which *can* use high
resolution timers, but I don't know if this is a real problem or not,
hence asking about it, and that HZ=100 is the ARM default whether we
might be able to select that or not.. which seems low.

HZ=250 is the "current" kernel default if you don't touch anything, it
seems, apologies for thinking it was HZ=100. And that is too high for
EBSA110 and a couple of other boards, especially where HZ must equal
some exact divisor being pumped right into some timer unit.
Understood. Surely the correct divisor should be *derived* from HZ and
not just dumped into the timer though, so HZ being set to an exact
divisor (but a round-down-to-acceptable-value) is kind of a hacky
concept..?

For the global kernel guys, I'd ask what is the reasoning for using
HZ=250 by default, I wonder? It seems like this number is from the
dark ages (pre-git, pre-bitkeeper, maybe pre-recorded history ;) and
the reason is lost. Why not HZ=100 or HZ=300 (if the help text is to
be believed, and it is probably older than God, HZ=300 is great for
playing back NTSC-format video.. :)? I can side with you on the
premise that in actual fact, defining a default HZ value in the
non-arch-specific kernel proper is a little quirky and it should be
something the arches do themselves (i.e. move the default-setting
stuff at the end into the arch/*/Kconfig - I would expect that now
i386 CPU support is gone from arch/x86, there's potentially a better
value than HZ=250 for the default?).

Anyway, a patch for ARM could perhaps end up like this:

~~
if ARCH_MULTIPLATFORM
source kernel/Kconfig.hz
else
HZ
    default 100
endif

HZ
    default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER
    # any previous platform definitions where *really* required here.
    # but not default 100 since it would override kernel/Kconfig.hz every time
~~

Which preserves all previous behaviors on all possible ARM arch
combinations, but where no reasonable override is set.. Kconfig.hz is
king. I cannot imagine any situation except for AT91 or OMAP could not
do this in their own {mach,plat}-*/Kconfigs and not in the core
config, which cleans up the extra HZ block.

We can agree that the "default 200 if.." list is unwieldy and Arnd is
right in that there is some cargo-cult programming going on here,
right?

Even if we assume EBSA110 and a couple others are really affected by
having such timer setups, therefore "reasonable", I'd challenge anyone
to tell me Exynos4 or the S5P platforms do not have high resolution
timers capable of handling more than HZ=200 (or the default HZ=250)
which I would class as "unreasonable".. this is why I said it was
possibly both. I am not one to judge some of these platforms I've
never even heard of, that is why I am *asking* about it before I even
think of doing anything about it.

I tested this a few weeks ago with a *few* defconfigs (by sourcing
Kconfig.hz above the existing HZ definitions) and it does effectively
override the value I went in and stabbed into menuconfig, in the
resultant generated local .config file - if they themselves are
sourced AFTER the source kernel/Kconfig.hz (which they pretty much
are) in arch/arm/Kconfig.

Could we also at least agree that if EBSA110 can handle HZ=200 with a
16-bit timer, or HZ=128 for OMAP and that AT91 will override it to 100
on it's own, then that "default 100" is overly restrictive and we
could remove it, allowing each {mach,plat}-*/Kconfig owner to
investigate and find the correct HZ value and implement an override or
selection, or just allow free configuration?

As far as I can tell AT91 and SHMOBILE only supply defaults because HZ
*must* meet some exact timer divisor (OMAP says "Kernel internal timer
frequency should be a divisor of 32768") in which case their timer
drivers should not be so stupid and instead round down to the nearest
acceptable timer divisor or WARN_ON if the compile-time values are
unacceptable at runtime before anyone sees any freakish behavior. Is
it a hard requirement for the ARM architecture that a woefully
mis-configured kernel MUST boot completely to userspace?

--
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:02   ` Matt Sealey
@ 2013-01-21 22:30     ` Arnd Bergmann
  2013-01-21 22:45       ` Russell King - ARM Linux
  0 siblings, 1 reply; 48+ messages in thread
From: Arnd Bergmann @ 2013-01-21 22:30 UTC (permalink / raw)
  To: Matt Sealey
  Cc: Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar,
	Russell King - ARM Linux, John Stultz, Ben Dooks

On Monday 21 January 2013, Matt Sealey wrote:
> So is that a bug in that it is not available to ARM right now, a bug
> in that it would be impossible for anyone on ARM to have ever tested
> this code, or a bug in that it should NEVER be enabled for ARM for
> some reason? John? Ingo? :)
> 

I think it's a bug that it's not available. That does not look intentional.

	Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:14     ` Matt Sealey
@ 2013-01-21 22:36       ` John Stultz
  2013-01-21 22:49         ` Russell King - ARM Linux
  2013-01-21 22:54         ` Matt Sealey
  0 siblings, 2 replies; 48+ messages in thread
From: John Stultz @ 2013-01-21 22:36 UTC (permalink / raw)
  To: Matt Sealey
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

On 01/21/2013 01:14 PM, Matt Sealey wrote:
> On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On 01/21/2013 12:41 PM, Arnd Bergmann wrote:
>>> Right. It's pretty clear that the above logic does not work
>>> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
>>> select NO_HZ to make the question much less interesting.
>> Although, even with NO_HZ, we still have some sense of HZ.
> I wonder if you can confirm my understanding of this by the way? The
> way I think this works is;
>
> CONFIG_HZ on it's own defines the rate at which the kernel wakes up
> from sleeping on the job, and checks for current or expired timer
> events such that it can do things like schedule_work (as in
> workqueues) or perform scheduler (as in processes/tasks) operations.

CONFIG_HZ defines the length of a jiffy.

In the absence of NOHZ and HRT, HZ defines how frequently the 
timer/scheduler tick will fire.

> CONFIG_NO_HZ turns on logic which effectively only wakes up at a
> *maximum* of CONFIG_HZ times per second, but otherwise will go to
> sleep and stay that way if no events actually happened (so, we rely on
> a timer interrupt popping up).

NOHZ adds logic which basically allows us to skip ticks if the cpu is idle.

And HRT adds logic which allows us to fire timers more frequently then HZ.

> In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
> example) combined with CONFIG_NO_HZ and less than e.g. 250 things
> happening per second will wake up "exactly" the same number of times?
Ideally, if both systems are completely idle, they may see similar 
number of actual interrupts.

But when the cpus are running processes, the HZ=1000 system will see 
more frequent interrupts, since the timer/scheduler interrupt will jump 
in 4 times more frequently.


> CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
> solution here, then, and CONFIG_HZ=100 should be a reasonable default
> (as it is anyway with an otherwise-unconfigured kernel on any other
> platform) for !CONFIG_NO_HZ.

Eeehhh... I'm not sure this is follows.

>>
>> Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and
>> the register_refined_jiffies is really only necessary if you're not
>> expecting a proper clocksource to eventually be registered), assuming the
>> hardware can do something close to the HZ value requested.
>>
>> So I'd probably want to hear about what history caused the specific 200 HZ
>> selections, as I suspect there's actual hardware limitations there. So if
>> you can not get actual timer ticks any faster then 200 HZ on that hardware,
>> setting HZ higher could cause some jiffies related timer trouble (ie: if the
>> kernel thinks HZ is 1000 but the hardware can only do 200, that's a
>> different problem then if the hardware actually can only do 999.8 HZ). So
>> things like timer-wheel timeouts may not happen when they should.
>>
>> I suspect the best approach for multi-arch in those cases may be to select
>> HZ=100
> As above, or "not select anything at all" since HZ=100 if you don't
> touch anything, right?

Well, Russell brought up a case that doesn't handle this. If a system 
*can't* do HZ=100, but can do HZ=200.

Though there are hacks, of course, that might get around this (skip 
every other interrupt at 200HZ).

> If someone picks HZ=1000 and their platform can't support it, then
> that's their own damn problem (don't touch things you don't
> understand, right? ;)
Well, ideally with kconfig we try to add proper dependencies so 
impossible options aren't left to the user.
HZ is a common enough knob to turn on most systems, I don't know if 
leaving the user rope to hang himself is a great idea.

>
>> and use HRT to allow more modern systems to have finer-grained
>> timers.
> My question really has to be is CONFIG_SCHED_HRTICK useful, what
> exactly is it going to do on ARM here since nobody can ever have
> enabled it? Is it going to keel over and explode if nobody registers a
> non-jiffies sched_clock (since the jiffies clock is technically
> reporting itself as a ridiculously high resolution clocksource..)?
??? Not following this at all.  jiffies is the *MOST* coarse resolution 
clocksource there is (at least that I'm aware of.. I recall someone 
wanting to do a 60Hz clocksource, but I don't think that ever happened).

> Or is this one of those things that if your platform doesn't have a
> real high resolution timer, you shouldn't enable HRTIMERS and
> therefore not enable SCHED_HRTICK as a result? That affects
> ARCH_MULTIPLATFORM here. Is the solution as simple as
> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
> resolution timer? Documentation to that effect?

SO HRITMERS was designed to be be build time enabled, while still giving 
you a functioning system if it was booted on a system that didn't 
support clockevents.  We boot with standard HZ, and only switch over to 
HRT mode if we have a proper clocksource and clockevent driver.

However, HRTIMERS or NOHZ doesn't fix the case of having a system boot 
with HZ=1000 or HZ=100 if the system can *only* do HZ=200.

thanks
-john




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:20       ` Matt Sealey
@ 2013-01-21 22:42         ` Russell King - ARM Linux
  2013-01-21 23:23           ` Matt Sealey
  0 siblings, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 22:42 UTC (permalink / raw)
  To: Matt Sealey
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, Ben Dooks

On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
> I am sorry it sounded if I was being high and mighty about not being
> able to select my own HZ (or being forced by Exynos to be 200 or by
> not being able to test an Exynos board, forced to default to 100). My
> real "grievance" here is we got a configuration item for the scheduler
> which is being left out of ARM configurations which *can* use high
> resolution timers, but I don't know if this is a real problem or not,
> hence asking about it, and that HZ=100 is the ARM default whether we
> might be able to select that or not.. which seems low.

Well, I have a versatile platform here.  It's the inteligence behind
the power control system for booting the boards on the nightly tests
(currently disabled because I'm waiting for my main server to lock up
again, and I need to use one of the serial ports for that.)

The point is, it talks via I2C to a load of power monitors to read
samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
built with HZ=100.  NO_HZ=y and highres timers are enabled... works
fine.

So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
highres timers, it all works with epoll() - you get the interval that
you're after.  I've verified this with calls to gettimeofday() and
the POSIX clocks.

> HZ=250 is the "current" kernel default if you don't touch anything, it
> seems, apologies for thinking it was HZ=100.

Actually, it always used to be 100Hz on everything, including x86.
It got upped when there were interactivity issues... which haven't
been reported on ARM - so why change something that we know works and
everyone is happy with?

> And that is too high for
> EBSA110 and a couple of other boards, especially where HZ must equal
> some exact divisor being pumped right into some timer unit.

EBSA110 can do 250Hz, but it'll mean manually recalculating the timer
arithmetic - because it's not a "reloading" counter - software has to
manually reload it, and you have to take account of how far it's
rolled over to get anything close to a regular interrupt rate which
NTP is happy with.  And believe me, it used to be one of two main NTP
broadcasting servers on my network, so I know it works.

> Understood. Surely the correct divisor should be *derived* from HZ and
> not just dumped into the timer though, so HZ being set to an exact
> divisor (but a round-down-to-acceptable-value) is kind of a hacky
> concept..?

No.  See above.  It's not a simple bit of maths.  You need to know how
fast the CPU runs, and how many instructions it takes to read the
current value, modify it, write it back and factor that into the
calculation.  Get it wrong - by even as little as one count - and the
error is too large, and NTP fails to sync.

> For the global kernel guys, I'd ask what is the reasoning for using
> HZ=250 by default, I wonder? It seems like this number is from the
> dark ages (pre-git, pre-bitkeeper, maybe pre-recorded history ;) and
> the reason is lost. Why not HZ=100 or HZ=300 (if the help text is to
> be believed, and it is probably older than God, HZ=300 is great for
> playing back NTSC-format video.. :)? I can side with you on the
> premise that in actual fact, defining a default HZ value in the
> non-arch-specific kernel proper is a little quirky and it should be
> something the arches do themselves (i.e. move the default-setting
> stuff at the end into the arch/*/Kconfig - I would expect that now
> i386 CPU support is gone from arch/x86, there's potentially a better
> value than HZ=250 for the default?).

>From what I remember, the history is that HZ used to be 100.  Then it
became 1000 as an experiment to do with desktop interactivity.  That
was found to be too heavy, so it was then dropped by a factor of 4 as
a compromise.

That's why kernel/Kconfig.hz has 100, 250 and 1000 - those are the
values which were tried on x86 many years ago.

> 
> Anyway, a patch for ARM could perhaps end up like this:
> 
> ~~
> if ARCH_MULTIPLATFORM
> source kernel/Kconfig.hz
> else
> HZ
>     default 100
> endif
> 
> HZ
>     default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER
>     # any previous platform definitions where *really* required here.
>     # but not default 100 since it would override kernel/Kconfig.hz every time

That doesn't work - if you define the same symbol twice, one definition
takes priority over the other (I don't remember which way it works).
They don't accumulate.

> Which preserves all previous behaviors on all possible ARM arch
> combinations, but where no reasonable override is set.. Kconfig.hz is
> king. I cannot imagine any situation except for AT91 or OMAP could not
> do this in their own {mach,plat}-*/Kconfigs and not in the core
> config, which cleans up the extra HZ block.

Because... it simply doesn't work like that.  Try it and check to see
what Kconfig produces.

We know this, because our FRAME_POINTER config overrides the generic
one - not partially, but totally and utterly in every way.

> Could we also at least agree that if EBSA110 can handle HZ=200 with a
> 16-bit timer, or HZ=128 for OMAP and that AT91 will override it to 100
> on it's own, then that "default 100" is overly restrictive and we
> could remove it, allowing each {mach,plat}-*/Kconfig owner to
> investigate and find the correct HZ value and implement an override or
> selection, or just allow free configuration?

I just don't see how that's remotely possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:18       ` John Stultz
@ 2013-01-21 22:44         ` Russell King - ARM Linux
  2013-01-22  8:27           ` Arnd Bergmann
  0 siblings, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 22:44 UTC (permalink / raw)
  To: John Stultz
  Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, Ben Dooks

On Mon, Jan 21, 2013 at 02:18:20PM -0800, John Stultz wrote:
> So we used to have the ACTHZ code to handle error from the HZ rate  
> requested and the HZ rate possible given the underlying hardware. That's  
> been moved to the register_refined_jiffies(), but do you have a sense if  
> there a reason it couldn't be used? I don't quite recall the bounds at  
> this second, so ~7% error might very well be too large.
>
> So yes, I suspect these sorts of platforms, where there are no modern  
> clocksource/clockevent driver, as well as further constraints (like  
> specific HZ) are likely not good candidates for a multi-arch build.

In this particular case, EBSA110 is not a candidate for multi-arch
build anyway, because it's ARMv4 and we're only really bothering with
ARMv6 and better.

Not only that, but the IO stuff on it is sufficiently obscure and
non-standard...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:30     ` Arnd Bergmann
@ 2013-01-21 22:45       ` Russell King - ARM Linux
  2013-01-21 23:01         ` Matt Sealey
  0 siblings, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 22:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, John Stultz, Ben Dooks

On Mon, Jan 21, 2013 at 10:30:07PM +0000, Arnd Bergmann wrote:
> On Monday 21 January 2013, Matt Sealey wrote:
> > So is that a bug in that it is not available to ARM right now, a bug
> > in that it would be impossible for anyone on ARM to have ever tested
> > this code, or a bug in that it should NEVER be enabled for ARM for
> > some reason? John? Ingo? :)
> > 
> 
> I think it's a bug that it's not available. That does not look intentional.

What's a bug?  kernel/Kconfig.hz not being available?  No, it's
intentional.  (See my replies).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:36       ` John Stultz
@ 2013-01-21 22:49         ` Russell King - ARM Linux
  2013-01-21 22:54         ` Matt Sealey
  1 sibling, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 22:49 UTC (permalink / raw)
  To: John Stultz
  Cc: Matt Sealey, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 02:36:13PM -0800, John Stultz wrote:
> Well, Russell brought up a case that doesn't handle this. If a system  
> *can't* do HZ=100, but can do HZ=200.
>
> Though there are hacks, of course, that might get around this (skip  
> every other interrupt at 200HZ).

Note: in the early days of EBSA110 support, yes, we did that, so that
we could have HZ=100 everywhere.  _However_ it sufficiently peturbed
NTP that it basically was unable to slew the clock in any sane manner.
I never got to the bottom of why that was, and when USER_HZ was
decoupled from the kernel HZ, it allowed the problem to be fixed, and
the kernel code to become a _lot_ cleaner.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:36       ` John Stultz
  2013-01-21 22:49         ` Russell King - ARM Linux
@ 2013-01-21 22:54         ` Matt Sealey
  2013-01-21 23:13           ` Russell King - ARM Linux
                             ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 22:54 UTC (permalink / raw)
  To: John Stultz
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote:
> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>>
>> On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org>
>> wrote:
>>>
>>> On 01/21/2013 12:41 PM, Arnd Bergmann wrote:
>>>>
>>>> Right. It's pretty clear that the above logic does not work
>>>> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
>>>> select NO_HZ to make the question much less interesting.
>>>
>>> Although, even with NO_HZ, we still have some sense of HZ.
>>
>> In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
>> example) combined with CONFIG_NO_HZ and less than e.g. 250 things
>> happening per second will wake up "exactly" the same number of times?
>
> Ideally, if both systems are completely idle, they may see similar number of
> actual interrupts.
>
> But when the cpus are running processes, the HZ=1000 system will see more
> frequent interrupts, since the timer/scheduler interrupt will jump in 4
> times more frequently.

Understood..

>> CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
>> solution here, then, and CONFIG_HZ=100 should be a reasonable default
>> (as it is anyway with an otherwise-unconfigured kernel on any other
>> platform) for !CONFIG_NO_HZ.
>
> Eeehhh... I'm not sure this is follows.

Okay, I'm happy to be wrong on this...

>> As above, or "not select anything at all" since HZ=100 if you don't
>> touch anything, right?
>
> Well, Russell brought up a case that doesn't handle this. If a system
> *can't* do HZ=100, but can do HZ=200.
>
> Though there are hacks, of course, that might get around this (skip every
> other interrupt at 200HZ).

Hmm, I think it might be appreciated for people looking at this stuff
(same as I stumbled into it) for a little comment on WHY the default
is 200. That way you don't wonder even if you know why EBSA110 has a
HZ=200 default, why Exynos is lumped in there too (to reduce the
number of interrupts firing? Maybe the Exynos timer interrupt is kind
of a horrid core NMI kind of thing and it's desirable for it not to be
every millisecond, or maybe it has the same restrictions as EBSA110,
but where would anyone go to find out this information?)

>> If someone picks HZ=1000 and their platform can't support it, then
>> that's their own damn problem (don't touch things you don't
>> understand, right? ;)
>
> Well, ideally with kconfig we try to add proper dependencies so impossible
> options aren't left to the user.
> HZ is a common enough knob to turn on most systems, I don't know if leaving
> the user rope to hang himself is a great idea.

I think then the default 100 at the end of the arch/arm/Kconfig is
saying "you are not allowed to know that such a thing as rope even
exists," when in fact what we should be doing is just making sure they
can't swing it over the rafters.. am I taking the analogy too far? :)

>> My question really has to be is CONFIG_SCHED_HRTICK useful, what
>> exactly is it going to do on ARM here since nobody can ever have
>> enabled it? Is it going to keel over and explode if nobody registers a
>> non-jiffies sched_clock (since the jiffies clock is technically
>> reporting itself as a ridiculously high resolution clocksource..)?
>
> ??? Not following this at all.  jiffies is the *MOST* coarse resolution
> clocksource there is (at least that I'm aware of.. I recall someone wanting
> to do a 60Hz clocksource, but I don't think that ever happened).

Is that based on it's clocksource rating (probably worse than a real
hrtimer) or it's reported resolution? Because on i.MX51 if I force it
to use the jiffies clock the debug on the kernel log is telling me it
has a higher resolution (it TELLS me that it ticks "as fast" as the
CPU frequency and wraps less than my real timer).

I know where the 60Hz clocksource might come from, the old Amiga
platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
least, it is precisely the vsync clock for synchronizing your display
output on TV-out, which makes it completely useful for the framebuffer
driver), but.. you just won't expect to assign it as sched_clock or
your delay timer. And if anyone does I'd expect they'd know full well
it'd not run so well.

>> Or is this one of those things that if your platform doesn't have a
>> real high resolution timer, you shouldn't enable HRTIMERS and
>> therefore not enable SCHED_HRTICK as a result? That affects
>> ARCH_MULTIPLATFORM here. Is the solution as simple as
>> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
>> resolution timer? Documentation to that effect?
>
> SO HRITMERS was designed to be be build time enabled, while still giving you
> a functioning system if it was booted on a system that didn't support
> clockevents.  We boot with standard HZ, and only switch over to HRT mode if
> we have a proper clocksource and clockevent driver.

Okay. I'm still a little confused as to what SCHED_HRTICK actually
makes a difference to, though.

>From that description, we are booting with standard HZ on ARM, and the
core sched_clock (as in we can call setup_sched_clock)
and/or/both/optionally using a real delay_timer switch to HRT mode if
we have the right equipment available in the kernel and at runtime on
the SoC.. but the process scheduler isn't compiled with the means to
actually take advantage of us being in HRT mode?

> However, HRTIMERS or NOHZ doesn't fix the case of having a system boot with
> HZ=1000 or HZ=100 if the system can *only* do HZ=200.

A simple BUILD_BUG_ON and a BUG_ON right after each other in the
appropriate clocksource driver solves that.. if there's an insistence
on having at least some rope, we can put them in a field and tell them
they have to use the moon to actually hang themselves...

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:45       ` Russell King - ARM Linux
@ 2013-01-21 23:01         ` Matt Sealey
  0 siblings, 0 replies; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 23:01 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, John Stultz, Ben Dooks

On Mon, Jan 21, 2013 at 4:45 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 21, 2013 at 10:30:07PM +0000, Arnd Bergmann wrote:
>> On Monday 21 January 2013, Matt Sealey wrote:
>> > So is that a bug in that it is not available to ARM right now, a bug
>> > in that it would be impossible for anyone on ARM to have ever tested
>> > this code, or a bug in that it should NEVER be enabled for ARM for
>> > some reason? John? Ingo? :)
>> >
>>
>> I think it's a bug that it's not available. That does not look intentional.
>
> What's a bug?  kernel/Kconfig.hz not being available?  No, it's
> intentional.  (See my replies).

The bug I saw as real is that CONFIG_SCHED_HRTICK is defined only in
kernel/Kconfig.hz (and used in kernel/sched only) - so if we want that
functionality enabled we will also have to opencode it in
arch/arm/Kconfig. Everyone else, by virtue of using kernel/Kconfig.hz,
gets this config item enabled for free if they have hrtimers or
generic smp helpers.. if I understood what John just said, this means
on ARM, since we don't use kernel/Kconfig.hz and we don't also define
an item for CONFIG_SCHED_HRTICK, the process scheduler is completely
oblivious that we're running in HRT mode?

The thing I don't know is real is if that really matters one bit..

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:54         ` Matt Sealey
@ 2013-01-21 23:13           ` Russell King - ARM Linux
  2013-01-21 23:30             ` Matt Sealey
  2013-01-22  0:38           ` John Stultz
  2013-01-22  0:51           ` John Stultz
  2 siblings, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 23:13 UTC (permalink / raw)
  To: Matt Sealey
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote:
> Hmm, I think it might be appreciated for people looking at this stuff
> (same as I stumbled into it) for a little comment on WHY the default
> is 200. That way you don't wonder even if you know why EBSA110 has a
> HZ=200 default, why Exynos is lumped in there too (to reduce the
> number of interrupts firing?

Err, _reduce_ ?

Can you please explain why changing HZ from 100 to 200 is a reduction?

> Maybe the Exynos timer interrupt is kind
> of a horrid core NMI kind of thing and it's desirable for it not to be
> every millisecond,

Huh?  HZ=100 is centisecond intervals...

> or maybe it has the same restrictions as EBSA110,
> but where would anyone go to find out this information?)

Maybe the mailing list archives.  No, not these ones.  The full ones
on lists.arm.linux.org.uk.  The lurker archives contain every email
that has been on these mailing lists stretching back into the late
1990s.  They are the only _full_ archives (give or take a few problems
with connectivity between lists.arm.linux.org.uk and lists.infradead.org
throwing the archiver subscription off.)

> I think then the default 100 at the end of the arch/arm/Kconfig is
> saying "you are not allowed to know that such a thing as rope even
> exists," when in fact what we should be doing is just making sure they
> can't swing it over the rafters.. am I taking the analogy too far? :)

I think you're understanding is just waaaayyyyy off.  That default is
there because that is the _architecture_ _default_ and there _has_ to
be a default.  No, including kernel/Kconfig.hz won't give us any kind
of non-specified default because, as I've already said in one of my
other mails, you can't supplement Kconfig symbol definitions by
declaring it multiple times.

> I know where the 60Hz clocksource might come from, the old Amiga
> platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
> US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
> least, it is precisely the vsync clock for synchronizing your display
> output on TV-out, which makes it completely useful for the framebuffer
> driver), but.. you just won't expect to assign it as sched_clock or
> your delay timer. And if anyone does I'd expect they'd know full well
> it'd not run so well.

Except in the UK where it'd be 50Hz for the TV out.  (Lengthy irrelevant
explanation why this is so for UK cut.)

> >From that description, we are booting with standard HZ on ARM, and the
> core sched_clock (as in we can call setup_sched_clock)
> and/or/both/optionally using a real delay_timer switch to HRT mode if
> we have the right equipment available in the kernel and at runtime on
> the SoC.. but the process scheduler isn't compiled with the means to
> actually take advantage of us being in HRT mode?

Don't mix sched_clock() into this; it has nothing to do with HZ at all.
You're confusing your apples with your oranges.

> A simple BUILD_BUG_ON and a BUG_ON right after each other in the
> appropriate clocksource driver solves that.. if there's an insistence
> on having at least some rope, we can put them in a field and tell them
> they have to use the moon to actually hang themselves...

No it doesn't - it introduces a whole load of new ways to make the
kernel build or boot fail for pointless reasons - more failures, more
regressions.

No thank you.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 21:03   ` Russell King - ARM Linux
@ 2013-01-21 23:23     ` Tony Lindgren
  2013-01-22  6:23       ` Santosh Shilimkar
  0 siblings, 1 reply; 48+ messages in thread
From: Tony Lindgren @ 2013-01-21 23:23 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, John Stultz, Ben Dooks

* Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]:
> 
> As for Samsung and the rest I can't comment.  The original reason OMAP
> used this though was because the 32768Hz counter can't produce 100Hz
> without a .1% error - too much error under pre-clocksource
> implementations for timekeeping.  Whether that's changed with the
> clocksource/clockevent support needs to be checked.

Yes that's why HZ was originally set to 128. That value (or some multiple)
still makes sense when the 32 KiHZ clock source is being used. Of course
we should rely on the local timer when running for the SoCs that have
them.

Regards,

Tony

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:42         ` Russell King - ARM Linux
@ 2013-01-21 23:23           ` Matt Sealey
  2013-01-21 23:49             ` Russell King - ARM Linux
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 23:23 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
>> I am sorry it sounded if I was being high and mighty about not being
>> able to select my own HZ (or being forced by Exynos to be 200 or by
>> not being able to test an Exynos board, forced to default to 100). My
>> real "grievance" here is we got a configuration item for the scheduler
>> which is being left out of ARM configurations which *can* use high
>> resolution timers, but I don't know if this is a real problem or not,
>> hence asking about it, and that HZ=100 is the ARM default whether we
>> might be able to select that or not.. which seems low.
>
> Well, I have a versatile platform here.  It's the inteligence behind
> the power control system for booting the boards on the nightly tests
> (currently disabled because I'm waiting for my main server to lock up
> again, and I need to use one of the serial ports for that.)
>
> The point is, it talks via I2C to a load of power monitors to read
> samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
> built with HZ=100.  NO_HZ=y and highres timers are enabled... works
> fine.
>
> So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
> highres timers, it all works with epoll() - you get the interval that
> you're after.  I've verified this with calls to gettimeofday() and
> the POSIX clocks.

Okay.

So, can you read this (it's short):

http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt

And please tell me if he's batshit crazy and I should completely
ignore any scheduler discussion that isn't ARM-specific, or maybe..
and I can almost guarantee this, he doesn't have an ARM platform so
he's just delightfully ill-informed about anything but his quad-core
x86?

>> HZ=250 is the "current" kernel default if you don't touch anything, it
>> seems, apologies for thinking it was HZ=100.
>
> Actually, it always used to be 100Hz on everything, including x86.
> It got upped when there were interactivity issues... which haven't
> been reported on ARM - so why change something that we know works and
> everyone is happy with?

I don't know. I guess this is why I included Ingo and Peter as they
seem to be responsible for core HZ-related things; why have HZ=250 on
x86 when CONFIG_NO_HZ and HZ=100 would work just as effectively? Isn't
CONFIG_NO_HZ the default on x86 and PPC and.. pretty much everything
else?

I know Con K. has been accused many times of peddling snake-oil... but
he has pretty graphs and benchmarks that kind of bear him out on most
things even if the results do not get his work upstream. I can't fault
the statistical significance of his results.. but even a placebo
effect can be graphed, correlation is not causation, etc, etc. - I
don't know if anything real filters down into the documentation
though.

>> And that is too high for
>> EBSA110 and a couple of other boards, especially where HZ must equal
>> some exact divisor being pumped right into some timer unit.
>
> EBSA110 can do 250Hz, but it'll mean manually recalculating the timer
> arithmetic - because it's not a "reloading" counter - software has to
> manually reload it, and you have to take account of how far it's
> rolled over to get anything close to a regular interrupt rate which
> NTP is happy with.  And believe me, it used to be one of two main NTP
> broadcasting servers on my network, so I know it works.

A-ha...

>> Anyway, a patch for ARM could perhaps end up like this:
>>
>> ~~
>> if ARCH_MULTIPLATFORM
>> source kernel/Kconfig.hz
>> else
>> HZ
>>     default 100
>> endif
>>
>> HZ
>>     default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER
>>     # any previous platform definitions where *really* required here.
>>     # but not default 100 since it would override kernel/Kconfig.hz every time
>
> That doesn't work - if you define the same symbol twice, one definition
> takes priority over the other (I don't remember which way it works).
> They don't accumulate.

Well I did some testing.. a couple days of poking around, and they
don't need to accumulate.

> Because... it simply doesn't work like that.  Try it and check to see
> what Kconfig produces.

I did test it.. whatever you define last, sticks, and it's down to the
order they're parsed in the tree - luckily, arch/arm/Kconfig is
sourced first, which sources the mach/plat stuff way down at  the
bottom. As long as you have your "default" set somewhere, any further
default just has to be sourced or added later in *one* of the
Kconfigs, same as building any C file with "gcc -E" and spitting it
out.

Someone, at the end of it all, has to set some default, and as long as
the one you want is the last one, everything is shiny.

> We know this, because our FRAME_POINTER config overrides the generic
> one - not partially, but totally and utterly in every way.

But for something as simple as CONFIG_HZ getting a value.. it works
okay. If Kconfig.hz sets CONFIG_HZ=250 because CONFIG_HZ_250 is
default yes, and it CONFIG_HZ defaults to 250 if it's set, and then
you put

HZ
   default 100

Right after it, or right after it's source in arch/x86/Kconfig, or
whatever, that "default" is what sticks and what ends up in CONFIG_HZ
in the local .config.

> I just don't see how that's remotely possible.

Maybe I tested it wrong, you'd know better than I exactly how (and I
would appreciate knowing how so I can go back and test it again :)

--
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 23:13           ` Russell King - ARM Linux
@ 2013-01-21 23:30             ` Matt Sealey
  2013-01-22  0:02               ` Russell King - ARM Linux
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-21 23:30 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 5:13 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote:
>> Hmm, I think it might be appreciated for people looking at this stuff
>> (same as I stumbled into it) for a little comment on WHY the default
>> is 200. That way you don't wonder even if you know why EBSA110 has a
>> HZ=200 default, why Exynos is lumped in there too (to reduce the
>> number of interrupts firing?
>
> Err, _reduce_ ?
>
> Can you please explain why changing HZ from 100 to 200 is a reduction?

We were talking about HZ=1000 at the time, sorry...

>> Maybe the Exynos timer interrupt is kind
>> of a horrid core NMI kind of thing and it's desirable for it not to be
>> every millisecond,
>
> Huh?  HZ=100 is centisecond intervals...

See above..

> I think you're understanding is just waaaayyyyy off.  That default is
> there because that is the _architecture_ _default_ and there _has_ to
> be a default.  No, including kernel/Kconfig.hz won't give us any kind
> of non-specified default because, as I've already said in one of my
> other mails, you can't supplement Kconfig symbol definitions by
> declaring it multiple times.

Okay, so the real



>> I know where the 60Hz clocksource might come from, the old Amiga
>> platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
>> US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
>> least, it is precisely the vsync clock for synchronizing your display
>> output on TV-out, which makes it completely useful for the framebuffer
>> driver), but.. you just won't expect to assign it as sched_clock or
>> your delay timer. And if anyone does I'd expect they'd know full well
>> it'd not run so well.
>
> Except in the UK where it'd be 50Hz for the TV out.  (Lengthy irrelevant
> explanation why this is so for UK cut.)

Read again: "50Hz in Europe".

Australia too. I'm British and I used to have more EU-manufactured
Amigas than I knew what to do with.. so.. just like your NTP story, I
definitely know this already.

>> >From that description, we are booting with standard HZ on ARM, and the
>> core sched_clock (as in we can call setup_sched_clock)
>> and/or/both/optionally using a real delay_timer switch to HRT mode if
>> we have the right equipment available in the kernel and at runtime on
>> the SoC.. but the process scheduler isn't compiled with the means to
>> actually take advantage of us being in HRT mode?
>
> Don't mix sched_clock() into this; it has nothing to do with HZ at all.
> You're confusing your apples with your oranges.

Okay..

>> A simple BUILD_BUG_ON and a BUG_ON right after each other in the
>> appropriate clocksource driver solves that.. if there's an insistence
>> on having at least some rope, we can put them in a field and tell them
>> they have to use the moon to actually hang themselves...
>
> No it doesn't - it introduces a whole load of new ways to make the
> kernel build or boot fail for pointless reasons - more failures, more
> regressions.
>
> No thank you.

But it would effectively stop users drinking kool-aid.. if you set
your HZ to something stupid, you don't even get a kernel to build, and
certainly don't get to boot past the first 40 lines of boot messages..
I think most people would rather a build error, or a runtime
unmistakable, unmissable warning than a subtle and almost
imperceptible skew in NTP synchronization, to use your example.

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 23:23           ` Matt Sealey
@ 2013-01-21 23:49             ` Russell King - ARM Linux
  2013-01-22  0:09               ` Matt Sealey
  0 siblings, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-21 23:49 UTC (permalink / raw)
  To: Matt Sealey
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote:
> On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
> >> I am sorry it sounded if I was being high and mighty about not being
> >> able to select my own HZ (or being forced by Exynos to be 200 or by
> >> not being able to test an Exynos board, forced to default to 100). My
> >> real "grievance" here is we got a configuration item for the scheduler
> >> which is being left out of ARM configurations which *can* use high
> >> resolution timers, but I don't know if this is a real problem or not,
> >> hence asking about it, and that HZ=100 is the ARM default whether we
> >> might be able to select that or not.. which seems low.
> >
> > Well, I have a versatile platform here.  It's the inteligence behind
> > the power control system for booting the boards on the nightly tests
> > (currently disabled because I'm waiting for my main server to lock up
> > again, and I need to use one of the serial ports for that.)
> >
> > The point is, it talks via I2C to a load of power monitors to read
> > samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
> > built with HZ=100.  NO_HZ=y and highres timers are enabled... works
> > fine.
> >
> > So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
> > highres timers, it all works with epoll() - you get the interval that
> > you're after.  I've verified this with calls to gettimeofday() and
> > the POSIX clocks.
> 
> Okay.
> 
> So, can you read this (it's short):
> 
> http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt
> 
> And please tell me if he's batshit crazy and I should completely
> ignore any scheduler discussion that isn't ARM-specific, or maybe..
> and I can almost guarantee this, he doesn't have an ARM platform so
> he's just delightfully ill-informed about anything but his quad-core
> x86?

Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand...
doesn't really fit into any of those categories given there.  I'd suggest
that what's given there is a suggestion/opinion based on behaviours
observed on x86 platforms.

Whether it's appropriate for other architectures is not really a proven
point - is it worth running ARM at 1000Hz when the load from running at
100Hz is measurable as a definite error in loops_per_jiffy calibration?
Remember - the load from the interrupt handler at 1000Hz is 10x the load
at 100Hz.

Do you want to spend more cycles per second on the possibly multi-layer
IRQ servicing and timer servicing?

And what about the interrupt latency issue that we've hit several times
already with devices taking longer than 10ms to service their peripherals
because the driver doesn't make use of delayed works/tasklets/etc.

The lack of reasonable device DMA too has an impact for many drivers - the
CPU has to spend more time in interrupt handlers (which are now run to the
exclusion of any other interrupt in the system) performing PIO - or in the
case of those systems which _do_ have DMA, they may end up having to do
cache maintanence over large cache ranges from IRQ context which x86
doesn't have to do.

There's many factors here, and the choice of what the right HZ is for a
platform is not as clear cut as one may think.  Given all the additional
overheads we have on ARM because of the lack of memory coherency, the
generally bad DMA support, etc, I think what we currently have is still
right as an architecture default - 100Hz.

> I did test it.. whatever you define last, sticks, and it's down to the
> order they're parsed in the tree - luckily, arch/arm/Kconfig is
> sourced first, which sources the mach/plat stuff way down at  the
> bottom. As long as you have your "default" set somewhere, any further
> default just has to be sourced or added later in *one* of the
> Kconfigs, same as building any C file with "gcc -E" and spitting it
> out.
> 
> Someone, at the end of it all, has to set some default, and as long as
> the one you want is the last one, everything is shiny.

Actually, we're both wrong.  There seems to be two things which
inflence it, and it basically comes down to this:

- the value a particular symbol has comes from the _first_ declaration
  which a value is assigned to a symbol.

So:

config HZ
        int
        default 300

config HZ
        int
        default 100 if OPT1
        default 200 if OPT2
        default 400

takes on the value of 300 no matter what combination of OPT1 and OPT2
are enabled.

config HZ
        int
        default 100 if OPT1
        default 200 if OPT2
        default 400
           
config HZ
        int
        default 300

never takes the value 300, but 100, 200 or 400.

config HZ
        int
        default 100 if OPT1
        default 200 if OPT2
           
config HZ
        int
        default 300

Will now take 100, 200, or 300 depending on which of OPT1/OPT2 are enabled.

So, we _can_ use kernel/Kconfig.hz, but it's not very nice at all: we will
be presenting users with configutation options for the HZ value which will
be _silently_ ignored by Kconfig if we have a platform which overrides this.

Probably fine if you think that Kconfig is a developers tool and you edit
the configuration files (and therefore you expect them to know what they're
doing, and how this stuff works), but not if you think that Kconfig users
should be presented with meaningful options when configuring their kernel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 23:30             ` Matt Sealey
@ 2013-01-22  0:02               ` Russell King - ARM Linux
  0 siblings, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-22  0:02 UTC (permalink / raw)
  To: Matt Sealey
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 05:30:31PM -0600, Matt Sealey wrote:
> But it would effectively stop users drinking kool-aid.. if you set
> your HZ to something stupid, you don't even get a kernel to build, and
> certainly don't get to boot past the first 40 lines of boot messages..
> I think most people would rather a build error, or a runtime
> unmistakable, unmissable warning than a subtle and almost
> imperceptible skew in NTP synchronization, to use your example.

1. a kernel which doesn't build.  What do you think both Arnd and myself
   have been doing for the last few years, building such things as
   random configurations and such like, finding stuff that doesn't work
   and fixing the kernel so that we end up with _NO_ configuration which
   fails to build.

   Are you seriously about to tell us that we're wasting our time and we
   should just let the kernel build fail in all horrid sorts of ways?

2. As for NTP behaviour... well, have you ever experienced a system where
   NTP has to keep doing step corrections on the time of day, where some
   steps (eg, backwards) cause services to quit because time of day must
   be monotonic...

What you're proposing is that we litter the ARM arch with all sorts of
tests for CONFIG_HZ and #error out on ones that don't make sense.  I
think you're smoking crack.

What I think is that we should _not_ allow CONFIG_HZ to be set to
anything which isn't appropriate for the platforms - or indeed the
reverse.  That's going to be extremely difficult to do with multi-arch
because it's effectively a two-way dependency.

I don't think we can do that with kernel/Kconfig.hz unless we introduce
another layer of permissive configurations for the HZ_1000... etc, but
I'm not sure that anyone outside ARM would like even that.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 23:49             ` Russell King - ARM Linux
@ 2013-01-22  0:09               ` Matt Sealey
  2013-01-22  0:26                 ` Matt Sealey
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-22  0:09 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

Okay so the final resolution of this is;

1) That the arch/arm/Kconfig HZ block is suffering from some cruft

I think we could all be fairly confident that Exynos4 or S5P does not
require HZ=200 - in theory, it has no such timer restrictions like
EBSA110 (the docs I have show a perfectly capable 32-bit timer with a
double-digits MHz input clock, since these are multimedia-class SoCs
it'd be seriously f**ked up if they didn't).

But while some of the entries on this line may be cargo-cult
programming, the original addition on top of EBSA110 *may* be one of
your "unreported" responsiveness issues.

We could just let some Samsung employees complain when Android 6.x
starts to get laggy with a 3.8 kernel because we forced their HZ=100.

What I would do is predicate a fixed, obvious default on
ARCH_MULTIPLATFORM so that it would get the benefit of a default HZ
that you agree with. It wouldn't CHANGE anything, but it makes it look
less funky, since the non-multiplatform settings would be somewhere
else (it either needs more comments or an if - either way - otherwise
it's potentially confusing);

if ARCH_MULTIPLATFORM
config HZ
   int
   default 100
else
   # old config HZ block here
endif

2) We need to add config SCHED_HRTICK as a copy and paste from
kernel/Kconfig.hz since.. well, I still don't understand exactly what
the true effect would be, but I assume since Arnd is concerned and
John's explanation rings true that it really should be enabled on ARM
systems with the exact same dependencies as kernel/Kconfig.hz.

Or not.. I see it as an oddity until I understand if we really care
about it, but the code seems to be fairly important to the scheduler
and also enabled by default almost everywhere else, which means only
people with really freakish SMP architectures with no ability to use
GENERIC_SMP_HELPERS have ever run these code paths besides ARM. That
kind of leaves ARM in the doghouse.. who knows what weirdo scheduler
reactions are related to it not being enabled. Maybe when it is, HZ
*would* need to be allowed to be bumped when using this code path?
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.


On Mon, Jan 21, 2013 at 5:49 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote:
>> On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
>> >> I am sorry it sounded if I was being high and mighty about not being
>> >> able to select my own HZ (or being forced by Exynos to be 200 or by
>> >> not being able to test an Exynos board, forced to default to 100). My
>> >> real "grievance" here is we got a configuration item for the scheduler
>> >> which is being left out of ARM configurations which *can* use high
>> >> resolution timers, but I don't know if this is a real problem or not,
>> >> hence asking about it, and that HZ=100 is the ARM default whether we
>> >> might be able to select that or not.. which seems low.
>> >
>> > Well, I have a versatile platform here.  It's the inteligence behind
>> > the power control system for booting the boards on the nightly tests
>> > (currently disabled because I'm waiting for my main server to lock up
>> > again, and I need to use one of the serial ports for that.)
>> >
>> > The point is, it talks via I2C to a load of power monitors to read
>> > samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
>> > built with HZ=100.  NO_HZ=y and highres timers are enabled... works
>> > fine.
>> >
>> > So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
>> > highres timers, it all works with epoll() - you get the interval that
>> > you're after.  I've verified this with calls to gettimeofday() and
>> > the POSIX clocks.
>>
>> Okay.
>>
>> So, can you read this (it's short):
>>
>> http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt
>>
>> And please tell me if he's batshit crazy and I should completely
>> ignore any scheduler discussion that isn't ARM-specific, or maybe..
>> and I can almost guarantee this, he doesn't have an ARM platform so
>> he's just delightfully ill-informed about anything but his quad-core
>> x86?
>
> Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand...
> doesn't really fit into any of those categories given there.  I'd suggest
> that what's given there is a suggestion/opinion based on behaviours
> observed on x86 platforms.
>
> Whether it's appropriate for other architectures is not really a proven
> point - is it worth running ARM at 1000Hz when the load from running at
> 100Hz is measurable as a definite error in loops_per_jiffy calibration?
> Remember - the load from the interrupt handler at 1000Hz is 10x the load
> at 100Hz.
>
> Do you want to spend more cycles per second on the possibly multi-layer
> IRQ servicing and timer servicing?
>
> And what about the interrupt latency issue that we've hit several times
> already with devices taking longer than 10ms to service their peripherals
> because the driver doesn't make use of delayed works/tasklets/etc.
>
> The lack of reasonable device DMA too has an impact for many drivers - the
> CPU has to spend more time in interrupt handlers (which are now run to the
> exclusion of any other interrupt in the system) performing PIO - or in the
> case of those systems which _do_ have DMA, they may end up having to do
> cache maintanence over large cache ranges from IRQ context which x86
> doesn't have to do.
>
> There's many factors here, and the choice of what the right HZ is for a
> platform is not as clear cut as one may think.  Given all the additional
> overheads we have on ARM because of the lack of memory coherency, the
> generally bad DMA support, etc, I think what we currently have is still
> right as an architecture default - 100Hz.
>
>> I did test it.. whatever you define last, sticks, and it's down to the
>> order they're parsed in the tree - luckily, arch/arm/Kconfig is
>> sourced first, which sources the mach/plat stuff way down at  the
>> bottom. As long as you have your "default" set somewhere, any further
>> default just has to be sourced or added later in *one* of the
>> Kconfigs, same as building any C file with "gcc -E" and spitting it
>> out.
>>
>> Someone, at the end of it all, has to set some default, and as long as
>> the one you want is the last one, everything is shiny.
>
> Actually, we're both wrong.  There seems to be two things which
> inflence it, and it basically comes down to this:
>
> - the value a particular symbol has comes from the _first_ declaration
>   which a value is assigned to a symbol.
>
> So:
>
> config HZ
>         int
>         default 300
>
> config HZ
>         int
>         default 100 if OPT1
>         default 200 if OPT2
>         default 400
>
> takes on the value of 300 no matter what combination of OPT1 and OPT2
> are enabled.
>
> config HZ
>         int
>         default 100 if OPT1
>         default 200 if OPT2
>         default 400
>
> config HZ
>         int
>         default 300
>
> never takes the value 300, but 100, 200 or 400.
>
> config HZ
>         int
>         default 100 if OPT1
>         default 200 if OPT2
>
> config HZ
>         int
>         default 300
>
> Will now take 100, 200, or 300 depending on which of OPT1/OPT2 are enabled.
>
> So, we _can_ use kernel/Kconfig.hz, but it's not very nice at all: we will
> be presenting users with configutation options for the HZ value which will
> be _silently_ ignored by Kconfig if we have a platform which overrides this.
>
> Probably fine if you think that Kconfig is a developers tool and you edit
> the configuration files (and therefore you expect them to know what they're
> doing, and how this stuff works), but not if you think that Kconfig users
> should be presented with meaningful options when configuring their kernel.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  0:09               ` Matt Sealey
@ 2013-01-22  0:26                 ` Matt Sealey
  0 siblings, 0 replies; 48+ messages in thread
From: Matt Sealey @ 2013-01-22  0:26 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, linux-samsung-soc, Ben Dooks,
	Kukjin Kim

On Mon, Jan 21, 2013 at 6:09 PM, Matt Sealey <matt@genesi-usa.com> wrote:

[LAKML: about lack of SCHED_HRTICK because we don't use
kernel/Kconfig.hz on ARM)]

> kind of leaves ARM in the doghouse.. who knows what weirdo scheduler
> reactions are related to it not being enabled. Maybe when it is, HZ
> *would* need to be allowed to be bumped when using this code path?

Or conversely maybe this is exactly why the Samsung maintainers
decided they need HZ=200, because SCHED_HRTICK isn't being enabled and
they're experiencing some multimedia lag because of it?

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:54         ` Matt Sealey
  2013-01-21 23:13           ` Russell King - ARM Linux
@ 2013-01-22  0:38           ` John Stultz
  2013-01-22  0:51           ` John Stultz
  2 siblings, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-01-22  0:38 UTC (permalink / raw)
  To: Matt Sealey
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux, Thomas Gleixner

On 01/21/2013 02:54 PM, Matt Sealey wrote:
> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>>> Or is this one of those things that if your platform doesn't have a
>>> real high resolution timer, you shouldn't enable HRTIMERS and
>>> therefore not enable SCHED_HRTICK as a result? That affects
>>> ARCH_MULTIPLATFORM here. Is the solution as simple as
>>> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
>>> resolution timer? Documentation to that effect?
>> SO HRITMERS was designed to be be build time enabled, while still giving you
>> a functioning system if it was booted on a system that didn't support
>> clockevents.  We boot with standard HZ, and only switch over to HRT mode if
>> we have a proper clocksource and clockevent driver.
> Okay. I'm still a little confused as to what SCHED_HRTICK actually
> makes a difference to, though.
>
>  From that description, we are booting with standard HZ on ARM, and the
> core sched_clock (as in we can call setup_sched_clock)
> and/or/both/optionally using a real delay_timer switch to HRT mode if
> we have the right equipment available in the kernel and at runtime on
> the SoC.. but the process scheduler isn't compiled with the means to
> actually take advantage of us being in HRT mode?

So I'm actually not super familiar with SCHED_HRTICK details, but from 
my brief skim of it it looks like its useful for turning off the 
periodic timer tick, and allowing the scheduler tick to be triggered by 
an hrtimer itself (There's a number of these interesting inversions that 
go on in switching to HRT mode - for instance, standard timer ticks are 
switched to being hrtimer events themselves).

This likely has the benefit of time-accurate preemption (well, long 
term, as if the timer granularity isn't matching you could be delayed up 
to a tick - but it wouldn't drift).

I'm guessing Thomas would probably know best what the potential issues 
would be from running ((CONFIG_HRTIMER  || CONFIG_NO_HZ) && 
!CONFIG_SCHED_HRTICK).

thanks
-john

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:54         ` Matt Sealey
  2013-01-21 23:13           ` Russell King - ARM Linux
  2013-01-22  0:38           ` John Stultz
@ 2013-01-22  0:51           ` John Stultz
  2013-01-22  1:06             ` Matt Sealey
  2 siblings, 1 reply; 48+ messages in thread
From: John Stultz @ 2013-01-22  0:51 UTC (permalink / raw)
  To: Matt Sealey
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

On 01/21/2013 02:54 PM, Matt Sealey wrote:
> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>>> My question really has to be is CONFIG_SCHED_HRTICK useful, what
>>> exactly is it going to do on ARM here since nobody can ever have
>>> enabled it? Is it going to keel over and explode if nobody registers a
>>> non-jiffies sched_clock (since the jiffies clock is technically
>>> reporting itself as a ridiculously high resolution clocksource..)?
>> ??? Not following this at all.  jiffies is the *MOST* coarse resolution
>> clocksource there is (at least that I'm aware of.. I recall someone wanting
>> to do a 60Hz clocksource, but I don't think that ever happened).
> Is that based on it's clocksource rating (probably worse than a real
> hrtimer) or it's reported resolution? Because on i.MX51 if I force it
> to use the jiffies clock the debug on the kernel log is telling me it
> has a higher resolution (it TELLS me that it ticks "as fast" as the
> CPU frequency and wraps less than my real timer).
So the clocksource rating is supposed to be defined by the clocksource 
driver writer, and just provides a way for the clocksource core to 
select the best clocksource given a set of clocksources. It is not 
defined as any sort of calculated mapping to any property of the 
clocksource itself (although some driver writers might compute a ratings 
value in that way, but I feel the static ranking is much simpler). The 
comment above struct clocksource in clocksource.h tries to explain this.

As far as jiffies rating, from jiffies.c:
     .rating        = 1, /* lowest valid rating*/

So I'm not sure what you mean by "the debug on the kernel log is telling 
me it has a higher resolution".



> I know where the 60Hz clocksource might come from, the old Amiga
> platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
> US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
> least, it is precisely the vsync clock for synchronizing your display
> output on TV-out, which makes it completely useful for the framebuffer
> driver), but.. you just won't expect to assign it as sched_clock or
> your delay timer. And if anyone does I'd expect they'd know full well
> it'd not run so well.

Yes, in the case I was remembering, the 60HZ was driven by the 
electrical line.

thanks
-john



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  0:51           ` John Stultz
@ 2013-01-22  1:06             ` Matt Sealey
  2013-01-22  1:18               ` Russell King - ARM Linux
  2013-01-22  1:31               ` John Stultz
  0 siblings, 2 replies; 48+ messages in thread
From: Matt Sealey @ 2013-01-22  1:06 UTC (permalink / raw)
  To: John Stultz
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote:
> On 01/21/2013 02:54 PM, Matt Sealey wrote:
>>
>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org>
>> wrote:
>>>
>>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>
> As far as jiffies rating, from jiffies.c:
>     .rating        = 1, /* lowest valid rating*/
>
> So I'm not sure what you mean by "the debug on the kernel log is telling me
> it has a higher resolution".

Oh, it is just if I actually don't run setup_sched_clock on my
platform, it gives a little message (with #define DEBUG 1 in
sched_clock.c) about who setup the last sched_clock. Since you only
get one chance, and I was fiddling with setup_sched_clock being probed
from multiple possible timers from device tree (i.MX3 has a crapload
of valid timers, which one you use right now is basically forced by
the not-quite-fully-DT-only code and some funky iomap tricks).

And what I got was, if I use the real hardware timer, it runs at 66MHz
and says it has 15ns resolution and wraps every 500 seconds or so. The
jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
drift. The generic reporting of how "good" the sched_clock source is
kind of glosses over the quality rating of the clock source and at
first glance (if you're not paying that much attention), it is a
little bit misleading..

> Yes, in the case I was remembering, the 60HZ was driven by the electrical
> line.

While I have your attention, what would be the minimum "good" speed to
run the sched_clock or delay timer implementation from? My rudimentary
scribblings in my notebook give me a value of "don't bother" with less
than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
clock would do (i.MX osc clock input if I can supply it to one of the
above myriad timers) since this would be low-power compared to a 66MHz
one (by a couple mA anyway). I also have a bunch of questions about
the delay timer requirements.. I might mail you personally.. or would
you prefer on-list?

-- 
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  1:06             ` Matt Sealey
@ 2013-01-22  1:18               ` Russell King - ARM Linux
  2013-01-22  1:56                 ` Matt Sealey
  2013-01-22  1:31               ` John Stultz
  1 sibling, 1 reply; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-22  1:18 UTC (permalink / raw)
  To: Matt Sealey
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote:
> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote:
> > On 01/21/2013 02:54 PM, Matt Sealey wrote:
> >>
> >> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org>
> >> wrote:
> >>>
> >>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
> >
> > As far as jiffies rating, from jiffies.c:
> >     .rating        = 1, /* lowest valid rating*/
> >
> > So I'm not sure what you mean by "the debug on the kernel log is telling me
> > it has a higher resolution".
> 
> Oh, it is just if I actually don't run setup_sched_clock on my
> platform, it gives a little message (with #define DEBUG 1 in
> sched_clock.c)

sched_clock() has nothing to do with time keeping, and that
HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from
jiffies).

Now, sched_clock() is there to give the scheduler a _fast_ to access,
higher resolution clock than is available from other sources, so that
there's ways of accurately measuring the amount of time processes run
for, and other such measurements - and it uses that to determine how
to schedule a particular task and when to preempt it.

Not providing it means you get those measurements at HZ-based resolution,
which is suboptimal for tasks which run often for sub-HZ periods (which
can end up accumulating zero run time.)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  1:06             ` Matt Sealey
  2013-01-22  1:18               ` Russell King - ARM Linux
@ 2013-01-22  1:31               ` John Stultz
  2013-01-22  2:10                 ` Matt Sealey
  1 sibling, 1 reply; 48+ messages in thread
From: John Stultz @ 2013-01-22  1:31 UTC (permalink / raw)
  To: Matt Sealey
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

On 01/21/2013 05:06 PM, Matt Sealey wrote:
> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On 01/21/2013 02:54 PM, Matt Sealey wrote:
>>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org>
>>> wrote:
>>>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>> As far as jiffies rating, from jiffies.c:
>>      .rating        = 1, /* lowest valid rating*/
>>
>> So I'm not sure what you mean by "the debug on the kernel log is telling me
>> it has a higher resolution".
> Oh, it is just if I actually don't run setup_sched_clock on my
> platform, it gives a little message (with #define DEBUG 1 in
> sched_clock.c) about who setup the last sched_clock. Since you only
> get one chance, and I was fiddling with setup_sched_clock being probed
> from multiple possible timers from device tree (i.MX3 has a crapload
> of valid timers, which one you use right now is basically forced by
> the not-quite-fully-DT-only code and some funky iomap tricks).
>
> And what I got was, if I use the real hardware timer, it runs at 66MHz
> and says it has 15ns resolution and wraps every 500 seconds or so. The
> jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
> drift. The generic reporting of how "good" the sched_clock source is
> kind of glosses over the quality rating of the clock source and at
> first glance (if you're not paying that much attention), it is a
> little bit misleading..

I've got no clue on this. sched_clock is arch specific, and while ARM 
does use clocksources for sched_clock, what you're seeing is a detail of 
the ARM implementation and not the clocksource code (one complication is 
that clocksources rating values are for the requirements of timekeeping, 
which are different then the requirements for sched_clock - so the 
confusion is understandable).


>> Yes, in the case I was remembering, the 60HZ was driven by the electrical
>> line.
> While I have your attention, what would be the minimum "good" speed to
> run the sched_clock or delay timer implementation from? My rudimentary
> scribblings in my notebook give me a value of "don't bother" with less
> than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
> clock would do (i.MX osc clock input if I can supply it to one of the
> above myriad timers) since this would be low-power compared to a 66MHz
> one (by a couple mA anyway). I also have a bunch of questions about
> the delay timer requirements.. I might mail you personally.. or would
> you prefer on-list?
So there are probably other folks who could better comment on 
sched_clock() or the delay timer (I'm guessing the delay() 
implementation is what you mean by that) design trade-offs.

My first *guess* would be that for delay, you probably want a counter 
that has half-usec granularity or finer (~5Mhz), since udelay is likely 
the most common usage, and coarser then that and you might cause driver 
issues.  Though you could probably get away with a cpu loop based delay 
and avoid requiring a high res counter.

For sched_clock(), the standard reply is probably "as fast and as 
fine-graned as you can get".  But as far as a lower-bound, I'd expect 
the CONFIG_HZ value would be a good bet, as many systems use jiffies for 
their sched_clock without major issue, though I'm sure there are 
interactivity trade-offs.

But again, someone more familiar with the scheduler and driver 
requirements would probably be more informational.

thanks
-john



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  1:18               ` Russell King - ARM Linux
@ 2013-01-22  1:56                 ` Matt Sealey
  0 siblings, 0 replies; 48+ messages in thread
From: Matt Sealey @ 2013-01-22  1:56 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Mon, Jan 21, 2013 at 7:18 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote:
>> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote:
>> > On 01/21/2013 02:54 PM, Matt Sealey wrote:
>
> sched_clock() has nothing to do with time keeping, and that
> HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from
> jiffies).
>
> Now, sched_clock() is there to give the scheduler a _fast_ to access,
> higher resolution clock than is available from other sources, so that
> there's ways of accurately measuring the amount of time processes run
> for,

That depends on what you meant by timekeeping, right?

I'm really not concerned about the wallclock time, more about the
accuracy of the scheduler clock (tick?), preemption, accurate delays
(i.e. if I msleep(10) does it delay for 10ms or for 40ms because my
delay timer is inaccurate? I'd rather it was better but closer to
10ms), and whether the scheduler (the thing that tells my userspace
whether firefox is running now, or totem, or any other task) is using
the correct high resolution periodic, oneshot, repeatable (however it
repeats) timers *properly* given that this magic config item is
missing on ARM.

That magic config item being CONFIG_SCHED_HRTICK which is referenced a
bunch in kernel/sched/*.[ch] but *ONLY* defined as a Kconfig item in
kernel/Kconfig.hz.

Do we need to copy that Kconfig item out to arch/arm/Kconfig, that's
the question?

>  and other such measurements - and it uses that to determine how
> to schedule a particular task and when to preempt it.
>
> Not providing it means you get those measurements at HZ-based resolution,
> which is suboptimal for tasks which run often for sub-HZ periods (which
> can end up accumulating zero run time.)

Okay, and John said earlier:

John Stultz:
> So I'm actually not super familiar with SCHED_HRTICK details, but from my
> brief skim of it it looks like its useful for turning off the periodic timer
> tick, and allowing the scheduler tick to be triggered by an hrtimer itself
> (There's a number of these interesting inversions that go on in switching to
> HRT mode - for instance, standard timer ticks are switched to being hrtimer
> events themselves).
>
> This likely has the benefit of time-accurate preemption (well, long term, as
> if the timer granularity isn't matching you could be delayed up to a tick -
> but it wouldn't drift).
>
> I'm guessing Thomas would probably know best what the potential issues would
> be from running ((CONFIG_HRTIMER  || CONFIG_NO_HZ) && !CONFIG_SCHED_HRTICK).

If SCHED_HRTICK isn't enabled but setup_sched_clock has been given an
accessor for a real, hardware, fast, high resolution counter that
meets all the needs of sched_clock, what's going on? If it's enabled,
what extra is it doing that, say, my_plat_read_sched_clock doesn't?

--
Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  1:31               ` John Stultz
@ 2013-01-22  2:10                 ` Matt Sealey
  2013-01-31 21:31                   ` Thomas Gleixner
  0 siblings, 1 reply; 48+ messages in thread
From: Matt Sealey @ 2013-01-22  2:10 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner
  Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra,
	Ingo Molnar, Russell King - ARM Linux

Matt Sealey <matt@genesi-usa.com>
Product Development Analyst, Genesi USA, Inc.


On Mon, Jan 21, 2013 at 7:31 PM, John Stultz <john.stultz@linaro.org> wrote:
> On 01/21/2013 05:06 PM, Matt Sealey wrote:
>>
>> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org>
>> wrote:
>>>
>>> On 01/21/2013 02:54 PM, Matt Sealey wrote:
>>>>
>>>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org>
>>>> wrote:
>>>>>
>>>>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>>>
>>> As far as jiffies rating, from jiffies.c:
>>>      .rating        = 1, /* lowest valid rating*/
>>>
>>> So I'm not sure what you mean by "the debug on the kernel log is telling
>>> me
>>> it has a higher resolution".
>>
>> Oh, it is just if I actually don't run setup_sched_clock on my
>> platform, it gives a little message (with #define DEBUG 1 in
>> sched_clock.c) about who setup the last sched_clock. Since you only
>> get one chance, and I was fiddling with setup_sched_clock being probed
>> from multiple possible timers from device tree (i.MX3 has a crapload
>> of valid timers, which one you use right now is basically forced by
>> the not-quite-fully-DT-only code and some funky iomap tricks).
>>
>> And what I got was, if I use the real hardware timer, it runs at 66MHz
>> and says it has 15ns resolution and wraps every 500 seconds or so. The
>> jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
>> drift. The generic reporting of how "good" the sched_clock source is
>> kind of glosses over the quality rating of the clock source and at
>> first glance (if you're not paying that much attention), it is a
>> little bit misleading..
>
>
> I've got no clue on this. sched_clock is arch specific, and while ARM does
> use clocksources for sched_clock, what you're seeing is a detail of the ARM
> implementation and not the clocksource code (one complication is that
> clocksources rating values are for the requirements of timekeeping, which
> are different then the requirements for sched_clock - so the confusion is
> understandable).
>
>
>
>>> Yes, in the case I was remembering, the 60HZ was driven by the electrical
>>> line.
>>
>> While I have your attention, what would be the minimum "good" speed to
>> run the sched_clock or delay timer implementation from? My rudimentary
>> scribblings in my notebook give me a value of "don't bother" with less
>> than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
>> clock would do (i.MX osc clock input if I can supply it to one of the
>> above myriad timers) since this would be low-power compared to a 66MHz
>> one (by a couple mA anyway). I also have a bunch of questions about
>> the delay timer requirements.. I might mail you personally.. or would
>> you prefer on-list?
>
> So there are probably other folks who could better comment on sched_clock()
> or the delay timer (I'm guessing the delay() implementation is what you mean
> by that) design trade-offs.

I'm specifically talking about if I do

static struct delay_timer imx_gpt_delay_timer = {
        .read_current_timer = imx_gpt_read_current_timer,
};

and then something like:

imx_gpt_delay_timer.freq = clk_get_rate(clk_per);
register_current_timer_delay(&imx_gpt_delay_timer);

In the sense that now (as of kernel 3.7 iirc), I have an ability to
have the delay implementation use this awesome fast accessor (which is
nothing to do with a 'clocksource' as in the subsystem..) to get to my
(here at least) 66.5MHz counter (up or down, i.MX has both, but I
dunno if you can use a down counter for delay_timer, or if that's
preferred, or what.. there are no examples of it.. but it seems to
work.. that said I can't imagine what would be an immediately visible
and not totally random effect of doing it "wrong", maybe that delays
are instantly returned, that could be very hard or impossible to ever
notice compared to not being able to browse the internet on the target
device.. it might pop up on some randomly-not-resetting platform
device or so, though..)

And I can also put sched_clock on a completely different timer. Does
that make any sense at all? I wouldn't know, it's not documented.

And if I wanted to I could register 8 more timers. That seems rather
excessive, but the ability to use those extra 8 as clock outputs from
the SoC or otherwise directly use comparators is useful to some
people, does Linux in general really give a damn about having 8 timers
of the same quality being available when most systems barely have two
clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and
some SoC-specific timer). I dunno how many people might actually want
to define in a device tree, but I figure every single one is not a bad
thing and which ones end up as sched_clock, delay_timer or just plain
registered clocksources, or not registered as a clocksource and
accessed as some kind of comparator through some kooky ioctl API, is
something you would also configure...

> But again, someone more familiar with the scheduler and driver requirements
> would probably be more informational.

Okay. I assume that's a combination of Russell and Thomas..

-- 
Matt Sealey

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 23:23     ` Tony Lindgren
@ 2013-01-22  6:23       ` Santosh Shilimkar
  2013-01-22  9:31         ` Arnd Bergmann
  0 siblings, 1 reply; 48+ messages in thread
From: Santosh Shilimkar @ 2013-01-22  6:23 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Russell King - ARM Linux, Arnd Bergmann, Peter Zijlstra,
	Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz,
	Linux ARM Kernel ML

On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]:
>>
>> As for Samsung and the rest I can't comment.  The original reason OMAP
>> used this though was because the 32768Hz counter can't produce 100Hz
>> without a .1% error - too much error under pre-clocksource
>> implementations for timekeeping.  Whether that's changed with the
>> clocksource/clockevent support needs to be checked.
>
> Yes that's why HZ was originally set to 128. That value (or some multiple)
> still makes sense when the 32 KiHZ clock source is being used. Of course
> we should rely on the local timer when running for the SoCs that have
> them.
>
This is right. It was only because of the drift associated when clocked
with 32KHz. Even on SOCs where local timers are available for power
management reasons we need to switch to 32KHz clocked device in
low power states. Hence the HZ value should be multiple of 32 on
OMAP.

Regards
Santosh


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-21 22:44         ` Russell King - ARM Linux
@ 2013-01-22  8:27           ` Arnd Bergmann
  0 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2013-01-22  8:27 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: John Stultz, Matt Sealey, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar

On Monday 21 January 2013, Russell King - ARM Linux wrote:
> In this particular case, EBSA110 is not a candidate for multi-arch
> build anyway, because it's ARMv4 and we're only really bothering with
> ARMv6 and better.
> 
> Not only that, but the IO stuff on it is sufficiently obscure and
> non-standard...

Right, no point worrying about EBSA110. We need to work out OMAP and
Exynos/S5P though: As long as OMAP needs 128HZ and Exynos needs 200HZ,
we can never have them in the same kernel.

	Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  6:23       ` Santosh Shilimkar
@ 2013-01-22  9:31         ` Arnd Bergmann
  2013-01-22 10:14           ` Santosh Shilimkar
  0 siblings, 1 reply; 48+ messages in thread
From: Arnd Bergmann @ 2013-01-22  9:31 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Tony Lindgren, Russell King - ARM Linux, Peter Zijlstra,
	Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz,
	Linux ARM Kernel ML

On Tuesday 22 January 2013, Santosh Shilimkar wrote:
> On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote:
> > * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]:
> >>
> >> As for Samsung and the rest I can't comment.  The original reason OMAP
> >> used this though was because the 32768Hz counter can't produce 100Hz
> >> without a .1% error - too much error under pre-clocksource
> >> implementations for timekeeping.  Whether that's changed with the
> >> clocksource/clockevent support needs to be checked.
> >
> > Yes that's why HZ was originally set to 128. That value (or some multiple)
> > still makes sense when the 32 KiHZ clock source is being used. Of course
> > we should rely on the local timer when running for the SoCs that have
> > them.
> >
> This is right. It was only because of the drift associated when clocked
> with 32KHz. Even on SOCs where local timers are available for power
> management reasons we need to switch to 32KHz clocked device in
> low power states. Hence the HZ value should be multiple of 32 on
> OMAP.

I need some help understanding what the two of you are saying, because
it sounds to me that you imply we cannot have a multiplatform kernel
that includes OMAP and another platform that needs (or wants) a different
HZ value.

However, I also thought that when using a proper clocksource driver,
the HZ setting has absolutely no impact on the drift of the wall clock,
because those two are decoupled.

Even when using the HZ based clocksource (for whatever reason you
would want to do that), I thought there should be no drift as long
as the CLOCK_TICK_RATE (in older kernels) or the register_refined_jiffies()
(in older kernels) setting matches the hardware timer frequency.

What am I missing?

	Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  9:31         ` Arnd Bergmann
@ 2013-01-22 10:14           ` Santosh Shilimkar
  2013-01-22 14:51             ` Russell King - ARM Linux
  0 siblings, 1 reply; 48+ messages in thread
From: Santosh Shilimkar @ 2013-01-22 10:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Tony Lindgren, Russell King - ARM Linux, Peter Zijlstra,
	Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz,
	Linux ARM Kernel ML

On Tuesday 22 January 2013 03:01 PM, Arnd Bergmann wrote:
> On Tuesday 22 January 2013, Santosh Shilimkar wrote:
>> On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote:
>>> * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]:
>>>>
>>>> As for Samsung and the rest I can't comment.  The original reason OMAP
>>>> used this though was because the 32768Hz counter can't produce 100Hz
>>>> without a .1% error - too much error under pre-clocksource
>>>> implementations for timekeeping.  Whether that's changed with the
>>>> clocksource/clockevent support needs to be checked.
>>>
>>> Yes that's why HZ was originally set to 128. That value (or some multiple)
>>> still makes sense when the 32 KiHZ clock source is being used. Of course
>>> we should rely on the local timer when running for the SoCs that have
>>> them.
>>>
>> This is right. It was only because of the drift associated when clocked
>> with 32KHz. Even on SOCs where local timers are available for power
>> management reasons we need to switch to 32KHz clocked device in
>> low power states. Hence the HZ value should be multiple of 32 on
>> OMAP.
>
> I need some help understanding what the two of you are saying, because
> it sounds to me that you imply we cannot have a multiplatform kernel
> that includes OMAP and another platform that needs (or wants) a different
> HZ value.
>
Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
is always running(even during low power states) and hence the clock
source and clock event have been clocked using 32KHz clock. As mentioned
by RMK, with 32768 Hz clock and HZ = 100, there will be always an
error of 0.1 %. This accuracy also impacts the timer tick interval.
This was the reason, OMAP has been using the HZ = 128.

There is a hardware feature to implement 1 ms correction on the timer
to overcome such an issue but it was not supported on OMAP2 devices.
OMAP3/4/5 does support it. Though one attempt [1] was made to support
it in kernel. This will ofcourse will address the tick interval
corrections.

> However, I also thought that when using a proper clocksource driver,
> the HZ setting has absolutely no impact on the drift of the wall clock,
> because those two are decoupled.
>
I am not too sure about this. I was under impression that tick (clock
event) ticking accuracy does impact the kernel time keeping as well.

> Even when using the HZ based clocksource (for whatever reason you
> would want to do that), I thought there should be no drift as long
> as the CLOCK_TICK_RATE (in older kernels) or the register_refined_jiffies()
> (in older kernels) setting matches the hardware timer frequency.
>
> What am I missing?
>
The issue is with hardware timer frequency itself since with HZ = 100 or
200, the timer tick will not be accurate. Hope this gives bit more info.

Regards,
Santosh
[1] https://patchwork.kernel.org/patch/107364/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 10:14           ` Santosh Shilimkar
@ 2013-01-22 14:51             ` Russell King - ARM Linux
  2013-01-22 15:05               ` Santosh Shilimkar
                                 ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-22 14:51 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML,
	Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML

On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
> is always running(even during low power states) and hence the clock
> source and clock event have been clocked using 32KHz clock. As mentioned
> by RMK, with 32768 Hz clock and HZ = 100, there will be always an
> error of 0.1 %. This accuracy also impacts the timer tick interval.
> This was the reason, OMAP has been using the HZ = 128.

Ok.  Let's look at this.  As far as time-of-day is concerned, this
shouldn't really matter with the clocksource/clockevent based system
that we now have (where *important point* platforms have been converted
over.)

Any platform providing a clocksource will override the jiffy-based
clocksource.  The measurement of time-of-day passing is now based on
the difference in values read from the clocksource, not from the actual
tick rate.

Anything _not_ providing a clock source will be reliant on jiffies
incrementing, which in turn _requires_ one timer interrupt per jiffies
at a known rate (which is HZ).

Now, that's the time of day, what about jiffies?  Well, jiffies is
incremented based on a certain number of nsec having passed since the
last jiffy update.  That means the code copes with dropped ticks and
the like.

However, if your actual interrupt rate is close to the desired HZ, then
it can lead to some interesting effects (and noise):

- if the interrupt rate is slightly faster than HZ, then you can end up
  with updates being delayed by 2x interrupt rate.
- if the interrupt rate is slightly slower than HZ, you can occasionally
  end up with jiffies incrementing by two.
- if your interrupt rate is dead on HZ, then other system noise can come
  into effect and you may get maybe zero, one or two jiffy increments per
  interrupt.

(You have to think about time passing in NS, where jiffy updates should
be vs where the timer interrupts happen.)  See tick_do_update_jiffies64()
for the details.

The timer infrastructure is jiffy based - which includes scheduling where
the scheduler does not use hrtimers.  That means a slight discrepency
between HZ and the actual interrupt rate can cause around 1/HZ jitter.
That's a matter of fact due to how the code works.

So, actually, I think the accuracy of HZ has much overall effect _provided_
a platform provides a clocksource to the accuracy of jiffy based timers
nor timekeeping.  For those which don't, the accuracy of the timer
interrupt to HZ is very important.

(This is just based on reading some code and not on practical
experiments - I'd suggest some research of this is done, trying HZ=100
on OMAP's 32kHz timers, checking whether there's any drift, checking
how accurately a single task can be woken from various select/poll/epoll
delays, and checking whether NTP works.)

And I think further discussion is pointless until such research has been
done (or someone who _really_ knows the time keeping/timer/sched code
inside out comments.)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 14:51             ` Russell King - ARM Linux
@ 2013-01-22 15:05               ` Santosh Shilimkar
  2013-01-28  6:08                 ` Santosh Shilimkar
  2013-01-22 17:31               ` Arnd Bergmann
  2013-01-22 18:59               ` John Stultz
  2 siblings, 1 reply; 48+ messages in thread
From: Santosh Shilimkar @ 2013-01-22 15:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML,
	Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML

On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote:
> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
>> is always running(even during low power states) and hence the clock
>> source and clock event have been clocked using 32KHz clock. As mentioned
>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an
>> error of 0.1 %. This accuracy also impacts the timer tick interval.
>> This was the reason, OMAP has been using the HZ = 128.
>
> Ok.  Let's look at this.  As far as time-of-day is concerned, this
> shouldn't really matter with the clocksource/clockevent based system
> that we now have (where *important point* platforms have been converted
> over.)
>
> Any platform providing a clocksource will override the jiffy-based
> clocksource.  The measurement of time-of-day passing is now based on
> the difference in values read from the clocksource, not from the actual
> tick rate.
>
> Anything _not_ providing a clock source will be reliant on jiffies
> incrementing, which in turn _requires_ one timer interrupt per jiffies
> at a known rate (which is HZ).
>
> Now, that's the time of day, what about jiffies?  Well, jiffies is
> incremented based on a certain number of nsec having passed since the
> last jiffy update.  That means the code copes with dropped ticks and
> the like.
>
> However, if your actual interrupt rate is close to the desired HZ, then
> it can lead to some interesting effects (and noise):
>
> - if the interrupt rate is slightly faster than HZ, then you can end up
>    with updates being delayed by 2x interrupt rate.
> - if the interrupt rate is slightly slower than HZ, you can occasionally
>    end up with jiffies incrementing by two.
> - if your interrupt rate is dead on HZ, then other system noise can come
>    into effect and you may get maybe zero, one or two jiffy increments per
>    interrupt.
>
> (You have to think about time passing in NS, where jiffy updates should
> be vs where the timer interrupts happen.)  See tick_do_update_jiffies64()
> for the details.
>
> The timer infrastructure is jiffy based - which includes scheduling where
> the scheduler does not use hrtimers.  That means a slight discrepency
> between HZ and the actual interrupt rate can cause around 1/HZ jitter.
> That's a matter of fact due to how the code works.
>
> So, actually, I think the accuracy of HZ has much overall effect _provided_
> a platform provides a clocksource to the accuracy of jiffy based timers
> nor timekeeping.  For those which don't, the accuracy of the timer
> interrupt to HZ is very important.
>
> (This is just based on reading some code and not on practical
> experiments - I'd suggest some research of this is done, trying HZ=100
> on OMAP's 32kHz timers, checking whether there's any drift, checking
> how accurately a single task can be woken from various select/poll/epoll
> delays, and checking whether NTP works.)
>
Thanks for expanding it. It is really helpful.

> And I think further discussion is pointless until such research has been
> done (or someone who _really_ knows the time keeping/timer/sched code
> inside out comments.)
>
Fully agree about experimentation to re-asses the drift.
 From what I recollect from past, few OMAP customers did
report the time drift issue and that is how the switch
from 100 --> 128 happened.

Anyway I have added the suggested task to my long todo list.

Regards,
Santosh




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 14:51             ` Russell King - ARM Linux
  2013-01-22 15:05               ` Santosh Shilimkar
@ 2013-01-22 17:31               ` Arnd Bergmann
  2013-01-22 18:59               ` John Stultz
  2 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2013-01-22 17:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Santosh Shilimkar, Tony Lindgren, Peter Zijlstra, Matt Sealey,
	LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML

On Tuesday 22 January 2013, Russell King - ARM Linux wrote:
> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
> > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
> > is always running(even during low power states) and hence the clock
> > source and clock event have been clocked using 32KHz clock. As mentioned
> > by RMK, with 32768 Hz clock and HZ = 100, there will be always an
> > error of 0.1 %. This accuracy also impacts the timer tick interval.
> > This was the reason, OMAP has been using the HZ = 128.
> 
> Ok.  Let's look at this.  As far as time-of-day is concerned, this
> shouldn't really matter with the clocksource/clockevent based system
> that we now have (where *important point* platforms have been converted
> over.)
>
> Any platform providing a clocksource will override the jiffy-based
> clocksource.  The measurement of time-of-day passing is now based on
> the difference in values read from the clocksource, not from the actual
> tick rate.

Ok, that was my reading as well.

> - if the interrupt rate is slightly faster than HZ, then you can end up
>   with updates being delayed by 2x interrupt rate.
> - if the interrupt rate is slightly slower than HZ, you can occasionally
>   end up with jiffies incrementing by two.
> - if your interrupt rate is dead on HZ, then other system noise can come
>   into effect and you may get maybe zero, one or two jiffy increments per
>   interrupt.
> 
> (You have to think about time passing in NS, where jiffy updates should
> be vs where the timer interrupts happen.)  See tick_do_update_jiffies64()
> for the details.

Ah, right. I forgot about this case. So when we have an accurate clocksource,
rather than relying on the timer tick as the sole source for timekeeping,
the jiffies64 variable may be less accurate (up to almost two jiffies
diff, rather than almost one jiffy).

> The timer infrastructure is jiffy based - which includes scheduling where
> the scheduler does not use hrtimers.  That means a slight discrepency
> between HZ and the actual interrupt rate can cause around 1/HZ jitter.
> That's a matter of fact due to how the code works.

Yes, the two jiffies accuracy I mentioned above would be the result of
the 1 jiffy jitter plus 1 jiffy from the limited resolution.

> So, actually, I think the accuracy of HZ has much overall effect _provided_
> a platform provides a clocksource to the accuracy of jiffy based timers
> nor timekeeping.  For those which don't, the accuracy of the timer
> interrupt to HZ is very important.

This is where I don't see the same problem that you are seeing. Shouldn't
the old ACT_HZ calculation based on CLOCK_TICK_RATE have prevented this?

Note that all PC-like systems traditionally have a CLOCK_TICK_RATE of
1193182 Hz, which does not accurately divide into any of the normal
HZ values, the jiffies clocksource used to have code in it to make
up for this problem. Nowadays, since John's b3c869d35 "jiffies: Remove
compile time assumptions about CLOCK_TICK_RATE" patch in 3.7, the
logic in part of the refined_jiffies clock source that is used currently
only on x86.

I do agree that any platform that is using neither a platform specific
clocksource nor the refined_jiffies would suffer from the drift as
you describe. OMAP was in fact using CLOCK_TICK_RATE correctly, but
is not using the refined_jiffies clocksource now because it has
its own clocksource implementation.

> And I think further discussion is pointless until such research has been
> done (or someone who _really_ knows the time keeping/timer/sched code
> inside out comments.)

Maybe John has some more insights here, he seems to be the one that
understands it better than any of us.

	Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 14:51             ` Russell King - ARM Linux
  2013-01-22 15:05               ` Santosh Shilimkar
  2013-01-22 17:31               ` Arnd Bergmann
@ 2013-01-22 18:59               ` John Stultz
  2013-01-22 21:52                 ` Tony Lindgren
  2 siblings, 1 reply; 48+ messages in thread
From: John Stultz @ 2013-01-22 18:59 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Santosh Shilimkar, Arnd Bergmann, Tony Lindgren, Peter Zijlstra,
	Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML

On 01/22/2013 06:51 AM, Russell King - ARM Linux wrote:
> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
>> is always running(even during low power states) and hence the clock
>> source and clock event have been clocked using 32KHz clock. As mentioned
>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an
>> error of 0.1 %. This accuracy also impacts the timer tick interval.
>> This was the reason, OMAP has been using the HZ = 128.
> Ok.  Let's look at this.  As far as time-of-day is concerned, this
> shouldn't really matter with the clocksource/clockevent based system
> that we now have (where *important point* platforms have been converted
> over.)
>
> Any platform providing a clocksource will override the jiffy-based
> clocksource.  The measurement of time-of-day passing is now based on
> the difference in values read from the clocksource, not from the actual
> tick rate.
>
> Anything _not_ providing a clock source will be reliant on jiffies
> incrementing, which in turn _requires_ one timer interrupt per jiffies
> at a known rate (which is HZ).

Correct. As long as we have a fine-grained hardware clocksource 
installed, HZ error should not affect timekeeping in any major way.


> Now, that's the time of day, what about jiffies?  Well, jiffies is
> incremented based on a certain number of nsec having passed since the
> last jiffy update.  That means the code copes with dropped ticks and
> the like.
>
> However, if your actual interrupt rate is close to the desired HZ, then
> it can lead to some interesting effects (and noise):
>
> - if the interrupt rate is slightly faster than HZ, then you can end up
>    with updates being delayed by 2x interrupt rate.
> - if the interrupt rate is slightly slower than HZ, you can occasionally
>    end up with jiffies incrementing by two.
> - if your interrupt rate is dead on HZ, then other system noise can come
>    into effect and you may get maybe zero, one or two jiffy increments per
>    interrupt.
>
> (You have to think about time passing in NS, where jiffy updates should
> be vs where the timer interrupts happen.)  See tick_do_update_jiffies64()
> for the details.

Correct, with HRT, we actually trigger the HZ-frequency timer tick from 
an hrtimer (which expires based on the system time driven by the 
clocksource). Thus even if there is a theoretical error between the 
ideal HZ and what the hardware can do, that error will not propagate 
forward.

Instead, you may only see timer jitter on the order of how fine-grained 
the timer hardware can be triggered. If that is relatively fine, it 
shouldn't be an issue, if its relatively coarse (closer to HZ), then 
there may be the noise effects you list above. Although that should be 
mostly ok since jiffy timers will always have a few jiffys of jitter due 
to the granularity (ie: when setting a jiffies timer, you don't how how 
far into the current jiffy you are).

In the case where we don't have HRT, and the timers are triggered by the 
HZ periodic interrupt, then there is a mix of possibilities, for 
hrtimers you'll still see the behavior you list above (since they are 
still time based), but for jiffies timers, the rules are mostly inverted 
(if the interrupt rate is fast, jiffies timers will trigger sooner, if 
the rate is slow, jiffies timers will trigger later).

And if you are using jiffies for time (and not using the 
register_refined_jiffies code), then everything will follow the 
interrupt freq. So if interrupts are faster then HZ, time will move 
faster, timers will expire early, etc.


> The timer infrastructure is jiffy based - which includes scheduling where
> the scheduler does not use hrtimers.  That means a slight discrepency
> between HZ and the actual interrupt rate can cause around 1/HZ jitter.
> That's a matter of fact due to how the code works.
>
> So, actually, I think the accuracy of HZ has much overall effect _provided_
> a platform provides a clocksource to the accuracy of jiffy based timers
> nor timekeeping.  For those which don't, the accuracy of the timer
> interrupt to HZ is very important.

I think you're right, but I suspect there are some typos in the above. 
So to clarify:

The accuracy of HZ shouldn't have much affect on timekeeping on systems 
that use fine-grained clocksources. Though for systems that use 
jiffies/arch_gettimeoffset() HZ accuracy is more important. However, the 
register_refined_jiffies() call should allow for smaller error on those 
systems to be corrected.

The accuracy of HZ may have some affect on systems that do not have a 
clockevent driver and do not use hrt mode. It should be relatively bounded


> (This is just based on reading some code and not on practical
> experiments - I'd suggest some research of this is done, trying HZ=100
> on OMAP's 32kHz timers, checking whether there's any drift, checking
> how accurately a single task can be woken from various select/poll/epoll
> delays, and checking whether NTP works.)

Yea, for omap and other more "modern" systems with clocksources and 
clockevents, HZ=100 should be ok. Although I'd still like to see the 
experiments run, since as always, there may be bugs (I'd be interested 
in hearing about).

Even on systems w/o clocksources and clockevents, small HZ error should 
be able to be managed via the register_refined_jiffies() and I'd like to 
hear if folks have issues with that (there may be bounds limits I've not 
run into - so I'd like to get that fixed if so).

The only really problematic cases are systems where there aren't 
clocksources nor clockevents, and the hardware has specific limits on 
what HZ ranges it can do (ie the EBSA110), but I think we're all ok with 
those not being able to be compiled into a multi-platform kernel.

thanks
-john


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 18:59               ` John Stultz
@ 2013-01-22 21:52                 ` Tony Lindgren
  2013-01-23  5:18                   ` Santosh Shilimkar
  0 siblings, 1 reply; 48+ messages in thread
From: Tony Lindgren @ 2013-01-22 21:52 UTC (permalink / raw)
  To: John Stultz
  Cc: Russell King - ARM Linux, Santosh Shilimkar, Arnd Bergmann,
	Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar,
	Linux ARM Kernel ML

* John Stultz <john.stultz@linaro.org> [130122 11:02]:
> 
> Correct, with HRT, we actually trigger the HZ-frequency timer tick
> from an hrtimer (which expires based on the system time driven by
> the clocksource). Thus even if there is a theoretical error between
> the ideal HZ and what the hardware can do, that error will not
> propagate forward.

If there's no cumulative error, sounds like the way to go is to select
HRT for ARM multiplatform builds and set the HZ to 100 then.

Regards,

Tony

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 21:52                 ` Tony Lindgren
@ 2013-01-23  5:18                   ` Santosh Shilimkar
  0 siblings, 0 replies; 48+ messages in thread
From: Santosh Shilimkar @ 2013-01-23  5:18 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: John Stultz, Russell King - ARM Linux, Arnd Bergmann,
	Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar,
	Linux ARM Kernel ML

On Wednesday 23 January 2013 03:22 AM, Tony Lindgren wrote:
> * John Stultz <john.stultz@linaro.org> [130122 11:02]:
>>
>> Correct, with HRT, we actually trigger the HZ-frequency timer tick
>> from an hrtimer (which expires based on the system time driven by
>> the clocksource). Thus even if there is a theoretical error between
>> the ideal HZ and what the hardware can do, that error will not
>> propagate forward.
>
> If there's no cumulative error, sounds like the way to go is to select
> HRT for ARM multiplatform builds and set the HZ to 100 then.
>
HIGH_RES_TIMERS are always enabled by default for OMAP as well as
multi-platform build.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22 15:05               ` Santosh Shilimkar
@ 2013-01-28  6:08                 ` Santosh Shilimkar
  2013-01-29  0:01                   ` John Stultz
  0 siblings, 1 reply; 48+ messages in thread
From: Santosh Shilimkar @ 2013-01-28  6:08 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML,
	Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML

On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote:
> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote:
>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
>>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
>>> is always running(even during low power states) and hence the clock
>>> source and clock event have been clocked using 32KHz clock. As mentioned
>>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an
>>> error of 0.1 %. This accuracy also impacts the timer tick interval.
>>> This was the reason, OMAP has been using the HZ = 128.
>>
>> Ok.  Let's look at this.  As far as time-of-day is concerned, this
>> shouldn't really matter with the clocksource/clockevent based system
>> that we now have (where *important point* platforms have been converted
>> over.)
>>
>> Any platform providing a clocksource will override the jiffy-based
>> clocksource.  The measurement of time-of-day passing is now based on
>> the difference in values read from the clocksource, not from the actual
>> tick rate.
>>
>> Anything _not_ providing a clock source will be reliant on jiffies
>> incrementing, which in turn _requires_ one timer interrupt per jiffies
>> at a known rate (which is HZ).
>>
>> Now, that's the time of day, what about jiffies?  Well, jiffies is
>> incremented based on a certain number of nsec having passed since the
>> last jiffy update.  That means the code copes with dropped ticks and
>> the like.
>>
>> However, if your actual interrupt rate is close to the desired HZ, then
>> it can lead to some interesting effects (and noise):
>>
>> - if the interrupt rate is slightly faster than HZ, then you can end up
>>    with updates being delayed by 2x interrupt rate.
>> - if the interrupt rate is slightly slower than HZ, you can occasionally
>>    end up with jiffies incrementing by two.
>> - if your interrupt rate is dead on HZ, then other system noise can come
>>    into effect and you may get maybe zero, one or two jiffy increments
>> per
>>    interrupt.
>>
>> (You have to think about time passing in NS, where jiffy updates should
>> be vs where the timer interrupts happen.)  See tick_do_update_jiffies64()
>> for the details.
>>
>> The timer infrastructure is jiffy based - which includes scheduling where
>> the scheduler does not use hrtimers.  That means a slight discrepency
>> between HZ and the actual interrupt rate can cause around 1/HZ jitter.
>> That's a matter of fact due to how the code works.
>>
>> So, actually, I think the accuracy of HZ has much overall effect
>> _provided_
>> a platform provides a clocksource to the accuracy of jiffy based timers
>> nor timekeeping.  For those which don't, the accuracy of the timer
>> interrupt to HZ is very important.
>>
>> (This is just based on reading some code and not on practical
>> experiments - I'd suggest some research of this is done, trying HZ=100
>> on OMAP's 32kHz timers, checking whether there's any drift, checking
>> how accurately a single task can be woken from various select/poll/epoll
>> delays, and checking whether NTP works.)
>>
> Thanks for expanding it. It is really helpful.
>
>> And I think further discussion is pointless until such research has been
>> done (or someone who _really_ knows the time keeping/timer/sched code
>> inside out comments.)
>>
> Fully agree about experimentation to re-asses the drift.
>  From what I recollect from past, few OMAP customers did
> report the time drift issue and that is how the switch
> from 100 --> 128 happened.
>
> Anyway I have added the suggested task to my long todo list.
>
So I tried to see if any time drift with HZ = 100 on OMAP. I ran the
setup for 62 hours and 27 mins with time synced up once with NTP server.
I measure about ~174 millisecond drift which is almost noise considering
the observed duration was ~224820000 milliseconds.

Am re-running the setup with HZ = 128 for similar time frame to see if
the minimal drift observed goes away.

Once through that, I will send a patch to update the OMAP to use
HZ = 100 and possibly get rid of the custom OMAP HZ config.

Regards,
Santosh



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-28  6:08                 ` Santosh Shilimkar
@ 2013-01-29  0:01                   ` John Stultz
  2013-01-29  6:43                     ` Santosh Shilimkar
  0 siblings, 1 reply; 48+ messages in thread
From: John Stultz @ 2013-01-29  0:01 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Russell King - ARM Linux, Arnd Bergmann, Tony Lindgren,
	Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar,
	Linux ARM Kernel ML

On 01/27/2013 10:08 PM, Santosh Shilimkar wrote:
> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote:
>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote:
>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
>>>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock 
>>>> which
>>>> is always running(even during low power states) and hence the clock
>>>> source and clock event have been clocked using 32KHz clock. As 
>>>> mentioned
>>>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an
>>>> error of 0.1 %. This accuracy also impacts the timer tick interval.
>>>> This was the reason, OMAP has been using the HZ = 128.
>>>
>>> Ok.  Let's look at this.  As far as time-of-day is concerned, this
>>> shouldn't really matter with the clocksource/clockevent based system
>>> that we now have (where *important point* platforms have been converted
>>> over.)
>>>
>>> Any platform providing a clocksource will override the jiffy-based
>>> clocksource.  The measurement of time-of-day passing is now based on
>>> the difference in values read from the clocksource, not from the actual
>>> tick rate.
>>>
>>> Anything _not_ providing a clock source will be reliant on jiffies
>>> incrementing, which in turn _requires_ one timer interrupt per jiffies
>>> at a known rate (which is HZ).
>>>
>>> Now, that's the time of day, what about jiffies?  Well, jiffies is
>>> incremented based on a certain number of nsec having passed since the
>>> last jiffy update.  That means the code copes with dropped ticks and
>>> the like.
>>>
>>> However, if your actual interrupt rate is close to the desired HZ, then
>>> it can lead to some interesting effects (and noise):
>>>
>>> - if the interrupt rate is slightly faster than HZ, then you can end up
>>>    with updates being delayed by 2x interrupt rate.
>>> - if the interrupt rate is slightly slower than HZ, you can 
>>> occasionally
>>>    end up with jiffies incrementing by two.
>>> - if your interrupt rate is dead on HZ, then other system noise can 
>>> come
>>>    into effect and you may get maybe zero, one or two jiffy increments
>>> per
>>>    interrupt.
>>>
>>> (You have to think about time passing in NS, where jiffy updates should
>>> be vs where the timer interrupts happen.)  See 
>>> tick_do_update_jiffies64()
>>> for the details.
>>>
>>> The timer infrastructure is jiffy based - which includes scheduling 
>>> where
>>> the scheduler does not use hrtimers.  That means a slight discrepency
>>> between HZ and the actual interrupt rate can cause around 1/HZ jitter.
>>> That's a matter of fact due to how the code works.
>>>
>>> So, actually, I think the accuracy of HZ has much overall effect
>>> _provided_
>>> a platform provides a clocksource to the accuracy of jiffy based timers
>>> nor timekeeping.  For those which don't, the accuracy of the timer
>>> interrupt to HZ is very important.
>>>
>>> (This is just based on reading some code and not on practical
>>> experiments - I'd suggest some research of this is done, trying HZ=100
>>> on OMAP's 32kHz timers, checking whether there's any drift, checking
>>> how accurately a single task can be woken from various 
>>> select/poll/epoll
>>> delays, and checking whether NTP works.)
>>>
>> Thanks for expanding it. It is really helpful.
>>
>>> And I think further discussion is pointless until such research has 
>>> been
>>> done (or someone who _really_ knows the time keeping/timer/sched code
>>> inside out comments.)
>>>
>> Fully agree about experimentation to re-asses the drift.
>>  From what I recollect from past, few OMAP customers did
>> report the time drift issue and that is how the switch
>> from 100 --> 128 happened.
>>
>> Anyway I have added the suggested task to my long todo list.
>>
> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the
> setup for 62 hours and 27 mins with time synced up once with NTP server.
> I measure about ~174 millisecond drift which is almost noise considering
> the observed duration was ~224820000 milliseconds.

So 174ms drift doesn't sound great, as < 2ms (often much less - though 
that depends on how close the server is) can be expected with NTP. 
Although its not clear how you were measuring: Did you see a max 174ms 
offset while trying to sync with NTP? Was that offset shortly after 
starting NTP or after NTP converged down?

thanks
-john


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-29  0:01                   ` John Stultz
@ 2013-01-29  6:43                     ` Santosh Shilimkar
  2013-01-29 10:06                       ` Russell King - ARM Linux
  2013-01-29 18:43                       ` John Stultz
  0 siblings, 2 replies; 48+ messages in thread
From: Santosh Shilimkar @ 2013-01-29  6:43 UTC (permalink / raw)
  To: John Stultz
  Cc: Russell King - ARM Linux, Arnd Bergmann, Tony Lindgren,
	Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar,
	Linux ARM Kernel ML

Jon,

On Tuesday 29 January 2013 05:31 AM, John Stultz wrote:
> On 01/27/2013 10:08 PM, Santosh Shilimkar wrote:
>> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote:
>>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote:
>>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:

[..]

>>> Thanks for expanding it. It is really helpful.
>>>
>>>> And I think further discussion is pointless until such research has
>>>> been
>>>> done (or someone who _really_ knows the time keeping/timer/sched code
>>>> inside out comments.)
>>>>
>>> Fully agree about experimentation to re-asses the drift.
>>>  From what I recollect from past, few OMAP customers did
>>> report the time drift issue and that is how the switch
>>> from 100 --> 128 happened.
>>>
>>> Anyway I have added the suggested task to my long todo list.
>>>
>> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the
>> setup for 62 hours and 27 mins with time synced up once with NTP server.
>> I measure about ~174 millisecond drift which is almost noise considering
>> the observed duration was ~224820000 milliseconds.
>
> So 174ms drift doesn't sound great, as < 2ms (often much less - though
> that depends on how close the server is) can be expected with NTP.
> Although its not clear how you were measuring: Did you see a max 174ms
> offset while trying to sync with NTP? Was that offset shortly after
> starting NTP or after NTP converged down?
>
To avoid the server latency, we didn't do continuous sync. The time was 
synced in the beginning and after 62.5 hours (#ntpd -qg) and the drift
of about 174 ms was observed. As you said this could be because of
server sync time along with probably some addition from system calls
from #ntpd. As mentioned, the other run with HZ = 128 which started
15 hours 20 mins is already showing about 24 mS drift now. I will
let it run for couple of more days just to have similar duration run.

Regards,
santosh




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-29  6:43                     ` Santosh Shilimkar
@ 2013-01-29 10:06                       ` Russell King - ARM Linux
  2013-01-29 18:43                       ` John Stultz
  1 sibling, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2013-01-29 10:06 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: John Stultz, Arnd Bergmann, Tony Lindgren, Peter Zijlstra,
	Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML

On Tue, Jan 29, 2013 at 12:13:46PM +0530, Santosh Shilimkar wrote:
> To avoid the server latency, we didn't do continuous sync. The time was  
> synced in the beginning and after 62.5 hours (#ntpd -qg) and the drift
> of about 174 ms was observed. As you said this could be because of
> server sync time along with probably some addition from system calls
> from #ntpd. As mentioned, the other run with HZ = 128 which started
> 15 hours 20 mins is already showing about 24 mS drift now. I will
> let it run for couple of more days just to have similar duration run.

Hmm.  I wonder if ntpd -qg will cause ntp to read the drift file and
adjust the kernel time keeping using that information...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-29  6:43                     ` Santosh Shilimkar
  2013-01-29 10:06                       ` Russell King - ARM Linux
@ 2013-01-29 18:43                       ` John Stultz
  1 sibling, 0 replies; 48+ messages in thread
From: John Stultz @ 2013-01-29 18:43 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: Russell King - ARM Linux, Arnd Bergmann, Tony Lindgren,
	Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar,
	Linux ARM Kernel ML

On 01/28/2013 10:43 PM, Santosh Shilimkar wrote:
> Jon,
>
> On Tuesday 29 January 2013 05:31 AM, John Stultz wrote:
>> On 01/27/2013 10:08 PM, Santosh Shilimkar wrote:
>>> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote:
>>>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote:
>>>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:
>
> [..]
>
>>>> Thanks for expanding it. It is really helpful.
>>>>
>>>>> And I think further discussion is pointless until such research has
>>>>> been
>>>>> done (or someone who _really_ knows the time keeping/timer/sched code
>>>>> inside out comments.)
>>>>>
>>>> Fully agree about experimentation to re-asses the drift.
>>>>  From what I recollect from past, few OMAP customers did
>>>> report the time drift issue and that is how the switch
>>>> from 100 --> 128 happened.
>>>>
>>>> Anyway I have added the suggested task to my long todo list.
>>>>
>>> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the
>>> setup for 62 hours and 27 mins with time synced up once with NTP 
>>> server.
>>> I measure about ~174 millisecond drift which is almost noise 
>>> considering
>>> the observed duration was ~224820000 milliseconds.
>>
>> So 174ms drift doesn't sound great, as < 2ms (often much less - though
>> that depends on how close the server is) can be expected with NTP.
>> Although its not clear how you were measuring: Did you see a max 174ms
>> offset while trying to sync with NTP? Was that offset shortly after
>> starting NTP or after NTP converged down?
>>
> To avoid the server latency, we didn't do continuous sync. The time 
> was synced in the beginning and after 62.5 hours (#ntpd -qg) and the 
> drift
> of about 174 ms was observed. As you said this could be because of
> server sync time along with probably some addition from system calls
> from #ntpd. 

Ahh.. Ok. Thanks for the clarification. After a one time sync, ~774ppb 
drift is surprisingly good!


> As mentioned, the other run with HZ = 128 which started
> 15 hours 20 mins is already showing about 24 mS drift now. I will
> let it run for couple of more days just to have similar duration run.

Yea, this is also great drift wise (but its not surprising, as both 
cases we're keeping time off of the same clocksource, and HZ shouldn't 
come into play). But its good to have the timekeeping side validated.

thanks
-john







^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: One of these things (CONFIG_HZ) is not like the others..
  2013-01-22  2:10                 ` Matt Sealey
@ 2013-01-31 21:31                   ` Thomas Gleixner
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Gleixner @ 2013-01-31 21:31 UTC (permalink / raw)
  To: Matt Sealey
  Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML,
	Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux

On Mon, 21 Jan 2013, Matt Sealey wrote:
> And if I wanted to I could register 8 more timers. That seems rather
> excessive, but the ability to use those extra 8 as clock outputs from
> the SoC or otherwise directly use comparators is useful to some
> people, does Linux in general really give a damn about having 8 timers
> of the same quality being available when most systems barely have two
> clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and
> some SoC-specific timer). I dunno how many people might actually want

If you want to use that timers just for delivering arbitrary timer
events, then no. There is no point to have a gazillion of timer
interrupts happening w/o being coordinated. We have a pretty well
structured timer event infrastructure for precise and more timeout
oriented events, which are pretty happy to be served by a single per
cpu event device.

If you want to use the extra timers for other purposes (PWM, timer
triggered DMA transfers, etc...) then they are not in any way related
to the timers/timekeeping core.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2013-01-31 21:31 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-21 20:01 One of these things (CONFIG_HZ) is not like the others Matt Sealey
2013-01-21 20:41 ` Arnd Bergmann
2013-01-21 21:00   ` John Stultz
2013-01-21 21:12     ` Russell King - ARM Linux
2013-01-21 22:18       ` John Stultz
2013-01-21 22:44         ` Russell King - ARM Linux
2013-01-22  8:27           ` Arnd Bergmann
2013-01-21 22:20       ` Matt Sealey
2013-01-21 22:42         ` Russell King - ARM Linux
2013-01-21 23:23           ` Matt Sealey
2013-01-21 23:49             ` Russell King - ARM Linux
2013-01-22  0:09               ` Matt Sealey
2013-01-22  0:26                 ` Matt Sealey
2013-01-21 21:14     ` Matt Sealey
2013-01-21 22:36       ` John Stultz
2013-01-21 22:49         ` Russell King - ARM Linux
2013-01-21 22:54         ` Matt Sealey
2013-01-21 23:13           ` Russell King - ARM Linux
2013-01-21 23:30             ` Matt Sealey
2013-01-22  0:02               ` Russell King - ARM Linux
2013-01-22  0:38           ` John Stultz
2013-01-22  0:51           ` John Stultz
2013-01-22  1:06             ` Matt Sealey
2013-01-22  1:18               ` Russell King - ARM Linux
2013-01-22  1:56                 ` Matt Sealey
2013-01-22  1:31               ` John Stultz
2013-01-22  2:10                 ` Matt Sealey
2013-01-31 21:31                   ` Thomas Gleixner
2013-01-21 21:02   ` Matt Sealey
2013-01-21 22:30     ` Arnd Bergmann
2013-01-21 22:45       ` Russell King - ARM Linux
2013-01-21 23:01         ` Matt Sealey
2013-01-21 21:03   ` Russell King - ARM Linux
2013-01-21 23:23     ` Tony Lindgren
2013-01-22  6:23       ` Santosh Shilimkar
2013-01-22  9:31         ` Arnd Bergmann
2013-01-22 10:14           ` Santosh Shilimkar
2013-01-22 14:51             ` Russell King - ARM Linux
2013-01-22 15:05               ` Santosh Shilimkar
2013-01-28  6:08                 ` Santosh Shilimkar
2013-01-29  0:01                   ` John Stultz
2013-01-29  6:43                     ` Santosh Shilimkar
2013-01-29 10:06                       ` Russell King - ARM Linux
2013-01-29 18:43                       ` John Stultz
2013-01-22 17:31               ` Arnd Bergmann
2013-01-22 18:59               ` John Stultz
2013-01-22 21:52                 ` Tony Lindgren
2013-01-23  5:18                   ` Santosh Shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).