* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 20:01 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 20:01 UTC (permalink / raw) To: Linux ARM Kernel ML Cc: Arnd Bergmann, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux Hello all, Understanding that this is a bit of a digression, I have a related nitpick to discussion of the patch "arm: kconfig: don't select TWD with local timer for Armada 370/XP" which is allowing me to explain myself a little better given Arnd's recommendation for it, since I was looking for a really good way to describe it without seeming too focused on a particular configuration item.. So, to recap, there is a discussion going on about where HAVE_ lives and what ARCH_MULTIPLATFORM breakes when using HAVE_. I think this is related, at least, to configuration reworks to make ARCH_MULTIPLATFORM a truly "inclusive" place.. ARM seems to be the only "major" platform not using the kernel/Kconfig.hz definitions, instead rolling it's own and setting what could be described as both reasonable and unreasonable defaults for platforms. If we're going wholesale for multiplatform on ARM then having CONFIG_HZ be selected dependent on platform options seems rather curious since building a kernel for Exynos, OMAP or so will force the default to a value which is not truly desired by the maintainers. config HZ int default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ ARCH_S5PV210 || ARCH_EXYNOS4 default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER default AT91_TIMER_HZ if ARCH_AT91 default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE default 100 There is a patch floating around ("ARM: OMAP2+: timer: remove CONFIG_OMAP_32K_TIMER") which modifies the OMAP line, so I'll ignore that for my below example, and I saw a patch for adding Exynos5 processors to the top default somewhere around here. So, based on those getting in, in my case here, I can see a situation where; * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I will get CONFIG_HZ=200. * If I built for just i.MX6, I will get CONFIG_HZ=100. Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the other ARM platforms I also want to boot on it.. this is not exactly multiplatform compliant, right? In fact, if I want any other value without meeting any of the other defaults I am *forced* to have a CONFIG_HZ value of 100 (running oldconfig will set any value back to this), because none of the standard (100/300/1000 as I see on x86 and PPC) selection entries or the override control are present or sourced in the main arch/arm/Kconfig. This seems infuriatingly inconsistent - and I am absolutely sure that the default for Samsung platforms is basically totally unreasonable (and definitely not multiplatform-aware) behavior in forcing some default setting. For AT91 and SHMOBILE, I am not sure at all.. given the need for the OMAP platform to know what it's timer frequency is, maybe they can be worked around the same way as the OMAP patch so the dependencies get removed, but I also don't understand why the actual value CONFIG_HZ would really matter in these cases (except that it would stop the kernel trying to check or queue timer events more often than the timer is capable of running.. surely this is a runtime issue and proper use of the sched_clock implementation handles this?) This could in theory be resolved by having the arch-specific Kconfigs add for example CONFIG_HZ_MY_ARCH (similar to kernel/Kconfig.hz's CONFIG_HZ_1000 which selects 1000 as the "default") and selecting it if !ARCH_MULTIPLATFORM, which keeps these special little "my arch is different to your arch" quirks out of a core configuration file. That way Exynos-only kernels keep their 200, and AT91 keeps it's.. whatever that config item resolves to (128 I think), and they would pop up in the list with 100/300/1000. Also, on ARCH_MULTIPLATFORM kernels, the default-setting behavior is turned off, so all you'd see is 100/300/1000 and an opportunity to set your own value. This is, I think, what should be the case - that rather than "magically" selecting CONFIG_HZ's value, it should be up to the configurator (individual, maintainer shipping a defconfig, distribution) of the kernel. And, why not document that "foo" arch runs better with "CONFIG_HZ_MY_ARCH" and instruct configurators of the kernel to do the right thing, or pick the average value, or specific lowest-common-denominator value, instead of forcing the value to the default for the highest/lowest/random arch that met the dependency of the "default" directive? The Kconfig system isn't smart enough to handle this automatically for multiplatform. Additionally, using kernel/Kconfig.hz is a predicate for enabling (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere else. I don't know how many ARM systems here benefit from this, if there is a benefit, or what this really means.. if you really have a high resolution timer (and hrtimers enabled) that would assist the scheduler this way, is it supposed to make a big difference to the way the scheduler works for the better or worse? Is this actually overridden by ARM sched_clock handling or so? Shouldn't there be a help entry or some documentation for what this option does? I have CC'd the scheduler maintainers because I'd really like to know what I am doing here before I venture into putting patches out which could potentially rip open spacetime and have us all sucked in.. And I guess I have one more question before I do attempt to open that tear, what really is the effect of CONFIG_HZ vs. CONFIG_NO_HZ vs. ARM sched_clock, and usage of the new hooks to register a real timer as ARM delay_timer? I have patches I can modify for upstream that add both device tree implementation and probing of i.MX highres clocksources (GPT and EPIT) and registration of sched_clock and delay timer implementations based on these clocks, but while the code compiles and seems to work, the ACTUAL effect of these (and the fundamental requirements for the clocks being used) seems to be information only in the minds of the people who wrote the code. It's not that obvious to me what the true effect of using a non-architected ARM core timer for at least the delay_timer is, and I have some really odd lpj values and very strange re-calibrations popping out (with constant rate for the timer, lpj goes down.. when using the delay_timer implementation, shouldn't lpj be still relative to the timer rate and NOT cpu frequency?) when using cpufreq on i.MX5 when I do it, and whether CONFIG_SCHED_HRTICK is a good or bad idea.. Apologies for the insane number of questions here, but fully appreciative of any answers, -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 20:01 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 20:01 UTC (permalink / raw) To: linux-arm-kernel Hello all, Understanding that this is a bit of a digression, I have a related nitpick to discussion of the patch "arm: kconfig: don't select TWD with local timer for Armada 370/XP" which is allowing me to explain myself a little better given Arnd's recommendation for it, since I was looking for a really good way to describe it without seeming too focused on a particular configuration item.. So, to recap, there is a discussion going on about where HAVE_ lives and what ARCH_MULTIPLATFORM breakes when using HAVE_. I think this is related, at least, to configuration reworks to make ARCH_MULTIPLATFORM a truly "inclusive" place.. ARM seems to be the only "major" platform not using the kernel/Kconfig.hz definitions, instead rolling it's own and setting what could be described as both reasonable and unreasonable defaults for platforms. If we're going wholesale for multiplatform on ARM then having CONFIG_HZ be selected dependent on platform options seems rather curious since building a kernel for Exynos, OMAP or so will force the default to a value which is not truly desired by the maintainers. config HZ int default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ ARCH_S5PV210 || ARCH_EXYNOS4 default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER default AT91_TIMER_HZ if ARCH_AT91 default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE default 100 There is a patch floating around ("ARM: OMAP2+: timer: remove CONFIG_OMAP_32K_TIMER") which modifies the OMAP line, so I'll ignore that for my below example, and I saw a patch for adding Exynos5 processors to the top default somewhere around here. So, based on those getting in, in my case here, I can see a situation where; * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I will get CONFIG_HZ=200. * If I built for just i.MX6, I will get CONFIG_HZ=100. Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the other ARM platforms I also want to boot on it.. this is not exactly multiplatform compliant, right? In fact, if I want any other value without meeting any of the other defaults I am *forced* to have a CONFIG_HZ value of 100 (running oldconfig will set any value back to this), because none of the standard (100/300/1000 as I see on x86 and PPC) selection entries or the override control are present or sourced in the main arch/arm/Kconfig. This seems infuriatingly inconsistent - and I am absolutely sure that the default for Samsung platforms is basically totally unreasonable (and definitely not multiplatform-aware) behavior in forcing some default setting. For AT91 and SHMOBILE, I am not sure at all.. given the need for the OMAP platform to know what it's timer frequency is, maybe they can be worked around the same way as the OMAP patch so the dependencies get removed, but I also don't understand why the actual value CONFIG_HZ would really matter in these cases (except that it would stop the kernel trying to check or queue timer events more often than the timer is capable of running.. surely this is a runtime issue and proper use of the sched_clock implementation handles this?) This could in theory be resolved by having the arch-specific Kconfigs add for example CONFIG_HZ_MY_ARCH (similar to kernel/Kconfig.hz's CONFIG_HZ_1000 which selects 1000 as the "default") and selecting it if !ARCH_MULTIPLATFORM, which keeps these special little "my arch is different to your arch" quirks out of a core configuration file. That way Exynos-only kernels keep their 200, and AT91 keeps it's.. whatever that config item resolves to (128 I think), and they would pop up in the list with 100/300/1000. Also, on ARCH_MULTIPLATFORM kernels, the default-setting behavior is turned off, so all you'd see is 100/300/1000 and an opportunity to set your own value. This is, I think, what should be the case - that rather than "magically" selecting CONFIG_HZ's value, it should be up to the configurator (individual, maintainer shipping a defconfig, distribution) of the kernel. And, why not document that "foo" arch runs better with "CONFIG_HZ_MY_ARCH" and instruct configurators of the kernel to do the right thing, or pick the average value, or specific lowest-common-denominator value, instead of forcing the value to the default for the highest/lowest/random arch that met the dependency of the "default" directive? The Kconfig system isn't smart enough to handle this automatically for multiplatform. Additionally, using kernel/Kconfig.hz is a predicate for enabling (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere else. I don't know how many ARM systems here benefit from this, if there is a benefit, or what this really means.. if you really have a high resolution timer (and hrtimers enabled) that would assist the scheduler this way, is it supposed to make a big difference to the way the scheduler works for the better or worse? Is this actually overridden by ARM sched_clock handling or so? Shouldn't there be a help entry or some documentation for what this option does? I have CC'd the scheduler maintainers because I'd really like to know what I am doing here before I venture into putting patches out which could potentially rip open spacetime and have us all sucked in.. And I guess I have one more question before I do attempt to open that tear, what really is the effect of CONFIG_HZ vs. CONFIG_NO_HZ vs. ARM sched_clock, and usage of the new hooks to register a real timer as ARM delay_timer? I have patches I can modify for upstream that add both device tree implementation and probing of i.MX highres clocksources (GPT and EPIT) and registration of sched_clock and delay timer implementations based on these clocks, but while the code compiles and seems to work, the ACTUAL effect of these (and the fundamental requirements for the clocks being used) seems to be information only in the minds of the people who wrote the code. It's not that obvious to me what the true effect of using a non-architected ARM core timer for at least the delay_timer is, and I have some really odd lpj values and very strange re-calibrations popping out (with constant rate for the timer, lpj goes down.. when using the delay_timer implementation, shouldn't lpj be still relative to the timer rate and NOT cpu frequency?) when using cpufreq on i.MX5 when I do it, and whether CONFIG_SCHED_HRTICK is a good or bad idea.. Apologies for the insane number of questions here, but fully appreciative of any answers, -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 20:01 ` Matt Sealey @ 2013-01-21 20:41 ` Arnd Bergmann -1 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-21 20:41 UTC (permalink / raw) To: Matt Sealey Cc: Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, John Stultz, Ben Dooks On Monday 21 January 2013, Matt Sealey wrote: > > ARM seems to be the only "major" platform not using the > kernel/Kconfig.hz definitions, instead rolling it's own and setting > what could be described as both reasonable and unreasonable defaults > for platforms. If we're going wholesale for multiplatform on ARM then > having CONFIG_HZ be selected dependent on platform options seems > rather curious since building a kernel for Exynos, OMAP or so will > force the default to a value which is not truly desired by the > maintainers. Agreed 100%. (adding John Stultz to Cc, he's the local time expert) > config HZ > int > default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ > ARCH_S5PV210 || ARCH_EXYNOS4 > default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER > default AT91_TIMER_HZ if ARCH_AT91 > default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE > default 100 > > There is a patch floating around ("ARM: OMAP2+: timer: remove > CONFIG_OMAP_32K_TIMER") > which modifies the OMAP line, so I'll ignore that for my below > example, and I saw a patch for adding Exynos5 processors to the top > default somewhere around here. > > So, based on those getting in, in my case here, I can see a situation where; > > * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I > will get CONFIG_HZ=200. > > * If I built for just i.MX6, I will get CONFIG_HZ=100. > > Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the > other ARM platforms I also want to boot on it.. this is not exactly > multiplatform compliant, right? Right. It's pretty clear that the above logic does not work with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM select NO_HZ to make the question much less interesting. Regarding the defaults, I would suggest putting them into all the defaults into the defconfig files and removing the other hardcoding otherwise. Ben Dooks and Russell are probably the best to know what triggered the 200 HZ for s3c24xx and for ebsa110. My guess is that the other samsung ones are the result of cargo cult programming. at91 and omap set the HZ value to something that is derived from their hardware timer, but we have also forever had logic to calculate the exact time when that does not match. This code has very recently been moved into the new register_refined_jiffies() function. John can probably tell is if this solves all the problems for these platforms. > Additionally, using kernel/Kconfig.hz is a predicate for enabling > (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere > else. I don't know how many ARM systems here benefit from this, if > there is a benefit, or what this really means.. if you really have a > high resolution timer (and hrtimers enabled) that would assist the > scheduler this way, is it supposed to make a big difference to the way > the scheduler works for the better or worse? Is this actually > overridden by ARM sched_clock handling or so? Shouldn't there be a > help entry or some documentation for what this option does? I have > CC'd the scheduler maintainers because I'd really like to know what I > am doing here before I venture into putting patches out which could > potentially rip open spacetime and have us all sucked in.. Yes, that sounds like yet another bug. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 20:41 ` Arnd Bergmann 0 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-21 20:41 UTC (permalink / raw) To: linux-arm-kernel On Monday 21 January 2013, Matt Sealey wrote: > > ARM seems to be the only "major" platform not using the > kernel/Kconfig.hz definitions, instead rolling it's own and setting > what could be described as both reasonable and unreasonable defaults > for platforms. If we're going wholesale for multiplatform on ARM then > having CONFIG_HZ be selected dependent on platform options seems > rather curious since building a kernel for Exynos, OMAP or so will > force the default to a value which is not truly desired by the > maintainers. Agreed 100%. (adding John Stultz to Cc, he's the local time expert) > config HZ > int > default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ > ARCH_S5PV210 || ARCH_EXYNOS4 > default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER > default AT91_TIMER_HZ if ARCH_AT91 > default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE > default 100 > > There is a patch floating around ("ARM: OMAP2+: timer: remove > CONFIG_OMAP_32K_TIMER") > which modifies the OMAP line, so I'll ignore that for my below > example, and I saw a patch for adding Exynos5 processors to the top > default somewhere around here. > > So, based on those getting in, in my case here, I can see a situation where; > > * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I > will get CONFIG_HZ=200. > > * If I built for just i.MX6, I will get CONFIG_HZ=100. > > Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the > other ARM platforms I also want to boot on it.. this is not exactly > multiplatform compliant, right? Right. It's pretty clear that the above logic does not work with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM select NO_HZ to make the question much less interesting. Regarding the defaults, I would suggest putting them into all the defaults into the defconfig files and removing the other hardcoding otherwise. Ben Dooks and Russell are probably the best to know what triggered the 200 HZ for s3c24xx and for ebsa110. My guess is that the other samsung ones are the result of cargo cult programming. at91 and omap set the HZ value to something that is derived from their hardware timer, but we have also forever had logic to calculate the exact time when that does not match. This code has very recently been moved into the new register_refined_jiffies() function. John can probably tell is if this solves all the problems for these platforms. > Additionally, using kernel/Kconfig.hz is a predicate for enabling > (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere > else. I don't know how many ARM systems here benefit from this, if > there is a benefit, or what this really means.. if you really have a > high resolution timer (and hrtimers enabled) that would assist the > scheduler this way, is it supposed to make a big difference to the way > the scheduler works for the better or worse? Is this actually > overridden by ARM sched_clock handling or so? Shouldn't there be a > help entry or some documentation for what this option does? I have > CC'd the scheduler maintainers because I'd really like to know what I > am doing here before I venture into putting patches out which could > potentially rip open spacetime and have us all sucked in.. Yes, that sounds like yet another bug. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 20:41 ` Arnd Bergmann @ 2013-01-21 21:00 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-21 21:00 UTC (permalink / raw) To: Arnd Bergmann Cc: Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, Ben Dooks On 01/21/2013 12:41 PM, Arnd Bergmann wrote: > On Monday 21 January 2013, Matt Sealey wrote: >> config HZ >> int >> default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ >> ARCH_S5PV210 || ARCH_EXYNOS4 >> default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER >> default AT91_TIMER_HZ if ARCH_AT91 >> default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE >> default 100 >> >> There is a patch floating around ("ARM: OMAP2+: timer: remove >> CONFIG_OMAP_32K_TIMER") >> which modifies the OMAP line, so I'll ignore that for my below >> example, and I saw a patch for adding Exynos5 processors to the top >> default somewhere around here. >> >> So, based on those getting in, in my case here, I can see a situation where; >> >> * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I >> will get CONFIG_HZ=200. >> >> * If I built for just i.MX6, I will get CONFIG_HZ=100. >> >> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the >> other ARM platforms I also want to boot on it.. this is not exactly >> multiplatform compliant, right? > Right. It's pretty clear that the above logic does not work > with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM > select NO_HZ to make the question much less interesting. Although, even with NO_HZ, we still have some sense of HZ. > Regarding the defaults, I would suggest putting them into all the > defaults into the defconfig files and removing the other hardcoding > otherwise. Ben Dooks and Russell are probably the best to know > what triggered the 200 HZ for s3c24xx and for ebsa110. My guess > is that the other samsung ones are the result of cargo cult > programming. > > at91 and omap set the HZ value to something that is derived > from their hardware timer, but we have also forever had logic > to calculate the exact time when that does not match. This code > has very recently been moved into the new register_refined_jiffies() > function. John can probably tell is if this solves all the problems > for these platforms. Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and the register_refined_jiffies is really only necessary if you're not expecting a proper clocksource to eventually be registered), assuming the hardware can do something close to the HZ value requested. So I'd probably want to hear about what history caused the specific 200 HZ selections, as I suspect there's actual hardware limitations there. So if you can not get actual timer ticks any faster then 200 HZ on that hardware, setting HZ higher could cause some jiffies related timer trouble (ie: if the kernel thinks HZ is 1000 but the hardware can only do 200, that's a different problem then if the hardware actually can only do 999.8 HZ). So things like timer-wheel timeouts may not happen when they should. I suspect the best approach for multi-arch in those cases may be to select HZ=100 and use HRT to allow more modern systems to have finer-grained timers. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 21:00 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-21 21:00 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2013 12:41 PM, Arnd Bergmann wrote: > On Monday 21 January 2013, Matt Sealey wrote: >> config HZ >> int >> default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ >> ARCH_S5PV210 || ARCH_EXYNOS4 >> default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER >> default AT91_TIMER_HZ if ARCH_AT91 >> default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE >> default 100 >> >> There is a patch floating around ("ARM: OMAP2+: timer: remove >> CONFIG_OMAP_32K_TIMER") >> which modifies the OMAP line, so I'll ignore that for my below >> example, and I saw a patch for adding Exynos5 processors to the top >> default somewhere around here. >> >> So, based on those getting in, in my case here, I can see a situation where; >> >> * I build multiplatform for i.MX6 and Exynos4/5 ARCH_MULTIPLATFORM, I >> will get CONFIG_HZ=200. >> >> * If I built for just i.MX6, I will get CONFIG_HZ=100. >> >> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the >> other ARM platforms I also want to boot on it.. this is not exactly >> multiplatform compliant, right? > Right. It's pretty clear that the above logic does not work > with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM > select NO_HZ to make the question much less interesting. Although, even with NO_HZ, we still have some sense of HZ. > Regarding the defaults, I would suggest putting them into all the > defaults into the defconfig files and removing the other hardcoding > otherwise. Ben Dooks and Russell are probably the best to know > what triggered the 200 HZ for s3c24xx and for ebsa110. My guess > is that the other samsung ones are the result of cargo cult > programming. > > at91 and omap set the HZ value to something that is derived > from their hardware timer, but we have also forever had logic > to calculate the exact time when that does not match. This code > has very recently been moved into the new register_refined_jiffies() > function. John can probably tell is if this solves all the problems > for these platforms. Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and the register_refined_jiffies is really only necessary if you're not expecting a proper clocksource to eventually be registered), assuming the hardware can do something close to the HZ value requested. So I'd probably want to hear about what history caused the specific 200 HZ selections, as I suspect there's actual hardware limitations there. So if you can not get actual timer ticks any faster then 200 HZ on that hardware, setting HZ higher could cause some jiffies related timer trouble (ie: if the kernel thinks HZ is 1000 but the hardware can only do 200, that's a different problem then if the hardware actually can only do 999.8 HZ). So things like timer-wheel timeouts may not happen when they should. I suspect the best approach for multi-arch in those cases may be to select HZ=100 and use HRT to allow more modern systems to have finer-grained timers. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:00 ` John Stultz @ 2013-01-21 21:12 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 21:12 UTC (permalink / raw) To: John Stultz Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Ben Dooks On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote: > So if you can not get actual timer ticks any faster then 200 HZ on that > hardware, setting HZ higher could cause some jiffies related timer > trouble Err, no John. It's the other way around - especially on some platforms which are incapable of being converted to the clock source support. EBSA110 has _one_ counter. It counts down at a certain rate, and when it rolls over from 0 to FFFF, it produces an interrupt and continues counting down from FFFF. To produce anything close to a reasonable regular tick rate from that, the only way to do it is - with interrupts disabled - read the current value to find out how far the timer has rolled over, and set it so that the next event will expire as close as possible to the desired HZ rate. So, none of the clcokevent stuff can be used; and we rely _purely_ on counting interrupts in jiffy based increments to provide any reference of time. Moreover, because the counter is only 16-bit, and it's clocked from something around 7MHz, well, maths will tell you why 200Hz had to be chosen rather than 100Hz. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 21:12 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 21:12 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote: > So if you can not get actual timer ticks any faster then 200 HZ on that > hardware, setting HZ higher could cause some jiffies related timer > trouble Err, no John. It's the other way around - especially on some platforms which are incapable of being converted to the clock source support. EBSA110 has _one_ counter. It counts down at a certain rate, and when it rolls over from 0 to FFFF, it produces an interrupt and continues counting down from FFFF. To produce anything close to a reasonable regular tick rate from that, the only way to do it is - with interrupts disabled - read the current value to find out how far the timer has rolled over, and set it so that the next event will expire as close as possible to the desired HZ rate. So, none of the clcokevent stuff can be used; and we rely _purely_ on counting interrupts in jiffy based increments to provide any reference of time. Moreover, because the counter is only 16-bit, and it's clocked from something around 7MHz, well, maths will tell you why 200Hz had to be chosen rather than 100Hz. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:12 ` Russell King - ARM Linux @ 2013-01-21 22:18 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-21 22:18 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Ben Dooks On 01/21/2013 01:12 PM, Russell King - ARM Linux wrote: > On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote: >> So if you can not get actual timer ticks any faster then 200 HZ on that >> hardware, setting HZ higher could cause some jiffies related timer >> trouble > Err, no John. It's the other way around - especially on some platforms > which are incapable of being converted to the clock source support. > > EBSA110 has _one_ counter. It counts down at a certain rate, and when > it rolls over from 0 to FFFF, it produces an interrupt and continues > counting down from FFFF. > > To produce anything close to a reasonable regular tick rate from that, > the only way to do it is - with interrupts disabled - read the current > value to find out how far the timer has rolled over, and set it so that > the next event will expire as close as possible to the desired HZ rate. > > So, none of the clcokevent stuff can be used; and we rely _purely_ on > counting interrupts in jiffy based increments to provide any reference > of time. > Moreover, because the counter is only 16-bit, and it's clocked from > something around 7MHz, well, maths will tell you why 200Hz had to be > chosen rather than 100Hz. Ah, so the counter can't do anything *lower* then ~107HZ, right? (7MHZ/2^16) So we used to have the ACTHZ code to handle error from the HZ rate requested and the HZ rate possible given the underlying hardware. That's been moved to the register_refined_jiffies(), but do you have a sense if there a reason it couldn't be used? I don't quite recall the bounds at this second, so ~7% error might very well be too large. So yes, I suspect these sorts of platforms, where there are no modern clocksource/clockevent driver, as well as further constraints (like specific HZ) are likely not good candidates for a multi-arch build. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:18 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-21 22:18 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2013 01:12 PM, Russell King - ARM Linux wrote: > On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote: >> So if you can not get actual timer ticks any faster then 200 HZ on that >> hardware, setting HZ higher could cause some jiffies related timer >> trouble > Err, no John. It's the other way around - especially on some platforms > which are incapable of being converted to the clock source support. > > EBSA110 has _one_ counter. It counts down at a certain rate, and when > it rolls over from 0 to FFFF, it produces an interrupt and continues > counting down from FFFF. > > To produce anything close to a reasonable regular tick rate from that, > the only way to do it is - with interrupts disabled - read the current > value to find out how far the timer has rolled over, and set it so that > the next event will expire as close as possible to the desired HZ rate. > > So, none of the clcokevent stuff can be used; and we rely _purely_ on > counting interrupts in jiffy based increments to provide any reference > of time. > Moreover, because the counter is only 16-bit, and it's clocked from > something around 7MHz, well, maths will tell you why 200Hz had to be > chosen rather than 100Hz. Ah, so the counter can't do anything *lower* then ~107HZ, right? (7MHZ/2^16) So we used to have the ACTHZ code to handle error from the HZ rate requested and the HZ rate possible given the underlying hardware. That's been moved to the register_refined_jiffies(), but do you have a sense if there a reason it couldn't be used? I don't quite recall the bounds at this second, so ~7% error might very well be too large. So yes, I suspect these sorts of platforms, where there are no modern clocksource/clockevent driver, as well as further constraints (like specific HZ) are likely not good candidates for a multi-arch build. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:18 ` John Stultz @ 2013-01-21 22:44 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:44 UTC (permalink / raw) To: John Stultz Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Ben Dooks On Mon, Jan 21, 2013 at 02:18:20PM -0800, John Stultz wrote: > So we used to have the ACTHZ code to handle error from the HZ rate > requested and the HZ rate possible given the underlying hardware. That's > been moved to the register_refined_jiffies(), but do you have a sense if > there a reason it couldn't be used? I don't quite recall the bounds at > this second, so ~7% error might very well be too large. > > So yes, I suspect these sorts of platforms, where there are no modern > clocksource/clockevent driver, as well as further constraints (like > specific HZ) are likely not good candidates for a multi-arch build. In this particular case, EBSA110 is not a candidate for multi-arch build anyway, because it's ARMv4 and we're only really bothering with ARMv6 and better. Not only that, but the IO stuff on it is sufficiently obscure and non-standard... ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:44 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:44 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 02:18:20PM -0800, John Stultz wrote: > So we used to have the ACTHZ code to handle error from the HZ rate > requested and the HZ rate possible given the underlying hardware. That's > been moved to the register_refined_jiffies(), but do you have a sense if > there a reason it couldn't be used? I don't quite recall the bounds at > this second, so ~7% error might very well be too large. > > So yes, I suspect these sorts of platforms, where there are no modern > clocksource/clockevent driver, as well as further constraints (like > specific HZ) are likely not good candidates for a multi-arch build. In this particular case, EBSA110 is not a candidate for multi-arch build anyway, because it's ARMv4 and we're only really bothering with ARMv6 and better. Not only that, but the IO stuff on it is sufficiently obscure and non-standard... ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:44 ` Russell King - ARM Linux @ 2013-01-22 8:27 ` Arnd Bergmann -1 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-22 8:27 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Monday 21 January 2013, Russell King - ARM Linux wrote: > In this particular case, EBSA110 is not a candidate for multi-arch > build anyway, because it's ARMv4 and we're only really bothering with > ARMv6 and better. > > Not only that, but the IO stuff on it is sufficiently obscure and > non-standard... Right, no point worrying about EBSA110. We need to work out OMAP and Exynos/S5P though: As long as OMAP needs 128HZ and Exynos needs 200HZ, we can never have them in the same kernel. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 8:27 ` Arnd Bergmann 0 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-22 8:27 UTC (permalink / raw) To: linux-arm-kernel On Monday 21 January 2013, Russell King - ARM Linux wrote: > In this particular case, EBSA110 is not a candidate for multi-arch > build anyway, because it's ARMv4 and we're only really bothering with > ARMv6 and better. > > Not only that, but the IO stuff on it is sufficiently obscure and > non-standard... Right, no point worrying about EBSA110. We need to work out OMAP and Exynos/S5P though: As long as OMAP needs 128HZ and Exynos needs 200HZ, we can never have them in the same kernel. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:12 ` Russell King - ARM Linux @ 2013-01-21 22:20 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 22:20 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Ben Dooks On Mon, Jan 21, 2013 at 3:12 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote: >> So if you can not get actual timer ticks any faster then 200 HZ on that >> hardware, setting HZ higher could cause some jiffies related timer >> trouble > > Err, no John. It's the other way around - especially on some platforms > which are incapable of being converted to the clock source support. > > EBSA110 has _one_ counter. It counts down at a certain rate, and when > it rolls over from 0 to FFFF, it produces an interrupt and continues > counting down from FFFF. > > To produce anything close to a reasonable regular tick rate from that, > the only way to do it is - with interrupts disabled - read the current > value to find out how far the timer has rolled over, and set it so that > the next event will expire as close as possible to the desired HZ rate. > > So, none of the clcokevent stuff can be used; and we rely _purely_ on > counting interrupts in jiffy based increments to provide any reference > of time. > > Moreover, because the counter is only 16-bit, and it's clocked from > something around 7MHz, well, maths will tell you why 200Hz had to be > chosen rather than 100Hz. I am sorry it sounded if I was being high and mighty about not being able to select my own HZ (or being forced by Exynos to be 200 or by not being able to test an Exynos board, forced to default to 100). My real "grievance" here is we got a configuration item for the scheduler which is being left out of ARM configurations which *can* use high resolution timers, but I don't know if this is a real problem or not, hence asking about it, and that HZ=100 is the ARM default whether we might be able to select that or not.. which seems low. HZ=250 is the "current" kernel default if you don't touch anything, it seems, apologies for thinking it was HZ=100. And that is too high for EBSA110 and a couple of other boards, especially where HZ must equal some exact divisor being pumped right into some timer unit. Understood. Surely the correct divisor should be *derived* from HZ and not just dumped into the timer though, so HZ being set to an exact divisor (but a round-down-to-acceptable-value) is kind of a hacky concept..? For the global kernel guys, I'd ask what is the reasoning for using HZ=250 by default, I wonder? It seems like this number is from the dark ages (pre-git, pre-bitkeeper, maybe pre-recorded history ;) and the reason is lost. Why not HZ=100 or HZ=300 (if the help text is to be believed, and it is probably older than God, HZ=300 is great for playing back NTSC-format video.. :)? I can side with you on the premise that in actual fact, defining a default HZ value in the non-arch-specific kernel proper is a little quirky and it should be something the arches do themselves (i.e. move the default-setting stuff at the end into the arch/*/Kconfig - I would expect that now i386 CPU support is gone from arch/x86, there's potentially a better value than HZ=250 for the default?). Anyway, a patch for ARM could perhaps end up like this: ~~ if ARCH_MULTIPLATFORM source kernel/Kconfig.hz else HZ default 100 endif HZ default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER # any previous platform definitions where *really* required here. # but not default 100 since it would override kernel/Kconfig.hz every time ~~ Which preserves all previous behaviors on all possible ARM arch combinations, but where no reasonable override is set.. Kconfig.hz is king. I cannot imagine any situation except for AT91 or OMAP could not do this in their own {mach,plat}-*/Kconfigs and not in the core config, which cleans up the extra HZ block. We can agree that the "default 200 if.." list is unwieldy and Arnd is right in that there is some cargo-cult programming going on here, right? Even if we assume EBSA110 and a couple others are really affected by having such timer setups, therefore "reasonable", I'd challenge anyone to tell me Exynos4 or the S5P platforms do not have high resolution timers capable of handling more than HZ=200 (or the default HZ=250) which I would class as "unreasonable".. this is why I said it was possibly both. I am not one to judge some of these platforms I've never even heard of, that is why I am *asking* about it before I even think of doing anything about it. I tested this a few weeks ago with a *few* defconfigs (by sourcing Kconfig.hz above the existing HZ definitions) and it does effectively override the value I went in and stabbed into menuconfig, in the resultant generated local .config file - if they themselves are sourced AFTER the source kernel/Kconfig.hz (which they pretty much are) in arch/arm/Kconfig. Could we also at least agree that if EBSA110 can handle HZ=200 with a 16-bit timer, or HZ=128 for OMAP and that AT91 will override it to 100 on it's own, then that "default 100" is overly restrictive and we could remove it, allowing each {mach,plat}-*/Kconfig owner to investigate and find the correct HZ value and implement an override or selection, or just allow free configuration? As far as I can tell AT91 and SHMOBILE only supply defaults because HZ *must* meet some exact timer divisor (OMAP says "Kernel internal timer frequency should be a divisor of 32768") in which case their timer drivers should not be so stupid and instead round down to the nearest acceptable timer divisor or WARN_ON if the compile-time values are unacceptable at runtime before anyone sees any freakish behavior. Is it a hard requirement for the ARM architecture that a woefully mis-configured kernel MUST boot completely to userspace? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:20 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 22:20 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 3:12 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 01:00:15PM -0800, John Stultz wrote: >> So if you can not get actual timer ticks any faster then 200 HZ on that >> hardware, setting HZ higher could cause some jiffies related timer >> trouble > > Err, no John. It's the other way around - especially on some platforms > which are incapable of being converted to the clock source support. > > EBSA110 has _one_ counter. It counts down at a certain rate, and when > it rolls over from 0 to FFFF, it produces an interrupt and continues > counting down from FFFF. > > To produce anything close to a reasonable regular tick rate from that, > the only way to do it is - with interrupts disabled - read the current > value to find out how far the timer has rolled over, and set it so that > the next event will expire as close as possible to the desired HZ rate. > > So, none of the clcokevent stuff can be used; and we rely _purely_ on > counting interrupts in jiffy based increments to provide any reference > of time. > > Moreover, because the counter is only 16-bit, and it's clocked from > something around 7MHz, well, maths will tell you why 200Hz had to be > chosen rather than 100Hz. I am sorry it sounded if I was being high and mighty about not being able to select my own HZ (or being forced by Exynos to be 200 or by not being able to test an Exynos board, forced to default to 100). My real "grievance" here is we got a configuration item for the scheduler which is being left out of ARM configurations which *can* use high resolution timers, but I don't know if this is a real problem or not, hence asking about it, and that HZ=100 is the ARM default whether we might be able to select that or not.. which seems low. HZ=250 is the "current" kernel default if you don't touch anything, it seems, apologies for thinking it was HZ=100. And that is too high for EBSA110 and a couple of other boards, especially where HZ must equal some exact divisor being pumped right into some timer unit. Understood. Surely the correct divisor should be *derived* from HZ and not just dumped into the timer though, so HZ being set to an exact divisor (but a round-down-to-acceptable-value) is kind of a hacky concept..? For the global kernel guys, I'd ask what is the reasoning for using HZ=250 by default, I wonder? It seems like this number is from the dark ages (pre-git, pre-bitkeeper, maybe pre-recorded history ;) and the reason is lost. Why not HZ=100 or HZ=300 (if the help text is to be believed, and it is probably older than God, HZ=300 is great for playing back NTSC-format video.. :)? I can side with you on the premise that in actual fact, defining a default HZ value in the non-arch-specific kernel proper is a little quirky and it should be something the arches do themselves (i.e. move the default-setting stuff at the end into the arch/*/Kconfig - I would expect that now i386 CPU support is gone from arch/x86, there's potentially a better value than HZ=250 for the default?). Anyway, a patch for ARM could perhaps end up like this: ~~ if ARCH_MULTIPLATFORM source kernel/Kconfig.hz else HZ default 100 endif HZ default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER # any previous platform definitions where *really* required here. # but not default 100 since it would override kernel/Kconfig.hz every time ~~ Which preserves all previous behaviors on all possible ARM arch combinations, but where no reasonable override is set.. Kconfig.hz is king. I cannot imagine any situation except for AT91 or OMAP could not do this in their own {mach,plat}-*/Kconfigs and not in the core config, which cleans up the extra HZ block. We can agree that the "default 200 if.." list is unwieldy and Arnd is right in that there is some cargo-cult programming going on here, right? Even if we assume EBSA110 and a couple others are really affected by having such timer setups, therefore "reasonable", I'd challenge anyone to tell me Exynos4 or the S5P platforms do not have high resolution timers capable of handling more than HZ=200 (or the default HZ=250) which I would class as "unreasonable".. this is why I said it was possibly both. I am not one to judge some of these platforms I've never even heard of, that is why I am *asking* about it before I even think of doing anything about it. I tested this a few weeks ago with a *few* defconfigs (by sourcing Kconfig.hz above the existing HZ definitions) and it does effectively override the value I went in and stabbed into menuconfig, in the resultant generated local .config file - if they themselves are sourced AFTER the source kernel/Kconfig.hz (which they pretty much are) in arch/arm/Kconfig. Could we also at least agree that if EBSA110 can handle HZ=200 with a 16-bit timer, or HZ=128 for OMAP and that AT91 will override it to 100 on it's own, then that "default 100" is overly restrictive and we could remove it, allowing each {mach,plat}-*/Kconfig owner to investigate and find the correct HZ value and implement an override or selection, or just allow free configuration? As far as I can tell AT91 and SHMOBILE only supply defaults because HZ *must* meet some exact timer divisor (OMAP says "Kernel internal timer frequency should be a divisor of 32768") in which case their timer drivers should not be so stupid and instead round down to the nearest acceptable timer divisor or WARN_ON if the compile-time values are unacceptable at runtime before anyone sees any freakish behavior. Is it a hard requirement for the ARM architecture that a woefully mis-configured kernel MUST boot completely to userspace? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:20 ` Matt Sealey @ 2013-01-21 22:42 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:42 UTC (permalink / raw) To: Matt Sealey Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Ben Dooks On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: > I am sorry it sounded if I was being high and mighty about not being > able to select my own HZ (or being forced by Exynos to be 200 or by > not being able to test an Exynos board, forced to default to 100). My > real "grievance" here is we got a configuration item for the scheduler > which is being left out of ARM configurations which *can* use high > resolution timers, but I don't know if this is a real problem or not, > hence asking about it, and that HZ=100 is the ARM default whether we > might be able to select that or not.. which seems low. Well, I have a versatile platform here. It's the inteligence behind the power control system for booting the boards on the nightly tests (currently disabled because I'm waiting for my main server to lock up again, and I need to use one of the serial ports for that.) The point is, it talks via I2C to a load of power monitors to read samples out. It does this at sub-100Hz intervals. Yet the kernel is built with HZ=100. NO_HZ=y and highres timers are enabled... works fine. So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and highres timers, it all works with epoll() - you get the interval that you're after. I've verified this with calls to gettimeofday() and the POSIX clocks. > HZ=250 is the "current" kernel default if you don't touch anything, it > seems, apologies for thinking it was HZ=100. Actually, it always used to be 100Hz on everything, including x86. It got upped when there were interactivity issues... which haven't been reported on ARM - so why change something that we know works and everyone is happy with? > And that is too high for > EBSA110 and a couple of other boards, especially where HZ must equal > some exact divisor being pumped right into some timer unit. EBSA110 can do 250Hz, but it'll mean manually recalculating the timer arithmetic - because it's not a "reloading" counter - software has to manually reload it, and you have to take account of how far it's rolled over to get anything close to a regular interrupt rate which NTP is happy with. And believe me, it used to be one of two main NTP broadcasting servers on my network, so I know it works. > Understood. Surely the correct divisor should be *derived* from HZ and > not just dumped into the timer though, so HZ being set to an exact > divisor (but a round-down-to-acceptable-value) is kind of a hacky > concept..? No. See above. It's not a simple bit of maths. You need to know how fast the CPU runs, and how many instructions it takes to read the current value, modify it, write it back and factor that into the calculation. Get it wrong - by even as little as one count - and the error is too large, and NTP fails to sync. > For the global kernel guys, I'd ask what is the reasoning for using > HZ=250 by default, I wonder? It seems like this number is from the > dark ages (pre-git, pre-bitkeeper, maybe pre-recorded history ;) and > the reason is lost. Why not HZ=100 or HZ=300 (if the help text is to > be believed, and it is probably older than God, HZ=300 is great for > playing back NTSC-format video.. :)? I can side with you on the > premise that in actual fact, defining a default HZ value in the > non-arch-specific kernel proper is a little quirky and it should be > something the arches do themselves (i.e. move the default-setting > stuff at the end into the arch/*/Kconfig - I would expect that now > i386 CPU support is gone from arch/x86, there's potentially a better > value than HZ=250 for the default?). >From what I remember, the history is that HZ used to be 100. Then it became 1000 as an experiment to do with desktop interactivity. That was found to be too heavy, so it was then dropped by a factor of 4 as a compromise. That's why kernel/Kconfig.hz has 100, 250 and 1000 - those are the values which were tried on x86 many years ago. > > Anyway, a patch for ARM could perhaps end up like this: > > ~~ > if ARCH_MULTIPLATFORM > source kernel/Kconfig.hz > else > HZ > default 100 > endif > > HZ > default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER > # any previous platform definitions where *really* required here. > # but not default 100 since it would override kernel/Kconfig.hz every time That doesn't work - if you define the same symbol twice, one definition takes priority over the other (I don't remember which way it works). They don't accumulate. > Which preserves all previous behaviors on all possible ARM arch > combinations, but where no reasonable override is set.. Kconfig.hz is > king. I cannot imagine any situation except for AT91 or OMAP could not > do this in their own {mach,plat}-*/Kconfigs and not in the core > config, which cleans up the extra HZ block. Because... it simply doesn't work like that. Try it and check to see what Kconfig produces. We know this, because our FRAME_POINTER config overrides the generic one - not partially, but totally and utterly in every way. > Could we also at least agree that if EBSA110 can handle HZ=200 with a > 16-bit timer, or HZ=128 for OMAP and that AT91 will override it to 100 > on it's own, then that "default 100" is overly restrictive and we > could remove it, allowing each {mach,plat}-*/Kconfig owner to > investigate and find the correct HZ value and implement an override or > selection, or just allow free configuration? I just don't see how that's remotely possible. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:42 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:42 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: > I am sorry it sounded if I was being high and mighty about not being > able to select my own HZ (or being forced by Exynos to be 200 or by > not being able to test an Exynos board, forced to default to 100). My > real "grievance" here is we got a configuration item for the scheduler > which is being left out of ARM configurations which *can* use high > resolution timers, but I don't know if this is a real problem or not, > hence asking about it, and that HZ=100 is the ARM default whether we > might be able to select that or not.. which seems low. Well, I have a versatile platform here. It's the inteligence behind the power control system for booting the boards on the nightly tests (currently disabled because I'm waiting for my main server to lock up again, and I need to use one of the serial ports for that.) The point is, it talks via I2C to a load of power monitors to read samples out. It does this at sub-100Hz intervals. Yet the kernel is built with HZ=100. NO_HZ=y and highres timers are enabled... works fine. So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and highres timers, it all works with epoll() - you get the interval that you're after. I've verified this with calls to gettimeofday() and the POSIX clocks. > HZ=250 is the "current" kernel default if you don't touch anything, it > seems, apologies for thinking it was HZ=100. Actually, it always used to be 100Hz on everything, including x86. It got upped when there were interactivity issues... which haven't been reported on ARM - so why change something that we know works and everyone is happy with? > And that is too high for > EBSA110 and a couple of other boards, especially where HZ must equal > some exact divisor being pumped right into some timer unit. EBSA110 can do 250Hz, but it'll mean manually recalculating the timer arithmetic - because it's not a "reloading" counter - software has to manually reload it, and you have to take account of how far it's rolled over to get anything close to a regular interrupt rate which NTP is happy with. And believe me, it used to be one of two main NTP broadcasting servers on my network, so I know it works. > Understood. Surely the correct divisor should be *derived* from HZ and > not just dumped into the timer though, so HZ being set to an exact > divisor (but a round-down-to-acceptable-value) is kind of a hacky > concept..? No. See above. It's not a simple bit of maths. You need to know how fast the CPU runs, and how many instructions it takes to read the current value, modify it, write it back and factor that into the calculation. Get it wrong - by even as little as one count - and the error is too large, and NTP fails to sync. > For the global kernel guys, I'd ask what is the reasoning for using > HZ=250 by default, I wonder? It seems like this number is from the > dark ages (pre-git, pre-bitkeeper, maybe pre-recorded history ;) and > the reason is lost. Why not HZ=100 or HZ=300 (if the help text is to > be believed, and it is probably older than God, HZ=300 is great for > playing back NTSC-format video.. :)? I can side with you on the > premise that in actual fact, defining a default HZ value in the > non-arch-specific kernel proper is a little quirky and it should be > something the arches do themselves (i.e. move the default-setting > stuff at the end into the arch/*/Kconfig - I would expect that now > i386 CPU support is gone from arch/x86, there's potentially a better > value than HZ=250 for the default?). >From what I remember, the history is that HZ used to be 100. Then it became 1000 as an experiment to do with desktop interactivity. That was found to be too heavy, so it was then dropped by a factor of 4 as a compromise. That's why kernel/Kconfig.hz has 100, 250 and 1000 - those are the values which were tried on x86 many years ago. > > Anyway, a patch for ARM could perhaps end up like this: > > ~~ > if ARCH_MULTIPLATFORM > source kernel/Kconfig.hz > else > HZ > default 100 > endif > > HZ > default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER > # any previous platform definitions where *really* required here. > # but not default 100 since it would override kernel/Kconfig.hz every time That doesn't work - if you define the same symbol twice, one definition takes priority over the other (I don't remember which way it works). They don't accumulate. > Which preserves all previous behaviors on all possible ARM arch > combinations, but where no reasonable override is set.. Kconfig.hz is > king. I cannot imagine any situation except for AT91 or OMAP could not > do this in their own {mach,plat}-*/Kconfigs and not in the core > config, which cleans up the extra HZ block. Because... it simply doesn't work like that. Try it and check to see what Kconfig produces. We know this, because our FRAME_POINTER config overrides the generic one - not partially, but totally and utterly in every way. > Could we also at least agree that if EBSA110 can handle HZ=200 with a > 16-bit timer, or HZ=128 for OMAP and that AT91 will override it to 100 > on it's own, then that "default 100" is overly restrictive and we > could remove it, allowing each {mach,plat}-*/Kconfig owner to > investigate and find the correct HZ value and implement an override or > selection, or just allow free configuration? I just don't see how that's remotely possible. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:42 ` Russell King - ARM Linux @ 2013-01-21 23:23 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 23:23 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: >> I am sorry it sounded if I was being high and mighty about not being >> able to select my own HZ (or being forced by Exynos to be 200 or by >> not being able to test an Exynos board, forced to default to 100). My >> real "grievance" here is we got a configuration item for the scheduler >> which is being left out of ARM configurations which *can* use high >> resolution timers, but I don't know if this is a real problem or not, >> hence asking about it, and that HZ=100 is the ARM default whether we >> might be able to select that or not.. which seems low. > > Well, I have a versatile platform here. It's the inteligence behind > the power control system for booting the boards on the nightly tests > (currently disabled because I'm waiting for my main server to lock up > again, and I need to use one of the serial ports for that.) > > The point is, it talks via I2C to a load of power monitors to read > samples out. It does this at sub-100Hz intervals. Yet the kernel is > built with HZ=100. NO_HZ=y and highres timers are enabled... works > fine. > > So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and > highres timers, it all works with epoll() - you get the interval that > you're after. I've verified this with calls to gettimeofday() and > the POSIX clocks. Okay. So, can you read this (it's short): http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt And please tell me if he's batshit crazy and I should completely ignore any scheduler discussion that isn't ARM-specific, or maybe.. and I can almost guarantee this, he doesn't have an ARM platform so he's just delightfully ill-informed about anything but his quad-core x86? >> HZ=250 is the "current" kernel default if you don't touch anything, it >> seems, apologies for thinking it was HZ=100. > > Actually, it always used to be 100Hz on everything, including x86. > It got upped when there were interactivity issues... which haven't > been reported on ARM - so why change something that we know works and > everyone is happy with? I don't know. I guess this is why I included Ingo and Peter as they seem to be responsible for core HZ-related things; why have HZ=250 on x86 when CONFIG_NO_HZ and HZ=100 would work just as effectively? Isn't CONFIG_NO_HZ the default on x86 and PPC and.. pretty much everything else? I know Con K. has been accused many times of peddling snake-oil... but he has pretty graphs and benchmarks that kind of bear him out on most things even if the results do not get his work upstream. I can't fault the statistical significance of his results.. but even a placebo effect can be graphed, correlation is not causation, etc, etc. - I don't know if anything real filters down into the documentation though. >> And that is too high for >> EBSA110 and a couple of other boards, especially where HZ must equal >> some exact divisor being pumped right into some timer unit. > > EBSA110 can do 250Hz, but it'll mean manually recalculating the timer > arithmetic - because it's not a "reloading" counter - software has to > manually reload it, and you have to take account of how far it's > rolled over to get anything close to a regular interrupt rate which > NTP is happy with. And believe me, it used to be one of two main NTP > broadcasting servers on my network, so I know it works. A-ha... >> Anyway, a patch for ARM could perhaps end up like this: >> >> ~~ >> if ARCH_MULTIPLATFORM >> source kernel/Kconfig.hz >> else >> HZ >> default 100 >> endif >> >> HZ >> default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER >> # any previous platform definitions where *really* required here. >> # but not default 100 since it would override kernel/Kconfig.hz every time > > That doesn't work - if you define the same symbol twice, one definition > takes priority over the other (I don't remember which way it works). > They don't accumulate. Well I did some testing.. a couple days of poking around, and they don't need to accumulate. > Because... it simply doesn't work like that. Try it and check to see > what Kconfig produces. I did test it.. whatever you define last, sticks, and it's down to the order they're parsed in the tree - luckily, arch/arm/Kconfig is sourced first, which sources the mach/plat stuff way down at the bottom. As long as you have your "default" set somewhere, any further default just has to be sourced or added later in *one* of the Kconfigs, same as building any C file with "gcc -E" and spitting it out. Someone, at the end of it all, has to set some default, and as long as the one you want is the last one, everything is shiny. > We know this, because our FRAME_POINTER config overrides the generic > one - not partially, but totally and utterly in every way. But for something as simple as CONFIG_HZ getting a value.. it works okay. If Kconfig.hz sets CONFIG_HZ=250 because CONFIG_HZ_250 is default yes, and it CONFIG_HZ defaults to 250 if it's set, and then you put HZ default 100 Right after it, or right after it's source in arch/x86/Kconfig, or whatever, that "default" is what sticks and what ends up in CONFIG_HZ in the local .config. > I just don't see how that's remotely possible. Maybe I tested it wrong, you'd know better than I exactly how (and I would appreciate knowing how so I can go back and test it again :) -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 23:23 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 23:23 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: >> I am sorry it sounded if I was being high and mighty about not being >> able to select my own HZ (or being forced by Exynos to be 200 or by >> not being able to test an Exynos board, forced to default to 100). My >> real "grievance" here is we got a configuration item for the scheduler >> which is being left out of ARM configurations which *can* use high >> resolution timers, but I don't know if this is a real problem or not, >> hence asking about it, and that HZ=100 is the ARM default whether we >> might be able to select that or not.. which seems low. > > Well, I have a versatile platform here. It's the inteligence behind > the power control system for booting the boards on the nightly tests > (currently disabled because I'm waiting for my main server to lock up > again, and I need to use one of the serial ports for that.) > > The point is, it talks via I2C to a load of power monitors to read > samples out. It does this at sub-100Hz intervals. Yet the kernel is > built with HZ=100. NO_HZ=y and highres timers are enabled... works > fine. > > So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and > highres timers, it all works with epoll() - you get the interval that > you're after. I've verified this with calls to gettimeofday() and > the POSIX clocks. Okay. So, can you read this (it's short): http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt And please tell me if he's batshit crazy and I should completely ignore any scheduler discussion that isn't ARM-specific, or maybe.. and I can almost guarantee this, he doesn't have an ARM platform so he's just delightfully ill-informed about anything but his quad-core x86? >> HZ=250 is the "current" kernel default if you don't touch anything, it >> seems, apologies for thinking it was HZ=100. > > Actually, it always used to be 100Hz on everything, including x86. > It got upped when there were interactivity issues... which haven't > been reported on ARM - so why change something that we know works and > everyone is happy with? I don't know. I guess this is why I included Ingo and Peter as they seem to be responsible for core HZ-related things; why have HZ=250 on x86 when CONFIG_NO_HZ and HZ=100 would work just as effectively? Isn't CONFIG_NO_HZ the default on x86 and PPC and.. pretty much everything else? I know Con K. has been accused many times of peddling snake-oil... but he has pretty graphs and benchmarks that kind of bear him out on most things even if the results do not get his work upstream. I can't fault the statistical significance of his results.. but even a placebo effect can be graphed, correlation is not causation, etc, etc. - I don't know if anything real filters down into the documentation though. >> And that is too high for >> EBSA110 and a couple of other boards, especially where HZ must equal >> some exact divisor being pumped right into some timer unit. > > EBSA110 can do 250Hz, but it'll mean manually recalculating the timer > arithmetic - because it's not a "reloading" counter - software has to > manually reload it, and you have to take account of how far it's > rolled over to get anything close to a regular interrupt rate which > NTP is happy with. And believe me, it used to be one of two main NTP > broadcasting servers on my network, so I know it works. A-ha... >> Anyway, a patch for ARM could perhaps end up like this: >> >> ~~ >> if ARCH_MULTIPLATFORM >> source kernel/Kconfig.hz >> else >> HZ >> default 100 >> endif >> >> HZ >> default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER >> # any previous platform definitions where *really* required here. >> # but not default 100 since it would override kernel/Kconfig.hz every time > > That doesn't work - if you define the same symbol twice, one definition > takes priority over the other (I don't remember which way it works). > They don't accumulate. Well I did some testing.. a couple days of poking around, and they don't need to accumulate. > Because... it simply doesn't work like that. Try it and check to see > what Kconfig produces. I did test it.. whatever you define last, sticks, and it's down to the order they're parsed in the tree - luckily, arch/arm/Kconfig is sourced first, which sources the mach/plat stuff way down at the bottom. As long as you have your "default" set somewhere, any further default just has to be sourced or added later in *one* of the Kconfigs, same as building any C file with "gcc -E" and spitting it out. Someone, at the end of it all, has to set some default, and as long as the one you want is the last one, everything is shiny. > We know this, because our FRAME_POINTER config overrides the generic > one - not partially, but totally and utterly in every way. But for something as simple as CONFIG_HZ getting a value.. it works okay. If Kconfig.hz sets CONFIG_HZ=250 because CONFIG_HZ_250 is default yes, and it CONFIG_HZ defaults to 250 if it's set, and then you put HZ default 100 Right after it, or right after it's source in arch/x86/Kconfig, or whatever, that "default" is what sticks and what ends up in CONFIG_HZ in the local .config. > I just don't see how that's remotely possible. Maybe I tested it wrong, you'd know better than I exactly how (and I would appreciate knowing how so I can go back and test it again :) -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 23:23 ` Matt Sealey @ 2013-01-21 23:49 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 23:49 UTC (permalink / raw) To: Matt Sealey Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: > >> I am sorry it sounded if I was being high and mighty about not being > >> able to select my own HZ (or being forced by Exynos to be 200 or by > >> not being able to test an Exynos board, forced to default to 100). My > >> real "grievance" here is we got a configuration item for the scheduler > >> which is being left out of ARM configurations which *can* use high > >> resolution timers, but I don't know if this is a real problem or not, > >> hence asking about it, and that HZ=100 is the ARM default whether we > >> might be able to select that or not.. which seems low. > > > > Well, I have a versatile platform here. It's the inteligence behind > > the power control system for booting the boards on the nightly tests > > (currently disabled because I'm waiting for my main server to lock up > > again, and I need to use one of the serial ports for that.) > > > > The point is, it talks via I2C to a load of power monitors to read > > samples out. It does this at sub-100Hz intervals. Yet the kernel is > > built with HZ=100. NO_HZ=y and highres timers are enabled... works > > fine. > > > > So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and > > highres timers, it all works with epoll() - you get the interval that > > you're after. I've verified this with calls to gettimeofday() and > > the POSIX clocks. > > Okay. > > So, can you read this (it's short): > > http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt > > And please tell me if he's batshit crazy and I should completely > ignore any scheduler discussion that isn't ARM-specific, or maybe.. > and I can almost guarantee this, he doesn't have an ARM platform so > he's just delightfully ill-informed about anything but his quad-core > x86? Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand... doesn't really fit into any of those categories given there. I'd suggest that what's given there is a suggestion/opinion based on behaviours observed on x86 platforms. Whether it's appropriate for other architectures is not really a proven point - is it worth running ARM at 1000Hz when the load from running at 100Hz is measurable as a definite error in loops_per_jiffy calibration? Remember - the load from the interrupt handler at 1000Hz is 10x the load at 100Hz. Do you want to spend more cycles per second on the possibly multi-layer IRQ servicing and timer servicing? And what about the interrupt latency issue that we've hit several times already with devices taking longer than 10ms to service their peripherals because the driver doesn't make use of delayed works/tasklets/etc. The lack of reasonable device DMA too has an impact for many drivers - the CPU has to spend more time in interrupt handlers (which are now run to the exclusion of any other interrupt in the system) performing PIO - or in the case of those systems which _do_ have DMA, they may end up having to do cache maintanence over large cache ranges from IRQ context which x86 doesn't have to do. There's many factors here, and the choice of what the right HZ is for a platform is not as clear cut as one may think. Given all the additional overheads we have on ARM because of the lack of memory coherency, the generally bad DMA support, etc, I think what we currently have is still right as an architecture default - 100Hz. > I did test it.. whatever you define last, sticks, and it's down to the > order they're parsed in the tree - luckily, arch/arm/Kconfig is > sourced first, which sources the mach/plat stuff way down at the > bottom. As long as you have your "default" set somewhere, any further > default just has to be sourced or added later in *one* of the > Kconfigs, same as building any C file with "gcc -E" and spitting it > out. > > Someone, at the end of it all, has to set some default, and as long as > the one you want is the last one, everything is shiny. Actually, we're both wrong. There seems to be two things which inflence it, and it basically comes down to this: - the value a particular symbol has comes from the _first_ declaration which a value is assigned to a symbol. So: config HZ int default 300 config HZ int default 100 if OPT1 default 200 if OPT2 default 400 takes on the value of 300 no matter what combination of OPT1 and OPT2 are enabled. config HZ int default 100 if OPT1 default 200 if OPT2 default 400 config HZ int default 300 never takes the value 300, but 100, 200 or 400. config HZ int default 100 if OPT1 default 200 if OPT2 config HZ int default 300 Will now take 100, 200, or 300 depending on which of OPT1/OPT2 are enabled. So, we _can_ use kernel/Kconfig.hz, but it's not very nice at all: we will be presenting users with configutation options for the HZ value which will be _silently_ ignored by Kconfig if we have a platform which overrides this. Probably fine if you think that Kconfig is a developers tool and you edit the configuration files (and therefore you expect them to know what they're doing, and how this stuff works), but not if you think that Kconfig users should be presented with meaningful options when configuring their kernel. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 23:49 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 23:49 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: > >> I am sorry it sounded if I was being high and mighty about not being > >> able to select my own HZ (or being forced by Exynos to be 200 or by > >> not being able to test an Exynos board, forced to default to 100). My > >> real "grievance" here is we got a configuration item for the scheduler > >> which is being left out of ARM configurations which *can* use high > >> resolution timers, but I don't know if this is a real problem or not, > >> hence asking about it, and that HZ=100 is the ARM default whether we > >> might be able to select that or not.. which seems low. > > > > Well, I have a versatile platform here. It's the inteligence behind > > the power control system for booting the boards on the nightly tests > > (currently disabled because I'm waiting for my main server to lock up > > again, and I need to use one of the serial ports for that.) > > > > The point is, it talks via I2C to a load of power monitors to read > > samples out. It does this at sub-100Hz intervals. Yet the kernel is > > built with HZ=100. NO_HZ=y and highres timers are enabled... works > > fine. > > > > So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and > > highres timers, it all works with epoll() - you get the interval that > > you're after. I've verified this with calls to gettimeofday() and > > the POSIX clocks. > > Okay. > > So, can you read this (it's short): > > http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt > > And please tell me if he's batshit crazy and I should completely > ignore any scheduler discussion that isn't ARM-specific, or maybe.. > and I can almost guarantee this, he doesn't have an ARM platform so > he's just delightfully ill-informed about anything but his quad-core > x86? Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand... doesn't really fit into any of those categories given there. I'd suggest that what's given there is a suggestion/opinion based on behaviours observed on x86 platforms. Whether it's appropriate for other architectures is not really a proven point - is it worth running ARM at 1000Hz when the load from running at 100Hz is measurable as a definite error in loops_per_jiffy calibration? Remember - the load from the interrupt handler at 1000Hz is 10x the load at 100Hz. Do you want to spend more cycles per second on the possibly multi-layer IRQ servicing and timer servicing? And what about the interrupt latency issue that we've hit several times already with devices taking longer than 10ms to service their peripherals because the driver doesn't make use of delayed works/tasklets/etc. The lack of reasonable device DMA too has an impact for many drivers - the CPU has to spend more time in interrupt handlers (which are now run to the exclusion of any other interrupt in the system) performing PIO - or in the case of those systems which _do_ have DMA, they may end up having to do cache maintanence over large cache ranges from IRQ context which x86 doesn't have to do. There's many factors here, and the choice of what the right HZ is for a platform is not as clear cut as one may think. Given all the additional overheads we have on ARM because of the lack of memory coherency, the generally bad DMA support, etc, I think what we currently have is still right as an architecture default - 100Hz. > I did test it.. whatever you define last, sticks, and it's down to the > order they're parsed in the tree - luckily, arch/arm/Kconfig is > sourced first, which sources the mach/plat stuff way down at the > bottom. As long as you have your "default" set somewhere, any further > default just has to be sourced or added later in *one* of the > Kconfigs, same as building any C file with "gcc -E" and spitting it > out. > > Someone, at the end of it all, has to set some default, and as long as > the one you want is the last one, everything is shiny. Actually, we're both wrong. There seems to be two things which inflence it, and it basically comes down to this: - the value a particular symbol has comes from the _first_ declaration which a value is assigned to a symbol. So: config HZ int default 300 config HZ int default 100 if OPT1 default 200 if OPT2 default 400 takes on the value of 300 no matter what combination of OPT1 and OPT2 are enabled. config HZ int default 100 if OPT1 default 200 if OPT2 default 400 config HZ int default 300 never takes the value 300, but 100, 200 or 400. config HZ int default 100 if OPT1 default 200 if OPT2 config HZ int default 300 Will now take 100, 200, or 300 depending on which of OPT1/OPT2 are enabled. So, we _can_ use kernel/Kconfig.hz, but it's not very nice at all: we will be presenting users with configutation options for the HZ value which will be _silently_ ignored by Kconfig if we have a platform which overrides this. Probably fine if you think that Kconfig is a developers tool and you edit the configuration files (and therefore you expect them to know what they're doing, and how this stuff works), but not if you think that Kconfig users should be presented with meaningful options when configuring their kernel. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 23:49 ` Russell King - ARM Linux @ 2013-01-22 0:09 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 0:09 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar Okay so the final resolution of this is; 1) That the arch/arm/Kconfig HZ block is suffering from some cruft I think we could all be fairly confident that Exynos4 or S5P does not require HZ=200 - in theory, it has no such timer restrictions like EBSA110 (the docs I have show a perfectly capable 32-bit timer with a double-digits MHz input clock, since these are multimedia-class SoCs it'd be seriously f**ked up if they didn't). But while some of the entries on this line may be cargo-cult programming, the original addition on top of EBSA110 *may* be one of your "unreported" responsiveness issues. We could just let some Samsung employees complain when Android 6.x starts to get laggy with a 3.8 kernel because we forced their HZ=100. What I would do is predicate a fixed, obvious default on ARCH_MULTIPLATFORM so that it would get the benefit of a default HZ that you agree with. It wouldn't CHANGE anything, but it makes it look less funky, since the non-multiplatform settings would be somewhere else (it either needs more comments or an if - either way - otherwise it's potentially confusing); if ARCH_MULTIPLATFORM config HZ int default 100 else # old config HZ block here endif 2) We need to add config SCHED_HRTICK as a copy and paste from kernel/Kconfig.hz since.. well, I still don't understand exactly what the true effect would be, but I assume since Arnd is concerned and John's explanation rings true that it really should be enabled on ARM systems with the exact same dependencies as kernel/Kconfig.hz. Or not.. I see it as an oddity until I understand if we really care about it, but the code seems to be fairly important to the scheduler and also enabled by default almost everywhere else, which means only people with really freakish SMP architectures with no ability to use GENERIC_SMP_HELPERS have ever run these code paths besides ARM. That kind of leaves ARM in the doghouse.. who knows what weirdo scheduler reactions are related to it not being enabled. Maybe when it is, HZ *would* need to be allowed to be bumped when using this code path? Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. On Mon, Jan 21, 2013 at 5:49 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote: >> On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >> > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: >> >> I am sorry it sounded if I was being high and mighty about not being >> >> able to select my own HZ (or being forced by Exynos to be 200 or by >> >> not being able to test an Exynos board, forced to default to 100). My >> >> real "grievance" here is we got a configuration item for the scheduler >> >> which is being left out of ARM configurations which *can* use high >> >> resolution timers, but I don't know if this is a real problem or not, >> >> hence asking about it, and that HZ=100 is the ARM default whether we >> >> might be able to select that or not.. which seems low. >> > >> > Well, I have a versatile platform here. It's the inteligence behind >> > the power control system for booting the boards on the nightly tests >> > (currently disabled because I'm waiting for my main server to lock up >> > again, and I need to use one of the serial ports for that.) >> > >> > The point is, it talks via I2C to a load of power monitors to read >> > samples out. It does this at sub-100Hz intervals. Yet the kernel is >> > built with HZ=100. NO_HZ=y and highres timers are enabled... works >> > fine. >> > >> > So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and >> > highres timers, it all works with epoll() - you get the interval that >> > you're after. I've verified this with calls to gettimeofday() and >> > the POSIX clocks. >> >> Okay. >> >> So, can you read this (it's short): >> >> http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt >> >> And please tell me if he's batshit crazy and I should completely >> ignore any scheduler discussion that isn't ARM-specific, or maybe.. >> and I can almost guarantee this, he doesn't have an ARM platform so >> he's just delightfully ill-informed about anything but his quad-core >> x86? > > Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand... > doesn't really fit into any of those categories given there. I'd suggest > that what's given there is a suggestion/opinion based on behaviours > observed on x86 platforms. > > Whether it's appropriate for other architectures is not really a proven > point - is it worth running ARM at 1000Hz when the load from running at > 100Hz is measurable as a definite error in loops_per_jiffy calibration? > Remember - the load from the interrupt handler at 1000Hz is 10x the load > at 100Hz. > > Do you want to spend more cycles per second on the possibly multi-layer > IRQ servicing and timer servicing? > > And what about the interrupt latency issue that we've hit several times > already with devices taking longer than 10ms to service their peripherals > because the driver doesn't make use of delayed works/tasklets/etc. > > The lack of reasonable device DMA too has an impact for many drivers - the > CPU has to spend more time in interrupt handlers (which are now run to the > exclusion of any other interrupt in the system) performing PIO - or in the > case of those systems which _do_ have DMA, they may end up having to do > cache maintanence over large cache ranges from IRQ context which x86 > doesn't have to do. > > There's many factors here, and the choice of what the right HZ is for a > platform is not as clear cut as one may think. Given all the additional > overheads we have on ARM because of the lack of memory coherency, the > generally bad DMA support, etc, I think what we currently have is still > right as an architecture default - 100Hz. > >> I did test it.. whatever you define last, sticks, and it's down to the >> order they're parsed in the tree - luckily, arch/arm/Kconfig is >> sourced first, which sources the mach/plat stuff way down at the >> bottom. As long as you have your "default" set somewhere, any further >> default just has to be sourced or added later in *one* of the >> Kconfigs, same as building any C file with "gcc -E" and spitting it >> out. >> >> Someone, at the end of it all, has to set some default, and as long as >> the one you want is the last one, everything is shiny. > > Actually, we're both wrong. There seems to be two things which > inflence it, and it basically comes down to this: > > - the value a particular symbol has comes from the _first_ declaration > which a value is assigned to a symbol. > > So: > > config HZ > int > default 300 > > config HZ > int > default 100 if OPT1 > default 200 if OPT2 > default 400 > > takes on the value of 300 no matter what combination of OPT1 and OPT2 > are enabled. > > config HZ > int > default 100 if OPT1 > default 200 if OPT2 > default 400 > > config HZ > int > default 300 > > never takes the value 300, but 100, 200 or 400. > > config HZ > int > default 100 if OPT1 > default 200 if OPT2 > > config HZ > int > default 300 > > Will now take 100, 200, or 300 depending on which of OPT1/OPT2 are enabled. > > So, we _can_ use kernel/Kconfig.hz, but it's not very nice at all: we will > be presenting users with configutation options for the HZ value which will > be _silently_ ignored by Kconfig if we have a platform which overrides this. > > Probably fine if you think that Kconfig is a developers tool and you edit > the configuration files (and therefore you expect them to know what they're > doing, and how this stuff works), but not if you think that Kconfig users > should be presented with meaningful options when configuring their kernel. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 0:09 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 0:09 UTC (permalink / raw) To: linux-arm-kernel Okay so the final resolution of this is; 1) That the arch/arm/Kconfig HZ block is suffering from some cruft I think we could all be fairly confident that Exynos4 or S5P does not require HZ=200 - in theory, it has no such timer restrictions like EBSA110 (the docs I have show a perfectly capable 32-bit timer with a double-digits MHz input clock, since these are multimedia-class SoCs it'd be seriously f**ked up if they didn't). But while some of the entries on this line may be cargo-cult programming, the original addition on top of EBSA110 *may* be one of your "unreported" responsiveness issues. We could just let some Samsung employees complain when Android 6.x starts to get laggy with a 3.8 kernel because we forced their HZ=100. What I would do is predicate a fixed, obvious default on ARCH_MULTIPLATFORM so that it would get the benefit of a default HZ that you agree with. It wouldn't CHANGE anything, but it makes it look less funky, since the non-multiplatform settings would be somewhere else (it either needs more comments or an if - either way - otherwise it's potentially confusing); if ARCH_MULTIPLATFORM config HZ int default 100 else # old config HZ block here endif 2) We need to add config SCHED_HRTICK as a copy and paste from kernel/Kconfig.hz since.. well, I still don't understand exactly what the true effect would be, but I assume since Arnd is concerned and John's explanation rings true that it really should be enabled on ARM systems with the exact same dependencies as kernel/Kconfig.hz. Or not.. I see it as an oddity until I understand if we really care about it, but the code seems to be fairly important to the scheduler and also enabled by default almost everywhere else, which means only people with really freakish SMP architectures with no ability to use GENERIC_SMP_HELPERS have ever run these code paths besides ARM. That kind of leaves ARM in the doghouse.. who knows what weirdo scheduler reactions are related to it not being enabled. Maybe when it is, HZ *would* need to be allowed to be bumped when using this code path? Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. On Mon, Jan 21, 2013 at 5:49 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote: >> On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >> > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote: >> >> I am sorry it sounded if I was being high and mighty about not being >> >> able to select my own HZ (or being forced by Exynos to be 200 or by >> >> not being able to test an Exynos board, forced to default to 100). My >> >> real "grievance" here is we got a configuration item for the scheduler >> >> which is being left out of ARM configurations which *can* use high >> >> resolution timers, but I don't know if this is a real problem or not, >> >> hence asking about it, and that HZ=100 is the ARM default whether we >> >> might be able to select that or not.. which seems low. >> > >> > Well, I have a versatile platform here. It's the inteligence behind >> > the power control system for booting the boards on the nightly tests >> > (currently disabled because I'm waiting for my main server to lock up >> > again, and I need to use one of the serial ports for that.) >> > >> > The point is, it talks via I2C to a load of power monitors to read >> > samples out. It does this at sub-100Hz intervals. Yet the kernel is >> > built with HZ=100. NO_HZ=y and highres timers are enabled... works >> > fine. >> > >> > So, no, HZ=100 is not a limit in that scenario. With NO_HZ=y and >> > highres timers, it all works with epoll() - you get the interval that >> > you're after. I've verified this with calls to gettimeofday() and >> > the POSIX clocks. >> >> Okay. >> >> So, can you read this (it's short): >> >> http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt >> >> And please tell me if he's batshit crazy and I should completely >> ignore any scheduler discussion that isn't ARM-specific, or maybe.. >> and I can almost guarantee this, he doesn't have an ARM platform so >> he's just delightfully ill-informed about anything but his quad-core >> x86? > > Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand... > doesn't really fit into any of those categories given there. I'd suggest > that what's given there is a suggestion/opinion based on behaviours > observed on x86 platforms. > > Whether it's appropriate for other architectures is not really a proven > point - is it worth running ARM at 1000Hz when the load from running at > 100Hz is measurable as a definite error in loops_per_jiffy calibration? > Remember - the load from the interrupt handler at 1000Hz is 10x the load > at 100Hz. > > Do you want to spend more cycles per second on the possibly multi-layer > IRQ servicing and timer servicing? > > And what about the interrupt latency issue that we've hit several times > already with devices taking longer than 10ms to service their peripherals > because the driver doesn't make use of delayed works/tasklets/etc. > > The lack of reasonable device DMA too has an impact for many drivers - the > CPU has to spend more time in interrupt handlers (which are now run to the > exclusion of any other interrupt in the system) performing PIO - or in the > case of those systems which _do_ have DMA, they may end up having to do > cache maintanence over large cache ranges from IRQ context which x86 > doesn't have to do. > > There's many factors here, and the choice of what the right HZ is for a > platform is not as clear cut as one may think. Given all the additional > overheads we have on ARM because of the lack of memory coherency, the > generally bad DMA support, etc, I think what we currently have is still > right as an architecture default - 100Hz. > >> I did test it.. whatever you define last, sticks, and it's down to the >> order they're parsed in the tree - luckily, arch/arm/Kconfig is >> sourced first, which sources the mach/plat stuff way down at the >> bottom. As long as you have your "default" set somewhere, any further >> default just has to be sourced or added later in *one* of the >> Kconfigs, same as building any C file with "gcc -E" and spitting it >> out. >> >> Someone, at the end of it all, has to set some default, and as long as >> the one you want is the last one, everything is shiny. > > Actually, we're both wrong. There seems to be two things which > inflence it, and it basically comes down to this: > > - the value a particular symbol has comes from the _first_ declaration > which a value is assigned to a symbol. > > So: > > config HZ > int > default 300 > > config HZ > int > default 100 if OPT1 > default 200 if OPT2 > default 400 > > takes on the value of 300 no matter what combination of OPT1 and OPT2 > are enabled. > > config HZ > int > default 100 if OPT1 > default 200 if OPT2 > default 400 > > config HZ > int > default 300 > > never takes the value 300, but 100, 200 or 400. > > config HZ > int > default 100 if OPT1 > default 200 if OPT2 > > config HZ > int > default 300 > > Will now take 100, 200, or 300 depending on which of OPT1/OPT2 are enabled. > > So, we _can_ use kernel/Kconfig.hz, but it's not very nice at all: we will > be presenting users with configutation options for the HZ value which will > be _silently_ ignored by Kconfig if we have a platform which overrides this. > > Probably fine if you think that Kconfig is a developers tool and you edit > the configuration files (and therefore you expect them to know what they're > doing, and how this stuff works), but not if you think that Kconfig users > should be presented with meaningful options when configuring their kernel. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 0:09 ` Matt Sealey @ 2013-01-22 0:26 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 0:26 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, linux-samsung-soc, Ben Dooks, Kukjin Kim On Mon, Jan 21, 2013 at 6:09 PM, Matt Sealey <matt@genesi-usa.com> wrote: [LAKML: about lack of SCHED_HRTICK because we don't use kernel/Kconfig.hz on ARM)] > kind of leaves ARM in the doghouse.. who knows what weirdo scheduler > reactions are related to it not being enabled. Maybe when it is, HZ > *would* need to be allowed to be bumped when using this code path? Or conversely maybe this is exactly why the Samsung maintainers decided they need HZ=200, because SCHED_HRTICK isn't being enabled and they're experiencing some multimedia lag because of it? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 0:26 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 0:26 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 6:09 PM, Matt Sealey <matt@genesi-usa.com> wrote: [LAKML: about lack of SCHED_HRTICK because we don't use kernel/Kconfig.hz on ARM)] > kind of leaves ARM in the doghouse.. who knows what weirdo scheduler > reactions are related to it not being enabled. Maybe when it is, HZ > *would* need to be allowed to be bumped when using this code path? Or conversely maybe this is exactly why the Samsung maintainers decided they need HZ=200, because SCHED_HRTICK isn't being enabled and they're experiencing some multimedia lag because of it? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:00 ` John Stultz @ 2013-01-21 21:14 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 21:14 UTC (permalink / raw) To: John Stultz Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 12:41 PM, Arnd Bergmann wrote: >> >> Right. It's pretty clear that the above logic does not work >> with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM >> select NO_HZ to make the question much less interesting. > > Although, even with NO_HZ, we still have some sense of HZ. I wonder if you can confirm my understanding of this by the way? The way I think this works is; CONFIG_HZ on it's own defines the rate at which the kernel wakes up from sleeping on the job, and checks for current or expired timer events such that it can do things like schedule_work (as in workqueues) or perform scheduler (as in processes/tasks) operations. CONFIG_NO_HZ turns on logic which effectively only wakes up at a *maximum* of CONFIG_HZ times per second, but otherwise will go to sleep and stay that way if no events actually happened (so, we rely on a timer interrupt popping up). In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for example) combined with CONFIG_NO_HZ and less than e.g. 250 things happening per second will wake up "exactly" the same number of times? CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round solution here, then, and CONFIG_HZ=100 should be a reasonable default (as it is anyway with an otherwise-unconfigured kernel on any other platform) for !CONFIG_NO_HZ. I have to admit, the only reason I noticed the above is because I was reading one of CK's BFS logs and reading it makes it seem like the above is the case, but I have no idea if he thinks BFS makes that the case or if the current CFQ scheduler makes that the case, or if this is simply.. the case.. (can you see this is kind of confusing to me as this is basically not written anywhere except maybe an LWN article from 2008 I read up on? :) >> Regarding the defaults, I would suggest putting them into all the >> defaults into the defconfig files and removing the other hardcoding >> otherwise. Ben Dooks and Russell are probably the best to know >> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess >> is that the other samsung ones are the result of cargo cult >> programming. >> >> at91 and omap set the HZ value to something that is derived >> from their hardware timer, but we have also forever had logic >> to calculate the exact time when that does not match. This code >> has very recently been moved into the new register_refined_jiffies() >> function. John can probably tell is if this solves all the problems >> for these platforms. > > > Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and > the register_refined_jiffies is really only necessary if you're not > expecting a proper clocksource to eventually be registered), assuming the > hardware can do something close to the HZ value requested. > > So I'd probably want to hear about what history caused the specific 200 HZ > selections, as I suspect there's actual hardware limitations there. So if > you can not get actual timer ticks any faster then 200 HZ on that hardware, > setting HZ higher could cause some jiffies related timer trouble (ie: if the > kernel thinks HZ is 1000 but the hardware can only do 200, that's a > different problem then if the hardware actually can only do 999.8 HZ). So > things like timer-wheel timeouts may not happen when they should. > > I suspect the best approach for multi-arch in those cases may be to select > HZ=100 As above, or "not select anything at all" since HZ=100 if you don't touch anything, right? If someone picks HZ=1000 and their platform can't support it, then that's their own damn problem (don't touch things you don't understand, right? ;) > and use HRT to allow more modern systems to have finer-grained > timers. My question really has to be is CONFIG_SCHED_HRTICK useful, what exactly is it going to do on ARM here since nobody can ever have enabled it? Is it going to keel over and explode if nobody registers a non-jiffies sched_clock (since the jiffies clock is technically reporting itself as a ridiculously high resolution clocksource..)? Or is this one of those things that if your platform doesn't have a real high resolution timer, you shouldn't enable HRTIMERS and therefore not enable SCHED_HRTICK as a result? That affects ARCH_MULTIPLATFORM here. Is the solution as simple as ARCH_MULTIPLATFORM compliant platforms kind of have to have a high resolution timer? Documentation to that effect? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 21:14 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 21:14 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 12:41 PM, Arnd Bergmann wrote: >> >> Right. It's pretty clear that the above logic does not work >> with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM >> select NO_HZ to make the question much less interesting. > > Although, even with NO_HZ, we still have some sense of HZ. I wonder if you can confirm my understanding of this by the way? The way I think this works is; CONFIG_HZ on it's own defines the rate at which the kernel wakes up from sleeping on the job, and checks for current or expired timer events such that it can do things like schedule_work (as in workqueues) or perform scheduler (as in processes/tasks) operations. CONFIG_NO_HZ turns on logic which effectively only wakes up at a *maximum* of CONFIG_HZ times per second, but otherwise will go to sleep and stay that way if no events actually happened (so, we rely on a timer interrupt popping up). In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for example) combined with CONFIG_NO_HZ and less than e.g. 250 things happening per second will wake up "exactly" the same number of times? CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round solution here, then, and CONFIG_HZ=100 should be a reasonable default (as it is anyway with an otherwise-unconfigured kernel on any other platform) for !CONFIG_NO_HZ. I have to admit, the only reason I noticed the above is because I was reading one of CK's BFS logs and reading it makes it seem like the above is the case, but I have no idea if he thinks BFS makes that the case or if the current CFQ scheduler makes that the case, or if this is simply.. the case.. (can you see this is kind of confusing to me as this is basically not written anywhere except maybe an LWN article from 2008 I read up on? :) >> Regarding the defaults, I would suggest putting them into all the >> defaults into the defconfig files and removing the other hardcoding >> otherwise. Ben Dooks and Russell are probably the best to know >> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess >> is that the other samsung ones are the result of cargo cult >> programming. >> >> at91 and omap set the HZ value to something that is derived >> from their hardware timer, but we have also forever had logic >> to calculate the exact time when that does not match. This code >> has very recently been moved into the new register_refined_jiffies() >> function. John can probably tell is if this solves all the problems >> for these platforms. > > > Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and > the register_refined_jiffies is really only necessary if you're not > expecting a proper clocksource to eventually be registered), assuming the > hardware can do something close to the HZ value requested. > > So I'd probably want to hear about what history caused the specific 200 HZ > selections, as I suspect there's actual hardware limitations there. So if > you can not get actual timer ticks any faster then 200 HZ on that hardware, > setting HZ higher could cause some jiffies related timer trouble (ie: if the > kernel thinks HZ is 1000 but the hardware can only do 200, that's a > different problem then if the hardware actually can only do 999.8 HZ). So > things like timer-wheel timeouts may not happen when they should. > > I suspect the best approach for multi-arch in those cases may be to select > HZ=100 As above, or "not select anything at all" since HZ=100 if you don't touch anything, right? If someone picks HZ=1000 and their platform can't support it, then that's their own damn problem (don't touch things you don't understand, right? ;) > and use HRT to allow more modern systems to have finer-grained > timers. My question really has to be is CONFIG_SCHED_HRTICK useful, what exactly is it going to do on ARM here since nobody can ever have enabled it? Is it going to keel over and explode if nobody registers a non-jiffies sched_clock (since the jiffies clock is technically reporting itself as a ridiculously high resolution clocksource..)? Or is this one of those things that if your platform doesn't have a real high resolution timer, you shouldn't enable HRTIMERS and therefore not enable SCHED_HRTICK as a result? That affects ARCH_MULTIPLATFORM here. Is the solution as simple as ARCH_MULTIPLATFORM compliant platforms kind of have to have a high resolution timer? Documentation to that effect? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:14 ` Matt Sealey @ 2013-01-21 22:36 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-21 22:36 UTC (permalink / raw) To: Matt Sealey Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On 01/21/2013 01:14 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 12:41 PM, Arnd Bergmann wrote: >>> Right. It's pretty clear that the above logic does not work >>> with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM >>> select NO_HZ to make the question much less interesting. >> Although, even with NO_HZ, we still have some sense of HZ. > I wonder if you can confirm my understanding of this by the way? The > way I think this works is; > > CONFIG_HZ on it's own defines the rate at which the kernel wakes up > from sleeping on the job, and checks for current or expired timer > events such that it can do things like schedule_work (as in > workqueues) or perform scheduler (as in processes/tasks) operations. CONFIG_HZ defines the length of a jiffy. In the absence of NOHZ and HRT, HZ defines how frequently the timer/scheduler tick will fire. > CONFIG_NO_HZ turns on logic which effectively only wakes up at a > *maximum* of CONFIG_HZ times per second, but otherwise will go to > sleep and stay that way if no events actually happened (so, we rely on > a timer interrupt popping up). NOHZ adds logic which basically allows us to skip ticks if the cpu is idle. And HRT adds logic which allows us to fire timers more frequently then HZ. > In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for > example) combined with CONFIG_NO_HZ and less than e.g. 250 things > happening per second will wake up "exactly" the same number of times? Ideally, if both systems are completely idle, they may see similar number of actual interrupts. But when the cpus are running processes, the HZ=1000 system will see more frequent interrupts, since the timer/scheduler interrupt will jump in 4 times more frequently. > CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round > solution here, then, and CONFIG_HZ=100 should be a reasonable default > (as it is anyway with an otherwise-unconfigured kernel on any other > platform) for !CONFIG_NO_HZ. Eeehhh... I'm not sure this is follows. >> >> Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and >> the register_refined_jiffies is really only necessary if you're not >> expecting a proper clocksource to eventually be registered), assuming the >> hardware can do something close to the HZ value requested. >> >> So I'd probably want to hear about what history caused the specific 200 HZ >> selections, as I suspect there's actual hardware limitations there. So if >> you can not get actual timer ticks any faster then 200 HZ on that hardware, >> setting HZ higher could cause some jiffies related timer trouble (ie: if the >> kernel thinks HZ is 1000 but the hardware can only do 200, that's a >> different problem then if the hardware actually can only do 999.8 HZ). So >> things like timer-wheel timeouts may not happen when they should. >> >> I suspect the best approach for multi-arch in those cases may be to select >> HZ=100 > As above, or "not select anything at all" since HZ=100 if you don't > touch anything, right? Well, Russell brought up a case that doesn't handle this. If a system *can't* do HZ=100, but can do HZ=200. Though there are hacks, of course, that might get around this (skip every other interrupt at 200HZ). > If someone picks HZ=1000 and their platform can't support it, then > that's their own damn problem (don't touch things you don't > understand, right? ;) Well, ideally with kconfig we try to add proper dependencies so impossible options aren't left to the user. HZ is a common enough knob to turn on most systems, I don't know if leaving the user rope to hang himself is a great idea. > >> and use HRT to allow more modern systems to have finer-grained >> timers. > My question really has to be is CONFIG_SCHED_HRTICK useful, what > exactly is it going to do on ARM here since nobody can ever have > enabled it? Is it going to keel over and explode if nobody registers a > non-jiffies sched_clock (since the jiffies clock is technically > reporting itself as a ridiculously high resolution clocksource..)? ??? Not following this at all. jiffies is the *MOST* coarse resolution clocksource there is (at least that I'm aware of.. I recall someone wanting to do a 60Hz clocksource, but I don't think that ever happened). > Or is this one of those things that if your platform doesn't have a > real high resolution timer, you shouldn't enable HRTIMERS and > therefore not enable SCHED_HRTICK as a result? That affects > ARCH_MULTIPLATFORM here. Is the solution as simple as > ARCH_MULTIPLATFORM compliant platforms kind of have to have a high > resolution timer? Documentation to that effect? SO HRITMERS was designed to be be build time enabled, while still giving you a functioning system if it was booted on a system that didn't support clockevents. We boot with standard HZ, and only switch over to HRT mode if we have a proper clocksource and clockevent driver. However, HRTIMERS or NOHZ doesn't fix the case of having a system boot with HZ=1000 or HZ=100 if the system can *only* do HZ=200. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:36 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-21 22:36 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2013 01:14 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 12:41 PM, Arnd Bergmann wrote: >>> Right. It's pretty clear that the above logic does not work >>> with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM >>> select NO_HZ to make the question much less interesting. >> Although, even with NO_HZ, we still have some sense of HZ. > I wonder if you can confirm my understanding of this by the way? The > way I think this works is; > > CONFIG_HZ on it's own defines the rate at which the kernel wakes up > from sleeping on the job, and checks for current or expired timer > events such that it can do things like schedule_work (as in > workqueues) or perform scheduler (as in processes/tasks) operations. CONFIG_HZ defines the length of a jiffy. In the absence of NOHZ and HRT, HZ defines how frequently the timer/scheduler tick will fire. > CONFIG_NO_HZ turns on logic which effectively only wakes up at a > *maximum* of CONFIG_HZ times per second, but otherwise will go to > sleep and stay that way if no events actually happened (so, we rely on > a timer interrupt popping up). NOHZ adds logic which basically allows us to skip ticks if the cpu is idle. And HRT adds logic which allows us to fire timers more frequently then HZ. > In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for > example) combined with CONFIG_NO_HZ and less than e.g. 250 things > happening per second will wake up "exactly" the same number of times? Ideally, if both systems are completely idle, they may see similar number of actual interrupts. But when the cpus are running processes, the HZ=1000 system will see more frequent interrupts, since the timer/scheduler interrupt will jump in 4 times more frequently. > CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round > solution here, then, and CONFIG_HZ=100 should be a reasonable default > (as it is anyway with an otherwise-unconfigured kernel on any other > platform) for !CONFIG_NO_HZ. Eeehhh... I'm not sure this is follows. >> >> Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and >> the register_refined_jiffies is really only necessary if you're not >> expecting a proper clocksource to eventually be registered), assuming the >> hardware can do something close to the HZ value requested. >> >> So I'd probably want to hear about what history caused the specific 200 HZ >> selections, as I suspect there's actual hardware limitations there. So if >> you can not get actual timer ticks any faster then 200 HZ on that hardware, >> setting HZ higher could cause some jiffies related timer trouble (ie: if the >> kernel thinks HZ is 1000 but the hardware can only do 200, that's a >> different problem then if the hardware actually can only do 999.8 HZ). So >> things like timer-wheel timeouts may not happen when they should. >> >> I suspect the best approach for multi-arch in those cases may be to select >> HZ=100 > As above, or "not select anything at all" since HZ=100 if you don't > touch anything, right? Well, Russell brought up a case that doesn't handle this. If a system *can't* do HZ=100, but can do HZ=200. Though there are hacks, of course, that might get around this (skip every other interrupt at 200HZ). > If someone picks HZ=1000 and their platform can't support it, then > that's their own damn problem (don't touch things you don't > understand, right? ;) Well, ideally with kconfig we try to add proper dependencies so impossible options aren't left to the user. HZ is a common enough knob to turn on most systems, I don't know if leaving the user rope to hang himself is a great idea. > >> and use HRT to allow more modern systems to have finer-grained >> timers. > My question really has to be is CONFIG_SCHED_HRTICK useful, what > exactly is it going to do on ARM here since nobody can ever have > enabled it? Is it going to keel over and explode if nobody registers a > non-jiffies sched_clock (since the jiffies clock is technically > reporting itself as a ridiculously high resolution clocksource..)? ??? Not following this at all. jiffies is the *MOST* coarse resolution clocksource there is (at least that I'm aware of.. I recall someone wanting to do a 60Hz clocksource, but I don't think that ever happened). > Or is this one of those things that if your platform doesn't have a > real high resolution timer, you shouldn't enable HRTIMERS and > therefore not enable SCHED_HRTICK as a result? That affects > ARCH_MULTIPLATFORM here. Is the solution as simple as > ARCH_MULTIPLATFORM compliant platforms kind of have to have a high > resolution timer? Documentation to that effect? SO HRITMERS was designed to be be build time enabled, while still giving you a functioning system if it was booted on a system that didn't support clockevents. We boot with standard HZ, and only switch over to HRT mode if we have a proper clocksource and clockevent driver. However, HRTIMERS or NOHZ doesn't fix the case of having a system boot with HZ=1000 or HZ=100 if the system can *only* do HZ=200. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:36 ` John Stultz @ 2013-01-21 22:49 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:49 UTC (permalink / raw) To: John Stultz Cc: Matt Sealey, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 02:36:13PM -0800, John Stultz wrote: > Well, Russell brought up a case that doesn't handle this. If a system > *can't* do HZ=100, but can do HZ=200. > > Though there are hacks, of course, that might get around this (skip > every other interrupt at 200HZ). Note: in the early days of EBSA110 support, yes, we did that, so that we could have HZ=100 everywhere. _However_ it sufficiently peturbed NTP that it basically was unable to slew the clock in any sane manner. I never got to the bottom of why that was, and when USER_HZ was decoupled from the kernel HZ, it allowed the problem to be fixed, and the kernel code to become a _lot_ cleaner. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:49 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:49 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 02:36:13PM -0800, John Stultz wrote: > Well, Russell brought up a case that doesn't handle this. If a system > *can't* do HZ=100, but can do HZ=200. > > Though there are hacks, of course, that might get around this (skip > every other interrupt at 200HZ). Note: in the early days of EBSA110 support, yes, we did that, so that we could have HZ=100 everywhere. _However_ it sufficiently peturbed NTP that it basically was unable to slew the clock in any sane manner. I never got to the bottom of why that was, and when USER_HZ was decoupled from the kernel HZ, it allowed the problem to be fixed, and the kernel code to become a _lot_ cleaner. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:36 ` John Stultz @ 2013-01-21 22:54 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 22:54 UTC (permalink / raw) To: John Stultz Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 01:14 PM, Matt Sealey wrote: >> >> On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> >> wrote: >>> >>> On 01/21/2013 12:41 PM, Arnd Bergmann wrote: >>>> >>>> Right. It's pretty clear that the above logic does not work >>>> with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM >>>> select NO_HZ to make the question much less interesting. >>> >>> Although, even with NO_HZ, we still have some sense of HZ. >> >> In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for >> example) combined with CONFIG_NO_HZ and less than e.g. 250 things >> happening per second will wake up "exactly" the same number of times? > > Ideally, if both systems are completely idle, they may see similar number of > actual interrupts. > > But when the cpus are running processes, the HZ=1000 system will see more > frequent interrupts, since the timer/scheduler interrupt will jump in 4 > times more frequently. Understood.. >> CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round >> solution here, then, and CONFIG_HZ=100 should be a reasonable default >> (as it is anyway with an otherwise-unconfigured kernel on any other >> platform) for !CONFIG_NO_HZ. > > Eeehhh... I'm not sure this is follows. Okay, I'm happy to be wrong on this... >> As above, or "not select anything at all" since HZ=100 if you don't >> touch anything, right? > > Well, Russell brought up a case that doesn't handle this. If a system > *can't* do HZ=100, but can do HZ=200. > > Though there are hacks, of course, that might get around this (skip every > other interrupt at 200HZ). Hmm, I think it might be appreciated for people looking at this stuff (same as I stumbled into it) for a little comment on WHY the default is 200. That way you don't wonder even if you know why EBSA110 has a HZ=200 default, why Exynos is lumped in there too (to reduce the number of interrupts firing? Maybe the Exynos timer interrupt is kind of a horrid core NMI kind of thing and it's desirable for it not to be every millisecond, or maybe it has the same restrictions as EBSA110, but where would anyone go to find out this information?) >> If someone picks HZ=1000 and their platform can't support it, then >> that's their own damn problem (don't touch things you don't >> understand, right? ;) > > Well, ideally with kconfig we try to add proper dependencies so impossible > options aren't left to the user. > HZ is a common enough knob to turn on most systems, I don't know if leaving > the user rope to hang himself is a great idea. I think then the default 100 at the end of the arch/arm/Kconfig is saying "you are not allowed to know that such a thing as rope even exists," when in fact what we should be doing is just making sure they can't swing it over the rafters.. am I taking the analogy too far? :) >> My question really has to be is CONFIG_SCHED_HRTICK useful, what >> exactly is it going to do on ARM here since nobody can ever have >> enabled it? Is it going to keel over and explode if nobody registers a >> non-jiffies sched_clock (since the jiffies clock is technically >> reporting itself as a ridiculously high resolution clocksource..)? > > ??? Not following this at all. jiffies is the *MOST* coarse resolution > clocksource there is (at least that I'm aware of.. I recall someone wanting > to do a 60Hz clocksource, but I don't think that ever happened). Is that based on it's clocksource rating (probably worse than a real hrtimer) or it's reported resolution? Because on i.MX51 if I force it to use the jiffies clock the debug on the kernel log is telling me it has a higher resolution (it TELLS me that it ticks "as fast" as the CPU frequency and wraps less than my real timer). I know where the 60Hz clocksource might come from, the old Amiga platforms have one based on the PSU frequency (50Hz in Europe, 60Hz US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at least, it is precisely the vsync clock for synchronizing your display output on TV-out, which makes it completely useful for the framebuffer driver), but.. you just won't expect to assign it as sched_clock or your delay timer. And if anyone does I'd expect they'd know full well it'd not run so well. >> Or is this one of those things that if your platform doesn't have a >> real high resolution timer, you shouldn't enable HRTIMERS and >> therefore not enable SCHED_HRTICK as a result? That affects >> ARCH_MULTIPLATFORM here. Is the solution as simple as >> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high >> resolution timer? Documentation to that effect? > > SO HRITMERS was designed to be be build time enabled, while still giving you > a functioning system if it was booted on a system that didn't support > clockevents. We boot with standard HZ, and only switch over to HRT mode if > we have a proper clocksource and clockevent driver. Okay. I'm still a little confused as to what SCHED_HRTICK actually makes a difference to, though. >From that description, we are booting with standard HZ on ARM, and the core sched_clock (as in we can call setup_sched_clock) and/or/both/optionally using a real delay_timer switch to HRT mode if we have the right equipment available in the kernel and at runtime on the SoC.. but the process scheduler isn't compiled with the means to actually take advantage of us being in HRT mode? > However, HRTIMERS or NOHZ doesn't fix the case of having a system boot with > HZ=1000 or HZ=100 if the system can *only* do HZ=200. A simple BUILD_BUG_ON and a BUG_ON right after each other in the appropriate clocksource driver solves that.. if there's an insistence on having at least some rope, we can put them in a field and tell them they have to use the moon to actually hang themselves... -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:54 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 22:54 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 01:14 PM, Matt Sealey wrote: >> >> On Mon, Jan 21, 2013 at 3:00 PM, John Stultz <john.stultz@linaro.org> >> wrote: >>> >>> On 01/21/2013 12:41 PM, Arnd Bergmann wrote: >>>> >>>> Right. It's pretty clear that the above logic does not work >>>> with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM >>>> select NO_HZ to make the question much less interesting. >>> >>> Although, even with NO_HZ, we still have some sense of HZ. >> >> In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for >> example) combined with CONFIG_NO_HZ and less than e.g. 250 things >> happening per second will wake up "exactly" the same number of times? > > Ideally, if both systems are completely idle, they may see similar number of > actual interrupts. > > But when the cpus are running processes, the HZ=1000 system will see more > frequent interrupts, since the timer/scheduler interrupt will jump in 4 > times more frequently. Understood.. >> CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round >> solution here, then, and CONFIG_HZ=100 should be a reasonable default >> (as it is anyway with an otherwise-unconfigured kernel on any other >> platform) for !CONFIG_NO_HZ. > > Eeehhh... I'm not sure this is follows. Okay, I'm happy to be wrong on this... >> As above, or "not select anything at all" since HZ=100 if you don't >> touch anything, right? > > Well, Russell brought up a case that doesn't handle this. If a system > *can't* do HZ=100, but can do HZ=200. > > Though there are hacks, of course, that might get around this (skip every > other interrupt at 200HZ). Hmm, I think it might be appreciated for people looking at this stuff (same as I stumbled into it) for a little comment on WHY the default is 200. That way you don't wonder even if you know why EBSA110 has a HZ=200 default, why Exynos is lumped in there too (to reduce the number of interrupts firing? Maybe the Exynos timer interrupt is kind of a horrid core NMI kind of thing and it's desirable for it not to be every millisecond, or maybe it has the same restrictions as EBSA110, but where would anyone go to find out this information?) >> If someone picks HZ=1000 and their platform can't support it, then >> that's their own damn problem (don't touch things you don't >> understand, right? ;) > > Well, ideally with kconfig we try to add proper dependencies so impossible > options aren't left to the user. > HZ is a common enough knob to turn on most systems, I don't know if leaving > the user rope to hang himself is a great idea. I think then the default 100 at the end of the arch/arm/Kconfig is saying "you are not allowed to know that such a thing as rope even exists," when in fact what we should be doing is just making sure they can't swing it over the rafters.. am I taking the analogy too far? :) >> My question really has to be is CONFIG_SCHED_HRTICK useful, what >> exactly is it going to do on ARM here since nobody can ever have >> enabled it? Is it going to keel over and explode if nobody registers a >> non-jiffies sched_clock (since the jiffies clock is technically >> reporting itself as a ridiculously high resolution clocksource..)? > > ??? Not following this at all. jiffies is the *MOST* coarse resolution > clocksource there is (at least that I'm aware of.. I recall someone wanting > to do a 60Hz clocksource, but I don't think that ever happened). Is that based on it's clocksource rating (probably worse than a real hrtimer) or it's reported resolution? Because on i.MX51 if I force it to use the jiffies clock the debug on the kernel log is telling me it has a higher resolution (it TELLS me that it ticks "as fast" as the CPU frequency and wraps less than my real timer). I know where the 60Hz clocksource might come from, the old Amiga platforms have one based on the PSU frequency (50Hz in Europe, 60Hz US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at least, it is precisely the vsync clock for synchronizing your display output on TV-out, which makes it completely useful for the framebuffer driver), but.. you just won't expect to assign it as sched_clock or your delay timer. And if anyone does I'd expect they'd know full well it'd not run so well. >> Or is this one of those things that if your platform doesn't have a >> real high resolution timer, you shouldn't enable HRTIMERS and >> therefore not enable SCHED_HRTICK as a result? That affects >> ARCH_MULTIPLATFORM here. Is the solution as simple as >> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high >> resolution timer? Documentation to that effect? > > SO HRITMERS was designed to be be build time enabled, while still giving you > a functioning system if it was booted on a system that didn't support > clockevents. We boot with standard HZ, and only switch over to HRT mode if > we have a proper clocksource and clockevent driver. Okay. I'm still a little confused as to what SCHED_HRTICK actually makes a difference to, though. >From that description, we are booting with standard HZ on ARM, and the core sched_clock (as in we can call setup_sched_clock) and/or/both/optionally using a real delay_timer switch to HRT mode if we have the right equipment available in the kernel and at runtime on the SoC.. but the process scheduler isn't compiled with the means to actually take advantage of us being in HRT mode? > However, HRTIMERS or NOHZ doesn't fix the case of having a system boot with > HZ=1000 or HZ=100 if the system can *only* do HZ=200. A simple BUILD_BUG_ON and a BUG_ON right after each other in the appropriate clocksource driver solves that.. if there's an insistence on having at least some rope, we can put them in a field and tell them they have to use the moon to actually hang themselves... -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:54 ` Matt Sealey @ 2013-01-21 23:13 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 23:13 UTC (permalink / raw) To: Matt Sealey Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote: > Hmm, I think it might be appreciated for people looking at this stuff > (same as I stumbled into it) for a little comment on WHY the default > is 200. That way you don't wonder even if you know why EBSA110 has a > HZ=200 default, why Exynos is lumped in there too (to reduce the > number of interrupts firing? Err, _reduce_ ? Can you please explain why changing HZ from 100 to 200 is a reduction? > Maybe the Exynos timer interrupt is kind > of a horrid core NMI kind of thing and it's desirable for it not to be > every millisecond, Huh? HZ=100 is centisecond intervals... > or maybe it has the same restrictions as EBSA110, > but where would anyone go to find out this information?) Maybe the mailing list archives. No, not these ones. The full ones on lists.arm.linux.org.uk. The lurker archives contain every email that has been on these mailing lists stretching back into the late 1990s. They are the only _full_ archives (give or take a few problems with connectivity between lists.arm.linux.org.uk and lists.infradead.org throwing the archiver subscription off.) > I think then the default 100 at the end of the arch/arm/Kconfig is > saying "you are not allowed to know that such a thing as rope even > exists," when in fact what we should be doing is just making sure they > can't swing it over the rafters.. am I taking the analogy too far? :) I think you're understanding is just waaaayyyyy off. That default is there because that is the _architecture_ _default_ and there _has_ to be a default. No, including kernel/Kconfig.hz won't give us any kind of non-specified default because, as I've already said in one of my other mails, you can't supplement Kconfig symbol definitions by declaring it multiple times. > I know where the 60Hz clocksource might come from, the old Amiga > platforms have one based on the PSU frequency (50Hz in Europe, 60Hz > US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at > least, it is precisely the vsync clock for synchronizing your display > output on TV-out, which makes it completely useful for the framebuffer > driver), but.. you just won't expect to assign it as sched_clock or > your delay timer. And if anyone does I'd expect they'd know full well > it'd not run so well. Except in the UK where it'd be 50Hz for the TV out. (Lengthy irrelevant explanation why this is so for UK cut.) > >From that description, we are booting with standard HZ on ARM, and the > core sched_clock (as in we can call setup_sched_clock) > and/or/both/optionally using a real delay_timer switch to HRT mode if > we have the right equipment available in the kernel and at runtime on > the SoC.. but the process scheduler isn't compiled with the means to > actually take advantage of us being in HRT mode? Don't mix sched_clock() into this; it has nothing to do with HZ at all. You're confusing your apples with your oranges. > A simple BUILD_BUG_ON and a BUG_ON right after each other in the > appropriate clocksource driver solves that.. if there's an insistence > on having at least some rope, we can put them in a field and tell them > they have to use the moon to actually hang themselves... No it doesn't - it introduces a whole load of new ways to make the kernel build or boot fail for pointless reasons - more failures, more regressions. No thank you. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 23:13 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 23:13 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote: > Hmm, I think it might be appreciated for people looking at this stuff > (same as I stumbled into it) for a little comment on WHY the default > is 200. That way you don't wonder even if you know why EBSA110 has a > HZ=200 default, why Exynos is lumped in there too (to reduce the > number of interrupts firing? Err, _reduce_ ? Can you please explain why changing HZ from 100 to 200 is a reduction? > Maybe the Exynos timer interrupt is kind > of a horrid core NMI kind of thing and it's desirable for it not to be > every millisecond, Huh? HZ=100 is centisecond intervals... > or maybe it has the same restrictions as EBSA110, > but where would anyone go to find out this information?) Maybe the mailing list archives. No, not these ones. The full ones on lists.arm.linux.org.uk. The lurker archives contain every email that has been on these mailing lists stretching back into the late 1990s. They are the only _full_ archives (give or take a few problems with connectivity between lists.arm.linux.org.uk and lists.infradead.org throwing the archiver subscription off.) > I think then the default 100 at the end of the arch/arm/Kconfig is > saying "you are not allowed to know that such a thing as rope even > exists," when in fact what we should be doing is just making sure they > can't swing it over the rafters.. am I taking the analogy too far? :) I think you're understanding is just waaaayyyyy off. That default is there because that is the _architecture_ _default_ and there _has_ to be a default. No, including kernel/Kconfig.hz won't give us any kind of non-specified default because, as I've already said in one of my other mails, you can't supplement Kconfig symbol definitions by declaring it multiple times. > I know where the 60Hz clocksource might come from, the old Amiga > platforms have one based on the PSU frequency (50Hz in Europe, 60Hz > US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at > least, it is precisely the vsync clock for synchronizing your display > output on TV-out, which makes it completely useful for the framebuffer > driver), but.. you just won't expect to assign it as sched_clock or > your delay timer. And if anyone does I'd expect they'd know full well > it'd not run so well. Except in the UK where it'd be 50Hz for the TV out. (Lengthy irrelevant explanation why this is so for UK cut.) > >From that description, we are booting with standard HZ on ARM, and the > core sched_clock (as in we can call setup_sched_clock) > and/or/both/optionally using a real delay_timer switch to HRT mode if > we have the right equipment available in the kernel and at runtime on > the SoC.. but the process scheduler isn't compiled with the means to > actually take advantage of us being in HRT mode? Don't mix sched_clock() into this; it has nothing to do with HZ at all. You're confusing your apples with your oranges. > A simple BUILD_BUG_ON and a BUG_ON right after each other in the > appropriate clocksource driver solves that.. if there's an insistence > on having at least some rope, we can put them in a field and tell them > they have to use the moon to actually hang themselves... No it doesn't - it introduces a whole load of new ways to make the kernel build or boot fail for pointless reasons - more failures, more regressions. No thank you. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 23:13 ` Russell King - ARM Linux @ 2013-01-21 23:30 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 23:30 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 5:13 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote: >> Hmm, I think it might be appreciated for people looking at this stuff >> (same as I stumbled into it) for a little comment on WHY the default >> is 200. That way you don't wonder even if you know why EBSA110 has a >> HZ=200 default, why Exynos is lumped in there too (to reduce the >> number of interrupts firing? > > Err, _reduce_ ? > > Can you please explain why changing HZ from 100 to 200 is a reduction? We were talking about HZ=1000 at the time, sorry... >> Maybe the Exynos timer interrupt is kind >> of a horrid core NMI kind of thing and it's desirable for it not to be >> every millisecond, > > Huh? HZ=100 is centisecond intervals... See above.. > I think you're understanding is just waaaayyyyy off. That default is > there because that is the _architecture_ _default_ and there _has_ to > be a default. No, including kernel/Kconfig.hz won't give us any kind > of non-specified default because, as I've already said in one of my > other mails, you can't supplement Kconfig symbol definitions by > declaring it multiple times. Okay, so the real >> I know where the 60Hz clocksource might come from, the old Amiga >> platforms have one based on the PSU frequency (50Hz in Europe, 60Hz >> US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at >> least, it is precisely the vsync clock for synchronizing your display >> output on TV-out, which makes it completely useful for the framebuffer >> driver), but.. you just won't expect to assign it as sched_clock or >> your delay timer. And if anyone does I'd expect they'd know full well >> it'd not run so well. > > Except in the UK where it'd be 50Hz for the TV out. (Lengthy irrelevant > explanation why this is so for UK cut.) Read again: "50Hz in Europe". Australia too. I'm British and I used to have more EU-manufactured Amigas than I knew what to do with.. so.. just like your NTP story, I definitely know this already. >> >From that description, we are booting with standard HZ on ARM, and the >> core sched_clock (as in we can call setup_sched_clock) >> and/or/both/optionally using a real delay_timer switch to HRT mode if >> we have the right equipment available in the kernel and at runtime on >> the SoC.. but the process scheduler isn't compiled with the means to >> actually take advantage of us being in HRT mode? > > Don't mix sched_clock() into this; it has nothing to do with HZ at all. > You're confusing your apples with your oranges. Okay.. >> A simple BUILD_BUG_ON and a BUG_ON right after each other in the >> appropriate clocksource driver solves that.. if there's an insistence >> on having at least some rope, we can put them in a field and tell them >> they have to use the moon to actually hang themselves... > > No it doesn't - it introduces a whole load of new ways to make the > kernel build or boot fail for pointless reasons - more failures, more > regressions. > > No thank you. But it would effectively stop users drinking kool-aid.. if you set your HZ to something stupid, you don't even get a kernel to build, and certainly don't get to boot past the first 40 lines of boot messages.. I think most people would rather a build error, or a runtime unmistakable, unmissable warning than a subtle and almost imperceptible skew in NTP synchronization, to use your example. -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 23:30 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 23:30 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 5:13 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote: >> Hmm, I think it might be appreciated for people looking at this stuff >> (same as I stumbled into it) for a little comment on WHY the default >> is 200. That way you don't wonder even if you know why EBSA110 has a >> HZ=200 default, why Exynos is lumped in there too (to reduce the >> number of interrupts firing? > > Err, _reduce_ ? > > Can you please explain why changing HZ from 100 to 200 is a reduction? We were talking about HZ=1000 at the time, sorry... >> Maybe the Exynos timer interrupt is kind >> of a horrid core NMI kind of thing and it's desirable for it not to be >> every millisecond, > > Huh? HZ=100 is centisecond intervals... See above.. > I think you're understanding is just waaaayyyyy off. That default is > there because that is the _architecture_ _default_ and there _has_ to > be a default. No, including kernel/Kconfig.hz won't give us any kind > of non-specified default because, as I've already said in one of my > other mails, you can't supplement Kconfig symbol definitions by > declaring it multiple times. Okay, so the real >> I know where the 60Hz clocksource might come from, the old Amiga >> platforms have one based on the PSU frequency (50Hz in Europe, 60Hz >> US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at >> least, it is precisely the vsync clock for synchronizing your display >> output on TV-out, which makes it completely useful for the framebuffer >> driver), but.. you just won't expect to assign it as sched_clock or >> your delay timer. And if anyone does I'd expect they'd know full well >> it'd not run so well. > > Except in the UK where it'd be 50Hz for the TV out. (Lengthy irrelevant > explanation why this is so for UK cut.) Read again: "50Hz in Europe". Australia too. I'm British and I used to have more EU-manufactured Amigas than I knew what to do with.. so.. just like your NTP story, I definitely know this already. >> >From that description, we are booting with standard HZ on ARM, and the >> core sched_clock (as in we can call setup_sched_clock) >> and/or/both/optionally using a real delay_timer switch to HRT mode if >> we have the right equipment available in the kernel and at runtime on >> the SoC.. but the process scheduler isn't compiled with the means to >> actually take advantage of us being in HRT mode? > > Don't mix sched_clock() into this; it has nothing to do with HZ at all. > You're confusing your apples with your oranges. Okay.. >> A simple BUILD_BUG_ON and a BUG_ON right after each other in the >> appropriate clocksource driver solves that.. if there's an insistence >> on having at least some rope, we can put them in a field and tell them >> they have to use the moon to actually hang themselves... > > No it doesn't - it introduces a whole load of new ways to make the > kernel build or boot fail for pointless reasons - more failures, more > regressions. > > No thank you. But it would effectively stop users drinking kool-aid.. if you set your HZ to something stupid, you don't even get a kernel to build, and certainly don't get to boot past the first 40 lines of boot messages.. I think most people would rather a build error, or a runtime unmistakable, unmissable warning than a subtle and almost imperceptible skew in NTP synchronization, to use your example. -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 23:30 ` Matt Sealey @ 2013-01-22 0:02 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-22 0:02 UTC (permalink / raw) To: Matt Sealey Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 05:30:31PM -0600, Matt Sealey wrote: > But it would effectively stop users drinking kool-aid.. if you set > your HZ to something stupid, you don't even get a kernel to build, and > certainly don't get to boot past the first 40 lines of boot messages.. > I think most people would rather a build error, or a runtime > unmistakable, unmissable warning than a subtle and almost > imperceptible skew in NTP synchronization, to use your example. 1. a kernel which doesn't build. What do you think both Arnd and myself have been doing for the last few years, building such things as random configurations and such like, finding stuff that doesn't work and fixing the kernel so that we end up with _NO_ configuration which fails to build. Are you seriously about to tell us that we're wasting our time and we should just let the kernel build fail in all horrid sorts of ways? 2. As for NTP behaviour... well, have you ever experienced a system where NTP has to keep doing step corrections on the time of day, where some steps (eg, backwards) cause services to quit because time of day must be monotonic... What you're proposing is that we litter the ARM arch with all sorts of tests for CONFIG_HZ and #error out on ones that don't make sense. I think you're smoking crack. What I think is that we should _not_ allow CONFIG_HZ to be set to anything which isn't appropriate for the platforms - or indeed the reverse. That's going to be extremely difficult to do with multi-arch because it's effectively a two-way dependency. I don't think we can do that with kernel/Kconfig.hz unless we introduce another layer of permissive configurations for the HZ_1000... etc, but I'm not sure that anyone outside ARM would like even that. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 0:02 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-22 0:02 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 05:30:31PM -0600, Matt Sealey wrote: > But it would effectively stop users drinking kool-aid.. if you set > your HZ to something stupid, you don't even get a kernel to build, and > certainly don't get to boot past the first 40 lines of boot messages.. > I think most people would rather a build error, or a runtime > unmistakable, unmissable warning than a subtle and almost > imperceptible skew in NTP synchronization, to use your example. 1. a kernel which doesn't build. What do you think both Arnd and myself have been doing for the last few years, building such things as random configurations and such like, finding stuff that doesn't work and fixing the kernel so that we end up with _NO_ configuration which fails to build. Are you seriously about to tell us that we're wasting our time and we should just let the kernel build fail in all horrid sorts of ways? 2. As for NTP behaviour... well, have you ever experienced a system where NTP has to keep doing step corrections on the time of day, where some steps (eg, backwards) cause services to quit because time of day must be monotonic... What you're proposing is that we litter the ARM arch with all sorts of tests for CONFIG_HZ and #error out on ones that don't make sense. I think you're smoking crack. What I think is that we should _not_ allow CONFIG_HZ to be set to anything which isn't appropriate for the platforms - or indeed the reverse. That's going to be extremely difficult to do with multi-arch because it's effectively a two-way dependency. I don't think we can do that with kernel/Kconfig.hz unless we introduce another layer of permissive configurations for the HZ_1000... etc, but I'm not sure that anyone outside ARM would like even that. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:54 ` Matt Sealey @ 2013-01-22 0:38 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 0:38 UTC (permalink / raw) To: Matt Sealey Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, Thomas Gleixner On 01/21/2013 02:54 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 01:14 PM, Matt Sealey wrote: >>> Or is this one of those things that if your platform doesn't have a >>> real high resolution timer, you shouldn't enable HRTIMERS and >>> therefore not enable SCHED_HRTICK as a result? That affects >>> ARCH_MULTIPLATFORM here. Is the solution as simple as >>> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high >>> resolution timer? Documentation to that effect? >> SO HRITMERS was designed to be be build time enabled, while still giving you >> a functioning system if it was booted on a system that didn't support >> clockevents. We boot with standard HZ, and only switch over to HRT mode if >> we have a proper clocksource and clockevent driver. > Okay. I'm still a little confused as to what SCHED_HRTICK actually > makes a difference to, though. > > From that description, we are booting with standard HZ on ARM, and the > core sched_clock (as in we can call setup_sched_clock) > and/or/both/optionally using a real delay_timer switch to HRT mode if > we have the right equipment available in the kernel and at runtime on > the SoC.. but the process scheduler isn't compiled with the means to > actually take advantage of us being in HRT mode? So I'm actually not super familiar with SCHED_HRTICK details, but from my brief skim of it it looks like its useful for turning off the periodic timer tick, and allowing the scheduler tick to be triggered by an hrtimer itself (There's a number of these interesting inversions that go on in switching to HRT mode - for instance, standard timer ticks are switched to being hrtimer events themselves). This likely has the benefit of time-accurate preemption (well, long term, as if the timer granularity isn't matching you could be delayed up to a tick - but it wouldn't drift). I'm guessing Thomas would probably know best what the potential issues would be from running ((CONFIG_HRTIMER || CONFIG_NO_HZ) && !CONFIG_SCHED_HRTICK). thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 0:38 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 0:38 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2013 02:54 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 01:14 PM, Matt Sealey wrote: >>> Or is this one of those things that if your platform doesn't have a >>> real high resolution timer, you shouldn't enable HRTIMERS and >>> therefore not enable SCHED_HRTICK as a result? That affects >>> ARCH_MULTIPLATFORM here. Is the solution as simple as >>> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high >>> resolution timer? Documentation to that effect? >> SO HRITMERS was designed to be be build time enabled, while still giving you >> a functioning system if it was booted on a system that didn't support >> clockevents. We boot with standard HZ, and only switch over to HRT mode if >> we have a proper clocksource and clockevent driver. > Okay. I'm still a little confused as to what SCHED_HRTICK actually > makes a difference to, though. > > From that description, we are booting with standard HZ on ARM, and the > core sched_clock (as in we can call setup_sched_clock) > and/or/both/optionally using a real delay_timer switch to HRT mode if > we have the right equipment available in the kernel and at runtime on > the SoC.. but the process scheduler isn't compiled with the means to > actually take advantage of us being in HRT mode? So I'm actually not super familiar with SCHED_HRTICK details, but from my brief skim of it it looks like its useful for turning off the periodic timer tick, and allowing the scheduler tick to be triggered by an hrtimer itself (There's a number of these interesting inversions that go on in switching to HRT mode - for instance, standard timer ticks are switched to being hrtimer events themselves). This likely has the benefit of time-accurate preemption (well, long term, as if the timer granularity isn't matching you could be delayed up to a tick - but it wouldn't drift). I'm guessing Thomas would probably know best what the potential issues would be from running ((CONFIG_HRTIMER || CONFIG_NO_HZ) && !CONFIG_SCHED_HRTICK). thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:54 ` Matt Sealey @ 2013-01-22 0:51 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 0:51 UTC (permalink / raw) To: Matt Sealey Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On 01/21/2013 02:54 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 01:14 PM, Matt Sealey wrote: >>> My question really has to be is CONFIG_SCHED_HRTICK useful, what >>> exactly is it going to do on ARM here since nobody can ever have >>> enabled it? Is it going to keel over and explode if nobody registers a >>> non-jiffies sched_clock (since the jiffies clock is technically >>> reporting itself as a ridiculously high resolution clocksource..)? >> ??? Not following this at all. jiffies is the *MOST* coarse resolution >> clocksource there is (at least that I'm aware of.. I recall someone wanting >> to do a 60Hz clocksource, but I don't think that ever happened). > Is that based on it's clocksource rating (probably worse than a real > hrtimer) or it's reported resolution? Because on i.MX51 if I force it > to use the jiffies clock the debug on the kernel log is telling me it > has a higher resolution (it TELLS me that it ticks "as fast" as the > CPU frequency and wraps less than my real timer). So the clocksource rating is supposed to be defined by the clocksource driver writer, and just provides a way for the clocksource core to select the best clocksource given a set of clocksources. It is not defined as any sort of calculated mapping to any property of the clocksource itself (although some driver writers might compute a ratings value in that way, but I feel the static ranking is much simpler). The comment above struct clocksource in clocksource.h tries to explain this. As far as jiffies rating, from jiffies.c: .rating = 1, /* lowest valid rating*/ So I'm not sure what you mean by "the debug on the kernel log is telling me it has a higher resolution". > I know where the 60Hz clocksource might come from, the old Amiga > platforms have one based on the PSU frequency (50Hz in Europe, 60Hz > US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at > least, it is precisely the vsync clock for synchronizing your display > output on TV-out, which makes it completely useful for the framebuffer > driver), but.. you just won't expect to assign it as sched_clock or > your delay timer. And if anyone does I'd expect they'd know full well > it'd not run so well. Yes, in the case I was remembering, the 60HZ was driven by the electrical line. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 0:51 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 0:51 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2013 02:54 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 01:14 PM, Matt Sealey wrote: >>> My question really has to be is CONFIG_SCHED_HRTICK useful, what >>> exactly is it going to do on ARM here since nobody can ever have >>> enabled it? Is it going to keel over and explode if nobody registers a >>> non-jiffies sched_clock (since the jiffies clock is technically >>> reporting itself as a ridiculously high resolution clocksource..)? >> ??? Not following this at all. jiffies is the *MOST* coarse resolution >> clocksource there is (at least that I'm aware of.. I recall someone wanting >> to do a 60Hz clocksource, but I don't think that ever happened). > Is that based on it's clocksource rating (probably worse than a real > hrtimer) or it's reported resolution? Because on i.MX51 if I force it > to use the jiffies clock the debug on the kernel log is telling me it > has a higher resolution (it TELLS me that it ticks "as fast" as the > CPU frequency and wraps less than my real timer). So the clocksource rating is supposed to be defined by the clocksource driver writer, and just provides a way for the clocksource core to select the best clocksource given a set of clocksources. It is not defined as any sort of calculated mapping to any property of the clocksource itself (although some driver writers might compute a ratings value in that way, but I feel the static ranking is much simpler). The comment above struct clocksource in clocksource.h tries to explain this. As far as jiffies rating, from jiffies.c: .rating = 1, /* lowest valid rating*/ So I'm not sure what you mean by "the debug on the kernel log is telling me it has a higher resolution". > I know where the 60Hz clocksource might come from, the old Amiga > platforms have one based on the PSU frequency (50Hz in Europe, 60Hz > US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at > least, it is precisely the vsync clock for synchronizing your display > output on TV-out, which makes it completely useful for the framebuffer > driver), but.. you just won't expect to assign it as sched_clock or > your delay timer. And if anyone does I'd expect they'd know full well > it'd not run so well. Yes, in the case I was remembering, the 60HZ was driven by the electrical line. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 0:51 ` John Stultz @ 2013-01-22 1:06 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 1:06 UTC (permalink / raw) To: John Stultz Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 02:54 PM, Matt Sealey wrote: >> >> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> >> wrote: >>> >>> On 01/21/2013 01:14 PM, Matt Sealey wrote: > > As far as jiffies rating, from jiffies.c: > .rating = 1, /* lowest valid rating*/ > > So I'm not sure what you mean by "the debug on the kernel log is telling me > it has a higher resolution". Oh, it is just if I actually don't run setup_sched_clock on my platform, it gives a little message (with #define DEBUG 1 in sched_clock.c) about who setup the last sched_clock. Since you only get one chance, and I was fiddling with setup_sched_clock being probed from multiple possible timers from device tree (i.MX3 has a crapload of valid timers, which one you use right now is basically forced by the not-quite-fully-DT-only code and some funky iomap tricks). And what I got was, if I use the real hardware timer, it runs at 66MHz and says it has 15ns resolution and wraps every 500 seconds or so. The jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the drift. The generic reporting of how "good" the sched_clock source is kind of glosses over the quality rating of the clock source and at first glance (if you're not paying that much attention), it is a little bit misleading.. > Yes, in the case I was remembering, the 60HZ was driven by the electrical > line. While I have your attention, what would be the minimum "good" speed to run the sched_clock or delay timer implementation from? My rudimentary scribblings in my notebook give me a value of "don't bother" with less than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz clock would do (i.MX osc clock input if I can supply it to one of the above myriad timers) since this would be low-power compared to a 66MHz one (by a couple mA anyway). I also have a bunch of questions about the delay timer requirements.. I might mail you personally.. or would you prefer on-list? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 1:06 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 1:06 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 02:54 PM, Matt Sealey wrote: >> >> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> >> wrote: >>> >>> On 01/21/2013 01:14 PM, Matt Sealey wrote: > > As far as jiffies rating, from jiffies.c: > .rating = 1, /* lowest valid rating*/ > > So I'm not sure what you mean by "the debug on the kernel log is telling me > it has a higher resolution". Oh, it is just if I actually don't run setup_sched_clock on my platform, it gives a little message (with #define DEBUG 1 in sched_clock.c) about who setup the last sched_clock. Since you only get one chance, and I was fiddling with setup_sched_clock being probed from multiple possible timers from device tree (i.MX3 has a crapload of valid timers, which one you use right now is basically forced by the not-quite-fully-DT-only code and some funky iomap tricks). And what I got was, if I use the real hardware timer, it runs at 66MHz and says it has 15ns resolution and wraps every 500 seconds or so. The jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the drift. The generic reporting of how "good" the sched_clock source is kind of glosses over the quality rating of the clock source and at first glance (if you're not paying that much attention), it is a little bit misleading.. > Yes, in the case I was remembering, the 60HZ was driven by the electrical > line. While I have your attention, what would be the minimum "good" speed to run the sched_clock or delay timer implementation from? My rudimentary scribblings in my notebook give me a value of "don't bother" with less than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz clock would do (i.MX osc clock input if I can supply it to one of the above myriad timers) since this would be low-power compared to a 66MHz one (by a couple mA anyway). I also have a bunch of questions about the delay timer requirements.. I might mail you personally.. or would you prefer on-list? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 1:06 ` Matt Sealey @ 2013-01-22 1:18 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-22 1:18 UTC (permalink / raw) To: Matt Sealey Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: > > On 01/21/2013 02:54 PM, Matt Sealey wrote: > >> > >> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> > >> wrote: > >>> > >>> On 01/21/2013 01:14 PM, Matt Sealey wrote: > > > > As far as jiffies rating, from jiffies.c: > > .rating = 1, /* lowest valid rating*/ > > > > So I'm not sure what you mean by "the debug on the kernel log is telling me > > it has a higher resolution". > > Oh, it is just if I actually don't run setup_sched_clock on my > platform, it gives a little message (with #define DEBUG 1 in > sched_clock.c) sched_clock() has nothing to do with time keeping, and that HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from jiffies). Now, sched_clock() is there to give the scheduler a _fast_ to access, higher resolution clock than is available from other sources, so that there's ways of accurately measuring the amount of time processes run for, and other such measurements - and it uses that to determine how to schedule a particular task and when to preempt it. Not providing it means you get those measurements at HZ-based resolution, which is suboptimal for tasks which run often for sub-HZ periods (which can end up accumulating zero run time.) ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 1:18 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-22 1:18 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: > > On 01/21/2013 02:54 PM, Matt Sealey wrote: > >> > >> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> > >> wrote: > >>> > >>> On 01/21/2013 01:14 PM, Matt Sealey wrote: > > > > As far as jiffies rating, from jiffies.c: > > .rating = 1, /* lowest valid rating*/ > > > > So I'm not sure what you mean by "the debug on the kernel log is telling me > > it has a higher resolution". > > Oh, it is just if I actually don't run setup_sched_clock on my > platform, it gives a little message (with #define DEBUG 1 in > sched_clock.c) sched_clock() has nothing to do with time keeping, and that HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from jiffies). Now, sched_clock() is there to give the scheduler a _fast_ to access, higher resolution clock than is available from other sources, so that there's ways of accurately measuring the amount of time processes run for, and other such measurements - and it uses that to determine how to schedule a particular task and when to preempt it. Not providing it means you get those measurements at HZ-based resolution, which is suboptimal for tasks which run often for sub-HZ periods (which can end up accumulating zero run time.) ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 1:18 ` Russell King - ARM Linux @ 2013-01-22 1:56 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 1:56 UTC (permalink / raw) To: Russell King - ARM Linux Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar On Mon, Jan 21, 2013 at 7:18 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote: >> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: >> > On 01/21/2013 02:54 PM, Matt Sealey wrote: > > sched_clock() has nothing to do with time keeping, and that > HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from > jiffies). > > Now, sched_clock() is there to give the scheduler a _fast_ to access, > higher resolution clock than is available from other sources, so that > there's ways of accurately measuring the amount of time processes run > for, That depends on what you meant by timekeeping, right? I'm really not concerned about the wallclock time, more about the accuracy of the scheduler clock (tick?), preemption, accurate delays (i.e. if I msleep(10) does it delay for 10ms or for 40ms because my delay timer is inaccurate? I'd rather it was better but closer to 10ms), and whether the scheduler (the thing that tells my userspace whether firefox is running now, or totem, or any other task) is using the correct high resolution periodic, oneshot, repeatable (however it repeats) timers *properly* given that this magic config item is missing on ARM. That magic config item being CONFIG_SCHED_HRTICK which is referenced a bunch in kernel/sched/*.[ch] but *ONLY* defined as a Kconfig item in kernel/Kconfig.hz. Do we need to copy that Kconfig item out to arch/arm/Kconfig, that's the question? > and other such measurements - and it uses that to determine how > to schedule a particular task and when to preempt it. > > Not providing it means you get those measurements at HZ-based resolution, > which is suboptimal for tasks which run often for sub-HZ periods (which > can end up accumulating zero run time.) Okay, and John said earlier: John Stultz: > So I'm actually not super familiar with SCHED_HRTICK details, but from my > brief skim of it it looks like its useful for turning off the periodic timer > tick, and allowing the scheduler tick to be triggered by an hrtimer itself > (There's a number of these interesting inversions that go on in switching to > HRT mode - for instance, standard timer ticks are switched to being hrtimer > events themselves). > > This likely has the benefit of time-accurate preemption (well, long term, as > if the timer granularity isn't matching you could be delayed up to a tick - > but it wouldn't drift). > > I'm guessing Thomas would probably know best what the potential issues would > be from running ((CONFIG_HRTIMER || CONFIG_NO_HZ) && !CONFIG_SCHED_HRTICK). If SCHED_HRTICK isn't enabled but setup_sched_clock has been given an accessor for a real, hardware, fast, high resolution counter that meets all the needs of sched_clock, what's going on? If it's enabled, what extra is it doing that, say, my_plat_read_sched_clock doesn't? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 1:56 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 1:56 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 7:18 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote: >> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: >> > On 01/21/2013 02:54 PM, Matt Sealey wrote: > > sched_clock() has nothing to do with time keeping, and that > HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from > jiffies). > > Now, sched_clock() is there to give the scheduler a _fast_ to access, > higher resolution clock than is available from other sources, so that > there's ways of accurately measuring the amount of time processes run > for, That depends on what you meant by timekeeping, right? I'm really not concerned about the wallclock time, more about the accuracy of the scheduler clock (tick?), preemption, accurate delays (i.e. if I msleep(10) does it delay for 10ms or for 40ms because my delay timer is inaccurate? I'd rather it was better but closer to 10ms), and whether the scheduler (the thing that tells my userspace whether firefox is running now, or totem, or any other task) is using the correct high resolution periodic, oneshot, repeatable (however it repeats) timers *properly* given that this magic config item is missing on ARM. That magic config item being CONFIG_SCHED_HRTICK which is referenced a bunch in kernel/sched/*.[ch] but *ONLY* defined as a Kconfig item in kernel/Kconfig.hz. Do we need to copy that Kconfig item out to arch/arm/Kconfig, that's the question? > and other such measurements - and it uses that to determine how > to schedule a particular task and when to preempt it. > > Not providing it means you get those measurements at HZ-based resolution, > which is suboptimal for tasks which run often for sub-HZ periods (which > can end up accumulating zero run time.) Okay, and John said earlier: John Stultz: > So I'm actually not super familiar with SCHED_HRTICK details, but from my > brief skim of it it looks like its useful for turning off the periodic timer > tick, and allowing the scheduler tick to be triggered by an hrtimer itself > (There's a number of these interesting inversions that go on in switching to > HRT mode - for instance, standard timer ticks are switched to being hrtimer > events themselves). > > This likely has the benefit of time-accurate preemption (well, long term, as > if the timer granularity isn't matching you could be delayed up to a tick - > but it wouldn't drift). > > I'm guessing Thomas would probably know best what the potential issues would > be from running ((CONFIG_HRTIMER || CONFIG_NO_HZ) && !CONFIG_SCHED_HRTICK). If SCHED_HRTICK isn't enabled but setup_sched_clock has been given an accessor for a real, hardware, fast, high resolution counter that meets all the needs of sched_clock, what's going on? If it's enabled, what extra is it doing that, say, my_plat_read_sched_clock doesn't? -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 1:06 ` Matt Sealey @ 2013-01-22 1:31 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 1:31 UTC (permalink / raw) To: Matt Sealey Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On 01/21/2013 05:06 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 02:54 PM, Matt Sealey wrote: >>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> >>> wrote: >>>> On 01/21/2013 01:14 PM, Matt Sealey wrote: >> As far as jiffies rating, from jiffies.c: >> .rating = 1, /* lowest valid rating*/ >> >> So I'm not sure what you mean by "the debug on the kernel log is telling me >> it has a higher resolution". > Oh, it is just if I actually don't run setup_sched_clock on my > platform, it gives a little message (with #define DEBUG 1 in > sched_clock.c) about who setup the last sched_clock. Since you only > get one chance, and I was fiddling with setup_sched_clock being probed > from multiple possible timers from device tree (i.MX3 has a crapload > of valid timers, which one you use right now is basically forced by > the not-quite-fully-DT-only code and some funky iomap tricks). > > And what I got was, if I use the real hardware timer, it runs at 66MHz > and says it has 15ns resolution and wraps every 500 seconds or so. The > jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the > drift. The generic reporting of how "good" the sched_clock source is > kind of glosses over the quality rating of the clock source and at > first glance (if you're not paying that much attention), it is a > little bit misleading.. I've got no clue on this. sched_clock is arch specific, and while ARM does use clocksources for sched_clock, what you're seeing is a detail of the ARM implementation and not the clocksource code (one complication is that clocksources rating values are for the requirements of timekeeping, which are different then the requirements for sched_clock - so the confusion is understandable). >> Yes, in the case I was remembering, the 60HZ was driven by the electrical >> line. > While I have your attention, what would be the minimum "good" speed to > run the sched_clock or delay timer implementation from? My rudimentary > scribblings in my notebook give me a value of "don't bother" with less > than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz > clock would do (i.MX osc clock input if I can supply it to one of the > above myriad timers) since this would be low-power compared to a 66MHz > one (by a couple mA anyway). I also have a bunch of questions about > the delay timer requirements.. I might mail you personally.. or would > you prefer on-list? So there are probably other folks who could better comment on sched_clock() or the delay timer (I'm guessing the delay() implementation is what you mean by that) design trade-offs. My first *guess* would be that for delay, you probably want a counter that has half-usec granularity or finer (~5Mhz), since udelay is likely the most common usage, and coarser then that and you might cause driver issues. Though you could probably get away with a cpu loop based delay and avoid requiring a high res counter. For sched_clock(), the standard reply is probably "as fast and as fine-graned as you can get". But as far as a lower-bound, I'd expect the CONFIG_HZ value would be a good bet, as many systems use jiffies for their sched_clock without major issue, though I'm sure there are interactivity trade-offs. But again, someone more familiar with the scheduler and driver requirements would probably be more informational. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 1:31 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 1:31 UTC (permalink / raw) To: linux-arm-kernel On 01/21/2013 05:06 PM, Matt Sealey wrote: > On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> wrote: >> On 01/21/2013 02:54 PM, Matt Sealey wrote: >>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> >>> wrote: >>>> On 01/21/2013 01:14 PM, Matt Sealey wrote: >> As far as jiffies rating, from jiffies.c: >> .rating = 1, /* lowest valid rating*/ >> >> So I'm not sure what you mean by "the debug on the kernel log is telling me >> it has a higher resolution". > Oh, it is just if I actually don't run setup_sched_clock on my > platform, it gives a little message (with #define DEBUG 1 in > sched_clock.c) about who setup the last sched_clock. Since you only > get one chance, and I was fiddling with setup_sched_clock being probed > from multiple possible timers from device tree (i.MX3 has a crapload > of valid timers, which one you use right now is basically forced by > the not-quite-fully-DT-only code and some funky iomap tricks). > > And what I got was, if I use the real hardware timer, it runs at 66MHz > and says it has 15ns resolution and wraps every 500 seconds or so. The > jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the > drift. The generic reporting of how "good" the sched_clock source is > kind of glosses over the quality rating of the clock source and at > first glance (if you're not paying that much attention), it is a > little bit misleading.. I've got no clue on this. sched_clock is arch specific, and while ARM does use clocksources for sched_clock, what you're seeing is a detail of the ARM implementation and not the clocksource code (one complication is that clocksources rating values are for the requirements of timekeeping, which are different then the requirements for sched_clock - so the confusion is understandable). >> Yes, in the case I was remembering, the 60HZ was driven by the electrical >> line. > While I have your attention, what would be the minimum "good" speed to > run the sched_clock or delay timer implementation from? My rudimentary > scribblings in my notebook give me a value of "don't bother" with less > than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz > clock would do (i.MX osc clock input if I can supply it to one of the > above myriad timers) since this would be low-power compared to a 66MHz > one (by a couple mA anyway). I also have a bunch of questions about > the delay timer requirements.. I might mail you personally.. or would > you prefer on-list? So there are probably other folks who could better comment on sched_clock() or the delay timer (I'm guessing the delay() implementation is what you mean by that) design trade-offs. My first *guess* would be that for delay, you probably want a counter that has half-usec granularity or finer (~5Mhz), since udelay is likely the most common usage, and coarser then that and you might cause driver issues. Though you could probably get away with a cpu loop based delay and avoid requiring a high res counter. For sched_clock(), the standard reply is probably "as fast and as fine-graned as you can get". But as far as a lower-bound, I'd expect the CONFIG_HZ value would be a good bet, as many systems use jiffies for their sched_clock without major issue, though I'm sure there are interactivity trade-offs. But again, someone more familiar with the scheduler and driver requirements would probably be more informational. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 1:31 ` John Stultz @ 2013-01-22 2:10 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 2:10 UTC (permalink / raw) To: John Stultz, Thomas Gleixner Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. On Mon, Jan 21, 2013 at 7:31 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 05:06 PM, Matt Sealey wrote: >> >> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> >> wrote: >>> >>> On 01/21/2013 02:54 PM, Matt Sealey wrote: >>>> >>>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> >>>> wrote: >>>>> >>>>> On 01/21/2013 01:14 PM, Matt Sealey wrote: >>> >>> As far as jiffies rating, from jiffies.c: >>> .rating = 1, /* lowest valid rating*/ >>> >>> So I'm not sure what you mean by "the debug on the kernel log is telling >>> me >>> it has a higher resolution". >> >> Oh, it is just if I actually don't run setup_sched_clock on my >> platform, it gives a little message (with #define DEBUG 1 in >> sched_clock.c) about who setup the last sched_clock. Since you only >> get one chance, and I was fiddling with setup_sched_clock being probed >> from multiple possible timers from device tree (i.MX3 has a crapload >> of valid timers, which one you use right now is basically forced by >> the not-quite-fully-DT-only code and some funky iomap tricks). >> >> And what I got was, if I use the real hardware timer, it runs at 66MHz >> and says it has 15ns resolution and wraps every 500 seconds or so. The >> jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the >> drift. The generic reporting of how "good" the sched_clock source is >> kind of glosses over the quality rating of the clock source and at >> first glance (if you're not paying that much attention), it is a >> little bit misleading.. > > > I've got no clue on this. sched_clock is arch specific, and while ARM does > use clocksources for sched_clock, what you're seeing is a detail of the ARM > implementation and not the clocksource code (one complication is that > clocksources rating values are for the requirements of timekeeping, which > are different then the requirements for sched_clock - so the confusion is > understandable). > > > >>> Yes, in the case I was remembering, the 60HZ was driven by the electrical >>> line. >> >> While I have your attention, what would be the minimum "good" speed to >> run the sched_clock or delay timer implementation from? My rudimentary >> scribblings in my notebook give me a value of "don't bother" with less >> than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz >> clock would do (i.MX osc clock input if I can supply it to one of the >> above myriad timers) since this would be low-power compared to a 66MHz >> one (by a couple mA anyway). I also have a bunch of questions about >> the delay timer requirements.. I might mail you personally.. or would >> you prefer on-list? > > So there are probably other folks who could better comment on sched_clock() > or the delay timer (I'm guessing the delay() implementation is what you mean > by that) design trade-offs. I'm specifically talking about if I do static struct delay_timer imx_gpt_delay_timer = { .read_current_timer = imx_gpt_read_current_timer, }; and then something like: imx_gpt_delay_timer.freq = clk_get_rate(clk_per); register_current_timer_delay(&imx_gpt_delay_timer); In the sense that now (as of kernel 3.7 iirc), I have an ability to have the delay implementation use this awesome fast accessor (which is nothing to do with a 'clocksource' as in the subsystem..) to get to my (here at least) 66.5MHz counter (up or down, i.MX has both, but I dunno if you can use a down counter for delay_timer, or if that's preferred, or what.. there are no examples of it.. but it seems to work.. that said I can't imagine what would be an immediately visible and not totally random effect of doing it "wrong", maybe that delays are instantly returned, that could be very hard or impossible to ever notice compared to not being able to browse the internet on the target device.. it might pop up on some randomly-not-resetting platform device or so, though..) And I can also put sched_clock on a completely different timer. Does that make any sense at all? I wouldn't know, it's not documented. And if I wanted to I could register 8 more timers. That seems rather excessive, but the ability to use those extra 8 as clock outputs from the SoC or otherwise directly use comparators is useful to some people, does Linux in general really give a damn about having 8 timers of the same quality being available when most systems barely have two clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and some SoC-specific timer). I dunno how many people might actually want to define in a device tree, but I figure every single one is not a bad thing and which ones end up as sched_clock, delay_timer or just plain registered clocksources, or not registered as a clocksource and accessed as some kind of comparator through some kooky ioctl API, is something you would also configure... > But again, someone more familiar with the scheduler and driver requirements > would probably be more informational. Okay. I assume that's a combination of Russell and Thomas.. -- Matt Sealey ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 2:10 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-22 2:10 UTC (permalink / raw) To: linux-arm-kernel Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. On Mon, Jan 21, 2013 at 7:31 PM, John Stultz <john.stultz@linaro.org> wrote: > On 01/21/2013 05:06 PM, Matt Sealey wrote: >> >> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz <john.stultz@linaro.org> >> wrote: >>> >>> On 01/21/2013 02:54 PM, Matt Sealey wrote: >>>> >>>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz <john.stultz@linaro.org> >>>> wrote: >>>>> >>>>> On 01/21/2013 01:14 PM, Matt Sealey wrote: >>> >>> As far as jiffies rating, from jiffies.c: >>> .rating = 1, /* lowest valid rating*/ >>> >>> So I'm not sure what you mean by "the debug on the kernel log is telling >>> me >>> it has a higher resolution". >> >> Oh, it is just if I actually don't run setup_sched_clock on my >> platform, it gives a little message (with #define DEBUG 1 in >> sched_clock.c) about who setup the last sched_clock. Since you only >> get one chance, and I was fiddling with setup_sched_clock being probed >> from multiple possible timers from device tree (i.MX3 has a crapload >> of valid timers, which one you use right now is basically forced by >> the not-quite-fully-DT-only code and some funky iomap tricks). >> >> And what I got was, if I use the real hardware timer, it runs at 66MHz >> and says it has 15ns resolution and wraps every 500 seconds or so. The >> jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the >> drift. The generic reporting of how "good" the sched_clock source is >> kind of glosses over the quality rating of the clock source and at >> first glance (if you're not paying that much attention), it is a >> little bit misleading.. > > > I've got no clue on this. sched_clock is arch specific, and while ARM does > use clocksources for sched_clock, what you're seeing is a detail of the ARM > implementation and not the clocksource code (one complication is that > clocksources rating values are for the requirements of timekeeping, which > are different then the requirements for sched_clock - so the confusion is > understandable). > > > >>> Yes, in the case I was remembering, the 60HZ was driven by the electrical >>> line. >> >> While I have your attention, what would be the minimum "good" speed to >> run the sched_clock or delay timer implementation from? My rudimentary >> scribblings in my notebook give me a value of "don't bother" with less >> than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz >> clock would do (i.MX osc clock input if I can supply it to one of the >> above myriad timers) since this would be low-power compared to a 66MHz >> one (by a couple mA anyway). I also have a bunch of questions about >> the delay timer requirements.. I might mail you personally.. or would >> you prefer on-list? > > So there are probably other folks who could better comment on sched_clock() > or the delay timer (I'm guessing the delay() implementation is what you mean > by that) design trade-offs. I'm specifically talking about if I do static struct delay_timer imx_gpt_delay_timer = { .read_current_timer = imx_gpt_read_current_timer, }; and then something like: imx_gpt_delay_timer.freq = clk_get_rate(clk_per); register_current_timer_delay(&imx_gpt_delay_timer); In the sense that now (as of kernel 3.7 iirc), I have an ability to have the delay implementation use this awesome fast accessor (which is nothing to do with a 'clocksource' as in the subsystem..) to get to my (here at least) 66.5MHz counter (up or down, i.MX has both, but I dunno if you can use a down counter for delay_timer, or if that's preferred, or what.. there are no examples of it.. but it seems to work.. that said I can't imagine what would be an immediately visible and not totally random effect of doing it "wrong", maybe that delays are instantly returned, that could be very hard or impossible to ever notice compared to not being able to browse the internet on the target device.. it might pop up on some randomly-not-resetting platform device or so, though..) And I can also put sched_clock on a completely different timer. Does that make any sense at all? I wouldn't know, it's not documented. And if I wanted to I could register 8 more timers. That seems rather excessive, but the ability to use those extra 8 as clock outputs from the SoC or otherwise directly use comparators is useful to some people, does Linux in general really give a damn about having 8 timers of the same quality being available when most systems barely have two clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and some SoC-specific timer). I dunno how many people might actually want to define in a device tree, but I figure every single one is not a bad thing and which ones end up as sched_clock, delay_timer or just plain registered clocksources, or not registered as a clocksource and accessed as some kind of comparator through some kooky ioctl API, is something you would also configure... > But again, someone more familiar with the scheduler and driver requirements > would probably be more informational. Okay. I assume that's a combination of Russell and Thomas.. -- Matt Sealey ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 2:10 ` Matt Sealey @ 2013-01-31 21:31 ` Thomas Gleixner -1 siblings, 0 replies; 96+ messages in thread From: Thomas Gleixner @ 2013-01-31 21:31 UTC (permalink / raw) To: Matt Sealey Cc: John Stultz, Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux On Mon, 21 Jan 2013, Matt Sealey wrote: > And if I wanted to I could register 8 more timers. That seems rather > excessive, but the ability to use those extra 8 as clock outputs from > the SoC or otherwise directly use comparators is useful to some > people, does Linux in general really give a damn about having 8 timers > of the same quality being available when most systems barely have two > clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and > some SoC-specific timer). I dunno how many people might actually want If you want to use that timers just for delivering arbitrary timer events, then no. There is no point to have a gazillion of timer interrupts happening w/o being coordinated. We have a pretty well structured timer event infrastructure for precise and more timeout oriented events, which are pretty happy to be served by a single per cpu event device. If you want to use the extra timers for other purposes (PWM, timer triggered DMA transfers, etc...) then they are not in any way related to the timers/timekeeping core. Thanks, tglx ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-31 21:31 ` Thomas Gleixner 0 siblings, 0 replies; 96+ messages in thread From: Thomas Gleixner @ 2013-01-31 21:31 UTC (permalink / raw) To: linux-arm-kernel On Mon, 21 Jan 2013, Matt Sealey wrote: > And if I wanted to I could register 8 more timers. That seems rather > excessive, but the ability to use those extra 8 as clock outputs from > the SoC or otherwise directly use comparators is useful to some > people, does Linux in general really give a damn about having 8 timers > of the same quality being available when most systems barely have two > clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and > some SoC-specific timer). I dunno how many people might actually want If you want to use that timers just for delivering arbitrary timer events, then no. There is no point to have a gazillion of timer interrupts happening w/o being coordinated. We have a pretty well structured timer event infrastructure for precise and more timeout oriented events, which are pretty happy to be served by a single per cpu event device. If you want to use the extra timers for other purposes (PWM, timer triggered DMA transfers, etc...) then they are not in any way related to the timers/timekeeping core. Thanks, tglx ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 20:41 ` Arnd Bergmann @ 2013-01-21 21:02 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 21:02 UTC (permalink / raw) To: Arnd Bergmann Cc: Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, John Stultz, Ben Dooks On Mon, Jan 21, 2013 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Monday 21 January 2013, Matt Sealey wrote: >> >> ARM seems to be the only "major" platform not using the >> kernel/Kconfig.hz definitions, instead rolling it's own and setting >> what could be described as both reasonable and unreasonable defaults >> for platforms. If we're going wholesale for multiplatform on ARM then >> having CONFIG_HZ be selected dependent on platform options seems >> rather curious since building a kernel for Exynos, OMAP or so will >> force the default to a value which is not truly desired by the >> maintainers. > > Agreed 100%. > > (adding John Stultz to Cc, he's the local time expert) Hi, John! Welcome to the fray :) >> config HZ >> int >> default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ >> ARCH_S5PV210 || ARCH_EXYNOS4 >> default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER >> default AT91_TIMER_HZ if ARCH_AT91 >> default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE >> default 100 >> [snip] >> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the >> other ARM platforms I also want to boot on it.. this is not exactly >> multiplatform compliant, right? > > Right. It's pretty clear that the above logic does not work > with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM > select NO_HZ to make the question much less interesting. > > Regarding the defaults, I would suggest putting them into all the > defaults into the defconfig files and removing the other hardcoding > otherwise. Ben Dooks and Russell are probably the best to know > what triggered the 200 HZ for s3c24xx and for ebsa110. My guess > is that the other samsung ones are the result of cargo cult > programming. > > at91 and omap set the HZ value to something that is derived > from their hardware timer, but we have also forever had logic > to calculate the exact time when that does not match. This code > has very recently been moved into the new register_refined_jiffies() > function. John can probably tell is if this solves all the problems > for these platforms. I would be very interested. My plan would be then (providing John responds in the affirmative) to basically submit a patch to remove the 8 lines pasted above and source kernel/Kconfig.hz instead. I'm doing this now on a local kernel tree and I can't see any real problem with it. It would then be up to the above-mentioned maintainers to decide if they are part of the cargo cult and don't need it or refine their board files to match the New World Order of using Kconfig.hz. The unconfigured kernel default is 100 anyway which is lower than all the above default setting, so I would technically be causing a regression on those platforms... do I want to be responsible for that? Probably not, but as I said, it's not affecting (in fact, it may be *improving*) the platforms I care about. >> Additionally, using kernel/Kconfig.hz is a predicate for enabling >> (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere >> else. I don't know how many ARM systems here benefit from this, if >> there is a benefit, or what this really means.. if you really have a >> high resolution timer (and hrtimers enabled) that would assist the >> scheduler this way, is it supposed to make a big difference to the way >> the scheduler works for the better or worse? Is this actually >> overridden by ARM sched_clock handling or so? Shouldn't there be a >> help entry or some documentation for what this option does? I have >> CC'd the scheduler maintainers because I'd really like to know what I >> am doing here before I venture into putting patches out which could >> potentially rip open spacetime and have us all sucked in.. > > Yes, that sounds like yet another bug. So is that a bug in that it is not available to ARM right now, a bug in that it would be impossible for anyone on ARM to have ever tested this code, or a bug in that it should NEVER be enabled for ARM for some reason? John? Ingo? :) -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 21:02 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 21:02 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 2:41 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Monday 21 January 2013, Matt Sealey wrote: >> >> ARM seems to be the only "major" platform not using the >> kernel/Kconfig.hz definitions, instead rolling it's own and setting >> what could be described as both reasonable and unreasonable defaults >> for platforms. If we're going wholesale for multiplatform on ARM then >> having CONFIG_HZ be selected dependent on platform options seems >> rather curious since building a kernel for Exynos, OMAP or so will >> force the default to a value which is not truly desired by the >> maintainers. > > Agreed 100%. > > (adding John Stultz to Cc, he's the local time expert) Hi, John! Welcome to the fray :) >> config HZ >> int >> default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \ >> ARCH_S5PV210 || ARCH_EXYNOS4 >> default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER >> default AT91_TIMER_HZ if ARCH_AT91 >> default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE >> default 100 >> [snip] >> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the >> other ARM platforms I also want to boot on it.. this is not exactly >> multiplatform compliant, right? > > Right. It's pretty clear that the above logic does not work > with multiplatform. Maybe we should just make ARCH_MULTIPLATFORM > select NO_HZ to make the question much less interesting. > > Regarding the defaults, I would suggest putting them into all the > defaults into the defconfig files and removing the other hardcoding > otherwise. Ben Dooks and Russell are probably the best to know > what triggered the 200 HZ for s3c24xx and for ebsa110. My guess > is that the other samsung ones are the result of cargo cult > programming. > > at91 and omap set the HZ value to something that is derived > from their hardware timer, but we have also forever had logic > to calculate the exact time when that does not match. This code > has very recently been moved into the new register_refined_jiffies() > function. John can probably tell is if this solves all the problems > for these platforms. I would be very interested. My plan would be then (providing John responds in the affirmative) to basically submit a patch to remove the 8 lines pasted above and source kernel/Kconfig.hz instead. I'm doing this now on a local kernel tree and I can't see any real problem with it. It would then be up to the above-mentioned maintainers to decide if they are part of the cargo cult and don't need it or refine their board files to match the New World Order of using Kconfig.hz. The unconfigured kernel default is 100 anyway which is lower than all the above default setting, so I would technically be causing a regression on those platforms... do I want to be responsible for that? Probably not, but as I said, it's not affecting (in fact, it may be *improving*) the platforms I care about. >> Additionally, using kernel/Kconfig.hz is a predicate for enabling >> (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere >> else. I don't know how many ARM systems here benefit from this, if >> there is a benefit, or what this really means.. if you really have a >> high resolution timer (and hrtimers enabled) that would assist the >> scheduler this way, is it supposed to make a big difference to the way >> the scheduler works for the better or worse? Is this actually >> overridden by ARM sched_clock handling or so? Shouldn't there be a >> help entry or some documentation for what this option does? I have >> CC'd the scheduler maintainers because I'd really like to know what I >> am doing here before I venture into putting patches out which could >> potentially rip open spacetime and have us all sucked in.. > > Yes, that sounds like yet another bug. So is that a bug in that it is not available to ARM right now, a bug in that it would be impossible for anyone on ARM to have ever tested this code, or a bug in that it should NEVER be enabled for ARM for some reason? John? Ingo? :) -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:02 ` Matt Sealey @ 2013-01-21 22:30 ` Arnd Bergmann -1 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-21 22:30 UTC (permalink / raw) To: Matt Sealey Cc: Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, Russell King - ARM Linux, John Stultz, Ben Dooks On Monday 21 January 2013, Matt Sealey wrote: > So is that a bug in that it is not available to ARM right now, a bug > in that it would be impossible for anyone on ARM to have ever tested > this code, or a bug in that it should NEVER be enabled for ARM for > some reason? John? Ingo? :) > I think it's a bug that it's not available. That does not look intentional. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:30 ` Arnd Bergmann 0 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-21 22:30 UTC (permalink / raw) To: linux-arm-kernel On Monday 21 January 2013, Matt Sealey wrote: > So is that a bug in that it is not available to ARM right now, a bug > in that it would be impossible for anyone on ARM to have ever tested > this code, or a bug in that it should NEVER be enabled for ARM for > some reason? John? Ingo? :) > I think it's a bug that it's not available. That does not look intentional. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:30 ` Arnd Bergmann @ 2013-01-21 22:45 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:45 UTC (permalink / raw) To: Arnd Bergmann Cc: Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, John Stultz, Ben Dooks On Mon, Jan 21, 2013 at 10:30:07PM +0000, Arnd Bergmann wrote: > On Monday 21 January 2013, Matt Sealey wrote: > > So is that a bug in that it is not available to ARM right now, a bug > > in that it would be impossible for anyone on ARM to have ever tested > > this code, or a bug in that it should NEVER be enabled for ARM for > > some reason? John? Ingo? :) > > > > I think it's a bug that it's not available. That does not look intentional. What's a bug? kernel/Kconfig.hz not being available? No, it's intentional. (See my replies). ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 22:45 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 22:45 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 10:30:07PM +0000, Arnd Bergmann wrote: > On Monday 21 January 2013, Matt Sealey wrote: > > So is that a bug in that it is not available to ARM right now, a bug > > in that it would be impossible for anyone on ARM to have ever tested > > this code, or a bug in that it should NEVER be enabled for ARM for > > some reason? John? Ingo? :) > > > > I think it's a bug that it's not available. That does not look intentional. What's a bug? kernel/Kconfig.hz not being available? No, it's intentional. (See my replies). ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 22:45 ` Russell King - ARM Linux @ 2013-01-21 23:01 ` Matt Sealey -1 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 23:01 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Arnd Bergmann, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, John Stultz, Ben Dooks On Mon, Jan 21, 2013 at 4:45 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 10:30:07PM +0000, Arnd Bergmann wrote: >> On Monday 21 January 2013, Matt Sealey wrote: >> > So is that a bug in that it is not available to ARM right now, a bug >> > in that it would be impossible for anyone on ARM to have ever tested >> > this code, or a bug in that it should NEVER be enabled for ARM for >> > some reason? John? Ingo? :) >> > >> >> I think it's a bug that it's not available. That does not look intentional. > > What's a bug? kernel/Kconfig.hz not being available? No, it's > intentional. (See my replies). The bug I saw as real is that CONFIG_SCHED_HRTICK is defined only in kernel/Kconfig.hz (and used in kernel/sched only) - so if we want that functionality enabled we will also have to opencode it in arch/arm/Kconfig. Everyone else, by virtue of using kernel/Kconfig.hz, gets this config item enabled for free if they have hrtimers or generic smp helpers.. if I understood what John just said, this means on ARM, since we don't use kernel/Kconfig.hz and we don't also define an item for CONFIG_SCHED_HRTICK, the process scheduler is completely oblivious that we're running in HRT mode? The thing I don't know is real is if that really matters one bit.. -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 23:01 ` Matt Sealey 0 siblings, 0 replies; 96+ messages in thread From: Matt Sealey @ 2013-01-21 23:01 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 4:45 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Mon, Jan 21, 2013 at 10:30:07PM +0000, Arnd Bergmann wrote: >> On Monday 21 January 2013, Matt Sealey wrote: >> > So is that a bug in that it is not available to ARM right now, a bug >> > in that it would be impossible for anyone on ARM to have ever tested >> > this code, or a bug in that it should NEVER be enabled for ARM for >> > some reason? John? Ingo? :) >> > >> >> I think it's a bug that it's not available. That does not look intentional. > > What's a bug? kernel/Kconfig.hz not being available? No, it's > intentional. (See my replies). The bug I saw as real is that CONFIG_SCHED_HRTICK is defined only in kernel/Kconfig.hz (and used in kernel/sched only) - so if we want that functionality enabled we will also have to opencode it in arch/arm/Kconfig. Everyone else, by virtue of using kernel/Kconfig.hz, gets this config item enabled for free if they have hrtimers or generic smp helpers.. if I understood what John just said, this means on ARM, since we don't use kernel/Kconfig.hz and we don't also define an item for CONFIG_SCHED_HRTICK, the process scheduler is completely oblivious that we're running in HRT mode? The thing I don't know is real is if that really matters one bit.. -- Matt Sealey <matt@genesi-usa.com> Product Development Analyst, Genesi USA, Inc. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 20:41 ` Arnd Bergmann @ 2013-01-21 21:03 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 21:03 UTC (permalink / raw) To: Arnd Bergmann Cc: Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, John Stultz, Ben Dooks On Mon, Jan 21, 2013 at 08:41:17PM +0000, Arnd Bergmann wrote: > On Monday 21 January 2013, Matt Sealey wrote: > > > > ARM seems to be the only "major" platform not using the > > kernel/Kconfig.hz definitions, instead rolling it's own and setting > > what could be described as both reasonable and unreasonable defaults > > for platforms. No, you've got this totally wrong. They're not defaults. And I object to your use of "unreasonable" too. I've no idea where you get that from. There's a reason why we have different HZ rates - some platforms just can't do the standard 100Hz tick rate. No way - their timers can't divide down to that interrupt rate. Sorry to spoil your ivory tower with a few facts, but your statement is just rediculous. The reason we don't use kernel/Kconfig.hz is precisely because of that; we _HAVE_ to have different HZ definitions on different platforms, and you'll notice that kernel/Kconfig.hz makes _no_ prevision for this. Now, while things have moved forwards and we have clocksource/clockevent support, not every platform can support this timekeeping structure; ebsa110 certainly can't. There's one timer and one timer only which is usable, which even needs to be manually reloaded by the CPU. No other independent counter to act as a clock source. As for Samsung and the rest I can't comment. The original reason OMAP used this though was because the 32768Hz counter can't produce 100Hz without a .1% error - too much error under pre-clocksource implementations for timekeeping. Whether that's changed with the clocksource/clockevent support needs to be checked. It's entirely possible with the modern clocksource/clockevent support that many of these platforms can have their alternative HZ tick rates removed - but there will continue to be a subset which can't, and all the time that we have such a subset, kernel/Kconfig.hz can't be used without modification. ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 21:03 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-21 21:03 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jan 21, 2013 at 08:41:17PM +0000, Arnd Bergmann wrote: > On Monday 21 January 2013, Matt Sealey wrote: > > > > ARM seems to be the only "major" platform not using the > > kernel/Kconfig.hz definitions, instead rolling it's own and setting > > what could be described as both reasonable and unreasonable defaults > > for platforms. No, you've got this totally wrong. They're not defaults. And I object to your use of "unreasonable" too. I've no idea where you get that from. There's a reason why we have different HZ rates - some platforms just can't do the standard 100Hz tick rate. No way - their timers can't divide down to that interrupt rate. Sorry to spoil your ivory tower with a few facts, but your statement is just rediculous. The reason we don't use kernel/Kconfig.hz is precisely because of that; we _HAVE_ to have different HZ definitions on different platforms, and you'll notice that kernel/Kconfig.hz makes _no_ prevision for this. Now, while things have moved forwards and we have clocksource/clockevent support, not every platform can support this timekeeping structure; ebsa110 certainly can't. There's one timer and one timer only which is usable, which even needs to be manually reloaded by the CPU. No other independent counter to act as a clock source. As for Samsung and the rest I can't comment. The original reason OMAP used this though was because the 32768Hz counter can't produce 100Hz without a .1% error - too much error under pre-clocksource implementations for timekeeping. Whether that's changed with the clocksource/clockevent support needs to be checked. It's entirely possible with the modern clocksource/clockevent support that many of these platforms can have their alternative HZ tick rates removed - but there will continue to be a subset which can't, and all the time that we have such a subset, kernel/Kconfig.hz can't be used without modification. ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 21:03 ` Russell King - ARM Linux @ 2013-01-21 23:23 ` Tony Lindgren -1 siblings, 0 replies; 96+ messages in thread From: Tony Lindgren @ 2013-01-21 23:23 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Arnd Bergmann, Matt Sealey, Linux ARM Kernel ML, LKML, Peter Zijlstra, Ingo Molnar, John Stultz, Ben Dooks * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: > > As for Samsung and the rest I can't comment. The original reason OMAP > used this though was because the 32768Hz counter can't produce 100Hz > without a .1% error - too much error under pre-clocksource > implementations for timekeeping. Whether that's changed with the > clocksource/clockevent support needs to be checked. Yes that's why HZ was originally set to 128. That value (or some multiple) still makes sense when the 32 KiHZ clock source is being used. Of course we should rely on the local timer when running for the SoCs that have them. Regards, Tony ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-21 23:23 ` Tony Lindgren 0 siblings, 0 replies; 96+ messages in thread From: Tony Lindgren @ 2013-01-21 23:23 UTC (permalink / raw) To: linux-arm-kernel * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: > > As for Samsung and the rest I can't comment. The original reason OMAP > used this though was because the 32768Hz counter can't produce 100Hz > without a .1% error - too much error under pre-clocksource > implementations for timekeeping. Whether that's changed with the > clocksource/clockevent support needs to be checked. Yes that's why HZ was originally set to 128. That value (or some multiple) still makes sense when the 32 KiHZ clock source is being used. Of course we should rely on the local timer when running for the SoCs that have them. Regards, Tony ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-21 23:23 ` Tony Lindgren @ 2013-01-22 6:23 ` Santosh Shilimkar -1 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-22 6:23 UTC (permalink / raw) To: Tony Lindgren Cc: Russell King - ARM Linux, Arnd Bergmann, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote: > * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: >> >> As for Samsung and the rest I can't comment. The original reason OMAP >> used this though was because the 32768Hz counter can't produce 100Hz >> without a .1% error - too much error under pre-clocksource >> implementations for timekeeping. Whether that's changed with the >> clocksource/clockevent support needs to be checked. > > Yes that's why HZ was originally set to 128. That value (or some multiple) > still makes sense when the 32 KiHZ clock source is being used. Of course > we should rely on the local timer when running for the SoCs that have > them. > This is right. It was only because of the drift associated when clocked with 32KHz. Even on SOCs where local timers are available for power management reasons we need to switch to 32KHz clocked device in low power states. Hence the HZ value should be multiple of 32 on OMAP. Regards Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 6:23 ` Santosh Shilimkar 0 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-22 6:23 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote: > * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: >> >> As for Samsung and the rest I can't comment. The original reason OMAP >> used this though was because the 32768Hz counter can't produce 100Hz >> without a .1% error - too much error under pre-clocksource >> implementations for timekeeping. Whether that's changed with the >> clocksource/clockevent support needs to be checked. > > Yes that's why HZ was originally set to 128. That value (or some multiple) > still makes sense when the 32 KiHZ clock source is being used. Of course > we should rely on the local timer when running for the SoCs that have > them. > This is right. It was only because of the drift associated when clocked with 32KHz. Even on SOCs where local timers are available for power management reasons we need to switch to 32KHz clocked device in low power states. Hence the HZ value should be multiple of 32 on OMAP. Regards Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 6:23 ` Santosh Shilimkar @ 2013-01-22 9:31 ` Arnd Bergmann -1 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-22 9:31 UTC (permalink / raw) To: Santosh Shilimkar Cc: Tony Lindgren, Russell King - ARM Linux, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tuesday 22 January 2013, Santosh Shilimkar wrote: > On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote: > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: > >> > >> As for Samsung and the rest I can't comment. The original reason OMAP > >> used this though was because the 32768Hz counter can't produce 100Hz > >> without a .1% error - too much error under pre-clocksource > >> implementations for timekeeping. Whether that's changed with the > >> clocksource/clockevent support needs to be checked. > > > > Yes that's why HZ was originally set to 128. That value (or some multiple) > > still makes sense when the 32 KiHZ clock source is being used. Of course > > we should rely on the local timer when running for the SoCs that have > > them. > > > This is right. It was only because of the drift associated when clocked > with 32KHz. Even on SOCs where local timers are available for power > management reasons we need to switch to 32KHz clocked device in > low power states. Hence the HZ value should be multiple of 32 on > OMAP. I need some help understanding what the two of you are saying, because it sounds to me that you imply we cannot have a multiplatform kernel that includes OMAP and another platform that needs (or wants) a different HZ value. However, I also thought that when using a proper clocksource driver, the HZ setting has absolutely no impact on the drift of the wall clock, because those two are decoupled. Even when using the HZ based clocksource (for whatever reason you would want to do that), I thought there should be no drift as long as the CLOCK_TICK_RATE (in older kernels) or the register_refined_jiffies() (in older kernels) setting matches the hardware timer frequency. What am I missing? Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 9:31 ` Arnd Bergmann 0 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-22 9:31 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 22 January 2013, Santosh Shilimkar wrote: > On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote: > > * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: > >> > >> As for Samsung and the rest I can't comment. The original reason OMAP > >> used this though was because the 32768Hz counter can't produce 100Hz > >> without a .1% error - too much error under pre-clocksource > >> implementations for timekeeping. Whether that's changed with the > >> clocksource/clockevent support needs to be checked. > > > > Yes that's why HZ was originally set to 128. That value (or some multiple) > > still makes sense when the 32 KiHZ clock source is being used. Of course > > we should rely on the local timer when running for the SoCs that have > > them. > > > This is right. It was only because of the drift associated when clocked > with 32KHz. Even on SOCs where local timers are available for power > management reasons we need to switch to 32KHz clocked device in > low power states. Hence the HZ value should be multiple of 32 on > OMAP. I need some help understanding what the two of you are saying, because it sounds to me that you imply we cannot have a multiplatform kernel that includes OMAP and another platform that needs (or wants) a different HZ value. However, I also thought that when using a proper clocksource driver, the HZ setting has absolutely no impact on the drift of the wall clock, because those two are decoupled. Even when using the HZ based clocksource (for whatever reason you would want to do that), I thought there should be no drift as long as the CLOCK_TICK_RATE (in older kernels) or the register_refined_jiffies() (in older kernels) setting matches the hardware timer frequency. What am I missing? Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 9:31 ` Arnd Bergmann @ 2013-01-22 10:14 ` Santosh Shilimkar -1 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-22 10:14 UTC (permalink / raw) To: Arnd Bergmann Cc: Tony Lindgren, Russell King - ARM Linux, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tuesday 22 January 2013 03:01 PM, Arnd Bergmann wrote: > On Tuesday 22 January 2013, Santosh Shilimkar wrote: >> On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote: >>> * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: >>>> >>>> As for Samsung and the rest I can't comment. The original reason OMAP >>>> used this though was because the 32768Hz counter can't produce 100Hz >>>> without a .1% error - too much error under pre-clocksource >>>> implementations for timekeeping. Whether that's changed with the >>>> clocksource/clockevent support needs to be checked. >>> >>> Yes that's why HZ was originally set to 128. That value (or some multiple) >>> still makes sense when the 32 KiHZ clock source is being used. Of course >>> we should rely on the local timer when running for the SoCs that have >>> them. >>> >> This is right. It was only because of the drift associated when clocked >> with 32KHz. Even on SOCs where local timers are available for power >> management reasons we need to switch to 32KHz clocked device in >> low power states. Hence the HZ value should be multiple of 32 on >> OMAP. > > I need some help understanding what the two of you are saying, because > it sounds to me that you imply we cannot have a multiplatform kernel > that includes OMAP and another platform that needs (or wants) a different > HZ value. > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which is always running(even during low power states) and hence the clock source and clock event have been clocked using 32KHz clock. As mentioned by RMK, with 32768 Hz clock and HZ = 100, there will be always an error of 0.1 %. This accuracy also impacts the timer tick interval. This was the reason, OMAP has been using the HZ = 128. There is a hardware feature to implement 1 ms correction on the timer to overcome such an issue but it was not supported on OMAP2 devices. OMAP3/4/5 does support it. Though one attempt [1] was made to support it in kernel. This will ofcourse will address the tick interval corrections. > However, I also thought that when using a proper clocksource driver, > the HZ setting has absolutely no impact on the drift of the wall clock, > because those two are decoupled. > I am not too sure about this. I was under impression that tick (clock event) ticking accuracy does impact the kernel time keeping as well. > Even when using the HZ based clocksource (for whatever reason you > would want to do that), I thought there should be no drift as long > as the CLOCK_TICK_RATE (in older kernels) or the register_refined_jiffies() > (in older kernels) setting matches the hardware timer frequency. > > What am I missing? > The issue is with hardware timer frequency itself since with HZ = 100 or 200, the timer tick will not be accurate. Hope this gives bit more info. Regards, Santosh [1] https://patchwork.kernel.org/patch/107364/ ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 10:14 ` Santosh Shilimkar 0 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-22 10:14 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 22 January 2013 03:01 PM, Arnd Bergmann wrote: > On Tuesday 22 January 2013, Santosh Shilimkar wrote: >> On Tuesday 22 January 2013 04:53 AM, Tony Lindgren wrote: >>> * Russell King - ARM Linux <linux@arm.linux.org.uk> [130121 13:07]: >>>> >>>> As for Samsung and the rest I can't comment. The original reason OMAP >>>> used this though was because the 32768Hz counter can't produce 100Hz >>>> without a .1% error - too much error under pre-clocksource >>>> implementations for timekeeping. Whether that's changed with the >>>> clocksource/clockevent support needs to be checked. >>> >>> Yes that's why HZ was originally set to 128. That value (or some multiple) >>> still makes sense when the 32 KiHZ clock source is being used. Of course >>> we should rely on the local timer when running for the SoCs that have >>> them. >>> >> This is right. It was only because of the drift associated when clocked >> with 32KHz. Even on SOCs where local timers are available for power >> management reasons we need to switch to 32KHz clocked device in >> low power states. Hence the HZ value should be multiple of 32 on >> OMAP. > > I need some help understanding what the two of you are saying, because > it sounds to me that you imply we cannot have a multiplatform kernel > that includes OMAP and another platform that needs (or wants) a different > HZ value. > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which is always running(even during low power states) and hence the clock source and clock event have been clocked using 32KHz clock. As mentioned by RMK, with 32768 Hz clock and HZ = 100, there will be always an error of 0.1 %. This accuracy also impacts the timer tick interval. This was the reason, OMAP has been using the HZ = 128. There is a hardware feature to implement 1 ms correction on the timer to overcome such an issue but it was not supported on OMAP2 devices. OMAP3/4/5 does support it. Though one attempt [1] was made to support it in kernel. This will ofcourse will address the tick interval corrections. > However, I also thought that when using a proper clocksource driver, > the HZ setting has absolutely no impact on the drift of the wall clock, > because those two are decoupled. > I am not too sure about this. I was under impression that tick (clock event) ticking accuracy does impact the kernel time keeping as well. > Even when using the HZ based clocksource (for whatever reason you > would want to do that), I thought there should be no drift as long > as the CLOCK_TICK_RATE (in older kernels) or the register_refined_jiffies() > (in older kernels) setting matches the hardware timer frequency. > > What am I missing? > The issue is with hardware timer frequency itself since with HZ = 100 or 200, the timer tick will not be accurate. Hope this gives bit more info. Regards, Santosh [1] https://patchwork.kernel.org/patch/107364/ ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 10:14 ` Santosh Shilimkar @ 2013-01-22 14:51 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-22 14:51 UTC (permalink / raw) To: Santosh Shilimkar Cc: Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which > is always running(even during low power states) and hence the clock > source and clock event have been clocked using 32KHz clock. As mentioned > by RMK, with 32768 Hz clock and HZ = 100, there will be always an > error of 0.1 %. This accuracy also impacts the timer tick interval. > This was the reason, OMAP has been using the HZ = 128. Ok. Let's look at this. As far as time-of-day is concerned, this shouldn't really matter with the clocksource/clockevent based system that we now have (where *important point* platforms have been converted over.) Any platform providing a clocksource will override the jiffy-based clocksource. The measurement of time-of-day passing is now based on the difference in values read from the clocksource, not from the actual tick rate. Anything _not_ providing a clock source will be reliant on jiffies incrementing, which in turn _requires_ one timer interrupt per jiffies at a known rate (which is HZ). Now, that's the time of day, what about jiffies? Well, jiffies is incremented based on a certain number of nsec having passed since the last jiffy update. That means the code copes with dropped ticks and the like. However, if your actual interrupt rate is close to the desired HZ, then it can lead to some interesting effects (and noise): - if the interrupt rate is slightly faster than HZ, then you can end up with updates being delayed by 2x interrupt rate. - if the interrupt rate is slightly slower than HZ, you can occasionally end up with jiffies incrementing by two. - if your interrupt rate is dead on HZ, then other system noise can come into effect and you may get maybe zero, one or two jiffy increments per interrupt. (You have to think about time passing in NS, where jiffy updates should be vs where the timer interrupts happen.) See tick_do_update_jiffies64() for the details. The timer infrastructure is jiffy based - which includes scheduling where the scheduler does not use hrtimers. That means a slight discrepency between HZ and the actual interrupt rate can cause around 1/HZ jitter. That's a matter of fact due to how the code works. So, actually, I think the accuracy of HZ has much overall effect _provided_ a platform provides a clocksource to the accuracy of jiffy based timers nor timekeeping. For those which don't, the accuracy of the timer interrupt to HZ is very important. (This is just based on reading some code and not on practical experiments - I'd suggest some research of this is done, trying HZ=100 on OMAP's 32kHz timers, checking whether there's any drift, checking how accurately a single task can be woken from various select/poll/epoll delays, and checking whether NTP works.) And I think further discussion is pointless until such research has been done (or someone who _really_ knows the time keeping/timer/sched code inside out comments.) ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 14:51 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-22 14:51 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which > is always running(even during low power states) and hence the clock > source and clock event have been clocked using 32KHz clock. As mentioned > by RMK, with 32768 Hz clock and HZ = 100, there will be always an > error of 0.1 %. This accuracy also impacts the timer tick interval. > This was the reason, OMAP has been using the HZ = 128. Ok. Let's look at this. As far as time-of-day is concerned, this shouldn't really matter with the clocksource/clockevent based system that we now have (where *important point* platforms have been converted over.) Any platform providing a clocksource will override the jiffy-based clocksource. The measurement of time-of-day passing is now based on the difference in values read from the clocksource, not from the actual tick rate. Anything _not_ providing a clock source will be reliant on jiffies incrementing, which in turn _requires_ one timer interrupt per jiffies at a known rate (which is HZ). Now, that's the time of day, what about jiffies? Well, jiffies is incremented based on a certain number of nsec having passed since the last jiffy update. That means the code copes with dropped ticks and the like. However, if your actual interrupt rate is close to the desired HZ, then it can lead to some interesting effects (and noise): - if the interrupt rate is slightly faster than HZ, then you can end up with updates being delayed by 2x interrupt rate. - if the interrupt rate is slightly slower than HZ, you can occasionally end up with jiffies incrementing by two. - if your interrupt rate is dead on HZ, then other system noise can come into effect and you may get maybe zero, one or two jiffy increments per interrupt. (You have to think about time passing in NS, where jiffy updates should be vs where the timer interrupts happen.) See tick_do_update_jiffies64() for the details. The timer infrastructure is jiffy based - which includes scheduling where the scheduler does not use hrtimers. That means a slight discrepency between HZ and the actual interrupt rate can cause around 1/HZ jitter. That's a matter of fact due to how the code works. So, actually, I think the accuracy of HZ has much overall effect _provided_ a platform provides a clocksource to the accuracy of jiffy based timers nor timekeeping. For those which don't, the accuracy of the timer interrupt to HZ is very important. (This is just based on reading some code and not on practical experiments - I'd suggest some research of this is done, trying HZ=100 on OMAP's 32kHz timers, checking whether there's any drift, checking how accurately a single task can be woken from various select/poll/epoll delays, and checking whether NTP works.) And I think further discussion is pointless until such research has been done (or someone who _really_ knows the time keeping/timer/sched code inside out comments.) ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 14:51 ` Russell King - ARM Linux @ 2013-01-22 15:05 ` Santosh Shilimkar -1 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-22 15:05 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: > On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which >> is always running(even during low power states) and hence the clock >> source and clock event have been clocked using 32KHz clock. As mentioned >> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >> error of 0.1 %. This accuracy also impacts the timer tick interval. >> This was the reason, OMAP has been using the HZ = 128. > > Ok. Let's look at this. As far as time-of-day is concerned, this > shouldn't really matter with the clocksource/clockevent based system > that we now have (where *important point* platforms have been converted > over.) > > Any platform providing a clocksource will override the jiffy-based > clocksource. The measurement of time-of-day passing is now based on > the difference in values read from the clocksource, not from the actual > tick rate. > > Anything _not_ providing a clock source will be reliant on jiffies > incrementing, which in turn _requires_ one timer interrupt per jiffies > at a known rate (which is HZ). > > Now, that's the time of day, what about jiffies? Well, jiffies is > incremented based on a certain number of nsec having passed since the > last jiffy update. That means the code copes with dropped ticks and > the like. > > However, if your actual interrupt rate is close to the desired HZ, then > it can lead to some interesting effects (and noise): > > - if the interrupt rate is slightly faster than HZ, then you can end up > with updates being delayed by 2x interrupt rate. > - if the interrupt rate is slightly slower than HZ, you can occasionally > end up with jiffies incrementing by two. > - if your interrupt rate is dead on HZ, then other system noise can come > into effect and you may get maybe zero, one or two jiffy increments per > interrupt. > > (You have to think about time passing in NS, where jiffy updates should > be vs where the timer interrupts happen.) See tick_do_update_jiffies64() > for the details. > > The timer infrastructure is jiffy based - which includes scheduling where > the scheduler does not use hrtimers. That means a slight discrepency > between HZ and the actual interrupt rate can cause around 1/HZ jitter. > That's a matter of fact due to how the code works. > > So, actually, I think the accuracy of HZ has much overall effect _provided_ > a platform provides a clocksource to the accuracy of jiffy based timers > nor timekeeping. For those which don't, the accuracy of the timer > interrupt to HZ is very important. > > (This is just based on reading some code and not on practical > experiments - I'd suggest some research of this is done, trying HZ=100 > on OMAP's 32kHz timers, checking whether there's any drift, checking > how accurately a single task can be woken from various select/poll/epoll > delays, and checking whether NTP works.) > Thanks for expanding it. It is really helpful. > And I think further discussion is pointless until such research has been > done (or someone who _really_ knows the time keeping/timer/sched code > inside out comments.) > Fully agree about experimentation to re-asses the drift. From what I recollect from past, few OMAP customers did report the time drift issue and that is how the switch from 100 --> 128 happened. Anyway I have added the suggested task to my long todo list. Regards, Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 15:05 ` Santosh Shilimkar 0 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-22 15:05 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: > On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which >> is always running(even during low power states) and hence the clock >> source and clock event have been clocked using 32KHz clock. As mentioned >> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >> error of 0.1 %. This accuracy also impacts the timer tick interval. >> This was the reason, OMAP has been using the HZ = 128. > > Ok. Let's look at this. As far as time-of-day is concerned, this > shouldn't really matter with the clocksource/clockevent based system > that we now have (where *important point* platforms have been converted > over.) > > Any platform providing a clocksource will override the jiffy-based > clocksource. The measurement of time-of-day passing is now based on > the difference in values read from the clocksource, not from the actual > tick rate. > > Anything _not_ providing a clock source will be reliant on jiffies > incrementing, which in turn _requires_ one timer interrupt per jiffies > at a known rate (which is HZ). > > Now, that's the time of day, what about jiffies? Well, jiffies is > incremented based on a certain number of nsec having passed since the > last jiffy update. That means the code copes with dropped ticks and > the like. > > However, if your actual interrupt rate is close to the desired HZ, then > it can lead to some interesting effects (and noise): > > - if the interrupt rate is slightly faster than HZ, then you can end up > with updates being delayed by 2x interrupt rate. > - if the interrupt rate is slightly slower than HZ, you can occasionally > end up with jiffies incrementing by two. > - if your interrupt rate is dead on HZ, then other system noise can come > into effect and you may get maybe zero, one or two jiffy increments per > interrupt. > > (You have to think about time passing in NS, where jiffy updates should > be vs where the timer interrupts happen.) See tick_do_update_jiffies64() > for the details. > > The timer infrastructure is jiffy based - which includes scheduling where > the scheduler does not use hrtimers. That means a slight discrepency > between HZ and the actual interrupt rate can cause around 1/HZ jitter. > That's a matter of fact due to how the code works. > > So, actually, I think the accuracy of HZ has much overall effect _provided_ > a platform provides a clocksource to the accuracy of jiffy based timers > nor timekeeping. For those which don't, the accuracy of the timer > interrupt to HZ is very important. > > (This is just based on reading some code and not on practical > experiments - I'd suggest some research of this is done, trying HZ=100 > on OMAP's 32kHz timers, checking whether there's any drift, checking > how accurately a single task can be woken from various select/poll/epoll > delays, and checking whether NTP works.) > Thanks for expanding it. It is really helpful. > And I think further discussion is pointless until such research has been > done (or someone who _really_ knows the time keeping/timer/sched code > inside out comments.) > Fully agree about experimentation to re-asses the drift. From what I recollect from past, few OMAP customers did report the time drift issue and that is how the switch from 100 --> 128 happened. Anyway I have added the suggested task to my long todo list. Regards, Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 15:05 ` Santosh Shilimkar @ 2013-01-28 6:08 ` Santosh Shilimkar -1 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-28 6:08 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: > On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which >>> is always running(even during low power states) and hence the clock >>> source and clock event have been clocked using 32KHz clock. As mentioned >>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >>> error of 0.1 %. This accuracy also impacts the timer tick interval. >>> This was the reason, OMAP has been using the HZ = 128. >> >> Ok. Let's look at this. As far as time-of-day is concerned, this >> shouldn't really matter with the clocksource/clockevent based system >> that we now have (where *important point* platforms have been converted >> over.) >> >> Any platform providing a clocksource will override the jiffy-based >> clocksource. The measurement of time-of-day passing is now based on >> the difference in values read from the clocksource, not from the actual >> tick rate. >> >> Anything _not_ providing a clock source will be reliant on jiffies >> incrementing, which in turn _requires_ one timer interrupt per jiffies >> at a known rate (which is HZ). >> >> Now, that's the time of day, what about jiffies? Well, jiffies is >> incremented based on a certain number of nsec having passed since the >> last jiffy update. That means the code copes with dropped ticks and >> the like. >> >> However, if your actual interrupt rate is close to the desired HZ, then >> it can lead to some interesting effects (and noise): >> >> - if the interrupt rate is slightly faster than HZ, then you can end up >> with updates being delayed by 2x interrupt rate. >> - if the interrupt rate is slightly slower than HZ, you can occasionally >> end up with jiffies incrementing by two. >> - if your interrupt rate is dead on HZ, then other system noise can come >> into effect and you may get maybe zero, one or two jiffy increments >> per >> interrupt. >> >> (You have to think about time passing in NS, where jiffy updates should >> be vs where the timer interrupts happen.) See tick_do_update_jiffies64() >> for the details. >> >> The timer infrastructure is jiffy based - which includes scheduling where >> the scheduler does not use hrtimers. That means a slight discrepency >> between HZ and the actual interrupt rate can cause around 1/HZ jitter. >> That's a matter of fact due to how the code works. >> >> So, actually, I think the accuracy of HZ has much overall effect >> _provided_ >> a platform provides a clocksource to the accuracy of jiffy based timers >> nor timekeeping. For those which don't, the accuracy of the timer >> interrupt to HZ is very important. >> >> (This is just based on reading some code and not on practical >> experiments - I'd suggest some research of this is done, trying HZ=100 >> on OMAP's 32kHz timers, checking whether there's any drift, checking >> how accurately a single task can be woken from various select/poll/epoll >> delays, and checking whether NTP works.) >> > Thanks for expanding it. It is really helpful. > >> And I think further discussion is pointless until such research has been >> done (or someone who _really_ knows the time keeping/timer/sched code >> inside out comments.) >> > Fully agree about experimentation to re-asses the drift. > From what I recollect from past, few OMAP customers did > report the time drift issue and that is how the switch > from 100 --> 128 happened. > > Anyway I have added the suggested task to my long todo list. > So I tried to see if any time drift with HZ = 100 on OMAP. I ran the setup for 62 hours and 27 mins with time synced up once with NTP server. I measure about ~174 millisecond drift which is almost noise considering the observed duration was ~224820000 milliseconds. Am re-running the setup with HZ = 128 for similar time frame to see if the minimal drift observed goes away. Once through that, I will send a patch to update the OMAP to use HZ = 100 and possibly get rid of the custom OMAP HZ config. Regards, Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-28 6:08 ` Santosh Shilimkar 0 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-28 6:08 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: > On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which >>> is always running(even during low power states) and hence the clock >>> source and clock event have been clocked using 32KHz clock. As mentioned >>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >>> error of 0.1 %. This accuracy also impacts the timer tick interval. >>> This was the reason, OMAP has been using the HZ = 128. >> >> Ok. Let's look at this. As far as time-of-day is concerned, this >> shouldn't really matter with the clocksource/clockevent based system >> that we now have (where *important point* platforms have been converted >> over.) >> >> Any platform providing a clocksource will override the jiffy-based >> clocksource. The measurement of time-of-day passing is now based on >> the difference in values read from the clocksource, not from the actual >> tick rate. >> >> Anything _not_ providing a clock source will be reliant on jiffies >> incrementing, which in turn _requires_ one timer interrupt per jiffies >> at a known rate (which is HZ). >> >> Now, that's the time of day, what about jiffies? Well, jiffies is >> incremented based on a certain number of nsec having passed since the >> last jiffy update. That means the code copes with dropped ticks and >> the like. >> >> However, if your actual interrupt rate is close to the desired HZ, then >> it can lead to some interesting effects (and noise): >> >> - if the interrupt rate is slightly faster than HZ, then you can end up >> with updates being delayed by 2x interrupt rate. >> - if the interrupt rate is slightly slower than HZ, you can occasionally >> end up with jiffies incrementing by two. >> - if your interrupt rate is dead on HZ, then other system noise can come >> into effect and you may get maybe zero, one or two jiffy increments >> per >> interrupt. >> >> (You have to think about time passing in NS, where jiffy updates should >> be vs where the timer interrupts happen.) See tick_do_update_jiffies64() >> for the details. >> >> The timer infrastructure is jiffy based - which includes scheduling where >> the scheduler does not use hrtimers. That means a slight discrepency >> between HZ and the actual interrupt rate can cause around 1/HZ jitter. >> That's a matter of fact due to how the code works. >> >> So, actually, I think the accuracy of HZ has much overall effect >> _provided_ >> a platform provides a clocksource to the accuracy of jiffy based timers >> nor timekeeping. For those which don't, the accuracy of the timer >> interrupt to HZ is very important. >> >> (This is just based on reading some code and not on practical >> experiments - I'd suggest some research of this is done, trying HZ=100 >> on OMAP's 32kHz timers, checking whether there's any drift, checking >> how accurately a single task can be woken from various select/poll/epoll >> delays, and checking whether NTP works.) >> > Thanks for expanding it. It is really helpful. > >> And I think further discussion is pointless until such research has been >> done (or someone who _really_ knows the time keeping/timer/sched code >> inside out comments.) >> > Fully agree about experimentation to re-asses the drift. > From what I recollect from past, few OMAP customers did > report the time drift issue and that is how the switch > from 100 --> 128 happened. > > Anyway I have added the suggested task to my long todo list. > So I tried to see if any time drift with HZ = 100 on OMAP. I ran the setup for 62 hours and 27 mins with time synced up once with NTP server. I measure about ~174 millisecond drift which is almost noise considering the observed duration was ~224820000 milliseconds. Am re-running the setup with HZ = 128 for similar time frame to see if the minimal drift observed goes away. Once through that, I will send a patch to update the OMAP to use HZ = 100 and possibly get rid of the custom OMAP HZ config. Regards, Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-28 6:08 ` Santosh Shilimkar @ 2013-01-29 0:01 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-29 0:01 UTC (permalink / raw) To: Santosh Shilimkar Cc: Russell King - ARM Linux, Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML On 01/27/2013 10:08 PM, Santosh Shilimkar wrote: > On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: >> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >>>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock >>>> which >>>> is always running(even during low power states) and hence the clock >>>> source and clock event have been clocked using 32KHz clock. As >>>> mentioned >>>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >>>> error of 0.1 %. This accuracy also impacts the timer tick interval. >>>> This was the reason, OMAP has been using the HZ = 128. >>> >>> Ok. Let's look at this. As far as time-of-day is concerned, this >>> shouldn't really matter with the clocksource/clockevent based system >>> that we now have (where *important point* platforms have been converted >>> over.) >>> >>> Any platform providing a clocksource will override the jiffy-based >>> clocksource. The measurement of time-of-day passing is now based on >>> the difference in values read from the clocksource, not from the actual >>> tick rate. >>> >>> Anything _not_ providing a clock source will be reliant on jiffies >>> incrementing, which in turn _requires_ one timer interrupt per jiffies >>> at a known rate (which is HZ). >>> >>> Now, that's the time of day, what about jiffies? Well, jiffies is >>> incremented based on a certain number of nsec having passed since the >>> last jiffy update. That means the code copes with dropped ticks and >>> the like. >>> >>> However, if your actual interrupt rate is close to the desired HZ, then >>> it can lead to some interesting effects (and noise): >>> >>> - if the interrupt rate is slightly faster than HZ, then you can end up >>> with updates being delayed by 2x interrupt rate. >>> - if the interrupt rate is slightly slower than HZ, you can >>> occasionally >>> end up with jiffies incrementing by two. >>> - if your interrupt rate is dead on HZ, then other system noise can >>> come >>> into effect and you may get maybe zero, one or two jiffy increments >>> per >>> interrupt. >>> >>> (You have to think about time passing in NS, where jiffy updates should >>> be vs where the timer interrupts happen.) See >>> tick_do_update_jiffies64() >>> for the details. >>> >>> The timer infrastructure is jiffy based - which includes scheduling >>> where >>> the scheduler does not use hrtimers. That means a slight discrepency >>> between HZ and the actual interrupt rate can cause around 1/HZ jitter. >>> That's a matter of fact due to how the code works. >>> >>> So, actually, I think the accuracy of HZ has much overall effect >>> _provided_ >>> a platform provides a clocksource to the accuracy of jiffy based timers >>> nor timekeeping. For those which don't, the accuracy of the timer >>> interrupt to HZ is very important. >>> >>> (This is just based on reading some code and not on practical >>> experiments - I'd suggest some research of this is done, trying HZ=100 >>> on OMAP's 32kHz timers, checking whether there's any drift, checking >>> how accurately a single task can be woken from various >>> select/poll/epoll >>> delays, and checking whether NTP works.) >>> >> Thanks for expanding it. It is really helpful. >> >>> And I think further discussion is pointless until such research has >>> been >>> done (or someone who _really_ knows the time keeping/timer/sched code >>> inside out comments.) >>> >> Fully agree about experimentation to re-asses the drift. >> From what I recollect from past, few OMAP customers did >> report the time drift issue and that is how the switch >> from 100 --> 128 happened. >> >> Anyway I have added the suggested task to my long todo list. >> > So I tried to see if any time drift with HZ = 100 on OMAP. I ran the > setup for 62 hours and 27 mins with time synced up once with NTP server. > I measure about ~174 millisecond drift which is almost noise considering > the observed duration was ~224820000 milliseconds. So 174ms drift doesn't sound great, as < 2ms (often much less - though that depends on how close the server is) can be expected with NTP. Although its not clear how you were measuring: Did you see a max 174ms offset while trying to sync with NTP? Was that offset shortly after starting NTP or after NTP converged down? thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-29 0:01 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-29 0:01 UTC (permalink / raw) To: linux-arm-kernel On 01/27/2013 10:08 PM, Santosh Shilimkar wrote: > On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: >> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >>>> Sorry for not being clear enough. On OMAP, 32KHz is the only clock >>>> which >>>> is always running(even during low power states) and hence the clock >>>> source and clock event have been clocked using 32KHz clock. As >>>> mentioned >>>> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >>>> error of 0.1 %. This accuracy also impacts the timer tick interval. >>>> This was the reason, OMAP has been using the HZ = 128. >>> >>> Ok. Let's look at this. As far as time-of-day is concerned, this >>> shouldn't really matter with the clocksource/clockevent based system >>> that we now have (where *important point* platforms have been converted >>> over.) >>> >>> Any platform providing a clocksource will override the jiffy-based >>> clocksource. The measurement of time-of-day passing is now based on >>> the difference in values read from the clocksource, not from the actual >>> tick rate. >>> >>> Anything _not_ providing a clock source will be reliant on jiffies >>> incrementing, which in turn _requires_ one timer interrupt per jiffies >>> at a known rate (which is HZ). >>> >>> Now, that's the time of day, what about jiffies? Well, jiffies is >>> incremented based on a certain number of nsec having passed since the >>> last jiffy update. That means the code copes with dropped ticks and >>> the like. >>> >>> However, if your actual interrupt rate is close to the desired HZ, then >>> it can lead to some interesting effects (and noise): >>> >>> - if the interrupt rate is slightly faster than HZ, then you can end up >>> with updates being delayed by 2x interrupt rate. >>> - if the interrupt rate is slightly slower than HZ, you can >>> occasionally >>> end up with jiffies incrementing by two. >>> - if your interrupt rate is dead on HZ, then other system noise can >>> come >>> into effect and you may get maybe zero, one or two jiffy increments >>> per >>> interrupt. >>> >>> (You have to think about time passing in NS, where jiffy updates should >>> be vs where the timer interrupts happen.) See >>> tick_do_update_jiffies64() >>> for the details. >>> >>> The timer infrastructure is jiffy based - which includes scheduling >>> where >>> the scheduler does not use hrtimers. That means a slight discrepency >>> between HZ and the actual interrupt rate can cause around 1/HZ jitter. >>> That's a matter of fact due to how the code works. >>> >>> So, actually, I think the accuracy of HZ has much overall effect >>> _provided_ >>> a platform provides a clocksource to the accuracy of jiffy based timers >>> nor timekeeping. For those which don't, the accuracy of the timer >>> interrupt to HZ is very important. >>> >>> (This is just based on reading some code and not on practical >>> experiments - I'd suggest some research of this is done, trying HZ=100 >>> on OMAP's 32kHz timers, checking whether there's any drift, checking >>> how accurately a single task can be woken from various >>> select/poll/epoll >>> delays, and checking whether NTP works.) >>> >> Thanks for expanding it. It is really helpful. >> >>> And I think further discussion is pointless until such research has >>> been >>> done (or someone who _really_ knows the time keeping/timer/sched code >>> inside out comments.) >>> >> Fully agree about experimentation to re-asses the drift. >> From what I recollect from past, few OMAP customers did >> report the time drift issue and that is how the switch >> from 100 --> 128 happened. >> >> Anyway I have added the suggested task to my long todo list. >> > So I tried to see if any time drift with HZ = 100 on OMAP. I ran the > setup for 62 hours and 27 mins with time synced up once with NTP server. > I measure about ~174 millisecond drift which is almost noise considering > the observed duration was ~224820000 milliseconds. So 174ms drift doesn't sound great, as < 2ms (often much less - though that depends on how close the server is) can be expected with NTP. Although its not clear how you were measuring: Did you see a max 174ms offset while trying to sync with NTP? Was that offset shortly after starting NTP or after NTP converged down? thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-29 0:01 ` John Stultz @ 2013-01-29 6:43 ` Santosh Shilimkar -1 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-29 6:43 UTC (permalink / raw) To: John Stultz Cc: Russell King - ARM Linux, Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML Jon, On Tuesday 29 January 2013 05:31 AM, John Stultz wrote: > On 01/27/2013 10:08 PM, Santosh Shilimkar wrote: >> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: >>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: [..] >>> Thanks for expanding it. It is really helpful. >>> >>>> And I think further discussion is pointless until such research has >>>> been >>>> done (or someone who _really_ knows the time keeping/timer/sched code >>>> inside out comments.) >>>> >>> Fully agree about experimentation to re-asses the drift. >>> From what I recollect from past, few OMAP customers did >>> report the time drift issue and that is how the switch >>> from 100 --> 128 happened. >>> >>> Anyway I have added the suggested task to my long todo list. >>> >> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the >> setup for 62 hours and 27 mins with time synced up once with NTP server. >> I measure about ~174 millisecond drift which is almost noise considering >> the observed duration was ~224820000 milliseconds. > > So 174ms drift doesn't sound great, as < 2ms (often much less - though > that depends on how close the server is) can be expected with NTP. > Although its not clear how you were measuring: Did you see a max 174ms > offset while trying to sync with NTP? Was that offset shortly after > starting NTP or after NTP converged down? > To avoid the server latency, we didn't do continuous sync. The time was synced in the beginning and after 62.5 hours (#ntpd -qg) and the drift of about 174 ms was observed. As you said this could be because of server sync time along with probably some addition from system calls from #ntpd. As mentioned, the other run with HZ = 128 which started 15 hours 20 mins is already showing about 24 mS drift now. I will let it run for couple of more days just to have similar duration run. Regards, santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-29 6:43 ` Santosh Shilimkar 0 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-29 6:43 UTC (permalink / raw) To: linux-arm-kernel Jon, On Tuesday 29 January 2013 05:31 AM, John Stultz wrote: > On 01/27/2013 10:08 PM, Santosh Shilimkar wrote: >> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: >>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: [..] >>> Thanks for expanding it. It is really helpful. >>> >>>> And I think further discussion is pointless until such research has >>>> been >>>> done (or someone who _really_ knows the time keeping/timer/sched code >>>> inside out comments.) >>>> >>> Fully agree about experimentation to re-asses the drift. >>> From what I recollect from past, few OMAP customers did >>> report the time drift issue and that is how the switch >>> from 100 --> 128 happened. >>> >>> Anyway I have added the suggested task to my long todo list. >>> >> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the >> setup for 62 hours and 27 mins with time synced up once with NTP server. >> I measure about ~174 millisecond drift which is almost noise considering >> the observed duration was ~224820000 milliseconds. > > So 174ms drift doesn't sound great, as < 2ms (often much less - though > that depends on how close the server is) can be expected with NTP. > Although its not clear how you were measuring: Did you see a max 174ms > offset while trying to sync with NTP? Was that offset shortly after > starting NTP or after NTP converged down? > To avoid the server latency, we didn't do continuous sync. The time was synced in the beginning and after 62.5 hours (#ntpd -qg) and the drift of about 174 ms was observed. As you said this could be because of server sync time along with probably some addition from system calls from #ntpd. As mentioned, the other run with HZ = 128 which started 15 hours 20 mins is already showing about 24 mS drift now. I will let it run for couple of more days just to have similar duration run. Regards, santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-29 6:43 ` Santosh Shilimkar @ 2013-01-29 10:06 ` Russell King - ARM Linux -1 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-29 10:06 UTC (permalink / raw) To: Santosh Shilimkar Cc: John Stultz, Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML On Tue, Jan 29, 2013 at 12:13:46PM +0530, Santosh Shilimkar wrote: > To avoid the server latency, we didn't do continuous sync. The time was > synced in the beginning and after 62.5 hours (#ntpd -qg) and the drift > of about 174 ms was observed. As you said this could be because of > server sync time along with probably some addition from system calls > from #ntpd. As mentioned, the other run with HZ = 128 which started > 15 hours 20 mins is already showing about 24 mS drift now. I will > let it run for couple of more days just to have similar duration run. Hmm. I wonder if ntpd -qg will cause ntp to read the drift file and adjust the kernel time keeping using that information... ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-29 10:06 ` Russell King - ARM Linux 0 siblings, 0 replies; 96+ messages in thread From: Russell King - ARM Linux @ 2013-01-29 10:06 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jan 29, 2013 at 12:13:46PM +0530, Santosh Shilimkar wrote: > To avoid the server latency, we didn't do continuous sync. The time was > synced in the beginning and after 62.5 hours (#ntpd -qg) and the drift > of about 174 ms was observed. As you said this could be because of > server sync time along with probably some addition from system calls > from #ntpd. As mentioned, the other run with HZ = 128 which started > 15 hours 20 mins is already showing about 24 mS drift now. I will > let it run for couple of more days just to have similar duration run. Hmm. I wonder if ntpd -qg will cause ntp to read the drift file and adjust the kernel time keeping using that information... ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-29 6:43 ` Santosh Shilimkar @ 2013-01-29 18:43 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-29 18:43 UTC (permalink / raw) To: Santosh Shilimkar Cc: Russell King - ARM Linux, Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML On 01/28/2013 10:43 PM, Santosh Shilimkar wrote: > Jon, > > On Tuesday 29 January 2013 05:31 AM, John Stultz wrote: >> On 01/27/2013 10:08 PM, Santosh Shilimkar wrote: >>> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: >>>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >>>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: > > [..] > >>>> Thanks for expanding it. It is really helpful. >>>> >>>>> And I think further discussion is pointless until such research has >>>>> been >>>>> done (or someone who _really_ knows the time keeping/timer/sched code >>>>> inside out comments.) >>>>> >>>> Fully agree about experimentation to re-asses the drift. >>>> From what I recollect from past, few OMAP customers did >>>> report the time drift issue and that is how the switch >>>> from 100 --> 128 happened. >>>> >>>> Anyway I have added the suggested task to my long todo list. >>>> >>> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the >>> setup for 62 hours and 27 mins with time synced up once with NTP >>> server. >>> I measure about ~174 millisecond drift which is almost noise >>> considering >>> the observed duration was ~224820000 milliseconds. >> >> So 174ms drift doesn't sound great, as < 2ms (often much less - though >> that depends on how close the server is) can be expected with NTP. >> Although its not clear how you were measuring: Did you see a max 174ms >> offset while trying to sync with NTP? Was that offset shortly after >> starting NTP or after NTP converged down? >> > To avoid the server latency, we didn't do continuous sync. The time > was synced in the beginning and after 62.5 hours (#ntpd -qg) and the > drift > of about 174 ms was observed. As you said this could be because of > server sync time along with probably some addition from system calls > from #ntpd. Ahh.. Ok. Thanks for the clarification. After a one time sync, ~774ppb drift is surprisingly good! > As mentioned, the other run with HZ = 128 which started > 15 hours 20 mins is already showing about 24 mS drift now. I will > let it run for couple of more days just to have similar duration run. Yea, this is also great drift wise (but its not surprising, as both cases we're keeping time off of the same clocksource, and HZ shouldn't come into play). But its good to have the timekeeping side validated. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-29 18:43 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-29 18:43 UTC (permalink / raw) To: linux-arm-kernel On 01/28/2013 10:43 PM, Santosh Shilimkar wrote: > Jon, > > On Tuesday 29 January 2013 05:31 AM, John Stultz wrote: >> On 01/27/2013 10:08 PM, Santosh Shilimkar wrote: >>> On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote: >>>> On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote: >>>>> On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: > > [..] > >>>> Thanks for expanding it. It is really helpful. >>>> >>>>> And I think further discussion is pointless until such research has >>>>> been >>>>> done (or someone who _really_ knows the time keeping/timer/sched code >>>>> inside out comments.) >>>>> >>>> Fully agree about experimentation to re-asses the drift. >>>> From what I recollect from past, few OMAP customers did >>>> report the time drift issue and that is how the switch >>>> from 100 --> 128 happened. >>>> >>>> Anyway I have added the suggested task to my long todo list. >>>> >>> So I tried to see if any time drift with HZ = 100 on OMAP. I ran the >>> setup for 62 hours and 27 mins with time synced up once with NTP >>> server. >>> I measure about ~174 millisecond drift which is almost noise >>> considering >>> the observed duration was ~224820000 milliseconds. >> >> So 174ms drift doesn't sound great, as < 2ms (often much less - though >> that depends on how close the server is) can be expected with NTP. >> Although its not clear how you were measuring: Did you see a max 174ms >> offset while trying to sync with NTP? Was that offset shortly after >> starting NTP or after NTP converged down? >> > To avoid the server latency, we didn't do continuous sync. The time > was synced in the beginning and after 62.5 hours (#ntpd -qg) and the > drift > of about 174 ms was observed. As you said this could be because of > server sync time along with probably some addition from system calls > from #ntpd. Ahh.. Ok. Thanks for the clarification. After a one time sync, ~774ppb drift is surprisingly good! > As mentioned, the other run with HZ = 128 which started > 15 hours 20 mins is already showing about 24 mS drift now. I will > let it run for couple of more days just to have similar duration run. Yea, this is also great drift wise (but its not surprising, as both cases we're keeping time off of the same clocksource, and HZ shouldn't come into play). But its good to have the timekeeping side validated. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 14:51 ` Russell King - ARM Linux @ 2013-01-22 17:31 ` Arnd Bergmann -1 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-22 17:31 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Santosh Shilimkar, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, John Stultz, Linux ARM Kernel ML On Tuesday 22 January 2013, Russell King - ARM Linux wrote: > On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: > > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which > > is always running(even during low power states) and hence the clock > > source and clock event have been clocked using 32KHz clock. As mentioned > > by RMK, with 32768 Hz clock and HZ = 100, there will be always an > > error of 0.1 %. This accuracy also impacts the timer tick interval. > > This was the reason, OMAP has been using the HZ = 128. > > Ok. Let's look at this. As far as time-of-day is concerned, this > shouldn't really matter with the clocksource/clockevent based system > that we now have (where *important point* platforms have been converted > over.) > > Any platform providing a clocksource will override the jiffy-based > clocksource. The measurement of time-of-day passing is now based on > the difference in values read from the clocksource, not from the actual > tick rate. Ok, that was my reading as well. > - if the interrupt rate is slightly faster than HZ, then you can end up > with updates being delayed by 2x interrupt rate. > - if the interrupt rate is slightly slower than HZ, you can occasionally > end up with jiffies incrementing by two. > - if your interrupt rate is dead on HZ, then other system noise can come > into effect and you may get maybe zero, one or two jiffy increments per > interrupt. > > (You have to think about time passing in NS, where jiffy updates should > be vs where the timer interrupts happen.) See tick_do_update_jiffies64() > for the details. Ah, right. I forgot about this case. So when we have an accurate clocksource, rather than relying on the timer tick as the sole source for timekeeping, the jiffies64 variable may be less accurate (up to almost two jiffies diff, rather than almost one jiffy). > The timer infrastructure is jiffy based - which includes scheduling where > the scheduler does not use hrtimers. That means a slight discrepency > between HZ and the actual interrupt rate can cause around 1/HZ jitter. > That's a matter of fact due to how the code works. Yes, the two jiffies accuracy I mentioned above would be the result of the 1 jiffy jitter plus 1 jiffy from the limited resolution. > So, actually, I think the accuracy of HZ has much overall effect _provided_ > a platform provides a clocksource to the accuracy of jiffy based timers > nor timekeeping. For those which don't, the accuracy of the timer > interrupt to HZ is very important. This is where I don't see the same problem that you are seeing. Shouldn't the old ACT_HZ calculation based on CLOCK_TICK_RATE have prevented this? Note that all PC-like systems traditionally have a CLOCK_TICK_RATE of 1193182 Hz, which does not accurately divide into any of the normal HZ values, the jiffies clocksource used to have code in it to make up for this problem. Nowadays, since John's b3c869d35 "jiffies: Remove compile time assumptions about CLOCK_TICK_RATE" patch in 3.7, the logic in part of the refined_jiffies clock source that is used currently only on x86. I do agree that any platform that is using neither a platform specific clocksource nor the refined_jiffies would suffer from the drift as you describe. OMAP was in fact using CLOCK_TICK_RATE correctly, but is not using the refined_jiffies clocksource now because it has its own clocksource implementation. > And I think further discussion is pointless until such research has been > done (or someone who _really_ knows the time keeping/timer/sched code > inside out comments.) Maybe John has some more insights here, he seems to be the one that understands it better than any of us. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 17:31 ` Arnd Bergmann 0 siblings, 0 replies; 96+ messages in thread From: Arnd Bergmann @ 2013-01-22 17:31 UTC (permalink / raw) To: linux-arm-kernel On Tuesday 22 January 2013, Russell King - ARM Linux wrote: > On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: > > Sorry for not being clear enough. On OMAP, 32KHz is the only clock which > > is always running(even during low power states) and hence the clock > > source and clock event have been clocked using 32KHz clock. As mentioned > > by RMK, with 32768 Hz clock and HZ = 100, there will be always an > > error of 0.1 %. This accuracy also impacts the timer tick interval. > > This was the reason, OMAP has been using the HZ = 128. > > Ok. Let's look at this. As far as time-of-day is concerned, this > shouldn't really matter with the clocksource/clockevent based system > that we now have (where *important point* platforms have been converted > over.) > > Any platform providing a clocksource will override the jiffy-based > clocksource. The measurement of time-of-day passing is now based on > the difference in values read from the clocksource, not from the actual > tick rate. Ok, that was my reading as well. > - if the interrupt rate is slightly faster than HZ, then you can end up > with updates being delayed by 2x interrupt rate. > - if the interrupt rate is slightly slower than HZ, you can occasionally > end up with jiffies incrementing by two. > - if your interrupt rate is dead on HZ, then other system noise can come > into effect and you may get maybe zero, one or two jiffy increments per > interrupt. > > (You have to think about time passing in NS, where jiffy updates should > be vs where the timer interrupts happen.) See tick_do_update_jiffies64() > for the details. Ah, right. I forgot about this case. So when we have an accurate clocksource, rather than relying on the timer tick as the sole source for timekeeping, the jiffies64 variable may be less accurate (up to almost two jiffies diff, rather than almost one jiffy). > The timer infrastructure is jiffy based - which includes scheduling where > the scheduler does not use hrtimers. That means a slight discrepency > between HZ and the actual interrupt rate can cause around 1/HZ jitter. > That's a matter of fact due to how the code works. Yes, the two jiffies accuracy I mentioned above would be the result of the 1 jiffy jitter plus 1 jiffy from the limited resolution. > So, actually, I think the accuracy of HZ has much overall effect _provided_ > a platform provides a clocksource to the accuracy of jiffy based timers > nor timekeeping. For those which don't, the accuracy of the timer > interrupt to HZ is very important. This is where I don't see the same problem that you are seeing. Shouldn't the old ACT_HZ calculation based on CLOCK_TICK_RATE have prevented this? Note that all PC-like systems traditionally have a CLOCK_TICK_RATE of 1193182 Hz, which does not accurately divide into any of the normal HZ values, the jiffies clocksource used to have code in it to make up for this problem. Nowadays, since John's b3c869d35 "jiffies: Remove compile time assumptions about CLOCK_TICK_RATE" patch in 3.7, the logic in part of the refined_jiffies clock source that is used currently only on x86. I do agree that any platform that is using neither a platform specific clocksource nor the refined_jiffies would suffer from the drift as you describe. OMAP was in fact using CLOCK_TICK_RATE correctly, but is not using the refined_jiffies clocksource now because it has its own clocksource implementation. > And I think further discussion is pointless until such research has been > done (or someone who _really_ knows the time keeping/timer/sched code > inside out comments.) Maybe John has some more insights here, he seems to be the one that understands it better than any of us. Arnd ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 14:51 ` Russell King - ARM Linux @ 2013-01-22 18:59 ` John Stultz -1 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 18:59 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Santosh Shilimkar, Arnd Bergmann, Tony Lindgren, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML On 01/22/2013 06:51 AM, Russell King - ARM Linux wrote: > On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which >> is always running(even during low power states) and hence the clock >> source and clock event have been clocked using 32KHz clock. As mentioned >> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >> error of 0.1 %. This accuracy also impacts the timer tick interval. >> This was the reason, OMAP has been using the HZ = 128. > Ok. Let's look at this. As far as time-of-day is concerned, this > shouldn't really matter with the clocksource/clockevent based system > that we now have (where *important point* platforms have been converted > over.) > > Any platform providing a clocksource will override the jiffy-based > clocksource. The measurement of time-of-day passing is now based on > the difference in values read from the clocksource, not from the actual > tick rate. > > Anything _not_ providing a clock source will be reliant on jiffies > incrementing, which in turn _requires_ one timer interrupt per jiffies > at a known rate (which is HZ). Correct. As long as we have a fine-grained hardware clocksource installed, HZ error should not affect timekeeping in any major way. > Now, that's the time of day, what about jiffies? Well, jiffies is > incremented based on a certain number of nsec having passed since the > last jiffy update. That means the code copes with dropped ticks and > the like. > > However, if your actual interrupt rate is close to the desired HZ, then > it can lead to some interesting effects (and noise): > > - if the interrupt rate is slightly faster than HZ, then you can end up > with updates being delayed by 2x interrupt rate. > - if the interrupt rate is slightly slower than HZ, you can occasionally > end up with jiffies incrementing by two. > - if your interrupt rate is dead on HZ, then other system noise can come > into effect and you may get maybe zero, one or two jiffy increments per > interrupt. > > (You have to think about time passing in NS, where jiffy updates should > be vs where the timer interrupts happen.) See tick_do_update_jiffies64() > for the details. Correct, with HRT, we actually trigger the HZ-frequency timer tick from an hrtimer (which expires based on the system time driven by the clocksource). Thus even if there is a theoretical error between the ideal HZ and what the hardware can do, that error will not propagate forward. Instead, you may only see timer jitter on the order of how fine-grained the timer hardware can be triggered. If that is relatively fine, it shouldn't be an issue, if its relatively coarse (closer to HZ), then there may be the noise effects you list above. Although that should be mostly ok since jiffy timers will always have a few jiffys of jitter due to the granularity (ie: when setting a jiffies timer, you don't how how far into the current jiffy you are). In the case where we don't have HRT, and the timers are triggered by the HZ periodic interrupt, then there is a mix of possibilities, for hrtimers you'll still see the behavior you list above (since they are still time based), but for jiffies timers, the rules are mostly inverted (if the interrupt rate is fast, jiffies timers will trigger sooner, if the rate is slow, jiffies timers will trigger later). And if you are using jiffies for time (and not using the register_refined_jiffies code), then everything will follow the interrupt freq. So if interrupts are faster then HZ, time will move faster, timers will expire early, etc. > The timer infrastructure is jiffy based - which includes scheduling where > the scheduler does not use hrtimers. That means a slight discrepency > between HZ and the actual interrupt rate can cause around 1/HZ jitter. > That's a matter of fact due to how the code works. > > So, actually, I think the accuracy of HZ has much overall effect _provided_ > a platform provides a clocksource to the accuracy of jiffy based timers > nor timekeeping. For those which don't, the accuracy of the timer > interrupt to HZ is very important. I think you're right, but I suspect there are some typos in the above. So to clarify: The accuracy of HZ shouldn't have much affect on timekeeping on systems that use fine-grained clocksources. Though for systems that use jiffies/arch_gettimeoffset() HZ accuracy is more important. However, the register_refined_jiffies() call should allow for smaller error on those systems to be corrected. The accuracy of HZ may have some affect on systems that do not have a clockevent driver and do not use hrt mode. It should be relatively bounded > (This is just based on reading some code and not on practical > experiments - I'd suggest some research of this is done, trying HZ=100 > on OMAP's 32kHz timers, checking whether there's any drift, checking > how accurately a single task can be woken from various select/poll/epoll > delays, and checking whether NTP works.) Yea, for omap and other more "modern" systems with clocksources and clockevents, HZ=100 should be ok. Although I'd still like to see the experiments run, since as always, there may be bugs (I'd be interested in hearing about). Even on systems w/o clocksources and clockevents, small HZ error should be able to be managed via the register_refined_jiffies() and I'd like to hear if folks have issues with that (there may be bounds limits I've not run into - so I'd like to get that fixed if so). The only really problematic cases are systems where there aren't clocksources nor clockevents, and the hardware has specific limits on what HZ ranges it can do (ie the EBSA110), but I think we're all ok with those not being able to be compiled into a multi-platform kernel. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 18:59 ` John Stultz 0 siblings, 0 replies; 96+ messages in thread From: John Stultz @ 2013-01-22 18:59 UTC (permalink / raw) To: linux-arm-kernel On 01/22/2013 06:51 AM, Russell King - ARM Linux wrote: > On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote: >> Sorry for not being clear enough. On OMAP, 32KHz is the only clock which >> is always running(even during low power states) and hence the clock >> source and clock event have been clocked using 32KHz clock. As mentioned >> by RMK, with 32768 Hz clock and HZ = 100, there will be always an >> error of 0.1 %. This accuracy also impacts the timer tick interval. >> This was the reason, OMAP has been using the HZ = 128. > Ok. Let's look at this. As far as time-of-day is concerned, this > shouldn't really matter with the clocksource/clockevent based system > that we now have (where *important point* platforms have been converted > over.) > > Any platform providing a clocksource will override the jiffy-based > clocksource. The measurement of time-of-day passing is now based on > the difference in values read from the clocksource, not from the actual > tick rate. > > Anything _not_ providing a clock source will be reliant on jiffies > incrementing, which in turn _requires_ one timer interrupt per jiffies > at a known rate (which is HZ). Correct. As long as we have a fine-grained hardware clocksource installed, HZ error should not affect timekeeping in any major way. > Now, that's the time of day, what about jiffies? Well, jiffies is > incremented based on a certain number of nsec having passed since the > last jiffy update. That means the code copes with dropped ticks and > the like. > > However, if your actual interrupt rate is close to the desired HZ, then > it can lead to some interesting effects (and noise): > > - if the interrupt rate is slightly faster than HZ, then you can end up > with updates being delayed by 2x interrupt rate. > - if the interrupt rate is slightly slower than HZ, you can occasionally > end up with jiffies incrementing by two. > - if your interrupt rate is dead on HZ, then other system noise can come > into effect and you may get maybe zero, one or two jiffy increments per > interrupt. > > (You have to think about time passing in NS, where jiffy updates should > be vs where the timer interrupts happen.) See tick_do_update_jiffies64() > for the details. Correct, with HRT, we actually trigger the HZ-frequency timer tick from an hrtimer (which expires based on the system time driven by the clocksource). Thus even if there is a theoretical error between the ideal HZ and what the hardware can do, that error will not propagate forward. Instead, you may only see timer jitter on the order of how fine-grained the timer hardware can be triggered. If that is relatively fine, it shouldn't be an issue, if its relatively coarse (closer to HZ), then there may be the noise effects you list above. Although that should be mostly ok since jiffy timers will always have a few jiffys of jitter due to the granularity (ie: when setting a jiffies timer, you don't how how far into the current jiffy you are). In the case where we don't have HRT, and the timers are triggered by the HZ periodic interrupt, then there is a mix of possibilities, for hrtimers you'll still see the behavior you list above (since they are still time based), but for jiffies timers, the rules are mostly inverted (if the interrupt rate is fast, jiffies timers will trigger sooner, if the rate is slow, jiffies timers will trigger later). And if you are using jiffies for time (and not using the register_refined_jiffies code), then everything will follow the interrupt freq. So if interrupts are faster then HZ, time will move faster, timers will expire early, etc. > The timer infrastructure is jiffy based - which includes scheduling where > the scheduler does not use hrtimers. That means a slight discrepency > between HZ and the actual interrupt rate can cause around 1/HZ jitter. > That's a matter of fact due to how the code works. > > So, actually, I think the accuracy of HZ has much overall effect _provided_ > a platform provides a clocksource to the accuracy of jiffy based timers > nor timekeeping. For those which don't, the accuracy of the timer > interrupt to HZ is very important. I think you're right, but I suspect there are some typos in the above. So to clarify: The accuracy of HZ shouldn't have much affect on timekeeping on systems that use fine-grained clocksources. Though for systems that use jiffies/arch_gettimeoffset() HZ accuracy is more important. However, the register_refined_jiffies() call should allow for smaller error on those systems to be corrected. The accuracy of HZ may have some affect on systems that do not have a clockevent driver and do not use hrt mode. It should be relatively bounded > (This is just based on reading some code and not on practical > experiments - I'd suggest some research of this is done, trying HZ=100 > on OMAP's 32kHz timers, checking whether there's any drift, checking > how accurately a single task can be woken from various select/poll/epoll > delays, and checking whether NTP works.) Yea, for omap and other more "modern" systems with clocksources and clockevents, HZ=100 should be ok. Although I'd still like to see the experiments run, since as always, there may be bugs (I'd be interested in hearing about). Even on systems w/o clocksources and clockevents, small HZ error should be able to be managed via the register_refined_jiffies() and I'd like to hear if folks have issues with that (there may be bounds limits I've not run into - so I'd like to get that fixed if so). The only really problematic cases are systems where there aren't clocksources nor clockevents, and the hardware has specific limits on what HZ ranges it can do (ie the EBSA110), but I think we're all ok with those not being able to be compiled into a multi-platform kernel. thanks -john ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 18:59 ` John Stultz @ 2013-01-22 21:52 ` Tony Lindgren -1 siblings, 0 replies; 96+ messages in thread From: Tony Lindgren @ 2013-01-22 21:52 UTC (permalink / raw) To: John Stultz Cc: Russell King - ARM Linux, Santosh Shilimkar, Arnd Bergmann, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML * John Stultz <john.stultz@linaro.org> [130122 11:02]: > > Correct, with HRT, we actually trigger the HZ-frequency timer tick > from an hrtimer (which expires based on the system time driven by > the clocksource). Thus even if there is a theoretical error between > the ideal HZ and what the hardware can do, that error will not > propagate forward. If there's no cumulative error, sounds like the way to go is to select HRT for ARM multiplatform builds and set the HZ to 100 then. Regards, Tony ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-22 21:52 ` Tony Lindgren 0 siblings, 0 replies; 96+ messages in thread From: Tony Lindgren @ 2013-01-22 21:52 UTC (permalink / raw) To: linux-arm-kernel * John Stultz <john.stultz@linaro.org> [130122 11:02]: > > Correct, with HRT, we actually trigger the HZ-frequency timer tick > from an hrtimer (which expires based on the system time driven by > the clocksource). Thus even if there is a theoretical error between > the ideal HZ and what the hardware can do, that error will not > propagate forward. If there's no cumulative error, sounds like the way to go is to select HRT for ARM multiplatform builds and set the HZ to 100 then. Regards, Tony ^ permalink raw reply [flat|nested] 96+ messages in thread
* Re: One of these things (CONFIG_HZ) is not like the others.. 2013-01-22 21:52 ` Tony Lindgren @ 2013-01-23 5:18 ` Santosh Shilimkar -1 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-23 5:18 UTC (permalink / raw) To: Tony Lindgren Cc: John Stultz, Russell King - ARM Linux, Arnd Bergmann, Peter Zijlstra, Matt Sealey, LKML, Ben Dooks, Ingo Molnar, Linux ARM Kernel ML On Wednesday 23 January 2013 03:22 AM, Tony Lindgren wrote: > * John Stultz <john.stultz@linaro.org> [130122 11:02]: >> >> Correct, with HRT, we actually trigger the HZ-frequency timer tick >> from an hrtimer (which expires based on the system time driven by >> the clocksource). Thus even if there is a theoretical error between >> the ideal HZ and what the hardware can do, that error will not >> propagate forward. > > If there's no cumulative error, sounds like the way to go is to select > HRT for ARM multiplatform builds and set the HZ to 100 then. > HIGH_RES_TIMERS are always enabled by default for OMAP as well as multi-platform build. Regards, Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
* One of these things (CONFIG_HZ) is not like the others.. @ 2013-01-23 5:18 ` Santosh Shilimkar 0 siblings, 0 replies; 96+ messages in thread From: Santosh Shilimkar @ 2013-01-23 5:18 UTC (permalink / raw) To: linux-arm-kernel On Wednesday 23 January 2013 03:22 AM, Tony Lindgren wrote: > * John Stultz <john.stultz@linaro.org> [130122 11:02]: >> >> Correct, with HRT, we actually trigger the HZ-frequency timer tick >> from an hrtimer (which expires based on the system time driven by >> the clocksource). Thus even if there is a theoretical error between >> the ideal HZ and what the hardware can do, that error will not >> propagate forward. > > If there's no cumulative error, sounds like the way to go is to select > HRT for ARM multiplatform builds and set the HZ to 100 then. > HIGH_RES_TIMERS are always enabled by default for OMAP as well as multi-platform build. Regards, Santosh ^ permalink raw reply [flat|nested] 96+ messages in thread
end of thread, other threads:[~2013-01-31 21:31 UTC | newest] Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-01-21 20:01 One of these things (CONFIG_HZ) is not like the others Matt Sealey 2013-01-21 20:01 ` Matt Sealey 2013-01-21 20:41 ` Arnd Bergmann 2013-01-21 20:41 ` Arnd Bergmann 2013-01-21 21:00 ` John Stultz 2013-01-21 21:00 ` John Stultz 2013-01-21 21:12 ` Russell King - ARM Linux 2013-01-21 21:12 ` Russell King - ARM Linux 2013-01-21 22:18 ` John Stultz 2013-01-21 22:18 ` John Stultz 2013-01-21 22:44 ` Russell King - ARM Linux 2013-01-21 22:44 ` Russell King - ARM Linux 2013-01-22 8:27 ` Arnd Bergmann 2013-01-22 8:27 ` Arnd Bergmann 2013-01-21 22:20 ` Matt Sealey 2013-01-21 22:20 ` Matt Sealey 2013-01-21 22:42 ` Russell King - ARM Linux 2013-01-21 22:42 ` Russell King - ARM Linux 2013-01-21 23:23 ` Matt Sealey 2013-01-21 23:23 ` Matt Sealey 2013-01-21 23:49 ` Russell King - ARM Linux 2013-01-21 23:49 ` Russell King - ARM Linux 2013-01-22 0:09 ` Matt Sealey 2013-01-22 0:09 ` Matt Sealey 2013-01-22 0:26 ` Matt Sealey 2013-01-22 0:26 ` Matt Sealey 2013-01-21 21:14 ` Matt Sealey 2013-01-21 21:14 ` Matt Sealey 2013-01-21 22:36 ` John Stultz 2013-01-21 22:36 ` John Stultz 2013-01-21 22:49 ` Russell King - ARM Linux 2013-01-21 22:49 ` Russell King - ARM Linux 2013-01-21 22:54 ` Matt Sealey 2013-01-21 22:54 ` Matt Sealey 2013-01-21 23:13 ` Russell King - ARM Linux 2013-01-21 23:13 ` Russell King - ARM Linux 2013-01-21 23:30 ` Matt Sealey 2013-01-21 23:30 ` Matt Sealey 2013-01-22 0:02 ` Russell King - ARM Linux 2013-01-22 0:02 ` Russell King - ARM Linux 2013-01-22 0:38 ` John Stultz 2013-01-22 0:38 ` John Stultz 2013-01-22 0:51 ` John Stultz 2013-01-22 0:51 ` John Stultz 2013-01-22 1:06 ` Matt Sealey 2013-01-22 1:06 ` Matt Sealey 2013-01-22 1:18 ` Russell King - ARM Linux 2013-01-22 1:18 ` Russell King - ARM Linux 2013-01-22 1:56 ` Matt Sealey 2013-01-22 1:56 ` Matt Sealey 2013-01-22 1:31 ` John Stultz 2013-01-22 1:31 ` John Stultz 2013-01-22 2:10 ` Matt Sealey 2013-01-22 2:10 ` Matt Sealey 2013-01-31 21:31 ` Thomas Gleixner 2013-01-31 21:31 ` Thomas Gleixner 2013-01-21 21:02 ` Matt Sealey 2013-01-21 21:02 ` Matt Sealey 2013-01-21 22:30 ` Arnd Bergmann 2013-01-21 22:30 ` Arnd Bergmann 2013-01-21 22:45 ` Russell King - ARM Linux 2013-01-21 22:45 ` Russell King - ARM Linux 2013-01-21 23:01 ` Matt Sealey 2013-01-21 23:01 ` Matt Sealey 2013-01-21 21:03 ` Russell King - ARM Linux 2013-01-21 21:03 ` Russell King - ARM Linux 2013-01-21 23:23 ` Tony Lindgren 2013-01-21 23:23 ` Tony Lindgren 2013-01-22 6:23 ` Santosh Shilimkar 2013-01-22 6:23 ` Santosh Shilimkar 2013-01-22 9:31 ` Arnd Bergmann 2013-01-22 9:31 ` Arnd Bergmann 2013-01-22 10:14 ` Santosh Shilimkar 2013-01-22 10:14 ` Santosh Shilimkar 2013-01-22 14:51 ` Russell King - ARM Linux 2013-01-22 14:51 ` Russell King - ARM Linux 2013-01-22 15:05 ` Santosh Shilimkar 2013-01-22 15:05 ` Santosh Shilimkar 2013-01-28 6:08 ` Santosh Shilimkar 2013-01-28 6:08 ` Santosh Shilimkar 2013-01-29 0:01 ` John Stultz 2013-01-29 0:01 ` John Stultz 2013-01-29 6:43 ` Santosh Shilimkar 2013-01-29 6:43 ` Santosh Shilimkar 2013-01-29 10:06 ` Russell King - ARM Linux 2013-01-29 10:06 ` Russell King - ARM Linux 2013-01-29 18:43 ` John Stultz 2013-01-29 18:43 ` John Stultz 2013-01-22 17:31 ` Arnd Bergmann 2013-01-22 17:31 ` Arnd Bergmann 2013-01-22 18:59 ` John Stultz 2013-01-22 18:59 ` John Stultz 2013-01-22 21:52 ` Tony Lindgren 2013-01-22 21:52 ` Tony Lindgren 2013-01-23 5:18 ` Santosh Shilimkar 2013-01-23 5:18 ` Santosh Shilimkar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.