All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29  9:23 ` Lorenzo Pieralisi
  0 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Pieralisi @ 2014-05-29  9:23 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: mark.rutland, Lorenzo Pieralisi, Preeti U Murthy, Will Deacon

On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel software broadcast hrtimer upon boot. It relies on a dynamically
chosen CPU to be always powered-up. This CPU then relays the timer interrupt
to CPUs in deep-idle states through its HW local timer device.

On systems with power management capabilities but no functional HW broadcast
tick device, the hrtimer based clock event device allows the kernel to
enter high-resolution timer mode, which improves system latencies and saves
dynamic power.

The side effect of having a CPU always-on has implications on power management
platform capabilities and makes CPUidle suboptimal, since at least a CPU is
kept always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working power
management capabilities.

The hrtimer based clock event device has lowest possible rating so that,
if a platform contains a functional HW clock event device with broadcast
capabilities, that device is always chosen as a tick broadcast device instead
of the software based one, now present by default.

Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 29c39d5..3d43900 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/clockchips.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
@@ -67,6 +68,8 @@ void __init time_init(void)
 
 	clocksource_of_init();
 
+	tick_setup_hrtimer_broadcast();
+
 	arch_timer_rate = arch_timer_get_rate();
 	if (!arch_timer_rate)
 		panic("Unable to initialise architected timer.\n");
-- 
1.8.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29  9:23 ` Lorenzo Pieralisi
  0 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Pieralisi @ 2014-05-29  9:23 UTC (permalink / raw)
  To: linux-arm-kernel

On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel software broadcast hrtimer upon boot. It relies on a dynamically
chosen CPU to be always powered-up. This CPU then relays the timer interrupt
to CPUs in deep-idle states through its HW local timer device.

On systems with power management capabilities but no functional HW broadcast
tick device, the hrtimer based clock event device allows the kernel to
enter high-resolution timer mode, which improves system latencies and saves
dynamic power.

The side effect of having a CPU always-on has implications on power management
platform capabilities and makes CPUidle suboptimal, since at least a CPU is
kept always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working power
management capabilities.

The hrtimer based clock event device has lowest possible rating so that,
if a platform contains a functional HW clock event device with broadcast
capabilities, that device is always chosen as a tick broadcast device instead
of the software based one, now present by default.

Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 29c39d5..3d43900 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/clockchips.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
@@ -67,6 +68,8 @@ void __init time_init(void)
 
 	clocksource_of_init();
 
+	tick_setup_hrtimer_broadcast();
+
 	arch_timer_rate = arch_timer_get_rate();
 	if (!arch_timer_rate)
 		panic("Unable to initialise architected timer.\n");
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29  9:23 ` Lorenzo Pieralisi
@ 2014-05-29 10:14   ` Will Deacon
  -1 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2014-05-29 10:14 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-arm-kernel, linux-kernel, Mark Rutland, Preeti U Murthy

On Thu, May 29, 2014 at 10:23:01AM +0100, Lorenzo Pieralisi wrote:
> On platforms implementing CPU power management, the CPUidle subsystem
> can allow CPUs to enter idle states where local timers logic is lost on power
> down. To keep the software timers functional the kernel relies on an
> always-on broadcast timer to be present in the platform to relay the
> interrupt signalling the timer expiries.
> 
> For platforms implementing CPU core gating that do not implement an always-on
> HW timer or implement it in a broken way, this patch adds code to initialize
> the kernel software broadcast hrtimer upon boot. It relies on a dynamically
> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> to CPUs in deep-idle states through its HW local timer device.
> 
> On systems with power management capabilities but no functional HW broadcast
> tick device, the hrtimer based clock event device allows the kernel to
> enter high-resolution timer mode, which improves system latencies and saves
> dynamic power.
> 
> The side effect of having a CPU always-on has implications on power management
> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> kept always in a shallow idle state by the kernel to relay timer interrupts,
> but at least leaves the kernel with a functional system with some working power
> management capabilities.
> 
> The hrtimer based clock event device has lowest possible rating so that,
> if a platform contains a functional HW clock event device with broadcast
> capabilities, that device is always chosen as a tick broadcast device instead
> of the software based one, now present by default.
> 
> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks Lorenzo,

  Acked-by: Will Deacon <will.deacon@arm.com>

With this patch applied, cyclictest starts reporting sane numbers and I
no longer see latency-related failures in LTP when it does things like
test timeouts and nanosleep()s.

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29 10:14   ` Will Deacon
  0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2014-05-29 10:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 29, 2014 at 10:23:01AM +0100, Lorenzo Pieralisi wrote:
> On platforms implementing CPU power management, the CPUidle subsystem
> can allow CPUs to enter idle states where local timers logic is lost on power
> down. To keep the software timers functional the kernel relies on an
> always-on broadcast timer to be present in the platform to relay the
> interrupt signalling the timer expiries.
> 
> For platforms implementing CPU core gating that do not implement an always-on
> HW timer or implement it in a broken way, this patch adds code to initialize
> the kernel software broadcast hrtimer upon boot. It relies on a dynamically
> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> to CPUs in deep-idle states through its HW local timer device.
> 
> On systems with power management capabilities but no functional HW broadcast
> tick device, the hrtimer based clock event device allows the kernel to
> enter high-resolution timer mode, which improves system latencies and saves
> dynamic power.
> 
> The side effect of having a CPU always-on has implications on power management
> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> kept always in a shallow idle state by the kernel to relay timer interrupts,
> but at least leaves the kernel with a functional system with some working power
> management capabilities.
> 
> The hrtimer based clock event device has lowest possible rating so that,
> if a platform contains a functional HW clock event device with broadcast
> capabilities, that device is always chosen as a tick broadcast device instead
> of the software based one, now present by default.
> 
> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks Lorenzo,

  Acked-by: Will Deacon <will.deacon@arm.com>

With this patch applied, cyclictest starts reporting sane numbers and I
no longer see latency-related failures in LTP when it does things like
test timeouts and nanosleep()s.

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29  9:23 ` Lorenzo Pieralisi
@ 2014-05-29 11:04   ` Preeti U Murthy
  -1 siblings, 0 replies; 16+ messages in thread
From: Preeti U Murthy @ 2014-05-29 11:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-arm-kernel, linux-kernel, mark.rutland, Will Deacon

Hi Lorenzo,

On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
> On platforms implementing CPU power management, the CPUidle subsystem
> can allow CPUs to enter idle states where local timers logic is lost on power
> down. To keep the software timers functional the kernel relies on an
> always-on broadcast timer to be present in the platform to relay the
> interrupt signalling the timer expiries.
> 
> For platforms implementing CPU core gating that do not implement an always-on
> HW timer or implement it in a broken way, this patch adds code to initialize
> the kernel software broadcast hrtimer upon boot. It relies on a dynamically

It would be best to use the term "hrtimer based broadcast device"
throughout the changelog for uniformity and to avoid confusion instead
of mixing it with "software broadcast".

> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> to CPUs in deep-idle states through its HW local timer device.
> 
> On systems with power management capabilities but no functional HW broadcast
> tick device, the hrtimer based clock event device allows the kernel to
> enter high-resolution timer mode, which improves system latencies and saves
> dynamic power.

Sorry but I do not understand the above paragraph. What do you mean by
"allows the kernel to enter high resolution timer mode" ? And how does
it improve system latency? I understand that the hrtimer based
clockevent device saves dynamic power since it provides a mechanism in
which cpus can enter deeper idle states.

> 
> The side effect of having a CPU always-on has implications on power management
> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> kept always in a shallow idle state by the kernel to relay timer interrupts,
> but at least leaves the kernel with a functional system with some working power
> management capabilities.
> 
> The hrtimer based clock event device has lowest possible rating so that,
> if a platform contains a functional HW clock event device with broadcast
> capabilities, that device is always chosen as a tick broadcast device instead
> of the software based one, now present by default.

I think this statement "instead of the software based one, now present
by default" is incorrect. The hrtimer based clock event device will come
into picture only when the arch calls tick_setup_hrtimer_broadcast()
explicitly. Otherwise either the arch should register a real clock
device which does broadcast or should disable deep idle states where the
local timers stop. So I would suggest skipping the last paragraph as it
is not conveying anything in specific. The fact that a clock device with
the highest rating will be chosen is already known and need not be
mentioned explicitly IMHO.

> 
> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm64/kernel/time.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index 29c39d5..3d43900 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -18,6 +18,7 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <linux/clockchips.h>
>  #include <linux/export.h>
>  #include <linux/kernel.h>
>  #include <linux/interrupt.h>
> @@ -67,6 +68,8 @@ void __init time_init(void)
>  
>  	clocksource_of_init();
>  
> +	tick_setup_hrtimer_broadcast();
> +
>  	arch_timer_rate = arch_timer_get_rate();
>  	if (!arch_timer_rate)
>  		panic("Unable to initialise architected timer.\n");
> 

You have defined flag "CPUIDLE_FLAG_TIMER_STOP" for your deep idle
states in which timer stops right?

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29 11:04   ` Preeti U Murthy
  0 siblings, 0 replies; 16+ messages in thread
From: Preeti U Murthy @ 2014-05-29 11:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Lorenzo,

On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
> On platforms implementing CPU power management, the CPUidle subsystem
> can allow CPUs to enter idle states where local timers logic is lost on power
> down. To keep the software timers functional the kernel relies on an
> always-on broadcast timer to be present in the platform to relay the
> interrupt signalling the timer expiries.
> 
> For platforms implementing CPU core gating that do not implement an always-on
> HW timer or implement it in a broken way, this patch adds code to initialize
> the kernel software broadcast hrtimer upon boot. It relies on a dynamically

It would be best to use the term "hrtimer based broadcast device"
throughout the changelog for uniformity and to avoid confusion instead
of mixing it with "software broadcast".

> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> to CPUs in deep-idle states through its HW local timer device.
> 
> On systems with power management capabilities but no functional HW broadcast
> tick device, the hrtimer based clock event device allows the kernel to
> enter high-resolution timer mode, which improves system latencies and saves
> dynamic power.

Sorry but I do not understand the above paragraph. What do you mean by
"allows the kernel to enter high resolution timer mode" ? And how does
it improve system latency? I understand that the hrtimer based
clockevent device saves dynamic power since it provides a mechanism in
which cpus can enter deeper idle states.

> 
> The side effect of having a CPU always-on has implications on power management
> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> kept always in a shallow idle state by the kernel to relay timer interrupts,
> but at least leaves the kernel with a functional system with some working power
> management capabilities.
> 
> The hrtimer based clock event device has lowest possible rating so that,
> if a platform contains a functional HW clock event device with broadcast
> capabilities, that device is always chosen as a tick broadcast device instead
> of the software based one, now present by default.

I think this statement "instead of the software based one, now present
by default" is incorrect. The hrtimer based clock event device will come
into picture only when the arch calls tick_setup_hrtimer_broadcast()
explicitly. Otherwise either the arch should register a real clock
device which does broadcast or should disable deep idle states where the
local timers stop. So I would suggest skipping the last paragraph as it
is not conveying anything in specific. The fact that a clock device with
the highest rating will be chosen is already known and need not be
mentioned explicitly IMHO.

> 
> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  arch/arm64/kernel/time.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index 29c39d5..3d43900 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -18,6 +18,7 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <linux/clockchips.h>
>  #include <linux/export.h>
>  #include <linux/kernel.h>
>  #include <linux/interrupt.h>
> @@ -67,6 +68,8 @@ void __init time_init(void)
>  
>  	clocksource_of_init();
>  
> +	tick_setup_hrtimer_broadcast();
> +
>  	arch_timer_rate = arch_timer_get_rate();
>  	if (!arch_timer_rate)
>  		panic("Unable to initialise architected timer.\n");
> 

You have defined flag "CPUIDLE_FLAG_TIMER_STOP" for your deep idle
states in which timer stops right?

Regards
Preeti U Murthy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29 11:04   ` Preeti U Murthy
@ 2014-05-29 12:39     ` Mark Rutland
  -1 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2014-05-29 12:39 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Lorenzo Pieralisi, linux-arm-kernel, linux-kernel, Will Deacon

Hi Preeti,

On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
> Hi Lorenzo,
> 
> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
> > On platforms implementing CPU power management, the CPUidle subsystem
> > can allow CPUs to enter idle states where local timers logic is lost on power
> > down. To keep the software timers functional the kernel relies on an
> > always-on broadcast timer to be present in the platform to relay the
> > interrupt signalling the timer expiries.
> > 
> > For platforms implementing CPU core gating that do not implement an always-on
> > HW timer or implement it in a broken way, this patch adds code to initialize
> > the kernel software broadcast hrtimer upon boot. It relies on a dynamically
> 
> It would be best to use the term "hrtimer based broadcast device"
> throughout the changelog for uniformity and to avoid confusion instead
> of mixing it with "software broadcast".
> 
> > chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> > to CPUs in deep-idle states through its HW local timer device.
> > 
> > On systems with power management capabilities but no functional HW broadcast
> > tick device, the hrtimer based clock event device allows the kernel to
> > enter high-resolution timer mode, which improves system latencies and saves
> > dynamic power.
> 
> Sorry but I do not understand the above paragraph. What do you mean by
> "allows the kernel to enter high resolution timer mode" ? And how does
> it improve system latency? I understand that the hrtimer based
> clockevent device saves dynamic power since it provides a mechanism in
> which cpus can enter deeper idle states.

When there's no oneshot-capable broadcast device and the CPU-local
clock_event_device has the CLK_EVT_FEAT_C3STOP flag,
tick_is_oneshot_available will return false. Thus
tick_check_oneshot_change will return false, and hrtimer_switch_to_hres
will never switch to high resolution mode (and we can also never enter
NOHZ mode), leaving us stuck in periodic mode.

In periodic mode ticks occur at fixed intervals, and any timer wakeup
will occur after the tick following the requested wakeup time, adding
some amount of latency over what would be possible with high resolution
mode. Additionally, as we can only wake up at said ticks and not between
them, it's possible that several timers for intervals shorter than that
tick interval will fire at once upon a timer tick. Any tasks which
requested these wakeups will fight for CPU time, and some will see
additional latency because of this.

> > 
> > The side effect of having a CPU always-on has implications on power management
> > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > but at least leaves the kernel with a functional system with some working power
> > management capabilities.
> > 
> > The hrtimer based clock event device has lowest possible rating so that,
> > if a platform contains a functional HW clock event device with broadcast
> > capabilities, that device is always chosen as a tick broadcast device instead
> > of the software based one, now present by default.
> 
> I think this statement "instead of the software based one, now present
> by default" is incorrect. The hrtimer based clock event device will come
> into picture only when the arch calls tick_setup_hrtimer_broadcast()
> explicitly. Otherwise either the arch should register a real clock
> device which does broadcast or should disable deep idle states where the
> local timers stop. So I would suggest skipping the last paragraph as it
> is not conveying anything in specific. The fact that a clock device with
> the highest rating will be chosen is already known and need not be
> mentioned explicitly IMHO.

I think it is worth keeping the paragraph to allay anyone's fear that
the hrtimer based broadcast device might be selected in preference to a
real suitable clock. I would otherwise not be aware that the hrtimer
based broadcast device had the lowest rating (and would have to go and
look that up separately).

As the arch code has delegated timer registration to
clocksoruce_of_init, it doesn't know whether any of the real devices
that may have been registered are suitable as a broadcast source for
oneshot events. So we can't conditionally register the hrtimer based
broadcast device.

Perhaps we could replace "now present by default" with "which is
unconditionally registered in case no suitable hardware device is
present"?

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29 12:39     ` Mark Rutland
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2014-05-29 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Preeti,

On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
> Hi Lorenzo,
> 
> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
> > On platforms implementing CPU power management, the CPUidle subsystem
> > can allow CPUs to enter idle states where local timers logic is lost on power
> > down. To keep the software timers functional the kernel relies on an
> > always-on broadcast timer to be present in the platform to relay the
> > interrupt signalling the timer expiries.
> > 
> > For platforms implementing CPU core gating that do not implement an always-on
> > HW timer or implement it in a broken way, this patch adds code to initialize
> > the kernel software broadcast hrtimer upon boot. It relies on a dynamically
> 
> It would be best to use the term "hrtimer based broadcast device"
> throughout the changelog for uniformity and to avoid confusion instead
> of mixing it with "software broadcast".
> 
> > chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> > to CPUs in deep-idle states through its HW local timer device.
> > 
> > On systems with power management capabilities but no functional HW broadcast
> > tick device, the hrtimer based clock event device allows the kernel to
> > enter high-resolution timer mode, which improves system latencies and saves
> > dynamic power.
> 
> Sorry but I do not understand the above paragraph. What do you mean by
> "allows the kernel to enter high resolution timer mode" ? And how does
> it improve system latency? I understand that the hrtimer based
> clockevent device saves dynamic power since it provides a mechanism in
> which cpus can enter deeper idle states.

When there's no oneshot-capable broadcast device and the CPU-local
clock_event_device has the CLK_EVT_FEAT_C3STOP flag,
tick_is_oneshot_available will return false. Thus
tick_check_oneshot_change will return false, and hrtimer_switch_to_hres
will never switch to high resolution mode (and we can also never enter
NOHZ mode), leaving us stuck in periodic mode.

In periodic mode ticks occur at fixed intervals, and any timer wakeup
will occur after the tick following the requested wakeup time, adding
some amount of latency over what would be possible with high resolution
mode. Additionally, as we can only wake up at said ticks and not between
them, it's possible that several timers for intervals shorter than that
tick interval will fire at once upon a timer tick. Any tasks which
requested these wakeups will fight for CPU time, and some will see
additional latency because of this.

> > 
> > The side effect of having a CPU always-on has implications on power management
> > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > but at least leaves the kernel with a functional system with some working power
> > management capabilities.
> > 
> > The hrtimer based clock event device has lowest possible rating so that,
> > if a platform contains a functional HW clock event device with broadcast
> > capabilities, that device is always chosen as a tick broadcast device instead
> > of the software based one, now present by default.
> 
> I think this statement "instead of the software based one, now present
> by default" is incorrect. The hrtimer based clock event device will come
> into picture only when the arch calls tick_setup_hrtimer_broadcast()
> explicitly. Otherwise either the arch should register a real clock
> device which does broadcast or should disable deep idle states where the
> local timers stop. So I would suggest skipping the last paragraph as it
> is not conveying anything in specific. The fact that a clock device with
> the highest rating will be chosen is already known and need not be
> mentioned explicitly IMHO.

I think it is worth keeping the paragraph to allay anyone's fear that
the hrtimer based broadcast device might be selected in preference to a
real suitable clock. I would otherwise not be aware that the hrtimer
based broadcast device had the lowest rating (and would have to go and
look that up separately).

As the arch code has delegated timer registration to
clocksoruce_of_init, it doesn't know whether any of the real devices
that may have been registered are suitable as a broadcast source for
oneshot events. So we can't conditionally register the hrtimer based
broadcast device.

Perhaps we could replace "now present by default" with "which is
unconditionally registered in case no suitable hardware device is
present"?

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29 11:04   ` Preeti U Murthy
@ 2014-05-29 14:25     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Pieralisi @ 2014-05-29 14:25 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: linux-arm-kernel, linux-kernel, Mark Rutland, Will Deacon

Hi Preeti,

On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
> Hi Lorenzo,
> 
> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
> > On platforms implementing CPU power management, the CPUidle subsystem
> > can allow CPUs to enter idle states where local timers logic is lost on power
> > down. To keep the software timers functional the kernel relies on an
> > always-on broadcast timer to be present in the platform to relay the
> > interrupt signalling the timer expiries.
> > 
> > For platforms implementing CPU core gating that do not implement an always-on
> > HW timer or implement it in a broken way, this patch adds code to initialize
> > the kernel software broadcast hrtimer upon boot. It relies on a dynamically
> 
> It would be best to use the term "hrtimer based broadcast device"
> throughout the changelog for uniformity and to avoid confusion instead
> of mixing it with "software broadcast".

Agreed.

> > chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> > to CPUs in deep-idle states through its HW local timer device.
> > 
> > On systems with power management capabilities but no functional HW broadcast
> > tick device, the hrtimer based clock event device allows the kernel to
> > enter high-resolution timer mode, which improves system latencies and saves
> > dynamic power.
> 
> Sorry but I do not understand the above paragraph. What do you mean by
> "allows the kernel to enter high resolution timer mode" ? And how does
> it improve system latency? I understand that the hrtimer based
> clockevent device saves dynamic power since it provides a mechanism in
> which cpus can enter deeper idle states.

See Mark's reply, I have nothing to add. I will remove this paragraph anyway.

> > The side effect of having a CPU always-on has implications on power management
> > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > but at least leaves the kernel with a functional system with some working power
> > management capabilities.
> > 
> > The hrtimer based clock event device has lowest possible rating so that,
> > if a platform contains a functional HW clock event device with broadcast
> > capabilities, that device is always chosen as a tick broadcast device instead
> > of the software based one, now present by default.
> 
> I think this statement "instead of the software based one, now present
> by default" is incorrect. The hrtimer based clock event device will come
> into picture only when the arch calls tick_setup_hrtimer_broadcast()
> explicitly. Otherwise either the arch should register a real clock
> device which does broadcast or should disable deep idle states where the
> local timers stop. So I would suggest skipping the last paragraph as it
> is not conveying anything in specific. The fact that a clock device with
> the highest rating will be chosen is already known and need not be
> mentioned explicitly IMHO.
> 
> > 
> > Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > ---
> >  arch/arm64/kernel/time.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> > index 29c39d5..3d43900 100644
> > --- a/arch/arm64/kernel/time.c
> > +++ b/arch/arm64/kernel/time.c
> > @@ -18,6 +18,7 @@
> >   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >   */
> >  
> > +#include <linux/clockchips.h>
> >  #include <linux/export.h>
> >  #include <linux/kernel.h>
> >  #include <linux/interrupt.h>
> > @@ -67,6 +68,8 @@ void __init time_init(void)
> >  
> >  	clocksource_of_init();
> >  
> > +	tick_setup_hrtimer_broadcast();
> > +
> >  	arch_timer_rate = arch_timer_get_rate();
> >  	if (!arch_timer_rate)
> >  		panic("Unable to initialise architected timer.\n");
> > 
> 
> You have defined flag "CPUIDLE_FLAG_TIMER_STOP" for your deep idle
> states in which timer stops right?

Yes, I would have noticed otherwise =)

Thanks,
Lorenzo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29 14:25     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Pieralisi @ 2014-05-29 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Preeti,

On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
> Hi Lorenzo,
> 
> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
> > On platforms implementing CPU power management, the CPUidle subsystem
> > can allow CPUs to enter idle states where local timers logic is lost on power
> > down. To keep the software timers functional the kernel relies on an
> > always-on broadcast timer to be present in the platform to relay the
> > interrupt signalling the timer expiries.
> > 
> > For platforms implementing CPU core gating that do not implement an always-on
> > HW timer or implement it in a broken way, this patch adds code to initialize
> > the kernel software broadcast hrtimer upon boot. It relies on a dynamically
> 
> It would be best to use the term "hrtimer based broadcast device"
> throughout the changelog for uniformity and to avoid confusion instead
> of mixing it with "software broadcast".

Agreed.

> > chosen CPU to be always powered-up. This CPU then relays the timer interrupt
> > to CPUs in deep-idle states through its HW local timer device.
> > 
> > On systems with power management capabilities but no functional HW broadcast
> > tick device, the hrtimer based clock event device allows the kernel to
> > enter high-resolution timer mode, which improves system latencies and saves
> > dynamic power.
> 
> Sorry but I do not understand the above paragraph. What do you mean by
> "allows the kernel to enter high resolution timer mode" ? And how does
> it improve system latency? I understand that the hrtimer based
> clockevent device saves dynamic power since it provides a mechanism in
> which cpus can enter deeper idle states.

See Mark's reply, I have nothing to add. I will remove this paragraph anyway.

> > The side effect of having a CPU always-on has implications on power management
> > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > but at least leaves the kernel with a functional system with some working power
> > management capabilities.
> > 
> > The hrtimer based clock event device has lowest possible rating so that,
> > if a platform contains a functional HW clock event device with broadcast
> > capabilities, that device is always chosen as a tick broadcast device instead
> > of the software based one, now present by default.
> 
> I think this statement "instead of the software based one, now present
> by default" is incorrect. The hrtimer based clock event device will come
> into picture only when the arch calls tick_setup_hrtimer_broadcast()
> explicitly. Otherwise either the arch should register a real clock
> device which does broadcast or should disable deep idle states where the
> local timers stop. So I would suggest skipping the last paragraph as it
> is not conveying anything in specific. The fact that a clock device with
> the highest rating will be chosen is already known and need not be
> mentioned explicitly IMHO.
> 
> > 
> > Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > ---
> >  arch/arm64/kernel/time.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> > index 29c39d5..3d43900 100644
> > --- a/arch/arm64/kernel/time.c
> > +++ b/arch/arm64/kernel/time.c
> > @@ -18,6 +18,7 @@
> >   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >   */
> >  
> > +#include <linux/clockchips.h>
> >  #include <linux/export.h>
> >  #include <linux/kernel.h>
> >  #include <linux/interrupt.h>
> > @@ -67,6 +68,8 @@ void __init time_init(void)
> >  
> >  	clocksource_of_init();
> >  
> > +	tick_setup_hrtimer_broadcast();
> > +
> >  	arch_timer_rate = arch_timer_get_rate();
> >  	if (!arch_timer_rate)
> >  		panic("Unable to initialise architected timer.\n");
> > 
> 
> You have defined flag "CPUIDLE_FLAG_TIMER_STOP" for your deep idle
> states in which timer stops right?

Yes, I would have noticed otherwise =)

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29 12:39     ` Mark Rutland
@ 2014-05-29 14:29       ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Pieralisi @ 2014-05-29 14:29 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Preeti U Murthy, linux-arm-kernel, linux-kernel, Will Deacon

On Thu, May 29, 2014 at 01:39:29PM +0100, Mark Rutland wrote:

[...]

> > > The side effect of having a CPU always-on has implications on power management
> > > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > > but at least leaves the kernel with a functional system with some working power
> > > management capabilities.
> > > 
> > > The hrtimer based clock event device has lowest possible rating so that,
> > > if a platform contains a functional HW clock event device with broadcast
> > > capabilities, that device is always chosen as a tick broadcast device instead
> > > of the software based one, now present by default.
> > 
> > I think this statement "instead of the software based one, now present
> > by default" is incorrect. The hrtimer based clock event device will come
> > into picture only when the arch calls tick_setup_hrtimer_broadcast()
> > explicitly. Otherwise either the arch should register a real clock
> > device which does broadcast or should disable deep idle states where the
> > local timers stop. So I would suggest skipping the last paragraph as it
> > is not conveying anything in specific. The fact that a clock device with
> > the highest rating will be chosen is already known and need not be
> > mentioned explicitly IMHO.
> 
> I think it is worth keeping the paragraph to allay anyone's fear that
> the hrtimer based broadcast device might be selected in preference to a
> real suitable clock. I would otherwise not be aware that the hrtimer
> based broadcast device had the lowest rating (and would have to go and
> look that up separately).
> 
> As the arch code has delegated timer registration to
> clocksoruce_of_init, it doesn't know whether any of the real devices
> that may have been registered are suitable as a broadcast source for
> oneshot events. So we can't conditionally register the hrtimer based
> broadcast device.
> 
> Perhaps we could replace "now present by default" with "which is
> unconditionally registered in case no suitable hardware device is
> present"?

How about this:

-- >8 --
Subject: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event
 device

On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel hrtimer based clock event device upon boot (which can be chosen as
tick broadcast device by the kernel).
It relies on a dynamically chosen CPU to be always powered-up. This CPU then
relays the timer interrupt to CPUs in deep-idle states through its HW local
timer device.

The side effect of having a CPU always-on has implications on power management
platform capabilities and makes CPUidle suboptimal, since at least a CPU is
kept always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working power
management capabilities.

The hrtimer based clock event device has lowest possible rating so that,
if a platform contains a functional HW clock event device with broadcast
capabilities, that device is always chosen as a tick broadcast device instead
of the hrtimer based one, which is unconditionally registered in case no
suitable hardware clock event device is present.

Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 6815987..1a7125c 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/clockchips.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
@@ -69,6 +70,8 @@ void __init time_init(void)
 	of_clk_init(NULL);
 	clocksource_of_init();
 
+	tick_setup_hrtimer_broadcast();
+
 	arch_timer_rate = arch_timer_get_rate();
 	if (!arch_timer_rate)
 		panic("Unable to initialise architected timer.\n");
-- 
1.8.4



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29 14:29       ` Lorenzo Pieralisi
  0 siblings, 0 replies; 16+ messages in thread
From: Lorenzo Pieralisi @ 2014-05-29 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 29, 2014 at 01:39:29PM +0100, Mark Rutland wrote:

[...]

> > > The side effect of having a CPU always-on has implications on power management
> > > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > > but at least leaves the kernel with a functional system with some working power
> > > management capabilities.
> > > 
> > > The hrtimer based clock event device has lowest possible rating so that,
> > > if a platform contains a functional HW clock event device with broadcast
> > > capabilities, that device is always chosen as a tick broadcast device instead
> > > of the software based one, now present by default.
> > 
> > I think this statement "instead of the software based one, now present
> > by default" is incorrect. The hrtimer based clock event device will come
> > into picture only when the arch calls tick_setup_hrtimer_broadcast()
> > explicitly. Otherwise either the arch should register a real clock
> > device which does broadcast or should disable deep idle states where the
> > local timers stop. So I would suggest skipping the last paragraph as it
> > is not conveying anything in specific. The fact that a clock device with
> > the highest rating will be chosen is already known and need not be
> > mentioned explicitly IMHO.
> 
> I think it is worth keeping the paragraph to allay anyone's fear that
> the hrtimer based broadcast device might be selected in preference to a
> real suitable clock. I would otherwise not be aware that the hrtimer
> based broadcast device had the lowest rating (and would have to go and
> look that up separately).
> 
> As the arch code has delegated timer registration to
> clocksoruce_of_init, it doesn't know whether any of the real devices
> that may have been registered are suitable as a broadcast source for
> oneshot events. So we can't conditionally register the hrtimer based
> broadcast device.
> 
> Perhaps we could replace "now present by default" with "which is
> unconditionally registered in case no suitable hardware device is
> present"?

How about this:

-- >8 --
Subject: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event
 device

On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel hrtimer based clock event device upon boot (which can be chosen as
tick broadcast device by the kernel).
It relies on a dynamically chosen CPU to be always powered-up. This CPU then
relays the timer interrupt to CPUs in deep-idle states through its HW local
timer device.

The side effect of having a CPU always-on has implications on power management
platform capabilities and makes CPUidle suboptimal, since at least a CPU is
kept always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working power
management capabilities.

The hrtimer based clock event device has lowest possible rating so that,
if a platform contains a functional HW clock event device with broadcast
capabilities, that device is always chosen as a tick broadcast device instead
of the hrtimer based one, which is unconditionally registered in case no
suitable hardware clock event device is present.

Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 6815987..1a7125c 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/clockchips.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
@@ -69,6 +70,8 @@ void __init time_init(void)
 	of_clk_init(NULL);
 	clocksource_of_init();
 
+	tick_setup_hrtimer_broadcast();
+
 	arch_timer_rate = arch_timer_get_rate();
 	if (!arch_timer_rate)
 		panic("Unable to initialise architected timer.\n");
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29 14:29       ` Lorenzo Pieralisi
@ 2014-05-29 16:48         ` Mark Rutland
  -1 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2014-05-29 16:48 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Preeti U Murthy, linux-arm-kernel, linux-kernel, Will Deacon

On Thu, May 29, 2014 at 03:29:12PM +0100, Lorenzo Pieralisi wrote:
> On Thu, May 29, 2014 at 01:39:29PM +0100, Mark Rutland wrote:
> 
> [...]
> 
> > > > The side effect of having a CPU always-on has implications on power management
> > > > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > > > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > > > but at least leaves the kernel with a functional system with some working power
> > > > management capabilities.
> > > > 
> > > > The hrtimer based clock event device has lowest possible rating so that,
> > > > if a platform contains a functional HW clock event device with broadcast
> > > > capabilities, that device is always chosen as a tick broadcast device instead
> > > > of the software based one, now present by default.
> > > 
> > > I think this statement "instead of the software based one, now present
> > > by default" is incorrect. The hrtimer based clock event device will come
> > > into picture only when the arch calls tick_setup_hrtimer_broadcast()
> > > explicitly. Otherwise either the arch should register a real clock
> > > device which does broadcast or should disable deep idle states where the
> > > local timers stop. So I would suggest skipping the last paragraph as it
> > > is not conveying anything in specific. The fact that a clock device with
> > > the highest rating will be chosen is already known and need not be
> > > mentioned explicitly IMHO.
> > 
> > I think it is worth keeping the paragraph to allay anyone's fear that
> > the hrtimer based broadcast device might be selected in preference to a
> > real suitable clock. I would otherwise not be aware that the hrtimer
> > based broadcast device had the lowest rating (and would have to go and
> > look that up separately).
> > 
> > As the arch code has delegated timer registration to
> > clocksoruce_of_init, it doesn't know whether any of the real devices
> > that may have been registered are suitable as a broadcast source for
> > oneshot events. So we can't conditionally register the hrtimer based
> > broadcast device.
> > 
> > Perhaps we could replace "now present by default" with "which is
> > unconditionally registered in case no suitable hardware device is
> > present"?
> 
> How about this:
> 
> -- >8 --
> Subject: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event
>  device
> 
> On platforms implementing CPU power management, the CPUidle subsystem
> can allow CPUs to enter idle states where local timers logic is lost on power
> down. To keep the software timers functional the kernel relies on an
> always-on broadcast timer to be present in the platform to relay the
> interrupt signalling the timer expiries.
> 
> For platforms implementing CPU core gating that do not implement an always-on
> HW timer or implement it in a broken way, this patch adds code to initialize
> the kernel hrtimer based clock event device upon boot (which can be chosen as
> tick broadcast device by the kernel).
> It relies on a dynamically chosen CPU to be always powered-up. This CPU then
> relays the timer interrupt to CPUs in deep-idle states through its HW local
> timer device.
> 
> The side effect of having a CPU always-on has implications on power management
> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> kept always in a shallow idle state by the kernel to relay timer interrupts,
> but at least leaves the kernel with a functional system with some working power
> management capabilities.

I think "The side effect of" is redundant, but otherwise this is fine.

> 
> The hrtimer based clock event device has lowest possible rating so that,
> if a platform contains a functional HW clock event device with broadcast
> capabilities, that device is always chosen as a tick broadcast device instead
> of the hrtimer based one, which is unconditionally registered in case no
> suitable hardware clock event device is present.

The last paragaph jumps back and forward a bit. How about:

The hrtimer based clock event device is unconditionally registered, but
has the lowest possible rating such that any broadcast-capable HW clock
event device present will be chosen in preference as the tick broadcast
device.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-29 16:48         ` Mark Rutland
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Rutland @ 2014-05-29 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 29, 2014 at 03:29:12PM +0100, Lorenzo Pieralisi wrote:
> On Thu, May 29, 2014 at 01:39:29PM +0100, Mark Rutland wrote:
> 
> [...]
> 
> > > > The side effect of having a CPU always-on has implications on power management
> > > > platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> > > > kept always in a shallow idle state by the kernel to relay timer interrupts,
> > > > but at least leaves the kernel with a functional system with some working power
> > > > management capabilities.
> > > > 
> > > > The hrtimer based clock event device has lowest possible rating so that,
> > > > if a platform contains a functional HW clock event device with broadcast
> > > > capabilities, that device is always chosen as a tick broadcast device instead
> > > > of the software based one, now present by default.
> > > 
> > > I think this statement "instead of the software based one, now present
> > > by default" is incorrect. The hrtimer based clock event device will come
> > > into picture only when the arch calls tick_setup_hrtimer_broadcast()
> > > explicitly. Otherwise either the arch should register a real clock
> > > device which does broadcast or should disable deep idle states where the
> > > local timers stop. So I would suggest skipping the last paragraph as it
> > > is not conveying anything in specific. The fact that a clock device with
> > > the highest rating will be chosen is already known and need not be
> > > mentioned explicitly IMHO.
> > 
> > I think it is worth keeping the paragraph to allay anyone's fear that
> > the hrtimer based broadcast device might be selected in preference to a
> > real suitable clock. I would otherwise not be aware that the hrtimer
> > based broadcast device had the lowest rating (and would have to go and
> > look that up separately).
> > 
> > As the arch code has delegated timer registration to
> > clocksoruce_of_init, it doesn't know whether any of the real devices
> > that may have been registered are suitable as a broadcast source for
> > oneshot events. So we can't conditionally register the hrtimer based
> > broadcast device.
> > 
> > Perhaps we could replace "now present by default" with "which is
> > unconditionally registered in case no suitable hardware device is
> > present"?
> 
> How about this:
> 
> -- >8 --
> Subject: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event
>  device
> 
> On platforms implementing CPU power management, the CPUidle subsystem
> can allow CPUs to enter idle states where local timers logic is lost on power
> down. To keep the software timers functional the kernel relies on an
> always-on broadcast timer to be present in the platform to relay the
> interrupt signalling the timer expiries.
> 
> For platforms implementing CPU core gating that do not implement an always-on
> HW timer or implement it in a broken way, this patch adds code to initialize
> the kernel hrtimer based clock event device upon boot (which can be chosen as
> tick broadcast device by the kernel).
> It relies on a dynamically chosen CPU to be always powered-up. This CPU then
> relays the timer interrupt to CPUs in deep-idle states through its HW local
> timer device.
> 
> The side effect of having a CPU always-on has implications on power management
> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
> kept always in a shallow idle state by the kernel to relay timer interrupts,
> but at least leaves the kernel with a functional system with some working power
> management capabilities.

I think "The side effect of" is redundant, but otherwise this is fine.

> 
> The hrtimer based clock event device has lowest possible rating so that,
> if a platform contains a functional HW clock event device with broadcast
> capabilities, that device is always chosen as a tick broadcast device instead
> of the hrtimer based one, which is unconditionally registered in case no
> suitable hardware clock event device is present.

The last paragaph jumps back and forward a bit. How about:

The hrtimer based clock event device is unconditionally registered, but
has the lowest possible rating such that any broadcast-capable HW clock
event device present will be chosen in preference as the tick broadcast
device.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
  2014-05-29 12:39     ` Mark Rutland
@ 2014-05-30  5:54       ` Preeti U Murthy
  -1 siblings, 0 replies; 16+ messages in thread
From: Preeti U Murthy @ 2014-05-30  5:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Lorenzo Pieralisi, linux-arm-kernel, linux-kernel, Will Deacon

On 05/29/2014 06:09 PM, Mark Rutland wrote:
> Hi Preeti,
> 
> On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
>> Hi Lorenzo,
>>
>> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
>>> On platforms implementing CPU power management, the CPUidle subsystem
>>> can allow CPUs to enter idle states where local timers logic is lost on power
>>> down. To keep the software timers functional the kernel relies on an
>>> always-on broadcast timer to be present in the platform to relay the
>>> interrupt signalling the timer expiries.
>>>
>>> For platforms implementing CPU core gating that do not implement an always-on
>>> HW timer or implement it in a broken way, this patch adds code to initialize
>>> the kernel software broadcast hrtimer upon boot. It relies on a dynamically
>>
>> It would be best to use the term "hrtimer based broadcast device"
>> throughout the changelog for uniformity and to avoid confusion instead
>> of mixing it with "software broadcast".
>>
>>> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
>>> to CPUs in deep-idle states through its HW local timer device.
>>>
>>> On systems with power management capabilities but no functional HW broadcast
>>> tick device, the hrtimer based clock event device allows the kernel to
>>> enter high-resolution timer mode, which improves system latencies and saves
>>> dynamic power.
>>
>> Sorry but I do not understand the above paragraph. What do you mean by
>> "allows the kernel to enter high resolution timer mode" ? And how does
>> it improve system latency? I understand that the hrtimer based
>> clockevent device saves dynamic power since it provides a mechanism in
>> which cpus can enter deeper idle states.
> 
> When there's no oneshot-capable broadcast device and the CPU-local
> clock_event_device has the CLK_EVT_FEAT_C3STOP flag,
> tick_is_oneshot_available will return false. Thus
> tick_check_oneshot_change will return false, and hrtimer_switch_to_hres
> will never switch to high resolution mode (and we can also never enter
> NOHZ mode), leaving us stuck in periodic mode.
> 
> In periodic mode ticks occur at fixed intervals, and any timer wakeup
> will occur after the tick following the requested wakeup time, adding
> some amount of latency over what would be possible with high resolution
> mode. Additionally, as we can only wake up at said ticks and not between
> them, it's possible that several timers for intervals shorter than that
> tick interval will fire at once upon a timer tick. Any tasks which
> requested these wakeups will fight for CPU time, and some will see
> additional latency because of this.

Ah ok I see now. Thanks for the explanation :)

> 
>>>
>>> The side effect of having a CPU always-on has implications on power management
>>> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
>>> kept always in a shallow idle state by the kernel to relay timer interrupts,
>>> but at least leaves the kernel with a functional system with some working power
>>> management capabilities.
>>>
>>> The hrtimer based clock event device has lowest possible rating so that,
>>> if a platform contains a functional HW clock event device with broadcast
>>> capabilities, that device is always chosen as a tick broadcast device instead
>>> of the software based one, now present by default.
>>
>> I think this statement "instead of the software based one, now present
>> by default" is incorrect. The hrtimer based clock event device will come
>> into picture only when the arch calls tick_setup_hrtimer_broadcast()
>> explicitly. Otherwise either the arch should register a real clock
>> device which does broadcast or should disable deep idle states where the
>> local timers stop. So I would suggest skipping the last paragraph as it
>> is not conveying anything in specific. The fact that a clock device with
>> the highest rating will be chosen is already known and need not be
>> mentioned explicitly IMHO.
> 
> I think it is worth keeping the paragraph to allay anyone's fear that
> the hrtimer based broadcast device might be selected in preference to a
> real suitable clock. I would otherwise not be aware that the hrtimer
> based broadcast device had the lowest rating (and would have to go and
> look that up separately).
> 
> As the arch code has delegated timer registration to
> clocksoruce_of_init, it doesn't know whether any of the real devices
> that may have been registered are suitable as a broadcast source for
> oneshot events. So we can't conditionally register the hrtimer based
> broadcast device.
> 
> Perhaps we could replace "now present by default" with "which is
> unconditionally registered in case no suitable hardware device is
> present"?

Yes I would say "which gets registered in case no suitable hardware
devices is present." This would make it clearer.

Regards
Preeti U Murthy
> 
> Cheers,
> Mark.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
@ 2014-05-30  5:54       ` Preeti U Murthy
  0 siblings, 0 replies; 16+ messages in thread
From: Preeti U Murthy @ 2014-05-30  5:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/29/2014 06:09 PM, Mark Rutland wrote:
> Hi Preeti,
> 
> On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
>> Hi Lorenzo,
>>
>> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
>>> On platforms implementing CPU power management, the CPUidle subsystem
>>> can allow CPUs to enter idle states where local timers logic is lost on power
>>> down. To keep the software timers functional the kernel relies on an
>>> always-on broadcast timer to be present in the platform to relay the
>>> interrupt signalling the timer expiries.
>>>
>>> For platforms implementing CPU core gating that do not implement an always-on
>>> HW timer or implement it in a broken way, this patch adds code to initialize
>>> the kernel software broadcast hrtimer upon boot. It relies on a dynamically
>>
>> It would be best to use the term "hrtimer based broadcast device"
>> throughout the changelog for uniformity and to avoid confusion instead
>> of mixing it with "software broadcast".
>>
>>> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
>>> to CPUs in deep-idle states through its HW local timer device.
>>>
>>> On systems with power management capabilities but no functional HW broadcast
>>> tick device, the hrtimer based clock event device allows the kernel to
>>> enter high-resolution timer mode, which improves system latencies and saves
>>> dynamic power.
>>
>> Sorry but I do not understand the above paragraph. What do you mean by
>> "allows the kernel to enter high resolution timer mode" ? And how does
>> it improve system latency? I understand that the hrtimer based
>> clockevent device saves dynamic power since it provides a mechanism in
>> which cpus can enter deeper idle states.
> 
> When there's no oneshot-capable broadcast device and the CPU-local
> clock_event_device has the CLK_EVT_FEAT_C3STOP flag,
> tick_is_oneshot_available will return false. Thus
> tick_check_oneshot_change will return false, and hrtimer_switch_to_hres
> will never switch to high resolution mode (and we can also never enter
> NOHZ mode), leaving us stuck in periodic mode.
> 
> In periodic mode ticks occur at fixed intervals, and any timer wakeup
> will occur after the tick following the requested wakeup time, adding
> some amount of latency over what would be possible with high resolution
> mode. Additionally, as we can only wake up at said ticks and not between
> them, it's possible that several timers for intervals shorter than that
> tick interval will fire at once upon a timer tick. Any tasks which
> requested these wakeups will fight for CPU time, and some will see
> additional latency because of this.

Ah ok I see now. Thanks for the explanation :)

> 
>>>
>>> The side effect of having a CPU always-on has implications on power management
>>> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
>>> kept always in a shallow idle state by the kernel to relay timer interrupts,
>>> but at least leaves the kernel with a functional system with some working power
>>> management capabilities.
>>>
>>> The hrtimer based clock event device has lowest possible rating so that,
>>> if a platform contains a functional HW clock event device with broadcast
>>> capabilities, that device is always chosen as a tick broadcast device instead
>>> of the software based one, now present by default.
>>
>> I think this statement "instead of the software based one, now present
>> by default" is incorrect. The hrtimer based clock event device will come
>> into picture only when the arch calls tick_setup_hrtimer_broadcast()
>> explicitly. Otherwise either the arch should register a real clock
>> device which does broadcast or should disable deep idle states where the
>> local timers stop. So I would suggest skipping the last paragraph as it
>> is not conveying anything in specific. The fact that a clock device with
>> the highest rating will be chosen is already known and need not be
>> mentioned explicitly IMHO.
> 
> I think it is worth keeping the paragraph to allay anyone's fear that
> the hrtimer based broadcast device might be selected in preference to a
> real suitable clock. I would otherwise not be aware that the hrtimer
> based broadcast device had the lowest rating (and would have to go and
> look that up separately).
> 
> As the arch code has delegated timer registration to
> clocksoruce_of_init, it doesn't know whether any of the real devices
> that may have been registered are suitable as a broadcast source for
> oneshot events. So we can't conditionally register the hrtimer based
> broadcast device.
> 
> Perhaps we could replace "now present by default" with "which is
> unconditionally registered in case no suitable hardware device is
> present"?

Yes I would say "which gets registered in case no suitable hardware
devices is present." This would make it clearer.

Regards
Preeti U Murthy
> 
> Cheers,
> Mark.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-05-30  5:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-29  9:23 [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device Lorenzo Pieralisi
2014-05-29  9:23 ` Lorenzo Pieralisi
2014-05-29 10:14 ` Will Deacon
2014-05-29 10:14   ` Will Deacon
2014-05-29 11:04 ` Preeti U Murthy
2014-05-29 11:04   ` Preeti U Murthy
2014-05-29 12:39   ` Mark Rutland
2014-05-29 12:39     ` Mark Rutland
2014-05-29 14:29     ` Lorenzo Pieralisi
2014-05-29 14:29       ` Lorenzo Pieralisi
2014-05-29 16:48       ` Mark Rutland
2014-05-29 16:48         ` Mark Rutland
2014-05-30  5:54     ` Preeti U Murthy
2014-05-30  5:54       ` Preeti U Murthy
2014-05-29 14:25   ` Lorenzo Pieralisi
2014-05-29 14:25     ` Lorenzo Pieralisi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.