linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
@ 2009-04-20 21:16 Jon Hunter
  2009-04-21  6:35 ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-04-20 21:16 UTC (permalink / raw)
  To: linux-kernel

Hello,

 From reviewing the dynamic tick code, I noticed that the "max_delta_ns"
member of the "clock_event_device" structure, represents the upper limit
between timer events in nanoseconds. The variable, "max_delta_ns", is
defined as an unsigned long. An unsigned long is a 32-bit integer for 
32-bit machines and is a 64-bit integer for 64-bit machines (if -m64 
option is used for gcc). Also see [1].

The variable, "max_delta_ns", is configured by calling function 
"clockevent_delta2ns()". The maximum value that "max_delta_ns" can be 
set to by calling clockevent_delta2ns(), is LONG_MAX. For a 32-bit 
machine LONG_MAX is equal to 0x7fffffff and in nanoseconds this equates 
to ~2.15 seconds. Hence, the maximum sleep time for a 32-bit machine is 
~2.15 seconds, where as for a 64-bit machine it will be many years.

Therefore, I wanted to propose changing the type of max_delta_ns to be 
"unsigned long long" instead of "unsigned long". My understanding is 
that a variable of type "unsigned long long" is 64-bits for both 32-bit 
and 64-bit machines and hence this would allow a 32-bit machine to sleep 
for longer than ~2.15 seconds. Please note that I also ended up making 
"min_delta_ns" of type "unsigned long long" too and although this is 
probably very unnecessary, it made the patch simpler. See below.

Anyway, making this change has allowed my 32-bit machine to sleep for 
longer than ~2.15 seconds and thought this could be of interest to 
others. Your comments/feedback would be appreciated.

[1] http://en.wikipedia.org/wiki/64-bit#64-bit_data_models

Cheers
Jon


Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/clockchips.h |    6 +++---
  kernel/hrtimer.c           |    2 +-
  kernel/time/clockevents.c  |   10 +++++-----
  kernel/time/tick-oneshot.c |    2 +-
  kernel/time/timer_list.c   |    4 ++--
  5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 3a1dbba..8154bc6 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -77,8 +77,8 @@ enum clock_event_nofitiers {
  struct clock_event_device {
  	const char		*name;
  	unsigned int		features;
-	unsigned long		max_delta_ns;
-	unsigned long		min_delta_ns;
+	unsigned long long	max_delta_ns;
+	unsigned long long	min_delta_ns;
  	unsigned long		mult;
  	int			shift;
  	int			rating;
@@ -116,7 +116,7 @@ static inline unsigned long div_sc(unsigned long 
ticks, unsigned long nsec,
  }

  /* Clock event layer functions */
-extern unsigned long clockevent_delta2ns(unsigned long latch,
+extern unsigned long long clockevent_delta2ns(unsigned long latch,
  					 struct clock_event_device *evt);
  extern void clockevents_register_device(struct clock_event_device *dev);

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cb8a15c..5b1cdc4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1199,7 +1199,7 @@ hrtimer_interrupt_hanging(struct 
clock_event_device *dev,
  	force_clock_reprogram = 1;
  	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
  	printk(KERN_WARNING "hrtimer: interrupt too slow, "
-		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
+		"forcing clock min delta to %llu ns\n", dev->min_delta_ns);
  }
  /*
   * High resolution timer interrupt
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index d13be21..3fa07b3 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -36,10 +36,10 @@ static DEFINE_SPINLOCK(clockevents_lock);
   *
   * Math helper, returns latch value converted to nanoseconds (bound 
checked)
   */
-unsigned long clockevent_delta2ns(unsigned long latch,
+unsigned long long clockevent_delta2ns(unsigned long latch,
  				  struct clock_event_device *evt)
  {
-	u64 clc = ((u64) latch << evt->shift);
+	unsigned long long clc = ((unsigned long long) latch << evt->shift);

  	if (unlikely(!evt->mult)) {
  		evt->mult = 1;
@@ -49,10 +49,10 @@ unsigned long clockevent_delta2ns(unsigned long latch,
  	do_div(clc, evt->mult);
  	if (clc < 1000)
  		clc = 1000;
-	if (clc > LONG_MAX)
-		clc = LONG_MAX;
+	if (clc > LLONG_MAX)
+		clc = LLONG_MAX;

-	return (unsigned long) clc;
+	return clc;
  }

  /**
diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
index 2e8de67..857087b 100644
--- a/kernel/time/tick-oneshot.c
+++ b/kernel/time/tick-oneshot.c
@@ -50,7 +50,7 @@ int tick_dev_program_event(struct clock_event_device 
*dev, ktime_t expires,
  				dev->min_delta_ns += dev->min_delta_ns >> 1;

  			printk(KERN_WARNING
-			       "CE: %s increasing min_delta_ns to %lu nsec\n",
+			       "CE: %s increasing min_delta_ns to %llu nsec\n",
  			       dev->name ? dev->name : "?",
  			       dev->min_delta_ns << 1);

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index a999b92..3bf30b4 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -204,8 +204,8 @@ print_tickdevice(struct seq_file *m, struct 
tick_device *td, int cpu)
  		return;
  	}
  	SEQ_printf(m, "%s\n", dev->name);
-	SEQ_printf(m, " max_delta_ns:   %lu\n", dev->max_delta_ns);
-	SEQ_printf(m, " min_delta_ns:   %lu\n", dev->min_delta_ns);
+	SEQ_printf(m, " max_delta_ns:   %llu\n", dev->max_delta_ns);
+	SEQ_printf(m, " min_delta_ns:   %llu\n", dev->min_delta_ns);
  	SEQ_printf(m, " mult:           %lu\n", dev->mult);
  	SEQ_printf(m, " shift:          %d\n", dev->shift);
  	SEQ_printf(m, " mode:           %d\n", dev->mode);
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-20 21:16 [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
@ 2009-04-21  6:35 ` Ingo Molnar
  2009-04-21 20:32   ` john stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2009-04-21  6:35 UTC (permalink / raw)
  To: Jon Hunter, Thomas Gleixner, John Stultz; +Cc: linux-kernel


* Jon Hunter <jon-hunter@ti.com> wrote:

> Hello,
>
> From reviewing the dynamic tick code, I noticed that the 
> "max_delta_ns" member of the "clock_event_device" structure, 
> represents the upper limit between timer events in nanoseconds. 
> The variable, "max_delta_ns", is defined as an unsigned long. An 
> unsigned long is a 32-bit integer for 32-bit machines and is a 
> 64-bit integer for 64-bit machines (if -m64 option is used for 
> gcc). Also see [1].
>
> The variable, "max_delta_ns", is configured by calling function 
> "clockevent_delta2ns()". The maximum value that "max_delta_ns" can 
> be set to by calling clockevent_delta2ns(), is LONG_MAX. For a 
> 32-bit machine LONG_MAX is equal to 0x7fffffff and in nanoseconds 
> this equates to ~2.15 seconds. Hence, the maximum sleep time for a 
> 32-bit machine is ~2.15 seconds, where as for a 64-bit machine it 
> will be many years.
>
> Therefore, I wanted to propose changing the type of max_delta_ns 
> to be "unsigned long long" instead of "unsigned long". My 
> understanding is that a variable of type "unsigned long long" is 
> 64-bits for both 32-bit and 64-bit machines and hence this would 
> allow a 32-bit machine to sleep for longer than ~2.15 seconds. 
> Please note that I also ended up making "min_delta_ns" of type 
> "unsigned long long" too and although this is probably very 
> unnecessary, it made the patch simpler. See below.
>
> Anyway, making this change has allowed my 32-bit machine to sleep 
> for longer than ~2.15 seconds and thought this could be of 
> interest to others. Your comments/feedback would be appreciated.
>
> [1] http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
>
> Cheers
> Jon
>
>
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>
> ---
>  include/linux/clockchips.h |    6 +++---
>  kernel/hrtimer.c           |    2 +-
>  kernel/time/clockevents.c  |   10 +++++-----
>  kernel/time/tick-oneshot.c |    2 +-
>  kernel/time/timer_list.c   |    4 ++--
>  5 files changed, 12 insertions(+), 12 deletions(-)

Looks like a good thing at first sight. Thomas, John, what do you 
think?

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-21  6:35 ` Ingo Molnar
@ 2009-04-21 20:32   ` john stultz
  2009-04-21 23:20     ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: john stultz @ 2009-04-21 20:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Jon Hunter, Thomas Gleixner, linux-kernel

On Tue, 2009-04-21 at 08:35 +0200, Ingo Molnar wrote:
> * Jon Hunter <jon-hunter@ti.com> wrote:
> > Anyway, making this change has allowed my 32-bit machine to sleep 
> > for longer than ~2.15 seconds and thought this could be of 
> > interest to others. Your comments/feedback would be appreciated.
> >
> Looks like a good thing at first sight. Thomas, John, what do you 
> think?

The concern is many clocksources wrap after a handful of seconds. The
acpi_pm is the best example (its only 24 bits wide). 

I brought this issue up earlier, and provided some example code that
could be used to limit the idle time appropriately for the current
clocksource here:

http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html

Jon: Did you see that mail, or is there a reason you didn't adapt this
code into your patch? 


thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-21 20:32   ` john stultz
@ 2009-04-21 23:20     ` Jon Hunter
  2009-04-22  0:02       ` john stultz
  2009-04-22  0:05       ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds john stultz
  0 siblings, 2 replies; 30+ messages in thread
From: Jon Hunter @ 2009-04-21 23:20 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


john stultz wrote:
> The concern is many clocksources wrap after a handful of seconds. The
> acpi_pm is the best example (its only 24 bits wide). 
> 
> I brought this issue up earlier, and provided some example code that
> could be used to limit the idle time appropriately for the current
> clocksource here:
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> 
> Jon: Did you see that mail, or is there a reason you didn't adapt this
> code into your patch? 

Hi John, yes I did read this email and thanks for bringing this up.

As I looked at this more I noticed that for 64-bit machines that the 
max_delta_ns would be a 64-bit integer already and so this change would 
only have an impact for 32-bit machines. I understand that there are 
more 32-bit machines that 64-bit. However, I was trying to understand 
how the wrapping of clocksources, such as the one you mention above, is 
handled today for 64-bit machines that could theoretically sleep for 
longer periods.

In addition to this as I was reviewing the "tick_nohz_stop_sched_tick()" 
function that is configuring the dynamic tick and I noticed that this 
code would actually stop the timer altogether if the time for the next 
timer event is greater than NEXT_TIMER_MAX_DELTA jiffies. See code 
snippet below. This is very unlikely, however, if this scenario was to 
occur what would be the impact on the clocksource?

           /*
           * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
           * there is no timer pending or at least extremly far 

           * into the future (12 days for HZ=1000). In this case
           * we simply stop the tick timer:
           */
           if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
                   ts->idle_expires.tv64 = KTIME_MAX;
                   if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
                           hrtimer_cancel(&ts->sched_timer);
                   goto out;
           }

I understand that clocksources need to be handled correctly, but as I 
looked into this more I wanted to clarify how this is handled today for 
64-bit machines. I appreciate your comments and feedback.

Cheers
Jon

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-21 23:20     ` Jon Hunter
@ 2009-04-22  0:02       ` john stultz
  2009-05-07 14:52         ` Jon Hunter
  2009-04-22  0:05       ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds john stultz
  1 sibling, 1 reply; 30+ messages in thread
From: john stultz @ 2009-04-22  0:02 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Tue, 2009-04-21 at 18:20 -0500, Jon Hunter wrote:
> john stultz wrote:
> > The concern is many clocksources wrap after a handful of seconds. The
> > acpi_pm is the best example (its only 24 bits wide). 
> > 
> > I brought this issue up earlier, and provided some example code that
> > could be used to limit the idle time appropriately for the current
> > clocksource here:
> > 
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> > 
> > Jon: Did you see that mail, or is there a reason you didn't adapt this
> > code into your patch? 
> 
> Hi John, yes I did read this email and thanks for bringing this up.
> 
> As I looked at this more I noticed that for 64-bit machines that the 
> max_delta_ns would be a 64-bit integer already and so this change would 
> only have an impact for 32-bit machines. I understand that there are 
> more 32-bit machines that 64-bit. However, I was trying to understand 
> how the wrapping of clocksources, such as the one you mention above, is 
> handled today for 64-bit machines that could theoretically sleep for 
> longer periods.

As the actual max_delta_ns currently comes from the clockevent driver,
that is our current limitation. For instance, on a box I'm using, the
lapic's max_delta_ns is a little more then 600ms. The HPET's does allow
for ~149sec timers, which would break with acpi_pm, but I suspect boxes
using the HPET for tick interrupts are currently using HPET for the
clocksource as well, which keeps it safe.

So I suspect its mostly luck that system configs don't hit the issue
that's saved us so far on 64bit boxes.

So yes, while your patch in of itself doesn't create the issue, it does
open the window a bit more. I'm just saying we need to add the
clocksource limiting factor in before more odd configs trip over it. :)


> In addition to this as I was reviewing the "tick_nohz_stop_sched_tick()" 
> function that is configuring the dynamic tick and I noticed that this 
> code would actually stop the timer altogether if the time for the next 
> timer event is greater than NEXT_TIMER_MAX_DELTA jiffies. See code 
> snippet below. This is very unlikely, however, if this scenario was to 
> occur what would be the impact on the clocksource?
> 
>            /*
>            * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
>            * there is no timer pending or at least extremly far 
> 
>            * into the future (12 days for HZ=1000). In this case
>            * we simply stop the tick timer:
>            */
>            if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
>                    ts->idle_expires.tv64 = KTIME_MAX;
>                    if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>                            hrtimer_cancel(&ts->sched_timer);
>                    goto out;
>            }

Fair point. Thomas? Why do we kill the timer in the above case? Can that
catch us on an UP box? If so what would ever wake us up if there were no
other system interrupts?

thanks
-john





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-21 23:20     ` Jon Hunter
  2009-04-22  0:02       ` john stultz
@ 2009-04-22  0:05       ` john stultz
  2009-04-22  3:07         ` Jon Hunter
  1 sibling, 1 reply; 30+ messages in thread
From: john stultz @ 2009-04-22  0:05 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Tue, 2009-04-21 at 18:20 -0500, Jon Hunter wrote:
> john stultz wrote:
> > The concern is many clocksources wrap after a handful of seconds. The
> > acpi_pm is the best example (its only 24 bits wide). 
> > 
> > I brought this issue up earlier, and provided some example code that
> > could be used to limit the idle time appropriately for the current
> > clocksource here:
> > 
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> > 
> > Jon: Did you see that mail, or is there a reason you didn't adapt this
> > code into your patch? 
> 
> Hi John, yes I did read this email and thanks for bringing this up.
> 
> As I looked at this more I noticed that for 64-bit machines that the 
> max_delta_ns would be a 64-bit integer already and so this change would 
> only have an impact for 32-bit machines. I understand that there are 
> more 32-bit machines that 64-bit. However, I was trying to understand 
> how the wrapping of clocksources, such as the one you mention above, is 
> handled today for 64-bit machines that could theoretically sleep for 
> longer periods.

One other minor comment nit, if we're really meaning that max_delta_ns
is a 64bit value, should we not just be using s64 and be explicit
instead of converting longs to long longs?

thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-22  0:05       ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds john stultz
@ 2009-04-22  3:07         ` Jon Hunter
  2009-04-22 15:30           ` Chris Friesen
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-04-22  3:07 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

john stultz wrote:
> One other minor comment nit, if we're really meaning that max_delta_ns
> is a 64bit value, should we not just be using s64 and be explicit
> instead of converting longs to long longs?

Thanks for the feedback. I am in two minds about this. I agree and I 
would prefer to use s64/u64 as this is explicit with regard to the size 
of the data type. However, for right or wrong I ended up with long long 
because...

a). The function clockevent_delta2ns() uses LONG_MAX (or in my 
suggestion LLONG_MAX) as the upper limit. LONG_MAX and LLONG_MAX are 
defined as a long and long long respectively.

#define LONG_MAX        ((long)(~0UL>>1))
#define LLONG_MAX       ((long long)(~0ULL>>1))

b). There are a couple prints in the kernel for display max_delta_ns and 
min_delta_ns. The prints use %lu and %llu to indicate long and long long 
types respectively.

So to avoid using casts or possibly a type mismatch for some 
architecture that I may have overlooked I kept it as long long. So this 
assumes that long long will be a 64-bit type which I don't like. 
However, this would not cause any compilation issues even if long long 
turned out to be 32-bits or 128-bits (if this is even possible). We 
could use u64 and cast where necessary to be safe/correct if this is 
preferred.

Cheers
Jon

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-22  3:07         ` Jon Hunter
@ 2009-04-22 15:30           ` Chris Friesen
  2009-04-22 17:04             ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: Chris Friesen @ 2009-04-22 15:30 UTC (permalink / raw)
  To: Jon Hunter; +Cc: john stultz, Ingo Molnar, Thomas Gleixner, linux-kernel

Jon Hunter wrote:

> So this 
> assumes that long long will be a 64-bit type which I don't like. 
> However, this would not cause any compilation issues even if long long 
> turned out to be 32-bits or 128-bits (if this is even possible).

Isn't "long long" guaranteed to be 64-bit on all linux systems?

Unless the width is critical, I'd prefer to stay away from u64 until it 
gets unified between architectures.  I recently ran into a problem 
printk-ing a "u64" value because it was a different type on ppc64 than 
x86-64.


Chris

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-22 15:30           ` Chris Friesen
@ 2009-04-22 17:04             ` Jon Hunter
  2009-04-22 18:53               ` Geert Uytterhoeven
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-04-22 17:04 UTC (permalink / raw)
  To: Chris Friesen; +Cc: john stultz, Ingo Molnar, Thomas Gleixner, linux-kernel


Chris Friesen wrote:
> Isn't "long long" guaranteed to be 64-bit on all linux systems?

If long long is guaranteed to be 64-bits this is the way to go. Looks 
like there was some previous discussion on making u64 always a long 
long, but I am not sure that this happened [1]. So may be this does 
confirm this?

> Unless the width is critical, I'd prefer to stay away from u64 until it 
> gets unified between architectures.  I recently ran into a problem 
> printk-ing a "u64" value because it was a different type on ppc64 than 
> x86-64.

It is not critical but maybe more ideal, as it would be nice to be 
explicit that this variable is intended to be 64bits. In fact the issue 
you saw with the printk is one of the reasons that I previously 
mentioned of why I had opted to stay with long long. I also found that 
this issue was discussed in the thread I mentioned above [1]. Seems like 
a common problem.

The alternative is to use u64 and make sure that all printks cast the 
variable to long long where necessary. However, this is not clean and 
you do run the risk of a new print being added that does not take this 
into account and breaks the code for some architectures. So I wished to 
avoid this.

For this specific case using long long should be fine. Even if there is 
a case where long long is not 64bits, then this would not break 
functionality, simply increase of decrease the dynamic range of 
max_delta_ns and min_delta_ns.

Cheers
Jon

[1] http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/2805.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for  more than 2.15 seconds
  2009-04-22 17:04             ` Jon Hunter
@ 2009-04-22 18:53               ` Geert Uytterhoeven
  0 siblings, 0 replies; 30+ messages in thread
From: Geert Uytterhoeven @ 2009-04-22 18:53 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Chris Friesen, john stultz, Ingo Molnar, Thomas Gleixner, linux-kernel

On Wed, Apr 22, 2009 at 19:04, Jon Hunter <jon-hunter@ti.com> wrote:
> Chris Friesen wrote:
>>
>> Isn't "long long" guaranteed to be 64-bit on all linux systems?
>
> If long long is guaranteed to be 64-bits this is the way to go. Looks like
> there was some previous discussion on making u64 always a long long, but I
> am not sure that this happened [1]. So may be this does confirm this?
>
>> Unless the width is critical, I'd prefer to stay away from u64 until it
>> gets unified between architectures.  I recently ran into a problem
>> printk-ing a "u64" value because it was a different type on ppc64 than
>> x86-64.
>
> It is not critical but maybe more ideal, as it would be nice to be explicit
> that this variable is intended to be 64bits. In fact the issue you saw with
> the printk is one of the reasons that I previously mentioned of why I had
> opted to stay with long long. I also found that this issue was discussed in
> the thread I mentioned above [1]. Seems like a common problem.
>
> The alternative is to use u64 and make sure that all printks cast the
> variable to long long where necessary. However, this is not clean and you do
> run the risk of a new print being added that does not take this into account
> and breaks the code for some architectures. So I wished to avoid this.

That's why recently, u64 became `unsigned long long' on ppc64. So please stay
away from the casts.

-- 
Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds
  2009-04-22  0:02       ` john stultz
@ 2009-05-07 14:52         ` Jon Hunter
  2009-05-08  0:54           ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formore " john stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-07 14:52 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


john stultz wrote:
> So yes, while your patch in of itself doesn't create the issue, it does
> open the window a bit more. I'm just saying we need to add the
> clocksource limiting factor in before more odd configs trip over it. :)

Thanks, this helps. I have been looking into this some more and it seems 
to me that the appropriate place in the tick_nohz_stop_sched_tick() 
function to check the clocksource upper limit would be in the following 
code segment:

	/* Read jiffies and the time when jiffies were updated last */
          do {
                  seq = read_seqbegin(&xtime_lock);
                  last_update = last_jiffies_update;
                  last_jiffies = jiffies;
          } while (read_seqretry(&xtime_lock, seq));

          /* Get the next timer wheel timer */
          next_jiffies = get_next_timer_interrupt(last_jiffies);
          delta_jiffies = next_jiffies - last_jiffies;


Here is my thinking...

1). The above do-while is holding the xtime_lock for updating a couple 
variables. Given that we would need to hold this lock when querying the 
max time that the clocksource can be deferred it would seem to make 
sense to do it in this same loop to avoid requesting the same lock twice 
in the same function.

2). The function "get_next_timer_interrupt()" returns the time of when 
the next timer event is due to expire. If we know the maximum number of 
jiffies that can elapse before the clocksource wraps, then we can simply 
compare delta_jiffies with the maximum jiffies for the clocksource and 
adjust delta_jiffies if it is greater than the maximum jiffies.

3). The maximum jiffies that can elapse before a clocksource wraps 
should be a constant value which can be derived from the clocksource 
mask value. Therefore, I was thinking that it would be more 
efficient/optimal to calculate the maximum jiffies that can elapse 
before the clocksource wraps when the clocksource is registered and 
store it in the main clocksource structure.

So taking the above into account, I was thinking of something along the 
lines of the following. Let me know if you have any thoughts/comments.

Cheers
Jon


Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/clockchips.h  |    6 +++---
  include/linux/clocksource.h |   16 ++++++++++++++++
  include/linux/time.h        |    1 +
  kernel/hrtimer.c            |    2 +-
  kernel/time/clockevents.c   |   10 +++++-----
  kernel/time/clocksource.c   |    6 ++++++
  kernel/time/tick-oneshot.c  |    2 +-
  kernel/time/tick-sched.c    |   17 ++++++++++++++++-
  kernel/time/timekeeping.c   |   10 ++++++++++
  kernel/time/timer_list.c    |    4 ++--
  10 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 3a1dbba..8154bc6 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -77,8 +77,8 @@ enum clock_event_nofitiers {
  struct clock_event_device {
  	const char		*name;
  	unsigned int		features;
-	unsigned long		max_delta_ns;
-	unsigned long		min_delta_ns;
+	unsigned long long	max_delta_ns;
+	unsigned long long	min_delta_ns;
  	unsigned long		mult;
  	int			shift;
  	int			rating;
@@ -116,7 +116,7 @@ static inline unsigned long div_sc(unsigned long 
ticks, unsigned long nsec,
  }

  /* Clock event layer functions */
-extern unsigned long clockevent_delta2ns(unsigned long latch,
+extern unsigned long long clockevent_delta2ns(unsigned long latch,
  					 struct clock_event_device *evt);
  extern void clockevents_register_device(struct clock_event_device *dev);

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..2e45f54 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -151,6 +151,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
   * @mult:		cycle to nanosecond multiplier (adjusted by NTP)
   * @mult_orig:		cycle to nanosecond multiplier (unadjusted by NTP)
   * @shift:		cycle to nanosecond divisor (power of two)
+ * @max_jiffies:	max jiffies that can elapse before the clocksource wraps
   * @flags:		flags describing special propertiesDynamic Tick: Allow 
32-bit machines to sleep for more than 2.15 seconds
   * @vread:		vsyscall based read
   * @resume:		resume function for the clocksource, if necessary
@@ -171,6 +172,7 @@ struct clocksource {
  	u32 mult;
  	u32 mult_orig;
  	u32 shift;
+	unsigned long max_jiffies;
  	unsigned long flags;
  	cycle_t (*vread)(void);
  	void (*resume)(void);
@@ -322,6 +324,20 @@ static inline s64 cyc2ns(struct clocksource *cs, 
cycle_t cycles)
  }

  /**
+ * cyc2jiffies - converts clocksource cycles to jiffies
+ * @cs:         Pointer to clocksource
+ * @cycles:     Cycles
+ *
+ */
+static inline unsigned long cyc2jiffies(struct clocksource *cs, cycle_t 
cycles)
+{
+	struct timespec ts;
+
+	ts = ns_to_timespec(cyc2ns(cs, cycles));
+	return timespec_to_jiffies(&ts);
+}
+
+/**
   * clocksource_calculate_interval - Calculates a clocksource interval 
struct
   *
   * @c:		Pointer to clocksource.
diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..51f80aa 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern unsigned long timekeeping_max_jiffies(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cb8a15c..5b1cdc4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1199,7 +1199,7 @@ hrtimer_interrupt_hanging(struct 
clock_event_device *dev,
  	force_clock_reprogram = 1;
  	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
  	printk(KERN_WARNING "hrtimer: interrupt too slow, "
-		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
+		"forcing clock min delta to %llu ns\n", dev->min_delta_ns);
  }
  /*
   * High resolution timer interrupt
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index d13be21..3fa07b3 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -36,10 +36,10 @@ static DEFINE_SPINLOCK(clockevents_lock);
   *
   * Math helper, returns latch value converted to nanoseconds (bound 
checked)
   */
-unsigned long clockevent_delta2ns(unsigned long latch,
+unsigned long long clockevent_delta2ns(unsigned long latch,
  				  struct clock_event_device *evt)
  {
-	u64 clc = ((u64) latch << evt->shift);
+	unsigned long long clc = ((unsigned long long) latch << evt->shift);

  	if (unlikely(!evt->mult)) {
  		evt->mult = 1;
@@ -49,10 +49,10 @@ unsigned long clockevent_delta2ns(unsigned long latch,
  	do_div(clc, evt->mult);
  	if (clc < 1000)
  		clc = 1000;
-	if (clc > LONG_MAX)
-		clc = LONG_MAX;
+	if (clc > LLONG_MAX)
+		clc = LLONG_MAX;

-	return (unsigned long) clc;
+	return clc;
  }

  /**
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ecfd7b5..c6cbf47 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -405,6 +405,12 @@ int clocksource_register(struct clocksource *c)
  	/* save mult_orig on registration */
  	c->mult_orig = c->mult;

+	/*
+	 * calculate max jiffies than can occur
+	 * before clocksource wraps
+	 */
+	c->max_jiffies = cyc2jiffies(c, c->mask);
+
  	spin_lock_irqsave(&clocksource_lock, flags);
  	ret = clocksource_enqueue(c);
  	if (!ret)
diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
index 2e8de67..857087b 100644
--- a/kernel/time/tick-oneshot.c
+++ b/kernel/time/tick-oneshot.c
@@ -50,7 +50,7 @@ int tick_dev_program_event(struct clock_event_device 
*dev, ktime_t expires,
  				dev->min_delta_ns += dev->min_delta_ns >> 1;

  			printk(KERN_WARNING
-			       "CE: %s increasing min_delta_ns to %lu nsec\n",
+			       "CE: %s increasing min_delta_ns to %llu nsec\n",
  			       dev->name ? dev->name : "?",
  			       dev->min_delta_ns << 1);

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..983eb5f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -212,7 +212,8 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
   */
  void tick_nohz_stop_sched_tick(int inidle)
  {
-	unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
+	unsigned long seq, flags;
+	unsigned long last_jiffies, next_jiffies, delta_jiffies, max_jiffies;
  	struct tick_sched *ts;
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
@@ -264,12 +265,26 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_jiffies = timekeeping_max_jiffies();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
  	next_jiffies = get_next_timer_interrupt(last_jiffies);
  	delta_jiffies = next_jiffies - last_jiffies;

+	/*
+	 * Make sure that the delta jiffies is not greater than
+	 * the max jiffies for the current clocksource.
+	 */
+	if (delta_jiffies >= max_jiffies) {
+
+		/*
+		 * Set delta jiffies to the maximum number of jiffies
+		 * less 1 to ensure that the clocksource will not wrap.
+		 */
+		delta_jiffies = max_jiffies - 1;
+	}
+
  	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
  		delta_jiffies = 1;
  	/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..cd35bfc 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,16 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_jiffies - returns max jiffies the current 
clocksource can count
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+unsigned long timekeeping_max_jiffies(void)
+{
+	return clock->max_jiffies;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index a999b92..3bf30b4 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -204,8 +204,8 @@ print_tickdevice(struct seq_file *m, struct 
tick_device *td, int cpu)
  		return;
  	}
  	SEQ_printf(m, "%s\n", dev->name);
-	SEQ_printf(m, " max_delta_ns:   %lu\n", dev->max_delta_ns);
-	SEQ_printf(m, " min_delta_ns:   %lu\n", dev->min_delta_ns);
+	SEQ_printf(m, " max_delta_ns:   %llu\n", dev->max_delta_ns);
+	SEQ_printf(m, " min_delta_ns:   %llu\n", dev->min_delta_ns);
  	SEQ_printf(m, " mult:           %lu\n", dev->mult);
  	SEQ_printf(m, " shift:          %d\n", dev->shift);
  	SEQ_printf(m, " mode:           %d\n", dev->mode);
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep  formore than 2.15 seconds
  2009-05-07 14:52         ` Jon Hunter
@ 2009-05-08  0:54           ` john stultz
  2009-05-08 16:05             ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: john stultz @ 2009-05-08  0:54 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Thu, 2009-05-07 at 09:52 -0500, Jon Hunter wrote:
> john stultz wrote:
> > So yes, while your patch in of itself doesn't create the issue, it does
> > open the window a bit more. I'm just saying we need to add the
> > clocksource limiting factor in before more odd configs trip over it. :)
> 
> Thanks, this helps. I have been looking into this some more and it seems 
> to me that the appropriate place in the tick_nohz_stop_sched_tick() 
> function to check the clocksource upper limit would be in the following 
> code segment:
> 
> 	/* Read jiffies and the time when jiffies were updated last */
>           do {
>                   seq = read_seqbegin(&xtime_lock);
>                   last_update = last_jiffies_update;
>                   last_jiffies = jiffies;
>           } while (read_seqretry(&xtime_lock, seq));
> 
>           /* Get the next timer wheel timer */
>           next_jiffies = get_next_timer_interrupt(last_jiffies);
>           delta_jiffies = next_jiffies - last_jiffies;
> 
> 
> Here is my thinking...
> 
> 1). The above do-while is holding the xtime_lock for updating a couple 
> variables. Given that we would need to hold this lock when querying the 
> max time that the clocksource can be deferred it would seem to make 
> sense to do it in this same loop to avoid requesting the same lock twice 
> in the same function.

Yep. It would be easy to pull:
	max_time_delta = timekeeping_max_deferment()
when you read jiffies.


> 2). The function "get_next_timer_interrupt()" returns the time of when 
> the next timer event is due to expire. If we know the maximum number of 
> jiffies that can elapse before the clocksource wraps, then we can simply 
> compare delta_jiffies with the maximum jiffies for the clocksource and 
> adjust delta_jiffies if it is greater than the maximum jiffies.

Urr. Lets move away from jiffies. Jiffies bad. Human time good.

Its easy to get the max value in ns right now, last_update is already a
ktime_t. 

I think checking if expires (little bit lower in the same function) is
larger then (last_update + max_time_delta) would be much much simpler.


> 3). The maximum jiffies that can elapse before a clocksource wraps 
> should be a constant value which can be derived from the clocksource 
> mask value. Therefore, I was thinking that it would be more 
> efficient/optimal to calculate the maximum jiffies that can elapse 
> before the clocksource wraps when the clocksource is registered and 
> store it in the main clocksource structure.

Yeeks. No, lets not do this. Cluttering up the clocksource with jiffies
values is totally unnecessary.


> So taking the above into account, I was thinking of something along the 
> lines of the following. Let me know if you have any thoughts/comments.
> 
> Cheers
> Jon
> 
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>


So while I really don't like your patch, I think you have the right
idea. Just keep things in nanoseconds, rather then converting them to
jiffies first. It will be much simpler patch and won't affect as much
code.

Look at the simple accessor patch I sent earlier:
http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html

You could require the xtime_lock be held while calling to doge grabbing
it twice and plug it right in.

thanks
-john


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep  formore than 2.15 seconds
  2009-05-08  0:54           ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formore " john stultz
@ 2009-05-08 16:05             ` Jon Hunter
  2009-05-09  0:51               ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan " john stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-08 16:05 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


john stultz wrote:
> Yep. It would be easy to pull:
> 	max_time_delta = timekeeping_max_deferment()
> when you read jiffies.

Ok, will do.

> Urr. Lets move away from jiffies. Jiffies bad. Human time good.
> 
> Its easy to get the max value in ns right now, last_update is already a
> ktime_t. 
> 
> I think checking if expires (little bit lower in the same function) is
> larger then (last_update + max_time_delta) would be much much simpler.

No problem. I will do this too. I had a bit of a tough time figuring out 
what was best here.

> Yeeks. No, lets not do this. Cluttering up the clocksource with jiffies
> values is totally unnecessary.

Sorry about that. I have a tendency to try to reduce run-time 
computation if I can do it once at the beginning. However, I also 
dislike clutter so we can keep it the way you recommend.

> So while I really don't like your patch, I think you have the right
> idea. Just keep things in nanoseconds, rather then converting them to
> jiffies first. It will be much simpler patch and won't affect as much
> code.
> 
> Look at the simple accessor patch I sent earlier:
> http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> 
> You could require the xtime_lock be held while calling to doge grabbing
> it twice and plug it right in.

Done. See below. Please note that the below patch is simply for handling 
the wrapping of clocksources and does not include the original patch to 
convert max_delta_ns to long long (to keep things simple for now).

I still have a couple concerns:

1). The use of delta_jiffies

The below patch does not update delta_jiffies. In the current code 
delta_jiffies in a couple places after "expires" is calculated. So if we 
adjust expires to account for clocksource wrap, we should avoid using 
delta_jiffies later on. The first place delta_jiffies is used after 
expires is calculated is here:

                  if (delta_jiffies > 1)
                          cpumask_set_cpu(cpu, nohz_cpu_mask);

The second place is here:

                  /*
                   * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
                   * there is no timer pending or at least extremly far
                   * into the future (12 days for HZ=1000). In this case
                   * we simply stop the tick timer:
                   */
                  if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
                          ts->idle_expires.tv64 = KTIME_MAX;
                          if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
                                  hrtimer_cancel(&ts->sched_timer);
                          goto out;

Currently I have modified these compares so that the comparisons are 
done in nanoseconds and not jiffies to be safe. Let me know your thoughts.


2). Clocksource max deferment

In your original patch you suggested that we should reduce the max time 
returned by the function timekeeping_max_deferment by some amount. Would 
it make sense to reduce this by a jiffie? In the current dynamic tick 
code we will only defer the tick if the next event is greater than or 
equal to 1 jiffie.

Cheers
Jon



Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/time.h      |    1 +
  kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
  kernel/time/timekeeping.c |   14 ++++++++++++++
  3 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..5f9ba13 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 * /
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..a2ce815 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,20 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	s64 max_nsecs;
+
+	max_nsecs = cyc2ns(clock, clock->mask);
+
+	return max_nsecs; /* XXX maybe reduce by some amount to be safe? */
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep   formorethan 2.15 seconds
  2009-05-08 16:05             ` Jon Hunter
@ 2009-05-09  0:51               ` john stultz
  2009-05-12 23:35                 ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: john stultz @ 2009-05-09  0:51 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri, 2009-05-08 at 11:05 -0500, Jon Hunter wrote:
> john stultz wrote:
> > Yep. It would be easy to pull:
> > 	max_time_delta = timekeeping_max_deferment()
> > when you read jiffies.
> 
> Ok, will do.
> 
> > Urr. Lets move away from jiffies. Jiffies bad. Human time good.
> > 
> > Its easy to get the max value in ns right now, last_update is already a
> > ktime_t. 
> > 
> > I think checking if expires (little bit lower in the same function) is
> > larger then (last_update + max_time_delta) would be much much simpler.
> 
> No problem. I will do this too. I had a bit of a tough time figuring out 
> what was best here.
> 
> > Yeeks. No, lets not do this. Cluttering up the clocksource with jiffies
> > values is totally unnecessary.
> 
> Sorry about that. I have a tendency to try to reduce run-time 
> computation if I can do it once at the beginning. However, I also 
> dislike clutter so we can keep it the way you recommend.
> 
> > So while I really don't like your patch, I think you have the right
> > idea. Just keep things in nanoseconds, rather then converting them to
> > jiffies first. It will be much simpler patch and won't affect as much
> > code.
> > 
> > Look at the simple accessor patch I sent earlier:
> > http://lkml.indiana.edu/hypermail/linux/kernel/0901.3/02693.html
> > 
> > You could require the xtime_lock be held while calling to doge grabbing
> > it twice and plug it right in.
> 
> Done. See below. Please note that the below patch is simply for handling 
> the wrapping of clocksources and does not include the original patch to 
> convert max_delta_ns to long long (to keep things simple for now).
> 
> I still have a couple concerns:
> 
> 1). The use of delta_jiffies
> 
> The below patch does not update delta_jiffies. In the current code 
> delta_jiffies in a couple places after "expires" is calculated. So if we 
> adjust expires to account for clocksource wrap, we should avoid using 
> delta_jiffies later on. The first place delta_jiffies is used after 
> expires is calculated is here:
> 
>                   if (delta_jiffies > 1)
>                           cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
> The second place is here:
> 
>                   /*
>                    * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
>                    * there is no timer pending or at least extremly far
>                    * into the future (12 days for HZ=1000). In this case
>                    * we simply stop the tick timer:
>                    */
>                   if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
>                           ts->idle_expires.tv64 = KTIME_MAX;
>                           if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>                                   hrtimer_cancel(&ts->sched_timer);
>                           goto out;
> 
> Currently I have modified these compares so that the comparisons are 
> done in nanoseconds and not jiffies to be safe. Let me know your thoughts.

Yep. Looks good to me.

> 
> 2). Clocksource max deferment
> 
> In your original patch you suggested that we should reduce the max time 
> returned by the function timekeeping_max_deferment by some amount. Would 
> it make sense to reduce this by a jiffie? In the current dynamic tick 
> code we will only defer the tick if the next event is greater than or 
> equal to 1 jiffie.

Yea. NSEC_PER_SEC/HZ would probably be safe. I was initially thinking
being more paranoid and just dividing it in half, but that's probably a
bit silly.

As far the decision to defer if the next even is greater then one jiffy
away, that seems reasonable, but I'd not embed that into the
timekeeping_max_deferrment(). 

I'm suggesting we drop timekeeping_max_deferrment() down since that's
the absolute maximum and we're sure to break if we actually wait that
long (since the time between clocksource reads would certainly be longer
due to execution delay). 1HZ seems reasonable, since we should easily be
able to run the tick code twice in that time, as well as it should be
easily within the interrupt programming granularity.

Any additional decisions as to how far out we should be before we start
skipping ticks would be up to the tick resched code, and shouldn't be in
the timekeeping function.

Sound sane? If so add that in and I'll ack it.

> 
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

This looks *much* better to me. Thanks for reworking it!

Thomas, this all sound/look good to you?

thanks
-john

> ---
>   include/linux/time.h      |    1 +
>   kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
>   kernel/time/timekeeping.c |   14 ++++++++++++++
>   3 files changed, 40 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..5f9ba13 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 * /
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..a2ce815 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,20 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +
> +	max_nsecs = cyc2ns(clock, clock->mask);
> +
> +	return max_nsecs; /* XXX maybe reduce by some amount to be safe? */
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep   formorethan 2.15 seconds
  2009-05-09  0:51               ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan " john stultz
@ 2009-05-12 23:35                 ` Jon Hunter
  2009-05-12 23:58                   ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds john stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-12 23:35 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


john stultz wrote:
> Yea. NSEC_PER_SEC/HZ would probably be safe. I was initially thinking
> being more paranoid and just dividing it in half, but that's probably a
> bit silly.

Thanks, I have added the code to subtract NSEC_PER_SEC/HZ. Should we 
have any concerns about the adjustment of the mult value? This is the 
only thing that could impact the value returned from 
timekeeping_max_deferment(). I am not familiar with exactly how this is 
working so just wanted to ask.

> As far the decision to defer if the next even is greater then one jiffy
> away, that seems reasonable, but I'd not embed that into the
> timekeeping_max_deferrment(). 
> 
> I'm suggesting we drop timekeeping_max_deferrment() down since that's
> the absolute maximum and we're sure to break if we actually wait that
> long (since the time between clocksource reads would certainly be longer
> due to execution delay). 1HZ seems reasonable, since we should easily be
> able to run the tick code twice in that time, as well as it should be
> easily within the interrupt programming granularity.
> 
> Any additional decisions as to how far out we should be before we start
> skipping ticks would be up to the tick resched code, and shouldn't be in
> the timekeeping function.
> 
> Sound sane? If so add that in and I'll ack it.

Yes, agree. See below. By the way I have kept the below patch separate 
from the original I posted here:

http://marc.info/?l=linux-kernel&m=124026224019895&w=2

I was not sure if you would prefer to keep these as two patch series or 
make it one single patch. Let me know if you would like me to combine or 
re-post as a two patch series.

Please note that the environment I have been running some basic tests on 
is a single core ARM device. I just wanted to let you know in case you 
have any concerns with this.

> This looks *much* better to me. Thanks for reworking it!

Great! No problem. Thanks for your help and feedback.

Cheers
Jon


Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/time.h      |    1 +
  kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
  kernel/time/timekeeping.c |   19 +++++++++++++++++++
  3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..7617fbe 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,25 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	s64 max_nsecs;
+
+	/*
+	 * Limit the time the clocksource can be
+	 * deferred by one jiffie period to ensure
+	 * that the clocksource will not wrap.
+	 */
+	max_nsecs = cyc2ns(clock, clock->mask) - (NSEC_PER_SEC/HZ);
+
+	return max_nsecs;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-12 23:35                 ` Jon Hunter
@ 2009-05-12 23:58                   ` john stultz
  2009-05-13 15:14                     ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: john stultz @ 2009-05-12 23:58 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Tue, 2009-05-12 at 18:35 -0500, Jon Hunter wrote:
> john stultz wrote:
> > Yea. NSEC_PER_SEC/HZ would probably be safe. I was initially thinking
> > being more paranoid and just dividing it in half, but that's probably a
> > bit silly.
> 
> Thanks, I have added the code to subtract NSEC_PER_SEC/HZ. Should we 
> have any concerns about the adjustment of the mult value? This is the 
> only thing that could impact the value returned from 
> timekeeping_max_deferment(). I am not familiar with exactly how this is 
> working so just wanted to ask.

Well, the mult adjustments should be quite small, especially compared to
the NSEC_PER_SEC/HZ adjustment.

Hmm... Although, I guess we could get bitten if the max_deferment was
like an hour, and the adjustment was enough that it scaled out to and we
ended up being a second late or so. So you have a point.

But since the clockevent driver is not scaled, we probably can get away
with using the orig_mult value instead of mult, and be ok.

Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
up without a problem. 

I suspect it would be tough to hit this issue though.

> > As far the decision to defer if the next even is greater then one jiffy
> > away, that seems reasonable, but I'd not embed that into the
> > timekeeping_max_deferrment(). 
> > 
> > I'm suggesting we drop timekeeping_max_deferrment() down since that's
> > the absolute maximum and we're sure to break if we actually wait that
> > long (since the time between clocksource reads would certainly be longer
> > due to execution delay). 1HZ seems reasonable, since we should easily be
> > able to run the tick code twice in that time, as well as it should be
> > easily within the interrupt programming granularity.
> > 
> > Any additional decisions as to how far out we should be before we start
> > skipping ticks would be up to the tick resched code, and shouldn't be in
> > the timekeeping function.
> > 
> > Sound sane? If so add that in and I'll ack it.
> 
> Yes, agree. See below. By the way I have kept the below patch separate 
> from the original I posted here:
> 
> http://marc.info/?l=linux-kernel&m=124026224019895&w=2
> 
> I was not sure if you would prefer to keep these as two patch series or 
> make it one single patch. Let me know if you would like me to combine or 
> re-post as a two patch series.

Two patches should be fine.

> Please note that the environment I have been running some basic tests on 
> is a single core ARM device. I just wanted to let you know in case you 
> have any concerns with this.
> 
> > This looks *much* better to me. Thanks for reworking it!
> 
> Great! No problem. Thanks for your help and feedback.
> 
> Cheers
> Jon
> 
> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Looks good overall. We may want to add the -10% (or -5%) to be totally
safe, but that's likely just me being paranoid.

Also one more safety issue below.

Otherwise,
Acked-by: John Stultz <johnstul@us.ibm.com>


thanks
-john


> ---
>   include/linux/time.h      |    1 +
>   kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
>   kernel/time/timekeeping.c |   19 +++++++++++++++++++
>   3 files changed, 45 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..7617fbe 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,25 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +
> +	/*
> +	 * Limit the time the clocksource can be
> +	 * deferred by one jiffie period to ensure
> +	 * that the clocksource will not wrap.
> +	 */
> +	max_nsecs = cyc2ns(clock, clock->mask) - (NSEC_PER_SEC/HZ);
> +

This seems really unlikely, but you might want to add something like:

	if (max_nsecs < 0)
		max_nsecs = 0;

To avoid negative underflows. I don't see how a system could be running
in highres mode if the clocksource isn't continuous for longer then a
tick, but probably a good idea none the less.


> +	return max_nsecs;
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-12 23:58                   ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds john stultz
@ 2009-05-13 15:14                     ` Jon Hunter
  2009-05-13 16:41                       ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-13 15:14 UTC (permalink / raw)
  To: john stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


john stultz wrote:
> Well, the mult adjustments should be quite small, especially compared to
> the NSEC_PER_SEC/HZ adjustment.
> 
> Hmm... Although, I guess we could get bitten if the max_deferment was
> like an hour, and the adjustment was enough that it scaled out to and we
> ended up being a second late or so. So you have a point.
> 
> But since the clockevent driver is not scaled, we probably can get away
> with using the orig_mult value instead of mult, and be ok.
> 
> Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
> larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
> up without a problem. 

Yes, may be this would be a safer option. Thinking about this I was 
wondering if we should always use max_deferement/10, because I did not 
think that there would ever be a case where NSEC_PER_SEC/HZ would be 
greater. If NSEC_PER_SEC/HZ was greater than max_deferement/10 this 
would imply that the clocksource would wrap after only 10 jiffies, if I 
have the math right...

> I suspect it would be tough to hit this issue though.

Agree.

> Two patches should be fine.

Ok, I will re-post as two once we have the final version.

> Looks good overall. We may want to add the -10% (or -5%) to be totally
> safe, but that's likely just me being paranoid.

I am paranoid too! Do you care if we use 6.25% or 12.5% margin instead 
of 10% or 5%? This way we can avoid a 64-bit division by using a simple 
shift. See below. I have implemented a 6.25% margin for now. Let me know 
your thoughts.

One final question, I noticed in clocksource.h that the definition of 
function cyc2ns returns a type of s64, however, in the function itself a 
variable of type u64 is used and returned. Should this function be 
modified as follows?

  static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
  {
-       u64 ret = (u64)cycles;
+       s64 ret = (s64)cycles;
         ret = (ret * cs->mult) >> cs->shift;
         return ret;
  }

Cheers
Jon


Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/time.h      |    1 +
  kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++++-----------
  kernel/time/timekeeping.c |   28 ++++++++++++++++++++++++++++
  3 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..e764ac8 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,34 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	s64 max_nsecs, margin;
+
+	max_nsecs = cyc2ns(clock, clock->mask);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * let's limit the time the clocksource can be deferred by 6.25% of
+	 * the total time the clocksource can count. Please note a margin
+	 * of 6.25% is used because this can be computed with a shift,
+	 * versus say 5% which would require division.
+	 */
+	margin = max_nsecs >> 4;
+
+	max_nsecs = max_nsecs - margin;
+
+	if (max_nsecs < 0)
+		max_nsecs = 0;
+
+	return max_nsecs;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-13 15:14                     ` Jon Hunter
@ 2009-05-13 16:41                       ` John Stultz
  2009-05-13 17:54                         ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2009-05-13 16:41 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Wed, 2009-05-13 at 10:14 -0500, Jon Hunter wrote:
> john stultz wrote:
> > Well, the mult adjustments should be quite small, especially compared to
> > the NSEC_PER_SEC/HZ adjustment.
> > 
> > Hmm... Although, I guess we could get bitten if the max_deferment was
> > like an hour, and the adjustment was enough that it scaled out to and we
> > ended up being a second late or so. So you have a point.
> > 
> > But since the clockevent driver is not scaled, we probably can get away
> > with using the orig_mult value instead of mult, and be ok.
> > 
> > Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
> > larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
> > up without a problem. 
> 
> Yes, may be this would be a safer option. Thinking about this I was 
> wondering if we should always use max_deferement/10, because I did not 
> think that there would ever be a case where NSEC_PER_SEC/HZ would be 
> greater. If NSEC_PER_SEC/HZ was greater than max_deferement/10 this 
> would imply that the clocksource would wrap after only 10 jiffies, if I 
> have the math right...

Right, but even with such limitiations, if an arch can skip every 5
ticks, they probably will try, right? :)


> > I suspect it would be tough to hit this issue though.
> 
> Agree.
> 
> > Two patches should be fine.
> 
> Ok, I will re-post as two once we have the final version.
> 
> > Looks good overall. We may want to add the -10% (or -5%) to be totally
> > safe, but that's likely just me being paranoid.
> 
> I am paranoid too! Do you care if we use 6.25% or 12.5% margin instead 
> of 10% or 5%? This way we can avoid a 64-bit division by using a simple 
> shift. See below. I have implemented a 6.25% margin for now. Let me know 
> your thoughts.

That sounds reasonable to me.

> One final question, I noticed in clocksource.h that the definition of 
> function cyc2ns returns a type of s64, however, in the function itself a 
> variable of type u64 is used and returned. Should this function be 
> modified as follows?
> 
>   static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
>   {
> -       u64 ret = (u64)cycles;
> +       s64 ret = (s64)cycles;
>          ret = (ret * cs->mult) >> cs->shift;
>          return ret;
>   }

Damn. So this brings up an issue I had missed prior.

Given that a clocksoure's shift value is calculated so that the
cycles*mult multiplication won't overflow 64 bits, there may be inherent
assumptions in the clocksource code that limit the results of
timekeeping_max_deferrment() other then just the clocksource mask value.

So even if the clocksource doesn't wrap over the interval, it could be
large enough to cause the cyc2ns function to break given a large enough
cycle interval.

Right now that assumption is spread out in the individual clocksource
drivers. I've got a calculate_shift() helper patch sitting around that
will help unify that, but even so there still is a trade-off in that:

1) The larger the shift value, the finer grained (smoother) we can be in
making NTP adjustments. 

2) The smaller the shift value, the smaller the mult value, so the
longer the cycle interval length can be before we overflow.

So I'm not sure how we can better extend this in all cases.

I'll have to think about how that would change
timekeeping_max_deferment() and how we'd have to calculate a reasonable
max efficiently.

Other then this issue (which is my fault for not noticing it earlier),
you're patch looks great. I just feel badly for making you rev this
thing over and over. 

One option if you're itching to push it in and be done with it: Make
timekeeping_max_deferment() return just 1 second for now. Your patch
provides the right infrastructure for the timekeeping code to provide
its limits to the clockevents code. So you can use a safe short constant
value for now, and we can extend that out correctly in a future patch.

Sorry again for not catching this until now. :(

thanks
-john




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-13 16:41                       ` John Stultz
@ 2009-05-13 17:54                         ` Jon Hunter
  2009-05-13 19:21                           ` John Stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-13 17:54 UTC (permalink / raw)
  To: John Stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


John Stultz wrote:
>>> Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
>>> larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
>>> up without a problem. 
>> Yes, may be this would be a safer option. Thinking about this I was 
>> wondering if we should always use max_deferement/10, because I did not 
>> think that there would ever be a case where NSEC_PER_SEC/HZ would be 
>> greater. If NSEC_PER_SEC/HZ was greater than max_deferement/10 this 
>> would imply that the clocksource would wrap after only 10 jiffies, if I 
>> have the math right...
> 
> Right, but even with such limitiations, if an arch can skip every 5
> ticks, they probably will try, right? :)

Sure, but I guess I was wondering if there would ever be a clocksource 
that would overflow in 10-20 ticks? If not then it would be safe to 
always use -10% or -5% margin and we can forget about NSEC_PER_SEC/HZ.

Unless I am understanding this wrong, but I thought we are just trying 
to make sure we never sleep for a time longer than the total time a 
clocksource can count.

> That sounds reasonable to me.

Great.

>> One final question, I noticed in clocksource.h that the definition of 
>> function cyc2ns returns a type of s64, however, in the function itself a 
>> variable of type u64 is used and returned. Should this function be 
>> modified as follows?
>>
>>   static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
>>   {
>> -       u64 ret = (u64)cycles;
>> +       s64 ret = (s64)cycles;
>>          ret = (ret * cs->mult) >> cs->shift;
>>          return ret;
>>   }
> 
> Damn. So this brings up an issue I had missed prior.

Any comments on whether this should be u64 versus s64?

> I'll have to think about how that would change
> timekeeping_max_deferment() and how we'd have to calculate a reasonable
> max efficiently.
> 
> Other then this issue (which is my fault for not noticing it earlier),
> you're patch looks great. I just feel badly for making you rev this
> thing over and over. 

No problem, its fine. Its more important for us to get this right so I 
am happy to help where I can.

> One option if you're itching to push it in and be done with it: Make
> timekeeping_max_deferment() return just 1 second for now. Your patch
> provides the right infrastructure for the timekeeping code to provide
> its limits to the clockevents code. So you can use a safe short constant
> value for now, and we can extend that out correctly in a future patch.

How about going back to your original thought and making it 50% margin 
for now? In other words, use max_deferment/2? Therefore, for clocksource 
that can count for 10s of years before overflowing it will not be as 
severe.

> Sorry again for not catching this until now. :(

No problem at all. Thanks for all the inputs.

Cheers
Jon



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-13 17:54                         ` Jon Hunter
@ 2009-05-13 19:21                           ` John Stultz
  2009-05-15 16:35                             ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: John Stultz @ 2009-05-13 19:21 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Wed, 2009-05-13 at 12:54 -0500, Jon Hunter wrote:
> John Stultz wrote:
> >>> Alternatively instead of NSEC_PER_SEC/HZ, we could always drop the
> >>> larger of NSEC_PER_SEC/HZ or max_deferment/10? That way we should scale
> >>> up without a problem. 
> >> Yes, may be this would be a safer option. Thinking about this I was 
> >> wondering if we should always use max_deferement/10, because I did not 
> >> think that there would ever be a case where NSEC_PER_SEC/HZ would be 
> >> greater. If NSEC_PER_SEC/HZ was greater than max_deferement/10 this 
> >> would imply that the clocksource would wrap after only 10 jiffies, if I 
> >> have the math right...
> > 
> > Right, but even with such limitiations, if an arch can skip every 5
> > ticks, they probably will try, right? :)
> 
> Sure, but I guess I was wondering if there would ever be a clocksource 
> that would overflow in 10-20 ticks? If not then it would be safe to 
> always use -10% or -5% margin and we can forget about NSEC_PER_SEC/HZ.
> 
> Unless I am understanding this wrong, but I thought we are just trying 
> to make sure we never sleep for a time longer than the total time a 
> clocksource can count.
> 
> > That sounds reasonable to me.
> 
> Great.
> 
> >> One final question, I noticed in clocksource.h that the definition of 
> >> function cyc2ns returns a type of s64, however, in the function itself a 
> >> variable of type u64 is used and returned. Should this function be 
> >> modified as follows?
> >>
> >>   static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
> >>   {
> >> -       u64 ret = (u64)cycles;
> >> +       s64 ret = (s64)cycles;
> >>          ret = (ret * cs->mult) >> cs->shift;
> >>          return ret;
> >>   }
> > 
> > Damn. So this brings up an issue I had missed prior.
> 
> Any comments on whether this should be u64 versus s64?

I'd leave it alone for now. I'm concerns that in large multiplies, if
its a s64 the sign might get extended down by the shift. I need to look
at it in more detail though.


> > I'll have to think about how that would change
> > timekeeping_max_deferment() and how we'd have to calculate a reasonable
> > max efficiently.
> > 
> > Other then this issue (which is my fault for not noticing it earlier),
> > you're patch looks great. I just feel badly for making you rev this
> > thing over and over. 
> 
> No problem, its fine. Its more important for us to get this right so I 
> am happy to help where I can.
> 
> > One option if you're itching to push it in and be done with it: Make
> > timekeeping_max_deferment() return just 1 second for now. Your patch
> > provides the right infrastructure for the timekeeping code to provide
> > its limits to the clockevents code. So you can use a safe short constant
> > value for now, and we can extend that out correctly in a future patch.
> 
> How about going back to your original thought and making it 50% margin 
> for now? In other words, use max_deferment/2? Therefore, for clocksource 
> that can count for 10s of years before overflowing it will not be as 
> severe.

Well, even the 50% margin might be trouble, since it may overflow the
cyc2ns() function if the shift is large (on many clocksources is is
fairly large).

-john





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-13 19:21                           ` John Stultz
@ 2009-05-15 16:35                             ` Jon Hunter
  2009-05-15 18:55                               ` Jon Hunter
  2009-05-16  1:18                               ` John Stultz
  0 siblings, 2 replies; 30+ messages in thread
From: Jon Hunter @ 2009-05-15 16:35 UTC (permalink / raw)
  To: John Stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


John Stultz wrote:
>>>> One final question, I noticed in clocksource.h that the definition of 
>>>> function cyc2ns returns a type of s64, however, in the function itself a 
>>>> variable of type u64 is used and returned. Should this function be 
>>>> modified as follows?
>>>>
>>>>   static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
>>>>   {
>>>> -       u64 ret = (u64)cycles;
>>>> +       s64 ret = (s64)cycles;
>>>>          ret = (ret * cs->mult) >> cs->shift;
>>>>          return ret;
>>>>   }
>>> Damn. So this brings up an issue I had missed prior.
>> Any comments on whether this should be u64 versus s64?
> 
> I'd leave it alone for now. I'm concerns that in large multiplies, if
> its a s64 the sign might get extended down by the shift. I need to look
> at it in more detail though.

I have been thinking about this some more and I do agree that there is a 
chance that the multiply could overflow if the "cycles" and "mult" are 
large. From the perspective of the timekeeping_max_deferment() function 
this would be very likely for 64-bit clocksources when the mask will be 
equal to (2^64)-1. Therefore, how about modifying the function as 
follows in order to catch any occurrences of overflow?

Let me know if this is aligned with your thinking or if I am barking up 
the wrong tree here.

Cheers
Jon


diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 507235a..8204373 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -316,8 +316,32 @@ static inline void clocksource_disable(struct 
clocksource *cs)
   */
  static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
  {
-	s64 ret = (s64)cycles;
-	ret = (ret * cs->mult) >> cs->shift;
+	s64 ret;
+	u64 upper, lower, overflow;
+
+	/*
+	 * Split the calculation into two halves to ensure
+	 * that we can catch any overflow that may occur.
+	 */
+	upper = ((cycles >> 32) * cs->mult) >> cs->shift;
+	lower = ((cycles & 0xFFFFFFFF) * cs->mult) >> cs->shift;
+
+	/*
+	 * Check to see if the result will overflow. If
+	 * overflow is non-zero then the result is greater
+	 * than 63-bits which is the max positive value
+	 * for a signed result.
+	 */
+	overflow = (upper + (lower >> 32)) >> 31;
+
+	/*
+	 * If the result overflows, return the max value we can.
+	 */
+	if (overflow)
+		ret = LONG_MAX;
+	else
+		ret = (s64)((upper << 32) + lower);
+
  	return ret;
  }

-- 
1.6.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep    formorethan2.15 seconds
  2009-05-15 16:35                             ` Jon Hunter
@ 2009-05-15 18:55                               ` Jon Hunter
  2009-05-16  1:29                                 ` John Stultz
  2009-05-16  1:18                               ` John Stultz
  1 sibling, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-15 18:55 UTC (permalink / raw)
  To: John Stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


Jon Hunter wrote:
> +	/*
> +	 * If the result overflows, return the max value we can.
> +	 */
> +	if (overflow)
> +		ret = LONG_MAX;
> +	else
> +		ret = (s64)((upper << 32) + lower);
> +
>   	return ret;
>   }

Correction. Should have been LLONG_MAX and not LONG_MAX in the above. 
See below.

Jon


diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..647f228 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -316,8 +316,32 @@ static inline void clocksource_disable(struct 
clocksource *cs)
   */
  static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
  {
-       u64 ret = (u64)cycles;
-       ret = (ret * cs->mult) >> cs->shift;
+       s64 ret;
+       u64 upper, lower, overflow;
+
+       /*
+        * Split the calculation into two halves to ensure
+        * that we can catch any overflow that may occur.
+        */
+       upper = ((cycles >> 32) * cs->mult) >> cs->shift;
+       lower = ((cycles & 0xFFFFFFFF) * cs->mult) >> cs->shift;
+
+       /*
+        * Check to see if the result will overflow. If
+        * overflow is non-zero then the result is greater
+        * than 63-bits which is the max positive value
+        * for a signed result.
+        */
+       overflow = (upper + (lower >> 32)) >> 31;
+
+       /*
+        * If the result overflows, return the max value we can.
+        */
+       if (overflow)
+               ret = LLONG_MAX;
+       else
+               ret = (s64)((upper << 32) + lower);
+
         return ret;
  }


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-15 16:35                             ` Jon Hunter
  2009-05-15 18:55                               ` Jon Hunter
@ 2009-05-16  1:18                               ` John Stultz
  2009-05-22 18:21                                 ` Jon Hunter
  1 sibling, 1 reply; 30+ messages in thread
From: John Stultz @ 2009-05-16  1:18 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri, 2009-05-15 at 11:35 -0500, Jon Hunter wrote:
> John Stultz wrote:
> >>>> One final question, I noticed in clocksource.h that the definition of 
> >>>> function cyc2ns returns a type of s64, however, in the function itself a 
> >>>> variable of type u64 is used and returned. Should this function be 
> >>>> modified as follows?
> >>>>
> >>>>   static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
> >>>>   {
> >>>> -       u64 ret = (u64)cycles;
> >>>> +       s64 ret = (s64)cycles;
> >>>>          ret = (ret * cs->mult) >> cs->shift;
> >>>>          return ret;
> >>>>   }
> >>> Damn. So this brings up an issue I had missed prior.
> >> Any comments on whether this should be u64 versus s64?
> > 
> > I'd leave it alone for now. I'm concerns that in large multiplies, if
> > its a s64 the sign might get extended down by the shift. I need to look
> > at it in more detail though.
> 
> I have been thinking about this some more and I do agree that there is a 
> chance that the multiply could overflow if the "cycles" and "mult" are 
> large. From the perspective of the timekeeping_max_deferment() function 
> this would be very likely for 64-bit clocksources when the mask will be 
> equal to (2^64)-1. Therefore, how about modifying the function as 
> follows in order to catch any occurrences of overflow?
> 
> Let me know if this is aligned with your thinking or if I am barking up 
> the wrong tree here.

So cyc2ns is a very very hot path, so I don't think we want to muck with
that much.

Instead of catching the overflows, and then trying to handle them, we
really want to prevent overflows from happening. That is the main point
of the timekeeping_max_deferrment() interface after all ;)

So thinking about it a bit more, what we really want from
timekeeping_max_deferrment() is roughly:

	/* find the max cycle value that would overflow the mult */
	max_cycles = -1UUL/clocksource->mult;

	/* pick the smaller of max_cycles or the mask value */
	max_cycles = min(max_cycles, clocksource->mask);

	/* convert max_cycles into ns */
	max_time = cyc2ns(clocksource, max_cycles);

	/* take off 5% of the max to make sure we don't show up late */
	max_time = max_time - max_time/20;


We should be able to make this reasonably fast via:
	max_cycles = 1<<(64 - ilog2(clocksource->mult) - 1);
	max_cycles = min(max_cycles, clocksource->mask);
	max_time = cyc2ns(clocksource, max_cycles);
	max_time = max_time - max_time >> 4;

Does that seem reasonable?

thanks
-john




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-15 18:55                               ` Jon Hunter
@ 2009-05-16  1:29                                 ` John Stultz
  0 siblings, 0 replies; 30+ messages in thread
From: John Stultz @ 2009-05-16  1:29 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri, 2009-05-15 at 13:55 -0500, Jon Hunter wrote:
> Jon Hunter wrote:
> > +	/*
> > +	 * If the result overflows, return the max value we can.
> > +	 */
> > +	if (overflow)
> > +		ret = LONG_MAX;
> > +	else
> > +		ret = (s64)((upper << 32) + lower);
> > +
> >   	return ret;
> >   }
> 
> Correction. Should have been LLONG_MAX and not LONG_MAX in the above. 
> See below.

As I said before, I don't think we want to go this path, but as an
aside, if we did want to catch and handle overflows (which would be bad
for timekeeping as time would seemingly stop increasing, instead we
really just want to ensure they never happen), you'd not want to return
LLONG_MAX, since that would be *way* too big (500 years of nanoseconds)!

Instead you'd probably want:  LLONG_MAX >> clocksource->shift

Which would better mimic the (cycles*mult) >> shift calculation.

But I think the outline I provided in my last email is the better
solution.

thanks
-john


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-16  1:18                               ` John Stultz
@ 2009-05-22 18:21                                 ` Jon Hunter
  2009-05-22 19:23                                   ` john stultz
  2009-05-22 19:59                                   ` Thomas Gleixner
  0 siblings, 2 replies; 30+ messages in thread
From: Jon Hunter @ 2009-05-22 18:21 UTC (permalink / raw)
  To: John Stultz; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel


John Stultz wrote:
> So cyc2ns is a very very hot path, so I don't think we want to muck with
> that much.
> 
> Instead of catching the overflows, and then trying to handle them, we
> really want to prevent overflows from happening. That is the main point
> of the timekeeping_max_deferrment() interface after all ;)
> 
> So thinking about it a bit more, what we really want from
> timekeeping_max_deferrment() is roughly:
> 
> 	/* find the max cycle value that would overflow the mult */
> 	max_cycles = -1UUL/clocksource->mult;
> 
> 	/* pick the smaller of max_cycles or the mask value */
> 	max_cycles = min(max_cycles, clocksource->mask);
> 
> 	/* convert max_cycles into ns */
> 	max_time = cyc2ns(clocksource, max_cycles);
> 
> 	/* take off 5% of the max to make sure we don't show up late */
> 	max_time = max_time - max_time/20;
> 
> 
> We should be able to make this reasonably fast via:
> 	max_cycles = 1<<(64 - ilog2(clocksource->mult) - 1);
> 	max_cycles = min(max_cycles, clocksource->mask);
> 	max_time = cyc2ns(clocksource, max_cycles);
> 	max_time = max_time - max_time >> 4;
> 
> Does that seem reasonable?

Yes, good idea. Sorry for the delay in getting back to this. I have 
implemented the above and incorporated into the patch below.

Please note that the only modification I made to the above was to add 1 
to result of the log2. I found that the result of the log2 was rounded 
down and so this was still making max_cycles too big. Therefore to be on 
the safe side I add 1 to the result of log2 which will ensure max_cycles 
is not too big.

Thanks for the inputs. Let me know your thoughts.

Cheers
Jon


Signed-off-by: Jon Hunter <jon-hunter@ti.com>
---
  include/linux/time.h      |    1 +
  kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++----------
  kernel/time/timekeeping.c |   47 
+++++++++++++++++++++++++++++++++++++++++++++
  3 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 242f624..090be07 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);

  extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
  extern int timekeeping_valid_for_hres(void);
+extern s64 timekeeping_max_deferment(void);
  extern void update_wall_time(void);
  extern void update_xtime_cache(u64 nsec);

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d3f1ef4..f0155ae 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  	ktime_t last_update, expires, now;
  	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
  	int cpu;
+	s64 time_delta, max_time_delta;

  	local_irq_save(flags);

@@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		seq = read_seqbegin(&xtime_lock);
  		last_update = last_jiffies_update;
  		last_jiffies = jiffies;
+		max_time_delta = timekeeping_max_deferment();
  	} while (read_seqretry(&xtime_lock, seq));

  	/* Get the next timer wheel timer */
@@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
  	if ((long)delta_jiffies >= 1) {

  		/*
-		* calculate the expiry time for the next timer wheel
-		* timer
-		*/
-		expires = ktime_add_ns(last_update, tick_period.tv64 *
-				   delta_jiffies);
+		 * Calculate the time delta for the next timer event.
+		 * If the time delta exceeds the maximum time delta
+		 * permitted by the current clocksource then adjust
+		 * the time delta accordingly to ensure the
+		 * clocksource does not wrap.
+		 */
+		time_delta = tick_period.tv64 * delta_jiffies;
+
+		if (time_delta > max_time_delta)
+			time_delta = max_time_delta;
+
+		/*
+		 * calculate the expiry time for the next timer wheel
+		 * timer
+		 */
+		expires = ktime_add_ns(last_update, time_delta);

  		/*
  		 * If this cpu is the one which updates jiffies, then
@@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
  		if (cpu == tick_do_timer_cpu)
  			tick_do_timer_cpu = TICK_DO_TIMER_NONE;

-		if (delta_jiffies > 1)
+		if (time_delta > tick_period.tv64)
  			cpumask_set_cpu(cpu, nohz_cpu_mask);

  		/* Skip reprogram of event if its not changed */
@@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
  		ts->idle_sleeps++;

  		/*
-		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
-		 * there is no timer pending or at least extremly far
-		 * into the future (12 days for HZ=1000). In this case
-		 * we simply stop the tick timer:
+		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
+		 * signals that there is no timer pending or at least
+		 * extremely far into the future (12 days for HZ=1000).
+		 * In this case we simply stop the tick timer:
  		 */
-		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
+		if (unlikely(time_delta >=
+				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
  			ts->idle_expires.tv64 = KTIME_MAX;
  			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
  				hrtimer_cancel(&ts->sched_timer);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..608fc6f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void)
  }

  /**
+ * timekeeping_max_deferment - Returns max time the clocksource can be 
deferred
+ *
+ * IMPORTANT: Must be called with xtime_lock held!
+ */
+s64 timekeeping_max_deferment(void)
+{
+	s64 max_nsecs;
+	u64 max_cycles;
+
+	/*
+	 * Calculate the maximum number of cycles that we can pass to the
+	 * cyc2ns function without overflowing a 64-bit signed result. The
+	 * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
+	 * is equivalent to the below.
+	 * max_cycles < (2^63)/clock->mult
+	 * max_cycles < 2^(log2((2^63)/clock->mult))
+	 * max_cycles < 2^(log2(2^63) - log2(clock->mult))
+	 * max_cycles < 2^(63 - log2(clock->mult))
+	 * max_cycles < 1 << (63 - log2(clock->mult))
+	 * Please note that we add 1 to the result of the log2 to account for
+	 * any rounding errors, ensure the above inequality is satisfied and
+	 * no overflow will occur.
+	 */
+	max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
+
+	/*
+	 * The actual maximum number of cycles we can defer the clocksource is
+	 * determined by the minimum of max_cycles and clock->mask.
+	 */
+	max_cycles = min(max_cycles, clock->mask);
+	max_nsecs = cyc2ns(clock, max_cycles);
+
+	/*
+	 * To ensure that the clocksource does not wrap whilst we are idle,
+	 * limit the time the clocksource can be deferred by 6.25%. Please
+	 * note a margin of 6.25% is used because this can be computed with
+	 * a shift, versus say 5% which would require division.
+	 */
+	max_nsecs = max_nsecs - (max_nsecs >> 4);
+
+	if (max_nsecs < 0)
+		max_nsecs = 0;
+
+	return max_nsecs;
+}
+
+/**
   * read_persistent_clock -  Return time in seconds from the persistent 
clock.
   *
   * Weak dummy function for arches that do not yet support it.
-- 
1.6.1







^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-22 18:21                                 ` Jon Hunter
@ 2009-05-22 19:23                                   ` john stultz
  2009-05-22 19:54                                     ` Thomas Gleixner
  2009-05-22 19:59                                   ` Thomas Gleixner
  1 sibling, 1 reply; 30+ messages in thread
From: john stultz @ 2009-05-22 19:23 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel

On Fri, 2009-05-22 at 13:21 -0500, Jon Hunter wrote:
> John Stultz wrote:
> > So cyc2ns is a very very hot path, so I don't think we want to muck with
> > that much.
> > 
> > Instead of catching the overflows, and then trying to handle them, we
> > really want to prevent overflows from happening. That is the main point
> > of the timekeeping_max_deferrment() interface after all ;)
> > 
> > So thinking about it a bit more, what we really want from
> > timekeeping_max_deferrment() is roughly:
> > 
> > 	/* find the max cycle value that would overflow the mult */
> > 	max_cycles = -1UUL/clocksource->mult;
> > 
> > 	/* pick the smaller of max_cycles or the mask value */
> > 	max_cycles = min(max_cycles, clocksource->mask);
> > 
> > 	/* convert max_cycles into ns */
> > 	max_time = cyc2ns(clocksource, max_cycles);
> > 
> > 	/* take off 5% of the max to make sure we don't show up late */
> > 	max_time = max_time - max_time/20;
> > 
> > 
> > We should be able to make this reasonably fast via:
> > 	max_cycles = 1<<(64 - ilog2(clocksource->mult) - 1);
> > 	max_cycles = min(max_cycles, clocksource->mask);
> > 	max_time = cyc2ns(clocksource, max_cycles);
> > 	max_time = max_time - max_time >> 4;
> > 
> > Does that seem reasonable?
> 
> Yes, good idea. Sorry for the delay in getting back to this. I have 
> implemented the above and incorporated into the patch below.
> 
> Please note that the only modification I made to the above was to add 1 
> to result of the log2. I found that the result of the log2 was rounded 
> down and so this was still making max_cycles too big. Therefore to be on 
> the safe side I add 1 to the result of log2 which will ensure max_cycles 
> is not too big.

Yep you're right. I was trying to handle that with the -1, but my
mistake was using 1<<64 not 1<<63. Thanks for catching that!


> 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Acked-by: John Stultz <johnstul@us.ibm.com>

This looks great to me!

Thanks for putting up with my nit-picky review and revised suggestions!

Thomas, you ok to pick this up?
-john

> ---
>   include/linux/time.h      |    1 +
>   kernel/time/tick-sched.c  |   36 +++++++++++++++++++++++----------
>   kernel/time/timekeeping.c |   47 
> +++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 73 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/time.h b/include/linux/time.h
> index 242f624..090be07 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -130,6 +130,7 @@ extern void monotonic_to_bootbased(struct timespec *ts);
> 
>   extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
>   extern int timekeeping_valid_for_hres(void);
> +extern s64 timekeeping_max_deferment(void);
>   extern void update_wall_time(void);
>   extern void update_xtime_cache(u64 nsec);
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index d3f1ef4..f0155ae 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -217,6 +217,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	ktime_t last_update, expires, now;
>   	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
>   	int cpu;
> +	s64 time_delta, max_time_delta;
> 
>   	local_irq_save(flags);
> 
> @@ -264,6 +265,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		seq = read_seqbegin(&xtime_lock);
>   		last_update = last_jiffies_update;
>   		last_jiffies = jiffies;
> +		max_time_delta = timekeeping_max_deferment();
>   	} while (read_seqretry(&xtime_lock, seq));
> 
>   	/* Get the next timer wheel timer */
> @@ -283,11 +285,22 @@ void tick_nohz_stop_sched_tick(int inidle)
>   	if ((long)delta_jiffies >= 1) {
> 
>   		/*
> -		* calculate the expiry time for the next timer wheel
> -		* timer
> -		*/
> -		expires = ktime_add_ns(last_update, tick_period.tv64 *
> -				   delta_jiffies);
> +		 * Calculate the time delta for the next timer event.
> +		 * If the time delta exceeds the maximum time delta
> +		 * permitted by the current clocksource then adjust
> +		 * the time delta accordingly to ensure the
> +		 * clocksource does not wrap.
> +		 */
> +		time_delta = tick_period.tv64 * delta_jiffies;
> +
> +		if (time_delta > max_time_delta)
> +			time_delta = max_time_delta;
> +
> +		/*
> +		 * calculate the expiry time for the next timer wheel
> +		 * timer
> +		 */
> +		expires = ktime_add_ns(last_update, time_delta);
> 
>   		/*
>   		 * If this cpu is the one which updates jiffies, then
> @@ -300,7 +313,7 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		if (cpu == tick_do_timer_cpu)
>   			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> 
> -		if (delta_jiffies > 1)
> +		if (time_delta > tick_period.tv64)
>   			cpumask_set_cpu(cpu, nohz_cpu_mask);
> 
>   		/* Skip reprogram of event if its not changed */
> @@ -332,12 +345,13 @@ void tick_nohz_stop_sched_tick(int inidle)
>   		ts->idle_sleeps++;
> 
>   		/*
> -		 * delta_jiffies >= NEXT_TIMER_MAX_DELTA signals that
> -		 * there is no timer pending or at least extremly far
> -		 * into the future (12 days for HZ=1000). In this case
> -		 * we simply stop the tick timer:
> +		 * time_delta >= (tick_period.tv64 * NEXT_TIMER_MAX_DELTA)
> +		 * signals that there is no timer pending or at least
> +		 * extremely far into the future (12 days for HZ=1000).
> +		 * In this case we simply stop the tick timer:
>   		 */
> -		if (unlikely(delta_jiffies >= NEXT_TIMER_MAX_DELTA)) {
> +		if (unlikely(time_delta >=
> +				(tick_period.tv64 * NEXT_TIMER_MAX_DELTA))) {
>   			ts->idle_expires.tv64 = KTIME_MAX;
>   			if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
>   				hrtimer_cancel(&ts->sched_timer);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 687dff4..608fc6f 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -271,6 +271,53 @@ int timekeeping_valid_for_hres(void)
>   }
> 
>   /**
> + * timekeeping_max_deferment - Returns max time the clocksource can be 
> deferred
> + *
> + * IMPORTANT: Must be called with xtime_lock held!
> + */
> +s64 timekeeping_max_deferment(void)
> +{
> +	s64 max_nsecs;
> +	u64 max_cycles;
> +
> +	/*
> +	 * Calculate the maximum number of cycles that we can pass to the
> +	 * cyc2ns function without overflowing a 64-bit signed result. The
> +	 * maximum number of cycles is equal to ULLONG_MAX/clock->mult which
> +	 * is equivalent to the below.
> +	 * max_cycles < (2^63)/clock->mult
> +	 * max_cycles < 2^(log2((2^63)/clock->mult))
> +	 * max_cycles < 2^(log2(2^63) - log2(clock->mult))
> +	 * max_cycles < 2^(63 - log2(clock->mult))
> +	 * max_cycles < 1 << (63 - log2(clock->mult))
> +	 * Please note that we add 1 to the result of the log2 to account for
> +	 * any rounding errors, ensure the above inequality is satisfied and
> +	 * no overflow will occur.
> +	 */
> +	max_cycles = 1ULL << (63 - (ilog2(clock->mult) + 1));
> +
> +	/*
> +	 * The actual maximum number of cycles we can defer the clocksource is
> +	 * determined by the minimum of max_cycles and clock->mask.
> +	 */
> +	max_cycles = min(max_cycles, clock->mask);
> +	max_nsecs = cyc2ns(clock, max_cycles);
> +
> +	/*
> +	 * To ensure that the clocksource does not wrap whilst we are idle,
> +	 * limit the time the clocksource can be deferred by 6.25%. Please
> +	 * note a margin of 6.25% is used because this can be computed with
> +	 * a shift, versus say 5% which would require division.
> +	 */
> +	max_nsecs = max_nsecs - (max_nsecs >> 4);
> +
> +	if (max_nsecs < 0)
> +		max_nsecs = 0;
> +
> +	return max_nsecs;
> +}
> +
> +/**
>    * read_persistent_clock -  Return time in seconds from the persistent 
> clock.
>    *
>    * Weak dummy function for arches that do not yet support it.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-22 19:23                                   ` john stultz
@ 2009-05-22 19:54                                     ` Thomas Gleixner
  2009-05-26 15:12                                       ` Jon Hunter
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2009-05-22 19:54 UTC (permalink / raw)
  To: john stultz; +Cc: Jon Hunter, Ingo Molnar, linux-kernel

On Fri, 22 May 2009, john stultz wrote:
> Acked-by: John Stultz <johnstul@us.ibm.com>
> 
> This looks great to me!
> 
> Thanks for putting up with my nit-picky review and revised suggestions!
> 
> Thomas, you ok to pick this up?

Yup, will do.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-22 18:21                                 ` Jon Hunter
  2009-05-22 19:23                                   ` john stultz
@ 2009-05-22 19:59                                   ` Thomas Gleixner
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2009-05-22 19:59 UTC (permalink / raw)
  To: Jon Hunter; +Cc: John Stultz, Ingo Molnar, linux-kernel

On Fri, 22 May 2009, Jon Hunter wrote:
> Cheers
> Jon

  -ENOCHANGELOG
 
> Signed-off-by: Jon Hunter <jon-hunter@ti.com>

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-22 19:54                                     ` Thomas Gleixner
@ 2009-05-26 15:12                                       ` Jon Hunter
  2009-05-26 20:26                                         ` john stultz
  0 siblings, 1 reply; 30+ messages in thread
From: Jon Hunter @ 2009-05-26 15:12 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: john stultz, Ingo Molnar, linux-kernel


Thomas Gleixner wrote:
> On Fri, 22 May 2009, john stultz wrote:
>> Acked-by: John Stultz <johnstul@us.ibm.com>
>>
>> This looks great to me!
>>
>> Thanks for putting up with my nit-picky review and revised suggestions!
>>
>> Thomas, you ok to pick this up?
> 
> Yup, will do.

Hi Thomas, John,

Would you also be willing to accept the initial patch that I submitted?

See: http://marc.info/?l=linux-kernel&m=124026224019895&w=2

If so, I could resubmit the two patches as a patch series called 
something like "dynamic tick: enabling longer sleep times on 32-bit 
machines."

Let me know your thoughts. I will also also add a changelog too.

Cheers
Jon


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds
  2009-05-26 15:12                                       ` Jon Hunter
@ 2009-05-26 20:26                                         ` john stultz
  0 siblings, 0 replies; 30+ messages in thread
From: john stultz @ 2009-05-26 20:26 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel

On Tue, 2009-05-26 at 10:12 -0500, Jon Hunter wrote:
> Thomas Gleixner wrote:
> > On Fri, 22 May 2009, john stultz wrote:
> >> Acked-by: John Stultz <johnstul@us.ibm.com>
> >>
> >> This looks great to me!
> >>
> >> Thanks for putting up with my nit-picky review and revised suggestions!
> >>
> >> Thomas, you ok to pick this up?
> > 
> > Yup, will do.
> 
> Hi Thomas, John,
> 
> Would you also be willing to accept the initial patch that I submitted?
> 
> See: http://marc.info/?l=linux-kernel&m=124026224019895&w=2
> 
> If so, I could resubmit the two patches as a patch series called 
> something like "dynamic tick: enabling longer sleep times on 32-bit 
> machines."
> 
> Let me know your thoughts. I will also also add a changelog too.

Yea, I think both of them look good. So I'd resend as a patch set with
proper patch descriptions on both.

thanks
-john



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-05-26 20:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-20 21:16 [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds Jon Hunter
2009-04-21  6:35 ` Ingo Molnar
2009-04-21 20:32   ` john stultz
2009-04-21 23:20     ` Jon Hunter
2009-04-22  0:02       ` john stultz
2009-05-07 14:52         ` Jon Hunter
2009-05-08  0:54           ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formore " john stultz
2009-05-08 16:05             ` Jon Hunter
2009-05-09  0:51               ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan " john stultz
2009-05-12 23:35                 ` Jon Hunter
2009-05-12 23:58                   ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep formorethan2.15 seconds john stultz
2009-05-13 15:14                     ` Jon Hunter
2009-05-13 16:41                       ` John Stultz
2009-05-13 17:54                         ` Jon Hunter
2009-05-13 19:21                           ` John Stultz
2009-05-15 16:35                             ` Jon Hunter
2009-05-15 18:55                               ` Jon Hunter
2009-05-16  1:29                                 ` John Stultz
2009-05-16  1:18                               ` John Stultz
2009-05-22 18:21                                 ` Jon Hunter
2009-05-22 19:23                                   ` john stultz
2009-05-22 19:54                                     ` Thomas Gleixner
2009-05-26 15:12                                       ` Jon Hunter
2009-05-26 20:26                                         ` john stultz
2009-05-22 19:59                                   ` Thomas Gleixner
2009-04-22  0:05       ` [RFC][PATCH] Dynamic Tick: Allow 32-bit machines to sleep for more than 2.15 seconds john stultz
2009-04-22  3:07         ` Jon Hunter
2009-04-22 15:30           ` Chris Friesen
2009-04-22 17:04             ` Jon Hunter
2009-04-22 18:53               ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).