All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [RFC] ARM timing code refactoring
@ 2011-01-22 10:20 Albert ARIBAUD
  2011-01-22 10:42 ` Reinhard Meyer
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-22 10:20 UTC (permalink / raw)
  To: u-boot

Hi All,

I am starting this thread to revive and, hopefully, come to a general 
agreement on how timers should be implemented and used in the ARM 
architecture, and get rid of current quick fixes. Let us start with 
Reinhard's proposal:

> There were several suggestions about that in the past (including from
> me) that involve rework everywhere HZ related timeouts are used. I
> still prefer a method as follows (because it does not need repeated
> mul/div calculations nor necessarily 64 bit arithmetic):

Agreed for unnecessary mult-div, but 64-bit we would not avoid, and 
should not IMO, when the HW has it.

> u32 timeout = timeout_init(100); /* 100ms timeout */
>
> do {...} while (!timed_out(timeout));
>
> Internally it would be like:
>
> timeout_init(x):
> return fast_tick + (x * fast_tick_rate) / CONFIG_SYS_HZ;
> /* this might need 64 bit precision in some implementations */
>
> time_out(x):
> return ((i32)(x - fast_tick)) < 0;
>
> If the tick were really high speed (and then 64 bits), fast_tick
> could be derived by shifting the tick some bits to the right.

The only thing I slightly dislike about the overall idea is the signed 
rather than unsigned comparison in the timeout function (I prefer using 
the full 32-bit range, even if only as an academic point) and the fact 
that the value of the timeout is encoded in advance in the loop control 
variable 'timeout'.

I'd rather have a single 'time(x)' (or 'ticks_elapsed(x)', names are 
negotiable) macro which subtract its argument from the current ticks, 
e.g. 'then = time(0)' would set 'then' to the number of ticks elapsed 
from boot, while 'now = time(then)' would set 'now' the ticks elapsed 
from 'then'; and a 'ms_to_ticks(x)' (again, or 'milliseconds(x)') :

	#define time(x) (ticks - x)
	#define ms_to_ticks(m) ( (m * fast_tick_rate) / CONFIG_SYS_HZ)

Note that time(x) assumes unsigned arguments and amounts to an unsigned 
compare, because we're always computing an difference time, i.e. even 
with x = 2 and ticks = 1, the result is correct -- that's assuming ticks 
is monotonous 32-bits (or 64-bits for the platforms that can support it 
as an atomic value)

Your example would then become

	then = time(0);
	do {...} while ( time(then) < ms_to_ticks(100) );

... moving the actual timeout value impact from the time sample before 
the 'while' to the 'while' condition@then end.

For expressiveness, added macros such as:

	#define now() time(0)
  	#define ms_elapsed(then,ms) ( time(then) < ms_to_ticks(ms) )

... would allow writing the same example as:

	then = now();
	do {...} while ( !ms_elapsed(then,100) );

> But, as long as we cannot agree on something, there will be no
> time spent to make patches...

Makes sense, hence this specific thread. :)

> Best Regards,
> Reinhard

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 10:20 [U-Boot] [RFC] ARM timing code refactoring Albert ARIBAUD
@ 2011-01-22 10:42 ` Reinhard Meyer
  2011-01-22 11:32   ` Albert ARIBAUD
  2011-01-22 11:00 ` [U-Boot] [RFC] U-boot (was: ARM) " Reinhard Meyer
  2011-01-22 19:19 ` [U-Boot] [RFC] ARM timing code refactoring Wolfgang Denk
  2 siblings, 1 reply; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-22 10:42 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,
> Hi All,
>
> I am starting this thread to revive and, hopefully, come to a general
> agreement on how timers should be implemented and used in the ARM
> architecture, and get rid of current quick fixes. Let us start with
> Reinhard's proposal:
>
>> There were several suggestions about that in the past (including from
>> me) that involve rework everywhere HZ related timeouts are used. I
>> still prefer a method as follows (because it does not need repeated
>> mul/div calculations nor necessarily 64 bit arithmetic):
>
> Agreed for unnecessary mult-div, but 64-bit we would not avoid, and
> should not IMO, when the HW has it.
>
>> u32 timeout = timeout_init(100); /* 100ms timeout */
>>
>> do {...} while (!timed_out(timeout));
>>
>> Internally it would be like:
>>
>> timeout_init(x):
>> return fast_tick + (x * fast_tick_rate) / CONFIG_SYS_HZ;
>> /* this might need 64 bit precision in some implementations */
>>
>> time_out(x):
>> return ((i32)(x - fast_tick))<  0;
>>
>> If the tick were really high speed (and then 64 bits), fast_tick
>> could be derived by shifting the tick some bits to the right.
>
> The only thing I slightly dislike about the overall idea is the signed
> rather than unsigned comparison in the timeout function (I prefer using
> the full 32-bit range, even if only as an academic point) and the fact
> that the value of the timeout is encoded in advance in the loop control
> variable 'timeout'.

1. you always need signed compares there, unless you never anticipate a
rollover of your timer value to zero.
2. whats the problem with initializing the timeout value at the beginning?

>
> I'd rather have a single 'time(x)' (or 'ticks_elapsed(x)', names are
> negotiable) macro which subtract its argument from the current ticks,
> e.g. 'then = time(0)' would set 'then' to the number of ticks elapsed
> from boot, while 'now = time(then)' would set 'now' the ticks elapsed
> from 'then'; and a 'ms_to_ticks(x)' (again, or 'milliseconds(x)') :
>
> 	#define time(x) (ticks - x)
> 	#define ms_to_ticks(m) ( (m * fast_tick_rate) / CONFIG_SYS_HZ)

We have exactly an equivalent of this in use at AT91.
It works only as long as the ticks value does not roll back to zero -
which in the current 64 bit implementation I did is after the end of
the universe...

Note also that "fast_tick_rate" would not be a constant in AT91, it is
dynamically calculated from main xtal frequency measured against the 32kHz
xtal and PLL settings.

>
> Note that time(x) assumes unsigned arguments and amounts to an unsigned
> compare, because we're always computing an difference time, i.e. even
> with x = 2 and ticks = 1, the result is correct -- that's assuming ticks
> is monotonous 32-bits (or 64-bits for the platforms that can support it
> as an atomic value)

Assume: Monotonous AND never wrapping back to zero!

>
> Your example would then become
>
> 	then = time(0);
> 	do {...} while ( time(then)<  ms_to_ticks(100) );

That looks ugly to me. We don't want to see the high speed(64 bit) values
in the drivers - I think.

>
> ... moving the actual timeout value impact from the time sample before
> the 'while' to the 'while' condition at then end.

Which does a multiply and a divide in 64 bit each loop iteration...
(fast_tick_rate being a variable)

>
> For expressiveness, added macros such as:
>
> 	#define now() time(0)
>    	#define ms_elapsed(then,ms) ( time(then)<  ms_to_ticks(ms) )
>
> ... would allow writing the same example as:
>
> 	then = now();
> 	do {...} while ( !ms_elapsed(then,100) );
>

Why make everything so complicated???

>> But, as long as we cannot agree on something, there will be no
>> time spent to make patches...
>
> Makes sense, hence this specific thread. :)

The how-many-th thread about timers is this ? :)

Best Regards,
Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] U-boot (was: ARM) timing code refactoring
  2011-01-22 10:20 [U-Boot] [RFC] ARM timing code refactoring Albert ARIBAUD
  2011-01-22 10:42 ` Reinhard Meyer
@ 2011-01-22 11:00 ` Reinhard Meyer
  2011-01-22 12:22   ` [U-Boot] [RFC] U-boot Albert ARIBAUD
  2011-01-22 19:19 ` [U-Boot] [RFC] ARM timing code refactoring Wolfgang Denk
  2 siblings, 1 reply; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-22 11:00 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

this is not an ARM local issue.

The timeouts are used in generic drivers all around u-boot.

Have a grep for get_timer, reset_timer...

The most ugly use is with reset_timer involved, where the internal
pseudo-tick is reset to zero, so all calls to get_timer are
relative to that moment.

We are looking at replacing all those occurrences of reset_timer
and get_timer with "better" methods.

Best Regards,
Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 10:42 ` Reinhard Meyer
@ 2011-01-22 11:32   ` Albert ARIBAUD
  0 siblings, 0 replies; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-22 11:32 UTC (permalink / raw)
  To: u-boot

Le 22/01/2011 11:42, Reinhard Meyer a ?crit :
> Dear Albert ARIBAUD,
>> Hi All,
>>
>> I am starting this thread to revive and, hopefully, come to a general
>> agreement on how timers should be implemented and used in the ARM
>> architecture, and get rid of current quick fixes. Let us start with
>> Reinhard's proposal:
>>
>>> There were several suggestions about that in the past (including from
>>> me) that involve rework everywhere HZ related timeouts are used. I
>>> still prefer a method as follows (because it does not need repeated
>>> mul/div calculations nor necessarily 64 bit arithmetic):
>>
>> Agreed for unnecessary mult-div, but 64-bit we would not avoid, and
>> should not IMO, when the HW has it.
>>
>>> u32 timeout = timeout_init(100); /* 100ms timeout */
>>>
>>> do {...} while (!timed_out(timeout));
>>>
>>> Internally it would be like:
>>>
>>> timeout_init(x):
>>> return fast_tick + (x * fast_tick_rate) / CONFIG_SYS_HZ;
>>> /* this might need 64 bit precision in some implementations */
>>>
>>> time_out(x):
>>> return ((i32)(x - fast_tick))< 0;
>>>
>>> If the tick were really high speed (and then 64 bits), fast_tick
>>> could be derived by shifting the tick some bits to the right.
>>
>> The only thing I slightly dislike about the overall idea is the signed
>> rather than unsigned comparison in the timeout function (I prefer using
>> the full 32-bit range, even if only as an academic point) and the fact
>> that the value of the timeout is encoded in advance in the loop control
>> variable 'timeout'.
>
> 1. you always need signed compares there, unless you never anticipate a
> rollover of your timer value to zero.

I don't think rollover to zero is an issue if the tick counter is 
free-running on 32 (or 64 bits); and if it is free-running on some other 
number of bits, the time() macro just needs anding to work.

> 2. whats the problem with initializing the timeout value at the beginning?

There isn't a problem as such; simply, as the timeout is not 
conceptually needed at the beginning of the timing loop, I would favor 
implementations which do not need it at the beginning either.

>> I'd rather have a single 'time(x)' (or 'ticks_elapsed(x)', names are
>> negotiable) macro which subtract its argument from the current ticks,
>> e.g. 'then = time(0)' would set 'then' to the number of ticks elapsed
>> from boot, while 'now = time(then)' would set 'now' the ticks elapsed
>> from 'then'; and a 'ms_to_ticks(x)' (again, or 'milliseconds(x)') :
>>
>> #define time(x) (ticks - x)
>> #define ms_to_ticks(m) ( (m * fast_tick_rate) / CONFIG_SYS_HZ)
>
> We have exactly an equivalent of this in use at AT91.
> It works only as long as the ticks value does not roll back to zero -
> which in the current 64 bit implementation I did is after the end of
> the universe...

Let us take a case where the tick value rolls over 0. Say, ticks is 
equal to 2^32 - 100, and we should delay for 200 ticks. At the beginning 
of the loop, 'then' becomes equal to 2^32 - 100. While the loop runs, 
time(then) equals 'ticks - then', or, '2^32 - 100 + n - (2^32 - 100)', 
or '2^32 - 100 - 2^32 + 100 +n', or (since we're computing modulo 2^32) 
'-100 + 100 +n', or 'n', where n is the number of ticks elapsed since 
'now' was caculated. When n is 200 or more, the loop end condition is 
met. Crossing the 0 value did not affect the computation -- or did it?

> Note also that "fast_tick_rate" would not be a constant in AT91, it is
> dynamically calculated from main xtal frequency measured against the 32kHz
> xtal and PLL settings.

That's another issue entirely, with two consequences, but they don't 
really differ with respect to which proposal, yours or mine, is considered.

1. This makes the compiler unable to optimize ms to tick conversions at 
compile time. Not a really big deal, and unavoidable as soon as the 
decision is taken to calibrate the fast tick at run time.

2. If calibration is done continuously, then ticks *may* not be 
monotonous. If it is done once at startup, a la bogomips calculation, 
then we should be safe.

>> Note that time(x) assumes unsigned arguments and amounts to an unsigned
>> compare, because we're always computing an difference time, i.e. even
>> with x = 2 and ticks = 1, the result is correct -- that's assuming ticks
>> is monotonous 32-bits (or 64-bits for the platforms that can support it
>> as an atomic value)
>
> Assume: Monotonous AND never wrapping back to zero!

I think zero wraparound is not an issue if the ticks are on N (ideally, 
32 or 64) bits, which is most probably the case for a free-running 
counter. We need to assume monotonous at least during delay loops; since 
we have little or no parallel execution, we should be able to avoid 
recalibration while a delay loop is running, right?

>> Your example would then become
>>
>> then = time(0);
>> do {...} while ( time(then)< ms_to_ticks(100) );
>
> That looks ugly to me. We don't want to see the high speed(64 bit) values
> in the drivers - I think.

We don't see them, actually; but why would HW tick values would be 
unwelcome in drivers that already handle, and actually are designed to 
handle, lots of other HW related values? And besides, how does this 
differ from your proposal in this respect?

>> ... moving the actual timeout value impact from the time sample before
>> the 'while' to the 'while' condition at then end.
>
> Which does a multiply and a divide in 64 bit each loop iteration...
> (fast_tick_rate being a variable)

Not if the compiler is half decent; it will optimize out the constant 
expression. And if you don't trust the compiler, you can pre-compute the 
value of the delay.

>> For expressiveness, added macros such as:
>>
>> #define now() time(0)
>> #define ms_elapsed(then,ms) ( time(then)< ms_to_ticks(ms) )
>>
>> ... would allow writing the same example as:
>>
>> then = now();
>> do {...} while ( !ms_elapsed(then,100) );
>
> Why make everything so complicated???

I am surprised by this question. The last point I make is to simplify 
things, not complicate them, and besides, it is only for expressiveness 
and clarity of the code.

> The how-many-th thread about timers is this ? :)

The last one, trust me. :)

> Best Regards,
> Reinhard

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] U-boot
  2011-01-22 11:00 ` [U-Boot] [RFC] U-boot (was: ARM) " Reinhard Meyer
@ 2011-01-22 12:22   ` Albert ARIBAUD
  0 siblings, 0 replies; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-22 12:22 UTC (permalink / raw)
  To: u-boot

Hi Reinhard,

Le 22/01/2011 12:00, Reinhard Meyer a ?crit :
> Dear Albert ARIBAUD,
>
> this is not an ARM local issue.

Well, there *is* an ARM specific side of it (use of gd variables during 
relocation), and that is what prompted me to start the RFC, but 
generalization to U-boot is welcome if it leads to a satisfactory 
solution. If not, at least I'll adopt a solution for ARM.

> The timeouts are used in generic drivers all around u-boot.
>
> Have a grep for get_timer, reset_timer...
>
> The most ugly use is with reset_timer involved, where the internal
> pseudo-tick is reset to zero, so all calls to get_timer are
> relative to that moment.
>
> We are looking at replacing all those occurrences of reset_timer
> and get_timer with "better" methods.

Seems to me this replacement is quite straightforward, as most uses of 
reset_timer() and subsequent get_timer are actually loops, functionally 
the same as our proposals, only instead of using a local to store the 
start or end time, they reset ticks to zero -- which, I concur, is a Bad 
Thing.

But 'reset_timer()' calls just need to be replaced with your 'timeout = 
timeout_init(N)' or my 'then = now()' and 'get_timer() > N' or 
'get_timer_masked() > N' by your 'time_out(timeout)' or my 
'ms_elapsed(N)'. Seems to me like a tedious effort, possibly involving 
occasional time-out value conversions, but not a difficulty.

What do I miss?

> Best Regards,
> Reinhard

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 10:20 [U-Boot] [RFC] ARM timing code refactoring Albert ARIBAUD
  2011-01-22 10:42 ` Reinhard Meyer
  2011-01-22 11:00 ` [U-Boot] [RFC] U-boot (was: ARM) " Reinhard Meyer
@ 2011-01-22 19:19 ` Wolfgang Denk
  2011-01-22 20:17   ` Albert ARIBAUD
  2 siblings, 1 reply; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-22 19:19 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

In message <4D3AAF63.1030600@free.fr> you wrote:
> 
> Agreed for unnecessary mult-div, but 64-bit we would not avoid, and 
> should not IMO, when the HW has it.

When attempting to come up with true generic code, we should probably
_always_ use a (virtual) unsigned 64 bit counter.

> > u32 timeout = timeout_init(100); /* 100ms timeout */
> >
> > do {...} while (!timed_out(timeout));
> >
> > Internally it would be like:
> >
> > timeout_init(x):
> > return fast_tick + (x * fast_tick_rate) / CONFIG_SYS_HZ;
> > /* this might need 64 bit precision in some implementations */
> >
> > time_out(x):
> > return ((i32)(x - fast_tick)) < 0;
> >
> > If the tick were really high speed (and then 64 bits), fast_tick
> > could be derived by shifting the tick some bits to the right.
> 
> The only thing I slightly dislike about the overall idea is the signed 
> rather than unsigned comparison in the timeout function (I prefer using 
> the full 32-bit range, even if only as an academic point) and the fact 
> that the value of the timeout is encoded in advance in the loop control 
> variable 'timeout'.

Please don't.  Use an unsigned counter and allow it to roll over.
Anything else causes additional (and unnecessary) restrictions and
makes the code more complicated.

> I'd rather have a single 'time(x)' (or 'ticks_elapsed(x)', names are 
> negotiable) macro which subtract its argument from the current ticks, 

I'm not sure what you have in mind with "substract".  I strongly
recommend to avoid problems with wrap-around etc. right from the
beginning, and let unsigned arithmentcs handle this for us.

> e.g. 'then = time(0)' would set 'then' to the number of ticks elapsed 
> from boot, while 'now = time(then)' would set 'now' the ticks elapsed 
> from 'then'; and a 'ms_to_ticks(x)' (again, or 'milliseconds(x)') :

Do we really need such a function?  As far as I can tell we don't
really have any time (in the sense of wallclock time) related
requirements, we only need delta times, and nobody really cares about
the starting point.

We should _always_ be able to use the standard approach of

	start = get_timer()

	while ((get_timer() - start) < timeout) {
		... wait ...
	}

> Your example would then become
> 
> 	then = time(0);
> 	do {...} while ( time(then) < ms_to_ticks(100) );

We should NOT do this.  It is bound to break as soon as the code 
in the loop (the "..." part) needs to implement a timeout, too.

> 	#define now() time(0)
>   	#define ms_elapsed(then,ms) ( time(then) < ms_to_ticks(ms) )

This is not a good isea for the same reason.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
There has been an alarming increase in the number of things you  know
nothing about.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 19:19 ` [U-Boot] [RFC] ARM timing code refactoring Wolfgang Denk
@ 2011-01-22 20:17   ` Albert ARIBAUD
  2011-01-22 21:26     ` Wolfgang Denk
  0 siblings, 1 reply; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-22 20:17 UTC (permalink / raw)
  To: u-boot

Hi Wolfgang,

Le 22/01/2011 20:19, Wolfgang Denk a ?crit :

> Dear Albert ARIBAUD,

>> Agreed for unnecessary mult-div, but 64-bit we would not avoid, and
>> should not IMO, when the HW has it.
>
> When attempting to come up with true generic code, we should probably
> _always_ use a (virtual) unsigned 64 bit counter.

That's fine with me. We'll just have to keep in mind that with a 32-bit 
HW counter, the upper 32 bits of the 64-bit virtual counter must be 
managed by SW -- which should not prove to be too much of a problem, but 
will force us to handle HW-to-SW rollovers in get_timer() on each 
virtual timer reads.

>>> If the tick were really high speed (and then 64 bits), fast_tick
>>> could be derived by shifting the tick some bits to the right.
>>
>> The only thing I slightly dislike about the overall idea is the signed
>> rather than unsigned comparison in the timeout function (I prefer using
>> the full 32-bit range, even if only as an academic point) and the fact
>> that the value of the timeout is encoded in advance in the loop control
>> variable 'timeout'.
>
> Please don't.  Use an unsigned counter and allow it to roll over.
> Anything else causes additional (and unnecessary) restrictions and
> makes the code more complicated.

Hmm... don't we actually agree here? I was also expressing my preference 
for unsigned.

>> I'd rather have a single 'time(x)' (or 'ticks_elapsed(x)', names are
>> negotiable) macro which subtract its argument from the current ticks,
>
> I'm not sure what you have in mind with "substract".  I strongly
> recommend to avoid problems with wrap-around etc. right from the
> beginning, and let unsigned arithmentcs handle this for us.

That's what I meant with my comment about subtracting: the difference of 
two unsigned integers is always correct.

>> e.g. 'then = time(0)' would set 'then' to the number of ticks elapsed
>> from boot, while 'now = time(then)' would set 'now' the ticks elapsed
>> from 'then'; and a 'ms_to_ticks(x)' (again, or 'milliseconds(x)') :
>
> Do we really need such a function?  As far as I can tell we don't
> really have any time (in the sense of wallclock time) related
> requirements, we only need delta times, and nobody really cares about
> the starting point.

Hmm... My idea with providing time() with an argument was that precisely 
since we are interested only in elapsed time, not absolute time, our 
basic time function should be able to tell us relative times.

> We should _always_ be able to use the standard approach of
>
> 	start = get_timer()
>
> 	while ((get_timer() - start)<  timeout) {
> suall		... wait ...
> 	}

Functionally this is the same as what I suggested, minus the absolute 
get_timer() vs relative time(x) call. Either way works, I'm not going to 
try and make a point that we use "my" time(x).

We still need ms-to-tick and us-to-tick conversions, though, because 
usually timing constraints will be expressed in us or ms, not in ticks.

Just one thing, however:

>> Your example would then become
>>
>> 	then = time(0);
>> 	do {...} while ( time(then)<  ms_to_ticks(100) );
>
> We should NOT do this.  It is bound to break as soon as the code
> in the loop (the "..." part) needs to implement a timeout, too.

I'm not sure to understand why. Can you develop how you think this would 
break with an inner timeout?

>> 	#define now() time(0)
>>    	#define ms_elapsed(then,ms) ( time(then)<  ms_to_ticks(ms) )
>
> This is not a good isea for the same reason.

Those two were just syntactic sugaring anyway; if you don't like them, 
no worries.

> Best regards,
>
> Wolfgang Denk

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 20:17   ` Albert ARIBAUD
@ 2011-01-22 21:26     ` Wolfgang Denk
  2011-01-22 21:51       ` Reinhard Meyer
  2011-01-22 22:13       ` Albert ARIBAUD
  0 siblings, 2 replies; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-22 21:26 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

In message <4D3B3B5C.2060205@free.fr> you wrote:
> 
> > When attempting to come up with true generic code, we should probably
> > _always_ use a (virtual) unsigned 64 bit counter.
> 
> That's fine with me. We'll just have to keep in mind that with a 32-bit 
> HW counter, the upper 32 bits of the 64-bit virtual counter must be 
> managed by SW -- which should not prove to be too much of a problem, but 
> will force us to handle HW-to-SW rollovers in get_timer() on each 
> virtual timer reads.

Yes.

> >> The only thing I slightly dislike about the overall idea is the signed
> >> rather than unsigned comparison in the timeout function (I prefer using
> >> the full 32-bit range, even if only as an academic point) and the fact
> >> that the value of the timeout is encoded in advance in the loop control
> >> variable 'timeout'.
> >
> > Please don't.  Use an unsigned counter and allow it to roll over.
> > Anything else causes additional (and unnecessary) restrictions and
> > makes the code more complicated.
> 
> Hmm... don't we actually agree here? I was also expressing my preference 
> for unsigned.

I'm happy if we agree. I just wan't sure.

> > I'm not sure what you have in mind with "substract".  I strongly
> > recommend to avoid problems with wrap-around etc. right from the
> > beginning, and let unsigned arithmentcs handle this for us.
> 
> That's what I meant with my comment about subtracting: the difference of 
> two unsigned integers is always correct.

Good.

> >> e.g. 'then = time(0)' would set 'then' to the number of ticks elapsed
> >> from boot, while 'now = time(then)' would set 'now' the ticks elapsed
> >> from 'then'; and a 'ms_to_ticks(x)' (again, or 'milliseconds(x)') :
> >
> > Do we really need such a function?  As far as I can tell we don't
> > really have any time (in the sense of wallclock time) related
> > requirements, we only need delta times, and nobody really cares about
> > the starting point.
> 
> Hmm... My idea with providing time() with an argument was that precisely 
> since we are interested only in elapsed time, not absolute time, our 
> basic time function should be able to tell us relative times.

The disadvantage of this approach is that such calls cannot be nested,
i. e. you must always make sure that the code run within the timeout
loop does not attempt to set up a timeout on it's own.  From my point
of view, this is a killing point.

> Functionally this is the same as what I suggested, minus the absolute 
> get_timer() vs relative time(x) call. Either way works, I'm not going to 
> try and make a point that we use "my" time(x).

See above for the fundamental difference - not in the implementation
for a single timeout, but from a system-wide point of view.

> >> 	then = time(0);
> >> 	do {...} while ( time(then)<  ms_to_ticks(100) );
> >
> > We should NOT do this.  It is bound to break as soon as the code
> > in the loop (the "..." part) needs to implement a timeout, too.
> 
> I'm not sure to understand why. Can you develop how you think this would 
> break with an inner timeout?

Sure:

	/* implement something which needs a 100 ms timeout */
	then = time(0);

	do {
		int then_nested;
		... do something...
		... do more...
		/* now do something which needs a 5 ms timeout */
		then_nested = time(0);
		do {
			...
		} while (time(then_nested) < ms_to_ticks(5));

	} while (time(then) < ms_to_ticks(100));

You see the problem?


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Remember, there's a big difference between kneeling down and  bending
over.                                                   - Frank Zappa

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 21:26     ` Wolfgang Denk
@ 2011-01-22 21:51       ` Reinhard Meyer
  2011-01-23 10:12         ` Wolfgang Denk
  2011-01-22 22:13       ` Albert ARIBAUD
  1 sibling, 1 reply; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-22 21:51 UTC (permalink / raw)
  To: u-boot

Dear Wolfgang Denk,
> Dear Albert ARIBAUD,

With all this half quoting and deleting of important parts,
my original proposal was lost again.

If you really care to look at it, it

1. does not have issues with rollover
2. does not have problems with nested timeouts
3. does 64 bit mul/div calculations only once at the begin of a timeout
4. is easy and straightforward to use
5. needs very few lines of code to implement

Calculating the end time at the begin of a loop is not a problem,
I am not aware of any timeout loops where the timeout value needs to be changed
inside the loop.

Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 21:26     ` Wolfgang Denk
  2011-01-22 21:51       ` Reinhard Meyer
@ 2011-01-22 22:13       ` Albert ARIBAUD
  2011-01-23 16:15         ` Wolfgang Denk
  1 sibling, 1 reply; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-22 22:13 UTC (permalink / raw)
  To: u-boot

Le 22/01/2011 22:26, Wolfgang Denk a ?crit :

>> Hmm... My idea with providing time() with an argument was that precisely
>> since we are interested only in elapsed time, not absolute time, our
>> basic time function should be able to tell us relative times.
>
> The disadvantage of this approach is that such calls cannot be nested,
> i. e. you must always make sure that the code run within the timeout
> loop does not attempt to set up a timeout on it's own.  From my point
> of view, this is a killing point.
>
>> Functionally this is the same as what I suggested, minus the absolute
>> get_timer() vs relative time(x) call. Either way works, I'm not going to
>> try and make a point that we use "my" time(x).
>
> See above for the fundamental difference - not in the implementation
> for a single timeout, but from a system-wide point of view.
>
>>>> 	then = time(0);
>>>> 	do {...} while ( time(then)<   ms_to_ticks(100) );
>>>
>>> We should NOT do this.  It is bound to break as soon as the code
>>> in the loop (the "..." part) needs to implement a timeout, too.
>>
>> I'm not sure to understand why. Can you develop how you think this would
>> break with an inner timeout?
>
> Sure:
>
> 	/* implement something which needs a 100 ms timeout */
> 	then = time(0);
>
> 	do {
> 		int then_nested;
> 		... do something...
> 		... do more...
> 		/* now do something which needs a 5 ms timeout */
> 		then_nested = time(0);
> 		do {
> 			...
> 		} while (time(then_nested)<  ms_to_ticks(5));
>
> 	} while (time(then)<  ms_to_ticks(100));
>
> You see the problem?

Actually no, I don't. As a reminder, I am considering the following 
definitions:

	#define time(n) (ticks-n)
	#define ms_to_ticks(ms) (ms * fast_tick_rate) / CONFIG_SYS_HZ

Neither of these has any side effect, so I am at loss as to why that 
would break when used in nested loops; each loop has its own reference 
start time by assigning time(0) to its own variable (then and 
then_nested), and each has its own elapsed time computation by passing 
its own variable to time() and comparing with its own constant timeout 
value. Can you point the exact instruction that would break in the code 
above, and why it would?

> Best regards,
>
> Wolfgang Denk

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 21:51       ` Reinhard Meyer
@ 2011-01-23 10:12         ` Wolfgang Denk
  2011-01-23 10:26           ` Reinhard Meyer
  0 siblings, 1 reply; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-23 10:12 UTC (permalink / raw)
  To: u-boot

Dear Reinhard Meyer,

In message <4D3B5171.7090700@emk-elektronik.de> you wrote:
> 
> With all this half quoting and deleting of important parts,
> my original proposal was lost again.

This is a prettyu long running thread, and I am not exactly sure what
your original proposal actually was.  Could you please post a
reference?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
"Out of register space (ugh)"
- vi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 10:12         ` Wolfgang Denk
@ 2011-01-23 10:26           ` Reinhard Meyer
  2011-01-23 16:23             ` Wolfgang Denk
  0 siblings, 1 reply; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-23 10:26 UTC (permalink / raw)
  To: u-boot

Dear Wolfgang Denk,
> In message<4D3B5171.7090700@emk-elektronik.de>  you wrote:
>>
>> With all this half quoting and deleting of important parts,
>> my original proposal was lost again.
>
> This is a prettyu long running thread, and I am not exactly sure what
> your original proposal actually was.  Could you please post a
> reference?
I just repost here:

There were several suggestions about that in the past (including from me)
that involve rework everywhere HZ related timeouts are used. I still
prefer a method as follows (because it does not need repeated mul/div calculations
nor necessarily 64 bit arithmetic):

u32 timeout = timeout_init(100); /* 100ms timeout */

do {...} while (!timed_out(timeout));

Internally it would be like:

timeout_init(x):
   return fast_tick + (x * fast_tick_rate) / CONFIG_SYS_HZ;
   /* this might need 64 bit precision in some implementations */

time_out(x):
   return ((i32)(x - fast_tick)) < 0;

If the tick were really high speed (and then 64 bits),
fast_tick could be derived by shifting the tick some bits to the right.

But, as long as we cannot agree on something, there will be no time spent
to make patches...

Best Regards,
Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-22 22:13       ` Albert ARIBAUD
@ 2011-01-23 16:15         ` Wolfgang Denk
  0 siblings, 0 replies; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-23 16:15 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

In message <4D3B568D.3080903@free.fr> you wrote:
> 
> > You see the problem?
> 
> Actually no, I don't. As a reminder, I am considering the following 
> definitions:
> 
> 	#define time(n) (ticks-n)
> 	#define ms_to_ticks(ms) (ms * fast_tick_rate) / CONFIG_SYS_HZ

Ah, I see. Well, I have to admit that I did not see this part; but
then, it is still similar to what the current get_timer(t) call is
doing (at least in it's reference implementation on PowerPC, cf.
arch/powerpc/lib/interrupts.c), except that no set_timer(t) is
provided.

In acy case - I vote to get rid of this funktion to set some "time
base value", and provide a plain void call interface to get_timer()
[or whatever the corresponding function would be called].

Also, I vote to always explicitly write the timeout loop as I gave in
the example, because this makes the meachnism clear to the reader of
the code - hiding it in macros or functions may only lead to conusions
or different (and broken) implementations across architectures.

> Neither of these has any side effect, so I am at loss as to why that 
> would break when used in nested loops; each loop has its own reference 

My reading was that "time(0)" would perform some reset of a variable
to give a wrap-around free starting point. Sorry for not reading all
of the previous context.

> start time by assigning time(0) to its own variable (then and 
> then_nested), and each has its own elapsed time computation by passing 
> its own variable to time() and comparing with its own constant timeout 

I don't see any use of the extra argument to the time() function /
macro, nor any need / use for such a function / macro at all.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
"Faith: not *wanting* to know what is true."    - Friedrich Nietzsche

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 10:26           ` Reinhard Meyer
@ 2011-01-23 16:23             ` Wolfgang Denk
  2011-01-23 18:47               ` Reinhard Meyer
  0 siblings, 1 reply; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-23 16:23 UTC (permalink / raw)
  To: u-boot

Dear Reinhard Meyer,

In message <4D3C0271.4070306@emk-elektronik.de> you wrote:
>
> There were several suggestions about that in the past (including from me)
> that involve rework everywhere HZ related timeouts are used. I still
> prefer a method as follows (because it does not need repeated mul/div calculations
> nor necessarily 64 bit arithmetic):
> 
> u32 timeout = timeout_init(100); /* 100ms timeout */
> 
> do {...} while (!timed_out(timeout));

I dislike this approach. I immediately fear the same problem I just
saw (incorrectly) in Albert's proposal - timeout_init() seems to
store the timeouut information in some internal varoable, which is
then checked by timed_out() - this is bound to fail as soon as
somebody atttempts to nest timeouts.

Your implementation may be different, but you can bet sooner or later
comes up with such a bugy implementation.

And it is not needed.

PLease see my proposal: we do not needs several timer or timeout
related functions, all we need is a plain "get timer" function,
without any arguments.  And the resulting code makes it obvious to the
reader that such loops can be nested as you like.

> time_out(x):
>    return ((i32)(x - fast_tick)) < 0;
> 
> If the tick were really high speed (and then 64 bits),
> fast_tick could be derived by shifting the tick some bits to the right.

I have no idea what "fast_tick" versus "tick" means here, nor why we
would need more than one tick.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
As a general rule, the freedom of any people can  be  judged  by  the
volume of their laughter.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 16:23             ` Wolfgang Denk
@ 2011-01-23 18:47               ` Reinhard Meyer
  2011-01-23 19:35                 ` Wolfgang Denk
  0 siblings, 1 reply; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-23 18:47 UTC (permalink / raw)
  To: u-boot

Dear Wolfgang Denk,
> Dear Reinhard Meyer,
>
> In message<4D3C0271.4070306@emk-elektronik.de>  you wrote:
>>
>> There were several suggestions about that in the past (including from me)
>> that involve rework everywhere HZ related timeouts are used. I still
>> prefer a method as follows (because it does not need repeated mul/div calculations
>> nor necessarily 64 bit arithmetic):
>>
>> u32 timeout = timeout_init(100); /* 100ms timeout */
>>
>> do {...} while (!timed_out(timeout));
>
> I dislike this approach. I immediately fear the same problem I just
> saw (incorrectly) in Albert's proposal - timeout_init() seems to
> store the timeouut information in some internal varoable, which is
> then checked by timed_out() - this is bound to fail as soon as
> somebody atttempts to nest timeouts.

Excuse me: the timeout info is stored in the user's variable "timeout",
that should be quite obvious!

Nested timeouts pose no problem as long as the user uses different (local)
vars to store the return value of timeout_init().

>
> Your implementation may be different, but you can bet sooner or later
> comes up with such a bugy implementation.

There is nothing buggy.

>
> And it is not needed.
>
> PLease see my proposal: we do not needs several timer or timeout
> related functions, all we need is a plain "get timer" function,
> without any arguments.  And the resulting code makes it obvious to the
> reader that such loops can be nested as you like.
>

If you demand that this get_timer returns in HZ units,
that will not be possible on most hardware without complicated code.
We have discussed that long ago...

>> time_out(x):
>>     return ((i32)(x - fast_tick))<  0;
>>
>> If the tick were really high speed (and then 64 bits),
>> fast_tick could be derived by shifting the tick some bits to the right.
>
> I have no idea what "fast_tick" versus "tick" means here, nor why we
> would need more than one tick.

Well, you could try to understand:
tick=the "at hardware speed running" timer, if that's incrementing too fast for
32 bit "timeout" vars for reasonable timeouts (up to a minute?),
it could be divided by a power of two before use,
or we plainly use u64 for the "timeout" variables; probably simpler than shifting...

Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 18:47               ` Reinhard Meyer
@ 2011-01-23 19:35                 ` Wolfgang Denk
  2011-01-23 20:59                   ` Albert ARIBAUD
  0 siblings, 1 reply; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-23 19:35 UTC (permalink / raw)
  To: u-boot

Dear Reinhard Meyer,

In message <4D3C77BC.50006@emk-elektronik.de> you wrote:
>
> >> u32 timeout = timeout_init(100); /* 100ms timeout */
> >>
> >> do {...} while (!timed_out(timeout));
> >
> > I dislike this approach. I immediately fear the same problem I just
> > saw (incorrectly) in Albert's proposal - timeout_init() seems to
> > store the timeouut information in some internal varoable, which is
> > then checked by timed_out() - this is bound to fail as soon as
> > somebody atttempts to nest timeouts.
> 
> Excuse me: the timeout info is stored in the user's variable "timeout",
> that should be quite obvious!

Is it? Why?  It could be as well that timeout_init(100) keeps some
internal information about an initialized timeout - the name suggests
it.

> Nested timeouts pose no problem as long as the user uses different (local)
> vars to store the return value of timeout_init().

Yes, but you need to see the implementation of timeout_init() and
timed_out() to make sure.

> > Your implementation may be different, but you can bet sooner or later
> > comes up with such a bugy implementation.
> 
> There is nothing buggy.

Not in your proposal - but the interface allows (or even invites) to
come up with incorrect implementations, and history has shown that we
don't catch all of these.  So let's chose an interface which has no
such problems.

> > PLease see my proposal: we do not needs several timer or timeout
> > related functions, all we need is a plain "get timer" function,
> > without any arguments.  And the resulting code makes it obvious to the
> > reader that such loops can be nested as you like.
> 
> If you demand that this get_timer returns in HZ units,

If we agree on get_timer(), then indeed the resolution shall ne
milliseconds, as this is how this function is defined.

On the other hand, milliseconds are too long for some types of
timeouts, so for short timeouts we would use get_ticks().

For reference, please see the PPC implementation (in
arch/powerpc/lib/ticks.S, arch/powerpc/lib/time.c,
arch/powerpc/lib/interrupts.c).

At the moment I would suggest to change the existing interface like
that:

* Drop the set_timer() function.

* Change get_timer() to take no argument, i. e.:

	unsigned long get_timer(void);

  get_timer() returns a monotonous upward counting time stamp with a
  resolution of milliseconds. After reaching ULONG_MAX the timer wraps
  around to 0.

  The get_timer() implementation may be interrupt based and is only
  available after relocation.

* Provide a fast, low-level, system dependent timer function

	unsigned long long get_ticks(void);

  get_ticks() returns a monotonous upward counting time stamp with a
  system-specific resolution. No assumptions should be made about the
  resolution. After reaching ULLONG_MAX the timer wraps around to 0.

  It is mandatory that get_ticks() is available before relocation.

* Provide a set of utility functions:

  ->	void wait_ticks(unsigned long ticks);

  Delay execution for "ticks" ticks.

  ->	unsigned long usec2ticks(unsigned long usec);

  Convert microseconds into ticks; intended for SHORT delays only
  (maximum depending on system clock, usually below 1 second).

  ->	void __udelay(unsigned long usec);

  Delay execution for "usec" microseconds; intended for SHORT delays
  only (maximum depending on system clock, usually below 1 second).
  If all architectures followed the above suggestion, we could move
  the PPC implementation to common code:

  	void __udelay(unsigned long usec) 
	{
		ulong ticks = usec2ticks(usec);
		wait_ticks(ticks); 
	}

  __udelay() can reliably be used before relocation.

  ->	void udelay(unsigned long usec)

  Similar to __udelay() with the additional functionality to trigger
  the watchdog timer for long delays.



> that will not be possible on most hardware without complicated code.
> We have discussed that long ago...

I am aware of this.

> Well, you could try to understand:
> tick=the "at hardware speed running" timer, if that's incrementing too fast for
> 32 bit "timeout" vars for reasonable timeouts (up to a minute?),

See above.  For short, high resolution timeouts you can use
get_ticks() and friends.  For long delays you can use get_timer().

Note that "reasonable timeouts (up to a minute?)" are only very
infrequently needed, and don't need the high resolution of
get_ticks(), so these would naturally be implemented on the base of
get_timer().


We have been using this implementation for more than a decade on
PowerPC.  The only thing you need is a monotonous upward counting
64 bit "time base" counter where you can read the system ticks from.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
"Just think of a computer as hardware you can program."
- Nigel de la Tierre

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 19:35                 ` Wolfgang Denk
@ 2011-01-23 20:59                   ` Albert ARIBAUD
  2011-01-23 21:22                     ` Reinhard Meyer
  0 siblings, 1 reply; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-23 20:59 UTC (permalink / raw)
  To: u-boot

Le 23/01/2011 20:35, Wolfgang Denk a ?crit :

> At the moment I would suggest to change the existing interface like
> that:
>
> * Drop the set_timer() function.
>
> * Change get_timer() to take no argument, i. e.:
>
> 	unsigned long get_timer(void);
>
>    get_timer() returns a monotonous upward counting time stamp with a
>    resolution of milliseconds. After reaching ULONG_MAX the timer wraps
>    around to 0.
>
>    The get_timer() implementation may be interrupt based and is only
>    available after relocation.
>
> * Provide a fast, low-level, system dependent timer function
>
> 	unsigned long long get_ticks(void);
>
>    get_ticks() returns a monotonous upward counting time stamp with a
>    system-specific resolution. No assumptions should be made about the
>    resolution. After reaching ULLONG_MAX the timer wraps around to 0.
>
>    It is mandatory that get_ticks() is available before relocation.
>
> * Provide a set of utility functions:
>
>    ->	void wait_ticks(unsigned long ticks);
>
>    Delay execution for "ticks" ticks.
>
>    ->	unsigned long usec2ticks(unsigned long usec);
>
>    Convert microseconds into ticks; intended for SHORT delays only
>    (maximum depending on system clock, usually below 1 second).
>
>    ->	void __udelay(unsigned long usec);
>
>    Delay execution for "usec" microseconds; intended for SHORT delays
>    only (maximum depending on system clock, usually below 1 second).
>    If all architectures followed the above suggestion, we could move
>    the PPC implementation to common code:
>
>    	void __udelay(unsigned long usec)
> 	{
> 		ulong ticks = usec2ticks(usec);
> 		wait_ticks(ticks);
> 	}
>
>    __udelay() can reliably be used before relocation.
>
>    ->	void udelay(unsigned long usec)
>
>    Similar to __udelay() with the additional functionality to trigger
>    the watchdog timer for long delays.
>
>
>
>> that will not be possible on most hardware without complicated code.
>> We have discussed that long ago...
>
> I am aware of this.
>
>> Well, you could try to understand:
>> tick=the "at hardware speed running" timer, if that's incrementing too fast for
>> 32 bit "timeout" vars for reasonable timeouts (up to a minute?),
>
> See above.  For short, high resolution timeouts you can use
> get_ticks() and friends.  For long delays you can use get_timer().
>
> Note that "reasonable timeouts (up to a minute?)" are only very
> infrequently needed, and don't need the high resolution of
> get_ticks(), so these would naturally be implemented on the base of
> get_timer().
>
>
> We have been using this implementation for more than a decade on
> PowerPC.  The only thing you need is a monotonous upward counting
> 64 bit "time base" counter where you can read the system ticks from.
>
> Best regards,
>
> Wolfgang Denk

This proposal covers what I was thinking of (oubviously I had not looked 
into PPC implementations) and the few differences with my proposal are 
not worth fighting over, so overall I am fine with the above.

Let us hear from others now, and if we reach an agreement, then we'll 
start discussing implementation.

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 20:59                   ` Albert ARIBAUD
@ 2011-01-23 21:22                     ` Reinhard Meyer
  2011-01-23 22:01                       ` Reinhard Meyer
  2011-01-23 22:57                       ` Wolfgang Denk
  0 siblings, 2 replies; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-23 21:22 UTC (permalink / raw)
  To: u-boot

On 23.01.2011 21:59, Albert ARIBAUD wrote:
> Le 23/01/2011 20:35, Wolfgang Denk a ?crit :
>
>> At the moment I would suggest to change the existing interface like
>> that:
>>
>> * Drop the set_timer() function.
>>
>> * Change get_timer() to take no argument, i. e.:
>>
>> unsigned long get_timer(void);
>>
>> get_timer() returns a monotonous upward counting time stamp with a
>> resolution of milliseconds. After reaching ULONG_MAX the timer wraps
>> around to 0.

Exactly that wrap makes the situation so complicated, since the simple code
u32 get_timer(void)
{
  return (ticks * 1000ULL) / tickspersec;
}
won't do that wrap.

>>
>> The get_timer() implementation may be interrupt based and is only
>> available after relocation.

Currently it is used before relocation in some places, I think I have
seen it in NAND drivers... That would have to be changed then.

>>
>> * Provide a fast, low-level, system dependent timer function
>>
>> unsigned long long get_ticks(void);
>>
>> get_ticks() returns a monotonous upward counting time stamp with a
>> system-specific resolution. No assumptions should be made about the
>> resolution. After reaching ULLONG_MAX the timer wraps around to 0.
>>
>> It is mandatory that get_ticks() is available before relocation.
>>
>> * Provide a set of utility functions:
>>
>> -> void wait_ticks(unsigned long ticks);
>>
>> Delay execution for "ticks" ticks.
>>
>> -> unsigned long usec2ticks(unsigned long usec);
>>
>> Convert microseconds into ticks; intended for SHORT delays only
>> (maximum depending on system clock, usually below 1 second).
>>
>> -> void __udelay(unsigned long usec);
>>
>> Delay execution for "usec" microseconds; intended for SHORT delays
>> only (maximum depending on system clock, usually below 1 second).
>> If all architectures followed the above suggestion, we could move
>> the PPC implementation to common code:
>>
>> void __udelay(unsigned long usec)
>> {
>> ulong ticks = usec2ticks(usec);
>> wait_ticks(ticks);
>> }
>>
>> __udelay() can reliably be used before relocation.
>>
>> -> void udelay(unsigned long usec)
>>
>> Similar to __udelay() with the additional functionality to trigger
>> the watchdog timer for long delays.
>>
>>
>>
>>> that will not be possible on most hardware without complicated code.
>>> We have discussed that long ago...
>>
>> I am aware of this.
>>
>>> Well, you could try to understand:
>>> tick=the "at hardware speed running" timer, if that's incrementing too fast for
>>> 32 bit "timeout" vars for reasonable timeouts (up to a minute?),
>>
>> See above. For short, high resolution timeouts you can use
>> get_ticks() and friends. For long delays you can use get_timer().
>>
>> Note that "reasonable timeouts (up to a minute?)" are only very
>> infrequently needed, and don't need the high resolution of
>> get_ticks(), so these would naturally be implemented on the base of
>> get_timer().
>>
>>
>> We have been using this implementation for more than a decade on
>> PowerPC. The only thing you need is a monotonous upward counting
>> 64 bit "time base" counter where you can read the system ticks from.
>>
>> Best regards,
>>
>> Wolfgang Denk
>
> This proposal covers what I was thinking of (oubviously I had not looked into PPC implementations) and the few differences with my proposal are not worth fighting over, so overall I am fine with the above.
>
> Let us hear from others now, and if we reach an agreement, then we'll start discussing implementation.
>
> Amicalement,

This is already implemented functionally very closely (apart from factoring and the
get_timer(void) change) to this in AT91, the only (academic) hitch is that it will
burp a few billion years after each reset :)

Check arch/arm/cpu/arm926ejs/at91/timer.c

What bothers me is the need for 64 bit mul/div in each loop iteration, for CPUs without
hardware for that this might slow down data transfer loops of the style

u32 start_time = get_timer();
do {
	if ("data_ready")
		/* transfer a byte */
	if (get_timer() - start_time > timeout)
		/* fail and exit loop */
} while (--"bytestodo" > 0);

since get_timer() will be somewhat like:

	return (tick * 1000ULL) / tickspersec;

As I stated before, tickspersec is a variable in, for example, AT91. So the
expression cannot be optimized by the compiler.

Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 21:22                     ` Reinhard Meyer
@ 2011-01-23 22:01                       ` Reinhard Meyer
  2011-01-23 22:57                       ` Wolfgang Denk
  1 sibling, 0 replies; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-23 22:01 UTC (permalink / raw)
  To: u-boot

On 23.01.2011 22:22, Reinhard Meyer wrote:
> On 23.01.2011 21:59, Albert ARIBAUD wrote:
>> Le 23/01/2011 20:35, Wolfgang Denk a ?crit :
>>
>>> At the moment I would suggest to change the existing interface like
>>> that:
>>>
>>> * Drop the set_timer() function.
>>>
>>> * Change get_timer() to take no argument, i. e.:
>>>
>>> unsigned long get_timer(void);
>>>
>>> get_timer() returns a monotonous upward counting time stamp with a
>>> resolution of milliseconds. After reaching ULONG_MAX the timer wraps
>>> around to 0.
>
> Exactly that wrap makes the situation so complicated, since the simple code
> u32 get_timer(void)
> {
>    return (ticks * 1000ULL) / tickspersec;
> }
> won't do that wrap.
>
>>>
>>> The get_timer() implementation may be interrupt based and is only
>>> available after relocation.
>
> Currently it is used before relocation in some places, I think I have
> seen it in NAND drivers... That would have to be changed then.
>
>>>
>>> * Provide a fast, low-level, system dependent timer function
>>>
>>> unsigned long long get_ticks(void);
>>>
>>> get_ticks() returns a monotonous upward counting time stamp with a
>>> system-specific resolution. No assumptions should be made about the
>>> resolution. After reaching ULLONG_MAX the timer wraps around to 0.
>>>
>>> It is mandatory that get_ticks() is available before relocation.
>>>
>>> * Provide a set of utility functions:
>>>
>>> ->  void wait_ticks(unsigned long ticks);
>>>
>>> Delay execution for "ticks" ticks.
>>>
>>> ->  unsigned long usec2ticks(unsigned long usec);
>>>
>>> Convert microseconds into ticks; intended for SHORT delays only
>>> (maximum depending on system clock, usually below 1 second).
>>>
>>> ->  void __udelay(unsigned long usec);
>>>
>>> Delay execution for "usec" microseconds; intended for SHORT delays
>>> only (maximum depending on system clock, usually below 1 second).
>>> If all architectures followed the above suggestion, we could move
>>> the PPC implementation to common code:
>>>
>>> void __udelay(unsigned long usec)
>>> {
>>> ulong ticks = usec2ticks(usec);
>>> wait_ticks(ticks);
>>> }
>>>
>>> __udelay() can reliably be used before relocation.
>>>
>>> ->  void udelay(unsigned long usec)
>>>
>>> Similar to __udelay() with the additional functionality to trigger
>>> the watchdog timer for long delays.
>>>
>>>
>>>
>>>> that will not be possible on most hardware without complicated code.
>>>> We have discussed that long ago...
>>>
>>> I am aware of this.
>>>
>>>> Well, you could try to understand:
>>>> tick=the "at hardware speed running" timer, if that's incrementing too fast for
>>>> 32 bit "timeout" vars for reasonable timeouts (up to a minute?),
>>>
>>> See above. For short, high resolution timeouts you can use
>>> get_ticks() and friends. For long delays you can use get_timer().
>>>
>>> Note that "reasonable timeouts (up to a minute?)" are only very
>>> infrequently needed, and don't need the high resolution of
>>> get_ticks(), so these would naturally be implemented on the base of
>>> get_timer().
>>>
>>>
>>> We have been using this implementation for more than a decade on
>>> PowerPC. The only thing you need is a monotonous upward counting
>>> 64 bit "time base" counter where you can read the system ticks from.
>>>
>>> Best regards,
>>>
>>> Wolfgang Denk
>>
>> This proposal covers what I was thinking of (oubviously I had not looked into PPC implementations) and the few differences with my proposal are not worth fighting over, so overall I am fine with the above.
>>
>> Let us hear from others now, and if we reach an agreement, then we'll start discussing implementation.
>>
>> Amicalement,
>
> This is already implemented functionally very closely (apart from factoring and the
> get_timer(void) change) to this in AT91, the only (academic) hitch is that it will
> burp a few billion years after each reset :)
>
> Check arch/arm/cpu/arm926ejs/at91/timer.c

look at u-boot-atmel.git, rework101229 branch, this reworked version already is minus the
reset_timer() function that is not needed anymore

>
> What bothers me is the need for 64 bit mul/div in each loop iteration, for CPUs without
> hardware for that this might slow down data transfer loops of the style
>
> u32 start_time = get_timer();
> do {
> 	if ("data_ready")
> 		/* transfer a byte */
> 	if (get_timer() - start_time>  timeout)
> 		/* fail and exit loop */
> } while (--"bytestodo">  0);
>
> since get_timer() will be somewhat like:
>
> 	return (tick * 1000ULL) / tickspersec;
>
> As I stated before, tickspersec is a variable in, for example, AT91. So the
> expression cannot be optimized by the compiler.
>
> Reinhard
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 21:22                     ` Reinhard Meyer
  2011-01-23 22:01                       ` Reinhard Meyer
@ 2011-01-23 22:57                       ` Wolfgang Denk
  2011-01-24  1:42                         ` J. William Campbell
  1 sibling, 1 reply; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-23 22:57 UTC (permalink / raw)
  To: u-boot

Dear Reinhard Meyer,

In message <4D3C9BFC.1010907@emk-elektronik.de> you wrote:
>
> >> get_timer() returns a monotonous upward counting time stamp with a
> >> resolution of milliseconds. After reaching ULONG_MAX the timer wraps
> >> around to 0.
> 
> Exactly that wrap makes the situation so complicated, since the simple code
> u32 get_timer(void)
> {
>   return (ticks * 1000ULL) / tickspersec;
> }
> won't do that wrap.

Do you have a better suggestion?

> >> The get_timer() implementation may be interrupt based and is only
> >> available after relocation.
> 
> Currently it is used before relocation in some places, I think I have
> seen it in NAND drivers... That would have to be changed then.

Indeed.  It is unreliable or even broken now.

> This is already implemented functionally very closely (apart from factoring and the
> get_timer(void) change) to this in AT91, the only (academic) hitch is that it will
> burp a few billion years after each reset :)

> What bothers me is the need for 64 bit mul/div in each loop iteration, for CPUs without
> hardware for that this might slow down data transfer loops of the style
> 
> u32 start_time = get_timer();
> do {
> 	if ("data_ready")
> 		/* transfer a byte */
> 	if (get_timer() - start_time > timeout)
> 		/* fail and exit loop */
> } while (--"bytestodo" > 0);
> 
> since get_timer() will be somewhat like:
> 
> 	return (tick * 1000ULL) / tickspersec;
> 
> As I stated before, tickspersec is a variable in, for example, AT91. So the
> expression cannot be optimized by the compiler.

I don't think this is the only way to implement this. How does Linux
derive time info from jiffies?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
You get a wonderful view from the point of no return.
                                    - Terry Pratchett, _Making_Money_

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-23 22:57                       ` Wolfgang Denk
@ 2011-01-24  1:42                         ` J. William Campbell
  2011-01-24  7:24                           ` Albert ARIBAUD
  0 siblings, 1 reply; 31+ messages in thread
From: J. William Campbell @ 2011-01-24  1:42 UTC (permalink / raw)
  To: u-boot

On 1/23/2011 2:57 PM, Wolfgang Denk wrote:
> Dear Reinhard Meyer,
>
> In message<4D3C9BFC.1010907@emk-elektronik.de>  you wrote:
>>>> get_timer() returns a monotonous upward counting time stamp with a
>>>> resolution of milliseconds. After reaching ULONG_MAX the timer wraps
>>>> around to 0.
>> Exactly that wrap makes the situation so complicated, since the simple code
>> u32 get_timer(void)
>> {
>>    return (ticks * 1000ULL) / tickspersec;
>> }
>> won't do that wrap.
> Do you have a better suggestion?
>
>>>> The get_timer() implementation may be interrupt based and is only
>>>> available after relocation.
>> Currently it is used before relocation in some places, I think I have
>> seen it in NAND drivers... That would have to be changed then.
> Indeed.  It is unreliable or even broken now.
>
>> This is already implemented functionally very closely (apart from factoring and the
>> get_timer(void) change) to this in AT91, the only (academic) hitch is that it will
>> burp a few billion years after each reset :)
>> What bothers me is the need for 64 bit mul/div in each loop iteration, for CPUs without
>> hardware for that this might slow down data transfer loops of the style
>>
>> u32 start_time = get_timer();
>> do {
>> 	if ("data_ready")
>> 		/* transfer a byte */
>> 	if (get_timer() - start_time>  timeout)
>> 		/* fail and exit loop */
>> } while (--"bytestodo">  0);
>>
>> since get_timer() will be somewhat like:
>>
>> 	return (tick * 1000ULL) / tickspersec;
>>
>> As I stated before, tickspersec is a variable in, for example, AT91. So the
>> expression cannot be optimized by the compiler.
> I don't think this is the only way to implement this. How does Linux
> derive time info from jiffies?

Hi All,
           In order to avoid doing 64 bit math, we can define a "jiffie" 
or a "bogo_ms" that is the 64 bit timebase shifted right such that the 
lsb of the bottom 32 bits has a resolution of between 0.5 ms and 1 ms. 
It is then possible to convert the difference between two jiffie/bogo_ms 
values to a number of ms using a 32 bit multiply and a right shift of 16 
bits, with essentially negligible error.  get_bogo_ms() would return a 
32 bit number in bogo_ms, thus the timing loop would be written.

u32 start_time = get_bogo_ms();
do {
     if ("data_ready")
         /* transfer a byte */
     if (bogo_ms_to_ms(get_timer() - start_time)>  TIMEOUT_IN_MS)
         /* fail and exit loop */
} while (--"bytestodo">  0);

u32 get_bogo_ms()
{
         u64 tick;
         read(tick);

          return (tick>>  gd->timer_shift);
}
u32 bogo_ms_to_ms(u32 x)
{
    /* this code assumes the resulting ms will be between 0 and 65535, 
or 65 seconds */
        return ((x * gd->cvt_bogo_ms_to_ms)>>  16); /* cvt_bogo_ms_to_ms 
is a 16 bit binary fraction */
}

All the above code assumes timeouts are 65 seconds or less, which I 
think is probably fair. Conversion of ms values up to 65 seconds to 
bogo_ms is also easy, and a 32 bit multiplied result is all that is 
required.
What is not so easy is converting a 32 bit timer value to ms.  It can be 
done if the CPU can do a 32 by 32 multiply to produce a 64 bit result, 
use the msb, and possibly correct the result by an add if  bit 32,of the 
timer is set.  You need a 33 bit counter in bogo_ms to get a monotonic, 
accurate 32 bit counter in ms. The powerpc can use a mulhw operation to 
do this, and any CPU that will produce a 64 bit product can do this. 
However, many CPUs do not produce 64 bit products easily. Using division 
to do these operations are even less appealing, as many CPUs do not 
provide hardware division at all. Since it is not necessary to do this 
conversion to easily use timeouts with 1 ms resolution and accuracy,  I 
think the idea of not using a timer in ms but rather bogo_ms/jiffies is 
possibly better?

Best Regards,
Bill Campbell

> Best regards,
>
> Wolfgang Denk
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  1:42                         ` J. William Campbell
@ 2011-01-24  7:24                           ` Albert ARIBAUD
  2011-01-24  7:50                             ` Reinhard Meyer
                                               ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-24  7:24 UTC (permalink / raw)
  To: u-boot

Le 24/01/2011 02:42, J. William Campbell a ?crit :

> Hi All,
>             In order to avoid doing 64 bit math, we can define a "jiffie"
> or a "bogo_ms" that is the 64 bit timebase shifted right such that the
> lsb of the bottom 32 bits has a resolution of between 0.5 ms and 1 ms.
> It is then possible to convert the difference between two jiffie/bogo_ms
> values to a number of ms using a 32 bit multiply and a right shift of 16
> bits, with essentially negligible error.  get_bogo_ms() would return a
> 32 bit number in bogo_ms, thus the timing loop would be written.
>
> u32 start_time = get_bogo_ms();
> do {
>       if ("data_ready")
>           /* transfer a byte */
>       if (bogo_ms_to_ms(get_timer() - start_time)>   TIMEOUT_IN_MS)
>           /* fail and exit loop */
> } while (--"bytestodo">   0);
>
> u32 get_bogo_ms()
> {
>           u64 tick;
>           read(tick);
>
>            return (tick>>   gd->timer_shift);
> }
> u32 bogo_ms_to_ms(u32 x)
> {
>      /* this code assumes the resulting ms will be between 0 and 65535,
> or 65 seconds */
>          return ((x * gd->cvt_bogo_ms_to_ms)>>   16); /* cvt_bogo_ms_to_ms
> is a 16 bit binary fraction */
> }
>
> All the above code assumes timeouts are 65 seconds or less, which I
> think is probably fair. Conversion of ms values up to 65 seconds to
> bogo_ms is also easy, and a 32 bit multiplied result is all that is
> required.
> What is not so easy is converting a 32 bit timer value to ms.  It can be
> done if the CPU can do a 32 by 32 multiply to produce a 64 bit result,
> use the msb, and possibly correct the result by an add if  bit 32,of the
> timer is set.  You need a 33 bit counter in bogo_ms to get a monotonic,
> accurate 32 bit counter in ms. The powerpc can use a mulhw operation to
> do this, and any CPU that will produce a 64 bit product can do this.
> However, many CPUs do not produce 64 bit products easily. Using division
> to do these operations are even less appealing, as many CPUs do not
> provide hardware division at all. Since it is not necessary to do this
> conversion to easily use timeouts with 1 ms resolution and accuracy,  I
> think the idea of not using a timer in ms but rather bogo_ms/jiffies is
> possibly better?
>
> Best Regards,
> Bill Campbell

That is assuming a 64-bit timebase, isn't it? for CPUs / SoCs that don't 
have such a timebase but only a 32-bit timer, the bogo_ms/jiffy would 
not go through the full 32-bit range, which would cause issues with the 
timing loops on rollover -- and while a timeout of more than 65 sec may 
not be too likely, a timeout starting near the wraparound value of 
bogo_ms still could happen.

Besides, the 'tick' unit of time makes physical sense but the bogo_ms 
would not, while still not being a common timing value -- reminds me of 
my ms_to_ticks conversion macro that Wolfgang did not like.

In a more general perspective, I'd like to see where where exactly we 
need 64-bit multiply/divide operations in Wolfgang's proposal before we 
try to get rid of it. In my understanding:

- get_timer() works in pure ticks, not ms, and thus does not need 
multiply/divide; it may at most need to implement a carry over from 32 
bit to 64 bits *if* the HW counter is 32 bits *and if* we want a 64-bit 
virtual counter.

- get_time() works in ms, and thus needs scale conversion, so possibly a 
multiply/divide but possibly some other method, to convert a tick value 
to an ms value.

That's where I come back to one point of my proposal: if we can get a 
general framework for get_timer() to return a 64-bit free-running tick 
value, then we might not need a ms-based get_time() at all, because we 
could use get_timer() as well for ms timings, provided we can convert 
our timeout from ms to ticks, i.e.

	/* let's wait 200 milliseconds */
	/* Timing loop uses ticks: convert 200 ms to 'timeout' ticks */
	timeout = ms_to_ticks(200);
	u32 start = get_timer(); /* start time, in ticks */
	do {
		...
	} while ( (get_timer() -start) < timeout);

This way, a timing loop would not involve anything more complex than a 
64-bit subtraction and comparison; the only division/multiplication 
involved would be in the timeout computation, out of the loop.

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  7:24                           ` Albert ARIBAUD
@ 2011-01-24  7:50                             ` Reinhard Meyer
  2011-01-24 12:59                               ` Wolfgang Denk
  2011-01-24  8:25                             ` Andreas Bießmann
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Reinhard Meyer @ 2011-01-24  7:50 UTC (permalink / raw)
  To: u-boot

Dear all,

its quite funny to see how we go around in circles here, this proposal of Albert
now is quite close to my original proposal. Only that I factored the ms_to_ticks
AND the comparison into the timer calls:

u64 timer_setup(u32 timeout_in_ms)
{
	return get_ticks() + ms_to_ticks(timeout_in_ms);
}

int timer_expired(u64 endtime)
{
	/*
	 * convert the unsigned difference to signed, to easyly
	 * check for "carry". In assembly we could just do a BCC
	 * after the subtraction to see whether get_ticks()
	 * has passed ahead of endtime.
	 */
	return (signed)(endtime - get_ticks()) < 0;
}

What can be more pragmatic and trivial than those two functions??

Usage then:

 	/* let's wait 200 milliseconds */
 	u64 endtime = timer_setup(200);
 	do {
 		...
 	} while (!timer_expired(endtime));

 
> That's where I come back to one point of my proposal: if we can get a 
> general framework for get_timer() to return a 64-bit free-running tick 

We have that already at least on PowerPC and AT91. Its called u64 get_ticks(void)
and returns a free running 64 bit value. An associated function,
u64 get_tbclk(void) returns the frequency of that tick.

I don't think that this part of the framework needs to be discussed -
except *maybe* for function names.

> value, then we might not need a ms-based get_time() at all, because we 
> could use get_timer() as well for ms timings, provided we can convert 
> our timeout from ms to ticks, i.e.
> 
> 	/* let's wait 200 milliseconds */
> 	/* Timing loop uses ticks: convert 200 ms to 'timeout' ticks */
> 	timeout = ms_to_ticks(200);
> 	u32 start = get_timer(); /* start time, in ticks */
> 	do {
> 		...
> 	} while ( (get_timer() -start) < timeout);

Mandatory u64 for start AND timeout, please.
It is the same functionality as my proposal, just bears more places where
"users" might make mistakes.

But I am sure that Wolfgang will not like either of our proposals, because
the variables used in "userspace" are not in ms.

Reinhard

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  7:24                           ` Albert ARIBAUD
  2011-01-24  7:50                             ` Reinhard Meyer
@ 2011-01-24  8:25                             ` Andreas Bießmann
  2011-01-24 11:58                               ` Albert ARIBAUD
  2011-01-24 12:54                             ` Wolfgang Denk
  2011-01-24 13:02                             ` Wolfgang Denk
  3 siblings, 1 reply; 31+ messages in thread
From: Andreas Bießmann @ 2011-01-24  8:25 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

Am 24.01.2011 um 08:24 schrieb Albert ARIBAUD:

> Le 24/01/2011 02:42, J. William Campbell a ?crit :
> 
>> Hi All,
>>            In order to avoid doing 64 bit math, we can define a "jiffie"
>> or a "bogo_ms" that is the 64 bit timebase shifted right such that the
>> lsb of the bottom 32 bits has a resolution of between 0.5 ms and 1 ms.
>> It is then possible to convert the difference between two jiffie/bogo_ms
>> values to a number of ms using a 32 bit multiply and a right shift of 16
>> bits, with essentially negligible error.  get_bogo_ms() would return a
>> 32 bit number in bogo_ms, thus the timing loop would be written.
>> 
>> u32 start_time = get_bogo_ms();
>> do {
>>      if ("data_ready")
>>          /* transfer a byte */
>>      if (bogo_ms_to_ms(get_timer() - start_time)>   TIMEOUT_IN_MS)
>>          /* fail and exit loop */
>> } while (--"bytestodo">   0);
>> 
>> u32 get_bogo_ms()
>> {
>>          u64 tick;
>>          read(tick);
>> 
>>           return (tick>>   gd->timer_shift);
>> }
>> u32 bogo_ms_to_ms(u32 x)
>> {
>>     /* this code assumes the resulting ms will be between 0 and 65535,
>> or 65 seconds */
>>         return ((x * gd->cvt_bogo_ms_to_ms)>>   16); /* cvt_bogo_ms_to_ms
>> is a 16 bit binary fraction */
>> }
>> 
>> All the above code assumes timeouts are 65 seconds or less, which I
>> think is probably fair. Conversion of ms values up to 65 seconds to
>> bogo_ms is also easy, and a 32 bit multiplied result is all that is
>> required.
>> What is not so easy is converting a 32 bit timer value to ms.  It can be
>> done if the CPU can do a 32 by 32 multiply to produce a 64 bit result,
>> use the msb, and possibly correct the result by an add if  bit 32,of the
>> timer is set.  You need a 33 bit counter in bogo_ms to get a monotonic,
>> accurate 32 bit counter in ms. The powerpc can use a mulhw operation to
>> do this, and any CPU that will produce a 64 bit product can do this.
>> However, many CPUs do not produce 64 bit products easily. Using division
>> to do these operations are even less appealing, as many CPUs do not
>> provide hardware division at all. Since it is not necessary to do this
>> conversion to easily use timeouts with 1 ms resolution and accuracy,  I
>> think the idea of not using a timer in ms but rather bogo_ms/jiffies is
>> possibly better?
>> 
>> Best Regards,
>> Bill Campbell
> 
> That is assuming a 64-bit timebase, isn't it? for CPUs / SoCs that don't 
> have such a timebase but only a 32-bit timer, the bogo_ms/jiffy would 
> not go through the full 32-bit range, which would cause issues with the 
> timing loops on rollover -- and while a timeout of more than 65 sec may 
> not be too likely, a timeout starting near the wraparound value of 
> bogo_ms still could happen.

I agree with the possibility of wrap around near the end of u32 'bogo_ms' counter. Therefore we do need to define some constraints for such a '64 bit free running tick counter'. It could be implemented to overflow in some seconds as the u32 bogo_ms would do.

> Besides, the 'tick' unit of time makes physical sense but the bogo_ms 
> would not, while still not being a common timing value -- reminds me of 
> my ms_to_ticks conversion macro that Wolfgang did not like.

I also dislike to have another virtual physical dimension defined here.

> In a more general perspective, I'd like to see where where exactly we 
> need 64-bit multiply/divide operations in Wolfgang's proposal before we 
> try to get rid of it. In my understanding:
> 
> - get_timer() works in pure ticks, not ms, and thus does not need 
> multiply/divide; it may at most need to implement a carry over from 32 
> bit to 64 bits *if* the HW counter is 32 bits *and if* we want a 64-bit 
> virtual counter.
> 
> - get_time() works in ms, and thus needs scale conversion, so possibly a 
> multiply/divide but possibly some other method, to convert a tick value 
> to an ms value.
> 
> That's where I come back to one point of my proposal: if we can get a 
> general framework for get_timer() to return a 64-bit free-running tick 
> value, then we might not need a ms-based get_time() at all, because we 
> could use get_timer() as well for ms timings, provided we can convert 
> our timeout from ms to ticks, i.e.
> 
> 	/* let's wait 200 milliseconds */
> 	/* Timing loop uses ticks: convert 200 ms to 'timeout' ticks */
> 	timeout = ms_to_ticks(200);
> 	u32 start = get_timer(); /* start time, in ticks */
> 	do {
> 		...
> 	} while ( (get_timer() -start) < timeout);

You may think about the following change to this proposal:

/* lets wait 200 ms */
/* get the end point of our timeout in ticks */
u64 timeout_end = get_timer() + ms_to_ticks(200);
do {
 ...
} while ( get_timer() < timeout_end);

If I got Reinhard's proposal correct this is exactly what he meant. He call it 'timer_init(timeout_val)' and 'is_timeout()' but I feel this is exactly what he described.

First we calculate the timeout in ticks, then  just compare the 'now()' value with the end point of the timeout loop. I claim this approach is a bit better than yours on systems that can not do 64 bit instructions natively.

> This way, a timing loop would not involve anything more complex than a 
> 64-bit subtraction and comparison; the only division/multiplication 
> involved would be in the timeout computation, out of the loop.

You forgot to mention the 'ms_to_ticks()' could be pre-calculated by preprocessor in most cases. This may be a huge performance gain in most cases.

regards

Andreas Bie?mann

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  8:25                             ` Andreas Bießmann
@ 2011-01-24 11:58                               ` Albert ARIBAUD
  2011-01-24 12:06                                 ` Albert ARIBAUD
  2011-01-24 12:58                                 ` Andreas Bießmann
  0 siblings, 2 replies; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-24 11:58 UTC (permalink / raw)
  To: u-boot

Hi Andreas,

Le 24/01/2011 09:25, Andreas Bie?mann a ?crit :

>> That's where I come back to one point of my proposal: if we can get a
>> general framework for get_timer() to return a 64-bit free-running tick
>> value, then we might not need a ms-based get_time() at all, because we
>> could use get_timer() as well for ms timings, provided we can convert
>> our timeout from ms to ticks, i.e.
>>
>> 	/* let's wait 200 milliseconds */
>> 	/* Timing loop uses ticks: convert 200 ms to 'timeout' ticks */
>> 	timeout = ms_to_ticks(200);
>> 	u32 start = get_timer(); /* start time, in ticks */
>> 	do {
>> 		...
>> 	} while ( (get_timer() -start)<  timeout);
>
> You may think about the following change to this proposal:
>
> /* lets wait 200 ms */
> /* get the end point of our timeout in ticks */
> u64 timeout_end = get_timer() + ms_to_ticks(200);
> do {
>   ...
> } while ( get_timer()<  timeout_end);

The problem here is that in the loop exit condition you replace a 
difference between two unsigned times (which always yields the correct 
duration) with a comparison of two dates (which does not).

For instance, if at loop entry get_timer() was, say, 10 ticks to 
rollover and the loop timing was 12 ticks, you end up with an end date 
of 2. If your loop body runs long enough, get_timer() may already have 
gone past this and will this stay greater than timeout_end for a very 
long time.

OTOH, using get_timer() on entry of loop and subtracting it from 
get_timer()@each loop iteration always yields the time elapsed, 
unaffected by rollover. You can then safely compare this elapsed time 
with the time-out value.

Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24 11:58                               ` Albert ARIBAUD
@ 2011-01-24 12:06                                 ` Albert ARIBAUD
  2011-01-24 12:58                                 ` Andreas Bießmann
  1 sibling, 0 replies; 31+ messages in thread
From: Albert ARIBAUD @ 2011-01-24 12:06 UTC (permalink / raw)
  To: u-boot

Le 24/01/2011 12:58, Albert ARIBAUD a ?crit :

> For instance, if at loop entry get_timer() was, say, 10 ticks to
> rollover and the loop timing was 12 ticks, you end up with an end date
> of 2. If your loop body runs long enough, get_timer() may already have
> gone past this and will this stay greater than timeout_end for a very
> long time.

I should always wait for the coffee to produce its effect before 
posting. The right example is having timeout_end *not* have rolled over 
(thus being a very high number) and the loop body being long enough that 
get_timer() is called at the first loop condition test *after* rollover, 
thus being very small.

> OTOH, using get_timer() on entry of loop and subtracting it from
> get_timer() at each loop iteration always yields the time elapsed,
> unaffected by rollover. You can then safely compare this elapsed time
> with the time-out value.
>
> Amicalement,


Amicalement,
-- 
Albert.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  7:24                           ` Albert ARIBAUD
  2011-01-24  7:50                             ` Reinhard Meyer
  2011-01-24  8:25                             ` Andreas Bießmann
@ 2011-01-24 12:54                             ` Wolfgang Denk
  2011-01-24 13:02                             ` Wolfgang Denk
  3 siblings, 0 replies; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-24 12:54 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

In message <4D3D2942.4060600@free.fr> you wrote:
>
> - get_timer() works in pure ticks, not ms, and thus does not need
> multiply/divide; it may at most need to implement a carry over from 32
> bit to 64 bits *if* the HW counter is 32 bits *and if* we want a 64-bit
> virtual counter.
> - get_time() works in ms, and thus needs scale conversion, so possibly a
> multiply/divide but possibly some other method, to convert a tick value
> to an ms value.

No.  There is get_ticks(), which operates on ticks, and there is
get_timer(), which returns milliseconds.


> That's where I come back to one point of my proposal: if we can get a
> general framework for get_timer() to return a 64-bit free-running tick
> value, then we might not need a ms-based get_time() at all, because we
> could use get_timer() as well for ms timings, provided we can convert

Then you will always have to use 64 bit variables for all related
operations.  Note that get_timer() returns "unsigned long"", which
allows for smaller code in the majority of use cases.


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Program maintenance is an entropy-increasing process,  and  even  its
most skilfull execution only delays the subsidence of the system into
unfixable obsolescence.       - Fred Brooks, "The Mythical Man Month"

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24 11:58                               ` Albert ARIBAUD
  2011-01-24 12:06                                 ` Albert ARIBAUD
@ 2011-01-24 12:58                                 ` Andreas Bießmann
  1 sibling, 0 replies; 31+ messages in thread
From: Andreas Bießmann @ 2011-01-24 12:58 UTC (permalink / raw)
  To: u-boot

Hi Albert,

Am 24.01.2011 12:58, schrieb Albert ARIBAUD:
> Hi Andreas,
> 
> Le 24/01/2011 09:25, Andreas Bie?mann a ?crit :
> 
>>> That's where I come back to one point of my proposal: if we can get a
>>> general framework for get_timer() to return a 64-bit free-running tick
>>> value, then we might not need a ms-based get_time() at all, because we
>>> could use get_timer() as well for ms timings, provided we can convert
>>> our timeout from ms to ticks, i.e.
>>>
>>>     /* let's wait 200 milliseconds */
>>>     /* Timing loop uses ticks: convert 200 ms to 'timeout' ticks */
>>>     timeout = ms_to_ticks(200);
>>>     u32 start = get_timer(); /* start time, in ticks */
>>>     do {
>>>         ...
>>>     } while ( (get_timer() -start)<  timeout);
>>
>> You may think about the following change to this proposal:
>>
>> /* lets wait 200 ms */
>> /* get the end point of our timeout in ticks */
>> u64 timeout_end = get_timer() + ms_to_ticks(200);
>> do {
>>   ...
>> } while ( get_timer()<  timeout_end);
> 
> The problem here is that in the loop exit condition you replace a
> difference between two unsigned times (which always yields the correct
> duration) with a comparison of two dates (which does not).

Ok, I got your point.

regards

Andreas Bie?mann

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  7:50                             ` Reinhard Meyer
@ 2011-01-24 12:59                               ` Wolfgang Denk
  0 siblings, 0 replies; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-24 12:59 UTC (permalink / raw)
  To: u-boot

Dear Reinhard Meyer,

In message <4D3D2F34.6020903@emk-elektronik.de> you wrote:
> Dear all,
> 
> its quite funny to see how we go around in circles here, this proposal of Albert
> now is quite close to my original proposal. Only that I factored the ms_to_ticks
> AND the comparison into the timer calls:
> 
> u64 timer_setup(u32 timeout_in_ms)
> {
> 	return get_ticks() + ms_to_ticks(timeout_in_ms);
> }

No.  I said this severaltimes before, and I repeat it here one other
time:  I do not want to see such a function.  There nothing to be done
to "set up a timer".


> 	/*
> 	 * convert the unsigned difference to signed, to easyly
> 	 * check for "carry". In assembly we could just do a BCC
> 	 * after the subtraction to see whether get_ticks()
> 	 * has passed ahead of endtime.
> 	 */
> 	return (signed)(endtime - get_ticks()) < 0;

Are you sure this is really working? Why do you insist on this
approach instead of using the proven to be correct way to write this?

> > 	u32 start = get_timer(); /* start time, in ticks */

No. I do not see any reason to change existing interfaces.

If ticks are wanted, then use get_ticks().  And deal with the overhead
or 64 bit arithmetics.

But in general code I prefer get_timer() - milliseconds, u32.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Substitute "damn" every time you're inclined to write "very"; your
editor will delete it and the writing will be just as it should be.
                - Mark Twain

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24  7:24                           ` Albert ARIBAUD
                                               ` (2 preceding siblings ...)
  2011-01-24 12:54                             ` Wolfgang Denk
@ 2011-01-24 13:02                             ` Wolfgang Denk
  2011-01-24 16:23                               ` J. William Campbell
  3 siblings, 1 reply; 31+ messages in thread
From: Wolfgang Denk @ 2011-01-24 13:02 UTC (permalink / raw)
  To: u-boot

Dear Albert ARIBAUD,

In message <4D3D2942.4060600@free.fr> you wrote:
> 
> That is assuming a 64-bit timebase, isn't it? for CPUs / SoCs that don't >
> have such a timebase but only a 32-bit timer, the bogo_ms/jiffy would >
> not go through the full 32-bit range, which would cause issues with the >
> timing loops on rollover -- and while a timeout of more than 65 sec may >
> not be too likely, a timeout starting near the wraparound value of >
> bogo_ms still could happen.

Sorry, but I don't get it.  What exactly is the problem with a 32 bit
counter, and why would it not go through the full 32-bit range?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
Vor allem kein Gedanke! Nichts ist kompromittierender als ein  Gedan-
ke!            - Friedrich Wilhelm Nietzsche _Der Fall Wagner_ (1888)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [U-Boot] [RFC] ARM timing code refactoring
  2011-01-24 13:02                             ` Wolfgang Denk
@ 2011-01-24 16:23                               ` J. William Campbell
  0 siblings, 0 replies; 31+ messages in thread
From: J. William Campbell @ 2011-01-24 16:23 UTC (permalink / raw)
  To: u-boot

On 1/24/2011 5:02 AM, Wolfgang Denk wrote:
> Dear Albert ARIBAUD,
>
> In message<4D3D2942.4060600@free.fr>  you wrote:
>> That is assuming a 64-bit timebase, isn't it? for CPUs / SoCs that don't>
>> have such a timebase but only a 32-bit timer, the bogo_ms/jiffy would>
>> not go through the full 32-bit range, which would cause issues with the>
>> timing loops on rollover -- and while a timeout of more than 65 sec may>
>> not be too likely, a timeout starting near the wraparound value of>
>> bogo_ms still could happen.
Hi All,
        If you use my approach of shifting right by n bits  (in order to 
get a counter that has "about" 1 ms resolution),  zeros are shifted into 
the top. The counter would never reach more than (32-n) bits in 
magnitude. This is easily fixed by creating a "virtual" n msbs for the 
counter, as must be done if a 64 bit counter is to be simulated. 
Actually, I only need "n" bits, but in any case it requires detecting 
that the counter has "backed up" mod 32 bits and if so, increase the 
virtual counter by 1.

If you really, really, really want a timer that "ticks" at a 1 ms rate, 
instead of bogo_ms/jiffies, I propose the following:

u32 get_time()
{
         u32 delta_t_bogo_ms;
        u32  t_save;
#if defined(TIMER_IS_64_BITS)
         u64 tick;
         read(&tick);

          t_save  = tick >> gd->timer_shift;
          delta_t_bogo_ms = t_bogo_ms - gd->prev_timer;
#else
       read(t_save);
       delta_t_bogo_ms = (t_save  - gd->prev_timer) >> gd->timer_shift;
#endif
       gd->prev_timer = t_save;      /* save previous counter */
       if (delta_t_bogo_ms < gd->bogo_ms_per_65_sec)
       {
              gd->fract_timer    +=  delta_t_bogo_ms * 
gd->cvt_bogo_ms_to_ms; /* accumulate fraction of ms */
              gd->timer_in_ms += gd->fract_timer >> 16;
             gd->fract_timer &= 0X0000FFFF;
       }
       else
             gd->fract_timer = 0; /* start accumulating from 0 fraction */
       return(gd->timer_in_ms)
}

            This routine will create a timer in ms, that will be 
accurate as long as this routine is called either once about ever 65 
seconds or once per time base rollover, whichever is less. Nested timer 
use is ok. If the timer is not called frequently enough, it will return 
the same value twice, but after that it will start timing normally. This 
shouldn't matter, as the second returned value will  be the start of a 
timing loop. Timeout values can be the full 32 bits, as long as you keep 
calling the routine frequently enough. No initialization is required.
           Note that you can (and probably should) use the bottom 32 
bits of the hardware timebase as a 32 bit timebase unless the clock 
would overflow in 65 seconds (running faster than about 66 MHz), or if 
you want to  relax the 65 seconds.
If you want to save a word in the gd data,  use 0X10000 instead of 
bogo_ms_per_65_sec (or a more precise value if you know it). Note that 
gd->timer_shift and gd->cvt_bogo_ms_to_ms can also be replaced by 
constants if the clock speed is fixed.

This is more expensive than using the bogo_ms timer, but does have the 
advantage that everything is in ms. FWIW, I think converting from ms to 
some other unit for loop control is fine, as long as we have a standard 
routine to do that that is "cheap". However, others may not agree. For 
sure, passing around 64 bit tick values to do this process is, IMHO, 
vast overkill and not a good general solution, as many processors really 
don't like to do 64 bit operations.

Best Regards.
Bill Campbell


> Sorry, but I don't get it.  What exactly is the problem with a 32 bit
> counter, and why would it not go through the full 32-bit range?
>
> Best regards,
>
> Wolfgang Denk
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-01-24 16:23 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-22 10:20 [U-Boot] [RFC] ARM timing code refactoring Albert ARIBAUD
2011-01-22 10:42 ` Reinhard Meyer
2011-01-22 11:32   ` Albert ARIBAUD
2011-01-22 11:00 ` [U-Boot] [RFC] U-boot (was: ARM) " Reinhard Meyer
2011-01-22 12:22   ` [U-Boot] [RFC] U-boot Albert ARIBAUD
2011-01-22 19:19 ` [U-Boot] [RFC] ARM timing code refactoring Wolfgang Denk
2011-01-22 20:17   ` Albert ARIBAUD
2011-01-22 21:26     ` Wolfgang Denk
2011-01-22 21:51       ` Reinhard Meyer
2011-01-23 10:12         ` Wolfgang Denk
2011-01-23 10:26           ` Reinhard Meyer
2011-01-23 16:23             ` Wolfgang Denk
2011-01-23 18:47               ` Reinhard Meyer
2011-01-23 19:35                 ` Wolfgang Denk
2011-01-23 20:59                   ` Albert ARIBAUD
2011-01-23 21:22                     ` Reinhard Meyer
2011-01-23 22:01                       ` Reinhard Meyer
2011-01-23 22:57                       ` Wolfgang Denk
2011-01-24  1:42                         ` J. William Campbell
2011-01-24  7:24                           ` Albert ARIBAUD
2011-01-24  7:50                             ` Reinhard Meyer
2011-01-24 12:59                               ` Wolfgang Denk
2011-01-24  8:25                             ` Andreas Bießmann
2011-01-24 11:58                               ` Albert ARIBAUD
2011-01-24 12:06                                 ` Albert ARIBAUD
2011-01-24 12:58                                 ` Andreas Bießmann
2011-01-24 12:54                             ` Wolfgang Denk
2011-01-24 13:02                             ` Wolfgang Denk
2011-01-24 16:23                               ` J. William Campbell
2011-01-22 22:13       ` Albert ARIBAUD
2011-01-23 16:15         ` Wolfgang Denk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.