All of lore.kernel.org
 help / color / mirror / Atom feed
* Guarantee udelay(N) spins at least N microseconds
@ 2015-04-10 11:25 Mason
  2015-04-10 11:42 ` Willy Tarreau
  2015-04-10 11:44 ` Russell King - ARM Linux
  0 siblings, 2 replies; 16+ messages in thread
From: Mason @ 2015-04-10 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hello everyone,

This is take 2 of my tiny delay.c patch

Problem statement

When converting microseconds to timer cycles in __timer_udelay() and
__timer_const_udelay(), the result is rounded down(*), which means the
system will not spin as long as requested (specifically, between
epsilon and 1 cycle shorter).

If I understand correctly, most drivers expect udelay(N) to spin for
at least N ?s. Is that correct? In that use case, spinning less might
introduce subtle heisenbugs.


Typical example

timer->freq = 90 kHz && HZ = 100
(thus UDELAY_MULT = 107374 && ticks_per_jiffy = 900)

udelay(10) => __timer_const_udelay(10*107374)
            => __timer_delay((1073740*900) >> 30)
            => __timer_delay(0)

So udelay(10) resolves to no delay at all.


(*) 2^41 / 10^6 = 2199023,255552
2199023 < 2^41 / 10^6
UDELAY_MULT = 2199023*HZ / 2^11 < 2^30*HZ / 10^6

cycles = N * UDELAY_MULT * freq/HZ / 2^30
        < N * 2^30*HZ / 10^6 * freq/HZ / 2^30
        < N / 10^6 * freq


Proposed fix

Since results are always rounded down, all we need is to increment
the result by 1 to round it up.

Would someone ACK the patch below?

Regards.


Patch against 4.0-rc4

diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 312d43e..3cfbd07 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -66,7 +66,7 @@ static void __timer_const_udelay(unsigned long xloops)
  {
         unsigned long long loops = xloops;
         loops *= arm_delay_ops.ticks_per_jiffy;
-       __timer_delay(loops >> UDELAY_SHIFT);
+       __timer_delay((loops >> UDELAY_SHIFT) + 1);
  }
  
  static void __timer_udelay(unsigned long usecs)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 11:25 Guarantee udelay(N) spins at least N microseconds Mason
@ 2015-04-10 11:42 ` Willy Tarreau
  2015-04-10 14:53   ` Mason
  2015-04-10 11:44 ` Russell King - ARM Linux
  1 sibling, 1 reply; 16+ messages in thread
From: Willy Tarreau @ 2015-04-10 11:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Apr 10, 2015 at 01:25:37PM +0200, Mason wrote:
> Hello everyone,
> 
> This is take 2 of my tiny delay.c patch
> 
> Problem statement
> 
> When converting microseconds to timer cycles in __timer_udelay() and
> __timer_const_udelay(), the result is rounded down(*), which means the
> system will not spin as long as requested (specifically, between
> epsilon and 1 cycle shorter).
> 
> If I understand correctly, most drivers expect udelay(N) to spin for
> at least N ?s. Is that correct? In that use case, spinning less might
> introduce subtle heisenbugs.
> 
> 
> Typical example
> 
> timer->freq = 90 kHz && HZ = 100
> (thus UDELAY_MULT = 107374 && ticks_per_jiffy = 900)
> 
> udelay(10) => __timer_const_udelay(10*107374)
>            => __timer_delay((1073740*900) >> 30)
>            => __timer_delay(0)
> 
> So udelay(10) resolves to no delay at all.
> 
> 
> (*) 2^41 / 10^6 = 2199023,255552
> 2199023 < 2^41 / 10^6
> UDELAY_MULT = 2199023*HZ / 2^11 < 2^30*HZ / 10^6
> 
> cycles = N * UDELAY_MULT * freq/HZ / 2^30
>        < N * 2^30*HZ / 10^6 * freq/HZ / 2^30
>        < N / 10^6 * freq
> 
> 
> Proposed fix
> 
> Since results are always rounded down, all we need is to increment
> the result by 1 to round it up.
> 
> Would someone ACK the patch below?
> 
> Regards.
> 
> 
> Patch against 4.0-rc4
> 
> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
> index 312d43e..3cfbd07 100644
> --- a/arch/arm/lib/delay.c
> +++ b/arch/arm/lib/delay.c
> @@ -66,7 +66,7 @@ static void __timer_const_udelay(unsigned long xloops)
>  {
>         unsigned long long loops = xloops;
>         loops *= arm_delay_ops.ticks_per_jiffy;
> -       __timer_delay(loops >> UDELAY_SHIFT);
> +       __timer_delay((loops >> UDELAY_SHIFT) + 1);
>  }

If loops is a multiple of 2 ^ UDELAY_SHIFT, then your result is too
high by one. The proper way to round by excess is the following :

    __timer_delay((loops + (1 << UDELAY_SHIFT) - 1) >> UDELAY_SHIFT);

That way it does +1 for every value of loop not an exact multiple
of 2^UDELAY_SHIFT.

Regards,
willy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 11:25 Guarantee udelay(N) spins at least N microseconds Mason
  2015-04-10 11:42 ` Willy Tarreau
@ 2015-04-10 11:44 ` Russell King - ARM Linux
  2015-04-10 12:41   ` Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 11:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 01:25:37PM +0200, Mason wrote:
> If I understand correctly, most drivers expect udelay(N) to spin for
> at least N ?s. Is that correct? In that use case, spinning less might
> introduce subtle heisenbugs.

We've never guaranteed this.

The fact is that udelay() can delay for _approximately_ the time you
ask for - it might be slightly shorter, or it could be much longer
than you expect.  On most UP implementations using the software loop
it will typically be around 1% slower than requested.

Adding 1us to every delay is going to be very bad.  Rather than doing
that, why not arrange for the rounding error to be accomodated?

> Typical example
> 
> timer->freq = 90 kHz && HZ = 100
> (thus UDELAY_MULT = 107374 && ticks_per_jiffy = 900)
> 
> udelay(10) => __timer_const_udelay(10*107374)
>            => __timer_delay((1073740*900) >> 30)
>            => __timer_delay(0)

In other words, add (1 << 30) - 1 before shifting right by 30.

-- 
FTTC broadband for 0.8mile line: currently@10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 11:44 ` Russell King - ARM Linux
@ 2015-04-10 12:41   ` Mason
  2015-04-10 15:06     ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Mason @ 2015-04-10 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/04/2015 13:44, Russell King - ARM Linux wrote:
> On Fri, Apr 10, 2015 at 01:25:37PM +0200, Mason wrote:
>> If I understand correctly, most drivers expect udelay(N) to spin for
>> at least N ?s. Is that correct? In that use case, spinning less might
>> introduce subtle heisenbugs.
>
> We've never guaranteed this.
>
> The fact is that udelay() can delay for _approximately_ the time you
> ask for - it might be slightly shorter, or it could be much longer
> than you expect.

OK, but asking for 10 ?s and spinning 0 is a problem, right?

> On most UP implementations using the software loop
> it will typically be around 1% slower than requested.

Please correct any misconception, it seems to me that typical
driver writers, reading "operation X takes 10 ?s to complete"
will write udelay(10); (if they want to spin-wait)

Do you think they should take the inherent imprecision of loop-based
delays into account, and add a small cushion to be safe?

Also, it seems timer-based delays were introduced, in part, because
they did away with the imprecision of loop-based delays on DVFS
systems.

> Adding 1us to every delay is going to be very bad.  Rather than doing
> that, why not arrange for the rounding error to be accommodated?

Are you saying that my proposed patch blindly adds a 1 ?s delay?

In fact, it adds one /timer cycle/ (which, in my 90 kHz example,
is much more than 1 ?s), and this increment is just rounding up
the conversion result.

So if a user wants to spin N ?s, and N ?s is equivalent to
X.7 cycles, then we delay X+1 cycles.

Regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 11:42 ` Willy Tarreau
@ 2015-04-10 14:53   ` Mason
  2015-04-10 15:06     ` Willy Tarreau
  0 siblings, 1 reply; 16+ messages in thread
From: Mason @ 2015-04-10 14:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Willy,

On 10/04/2015 13:42, Willy Tarreau wrote:
> Hi,
>
> On Fri, Apr 10, 2015 at 01:25:37PM +0200, Mason wrote:
>> Hello everyone,
>>
>> This is take 2 of my tiny delay.c patch
>>
>> Problem statement
>>
>> When converting microseconds to timer cycles in __timer_udelay() and
>> __timer_const_udelay(), the result is rounded down(*), which means the
>> system will not spin as long as requested (specifically, between
>> epsilon and 1 cycle shorter).
>>
>> If I understand correctly, most drivers expect udelay(N) to spin for
>> at least N ?s. Is that correct? In that use case, spinning less might
>> introduce subtle heisenbugs.
>>
>>
>> Typical example
>>
>> timer->freq = 90 kHz && HZ = 100
>> (thus UDELAY_MULT = 107374 && ticks_per_jiffy = 900)
>>
>> udelay(10) => __timer_const_udelay(10*107374)
>>             => __timer_delay((1073740*900) >> 30)
>>             => __timer_delay(0)
>>
>> So udelay(10) resolves to no delay at all.
>>
>>
>> (*) 2^41 / 10^6 = 2199023,255552
>> 2199023 < 2^41 / 10^6
>> UDELAY_MULT = 2199023*HZ / 2^11 < 2^30*HZ / 10^6
>>
>> cycles = N * UDELAY_MULT * freq/HZ / 2^30
>>         < N * 2^30*HZ / 10^6 * freq/HZ / 2^30
>>         < N / 10^6 * freq
>>
>>
>> Proposed fix
>>
>> Since results are always rounded down, all we need is to increment
>> the result by 1 to round it up.
>>
>> Would someone ACK the patch below?
>>
>> Regards.
>>
>>
>> Patch against 4.0-rc4
>>
>> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
>> index 312d43e..3cfbd07 100644
>> --- a/arch/arm/lib/delay.c
>> +++ b/arch/arm/lib/delay.c
>> @@ -66,7 +66,7 @@ static void __timer_const_udelay(unsigned long xloops)
>>   {
>>          unsigned long long loops = xloops;
>>          loops *= arm_delay_ops.ticks_per_jiffy;
>> -       __timer_delay(loops >> UDELAY_SHIFT);
>> +       __timer_delay((loops >> UDELAY_SHIFT) + 1);
>>   }
>
> If loops is a multiple of 2 ^ UDELAY_SHIFT, then your result is too
> high by one. The proper way to round by excess is the following :
>
>      __timer_delay((loops + (1 << UDELAY_SHIFT) - 1) >> UDELAY_SHIFT);
>
> That way it does +1 for every value of loop not an exact multiple
> of 2^UDELAY_SHIFT.

The important thing to realize is that xloops is already rounded down,
because we use 2199023 as an approximation of 2^41 / 10^6.

Thus, even when 'loops' is a multiple of 2^30, we'll want to round up.

Illustration

timer->freq = 100*2^20 && HZ = 100
(thus UDELAY_MULT = 107374 && ticks_per_jiffy = 2^20)

Suppose udelay(512)
so we want to spin for 512 / 10^6 * 100*2^20 = 53687,0912 cycles
i.e. 53688 cycles if we round up.

loops = 512 * 107374 * 2^20 = 53687 * 2^30

If we just add (2^30-1) before shifting by 30, the result comes out
to 53687, but (loops >> 30) + 1 is closer to what we really want.

One might argue that the difference between 53687 and 53688 is
lost in the noise and thus irrelevant. I can agree with that.

Which is why I chose the simpler

   __timer_delay((loops >> UDELAY_SHIFT) + 1);

over

   __timer_delay((loops + (1 << UDELAY_SHIFT) - 1) >> UDELAY_SHIFT);


Do you disagree with this logic?

Regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 12:41   ` Mason
@ 2015-04-10 15:06     ` Russell King - ARM Linux
  2015-04-10 15:30       ` Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 02:41:37PM +0200, Mason wrote:
> On 10/04/2015 13:44, Russell King - ARM Linux wrote:
> >On Fri, Apr 10, 2015 at 01:25:37PM +0200, Mason wrote:
> >>If I understand correctly, most drivers expect udelay(N) to spin for
> >>at least N ?s. Is that correct? In that use case, spinning less might
> >>introduce subtle heisenbugs.
> >
> >We've never guaranteed this.
> >
> >The fact is that udelay() can delay for _approximately_ the time you
> >ask for - it might be slightly shorter, or it could be much longer
> >than you expect.
> 
> OK, but asking for 10 ?s and spinning 0 is a problem, right?

Potentially.

> >On most UP implementations using the software loop
> >it will typically be around 1% slower than requested.
> 
> Please correct any misconception, it seems to me that typical
> driver writers, reading "operation X takes 10 ?s to complete"
> will write udelay(10); (if they want to spin-wait)

Arguments do not matter here, sorry.  What I said above is the way it
behaves, period.  It's not for discussion.

It may interest you that I discussed this issue with Linus, and Linus
also said that it _isn't_ a problem and it _isn't_ something we care
to fix.

So, like it or not, we're stuck with udelay(10) being possible to
delay by 9.99us intead of 10us.

Where the inaccuracy comes from is entirely down to the way we calculate
loops_per_jiffy - this is the number of loop cycles between two timer
interrupts - but this does _not_ take account of the time to execute a
timer interrupt.

So, loops_per_jiffy is the number of loop cycles between two timer
interrupts minus the time taken to service the timer interrupt.

And what this means is that udelay(n) where 'n' is less than the
period between two timer interrupts /will/ be, and is /expected to
be/ potentially shorter than the requested period.

There's no getting away from that, we can't estimate how long the timer
interrupt takes to handle without the use of an external timer, and if
we've got an external timer, we might as well use it for all delays.

> Do you think they should take the inherent imprecision of loop-based
> delays into account, and add a small cushion to be safe?

No.  See above.  Not doing that.  Live with it.

Fix the rounding errors if you wish, but do _not_ try to fix the
"udelay() may return slightly earlier than requested" problem.  I
will not apply a patch to fix _that_ problem.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 14:53   ` Mason
@ 2015-04-10 15:06     ` Willy Tarreau
  0 siblings, 0 replies; 16+ messages in thread
From: Willy Tarreau @ 2015-04-10 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mason,

On Fri, Apr 10, 2015 at 04:53:10PM +0200, Mason wrote:
> The important thing to realize is that xloops is already rounded down,
> because we use 2199023 as an approximation of 2^41 / 10^6.
> 
> Thus, even when 'loops' is a multiple of 2^30, we'll want to round up.
> 
> Illustration
> 
> timer->freq = 100*2^20 && HZ = 100
> (thus UDELAY_MULT = 107374 && ticks_per_jiffy = 2^20)
> 
> Suppose udelay(512)
> so we want to spin for 512 / 10^6 * 100*2^20 = 53687,0912 cycles
> i.e. 53688 cycles if we round up.
> 
> loops = 512 * 107374 * 2^20 = 53687 * 2^30
> 
> If we just add (2^30-1) before shifting by 30, the result comes out
> to 53687, but (loops >> 30) + 1 is closer to what we really want.
> 
> One might argue that the difference between 53687 and 53688 is
> lost in the noise and thus irrelevant. I can agree with that.
> 
> Which is why I chose the simpler
> 
>   __timer_delay((loops >> UDELAY_SHIFT) + 1);
> 
> over
> 
>   __timer_delay((loops + (1 << UDELAY_SHIFT) - 1) >> UDELAY_SHIFT);
> 
> 
> Do you disagree with this logic?

OK it seems to make sense then.

Regards,
Willy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 15:06     ` Russell King - ARM Linux
@ 2015-04-10 15:30       ` Mason
  2015-04-10 16:08         ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Mason @ 2015-04-10 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Russell,

I appreciate (very much so) that you spend time replying to me,
but I also sense a lot of animosity, and I don't know what I've
done wrong to deserve it :-(

On 10/04/2015 17:06, Russell King - ARM Linux wrote:
> On Fri, Apr 10, 2015 at 02:41:37PM +0200, Mason wrote:
>> On 10/04/2015 13:44, Russell King - ARM Linux wrote:
>>> On Fri, Apr 10, 2015 at 01:25:37PM +0200, Mason wrote:
>>>> If I understand correctly, most drivers expect udelay(N) to spin for
>>>> at least N ?s. Is that correct? In that use case, spinning less might
>>>> introduce subtle heisenbugs.
>>>
>>> We've never guaranteed this.
>>>
>>> The fact is that udelay() can delay for _approximately_ the time you
>>> ask for - it might be slightly shorter, or it could be much longer
>>> than you expect.
>>
>> OK, but asking for 10 ?s and spinning 0 is a problem, right?
>
> Potentially.
>
>>> On most UP implementations using the software loop
>>> it will typically be around 1% slower than requested.
>>
>> Please correct any misconception, it seems to me that typical
>> driver writers, reading "operation X takes 10 ?s to complete"
>> will write udelay(10); (if they want to spin-wait)
>
> Arguments do not matter here, sorry.  What I said above is the way it
> behaves, period.  It's not for discussion.
>
> It may interest you that I discussed this issue with Linus, and Linus
> also said that it _isn't_ a problem and it _isn't_ something we care
> to fix.
>
> So, like it or not, we're stuck with udelay(10) being possible to
> delay by 9.99us instead of 10us.
>
> Where the inaccuracy comes from is entirely down to the way we calculate
> loops_per_jiffy - this is the number of loop cycles between two timer
> interrupts - but this does _not_ take account of the time to execute a
> timer interrupt.
>
> So, loops_per_jiffy is the number of loop cycles between two timer
> interrupts minus the time taken to service the timer interrupt.
>
> And what this means is that udelay(n) where 'n' is less than the
> period between two timer interrupts /will/ be, and is /expected to
> be/ potentially shorter than the requested period.

You've made it clear how loop-based delays are implemented; and also
that loop-based delays are typically 1% shorter than requested.
(Thanks for the overview, by the way.) Please note that I haven't
touched to the loop-based code, I'm only discussing the timer-based
code.

> There's no getting away from that, we can't estimate how long the timer
> interrupt takes to handle without the use of an external timer, and if
> we've got an external timer, we might as well use it for all delays.

Exactly! And my patch only changes __timer_const_udelay() so again I'm
not touching loop-based code.

>> Do you think they should take the inherent imprecision of loop-based
>> delays into account, and add a small cushion to be safe?
>
> No.  See above.  Not doing that.  Live with it.

See, I don't understand your "Not doing that" statement.
I didn't suggest changing the implementation, I asked your opinion
(and the opinion of others on the list) whether driver _writers_
should take into account _in their code_ the 1% error you mentioned.

Specifically, should a driver writer use

   udelay(101);

when his spec says to spin 100 ?s?

(Anyway, this is just a tangential question, as I digest the ins
and outs of kernel and driver development.)

> Fix the rounding errors if you wish, but do _not_ try to fix the
> "udelay() may return slightly earlier than requested" problem.  I
> will not apply a patch to fix _that_ problem.

Can I submit as-is the one-line patch I proposed earlier?

Regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 15:30       ` Mason
@ 2015-04-10 16:08         ` Russell King - ARM Linux
  2015-04-10 20:01           ` Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 05:30:24PM +0200, Mason wrote:
> I appreciate (very much so) that you spend time replying to me,
> but I also sense a lot of animosity, and I don't know what I've
> done wrong to deserve it :-(

I'm putting the point across strongly because I really don't think
there is an issue to be solved here.

> On 10/04/2015 17:06, Russell King - ARM Linux wrote:
> >And what this means is that udelay(n) where 'n' is less than the
> >period between two timer interrupts /will/ be, and is /expected to
> >be/ potentially shorter than the requested period.
> 
> You've made it clear how loop-based delays are implemented; and also
> that loop-based delays are typically 1% shorter than requested.
> (Thanks for the overview, by the way.) Please note that I haven't
> touched to the loop-based code, I'm only discussing the timer-based
> code.

1% is a figure I pulled out of the air.  It really depends on the CPU
instructions per cycle, and how much work is being done in the timer
interrupt handler.

> >There's no getting away from that, we can't estimate how long the timer
> >interrupt takes to handle without the use of an external timer, and if
> >we've got an external timer, we might as well use it for all delays.
> 
> Exactly! And my patch only changes __timer_const_udelay() so again I'm
> not touching loop-based code.

What I'm trying to get through to you is that udelay() as a _whole_ does
not provide a guarantee that it will wait for _at least_ the time you
asked for.  All that it does is provide an _approximate_ delay.

Yes, we can improve the timer delays to provide a guaranteed delay of at
least the requested period.  We _can't_ do the same for the loop-based
delay.

So, if we fix the timer based delays, udelay() will then have two differing
expectations depending on whether it's using a timer or a loop based delay.
That is bad.  It's extremely poor.  The expectations of an API should not
change because of a different implementation, that's the way bugs happen.

That's why...

> >No.  See above.  Not doing that.  Live with it.

it's not a problem, and why we're not going to fix the timer code to
provide a minimum guaranteed delay.

> Specifically, should a driver writer use
> 
>   udelay(101);
> 
> when his spec says to spin 100 ?s?
> 
> (Anyway, this is just a tangential question, as I digest the ins
> and outs of kernel and driver development.)

A driver writer should always use the required delay plus an adequate
cushion to ensure that the delay required by the hardware is met.

I suggest you read this:

	https://lkml.org/lkml/2011/1/9/37

which is the discussion I had with Linus on the point you are raising
here, which is the 4th hit in google for "udelay shorter delays".

Given that some udelay() implementations have been known to be as much
as 50% off, that suggests using a delay value of 200 in your above
example for a delay of 100?s.  ARM is /relatively/ good in that
regard, cpufreq and scheduling effects withstanding.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 16:08         ` Russell King - ARM Linux
@ 2015-04-10 20:01           ` Mason
  2015-04-10 20:42             ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Mason @ 2015-04-10 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/04/2015 18:08, Russell King - ARM Linux wrote:
> On Fri, Apr 10, 2015 at 05:30:24PM +0200, Mason wrote:
>> I appreciate (very much so) that you spend time replying to me,
>> but I also sense a lot of animosity, and I don't know what I've
>> done wrong to deserve it :-(
> 
> I'm putting the point across strongly because I really don't think
> there is an issue to be solved here.
> 
>> On 10/04/2015 17:06, Russell King - ARM Linux wrote:
>>> And what this means is that udelay(n) where 'n' is less than the
>>> period between two timer interrupts /will/ be, and is /expected to
>>> be/ potentially shorter than the requested period.
>>
>> You've made it clear how loop-based delays are implemented; and also
>> that loop-based delays are typically 1% shorter than requested.
>> (Thanks for the overview, by the way.) Please note that I haven't
>> touched to the loop-based code, I'm only discussing the timer-based
>> code.
> 
> 1% is a figure I pulled out of the air.  It really depends on the CPU
> instructions per cycle, and how much work is being done in the timer
> interrupt handler.
> 
>>> There's no getting away from that, we can't estimate how long the timer
>>> interrupt takes to handle without the use of an external timer, and if
>>> we've got an external timer, we might as well use it for all delays.
>>
>> Exactly! And my patch only changes __timer_const_udelay() so again I'm
>> not touching loop-based code.
> 
> What I'm trying to get through to you is that udelay() as a _whole_ does
> not provide a guarantee that it will wait for _at least_ the time you
> asked for.  All that it does is provide an _approximate_ delay.

I've taken time to digest the information you provided.

Let's forget the subject of my message, i.e. trying to
guarantee udelay(N) spins at least N microseconds.

There is, however, an important difference between loop-based
delays and timer-based delays; CPU frequencies typically fall
in the 50-5000 MHz range, while timer frequencies typically
span tens of kHz up to hundreds of MHz. For example, 90 kHz
is sometimes provided in multimedia systems (MPEG TS).

And rounding down the cycle count becomes a real problem
with coarse timers. Consider the aforementioned 90 kHz
timer, i.e. T = 11.1 ?s

If a driver wants to spin for 5 ?s, and the author was
careful enough to double the number, for a safe cushion,
he'll request 10 ?s. But since 10 < 11.1, the request
gets rounded down to 0.

Therefore, I think timer-based loops should round up.

If you NAK the patch, that's OK. I'll drop the issue,
and only apply it locally.

Regards.

Patch provided for easier reference:

diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 312d43e..3cfbd07 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -66,7 +66,7 @@ static void __timer_const_udelay(unsigned long xloops)
  {
         unsigned long long loops = xloops;
         loops *= arm_delay_ops.ticks_per_jiffy;
-       __timer_delay(loops >> UDELAY_SHIFT);
+       __timer_delay((loops >> UDELAY_SHIFT) + 1);
  }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 20:01           ` Mason
@ 2015-04-10 20:42             ` Russell King - ARM Linux
  2015-04-10 21:22               ` Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-04-10 20:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 10:01:35PM +0200, Mason wrote:
> There is, however, an important difference between loop-based
> delays and timer-based delays; CPU frequencies typically fall
> in the 50-5000 MHz range, while timer frequencies typically
> span tens of kHz up to hundreds of MHz. For example, 90 kHz
> is sometimes provided in multimedia systems (MPEG TS).

Why would you want to use such a slowly clocked counter for something
which is supposed to be able to produce delays in the micro-second and
potentially the nanosecond range?

get_cycles(), which is what the timer based delay is based upon, is
supposed to be a _high resolution counter_, preferably running at
the same kind of speeds as the CPU, though with a fixed clock rate.
It most definitely is not supposed to be in the kHz range.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 20:42             ` Russell King - ARM Linux
@ 2015-04-10 21:22               ` Mason
  2015-04-11  7:30                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Mason @ 2015-04-10 21:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/04/2015 22:42, Russell King - ARM Linux wrote:
> On Fri, Apr 10, 2015 at 10:01:35PM +0200, Mason wrote:
>> There is, however, an important difference between loop-based
>> delays and timer-based delays; CPU frequencies typically fall
>> in the 50-5000 MHz range, while timer frequencies typically
>> span tens of kHz up to hundreds of MHz. For example, 90 kHz
>> is sometimes provided in multimedia systems (MPEG TS).
> 
> Why would you want to use such a slowly clocked counter for something
> which is supposed to be able to produce delays in the micro-second and
> potentially the nanosecond range?
> 
> get_cycles(), which is what the timer based delay is based upon, is
> supposed to be a _high resolution counter_, preferably running at
> the same kind of speeds as the CPU, though with a fixed clock rate.
> It most definitely is not supposed to be in the kHz range.

If there's only a single fixed clock in the system, I'd
use it for sched_clock, clocksource, and timer delay.
Are there other options?

It was you who wrote some time ago: "Timers are preferred
because of the problems with the software delay loop."
(My system implements DVFS.)

It seems to me that a 90 kHz timer is still better than
the jiffy counter, or am I mistaken again?

Regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-10 21:22               ` Mason
@ 2015-04-11  7:30                 ` Russell King - ARM Linux
  2015-04-11 11:57                   ` Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-04-11  7:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 10, 2015 at 11:22:56PM +0200, Mason wrote:
> On 10/04/2015 22:42, Russell King - ARM Linux wrote:
> > On Fri, Apr 10, 2015 at 10:01:35PM +0200, Mason wrote:
> >> There is, however, an important difference between loop-based
> >> delays and timer-based delays; CPU frequencies typically fall
> >> in the 50-5000 MHz range, while timer frequencies typically
> >> span tens of kHz up to hundreds of MHz. For example, 90 kHz
> >> is sometimes provided in multimedia systems (MPEG TS).
> > 
> > Why would you want to use such a slowly clocked counter for something
> > which is supposed to be able to produce delays in the micro-second and
> > potentially the nanosecond range?
> > 
> > get_cycles(), which is what the timer based delay is based upon, is
> > supposed to be a _high resolution counter_, preferably running at
> > the same kind of speeds as the CPU, though with a fixed clock rate.
> > It most definitely is not supposed to be in the kHz range.
> 
> If there's only a single fixed clock in the system, I'd
> use it for sched_clock, clocksource, and timer delay.
> Are there other options?
> 
> It was you who wrote some time ago: "Timers are preferred
> because of the problems with the software delay loop."
> (My system implements DVFS.)
> 
> It seems to me that a 90 kHz timer is still better than
> the jiffy counter, or am I mistaken again?

Given the choice of a 90kHz timer vs using a calibrated software delay
loop, the software delay loop wins.  I never envisioned that someone
would be silly enough to think that a 90kHz timer would somehow be
suitable to replace a software delay loop calibrated against a timer.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-11  7:30                 ` Russell King - ARM Linux
@ 2015-04-11 11:57                   ` Mason
  2015-04-11 12:10                     ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Mason @ 2015-04-11 11:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/04/2015 09:30, Russell King - ARM Linux wrote:

> On Fri, Apr 10, 2015 at 11:22:56PM +0200, Mason wrote:
>
>> It was you who wrote some time ago: "Timers are preferred
>> because of the problems with the software delay loop."
>> (My system implements DVFS.)
>>
>> It seems to me that a 90 kHz timer is still better than
>> the jiffy counter, or am I mistaken again?
> 
> Given the choice of a 90kHz timer vs using a calibrated software
> delay loop, the software delay loop wins.  I never envisioned that
> someone would be silly enough to think

I'm full of surprises.

> that a 90kHz timer would somehow be suitable to replace a software
> delay loop calibrated against a timer.

Only one message ago, you were arguing that loop-based delays
could be up to 50% inaccurate. Thus, if one wanted to spin for
500 ?s, they'd have to request 1 ms just to be sure. An 11 ?s
accuracy looks like a better deal to me, overall.

Add DVFS to the mix, and that 500 ?s loop-based delay turns into
a 50 ?s delay when the other core decides to boost the cluster
from 100 MHz to 1 GHz. And then drivers break randomly.

Regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-11 11:57                   ` Mason
@ 2015-04-11 12:10                     ` Russell King - ARM Linux
  2015-04-11 13:45                       ` Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-04-11 12:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Apr 11, 2015 at 01:57:12PM +0200, Mason wrote:
> On 11/04/2015 09:30, Russell King - ARM Linux wrote:
> 
> > On Fri, Apr 10, 2015 at 11:22:56PM +0200, Mason wrote:
> >
> >> It was you who wrote some time ago: "Timers are preferred
> >> because of the problems with the software delay loop."
> >> (My system implements DVFS.)
> >>
> >> It seems to me that a 90 kHz timer is still better than
> >> the jiffy counter, or am I mistaken again?
> > 
> > Given the choice of a 90kHz timer vs using a calibrated software
> > delay loop, the software delay loop wins.  I never envisioned that
> > someone would be silly enough to think
> 
> I'm full of surprises.
> 
> > that a 90kHz timer would somehow be suitable to replace a software
> > delay loop calibrated against a timer.
> 
> Only one message ago, you were arguing that loop-based delays
> could be up to 50% inaccurate. Thus, if one wanted to spin for
> 500 ?s, they'd have to request 1 ms just to be sure. An 11 ?s
> accuracy looks like a better deal to me, overall.
> 
> Add DVFS to the mix, and that 500 ?s loop-based delay turns into
> a 50 ?s delay when the other core decides to boost the cluster
> from 100 MHz to 1 GHz. And then drivers break randomly.

*Think* please.

What you've just said is total rubbish.

As you've already found out, with a 90kHz clock, if you request a 5?s
delay, you get a zero delay.

That's because 90kHz has _insufficient_ resolution to be used for a
microsecond delay, and in that case it is _FAR_ better to use the
software delay loop.

Asking for a 1?s and getting an 11?s delay is _total_ bollocks.  It's
worse than a 50% error.  It's an 1100% error.

As I say, *THINK*.  I know it's a foreign idea.

You're not going to convince me to accept the idea that a 90kHz
counter is going to be suitable for a delay.  In fact, what you're
actually doing is convincing me that we need to put a test into
arch/arm/lib/delay.c to stop this stupidity: prevent it considering
any counter which does not have a clock rate in the MHz range.

That means we will reject your stupid 90kHz counter, and your
problem will be solved.

What this also means is that if you only have a 90kHz counter
available, doing DVFS is also a stupid idea.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Guarantee udelay(N) spins at least N microseconds
  2015-04-11 12:10                     ` Russell King - ARM Linux
@ 2015-04-11 13:45                       ` Mason
  0 siblings, 0 replies; 16+ messages in thread
From: Mason @ 2015-04-11 13:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/04/2015 14:10, Russell King - ARM Linux wrote:

> What this also means is that if you only have a 90kHz counter
> available, doing DVFS is also a stupid idea.

I must be missing something, because I don't see why DVFS
(i.e. saving energy when the system is idle) is a stupid
idea when only a 90 kHz counter is available?

What's the connection?

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-04-11 13:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-10 11:25 Guarantee udelay(N) spins at least N microseconds Mason
2015-04-10 11:42 ` Willy Tarreau
2015-04-10 14:53   ` Mason
2015-04-10 15:06     ` Willy Tarreau
2015-04-10 11:44 ` Russell King - ARM Linux
2015-04-10 12:41   ` Mason
2015-04-10 15:06     ` Russell King - ARM Linux
2015-04-10 15:30       ` Mason
2015-04-10 16:08         ` Russell King - ARM Linux
2015-04-10 20:01           ` Mason
2015-04-10 20:42             ` Russell King - ARM Linux
2015-04-10 21:22               ` Mason
2015-04-11  7:30                 ` Russell King - ARM Linux
2015-04-11 11:57                   ` Mason
2015-04-11 12:10                     ` Russell King - ARM Linux
2015-04-11 13:45                       ` Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.