All of lore.kernel.org
 help / color / mirror / Atom feed
* How to get better precision out of getrusage on the ARM?
@ 2015-12-22 14:30 Patrick Doyle
  2015-12-22 14:49 ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Doyle @ 2015-12-22 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

Short version:
My application running on a Cortex-A5 processor (the SAMA5D2 from
Atmel) calls getrusage(RUSAGE_THREAD, ...), which returns cpu time
quantized to the kernel tick frequency (100Hz or 1Khz, depending on
CONFIG_HZ_100 vs CONFIG_HZ_1000).

How can I get better precision for sched_clock on the (Cortex-A5) ARM?
 The x86 uses the TSC.

tl;dr version:

I see that RUSAGE_THREAD percolates through to use the clock for
CLOCK_THREAD_CPUTIME_ID, which percolates through to calling
task_sched_runtime(), which returns p->se.sum_exec_runtime, which is
(ultimately updated by the value returned by sched_clock_cpu().

Ok, so what is sched_clock()?

According to Documentation/timers/timekeeping.txt:

"In addition to the clock sources and clock events there is a special weak
function in the kernel called sched_clock(). This function shall return the
number of nanoseconds since the system was started. An architecture may or
may not provide an implementation of sched_clock() on its own. If a local
implementation is not provided, the system jiffy counter will be used as
sched_clock()."

OK, so it seems to me that the ARM architecture does not define a
custom sched_clock(), instead it relies upon the default
sched_clock(), which uses the clock source with the highest update
rate.  (Please correct me if I'm wrong -- I'm trying to educate myself
in a very short period of time here).

A system can have various clock sources.  A clock source may or may
not register itself with a call to sched_clock_register(). My Atmel
SAMA5D2 processor has 2 clock soruces: timer-atmel-pit.c and
tcb_clksrc.c, neither of which calls sched_clock_register(), so the
default jiffy_sched_clock_read() function is registered, leading to my
100 Hz/1kHz timing granularity.

I'm looking for some advice on where to go from here.  I could modify
timer-atmel-pit.c or tcb_clksrc.c to call sched_clock_register() with
the appropriate free running clock counter (tcb_clksrc looks like a
better candidate for that, maybe, I think).  But I feel like this
should be a solved problem and I shouldn't have to do this.  (I don't
mind doing this, I just hate reinventing wheels in the world of open
software).

I found a patch submitted in 2011
(http://lists.infradead.org/pipermail/linux-arm-kernel/2011-April/049549.html)
that proposes to provide a clocksource based sched_clock(), but I
don't see any evidence of that in my 4.1 kernel.  Why wasn't that
patch accepted?

While I'm at it, would somebody mind pointing me in the direction of
documentation of the rationale behind the whole clocksource
abstraction to begin with?  How does the kernel choose between
multiple clock sources (yes, I know the 'rating' field), but what does
it ultimately use its clocksource for if not for sched_clock()?  Or,
why doesn't sched_clock() use the selected clocksource?

Thanks.

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-22 14:30 How to get better precision out of getrusage on the ARM? Patrick Doyle
@ 2015-12-22 14:49 ` Russell King - ARM Linux
  2015-12-22 14:57   ` Patrick Doyle
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-12-22 14:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 09:30:30AM -0500, Patrick Doyle wrote:
> Short version:
> My application running on a Cortex-A5 processor (the SAMA5D2 from
> Atmel) calls getrusage(RUSAGE_THREAD, ...), which returns cpu time
> quantized to the kernel tick frequency (100Hz or 1Khz, depending on
> CONFIG_HZ_100 vs CONFIG_HZ_1000).
> 
> How can I get better precision for sched_clock on the (Cortex-A5) ARM?
>  The x86 uses the TSC.

You need to provide a sched_clock() implementation using
sched_clock_register().

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-22 14:49 ` Russell King - ARM Linux
@ 2015-12-22 14:57   ` Patrick Doyle
  2015-12-22 15:13     ` Russell King - ARM Linux
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Doyle @ 2015-12-22 14:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 9:49 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Dec 22, 2015 at 09:30:30AM -0500, Patrick Doyle wrote:
>> Short version:
>> My application running on a Cortex-A5 processor (the SAMA5D2 from
>> Atmel) calls getrusage(RUSAGE_THREAD, ...), which returns cpu time
>> quantized to the kernel tick frequency (100Hz or 1Khz, depending on
>> CONFIG_HZ_100 vs CONFIG_HZ_1000).
>>
>> How can I get better precision for sched_clock on the (Cortex-A5) ARM?
>>  The x86 uses the TSC.
>
> You need to provide a sched_clock() implementation using
> sched_clock_register().
>
Thank you, I tried that by modifying the (Atmel specific) tcb_clcksrc,
and got a sched_clock that never incremented.  I have a question out
at the Atmel forum asking about this, and will continue to investigate
on my own, but I figured it was about time to ask the experts :-)

How does the kernel generate its tick clock?  Does it use a
clocksource (or the clkevt abstraction), or does it rely upon the
(PIT) timer registered much earlier in the boot process?

I guess I'm confused about the relationships between clocksources,
clock events, sched_clock, and the HZ tick interrupt.  Can anybody
point me at a good place to demystify these things for me?

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-22 14:57   ` Patrick Doyle
@ 2015-12-22 15:13     ` Russell King - ARM Linux
  2015-12-22 16:28       ` Patrick Doyle
  0 siblings, 1 reply; 16+ messages in thread
From: Russell King - ARM Linux @ 2015-12-22 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 09:57:57AM -0500, Patrick Doyle wrote:
> On Tue, Dec 22, 2015 at 9:49 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Tue, Dec 22, 2015 at 09:30:30AM -0500, Patrick Doyle wrote:
> >> Short version:
> >> My application running on a Cortex-A5 processor (the SAMA5D2 from
> >> Atmel) calls getrusage(RUSAGE_THREAD, ...), which returns cpu time
> >> quantized to the kernel tick frequency (100Hz or 1Khz, depending on
> >> CONFIG_HZ_100 vs CONFIG_HZ_1000).
> >>
> >> How can I get better precision for sched_clock on the (Cortex-A5) ARM?
> >>  The x86 uses the TSC.
> >
> > You need to provide a sched_clock() implementation using
> > sched_clock_register().
> >
> Thank you, I tried that by modifying the (Atmel specific) tcb_clcksrc,
> and got a sched_clock that never incremented.  I have a question out
> at the Atmel forum asking about this, and will continue to investigate
> on my own, but I figured it was about time to ask the experts :-)
> 
> How does the kernel generate its tick clock?  Does it use a
> clocksource (or the clkevt abstraction), or does it rely upon the
> (PIT) timer registered much earlier in the boot process?
> 
> I guess I'm confused about the relationships between clocksources,
> clock events, sched_clock, and the HZ tick interrupt.  Can anybody
> point me at a good place to demystify these things for me?

sched_clock is an entirely separate clock from the rest of time keeping.

However, when a platform doesn't supply a sched_clock, it falls back to
using a HZ-based time source - that being the kernel's jiffy counter.

Otherwise, if a platform supplies a sched_clock, registered through
sched_clock_register(), the counter which is read must increment
monotonically at the rate specified to sched_clock_register().

If you're not seeing sched_clock() increment, that suggests that
the read function supplied to sched_clock_register() is returning a
constant value.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-22 15:13     ` Russell King - ARM Linux
@ 2015-12-22 16:28       ` Patrick Doyle
  2015-12-22 21:23         ` Patrick Doyle
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Doyle @ 2015-12-22 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 10:13 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Dec 22, 2015 at 09:57:57AM -0500, Patrick Doyle wrote:
>> On Tue, Dec 22, 2015 at 9:49 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Tue, Dec 22, 2015 at 09:30:30AM -0500, Patrick Doyle wrote:
>> >> Short version:
>> >> My application running on a Cortex-A5 processor (the SAMA5D2 from
>> >> Atmel) calls getrusage(RUSAGE_THREAD, ...), which returns cpu time
>> >> quantized to the kernel tick frequency (100Hz or 1Khz, depending on
>> >> CONFIG_HZ_100 vs CONFIG_HZ_1000).
>> >>
>> >> How can I get better precision for sched_clock on the (Cortex-A5) ARM?
>> >>  The x86 uses the TSC.
>> >
>> > You need to provide a sched_clock() implementation using
>> > sched_clock_register().
>> >
>> Thank you, I tried that by modifying the (Atmel specific) tcb_clcksrc,
>> and got a sched_clock that never incremented.  I have a question out
>> at the Atmel forum asking about this, and will continue to investigate
>> on my own, but I figured it was about time to ask the experts :-)
Thank you for your help.

Now my sched_clock is incrementing, (I have no idea why it wasn't
yesterday -- it must have been 5:00-itis).  I have passed a readback
function to sched_clock_register(), but I'm still seeing 1ms
resolution for the values returned by getrusage().  Are there other
configuration items I should be setting to get better resolution?  I
have tried setting CONFIG_HIGH_RES_TIMERS and
CONFIG_IRQ_TIME_ACCOUNTING.  It seems like I should choose
VIRT_CPU_ACCOUNTING_NATIVE, but that choice is not enabled because
HAVE_VIRT_CPU_ACCOUNTING is not defined.  HAVE_VIRT_SPU_ACCOUNTING
doesn't seem to be defined for the ARM architecture.

Any more thoughts or ideas?

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-22 16:28       ` Patrick Doyle
@ 2015-12-22 21:23         ` Patrick Doyle
  2015-12-30 15:00           ` Patrick Doyle
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Doyle @ 2015-12-22 21:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 11:28 AM, Patrick Doyle <wpdster@gmail.com> wrote:
> On Tue, Dec 22, 2015 at 10:13 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Tue, Dec 22, 2015 at 09:57:57AM -0500, Patrick Doyle wrote:
>>> On Tue, Dec 22, 2015 at 9:49 AM, Russell King - ARM Linux
>>> <linux@arm.linux.org.uk> wrote:
>>> > On Tue, Dec 22, 2015 at 09:30:30AM -0500, Patrick Doyle wrote:
>>> >> Short version:
>>> >> My application running on a Cortex-A5 processor (the SAMA5D2 from
>>> >> Atmel) calls getrusage(RUSAGE_THREAD, ...), which returns cpu time
>>> >> quantized to the kernel tick frequency (100Hz or 1Khz, depending on
>>> >> CONFIG_HZ_100 vs CONFIG_HZ_1000).
>>> >>
>>> >> How can I get better precision for sched_clock on the (Cortex-A5) ARM?
>>> >>  The x86 uses the TSC.
>>> >
>>> > You need to provide a sched_clock() implementation using
>>> > sched_clock_register().
>>> >
>>> Thank you, I tried that by modifying the (Atmel specific) tcb_clcksrc,
>>> and got a sched_clock that never incremented.  I have a question out
>>> at the Atmel forum asking about this, and will continue to investigate
>>> on my own, but I figured it was about time to ask the experts :-)
> Thank you for your help.
>
> Now my sched_clock is incrementing, (I have no idea why it wasn't
> yesterday -- it must have been 5:00-itis).  I have passed a readback
> function to sched_clock_register(), but I'm still seeing 1ms
> resolution for the values returned by getrusage().  Are there other
> configuration items I should be setting to get better resolution?  I
> have tried setting CONFIG_HIGH_RES_TIMERS and
> CONFIG_IRQ_TIME_ACCOUNTING.  It seems like I should choose
> VIRT_CPU_ACCOUNTING_NATIVE, but that choice is not enabled because
> HAVE_VIRT_CPU_ACCOUNTING is not defined.  HAVE_VIRT_SPU_ACCOUNTING
> doesn't seem to be defined for the ARM architecture.

OK, answering my own question, it seems that I had to select
CONFIG_VIRT_CPU_ACCOUNTING_GEN, also known as "Full dynticks CPU time
accounting" (somewhat confusing me) in order to get the time
resolution provided by my counter.

I also have figured out why sched_clock wasn't incrementing yesterday
-- it seems to stop incrementing once my 32-bit counter wraps around.

I thought that sched_clock_register() would take care of wrap-around
issues.  I call it with bits = 32.  What else should I do in order for
sched_clock to continue counting beyond the first 7 minutes of life?

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-22 21:23         ` Patrick Doyle
@ 2015-12-30 15:00           ` Patrick Doyle
  2015-12-30 15:52             ` Patrick Doyle
  2016-01-19  0:16             ` Alexandre Belloni
  0 siblings, 2 replies; 16+ messages in thread
From: Patrick Doyle @ 2015-12-30 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

Continuing on...
I now have a CLOCKSOURCE_OF_DECLARED()'ed 10 MHz clock source running
on my ARM processor (Atmel SAMA5D2 Xplained board).  It registers
itself through sched_clock_register() to provide a high resolution
sched clock.  Once I turned on "Full dynticks CPU time accounting"
(CONFIG_VIRT_CPU_ACCOUNTING_GEN), I was able to get better than jiffy
resolution from my calls to getrusage(RUSAGE_THREAD,..).  But things
still aren't quite right.  I am using getrusage() to provide some
runtime profile information to an existing application (that was
ported to run on Linux instead of a custom RTOS).  I have code that
looks like:

tick()
// commented out code that used to do something
tock()

where tick() & tock() are my profile "start" and "stop" points that
call getrusage() to record and and accumulate time spent between calls
to tick() & tock().  Most of the time, I get a delta of 0 between the
two calls, which I expect.  But occasionally, I get a delta ranging
between 800us and 1000us, which I don't understand at all.  It seems
like my thread is being "charged" for time spent doing something else.
Perhaps an interrupt occurred and its time got charged to my thread;
perhaps a higher priority thread ran for 1ms, I don't know (yet).

Does anybody have any suggestions as to where I might look, or as to
what kernel CONFIG options might make the most sense for an
application such as this?

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-30 15:00           ` Patrick Doyle
@ 2015-12-30 15:52             ` Patrick Doyle
  2016-01-01 18:14               ` Corey Minyard
  2016-01-19  0:16             ` Alexandre Belloni
  1 sibling, 1 reply; 16+ messages in thread
From: Patrick Doyle @ 2015-12-30 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 30, 2015 at 10:00 AM, Patrick Doyle <wpdster@gmail.com> wrote:
> Continuing on...
> I now have a CLOCKSOURCE_OF_DECLARED()'ed 10 MHz clock source running
> on my ARM processor (Atmel SAMA5D2 Xplained board).  It registers
> itself through sched_clock_register() to provide a high resolution
> sched clock.  Once I turned on "Full dynticks CPU time accounting"
> (CONFIG_VIRT_CPU_ACCOUNTING_GEN), I was able to get better than jiffy
> resolution from my calls to getrusage(RUSAGE_THREAD,..).  But things
> still aren't quite right.  I am using getrusage() to provide some
> runtime profile information to an existing application (that was
> ported to run on Linux instead of a custom RTOS).  I have code that
> looks like:
>
> tick()
> // commented out code that used to do something
> tock()
>
> where tick() & tock() are my profile "start" and "stop" points that
> call getrusage() to record and and accumulate time spent between calls
> to tick() & tock().  Most of the time, I get a delta of 0 between the
> two calls, which I expect.  But occasionally, I get a delta ranging
> between 800us and 1000us, which I don't understand at all.  It seems
> like my thread is being "charged" for time spent doing something else.
> Perhaps an interrupt occurred and its time got charged to my thread;
> perhaps a higher priority thread ran for 1ms, I don't know (yet).
>
> Does anybody have any suggestions as to where I might look, or as to
> what kernel CONFIG options might make the most sense for an
> application such as this?
>
> --wpd

A couple of more (confusing) data points...
- Changing the tick rate to 100Hz results in deltas as extreme as 9400us.
- Using clock_gettime(CLOCK_THREAD_CPUTIME_ID,...) instead of
getrusage(RUSAGE_THREAD,...) gives much more believable numbers in the
15-25us range, but still with a few bizarre excursions to 41, 69, and
172us (for one random test case).

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-30 15:52             ` Patrick Doyle
@ 2016-01-01 18:14               ` Corey Minyard
  2016-01-04 15:46                 ` Patrick Doyle
  2016-01-19  4:50                 ` Yang, Wenyou
  0 siblings, 2 replies; 16+ messages in thread
From: Corey Minyard @ 2016-01-01 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

Those numbers are statistical, if a tick occurs while something is 
running, that something is assigned the entire timer tick.  So as you 
have found, you can get some pretty unusual numbers on this, especially 
with short intervals.

I have some patches at https://sourceforge.net/projects/microstate that 
add the ability to do accurate accounting of time, if you really need that.

-corey

On 12/30/2015 09:52 AM, Patrick Doyle wrote:
> On Wed, Dec 30, 2015 at 10:00 AM, Patrick Doyle <wpdster@gmail.com> wrote:
>> Continuing on...
>> I now have a CLOCKSOURCE_OF_DECLARED()'ed 10 MHz clock source running
>> on my ARM processor (Atmel SAMA5D2 Xplained board).  It registers
>> itself through sched_clock_register() to provide a high resolution
>> sched clock.  Once I turned on "Full dynticks CPU time accounting"
>> (CONFIG_VIRT_CPU_ACCOUNTING_GEN), I was able to get better than jiffy
>> resolution from my calls to getrusage(RUSAGE_THREAD,..).  But things
>> still aren't quite right.  I am using getrusage() to provide some
>> runtime profile information to an existing application (that was
>> ported to run on Linux instead of a custom RTOS).  I have code that
>> looks like:
>>
>> tick()
>> // commented out code that used to do something
>> tock()
>>
>> where tick() & tock() are my profile "start" and "stop" points that
>> call getrusage() to record and and accumulate time spent between calls
>> to tick() & tock().  Most of the time, I get a delta of 0 between the
>> two calls, which I expect.  But occasionally, I get a delta ranging
>> between 800us and 1000us, which I don't understand at all.  It seems
>> like my thread is being "charged" for time spent doing something else.
>> Perhaps an interrupt occurred and its time got charged to my thread;
>> perhaps a higher priority thread ran for 1ms, I don't know (yet).
>>
>> Does anybody have any suggestions as to where I might look, or as to
>> what kernel CONFIG options might make the most sense for an
>> application such as this?
>>
>> --wpd
> A couple of more (confusing) data points...
> - Changing the tick rate to 100Hz results in deltas as extreme as 9400us.
> - Using clock_gettime(CLOCK_THREAD_CPUTIME_ID,...) instead of
> getrusage(RUSAGE_THREAD,...) gives much more believable numbers in the
> 15-25us range, but still with a few bizarre excursions to 41, 69, and
> 172us (for one random test case).
>
> --wpd
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2016-01-01 18:14               ` Corey Minyard
@ 2016-01-04 15:46                 ` Patrick Doyle
  2016-01-19  4:50                 ` Yang, Wenyou
  1 sibling, 0 replies; 16+ messages in thread
From: Patrick Doyle @ 2016-01-04 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 1, 2016 at 1:14 PM, Corey Minyard <tcminyard@gmail.com> wrote:
> Those numbers are statistical, if a tick occurs while something is running,
> that something is assigned the entire timer tick.  So as you have found, you
> can get some pretty unusual numbers on this, especially with short
> intervals.
>
> I have some patches at https://sourceforge.net/projects/microstate that add
> the ability to do accurate accounting of time, if you really need that.
Thank you Corey,
That makes sense.  I have found that using
clock_gettime(CLOCK_THREAD_CPUTIME_ID, ...) provides the information I
require.  I was confused by the accounting of timer ticks, and
especially by the fact that the accounting changed from exactly 1000us
(or 10000us) to a range centered about 1000us as I fiddled with config
parameters.  I felt sure that one of those config options would result
in accounting only such time to a thread as the thread used.
clock_gettime() returns a thread specific timer that I can use.

At this point, I remain confused/disappointed by the 3us time interval
between calls to clock_gettime().  That feels like a long time for a
pair of context switches into the kernel to get the thread timer, but
I guess that's what it is (on my 500 MHz Cortex-A5).

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2015-12-30 15:00           ` Patrick Doyle
  2015-12-30 15:52             ` Patrick Doyle
@ 2016-01-19  0:16             ` Alexandre Belloni
  2016-01-19 14:19               ` Patrick Doyle
  1 sibling, 1 reply; 16+ messages in thread
From: Alexandre Belloni @ 2016-01-19  0:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Patrick,

I was wondering whether you had the chance to test this patch from the
RT tree:
http://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v4.4-rt2&id=57142bdff523a67657d0b2603eaa91df58b88bd8

when trying your sched_clock experiment?

On 30/12/2015 at 10:00:46 -0500, Patrick Doyle wrote :
> Continuing on...
> I now have a CLOCKSOURCE_OF_DECLARED()'ed 10 MHz clock source running
> on my ARM processor (Atmel SAMA5D2 Xplained board).  It registers
> itself through sched_clock_register() to provide a high resolution
> sched clock.  Once I turned on "Full dynticks CPU time accounting"
> (CONFIG_VIRT_CPU_ACCOUNTING_GEN), I was able to get better than jiffy
> resolution from my calls to getrusage(RUSAGE_THREAD,..).  But things
> still aren't quite right.  I am using getrusage() to provide some
> runtime profile information to an existing application (that was
> ported to run on Linux instead of a custom RTOS).  I have code that
> looks like:
> 
> tick()
> // commented out code that used to do something
> tock()
> 
> where tick() & tock() are my profile "start" and "stop" points that
> call getrusage() to record and and accumulate time spent between calls
> to tick() & tock().  Most of the time, I get a delta of 0 between the
> two calls, which I expect.  But occasionally, I get a delta ranging
> between 800us and 1000us, which I don't understand at all.  It seems
> like my thread is being "charged" for time spent doing something else.
> Perhaps an interrupt occurred and its time got charged to my thread;
> perhaps a higher priority thread ran for 1ms, I don't know (yet).
> 
> Does anybody have any suggestions as to where I might look, or as to
> what kernel CONFIG options might make the most sense for an
> application such as this?
> 
> --wpd
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2016-01-01 18:14               ` Corey Minyard
  2016-01-04 15:46                 ` Patrick Doyle
@ 2016-01-19  4:50                 ` Yang, Wenyou
  2016-01-19 14:36                   ` Patrick Doyle
  1 sibling, 1 reply; 16+ messages in thread
From: Yang, Wenyou @ 2016-01-19  4:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Patrick,

On 2016/1/2 2:14, Corey Minyard wrote:
> Those numbers are statistical, if a tick occurs while something is 
> running, that something is assigned the entire timer tick.  So as you 
> have found, you can get some pretty unusual numbers on this, 
> especially with short intervals.
>
> I have some patches at https://sourceforge.net/projects/microstate 
> that add the ability to do accurate accounting of time, if you really 
> need that.

I want to run this similar tests on my side.

It seems you don't use this project. What project do you use as your 
application?  Could you give some information?

Did your questions resolve?  What is remaining?

Thank you!

>
> -corey
>
> On 12/30/2015 09:52 AM, Patrick Doyle wrote:
>> On Wed, Dec 30, 2015 at 10:00 AM, Patrick Doyle <wpdster@gmail.com> 
>> wrote:
>>> Continuing on...
>>> I now have a CLOCKSOURCE_OF_DECLARED()'ed 10 MHz clock source running
>>> on my ARM processor (Atmel SAMA5D2 Xplained board).  It registers
>>> itself through sched_clock_register() to provide a high resolution
>>> sched clock.  Once I turned on "Full dynticks CPU time accounting"
>>> (CONFIG_VIRT_CPU_ACCOUNTING_GEN), I was able to get better than jiffy
>>> resolution from my calls to getrusage(RUSAGE_THREAD,..).  But things
>>> still aren't quite right.  I am using getrusage() to provide some
>>> runtime profile information to an existing application (that was
>>> ported to run on Linux instead of a custom RTOS).  I have code that
>>> looks like:
>>>
>>> tick()
>>> // commented out code that used to do something
>>> tock()
>>>
>>> where tick() & tock() are my profile "start" and "stop" points that
>>> call getrusage() to record and and accumulate time spent between calls
>>> to tick() & tock().  Most of the time, I get a delta of 0 between the
>>> two calls, which I expect.  But occasionally, I get a delta ranging
>>> between 800us and 1000us, which I don't understand at all.  It seems
>>> like my thread is being "charged" for time spent doing something else.
>>> Perhaps an interrupt occurred and its time got charged to my thread;
>>> perhaps a higher priority thread ran for 1ms, I don't know (yet).
>>>
>>> Does anybody have any suggestions as to where I might look, or as to
>>> what kernel CONFIG options might make the most sense for an
>>> application such as this?
>>>
>>> --wpd
>> A couple of more (confusing) data points...
>> - Changing the tick rate to 100Hz results in deltas as extreme as 
>> 9400us.
>> - Using clock_gettime(CLOCK_THREAD_CPUTIME_ID,...) instead of
>> getrusage(RUSAGE_THREAD,...) gives much more believable numbers in the
>> 15-25us range, but still with a few bizarre excursions to 41, 69, and
>> 172us (for one random test case).
>>
>> --wpd
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2016-01-19  0:16             ` Alexandre Belloni
@ 2016-01-19 14:19               ` Patrick Doyle
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Doyle @ 2016-01-19 14:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Alexandre,
I missed that patch.  I'll have to take a look at that.  Thank you.

Regardless, it seems that I need to modify tcb_clksrc.c to register
itself through CLOCKSOURCE_OF_DECLARE(), set the appropriate kernel
configuration options, and use clock_gettime(CLOCK_THREAD_CPUTIME_ID)
in order to get better than 1 tick resolution out of clock_gettime for
my threads.  Unfortunately, the tcb_clksrc.c modifications appear to
be incompatible with the design intent of atmel_tclib.c.  I am waiting
for the opportunity to talk to Atmel about this.  (I need to ping my
local FAE about this again).  I also need to turn up the heat on this
back burner'ed task.  (I got some numbers, none of us are happy with
them, and I need to see what I can do to get better numbers, when I
can make the time to do so.)

--wpd


On Mon, Jan 18, 2016 at 7:16 PM, Alexandre Belloni
<alexandre.belloni@free-electrons.com> wrote:
> Hi Patrick,
>
> I was wondering whether you had the chance to test this patch from the
> RT tree:
> http://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v4.4-rt2&id=57142bdff523a67657d0b2603eaa91df58b88bd8
>
> when trying your sched_clock experiment?
>
> On 30/12/2015 at 10:00:46 -0500, Patrick Doyle wrote :
>> Continuing on...
>> I now have a CLOCKSOURCE_OF_DECLARED()'ed 10 MHz clock source running
>> on my ARM processor (Atmel SAMA5D2 Xplained board).  It registers
>> itself through sched_clock_register() to provide a high resolution
>> sched clock.  Once I turned on "Full dynticks CPU time accounting"
>> (CONFIG_VIRT_CPU_ACCOUNTING_GEN), I was able to get better than jiffy
>> resolution from my calls to getrusage(RUSAGE_THREAD,..).  But things
>> still aren't quite right.  I am using getrusage() to provide some
>> runtime profile information to an existing application (that was
>> ported to run on Linux instead of a custom RTOS).  I have code that
>> looks like:
>>
>> tick()
>> // commented out code that used to do something
>> tock()
>>
>> where tick() & tock() are my profile "start" and "stop" points that
>> call getrusage() to record and and accumulate time spent between calls
>> to tick() & tock().  Most of the time, I get a delta of 0 between the
>> two calls, which I expect.  But occasionally, I get a delta ranging
>> between 800us and 1000us, which I don't understand at all.  It seems
>> like my thread is being "charged" for time spent doing something else.
>> Perhaps an interrupt occurred and its time got charged to my thread;
>> perhaps a higher priority thread ran for 1ms, I don't know (yet).
>>
>> Does anybody have any suggestions as to where I might look, or as to
>> what kernel CONFIG options might make the most sense for an
>> application such as this?
>>
>> --wpd
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> --
> Alexandre Belloni, Free Electrons
> Embedded Linux, Kernel and Android engineering
> http://free-electrons.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2016-01-19  4:50                 ` Yang, Wenyou
@ 2016-01-19 14:36                   ` Patrick Doyle
  2016-01-20  1:24                     ` Yang, Wenyou
  2016-01-20 14:35                     ` Corey Minyard
  0 siblings, 2 replies; 16+ messages in thread
From: Patrick Doyle @ 2016-01-19 14:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 18, 2016 at 11:50 PM, Yang, Wenyou <wenyou.yang@atmel.com> wrote:
> On 2016/1/2 2:14, Corey Minyard wrote:
>> I have some patches at https://sourceforge.net/projects/microstate that
>> add the ability to do accurate accounting of time, if you really need that.
>
>
> I want to run this similar tests on my side.
>
> It seems you don't use this project. What project do you use as your
> application?  Could you give some information?
>
> Did your questions resolve?  What is remaining?
Hello Wenyou,
For my project, I was porting an existing application that ran on a
custom embedded OS.  Part of that custom embedded OS provided
per-thread CPU runtime timers.  As I looked around for a mechanism to
implement that, I found clock_gettime(2) using the
CLOCK_THREAD_CPUTIME_ID clk_id should provide the same capability in a
Posix-ly correct manner.

The problem I ran into was the values returned by
CLOCK_THREAD_CPUTIME_ID were quantized to multiples of the system tick
frequency.  So I hacked up tcb_clksrc.c to register itself using
CLOCKSOURCE_OF_DECLARE() and set the appropriate kernel options
(CONFIG_HIGH_RES_TIMERS and CONFIG_NO_HZ_IDLE, I think, but I can't
seem to find my notes or config files from that work -- I hate Monday
mornings, even virtual ones!).  As I noted in my previous email, I am
not happy with my changes to tcb_clksrc.c and would like the
opportunity to discuss them with you and your colleagues at Atmel and
figure out the best way to approach that.  I'm happy to send the
patches to you, but I don't think they should go into the linux-at91
tree without a lot more thought applied.

--wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2016-01-19 14:36                   ` Patrick Doyle
@ 2016-01-20  1:24                     ` Yang, Wenyou
  2016-01-20 14:35                     ` Corey Minyard
  1 sibling, 0 replies; 16+ messages in thread
From: Yang, Wenyou @ 2016-01-20  1:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Patrick,

Add Cyrille in loop, Cyrille is a timer guru in our company.

> -----Original Message-----
> From: Patrick Doyle [mailto:wpdster at gmail.com]
> Sent: 2016?1?19? 22:37
> To: Yang, Wenyou <Wenyou.Yang@atmel.com>
> Cc: Corey Minyard <tcminyard@gmail.com>; linux-arm-
> kernel at lists.infradead.org
> Subject: Re: How to get better precision out of getrusage on the ARM?
> 
> On Mon, Jan 18, 2016 at 11:50 PM, Yang, Wenyou <wenyou.yang@atmel.com>
> wrote:
> > On 2016/1/2 2:14, Corey Minyard wrote:
> >> I have some patches at https://sourceforge.net/projects/microstate
> >> that add the ability to do accurate accounting of time, if you really need that.
> >
> >
> > I want to run this similar tests on my side.
> >
> > It seems you don't use this project. What project do you use as your
> > application?  Could you give some information?
> >
> > Did your questions resolve?  What is remaining?
> Hello Wenyou,
> For my project, I was porting an existing application that ran on a custom
> embedded OS.  Part of that custom embedded OS provided per-thread CPU
> runtime timers.  As I looked around for a mechanism to implement that, I found
> clock_gettime(2) using the CLOCK_THREAD_CPUTIME_ID clk_id should provide
> the same capability in a Posix-ly correct manner.

Thank you for your information.

> 
> The problem I ran into was the values returned by
> CLOCK_THREAD_CPUTIME_ID were quantized to multiples of the system tick
> frequency.  So I hacked up tcb_clksrc.c to register itself using
> CLOCKSOURCE_OF_DECLARE() and set the appropriate kernel options
> (CONFIG_HIGH_RES_TIMERS and CONFIG_NO_HZ_IDLE, I think, but I can't
> seem to find my notes or config files from that work -- I hate Monday mornings,
> even virtual ones!).  As I noted in my previous email, I am not happy with my
> changes to tcb_clksrc.c and would like the opportunity to discuss them with you
> and your colleagues at Atmel and figure out the best way to approach that.  I'm
> happy to send the patches to you, but I don't think they should go into the linux-
> at91 tree without a lot more thought applied.

Please send me your patch, having it, we will have a good point to investigate it.

As you known, we are spending time on this issue.

Don't worry.

Thanks.


Best Regards,
Wenyou Yang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* How to get better precision out of getrusage on the ARM?
  2016-01-19 14:36                   ` Patrick Doyle
  2016-01-20  1:24                     ` Yang, Wenyou
@ 2016-01-20 14:35                     ` Corey Minyard
  1 sibling, 0 replies; 16+ messages in thread
From: Corey Minyard @ 2016-01-20 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/19/2016 08:36 AM, Patrick Doyle wrote:
> On Mon, Jan 18, 2016 at 11:50 PM, Yang, Wenyou <wenyou.yang@atmel.com> wrote:
>> On 2016/1/2 2:14, Corey Minyard wrote:
>>> I have some patches at https://sourceforge.net/projects/microstate that
>>> add the ability to do accurate accounting of time, if you really need that.
>>
>> I want to run this similar tests on my side.
>>
>> It seems you don't use this project. What project do you use as your
>> application?  Could you give some information?
>>
>> Did your questions resolve?  What is remaining?
> Hello Wenyou,
> For my project, I was porting an existing application that ran on a
> custom embedded OS.  Part of that custom embedded OS provided
> per-thread CPU runtime timers.  As I looked around for a mechanism to
> implement that, I found clock_gettime(2) using the
> CLOCK_THREAD_CPUTIME_ID clk_id should provide the same capability in a
> Posix-ly correct manner.

Those numbers are still statistical by default.  It's the same numbers
as getrusage(), just in a POSIX interface.

I forgot, there is a new kernel option may provide what you need. It's
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.  I have not tested this, but from
the description it may provide what you need.  It's only available on some
arches it appears.  There are some other timekeeping options in 
init/Kconfig,
you may want to look at those.

-corey

> The problem I ran into was the values returned by
> CLOCK_THREAD_CPUTIME_ID were quantized to multiples of the system tick
> frequency.  So I hacked up tcb_clksrc.c to register itself using
> CLOCKSOURCE_OF_DECLARE() and set the appropriate kernel options
> (CONFIG_HIGH_RES_TIMERS and CONFIG_NO_HZ_IDLE, I think, but I can't
> seem to find my notes or config files from that work -- I hate Monday
> mornings, even virtual ones!).  As I noted in my previous email, I am
> not happy with my changes to tcb_clksrc.c and would like the
> opportunity to discuss them with you and your colleagues at Atmel and
> figure out the best way to approach that.  I'm happy to send the
> patches to you, but I don't think they should go into the linux-at91
> tree without a lot more thought applied.
>
> --wpd

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-01-20 14:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-22 14:30 How to get better precision out of getrusage on the ARM? Patrick Doyle
2015-12-22 14:49 ` Russell King - ARM Linux
2015-12-22 14:57   ` Patrick Doyle
2015-12-22 15:13     ` Russell King - ARM Linux
2015-12-22 16:28       ` Patrick Doyle
2015-12-22 21:23         ` Patrick Doyle
2015-12-30 15:00           ` Patrick Doyle
2015-12-30 15:52             ` Patrick Doyle
2016-01-01 18:14               ` Corey Minyard
2016-01-04 15:46                 ` Patrick Doyle
2016-01-19  4:50                 ` Yang, Wenyou
2016-01-19 14:36                   ` Patrick Doyle
2016-01-20  1:24                     ` Yang, Wenyou
2016-01-20 14:35                     ` Corey Minyard
2016-01-19  0:16             ` Alexandre Belloni
2016-01-19 14:19               ` Patrick Doyle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.