linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND
@ 2007-05-16 23:20 Woodruff, Richard
  2007-05-17 10:10 ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Woodruff, Richard @ 2007-05-16 23:20 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel

Hi,

In testing we were noticing that we were getting some intermittent
crashes in profile_tick() when dyntick was enabled.

The crashes were because the frame pointer per_cpu____irq_regs value was
0.  That code does a user_mode(get_irq_regs()).  Currently regs is set
only upon real hardware entry on an irq.

The crash path shows resend_irqs() could be called with in a context
where set_irq_regs() was not executed.  In one specific case this was
from
softirq->tasklet_action(resend_tasklet)->resend_irqs->handle_level_irq->
handle_IRQ_event->...->profile_tick.

It seems anyone calling kernel/irq/manage.c:enable_irq() at the wrong
time can trigger this crash.

Creating a fake stack and doing a set_irq_regs() fixes the crash.  Would
it be useful to set a pointer to the entry context on all state changes?
For ease I just hacked a default fake stack into the init process after
fork time so there is never a 0 but that doesn't seem so nice.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND
  2007-05-16 23:20 bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND Woodruff, Richard
@ 2007-05-17 10:10 ` Thomas Gleixner
  2007-05-17 20:14   ` Woodruff, Richard
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2007-05-17 10:10 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-kernel, Ingo Molnar

On Wed, 2007-05-16 at 18:20 -0500, Woodruff, Richard wrote:
> The crashes were because the frame pointer per_cpu____irq_regs value was
> 0.  That code does a user_mode(get_irq_regs()).  Currently regs is set
> only upon real hardware entry on an irq.
> 
> The crash path shows resend_irqs() could be called with in a context
> where set_irq_regs() was not executed.  In one specific case this was
> from
> softirq->tasklet_action(resend_tasklet)->resend_irqs->handle_level_irq->
> handle_IRQ_event->...->profile_tick.
> 
> It seems anyone calling kernel/irq/manage.c:enable_irq() at the wrong
> time can trigger this crash.

which code is disabling / enabling the timer interrupt ?

	tglx



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND
  2007-05-17 10:10 ` Thomas Gleixner
@ 2007-05-17 20:14   ` Woodruff, Richard
  2007-05-17 20:38     ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Woodruff, Richard @ 2007-05-17 20:14 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Ingo Molnar

> On Wed, 2007-05-16 at 18:20 -0500, Woodruff, Richard wrote:
> > The crashes were because the frame pointer per_cpu____irq_regs value
was
> > 0.  That code does a user_mode(get_irq_regs()).  Currently regs is
set
> > only upon real hardware entry on an irq.
> >
> > The crash path shows resend_irqs() could be called with in a context
> > where set_irq_regs() was not executed.  In one specific case this
was
> > from
> >
softirq->tasklet_action(resend_tasklet)->resend_irqs->handle_level_irq->
> > handle_IRQ_event->...->profile_tick.
> >
> > It seems anyone calling kernel/irq/manage.c:enable_irq() at the
wrong
> > time can trigger this crash.
> 
> which code is disabling / enabling the timer interrupt ?

- No one in this case is calling enable_irq(#timer). The failure is
triggered from a non-tick-related enable_irq(#x). The function
handle_IRQ_event() always calls handle_dynamic_tick(). Thus every real
interrupt or fake interrupt though resend_irq will touch the timer code
paths.

To better describe:
  -0- Users space does an ioctl to driver
  -1- This driver calls enable_irq(#x)
  -2- This triggers a check_irq_resend()
  -3- This causes a tasklet schedule of the resend_tasklet for #x
  -4- This driver later does a spin_unlock_bh
  -5- This triggers a check for softirq/tasklets
  -6- The resend_tasklet is run and calls desc->handle_irq
  -7- This calls handle_level_irq
  -8- This calls handle_IRQ_event
  -9- This first calls handle_dynamic_tick
  -A- This will call though the ticker code to tick update
  -B- Finally you die in profile_tick.
  -C- Boom in dereference of 0 from user_mode(regs)

As there was no real interrupt the frame marker for irq_regs was not set
and the system dies.  Entry was via trap from the ioctl, not irq do_irq.

A dummy non-zero frame allows it to work but doesn't give true
profiling.  The resend path seems generally unsafe today.  Why not set
it on traps?

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND
  2007-05-17 20:14   ` Woodruff, Richard
@ 2007-05-17 20:38     ` Thomas Gleixner
  2007-05-17 22:24       ` Woodruff, Richard
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2007-05-17 20:38 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-kernel, Ingo Molnar

On Thu, 2007-05-17 at 15:14 -0500, Woodruff, Richard wrote:
> > which code is disabling / enabling the timer interrupt ?
> 
> - No one in this case is calling enable_irq(#timer). The failure is
> triggered from a non-tick-related enable_irq(#x). The function
> handle_IRQ_event() always calls handle_dynamic_tick(). Thus every real
> interrupt or fake interrupt though resend_irq will touch the timer code
> paths.
> 
> To better describe:
>   -0- Users space does an ioctl to driver
>   -1- This driver calls enable_irq(#x)
>   -2- This triggers a check_irq_resend()
>   -3- This causes a tasklet schedule of the resend_tasklet for #x
>   -4- This driver later does a spin_unlock_bh
>   -5- This triggers a check for softirq/tasklets
>   -6- The resend_tasklet is run and calls desc->handle_irq
>   -7- This calls handle_level_irq
>   -8- This calls handle_IRQ_event
>   -9- This first calls handle_dynamic_tick

This is the original ARM dyntick stuff, right ?

The dyntick support on your architecture is broken. Why does it fiddle
with the timer, when the system is not idle ?

This stuff should go away ASAP. A lot of ARMs are already converted to
clock events are using the generic NOHZ implementation, which does not
have those problems at all.

	tglx




^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND
  2007-05-17 20:38     ` Thomas Gleixner
@ 2007-05-17 22:24       ` Woodruff, Richard
  2007-05-18  7:49         ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Woodruff, Richard @ 2007-05-17 22:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Ingo Molnar

> This is the original ARM dyntick stuff, right ?

Yes this is a version is not using clocksource.

> The dyntick support on your architecture is broken. Why does it fiddle
> with the timer, when the system is not idle ?

I can't yet run the test sequence on the latest kernel so I'll have to
wait to experiment.  A brief look at the new code seems to show a
similar path but I need to actually run though it to understand better.


On the irq_resend() path handle_dynamic_tick() is still called as
before.  The difference now is the handler looks like it will jump into
some generic event code in tick-common.c.  This still calls
ticker_periodic() which calls profile_tick() in and out of one-shot
mode.

It might be the path still exists or I've just not spent enough time
understanding the real flow.

Thanks for the insight and the nice code.

> This stuff should go away ASAP. A lot of ARMs are already converted to
> clock events are using the generic NOHZ implementation, which does not
> have those problems at all.

Ok. Thanks. I look forward to using that when I can.

Regards,
Richard W. 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND
  2007-05-17 22:24       ` Woodruff, Richard
@ 2007-05-18  7:49         ` Thomas Gleixner
  2008-04-18 15:43           ` Higer latency with dynamic tick (need for an io-ondemand govenor?) Woodruff, Richard
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2007-05-18  7:49 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: linux-kernel, Ingo Molnar

On Thu, 2007-05-17 at 17:24 -0500, Woodruff, Richard wrote:
> > This is the original ARM dyntick stuff, right ?
> 
> Yes this is a version is not using clocksource.
> 
> > The dyntick support on your architecture is broken. Why does it fiddle
> > with the timer, when the system is not idle ?
> 
> I can't yet run the test sequence on the latest kernel so I'll have to
> wait to experiment.  A brief look at the new code seems to show a
> similar path but I need to actually run though it to understand better.
> 
> 
> On the irq_resend() path handle_dynamic_tick() is still called as
> before. 

No. NOHZ makes handle_dynamic_tick() a NOP. handle_dynamic_tick()
depends on CONFIG_NO_IDLE_HZ, which is not used when NOHZ is active.

The problem could only arise, when something would disable/enable the
timer interrupt.

	tglx



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2007-05-18  7:49         ` Thomas Gleixner
@ 2008-04-18 15:43           ` Woodruff, Richard
  2008-04-19  3:45             ` [linux-pm] " David Brownell
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Woodruff, Richard @ 2008-04-18 15:43 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar; +Cc: linux-kernel, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

Hi,

When capturing some traces with dynamic tick we were noticing the
interrupt latency seems to go up a good amount. If you look at the trace
the gpio IRQ is now offset a good amount.  Good news I guess is its
pretty predictable.

* If we couple this with progressively higher latency C-States we see
that IO speed can fall by a good amount, especially for PIO mixes.  Now
if QOS is maintained you may or may-not care.

I was wondering what thoughts of optimizing this might be.

One thought was if an io-ondemand of some sort was used.  It could track
interrupt statistics and be feed back into cpu-idle.  When there is a
high interrupt load period it could shrink the acceptable latency and
thus help choose a good a C-State which favors throughput.  Some moving
average window could be used to track it.

Perhaps a new interrupt attribute could be attached at irq request time
to allow the tracking of bandwidth important devices.

The attached is captured on a .22 kernel.  The same should be available
in a bit on a .24 kernel.

Regards,
Richard W.


[-- Attachment #2: idle_trace.PNG --]
[-- Type: image/png, Size: 58498 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-18 15:43           ` Higer latency with dynamic tick (need for an io-ondemand govenor?) Woodruff, Richard
@ 2008-04-19  3:45             ` David Brownell
  2008-04-19  7:13               ` Thomas Gleixner
  2008-04-20  6:19             ` Arjan van de Ven
  2008-04-20 12:41             ` Andi Kleen
  2 siblings, 1 reply; 16+ messages in thread
From: David Brownell @ 2008-04-19  3:45 UTC (permalink / raw)
  To: linux-pm; +Cc: Woodruff, Richard, Thomas Gleixner, Ingo Molnar, linux-kernel

On Friday 18 April 2008, Woodruff, Richard wrote:
> When capturing some traces with dynamic tick we were noticing the
> interrupt latency seems to go up a good amount. If you look at the trace
> the gpio IRQ is now offset a good amount.  Good news I guess is its
> pretty predictable.

That is, about 24 usec on this CPU ... an ARM v7, which I'm guessing
is an OMAP34xx running fairly fast (order of 4x faster than most ARMs).

Similar issues were noted, also using ETM trace, on an ARM920 core [1]
from Atmel.  There, the overhead of NO_HZ was observed to be more like
150 usec of per-IRQ overhead, which is enough to make NO_HZ non-viable
in some configurations.


> I was wondering what thoughts of optimizing this might be.

Cutting down the math implied by jiffies updates might help.
The 64 bit math for ktime structs isn't cheap; purely by eyeball,
that was almost 1/3 the cost of that 24 usec (mostly __do_div64).

- Dave

[1] http://marc.info/?l=linux-kernel&m=120471594714499&w=2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-19  3:45             ` [linux-pm] " David Brownell
@ 2008-04-19  7:13               ` Thomas Gleixner
  2008-04-19 22:49                 ` david
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2008-04-19  7:13 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm, Woodruff, Richard, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1140 bytes --]

On Fri, 18 Apr 2008, David Brownell wrote:
> On Friday 18 April 2008, Woodruff, Richard wrote:
> > When capturing some traces with dynamic tick we were noticing the
> > interrupt latency seems to go up a good amount. If you look at the trace
> > the gpio IRQ is now offset a good amount.  Good news I guess is its
> > pretty predictable.
> 
> That is, about 24 usec on this CPU ... an ARM v7, which I'm guessing
> is an OMAP34xx running fairly fast (order of 4x faster than most ARMs).
> 
> Similar issues were noted, also using ETM trace, on an ARM920 core [1]
> from Atmel.  There, the overhead of NO_HZ was observed to be more like
> 150 usec of per-IRQ overhead, which is enough to make NO_HZ non-viable
> in some configurations.
> 
> 
> > I was wondering what thoughts of optimizing this might be.
> 
> Cutting down the math implied by jiffies updates might help.
> The 64 bit math for ktime structs isn't cheap; purely by eyeball,
> that was almost 1/3 the cost of that 24 usec (mostly __do_div64).

Hmm, I have no real good idea to avoid the div64 in the case of a long
idle sleep. Any brilliant patches are welcome :)

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-19  7:13               ` Thomas Gleixner
@ 2008-04-19 22:49                 ` david
  2008-04-20  3:51                   ` David Brownell
  0 siblings, 1 reply; 16+ messages in thread
From: david @ 2008-04-19 22:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Brownell, linux-pm, Woodruff, Richard, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1425 bytes --]

On Sat, 19 Apr 2008, Thomas Gleixner wrote:

> On Fri, 18 Apr 2008, David Brownell wrote:
>> On Friday 18 April 2008, Woodruff, Richard wrote:
>>> When capturing some traces with dynamic tick we were noticing the
>>> interrupt latency seems to go up a good amount. If you look at the trace
>>> the gpio IRQ is now offset a good amount.  Good news I guess is its
>>> pretty predictable.
>>
>> That is, about 24 usec on this CPU ... an ARM v7, which I'm guessing
>> is an OMAP34xx running fairly fast (order of 4x faster than most ARMs).
>>
>> Similar issues were noted, also using ETM trace, on an ARM920 core [1]
>> from Atmel.  There, the overhead of NO_HZ was observed to be more like
>> 150 usec of per-IRQ overhead, which is enough to make NO_HZ non-viable
>> in some configurations.
>>
>>
>>> I was wondering what thoughts of optimizing this might be.
>>
>> Cutting down the math implied by jiffies updates might help.
>> The 64 bit math for ktime structs isn't cheap; purely by eyeball,
>> that was almost 1/3 the cost of that 24 usec (mostly __do_div64).
>
> Hmm, I have no real good idea to avoid the div64 in the case of a long
> idle sleep. Any brilliant patches are welcome :)

how long is 'long idle sleep'? and how common are such sleeps? is it 
possibly worth the cost of a test in the hotpath to see if you need to do 
the 64 bit math or can get away with 32 bit math (at least on some 
platforms)

David Lang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-19 22:49                 ` david
@ 2008-04-20  3:51                   ` David Brownell
  0 siblings, 0 replies; 16+ messages in thread
From: David Brownell @ 2008-04-20  3:51 UTC (permalink / raw)
  To: david
  Cc: Thomas Gleixner, linux-pm, Woodruff, Richard, Ingo Molnar, linux-kernel

On Saturday 19 April 2008, david@lang.hm wrote:
> On Sat, 19 Apr 2008, Thomas Gleixner wrote:
> 
> > On Fri, 18 Apr 2008, David Brownell wrote:
> >> On Friday 18 April 2008, Woodruff, Richard wrote:
> >>> When capturing some traces with dynamic tick we were noticing the
> >>> interrupt latency seems to go up a good amount. 
> >>
> >>> I was wondering what thoughts of optimizing this might be.
> >>
> >> Cutting down the math implied by jiffies updates might help.

And update_wall_time() costs, too.


> >> The 64 bit math for ktime structs isn't cheap; purely by eyeball,
> >> that was almost 1/3 the cost of that 24 usec (mostly __do_div64).
> >
> > Hmm, I have no real good idea to avoid the div64 in the case of a long
> > idle sleep. Any brilliant patches are welcome :)

That is, in tick_do_update_jiffies64()?

                delta = ktime_sub(delta, tick_period);
                last_jiffies_update = ktime_add(last_jiffies_update,
                                                tick_period);

                /* Slow path for long timeouts */
                if (unlikely(delta.tv64 >= tick_period.tv64)) {
                        s64 incr = ktime_to_ns(tick_period);

                        ticks = ktime_divns(delta, incr);

                        last_jiffies_update = ktime_add_ns(last_jiffies_update,
                                                           incr * ticks);
                }
                do_timer(++ticks);

Some math not shown here is converting clocksource values
to ktimes ... cyc2ns() has a comment about needing some
optimization, I wonder if that's an issue here.

Maybe turning tick_period into an *actual* constant (it's
a function of HZ) would help a bit; "incr" too.

Re the "ticks = ktime_divns(...)":  since "incr" is constant,
the first thing that comes to mind is a binary search over a
precomputed table.

For HZ=100 (common for ARM) a table of size 128 would exceed
the normal range of NO_HZ tick rates ... down to below 1 HZ.


> how long is 'long idle sleep'? and how common are such sleeps?

The above code says "unlikely()" but that presumes very busy
systems.  I would have assumed taking more than one tick was
the most common case, since most systems spend more time idle
than working.  I certainly observe it to be the common case,
and it's a power management optimization goal.


> is it  
> possibly worth the cost of a test in the hotpath to see if you need to do 
> the 64 bit math or can get away with 32 bit math (at least on some 
> platforms)

Possibly opening a can of worms, I'll observe that when the
concern is just to update jiffies, converting to ktime values
seems all but needless.  Deltas at the level of a clocksource
can be mapped to jiffies as easily as deltas at the nsec level,
saving some work...

Those delta tables could use just 32 bit values in the most
common cases:  clocksource ticking at less than 4 GHz, and
the IRQs firing more often than once a second. 

- Dave

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-18 15:43           ` Higer latency with dynamic tick (need for an io-ondemand govenor?) Woodruff, Richard
  2008-04-19  3:45             ` [linux-pm] " David Brownell
@ 2008-04-20  6:19             ` Arjan van de Ven
  2008-04-20 14:09               ` Woodruff, Richard
  2008-04-20 12:41             ` Andi Kleen
  2 siblings, 1 reply; 16+ messages in thread
From: Arjan van de Ven @ 2008-04-20  6:19 UTC (permalink / raw)
  To: Woodruff, Richard; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, linux-pm

On Fri, 18 Apr 2008 10:43:32 -0500
"Woodruff, Richard" <r-woodruff2@ti.com> wrote:

> Hi,
> 
> When capturing some traces with dynamic tick we were noticing the
> interrupt latency seems to go up a good amount. If you look at the
> trace the gpio IRQ is now offset a good amount.  Good news I guess is
> its pretty predictable.
> 
> * If we couple this with progressively higher latency C-States we see
> that IO speed can fall by a good amount, especially for PIO mixes.
> Now if QOS is maintained you may or may-not care.
> 
> I was wondering what thoughts of optimizing this might be.
> 
> One thought was if an io-ondemand of some sort was used.  It could
> track interrupt statistics and be feed back into cpu-idle.  When
> there is a high interrupt load period it could shrink the acceptable
> latency and thus help choose a good a C-State which favors
> throughput.  Some moving average window could be used to track it.
> 
> Perhaps a new interrupt attribute could be attached at irq request
> time to allow the tracking of bandwidth important devices.
> 
> The attached is captured on a .22 kernel.  The same should be
> available in a bit on a .24 kernel.


So right now we have the pmqos framework (and before that we had a simpler version of this);
so if your realtime (or realtime-like) system cannot deal with latency longer then X usec,
you can just tell the kernel,, and the deeper power states that have this latency, just won't get used.

What you're mentioning is sort-of-kinda different. It's the "most of the time go as deep as you can,
but when I do IO, it hurts throughput".
There's two approaches to that in principle
1) Work based on historic behavior, and go less deep when there's lots of activity in the (recent) past
   A few folks at Intel are working on something like this
2) You have the IO layer tell the kernel "heads up, something coming down soon"
   This is more involved, especially since it's harder to predict when the disk will be done.
   (it could be a 10msec seek, but it could also be in the disks cache memory, or it could be an SSD or,
   the disk may have to read the sector 5 times because of weak magnetics... it's all over the map)
   Another complication is that we need to only do this for "synchronous" IO's.. which is known at higher layers
   in the block stack, but I think gets lost towards the bottom.

There's another problem with 2); in a multicore world; all packages EXACPT for the one which will get the irq can go to a deeper state anyway....
but it might be hard to predict which CPU will get the completion irq.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-18 15:43           ` Higer latency with dynamic tick (need for an io-ondemand govenor?) Woodruff, Richard
  2008-04-19  3:45             ` [linux-pm] " David Brownell
  2008-04-20  6:19             ` Arjan van de Ven
@ 2008-04-20 12:41             ` Andi Kleen
  2008-04-20 14:21               ` Woodruff, Richard
  2 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2008-04-20 12:41 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, linux-pm, lenb

"Woodruff, Richard" <r-woodruff2@ti.com> writes:

> When capturing some traces with dynamic tick we were noticing the
> interrupt latency seems to go up a good amount. If you look at the trace
> the gpio IRQ is now offset a good amount.  Good news I guess is its
> pretty predictable.
>
> * If we couple this with progressively higher latency C-States we see
> that IO speed can fall by a good amount, especially for PIO mixes.  Now
> if QOS is maintained you may or may-not care.
>
> I was wondering what thoughts of optimizing this might be.
>
> One thought was if an io-ondemand of some sort was used.  It could track
> interrupt statistics and be feed back into cpu-idle.  When there is a
> high interrupt load period it could shrink the acceptable latency and
> thus help choose a good a C-State which favors throughput.  Some moving
> average window could be used to track it.
>
> Perhaps a new interrupt attribute could be attached at irq request time
> to allow the tracking of bandwidth important devices.
>
> The attached is captured on a .22 kernel.  The same should be available
> in a bit on a .24 kernel.

Are you talking about x86? 

On older x86 this effect should have been handled by the C state
algorithm taking the bus master activity register into account (which
should also trigger for interrupts)

But I think the register has been nop'ed on newer platforms 
so indeed we'll need some way to handle this.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-20  6:19             ` Arjan van de Ven
@ 2008-04-20 14:09               ` Woodruff, Richard
  0 siblings, 0 replies; 16+ messages in thread
From: Woodruff, Richard @ 2008-04-20 14:09 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, linux-pm

On Sun, 20 Apr 2008 1:20: -0500
"Arjan van de Ven" < arjan@infradead.org > wrote:

> So right now we have the pmqos framework (and before that we had a
> simpler version of this);
> so if your realtime (or realtime-like) system cannot deal with latency
> longer then X usec,
> you can just tell the kernel,, and the deeper power states that have
> this latency, just won't get used.

Yes.  We're already using the older version today (sparingly) to set worst acceptable latency before failure.  This does work.

> What you're mentioning is sort-of-kinda different. It's the "most of the
> time go as deep as you can,
> but when I do IO, it hurts throughput".

That's 100% correct.

> There's two approaches to that in principle
> 1) Work based on historic behavior, and go less deep when there's lots
> of activity in the (recent) past
>    A few folks at Intel are working on something like this

Any data to share here? 

Interrupt frequency seemed like a good pivot, I thought I might experiment starting here.  Opinion?

> 2) You have the IO layer tell the kernel "heads up, something coming
> down soon"
>    This is more involved, especially since it's harder to predict when
> the disk will be done.
>    (it could be a 10msec seek, but it could also be in the disks cache
> memory, or it could be an SSD or,
>    the disk may have to read the sector 5 times because of weak
> magnetics... it's all over the map)
>    Another complication is that we need to only do this for
> "synchronous" IO's.. which is known at higher layers
>    in the block stack, but I think gets lost towards the bottom.

Not so many phones have a disk to care about the above device example.  However, I been wondering if the use of an mmap to non-file-backed devices and the use of the fadvice() hint might be useful for other classes of devices.  We have some media devices which do special mappings.

For UMPC's is the disk the main problem device which is slowing down the system?

Some of the slow downs I was talking about were happening in HS-USB devices.

> There's another problem with 2); in a multicore world; all packages
> EXACPT for the one which will get the irq can go to a deeper state
> anyway....
> but it might be hard to predict which CPU will get the completion irq.

I see. I expect there are some interesting problems there.  I would hope some affinity would keep an IRQ on the same CPU most of the time.  But like you say if its offline or in a very high latency state you would have to re-route that IRQ to a machine capable of handling it.

I know of some IRQ IP blocks with some QOS features, this seems like a place to use these.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-20 12:41             ` Andi Kleen
@ 2008-04-20 14:21               ` Woodruff, Richard
  2008-04-20 14:26                 ` Andi Kleen
  0 siblings, 1 reply; 16+ messages in thread
From: Woodruff, Richard @ 2008-04-20 14:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, linux-pm, lenb

"Andi Kleen" <andi@firstfloor.org> writes:
> Are you talking about x86?

ARM (TI-OMAP)

> On older x86 this effect should have been handled by the C state
> algorithm taking the bus master activity register into account (which
> should also trigger for interrupts)

Well, today we still do use the bus master hook for ARM.  In my terms it is an 'OCP bus initiator' not master.  But it is logically the same.

If some device is actively pushing data on the bus, this will limit a C-State's choice.  In our hardware if you are using the 'hw-auto' features you won't hit the state anyway even if you try as hw-activity figures into some automatic state machines. However, the cost of context savings for some of the states is high enough that it is still worth the bus-master activity check.  You want to avoid unnecessary context stacking if you are not going to hit the state.

> But I think the register has been nop'ed on newer platforms
> so indeed we'll need some way to handle this.

The activity is still useful.  Not sure about the underlying X86 hardware implementation.  For me it's just exporting some activity for masters & slaves as seen by the interconnect.

Regards,
Richard W.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Higer latency with dynamic tick (need for an io-ondemand govenor?)
  2008-04-20 14:21               ` Woodruff, Richard
@ 2008-04-20 14:26                 ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2008-04-20 14:26 UTC (permalink / raw)
  To: Woodruff, Richard
  Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, linux-pm, lenb

Woodruff, Richard wrote:
> "Andi Kleen" <andi@firstfloor.org> writes:
>> Are you talking about x86?
> 
> ARM (TI-OMAP)

Sorry I was confused because you used the term "C-state" which is normally ACPI (x86/ia64) 
specific. If someone says C states I assume ACPI and usually x86 by default
due to lack of deeper sleep states on most ia64s.

> Not sure about the underlying X86 hardware implementation.

On x86 the trend is for the hardware/firmware/SMM doing more and more of this on its own,
as in deciding by itself how deep it wants to sleep.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-04-20 14:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-16 23:20 bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND Woodruff, Richard
2007-05-17 10:10 ` Thomas Gleixner
2007-05-17 20:14   ` Woodruff, Richard
2007-05-17 20:38     ` Thomas Gleixner
2007-05-17 22:24       ` Woodruff, Richard
2007-05-18  7:49         ` Thomas Gleixner
2008-04-18 15:43           ` Higer latency with dynamic tick (need for an io-ondemand govenor?) Woodruff, Richard
2008-04-19  3:45             ` [linux-pm] " David Brownell
2008-04-19  7:13               ` Thomas Gleixner
2008-04-19 22:49                 ` david
2008-04-20  3:51                   ` David Brownell
2008-04-20  6:19             ` Arjan van de Ven
2008-04-20 14:09               ` Woodruff, Richard
2008-04-20 12:41             ` Andi Kleen
2008-04-20 14:21               ` Woodruff, Richard
2008-04-20 14:26                 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).