linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: HT and idle = poll
@ 2003-03-06 21:15 Nakajima, Jun
  2003-03-06 22:42 ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Nakajima, Jun @ 2003-03-06 21:15 UTC (permalink / raw)
  To: Linus Torvalds, Alan Cox; +Cc: Linux Kernel Mailing List

Linus,

That's correct. Basically mwait is similar to hlt, but you can avoid IPI to wake up the processor waiting. A write to the address specified by monitor wakes up the processor, unlike hlt.

So our plan is to use monitor/mwait in the idle loop, for example, in the kernel to lower the latency.

Jun

> -----Original Message-----
> From: Linus Torvalds [mailto:torvalds@transmeta.com]
> Sent: Thursday, March 06, 2003 12:09 PM
> To: Alan Cox
> Cc: Linux Kernel Mailing List
> Subject: Re: HT and idle = poll
> 
> 
> On 6 Mar 2003, Alan Cox wrote:
> > On Thu, 2003-03-06 at 19:30, Linus Torvalds wrote:
> > > >So, don't use idle=poll with HT when you know your workload has idle
> time!  I
> > > >have not tried oprofile, but it stands to reason that this would be a
> >
> > idle=poll probably needs to be doing "rep nop" in a tight loop.
> 
> We already do that. It's not enough. The HT thing will still steal cycles
> continually, since the "rep nop" is really only equivalent to a
> "sched_yield()".
> 
> Think of "rep nop" as yielding, and "mwait" as a true wait.
> 
> (I don't actually have any real information on "mwait", so I may be wrong
> about the details on the new instructions. They looked obvious enough,
> though).
> 
> 		Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: HT and idle = poll
  2003-03-06 21:15 HT and idle = poll Nakajima, Jun
@ 2003-03-06 22:42 ` Alan Cox
  0 siblings, 0 replies; 12+ messages in thread
From: Alan Cox @ 2003-03-06 22:42 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Thu, 2003-03-06 at 21:15, Nakajima, Jun wrote:
> Linus,
> 
> That's correct. Basically mwait is similar to hlt, but you can avoid IPI to wake up the processor waiting. A write to the address specified by monitor wakes up the processor, unlike hlt.
> 
> So our plan is to use monitor/mwait in the idle loop, for example, in the kernel to lower the latency.

Thats nice. It means you've got the basis of the instructions (although not quite the same
exact functionality) as Brian Grayson proposed four years ago with Armadillo.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 22:22   ` Martin J. Bligh
@ 2003-03-06 23:59     ` John Levon
  0 siblings, 0 replies; 12+ messages in thread
From: John Levon @ 2003-03-06 23:59 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

On Thu, Mar 06, 2003 at 02:22:48PM -0800, Martin J. Bligh wrote:

> BTW, could someone give a brief summary of why idle=poll is needed for 
> oprofile, I'd love to add it do the "documentation for dummies" file I
> was writing.

Because events like CPU_CLK_UNHALTED don't tick when the cpu is halted,
so the idle time doesn't show up properly in the kernel profile.
idle=poll doesn't hlt so the profile for poll_idle() reflects the actual
idle percentage.

Something like that anyway.

john

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 20:08     ` Linus Torvalds
@ 2003-03-06 22:36       ` Eric Northup
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Northup @ 2003-03-06 22:36 UTC (permalink / raw)
  To: Linus Torvalds, Alan Cox; +Cc: Linux Kernel Mailing List, Andrew Theurer

On Thursday 06 March 2003 03:08 pm, Linus Torvalds wrote:
> On 6 Mar 2003, Alan Cox wrote:
> > idle=poll probably needs to be doing "rep nop" in a tight loop.
>
> We already do that. It's not enough. The HT thing will still steal cycles
> continually, since the "rep nop" is really only equivalent to a
> "sched_yield()".

(Perhaps a naive idea) Right now, there is a single "rep nop" per poll.  What 
happens if you unroll the loop a few times:

while (!condition) {
	cpu_relax();
	cpu_relax();
	cpu_relax();
}

?  I have no HT hardware so can't test this.

-Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 19:30 ` Linus Torvalds
  2003-03-06 19:52   ` Davide Libenzi
  2003-03-06 21:09   ` Alan Cox
@ 2003-03-06 22:22   ` Martin J. Bligh
  2003-03-06 23:59     ` John Levon
  2 siblings, 1 reply; 12+ messages in thread
From: Martin J. Bligh @ 2003-03-06 22:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Levon

> Andrew Theurer  <habanero@us.ibm.com> wrote:
>> The test:  kernbench (average of  kernel compiles5) with -j2 on a 2 physical/4 
>> logical P4 system.  This is on 2.5.64-HTschedB3:
>> 
>> idle != poll: Elapsed: 136.692s User: 249.846s System: 30.596s CPU: 204.8%
>> idle  = poll: Elapsed: 161.868s User: 295.738s System: 32.966s CPU: 202.6%
>> 
>> A 15.5% increase in compile times.
>> 
>> So, don't use idle=poll with HT when you know your workload has idle time!  I 
>> have not tried oprofile, but it stands to reason that this would be a 
>> problem.  There's no point in using idle=poll with oprofile and HT anyway, as 
>> the cpu utilization is totally wrong with HT to begin with (more on that 
>> later).
>> 
>> Presumably a logical cpu polling while idle uses too many cpu resources 
>> unnecessarily and significantly affects the performance of its sibling. 
> 
> Btw, I think this is exactly what the new HT prescott instructions are
> for: instead of having busy loops polling for a change in memory (be it
> a spinlock or a "need_resched" flag), new HT CPU's will support a
> "mwait" instruction. 
> 
> But yes, at least for now, I really don't think you should really _ever_
> use "idle=poll" on HT-enabled hardware. The idle CPU's will just suck
> cycles from the real work.

BTW, could someone give a brief summary of why idle=poll is needed for 
oprofile, I'd love to add it do the "documentation for dummies" file I
was writing.

M.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 19:30 ` Linus Torvalds
  2003-03-06 19:52   ` Davide Libenzi
@ 2003-03-06 21:09   ` Alan Cox
  2003-03-06 20:08     ` Linus Torvalds
  2003-03-06 22:22   ` Martin J. Bligh
  2 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2003-03-06 21:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Thu, 2003-03-06 at 19:30, Linus Torvalds wrote:
> >So, don't use idle=poll with HT when you know your workload has idle time!  I 
> >have not tried oprofile, but it stands to reason that this would be a 

idle=poll probably needs to be doing "rep nop" in a tight loop. That
ironically also saves more power than "hlt" on PIV last time someone
investigated



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 20:05     ` Linus Torvalds
@ 2003-03-06 20:52       ` Davide Libenzi
  0 siblings, 0 replies; 12+ messages in thread
From: Davide Libenzi @ 2003-03-06 20:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Thu, 6 Mar 2003, Linus Torvalds wrote:

>
> On Thu, 6 Mar 2003, Davide Libenzi wrote:
> >
> > Not only. The polling CPU will also shoot a strom of memory requests,
> > clobbering the CPU's memory I/O stages.
>
> Well, that would only be true with a really crappy CPU with no caches.
>
> Polling the same location (as long as it's a pure poll, not trying to do
> some locked read-modify-write cycle) should be fine. At least for
> something like idle-polling, where the one location it _is_ polling should
> not actually be touched by anybody else until the wakeup actually happens.

We are talking about HT, don't we ? Cores share execution units and memory
requests are shot on the memory I/O units of the CPU. Before there is a
cache circuitry intervention. Something like "while (!run);" will generate
an enormous amount of memory I/O requests on the CPU's memory units. That
are shared by cores. Even with non-HT CPU, the above loop creates problems
respect of the latency to exit the loop itself when the condition will
become true. This because of the huge number of alloc request issued, that
must be, exiting the loop, 1) discarded 2) checked against reordering. But
I don't think the exit latency matters a lot here.



- Davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 21:09   ` Alan Cox
@ 2003-03-06 20:08     ` Linus Torvalds
  2003-03-06 22:36       ` Eric Northup
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2003-03-06 20:08 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List


On 6 Mar 2003, Alan Cox wrote:
> On Thu, 2003-03-06 at 19:30, Linus Torvalds wrote:
> > >So, don't use idle=poll with HT when you know your workload has idle time!  I 
> > >have not tried oprofile, but it stands to reason that this would be a 
> 
> idle=poll probably needs to be doing "rep nop" in a tight loop.

We already do that. It's not enough. The HT thing will still steal cycles 
continually, since the "rep nop" is really only equivalent to a 
"sched_yield()".

Think of "rep nop" as yielding, and "mwait" as a true wait.

(I don't actually have any real information on "mwait", so I may be wrong 
about the details on the new instructions. They looked obvious enough, 
though).

		Linus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 19:52   ` Davide Libenzi
@ 2003-03-06 20:05     ` Linus Torvalds
  2003-03-06 20:52       ` Davide Libenzi
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2003-03-06 20:05 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel


On Thu, 6 Mar 2003, Davide Libenzi wrote:
> 
> Not only. The polling CPU will also shoot a strom of memory requests,
> clobbering the CPU's memory I/O stages.

Well, that would only be true with a really crappy CPU with no caches.

Polling the same location (as long as it's a pure poll, not trying to do 
some locked read-modify-write cycle) should be fine. At least for 
something like idle-polling, where the one location it _is_ polling should 
not actually be touched by anybody else until the wakeup actually happens.

		Linus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06 19:30 ` Linus Torvalds
@ 2003-03-06 19:52   ` Davide Libenzi
  2003-03-06 20:05     ` Linus Torvalds
  2003-03-06 21:09   ` Alan Cox
  2003-03-06 22:22   ` Martin J. Bligh
  2 siblings, 1 reply; 12+ messages in thread
From: Davide Libenzi @ 2003-03-06 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Thu, 6 Mar 2003, Linus Torvalds wrote:

> But yes, at least for now, I really don't think you should really _ever_
> use "idle=poll" on HT-enabled hardware. The idle CPU's will just suck
> cycles from the real work.

Not only. The polling CPU will also shoot a strom of memory requests,
clobbering the CPU's memory I/O stages.



- Davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: HT and idle = poll
  2003-03-06  5:18 Andrew Theurer
@ 2003-03-06 19:30 ` Linus Torvalds
  2003-03-06 19:52   ` Davide Libenzi
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Linus Torvalds @ 2003-03-06 19:30 UTC (permalink / raw)
  To: linux-kernel

In article <200303052318.04647.habanero@us.ibm.com>,
Andrew Theurer  <habanero@us.ibm.com> wrote:
>The test:  kernbench (average of  kernel compiles5) with -j2 on a 2 physical/4 
>logical P4 system.  This is on 2.5.64-HTschedB3:
>
>idle != poll: Elapsed: 136.692s User: 249.846s System: 30.596s CPU: 204.8%
>idle  = poll: Elapsed: 161.868s User: 295.738s System: 32.966s CPU: 202.6%
>
>A 15.5% increase in compile times.
>
>So, don't use idle=poll with HT when you know your workload has idle time!  I 
>have not tried oprofile, but it stands to reason that this would be a 
>problem.  There's no point in using idle=poll with oprofile and HT anyway, as 
>the cpu utilization is totally wrong with HT to begin with (more on that 
>later).
>
>Presumably a logical cpu polling while idle uses too many cpu resources 
>unnecessarily and significantly affects the performance of its sibling. 

Btw, I think this is exactly what the new HT prescott instructions are
for: instead of having busy loops polling for a change in memory (be it
a spinlock or a "need_resched" flag), new HT CPU's will support a
"mwait" instruction. 

But yes, at least for now, I really don't think you should really _ever_
use "idle=poll" on HT-enabled hardware. The idle CPU's will just suck
cycles from the real work.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* HT and idle = poll
@ 2003-03-06  5:18 Andrew Theurer
  2003-03-06 19:30 ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Theurer @ 2003-03-06  5:18 UTC (permalink / raw)
  To: linux-kernel

The test:  kernbench (average of  kernel compiles5) with -j2 on a 2 physical/4 
logical P4 system.  This is on 2.5.64-HTschedB3:

idle != poll: Elapsed: 136.692s User: 249.846s System: 30.596s CPU: 204.8%
idle  = poll: Elapsed: 161.868s User: 295.738s System: 32.966s CPU: 202.6%

A 15.5% increase in compile times.

So, don't use idle=poll with HT when you know your workload has idle time!  I 
have not tried oprofile, but it stands to reason that this would be a 
problem.  There's no point in using idle=poll with oprofile and HT anyway, as 
the cpu utilization is totally wrong with HT to begin with (more on that 
later).

Presumably a logical cpu polling while idle uses too many cpu resources 
unnecessarily and significantly affects the performance of its sibling. 

-Andrew Theurer


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-03-06 23:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-06 21:15 HT and idle = poll Nakajima, Jun
2003-03-06 22:42 ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2003-03-06  5:18 Andrew Theurer
2003-03-06 19:30 ` Linus Torvalds
2003-03-06 19:52   ` Davide Libenzi
2003-03-06 20:05     ` Linus Torvalds
2003-03-06 20:52       ` Davide Libenzi
2003-03-06 21:09   ` Alan Cox
2003-03-06 20:08     ` Linus Torvalds
2003-03-06 22:36       ` Eric Northup
2003-03-06 22:22   ` Martin J. Bligh
2003-03-06 23:59     ` John Levon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).