linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH] idle using PNI monitor/mwait
@ 2003-07-09  0:35 Nakajima, Jun
  2003-07-09  6:41 ` Zwane Mwaikambo
  0 siblings, 1 reply; 8+ messages in thread
From: Nakajima, Jun @ 2003-07-09  0:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Saxena, Sunil, Mallick, Asit K, Pallipadi, Venkatesh

That's right. If we have a lot of high-contention locks in the kernel,
we need to fix the code first, to get benefits for the other
architectures. 

"mwait" granularity (64-byte, for example) is given by the cpuid
instruction, and we did not use it because 1) it's unlikely that the
other fields of the task structure are modified when it's idle, 2) the
processor needs to check the flag after mwait anyway, to avoid waking up
with a false signal caused by other break events (i.e. mwait is a hint).

Jun

> -----Original Message-----
> From: Linus Torvalds [mailto:torvalds@osdl.org]
> Sent: Tuesday, July 08, 2003 4:34 PM
> To: Nakajima, Jun
> Cc: linux-kernel@vger.kernel.org; Saxena, Sunil; Mallick, Asit K;
> Pallipadi, Venkatesh
> Subject: Re: [PATCH] idle using PNI monitor/mwait
> 
> 
> On Tue, 8 Jul 2003, Nakajima, Jun wrote:
> >
> > Attached is a patch that enables PNI (Prescott New Instructions)
> > monitor/mwait in kernel idle (opcodes are now public). Basically
MWAIT
> > is similar to hlt, but you can avoid IPI to wake up the processor
> > waiting. A write (by another processor) to the address range
specified
> > by MONITOR would wake up the processor waiting on MWAIT.
> 
> How about spinlocks? Does it make sense to make the contention code
use
> mwait too, or are the latencies too high? Not that we have a lot of
> high-contention locks any more, so maybe it doesn't much matter.
> 
> Also, wasn't there some flag to set the "mwait" granularity? I don't
see
> anything like that in the patch..
> 
> 		Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] idle using PNI monitor/mwait
  2003-07-09  0:35 [PATCH] idle using PNI monitor/mwait Nakajima, Jun
@ 2003-07-09  6:41 ` Zwane Mwaikambo
  0 siblings, 0 replies; 8+ messages in thread
From: Zwane Mwaikambo @ 2003-07-09  6:41 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: Linus Torvalds, linux-kernel, Saxena, Sunil, Mallick, Asit K,
	Pallipadi, Venkatesh

On Tue, 8 Jul 2003, Nakajima, Jun wrote:

> That's right. If we have a lot of high-contention locks in the kernel,
> we need to fix the code first, to get benefits for the other
> architectures. 
> 
> "mwait" granularity (64-byte, for example) is given by the cpuid
> instruction, and we did not use it because 1) it's unlikely that the
> other fields of the task structure are modified when it's idle, 2) the
> processor needs to check the flag after mwait anyway, to avoid waking up
> with a false signal caused by other break events (i.e. mwait is a hint).

It could still be very handy for polling loops of the form;

while (!ready)
	__asm__ ("pause;");

Jun would there be any thermal advantages over using poll and pause ?

Thanks,
	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] idle using PNI monitor/mwait
@ 2003-07-10  1:17 Saxena, Sunil
  0 siblings, 0 replies; 8+ messages in thread
From: Saxena, Sunil @ 2003-07-10  1:17 UTC (permalink / raw)
  To: Zwane Mwaikambo, Nakajima, Jun
  Cc: Linus Torvalds, linux-kernel, Mallick, Asit K, Pallipadi, Venkatesh

Thermal advantages may be there and like "pause" they would be
implementation specific.

Thanks
Sunil

-----Original Message-----
From: Zwane Mwaikambo [mailto:zwane@arm.linux.org.uk] 
Sent: Tuesday, July 08, 2003 11:42 PM
To: Nakajima, Jun
Cc: Linus Torvalds; linux-kernel@vger.kernel.org; Saxena, Sunil;
Mallick, Asit K; Pallipadi, Venkatesh
Subject: RE: [PATCH] idle using PNI monitor/mwait

On Tue, 8 Jul 2003, Nakajima, Jun wrote:

> That's right. If we have a lot of high-contention locks in the kernel,
> we need to fix the code first, to get benefits for the other
> architectures. 
> 
> "mwait" granularity (64-byte, for example) is given by the cpuid
> instruction, and we did not use it because 1) it's unlikely that the
> other fields of the task structure are modified when it's idle, 2) the
> processor needs to check the flag after mwait anyway, to avoid waking
up
> with a false signal caused by other break events (i.e. mwait is a
hint).

It could still be very handy for polling loops of the form;

while (!ready)
	__asm__ ("pause;");

Jun would there be any thermal advantages over using poll and pause ?

Thanks,
	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] idle using PNI monitor/mwait
@ 2003-07-09 17:01 Mallick, Asit K
  0 siblings, 0 replies; 8+ messages in thread
From: Mallick, Asit K @ 2003-07-09 17:01 UTC (permalink / raw)
  To: Nakajima, Jun, Linus Torvalds
  Cc: linux-kernel, Saxena, Sunil, Pallipadi, Venkatesh

Linus,

We are analyzing the performance of use of mwait in contention codes. We
do not have all the data yet and will let you know the benefit of use of
mwait in contention code.
Thanks,
Asit


> -----Original Message-----
> From: Nakajima, Jun 
> Sent: Tuesday, July 08, 2003 5:36 PM
> To: 'Linus Torvalds'
> Cc: linux-kernel@vger.kernel.org; Saxena, Sunil; Mallick, 
> Asit K; Pallipadi, Venkatesh
> Subject: RE: [PATCH] idle using PNI monitor/mwait
> 
> 
> That's right. If we have a lot of high-contention locks in 
> the kernel, we need to fix the code first, to get benefits 
> for the other architectures. 
> 
> "mwait" granularity (64-byte, for example) is given by the 
> cpuid instruction, and we did not use it because 1) it's 
> unlikely that the other fields of the task structure are 
> modified when it's idle, 2) the processor needs to check the 
> flag after mwait anyway, to avoid waking up with a false 
> signal caused by other break events (i.e. mwait is a hint).
> 
> Jun
> 
> > -----Original Message-----
> > From: Linus Torvalds [mailto:torvalds@osdl.org]
> > Sent: Tuesday, July 08, 2003 4:34 PM
> > To: Nakajima, Jun
> > Cc: linux-kernel@vger.kernel.org; Saxena, Sunil; Mallick, Asit K;
> > Pallipadi, Venkatesh
> > Subject: Re: [PATCH] idle using PNI monitor/mwait
> > 
> > 
> > On Tue, 8 Jul 2003, Nakajima, Jun wrote:
> > >
> > > Attached is a patch that enables PNI (Prescott New Instructions)
> > > monitor/mwait in kernel idle (opcodes are now public). 
> Basically MWAIT
> > > is similar to hlt, but you can avoid IPI to wake up the processor
> > > waiting. A write (by another processor) to the address 
> range specified
> > > by MONITOR would wake up the processor waiting on MWAIT.
> > 
> > How about spinlocks? Does it make sense to make the 
> contention code use
> > mwait too, or are the latencies too high? Not that we have a lot of
> > high-contention locks any more, so maybe it doesn't much matter.
> > 
> > Also, wasn't there some flag to set the "mwait" 
> granularity? I don't see
> > anything like that in the patch..
> > 
> > 		Linus
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] idle using PNI monitor/mwait
@ 2003-07-09 16:39 Mallick, Asit K
  0 siblings, 0 replies; 8+ messages in thread
From: Mallick, Asit K @ 2003-07-09 16:39 UTC (permalink / raw)
  To: Alan Cox, Nakajima, Jun
  Cc: Linus Torvalds, Linux Kernel Mailing List, Saxena, Sunil,
	Pallipadi, Venkatesh

Alan,
Mwait is not dependent directly on the processor and any bus master
write will wake up the mwait. So, your example will also work.
Thanks,
Asit


> -----Original Message-----
> From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk] 
> Sent: Wednesday, July 09, 2003 4:00 AM
> To: Nakajima, Jun
> Cc: Linus Torvalds; Linux Kernel Mailing List; Saxena, Sunil; 
> Mallick, Asit K; Pallipadi, Venkatesh
> Subject: Re: [PATCH] idle using PNI monitor/mwait
> 
> 
> On Maw, 2003-07-08 at 22:23, Nakajima, Jun wrote:
> > Hi Linus,
> > 
> > Attached is a patch that enables PNI (Prescott New Instructions)
> > monitor/mwait in kernel idle (opcodes are now public). 
> Basically MWAIT
> > is similar to hlt, but you can avoid IPI to wake up the processor
> > waiting. A write (by another processor) to the address 
> range specified
> > by MONITOR would wake up the processor waiting on MWAIT.
> 
> Is mwait dependant on cached cpu memory and the cache 
> exclusivity logic
> or directly on the processor. In other words can I use mwait in future
> to wait for DMA to hit a given location ? - Im mostly thinking about
> debugging uses 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] idle using PNI monitor/mwait
  2003-07-08 21:23 Nakajima, Jun
  2003-07-08 23:34 ` Linus Torvalds
@ 2003-07-09 10:59 ` Alan Cox
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Cox @ 2003-07-09 10:59 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: Linus Torvalds, Linux Kernel Mailing List, Saxena, Sunil,
	Mallick, Asit K, Pallipadi, Venkatesh

On Maw, 2003-07-08 at 22:23, Nakajima, Jun wrote:
> Hi Linus,
> 
> Attached is a patch that enables PNI (Prescott New Instructions)
> monitor/mwait in kernel idle (opcodes are now public). Basically MWAIT
> is similar to hlt, but you can avoid IPI to wake up the processor
> waiting. A write (by another processor) to the address range specified
> by MONITOR would wake up the processor waiting on MWAIT.

Is mwait dependant on cached cpu memory and the cache exclusivity logic
or directly on the processor. In other words can I use mwait in future
to wait for DMA to hit a given location ? - Im mostly thinking about
debugging uses 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] idle using PNI monitor/mwait
  2003-07-08 21:23 Nakajima, Jun
@ 2003-07-08 23:34 ` Linus Torvalds
  2003-07-09 10:59 ` Alan Cox
  1 sibling, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2003-07-08 23:34 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: linux-kernel, Saxena, Sunil, Mallick, Asit K, Pallipadi, Venkatesh


On Tue, 8 Jul 2003, Nakajima, Jun wrote:
> 
> Attached is a patch that enables PNI (Prescott New Instructions)
> monitor/mwait in kernel idle (opcodes are now public). Basically MWAIT
> is similar to hlt, but you can avoid IPI to wake up the processor
> waiting. A write (by another processor) to the address range specified
> by MONITOR would wake up the processor waiting on MWAIT.

How about spinlocks? Does it make sense to make the contention code use 
mwait too, or are the latencies too high? Not that we have a lot of 
high-contention locks any more, so maybe it doesn't much matter.

Also, wasn't there some flag to set the "mwait" granularity? I don't see 
anything like that in the patch..

		Linus



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] idle using PNI monitor/mwait
@ 2003-07-08 21:23 Nakajima, Jun
  2003-07-08 23:34 ` Linus Torvalds
  2003-07-09 10:59 ` Alan Cox
  0 siblings, 2 replies; 8+ messages in thread
From: Nakajima, Jun @ 2003-07-08 21:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Saxena, Sunil, Mallick, Asit K, Pallipadi, Venkatesh

[-- Attachment #1: Type: text/plain, Size: 4348 bytes --]

Hi Linus,

Attached is a patch that enables PNI (Prescott New Instructions)
monitor/mwait in kernel idle (opcodes are now public). Basically MWAIT
is similar to hlt, but you can avoid IPI to wake up the processor
waiting. A write (by another processor) to the address range specified
by MONITOR would wake up the processor waiting on MWAIT.

Please apply.

Thanks,
Jun

----------------
diff -ur /build/orig/linux-2.5.74/arch/i386/kernel/cpu/intel.c
linux-2.5.74/arch/i386/kernel/cpu/intel.c
--- /build/orig/linux-2.5.74/arch/i386/kernel/cpu/intel.c
2003-07-02 13:43:55.000000000 -0700
+++ linux-2.5.74/arch/i386/kernel/cpu/intel.c	2003-07-08
09:18:28.000000000 -0700
@@ -13,6 +13,7 @@
 
 static int disable_P4_HT __initdata = 0;
 extern int trap_init_f00f_bug(void);
+extern void select_idle_routine(const struct cpuinfo_x86 *c);
 
 #ifdef CONFIG_X86_INTEL_USERCOPY
 /*
@@ -172,7 +173,7 @@
 	}
 #endif
 
-
+	select_idle_routine(c);
 	if (c->cpuid_level > 1) {
 		/* supports eax=2  call */
 		int i, j, n;
diff -ur /build/orig/linux-2.5.74/arch/i386/kernel/process.c
linux-2.5.74/arch/i386/kernel/process.c
--- /build/orig/linux-2.5.74/arch/i386/kernel/process.c	2003-07-02
13:38:40.000000000 -0700
+++ linux-2.5.74/arch/i386/kernel/process.c	2003-07-08
11:52:42.000000000 -0700
@@ -148,11 +148,56 @@
 	}
 }
 
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, 
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait
state 
+ * through MWAIT. Whenever someone changes need_resched, we would be
woken 
+ * up from MWAIT (without an IPI).
+ */
+static void mwait_idle (void)
+{
+	local_irq_enable();
+
+	if (!need_resched()) {
+		set_thread_flag(TIF_POLLING_NRFLAG);
+		do {
+			__monitor((void *)&current_thread_info()->flags,
0, 0);
+			if (need_resched())
+				break;
+			__mwait(0, 0);
+		} while (!need_resched());
+		clear_thread_flag(TIF_POLLING_NRFLAG);
+	}
+}
+
+void __init select_idle_routine(const struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_MWAIT)) {
+		printk("Monitor/Mwait feature present.\n");
+		/* 
+		 * Skip, if setup has overridden idle.
+		 * Also, take care of system with asymmetric CPUs.
+		 * Use, mwait_idle only if all cpus support it.
+		 * If not, we fallback to default_idle()
+		 */
+		if (!pm_idle) {
+			pm_idle = mwait_idle;
+		}
+		return;
+	}
+	pm_idle = default_idle;
+	return;
+}
+
 static int __init idle_setup (char *str)
 {
 	if (!strncmp(str, "poll", 4)) {
 		printk("using polling idle threads.\n");
 		pm_idle = poll_idle;
+	} else if (!strncmp(str, "halt", 4)) {
+		printk("using halt in idle threads.\n");
+		pm_idle = default_idle;
 	}
 
 	return 1;
diff -ur /build/orig/linux-2.5.74/include/asm-i386/cpufeature.h
linux-2.5.74/include/asm-i386/cpufeature.h
--- /build/orig/linux-2.5.74/include/asm-i386/cpufeature.h
2003-07-02 13:51:50.000000000 -0700
+++ linux-2.5.74/include/asm-i386/cpufeature.h	2003-07-08
09:18:28.000000000 -0700
@@ -71,6 +71,8 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_EST		(4*32+ 7) /* Enhanced SpeedStep
*/
+#define X86_FEATURE_MWAIT	(4*32+ 3) /* Monitor/Mwait support */
+
 
 /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word
5 */
 #define X86_FEATURE_XSTORE	(5*32+ 2) /* on-CPU RNG present (xstore
insn) */
diff -ur /build/orig/linux-2.5.74/include/asm-i386/processor.h
linux-2.5.74/include/asm-i386/processor.h
--- /build/orig/linux-2.5.74/include/asm-i386/processor.h
2003-07-02 13:40:24.000000000 -0700
+++ linux-2.5.74/include/asm-i386/processor.h	2003-07-08
09:18:28.000000000 -0700
@@ -272,6 +272,22 @@
 #define pc98 0
 #endif
 
+static __inline__ void __monitor(const void *eax, unsigned long ecx, 
+		unsigned long edx)
+{
+	/* "monitor %eax,%ecx,%edx;" */
+	asm volatile(
+		".byte 0x0f,0x01,0xc8;"
+		: :"a" (eax), "c" (ecx), "d"(edx));
+}
+
+static __inline__ void __mwait(unsigned long eax, unsigned long ecx)
+{
+	/* "mwait %eax,%ecx;" */
+	asm volatile(
+		".byte 0x0f,0x01,0xc9;"
+		: :"a" (eax), "c" (ecx));
+}
 
 /* from system description table in BIOS.  Mostly for MCA use, but
 others may find it useful. */



[-- Attachment #2: mwait-2.5.74.patch --]
[-- Type: application/octet-stream, Size: 3807 bytes --]

diff -ur /build/orig/linux-2.5.74/arch/i386/kernel/cpu/intel.c linux-2.5.74/arch/i386/kernel/cpu/intel.c
--- /build/orig/linux-2.5.74/arch/i386/kernel/cpu/intel.c	2003-07-02 13:43:55.000000000 -0700
+++ linux-2.5.74/arch/i386/kernel/cpu/intel.c	2003-07-08 09:18:28.000000000 -0700
@@ -13,6 +13,7 @@
 
 static int disable_P4_HT __initdata = 0;
 extern int trap_init_f00f_bug(void);
+extern void select_idle_routine(const struct cpuinfo_x86 *c);
 
 #ifdef CONFIG_X86_INTEL_USERCOPY
 /*
@@ -172,7 +173,7 @@
 	}
 #endif
 
-
+	select_idle_routine(c);
 	if (c->cpuid_level > 1) {
 		/* supports eax=2  call */
 		int i, j, n;
diff -ur /build/orig/linux-2.5.74/arch/i386/kernel/process.c linux-2.5.74/arch/i386/kernel/process.c
--- /build/orig/linux-2.5.74/arch/i386/kernel/process.c	2003-07-02 13:38:40.000000000 -0700
+++ linux-2.5.74/arch/i386/kernel/process.c	2003-07-08 11:52:42.000000000 -0700
@@ -148,11 +148,56 @@
 	}
 }
 
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, 
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait state 
+ * through MWAIT. Whenever someone changes need_resched, we would be woken 
+ * up from MWAIT (without an IPI).
+ */
+static void mwait_idle (void)
+{
+	local_irq_enable();
+
+	if (!need_resched()) {
+		set_thread_flag(TIF_POLLING_NRFLAG);
+		do {
+			__monitor((void *)&current_thread_info()->flags, 0, 0);
+			if (need_resched())
+				break;
+			__mwait(0, 0);
+		} while (!need_resched());
+		clear_thread_flag(TIF_POLLING_NRFLAG);
+	}
+}
+
+void __init select_idle_routine(const struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_MWAIT)) {
+		printk("Monitor/Mwait feature present.\n");
+		/* 
+		 * Skip, if setup has overridden idle.
+		 * Also, take care of system with asymmetric CPUs.
+		 * Use, mwait_idle only if all cpus support it.
+		 * If not, we fallback to default_idle()
+		 */
+		if (!pm_idle) {
+			pm_idle = mwait_idle;
+		}
+		return;
+	}
+	pm_idle = default_idle;
+	return;
+}
+
 static int __init idle_setup (char *str)
 {
 	if (!strncmp(str, "poll", 4)) {
 		printk("using polling idle threads.\n");
 		pm_idle = poll_idle;
+	} else if (!strncmp(str, "halt", 4)) {
+		printk("using halt in idle threads.\n");
+		pm_idle = default_idle;
 	}
 
 	return 1;
diff -ur /build/orig/linux-2.5.74/include/asm-i386/cpufeature.h linux-2.5.74/include/asm-i386/cpufeature.h
--- /build/orig/linux-2.5.74/include/asm-i386/cpufeature.h	2003-07-02 13:51:50.000000000 -0700
+++ linux-2.5.74/include/asm-i386/cpufeature.h	2003-07-08 09:18:28.000000000 -0700
@@ -71,6 +71,8 @@
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_EST		(4*32+ 7) /* Enhanced SpeedStep */
+#define X86_FEATURE_MWAIT	(4*32+ 3) /* Monitor/Mwait support */
+
 
 /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
 #define X86_FEATURE_XSTORE	(5*32+ 2) /* on-CPU RNG present (xstore insn) */
diff -ur /build/orig/linux-2.5.74/include/asm-i386/processor.h linux-2.5.74/include/asm-i386/processor.h
--- /build/orig/linux-2.5.74/include/asm-i386/processor.h	2003-07-02 13:40:24.000000000 -0700
+++ linux-2.5.74/include/asm-i386/processor.h	2003-07-08 09:18:28.000000000 -0700
@@ -272,6 +272,22 @@
 #define pc98 0
 #endif
 
+static __inline__ void __monitor(const void *eax, unsigned long ecx, 
+		unsigned long edx)
+{
+	/* "monitor %eax,%ecx,%edx;" */
+	asm volatile(
+		".byte 0x0f,0x01,0xc8;"
+		: :"a" (eax), "c" (ecx), "d"(edx));
+}
+
+static __inline__ void __mwait(unsigned long eax, unsigned long ecx)
+{
+	/* "mwait %eax,%ecx;" */
+	asm volatile(
+		".byte 0x0f,0x01,0xc9;"
+		: :"a" (eax), "c" (ecx));
+}
 
 /* from system description table in BIOS.  Mostly for MCA use, but
 others may find it useful. */

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-07-10  1:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-09  0:35 [PATCH] idle using PNI monitor/mwait Nakajima, Jun
2003-07-09  6:41 ` Zwane Mwaikambo
  -- strict thread matches above, loose matches on Subject: below --
2003-07-10  1:17 Saxena, Sunil
2003-07-09 17:01 Mallick, Asit K
2003-07-09 16:39 Mallick, Asit K
2003-07-08 21:23 Nakajima, Jun
2003-07-08 23:34 ` Linus Torvalds
2003-07-09 10:59 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).