linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: SSE related security hole
       [not found] ` <a9ncgs$2s2$1@cesium.transmeta.com.suse.lists.linux.kernel>
@ 2002-04-19 14:06   ` Andi Kleen
  2002-04-19 18:00     ` Doug Ledford
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2002-04-19 14:06 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:
> 
> Perhaps the right thing to do is to have a description in data of the
> desired initialization state and just F[NX]RSTOR it?

Sounds like the cleanest solution. The state could be saved at CPU bootup
with just MXCSR initialized.

I'll implement that for x86-64.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: SSE related security hole
  2002-04-19 14:06   ` SSE related security hole Andi Kleen
@ 2002-04-19 18:00     ` Doug Ledford
  2002-04-19 21:04       ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Doug Ledford @ 2002-04-19 18:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel

On Fri, Apr 19, 2002 at 04:06:17PM +0200, Andi Kleen wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> > 
> > Perhaps the right thing to do is to have a description in data of the
> > desired initialization state and just F[NX]RSTOR it?
> 
> Sounds like the cleanest solution. The state could be saved at CPU bootup
> with just MXCSR initialized.
> 
> I'll implement that for x86-64.

Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup is 
likely *very* much faster.  Someone should check this before we sacrifice 
100 cycles needlessly or something.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: SSE related security hole
  2002-04-19 18:00     ` Doug Ledford
@ 2002-04-19 21:04       ` Andrea Arcangeli
  2002-04-19 21:35         ` H. Peter Anvin
  2002-04-19 22:18         ` Jan Hubicka
  0 siblings, 2 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-19 21:04 UTC (permalink / raw)
  To: Andi Kleen, H. Peter Anvin, linux-kernel; +Cc: Jan Hubicka

On Fri, Apr 19, 2002 at 02:00:31PM -0400, Doug Ledford wrote:
> On Fri, Apr 19, 2002 at 04:06:17PM +0200, Andi Kleen wrote:
> > "H. Peter Anvin" <hpa@zytor.com> writes:
> > > 
> > > Perhaps the right thing to do is to have a description in data of the
> > > desired initialization state and just F[NX]RSTOR it?
> > 
> > Sounds like the cleanest solution. The state could be saved at CPU bootup
> > with just MXCSR initialized.
> > 
> > I'll implement that for x86-64.
> 
> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup is 
> likely *very* much faster.  Someone should check this before we sacrifice 
> 100 cycles needlessly or something.

most probably yes, fxrestor needs to read ram, pxor also takes some
icache and bytecode ram but it sounds like it will be faster.

Maybe we could also interleave the pxor with the xorps, since they uses
different parts of the cpu, Honza?

diff -urN ref/arch/x86_64/kernel/i387.c xmm/arch/x86_64/kernel/i387.c
--- ref/arch/x86_64/kernel/i387.c	Fri Apr 19 19:37:30 2002
+++ xmm/arch/x86_64/kernel/i387.c	Fri Apr 19 19:39:02 2002
@@ -34,6 +34,31 @@
 	struct task_struct *me = current;
 	__asm__("fninit");
 	load_mxcsr(0x1f80);
+	asm volatile("pxor %mm0, %mm0\n\t"
+		     "pxor %mm1, %mm1\n\t"
+		     "pxor %mm2, %mm2\n\t"
+		     "pxor %mm3, %mm3\n\t"
+		     "pxor %mm4, %mm4\n\t"
+		     "pxor %mm5, %mm5\n\t"
+		     "pxor %mm6, %mm6\n\t"
+		     "pxor %mm7, %mm7\n\t"
+		     "emms\n\t"
+		     "xorps %xmm0,  %xmm0\n\t"
+		     "xorps %xmm1,  %xmm1\n\t"
+		     "xorps %xmm2,  %xmm2\n\t"
+		     "xorps %xmm3,  %xmm3\n\t"
+		     "xorps %xmm4,  %xmm4\n\t"
+		     "xorps %xmm5,  %xmm5\n\t"
+		     "xorps %xmm6,  %xmm6\n\t"
+		     "xorps %xmm7,  %xmm7\n\t"
+		     "xorps %xmm8,  %xmm8\n\t"
+		     "xorps %xmm9,  %xmm9\n\t"
+		     "xorps %xmm10, %xmm10\n\t"
+		     "xorps %xmm11, %xmm11\n\t"
+		     "xorps %xmm12, %xmm12\n\t"
+		     "xorps %xmm13, %xmm13\n\t"
+		     "xorps %xmm14, %xmm14\n\t"
+		     "xorps %xmm15, %xmm15\n");
 	me->used_math = 1;
 }
 

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:04       ` Andrea Arcangeli
@ 2002-04-19 21:35         ` H. Peter Anvin
  2002-04-19 21:42           ` Andi Kleen
  2002-04-19 23:12           ` [PATCH] " Brian Gerst
  2002-04-19 22:18         ` Jan Hubicka
  1 sibling, 2 replies; 38+ messages in thread
From: H. Peter Anvin @ 2002-04-19 21:35 UTC (permalink / raw)
  To: andrea; +Cc: ak, hpa, linux-kernel, jh

>>
>> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup
>> is  likely *very* much faster.  Someone should check this before we
>> sacrifice  100 cycles needlessly or something.
>
> most probably yes, fxrestor needs to read ram, pxor also takes some
> icache and bytecode ram but it sounds like it will be faster.
>
> Maybe we could also interleave the pxor with the xorps, since they uses
> different parts of the cpu, Honza?
>

You almost certainly should.  The reason I suggested FXRSTOR is that it
would initialize the entire FPU, including any state that future
processors may add, thus reducing the likelihood of any funnies in the
future.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:35         ` H. Peter Anvin
@ 2002-04-19 21:42           ` Andi Kleen
  2002-04-20  3:23             ` Andrea Arcangeli
  2002-04-19 23:12           ` [PATCH] " Brian Gerst
  1 sibling, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2002-04-19 21:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: andrea, ak, linux-kernel, jh

On Fri, Apr 19, 2002 at 02:35:57PM -0700, H. Peter Anvin wrote:
> >>
> >> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup
> >> is  likely *very* much faster.  Someone should check this before we
> >> sacrifice  100 cycles needlessly or something.
> >
> > most probably yes, fxrestor needs to read ram, pxor also takes some
> > icache and bytecode ram but it sounds like it will be faster.
> >
> > Maybe we could also interleave the pxor with the xorps, since they uses
> > different parts of the cpu, Honza?
> >
> 
> You almost certainly should.  The reason I suggested FXRSTOR is that it
> would initialize the entire FPU, including any state that future
> processors may add, thus reducing the likelihood of any funnies in the
> future.

That's also why I like it. 

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:04       ` Andrea Arcangeli
  2002-04-19 21:35         ` H. Peter Anvin
@ 2002-04-19 22:18         ` Jan Hubicka
  1 sibling, 0 replies; 38+ messages in thread
From: Jan Hubicka @ 2002-04-19 22:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Jan Hubicka

> On Fri, Apr 19, 2002 at 02:00:31PM -0400, Doug Ledford wrote:
> > On Fri, Apr 19, 2002 at 04:06:17PM +0200, Andi Kleen wrote:
> > > "H. Peter Anvin" <hpa@zytor.com> writes:
> > > > 
> > > > Perhaps the right thing to do is to have a description in data of the
> > > > desired initialization state and just F[NX]RSTOR it?
> > > 
> > > Sounds like the cleanest solution. The state could be saved at CPU bootup
> > > with just MXCSR initialized.
> > > 
> > > I'll implement that for x86-64.
> > 
> > Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup is 
> > likely *very* much faster.  Someone should check this before we sacrifice 
> > 100 cycles needlessly or something.
> 
> most probably yes, fxrestor needs to read ram, pxor also takes some
> icache and bytecode ram but it sounds like it will be faster.
> 
> Maybe we could also interleave the pxor with the xorps, since they uses
> different parts of the cpu, Honza?

Yes, I guess that should help to at least some chips.
Definitly nothing to loose :)

Honza

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] Re: SSE related security hole
  2002-04-19 21:35         ` H. Peter Anvin
  2002-04-19 21:42           ` Andi Kleen
@ 2002-04-19 23:12           ` Brian Gerst
  2002-04-19 23:41             ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Brian Gerst @ 2002-04-19 23:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: andrea, ak, linux-kernel, jh, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

H. Peter Anvin wrote:
 >>> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs
 >>> setup is  likely *very* much faster.  Someone should check this
 >>> before we sacrifice  100 cycles needlessly or something.
 >>
 >> most probably yes, fxrestor needs to read ram, pxor also takes
 >> some icache and bytecode ram but it sounds like it will be faster.
 >>
 >> Maybe we could also interleave the pxor with the xorps, since they
 >> uses different parts of the cpu, Honza?
 >>
 >
 >
 > You almost certainly should.  The reason I suggested FXRSTOR is that
 > it would initialize the entire FPU, including any state that future
 > processors may add, thus reducing the likelihood of any funnies in
 > the future.

Here's a patch to do just that.  It initializes the saved state in the
task struct and falls through to restore_fpu().

-- 

						Brian Gerst

[-- Attachment #2: fpuclear-1 --]
[-- Type: text/plain, Size: 1951 bytes --]

diff -urN linux-2.5.8/arch/i386/kernel/i387.c linux/arch/i386/kernel/i387.c
--- linux-2.5.8/arch/i386/kernel/i387.c	Thu Mar  7 21:18:32 2002
+++ linux/arch/i386/kernel/i387.c	Fri Apr 19 18:23:53 2002
@@ -31,13 +31,20 @@
  * value at reset if we support XMM instructions and then
  * remeber the current task has used the FPU.
  */
-void init_fpu(void)
+void init_fpu(struct task_struct *tsk)
 {
-	__asm__("fninit");
-	if ( cpu_has_xmm )
-		load_mxcsr(0x1f80);
-		
-	current->used_math = 1;
+	if ( cpu_has_xmm ) {
+		memset(&tsk->thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct));
+		tsk->thread.i387.fxsave.cwd = 0x37f;
+		tsk->thread.i387.fxsave.mxcsr = 0x1f80;
+	} else {
+		memset(&tsk->thread.i387.fsave, 0, sizeof(struct i387_fsave_struct));
+		tsk->thread.i387.fsave.cwd = 0xffff037f;
+		tsk->thread.i387.fsave.swd = 0xffff0000;
+		tsk->thread.i387.fsave.twd = 0xffffffff;
+		tsk->thread.i387.fsave.fos = 0xffff0000;
+	}
+	tsk->used_math = 1;
 }
 
 /*
diff -urN linux-2.5.8/arch/i386/kernel/traps.c linux/arch/i386/kernel/traps.c
--- linux-2.5.8/arch/i386/kernel/traps.c	Sun Apr 14 23:48:18 2002
+++ linux/arch/i386/kernel/traps.c	Fri Apr 19 18:22:12 2002
@@ -757,13 +757,12 @@
  */
 asmlinkage void math_state_restore(struct pt_regs regs)
 {
+	struct task_struct *tsk = current;
 	clts();		/* Allow maths ops (or we recurse) */
 
-	if (current->used_math) {
-		restore_fpu(current);
-	} else {
-		init_fpu();
-	}
+	if (!tsk->used_math)
+		init_fpu(tsk);
+	restore_fpu(tsk);
 	set_thread_flag(TIF_USEDFPU);	/* So we fnsave on switch_to() */
 }
 
diff -urN linux-2.5.8/include/asm-i386/i387.h linux/include/asm-i386/i387.h
--- linux-2.5.8/include/asm-i386/i387.h	Fri Apr 19 18:44:48 2002
+++ linux/include/asm-i386/i387.h	Fri Apr 19 18:46:12 2002
@@ -17,7 +17,7 @@
 #include <asm/sigcontext.h>
 #include <asm/user.h>
 
-extern void init_fpu(void);
+extern void init_fpu(struct task_struct *);
 /*
  * FPU lazy state save handling...
  */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-19 23:12           ` [PATCH] " Brian Gerst
@ 2002-04-19 23:41             ` Linus Torvalds
  2002-04-20  0:01               ` H. Peter Anvin
  2002-04-20  0:08               ` Brian Gerst
  0 siblings, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2002-04-19 23:41 UTC (permalink / raw)
  To: Brian Gerst; +Cc: H. Peter Anvin, andrea, ak, linux-kernel, jh



On Fri, 19 Apr 2002, Brian Gerst wrote:
>
> Here's a patch to do just that.  It initializes the saved state in the
> task struct and falls through to restore_fpu().

One issue is whether we _can_ restore a "generated" image like this: it's
entirely possible that Intel might save away internal CPU shadow data in
the save-state structure, and future CPU's might be unhappy about loading
data that doesn't conform to what the CPU would save.

That said, the same issue is certainly true for just doing "xorps", since
that will not clear any potential future CPU state.

I get this feeling that Intel screwed up on specifying how to initialize
this whole state.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-19 23:41             ` Linus Torvalds
@ 2002-04-20  0:01               ` H. Peter Anvin
  2002-04-20  0:09                 ` Linus Torvalds
  2002-04-20  0:08               ` Brian Gerst
  1 sibling, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2002-04-20  0:01 UTC (permalink / raw)
  To: torvalds; +Cc: bgerst, hpa, andrea, ak, linux-kernel, jh

>
> I get this feeling that Intel screwed up on specifying how to
> initialize this whole state.
>

Indeed.  Logically, FNINIT should have been extended to initialize it all -
- it is a security hole that it doesn't initialize MMX properly.
Alternatively, for SSE only, an INITP instruction could have been added
that an SSE-enabled OS can use at the time OSXFSR or whatever that flag is
called is set.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-19 23:41             ` Linus Torvalds
  2002-04-20  0:01               ` H. Peter Anvin
@ 2002-04-20  0:08               ` Brian Gerst
  2002-04-20  0:21                 ` Linus Torvalds
  2002-04-20  4:21                 ` Andrea Arcangeli
  1 sibling, 2 replies; 38+ messages in thread
From: Brian Gerst @ 2002-04-20  0:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, andrea, ak, linux-kernel, jh

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Linus Torvalds wrote:
> 
> On Fri, 19 Apr 2002, Brian Gerst wrote:
> 
>>Here's a patch to do just that.  It initializes the saved state in the
>>task struct and falls through to restore_fpu().
> 
> 
> One issue is whether we _can_ restore a "generated" image like this: it's
> entirely possible that Intel might save away internal CPU shadow data in
> the save-state structure, and future CPU's might be unhappy about loading
> data that doesn't conform to what the CPU would save.
> 
> That said, the same issue is certainly true for just doing "xorps", since
> that will not clear any potential future CPU state.
> 
> I get this feeling that Intel screwed up on specifying how to initialize
> this whole state.
> 
> 		Linus
> 

I don't know about Intel, but the Athlon doesn't appear to save anything 
in the "reserved" areas.  The values that I use to initialize the state 
are from what is saved after loading the stack with zeroes (fldz) and 
then doing finit.  I have samples of the saved state from a P2, a K6-2, 
and an Athlon XP.

Attached is a revised patch.  I had the cpu_has_xmm test wrong.

-- 

						Brian Gerst

[-- Attachment #2: fpuclear-2 --]
[-- Type: text/plain, Size: 1971 bytes --]

diff -urN linux-2.5.8/arch/i386/kernel/i387.c linux/arch/i386/kernel/i387.c
--- linux-2.5.8/arch/i386/kernel/i387.c	Thu Mar  7 21:18:32 2002
+++ linux/arch/i386/kernel/i387.c	Fri Apr 19 19:35:14 2002
@@ -31,13 +31,21 @@
  * value at reset if we support XMM instructions and then
  * remeber the current task has used the FPU.
  */
-void init_fpu(void)
+void init_fpu(struct task_struct *tsk)
 {
-	__asm__("fninit");
-	if ( cpu_has_xmm )
-		load_mxcsr(0x1f80);
-		
-	current->used_math = 1;
+	if (cpu_has_fxsr) {
+		memset(&tsk->thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct));
+		tsk->thread.i387.fxsave.cwd = 0x37f;
+		if (cpu_has_xmm)
+			tsk->thread.i387.fxsave.mxcsr = 0x1f80;
+	} else {
+		memset(&tsk->thread.i387.fsave, 0, sizeof(struct i387_fsave_struct));
+		tsk->thread.i387.fsave.cwd = 0xffff037f;
+		tsk->thread.i387.fsave.swd = 0xffff0000;
+		tsk->thread.i387.fsave.twd = 0xffffffff;
+		tsk->thread.i387.fsave.fos = 0xffff0000;
+	}
+	tsk->used_math = 1;
 }
 
 /*
diff -urN linux-2.5.8/arch/i386/kernel/traps.c linux/arch/i386/kernel/traps.c
--- linux-2.5.8/arch/i386/kernel/traps.c	Sun Apr 14 23:48:18 2002
+++ linux/arch/i386/kernel/traps.c	Fri Apr 19 18:22:12 2002
@@ -757,13 +757,12 @@
  */
 asmlinkage void math_state_restore(struct pt_regs regs)
 {
+	struct task_struct *tsk = current;
 	clts();		/* Allow maths ops (or we recurse) */
 
-	if (current->used_math) {
-		restore_fpu(current);
-	} else {
-		init_fpu();
-	}
+	if (!tsk->used_math)
+		init_fpu(tsk);
+	restore_fpu(tsk);
 	set_thread_flag(TIF_USEDFPU);	/* So we fnsave on switch_to() */
 }
 
diff -urN linux-2.5.8/include/asm-i386/i387.h linux/include/asm-i386/i387.h
--- linux-2.5.8/include/asm-i386/i387.h	Fri Apr 19 18:44:48 2002
+++ linux/include/asm-i386/i387.h	Fri Apr 19 19:34:03 2002
@@ -17,7 +17,7 @@
 #include <asm/sigcontext.h>
 #include <asm/user.h>
 
-extern void init_fpu(void);
+extern void init_fpu(struct task_struct *);
 /*
  * FPU lazy state save handling...
  */

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:01               ` H. Peter Anvin
@ 2002-04-20  0:09                 ` Linus Torvalds
  2002-04-20  0:11                   ` Brian Gerst
                                     ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20  0:09 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: bgerst, andrea, ak, linux-kernel, jh



On Fri, 19 Apr 2002, H. Peter Anvin wrote:
>
> Indeed.  Logically, FNINIT should have been extended to initialize it all -
> - it is a security hole that it doesn't initialize MMX properly.

Well, MMX should arguably be initialized with "emms", so the proper
sequence migth be something like

	if (sse)
		asm("emms");
	asm("fninit");

What does emms do to SSE2?

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:09                 ` Linus Torvalds
@ 2002-04-20  0:11                   ` Brian Gerst
  2002-04-20  0:19                   ` H. Peter Anvin
  2002-04-20  0:31                   ` Alan Cox
  2 siblings, 0 replies; 38+ messages in thread
From: Brian Gerst @ 2002-04-20  0:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, andrea, ak, linux-kernel, jh

Linus Torvalds wrote:
> 
> On Fri, 19 Apr 2002, H. Peter Anvin wrote:
> 
>>Indeed.  Logically, FNINIT should have been extended to initialize it all -
>>- it is a security hole that it doesn't initialize MMX properly.
> 
> 
> Well, MMX should arguably be initialized with "emms", so the proper
> sequence migth be something like
> 
> 	if (sse)
> 		asm("emms");
> 	asm("fninit");
> 
> What does emms do to SSE2?
> 
> 		Linus
> 

All emms does is reset the tag word.  It doesn't touch the registers.

-- 

						Brian Gerst


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:09                 ` Linus Torvalds
  2002-04-20  0:11                   ` Brian Gerst
@ 2002-04-20  0:19                   ` H. Peter Anvin
  2002-04-20  0:29                     ` Linus Torvalds
  2002-04-20  0:31                   ` Alan Cox
  2 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2002-04-20  0:19 UTC (permalink / raw)
  To: torvalds; +Cc: hpa, bgerst, andrea, ak, linux-kernel, jh

>
>
> On Fri, 19 Apr 2002, H. Peter Anvin wrote:
>>
>> Indeed.  Logically, FNINIT should have been extended to initialize it
>> all - - it is a security hole that it doesn't initialize MMX properly.
>
> Well, MMX should arguably be initialized with "emms", so the proper
> sequence migth be something like
>

Remember that one of the original design goals of MMX was to run on
unmodified operating systems (and it can't be turned off); thus needing
any extra initialization is a bug.

    -hpa




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:08               ` Brian Gerst
@ 2002-04-20  0:21                 ` Linus Torvalds
  2002-04-20  4:21                 ` Andrea Arcangeli
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20  0:21 UTC (permalink / raw)
  To: Brian Gerst; +Cc: H. Peter Anvin, andrea, ak, linux-kernel, jh



On Fri, 19 Apr 2002, Brian Gerst wrote:
>
> I don't know about Intel, but the Athlon doesn't appear to save anything
> in the "reserved" areas.

Yes, I think it will work with current CPU's, I'm just worried about
future state extensions.

That said, I personally certainly prefer this approach which is guaranteed
to set as much as possible to a known state, even in the presense of
future extensions. If future extensions don't like loading zeroes, at
least the state will be _consistent_, which on the whole is really the
most important thing.

Applied.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:19                   ` H. Peter Anvin
@ 2002-04-20  0:29                     ` Linus Torvalds
  0 siblings, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20  0:29 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: bgerst, andrea, ak, linux-kernel, jh



On Fri, 19 Apr 2002, H. Peter Anvin wrote:
>
> Remember that one of the original design goals of MMX was to run on
> unmodified operating systems (and it can't be turned off); thus needing
> any extra initialization is a bug.

Oh, I agree 100%. The sanest thing would certainly have been to make sure
"fninit" just clears everything inclusing MMX and full XMM state. Oh,
well.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:09                 ` Linus Torvalds
  2002-04-20  0:11                   ` Brian Gerst
  2002-04-20  0:19                   ` H. Peter Anvin
@ 2002-04-20  0:31                   ` Alan Cox
  2 siblings, 0 replies; 38+ messages in thread
From: Alan Cox @ 2002-04-20  0:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, bgerst, andrea, ak, linux-kernel, jh

> Well, MMX should arguably be initialized with "emms", so the proper
> sequence migth be something like

Does emms make any guarantees about the register state ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:42           ` Andi Kleen
@ 2002-04-20  3:23             ` Andrea Arcangeli
  0 siblings, 0 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-20  3:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel, jh, bgerst

On Fri, Apr 19, 2002 at 11:42:06PM +0200, Andi Kleen wrote:
> On Fri, Apr 19, 2002 at 02:35:57PM -0700, H. Peter Anvin wrote:
> > would initialize the entire FPU, including any state that future
> > processors may add, thus reducing the likelihood of any funnies in the
> > future.
> 
> That's also why I like it. 

Trusting the "boot state" of the cpu would require the BIOS to match the
linux ABI. The FPU must be in a known initialized state at the linux
level, not at the BIOS level, as first for the mxcsr, but also the other
registers should be set to zero by default so gcc can exploit that (I
guess that's what gcc is just doing and that's why Honza noticed it). so
if new future processors will add new stuff, the new stuff will have to
be initialized again in the "fxrestor" default payload in linux (so
requiring a modification to the OS), and having to change the default
rxrestor payload for a new cpu is equivalent to add another xor there
(modulo the runtime check on the cpu features that could be avoided with
two separate exception handlers for each cpu revision but it's fast
enough that it doesn't matter at the moment on x86).

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  0:08               ` Brian Gerst
  2002-04-20  0:21                 ` Linus Torvalds
@ 2002-04-20  4:21                 ` Andrea Arcangeli
  2002-04-20  4:35                   ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-20  4:21 UTC (permalink / raw)
  To: Brian Gerst; +Cc: Linus Torvalds, H. Peter Anvin, ak, linux-kernel, jh

On Fri, Apr 19, 2002 at 08:08:15PM -0400, Brian Gerst wrote:
> diff -urN linux-2.5.8/arch/i386/kernel/i387.c linux/arch/i386/kernel/i387.c
> --- linux-2.5.8/arch/i386/kernel/i387.c	Thu Mar  7 21:18:32 2002
> +++ linux/arch/i386/kernel/i387.c	Fri Apr 19 19:35:14 2002
> @@ -31,13 +31,21 @@
>   * value at reset if we support XMM instructions and then
>   * remeber the current task has used the FPU.
>   */
> -void init_fpu(void)
> +void init_fpu(struct task_struct *tsk)
>  {
> -	__asm__("fninit");
> -	if ( cpu_has_xmm )
> -		load_mxcsr(0x1f80);
> -		
> -	current->used_math = 1;
> +	if (cpu_has_fxsr) {
> +		memset(&tsk->thread.i387.fxsave, 0, sizeof(struct i387_fxsave_struct));
> +		tsk->thread.i387.fxsave.cwd = 0x37f;
> +		if (cpu_has_xmm)
> +			tsk->thread.i387.fxsave.mxcsr = 0x1f80;
> +	} else {
> +		memset(&tsk->thread.i387.fsave, 0, sizeof(struct i387_fsave_struct));
> +		tsk->thread.i387.fsave.cwd = 0xffff037f;
> +		tsk->thread.i387.fsave.swd = 0xffff0000;
> +		tsk->thread.i387.fsave.twd = 0xffffffff;
> +		tsk->thread.i387.fsave.fos = 0xffff0000;
> +	}
> +	tsk->used_math = 1;
>  }
>  
>  /*
> diff -urN linux-2.5.8/arch/i386/kernel/traps.c linux/arch/i386/kernel/traps.c
> --- linux-2.5.8/arch/i386/kernel/traps.c	Sun Apr 14 23:48:18 2002
> +++ linux/arch/i386/kernel/traps.c	Fri Apr 19 18:22:12 2002
> @@ -757,13 +757,12 @@
>   */
>  asmlinkage void math_state_restore(struct pt_regs regs)
>  {
> +	struct task_struct *tsk = current;
>  	clts();		/* Allow maths ops (or we recurse) */
>  
> -	if (current->used_math) {
> -		restore_fpu(current);
> -	} else {
> -		init_fpu();
> -	}
> +	if (!tsk->used_math)
> +		init_fpu(tsk);
> +	restore_fpu(tsk);
>  	set_thread_flag(TIF_USEDFPU);	/* So we fnsave on switch_to() */
>  }
>  

I don't think it's good enough for merging yet. If you really want to do
the fxrestor, you should at least do the init_fpu only once during
bootup. The fxrestor is probably just overkill, but the memset + the
initializations is completly superflous in a fast path, I'd also use the
proper set_fpu_cwd and friends instead of doing it by hand.  Even better
is to merge the:

			/* Simulate an empty FPU. */
			set_fpu_cwd(child, 0x037f);
			set_fpu_swd(child, 0x0000);
			set_fpu_twd(child, 0xffff);
			set_fpu_mxcsr(child, 0x1f80);

			/* Simulate an empty FPU. */
			set_fpu_cwd(child, 0x037f);
			set_fpu_swd(child, 0x0000);
			set_fpu_twd(child, 0xffff);

in ptrace.c in a single function instead of duplicating functionality by
hand.

I still think the xor will be faster, no dcache pollution at all and
less I/O to ram. Future features can require change to the "empty FPU"
state anyways.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  4:21                 ` Andrea Arcangeli
@ 2002-04-20  4:35                   ` Linus Torvalds
  2002-04-20  5:07                     ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20  4:35 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh



On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
>
> I don't think it's good enough for merging yet. If you really want to do
> the fxrestor, you should at least do the init_fpu only once during
> bootup.

No, it needs to be done once per process FP state init, ie "flush_thread".

Think about it - we do _not_ copy the FP registers over an execve().

> The fxrestor is probably just overkill, but the memset + the
> initializations is completly superflous in a fast path,

That's no fast path, that's a "this process has never used the FPU before,
so we'd better make sure that it starts off with a really clean slate".

There is not just "one" FP state per kernel - there is one per process.
There _has_ to be.

> I still think the xor will be faster, no dcache pollution at all and
> less I/O to ram. Future features can require change to the "empty FPU"
> state anyways.

But the point is that people may still use a 2.4.x kernel on a P4-SSE3,
which only adds a few new instructions, and which re-uses the old SSE2
save area.

No kernel support necessary - it's transparent to the kernel, which won't
ever even know that some new fields in the save area are now used.

That was the whole point of MMX - it worked without any new OS support.
Intel learnt somewhat from past mistakes and made the save area bigger
than necessary, so that they can add new extensions without needing to
upgrade the OS yet another time.

THAT is the reason we can't just zero the SSE registers - because if we
do, we'll have the same problem next time around.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  4:35                   ` Linus Torvalds
@ 2002-04-20  5:07                     ` Andrea Arcangeli
  2002-04-20 16:27                       ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-20  5:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

On Fri, Apr 19, 2002 at 09:35:45PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> >
> > I don't think it's good enough for merging yet. If you really want to do
> > the fxrestor, you should at least do the init_fpu only once during
> > bootup.
> 
> No, it needs to be done once per process FP state init, ie "flush_thread".

Note that with init_fpu I meant the init_fpu written in the patch. All
you need is a:

	fxrstor "default fpu state"

You don't need to memset(0) and init your own stack to create the
"default fpu state" at every "flush_thread". You only need to execute an
fxrestor over a piece of ram initialized only once during boot, never
initialized ala init_fpu in the fast path.

> 
> Think about it - we do _not_ copy the FP registers over an execve().
> 
> > The fxrestor is probably just overkill, but the memset + the
> > initializations is completly superflous in a fast path,
> 
> That's no fast path, that's a "this process has never used the FPU before,
> so we'd better make sure that it starts off with a really clean slate".

it's executed by every single task using the fpu, I do use python or bc
to do a simple math from bash and exit, that's a fast path for me, the
math exception should return ASAP, doing a superflous memset in the math
exception over an array looks bad to me.

> There is not just "one" FP state per kernel - there is one per process.
> There _has_ to be.

There is only "one" FP _empty_ state per kernel. That's the only thing
you need to resolve the math exception, no need to replicate it.  Then
you set used_math and PF_USEDFPU and the rest of the matchanism will take
care of using the per-process backing store to save/restore it. No need
to memset to create the "empty FPU" state at every exception over and over again.

> 
> > I still think the xor will be faster, no dcache pollution at all and
> > less I/O to ram. Future features can require change to the "empty FPU"
> > state anyways.
> 
> But the point is that people may still use a 2.4.x kernel on a P4-SSE3,
> which only adds a few new instructions, and which re-uses the old SSE2
> save area.

If there's no new xmm and new control register that's fine. If there's
new control register the 2.4.x kernel will need modifications anyways.

Just adding new instructions is just fine, like between sse and sse2.

I think the only argument for that is that it will potentially clear the
xmm8-15 registers too, if they will be added to an x86 (they're just in
x86-64). The control part doesn't make much sense to be because it will
likely not be zero anyways.

> THAT is the reason we can't just zero the SSE registers - because if we
> do, we'll have the same problem next time around.

You are zeroing the SSE registers with the fxrestor way too. If a new
control register is added zero won't be guaranteed to be the right
initialization for it, most control registers aren't set to 0 by
default.

Unless they provide some future plan so that we can design the OS now to
support new feature it's pointless to spend cycles to try to be
backwards compatible with future cpus. Zeroing registers will be fine if
they add new instructions, everything else won't be backwards compatible
by definition because it wasn't documented that you had to do a certain
think to be backwards compatible.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20  5:07                     ` Andrea Arcangeli
@ 2002-04-20 16:27                       ` Linus Torvalds
  2002-04-20 17:27                         ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20 16:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh


On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> 
> Note that with init_fpu I meant the init_fpu written in the patch. All
> you need is a:
> 
> 	fxrstor "default fpu state"

Ok, that I agree with.

> > That's no fast path, that's a "this process has never used the FPU before,
> > so we'd better make sure that it starts off with a really clean slate".
> 
> it's executed by every single task using the fpu

Yes. _Once_ in their lifetimes.

> > But the point is that people may still use a 2.4.x kernel on a P4-SSE3,
> > which only adds a few new instructions, and which re-uses the old SSE2
> > save area.
> 
> If there's no new xmm and new control register that's fine. If there's
> new control register the 2.4.x kernel will need modifications anyways.
> 
> Just adding new instructions is just fine, like between sse and sse2.

If Intel makes the SSE3 registers twice as wide (or creates new ones), the 
xorps trick simply will not work.

> I think the only argument for that is that it will potentially clear the
> xmm8-15 registers too, if they will be added to an x86 (they're just in
> x86-64). The control part doesn't make much sense to be because it will
> likely not be zero anyways.

Actually, even control parts likely _will_ be be zero, the way people 
work.

> > THAT is the reason we can't just zero the SSE registers - because if we
> > do, we'll have the same problem next time around.
> 
> You are zeroing the SSE registers with the fxrestor way too.


Andrea, that's the whole _point_.

>							 If a new
> control register is added zero won't be guaranteed to be the right
> initialization for it, most control registers aren't set to 0 by
> default.

Even then, having a reliable failure that is easy to pinpoint it a lot 
better than random behaviour that has taken us more than two years to even 
_find_.

Besides, zeroes for initial values of control registers actually _is_ 
fairly likely, in my opinion. I've sent off an email to my Intel contacts 
to try to make this architected..

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 16:27                       ` Linus Torvalds
@ 2002-04-20 17:27                         ` Andrea Arcangeli
  2002-04-20 17:38                           ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-20 17:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

On Sat, Apr 20, 2002 at 09:27:11AM -0700, Linus Torvalds wrote:
> If Intel makes the SSE3 registers twice as wide (or creates new ones), the 
> xorps trick simply will not work.

Then the thing is different, I expected SSE3 not to mess the xmm layout.
If you just know SSE3 would break with the xorps the fxrestor way is
better. Anyways the problems I have about the implementation remains
(memset and duplicate efforts with ptrace in creating the empty fpu
state).

If they tell you the xmm registers won't change with SSE3 instead I
still prefer the xorps, that's 3bytes x 8 registers = 24 bytes of
icachce, compared to throwing away 512bytes/32 = 16 dcache cachelines so
it should be significantly faster.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 17:27                         ` Andrea Arcangeli
@ 2002-04-20 17:38                           ` Linus Torvalds
  2002-04-20 18:12                             ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20 17:38 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh



On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
>
> Then the thing is different, I expected SSE3 not to mess the xmm layout.
> If you just know SSE3 would break with the xorps the fxrestor way is
> better. Anyways the problems I have about the implementation remains
> (memset and duplicate efforts with ptrace in creating the empty fpu
> state).

Hey, send a clean patch and it will definitely get fixed.. I don't
disagree with that part, although actual numbers are always good to have.

> If they tell you the xmm registers won't change with SSE3 instead I
> still prefer the xorps, that's 3bytes x 8 registers = 24 bytes of
> icachce, compared to throwing away 512bytes/32 = 16 dcache cachelines so
> it should be significantly faster.

Oh, if they promise to not add registers we have an easy time, I agree.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 17:38                           ` Linus Torvalds
@ 2002-04-20 18:12                             ` Andrea Arcangeli
  2002-04-20 19:30                               ` Linus Torvalds
  2002-04-23 19:21                               ` Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-20 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

On Sat, Apr 20, 2002 at 10:38:59AM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> >
> > Then the thing is different, I expected SSE3 not to mess the xmm layout.
> > If you just know SSE3 would break with the xorps the fxrestor way is
> > better. Anyways the problems I have about the implementation remains
> > (memset and duplicate efforts with ptrace in creating the empty fpu
> > state).
> 
> Hey, send a clean patch and it will definitely get fixed.. I don't
> disagree with that part, although actual numbers are always good to have.

Well, my preferred patch is still this one :)

--- linux/arch/i386/kernel/i387.c.org	Thu Apr 18 09:30:26 2002
+++ linux/arch/i386/kernel/i387.c	Thu Apr 18 09:38:23 2002
@@ -33,9 +33,30 @@
 void init_fpu(void)
 {
 	__asm__("fninit");
-	if ( cpu_has_xmm )
+
+	if ( cpu_has_mmx )
+		asm volatile(
+			"pxor %mm0, %mm0	\n"
+			"pxor %mm1, %mm1	\n"
+			"pxor %mm2, %mm2	\n"
+			"pxor %mm3, %mm3	\n"
+			"pxor %mm4, %mm4	\n"
+			"pxor %mm5, %mm5	\n"
+			"pxor %mm6, %mm6	\n"
+			"pxor %mm7, %mm7	\n"
+			"emms			\n");
+	if ( cpu_has_xmm ) {
 		load_mxcsr(0x1f80);
+		asm volatile(
+			"xorps %xmm0, %xmm0	\n"
+			"xorps %xmm1, %xmm1	\n"
+			"xorps %xmm2, %xmm2	\n"
+			"xorps %xmm3, %xmm3	\n"
+			"xorps %xmm4, %xmm4	\n"
+			"xorps %xmm5, %xmm5	\n"
+			"xorps %xmm6, %xmm6	\n"
+			"xorps %xmm7, %xmm7	");
-		
+	}		
 	current->used_math = 1;
 }
 
--- linux/include/asm-i386/processor.h~	Thu Apr 18 10:06:37 2002
+++ linux/include/asm-i386/processor.h	Thu Apr 18 10:06:37 2002
@@ -89,6 +89,7 @@
 #define cpu_has_vme	(test_bit(X86_FEATURE_VME,  boot_cpu_data.x86_capability))
 #define cpu_has_fxsr	(test_bit(X86_FEATURE_FXSR, boot_cpu_data.x86_capability))
 #define cpu_has_xmm	(test_bit(X86_FEATURE_XMM,  boot_cpu_data.x86_capability))
+#define cpu_has_mmx	(test_bit(X86_FEATURE_MMX,  boot_cpu_data.x86_capability))
 #define cpu_has_fpu	(test_bit(X86_FEATURE_FPU,  boot_cpu_data.x86_capability))
 #define cpu_has_apic	(test_bit(X86_FEATURE_APIC, boot_cpu_data.x86_capability))
 


My strongest argument is that even if it's true the fxrestor logic may
not need changes even if they add registers or control words, that
doesn't apply to the kernel as a whole too. The same 2.4 kernel binary
image with the fxrestor patch applied cannot enable SSE3 automatically.
Further patching, recompilation and in turn a new fresh binary image
will be necessary to enable SSE3, regardless of the universal fxrestor
logic.  This obviously assuming that they won't release again a chip
that is not backwards compatible 8).

I mean, if they change the registers layout, and so if they require a
different empty FPU state, they must as well add yet another bitflag to
enable SSE3, if they don't the chip isn't backwards compatible.

If they only add new instructions like in sse2 than everything is fine
and the same binary image will work regardless of xorps or
universal-rxrestor.

If the register layout changes, by the time we change the kernel to
enable SSE3, it will be trivial to add yet another series of xoprs in
init_fpu (or a control word initialization ala load_mxcsr) and it will
remain much faster than fxrestor, and actually much simpler as well
IMHO.

This is why I don't feel the need of compilcating the code creating the
"empty fpu" during boot etc..

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 18:12                             ` Andrea Arcangeli
@ 2002-04-20 19:30                               ` Linus Torvalds
  2002-04-20 19:41                                 ` Andi Kleen
  2002-04-23 19:21                               ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20 19:30 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh


On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> 
> Well, my preferred patch is still this one :)

But this one simply will not get applied, for all the reasons already
outlined. It _will_ cause the same problems that it tries to fix, just at
some later time.

Besides, I seriously doubt it is any faster than what is there already.

Time it, and notice how:

 - fninit takes about 200 cycles
 - fxrstor takes about 215 cycles

and your added 16*(pxor/xorps) likely takes at least 8 cycles.

In short, your "fast" code isn't actually any faster than doing it right.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 19:30                               ` Linus Torvalds
@ 2002-04-20 19:41                                 ` Andi Kleen
  2002-04-20 21:28                                   ` Andrea Arcangeli
  2002-04-20 23:13                                   ` Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Andi Kleen @ 2002-04-20 19:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

> Besides, I seriously doubt it is any faster than what is there already.
> 
> Time it, and notice how:
> 
>  - fninit takes about 200 cycles
>  - fxrstor takes about 215 cycles

On what CPU? 

I checked the Athlon4 optimization manual and fxrstor is listed as 68/108
cycles (i guess depending on whether there is XMM state or not so 68 cycles
probably apply here) and fninit as 91 cycles. It doesn't list the SSE1 
timings, but i guess the instructions don't take more than 3 cycles
(MMX instructions take that long). So Andrea's way should be 
91+16*3=139+some cycles for emms (or 107 if sse ops take only a single cycle) 
vs 68 or 108.  So the fxrstor wins well. 

On x86-64 the difference is even bigger because it has 16 XMM registers instead
of 8.


> In short, your "fast" code isn't actually any faster than doing it right.

At least on Athlon it should be slower. 

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 19:41                                 ` Andi Kleen
@ 2002-04-20 21:28                                   ` Andrea Arcangeli
  2002-04-20 22:43                                     ` H. Peter Anvin
  2002-04-20 23:23                                     ` Linus Torvalds
  2002-04-20 23:13                                   ` Linus Torvalds
  1 sibling, 2 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-20 21:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, Brian Gerst, H. Peter Anvin, linux-kernel, jh

On Sat, Apr 20, 2002 at 09:41:14PM +0200, Andi Kleen wrote:
> > Besides, I seriously doubt it is any faster than what is there already.
> > 
> > Time it, and notice how:
> > 
> >  - fninit takes about 200 cycles
> >  - fxrstor takes about 215 cycles
> 
> On what CPU? 
> 
> I checked the Athlon4 optimization manual and fxrstor is listed as 68/108
> cycles (i guess depending on whether there is XMM state or not so 68 cycles
> probably apply here) and fninit as 91 cycles. It doesn't list the SSE1 
> timings, but i guess the instructions don't take more than 3 cycles
> (MMX instructions take that long). So Andrea's way should be 
> 91+16*3=139+some cycles for emms (or 107 if sse ops take only a single cycle) 
> vs 68 or 108.  So the fxrstor wins well. 
> 
> On x86-64 the difference is even bigger because it has 16 XMM registers instead
> of 8.
> 
> 
> > In short, your "fast" code isn't actually any faster than doing it right.
> 
> At least on Athlon it should be slower. 

pxor+xorps is definitely faster than fxrestor on athlon-mp.

fxrestor on athlon-mp 1600, on cold cache (the "default fpu state" will
be cold most of the time, it's only ever used at the first math fault of
a task):

dualathlon:/home/andrea # for i in 1 2 3 4 5 6 7 8 9 10 ; do ./a.out ;
done
cycles 3507
cycles 3293
cycles 3179
cycles 3267
cycles 3374
cycles 3213
cycles 2575
cycles 3306
cycles 2915
cycles 3078

pxor + xorps on the same hardware cold cache:

dualathlon:/home/andrea # for i in 1 2 3 4 5 6 7 8 9 10 ; do ./a.out ;
done
cycles 2444
cycles 1619
cycles 2396
cycles 2644
cycles 2322
cycles 2404
cycles 2266
cycles 1657
cycles 2182
cycles 2077

And here it is again on PIII 800mhz for confirmation on true Intel CPU:

fxrestor:

for i in 1 2 3 4 5 6 7 8 9 10 ; do ./a.out ; done
cycles 3461
cycles 3294
cycles 3339
cycles 3276
cycles 3620
cycles 3443
cycles 3401
cycles 3455
cycles 3515
cycles 3327

xor+xorps:

for i in 1 2 3 4 5 6 7 8 9 10 ; do ./a.out ; done
cycles 2288
cycles 2181
cycles 2430
cycles 2486
cycles 2360
cycles 2534
cycles 2243
cycles 2199
cycles 2153
cycles 2396

As said in an earlier email it is a matter of memory bandwith, 59 bytes
of icache and zero data, against 7 bytes of icache and 512bytes of data.
the 512bytes of data are visibly slower, period. Saving mem bandwith is
much more important than reducing the number of instructions, even more
on SMP!

On the x86-64 it will be exactly the same, the mem bandwith is much
higher, but it's still a major bottleneck compared to a few more
instructions to execute (also consider the current ia32 isn't probably
reading the xmm8-15 slots, so on x86-64 fxrestor will be potentially
even slower compared to the in-core xorps that never do any I/O to
memory).

Go ahead and try yourself setting the #if to 0 or 1. The benchmark is
very accurate.

#include <sys/mman.h>
#include <asm/msr.h>

struct i387_fxsave_struct {
	unsigned short	cwd;
	unsigned short	swd;
	unsigned short	twd;
	unsigned short	fop;
	long	fip;
	long	fcs;
	long	foo;
	long	fos;
	long	mxcsr;
	long	reserved;
	long	st_space[32];	/* 8*16 bytes for each FP-reg = 128 bytes */
	long	xmm_space[32];	/* 8*16 bytes for each XMM-reg = 128 bytes */
	long	padding[56];
} __attribute__ ((aligned (16)));

#define LOOPS 1

#define load_mxcsr( val ) do { \
	unsigned long __mxcsr = ((unsigned long)(val) & 0xffbf); \
	asm volatile( "ldmxcsr %0" : : "m" (__mxcsr) ); \
} while (0)

struct i387_fxsave_struct i387;
char buf[1024*1024*40];

static void cold_dcache(void)
{
	memset(buf, 0, 1024*1024*40);
}

main()
{
	unsigned long before, after;
	int i;

	iopl(3);
	mlockall(MCL_FUTURE);

#if 1
	cold_dcache();
	asm("cli");

	rdtscl(before);

	__asm__("fninit");

	asm volatile(
		     "pxor %mm0, %mm0	\n"
		     "xorps %xmm0, %xmm0	\n"
		     "pxor %mm1, %mm1	\n"
		     "xorps %xmm1, %xmm1	\n"
		     "pxor %mm2, %mm2	\n"
		     "xorps %xmm2, %xmm2	\n"
		     "pxor %mm3, %mm3	\n"
		     "xorps %xmm3, %xmm3	\n"
		     "pxor %mm4, %mm4	\n"
		     "xorps %xmm4, %xmm4	\n"
		     "pxor %mm5, %mm5	\n"
		     "xorps %xmm5, %xmm5	\n"
		     "pxor %mm6, %mm6	\n"
		     "xorps %xmm6, %xmm6	\n"
		     "pxor %mm7, %mm7	\n"
		     "xorps %xmm7, %xmm7	\n"
		     "emms			\n");
	load_mxcsr(0x1f80);
	rdtscl(after);
#else
	asm volatile("fxsave %0" : : "m" (i387));

	cold_dcache();
	asm("cli");
	rdtscl(before);

	asm volatile("fxrstor %0" : "=m" (i387));

	rdtscl(after);
#endif

	printf("cycles %lu\n", after-before);
}

As said for SSE3 a kernel modification will be required anyways if xorps
isn't enough to initialize the new fpu, so the current 2.4 kernel cannot
cope with it anyways regardless of fxrestor or not.

So I don't buy that fxrestor is better, nor more generic.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 21:28                                   ` Andrea Arcangeli
@ 2002-04-20 22:43                                     ` H. Peter Anvin
  2002-04-21  2:09                                       ` Andrea Arcangeli
  2002-04-20 23:23                                     ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2002-04-20 22:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Linus Torvalds, Brian Gerst, linux-kernel, jh

Andrea Arcangeli wrote:
> 
> As said in an earlier email it is a matter of memory bandwith, 59 bytes
> of icache and zero data, against 7 bytes of icache and 512bytes of data.
> the 512bytes of data are visibly slower, period. Saving mem bandwith is
> much more important than reducing the number of instructions, even more
> on SMP!
> 

It's not 512 bytes of data -- only the part that's actually used is 
accessed.

	-hpa


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 19:41                                 ` Andi Kleen
  2002-04-20 21:28                                   ` Andrea Arcangeli
@ 2002-04-20 23:13                                   ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20 23:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrea Arcangeli, Brian Gerst, H. Peter Anvin, linux-kernel, jh



On Sat, 20 Apr 2002, Andi Kleen wrote:

> > Besides, I seriously doubt it is any faster than what is there already.
> >
> > Time it, and notice how:
> >
> >  - fninit takes about 200 cycles
> >  - fxrstor takes about 215 cycles
>
> On what CPU?

Sorry, should have mentioned. That's a P4.

> I checked the Athlon4 optimization manual and fxrstor is listed as 68/108
> cycles (i guess depending on whether there is XMM state or not so 68 cycles
> probably apply here) and fninit as 91 cycles. It doesn't list the SSE1
> timings, but i guess the instructions don't take more than 3 cycles
> (MMX instructions take that long). So Andrea's way should be
> 91+16*3=139+some cycles for emms (or 107 if sse ops take only a single cycle)
> vs 68 or 108.  So the fxrstor wins well.

The athlon should be able to do two MMX / cycle (the throughput is much
lower, but there are no data dependencies here).

> > In short, your "fast" code isn't actually any faster than doing it right.
>
> At least on Athlon it should be slower.

I suspect that the real answer is "the speed difference is _way_ in the
noise".

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 21:28                                   ` Andrea Arcangeli
  2002-04-20 22:43                                     ` H. Peter Anvin
@ 2002-04-20 23:23                                     ` Linus Torvalds
  2002-04-21  2:08                                       ` Andrea Arcangeli
  1 sibling, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2002-04-20 23:23 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Brian Gerst, H. Peter Anvin, linux-kernel, jh



On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
>
> pxor+xorps is definitely faster than fxrestor on athlon-mp.

Andrea, that's not the _comparison_.

The "fxrestor" replaces the "fninit" too, so you have to take that into
account.

> fxrestor on athlon-mp 1600, on cold cache (the "default fpu state" will
> be cold most of the time, it's only ever used at the first math fault of
> a task):

Except it's _never_ cold-cache the way it's coded now. In fact it's always
hot-cache - which are exactly the numbers I posted.

> Go ahead and try yourself setting the #if to 0 or 1. The benchmark is
> very accurate.

It may have high precision, but since it's testing something that has
nothing to do with the problem at hand, it's basically 100% useless.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 23:23                                     ` Linus Torvalds
@ 2002-04-21  2:08                                       ` Andrea Arcangeli
  0 siblings, 0 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-21  2:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Brian Gerst, H. Peter Anvin, linux-kernel, jh

On Sat, Apr 20, 2002 at 04:23:50PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> >
> > pxor+xorps is definitely faster than fxrestor on athlon-mp.
> 
> Andrea, that's not the _comparison_.
> 
> The "fxrestor" replaces the "fninit" too, so you have to take that into
> account.

Note that I just took it into account:

	rdtscl(before);

	__asm__("fninit");
	^^^^^^^^^^^^^^^^^

	asm volatile(
		     "pxor %mm0, %mm0	\n"
		     "xorps %xmm0, %xmm0	\n"
		     "pxor %mm1, %mm1	\n"
		     "xorps %xmm1, %xmm1	\n"
		     "pxor %mm2, %mm2	\n"
		     "xorps %xmm2, %xmm2	\n"
		     "pxor %mm3, %mm3	\n"
		     "xorps %xmm3, %xmm3	\n"
		     "pxor %mm4, %mm4	\n"
		     "xorps %xmm4, %xmm4	\n"
		     "pxor %mm5, %mm5	\n"
		     "xorps %xmm5, %xmm5	\n"
		     "pxor %mm6, %mm6	\n"
		     "xorps %xmm6, %xmm6	\n"
		     "pxor %mm7, %mm7	\n"
		     "xorps %xmm7, %xmm7	\n"
		     "emms			\n");
	load_mxcsr(0x1f80);
	rdtscl(after);

> 
> > fxrestor on athlon-mp 1600, on cold cache (the "default fpu state" will
> > be cold most of the time, it's only ever used at the first math fault of
> > a task):
> 
> Except it's _never_ cold-cache the way it's coded now. In fact it's always
> hot-cache - which are exactly the numbers I posted.

In the common case fork/clone are executed at a frequency that makes it
a cold cache.

Anyways since you are apparently constantly forking off a new task every
10 milliseconds on your machine, on the PIII with the hot cache it's 97
cycles for fxrestor, and 90 cycles of pxor/xorps/fninit so it's still
faster, but not much faster anymore. With the athlon-mp it's 120 cycles
the xorps/pxor/fninit and 85 cycles fxrestor, but that's only because
the fninit is very slow on the athlon-mp, probably not the case for
x86-64 (while the xorps/pxor is just much faster on the athlon-mp
compared to the PIII). The pxor/xorps alone on the athlon-mp takes only
17 cycles! the fninit alone instead takes 99 cycles.  512bytes of ram on
x86-64 (around half on x86) cannot be optimized away by a new cpu, the
bus will get the hit regardless, while fninit has the potential to be
much faster on the new cpu (just like it's much faster on the PIII and
infact it's a win).  I guess the P4 will give hot cache results similar
to the PIII with the hot-cache, and anyways there's no doubt on the huge
speedup with the _common_ case cold-cache, even more obviously on x86-64
that will really have to read the _whole_ 512bytes, not (yet) the case
for x86. I'm astonished you prefer to take the hit on the ram bus.

here the hot cache benchmark that I used so you can test yourself:

#include <sys/mman.h>
#include <asm/msr.h>

struct i387_fxsave_struct {
	unsigned short	cwd;
	unsigned short	swd;
	unsigned short	twd;
	unsigned short	fop;
	long	fip;
	long	fcs;
	long	foo;
	long	fos;
	long	mxcsr;
	long	reserved;
	long	st_space[32];	/* 8*16 bytes for each FP-reg = 128 bytes */
	long	xmm_space[32];	/* 8*16 bytes for each XMM-reg = 128 bytes */
	long	padding[56];
} __attribute__ ((aligned (16)));

#define LOOPS 200

#define load_mxcsr( val ) do { \
	unsigned long __mxcsr = ((unsigned long)(val) & 0xffbf); \
	asm volatile( "ldmxcsr %0" : : "m" (__mxcsr) ); \
} while (0)

struct i387_fxsave_struct i387;
char buf[1024*1024*40];

static void cold_dcache(void)
{
	memset(buf, 0, 1024*1024*40);
}

main()
{
	unsigned long before, after;
	int i;

#if 1
	for (i = 0; i < LOOPS; i++) {
		rdtscl(before);

 		__asm__("fninit");

#if 1
		asm volatile("pxor %mm0, %mm0	\n"
			     "pxor %mm1, %mm1	\n"
			     "pxor %mm2, %mm2	\n"
			     "pxor %mm3, %mm3	\n"
			     "pxor %mm4, %mm4	\n"
			     "pxor %mm5, %mm5	\n"
			     "pxor %mm6, %mm6	\n"
			     "pxor %mm7, %mm7	\n"
			     "xorps %xmm0, %xmm0	\n"
			     "xorps %xmm1, %xmm1	\n"
			     "xorps %xmm2, %xmm2	\n"
			     "xorps %xmm3, %xmm3	\n"
			     "xorps %xmm4, %xmm4	\n"
			     "xorps %xmm5, %xmm5	\n"
			     "xorps %xmm6, %xmm6	\n"
			     "xorps %xmm7, %xmm7	\n"
			     "emms			\n");
		load_mxcsr(0x1f80);
#endif
		rdtscl(after);
	}
#else
	asm volatile("fxsave %0" : : "m" (i387));

	for (i = 0; i < LOOPS; i++) {
		rdtscl(before);

		asm volatile("fxrstor %0" : "=m" (i387));

		rdtscl(after);
	}
#endif

	printf("cycles %lu\n", after-before);
}

> It may have high precision, but since it's testing something that has
> nothing to do with the problem at hand, it's basically 100% useless.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 22:43                                     ` H. Peter Anvin
@ 2002-04-21  2:09                                       ` Andrea Arcangeli
  0 siblings, 0 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-21  2:09 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andi Kleen, Linus Torvalds, Brian Gerst, linux-kernel, jh

On Sat, Apr 20, 2002 at 03:43:17PM -0700, H. Peter Anvin wrote:
> Andrea Arcangeli wrote:
> >
> >As said in an earlier email it is a matter of memory bandwith, 59 bytes
> >of icache and zero data, against 7 bytes of icache and 512bytes of data.
> >the 512bytes of data are visibly slower, period. Saving mem bandwith is
> >much more important than reducing the number of instructions, even more
> >on SMP!
> >
> 
> It's not 512 bytes of data -- only the part that's actually used is 
> accessed.

On current x86 yes (so far), but the x86-64 the whole 512bytes will have
to be read from ram instead.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-20 18:12                             ` Andrea Arcangeli
  2002-04-20 19:30                               ` Linus Torvalds
@ 2002-04-23 19:21                               ` Linus Torvalds
  2002-04-23 20:05                                 ` H. Peter Anvin
  2002-04-24  0:32                                 ` Andrea Arcangeli
  1 sibling, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2002-04-23 19:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh



On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
>
> I mean, if they change the registers layout, and so if they require a
> different empty FPU state, they must as well add yet another bitflag to
> enable SSE3, if they don't the chip isn't backwards compatible.

I have unofficial confirmation from Intel that the way to architecturally
initialize the FPU state is indeed something like

        memset(&fxsave, 0, sizeof(struct i387_fxsave_struct));
        fxsave.cwd = 0x37f;
        fxsave.mxcsr = 0x1f80;
        fxrstor(&fxsave);

and the person in question is trying to make sure this is documented so
that we won't be bitten by this in the future.

			Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-23 19:21                               ` Linus Torvalds
@ 2002-04-23 20:05                                 ` H. Peter Anvin
  2002-04-24  0:32                                 ` Andrea Arcangeli
  1 sibling, 0 replies; 38+ messages in thread
From: H. Peter Anvin @ 2002-04-23 20:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Brian Gerst, ak, linux-kernel, jh

Linus Torvalds wrote:
> 
> On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> 
>>I mean, if they change the registers layout, and so if they require a
>>different empty FPU state, they must as well add yet another bitflag to
>>enable SSE3, if they don't the chip isn't backwards compatible.
> 
> 
> I have unofficial confirmation from Intel that the way to architecturally
> initialize the FPU state is indeed something like
> 
>         memset(&fxsave, 0, sizeof(struct i387_fxsave_struct));
>         fxsave.cwd = 0x37f;
>         fxsave.mxcsr = 0x1f80;
>         fxrstor(&fxsave);
> 
> and the person in question is trying to make sure this is documented so
> that we won't be bitten by this in the future.
> 

Great!

	-hpa



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-23 19:21                               ` Linus Torvalds
  2002-04-23 20:05                                 ` H. Peter Anvin
@ 2002-04-24  0:32                                 ` Andrea Arcangeli
  2002-04-24  2:10                                   ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-24  0:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

On Tue, Apr 23, 2002 at 12:21:29PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 20 Apr 2002, Andrea Arcangeli wrote:
> >
> > I mean, if they change the registers layout, and so if they require a
> > different empty FPU state, they must as well add yet another bitflag to
> > enable SSE3, if they don't the chip isn't backwards compatible.
> 
> I have unofficial confirmation from Intel that the way to architecturally
> initialize the FPU state is indeed something like
> 
>         memset(&fxsave, 0, sizeof(struct i387_fxsave_struct));
>         fxsave.cwd = 0x37f;
>         fxsave.mxcsr = 0x1f80;
>         fxrstor(&fxsave);
> 
> and the person in question is trying to make sure this is documented so
> that we won't be bitten by this in the future.

Ok, thanks for the info. The advantage (now that it is documented! :) is
that they can add the xmm8-15 reigsters of x86-64 to x86 too without
requiring any change to linux. On the linux side we obviously don't care
if they document it retroactive as an errata of sse docs, so for us it
doesn't matter even if they don't add an additional bitflag before
adding the xmm8-15 registers.  Basically they should only deal with the
other operative systems now.

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-24  0:32                                 ` Andrea Arcangeli
@ 2002-04-24  2:10                                   ` Linus Torvalds
  2002-04-26  9:13                                     ` Pavel Machek
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2002-04-24  2:10 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh



On Wed, 24 Apr 2002, Andrea Arcangeli wrote:
>
>	  Basically they should only deal with the
> other operative systems now.

Yeah. And since the security hole is fairly small and insignificant in
comparison to some others, I suspect that the main "other operating
system" just won't care one way or the other.

		Linus


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-24  2:10                                   ` Linus Torvalds
@ 2002-04-26  9:13                                     ` Pavel Machek
  2002-04-26 11:55                                       ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Pavel Machek @ 2002-04-26  9:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

Hi!

> >	  Basically they should only deal with the
> > other operative systems now.
> 
> Yeah. And since the security hole is fairly small and insignificant in
> comparison to some others, I suspect that the main "other operating
> system" just won't care one way or the other.

*If* the other system uses fpu unit for memcpy, well, it may get
pretty big and significant.
									Pavel
-- 
(about SSSCA) "I don't say this lightly.  However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] Re: SSE related security hole
  2002-04-26  9:13                                     ` Pavel Machek
@ 2002-04-26 11:55                                       ` Andrea Arcangeli
  0 siblings, 0 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2002-04-26 11:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Brian Gerst, H. Peter Anvin, ak, linux-kernel, jh

On Fri, Apr 26, 2002 at 11:13:42AM +0200, Pavel Machek wrote:
> *If* [..]

you missed it, you should read what's written behind the lines too 8)

Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2002-04-26 11:55 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020418183639.20946.qmail@science.horizon.com.suse.lists.linux.kernel>
     [not found] ` <a9ncgs$2s2$1@cesium.transmeta.com.suse.lists.linux.kernel>
2002-04-19 14:06   ` SSE related security hole Andi Kleen
2002-04-19 18:00     ` Doug Ledford
2002-04-19 21:04       ` Andrea Arcangeli
2002-04-19 21:35         ` H. Peter Anvin
2002-04-19 21:42           ` Andi Kleen
2002-04-20  3:23             ` Andrea Arcangeli
2002-04-19 23:12           ` [PATCH] " Brian Gerst
2002-04-19 23:41             ` Linus Torvalds
2002-04-20  0:01               ` H. Peter Anvin
2002-04-20  0:09                 ` Linus Torvalds
2002-04-20  0:11                   ` Brian Gerst
2002-04-20  0:19                   ` H. Peter Anvin
2002-04-20  0:29                     ` Linus Torvalds
2002-04-20  0:31                   ` Alan Cox
2002-04-20  0:08               ` Brian Gerst
2002-04-20  0:21                 ` Linus Torvalds
2002-04-20  4:21                 ` Andrea Arcangeli
2002-04-20  4:35                   ` Linus Torvalds
2002-04-20  5:07                     ` Andrea Arcangeli
2002-04-20 16:27                       ` Linus Torvalds
2002-04-20 17:27                         ` Andrea Arcangeli
2002-04-20 17:38                           ` Linus Torvalds
2002-04-20 18:12                             ` Andrea Arcangeli
2002-04-20 19:30                               ` Linus Torvalds
2002-04-20 19:41                                 ` Andi Kleen
2002-04-20 21:28                                   ` Andrea Arcangeli
2002-04-20 22:43                                     ` H. Peter Anvin
2002-04-21  2:09                                       ` Andrea Arcangeli
2002-04-20 23:23                                     ` Linus Torvalds
2002-04-21  2:08                                       ` Andrea Arcangeli
2002-04-20 23:13                                   ` Linus Torvalds
2002-04-23 19:21                               ` Linus Torvalds
2002-04-23 20:05                                 ` H. Peter Anvin
2002-04-24  0:32                                 ` Andrea Arcangeli
2002-04-24  2:10                                   ` Linus Torvalds
2002-04-26  9:13                                     ` Pavel Machek
2002-04-26 11:55                                       ` Andrea Arcangeli
2002-04-19 22:18         ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).