All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] espfix code cleanup more
       [not found] <200607300016.k6U0GYu4023664@shell0.pdx.osdl.net>
@ 2006-07-30 10:57 ` Stas Sergeev
       [not found] ` <44CE766D.6000705@vmware.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-07-30 10:57 UTC (permalink / raw)
  To: akpm; +Cc: Linux kernel

[-- Attachment #1: Type: text/plain, Size: 199 bytes --]

Hi Andrew.

Attached is a micro-optimization on top of the
previous cleanup patch. It removes the redundant
arguments from the C functions. No functionality changes.

Signed-off-by: <stsp@aknet.ru>


[-- Attachment #2: espfcln_a.diff --]
[-- Type: text/x-patch, Size: 2221 bytes --]

--- linux-2.6.18-rc2-mm1/arch/i386/kernel/traps.c	2006-07-29 15:32:14.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/traps.c	2006-07-30 02:19:59.000000000 +0400
@@ -1018,13 +1018,13 @@
 #endif
 }
 
-fastcall unsigned long patch_espfix_gdt(struct pt_regs *regs,
+fastcall unsigned long patch_espfix_base(unsigned long uesp,
 					  unsigned long kesp)
 {
 	int cpu = smp_processor_id();
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 	struct desc_struct *gdt = (struct desc_struct *)cpu_gdt_descr->address;
-	unsigned long base = (kesp - regs->esp) & -THREAD_SIZE;
+	unsigned long base = (kesp - uesp) & -THREAD_SIZE;
 	__u64 desc = *(__u64 *)&gdt[GDT_ENTRY_ESPFIX_SS];
 	/* Set up base for espfix segment */
  	desc &= 0x00ffff000000ffffULL;
@@ -1034,14 +1034,13 @@
 	return kesp - base;
 }
 
-fastcall unsigned long get_orig_kesp(unsigned long kesp, unsigned long cpu)
+fastcall unsigned long get_espfix_base(unsigned long cpu)
 {
 	/* Since we are on a wrong stack, the smp_processor_id() cannot
 	 * be used. So the cpu number is passed from an assembly. */
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 	struct desc_struct *gdt = (struct desc_struct *)cpu_gdt_descr->address;
-	unsigned long base = get_desc_base(&gdt[GDT_ENTRY_ESPFIX_SS].a);
-	return base + kesp;
+	return get_desc_base(&gdt[GDT_ENTRY_ESPFIX_SS].a);
 }
 
 /*
--- linux-2.6.18-rc2-mm1/arch/i386/kernel/entry.S	2006-07-29 15:29:00.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/entry.S	2006-07-30 02:19:47.000000000 +0400
@@ -401,9 +401,9 @@
 	 * This is an "official" bug of all the x86-compatible
 	 * CPUs, which we can try to work around to make
 	 * dosemu and wine happy. */
-	movl %esp, %eax		# pt_regs pointer
+	movl OLDESP(%esp), %eax
 	movl %esp, %edx
-	call patch_espfix_gdt
+	call patch_espfix_base
 	pushl $__ESPFIX_SS
 	CFI_ADJUST_CFA_OFFSET 4
 	pushl %eax
@@ -502,10 +502,10 @@
 	CFI_ENDPROC
 
 #define FIXUP_ESPFIX_STACK \
-	movl %esp, %eax; \
 	GET_THREAD_INFO(%ebp); \
-	movl TI_cpu(%ebp), %edx; \
-	call get_orig_kesp; \
+	movl TI_cpu(%ebp), %eax; \
+	call get_espfix_base; \
+	addl %esp, %eax; \
 	pushl $__KERNEL_DS; \
 	pushl %eax; \
 	lss (%esp), %esp;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
       [not found] ` <44CE766D.6000705@vmware.com>
@ 2006-08-01 12:21   ` Stas Sergeev
  2006-08-01 13:38     ` Jan Beulich
  2006-08-01 21:01     ` Zachary Amsden
  0 siblings, 2 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-08-01 12:21 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: akpm, 76306.1226, ak, JBeulich, rohitseth, Linux kernel

[-- Attachment #1: Type: text/plain, Size: 4468 bytes --]

Hi Zach and thanks for you really great review!

Zachary Amsden wrote:
> Overall, this looks like a great cleanup.  But I think there are some 
> details with the patch that need fixing.  It is extremely hard to follow 
> the entry.S changes based on patches alone - which tree does this apply 
> cleanly to?
This should cleanly apply to 2.6.18-rc2-mm1.

> I would like to see the final result in entry.S, there are
I'll send it to you privately.

> still some details (dealing with eliminating the push/pop of %eax as a 
> temporary for the segment comparison) which I could not follow exactly 
> in patch format.
The reason for removal was that I also removed the xorl %eax,%eax, decl %eax
stuff from error_code (replaced with $-1, as you've seen), and so preserving
%eax became unnecessary I think.

>> iret_exc:
>> -    TRACE_IRQS_ON
>> -    sti
>>      pushl $0            # no error code
>>      pushl $do_iret_error
>>      jmp error_code
>>  
> Why did you remove the sti here?
Because I simply broke my head over that code. If you say sti should
stay - I beleive you. But please, explain me how it works, before I
went crazy! :)
I read the iret pseudo-code in the Intel manuals carefully. It first
pops the iret frame into temporaries, then checks the values for
validity, and then either faults or writes the values to the real
registers. Now, if we have a bad user's iret frame, the iret will
pop it, raise an exception and push a kernel's iret frame, while
the popped user's frame at that point should be completely lost,
as far as I can understand. Now we go into GPF handler, then to a
fixup, push $0 as an error code, but what iret frame is above that
error code? Where does it come from? Doesn't the iret loses the
user's iret frame completely? Certainly I am missing something obvious
here, but I just can't follow that magic...

> Did you test this with deliberately 
> faulting IRETs?  I think this is a bug.
Yes, I tested that and it works (I have a rather extensive test-suite
for that problem - faulting iret is always tested). But since I
don't understand the magic, I can't say why it works.

>>      pushl %eax; \                     <----
>> -    CFI_ADJUST_CFA_OFFSET 4; \
>> +    lss (%esp), %esp;
> <---- CFI adjustment here is missing.
Well, someone used that macro in a .fixup section, where the
CFI adjustments do not seem to work. But since I don't know
why this was done (Jan?), I reverted that to my original code and
added the adjustments now.

> Bigger problem!  (*!)  It is _unsafe_ to call C-code here.
Ow, holly shit!!! What a disaster... But you are right. Well,
this will make my new patch really big...

> You did not 
> introduce this bug, I just spotted it now, and the old code has the same 
> problem.
You are wrong here - I introduced that bug as the old code is
also mine. Linus attributed it to someone else though, probably
by mistake (or did he just wanted to rescue me from the rant about
that code? :). But it was mine.

> There are two options I see to fixing this problem.  One is to move the 
> GDT fixup code into assembler.  The other is to compile the GDT fixup 
> code in a separate .o file with exactly specified CFLAGS to make sure 
> -fomit-frame-pointer is not passed to it.
Well, unsafe is unsafe. If we can't explicitly tell gcc to prefix the
%ebp accesses with %ds, then I'd better go for an asm version. 

>>      FIXUP_ESPFIX_STACK        # %eax == %esp
>> -    CFI_ADJUST_CFA_OFFSET -20    # the frame has now moved
>> +    CFI_ADJUST_CFA_OFFSET -20
> Is this CFI adjustment still correct?
Hmm, I guess it wasn't correct also before, so I just
left it as is. Should now be fixed. 

>> -    .quad 0x0000920000000000    /* 0xd0 - ESPFIX 16-bit SS */
>> +    .quad 0x00cf92000000ffff    /* 0xd0 - ESPFIX SS */
> Seems a bit dangerous to allow access to full 4GB through this.  Can you 
> tighten the limit any?  I suppose not, because the high bits in %esp 
> really could be anything.  But it might be nice to try setting the limit 
> to regs->esp + THREAD_SIZE.  Of course, this is not strictly necessary, 
> just an extra paranoid protection mechanism.
Since, when calculating the base, I do &-THREAD_SIZE, I guess the minimal
safe limit is regs->esp + THREAD_SIZE*2... Well, may just I not do that please? :)
For what, btw? There are no such a things for __KERNEL_DS or anything, so
I just don't see the necessity.

The new patch is attached. Hopefully I addressed all of your concerns.
Thanks!


[-- Attachment #2: espfcln2.diff --]
[-- Type: text/x-patch, Size: 11221 bytes --]

--- linux-2.6.18-rc2-mm1/percpu.h	2004-01-09 10:00:03.000000000 +0300
+++ linux-2.6.18-rc2-mm1/percpu.h	2006-08-01 14:31:10.000000000 +0400
@@ -1,6 +1,27 @@
 #ifndef __ARCH_I386_PERCPU__
 #define __ARCH_I386_PERCPU__
 
+#ifndef __ASSEMBLY__
 #include <asm-generic/percpu.h>
+#else
+
+/*
+ * PER_CPU finds an address of a per-cpu variable.
+ *
+ * Args:
+ *    var - variable name
+ *    cpu - 32bit register containing the current CPU number
+ *
+ * The resulting address is stored in the "cpu" argument.
+ *
+ * Example:
+ *    PER_CPU(cpu_gdt_descr, %ebx)
+ */
+#define PER_CPU(var, cpu) \
+	shll $2, cpu; \
+	movl __per_cpu_offset(cpu), cpu; \
+	addl $per_cpu__/**/var, cpu;
+
+#endif /* !__ASSEMBLY__ */
 
 #endif /* __ARCH_I386_PERCPU__ */
--- linux-2.6.18-rc2-mm1/include/asm-i386/desc.h	2006-07-29 17:34:57.000000000 +0400
+++ linux-2.6.18-rc2-mm1/include/asm-i386/desc.h	2006-07-29 17:35:07.000000000 +0400
@@ -4,8 +4,6 @@
 #include <asm/ldt.h>
 #include <asm/segment.h>
 
-#define CPU_16BIT_STACK_SIZE 1024
-
 #ifndef __ASSEMBLY__
 
 #include <linux/preempt.h>
@@ -16,8 +14,6 @@
 
 extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
 
-DECLARE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
-
 struct Xgt_desc_struct {
 	unsigned short size;
 	unsigned long address __attribute__((packed));
@@ -162,6 +158,29 @@
 	return base;
 }
 
+#else /* __ASSEMBLY__ */
+
+/*
+ * GET_DESC_BASE reads the descriptor base of the specified segment.
+ *
+ * Args:
+ *    idx - descriptor index
+ *    gdt - GDT pointer
+ *    base - 32bit register to which the base will be written
+ *    lo_w - lo word of the "base" register
+ *    lo_b - lo byte of the "base" register
+ *    hi_b - hi byte of the low word of the "base" register
+ *
+ * Example:
+ *    GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah)
+ *    Will read the base address of GDT_ENTRY_ESPFIX_SS and put it into %eax.
+ */
+#define GET_DESC_BASE(idx, gdt, base, lo_w, lo_b, hi_b) \
+	movb idx*8+4(gdt), lo_b; \
+	movb idx*8+7(gdt), hi_b; \
+	shll $16, base; \
+	movw idx*8+2(gdt), lo_w;
+
 #endif /* !__ASSEMBLY__ */
 
 #endif
--- linux-2.6.18-rc2-mm1/arch/i386/kernel/head.S	2006-07-29 15:32:14.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/head.S	2006-07-29 17:17:51.000000000 +0400
@@ -590,7 +590,7 @@
 	.quad 0x00009a000000ffff	/* 0xc0 APM CS 16 code (16 bit) */
 	.quad 0x004092000000ffff	/* 0xc8 APM DS    data */
 
-	.quad 0x0000920000000000	/* 0xd0 - ESPFIX 16-bit SS */
+	.quad 0x00cf92000000ffff	/* 0xd0 - ESPFIX SS */
 	.quad 0x0000000000000000	/* 0xd8 - unused */
 	.quad 0x0000000000000000	/* 0xe0 - unused */
 	.quad 0x0000000000000000	/* 0xe8 - unused */
--- linux-2.6.18-rc2-mm1/arch/i386/kernel/traps.c	2006-07-29 15:32:14.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/traps.c	2006-07-30 02:19:59.000000000 +0400
@@ -1018,49 +1018,20 @@
 #endif
 }
 
-fastcall void setup_x86_bogus_stack(unsigned char * stk)
+fastcall unsigned long patch_espfix_base(unsigned long uesp,
+					  unsigned long kesp)
 {
-	unsigned long *switch16_ptr, *switch32_ptr;
-	struct pt_regs *regs;
-	unsigned long stack_top, stack_bot;
-	unsigned short iret_frame16_off;
 	int cpu = smp_processor_id();
-	/* reserve the space on 32bit stack for the magic switch16 pointer */
-	memmove(stk, stk + 8, sizeof(struct pt_regs));
-	switch16_ptr = (unsigned long *)(stk + sizeof(struct pt_regs));
-	regs = (struct pt_regs *)stk;
-	/* now the switch32 on 16bit stack */
-	stack_bot = (unsigned long)&per_cpu(cpu_16bit_stack, cpu);
-	stack_top = stack_bot +	CPU_16BIT_STACK_SIZE;
-	switch32_ptr = (unsigned long *)(stack_top - 8);
-	iret_frame16_off = CPU_16BIT_STACK_SIZE - 8 - 20;
-	/* copy iret frame on 16bit stack */
-	memcpy((void *)(stack_bot + iret_frame16_off), &regs->eip, 20);
-	/* fill in the switch pointers */
-	switch16_ptr[0] = (regs->esp & 0xffff0000) | iret_frame16_off;
-	switch16_ptr[1] = __ESPFIX_SS;
-	switch32_ptr[0] = (unsigned long)stk + sizeof(struct pt_regs) +
-		8 - CPU_16BIT_STACK_SIZE;
-	switch32_ptr[1] = __KERNEL_DS;
-}
-
-fastcall unsigned char * fixup_x86_bogus_stack(unsigned short sp)
-{
-	unsigned long *switch32_ptr;
-	unsigned char *stack16, *stack32;
-	unsigned long stack_top, stack_bot;
-	int len;
-	int cpu = smp_processor_id();
-	stack_bot = (unsigned long)&per_cpu(cpu_16bit_stack, cpu);
-	stack_top = stack_bot +	CPU_16BIT_STACK_SIZE;
-	switch32_ptr = (unsigned long *)(stack_top - 8);
-	/* copy the data from 16bit stack to 32bit stack */
-	len = CPU_16BIT_STACK_SIZE - 8 - sp;
-	stack16 = (unsigned char *)(stack_bot + sp);
-	stack32 = (unsigned char *)
-		(switch32_ptr[0] + CPU_16BIT_STACK_SIZE - 8 - len);
-	memcpy(stack32, stack16, len);
-	return stack32;
+	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
+	struct desc_struct *gdt = (struct desc_struct *)cpu_gdt_descr->address;
+	unsigned long base = (kesp - uesp) & -THREAD_SIZE;
+	__u64 desc = *(__u64 *)&gdt[GDT_ENTRY_ESPFIX_SS];
+	/* Set up base for espfix segment */
+ 	desc &= 0x00ffff000000ffffULL;
+ 	desc |=	((((__u64)base) << 16) & 0x000000ffffff0000ULL) |
+		((((__u64)base) << 32) & 0xff00000000000000ULL);
+	*(__u64 *)&gdt[GDT_ENTRY_ESPFIX_SS] = desc;
+	return kesp - base;
 }
 
 /*
--- linux-2.6.18-rc2-mm1/arch/i386/kernel/entry.S	2006-07-29 15:29:00.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/entry.S	2006-07-30 02:19:47.000000000 +0400
@@ -48,6 +48,7 @@
 #include <asm/smp.h>
 #include <asm/page.h>
 #include <asm/desc.h>
+#include <asm/percpu.h>
 #include <asm/dwarf2.h>
 #include "irq_vectors.h"
 
@@ -403,23 +404,18 @@
 	 * This is an "official" bug of all the x86-compatible
 	 * CPUs, which we can try to work around to make
 	 * dosemu and wine happy. */
-	subl $8, %esp		# reserve space for switch16 pointer
-	CFI_ADJUST_CFA_OFFSET 8
+	movl OLDESP(%esp), %eax
+	movl %esp, %edx
+	call patch_espfix_base
+	pushl $__ESPFIX_SS
+	CFI_ADJUST_CFA_OFFSET 4
+	pushl %eax
+	CFI_ADJUST_CFA_OFFSET 4
 	cli
 	TRACE_IRQS_OFF
-	movl %esp, %eax
-	/* Set up the 16bit stack frame with switch32 pointer on top,
-	 * and a switch16 pointer on top of the current frame. */
-	call setup_x86_bogus_stack
-	CFI_ADJUST_CFA_OFFSET -8	# frame has moved
-	TRACE_IRQS_IRET
-	RESTORE_REGS
-	lss 20+4(%esp), %esp	# switch to 16bit stack
-1:	iret
-.section __ex_table,"a"
-	.align 4
-	.long 1b,iret_exc
-.previous
+	lss (%esp), %esp
+	CFI_ADJUST_CFA_OFFSET -8
+	jmp restore_nocheck
 	CFI_ENDPROC
 
 	# perform work that needs to be done immediately before resumption
@@ -509,30 +505,30 @@
 	CFI_ENDPROC
 
 #define FIXUP_ESPFIX_STACK \
-	movl %esp, %eax; \
-	/* switch to 32bit stack using the pointer on top of 16bit stack */ \
-	lss %ss:CPU_16BIT_STACK_SIZE-8, %esp; \
-	/* copy data from 16bit stack to 32bit stack */ \
-	call fixup_x86_bogus_stack; \
-	/* put ESP to the proper location */ \
-	movl %eax, %esp;
-#define UNWIND_ESPFIX_STACK \
+	/* since we are on a wrong stack, we cant make it a C code :( */ \
+	GET_THREAD_INFO(%ebp); \
+	movl TI_cpu(%ebp), %ebx; \
+	PER_CPU(cpu_gdt_descr, %ebx); \
+	movl GDS_address(%ebx), %ebx; \
+	GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
+	addl %esp, %eax; \
+	pushl $__KERNEL_DS; \
+	CFI_ADJUST_CFA_OFFSET 4; \
 	pushl %eax; \
 	CFI_ADJUST_CFA_OFFSET 4; \
+	lss (%esp), %esp; \
+	CFI_ADJUST_CFA_OFFSET -8;
+#define UNWIND_ESPFIX_STACK \
 	movl %ss, %eax; \
-	/* see if on 16bit stack */ \
+	/* see if on espfix stack */ \
 	cmpw $__ESPFIX_SS, %ax; \
-	je 28f; \
-27:	popl %eax; \
-	CFI_ADJUST_CFA_OFFSET -4; \
-.section .fixup,"ax"; \
-28:	movl $__KERNEL_DS, %eax; \
+	jne 27f; \
+	movl $__KERNEL_DS, %eax; \
 	movl %eax, %ds; \
 	movl %eax, %es; \
-	/* switch to 32bit stack */ \
+	/* switch to normal stack */ \
 	FIXUP_ESPFIX_STACK; \
-	jmp 27b; \
-.previous
+27:;
 
 /*
  * Build the entry stubs and pointer table with
@@ -601,7 +597,6 @@
 	pushl %eax
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET eax, 0
-	xorl %eax, %eax
 	pushl %ebp
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET ebp, 0
@@ -614,7 +609,6 @@
 	pushl %edx
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET edx, 0
-	decl %eax			# eax = -1
 	pushl %ecx
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET ecx, 0
@@ -631,7 +625,7 @@
 	/*CFI_REGISTER es, ecx*/
 	movl ES(%esp), %edi		# get the function address
 	movl ORIG_EAX(%esp), %edx	# get the error code
-	movl %eax, ORIG_EAX(%esp)
+	movl $-1, ORIG_EAX(%esp)
 	movl %ecx, ES(%esp)
 	/*CFI_REL_OFFSET es, ES*/
 	movl $(__USER_DS), %ecx
@@ -733,7 +727,7 @@
 	cmpw $__ESPFIX_SS, %ax
 	popl %eax
 	CFI_ADJUST_CFA_OFFSET -4
-	je nmi_16bit_stack
+	je nmi_espfix_stack
 	cmpl $sysenter_entry,(%esp)
 	je nmi_stack_fixup
 	pushl %eax
@@ -772,14 +766,13 @@
 	FIX_STACK(24,nmi_stack_correct, 1)
 	jmp nmi_stack_correct
 
-nmi_16bit_stack:
+nmi_espfix_stack:
 	RING0_INT_FRAME
 	/* create the pointer to lss back */
 	pushl %ss
 	CFI_ADJUST_CFA_OFFSET 4
 	pushl %esp
 	CFI_ADJUST_CFA_OFFSET 4
-	movzwl %sp, %esp
 	addw $4, (%esp)
 	/* copy the iret frame of 12 bytes */
 	.rept 3
@@ -790,11 +783,11 @@
 	CFI_ADJUST_CFA_OFFSET 4
 	SAVE_ALL
 	FIXUP_ESPFIX_STACK		# %eax == %esp
-	CFI_ADJUST_CFA_OFFSET -20	# the frame has now moved
 	xorl %edx,%edx			# zero error code
 	call do_nmi
 	RESTORE_REGS
-	lss 12+4(%esp), %esp		# back to 16bit stack
+	lss 12+4(%esp), %esp		# back to espfix stack
+	CFI_ADJUST_CFA_OFFSET -24
 1:	iret
 	CFI_ENDPROC
 .section __ex_table,"a"
--- linux-2.6.18-rc2-mm1/arch/i386/kernel/asm-offsets.c	2006-07-29 15:29:00.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/asm-offsets.c	2006-08-01 12:38:25.000000000 +0400
@@ -58,6 +58,11 @@
 	OFFSET(TI_sysenter_return, thread_info, sysenter_return);
 	BLANK();
 
+	OFFSET(GDS_size, Xgt_desc_struct, size);
+	OFFSET(GDS_address, Xgt_desc_struct, address);
+	OFFSET(GDS_pad, Xgt_desc_struct, pad);
+	BLANK();
+
 	OFFSET(EXEC_DOMAIN_handler, exec_domain, handler);
 	OFFSET(RT_SIGFRAME_sigcontext, rt_sigframe, uc.uc_mcontext);
 	BLANK();
--- linux-2.6.18-rc2-mm1/arch/i386/kernel/cpu/common.c	2006-07-29 15:29:00.000000000 +0400
+++ linux-2.6.18-rc2-mm1/arch/i386/kernel/cpu/common.c	2006-07-29 17:39:01.000000000 +0400
@@ -24,9 +24,6 @@
 DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
-DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
-EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
-
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
 static int disable_x86_serial_nr __cpuinitdata = 1;
@@ -594,7 +591,6 @@
 	struct tss_struct * t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &current->thread;
 	struct desc_struct *gdt;
-	__u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu);
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 
 	if (cpu_test_and_set(cpu, cpu_initialized)) {
@@ -642,13 +638,6 @@
 	 * and set up the GDT descriptor:
 	 */
  	memcpy(gdt, cpu_gdt_table, GDT_SIZE);
-
-	/* Set up GDT entry for 16bit stack */
- 	*(__u64 *)(&gdt[GDT_ENTRY_ESPFIX_SS]) |=
-		((((__u64)stk16_off) << 16) & 0x000000ffffff0000ULL) |
-		((((__u64)stk16_off) << 32) & 0xff00000000000000ULL) |
-		(CPU_16BIT_STACK_SIZE - 1);
-
 	cpu_gdt_descr->size = GDT_SIZE - 1;
  	cpu_gdt_descr->address = (unsigned long)gdt;
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01 12:21   ` + espfix-code-cleanup.patch added to -mm tree Stas Sergeev
@ 2006-08-01 13:38     ` Jan Beulich
  2006-08-01 14:37       ` Stas Sergeev
  2006-08-01 21:01     ` Zachary Amsden
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2006-08-01 13:38 UTC (permalink / raw)
  To: Stas Sergeev, Zachary Amsden
  Cc: 76306.1226, rohitseth, ak, akpm, Linux kernel

>>>      pushl %eax; \                     <----
>>> -    CFI_ADJUST_CFA_OFFSET 4; \
>>> +    lss (%esp), %esp;
>> <---- CFI adjustment here is missing.
>Well, someone used that macro in a .fixup section, where the
>CFI adjustments do not seem to work. But since I don't know
>why this was done (Jan?), I reverted that to my original code and
>added the adjustments now.

The macro in question is UNWIND_ESPFIX_STACK, which is used in exactly
one place (in normal .text). Even more, the macro itself switches to
.fixup,
so it would be rather odd if it was used in a .fixup section by itself.
Note
that FIXUP_ESPFIX_STACK doesn't have any annotations, and hence can
freely be used in .fixup.

>>>      FIXUP_ESPFIX_STACK        # %eax == %esp
>>> -    CFI_ADJUST_CFA_OFFSET -20    # the frame has now moved
>>> +    CFI_ADJUST_CFA_OFFSET -20
>> Is this CFI adjustment still correct?
>Hmm, I guess it wasn't correct also before, so I just
>left it as is. Should now be fixed. 

I would think it was correct, but surely annotating these pieces of
code
wasn't something that anybody but the original author (you) should
have
done, as it wasn't too difficult to get lost.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01 13:38     ` Jan Beulich
@ 2006-08-01 14:37       ` Stas Sergeev
  2006-08-01 14:43         ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Stas Sergeev @ 2006-08-01 14:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zachary Amsden, 76306.1226, rohitseth, ak, akpm, Linux kernel

Hi.

Jan Beulich wrote:
>> Well, someone used that macro in a .fixup section, where the
>> CFI adjustments do not seem to work. But since I don't know
>> why this was done (Jan?), I reverted that to my original code and
>> added the adjustments now.
> The macro in question is UNWIND_ESPFIX_STACK, which is used in exactly
No, that was about FIXUP_ESPFIX_STACK in fact.

> Even more, the macro itself switches to .fixup,
... where it uses FIXUP_ESPFIX_STACK. I haven't done that.
Someone else added the .fixup section to UNWIND_ESPFIX_STACK,
and so the FIXUP_ESPFIX_STACK became used in that section.
I removed that now with my patch, unless someone can tell
me why it was needed.

> Note
> that FIXUP_ESPFIX_STACK doesn't have any annotations, and hence can
> freely be used in .fixup.
Yes, but now the annotations had to be added, and so was the problem.

>> Hmm, I guess it wasn't correct also before, so I just
>> left it as is. Should now be fixed.
> I would think it was correct,
But why? FIXUP_ESPFIX_STACK doesn't pop up 20 bytes or something,
it just used to copy the entire stack frame from 16bit to 32bit
stack. It is much more than 20 bytes, but at the end, %ss:%esp
points to the very same data. So where exactly did 20 come from?
Well, historical interest mainly, as that code is going to die
when we agree on the patch, and the new code is much cleaner and
won't raise such a questions.

> but surely annotating these pieces of
> code wasn't something that anybody but the original author (you) should
> have done, as it wasn't too difficult to get lost.
When it was written, such an annotations were not needed yet, so
I couldn't do that.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01 14:37       ` Stas Sergeev
@ 2006-08-01 14:43         ` Jan Beulich
  2006-08-01 15:09           ` Stas Sergeev
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2006-08-01 14:43 UTC (permalink / raw)
  To: Stas Sergeev
  Cc: 76306.1226, rohitseth, ak, akpm, Linux kernel, Zachary Amsden

>>> Well, someone used that macro in a .fixup section, where the
>>> CFI adjustments do not seem to work. But since I don't know
>>> why this was done (Jan?), I reverted that to my original code and
>>> added the adjustments now.
>> The macro in question is UNWIND_ESPFIX_STACK, which is used in
exactly
>No, that was about FIXUP_ESPFIX_STACK in fact.
>
>> Even more, the macro itself switches to .fixup,
>... where it uses FIXUP_ESPFIX_STACK. I haven't done that.
>Someone else added the .fixup section to UNWIND_ESPFIX_STACK,
>and so the FIXUP_ESPFIX_STACK became used in that section.
>I removed that now with my patch, unless someone can tell
>me why it was needed.

That was me, in order to get the unwind annotations right without
complicating the code too much. Again, FIXUP_ESPFIX_STACK doesn't
use any unwind directives so can be used anywhere, including the
.fixup section UNWIND_ESPFIX_STACK switches to.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01 14:43         ` Jan Beulich
@ 2006-08-01 15:09           ` Stas Sergeev
  0 siblings, 0 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-08-01 15:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: 76306.1226, rohitseth, ak, akpm, Linux kernel, Zachary Amsden

Hi.

Jan Beulich wrote:
> That was me, in order to get the unwind annotations right without
> complicating the code too much. Again, FIXUP_ESPFIX_STACK doesn't
> use any unwind directives so can be used anywhere, including the
> .fixup section UNWIND_ESPFIX_STACK switches to.
Yes, but the new version of FIXUP_ESPFIX_STACK _will_
use the annotations (that was Zach's complain), and therefore
it can't be used in a .fixup, and so the new UNWIND_ESPFIX_STACK
_will not_ use the .fixup, just like it was before your change.
I am not saying that your change was wrong, but now I had to undo
it. Let me know if that makes a problem.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01 12:21   ` + espfix-code-cleanup.patch added to -mm tree Stas Sergeev
  2006-08-01 13:38     ` Jan Beulich
@ 2006-08-01 21:01     ` Zachary Amsden
  2006-08-02 17:12       ` Stas Sergeev
  1 sibling, 1 reply; 16+ messages in thread
From: Zachary Amsden @ 2006-08-01 21:01 UTC (permalink / raw)
  To: Stas Sergeev; +Cc: akpm, 76306.1226, ak, JBeulich, rohitseth, Linux kernel

Stas Sergeev wrote:
> Hi Zach and thanks for you really great review!
>
> Zachary Amsden wrote:
>> Overall, this looks like a great cleanup.  But I think there are some 
>> details with the patch that need fixing.  It is extremely hard to 
>> follow the entry.S changes based on patches alone - which tree does 
>> this apply cleanly to?
> This should cleanly apply to 2.6.18-rc2-mm1.
>
>> I would like to see the final result in entry.S, there are
> I'll send it to you privately.
>
>> still some details (dealing with eliminating the push/pop of %eax as 
>> a temporary for the segment comparison) which I could not follow 
>> exactly in patch format.
> The reason for removal was that I also removed the xorl %eax,%eax, 
> decl %eax
> stuff from error_code (replaced with $-1, as you've seen), and so 
> preserving
> %eax became unnecessary I think.
>
>>> iret_exc:
>>> -    TRACE_IRQS_ON
>>> -    sti
>>>      pushl $0            # no error code
>>>      pushl $do_iret_error
>>>      jmp error_code
>>>  
>> Why did you remove the sti here?
> Because I simply broke my head over that code. If you say sti should
> stay - I beleive you. But please, explain me how it works, before I
> went crazy! :)

> I read the iret pseudo-code in the Intel manuals carefully. It first
> pops the iret frame into temporaries, then checks the values for
> validity, and then either faults or writes the values to the real
> registers. Now, if we have a bad user's iret frame, the iret will
> pop it, raise an exception and push a kernel's iret frame, while
> the popped user's frame at that point should be completely lost,
> as far as I can understand. Now we go into GPF handler, then to a
> fixup, push $0 as an error code, but what iret frame is above that
> error code? Where does it come from? Doesn't the iret loses the
> user's iret frame completely? Certainly I am missing something obvious
> here, but I just can't follow that magic...
>
>> Did you test this with deliberately faulting IRETs?  I think this is 
>> a bug.
> Yes, I tested that and it works (I have a rather extensive test-suite
> for that problem - faulting iret is always tested). But since I
> don't understand the magic, I can't say why it works.

You need to get a #GP or #NP on the faulting iret.  Several ways to do 
that -

1) Issue int $0x80 from last two bytes of an LDT code segment with 
truncated limit (#GP)
2) Move %ss to LDT, set %esp high.  Issue write LDT system call via int 
$0x80 that truncates the limit below %esp (#GP)
3) Move %cs / %ss to LDT.  Issue write LDT system call and make cs / ss 
segment not present (#NP)

Note that on the path back to the iret, interrupts are disabled.  The 
iret faults, but doesn't pop the user return frame.  The check for the 
fixup section will happen, and fixup_exception will return to the STI 
instruction in the fixup, with the user frame still above on the stack 
(but note the return pops EFLAGS, which disables interrupts as they were 
before for the iret).  The fixup returns, pushes a fake error code onto 
the stack, followed by a push of the exception handle, do_iret_error.  
It then re-enters the common error_code path, with saves registers and 
dispatches to do_iret_error, which follows the notify path, either 
killing the process with a SIGSEGV or delivering a signal.  The whole 
time the user return frame is sitting up there on the stack, so taking 
this exception:

DO_ERROR_INFO(32, SIGSEGV, "iret exception", iret_error, ILL_BADSTK, 0)

makes it appear as if the user instruction to which the kernel was 
trying to return caused a SIGSEGV / ILL_BADSTK.

I think you actually want interrupts enabled in that path, since if you 
look at the #GP handler, it is a trap gate, not an interrupt gate - so 
#GPs from userspace that take a very similar signal notifier path enter 
with interrupts enabled.  I don't think very much would go wrong if 
interrupts were left disabled, but it makes me nervous to change that 
semantic.


>> Bigger problem!  (*!)  It is _unsafe_ to call C-code here.
> Ow, holly shit!!! What a disaster... But you are right. Well,
> this will make my new patch really big...
>
>> You did not introduce this bug, I just spotted it now, and the old 
>> code has the same problem.
> You are wrong here - I introduced that bug as the old code is
> also mine. Linus attributed it to someone else though, probably
> by mistake (or did he just wanted to rescue me from the rant about
> that code? :). But it was mine.

Ah, ok.

>> There are two options I see to fixing this problem.  One is to move 
>> the GDT fixup code into assembler.  The other is to compile the GDT 
>> fixup code in a separate .o file with exactly specified CFLAGS to 
>> make sure -fomit-frame-pointer is not passed to it.
> Well, unsafe is unsafe. If we can't explicitly tell gcc to prefix the
> %ebp accesses with %ds, then I'd better go for an asm version.

Sounds like a plan.

>
>>> -    .quad 0x0000920000000000    /* 0xd0 - ESPFIX 16-bit SS */
>>> +    .quad 0x00cf92000000ffff    /* 0xd0 - ESPFIX SS */
>> Seems a bit dangerous to allow access to full 4GB through this.  Can 
>> you tighten the limit any?  I suppose not, because the high bits in 
>> %esp really could be anything.  But it might be nice to try setting 
>> the limit to regs->esp + THREAD_SIZE.  Of course, this is not 
>> strictly necessary, just an extra paranoid protection mechanism.
> Since, when calculating the base, I do &-THREAD_SIZE, I guess the minimal
> safe limit is regs->esp + THREAD_SIZE*2... Well, may just I not do 
> that please? :)
> For what, btw? There are no such a things for __KERNEL_DS or anything, so
> I just don't see the necessity.

It helps track down any bugs that could leak through otherwise and 
corrupt random memory.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01 21:01     ` Zachary Amsden
@ 2006-08-02 17:12       ` Stas Sergeev
  2006-08-02 18:30         ` Zachary Amsden
  0 siblings, 1 reply; 16+ messages in thread
From: Stas Sergeev @ 2006-08-02 17:12 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Linux kernel

Hi,

Zachary Amsden wrote:
> You need to get a #GP or #NP on the faulting iret.  Several ways to do 
> that -
I do that much simpler - I provoke a SIGSEGV and in a signal handler
I put the wrong value to scp->cs or scp->ss, and that makes iret to fault.

> iret faults, but doesn't pop the user return frame.
But does it push the kernel frame after it or not?
If not - I don't understand how we go to a fixup.
If yes - I don't understand how the user's frame gets
accessed later, as it is above the kernel's frame.

>> safe limit is regs->esp + THREAD_SIZE*2... Well, may just I not do 
>> that please? :)
>> For what, btw? There are no such a things for __KERNEL_DS or anything, so
>> I just don't see the necessity.
> It helps track down any bugs that could leak through otherwise and 
> corrupt random memory.
I think regs->esp + THREAD_SIZE*2 is already very permissive,
and I'd like to avoid messing with granularity. So unless you
really insist, I'll better not do that. :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-02 17:12       ` Stas Sergeev
@ 2006-08-02 18:30         ` Zachary Amsden
  2006-08-02 19:12           ` Stas Sergeev
  0 siblings, 1 reply; 16+ messages in thread
From: Zachary Amsden @ 2006-08-02 18:30 UTC (permalink / raw)
  To: Stas Sergeev; +Cc: Linux kernel

Stas Sergeev wrote:
> Hi,
>
> Zachary Amsden wrote:
>> You need to get a #GP or #NP on the faulting iret.  Several ways to 
>> do that -
> I do that much simpler - I provoke a SIGSEGV and in a signal handler
> I put the wrong value to scp->cs or scp->ss, and that makes iret to 
> fault.

Ok, that's a new trick ;)

>
>> iret faults, but doesn't pop the user return frame.
> But does it push the kernel frame after it or not?
> If not - I don't understand how we go to a fixup.
> If yes - I don't understand how the user's frame gets
> accessed later, as it is above the kernel's frame.

Yes.  The iret faults, the fault pushes a new kernel frame - and the 
fault handler's iret returns, removing the kernel frame.  So the kernel 
frame is gone by the time the fixup runs.

>
>>> safe limit is regs->esp + THREAD_SIZE*2... Well, may just I not do 
>>> that please? :)
>>> For what, btw? There are no such a things for __KERNEL_DS or 
>>> anything, so
>>> I just don't see the necessity.
>> It helps track down any bugs that could leak through otherwise and 
>> corrupt random memory.
> I think regs->esp + THREAD_SIZE*2 is already very permissive,
> and I'd like to avoid messing with granularity. So unless you
> really insist, I'll better not do that. :)

It's really hard to catch bugs that could otherwise happen when a 
non-zero based stack gets used (for example, C code which uses %ebp with 
-fomit-frame-pointer).  Setting the limit to THREAD_SIZE should 
guarantee that the non-zero based stack never is used to access anything 
but the stack and current thread.

Zach


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-02 18:30         ` Zachary Amsden
@ 2006-08-02 19:12           ` Stas Sergeev
  0 siblings, 0 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-08-02 19:12 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Linux kernel

Hi.

Zachary Amsden wrote:
> Yes.  The iret faults, the fault pushes a new kernel frame - and the 
> fault handler's iret returns, removing the kernel frame.  So the kernel 
> frame is gone by the time the fixup runs.
OK, thanks! I wasn't completely realizing that the fixup runs
after an exception handler is already returned. Now it all looks
pretty obvious. :)

> It's really hard to catch bugs that could otherwise happen when a 
> non-zero based stack gets used (for example, C code which uses %ebp with 
> -fomit-frame-pointer).  Setting the limit to THREAD_SIZE should 
> guarantee that the non-zero based stack never is used to access anything 
> but the stack and current thread.
Yes, be there a possibility the set the *constant* limit (THREAD_SIZE),
I'd certainly do that, no questions. But as long as we are talking
about the nasty non-constant limit like regs->esp+THREAD_SIZE*2, is it
really worth an efforts? This limit is very unpredictable. I'll have
to add the code to deal with granularity. And its still very, very
permissive. Not even nearly something like just THREAD_SIZE.
Do you really, really think it is worth all the headache?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-12  2:27 Chuck Ebbert
@ 2006-08-12 10:35 ` Stas Sergeev
  0 siblings, 0 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-08-12 10:35 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Zachary Amsden, linux-kernel, Jan Beulich

Hello.

Chuck Ebbert wrote:
> It's really not that hard to get the limit:
>         limit_in_bytes = new_esp | (THREAD_SIZE - 1)
>         limit_in_pages = limit_in_bytes >> 12
I was worrying about a corner cases. The new_esp can
be just everything. It can be < THREAD_SIZE, in which
case the limit_in_pages will be 0. Or, I beleive, it
can even be a negative value, which will turn into a
a value larger than the old_esp.
But after calculating a few examples, I have almost
convinced myself that your technique will work in all
such a cases. So I'll try that as soon as the new -mm's
will boot for me again.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
@ 2006-08-12  2:27 Chuck Ebbert
  2006-08-12 10:35 ` Stas Sergeev
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Ebbert @ 2006-08-12  2:27 UTC (permalink / raw)
  To: Stas Sergeev; +Cc: Zachary Amsden, linux-kernel, Jan Beulich

In-Reply-To: <44CF474C.9070800@aknet.ru>

On Tue, 01 Aug 2006 16:21:32 +0400, Stas Sergeev wrote:

> >> -    .quad 0x0000920000000000    /* 0xd0 - ESPFIX 16-bit SS */
> >> +    .quad 0x00cf92000000ffff    /* 0xd0 - ESPFIX SS */

> > Seems a bit dangerous to allow access to full 4GB through this.  Can you 
> > tighten the limit any?  I suppose not, because the high bits in %esp 
> > really could be anything.  But it might be nice to try setting the limit 
> > to regs->esp + THREAD_SIZE.  Of course, this is not strictly necessary, 
> > just an extra paranoid protection mechanism.

> Since, when calculating the base, I do &-THREAD_SIZE, I guess the minimal
> safe limit is regs->esp + THREAD_SIZE*2... Well, may just I not do that please? :)
> For what, btw? There are no such a things for __KERNEL_DS or anything, so
> I just don't see the necessity.

It's really not that hard to get the limit:

        limit_in_bytes = new_esp | (THREAD_SIZE - 1)
        limit_in_pages = limit_in_bytes >> 12

And this will catch any bad accesses that assume zero-based pointers:

        kernel stack is at f7000000
        user stack is at   b7000000

        SS base =  40000000
        SS limit = b7001fff

All kernel pointers will be >c0000000 and will trap on access if they
try to use SS.  And it will work with any user/kernel split.

-- 
Chuck


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-02 19:14 Chuck Ebbert
@ 2006-08-02 19:31 ` Stas Sergeev
  0 siblings, 0 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-08-02 19:31 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Zachary Amsden, linux-kernel

Hi.

Chuck Ebbert wrote:
> Only problem I have with this is we lose the original fault info from
> the iret.  So we have no real way of knowing whether it was #GP, #NP, #SF
> or whatever, and no record of the offending iret's address.
Thanks for the precise explanation.
There was also a problem with me reading the Intel's manual:
it uses Pop() in their pseudo-code, and it Pop()'s the values
*before* checking them. The description of the Pop() is very
confusing:
---
Pop() removes the value from the top of the stack and returns it.
---
What "removes" means here is unclear. Whether it adjusts a stack
pointer, is unclear. Since it is Pop(), I was assuming "removes"
means it also adjusts the stack pointer, but now I see it was a
wrong guess.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
@ 2006-08-02 19:14 Chuck Ebbert
  2006-08-02 19:31 ` Stas Sergeev
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Ebbert @ 2006-08-02 19:14 UTC (permalink / raw)
  To: Stas Sergeev; +Cc: Zachary Amsden, linux-kernel

In-Reply-To: <44D0DCF5.8050906@aknet.ru>

On Wed, 02 Aug 2006 21:12:21 +0400, Stas Sergeev wrote:
> 
> > iret faults, but doesn't pop the user return frame.
> But does it push the kernel frame after it or not?
> If not - I don't understand how we go to a fixup.
> If yes - I don't understand how the user's frame gets
> accessed later, as it is above the kernel's frame.

Just before trying to return to userspace, we have a stack:

        user_regs [ebx ... es]
        orig_eax
        user_iret_frame [eip ... oldss]

After RESTORE_ALL and discarding orig_eax, we have this just
before doing iret (user's regs are in the CPU regs now):

        user_iret_frame [eip ... oldss]

iret faults and we get:

        kernel_iret_frame [eip(of iret) ... flags]
        user_iret_frame [eip ... oldss]

error_code then saves regs and we have:

        user_regs [ebx ... es]
        orig_eax [== -1]
        kernel_iret_frame [eip(iret) ... flags]
        user_iret_frame [eip ... oldss]

error_code then calls e.g. do_segment_not_present, which finds a fixup
and does:

        regs->eip = fixup_address;

now we have:

        user_regs [ebx ... es]
        orig_eax [== -1]
        kernel_iret_frame [eip(fixup) ... flags]
        user_iret_frame [eip ... oldss]

standard return sequence gives us (again user's regs are back in CPU):

        kernel_iret_frame [eip(fixup) ... flags]
        user_iret_frame [eip ... oldss]

iret returns to the fixup code which jumps to error_code and then we have:

        user_regs [ebx ... es]
        orig_eax [== -1]
        user_iret_frame [eip ... oldss]

So now there is a stack frame that looks like it came from userspace
and we call the iret fault handler with that.

Only problem I have with this is we lose the original fault info from
the iret.  So we have no real way of knowing whether it was #GP, #NP, #SF
or whatever, and no record of the offending iret's address.

-- 
Chuck


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + espfix-code-cleanup.patch added to -mm tree
  2006-08-01  2:24 Chuck Ebbert
@ 2006-08-01 12:39 ` Stas Sergeev
  0 siblings, 0 replies; 16+ messages in thread
From: Stas Sergeev @ 2006-08-01 12:39 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: akpm, zach, rohitseth, JBeulich, ak, linux-kernel

Hi.

Chuck Ebbert wrote:
> we are on a ring0 32-bit stack that's not zero-based.  If an exception
> occurs in that state, UNWIND_ESPFIX_STACK restores the proper kernel
> SS and ESP but on return from the exception nothing restores the espfix
> stack.  I guess this isn't a problem now because exceptions in kernel
> mode are fatal but a kernel debugger might have problems here?
Perhaps you are right, but... unless there is some quick
way to mark that part of code "undebuggable", I'll better
leave that for the debugger maintainers to think about.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* + espfix-code-cleanup.patch added to -mm tree
@ 2006-08-01  2:24 Chuck Ebbert
  2006-08-01 12:39 ` Stas Sergeev
  0 siblings, 1 reply; 16+ messages in thread
From: Chuck Ebbert @ 2006-08-01  2:24 UTC (permalink / raw)
  To: akpm; +Cc: zach, rohitseth, JBeulich, ak, stsp, linux-kernel

In-Reply-To: <200607300016.k6U0GYu4023664@shell0.pdx.osdl.net>

On Sat, 29 Jul 2006 17:16:34 -0700, Andrew Morton wrote:

>     espfix-code-cleanup.patch

After the fixup code does this:

       movl %esp, %eax         # pt_regs pointer
       movl %esp, %edx
       call patch_espfix_gdt
       pushl $__ESPFIX_SS
       CFI_ADJUST_CFA_OFFSET 4
       pushl %eax
       CFI_ADJUST_CFA_OFFSET 4
==>    lss (%esp), %esp
       CFI_ADJUST_CFA_OFFSET -8
       jmp restore_nocheck

we are on a ring0 32-bit stack that's not zero-based.  If an exception
occurs in that state, UNWIND_ESPFIX_STACK restores the proper kernel
SS and ESP but on return from the exception nothing restores the espfix
stack.  I guess this isn't a problem now because exceptions in kernel
mode are fatal but a kernel debugger might have problems here?

-- 
Chuck

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-08-12 10:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200607300016.k6U0GYu4023664@shell0.pdx.osdl.net>
2006-07-30 10:57 ` [patch] espfix code cleanup more Stas Sergeev
     [not found] ` <44CE766D.6000705@vmware.com>
2006-08-01 12:21   ` + espfix-code-cleanup.patch added to -mm tree Stas Sergeev
2006-08-01 13:38     ` Jan Beulich
2006-08-01 14:37       ` Stas Sergeev
2006-08-01 14:43         ` Jan Beulich
2006-08-01 15:09           ` Stas Sergeev
2006-08-01 21:01     ` Zachary Amsden
2006-08-02 17:12       ` Stas Sergeev
2006-08-02 18:30         ` Zachary Amsden
2006-08-02 19:12           ` Stas Sergeev
2006-08-01  2:24 Chuck Ebbert
2006-08-01 12:39 ` Stas Sergeev
2006-08-02 19:14 Chuck Ebbert
2006-08-02 19:31 ` Stas Sergeev
2006-08-12  2:27 Chuck Ebbert
2006-08-12 10:35 ` Stas Sergeev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.