linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Implement per-processor data areas for i386.
@ 2006-08-30 23:52 Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 1/8] Use asm-offsets for the offsets of registers into the pt_regs struct, rather than having hard-coded constants Jeremy Fitzhardinge
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

(Changes since previous post:
 - now works
 - fixed sys_vm86
 - performance measurements)

Implement per-processor data areas for i386.

This patch implements per-processor data areas by using %gs as the
base segment of the per-processor memory.  This has two principle
advantages:

- It allows very simple direct access to per-processor data by
  effectively using an effective address of the form %gs:offset, where
  offset is the offset into struct i386_pda.  These sequences are faster
  and smaller than the current mechanism using current_thread_info().

- It also allows per-CPU data to be allocated as each CPU is brought
  up, rather than statically allocating it based on the maximum number
  of CPUs which could be brought up. (Though the existing per-cpu
  mechanism could be changed to do this.)

Performance:

I've done some simple performance tests on an Intel Core Duo running
at 1GHz (to emphisize any performance delta).  The results for the
lmbench null syscall latency test, which should show the most negative
effect from this change, show a ~8-9ns decline (.237uS -> .245uS).
This corresponds to around 9 CPU cycles, and correlates well with
the addition of the push/load/pop %gs into the hot path.

I have not yet measured the effect on other typees of processor or
more complex syscalls (though I would expect the push/pop overhead
would be drowned by longer times spent in the kernel, and mitigated by
actual use of the PDA).

The size improvements on the kernel text are nice as well: 
    2889361 -> 2883936 = 5425 bytes saved


Some background for people unfamiliar with x86 segmentation:

This uses the x86 segmentation stuff in a way similar to NPTL's way of
implementing Thread-Local Storage.  It relies on the fact that each CPU
has its own Global Descriptor Table (GDT), which is basically an array
of base-length pairs (with some extra stuff).  When a segment register
is loaded with a descriptor (approximately, an index in the GDT), and
you use that segment register for memory access, the address has the
base added to it, and the resulting address is used.

In other words, if you imagine the GDT containing an entry:
	Index	Offset
	123:	0xc0211000 (allocated PDA)
and you load %gs with this selector:
	mov $123, %gs
and then use GS later on:
	mov %gs:4, %eax
This has the effect of
	mov 0xc0211004, %eax
and because the GDT is per-CPU, the offset (= 0xc0211000 = memory
allocated for this CPU's PDA) can be a CPU-specific value while leaving
everything else constant.

This means that something like "current" or "smp_processor_id()" can
collapse to a single instruction:
	mov %gs:PDA_current, %reg


TODO: 
- Modify more things to use the PDA.  The more that uses it, the more
  the cost of the %gs save/restore is amortized.  smp_processor_id and
  current are the obvious first choices, which are implemented in this
  series.
--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/8] Use asm-offsets for the offsets of registers into the pt_regs struct, rather than having hard-coded constants.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 2/8] Basic definitions for i386-pda Jeremy Fitzhardinge
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen,
	Andrew Morton, Keith Owens

[-- Attachment #1: pt_regs-asm-offsets.patch --]
[-- Type: text/plain, Size: 8372 bytes --]

I left the constants in the comments of entry.S because they're useful
for reference; the code in entry.S is very dependent on the layout of
pt_regs, even when using asm-offsets.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Keith Owens <kaos@ocs.com.au>

[ This is identical to the previously posted version of the patch. ]

---
 arch/i386/kernel/asm-offsets.c |   17 +++++
 arch/i386/kernel/entry.S       |  118 +++++++++++++++++-----------------------
 2 files changed, 68 insertions(+), 67 deletions(-)


===================================================================
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -58,6 +58,23 @@ void foo(void)
 	OFFSET(TI_sysenter_return, thread_info, sysenter_return);
 	BLANK();
 
+	OFFSET(PT_EBX, pt_regs, ebx);
+	OFFSET(PT_ECX, pt_regs, ecx);
+	OFFSET(PT_EDX, pt_regs, edx);
+	OFFSET(PT_ESI, pt_regs, esi);
+	OFFSET(PT_EDI, pt_regs, edi);
+	OFFSET(PT_EBP, pt_regs, ebp);
+	OFFSET(PT_EAX, pt_regs, eax);
+	OFFSET(PT_DS,  pt_regs, xds);
+	OFFSET(PT_ES,  pt_regs, xes);
+	OFFSET(PT_ORIG_EAX, pt_regs, orig_eax);
+	OFFSET(PT_EIP, pt_regs, eip);
+	OFFSET(PT_CS,  pt_regs, xcs);
+	OFFSET(PT_EFLAGS, pt_regs, eflags);
+	OFFSET(PT_OLDESP, pt_regs, esp);
+	OFFSET(PT_OLDSS,  pt_regs, xss);
+	BLANK();
+
 	OFFSET(EXEC_DOMAIN_handler, exec_domain, handler);
 	OFFSET(RT_SIGFRAME_sigcontext, rt_sigframe, uc.uc_mcontext);
 	BLANK();
===================================================================
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -53,22 +53,6 @@
 
 #define nr_syscalls ((syscall_table_size)/4)
 
-EBX		= 0x00
-ECX		= 0x04
-EDX		= 0x08
-ESI		= 0x0C
-EDI		= 0x10
-EBP		= 0x14
-EAX		= 0x18
-DS		= 0x1C
-ES		= 0x20
-ORIG_EAX	= 0x24
-EIP		= 0x28
-CS		= 0x2C
-EFLAGS		= 0x30
-OLDESP		= 0x34
-OLDSS		= 0x38
-
 CF_MASK		= 0x00000001
 TF_MASK		= 0x00000100
 IF_MASK		= 0x00000200
@@ -92,7 +76,7 @@ VM_MASK		= 0x00020000
 
 .macro TRACE_IRQS_IRET
 #ifdef CONFIG_TRACE_IRQFLAGS
-	testl $IF_MASK,EFLAGS(%esp)     # interrupts off?
+	testl $IF_MASK,PT_EFLAGS(%esp)     # interrupts off?
 	jz 1f
 	TRACE_IRQS_ON
 1:
@@ -195,18 +179,18 @@ 4:	movl $0,(%esp);	\
 
 #define RING0_PTREGS_FRAME \
 	CFI_STARTPROC simple;\
-	CFI_DEF_CFA esp, OLDESP-EBX;\
-	/*CFI_OFFSET cs, CS-OLDESP;*/\
-	CFI_OFFSET eip, EIP-OLDESP;\
-	/*CFI_OFFSET es, ES-OLDESP;*/\
-	/*CFI_OFFSET ds, DS-OLDESP;*/\
-	CFI_OFFSET eax, EAX-OLDESP;\
-	CFI_OFFSET ebp, EBP-OLDESP;\
-	CFI_OFFSET edi, EDI-OLDESP;\
-	CFI_OFFSET esi, ESI-OLDESP;\
-	CFI_OFFSET edx, EDX-OLDESP;\
-	CFI_OFFSET ecx, ECX-OLDESP;\
-	CFI_OFFSET ebx, EBX-OLDESP
+	CFI_DEF_CFA esp, PT_OLDESP-PT_EBX;\
+	/*CFI_OFFSET cs, PT_CS-PT_OLDESP;*/\
+	CFI_OFFSET eip, PT_EIP-PT_OLDESP;\
+	/*CFI_OFFSET es, PT_ES-PT_OLDESP;*/\
+	/*CFI_OFFSET ds, PT_DS-PT_OLDESP;*/\
+	CFI_OFFSET eax, PT_EAX-PT_OLDESP;\
+	CFI_OFFSET ebp, PT_EBP-PT_OLDESP;\
+	CFI_OFFSET edi, PT_EDI-PT_OLDESP;\
+	CFI_OFFSET esi, PT_ESI-PT_OLDESP;\
+	CFI_OFFSET edx, PT_EDX-PT_OLDESP;\
+	CFI_OFFSET ecx, PT_ECX-PT_OLDESP;\
+	CFI_OFFSET ebx, PT_EBX-PT_OLDESP
 
 ENTRY(ret_from_fork)
 	CFI_STARTPROC
@@ -234,8 +218,8 @@ ret_from_intr:
 ret_from_intr:
 	GET_THREAD_INFO(%ebp)
 check_userspace:
-	movl EFLAGS(%esp), %eax		# mix EFLAGS and CS
-	movb CS(%esp), %al
+	movl PT_EFLAGS(%esp), %eax	# mix EFLAGS and CS
+	movb PT_CS(%esp), %al
 	andl $(VM_MASK | SEGMENT_RPL_MASK), %eax
 	cmpl $USER_RPL, %eax
 	jb resume_kernel		# not returning to v8086 or userspace
@@ -258,7 +242,7 @@ need_resched:
 	movl TI_flags(%ebp), %ecx	# need_resched set ?
 	testb $_TIF_NEED_RESCHED, %cl
 	jz restore_all
-	testl $IF_MASK,EFLAGS(%esp)     # interrupts off (exception path) ?
+	testl $IF_MASK,PT_EFLAGS(%esp)	# interrupts off (exception path) ?
 	jz restore_all
 	call preempt_schedule_irq
 	jmp need_resched
@@ -323,15 +307,15 @@ 1:	movl (%ebp),%ebp
 	cmpl $(nr_syscalls), %eax
 	jae syscall_badsys
 	call *sys_call_table(,%eax,4)
-	movl %eax,EAX(%esp)
+	movl %eax,PT_EAX(%esp)
 	DISABLE_INTERRUPTS
 	TRACE_IRQS_OFF
 	movl TI_flags(%ebp), %ecx
 	testw $_TIF_ALLWORK_MASK, %cx
 	jne syscall_exit_work
 /* if something modifies registers it must also disable sysexit */
-	movl EIP(%esp), %edx
-	movl OLDESP(%esp), %ecx
+	movl PT_EIP(%esp), %edx
+	movl PT_OLDESP(%esp), %ecx
 	xorl %ebp,%ebp
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS_SYSEXIT
@@ -345,7 +329,7 @@ ENTRY(system_call)
 	CFI_ADJUST_CFA_OFFSET 4
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
-	testl $TF_MASK,EFLAGS(%esp)
+	testl $TF_MASK,PT_EFLAGS(%esp)
 	jz no_singlestep
 	orl $_TIF_SINGLESTEP,TI_flags(%ebp)
 no_singlestep:
@@ -357,7 +341,7 @@ no_singlestep:
 	jae syscall_badsys
 syscall_call:
 	call *sys_call_table(,%eax,4)
-	movl %eax,EAX(%esp)		# store the return value
+	movl %eax,PT_EAX(%esp)		# store the return value
 syscall_exit:
 	DISABLE_INTERRUPTS		# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
@@ -368,12 +352,12 @@ syscall_exit:
 	jne syscall_exit_work
 
 restore_all:
-	movl EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
-	# Warning: OLDSS(%esp) contains the wrong/random values if we
+	movl PT_EFLAGS(%esp), %eax	# mix EFLAGS, SS and CS
+	# Warning: PT_OLDSS(%esp) contains the wrong/random values if we
 	# are returning to the kernel.
 	# See comments in process.c:copy_thread() for details.
-	movb OLDSS(%esp), %ah
-	movb CS(%esp), %al
+	movb PT_OLDSS(%esp), %ah
+	movb PT_CS(%esp), %al
 	andl $(VM_MASK | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
 	cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
 	CFI_REMEMBER_STATE
@@ -400,7 +384,7 @@ iret_exc:
 
 	CFI_RESTORE_STATE
 ldt_ss:
-	larl OLDSS(%esp), %eax
+	larl PT_OLDSS(%esp), %eax
 	jnz restore_nocheck
 	testl $0x00400000, %eax		# returning to 32bit stack?
 	jnz restore_nocheck		# allright, normal return
@@ -450,7 +434,7 @@ work_resched:
 
 work_notifysig:				# deal with pending signals and
 					# notify-resume requests
-	testl $VM_MASK, EFLAGS(%esp)
+	testl $VM_MASK, PT_EFLAGS(%esp)
 	movl %esp, %eax
 	jne work_notifysig_v86		# returning to kernel-space or
 					# vm86-space
@@ -475,14 +459,14 @@ work_notifysig_v86:
 	# perform syscall exit tracing
 	ALIGN
 syscall_trace_entry:
-	movl $-ENOSYS,EAX(%esp)
+	movl $-ENOSYS,PT_EAX(%esp)
 	movl %esp, %eax
 	xorl %edx,%edx
 	call do_syscall_trace
 	cmpl $0, %eax
 	jne resume_userspace		# ret != 0 -> running under PTRACE_SYSEMU,
 					# so must skip actual syscall
-	movl ORIG_EAX(%esp), %eax
+	movl PT_ORIG_EAX(%esp), %eax
 	cmpl $(nr_syscalls), %eax
 	jnae syscall_call
 	jmp syscall_exit
@@ -507,11 +491,11 @@ syscall_fault:
 	CFI_ADJUST_CFA_OFFSET 4
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
-	movl $-EFAULT,EAX(%esp)
+	movl $-EFAULT,PT_EAX(%esp)
 	jmp resume_userspace
 
 syscall_badsys:
-	movl $-ENOSYS,EAX(%esp)
+	movl $-ENOSYS,PT_EAX(%esp)
 	jmp resume_userspace
 	CFI_ENDPROC
 
@@ -634,10 +618,10 @@ error_code:
 	popl %ecx
 	CFI_ADJUST_CFA_OFFSET -4
 	/*CFI_REGISTER es, ecx*/
-	movl ES(%esp), %edi		# get the function address
-	movl ORIG_EAX(%esp), %edx	# get the error code
-	movl %eax, ORIG_EAX(%esp)
-	movl %ecx, ES(%esp)
+	movl PT_ES(%esp), %edi		# get the function address
+	movl PT_ORIG_EAX(%esp), %edx	# get the error code
+	movl %eax, PT_ORIG_EAX(%esp)
+	movl %ecx, PT_ES(%esp)
 	/*CFI_REL_OFFSET es, ES*/
 	movl $(__USER_DS), %ecx
 	movl %ecx, %ds
@@ -928,26 +912,26 @@ ENTRY(arch_unwind_init_running)
 	movl	4(%esp), %edx
 	movl	(%esp), %ecx
 	leal	4(%esp), %eax
-	movl	%ebx, EBX(%edx)
+	movl	%ebx, PT_EBX(%edx)
 	xorl	%ebx, %ebx
-	movl	%ebx, ECX(%edx)
-	movl	%ebx, EDX(%edx)
-	movl	%esi, ESI(%edx)
-	movl	%edi, EDI(%edx)
-	movl	%ebp, EBP(%edx)
-	movl	%ebx, EAX(%edx)
-	movl	$__USER_DS, DS(%edx)
-	movl	$__USER_DS, ES(%edx)
-	movl	%ebx, ORIG_EAX(%edx)
-	movl	%ecx, EIP(%edx)
+	movl	%ebx, PT_ECX(%edx)
+	movl	%ebx, PT_EDX(%edx)
+	movl	%esi, PT_ESI(%edx)
+	movl	%edi, PT_EDI(%edx)
+	movl	%ebp, PT_EBP(%edx)
+	movl	%ebx, PT_EAX(%edx)
+	movl	$__USER_DS, PT_DS(%edx)
+	movl	$__USER_DS, PT_ES(%edx)
+	movl	%ebx, PT_ORIG_EAX(%edx)
+	movl	%ecx, PT_EIP(%edx)
 	movl	12(%esp), %ecx
-	movl	$__KERNEL_CS, CS(%edx)
-	movl	%ebx, EFLAGS(%edx)
-	movl	%eax, OLDESP(%edx)
+	movl	$__KERNEL_CS, PT_CS(%edx)
+	movl	%ebx, PT_EFLAGS(%edx)
+	movl	%eax, PT_OLDESP(%edx)
 	movl	8(%esp), %eax
 	movl	%ecx, 8(%esp)
-	movl	EBX(%edx), %ebx
-	movl	$__KERNEL_DS, OLDSS(%edx)
+	movl	PT_EBX(%edx), %ebx
+	movl	$__KERNEL_DS, PT_OLDSS(%edx)
 	jmpl	*%eax
 	CFI_ENDPROC
 ENDPROC(arch_unwind_init_running)

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/8] Basic definitions for i386-pda
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 1/8] Use asm-offsets for the offsets of registers into the pt_regs struct, rather than having hard-coded constants Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 3/8] Initialize the per-CPU data area Jeremy Fitzhardinge
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[-- Attachment #1: i386-pda-definitions.patch --]
[-- Type: text/plain, Size: 4404 bytes --]

This patch has the basic definitions of struct i386_pda, and the
segment selector in the GDT.

asm-i386/pda.h is more or less a direct copy of asm-x86_64/pda.h.  The
most interesting difference is the use of _proxy_pda, which is used to
give gcc a model for the actual memory operations on the real pda
structure.  No actual reference is ever made to _proxy_pda, so it is
never defined.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>




---
 arch/i386/kernel/asm-offsets.c |    4 ++
 arch/i386/kernel/head.S        |    2 -
 include/asm-i386/pda.h         |   67 ++++++++++++++++++++++++++++++++++++++++
 include/asm-i386/segment.h     |    5 ++
 4 files changed, 76 insertions(+), 2 deletions(-)


===================================================================
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -15,6 +15,7 @@
 #include <asm/processor.h>
 #include <asm/thread_info.h>
 #include <asm/elf.h>
+#include <asm/pda.h>
 
 #define DEFINE(sym, val) \
         asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -91,4 +92,7 @@ void foo(void)
 	DEFINE(VDSO_PRELINK, VDSO_PRELINK);
 
 	OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
+
+	BLANK();
+	OFFSET(PDA_pcurrent, i386_pda, pcurrent);
 }
===================================================================
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -591,7 +591,7 @@ ENTRY(cpu_gdt_table)
 	.quad 0x004092000000ffff	/* 0xc8 APM DS    data */
 
 	.quad 0x0000920000000000	/* 0xd0 - ESPFIX 16-bit SS */
-	.quad 0x0000000000000000	/* 0xd8 - unused */
+	.quad 0x0000000000000000	/* 0xd8 - PDA */
 	.quad 0x0000000000000000	/* 0xe0 - unused */
 	.quad 0x0000000000000000	/* 0xe8 - unused */
 	.quad 0x0000000000000000	/* 0xf0 - unused */
===================================================================
--- a/include/asm-i386/segment.h
+++ b/include/asm-i386/segment.h
@@ -39,7 +39,7 @@
  *  25 - APM BIOS support 
  *
  *  26 - ESPFIX small SS
- *  27 - unused
+ *  27 - PDA				[ per-cpu private data area ]
  *  28 - unused
  *  29 - unused
  *  30 - unused
@@ -73,6 +73,9 @@
 
 #define GDT_ENTRY_ESPFIX_SS		(GDT_ENTRY_KERNEL_BASE + 14)
 #define __ESPFIX_SS (GDT_ENTRY_ESPFIX_SS * 8)
+
+#define GDT_ENTRY_PDA			(GDT_ENTRY_KERNEL_BASE + 15)
+#define __KERNEL_PDA (GDT_ENTRY_PDA * 8)
 
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
===================================================================
--- /dev/null
+++ b/include/asm-i386/pda.h
@@ -0,0 +1,67 @@
+#ifndef _I386_PDA_H
+#define _I386_PDA_H
+
+struct i386_pda
+{
+	struct task_struct *pcurrent;	/* current process */
+	int cpu_number;
+};
+
+extern struct i386_pda *_cpu_pda[];
+
+#define cpu_pda(i)	(_cpu_pda[i])
+
+#define pda_offset(field) offsetof(struct i386_pda, field)
+
+extern void __bad_pda_field(void);
+
+extern struct i386_pda _proxy_pda;
+
+#define pda_to_op(op,field,val)						\
+	do {								\
+		typedef typeof(_proxy_pda.field) T__;			\
+		switch (sizeof(_proxy_pda.field)) {			\
+		case 2:							\
+			asm(op "w %1,%%gs:%P2"				\
+			    : "+m" (_proxy_pda.field)			\
+			    :"ri" ((T__)val),				\
+			     "i"(pda_offset(field)));			\
+			break;						\
+		case 4:							\
+			asm(op "l %1,%%gs:%P2"				\
+			    : "+m" (_proxy_pda.field)			\
+			    :"ri" ((T__)val),				\
+			     "i"(pda_offset(field)));			\
+			break;						\
+		default: __bad_pda_field();				\
+		}							\
+	} while (0)
+
+#define pda_from_op(op,field)						\
+	({								\
+		typeof(_proxy_pda.field) ret__;				\
+		switch (sizeof(_proxy_pda.field)) {			\
+		case 2:							\
+			asm(op "w %%gs:%P1,%0"				\
+			    : "=r" (ret__)				\
+			    : "i" (pda_offset(field)),			\
+			      "m" (_proxy_pda.field));			\
+			break;						\
+		case 4:							\
+			asm(op "l %%gs:%P1,%0"				\
+			    : "=r" (ret__)				\
+			    : "i" (pda_offset(field)),			\
+			      "m" (_proxy_pda.field));			\
+			break;						\
+		default: __bad_pda_field();				\
+		}							\
+		ret__; })
+
+
+#define read_pda(field) pda_from_op("mov",field)
+#define write_pda(field,val) pda_to_op("mov",field,val)
+#define add_pda(field,val) pda_to_op("add",field,val)
+#define sub_pda(field,val) pda_to_op("sub",field,val)
+#define or_pda(field,val) pda_to_op("or",field,val)
+
+#endif	/* _I386_PDA_H */

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/8] Initialize the per-CPU data area.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 1/8] Use asm-offsets for the offsets of registers into the pt_regs struct, rather than having hard-coded constants Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 2/8] Basic definitions for i386-pda Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 4/8] Use %gs as the PDA base-segment in the kernel Jeremy Fitzhardinge
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[-- Attachment #1: i386-pda-init.patch --]
[-- Type: text/plain, Size: 8176 bytes --]

When a CPU is brought up, a PDA and GDT are allocated for it.  The
GDT's __KERNEL_PDA entry is pointed to the allocated PDA memory, so
that all references using this segment descriptor will refer to the PDA.

This patch rearranges CPU initialization a bit, so that the GDT/PDA
are set up as early as possible in cpu_init().  Also for secondary
CPUs, GDT+PDA are preallocated so the secondary CPU doesn't need to do
any memory allocation while setting up the PDA.  This will be
important once smp_processor_id() and current use the PDA.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/cpu/common.c |  159 +++++++++++++++++++++++++++++------------
 arch/i386/kernel/smpboot.c    |   10 ++
 include/asm-i386/processor.h  |    1 
 3 files changed, 125 insertions(+), 45 deletions(-)


===================================================================
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -18,6 +18,7 @@
 #include <asm/apic.h>
 #include <mach_apic.h>
 #endif
+#include <asm/pda.h>
 
 #include "cpu.h"
 
@@ -26,6 +27,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
 DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
+
+struct i386_pda *_cpu_pda[NR_CPUS] __read_mostly;
+EXPORT_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -582,6 +586,112 @@ void __init early_cpu_init(void)
 	disable_pse = 1;
 #endif
 }
+
+__cpuinit int alloc_gdt(int cpu)
+{
+	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
+	struct desc_struct *gdt;
+	struct i386_pda *pda;
+
+	gdt = (struct desc_struct *)cpu_gdt_descr->address;
+	pda = cpu_pda(cpu);
+
+	/*
+	 * This is a horrible hack to allocate the GDT.  The problem
+	 * is that cpu_init() is called really early for the boot CPU
+	 * (and hence needs bootmem) but much later for the secondary
+	 * CPUs, when bootmem will have gone away
+	 */
+	if (NODE_DATA(0)->bdata->node_bootmem_map) {
+		BUG_ON(gdt != NULL || pda != NULL);
+
+		gdt = alloc_bootmem_pages(PAGE_SIZE);
+		pda = alloc_bootmem(sizeof(*pda));
+		/* alloc_bootmem(_pages) panics on failure, so no check */
+
+		memset(gdt, 0, PAGE_SIZE);
+		memset(pda, 0, sizeof(*pda));
+	} else {
+		/* GDT and PDA might already have been allocated if
+		   this is a CPU hotplug re-insertion. */
+		if (gdt == NULL)
+			gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
+
+		if (pda == NULL)
+			pda = kmalloc_node(sizeof(*pda), GFP_KERNEL, cpu_to_node(cpu));
+
+		if (unlikely(!gdt || !pda)) {
+			free_pages((unsigned long)gdt, 0);
+			kfree(pda);
+			return 0;
+		}
+	}
+	
+ 	cpu_gdt_descr->address = (unsigned long)gdt;
+	cpu_pda(cpu) = pda;
+
+	return 1;
+}
+
+static __cpuinit void pda_init(int cpu, struct task_struct *curr)
+{
+	struct i386_pda *pda = cpu_pda(cpu);
+
+	memset(pda, 0, sizeof(*pda));
+
+	pda->cpu_number = cpu;
+	pda->pcurrent = curr;
+
+	printk("cpu %d current %p\n", cpu, curr);
+}
+
+/* Initialize the CPU's GDT and PDA */
+static __cpuinit void init_gdt(void)
+{
+	int cpu = smp_processor_id();
+	struct task_struct *curr = current;
+	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
+	__u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu);
+	struct desc_struct *gdt;
+	struct i386_pda *pda;
+
+	/* For non-boot CPUs, the GDT and PDA should already have been
+	   allocated. */
+	if (!alloc_gdt(cpu)) {
+		printk(KERN_CRIT "CPU%d failed to allocate GDT or PDA\n", cpu);
+		for (;;)
+			local_irq_enable();
+	}
+
+	gdt = (struct desc_struct *)cpu_gdt_descr->address;
+	pda = cpu_pda(cpu);
+
+	BUG_ON(gdt == NULL || pda == NULL);
+
+	/*
+	 * Initialize the per-CPU GDT with the boot GDT,
+	 * and set up the GDT descriptor:
+	 */
+ 	memcpy(gdt, cpu_gdt_table, GDT_SIZE);
+	cpu_gdt_descr->size = GDT_SIZE - 1;
+
+	/* Set up GDT entry for 16bit stack */
+ 	*(__u64 *)(&gdt[GDT_ENTRY_ESPFIX_SS]) |=
+		((((__u64)stk16_off) << 16) & 0x000000ffffff0000ULL) |
+		((((__u64)stk16_off) << 32) & 0xff00000000000000ULL) |
+		(CPU_16BIT_STACK_SIZE - 1);
+
+	pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
+			(u32 *)&gdt[GDT_ENTRY_PDA].b,
+			(unsigned long)pda, sizeof(*pda) - 1,
+			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
+
+	load_gdt(cpu_gdt_descr);
+
+	/* Do this once everything GDT-related has been set up. */
+	pda_init(cpu, curr);
+}
+
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
@@ -593,14 +703,16 @@ void __cpuinit cpu_init(void)
 	int cpu = smp_processor_id();
 	struct tss_struct * t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &current->thread;
-	struct desc_struct *gdt;
-	__u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu);
-	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 
 	if (cpu_test_and_set(cpu, cpu_initialized)) {
 		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
 		for (;;) local_irq_enable();
 	}
+
+	/* Init the GDT and PDA early, before calling printk(),
+	   since it may end up using the PDA indirectly. */
+	init_gdt();
+
 	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
 
 	if (cpu_has_vme || cpu_has_tsc || cpu_has_de)
@@ -612,47 +724,6 @@ void __cpuinit cpu_init(void)
 		set_in_cr4(X86_CR4_TSD);
 	}
 
-	/* The CPU hotplug case */
-	if (cpu_gdt_descr->address) {
-		gdt = (struct desc_struct *)cpu_gdt_descr->address;
-		memset(gdt, 0, PAGE_SIZE);
-		goto old_gdt;
-	}
-	/*
-	 * This is a horrible hack to allocate the GDT.  The problem
-	 * is that cpu_init() is called really early for the boot CPU
-	 * (and hence needs bootmem) but much later for the secondary
-	 * CPUs, when bootmem will have gone away
-	 */
-	if (NODE_DATA(0)->bdata->node_bootmem_map) {
-		gdt = (struct desc_struct *)alloc_bootmem_pages(PAGE_SIZE);
-		/* alloc_bootmem_pages panics on failure, so no check */
-		memset(gdt, 0, PAGE_SIZE);
-	} else {
-		gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
-		if (unlikely(!gdt)) {
-			printk(KERN_CRIT "CPU%d failed to allocate GDT\n", cpu);
-			for (;;)
-				local_irq_enable();
-		}
-	}
-old_gdt:
-	/*
-	 * Initialize the per-CPU GDT with the boot GDT,
-	 * and set up the GDT descriptor:
-	 */
- 	memcpy(gdt, cpu_gdt_table, GDT_SIZE);
-
-	/* Set up GDT entry for 16bit stack */
- 	*(__u64 *)(&gdt[GDT_ENTRY_ESPFIX_SS]) |=
-		((((__u64)stk16_off) << 16) & 0x000000ffffff0000ULL) |
-		((((__u64)stk16_off) << 32) & 0xff00000000000000ULL) |
-		(CPU_16BIT_STACK_SIZE - 1);
-
-	cpu_gdt_descr->size = GDT_SIZE - 1;
- 	cpu_gdt_descr->address = (unsigned long)gdt;
-
-	load_gdt(cpu_gdt_descr);
 	load_idt(&idt_descr);
 
 	/*
===================================================================
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -536,7 +536,7 @@ static void __devinit start_secondary(vo
 static void __devinit start_secondary(void *unused)
 {
 	/*
-	 * Dont put anything before smp_callin(), SMP
+	 * Don't put *anything* before cpu_init(), SMP
 	 * booting is too fragile that we want to limit the
 	 * things done here to the most necessary things.
 	 */
@@ -931,6 +931,14 @@ static int __devinit do_boot_cpu(int api
 	int timeout;
 	unsigned long start_eip;
 	unsigned short nmi_high = 0, nmi_low = 0;
+
+	/* Pre-allocate the CPU's GDT and PDA so it doesn't have to do
+	   any memory allocation during the delicate CPU-bringup
+	   phase. */
+	if (!alloc_gdt(cpu)) {
+		printk(KERN_INFO "Couldn't allocate GDT/PDA for CPU %d\n", cpu);
+		return -1;	/* ? */
+	}
 
 	++cpucount;
 	alternatives_smp_switch(1);
===================================================================
--- a/include/asm-i386/processor.h
+++ b/include/asm-i386/processor.h
@@ -726,5 +726,6 @@ extern unsigned long boot_option_idle_ov
 extern unsigned long boot_option_idle_override;
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
+extern int alloc_gdt(int cpu);
 
 #endif /* __ASM_I386_PROCESSOR_H */

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/8] Use %gs as the PDA base-segment in the kernel.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2006-08-30 23:52 ` [PATCH 3/8] Initialize the per-CPU data area Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 5/8] Fix places where using %gs changes the usermode ABI Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[-- Attachment #1: i386-pda-use-gs.patch --]
[-- Type: text/plain, Size: 13474 bytes --]

This patch is the meat of the PDA change.  This patch makes several
related changes:

1: Most significantly, %gs is now used in the kernel.  This means that on
   entry, the old value of %gs is saved away, and it is reloaded with
   __KERNEL_PDA.

2: entry.S constructs the stack in the shape of struct pt_regs, and this
   is passed around the kernel so that the process's saved register
   state can be accessed.

   Unfortunately struct pt_regs doesn't currently have space for %gs
   (or %fs). This patch extends pt_regs to add space for gs (no space
   is allocated for %fs, since it won't be used, and it would just
   complicate the code in entry.S to work around the space).

3: Because %gs is now saved on the stack like %ds, %es and the integer
   registers, there are a number of places where it no longer needs to
   be handled specially; namely context switch, and saving/restoring the
   register state in a signal context.

4: And since kernel threads run in kernel space and call normal kernel
   code, they need to be created with their %gs == __KERNEL_PDA.

NOTE: even though it's called "ptrace-abi.h", this file does not
define a user-space visible ABI.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/asm-offsets.c |    1 
 arch/i386/kernel/cpu/common.c  |   21 +++++++++-
 arch/i386/kernel/entry.S       |   81 +++++++++++++++++++++++-----------------
 arch/i386/kernel/process.c     |   27 ++++++-------
 arch/i386/kernel/signal.c      |    6 --
 include/asm-i386/mmu_context.h |    4 -
 include/asm-i386/processor.h   |    4 +
 include/asm-i386/ptrace-abi.h  |    2 
 kernel/fork.c                  |    2 
 9 files changed, 91 insertions(+), 57 deletions(-)


===================================================================
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -68,6 +68,7 @@ void foo(void)
 	OFFSET(PT_EAX, pt_regs, eax);
 	OFFSET(PT_DS,  pt_regs, xds);
 	OFFSET(PT_ES,  pt_regs, xes);
+	OFFSET(PT_GS,  pt_regs, xgs);
 	OFFSET(PT_ORIG_EAX, pt_regs, orig_eax);
 	OFFSET(PT_EIP, pt_regs, eip);
 	OFFSET(PT_CS,  pt_regs, xcs);
===================================================================
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -587,6 +587,14 @@ void __init early_cpu_init(void)
 #endif
 }
 
+/* Make sure %gs it initialized properly in idle threads */
+struct pt_regs * __devinit idle_regs(struct pt_regs *regs)
+{
+	memset(regs, 0, sizeof(struct pt_regs));
+	regs->xgs = __KERNEL_PDA;
+	return regs;
+}
+
 __cpuinit int alloc_gdt(int cpu)
 {
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
@@ -645,6 +653,14 @@ static __cpuinit void pda_init(int cpu, 
 	printk("cpu %d current %p\n", cpu, curr);
 }
 
+static inline void set_kernel_gs(void)
+{
+	/* Set %gs for this CPU's PDA.  Memory clobber is to create a
+	   barrier with respect to any PDA operations, so the compiler
+	   doesn't move any before here. */
+	asm volatile ("mov %0, %%gs" : : "r" (__KERNEL_PDA) : "memory");
+}
+
 /* Initialize the CPU's GDT and PDA */
 static __cpuinit void init_gdt(void)
 {
@@ -687,6 +703,7 @@ static __cpuinit void init_gdt(void)
 			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
 
 	load_gdt(cpu_gdt_descr);
+	set_kernel_gs();
 
 	/* Do this once everything GDT-related has been set up. */
 	pda_init(cpu, curr);
@@ -745,8 +762,8 @@ void __cpuinit cpu_init(void)
 	__set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss);
 #endif
 
-	/* Clear %fs and %gs. */
-	asm volatile ("movl %0, %%fs; movl %0, %%gs" : : "r" (0));
+	/* Clear %fs. */
+	asm volatile ("mov %0, %%fs" : : "r" (0));
 
 	/* Clear all 6 debug registers: */
 	set_debugreg(0, 0);
===================================================================
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -30,12 +30,13 @@
  *	18(%esp) - %eax
  *	1C(%esp) - %ds
  *	20(%esp) - %es
- *	24(%esp) - orig_eax
- *	28(%esp) - %eip
- *	2C(%esp) - %cs
- *	30(%esp) - %eflags
- *	34(%esp) - %oldesp
- *	38(%esp) - %oldss
+ *	24(%esp) - %gs
+ *	28(%esp) - orig_eax
+ *	2C(%esp) - %eip
+ *	30(%esp) - %cs
+ *	34(%esp) - %eflags
+ *	38(%esp) - %oldesp
+ *	3C(%esp) - %oldss
  *
  * "current" is in register %ebx during any slow entries.
  */
@@ -91,6 +92,9 @@ 1:
 
 #define SAVE_ALL \
 	cld; \
+	pushl %gs; \
+	CFI_ADJUST_CFA_OFFSET 4;\
+	/*CFI_REL_OFFSET gs, 0;*/\
 	pushl %es; \
 	CFI_ADJUST_CFA_OFFSET 4;\
 	/*CFI_REL_OFFSET es, 0;*/\
@@ -120,8 +124,10 @@ 1:
 	CFI_REL_OFFSET ebx, 0;\
 	movl $(__USER_DS), %edx; \
 	movl %edx, %ds; \
-	movl %edx, %es;
-
+	movl %edx, %es; \
+	movl $(__KERNEL_PDA), %edx; \
+	movl %edx, %gs
+	
 #define RESTORE_INT_REGS \
 	popl %ebx;	\
 	CFI_ADJUST_CFA_OFFSET -4;\
@@ -153,17 +159,22 @@ 2:	popl %es;	\
 2:	popl %es;	\
 	CFI_ADJUST_CFA_OFFSET -4;\
 	/*CFI_RESTORE es;*/\
-.section .fixup,"ax";	\
-3:	movl $0,(%esp);	\
+3:	popl %gs;	\
+	CFI_ADJUST_CFA_OFFSET -4;\
+	/*CFI_RESTORE gs;*/\
+.pushsection .fixup,"ax";	\
+4:	movl $0,(%esp);	\
 	jmp 1b;		\
-4:	movl $0,(%esp);	\
+5:	movl $0,(%esp);	\
 	jmp 2b;		\
-.previous;		\
+6:	movl $0,(%esp);	\
+	jmp 3b;		\
 .section __ex_table,"a";\
 	.align 4;	\
-	.long 1b,3b;	\
-	.long 2b,4b;	\
-.previous
+	.long 1b,4b;	\
+	.long 2b,5b;	\
+	.long 3b,6b;	\
+.popsection
 
 #define RING0_INT_FRAME \
 	CFI_STARTPROC simple;\
@@ -223,6 +234,7 @@ check_userspace:
 	andl $(VM_MASK | SEGMENT_RPL_MASK), %eax
 	cmpl $USER_RPL, %eax
 	jb resume_kernel		# not returning to v8086 or userspace
+
 ENTRY(resume_userspace)
  	DISABLE_INTERRUPTS		# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
@@ -314,13 +326,20 @@ 1:	movl (%ebp),%ebp
 	testw $_TIF_ALLWORK_MASK, %cx
 	jne syscall_exit_work
 /* if something modifies registers it must also disable sysexit */
+1:	mov  PT_GS(%esp), %gs
 	movl PT_EIP(%esp), %edx
 	movl PT_OLDESP(%esp), %ecx
 	xorl %ebp,%ebp
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS_SYSEXIT
 	CFI_ENDPROC
-
+.pushsection .fixup,"ax";	\
+2:	movl $0,PT_GS(%esp);	\
+	jmp 1b;			\
+.section __ex_table,"a";\
+	.align 4;	\
+	.long 1b,2b;	\
+.popsection
 
 	# system call handler stub
 ENTRY(system_call)
@@ -366,7 +385,7 @@ restore_nocheck:
 	TRACE_IRQS_IRET
 restore_nocheck_notrace:
 	RESTORE_REGS
-	addl $4, %esp
+	addl $4, %esp			# skip orig_eax/error_code
 	CFI_ADJUST_CFA_OFFSET -4
 1:	INTERRUPT_RETURN
 .section .fixup,"ax"
@@ -508,14 +527,12 @@ syscall_badsys:
 	/* put ESP to the proper location */ \
 	movl %eax, %esp;
 #define UNWIND_ESPFIX_STACK \
-	pushl %eax; \
 	CFI_ADJUST_CFA_OFFSET 4; \
 	movl %ss, %eax; \
 	/* see if on 16bit stack */ \
-	cmpw $__ESPFIX_SS, %ax; \
+	cmp $__ESPFIX_SS, %eax; \
 	je 28f; \
-27:	popl %eax; \
-	CFI_ADJUST_CFA_OFFSET -4; \
+27:	CFI_ADJUST_CFA_OFFSET -4; \
 .section .fixup,"ax"; \
 28:	movl $__KERNEL_DS, %eax; \
 	movl %eax, %ds; \
@@ -584,13 +601,15 @@ KPROBE_ENTRY(page_fault)
 	CFI_ADJUST_CFA_OFFSET 4
 	ALIGN
 error_code:
+	/* the function address is in %gs's slot on the stack */
+	pushl %es
+	CFI_ADJUST_CFA_OFFSET 4
 	pushl %ds
 	CFI_ADJUST_CFA_OFFSET 4
 	/*CFI_REL_OFFSET ds, 0*/
 	pushl %eax
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET eax, 0
-	xorl %eax, %eax
 	pushl %ebp
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET ebp, 0
@@ -603,7 +622,6 @@ error_code:
 	pushl %edx
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET edx, 0
-	decl %eax			# eax = -1
 	pushl %ecx
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET ecx, 0
@@ -611,21 +629,17 @@ error_code:
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET ebx, 0
 	cld
-	pushl %es
-	CFI_ADJUST_CFA_OFFSET 4
-	/*CFI_REL_OFFSET es, 0*/
 	UNWIND_ESPFIX_STACK
-	popl %ecx
-	CFI_ADJUST_CFA_OFFSET -4
-	/*CFI_REGISTER es, ecx*/
-	movl PT_ES(%esp), %edi		# get the function address
+	movl PT_GS(%esp), %edi		# get the function address
 	movl PT_ORIG_EAX(%esp), %edx	# get the error code
-	movl %eax, PT_ORIG_EAX(%esp)
-	movl %ecx, PT_ES(%esp)
-	/*CFI_REL_OFFSET es, ES*/
+	movl $-1, PT_ORIG_EAX(%esp)	# no syscall to restart
+	mov  %gs, PT_GS(%esp)
+	/*CFI_REL_OFFSET gs, GS*/
 	movl $(__USER_DS), %ecx
 	movl %ecx, %ds
 	movl %ecx, %es
+	movl $(__KERNEL_PDA), %ecx
+	movl %ecx, %gs
 	movl %esp,%eax			# pt_regs pointer
 	call *%edi
 	jmp ret_from_exception
@@ -935,6 +949,7 @@ ENTRY(arch_unwind_init_running)
 	movl	%ebx, PT_EAX(%edx)
 	movl	$__USER_DS, PT_DS(%edx)
 	movl	$__USER_DS, PT_ES(%edx)
+	movl	$0, PT_GS(%edx)
 	movl	%ebx, PT_ORIG_EAX(%edx)
 	movl	%ecx, PT_EIP(%edx)
 	movl	12(%esp), %ecx
===================================================================
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -56,6 +56,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/cpu.h>
+#include <asm/pda.h>
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
@@ -340,6 +341,7 @@ int kernel_thread(int (*fn)(void *), voi
 
 	regs.xds = __USER_DS;
 	regs.xes = __USER_DS;
+	regs.xgs = __KERNEL_PDA;
 	regs.orig_eax = -1;
 	regs.eip = (unsigned long) kernel_thread_helper;
 	regs.xcs = __KERNEL_CS | get_kernel_rpl();
@@ -425,7 +427,6 @@ int copy_thread(int nr, unsigned long cl
 	p->thread.eip = (unsigned long) ret_from_fork;
 
 	savesegment(fs,p->thread.fs);
-	savesegment(gs,p->thread.gs);
 
 	tsk = current;
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
@@ -656,16 +657,16 @@ struct task_struct fastcall * __switch_t
 	load_esp0(tss, next);
 
 	/*
-	 * Save away %fs and %gs. No need to save %es and %ds, as
-	 * those are always kernel segments while inside the kernel.
-	 * Doing this before setting the new TLS descriptors avoids
-	 * the situation where we temporarily have non-reloadable
-	 * segments in %fs and %gs.  This could be an issue if the
-	 * NMI handler ever used %fs or %gs (it does not today), or
-	 * if the kernel is running inside of a hypervisor layer.
+	 * Save away %fs. No need to save %gs, as it was saved on the
+	 * stack on entry.  No need to save %es and %ds, as those are
+	 * always kernel segments while inside the kernel.  Doing this
+	 * before setting the new TLS descriptors avoids the situation
+	 * where we temporarily have non-reloadable segments in %fs
+	 * and %gs.  This could be an issue if the NMI handler ever
+	 * used %fs or %gs (it does not today), or if the kernel is
+	 * running inside of a hypervisor layer.
 	 */
 	savesegment(fs, prev->fs);
-	savesegment(gs, prev->gs);
 
 	/*
 	 * Load the per-thread Thread-Local Storage descriptor.
@@ -673,16 +674,14 @@ struct task_struct fastcall * __switch_t
 	load_TLS(next, cpu);
 
 	/*
-	 * Restore %fs and %gs if needed.
+	 * Restore %fs if needed.
 	 *
-	 * Glibc normally makes %fs be zero, and %gs is one of
-	 * the TLS segments.
+	 * Glibc normally makes %fs be zero.
 	 */
 	if (unlikely(prev->fs | next->fs))
 		loadsegment(fs, next->fs);
 
-	if (prev->gs | next->gs)
-		loadsegment(gs, next->gs);
+	write_pda(pcurrent, next_p);
 
 	/*
 	 * Restore IOPL if needed.
===================================================================
--- a/arch/i386/kernel/signal.c
+++ b/arch/i386/kernel/signal.c
@@ -128,7 +128,7 @@ restore_sigcontext(struct pt_regs *regs,
 			 X86_EFLAGS_TF | X86_EFLAGS_SF | X86_EFLAGS_ZF | \
 			 X86_EFLAGS_AF | X86_EFLAGS_PF | X86_EFLAGS_CF)
 
-	GET_SEG(gs);
+	COPY_SEG(gs);
 	GET_SEG(fs);
 	COPY_SEG(es);
 	COPY_SEG(ds);
@@ -244,9 +244,7 @@ setup_sigcontext(struct sigcontext __use
 {
 	int tmp, err = 0;
 
-	tmp = 0;
-	savesegment(gs, tmp);
-	err |= __put_user(tmp, (unsigned int __user *)&sc->gs);
+	err |= __put_user(regs->xgs, (unsigned int __user *)&sc->gs);
 	savesegment(fs, tmp);
 	err |= __put_user(tmp, (unsigned int __user *)&sc->fs);
 
===================================================================
--- a/include/asm-i386/mmu_context.h
+++ b/include/asm-i386/mmu_context.h
@@ -62,8 +62,8 @@ static inline void switch_mm(struct mm_s
 #endif
 }
 
-#define deactivate_mm(tsk, mm) \
-	asm("movl %0,%%fs ; movl %0,%%gs": :"r" (0))
+#define deactivate_mm(tsk, mm)			\
+	asm("movl %0,%%fs": :"r" (0));
 
 #define activate_mm(prev, next) \
 	switch_mm((prev),(next),NULL)
===================================================================
--- a/include/asm-i386/processor.h
+++ b/include/asm-i386/processor.h
@@ -473,6 +473,7 @@ struct thread_struct {
 	.vm86_info = NULL,						\
 	.sysenter_cs = __KERNEL_CS,					\
 	.io_bitmap_ptr = NULL,						\
+	.gs = __KERNEL_PDA,						\
 }
 
 /*
@@ -500,7 +501,8 @@ static inline void load_esp0(struct tss_
 }
 
 #define start_thread(regs, new_eip, new_esp) do {		\
-	__asm__("movl %0,%%fs ; movl %0,%%gs": :"r" (0));	\
+	__asm__("movl %0,%%fs": :"r" (0));			\
+	regs->xgs = 0;						\
 	set_fs(USER_DS);					\
 	regs->xds = __USER_DS;					\
 	regs->xes = __USER_DS;					\
===================================================================
--- a/include/asm-i386/ptrace-abi.h
+++ b/include/asm-i386/ptrace-abi.h
@@ -33,6 +33,8 @@ struct pt_regs {
 	long eax;
 	int  xds;
 	int  xes;
+	/* int  xfs; */
+	int  xgs;
 	long orig_eax;
 	long eip;
 	int  xcs;
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1298,7 +1298,7 @@ fork_out:
 	return ERR_PTR(retval);
 }
 
-struct pt_regs * __devinit __attribute__((weak)) idle_regs(struct pt_regs *regs)
+noinline struct pt_regs * __devinit __attribute__((weak)) idle_regs(struct pt_regs *regs)
 {
 	memset(regs, 0, sizeof(struct pt_regs));
 	return regs;

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2006-08-30 23:52 ` [PATCH 4/8] Use %gs as the PDA base-segment in the kernel Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-31  7:11   ` Andi Kleen
  2006-08-30 23:52 ` [PATCH 6/8] Update sys_vm86 to cope with changed pt_regs and %gs usage Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[-- Attachment #1: i386-pda-fix-abi.patch --]
[-- Type: text/plain, Size: 3559 bytes --]

There are a few places where the change in struct pt_regs and the use
of %gs affect the userspace ABI.  These are primarily debugging
interfaces where thread state can be inspected or extracted.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/process.c |    6 +++---
 arch/i386/kernel/ptrace.c  |   18 ++++++------------
 include/asm-i386/elf.h     |    2 +-
 include/asm-i386/unwind.h  |    1 +
 4 files changed, 11 insertions(+), 16 deletions(-)


===================================================================
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -309,8 +309,8 @@ void show_regs(struct pt_regs * regs)
 		regs->eax,regs->ebx,regs->ecx,regs->edx);
 	printk("ESI: %08lx EDI: %08lx EBP: %08lx",
 		regs->esi, regs->edi, regs->ebp);
-	printk(" DS: %04x ES: %04x\n",
-		0xffff & regs->xds,0xffff & regs->xes);
+	printk(" DS: %04x ES: %04x GS: %04x\n",
+	       0xffff & regs->xds,0xffff & regs->xes, 0xffff & regs->xgs);
 
 	cr0 = read_cr0();
 	cr2 = read_cr2();
@@ -504,7 +504,7 @@ void dump_thread(struct pt_regs * regs, 
 	dump->regs.ds = regs->xds;
 	dump->regs.es = regs->xes;
 	savesegment(fs,dump->regs.fs);
-	savesegment(gs,dump->regs.gs);
+	dump->regs.gs = regs->xgs;
 	dump->regs.orig_eax = regs->orig_eax;
 	dump->regs.eip = regs->eip;
 	dump->regs.cs = regs->xcs;
===================================================================
--- a/arch/i386/kernel/ptrace.c
+++ b/arch/i386/kernel/ptrace.c
@@ -94,13 +94,9 @@ static int putreg(struct task_struct *ch
 				return -EIO;
 			child->thread.fs = value;
 			return 0;
-		case GS:
-			if (value && (value & 3) != 3)
-				return -EIO;
-			child->thread.gs = value;
-			return 0;
 		case DS:
 		case ES:
+		case GS:
 			if (value && (value & 3) != 3)
 				return -EIO;
 			value &= 0xffff;
@@ -116,8 +112,8 @@ static int putreg(struct task_struct *ch
 			value |= get_stack_long(child, EFL_OFFSET) & ~FLAG_MASK;
 			break;
 	}
-	if (regno > GS*4)
-		regno -= 2*4;
+	if (regno > ES*4)
+		regno -= 1*4;
 	put_stack_long(child, regno - sizeof(struct pt_regs), value);
 	return 0;
 }
@@ -131,18 +127,16 @@ static unsigned long getreg(struct task_
 		case FS:
 			retval = child->thread.fs;
 			break;
-		case GS:
-			retval = child->thread.gs;
-			break;
 		case DS:
 		case ES:
+		case GS:
 		case SS:
 		case CS:
 			retval = 0xffff;
 			/* fall through */
 		default:
-			if (regno > GS*4)
-				regno -= 2*4;
+			if (regno > ES*4)
+				regno -= 1*4;
 			regno = regno - sizeof(struct pt_regs);
 			retval &= get_stack_long(child, regno);
 	}
===================================================================
--- a/include/asm-i386/elf.h
+++ b/include/asm-i386/elf.h
@@ -88,7 +88,7 @@ typedef struct user_fxsr_struct elf_fpxr
 	pr_reg[7] = regs->xds;				\
 	pr_reg[8] = regs->xes;				\
 	savesegment(fs,pr_reg[9]);			\
-	savesegment(gs,pr_reg[10]);			\
+	pr_reg[10] = regs->xgs;				\
 	pr_reg[11] = regs->orig_eax;			\
 	pr_reg[12] = regs->eip;				\
 	pr_reg[13] = regs->xcs;				\
===================================================================
--- a/include/asm-i386/unwind.h
+++ b/include/asm-i386/unwind.h
@@ -64,6 +64,7 @@ static inline void arch_unw_init_blocked
 	info->regs.xss = __KERNEL_DS;
 	info->regs.xds = __USER_DS;
 	info->regs.xes = __USER_DS;
+	info->regs.xgs = __KERNEL_PDA;
 }
 
 extern asmlinkage int arch_unwind_init_running(struct unwind_frame_info *,

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 6/8] Update sys_vm86 to cope with changed pt_regs and %gs usage.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
                   ` (4 preceding siblings ...)
  2006-08-30 23:52 ` [PATCH 5/8] Fix places where using %gs changes the usermode ABI Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 7/8] Implement smp_processor_id() with the PDA Jeremy Fitzhardinge
  2006-08-30 23:52 ` [PATCH 8/8] Implement "current" " Jeremy Fitzhardinge
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen,
	Andrew Morton, Al Viro, Jason Baron, Chris Wright

[-- Attachment #1: i386-pda-fix-vm86.patch --]
[-- Type: text/plain, Size: 11932 bytes --]

sys_vm86 uses a struct kernel_vm86_regs, which is identical to
pt_regs, but adds an extra space for all the segment registers that
iret needs when returning into vm86 mode.

Previously this structure was completely independent, so changes in
pt_regs had to be manually reflected in kernel_vm86_regs.  This change
just embeds pt_regs in kernel_vm86_regs, and makes the appropriate
changes to vm86.c to deal with the new nameing.

Also, since %gs is dealt with differently in the kernel, this change
adjusts vm86.c accordingly.  Namely, the on-stack saved regs->xgs is
the place where usermode gs is stored, rather than in the CPU's %gs
register.

While making these changes, I also cleaned up some frankly bizarre
code which was added when auditing was added to sys_vm86.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>

---
 arch/i386/kernel/vm86.c |  125 ++++++++++++++++++++++++++++-------------------
 include/asm-i386/vm86.h |   17 ------
 2 files changed, 78 insertions(+), 64 deletions(-)


===================================================================
--- a/arch/i386/kernel/vm86.c
+++ b/arch/i386/kernel/vm86.c
@@ -43,6 +43,7 @@
 #include <linux/highmem.h>
 #include <linux/ptrace.h>
 #include <linux/audit.h>
+#include <linux/stddef.h>
 
 #include <asm/uaccess.h>
 #include <asm/io.h>
@@ -72,10 +73,10 @@
 /*
  * 8- and 16-bit register defines..
  */
-#define AL(regs)	(((unsigned char *)&((regs)->eax))[0])
-#define AH(regs)	(((unsigned char *)&((regs)->eax))[1])
-#define IP(regs)	(*(unsigned short *)&((regs)->eip))
-#define SP(regs)	(*(unsigned short *)&((regs)->esp))
+#define AL(regs)	(((unsigned char *)&((regs)->pt.eax))[0])
+#define AH(regs)	(((unsigned char *)&((regs)->pt.eax))[1])
+#define IP(regs)	(*(unsigned short *)&((regs)->pt.eip))
+#define SP(regs)	(*(unsigned short *)&((regs)->pt.esp))
 
 /*
  * virtual flags (16 and 32-bit versions)
@@ -89,10 +90,37 @@
 #define SAFE_MASK	(0xDD5)
 #define RETURN_MASK	(0xDFF)
 
-#define VM86_REGS_PART2 orig_eax
-#define VM86_REGS_SIZE1 \
-        ( (unsigned)( & (((struct kernel_vm86_regs *)0)->VM86_REGS_PART2) ) )
-#define VM86_REGS_SIZE2 (sizeof(struct kernel_vm86_regs) - VM86_REGS_SIZE1)
+/* convert kernel_vm86_regs to vm86_regs */
+static int copy_vm86_regs_to_user(struct vm86_regs __user *user,
+				  const struct kernel_vm86_regs *regs)
+{
+	int ret = 0;
+
+	/* kernel_vm86_regs is missing xfs, so copy everything up to
+	   (but not including) xgs, and then rest after xgs. */
+	ret += copy_to_user(user, regs, offsetof(struct kernel_vm86_regs, pt.xgs));
+	ret += copy_to_user(&user->__null_gs, &regs->pt.xgs,
+			    sizeof(struct kernel_vm86_regs) -
+			    offsetof(struct kernel_vm86_regs, pt.xgs));
+
+	return ret;
+}
+
+/* convert vm86_regs to kernel_vm86_regs */
+static int copy_vm86_regs_from_user(struct kernel_vm86_regs *regs,
+				    const struct vm86_regs __user *user,
+				    unsigned extra)
+{
+	int ret = 0;
+
+	ret += copy_from_user(regs, user, offsetof(struct kernel_vm86_regs, pt.xgs));
+	ret += copy_from_user(&regs->pt.xgs, &user->__null_gs,
+			      sizeof(struct kernel_vm86_regs) -
+			      offsetof(struct kernel_vm86_regs, pt.xgs) +
+			      extra);
+
+	return ret;
+}
 
 struct pt_regs * FASTCALL(save_v86_state(struct kernel_vm86_regs * regs));
 struct pt_regs * fastcall save_v86_state(struct kernel_vm86_regs * regs)
@@ -112,10 +140,8 @@ struct pt_regs * fastcall save_v86_state
 		printk("no vm86_info: BAD\n");
 		do_exit(SIGSEGV);
 	}
-	set_flags(regs->eflags, VEFLAGS, VIF_MASK | current->thread.v86mask);
-	tmp = copy_to_user(&current->thread.vm86_info->regs,regs, VM86_REGS_SIZE1);
-	tmp += copy_to_user(&current->thread.vm86_info->regs.VM86_REGS_PART2,
-		&regs->VM86_REGS_PART2, VM86_REGS_SIZE2);
+	set_flags(regs->pt.eflags, VEFLAGS, VIF_MASK | current->thread.v86mask);
+	tmp = copy_vm86_regs_to_user(&current->thread.vm86_info->regs,regs);
 	tmp += put_user(current->thread.screen_bitmap,&current->thread.vm86_info->screen_bitmap);
 	if (tmp) {
 		printk("vm86: could not access userspace vm86_info\n");
@@ -129,9 +155,11 @@ struct pt_regs * fastcall save_v86_state
 	current->thread.saved_esp0 = 0;
 	put_cpu();
 
+	ret = KVM86->regs32;
+
 	loadsegment(fs, current->thread.saved_fs);
-	loadsegment(gs, current->thread.saved_gs);
-	ret = KVM86->regs32;
+	ret->xgs = current->thread.saved_gs;
+
 	return ret;
 }
 
@@ -183,9 +211,9 @@ asmlinkage int sys_vm86old(struct pt_reg
 	tsk = current;
 	if (tsk->thread.saved_esp0)
 		goto out;
-	tmp  = copy_from_user(&info, v86, VM86_REGS_SIZE1);
-	tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2,
-		(long)&info.vm86plus - (long)&info.regs.VM86_REGS_PART2);
+	tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
+				       offsetof(struct kernel_vm86_struct, vm86plus) -
+				       sizeof(info.regs));
 	ret = -EFAULT;
 	if (tmp)
 		goto out;
@@ -233,9 +261,9 @@ asmlinkage int sys_vm86(struct pt_regs r
 	if (tsk->thread.saved_esp0)
 		goto out;
 	v86 = (struct vm86plus_struct __user *)regs.ecx;
-	tmp  = copy_from_user(&info, v86, VM86_REGS_SIZE1);
-	tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2,
-		(long)&info.regs32 - (long)&info.regs.VM86_REGS_PART2);
+	tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
+				       offsetof(struct kernel_vm86_struct, regs32) -
+				       sizeof(info.regs));
 	ret = -EFAULT;
 	if (tmp)
 		goto out;
@@ -252,15 +280,15 @@ static void do_sys_vm86(struct kernel_vm
 static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
 {
 	struct tss_struct *tss;
-	long eax;
 /*
  * make sure the vm86() system call doesn't try to do anything silly
  */
-	info->regs.__null_ds = 0;
-	info->regs.__null_es = 0;
-
-/* we are clearing fs,gs later just before "jmp resume_userspace",
- * because starting with Linux 2.1.x they aren't no longer saved/restored
+	info->regs.pt.xds = 0;
+	info->regs.pt.xes = 0;
+	info->regs.pt.xgs = 0;
+
+/* we are clearing fs later just before "jmp resume_userspace",
+ * because it is not saved/restored.
  */
 
 /*
@@ -268,10 +296,10 @@ static void do_sys_vm86(struct kernel_vm
  * has set it up safely, so this makes sure interrupt etc flags are
  * inherited from protected mode.
  */
- 	VEFLAGS = info->regs.eflags;
-	info->regs.eflags &= SAFE_MASK;
-	info->regs.eflags |= info->regs32->eflags & ~SAFE_MASK;
-	info->regs.eflags |= VM_MASK;
+ 	VEFLAGS = info->regs.pt.eflags;
+	info->regs.pt.eflags &= SAFE_MASK;
+	info->regs.pt.eflags |= info->regs32->eflags & ~SAFE_MASK;
+	info->regs.pt.eflags |= VM_MASK;
 
 	switch (info->cpu_type) {
 		case CPU_286:
@@ -294,7 +322,7 @@ static void do_sys_vm86(struct kernel_vm
 	info->regs32->eax = 0;
 	tsk->thread.saved_esp0 = tsk->thread.esp0;
 	savesegment(fs, tsk->thread.saved_fs);
-	savesegment(gs, tsk->thread.saved_gs);
+	tsk->thread.saved_gs = info->regs32->xgs;
 
 	tss = &per_cpu(init_tss, get_cpu());
 	tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0;
@@ -306,19 +334,18 @@ static void do_sys_vm86(struct kernel_vm
 	tsk->thread.screen_bitmap = info->screen_bitmap;
 	if (info->flags & VM86_SCREEN_BITMAP)
 		mark_screen_rdonly(tsk->mm);
-	__asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs\n\t");
-	__asm__ __volatile__("movl %%eax, %0\n" :"=r"(eax));
 
 	/*call audit_syscall_exit since we do not exit via the normal paths */
 	if (unlikely(current->audit_context))
-		audit_syscall_exit(AUDITSC_RESULT(eax), eax);
+		audit_syscall_exit(AUDITSC_RESULT(0), 0);
 
 	__asm__ __volatile__(
 		"movl %0,%%esp\n\t"
 		"movl %1,%%ebp\n\t"
+		"mov  %2, %%fs\n\t"
 		"jmp resume_userspace"
 		: /* no outputs */
-		:"r" (&info->regs), "r" (task_thread_info(tsk)));
+		:"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));
 	/* we never return here */
 }
 
@@ -348,12 +375,12 @@ static inline void clear_IF(struct kerne
 
 static inline void clear_TF(struct kernel_vm86_regs * regs)
 {
-	regs->eflags &= ~TF_MASK;
+	regs->pt.eflags &= ~TF_MASK;
 }
 
 static inline void clear_AC(struct kernel_vm86_regs * regs)
 {
-	regs->eflags &= ~AC_MASK;
+	regs->pt.eflags &= ~AC_MASK;
 }
 
 /* It is correct to call set_IF(regs) from the set_vflags_*
@@ -370,7 +397,7 @@ static inline void set_vflags_long(unsig
 static inline void set_vflags_long(unsigned long eflags, struct kernel_vm86_regs * regs)
 {
 	set_flags(VEFLAGS, eflags, current->thread.v86mask);
-	set_flags(regs->eflags, eflags, SAFE_MASK);
+	set_flags(regs->pt.eflags, eflags, SAFE_MASK);
 	if (eflags & IF_MASK)
 		set_IF(regs);
 	else
@@ -380,7 +407,7 @@ static inline void set_vflags_short(unsi
 static inline void set_vflags_short(unsigned short flags, struct kernel_vm86_regs * regs)
 {
 	set_flags(VFLAGS, flags, current->thread.v86mask);
-	set_flags(regs->eflags, flags, SAFE_MASK);
+	set_flags(regs->pt.eflags, flags, SAFE_MASK);
 	if (flags & IF_MASK)
 		set_IF(regs);
 	else
@@ -389,7 +416,7 @@ static inline void set_vflags_short(unsi
 
 static inline unsigned long get_vflags(struct kernel_vm86_regs * regs)
 {
-	unsigned long flags = regs->eflags & RETURN_MASK;
+	unsigned long flags = regs->pt.eflags & RETURN_MASK;
 
 	if (VEFLAGS & VIF_MASK)
 		flags |= IF_MASK;
@@ -493,7 +520,7 @@ static void do_int(struct kernel_vm86_re
 	unsigned long __user *intr_ptr;
 	unsigned long segoffs;
 
-	if (regs->cs == BIOSSEG)
+	if (regs->pt.xcs == BIOSSEG)
 		goto cannot_handle;
 	if (is_revectored(i, &KVM86->int_revectored))
 		goto cannot_handle;
@@ -505,9 +532,9 @@ static void do_int(struct kernel_vm86_re
 	if ((segoffs >> 16) == BIOSSEG)
 		goto cannot_handle;
 	pushw(ssp, sp, get_vflags(regs), cannot_handle);
-	pushw(ssp, sp, regs->cs, cannot_handle);
+	pushw(ssp, sp, regs->pt.xcs, cannot_handle);
 	pushw(ssp, sp, IP(regs), cannot_handle);
-	regs->cs = segoffs >> 16;
+	regs->pt.xcs = segoffs >> 16;
 	SP(regs) -= 6;
 	IP(regs) = segoffs & 0xffff;
 	clear_TF(regs);
@@ -524,7 +551,7 @@ int handle_vm86_trap(struct kernel_vm86_
 	if (VMPI.is_vm86pus) {
 		if ( (trapno==3) || (trapno==1) )
 			return_to_32bit(regs, VM86_TRAP + (trapno << 8));
-		do_int(regs, trapno, (unsigned char __user *) (regs->ss << 4), SP(regs));
+		do_int(regs, trapno, (unsigned char __user *) (regs->pt.xss << 4), SP(regs));
 		return 0;
 	}
 	if (trapno !=1)
@@ -560,10 +587,10 @@ void handle_vm86_fault(struct kernel_vm8
 		handle_vm86_trap(regs, 0, 1); \
 	return; } while (0)
 
-	orig_flags = *(unsigned short *)&regs->eflags;
-
-	csp = (unsigned char __user *) (regs->cs << 4);
-	ssp = (unsigned char __user *) (regs->ss << 4);
+	orig_flags = *(unsigned short *)&regs->pt.eflags;
+
+	csp = (unsigned char __user *) (regs->pt.xcs << 4);
+	ssp = (unsigned char __user *) (regs->pt.xss << 4);
 	sp = SP(regs);
 	ip = IP(regs);
 
@@ -650,7 +677,7 @@ void handle_vm86_fault(struct kernel_vm8
 			SP(regs) += 6;
 		}
 		IP(regs) = newip;
-		regs->cs = newcs;
+		regs->pt.xcs = newcs;
 		CHECK_IF_IN_TRAP;
 		if (data32) {
 			set_vflags_long(newflags, regs);
===================================================================
--- a/include/asm-i386/vm86.h
+++ b/include/asm-i386/vm86.h
@@ -145,26 +145,13 @@ struct vm86plus_struct {
  * at the end of the structure. Look at ptrace.h to see the "normal"
  * setup. For user space layout see 'struct vm86_regs' above.
  */
+#include <asm/ptrace-abi.h>
 
 struct kernel_vm86_regs {
 /*
  * normal regs, with special meaning for the segment descriptors..
  */
-	long ebx;
-	long ecx;
-	long edx;
-	long esi;
-	long edi;
-	long ebp;
-	long eax;
-	long __null_ds;
-	long __null_es;
-	long orig_eax;
-	long eip;
-	unsigned short cs, __csh;
-	long eflags;
-	long esp;
-	unsigned short ss, __ssh;
+	struct pt_regs pt;
 /*
  * these are specific to v86 mode:
  */

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 7/8] Implement smp_processor_id() with the PDA.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
                   ` (5 preceding siblings ...)
  2006-08-30 23:52 ` [PATCH 6/8] Update sys_vm86 to cope with changed pt_regs and %gs usage Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  2006-08-31 12:35   ` Ian Campbell
  2006-08-30 23:52 ` [PATCH 8/8] Implement "current" " Jeremy Fitzhardinge
  7 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[-- Attachment #1: i386-pda-smp_processor_id.patch --]
[-- Type: text/plain, Size: 4618 bytes --]

Use the cpu_number in the PDA to implement raw_smp_processor_id.  This
is a little simpler than using thread_info, though the cpu field in
thread_info cannot be removed since it is used for things other than
getting the current CPU in common code.

The slightly subtle part of this patch is dealing with very early uses
of smp_processor_id().  This is handled on the boot CPU by setting up
a very early PDA, which is later replaced when cpu_init() is called on
the boot CPU.  For other CPUs, it uses the thread_info cpu field until
the PDA has been set up.

This is more or less an example of using the PDA, and to give it a
proper exercising.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/asm-offsets.c |    2 +-
 arch/i386/kernel/cpu/common.c  |   18 ++++++++++++++++--
 include/asm-i386/smp.h         |    6 +++++-
 init/main.c                    |    9 +++++++--
 4 files changed, 29 insertions(+), 6 deletions(-)


===================================================================
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -52,7 +52,6 @@ void foo(void)
 	OFFSET(TI_exec_domain, thread_info, exec_domain);
 	OFFSET(TI_flags, thread_info, flags);
 	OFFSET(TI_status, thread_info, status);
-	OFFSET(TI_cpu, thread_info, cpu);
 	OFFSET(TI_preempt_count, thread_info, preempt_count);
 	OFFSET(TI_addr_limit, thread_info, addr_limit);
 	OFFSET(TI_restart_block, thread_info, restart_block);
@@ -96,4 +95,5 @@ void foo(void)
 
 	BLANK();
 	OFFSET(PDA_pcurrent, i386_pda, pcurrent);
+	OFFSET(PDA_cpu, i386_pda, cpu_number);
 }
===================================================================
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -664,7 +664,7 @@ static inline void set_kernel_gs(void)
 /* Initialize the CPU's GDT and PDA */
 static __cpuinit void init_gdt(void)
 {
-	int cpu = smp_processor_id();
+	int cpu = early_smp_processor_id();
 	struct task_struct *curr = current;
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 	__u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu);
@@ -707,6 +707,20 @@ static __cpuinit void init_gdt(void)
 
 	/* Do this once everything GDT-related has been set up. */
 	pda_init(cpu, curr);
+}
+
+/* Set up a very early PDA for the boot CPU so that smp_processor_id will work */
+void __init smp_setup_processor_id(void)
+{
+	static const __initdata struct i386_pda boot_pda;
+
+	pack_descriptor((u32 *)&cpu_gdt_table[GDT_ENTRY_PDA].a,
+			(u32 *)&cpu_gdt_table[GDT_ENTRY_PDA].b,
+			(unsigned long)&boot_pda, sizeof(struct i386_pda) - 1,
+			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
+
+	/* Set %gs for this CPU's PDA */
+	set_kernel_gs();
 }
 
 /*
@@ -717,7 +731,7 @@ static __cpuinit void init_gdt(void)
  */
 void __cpuinit cpu_init(void)
 {
-	int cpu = smp_processor_id();
+	int cpu = early_smp_processor_id();
 	struct tss_struct * t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &current->thread;
 
===================================================================
--- a/include/asm-i386/smp.h
+++ b/include/asm-i386/smp.h
@@ -8,6 +8,7 @@
 #include <linux/kernel.h>
 #include <linux/threads.h>
 #include <linux/cpumask.h>
+#include <asm/pda.h>
 #endif
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -58,7 +59,10 @@ extern void cpu_uninit(void);
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (current_thread_info()->cpu)
+#define raw_smp_processor_id() (read_pda(cpu_number))
+/* This is valid from the very earliest point in boot that we care
+   about. */
+#define early_smp_processor_id() (current_thread_info()->cpu)
 
 extern cpumask_t cpu_callout_map;
 extern cpumask_t cpu_callin_map;
===================================================================
--- a/init/main.c
+++ b/init/main.c
@@ -473,8 +473,13 @@ static void __init boot_cpu_init(void)
 	cpu_set(cpu, cpu_possible_map);
 }
 
-void __init __attribute__((weak)) smp_setup_processor_id(void)
-{
+/* Some versions of gcc seem to want to inline/eliminate the call to
+   this function, even though it is weak and could therefore be
+   replaced at link time.  Mark it noinline, and add an asm() to make
+   it harder to digest. */
+noinline void __init __attribute__((weak)) smp_setup_processor_id(void)
+{
+	asm volatile("" : : : "memory");
 }
 
 asmlinkage void __init start_kernel(void)

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 8/8] Implement "current" with the PDA.
  2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
                   ` (6 preceding siblings ...)
  2006-08-30 23:52 ` [PATCH 7/8] Implement smp_processor_id() with the PDA Jeremy Fitzhardinge
@ 2006-08-30 23:52 ` Jeremy Fitzhardinge
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-30 23:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[-- Attachment #1: i386-pda-current.patch --]
[-- Type: text/plain, Size: 4105 bytes --]

Use the pcurrent field in the PDA to implement the "current" macro.
This ends up compiling down to a single instruction to get the current
task.

This keeps the original definition of "get_current()" with the name
"early_current()", for use before the PDA has been set up.  On the
boot CPU, "current" will always work, but on secondary CPUs, it needs
the PDA to be explicitly set up first.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>

---
 arch/i386/kernel/cpu/common.c |   19 ++++++++++++-------
 arch/i386/kernel/smpboot.c    |    4 +++-
 include/asm-i386/current.h    |   10 ++++++++--
 3 files changed, 23 insertions(+), 10 deletions(-)


===================================================================
--- a/arch/i386/kernel/cpu/common.c
+++ b/arch/i386/kernel/cpu/common.c
@@ -665,7 +665,7 @@ static __cpuinit void init_gdt(void)
 static __cpuinit void init_gdt(void)
 {
 	int cpu = early_smp_processor_id();
-	struct task_struct *curr = current;
+	struct task_struct *curr = early_current();
 	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
 	__u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu);
 	struct desc_struct *gdt;
@@ -709,15 +709,18 @@ static __cpuinit void init_gdt(void)
 	pda_init(cpu, curr);
 }
 
-/* Set up a very early PDA for the boot CPU so that smp_processor_id will work */
+/* Set up a very early PDA for the boot CPU so that smp_processor_id()
+   and current will work. */
 void __init smp_setup_processor_id(void)
 {
-	static const __initdata struct i386_pda boot_pda;
+	static __initdata struct i386_pda boot_pda;
 
 	pack_descriptor((u32 *)&cpu_gdt_table[GDT_ENTRY_PDA].a,
 			(u32 *)&cpu_gdt_table[GDT_ENTRY_PDA].b,
 			(unsigned long)&boot_pda, sizeof(struct i386_pda) - 1,
 			0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
+
+	boot_pda.pcurrent = early_current();
 
 	/* Set %gs for this CPU's PDA */
 	set_kernel_gs();
@@ -732,8 +735,10 @@ void __cpuinit cpu_init(void)
 void __cpuinit cpu_init(void)
 {
 	int cpu = early_smp_processor_id();
+	struct task_struct *curr = early_current();
+
 	struct tss_struct * t = &per_cpu(init_tss, cpu);
-	struct thread_struct *thread = &current->thread;
+	struct thread_struct *thread = &curr->thread;
 
 	if (cpu_test_and_set(cpu, cpu_initialized)) {
 		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
@@ -761,10 +766,10 @@ void __cpuinit cpu_init(void)
 	 * Set up and load the per-CPU TSS and LDT
 	 */
 	atomic_inc(&init_mm.mm_count);
-	current->active_mm = &init_mm;
-	if (current->mm)
+	curr->active_mm = &init_mm;
+	if (curr->mm)
 		BUG();
-	enter_lazy_tlb(&init_mm, current);
+	enter_lazy_tlb(&init_mm, curr);
 
 	load_esp0(t, thread);
 	set_tss_desc(cpu,t);
===================================================================
--- a/arch/i386/kernel/smpboot.c
+++ b/arch/i386/kernel/smpboot.c
@@ -590,6 +590,8 @@ static void __devinit start_secondary(vo
  */
 void __devinit initialize_secondary(void)
 {
+	struct task_struct *curr = early_current();
+
 	/*
 	 * We don't actually need to load the full TSS,
 	 * basically just the stack pointer and the eip.
@@ -599,7 +601,7 @@ void __devinit initialize_secondary(void
 		"movl %0,%%esp\n\t"
 		"jmp *%1"
 		:
-		:"r" (current->thread.esp),"r" (current->thread.eip));
+		:"r" (curr->thread.esp),"r" (curr->thread.eip));
 }
 
 extern struct {
===================================================================
--- a/include/asm-i386/current.h
+++ b/include/asm-i386/current.h
@@ -2,14 +2,20 @@
 #define _I386_CURRENT_H
 
 #include <linux/thread_info.h>
+#include <asm/pda.h>
 
 struct task_struct;
 
-static __always_inline struct task_struct * get_current(void)
+static __always_inline struct task_struct *early_current(void)
 {
 	return current_thread_info()->task;
 }
- 
+
+static __always_inline struct task_struct *get_current(void)
+{
+	return read_pda(pcurrent);
+}
+
 #define current get_current()
 
 #endif /* !(_I386_CURRENT_H) */

--


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-30 23:52 ` [PATCH 5/8] Fix places where using %gs changes the usermode ABI Jeremy Fitzhardinge
@ 2006-08-31  7:11   ` Andi Kleen
  2006-08-31  7:22     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2006-08-31  7:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

On Thursday 31 August 2006 01:52, Jeremy Fitzhardinge wrote:
> ===================================================================
> --- a/arch/i386/kernel/ptrace.c
> +++ b/arch/i386/kernel/ptrace.c
> @@ -94,13 +94,9 @@ static int putreg(struct task_struct *ch


[...] So did you check that ESP, EIP, EFLAGS now come out correctly again? 
(e.g. do gdb and strace still work?)

-Andi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-31  7:11   ` Andi Kleen
@ 2006-08-31  7:22     ` Jeremy Fitzhardinge
  2006-08-31  7:36       ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-31  7:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

Andi Kleen wrote:
> [...] So did you check that ESP, EIP, EFLAGS now come out correctly again? 
> (e.g. do gdb and strace still work?)
>   
Yep.

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-31  7:22     ` Jeremy Fitzhardinge
@ 2006-08-31  7:36       ` Andi Kleen
  2006-08-31  8:04         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2006-08-31  7:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

On Thursday 31 August 2006 09:22, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > [...] So did you check that ESP, EIP, EFLAGS now come out correctly again? 
> > (e.g. do gdb and strace still work?)
> >   
> Yep.

Ok it looks good then. I would apply it, but it seems to require the paravirt
patchkit first? 

-Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-31  7:36       ` Andi Kleen
@ 2006-08-31  8:04         ` Jeremy Fitzhardinge
  2006-08-31  8:13           ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-31  8:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

Andi Kleen wrote:
> On Thursday 31 August 2006 09:22, Jeremy Fitzhardinge wrote:
>   
>> Andi Kleen wrote:
>>     
>>> [...] So did you check that ESP, EIP, EFLAGS now come out correctly again? 
>>> (e.g. do gdb and strace still work?)
>>>   
>>>       
>> Yep.
>>     
>
> Ok it looks good then. I would apply it, but it seems to require the paravirt
> patchkit first?

No, it's against -rc4-mm3.  How much does it conflict with your tree?

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-31  8:04         ` Jeremy Fitzhardinge
@ 2006-08-31  8:13           ` Andi Kleen
  2006-08-31  8:39             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2006-08-31  8:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

On Thursday 31 August 2006 10:04, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > On Thursday 31 August 2006 09:22, Jeremy Fitzhardinge wrote:
> >   
> >> Andi Kleen wrote:
> >>     
> >>> [...] So did you check that ESP, EIP, EFLAGS now come out correctly again? 
> >>> (e.g. do gdb and strace still work?)
> >>>   
> >>>       
> >> Yep.
> >>     
> >
> > Ok it looks good then. I would apply it, but it seems to require the paravirt
> > patchkit first?
> 
> No, it's against -rc4-mm3.  How much does it conflict with your tree?

The first entry.S patch already threw 4 rejects or so.  I didn't try
further. I guess I'll take it together with the rest of the paravirt
stuff after the .19 merge.

-Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/8] Fix places where using %gs changes the usermode ABI.
  2006-08-31  8:13           ` Andi Kleen
@ 2006-08-31  8:39             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-31  8:39 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

Andi Kleen wrote:
> The first entry.S patch already threw 4 rejects or so.  I didn't try
> further. I guess I'll take it together with the rest of the paravirt
> stuff after the .19 merge.
>   

Oh, that's a conflict with the "cli -> DISABLE_INTERRUPTS" (etc) patch 
which is already in Andrews tree.  It should be pretty straightforward 
to wiggle around it.

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] Implement smp_processor_id() with the PDA.
  2006-08-30 23:52 ` [PATCH 7/8] Implement smp_processor_id() with the PDA Jeremy Fitzhardinge
@ 2006-08-31 12:35   ` Ian Campbell
  2006-08-31 16:04     ` Jeremy Fitzhardinge
  2006-08-31 19:10     ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2006-08-31 12:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich,
	Andi Kleen, Andrew Morton

Hi Jeremy,

On Wed, 2006-08-30 at 16:52 -0700, Jeremy Fitzhardinge wrote:
> --- a/arch/i386/kernel/cpu/common.c
> +++ b/arch/i386/kernel/cpu/common.c@@ -664,7 +664,7 @@ static inline void set_kernel_gs(void)
>  /* Initialize the CPU's GDT and PDA */
>  static __cpuinit void init_gdt(void)
>  {
> -       int cpu = smp_processor_id();
> +       int cpu = early_smp_processor_id();
>         struct task_struct *curr = current;
>         struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
>         __u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu); 

This doesn't compile for me if CONFIG_SMP=n

          LD      .tmp_vmlinux1
        arch/i386/kernel/built-in.o: In function `cpu_init':
        (.init.text+0x1eda): undefined reference to `early_smp_processor_id'
        arch/i386/kernel/built-in.o: In function `cpu_init':
        (.init.text+0x1f11): undefined reference to `early_smp_processor_id'
        
smp_processor_id() is defined for !SMP in include/linux/smp.h, I don't
know if it would be appropriate to add early_smp_processor_id() there
since it seems i386 specific. asm/smp.h isn't included by linux/smp.h
when !SMP but you could add an explicit include to common.c I suppose.

Ian.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] Implement smp_processor_id() with the PDA.
  2006-08-31 12:35   ` Ian Campbell
@ 2006-08-31 16:04     ` Jeremy Fitzhardinge
  2006-08-31 19:10     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-31 16:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich,
	Andi Kleen, Andrew Morton, Chris Wright

Ian Campbell wrote:
> This doesn't compile for me if CONFIG_SMP=n
>   

Ah, good point.

>           LD      .tmp_vmlinux1
>         arch/i386/kernel/built-in.o: In function `cpu_init':
>         (.init.text+0x1eda): undefined reference to `early_smp_processor_id'
>         arch/i386/kernel/built-in.o: In function `cpu_init':
>         (.init.text+0x1f11): undefined reference to `early_smp_processor_id'
>         
> smp_processor_id() is defined for !SMP in include/linux/smp.h, I don't
> know if it would be appropriate to add early_smp_processor_id() there
> since it seems i386 specific. asm/smp.h isn't included by linux/smp.h
> when !SMP but you could add an explicit include to common.c I suppose.
>   

I'll have a look.

I think my preferred solution would be to get rid of all the early* 
stuff, and try to arrange to have the PDA set up before C code gets 
run.  For the boot CPU, it really could be done statically (I'm not 
quite sure why the boot CPU's GDT is allocated, given that it already 
has a static one; I think this might have been a Xen-related change?).  
The secondary CPUs could have their GDT+PDA completely allocated and 
initialized in advance, making secondary CPU PDA setup just a matter of 
doing lgdt and setting %gs in head.S, even before hitting C code.

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] Implement smp_processor_id() with the PDA.
  2006-08-31 12:35   ` Ian Campbell
  2006-08-31 16:04     ` Jeremy Fitzhardinge
@ 2006-08-31 19:10     ` Jeremy Fitzhardinge
  2006-08-31 21:34       ` Ian Campbell
  1 sibling, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-31 19:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich,
	Andi Kleen, Andrew Morton

Ian Campbell wrote:
> smp_processor_id() is defined for !SMP in include/linux/smp.h, I don't
> know if it would be appropriate to add early_smp_processor_id() there
> since it seems i386 specific. asm/smp.h isn't included by linux/smp.h
> when !SMP but you could add an explicit include to common.c I suppose.
>   
The simple solution is to just define a !SMP version of 
early_smp_processor_id().  It's i386 specific, but that's the only arch 
that uses it:

diff -r 8a89489b3734 include/asm-i386/smp.h
--- a/include/asm-i386/smp.h    Thu Aug 31 12:06:44 2006 -0700
+++ b/include/asm-i386/smp.h    Thu Aug 31 12:07:48 2006 -0700
@@ -98,6 +98,7 @@ extern unsigned int num_processors;
 #else /* CONFIG_SMP */
 
 #define safe_smp_processor_id()                0
+#define early_smp_processor_id()       0
 #define cpu_physical_id(cpu)           boot_cpu_physical_apicid
 
 #define NO_PROC_ID             0xFF            /* No processor magic marker */

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] Implement smp_processor_id() with the PDA.
  2006-08-31 19:10     ` Jeremy Fitzhardinge
@ 2006-08-31 21:34       ` Ian Campbell
  2006-08-31 21:39         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2006-08-31 21:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich,
	Andi Kleen, Andrew Morton

On Thu, 2006-08-31 at 12:10 -0700, Jeremy Fitzhardinge wrote:
> Ian Campbell wrote:
> > smp_processor_id() is defined for !SMP in include/linux/smp.h, I don't
> > know if it would be appropriate to add early_smp_processor_id() there
> > since it seems i386 specific. asm/smp.h isn't included by linux/smp.h
> > when !SMP but you could add an explicit include to common.c I suppose.
> >   
> The simple solution is to just define a !SMP version of 
> early_smp_processor_id().  It's i386 specific, but that's the only arch 
> that uses it:

Are you sure that works? When I tried it didn't. I think because
asm/smp.h isn't included by linux/smp.h for !SMP.

I needed the below to make it work, but including linux/smp.h and
asm/smp.h in the same file smells a bit fishy to me... Probably
acceptable for now if you are thinking of redoing SMP processor bringup
anyway.

diff -r fa530c593b97 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c     Thu Aug 31 22:28:11 2006 +0100
+++ b/arch/i386/kernel/cpu/common.c     Thu Aug 31 22:33:08 2006 +0100
@@ -13,6 +13,7 @@
 #include <asm/mmu_context.h>
 #include <asm/mtrr.h>
 #include <asm/mce.h>
+#include <asm/smp.h>
 #ifdef CONFIG_X86_LOCAL_APIC
 #include <asm/mpspec.h>
 #include <asm/apic.h>

Ian.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/8] Implement smp_processor_id() with the PDA.
  2006-08-31 21:34       ` Ian Campbell
@ 2006-08-31 21:39         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-08-31 21:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich,
	Andi Kleen, Andrew Morton

Ian Campbell wrote:
> Are you sure that works? When I tried it didn't. I think because
> asm/smp.h isn't included by linux/smp.h for !SMP.
>   

Nah, testing is overrated.

> I needed the below to make it work, but including linux/smp.h and
> asm/smp.h in the same file smells a bit fishy to me... Probably
> acceptable for now if you are thinking of redoing SMP processor bringup
> anyway.
>   

That looks OK for now.  Rearranging CPU bringup looks a little bit 
complex to do immediately, so this seems like a reasonable fix for now.

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/8] Implement per-processor data areas for i386.
  2006-09-01  8:30     ` Andi Kleen
@ 2006-09-01 19:08       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-01 19:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

Andi Kleen wrote:
>>> There unfortunately were still quite a lot of rejects because -mm* 
>>> is too different from mainline, but I fixed them all.
>>>   
>>>       
>> Thanks.  Were there more conflicts than entry.S?
>>     
>
> Yes ptrace-abi.h doesn't exist 

That's a bit of a surprise.  I've been working against -mm, so I hadn't 
noticed that it wasn't in mainline.  I had assumed the name was old and 
historical given how inaccurate it is.  I wonder what patch it's part of...

> and the ""s in the Subject of your last patch caused 
> quilt to freak out. I think there was one other too.
>
> I hope everything still works. At least one of my test machines 
> is currently completely unhappy on i386 with random hangs (even before 
> your patches), still bisecting it.
>   

Good luck.  The PDA/CPU startup stuff is all very touchy (I took your 
advice and had good success debugging it with simnow), but once you're 
past that it either works or it doesn't.

    J


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/8] Implement per-processor data areas for i386.
  2006-09-01  8:26   ` Jeremy Fitzhardinge
@ 2006-09-01  8:30     ` Andi Kleen
  2006-09-01 19:08       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2006-09-01  8:30 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

On Friday 01 September 2006 10:26, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > I applied it now, with one change. I replaced the %Ps with %cs because
> > that is apparently the more official way to do that in gcc. Please
> > change that in your copy too.
> >   
> 
> Do you mean the %P0, etc in the asms?

Yes.

> > There unfortunately were still quite a lot of rejects because -mm* 
> > is too different from mainline, but I fixed them all.
> >   
> Thanks.  Were there more conflicts than entry.S?

Yes ptrace-abi.h doesn't exist and the ""s in the Subject of your last patch caused 
quilt to freak out. I think there was one other too.

I hope everything still works. At least one of my test machines 
is currently completely unhappy on i386 with random hangs (even before 
your patches), still bisecting it.

-Andi
     

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/8] Implement per-processor data areas for i386.
  2006-09-01  8:16 ` Andi Kleen
@ 2006-09-01  8:26   ` Jeremy Fitzhardinge
  2006-09-01  8:30     ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-01  8:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

Andi Kleen wrote:
> I applied it now, with one change. I replaced the %Ps with %cs because
> that is apparently the more official way to do that in gcc. Please
> change that in your copy too.
>   

Do you mean the %P0, etc in the asms?

> There unfortunately were still quite a lot of rejects because -mm* 
> is too different from mainline, but I fixed them all.
>   
Thanks.  Were there more conflicts than entry.S?

    J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/8] Implement per-processor data areas for i386.
  2006-09-01  6:47 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
@ 2006-09-01  8:16 ` Andi Kleen
  2006-09-01  8:26   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Andi Kleen @ 2006-09-01  8:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Chuck Ebbert, Zachary Amsden, Jan Beulich, Andrew Morton

On Friday 01 September 2006 08:47, Jeremy Fitzhardinge wrote:
> [ Changes since previous post:
>   - fixed UP build
>   - make compiler type-check for writes to PDA
>   - added pda_addr() to get the address of PDA fields ]

I applied it now, with one change. I replaced the %Ps with %cs because
that is apparently the more official way to do that in gcc. Please
change that in your copy too.

There unfortunately were still quite a lot of rejects because -mm* 
is too different from mainline, but I fixed them all.

-Andi


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/8] Implement per-processor data areas for i386.
@ 2006-09-01  6:47 Jeremy Fitzhardinge
  2006-09-01  8:16 ` Andi Kleen
  0 siblings, 1 reply; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-01  6:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chuck Ebbert, Zachary Amsden, Jan Beulich, Andi Kleen, Andrew Morton

[ Changes since previous post:
  - fixed UP build
  - make compiler type-check for writes to PDA
  - added pda_addr() to get the address of PDA fields ]

This patch implements per-processor data areas by using %gs as the
base segment of the per-processor memory.  This has two principle
advantages:

- It allows very simple direct access to per-processor data by
  effectively using an effective address of the form %gs:offset, where
  offset is the offset into struct i386_pda.  These sequences are faster
  and smaller than the current mechanism using current_thread_info().

- It also allows per-CPU data to be allocated as each CPU is brought
  up, rather than statically allocating it based on the maximum number
  of CPUs which could be brought up.

Performance:

I've done some simple performance tests on an Intel Core Duo running
at 1GHz (to emphisize any performance delta).  The results for the
lmbench null syscall latency test, which should show the most negative
effect from this change, show a ~9ns decline (.237uS -> .245uS).
This corresponds to around 9 CPU cycles, and correlates well with
the addition of the push/load/pop %gs into the hot path.

I have not yet measured the effect on other typees of processor or
more complex syscalls (though I would expect the push/pop overhead
would be drowned by longer times spent in the kernel, and mitigated by
actual use of the PDA).

The size improvements on the kernel text are nice as well: 
    2889361 -> 2883936 = 5425 bytes saved


Some background for people unfamiliar with x86 segmentation:

This uses the x86 segmentation stuff in a way similar to NPTL's way of
implementing Thread-Local Storage.  It relies on the fact that each CPU
has its own Global Descriptor Table (GDT), which is basically an array
of base-length pairs (with some extra stuff).  When a segment register
is loaded with a descriptor (approximately, an index in the GDT), and
you use that segment register for memory access, the address has the
base added to it, and the resulting address is used.

In other words, if you imagine the GDT containing an entry:
	Index	Offset
	123:	0xc0211000 (allocated PDA)
and you load %gs with this selector:
	mov $123, %gs
and then use GS later on:
	mov %gs:4, %eax
This has the effect of
	mov 0xc0211004, %eax
and because the GDT is per-CPU, the offset (= 0xc0211000 = memory
allocated for this CPU's PDA) can be a CPU-specific value while leaving
everything else constant.

This means that something like "current" or "smp_processor_id()" can
collapse to a single instruction:
	mov %gs:PDA_current, %reg


TODO: 
- Modify more things to use the PDA.  The more that uses it, the more
  the cost of the %gs save/restore is amortized.  smp_processor_id and
  current are the obvious first choices, which are implemented in this
  series.
--


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-09-01 19:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-30 23:52 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 1/8] Use asm-offsets for the offsets of registers into the pt_regs struct, rather than having hard-coded constants Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 2/8] Basic definitions for i386-pda Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 3/8] Initialize the per-CPU data area Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 4/8] Use %gs as the PDA base-segment in the kernel Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 5/8] Fix places where using %gs changes the usermode ABI Jeremy Fitzhardinge
2006-08-31  7:11   ` Andi Kleen
2006-08-31  7:22     ` Jeremy Fitzhardinge
2006-08-31  7:36       ` Andi Kleen
2006-08-31  8:04         ` Jeremy Fitzhardinge
2006-08-31  8:13           ` Andi Kleen
2006-08-31  8:39             ` Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 6/8] Update sys_vm86 to cope with changed pt_regs and %gs usage Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 7/8] Implement smp_processor_id() with the PDA Jeremy Fitzhardinge
2006-08-31 12:35   ` Ian Campbell
2006-08-31 16:04     ` Jeremy Fitzhardinge
2006-08-31 19:10     ` Jeremy Fitzhardinge
2006-08-31 21:34       ` Ian Campbell
2006-08-31 21:39         ` Jeremy Fitzhardinge
2006-08-30 23:52 ` [PATCH 8/8] Implement "current" " Jeremy Fitzhardinge
2006-09-01  6:47 [PATCH 0/8] Implement per-processor data areas for i386 Jeremy Fitzhardinge
2006-09-01  8:16 ` Andi Kleen
2006-09-01  8:26   ` Jeremy Fitzhardinge
2006-09-01  8:30     ` Andi Kleen
2006-09-01 19:08       ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).