linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/3] Gdt page isolation
@ 2005-09-22  7:49 Zachary Amsden
  2005-09-22  8:59 ` Ingo Molnar
  2005-09-22 13:17 ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Zachary Amsden @ 2005-09-22  7:49 UTC (permalink / raw)
  To: Linus Torvalds, Jeffrey Sheldon, Ole Agesen, Shai Fultheim,
	Andrew Morton, Jack Lo, Ingo Molnar, Linux Kernel Mailing List,
	Virtualization Mailing List, Chris Wright, Martin Bligh,
	Pratap Subrahmanyam, Christopher Li, H. Peter Anvin,
	Zwane Mwaikambo, Andi Kleen, Zachary Amsden

Make GDT page aligned and page padded to support running inside of a
hypervisor.  This prevents false sharing of the GDT page with other
hot data, which is not allowed in Xen, and causes performance problems
in VMware.

Rather than go back to the old method of statically allocating the
GDT (which wastes unneded space for non-present CPUs), the GDT for
APs is allocated dynamically.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.14-rc1/include/asm-i386/desc.h
===================================================================
--- linux-2.6.14-rc1.orig/include/asm-i386/desc.h	2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/include/asm-i386/desc.h	2005-09-20 20:20:50.000000000 -0700
@@ -15,9 +15,6 @@
 #include <asm/mmu.h>
 
 extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
-DECLARE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]);
-
-#define get_cpu_gdt_table(_cpu) (per_cpu(cpu_gdt_table,_cpu))
 
 DECLARE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 
@@ -29,6 +26,8 @@ struct Xgt_desc_struct {
 
 extern struct Xgt_desc_struct idt_descr, cpu_gdt_descr[NR_CPUS];
 
+#define get_cpu_gdt_table(_cpu) ((struct desc_struct *)cpu_gdt_descr[(_cpu)].address)
+
 #define load_TR_desc() __asm__ __volatile__("ltr %w0"::"q" (GDT_ENTRY_TSS*8))
 #define load_LDT_desc() __asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8))
 
Index: linux-2.6.14-rc1/arch/i386/kernel/i386_ksyms.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/i386_ksyms.c	2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/i386_ksyms.c	2005-09-20 20:28:51.000000000 -0700
@@ -3,8 +3,7 @@
 #include <asm/checksum.h>
 #include <asm/desc.h>
 
-/* This is definitely a GPL-only symbol */
-EXPORT_SYMBOL_GPL(cpu_gdt_table);
+EXPORT_SYMBOL_GPL(cpu_gdt_descr);
 
 EXPORT_SYMBOL(__down_failed);
 EXPORT_SYMBOL(__down_failed_interruptible);
Index: linux-2.6.14-rc1/arch/i386/kernel/smpboot.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/smpboot.c	2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/smpboot.c	2005-09-20 20:38:22.000000000 -0700
@@ -898,6 +898,7 @@ static int __devinit do_boot_cpu(int api
 	 * This grunge runs the startup process for
 	 * the targeted processor.
 	 */
+	cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
 
 	atomic_set(&init_deasserted, 0);
 
Index: linux-2.6.14-rc1/arch/i386/kernel/head.S
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/head.S	2005-09-20 14:32:16.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/head.S	2005-09-20 20:44:38.000000000 -0700
@@ -525,3 +525,5 @@ ENTRY(cpu_gdt_table)
 	.quad 0x0000000000000000	/* 0xf0 - unused */
 	.quad 0x0000000000000000	/* 0xf8 - GDT entry 31: double-fault TSS */
 
+	/* Be sure this is zeroed to avoid false validations in Xen */
+	.fill PAGE_SIZE_asm / 8 - GDT_ENTRIES,8,0
Index: linux-2.6.14-rc1/arch/i386/kernel/cpu/common.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/cpu/common.c	2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/cpu/common.c	2005-09-20 20:37:31.000000000 -0700
@@ -18,9 +18,6 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]);
-EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
-
 DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] Gdt page isolation
  2005-09-22  7:49 [PATCH 3/3] Gdt page isolation Zachary Amsden
@ 2005-09-22  8:59 ` Ingo Molnar
  2005-09-22 13:17 ` Andi Kleen
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2005-09-22  8:59 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Linus Torvalds, Jeffrey Sheldon, Ole Agesen, Shai Fultheim,
	Andrew Morton, Jack Lo, Linux Kernel Mailing List,
	Virtualization Mailing List, Chris Wright, Martin Bligh,
	Pratap Subrahmanyam, Christopher Li, H. Peter Anvin,
	Zwane Mwaikambo, Andi Kleen


* Zachary Amsden <zach@vmware.com> wrote:

> Make GDT page aligned and page padded to support running inside of a 
> hypervisor.  This prevents false sharing of the GDT page with other 
> hot data, which is not allowed in Xen, and causes performance problems 
> in VMware.
> 
> Rather than go back to the old method of statically allocating the GDT 
> (which wastes unneded space for non-present CPUs), the GDT for APs is 
> allocated dynamically.
> 
> Signed-off-by: Zachary Amsden <zach@vmware.com>

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] Gdt page isolation
  2005-09-22  7:49 [PATCH 3/3] Gdt page isolation Zachary Amsden
  2005-09-22  8:59 ` Ingo Molnar
@ 2005-09-22 13:17 ` Andi Kleen
  2005-09-22 13:34   ` Ingo Molnar
  2005-09-23 19:00   ` Zachary Amsden
  1 sibling, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2005-09-22 13:17 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Linus Torvalds, Jeffrey Sheldon, Ole Agesen, Shai Fultheim,
	Andrew Morton, Jack Lo, Ingo Molnar, Linux Kernel Mailing List,
	Virtualization Mailing List, Chris Wright, Martin Bligh,
	Pratap Subrahmanyam, Christopher Li, H. Peter Anvin,
	Zwane Mwaikambo

>  	 * This grunge runs the startup process for
>  	 * the targeted processor.
>  	 */
> +	cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);

I can see why don't check it for NULL, but it's a ugly reason
and would be better fixed. It at least needs a comment.

-Andi (who would still prefer just going back to the array
in head.S - would work as well and waste less memory) 
>  
>  	atomic_set(&init_deasserted, 0);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] Gdt page isolation
  2005-09-22 13:17 ` Andi Kleen
@ 2005-09-22 13:34   ` Ingo Molnar
  2005-09-22 13:38     ` Andi Kleen
  2005-09-23 19:00   ` Zachary Amsden
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2005-09-22 13:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zachary Amsden, Linus Torvalds, Jeffrey Sheldon, Ole Agesen,
	Shai Fultheim, Andrew Morton, Jack Lo, Linux Kernel Mailing List,
	Virtualization Mailing List, Chris Wright, Martin Bligh,
	Pratap Subrahmanyam, Christopher Li, H. Peter Anvin,
	Zwane Mwaikambo


* Andi Kleen <ak@muc.de> wrote:

> >  	 * This grunge runs the startup process for
> >  	 * the targeted processor.
> >  	 */
> > +	cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
> 
> I can see why don't check it for NULL, but it's a ugly reason and 
> would be better fixed. It at least needs a comment.

it's so early in the bootup that any failure here would probably be 
fatal anyway. But yeah, a comment would be nice.

> -Andi (who would still prefer just going back to the array in head.S - 
> would work as well and waste less memory)

that doesnt really solve the problem for e.g. Xen, which needs a 
separate page for each GDT. (xenolinux is a separate arch right now, but 
it will/should be merged back into its base architectures) So why not 
solve all the problems at once?

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] Gdt page isolation
  2005-09-22 13:34   ` Ingo Molnar
@ 2005-09-22 13:38     ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2005-09-22 13:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Linus Torvalds, Jeffrey Sheldon, Ole Agesen,
	Shai Fultheim, Andrew Morton, Jack Lo, Linux Kernel Mailing List,
	Virtualization Mailing List, Chris Wright, Martin Bligh,
	Pratap Subrahmanyam, Christopher Li, H. Peter Anvin,
	Zwane Mwaikambo

On Thu, Sep 22, 2005 at 03:34:02PM +0200, Ingo Molnar wrote:
> 
> * Andi Kleen <ak@muc.de> wrote:
> 
> > >  	 * This grunge runs the startup process for
> > >  	 * the targeted processor.
> > >  	 */
> > > +	cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
> > 
> > I can see why don't check it for NULL, but it's a ugly reason and 
> > would be better fixed. It at least needs a comment.
> 
> it's so early in the bootup that any failure here would probably be 
> fatal anyway. But yeah, a comment would be nice.

Not true for AP boot, especially with CPU hotplug.
And CPU hotplug will be common because suspend/resume use it
and next years laptops will have dual core CPUs.
So it would be nicer to handle it. But getting out of that 
path is difficult.


> > -Andi (who would still prefer just going back to the array in head.S - 
> > would work as well and waste less memory)
> 
> that doesnt really solve the problem for e.g. Xen, which needs a 
> separate page for each GDT. (xenolinux is a separate arch right now, but 
> it will/should be merged back into its base architectures) So why not 
> solve all the problems at once?

Xen could use the same method vmware uses that handles having other
read only data in there too? Doesn't sound too bad.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] Gdt page isolation
  2005-09-22 13:17 ` Andi Kleen
  2005-09-22 13:34   ` Ingo Molnar
@ 2005-09-23 19:00   ` Zachary Amsden
  1 sibling, 0 replies; 6+ messages in thread
From: Zachary Amsden @ 2005-09-23 19:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Jeffrey Sheldon, Ole Agesen, Shai Fultheim,
	Andrew Morton, Jack Lo, Ingo Molnar, Linux Kernel Mailing List,
	Virtualization Mailing List, Chris Wright, Martin Bligh,
	Pratap Subrahmanyam, Christopher Li, H. Peter Anvin,
	Zwane Mwaikambo

Andi Kleen wrote:

>> 	 * This grunge runs the startup process for
>> 	 * the targeted processor.
>> 	 */
>>+	cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
>>    
>>
>
>I can see why don't check it for NULL, but it's a ugly reason
>and would be better fixed. It at least needs a comment.
>
>-Andi (who would still prefer just going back to the array
>in head.S - would work as well and waste less memory) 
>  
>

The array in head.S does waste more memory if you compile for NR_CPUS >> 
actual cpus.  But the primary reason for allocating on individual pages 
is to preserve the hypervisor GDT entries for Xen.  Xen relies on a GDT 
virtualization technique which uses descriptors in the high part of the 
GDT.  Keep in mind, the GDT is a paged data structure.  So here is what 
they do:

Linear address space:

+---------------------------+  4GB
|                           | 
|  Xen code, heap           |
|                           |
|                           |
+---------------------------+
|                           |  GDT virtual mapping
|  Xen per-domain mappings  |==(page 15)====> hypervisor physical page
|                           |==(page 1-14)==> zeroed pages
|  GDT (16 pages)           |==(page 0)==+
+---------------------------+  -168 ? MB |
|                           |            |
|  MPT tables               |            |
|                           |            |
|                           |            |
+---------------------------+            |
|                           |            |
|  Guest kernel             |            |
|                           |            |
|                           |            |
|                           |            |
|                           |            |
|  GDT 256 bytes, read-only |============+==> guest physical page
|                           |
+---------------------------+  3GB


So, the GDT mapping which is live in the hypervisor consists of guest 
GDT pages following by blank pages, followed by a page which is reserved 
for Xen private GDT mappings.  The guest pages are mapped into the 
guest, read-only.

This imposes a strict requirement on the guest regarding sharing of data 
on the GDT pages; it is impossible to share arbitrary data, even if it 
is read-only.  All data on thes pages must conform to the rules for 
valid guest GDT descriptors, which only GDT and LDT entries are forced 
to do.

So while it is technically possible to share the per-cpu GDTs on the 
same set of pages, the entire thing must be padded, page-aligned, and 
zeroed of any unused data.  Unless you go to a complicated scheme where 
per-cpu GDTs are colored and shared, you have an arbitrary limit (240) 
on the number of virtual CPUs.

So for now, the approach Xen is currently using appears to be simplest 
and most flexible to implement in terms of one page per CPU for the GDT.

Zach

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-09-23 19:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-22  7:49 [PATCH 3/3] Gdt page isolation Zachary Amsden
2005-09-22  8:59 ` Ingo Molnar
2005-09-22 13:17 ` Andi Kleen
2005-09-22 13:34   ` Ingo Molnar
2005-09-22 13:38     ` Andi Kleen
2005-09-23 19:00   ` Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).