All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel
@ 2007-04-28 17:58 Andi Kleen
  2007-04-28 17:58 ` [PATCH] [1/22] x86_64: dma_ops as const Andi Kleen
                   ` (21 more replies)
  0 siblings, 22 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: linux-kernel, patches


Here are the patches to support the relocatable kernel on x86-64. All done
by Eric Biederman and Vivek Goyal.  Following i386. Also includes lots
of cleanup to various boot paths.

Vivek already posted them a couple of time.

Please review.

-Andi


^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [1/22] x86_64: dma_ops as const
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [2/22] x86_64: Assembly safe page.h and pgtable.h Andi Kleen
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Stephen Hemminger, linux-kernel, patches


From: Stephen Hemminger <shemminger@linux-foundation.org>
The dma_ops structure can be const since it never changes
after boot.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>

---
 arch/x86_64/kernel/pci-calgary.c |    2 +-
 arch/x86_64/kernel/pci-gart.c    |    2 +-
 arch/x86_64/kernel/pci-nommu.c   |    2 +-
 arch/x86_64/kernel/pci-swiotlb.c |    2 +-
 arch/x86_64/mm/init.c            |    2 +-
 include/asm-x86_64/dma-mapping.h |    2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

Index: linux/arch/x86_64/kernel/pci-calgary.c
===================================================================
--- linux.orig/arch/x86_64/kernel/pci-calgary.c
+++ linux/arch/x86_64/kernel/pci-calgary.c
@@ -507,7 +507,7 @@ error:
 	return ret;
 }
 
-static struct dma_mapping_ops calgary_dma_ops = {
+static const struct dma_mapping_ops calgary_dma_ops = {
 	.alloc_coherent = calgary_alloc_coherent,
 	.map_single = calgary_map_single,
 	.unmap_single = calgary_unmap_single,
Index: linux/arch/x86_64/kernel/pci-gart.c
===================================================================
--- linux.orig/arch/x86_64/kernel/pci-gart.c
+++ linux/arch/x86_64/kernel/pci-gart.c
@@ -556,7 +556,7 @@ static __init int init_k8_gatt(struct ag
 
 extern int agp_amd64_init(void);
 
-static struct dma_mapping_ops gart_dma_ops = {
+static const struct dma_mapping_ops gart_dma_ops = {
 	.mapping_error = NULL,
 	.map_single = gart_map_single,
 	.map_simple = gart_map_simple,
Index: linux/arch/x86_64/kernel/pci-nommu.c
===================================================================
--- linux.orig/arch/x86_64/kernel/pci-nommu.c
+++ linux/arch/x86_64/kernel/pci-nommu.c
@@ -79,7 +79,7 @@ void nommu_unmap_sg(struct device *dev, 
 {
 }
 
-struct dma_mapping_ops nommu_dma_ops = {
+const struct dma_mapping_ops nommu_dma_ops = {
 	.map_single = nommu_map_single,
 	.unmap_single = nommu_unmap_single,
 	.map_sg = nommu_map_sg,
Index: linux/arch/x86_64/kernel/pci-swiotlb.c
===================================================================
--- linux.orig/arch/x86_64/kernel/pci-swiotlb.c
+++ linux/arch/x86_64/kernel/pci-swiotlb.c
@@ -12,7 +12,7 @@
 int swiotlb __read_mostly;
 EXPORT_SYMBOL(swiotlb);
 
-struct dma_mapping_ops swiotlb_dma_ops = {
+const struct dma_mapping_ops swiotlb_dma_ops = {
 	.mapping_error = swiotlb_dma_mapping_error,
 	.alloc_coherent = swiotlb_alloc_coherent,
 	.free_coherent = swiotlb_free_coherent,
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -46,7 +46,7 @@
 #define Dprintk(x...)
 #endif
 
-struct dma_mapping_ops* dma_ops;
+const struct dma_mapping_ops* dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
 static unsigned long dma_reserve __initdata;
Index: linux/include/asm-x86_64/dma-mapping.h
===================================================================
--- linux.orig/include/asm-x86_64/dma-mapping.h
+++ linux/include/asm-x86_64/dma-mapping.h
@@ -52,7 +52,7 @@ struct dma_mapping_ops {
 };
 
 extern dma_addr_t bad_dma_address;
-extern struct dma_mapping_ops* dma_ops;
+extern const struct dma_mapping_ops* dma_ops;
 extern int iommu_merge;
 
 static inline int dma_mapping_error(dma_addr_t dma_addr)

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [2/22] x86_64: Assembly safe page.h and pgtable.h
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
  2007-04-28 17:58 ` [PATCH] [1/22] x86_64: dma_ops as const Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [3/22] x86_64: Kill temp boot pmds Andi Kleen
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


This patch makes pgtable.h and page.h safe to include
in assembly files like head.S.  Allowing us to use
symbolic constants instead of hard coded numbers when
refering to the page tables.

This patch copies asm-sparc64/const.h to asm-x86_64 to
get a definition of _AC() a very convinient macro that
allows us to force the type when we are compiling the
code in C and to drop all of the type information when
we are using the constant in assembly.  Previously this
was done with multiple definition of the same constant.
const.h was modified slightly so that it works when given
CONFIG options as arguments.

This patch adds #ifndef __ASSEMBLY__ ... #endif
and _AC(1,UL) where appropriate so the assembler won't
choke on the header files.  Otherwise nothing
should have changed.

AK: added const.h to exported headers to fix headers_check

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 include/asm-x86_64/Kbuild    |    1 +
 include/asm-x86_64/const.h   |   20 ++++++++++++++++++++
 include/asm-x86_64/page.h    |   28 ++++++++++------------------
 include/asm-x86_64/pgtable.h |   33 +++++++++++++++++++++------------
 4 files changed, 52 insertions(+), 30 deletions(-)

Index: linux/include/asm-x86_64/const.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86_64/const.h
@@ -0,0 +1,20 @@
+/* const.h: Macros for dealing with constants.  */
+
+#ifndef _X86_64_CONST_H
+#define _X86_64_CONST_H
+
+/* Some constant macros are used in both assembler and
+ * C code.  Therefore we cannot annotate them always with
+ * 'UL' and other type specificers unilaterally.  We
+ * use the following macros to deal with this.
+ */
+
+#ifdef __ASSEMBLY__
+#define _AC(X,Y)	X
+#else
+#define __AC(X,Y)	(X##Y)
+#define _AC(X,Y)	__AC(X,Y)
+#endif
+
+
+#endif /* !(_X86_64_CONST_H) */
Index: linux/include/asm-x86_64/page.h
===================================================================
--- linux.orig/include/asm-x86_64/page.h
+++ linux/include/asm-x86_64/page.h
@@ -1,14 +1,11 @@
 #ifndef _X86_64_PAGE_H
 #define _X86_64_PAGE_H
 
+#include <asm/const.h>
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT	12
-#ifdef __ASSEMBLY__
-#define PAGE_SIZE	(0x1 << PAGE_SHIFT)
-#else
-#define PAGE_SIZE	(1UL << PAGE_SHIFT)
-#endif
+#define PAGE_SIZE	(_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 #define PHYSICAL_PAGE_MASK	(~(PAGE_SIZE-1) & __PHYSICAL_MASK)
 
@@ -33,10 +30,10 @@
 #define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
 #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
-#define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)
+#define LARGE_PAGE_SIZE (_AC(1,UL) << PMD_SHIFT)
 
 #define HPAGE_SHIFT PMD_SHIFT
-#define HPAGE_SIZE	((1UL) << HPAGE_SHIFT)
+#define HPAGE_SIZE	(_AC(1,UL) << HPAGE_SHIFT)
 #define HPAGE_MASK	(~(HPAGE_SIZE - 1))
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
 
@@ -76,29 +73,24 @@ typedef struct { unsigned long pgprot; }
 #define __pgd(x) ((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-#define __PHYSICAL_START	((unsigned long)CONFIG_PHYSICAL_START)
-#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
-#define __START_KERNEL_map	0xffffffff80000000UL
-#define __PAGE_OFFSET           0xffff810000000000UL
+#endif /* !__ASSEMBLY__ */
 
-#else
 #define __PHYSICAL_START	CONFIG_PHYSICAL_START
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 #define __START_KERNEL_map	0xffffffff80000000
 #define __PAGE_OFFSET           0xffff810000000000
-#endif /* !__ASSEMBLY__ */
 
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
 
 /* See Documentation/x86_64/mm.txt for a description of the memory map. */
 #define __PHYSICAL_MASK_SHIFT	46
-#define __PHYSICAL_MASK		((1UL << __PHYSICAL_MASK_SHIFT) - 1)
+#define __PHYSICAL_MASK		((_AC(1,UL) << __PHYSICAL_MASK_SHIFT) - 1)
 #define __VIRTUAL_MASK_SHIFT	48
-#define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
+#define __VIRTUAL_MASK		((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
 
-#define KERNEL_TEXT_SIZE  (40UL*1024*1024)
-#define KERNEL_TEXT_START 0xffffffff80000000UL 
+#define KERNEL_TEXT_SIZE  (40*1024*1024)
+#define KERNEL_TEXT_START 0xffffffff80000000
 
 #ifndef __ASSEMBLY__
 
@@ -106,7 +98,7 @@ typedef struct { unsigned long pgprot; }
 
 #endif /* __ASSEMBLY__ */
 
-#define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)
+#define PAGE_OFFSET		__PAGE_OFFSET
 
 /* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
    Otherwise you risk miscompilation. */ 
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -1,6 +1,9 @@
 #ifndef _X86_64_PGTABLE_H
 #define _X86_64_PGTABLE_H
 
+#include <asm/const.h>
+#ifndef __ASSEMBLY__
+
 /*
  * This file contains the functions and defines necessary to modify and use
  * the x86-64 page table tree.
@@ -30,6 +33,8 @@ extern void clear_kernel_mapping(unsigne
 extern unsigned long empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
 #define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
 
+#endif /* !__ASSEMBLY__ */
+
 /*
  * PGDIR_SHIFT determines what a top-level page table entry can map
  */
@@ -54,6 +59,8 @@ extern unsigned long empty_zero_page[PAG
  */
 #define PTRS_PER_PTE	512
 
+#ifndef __ASSEMBLY__
+
 #define pte_ERROR(e) \
 	printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), pte_val(e))
 #define pmd_ERROR(e) \
@@ -117,22 +124,23 @@ static inline pte_t ptep_get_and_clear_f
 
 #define pte_pgprot(a)	(__pgprot((a).pte & ~PHYSICAL_PAGE_MASK))
 
-#define PMD_SIZE	(1UL << PMD_SHIFT)
+#endif /* !__ASSEMBLY__ */
+
+#define PMD_SIZE	(_AC(1,UL) << PMD_SHIFT)
 #define PMD_MASK	(~(PMD_SIZE-1))
-#define PUD_SIZE	(1UL << PUD_SHIFT)
+#define PUD_SIZE	(_AC(1,UL) << PUD_SHIFT)
 #define PUD_MASK	(~(PUD_SIZE-1))
-#define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
+#define PGDIR_SIZE	(_AC(1,UL) << PGDIR_SHIFT)
 #define PGDIR_MASK	(~(PGDIR_SIZE-1))
 
 #define USER_PTRS_PER_PGD	((TASK_SIZE-1)/PGDIR_SIZE+1)
 #define FIRST_USER_ADDRESS	0
 
-#ifndef __ASSEMBLY__
-#define MAXMEM		 0x3fffffffffffUL
-#define VMALLOC_START    0xffffc20000000000UL
-#define VMALLOC_END      0xffffe1ffffffffffUL
-#define MODULES_VADDR    0xffffffff88000000UL
-#define MODULES_END      0xfffffffffff00000UL
+#define MAXMEM		 0x3fffffffffff
+#define VMALLOC_START    0xffffc20000000000
+#define VMALLOC_END      0xffffe1ffffffffff
+#define MODULES_VADDR    0xffffffff88000000
+#define MODULES_END      0xfffffffffff00000
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
 
 #define _PAGE_BIT_PRESENT	0
@@ -158,7 +166,7 @@ static inline pte_t ptep_get_and_clear_f
 #define _PAGE_GLOBAL	0x100	/* Global TLB entry */
 
 #define _PAGE_PROTNONE	0x080	/* If not present */
-#define _PAGE_NX        (1UL<<_PAGE_BIT_NX)
+#define _PAGE_NX        (_AC(1,UL)<<_PAGE_BIT_NX)
 
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
@@ -220,6 +228,8 @@ static inline pte_t ptep_get_and_clear_f
 #define __S110	PAGE_SHARED_EXEC
 #define __S111	PAGE_SHARED_EXEC
 
+#ifndef __ASSEMBLY__
+
 static inline unsigned long pgd_bad(pgd_t pgd)
 {
 	return pgd_val(pgd) & ~(PTE_MASK | _KERNPG_TABLE | _PAGE_USER);
@@ -405,8 +415,6 @@ extern spinlock_t pgd_lock;
 extern struct page *pgd_list;
 void vmalloc_sync_all(void);
 
-#endif /* !__ASSEMBLY__ */
-
 extern int kern_addr_valid(unsigned long addr); 
 
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot)		\
@@ -436,5 +444,6 @@ extern int kern_addr_valid(unsigned long
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 #define __HAVE_ARCH_PTE_SAME
 #include <asm-generic/pgtable.h>
+#endif /* !__ASSEMBLY__ */
 
 #endif /* _X86_64_PGTABLE_H */
Index: linux/include/asm-x86_64/Kbuild
===================================================================
--- linux.orig/include/asm-x86_64/Kbuild
+++ linux/include/asm-x86_64/Kbuild
@@ -18,3 +18,4 @@ header-y += vsyscall32.h
 unifdef-y += mce.h
 unifdef-y += mtrr.h
 unifdef-y += vsyscall.h
+unifdef-y += const.h

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [3/22] x86_64: Kill temp boot pmds
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
  2007-04-28 17:58 ` [PATCH] [1/22] x86_64: dma_ops as const Andi Kleen
  2007-04-28 17:58 ` [PATCH] [2/22] x86_64: Assembly safe page.h and pgtable.h Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [4/22] x86_64: Clean up the early boot page table Andi Kleen
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


Early in the boot process we need the ability to set
up temporary mappings, before our normal mechanisms are
initialized.  Currently this is used to map pages that
are part of the page tables we are building and pages
during the dmi scan.

The core problem is that we are using the user portion of
the page tables to implement this.  Which means that while
this mechanism is active we cannot catch NULL pointer dereferences
and we deviate from the normal ways of handling things.

In this patch I modify early_ioremap to map pages into
the kernel portion of address space, roughly where
we will later put modules, and I make the discovery of
which addresses we can use dynamic which removes all
kinds of static limits and remove the dependencies
on implementation details between different parts of the code.

Now alloc_low_page() and unmap_low_page() use 
early_iomap() and early_iounmap() to allocate/map and 
unmap a page.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/head.S |    3 -
 arch/x86_64/mm/init.c     |  100 ++++++++++++++++++++--------------------------
 2 files changed, 45 insertions(+), 58 deletions(-)

Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -288,9 +288,6 @@ NEXT_PAGE(level2_ident_pgt)
 	.quad	i << 21 | 0x083
 	i = i + 1
 	.endr
-	/* Temporary mappings for the super early allocator in arch/x86_64/mm/init.c */
-	.globl temp_boot_pmds
-temp_boot_pmds:
 	.fill	492,8,0
 	
 NEXT_PAGE(level2_kernel_pgt)
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -167,23 +167,9 @@ __set_fixmap (enum fixed_addresses idx, 
 
 unsigned long __initdata table_start, table_end; 
 
-extern pmd_t temp_boot_pmds[]; 
-
-static  struct temp_map { 
-	pmd_t *pmd;
-	void  *address; 
-	int    allocated; 
-} temp_mappings[] __initdata = { 
-	{ &temp_boot_pmds[0], (void *)(40UL * 1024 * 1024) },
-	{ &temp_boot_pmds[1], (void *)(42UL * 1024 * 1024) }, 
-	{}
-}; 
-
-static __meminit void *alloc_low_page(int *index, unsigned long *phys)
+static __meminit void *alloc_low_page(unsigned long *phys)
 { 
-	struct temp_map *ti;
-	int i; 
-	unsigned long pfn = table_end++, paddr; 
+	unsigned long pfn = table_end++;
 	void *adr;
 
 	if (after_bootmem) {
@@ -194,57 +180,63 @@ static __meminit void *alloc_low_page(in
 
 	if (pfn >= end_pfn) 
 		panic("alloc_low_page: ran out of memory"); 
-	for (i = 0; temp_mappings[i].allocated; i++) {
-		if (!temp_mappings[i].pmd) 
-			panic("alloc_low_page: ran out of temp mappings"); 
-	} 
-	ti = &temp_mappings[i];
-	paddr = (pfn << PAGE_SHIFT) & PMD_MASK; 
-	set_pmd(ti->pmd, __pmd(paddr | _KERNPG_TABLE | _PAGE_PSE)); 
-	ti->allocated = 1; 
-	__flush_tlb(); 	       
-	adr = ti->address + ((pfn << PAGE_SHIFT) & ~PMD_MASK); 
+
+	adr = early_ioremap(pfn * PAGE_SIZE, PAGE_SIZE);
 	memset(adr, 0, PAGE_SIZE);
-	*index = i; 
-	*phys  = pfn * PAGE_SIZE;  
-	return adr; 
-} 
+	*phys  = pfn * PAGE_SIZE;
+	return adr;
+}
 
-static __meminit void unmap_low_page(int i)
+static __meminit void unmap_low_page(void *adr)
 { 
-	struct temp_map *ti;
 
 	if (after_bootmem)
 		return;
 
-	ti = &temp_mappings[i];
-	set_pmd(ti->pmd, __pmd(0));
-	ti->allocated = 0; 
+	early_iounmap(adr, PAGE_SIZE);
 } 
 
 /* Must run before zap_low_mappings */
 __init void *early_ioremap(unsigned long addr, unsigned long size)
 {
-	unsigned long map = round_down(addr, LARGE_PAGE_SIZE); 
-
-	/* actually usually some more */
-	if (size >= LARGE_PAGE_SIZE) { 
-		return NULL;
+	unsigned long vaddr;
+	pmd_t *pmd, *last_pmd;
+	int i, pmds;
+
+	pmds = ((addr & ~PMD_MASK) + size + ~PMD_MASK) / PMD_SIZE;
+	vaddr = __START_KERNEL_map;
+	pmd = level2_kernel_pgt;
+	last_pmd = level2_kernel_pgt + PTRS_PER_PMD - 1;
+	for (; pmd <= last_pmd; pmd++, vaddr += PMD_SIZE) {
+		for (i = 0; i < pmds; i++) {
+			if (pmd_present(pmd[i]))
+				goto next;
+		}
+		vaddr += addr & ~PMD_MASK;
+		addr &= PMD_MASK;
+		for (i = 0; i < pmds; i++, addr += PMD_SIZE)
+			set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE));
+		__flush_tlb();
+		return (void *)vaddr;
+	next:
+		;
 	}
-	set_pmd(temp_mappings[0].pmd,  __pmd(map | _KERNPG_TABLE | _PAGE_PSE));
-	map += LARGE_PAGE_SIZE;
-	set_pmd(temp_mappings[1].pmd,  __pmd(map | _KERNPG_TABLE | _PAGE_PSE));
-	__flush_tlb();
-	return temp_mappings[0].address + (addr & (LARGE_PAGE_SIZE-1));
+	printk("early_ioremap(0x%lx, %lu) failed\n", addr, size);
+	return NULL;
 }
 
 /* To avoid virtual aliases later */
 __init void early_iounmap(void *addr, unsigned long size)
 {
-	if ((void *)round_down((unsigned long)addr, LARGE_PAGE_SIZE) != temp_mappings[0].address)
-		printk("early_iounmap: bad address %p\n", addr);
-	set_pmd(temp_mappings[0].pmd, __pmd(0));
-	set_pmd(temp_mappings[1].pmd, __pmd(0));
+	unsigned long vaddr;
+	pmd_t *pmd;
+	int i, pmds;
+
+	vaddr = (unsigned long)addr;
+	pmds = ((vaddr & ~PMD_MASK) + size + ~PMD_MASK) / PMD_SIZE;
+	pmd = level2_kernel_pgt + pmd_index(vaddr);
+	for (i = 0; i < pmds; i++)
+		pmd_clear(pmd + i);
 	__flush_tlb();
 }
 
@@ -289,7 +281,6 @@ static void __meminit phys_pud_init(pud_
 
 
 	for (; i < PTRS_PER_PUD; i++, addr = (addr & PUD_MASK) + PUD_SIZE ) {
-		int map; 
 		unsigned long pmd_phys;
 		pud_t *pud = pud_page + pud_index(addr);
 		pmd_t *pmd;
@@ -307,12 +298,12 @@ static void __meminit phys_pud_init(pud_
 			continue;
 		}
 
-		pmd = alloc_low_page(&map, &pmd_phys);
+		pmd = alloc_low_page(&pmd_phys);
 		spin_lock(&init_mm.page_table_lock);
 		set_pud(pud, __pud(pmd_phys | _KERNPG_TABLE));
 		phys_pmd_init(pmd, addr, end);
 		spin_unlock(&init_mm.page_table_lock);
-		unmap_low_page(map);
+		unmap_low_page(pmd);
 	}
 	__flush_tlb();
 } 
@@ -364,7 +355,6 @@ void __meminit init_memory_mapping(unsig
 	end = (unsigned long)__va(end);
 
 	for (; start < end; start = next) {
-		int map;
 		unsigned long pud_phys; 
 		pgd_t *pgd = pgd_offset_k(start);
 		pud_t *pud;
@@ -372,7 +362,7 @@ void __meminit init_memory_mapping(unsig
 		if (after_bootmem)
 			pud = pud_offset(pgd, start & PGDIR_MASK);
 		else
-			pud = alloc_low_page(&map, &pud_phys);
+			pud = alloc_low_page(&pud_phys);
 
 		next = start + PGDIR_SIZE;
 		if (next > end) 
@@ -380,7 +370,7 @@ void __meminit init_memory_mapping(unsig
 		phys_pud_init(pud, __pa(start), __pa(next));
 		if (!after_bootmem)
 			set_pgd(pgd_offset_k(start), mk_kernel_pgd(pud_phys));
-		unmap_low_page(map);   
+		unmap_low_page(pud);
 	} 
 
 	if (!after_bootmem)

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [4/22] x86_64: Clean up the early boot page table
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (2 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [3/22] x86_64: Kill temp boot pmds Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [5/22] x86_64: Fix early printk to use standard ISA mapping Andi Kleen
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


- Merge physmem_pgt and ident_pgt, removing physmem_pgt.  The merge
  is broken as soon as mm/init.c:init_memory_mapping is run.
- As physmem_pgt is gone don't export it in pgtable.h.
- Use defines from pgtable.h for page permissions.
- Fix the physical memory identity mapping so it is at the correct
  address.
- Remove the physical memory mapping from wakeup_level4_pgt it
  is at the wrong address so we can't possibly be usinging it.
- Simply NEXT_PAGE the work to calculate the phys_ alias
  of the labels was very cool.  Unfortuantely it was a brittle
  special purpose hack that makes maitenance more difficult.
  Instead just use label - __START_KERNEL_map like we do
  everywhere else in assembly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/head.S    |   61 +++++++++++++++++++------------------------
 include/asm-x86_64/pgtable.h |    1 
 2 files changed, 28 insertions(+), 34 deletions(-)

Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <asm/desc.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
 #include <asm/msr.h>
 #include <asm/cache.h>
@@ -260,52 +261,48 @@ ljumpvector:
 ENTRY(stext)
 ENTRY(_stext)
 
-	$page = 0
 #define NEXT_PAGE(name) \
-	$page = $page + 1; \
-	.org $page * 0x1000; \
-	phys_/**/name = $page * 0x1000 + __PHYSICAL_START; \
+	.balign	PAGE_SIZE; \
 ENTRY(name)
 
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)		\
+	i = 0 ;					\
+	.rept (COUNT) ;				\
+	.quad	(START) + (i << 21) + (PERM) ;	\
+	i = i + 1 ;				\
+	.endr
+
 NEXT_PAGE(init_level4_pgt)
 	/* This gets initialized in x86_64_start_kernel */
 	.fill	512,8,0
 
 NEXT_PAGE(level3_ident_pgt)
-	.quad	phys_level2_ident_pgt | 0x007
+	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.fill	511,8,0
 
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	510,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	phys_level2_kernel_pgt | 0x007
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 	.fill	1,8,0
 
 NEXT_PAGE(level2_ident_pgt)
-	/* 40MB for bootup. 	*/
-	i = 0
-	.rept 20
-	.quad	i << 21 | 0x083
-	i = i + 1
-	.endr
-	.fill	492,8,0
+	/* Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 */
+	PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD)
 	
 NEXT_PAGE(level2_kernel_pgt)
 	/* 40MB kernel mapping. The kernel code cannot be bigger than that.
 	   When you change this change KERNEL_TEXT_SIZE in page.h too. */
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511)-((2^30)*510)) = 0 */
-	i = 0
-	.rept 20
-	.quad	i << 21 | 0x183
-	i = i + 1
-	.endr
+	PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC|_PAGE_GLOBAL,
+		KERNEL_TEXT_SIZE/PMD_SIZE)
 	/* Module mapping starts here */
-	.fill	492,8,0
-
-NEXT_PAGE(level3_physmem_pgt)
-	.quad	phys_level2_kernel_pgt | 0x007	/* so that __va works even before pagetable_init */
-	.fill	511,8,0
+	.fill	(PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
 
+#undef PMDS
 #undef NEXT_PAGE
 
 	.data
@@ -313,12 +310,10 @@ NEXT_PAGE(level3_physmem_pgt)
 #ifdef CONFIG_ACPI_SLEEP
 	.align PAGE_SIZE
 ENTRY(wakeup_level4_pgt)
-	.quad	phys_level3_ident_pgt | 0x007
-	.fill	255,8,0
-	.quad	phys_level3_physmem_pgt | 0x007
-	.fill	254,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	510,8,0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	phys_level3_kernel_pgt | 0x007
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 #endif
 
 #ifndef CONFIG_HOTPLUG_CPU
@@ -332,12 +327,12 @@ ENTRY(wakeup_level4_pgt)
 	 */
 	.align PAGE_SIZE
 ENTRY(boot_level4_pgt)
-	.quad	phys_level3_ident_pgt | 0x007
-	.fill	255,8,0
-	.quad	phys_level3_physmem_pgt | 0x007
-	.fill	254,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	257,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	252,8,0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	phys_level3_kernel_pgt | 0x007
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
 	.data
 
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -14,7 +14,6 @@
 #include <asm/pda.h>
 
 extern pud_t level3_kernel_pgt[512];
-extern pud_t level3_physmem_pgt[512];
 extern pud_t level3_ident_pgt[512];
 extern pmd_t level2_kernel_pgt[512];
 extern pgd_t init_level4_pgt[];

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [5/22] x86_64: Fix early printk to use standard ISA mapping
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (3 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [4/22] x86_64: Clean up the early boot page table Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [6/22] x86_64: modify copy_bootdata to use virtual addresses Andi Kleen
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>



Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/early_printk.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/early_printk.c
===================================================================
--- linux.orig/arch/x86_64/kernel/early_printk.c
+++ linux/arch/x86_64/kernel/early_printk.c
@@ -11,11 +11,10 @@
 
 #ifdef __i386__
 #include <asm/setup.h>
-#define VGABASE		(__ISA_IO_base + 0xb8000)
 #else
 #include <asm/bootsetup.h>
-#define VGABASE		((void __iomem *)0xffffffff800b8000UL)
 #endif
+#define VGABASE		(__ISA_IO_base + 0xb8000)
 
 static int max_ypos = 25, max_xpos = 80;
 static int current_ypos = 25, current_xpos = 0;

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [6/22] x86_64: modify copy_bootdata to use virtual addresses
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (4 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [5/22] x86_64: Fix early printk to use standard ISA mapping Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [7/22] x86_64: cleanup segments Andi Kleen
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>

Use virtual addresses instead of physical addresses
in copy bootdata.  In addition fix the implementation
of the old bootloader convention.  Everything is
at real_mode_data always.  It is just that sometimes
real_mode_data was relocated by setup.S to not sit at
0x90000.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/head64.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

Index: linux/arch/x86_64/kernel/head64.c
===================================================================
--- linux.orig/arch/x86_64/kernel/head64.c
+++ linux/arch/x86_64/kernel/head64.c
@@ -29,25 +29,24 @@ static void __init clear_bss(void)
 }
 
 #define NEW_CL_POINTER		0x228	/* Relative to real mode data */
-#define OLD_CL_MAGIC_ADDR	0x90020
+#define OLD_CL_MAGIC_ADDR	0x20
 #define OLD_CL_MAGIC            0xA33F
-#define OLD_CL_BASE_ADDR        0x90000
-#define OLD_CL_OFFSET           0x90022
+#define OLD_CL_OFFSET           0x22
 
 static void __init copy_bootdata(char *real_mode_data)
 {
-	int new_data;
+	unsigned long new_data;
 	char * command_line;
 
 	memcpy(x86_boot_params, real_mode_data, BOOT_PARAM_SIZE);
-	new_data = *(int *) (x86_boot_params + NEW_CL_POINTER);
+	new_data = *(u32 *) (x86_boot_params + NEW_CL_POINTER);
 	if (!new_data) {
-		if (OLD_CL_MAGIC != * (u16 *) OLD_CL_MAGIC_ADDR) {
+		if (OLD_CL_MAGIC != *(u16 *)(real_mode_data + OLD_CL_MAGIC_ADDR)) {
 			return;
 		}
-		new_data = OLD_CL_BASE_ADDR + * (u16 *) OLD_CL_OFFSET;
+		new_data = __pa(real_mode_data) + *(u16 *)(real_mode_data + OLD_CL_OFFSET);
 	}
-	command_line = (char *) ((u64)(new_data));
+	command_line = __va(new_data);
 	memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 }
 
@@ -74,7 +73,7 @@ void __init x86_64_start_kernel(char * r
  		cpu_pda(i) = &boot_cpu_pda[i];
 
 	pda_init(0);
-	copy_bootdata(real_mode_data);
+	copy_bootdata(__va(real_mode_data));
 #ifdef CONFIG_SMP
 	cpu_set(0, cpu_online_map);
 #endif

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [7/22] x86_64: cleanup segments
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (5 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [6/22] x86_64: modify copy_bootdata to use virtual addresses Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [8/22] x86_64: Add EFER to the register set saved by save_processor_state Andi Kleen
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
used when entering the kernel so putting it first is useful when
trying to keep boot gdt sizes to a minimum.

Set the accessed bit on all gdt entries.  We don't care
so there is no need for the cpu to burn the extra cycles,
and it potentially allows the pages to be immutable.  Plus
it is confusing when debugging and your gdt entries mysteriously
change.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/head.S    |   12 ++++++------
 include/asm-x86_64/segment.h |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -362,13 +362,13 @@ gdt:
 	
 ENTRY(cpu_gdt_table)
 	.quad	0x0000000000000000	/* NULL descriptor */
+	.quad	0x00cf9b000000ffff	/* __KERNEL32_CS */
+	.quad	0x00af9b000000ffff	/* __KERNEL_CS */
+	.quad	0x00cf93000000ffff	/* __KERNEL_DS */
+	.quad	0x00cffb000000ffff	/* __USER32_CS */
+	.quad	0x00cff3000000ffff	/* __USER_DS, __USER32_DS  */
+	.quad	0x00affb000000ffff	/* __USER_CS */
 	.quad	0x0			/* unused */
-	.quad	0x00af9a000000ffff	/* __KERNEL_CS */
-	.quad	0x00cf92000000ffff	/* __KERNEL_DS */
-	.quad	0x00cffa000000ffff	/* __USER32_CS */
-	.quad	0x00cff2000000ffff	/* __USER_DS, __USER32_DS  */		
-	.quad	0x00affa000000ffff	/* __USER_CS */
-	.quad	0x00cf9a000000ffff	/* __KERNEL32_CS */
 	.quad	0,0			/* TSS */
 	.quad	0,0			/* LDT */
 	.quad   0,0,0			/* three TLS descriptors */ 
Index: linux/include/asm-x86_64/segment.h
===================================================================
--- linux.orig/include/asm-x86_64/segment.h
+++ linux/include/asm-x86_64/segment.h
@@ -6,7 +6,7 @@
 #define __KERNEL_CS	0x10
 #define __KERNEL_DS	0x18
 
-#define __KERNEL32_CS   0x38
+#define __KERNEL32_CS   0x08
 
 /* 
  * we cannot use the same code segment descriptor for user and kernel

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [8/22] x86_64: Add EFER to the register set saved by save_processor_state
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (6 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [7/22] x86_64: cleanup segments Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [9/22] x86_64: 64bit PIC SMP trampoline Andi Kleen
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


EFER varies like %cr4 depending on the cpu capabilities, and which cpu
capabilities we want to make use of.  So save/restore it make certain
we have the same EFER value when we are done.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/suspend.c |    3 ++-
 include/asm-x86_64/suspend.h |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/kernel/suspend.c
===================================================================
--- linux.orig/arch/x86_64/kernel/suspend.c
+++ linux/arch/x86_64/kernel/suspend.c
@@ -33,7 +33,6 @@ void __save_processor_state(struct saved
 	asm volatile ("str %0"  : "=m" (ctxt->tr));
 
 	/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
-	/* EFER should be constant for kernel version, no need to handle it. */
 	/*
 	 * segment registers
 	 */
@@ -50,6 +49,7 @@ void __save_processor_state(struct saved
 	/*
 	 * control registers 
 	 */
+	rdmsrl(MSR_EFER, ctxt->efer);
 	asm volatile ("movq %%cr0, %0" : "=r" (ctxt->cr0));
 	asm volatile ("movq %%cr2, %0" : "=r" (ctxt->cr2));
 	asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
@@ -75,6 +75,7 @@ void __restore_processor_state(struct sa
 	/*
 	 * control registers
 	 */
+	wrmsrl(MSR_EFER, ctxt->efer);
 	asm volatile ("movq %0, %%cr8" :: "r" (ctxt->cr8));
 	asm volatile ("movq %0, %%cr4" :: "r" (ctxt->cr4));
 	asm volatile ("movq %0, %%cr3" :: "r" (ctxt->cr3));
Index: linux/include/asm-x86_64/suspend.h
===================================================================
--- linux.orig/include/asm-x86_64/suspend.h
+++ linux/include/asm-x86_64/suspend.h
@@ -17,6 +17,7 @@ struct saved_context {
   	u16 ds, es, fs, gs, ss;
 	unsigned long gs_base, gs_kernel_base, fs_base;
 	unsigned long cr0, cr2, cr3, cr4, cr8;
+	unsigned long efer;
 	u16 gdt_pad;
 	u16 gdt_limit;
 	unsigned long gdt_base;

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [9/22] x86_64: 64bit PIC SMP trampoline
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (7 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [8/22] x86_64: Add EFER to the register set saved by save_processor_state Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [10/22] x86_64: Get rid of dead code in suspend resume Andi Kleen
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


This modifies the SMP trampoline and all of the associated code so
it can jump to a 64bit kernel loaded at an arbitrary address.

The dependencies on having an idenetity mapped page in the kernel
page tables for SMP bootup have all been removed.

In addition the trampoline has been modified to verify
that long mode is supported.  Asking if long mode is implemented is
down right silly but we have traditionally had some of these checks,
and they can't hurt anything.  So when the totally ludicrous happens
we just might handle it correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/head.S       |    1 
 arch/x86_64/kernel/setup.c      |    9 --
 arch/x86_64/kernel/trampoline.S |  168 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 156 insertions(+), 22 deletions(-)

Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -101,6 +101,7 @@ startup_32:
 	.org 0x100	
 	.globl startup_64
 startup_64:
+ENTRY(secondary_startup_64)
 	/* We come here either from startup_32
 	 * or directly from a 64bit bootloader.
 	 * Since we may have come directly from a bootloader we
Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -329,15 +329,8 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 #ifdef CONFIG_SMP
-	/*
-	 * But first pinch a few for the stack/trampoline stuff
-	 * FIXME: Don't need the extra page at 4K, but need to fix
-	 * trampoline before removing it. (see the GDT stuff)
-	 */
-	reserve_bootmem_generic(PAGE_SIZE, PAGE_SIZE);
-
 	/* Reserve SMP trampoline */
-	reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, PAGE_SIZE);
+	reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, 2*PAGE_SIZE);
 #endif
 
 #ifdef CONFIG_ACPI_SLEEP
Index: linux/arch/x86_64/kernel/trampoline.S
===================================================================
--- linux.orig/arch/x86_64/kernel/trampoline.S
+++ linux/arch/x86_64/kernel/trampoline.S
@@ -3,6 +3,7 @@
  *	Trampoline.S	Derived from Setup.S by Linus Torvalds
  *
  *	4 Jan 1997 Michael Chastain: changed to gnu as.
+ *	15 Sept 2005 Eric Biederman: 64bit PIC support
  *
  *	Entry: CS:IP point to the start of our code, we are 
  *	in real mode with no stack, but the rest of the 
@@ -17,15 +18,20 @@
  *	and IP is zero.  Thus, data addresses need to be absolute
  *	(no relocation) and are taken with regard to r_base.
  *
+ *	With the addition of trampoline_level4_pgt this code can
+ *	now enter a 64bit kernel that lives at arbitrary 64bit
+ *	physical addresses.
+ *
  *	If you work on this file, check the object module with objdump
  *	--full-contents --reloc to make sure there are no relocation
- *	entries. For the GDT entry we do hand relocation in smpboot.c
- *	because of 64bit linker limitations.
+ *	entries.
  */
 
 #include <linux/linkage.h>
-#include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/msr.h>
+#include <asm/segment.h>
 
 .data
 
@@ -33,15 +39,31 @@
 
 ENTRY(trampoline_data)
 r_base = .
+	cli			# We should be safe anyway
 	wbinvd	
 	mov	%cs, %ax	# Code and data in the same place
 	mov	%ax, %ds
+	mov	%ax, %es
+	mov	%ax, %ss
 
-	cli			# We should be safe anyway
 
 	movl	$0xA5A5A5A5, trampoline_data - r_base
 				# write marker for master knows we're running
 
+					# Setup stack
+	movw	$(trampoline_stack_end - r_base), %sp
+
+	call	verify_cpu		# Verify the cpu supports long mode
+
+	mov	%cs, %ax
+	movzx	%ax, %esi		# Find the 32bit trampoline location
+	shll	$4, %esi
+
+					# Fixup the vectors
+	addl	%esi, startup_32_vector - r_base
+	addl	%esi, startup_64_vector - r_base
+	addl	%esi, tgdt + 2 - r_base	# Fixup the gdt pointer
+
 	/*
 	 * GDT tables in non default location kernel can be beyond 16MB and
 	 * lgdt will not be able to load the address as in real mode default
@@ -49,23 +71,141 @@ r_base = .
 	 * to 32 bit.
 	 */
 
-	lidtl	idt_48 - r_base	# load idt with 0, 0
-	lgdtl	gdt_48 - r_base	# load gdt with whatever is appropriate
+	lidtl	tidt - r_base	# load idt with 0, 0
+	lgdtl	tgdt - r_base	# load gdt with whatever is appropriate
 
 	xor	%ax, %ax
 	inc	%ax		# protected mode (PE) bit
 	lmsw	%ax		# into protected mode
-	# flaush prefetch and jump to startup_32 in arch/x86_64/kernel/head.S
-	ljmpl	$__KERNEL32_CS, $(startup_32-__START_KERNEL_map)
+
+	# flush prefetch and jump to startup_32
+	ljmpl	*(startup_32_vector - r_base)
+
+	.code32
+	.balign 4
+startup_32:
+	movl	$__KERNEL_DS, %eax	# Initialize the %ds segment register
+	movl	%eax, %ds
+
+	xorl	%eax, %eax
+	btsl	$5, %eax		# Enable PAE mode
+	movl	%eax, %cr4
+
+					# Setup trampoline 4 level pagetables
+	leal	(trampoline_level4_pgt - r_base)(%esi), %eax
+	movl	%eax, %cr3
+
+	movl	$MSR_EFER, %ecx
+	movl	$(1 << _EFER_LME), %eax	# Enable Long Mode
+	xorl	%edx, %edx
+	wrmsr
+
+	xorl	%eax, %eax
+	btsl	$31, %eax		# Enable paging and in turn activate Long Mode
+	btsl	$0, %eax		# Enable protected mode
+	movl	%eax, %cr0
+
+	/*
+	 * At this point we're in long mode but in 32bit compatibility mode
+	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
+	 * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
+	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
+	 */
+	ljmp	*(startup_64_vector - r_base)(%esi)
+
+	.code64
+	.balign 4
+startup_64:
+	# Now jump into the kernel using virtual addresses
+	movq	$secondary_startup_64, %rax
+	jmp	*%rax
+
+	.code16
+verify_cpu:
+	pushl	$0			# Kill any dangerous flags
+	popfl
+
+	/* minimum CPUID flags for x86-64 */
+	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
+#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
+			   (1<<13)|(1<<15)|(1<<24)|(1<<25)|(1<<26))
+#define REQUIRED_MASK2 (1<<29)
+
+	pushfl				# check for cpuid
+	popl	%eax
+	movl	%eax, %ebx
+	xorl	$0x200000,%eax
+	pushl	%eax
+	popfl
+	pushfl
+	popl	%eax
+	pushl	%ebx
+	popfl
+	cmpl	%eax, %ebx
+	jz	no_longmode
+
+	xorl	%eax, %eax		# See if cpuid 1 is implemented
+	cpuid
+	cmpl	$0x1, %eax
+	jb	no_longmode
+
+	movl	$0x01, %eax		# Does the cpu have what it takes?
+	cpuid
+	andl	$REQUIRED_MASK1, %edx
+	xorl	$REQUIRED_MASK1, %edx
+	jnz	no_longmode
+
+	movl	$0x80000000, %eax	# See if extended cpuid is implemented
+	cpuid
+	cmpl	$0x80000001, %eax
+	jb	no_longmode
+
+	movl	$0x80000001, %eax	# Does the cpu have what it takes?
+	cpuid
+	andl	$REQUIRED_MASK2, %edx
+	xorl	$REQUIRED_MASK2, %edx
+	jnz	no_longmode
+
+	ret				# The cpu supports long mode
+
+no_longmode:
+	hlt
+	jmp no_longmode
+
 
 	# Careful these need to be in the same 64K segment as the above;
-idt_48:
+tidt:
 	.word	0			# idt limit = 0
 	.word	0, 0			# idt base = 0L
 
-gdt_48:
-	.short	GDT_ENTRIES*8 - 1	# gdt limit
-	.long	cpu_gdt_table-__START_KERNEL_map
+	# Duplicate the global descriptor table
+	# so the kernel can live anywhere
+	.balign 4
+tgdt:
+	.short	tgdt_end - tgdt		# gdt limit
+	.long	tgdt - r_base
+	.short 0
+	.quad	0x00cf9b000000ffff	# __KERNEL32_CS
+	.quad	0x00af9b000000ffff	# __KERNEL_CS
+	.quad	0x00cf93000000ffff	# __KERNEL_DS
+tgdt_end:
+
+	.balign 4
+startup_32_vector:
+	.long	startup_32 - r_base
+	.word	__KERNEL32_CS, 0
+
+	.balign 4
+startup_64_vector:
+	.long	startup_64 - r_base
+	.word	__KERNEL_CS, 0
+
+trampoline_stack:
+	.org 0x1000
+trampoline_stack_end:
+ENTRY(trampoline_level4_pgt)
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	510,8,0
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 
-.globl trampoline_end
-trampoline_end:	
+ENTRY(trampoline_end)

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [10/22] x86_64: Get rid of dead code in suspend resume
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (8 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [9/22] x86_64: 64bit PIC SMP trampoline Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [11/22] x86_64: wakeup.S rename registers to reflect right names Andi Kleen
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o Get rid of dead code in wakeup.S

o We never restore from saved_gdt, saved_idt, saved_ltd, saved_tss, saved_cr3,
  saved_cr4, saved_cr0, real_save_gdt, saved_efer, saved_efer2. Get rid
  of of associated code.

o Get rid of bogus_magic, bogus_31_magic and bogus_magic2. No longer being
  used.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/acpi/wakeup.S |   57 ---------------------------------------
 1 file changed, 1 insertion(+), 56 deletions(-)

Index: linux/arch/x86_64/kernel/acpi/wakeup.S
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/wakeup.S
+++ linux/arch/x86_64/kernel/acpi/wakeup.S
@@ -258,8 +258,6 @@ gdt_48a:
 	.word	0, 0				# gdt base (filled in later)
 	
 	
-real_save_gdt:	.word 0
-		.quad 0
 real_magic:	.quad 0
 video_mode:	.quad 0
 video_flags:	.quad 0
@@ -272,10 +270,6 @@ bogus_32_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
 	jmp bogus_32_magic
 
-bogus_31_magic:
-	movb	$0xb1,%al	;  outb %al,$0x80
-	jmp bogus_31_magic
-
 bogus_cpu:
 	movb	$0xbc,%al	;  outb %al,$0x80
 	jmp bogus_cpu
@@ -346,16 +340,6 @@ check_vesaa:
 
 _setbada: jmp setbada
 
-	.code64
-bogus_magic:
-	movw	$0x0e00 + 'B', %ds:(0xb8018)
-	jmp bogus_magic
-
-bogus_magic2:
-	movw	$0x0e00 + '2', %ds:(0xb8018)
-	jmp bogus_magic2
-	
-
 wakeup_stack_begin:	# Stack grows down
 
 .org	0xff0
@@ -373,28 +357,11 @@ ENTRY(wakeup_end)
 #
 # Returned address is location of code in low memory (past data and stack)
 #
+	.code64
 ENTRY(acpi_copy_wakeup_routine)
 	pushq	%rax
-	pushq	%rcx
 	pushq	%rdx
 
-	sgdt	saved_gdt
-	sidt	saved_idt
-	sldt	saved_ldt
-	str	saved_tss
-
-	movq    %cr3, %rdx
-	movq    %rdx, saved_cr3
-	movq    %cr4, %rdx
-	movq    %rdx, saved_cr4
-	movq	%cr0, %rdx
-	movq	%rdx, saved_cr0
-	sgdt    real_save_gdt - wakeup_start (,%rdi)
-	movl	$MSR_EFER, %ecx
-	rdmsr
-	movl	%eax, saved_efer
-	movl	%edx, saved_efer2
-
 	movl	saved_video_mode, %edx
 	movl	%edx, video_mode - wakeup_start (,%rdi)
 	movl	acpi_video_flags, %edx
@@ -407,17 +374,8 @@ ENTRY(acpi_copy_wakeup_routine)
 	cmpl	$0x9abcdef0, %eax
 	jne	bogus_32_magic
 
-	# make sure %cr4 is set correctly (features, etc)
-	movl	saved_cr4 - __START_KERNEL_map, %eax
-	movq	%rax, %cr4
-
-	movl	saved_cr0 - __START_KERNEL_map, %eax
-	movq	%rax, %cr0
-	jmp	1f		# Flush pipelines
-1:
 	# restore the regs we used
 	popq	%rdx
-	popq	%rcx
 	popq	%rax
 ENTRY(do_suspend_lowlevel_s4bios)
 	ret
@@ -512,16 +470,3 @@ ENTRY(saved_eip)	.quad	0
 ENTRY(saved_esp)	.quad	0
 
 ENTRY(saved_magic)	.quad	0
-
-ALIGN
-# saved registers
-saved_gdt:	.quad	0,0
-saved_idt:	.quad	0,0
-saved_ldt:	.quad	0
-saved_tss:	.quad	0
-
-saved_cr0:	.quad 0
-saved_cr3:	.quad 0
-saved_cr4:	.quad 0
-saved_efer:	.quad 0
-saved_efer2:	.quad 0

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [11/22] x86_64: wakeup.S rename registers to reflect right names
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (9 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [10/22] x86_64: Get rid of dead code in suspend resume Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:58 ` [PATCH] [12/22] x86_64: wakeup.S misc cleanups Andi Kleen
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o Use appropriate names for 64bit regsiters.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/acpi/wakeup.S |   36 ++++++++++++++++++------------------
 include/asm-x86_64/suspend.h     |   12 ++++++------
 2 files changed, 24 insertions(+), 24 deletions(-)

Index: linux/arch/x86_64/kernel/acpi/wakeup.S
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/wakeup.S
+++ linux/arch/x86_64/kernel/acpi/wakeup.S
@@ -211,16 +211,16 @@ wakeup_long64:
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_esp, %rsp
+	movq	saved_rsp, %rsp
 
 	movw	$0x0e00 + 'x', %ds:(0xb8018)
-	movq	saved_ebx, %rbx
-	movq	saved_edi, %rdi
-	movq	saved_esi, %rsi
-	movq	saved_ebp, %rbp
+	movq	saved_rbx, %rbx
+	movq	saved_rdi, %rdi
+	movq	saved_rsi, %rsi
+	movq	saved_rbp, %rbp
 
 	movw	$0x0e00 + '!', %ds:(0xb801a)
-	movq	saved_eip, %rax
+	movq	saved_rip, %rax
 	jmp	*%rax
 
 .code32
@@ -408,13 +408,13 @@ do_suspend_lowlevel:
 	movq %r15, saved_context_r15(%rip)
 	pushfq ; popq saved_context_eflags(%rip)
 
-	movq	$.L97, saved_eip(%rip)
+	movq	$.L97, saved_rip(%rip)
 
-	movq %rsp,saved_esp
-	movq %rbp,saved_ebp
-	movq %rbx,saved_ebx
-	movq %rdi,saved_edi
-	movq %rsi,saved_esi
+	movq %rsp,saved_rsp
+	movq %rbp,saved_rbp
+	movq %rbx,saved_rbx
+	movq %rdi,saved_rdi
+	movq %rsi,saved_rsi
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -461,12 +461,12 @@ do_suspend_lowlevel:
 	
 .data
 ALIGN
-ENTRY(saved_ebp)	.quad	0
-ENTRY(saved_esi)	.quad	0
-ENTRY(saved_edi)	.quad	0
-ENTRY(saved_ebx)	.quad	0
+ENTRY(saved_rbp)	.quad	0
+ENTRY(saved_rsi)	.quad	0
+ENTRY(saved_rdi)	.quad	0
+ENTRY(saved_rbx)	.quad	0
 
-ENTRY(saved_eip)	.quad	0
-ENTRY(saved_esp)	.quad	0
+ENTRY(saved_rip)	.quad	0
+ENTRY(saved_rsp)	.quad	0
 
 ENTRY(saved_magic)	.quad	0
Index: linux/include/asm-x86_64/suspend.h
===================================================================
--- linux.orig/include/asm-x86_64/suspend.h
+++ linux/include/asm-x86_64/suspend.h
@@ -45,12 +45,12 @@ extern unsigned long saved_context_eflag
 extern void fix_processor_context(void);
 
 #ifdef CONFIG_ACPI_SLEEP
-extern unsigned long saved_eip;
-extern unsigned long saved_esp;
-extern unsigned long saved_ebp;
-extern unsigned long saved_ebx;
-extern unsigned long saved_esi;
-extern unsigned long saved_edi;
+extern unsigned long saved_rip;
+extern unsigned long saved_rsp;
+extern unsigned long saved_rbp;
+extern unsigned long saved_rbx;
+extern unsigned long saved_rsi;
+extern unsigned long saved_rdi;
 
 /* routines for saving/restoring kernel state */
 extern int acpi_save_state_mem(void);

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [12/22] x86_64: wakeup.S misc cleanups
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (10 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [11/22] x86_64: wakeup.S rename registers to reflect right names Andi Kleen
@ 2007-04-28 17:58 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [13/22] x86_64: 64bit ACPI wakeup trampoline Andi Kleen
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:58 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o Various cleanups. One of the main purpose of cleanups is that make
  wakeup.S as close as possible to trampoline.S.

o Following are the changes
	- Indentations for comments.
	- Changed the gdt table to compact form and to resemble the
	  one in trampoline.S
	- Take the jump to 32bit from real mode using ljmpl. Makes code
	  more readable.
	- After enabling long mode, directly take a long jump for 64bit
	  mode. No need to take an extra jump to "reach_comaptibility_mode"
	- Stack is not used after real mode. So don't load stack in
 	  32 bit mode.
	- No need to enable PGE here.
	- No need to do extra EFER read, anyway we trash the read contents.
	- No need to enable system call (EFER_SCE). Anyway it will be 
	  enabled when original EFER is restored.
	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
  	  reload the original cr0 while restroing the processor state.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/acpi/wakeup.S |  112 +++++++++++++--------------------------
 1 file changed, 40 insertions(+), 72 deletions(-)

Index: linux/arch/x86_64/kernel/acpi/wakeup.S
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/wakeup.S
+++ linux/arch/x86_64/kernel/acpi/wakeup.S
@@ -30,11 +30,12 @@ wakeup_code:
 	cld
 	# setup data segment
 	movw	%cs, %ax
-	movw	%ax, %ds					# Make ds:0 point to wakeup_start
+	movw	%ax, %ds		# Make ds:0 point to wakeup_start
 	movw	%ax, %ss
-	mov	$(wakeup_stack - wakeup_code), %sp		# Private stack is needed for ASUS board
+					# Private stack is needed for ASUS board
+	mov	$(wakeup_stack - wakeup_code), %sp
 
-	pushl	$0						# Kill any dangerous flags
+	pushl	$0			# Kill any dangerous flags
 	popfl
 
 	movl	real_magic - wakeup_code, %eax
@@ -45,7 +46,7 @@ wakeup_code:
 	jz	1f
 	lcall   $0xc000,$3
 	movw	%cs, %ax
-	movw	%ax, %ds					# Bios might have played with that
+	movw	%ax, %ds		# Bios might have played with that
 	movw	%ax, %ss
 1:
 
@@ -75,9 +76,12 @@ wakeup_code:
 	jmp	1f
 1:
 
-	.byte 0x66, 0xea			# prefix + jmpi-opcode
-	.long	wakeup_32 - __START_KERNEL_map
-	.word	__KERNEL_CS
+	ljmpl   *(wakeup_32_vector - wakeup_code)
+
+	.balign 4
+wakeup_32_vector:
+	.long   wakeup_32 - __START_KERNEL_map
+	.word   __KERNEL32_CS, 0
 
 	.code32
 wakeup_32:
@@ -96,65 +100,50 @@ wakeup_32:
 	jnc	bogus_cpu
 	movl	%edx,%edi
 	
-	movw	$__KERNEL_DS, %ax
-	movw	%ax, %ds
-	movw	%ax, %es
-	movw	%ax, %fs
-	movw	%ax, %gs
+	movl	$__KERNEL_DS, %eax
+	movl	%eax, %ds
 
-	movw	$__KERNEL_DS, %ax	
-	movw	%ax, %ss
-
-	mov	$(wakeup_stack - __START_KERNEL_map), %esp
 	movl	saved_magic - __START_KERNEL_map, %eax
 	cmpl	$0x9abcdef0, %eax
 	jne	bogus_32_magic
 
+	movw	$0x0e00 + 'i', %ds:(0xb8012)
+	movb	$0xa8, %al	;  outb %al, $0x80;
+
 	/*
 	 * Prepare for entering 64bits mode
 	 */
 
-	/* Enable PAE mode and PGE */
+	/* Enable PAE */
 	xorl	%eax, %eax
 	btsl	$5, %eax
-	btsl	$7, %eax
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
 	movl	$(wakeup_level4_pgt - __START_KERNEL_map), %eax
 	movl	%eax, %cr3
 
-	/* Setup EFER (Extended Feature Enable Register) */
-	movl	$MSR_EFER, %ecx
-	rdmsr
-	/* Fool rdmsr and reset %eax to avoid dependences */
-	xorl	%eax, %eax
 	/* Enable Long Mode */
+	xorl    %eax, %eax
 	btsl	$_EFER_LME, %eax
-	/* Enable System Call */
-	btsl	$_EFER_SCE, %eax
 
-	/* No Execute supported? */	
+	/* No Execute supported? */
 	btl	$20,%edi
 	jnc     1f
 	btsl	$_EFER_NX, %eax
-1:	
 				
 	/* Make changes effective */
+1:	movl    $MSR_EFER, %ecx
+	xorl    %edx, %edx
 	wrmsr
-	wbinvd
 
 	xorl	%eax, %eax
 	btsl	$31, %eax			/* Enable paging and in turn activate Long Mode */
 	btsl	$0, %eax			/* Enable protected mode */
-	btsl	$1, %eax			/* Enable MP */
-	btsl	$4, %eax			/* Enable ET */
-	btsl	$5, %eax			/* Enable NE */
-	btsl	$16, %eax			/* Enable WP */
-	btsl	$18, %eax			/* Enable AM */
 
 	/* Make changes effective */
 	movl	%eax, %cr0
+
 	/* At this point:
 		CR4.PAE must be 1
 		CS.L must be 0
@@ -162,11 +151,6 @@ wakeup_32:
 		Next instruction must be a branch
 		This must be on identity-mapped page
 	*/
-	jmp	reach_compatibility_mode
-reach_compatibility_mode:
-	movw	$0x0e00 + 'i', %ds:(0xb8012)
-	movb	$0xa8, %al	;  outb %al, $0x80; 	
-		
 	/*
 	 * At this point we're in long mode but in 32bit compatibility mode
 	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
@@ -174,24 +158,19 @@ reach_compatibility_mode:
 	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
 	 */
 
-	movw	$0x0e00 + 'n', %ds:(0xb8014)
-	movb	$0xa9, %al	;  outb %al, $0x80
-	
-	/* Load new GDT with the 64bit segment using 32bit descriptor */
-	movl	$(pGDT32 - __START_KERNEL_map), %eax
-	lgdt	(%eax)
-
-	movl    $(wakeup_jumpvector - __START_KERNEL_map), %eax
 	/* Finally jump in 64bit mode */
-	ljmp	*(%eax)
+	ljmp	*(wakeup_long64_vector - __START_KERNEL_map)
 
-wakeup_jumpvector:
-	.long	wakeup_long64 - __START_KERNEL_map
-	.word	__KERNEL_CS
+	.balign 4
+wakeup_long64_vector:
+	.long   wakeup_long64 - __START_KERNEL_map
+	.word   __KERNEL_CS, 0
 
 .code64
 
-	/*	Hooray, we are in Long 64-bit mode (but still running in low memory) */
+	/* Hooray, we are in Long 64-bit mode (but still running in
+	 * low memory)
+	 */
 wakeup_long64:
 	/*
 	 * We must switch to a new descriptor in kernel space for the GDT
@@ -201,6 +180,9 @@ wakeup_long64:
 	 */
 	lgdt	cpu_gdt_descr - __START_KERNEL_map
 
+	movw	$0x0e00 + 'n', %ds:(0xb8014)
+	movb	$0xa9, %al	;  outb %al, $0x80
+
 	movw	$0x0e00 + 'u', %ds:(0xb8016)
 	
 	nop
@@ -227,33 +209,19 @@ wakeup_long64:
 
 	.align	64	
 gdta:
+	/* Its good to keep gdt in sync with one in trampoline.S */
 	.word	0, 0, 0, 0			# dummy
-
-	.word	0, 0, 0, 0			# unused
-
-	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
-	.word	0				# base address = 0
-	.word	0x9B00				# code read/exec. ??? Why I need 0x9B00 (as opposed to 0x9A00 in order for this to work?)
-	.word	0x00CF				# granularity = 4096, 386
-						#  (+5th nibble of limit)
-
-	.word	0xFFFF				# 4Gb - (0x100000*0x1000 = 4Gb)
-	.word	0				# base address = 0
-	.word	0x9200				# data read/write
-	.word	0x00CF				# granularity = 4096, 386
-						#  (+5th nibble of limit)
-# this is 64bit descriptor for code
-	.word	0xFFFF
-	.word	0
-	.word	0x9A00				# code read/exec
-	.word	0x00AF				# as above, but it is long mode and with D=0
+	/* ??? Why I need the accessed bit set in order for this to work? */
+	.quad   0x00cf9b000000ffff              # __KERNEL32_CS
+	.quad   0x00af9b000000ffff              # __KERNEL_CS
+	.quad   0x00cf93000000ffff              # __KERNEL_DS
 
 idt_48a:
 	.word	0				# idt limit = 0
 	.word	0, 0				# idt base = 0L
 
 gdt_48a:
-	.word	0x8000				# gdt limit=2048,
+	.word	0x800				# gdt limit=2048,
 						#  256 GDT entries
 	.word	0, 0				# gdt base (filled in later)
 	
@@ -263,7 +231,7 @@ video_mode:	.quad 0
 video_flags:	.quad 0
 
 bogus_real_magic:
-	movb	$0xba,%al	;  outb %al,$0x80		
+	movb	$0xba,%al	;  outb %al,$0x80
 	jmp bogus_real_magic
 
 bogus_32_magic:

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [13/22] x86_64: 64bit ACPI wakeup trampoline
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (11 preceding siblings ...)
  2007-04-28 17:58 ` [PATCH] [12/22] x86_64: wakeup.S misc cleanups Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [14/22] x86_64: Modify discover_ebda to use virtual addresses Andi Kleen
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o Moved wakeup_level4_pgt into the wakeup routine so we can
  run the kernel above 4G.

o Now we first go to 64bit mode and continue to run from trampoline and
  then then start accessing kernel symbols and restore processor context.
  This enables us to resume even in relocatable kernel context when 
  kernel might not be loaded at physical addr it has been compiled for.

o Removed the need for modifying any existing kernel page table.

o Increased the size of the wakeup routine to 8K. This is required as
  wake page tables are on trampoline itself and they got to be at 4K
  boundary, hence one page is not sufficient.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/acpi/sleep.c  |   24 ++-------------
 arch/x86_64/kernel/acpi/wakeup.S |   59 ++++++++++++++++++++++++---------------
 arch/x86_64/kernel/head.S        |    9 -----
 3 files changed, 41 insertions(+), 51 deletions(-)

Index: linux/arch/x86_64/kernel/acpi/sleep.c
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/sleep.c
+++ linux/arch/x86_64/kernel/acpi/sleep.c
@@ -60,19 +60,6 @@ extern char wakeup_start, wakeup_end;
 
 extern unsigned long acpi_copy_wakeup_routine(unsigned long);
 
-static pgd_t low_ptr;
-
-static void init_low_mapping(void)
-{
-	pgd_t *slot0 = pgd_offset(current->mm, 0UL);
-	low_ptr = *slot0;
-	/* FIXME: We're playing with the current task's page tables here, which
-	 * is potentially dangerous on SMP systems.
-	 */
-	set_pgd(slot0, *pgd_offset(current->mm, PAGE_OFFSET));
-	local_flush_tlb();
-}
-
 /**
  * acpi_save_state_mem - save kernel state
  *
@@ -81,8 +68,6 @@ static void init_low_mapping(void)
  */
 int acpi_save_state_mem(void)
 {
-	init_low_mapping();
-
 	memcpy((void *)acpi_wakeup_address, &wakeup_start,
 	       &wakeup_end - &wakeup_start);
 	acpi_copy_wakeup_routine(acpi_wakeup_address);
@@ -95,8 +80,6 @@ int acpi_save_state_mem(void)
  */
 void acpi_restore_state_mem(void)
 {
-	set_pgd(pgd_offset(current->mm, 0UL), low_ptr);
-	local_flush_tlb();
 }
 
 /**
@@ -109,10 +92,11 @@ void acpi_restore_state_mem(void)
  */
 void __init acpi_reserve_bootmem(void)
 {
-	acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE);
-	if ((&wakeup_end - &wakeup_start) > PAGE_SIZE)
+	acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE*2);
+	if ((&wakeup_end - &wakeup_start) > (PAGE_SIZE*2))
 		printk(KERN_CRIT
-		       "ACPI: Wakeup code way too big, will crash on attempt to suspend\n");
+		       "ACPI: Wakeup code way too big, will crash on attempt"
+		       " to suspend\n");
 }
 
 static int __init acpi_sleep_setup(char *str)
Index: linux/arch/x86_64/kernel/acpi/wakeup.S
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/wakeup.S
+++ linux/arch/x86_64/kernel/acpi/wakeup.S
@@ -1,6 +1,7 @@
 .text
 #include <linux/linkage.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
 #include <asm/msr.h>
 
@@ -62,12 +63,15 @@ wakeup_code:
 
 	movb	$0xa2, %al	;  outb %al, $0x80
 	
-	lidt	%ds:idt_48a - wakeup_code
-	xorl	%eax, %eax
-	movw	%ds, %ax			# (Convert %ds:gdt to a linear ptr)
-	shll	$4, %eax
-	addl	$(gdta - wakeup_code), %eax
-	movl	%eax, gdt_48a +2 - wakeup_code
+	mov	%ds, %ax			# Find 32bit wakeup_code addr
+	movzx   %ax, %esi			# (Convert %ds:gdt to a liner ptr)
+	shll    $4, %esi
+						# Fix up the vectors
+	addl    %esi, wakeup_32_vector - wakeup_code
+	addl    %esi, wakeup_long64_vector - wakeup_code
+	addl    %esi, gdt_48a + 2 - wakeup_code # Fixup the gdt pointer
+
+	lidtl	%ds:idt_48a - wakeup_code
 	lgdtl	%ds:gdt_48a - wakeup_code	# load gdt with whatever is
 						# appropriate
 
@@ -80,7 +84,7 @@ wakeup_code:
 
 	.balign 4
 wakeup_32_vector:
-	.long   wakeup_32 - __START_KERNEL_map
+	.long   wakeup_32 - wakeup_code
 	.word   __KERNEL32_CS, 0
 
 	.code32
@@ -103,10 +107,6 @@ wakeup_32:
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 
-	movl	saved_magic - __START_KERNEL_map, %eax
-	cmpl	$0x9abcdef0, %eax
-	jne	bogus_32_magic
-
 	movw	$0x0e00 + 'i', %ds:(0xb8012)
 	movb	$0xa8, %al	;  outb %al, $0x80;
 
@@ -120,7 +120,7 @@ wakeup_32:
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
-	movl	$(wakeup_level4_pgt - __START_KERNEL_map), %eax
+	leal    (wakeup_level4_pgt - wakeup_code)(%esi), %eax
 	movl	%eax, %cr3
 
 	/* Enable Long Mode */
@@ -159,11 +159,11 @@ wakeup_32:
 	 */
 
 	/* Finally jump in 64bit mode */
-	ljmp	*(wakeup_long64_vector - __START_KERNEL_map)
+        ljmp    *(wakeup_long64_vector - wakeup_code)(%esi)
 
 	.balign 4
 wakeup_long64_vector:
-	.long   wakeup_long64 - __START_KERNEL_map
+	.long   wakeup_long64 - wakeup_code
 	.word   __KERNEL_CS, 0
 
 .code64
@@ -178,11 +178,16 @@ wakeup_long64:
 	 * addresses where we're currently running on. We have to do that here
 	 * because in 32bit we couldn't load a 64bit linear address.
 	 */
-	lgdt	cpu_gdt_descr - __START_KERNEL_map
+	lgdt	cpu_gdt_descr
 
 	movw	$0x0e00 + 'n', %ds:(0xb8014)
 	movb	$0xa9, %al	;  outb %al, $0x80
 
+	movq    saved_magic, %rax
+	movq    $0x123456789abcdef0, %rdx
+	cmpq    %rdx, %rax
+	jne     bogus_64_magic
+
 	movw	$0x0e00 + 'u', %ds:(0xb8016)
 	
 	nop
@@ -223,20 +228,21 @@ idt_48a:
 gdt_48a:
 	.word	0x800				# gdt limit=2048,
 						#  256 GDT entries
-	.word	0, 0				# gdt base (filled in later)
-	
+	.long   gdta - wakeup_code              # gdt base (relocated in later)
 	
 real_magic:	.quad 0
 video_mode:	.quad 0
 video_flags:	.quad 0
 
+.code16
 bogus_real_magic:
 	movb	$0xba,%al	;  outb %al,$0x80
 	jmp bogus_real_magic
 
-bogus_32_magic:
+.code64
+bogus_64_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
-	jmp bogus_32_magic
+	jmp bogus_64_magic
 
 bogus_cpu:
 	movb	$0xbc,%al	;  outb %al,$0x80
@@ -263,6 +269,7 @@ bogus_cpu:
 #define VIDEO_FIRST_V7 0x0900
 
 # Setting of user mode (AX=mode ID) => CF=success
+.code16
 mode_seta:
 	movw	%ax, %bx
 #if 0
@@ -313,6 +320,13 @@ wakeup_stack_begin:	# Stack grows down
 .org	0xff0
 wakeup_stack:		# Just below end of page
 
+.org   0x1000
+ENTRY(wakeup_level4_pgt)
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill   510,8,0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad   level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
+
 ENTRY(wakeup_end)
 	
 ##
@@ -338,9 +352,10 @@ ENTRY(acpi_copy_wakeup_routine)
 	movq	$0x123456789abcdef0, %rdx
 	movq	%rdx, saved_magic
 
-	movl	saved_magic - __START_KERNEL_map, %eax
-	cmpl	$0x9abcdef0, %eax
-	jne	bogus_32_magic
+	movq    saved_magic, %rax
+	movq    $0x123456789abcdef0, %rdx
+	cmpq    %rdx, %rax
+	jne     bogus_64_magic
 
 	# restore the regs we used
 	popq	%rdx
Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -308,15 +308,6 @@ NEXT_PAGE(level2_kernel_pgt)
 
 	.data
 
-#ifdef CONFIG_ACPI_SLEEP
-	.align PAGE_SIZE
-ENTRY(wakeup_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	510,8,0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
-#endif
-
 #ifndef CONFIG_HOTPLUG_CPU
 	__INITDATA
 #endif

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [14/22] x86_64: Modify discover_ebda to use virtual addresses
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (12 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [13/22] x86_64: 64bit ACPI wakeup trampoline Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [15/22] x86_64: Remove the identity mapping as early as possible Andi Kleen
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/setup.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -205,10 +205,10 @@ static void discover_ebda(void)
 	 * there is a real-mode segmented pointer pointing to the 
 	 * 4K EBDA area at 0x40E
 	 */
-	ebda_addr = *(unsigned short *)EBDA_ADDR_POINTER;
+	ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
 	ebda_addr <<= 4;
 
-	ebda_size = *(unsigned short *)(unsigned long)ebda_addr;
+	ebda_size = *(unsigned short *)__va(ebda_addr);
 
 	/* Round EBDA up to pages */
 	if (ebda_size == 0)

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [15/22] x86_64: Remove the identity mapping as early as possible
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (13 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [14/22] x86_64: Modify discover_ebda to use virtual addresses Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [16/22] x86: Move swsusp __pa() dependent code to arch portion Andi Kleen
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.

So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code.  The functions
 are subtly different thus the name change.

This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed.  Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/kernel/head.S    |   39 ++++++++++++++-------------------------
 arch/x86_64/kernel/head64.c  |   17 +++++++++++------
 arch/x86_64/kernel/setup.c   |    2 --
 arch/x86_64/kernel/setup64.c |    1 -
 arch/x86_64/mm/init.c        |   24 ------------------------
 include/asm-x86_64/pgtable.h |    1 -
 include/asm-x86_64/proto.h   |    2 --
 7 files changed, 25 insertions(+), 61 deletions(-)

Index: linux/arch/x86_64/kernel/head64.c
===================================================================
--- linux.orig/arch/x86_64/kernel/head64.c
+++ linux/arch/x86_64/kernel/head64.c
@@ -18,8 +18,16 @@
 #include <asm/setup.h>
 #include <asm/desc.h>
 #include <asm/pgtable.h>
+#include <asm/tlbflush.h>
 #include <asm/sections.h>
 
+static void __init zap_identity_mappings(void)
+{
+	pgd_t *pgd = pgd_offset_k(0UL);
+	pgd_clear(pgd);
+	__flush_tlb();
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
    yet. */
 static void __init clear_bss(void)
@@ -57,18 +65,15 @@ void __init x86_64_start_kernel(char * r
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
+	/* Make NULL pointers segfault */
+	zap_identity_mappings();
+
 	for (i = 0; i < IDT_ENTRIES; i++)
 		set_intr_gate(i, early_idt_handler);
 	asm volatile("lidt %0" :: "m" (idt_descr));
 
 	early_printk("Kernel alive\n");
 
-	/*
-	 * switch to init_level4_pgt from boot_level4_pgt
-	 */
-	memcpy(init_level4_pgt, boot_level4_pgt, PTRS_PER_PGD*sizeof(pgd_t));
-	asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
-
  	for (i = 0; i < NR_CPUS; i++)
  		cpu_pda(i) = &boot_cpu_pda[i];
 
Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -71,7 +71,7 @@ startup_32:
 	movl	%eax, %cr4
 
 	/* Setup early boot stage 4 level pagetables */
-	movl	$(boot_level4_pgt - __START_KERNEL_map), %eax
+	movl	$(init_level4_pgt - __START_KERNEL_map), %eax
 	movl	%eax, %cr3
 
 	/* Setup EFER (Extended Feature Enable Register) */
@@ -115,7 +115,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr4
 
 	/* Setup early boot stage 4 level pagetables. */
-	movq	$(boot_level4_pgt - __START_KERNEL_map), %rax
+	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	movq	%rax, %cr3
 
 	/* Check if nx is implemented */
@@ -274,9 +274,19 @@ ENTRY(name)
 	i = i + 1 ;				\
 	.endr
 
+	/*
+	 * This default setting generates an ident mapping at address 0x100000
+	 * and a mapping for the kernel that precisely maps virtual address
+	 * 0xffffffff80000000 to physical address 0x000000. (always using
+	 * 2Mbyte large pages provided by PAE mode)
+	 */
 NEXT_PAGE(init_level4_pgt)
-	/* This gets initialized in x86_64_start_kernel */
-	.fill	512,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	257,8,0
+	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.fill	252,8,0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
 NEXT_PAGE(level3_ident_pgt)
 	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -307,27 +317,6 @@ NEXT_PAGE(level2_kernel_pgt)
 #undef NEXT_PAGE
 
 	.data
-
-#ifndef CONFIG_HOTPLUG_CPU
-	__INITDATA
-#endif
-	/*
-	 * This default setting generates an ident mapping at address 0x100000
-	 * and a mapping for the kernel that precisely maps virtual address
-	 * 0xffffffff80000000 to physical address 0x000000. (always using
-	 * 2Mbyte large pages provided by PAE mode)
-	 */
-	.align PAGE_SIZE
-ENTRY(boot_level4_pgt)
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	257,8,0
-	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.fill	252,8,0
-	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
-
-	.data
-
 	.align 16
 	.globl cpu_gdt_descr
 cpu_gdt_descr:
Index: linux/arch/x86_64/kernel/setup64.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup64.c
+++ linux/arch/x86_64/kernel/setup64.c
@@ -201,7 +201,6 @@ void __cpuinit cpu_init (void)
 	/* CPU 0 is initialised in head64.c */
 	if (cpu != 0) {
 		pda_init(cpu);
-		zap_low_mappings(cpu);
 	} else 
 		estacks = boot_exception_stacks; 
 
Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -274,8 +274,6 @@ void __init setup_arch(char **cmdline_p)
 
 	dmi_scan_machine();
 
-	zap_low_mappings(0);
-
 #ifdef CONFIG_ACPI
 	/*
 	 * Initialize the ACPI boot-time table parser (gets the RSDP and SDT).
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -378,21 +378,6 @@ void __meminit init_memory_mapping(unsig
 	__flush_tlb_all();
 }
 
-void __cpuinit zap_low_mappings(int cpu)
-{
-	if (cpu == 0) {
-		pgd_t *pgd = pgd_offset_k(0UL);
-		pgd_clear(pgd);
-	} else {
-		/*
-		 * For AP's, zap the low identity mappings by changing the cr3
-		 * to init_level4_pgt and doing local flush tlb all
-		 */
-		asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(&init_level4_pgt)));
-	}
-	__flush_tlb_all();
-}
-
 #ifndef CONFIG_NUMA
 void __init paging_init(void)
 {
@@ -569,15 +554,6 @@ void __init mem_init(void)
 		reservedpages << (PAGE_SHIFT-10),
 		datasize >> 10,
 		initsize >> 10);
-
-#ifdef CONFIG_SMP
-	/*
-	 * Sync boot_level4_pgt mappings with the init_level4_pgt
-	 * except for the low identity mappings which are already zapped
-	 * in init_level4_pgt. This sync-up is essential for AP's bringup
-	 */
-	memcpy(boot_level4_pgt+1, init_level4_pgt+1, (PTRS_PER_PGD-1)*sizeof(pgd_t));
-#endif
 }
 
 void free_init_pages(char *what, unsigned long begin, unsigned long end)
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -17,7 +17,6 @@ extern pud_t level3_kernel_pgt[512];
 extern pud_t level3_ident_pgt[512];
 extern pmd_t level2_kernel_pgt[512];
 extern pgd_t init_level4_pgt[];
-extern pgd_t boot_level4_pgt[];
 extern unsigned long __supported_pte_mask;
 
 #define swapper_pg_dir init_level4_pgt
Index: linux/include/asm-x86_64/proto.h
===================================================================
--- linux.orig/include/asm-x86_64/proto.h
+++ linux/include/asm-x86_64/proto.h
@@ -11,8 +11,6 @@ struct pt_regs;
 extern void start_kernel(void);
 extern void pda_init(int); 
 
-extern void zap_low_mappings(int cpu);
-
 extern void early_idt_handler(void);
 
 extern void mcheck_init(struct cpuinfo_x86 *c);

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [16/22] x86: Move swsusp __pa() dependent code to arch portion
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (14 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [15/22] x86_64: Remove the identity mapping as early as possible Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [17/22] x86_64: do not use virt_to_page on kernel data address Andi Kleen
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o __pa() should be used only on kernel linearly mapped virtual addresses
  and not on kernel text and data addresses.

o Hibernation code needs to determine the physical address associated
  with kernel symbol to mark a section boundary which contains pages which
  don't have to be saved and restored during hibernate/resume operation.

o Move this piece of code in arch dependent section. So that architectures
  which don't have kernel text/data mapped into kernel linearly mapped
  region can come up with their own ways of determining physical addresses
  associated with a kernel text.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/i386/power/suspend.c     |   14 ++++++++++++++
 arch/powerpc/kernel/Makefile  |    1 +
 arch/powerpc/kernel/suspend.c |   24 ++++++++++++++++++++++++
 arch/x86_64/kernel/suspend.c  |   14 ++++++++++++++
 kernel/power/power.h          |    5 ++---
 kernel/power/snapshot.c       |   11 -----------
 6 files changed, 55 insertions(+), 14 deletions(-)

Index: linux/arch/i386/power/suspend.c
===================================================================
--- linux.orig/arch/i386/power/suspend.c
+++ linux/arch/i386/power/suspend.c
@@ -16,6 +16,9 @@
 /* Defined in arch/i386/power/swsusp.S */
 extern int restore_image(void);
 
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
 /* Pointer to the temporary resume page tables */
 pgd_t *resume_pg_dir;
 
@@ -156,3 +159,14 @@ int swsusp_arch_resume(void)
 	restore_image();
 	return 0;
 }
+
+/*
+ *	pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
+	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
+	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
Index: linux/arch/powerpc/kernel/Makefile
===================================================================
--- linux.orig/arch/powerpc/kernel/Makefile
+++ linux/arch/powerpc/kernel/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
 obj-$(CONFIG_6xx)		+= idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o
 obj-$(CONFIG_TAU)		+= tau_6xx.o
 obj32-$(CONFIG_SOFTWARE_SUSPEND) += swsusp_32.o
+obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o
 obj32-$(CONFIG_MODULES)		+= module_32.o
 
 ifeq ($(CONFIG_PPC_MERGE),y)
Index: linux/arch/powerpc/kernel/suspend.c
===================================================================
--- /dev/null
+++ linux/arch/powerpc/kernel/suspend.c
@@ -0,0 +1,24 @@
+/*
+ * Suspend support specific for power.
+ *
+ * Distribute under GPLv2
+ *
+ * Copyright (c) 2002 Pavel Machek <pavel@suse.cz>
+ * Copyright (c) 2001 Patrick Mochel <mochel@osdl.org>
+ */
+
+#include <asm/page.h>
+
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
+/*
+ *	pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >> PAGE_SHIFT;
+	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa(&__nosave_end)) >> PAGE_SHIFT;
+	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
Index: linux/arch/x86_64/kernel/suspend.c
===================================================================
--- linux.orig/arch/x86_64/kernel/suspend.c
+++ linux/arch/x86_64/kernel/suspend.c
@@ -13,6 +13,9 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
 struct saved_context saved_context;
 
 unsigned long saved_context_eax, saved_context_ebx, saved_context_ecx, saved_context_edx;
@@ -220,4 +223,15 @@ int swsusp_arch_resume(void)
 	restore_image();
 	return 0;
 }
+
+/*
+ *	pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
+	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
+	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
 #endif /* CONFIG_SOFTWARE_SUSPEND */
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -23,6 +23,8 @@ static inline int pm_suspend_disk(void)
 }
 #endif
 
+extern int pfn_is_nosave(unsigned long);
+
 extern struct mutex pm_mutex;
 
 #define power_attr(_name) \
@@ -37,9 +39,6 @@ static struct subsys_attribute _name##_a
 
 extern struct subsystem power_subsys;
 
-/* References to section boundaries */
-extern const void __nosave_begin, __nosave_end;
-
 /* Preferred image size in bytes (default 500 MB) */
 extern unsigned long image_size;
 extern int in_suspend;
Index: linux/kernel/power/snapshot.c
===================================================================
--- linux.orig/kernel/power/snapshot.c
+++ linux/kernel/power/snapshot.c
@@ -651,17 +651,6 @@ static inline unsigned int count_highmem
 #endif /* CONFIG_HIGHMEM */
 
 /**
- *	pfn_is_nosave - check if given pfn is in the 'nosave' section
- */
-
-static inline int pfn_is_nosave(unsigned long pfn)
-{
-	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >> PAGE_SHIFT;
-	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa(&__nosave_end)) >> PAGE_SHIFT;
-	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
-}
-
-/**
  *	saveable - Determine whether a non-highmem page should be included in
  *	the suspend image.
  *

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [17/22] x86_64: do not use virt_to_page on kernel data address
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (15 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [16/22] x86: Move swsusp __pa() dependent code to arch portion Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [18/22] x86: __pa and __pa_symbol address space separation Andi Kleen
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o virt_to_page() call should be used on kernel linear addresses and not
  on kernel text and data addresses. Swsusp code uses it on kernel data
  (statically allocated swsusp_header).

o Allocate swsusp_header dynamically so that virt_to_page() can be used
  safely.

o I am changing this because in next few patches, __pa() on x86_64 will
  no longer support kernel text and data addresses and hibernation breaks. 

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 kernel/power/swap.c |   42 +++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

Index: linux/kernel/power/swap.c
===================================================================
--- linux.orig/kernel/power/swap.c
+++ linux/kernel/power/swap.c
@@ -33,12 +33,14 @@ extern char resume_file[];
 
 #define SWSUSP_SIG	"S1SUSPEND"
 
-static struct swsusp_header {
+struct swsusp_header {
 	char reserved[PAGE_SIZE - 20 - sizeof(sector_t)];
 	sector_t image;
 	char	orig_sig[10];
 	char	sig[10];
-} __attribute__((packed, aligned(PAGE_SIZE))) swsusp_header;
+} __attribute__((packed));
+
+static struct swsusp_header *swsusp_header;
 
 /*
  * General things
@@ -141,14 +143,14 @@ static int mark_swapfiles(sector_t start
 {
 	int error;
 
-	bio_read_page(swsusp_resume_block, &swsusp_header, NULL);
-	if (!memcmp("SWAP-SPACE",swsusp_header.sig, 10) ||
-	    !memcmp("SWAPSPACE2",swsusp_header.sig, 10)) {
-		memcpy(swsusp_header.orig_sig,swsusp_header.sig, 10);
-		memcpy(swsusp_header.sig,SWSUSP_SIG, 10);
-		swsusp_header.image = start;
+	bio_read_page(swsusp_resume_block, swsusp_header, NULL);
+	if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) ||
+	    !memcmp("SWAPSPACE2",swsusp_header->sig, 10)) {
+		memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10);
+		memcpy(swsusp_header->sig,SWSUSP_SIG, 10);
+		swsusp_header->image = start;
 		error = bio_write_page(swsusp_resume_block,
-					&swsusp_header, NULL);
+					swsusp_header, NULL);
 	} else {
 		printk(KERN_ERR "swsusp: Swap header not found!\n");
 		error = -ENODEV;
@@ -564,7 +566,7 @@ int swsusp_read(void)
 	if (error < PAGE_SIZE)
 		return error < 0 ? error : -EFAULT;
 	header = (struct swsusp_info *)data_of(snapshot);
-	error = get_swap_reader(&handle, swsusp_header.image);
+	error = get_swap_reader(&handle, swsusp_header->image);
 	if (!error)
 		error = swap_read_page(&handle, header, NULL);
 	if (!error)
@@ -591,17 +593,17 @@ int swsusp_check(void)
 	resume_bdev = open_by_devnum(swsusp_resume_device, FMODE_READ);
 	if (!IS_ERR(resume_bdev)) {
 		set_blocksize(resume_bdev, PAGE_SIZE);
-		memset(&swsusp_header, 0, sizeof(swsusp_header));
+		memset(swsusp_header, 0, sizeof(PAGE_SIZE));
 		error = bio_read_page(swsusp_resume_block,
-					&swsusp_header, NULL);
+					swsusp_header, NULL);
 		if (error)
 			return error;
 
-		if (!memcmp(SWSUSP_SIG, swsusp_header.sig, 10)) {
-			memcpy(swsusp_header.sig, swsusp_header.orig_sig, 10);
+		if (!memcmp(SWSUSP_SIG, swsusp_header->sig, 10)) {
+			memcpy(swsusp_header->sig, swsusp_header->orig_sig, 10);
 			/* Reset swap signature now */
 			error = bio_write_page(swsusp_resume_block,
-						&swsusp_header, NULL);
+						swsusp_header, NULL);
 		} else {
 			return -EINVAL;
 		}
@@ -632,3 +634,13 @@ void swsusp_close(void)
 
 	blkdev_put(resume_bdev);
 }
+
+static int swsusp_header_init(void)
+{
+	swsusp_header = (struct swsusp_header*) __get_free_page(GFP_KERNEL);
+	if (!swsusp_header)
+		panic("Could not allocate memory for swsusp_header\n");
+	return 0;
+}
+
+core_initcall(swsusp_header_init);

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [18/22] x86: __pa and __pa_symbol address space separation
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (16 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [17/22] x86_64: do not use virt_to_page on kernel data address Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [19/22] x86_64: Relocatable Kernel Support Andi Kleen
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/i386/kernel/alternative.c     |    4 ++--
 arch/i386/mm/init.c                |   15 ++++++++-------
 arch/x86_64/kernel/machine_kexec.c |   14 +++++++-------
 arch/x86_64/kernel/setup.c         |    9 +++++----
 arch/x86_64/kernel/smp.c           |    2 +-
 arch/x86_64/kernel/vsyscall.c      |    9 +++++++--
 arch/x86_64/mm/init.c              |   21 +++++++++++----------
 arch/x86_64/mm/pageattr.c          |   16 ++++++++--------
 include/asm-x86_64/page.h          |    6 ++----
 include/asm-x86_64/pgtable.h       |    4 ++--
 10 files changed, 53 insertions(+), 47 deletions(-)

Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -402,8 +402,8 @@ void __init alternative_instructions(voi
 						_text, _etext);
 		}
 		free_init_pages("SMP alternatives",
-				(unsigned long)__smp_alt_begin,
-				(unsigned long)__smp_alt_end);
+				__pa_symbol(&__smp_alt_begin),
+				__pa_symbol(&__smp_alt_end));
 	} else {
 		alternatives_smp_save(__smp_alt_instructions,
 				      __smp_alt_instructions_end);
Index: linux/arch/i386/mm/init.c
===================================================================
--- linux.orig/arch/i386/mm/init.c
+++ linux/arch/i386/mm/init.c
@@ -774,10 +774,11 @@ void free_init_pages(char *what, unsigne
 	unsigned long addr;
 
 	for (addr = begin; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
-		memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
+		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+		ClearPageReserved(page);
+		init_page_count(page);
+		memset(page_address(page), POISON_FREE_INITMEM, PAGE_SIZE);
+		__free_page(page);
 		totalram_pages++;
 	}
 	printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
@@ -786,14 +787,14 @@ void free_init_pages(char *what, unsigne
 void free_initmem(void)
 {
 	free_init_pages("unused kernel memory",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
+			__pa_symbol(&__init_begin),
+			__pa_symbol(&__init_end));
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-	free_init_pages("initrd memory", start, end);
+	free_init_pages("initrd memory", __pa(start), __pa(end));
 }
 #endif
 
Index: linux/arch/x86_64/kernel/machine_kexec.c
===================================================================
--- linux.orig/arch/x86_64/kernel/machine_kexec.c
+++ linux/arch/x86_64/kernel/machine_kexec.c
@@ -191,19 +191,19 @@ NORET_TYPE void machine_kexec(struct kim
 
 	page_list[PA_CONTROL_PAGE] = __pa(control_page);
 	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
-	page_list[PA_PGD] = __pa(kexec_pgd);
+	page_list[PA_PGD] = __pa_symbol(&kexec_pgd);
 	page_list[VA_PGD] = (unsigned long)kexec_pgd;
-	page_list[PA_PUD_0] = __pa(kexec_pud0);
+	page_list[PA_PUD_0] = __pa_symbol(&kexec_pud0);
 	page_list[VA_PUD_0] = (unsigned long)kexec_pud0;
-	page_list[PA_PMD_0] = __pa(kexec_pmd0);
+	page_list[PA_PMD_0] = __pa_symbol(&kexec_pmd0);
 	page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-	page_list[PA_PTE_0] = __pa(kexec_pte0);
+	page_list[PA_PTE_0] = __pa_symbol(&kexec_pte0);
 	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-	page_list[PA_PUD_1] = __pa(kexec_pud1);
+	page_list[PA_PUD_1] = __pa_symbol(&kexec_pud1);
 	page_list[VA_PUD_1] = (unsigned long)kexec_pud1;
-	page_list[PA_PMD_1] = __pa(kexec_pmd1);
+	page_list[PA_PMD_1] = __pa_symbol(&kexec_pmd1);
 	page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-	page_list[PA_PTE_1] = __pa(kexec_pte1);
+	page_list[PA_PTE_1] = __pa_symbol(&kexec_pte1);
 	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
 
 	page_list[PA_TABLE_PAGE] =
Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -243,11 +243,12 @@ void __init setup_arch(char **cmdline_p)
 	init_mm.end_code = (unsigned long) &_etext;
 	init_mm.end_data = (unsigned long) &_edata;
 	init_mm.brk = (unsigned long) &_end;
+	init_mm.pgd = __va(__pa_symbol(&init_level4_pgt));
 
-	code_resource.start = virt_to_phys(&_text);
-	code_resource.end = virt_to_phys(&_etext)-1;
-	data_resource.start = virt_to_phys(&_etext);
-	data_resource.end = virt_to_phys(&_edata)-1;
+	code_resource.start = __pa_symbol(&_text);
+	code_resource.end = __pa_symbol(&_etext)-1;
+	data_resource.start = __pa_symbol(&_etext);
+	data_resource.end = __pa_symbol(&_edata)-1;
 
 	early_identify_cpu(&boot_cpu_data);
 
Index: linux/arch/x86_64/kernel/smp.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smp.c
+++ linux/arch/x86_64/kernel/smp.c
@@ -76,7 +76,7 @@ static inline void leave_mm(int cpu)
 	if (read_pda(mmu_state) == TLBSTATE_OK)
 		BUG();
 	cpu_clear(cpu, read_pda(active_mm)->cpu_vm_mask);
-	load_cr3(swapper_pg_dir);
+	load_cr3(init_mm.pgd);
 }
 
 /*
Index: linux/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -45,6 +45,11 @@
 
 #define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
 #define __syscall_clobber "r11","rcx","memory"
+#define __pa_vsymbol(x)			\
+	({unsigned long v;  		\
+	extern char __vsyscall_0; 	\
+	  asm("" : "=r" (v) : "0" (x)); \
+	  ((v - VSYSCALL_FIRST_PAGE) + __pa_symbol(&__vsyscall_0)); })
 
 struct vsyscall_gtod_data_t {
 	seqlock_t lock;
@@ -224,10 +229,10 @@ static int vsyscall_sysctl_change(ctl_ta
 		return ret;
 	/* gcc has some trouble with __va(__pa()), so just do it this
 	   way. */
-	map1 = ioremap(__pa_symbol(&vsysc1), 2);
+	map1 = ioremap(__pa_vsymbol(&vsysc1), 2);
 	if (!map1)
 		return -ENOMEM;
-	map2 = ioremap(__pa_symbol(&vsysc2), 2);
+	map2 = ioremap(__pa_vsymbol(&vsysc2), 2);
 	if (!map2) {
 		ret = -ENOMEM;
 		goto out;
Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -565,11 +565,11 @@ void free_init_pages(char *what, unsigne
 
 	printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
 	for (addr = begin; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
-		memset((void *)(addr & ~(PAGE_SIZE-1)),
-			POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
+		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+		ClearPageReserved(page);
+		init_page_count(page);
+		memset(page_address(page), POISON_FREE_INITMEM, PAGE_SIZE);
+		__free_page(page);
 		totalram_pages++;
 	}
 }
@@ -579,17 +579,18 @@ void free_initmem(void)
 	memset(__initdata_begin, POISON_FREE_INITDATA,
 		__initdata_end - __initdata_begin);
 	free_init_pages("unused kernel memory",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
+			__pa_symbol(&__init_begin),
+			__pa_symbol(&__init_end));
 }
 
 #ifdef CONFIG_DEBUG_RODATA
 
 void mark_rodata_ro(void)
 {
-	unsigned long addr = (unsigned long)__start_rodata;
+	unsigned long addr = (unsigned long)__va(__pa_symbol(&__start_rodata));
+	unsigned long end  = (unsigned long)__va(__pa_symbol(&__end_rodata));
 
-	for (; addr < (unsigned long)__end_rodata; addr += PAGE_SIZE)
+	for (; addr < end; addr += PAGE_SIZE)
 		change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
 
 	printk ("Write protecting the kernel read-only data: %luk\n",
@@ -608,7 +609,7 @@ void mark_rodata_ro(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
-	free_init_pages("initrd memory", start, end);
+	free_init_pages("initrd memory", __pa(start), __pa(end));
 }
 #endif
 
Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -51,7 +51,6 @@ static struct page *split_large_page(uns
 	SetPagePrivate(base);
 	page_private(base) = 0;
 
-	address = __pa(address);
 	addr = address & LARGE_PAGE_MASK; 
 	pbase = (pte_t *)page_address(base);
 	for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
@@ -101,13 +100,12 @@ static inline void save_page(struct page
  * No more special protections in this 2/4MB area - revert to a
  * large page again. 
  */
-static void revert_page(unsigned long address, pgprot_t ref_prot)
+static void revert_page(unsigned long address, unsigned long pfn, pgprot_t ref_prot)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t large_pte;
-	unsigned long pfn;
 
 	pgd = pgd_offset_k(address);
 	BUG_ON(pgd_none(*pgd));
@@ -115,7 +113,6 @@ static void revert_page(unsigned long ad
 	BUG_ON(pud_none(*pud));
 	pmd = pmd_offset(pud, address);
 	BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
-	pfn = (__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT;
 	large_pte = pfn_pte(pfn, ref_prot);
 	large_pte = pte_mkhuge(large_pte);
 	set_pte((pte_t *)pmd, large_pte);
@@ -141,7 +138,8 @@ __change_page_attr(unsigned long address
  			 */
 			struct page *split;
 			ref_prot2 = pte_pgprot(pte_clrhuge(*kpte));
-			split = split_large_page(address, prot, ref_prot2);
+			split = split_large_page(pfn << PAGE_SHIFT, prot,
+							ref_prot2);
 			if (!split)
 				return -ENOMEM;
 			set_pte(kpte, mk_pte(split, ref_prot2));
@@ -160,7 +158,7 @@ __change_page_attr(unsigned long address
 
 	if (page_private(kpte_page) == 0) {
 		save_page(kpte_page);
-		revert_page(address, ref_prot);
+		revert_page(address, pfn, ref_prot);
  	}
 	return 0;
 } 
@@ -180,6 +178,7 @@ __change_page_attr(unsigned long address
  */
 int change_page_attr_addr(unsigned long address, int numpages, pgprot_t prot)
 {
+	unsigned long phys_base_pfn = __pa_symbol(__START_KERNEL_map) >> PAGE_SHIFT;
 	int err = 0; 
 	int i; 
 
@@ -192,10 +191,11 @@ int change_page_attr_addr(unsigned long 
 			break; 
 		/* Handle kernel mapping too which aliases part of the
 		 * lowmem */
-		if (__pa(address) < KERNEL_TEXT_SIZE) {
+		if ((pfn >= phys_base_pfn) &&
+			((pfn - phys_base_pfn) < (KERNEL_TEXT_SIZE >> PAGE_SHIFT))) {
 			unsigned long addr2;
 			pgprot_t prot2;
-			addr2 = __START_KERNEL_map + __pa(address);
+			addr2 = __START_KERNEL_map + ((pfn - phys_base_pfn) << PAGE_SHIFT);
 			/* Make sure the kernel mappings stay executable */
 			prot2 = pte_pgprot(pte_mkexec(pfn_pte(0, prot)));
 			err = __change_page_attr(addr2, pfn, prot2,
Index: linux/include/asm-x86_64/page.h
===================================================================
--- linux.orig/include/asm-x86_64/page.h
+++ linux/include/asm-x86_64/page.h
@@ -102,17 +102,15 @@ typedef struct { unsigned long pgprot; }
 
 /* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
    Otherwise you risk miscompilation. */ 
-#define __pa(x)			(((unsigned long)(x)>=__START_KERNEL_map)?(unsigned long)(x) - (unsigned long)__START_KERNEL_map:(unsigned long)(x) - PAGE_OFFSET)
+#define __pa(x)			((unsigned long)(x) - PAGE_OFFSET)
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */ 
 #define __pa_symbol(x)		\
 	({unsigned long v;  \
 	  asm("" : "=r" (v) : "0" (x)); \
-	  __pa(v); })
+	  (v - __START_KERNEL_map); })
 
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
-#define __boot_va(x)		__va(x)
-#define __boot_pa(x)		__pa(x)
 #ifdef CONFIG_FLATMEM
 #define pfn_valid(pfn)		((pfn) < end_pfn)
 #endif
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -19,7 +19,7 @@ extern pmd_t level2_kernel_pgt[512];
 extern pgd_t init_level4_pgt[];
 extern unsigned long __supported_pte_mask;
 
-#define swapper_pg_dir init_level4_pgt
+#define swapper_pg_dir ((pgd_t *)NULL)
 
 extern void paging_init(void);
 extern void clear_kernel_mapping(unsigned long addr, unsigned long size);
@@ -29,7 +29,7 @@ extern void clear_kernel_mapping(unsigne
  * for zero-mapped memory areas etc..
  */
 extern unsigned long empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
-#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
+#define ZERO_PAGE(vaddr) (pfn_to_page(__pa_symbol(&empty_zero_page) >> PAGE_SHIFT))
 
 #endif /* !__ASSEMBLY__ */
 

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [19/22] x86_64: Relocatable Kernel Support
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (17 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [18/22] x86: __pa and __pa_symbol address space separation Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [20/22] x86_64: build-time checking Andi Kleen
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


This patch modifies the x86_64 kernel so that it can be loaded and run
at any 2M aligned address, below 512G.  The technique used is to
compile the decompressor with -fPIC and modify it so the decompressor
is fully relocatable.  For the main kernel the page tables are
modified so the kernel remains at the same virtual address.  In
addition a variable phys_base is kept that holds the physical address
the kernel is loaded at.  __pa_symbol is modified to add that when
we take the address of a kernel symbol.

When loaded with a normal bootloader the decompressor will decompress
the kernel to 2M and it will run there.  This both ensures the
relocation code is always working, and makes it easier to use 2M
pages for the kernel and the cpu.

AK: changed to not make RELOCATABLE default in Kconfig

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/Kconfig                     |   49 ++++
 arch/x86_64/boot/compressed/Makefile    |   12 -
 arch/x86_64/boot/compressed/head.S      |  322 +++++++++++++++++++++++---------
 arch/x86_64/boot/compressed/misc.c      |  251 +++++++++++++-----------
 arch/x86_64/boot/compressed/vmlinux.lds |   44 ++++
 arch/x86_64/boot/compressed/vmlinux.scr |    9 
 arch/x86_64/kernel/head.S               |  225 ++++++++++++----------
 arch/x86_64/kernel/suspend_asm.S        |    7 
 include/asm-x86_64/page.h               |    6 
 9 files changed, 595 insertions(+), 330 deletions(-)

Index: linux/arch/x86_64/boot/compressed/head.S
===================================================================
--- linux.orig/arch/x86_64/boot/compressed/head.S
+++ linux/arch/x86_64/boot/compressed/head.S
@@ -26,116 +26,262 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/msr.h>
 
+.section ".text.head"
 	.code32
 	.globl startup_32
-	
+
 startup_32:
 	cld
 	cli
-	movl $(__KERNEL_DS),%eax
-	movl %eax,%ds
-	movl %eax,%es
-	movl %eax,%fs
-	movl %eax,%gs
-
-	lss stack_start,%esp
-	xorl %eax,%eax
-1:	incl %eax		# check that A20 really IS enabled
-	movl %eax,0x000000	# loop forever if it isn't
-	cmpl %eax,0x100000
-	je 1b
+	movl	$(__KERNEL_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %ss
+
+/* Calculate the delta between where we were compiled to run
+ * at and where we were actually loaded at.  This can only be done
+ * with a short local call on x86.  Nothing  else will tell us what
+ * address we are running at.  The reserved chunk of the real-mode
+ * data at 0x34-0x3f are used as the stack for this calculation.
+ * Only 4 bytes are needed.
+ */
+	leal	0x40(%esi), %esp
+	call	1f
+1:	popl	%ebp
+	subl	$1b, %ebp
+
+/* Compute the delta between where we were compiled to run at
+ * and where the code will actually run at.
+ */
+/* %ebp contains the address we are loaded at by the boot loader and %ebx
+ * contains the address where we should move the kernel image temporarily
+ * for safe in-place decompression.
+ */
+
+#ifdef CONFIG_RELOCATABLE
+	movl	%ebp, %ebx
+	addl	$(LARGE_PAGE_SIZE -1), %ebx
+	andl	$LARGE_PAGE_MASK, %ebx
+#else
+	movl	$CONFIG_PHYSICAL_START, %ebx
+#endif
+
+	/* Replace the compressed data size with the uncompressed size */
+	subl	input_len(%ebp), %ebx
+	movl	output_len(%ebp), %eax
+	addl	%eax, %ebx
+	/* Add 8 bytes for every 32K input block */
+	shrl	$12, %eax
+	addl	%eax, %ebx
+	/* Add 32K + 18 bytes of extra slack and align on a 4K boundary */
+	addl	$(32768 + 18 + 4095), %ebx
+	andl	$~4095, %ebx
 
 /*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
+ * Prepare for entering 64 bit mode
  */
-	pushl $0
-	popfl
+
+	/* Load new GDT with the 64bit segments using 32bit descriptor */
+	leal	gdt(%ebp), %eax
+	movl	%eax, gdt+2(%ebp)
+	lgdt	gdt(%ebp)
+
+	/* Enable PAE mode */
+	xorl	%eax, %eax
+	orl	$(1 << 5), %eax
+	movl	%eax, %cr4
+
+ /*
+  * Build early 4G boot pagetable
+  */
+	/* Initialize Page tables to 0*/
+	leal	pgtable(%ebx), %edi
+	xorl	%eax, %eax
+	movl	$((4096*6)/4), %ecx
+	rep	stosl
+
+	/* Build Level 4 */
+	leal	pgtable + 0(%ebx), %edi
+	leal	0x1007 (%edi), %eax
+	movl	%eax, 0(%edi)
+
+	/* Build Level 3 */
+	leal	pgtable + 0x1000(%ebx), %edi
+	leal	0x1007(%edi), %eax
+	movl	$4, %ecx
+1:	movl	%eax, 0x00(%edi)
+	addl	$0x00001000, %eax
+	addl	$8, %edi
+	decl	%ecx
+	jnz	1b
+
+	/* Build Level 2 */
+	leal	pgtable + 0x2000(%ebx), %edi
+	movl	$0x00000183, %eax
+	movl	$2048, %ecx
+1:	movl	%eax, 0(%edi)
+	addl	$0x00200000, %eax
+	addl	$8, %edi
+	decl	%ecx
+	jnz	1b
+
+	/* Enable the boot page tables */
+	leal	pgtable(%ebx), %eax
+	movl	%eax, %cr3
+
+	/* Enable Long mode in EFER (Extended Feature Enable Register) */
+	movl	$MSR_EFER, %ecx
+	rdmsr
+	btsl	$_EFER_LME, %eax
+	wrmsr
+
+	/* Setup for the jump to 64bit mode
+	 *
+	 * When the jump is performend we will be in long mode but
+	 * in 32bit compatibility mode with EFER.LME = 1, CS.L = 0, CS.D = 1
+	 * (and in turn EFER.LMA = 1).	To jump into 64bit mode we use
+	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
+	 * We place all of the values on our mini stack so lret can
+	 * used to perform that far jump.
+	 */
+	pushl	$__KERNEL_CS
+	leal	startup_64(%ebp), %eax
+	pushl	%eax
+
+	/* Enter paged protected Mode, activating Long Mode */
+	movl	$0x80000001, %eax /* Enable Paging and Protected mode */
+	movl	%eax, %cr0
+
+	/* Jump from 32bit compatibility mode into 64bit mode. */
+	lret
+
+	/* Be careful here startup_64 needs to be at a predictable
+	 * address so I can export it in an ELF header.  Bootloaders
+	 * should look at the ELF header to find this address, as
+	 * it may change in the future.
+	 */
+	.code64
+	.org 0x100
+ENTRY(startup_64)
+	/* We come here either from startup_32 or directly from a
+	 * 64bit bootloader.  If we come here from a bootloader we depend on
+	 * an identity mapped page table being provied that maps our
+	 * entire text+data+bss and hopefully all of memory.
+	 */
+
+	/* Setup data segments. */
+	xorl	%eax, %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %ss
+
+	/* Compute the decompressed kernel start address.  It is where
+	 * we were loaded at aligned to a 2M boundary. %rbp contains the
+	 * decompressed kernel start address.
+	 *
+	 * If it is a relocatable kernel then decompress and run the kernel
+	 * from load address aligned to 2MB addr, otherwise decompress and
+	 * run the kernel from CONFIG_PHYSICAL_START
+	 */
+
+	/* Start with the delta to where the kernel will run at. */
+#ifdef CONFIG_RELOCATABLE
+	leaq	startup_32(%rip) /* - $startup_32 */, %rbp
+	addq	$(LARGE_PAGE_SIZE - 1), %rbp
+	andq	$LARGE_PAGE_MASK, %rbp
+	movq	%rbp, %rbx
+#else
+	movq	$CONFIG_PHYSICAL_START, %rbp
+	movq	%rbp, %rbx
+#endif
+
+	/* Replace the compressed data size with the uncompressed size */
+	movl	input_len(%rip), %eax
+	subq	%rax, %rbx
+	movl	output_len(%rip), %eax
+	addq	%rax, %rbx
+	/* Add 8 bytes for every 32K input block */
+	shrq	$12, %rax
+	addq	%rax, %rbx
+	/* Add 32K + 18 bytes of extra slack and align on a 4K boundary */
+	addq	$(32768 + 18 + 4095), %rbx
+	andq	$~4095, %rbx
+
+/* Copy the compressed kernel to the end of our buffer
+ * where decompression in place becomes safe.
+ */
+	leaq	_end(%rip), %r8
+	leaq	_end(%rbx), %r9
+	movq	$_end /* - $startup_32 */, %rcx
+1:	subq	$8, %r8
+	subq	$8, %r9
+	movq	0(%r8), %rax
+	movq	%rax, 0(%r9)
+	subq	$8, %rcx
+	jnz	1b
+
+/*
+ * Jump to the relocated address.
+ */
+	leaq	relocated(%rbx), %rax
+	jmp	*%rax
+
+.section ".text"
+relocated:
+
 /*
  * Clear BSS
  */
-	xorl %eax,%eax
-	movl $_edata,%edi
-	movl $_end,%ecx
-	subl %edi,%ecx
+	xorq	%rax, %rax
+	leaq    _edata(%rbx), %rdi
+	leaq    _end(%rbx), %rcx
+	subq	%rdi, %rcx
 	cld
 	rep
 	stosb
+
+	/* Setup the stack */
+	leaq	user_stack_end(%rip), %rsp
+
+	/* zero EFLAGS after setting rsp */
+	pushq	$0
+	popfq
+
 /*
  * Do the decompression, and jump to the new kernel..
  */
-	subl $16,%esp	# place for structure on the stack
-	movl %esp,%eax
-	pushl %esi	# real mode pointer as second arg
-	pushl %eax	# address of structure as first arg
-	call decompress_kernel
-	orl  %eax,%eax 
-	jnz  3f
-	addl $8,%esp
-	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $__PHYSICAL_START
-
-/*
- * We come here, if we were loaded high.
- * We need to move the move-in-place routine down to 0x1000
- * and then start it with the buffer addresses in registers,
- * which we got from the stack.
- */
-3:
-	movl %esi,%ebx	
-	movl $move_routine_start,%esi
-	movl $0x1000,%edi
-	movl $move_routine_end,%ecx
-	subl %esi,%ecx
-	addl $3,%ecx
-	shrl $2,%ecx
-	cld
-	rep
-	movsl
+	pushq	%rsi			# Save the real mode argument
+	movq	%rsi, %rdi		# real mode address
+	leaq	_heap(%rip), %rsi	# _heap
+	leaq	input_data(%rip), %rdx  # input_data
+	movl	input_len(%rip), %eax
+	movq	%rax, %rcx		# input_len
+	movq	%rbp, %r8		# output
+	call	decompress_kernel
+	popq	%rsi
 
-	popl %esi	# discard the address
-	addl $4,%esp	# real mode pointer
-	popl %esi	# low_buffer_start
-	popl %ecx	# lcount
-	popl %edx	# high_buffer_start
-	popl %eax	# hcount
-	movl $__PHYSICAL_START,%edi
-	cli		# make sure we don't get interrupted
-	ljmp $(__KERNEL_CS), $0x1000 # and jump to the move routine
 
 /*
- * Routine (template) for moving the decompressed kernel in place,
- * if we were high loaded. This _must_ PIC-code !
+ * Jump to the decompressed kernel.
  */
-move_routine_start:
-	movl %ecx,%ebp
-	shrl $2,%ecx
-	rep
-	movsl
-	movl %ebp,%ecx
-	andl $3,%ecx
-	rep
-	movsb
-	movl %edx,%esi
-	movl %eax,%ecx	# NOTE: rep movsb won't move if %ecx == 0
-	addl $3,%ecx
-	shrl $2,%ecx
-	rep
-	movsl
-	movl %ebx,%esi	# Restore setup pointer
-	xorl %ebx,%ebx
-	ljmp $(__KERNEL_CS), $__PHYSICAL_START
-move_routine_end:
+	jmp	*%rbp
 
-
-/* Stack for uncompression */ 	
-	.align 32
-user_stack:	 	
+	.data
+gdt:
+	.word	gdt_end - gdt
+	.long	gdt
+	.word	0
+	.quad	0x0000000000000000	/* NULL descriptor */
+	.quad	0x00af9a000000ffff	/* __KERNEL_CS */
+	.quad	0x00cf92000000ffff	/* __KERNEL_DS */
+gdt_end:
+	.bss
+/* Stack for uncompression */
+	.balign 4
+user_stack:
 	.fill 4096,4,0
-stack_start:	
-	.long user_stack+4096
-	.word __KERNEL_DS
-
+user_stack_end:
Index: linux/arch/x86_64/boot/compressed/Makefile
===================================================================
--- linux.orig/arch/x86_64/boot/compressed/Makefile
+++ linux/arch/x86_64/boot/compressed/Makefile
@@ -8,16 +8,14 @@
 
 targets		:= vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o
 EXTRA_AFLAGS	:= -traditional
-AFLAGS		:= $(subst -m64,-m32,$(AFLAGS))
 
 # cannot use EXTRA_CFLAGS because base CFLAGS contains -mkernel which conflicts with
 # -m32
-CFLAGS := -m32 -D__KERNEL__ -Iinclude -O2  -fno-strict-aliasing
-LDFLAGS := -m elf_i386
+CFLAGS := -m64 -D__KERNEL__ -Iinclude -O2  -fno-strict-aliasing -fPIC -mcmodel=small -fno-builtin
+LDFLAGS := -m elf_x86_64
 
-LDFLAGS_vmlinux := -Ttext $(IMAGE_OFFSET) -e startup_32 -m elf_i386
-
-$(obj)/vmlinux: $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
+LDFLAGS_vmlinux := -T
+$(obj)/vmlinux: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
 	$(call if_changed,ld)
 	@:
 
@@ -27,7 +25,7 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,gzip)
 
-LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T
+LDFLAGS_piggy.o := -r --format binary --oformat elf64-x86-64 -T
 
 $(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE
 	$(call if_changed,ld)
Index: linux/arch/x86_64/boot/compressed/misc.c
===================================================================
--- linux.orig/arch/x86_64/boot/compressed/misc.c
+++ linux/arch/x86_64/boot/compressed/misc.c
@@ -9,10 +9,95 @@
  * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
  */
 
+#define _LINUX_STRING_H_ 1
+#define __LINUX_BITMAP_H 1
+
+#include <linux/linkage.h>
 #include <linux/screen_info.h>
 #include <asm/io.h>
 #include <asm/page.h>
 
+/* WARNING!!
+ * This code is compiled with -fPIC and it is relocated dynamically
+ * at run time, but no relocation processing is performed.
+ * This means that it is not safe to place pointers in static structures.
+ */
+
+/*
+ * Getting to provable safe in place decompression is hard.
+ * Worst case behaviours need to be analized.
+ * Background information:
+ *
+ * The file layout is:
+ *    magic[2]
+ *    method[1]
+ *    flags[1]
+ *    timestamp[4]
+ *    extraflags[1]
+ *    os[1]
+ *    compressed data blocks[N]
+ *    crc[4] orig_len[4]
+ *
+ * resulting in 18 bytes of non compressed data overhead.
+ *
+ * Files divided into blocks
+ * 1 bit (last block flag)
+ * 2 bits (block type)
+ *
+ * 1 block occurs every 32K -1 bytes or when there 50% compression has been achieved.
+ * The smallest block type encoding is always used.
+ *
+ * stored:
+ *    32 bits length in bytes.
+ *
+ * fixed:
+ *    magic fixed tree.
+ *    symbols.
+ *
+ * dynamic:
+ *    dynamic tree encoding.
+ *    symbols.
+ *
+ *
+ * The buffer for decompression in place is the length of the
+ * uncompressed data, plus a small amount extra to keep the algorithm safe.
+ * The compressed data is placed at the end of the buffer.  The output
+ * pointer is placed at the start of the buffer and the input pointer
+ * is placed where the compressed data starts.  Problems will occur
+ * when the output pointer overruns the input pointer.
+ *
+ * The output pointer can only overrun the input pointer if the input
+ * pointer is moving faster than the output pointer.  A condition only
+ * triggered by data whose compressed form is larger than the uncompressed
+ * form.
+ *
+ * The worst case at the block level is a growth of the compressed data
+ * of 5 bytes per 32767 bytes.
+ *
+ * The worst case internal to a compressed block is very hard to figure.
+ * The worst case can at least be boundined by having one bit that represents
+ * 32764 bytes and then all of the rest of the bytes representing the very
+ * very last byte.
+ *
+ * All of which is enough to compute an amount of extra data that is required
+ * to be safe.  To avoid problems at the block level allocating 5 extra bytes
+ * per 32767 bytes of data is sufficient.  To avoind problems internal to a block
+ * adding an extra 32767 bytes (the worst case uncompressed block size) is
+ * sufficient, to ensure that in the worst case the decompressed data for
+ * block will stop the byte before the compressed data for a block begins.
+ * To avoid problems with the compressed data's meta information an extra 18
+ * bytes are needed.  Leading to the formula:
+ *
+ * extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
+ *
+ * Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
+ * Adding 32768 instead of 32767 just makes for round numbers.
+ * Adding the decompressor_size is necessary as it musht live after all
+ * of the data as well.  Last I measured the decompressor is about 14K.
+ * 10K of actuall data and 4K of bss.
+ *
+ */
+
 /*
  * gzip declarations
  */
@@ -28,15 +113,20 @@ typedef unsigned char  uch;
 typedef unsigned short ush;
 typedef unsigned long  ulg;
 
-#define WSIZE 0x8000		/* Window size must be at least 32k, */
-				/* and a power of two */
-
-static uch *inbuf;	     /* input buffer */
-static uch window[WSIZE];    /* Sliding window buffer */
-
-static unsigned insize = 0;  /* valid bytes in inbuf */
-static unsigned inptr = 0;   /* index of next byte to be processed in inbuf */
-static unsigned outcnt = 0;  /* bytes in output buffer */
+#define WSIZE 0x80000000	/* Window size must be at least 32k,
+				 * and a power of two
+				 * We don't actually have a window just
+				 * a huge output buffer so I report
+				 * a 2G windows size, as that should
+				 * always be larger than our output buffer.
+				 */
+
+static uch *inbuf;	/* input buffer */
+static uch *window;	/* Sliding window buffer, (and final output buffer) */
+
+static unsigned insize;  /* valid bytes in inbuf */
+static unsigned inptr;   /* index of next byte to be processed in inbuf */
+static unsigned outcnt;  /* bytes in output buffer */
 
 /* gzip flag byte */
 #define ASCII_FLAG   0x01 /* bit 0 set: file probably ASCII text */
@@ -87,8 +177,6 @@ extern unsigned char input_data[];
 extern int input_len;
 
 static long bytes_out = 0;
-static uch *output_data;
-static unsigned long output_ptr = 0;
 
 static void *malloc(int size);
 static void free(void *where);
@@ -98,17 +186,10 @@ static void *memcpy(void *dest, const vo
 
 static void putstr(const char *);
 
-extern int end;
-static long free_mem_ptr = (long)&end;
+static long free_mem_ptr;
 static long free_mem_end_ptr;
 
-#define INPLACE_MOVE_ROUTINE  0x1000
-#define LOW_BUFFER_START      0x2000
-#define LOW_BUFFER_MAX       0x90000
-#define HEAP_SIZE             0x3000
-static unsigned int low_buffer_end, low_buffer_size;
-static int high_loaded =0;
-static uch *high_buffer_start /* = (uch *)(((ulg)&end) + HEAP_SIZE)*/;
+#define HEAP_SIZE             0x6000
 
 static char *vidmem = (char *)0xb8000;
 static int vidport;
@@ -218,58 +299,31 @@ static void* memcpy(void* dest, const vo
  */
 static int fill_inbuf(void)
 {
-	if (insize != 0) {
-		error("ran out of input data");
-	}
-
-	inbuf = input_data;
-	insize = input_len;
-	inptr = 1;
-	return inbuf[0];
+	error("ran out of input data");
+	return 0;
 }
 
 /* ===========================================================================
  * Write the output window window[0..outcnt-1] and update crc and bytes_out.
  * (Used for the decompressed data only.)
  */
-static void flush_window_low(void)
-{
-    ulg c = crc;         /* temporary variable */
-    unsigned n;
-    uch *in, *out, ch;
-    
-    in = window;
-    out = &output_data[output_ptr]; 
-    for (n = 0; n < outcnt; n++) {
-	    ch = *out++ = *in++;
-	    c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-    }
-    crc = c;
-    bytes_out += (ulg)outcnt;
-    output_ptr += (ulg)outcnt;
-    outcnt = 0;
-}
-
-static void flush_window_high(void)
-{
-    ulg c = crc;         /* temporary variable */
-    unsigned n;
-    uch *in,  ch;
-    in = window;
-    for (n = 0; n < outcnt; n++) {
-	ch = *output_data++ = *in++;
-	if ((ulg)output_data == low_buffer_end) output_data=high_buffer_start;
-	c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-    }
-    crc = c;
-    bytes_out += (ulg)outcnt;
-    outcnt = 0;
-}
-
 static void flush_window(void)
 {
-	if (high_loaded) flush_window_high();
-	else flush_window_low();
+	/* With my window equal to my output buffer
+	 * I only need to compute the crc here.
+	 */
+	ulg c = crc;         /* temporary variable */
+	unsigned n;
+	uch *in, ch;
+
+	in = window;
+	for (n = 0; n < outcnt; n++) {
+		ch = *in++;
+		c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
+	}
+	crc = c;
+	bytes_out += (ulg)outcnt;
+	outcnt = 0;
 }
 
 static void error(char *x)
@@ -281,57 +335,8 @@ static void error(char *x)
 	while(1);	/* Halt */
 }
 
-static void setup_normal_output_buffer(void)
-{
-#ifdef STANDARD_MEMORY_BIOS_CALL
-	if (RM_EXT_MEM_K < 1024) error("Less than 2MB of memory");
-#else
-	if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
-#endif
-	output_data = (unsigned char *)__PHYSICAL_START; /* Normally Points to 1M */
-	free_mem_end_ptr = (long)real_mode;
-}
-
-struct moveparams {
-	uch *low_buffer_start;  int lcount;
-	uch *high_buffer_start; int hcount;
-};
-
-static void setup_output_buffer_if_we_run_high(struct moveparams *mv)
-{
-	high_buffer_start = (uch *)(((ulg)&end) + HEAP_SIZE);
-#ifdef STANDARD_MEMORY_BIOS_CALL
-	if (RM_EXT_MEM_K < (3*1024)) error("Less than 4MB of memory");
-#else
-	if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < (3*1024)) error("Less than 4MB of memory");
-#endif	
-	mv->low_buffer_start = output_data = (unsigned char *)LOW_BUFFER_START;
-	low_buffer_end = ((unsigned int)real_mode > LOW_BUFFER_MAX
-	  ? LOW_BUFFER_MAX : (unsigned int)real_mode) & ~0xfff;
-	low_buffer_size = low_buffer_end - LOW_BUFFER_START;
-	high_loaded = 1;
-	free_mem_end_ptr = (long)high_buffer_start;
-	if ( (__PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
-		high_buffer_start = (uch *)(__PHYSICAL_START + low_buffer_size);
-		mv->hcount = 0; /* say: we need not to move high_buffer */
-	}
-	else mv->hcount = -1;
-	mv->high_buffer_start = high_buffer_start;
-}
-
-static void close_output_buffer_if_we_run_high(struct moveparams *mv)
-{
-	if (bytes_out > low_buffer_size) {
-		mv->lcount = low_buffer_size;
-		if (mv->hcount)
-			mv->hcount = bytes_out - low_buffer_size;
-	} else {
-		mv->lcount = bytes_out;
-		mv->hcount = 0;
-	}
-}
-
-int decompress_kernel(struct moveparams *mv, void *rmode)
+asmlinkage void decompress_kernel(void *rmode, unsigned long heap,
+	uch *input_data, unsigned long input_len, uch *output)
 {
 	real_mode = rmode;
 
@@ -346,13 +351,21 @@ int decompress_kernel(struct moveparams 
 	lines = RM_SCREEN_INFO.orig_video_lines;
 	cols = RM_SCREEN_INFO.orig_video_cols;
 
-	if (free_mem_ptr < 0x100000) setup_normal_output_buffer();
-	else setup_output_buffer_if_we_run_high(mv);
+	window = output;  		/* Output buffer (Normally at 1M) */
+	free_mem_ptr     = heap;	/* Heap  */
+	free_mem_end_ptr = heap + HEAP_SIZE;
+	inbuf  = input_data;		/* Input buffer */
+	insize = input_len;
+	inptr  = 0;
+
+	if ((ulg)output & 0x1fffffUL)
+		error("Destination address not 2M aligned");
+	if ((ulg)output >= 0xffffffffffUL)
+		error("Destination address too large");
 
 	makecrc();
 	putstr(".\nDecompressing Linux...");
 	gunzip();
 	putstr("done.\nBooting the kernel.\n");
-	if (high_loaded) close_output_buffer_if_we_run_high(mv);
-	return high_loaded;
+	return;
 }
Index: linux/arch/x86_64/boot/compressed/vmlinux.lds
===================================================================
--- /dev/null
+++ linux/arch/x86_64/boot/compressed/vmlinux.lds
@@ -0,0 +1,44 @@
+OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
+OUTPUT_ARCH(i386:x86-64)
+ENTRY(startup_64)
+SECTIONS
+{
+	/* Be careful parts of head.S assume startup_32 is at
+ 	 * address 0.
+	 */
+	. = 0;
+	.text :	{
+		_head = . ;
+		*(.text.head)
+		_ehead = . ;
+		*(.text.compressed)
+		_text = .; 	/* Text */
+		*(.text)
+		*(.text.*)
+		_etext = . ;
+	}
+	.rodata : {
+		_rodata = . ;
+		*(.rodata)	 /* read-only data */
+		*(.rodata.*)
+		_erodata = . ;
+	}
+	.data :	{
+		_data = . ;
+		*(.data)
+		*(.data.*)
+		_edata = . ;
+	}
+	.bss : {
+		_bss = . ;
+		*(.bss)
+		*(.bss.*)
+		*(COMMON)
+		. = ALIGN(8);
+		_end = . ;
+		. = ALIGN(4096);
+		pgtable = . ;
+		. = . + 4096 * 6;
+		_heap = .;
+	}
+}
Index: linux/arch/x86_64/boot/compressed/vmlinux.scr
===================================================================
--- linux.orig/arch/x86_64/boot/compressed/vmlinux.scr
+++ linux/arch/x86_64/boot/compressed/vmlinux.scr
@@ -1,9 +1,10 @@
 SECTIONS
 {
-  .data : { 
+  .text.compressed : {
 	input_len = .;
-	LONG(input_data_end - input_data) input_data = .; 
-	*(.data) 
-	input_data_end = .; 
+	LONG(input_data_end - input_data) input_data = .;
+	*(.data)
+	output_len = . - 4;
+	input_data_end = .;
 	}
 }
Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -565,23 +565,56 @@ config CRASH_DUMP
 	  PHYSICAL_START.
           For more details see Documentation/kdump/kdump.txt
 
+config RELOCATABLE
+	bool "Build a relocatable kernel(EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	help
+	  Builds a relocatable kernel. This enables loading and running
+	  a kernel binary from a different physical address than it has
+	  been compiled for.
+
+	  One use is for the kexec on panic case where the recovery kernel
+	  must live at a different physical address than the primary
+	  kernel.
+
+	  Note: If CONFIG_RELOCATABLE=y, then kernel run from the address
+	  it has been loaded at and compile time physical address
+	  (CONFIG_PHYSICAL_START) is ignored.
+
 config PHYSICAL_START
 	hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
-	default "0x1000000" if CRASH_DUMP
 	default "0x200000"
 	help
-	  This gives the physical address where the kernel is loaded. Normally
-	  for regular kernels this value is 0x200000 (2MB). But in the case
-	  of kexec on panic the fail safe kernel needs to run at a different
-	  address than the panic-ed kernel. This option is used to set the load
-	  address for kernels used to capture crash dump on being kexec'ed
-	  after panic. The default value for crash dump kernels is
-	  0x1000000 (16MB). This can also be set based on the "X" value as
+	  This gives the physical address where the kernel is loaded. It
+	  should be aligned to 2MB boundary.
+
+	  If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
+	  bzImage will decompress itself to above physical address and
+	  run from there. Otherwise, bzImage will run from the address where
+	  it has been loaded by the boot loader and will ignore above physical
+	  address.
+
+	  In normal kdump cases one does not have to set/change this option
+	  as now bzImage can be compiled as a completely relocatable image
+	  (CONFIG_RELOCATABLE=y) and be used to load and run from a different
+	  address. This option is mainly useful for the folks who don't want
+	  to use a bzImage for capturing the crash dump and want to use a
+	  vmlinux instead.
+
+	  So if you are using bzImage for capturing the crash dump, leave
+	  the value here unchanged to 0x200000 and set CONFIG_RELOCATABLE=y.
+	  Otherwise if you plan to use vmlinux for capturing the crash dump
+	  change this value to start of the reserved region (Typically 16MB
+	  0x1000000). In other words, it can be set based on the "X" value as
 	  specified in the "crashkernel=YM@XM" command line boot parameter
 	  passed to the panic-ed kernel. Typically this parameter is set as
 	  crashkernel=64M@16M. Please take a look at
 	  Documentation/kdump/kdump.txt for more details about crash dumps.
 
+	  Usage of bzImage for capturing the crash dump is advantageous as
+	  one does not have to build two kernels. Same kernel can be used
+	  as production kernel and capture kernel.
+
 	  Don't change this unless you know what you are doing.
 
 config SECCOMP
Index: linux/arch/x86_64/kernel/head.S
===================================================================
--- linux.orig/arch/x86_64/kernel/head.S
+++ linux/arch/x86_64/kernel/head.S
@@ -5,6 +5,7 @@
  *  Copyright (C) 2000 Pavel Machek <pavel@suse.cz>
  *  Copyright (C) 2000 Karsten Keil <kkeil@suse.de>
  *  Copyright (C) 2001,2002 Andi Kleen <ak@suse.de>
+ *  Copyright (C) 2005 Eric Biederman <ebiederm@xmission.com>
  */
 
 
@@ -17,95 +18,127 @@
 #include <asm/page.h>
 #include <asm/msr.h>
 #include <asm/cache.h>
-	
+
 /* we are not able to switch in one step to the final KERNEL ADRESS SPACE
- * because we need identity-mapped pages on setup so define __START_KERNEL to
- * 0x100000 for this stage
- * 
+ * because we need identity-mapped pages.
+ *
  */
 
 	.text
 	.section .bootstrap.text
-	.code32
-	.globl startup_32
-/* %bx:	 1 if coming from smp trampoline on secondary cpu */ 
-startup_32:
-	
+	.code64
+	.globl startup_64
+startup_64:
+
 	/*
-	 * At this point the CPU runs in 32bit protected mode (CS.D = 1) with
-	 * paging disabled and the point of this file is to switch to 64bit
-	 * long mode with a kernel mapping for kerneland to jump into the
-	 * kernel virtual addresses.
- 	 * There is no stack until we set one up.
+	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
+	 * and someone has loaded an identity mapped page table
+	 * for us.  These identity mapped page tables map all of the
+	 * kernel pages and possibly all of memory.
+	 *
+	 * %esi holds a physical pointer to real_mode_data.
+	 *
+	 * We come here either directly from a 64bit bootloader, or from
+	 * arch/x86_64/boot/compressed/head.S.
+	 *
+	 * We only come here initially at boot nothing else comes here.
+	 *
+	 * Since we may be loaded at an address different from what we were
+	 * compiled to run at we first fixup the physical addresses in our page
+	 * tables and then reload them.
 	 */
 
-	/* Initialize the %ds segment register */
-	movl $__KERNEL_DS,%eax
-	movl %eax,%ds
+	/* Compute the delta between the address I am compiled to run at and the
+	 * address I am actually running at.
+	 */
+	leaq	_text(%rip), %rbp
+	subq	$_text - __START_KERNEL_map, %rbp
 
-	/* Load new GDT with the 64bit segments using 32bit descriptor */
-	lgdt	pGDT32 - __START_KERNEL_map
+	/* Is the address not 2M aligned? */
+	movq	%rbp, %rax
+	andl	$~LARGE_PAGE_MASK, %eax
+	testl	%eax, %eax
+	jnz	bad_address
+
+	/* Is the address too large? */
+	leaq	_text(%rip), %rdx
+	movq	$PGDIR_SIZE, %rax
+	cmpq	%rax, %rdx
+	jae	bad_address
 
-	/* If the CPU doesn't support CPUID this will double fault.
-	 * Unfortunately it is hard to check for CPUID without a stack. 
+	/* Fixup the physical addresses in the page table
 	 */
-	
-	/* Check if extended functions are implemented */		
-	movl	$0x80000000, %eax
-	cpuid
-	cmpl	$0x80000000, %eax
-	jbe	no_long_mode
-	/* Check if long mode is implemented */
-	mov	$0x80000001, %eax
-	cpuid
-	btl	$29, %edx
-	jnc	no_long_mode
+	addq	%rbp, init_level4_pgt + 0(%rip)
+	addq	%rbp, init_level4_pgt + (258*8)(%rip)
+	addq	%rbp, init_level4_pgt + (511*8)(%rip)
+
+	addq	%rbp, level3_ident_pgt + 0(%rip)
+	addq	%rbp, level3_kernel_pgt + (510*8)(%rip)
+
+	/* Add an Identity mapping if I am above 1G */
+	leaq	_text(%rip), %rdi
+	andq	$LARGE_PAGE_MASK, %rdi
+
+	movq	%rdi, %rax
+	shrq	$PUD_SHIFT, %rax
+	andq	$(PTRS_PER_PUD - 1), %rax
+	jz	ident_complete
+
+	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq	level3_ident_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %rax, 8)
+
+	movq	%rdi, %rax
+	shrq	$PMD_SHIFT, %rax
+	andq	$(PTRS_PER_PMD - 1), %rax
+	leaq	__PAGE_KERNEL_LARGE_EXEC(%rdi), %rdx
+	leaq	level2_spare_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %rax, 8)
+ident_complete:
 
-	/*
-	 * Prepare for entering 64bits mode
+	/* Fixup the kernel text+data virtual addresses
 	 */
+	leaq	level2_kernel_pgt(%rip), %rdi
+	leaq	4096(%rdi), %r8
+	/* See if it is a valid page table entry */
+1:	testq	$1, 0(%rdi)
+	jz	2f
+	addq	%rbp, 0(%rdi)
+	/* Go to the next page */
+2:	addq	$8, %rdi
+	cmp	%r8, %rdi
+	jne	1b
 
-	/* Enable PAE mode */
-	xorl	%eax, %eax
-	btsl	$5, %eax
-	movl	%eax, %cr4
-
-	/* Setup early boot stage 4 level pagetables */
-	movl	$(init_level4_pgt - __START_KERNEL_map), %eax
-	movl	%eax, %cr3
+	/* Fixup phys_base */
+	addq	%rbp, phys_base(%rip)
 
-	/* Setup EFER (Extended Feature Enable Register) */
-	movl	$MSR_EFER, %ecx
-	rdmsr
-
-	/* Enable Long Mode */
-	btsl	$_EFER_LME, %eax
-				
-	/* Make changes effective */
-	wrmsr
+#ifdef CONFIG_SMP
+	addq	%rbp, trampoline_level4_pgt + 0(%rip)
+	addq	%rbp, trampoline_level4_pgt + (511*8)(%rip)
+#endif
+#ifdef CONFIG_ACPI_SLEEP
+	addq	%rbp, wakeup_level4_pgt + 0(%rip)
+	addq	%rbp, wakeup_level4_pgt + (511*8)(%rip)
+#endif
 
-	xorl	%eax, %eax
-	btsl	$31, %eax			/* Enable paging and in turn activate Long Mode */
-	btsl	$0, %eax			/* Enable protected mode */
-	/* Make changes effective */
-	movl	%eax, %cr0
-	/*
-	 * At this point we're in long mode but in 32bit compatibility mode
-	 * with EFER.LME = 1, CS.L = 0, CS.D = 1 (and in turn
-	 * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
-	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
+	/* Due to ENTRY(), sometimes the empty space gets filled with
+	 * zeros. Better take a jmp than relying on empty space being
+	 * filled with 0x90 (nop)
 	 */
-	ljmp	$__KERNEL_CS, $(startup_64 - __START_KERNEL_map)
-
-	.code64
-	.org 0x100	
-	.globl startup_64
-startup_64:
+	jmp secondary_startup_64
 ENTRY(secondary_startup_64)
-	/* We come here either from startup_32
-	 * or directly from a 64bit bootloader.
-	 * Since we may have come directly from a bootloader we
-	 * reload the page tables here.
+	/*
+	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 1,
+	 * and someone has loaded a mapped page table.
+	 *
+	 * %esi holds a physical pointer to real_mode_data.
+	 *
+	 * We come here either from startup_64 (using physical addresses)
+	 * or from trampoline.S (using virtual addresses).
+	 *
+	 * Using virtual addresses from trampoline.S removes the need
+	 * to have any identity mapped pages in the kernel page table
+	 * after the boot processor executes this code.
 	 */
 
 	/* Enable PAE mode and PGE */
@@ -116,8 +149,14 @@ ENTRY(secondary_startup_64)
 
 	/* Setup early boot stage 4 level pagetables. */
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+	addq	phys_base(%rip), %rax
 	movq	%rax, %cr3
 
+	/* Ensure I am executing from virtual addresses */
+	movq	$1f, %rax
+	jmp	*%rax
+1:
+
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
@@ -126,17 +165,11 @@ ENTRY(secondary_startup_64)
 	/* Setup EFER (Extended Feature Enable Register) */
 	movl	$MSR_EFER, %ecx
 	rdmsr
-
-	/* Enable System Call */
-	btsl	$_EFER_SCE, %eax
-
-	/* No Execute supported? */
-	btl	$20,%edi
+	btsl	$_EFER_SCE, %eax	/* Enable System Call */
+	btl	$20,%edi		/* No Execute supported? */
 	jnc     1f
 	btsl	$_EFER_NX, %eax
-1:
-	/* Make changes effective */
-	wrmsr
+1:	wrmsr				/* Make changes effective */
 
 	/* Setup cr0 */
 #define CR0_PM				1		/* protected mode */
@@ -163,7 +196,7 @@ ENTRY(secondary_startup_64)
 	 * addresses where we're currently running on. We have to do that here
 	 * because in 32bit we couldn't load a 64bit linear address.
 	 */
-	lgdt	cpu_gdt_descr
+	lgdt	cpu_gdt_descr(%rip)
 
 	/* set up data segments. actually 0 would do too */
 	movl $__KERNEL_DS,%eax
@@ -214,6 +247,9 @@ initial_code:
 init_rsp:
 	.quad  init_thread_union+THREAD_SIZE-8
 
+bad_address:
+	jmp bad_address
+
 ENTRY(early_idt_handler)
 	cmpl $2,early_recursion_flag(%rip)
 	jz  1f
@@ -242,23 +278,7 @@ early_idt_msg:
 early_idt_ripmsg:
 	.asciz "RIP %s\n"
 
-.code32
-ENTRY(no_long_mode)
-	/* This isn't an x86-64 CPU so hang */
-1:
-	jmp	1b
-
-.org 0xf00
-	.globl pGDT32
-pGDT32:
-	.word	gdt_end-cpu_gdt_table-1
-	.long	cpu_gdt_table-__START_KERNEL_map
-
-.org 0xf10	
-ljumpvector:
-	.long	startup_64-__START_KERNEL_map
-	.word	__KERNEL_CS
-
+.balign PAGE_SIZE
 ENTRY(stext)
 ENTRY(_stext)
 
@@ -303,7 +323,7 @@ NEXT_PAGE(level2_ident_pgt)
 	 * Don't set NX because code runs from these pages.
 	 */
 	PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD)
-	
+
 NEXT_PAGE(level2_kernel_pgt)
 	/* 40MB kernel mapping. The kernel code cannot be bigger than that.
 	   When you change this change KERNEL_TEXT_SIZE in page.h too. */
@@ -313,6 +333,9 @@ NEXT_PAGE(level2_kernel_pgt)
 	/* Module mapping starts here */
 	.fill	(PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
 
+NEXT_PAGE(level2_spare_pgt)
+	.fill   512,8,0
+
 #undef PMDS
 #undef NEXT_PAGE
 
@@ -330,6 +353,10 @@ gdt:
 	.endr
 #endif
 
+ENTRY(phys_base)
+	/* This must match the first entry in level2_kernel_pgt */
+	.quad   0x0000000000000000
+
 /* We need valid kernel segments for data and code in long mode too
  * IRET will check the segment types  kkeil 2000/10/28
  * Also sysret mandates a special GDT layout 
Index: linux/arch/x86_64/kernel/suspend_asm.S
===================================================================
--- linux.orig/arch/x86_64/kernel/suspend_asm.S
+++ linux/arch/x86_64/kernel/suspend_asm.S
@@ -71,9 +71,10 @@ loop:
 	jmp	loop
 done:
 	/* go back to the original page tables */
-	leaq	init_level4_pgt(%rip), %rax
-	subq	$__START_KERNEL_map, %rax
-	movq	%rax, %cr3
+	movq    $(init_level4_pgt - __START_KERNEL_map), %rax
+	addq    phys_base(%rip), %rax
+	movq    %rax, %cr3
+
 	/* Flush TLB, including "global" things (vmalloc) */
 	movq	mmu_cr4_features(%rip), %rax
 	movq	%rax, %rdx
Index: linux/include/asm-x86_64/page.h
===================================================================
--- linux.orig/include/asm-x86_64/page.h
+++ linux/include/asm-x86_64/page.h
@@ -61,6 +61,8 @@ typedef struct { unsigned long pgd; } pg
 
 typedef struct { unsigned long pgprot; } pgprot_t;
 
+extern unsigned long phys_base;
+
 #define pte_val(x)	((x).pte)
 #define pmd_val(x)	((x).pmd)
 #define pud_val(x)	((x).pud)
@@ -101,14 +103,14 @@ typedef struct { unsigned long pgprot; }
 #define PAGE_OFFSET		__PAGE_OFFSET
 
 /* Note: __pa(&symbol_visible_to_c) should be always replaced with __pa_symbol.
-   Otherwise you risk miscompilation. */ 
+   Otherwise you risk miscompilation. */
 #define __pa(x)			((unsigned long)(x) - PAGE_OFFSET)
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */ 
 #define __pa_symbol(x)		\
 	({unsigned long v;  \
 	  asm("" : "=r" (v) : "0" (x)); \
-	  (v - __START_KERNEL_map); })
+	  ((v - __START_KERNEL_map) + phys_base); })
 
 #define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
 #ifdef CONFIG_FLATMEM

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [20/22] x86_64: build-time checking
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (18 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [19/22] x86_64: Relocatable Kernel Support Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 17:59 ` [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage Andi Kleen
  2007-04-28 17:59 ` [PATCH] [22/22] x86_64: Move cpu verification code to common file Andi Kleen
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, Eric W. Biederman, Andi Kleen, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>

o X86_64 kernel should run from 2MB aligned address for two reasons.
	- Performance.
	- For relocatable kernels, page tables are updated based on difference
	  between compile time address and load time physical address.
	  This difference should be multiple of 2MB as kernel text and data
	  is mapped using 2MB pages and PMD should be pointing to a 2MB
	  aligned address. Life is simpler if both compile time and load time
	  kernel addresses are 2MB aligned.

o Flag the error at compile time if one is trying to build a kernel which
  does not meet alignment restrictions.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86_64/boot/compressed/misc.c |    2 +-
 arch/x86_64/kernel/head64.c        |    7 +++++++
 include/asm-x86_64/page.h          |    1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/boot/compressed/misc.c
===================================================================
--- linux.orig/arch/x86_64/boot/compressed/misc.c
+++ linux/arch/x86_64/boot/compressed/misc.c
@@ -358,7 +358,7 @@ asmlinkage void decompress_kernel(void *
 	insize = input_len;
 	inptr  = 0;
 
-	if ((ulg)output & 0x1fffffUL)
+	if ((ulg)output & (__KERNEL_ALIGN - 1))
 		error("Destination address not 2M aligned");
 	if ((ulg)output >= 0xffffffffffUL)
 		error("Destination address too large");
Index: linux/arch/x86_64/kernel/head64.c
===================================================================
--- linux.orig/arch/x86_64/kernel/head64.c
+++ linux/arch/x86_64/kernel/head64.c
@@ -62,6 +62,13 @@ void __init x86_64_start_kernel(char * r
 {
 	int i;
 
+	/*
+	 * Make sure kernel is aligned to 2MB address. Catching it at compile
+	 * time is better. Change your config file and compile the kernel
+	 * for a 2MB aligned address (CONFIG_PHYSICAL_START)
+	 */
+	BUILD_BUG_ON(CONFIG_PHYSICAL_START & (__KERNEL_ALIGN - 1));
+
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
Index: linux/include/asm-x86_64/page.h
===================================================================
--- linux.orig/include/asm-x86_64/page.h
+++ linux/include/asm-x86_64/page.h
@@ -78,6 +78,7 @@ extern unsigned long phys_base;
 #endif /* !__ASSEMBLY__ */
 
 #define __PHYSICAL_START	CONFIG_PHYSICAL_START
+#define __KERNEL_ALIGN		0x200000
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 #define __START_KERNEL_map	0xffffffff80000000
 #define __PAGE_OFFSET           0xffff810000000000

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (19 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [20/22] x86_64: build-time checking Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  2007-04-28 18:07   ` Jeff Garzik
  2007-04-28 17:59 ` [PATCH] [22/22] x86_64: Move cpu verification code to common file Andi Kleen
  21 siblings, 1 reply; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
  load the protected mode kernel at non-1MB address. Now protected mode
  component is relocatable and can be loaded at non-1MB addresses.

o As of today kdump uses it to run a second kernel from a reserved memory
  area.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/boot/setup.S |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/boot/setup.S
===================================================================
--- linux.orig/arch/x86_64/boot/setup.S
+++ linux/arch/x86_64/boot/setup.S
@@ -80,7 +80,7 @@ start:
 # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
 
 		.ascii	"HdrS"		# header signature
-		.word	0x0204		# header version number (>= 0x0105)
+		.word	0x0205		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
 start_sys_seg:	.word	SYSSEG
@@ -155,7 +155,16 @@ cmd_line_ptr:	.long 0			# (Header versio
 					# low memory 0x10000 or higher.
 
 ramdisk_max:	.long 0xffffffff
-	
+kernel_alignment:  .long 0x200000       # physical addr alignment required for
+					# protected mode relocatable kernel
+#ifdef CONFIG_RELOCATABLE
+relocatable_kernel:    .byte 1
+#else
+relocatable_kernel:    .byte 0
+#endif
+pad2:                  .byte 0
+pad3:                  .word 0
+
 trampoline:	call	start_of_setup
 		.align 16
 					# The offset at this point is 0x240

^ permalink raw reply	[flat|nested] 217+ messages in thread

* [PATCH] [22/22] x86_64: Move cpu verification code to common file
  2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
                   ` (20 preceding siblings ...)
  2007-04-28 17:59 ` [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage Andi Kleen
@ 2007-04-28 17:59 ` Andi Kleen
  21 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 17:59 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel, patches


From: Vivek Goyal <vgoyal@in.ibm.com>


o This patch moves the code to verify long mode and SSE to a common file.
  This code is now shared by trampoline.S, wakeup.S, boot/setup.S and
  boot/compressed/head.S

o So far we used to do very limited check in trampoline.S, wakeup.S and
  in 32bit entry point. Now all the entry paths are forced to do the
  exhaustive check, including SSE because verify_cpu is shared.

o I am keeping this patch as last in the x86 relocatable series because
  previous patches have got quite some amount of testing done and don't want
  to distrub that. So that if there is problem introduced by this patch, at
  least it can be easily isolated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>

---

 arch/x86_64/boot/compressed/head.S |   19 ++++++
 arch/x86_64/boot/setup.S           |   65 ++-------------------
 arch/x86_64/kernel/acpi/wakeup.S   |   30 +++++-----
 arch/x86_64/kernel/trampoline.S    |   51 +----------------
 arch/x86_64/kernel/verify_cpu.S    |  110 +++++++++++++++++++++++++++++++++++++
 5 files changed, 152 insertions(+), 123 deletions(-)

Index: linux/arch/x86_64/boot/compressed/head.S
===================================================================
--- linux.orig/arch/x86_64/boot/compressed/head.S
+++ linux/arch/x86_64/boot/compressed/head.S
@@ -54,6 +54,15 @@ startup_32:
 1:	popl	%ebp
 	subl	$1b, %ebp
 
+/* setup a stack and make sure cpu supports long mode. */
+	movl	$user_stack_end, %eax
+	addl	%ebp, %eax
+	movl	%eax, %esp
+
+	call	verify_cpu
+	testl	%eax, %eax
+	jnz	no_longmode
+
 /* Compute the delta between where we were compiled to run at
  * and where the code will actually run at.
  */
@@ -159,13 +168,21 @@ startup_32:
 	/* Jump from 32bit compatibility mode into 64bit mode. */
 	lret
 
+no_longmode:
+	/* This isn't an x86-64 CPU so hang */
+1:
+	hlt
+	jmp     1b
+
+#include "../../kernel/verify_cpu.S"
+
 	/* Be careful here startup_64 needs to be at a predictable
 	 * address so I can export it in an ELF header.  Bootloaders
 	 * should look at the ELF header to find this address, as
 	 * it may change in the future.
 	 */
 	.code64
-	.org 0x100
+	.org 0x200
 ENTRY(startup_64)
 	/* We come here either from startup_32 or directly from a
 	 * 64bit bootloader.  If we come here from a bootloader we depend on
Index: linux/arch/x86_64/boot/setup.S
===================================================================
--- linux.orig/arch/x86_64/boot/setup.S
+++ linux/arch/x86_64/boot/setup.S
@@ -299,64 +299,10 @@ loader_ok:
 	movw	%cs,%ax
 	movw	%ax,%ds
 	
-	/* minimum CPUID flags for x86-64 */
-	/* see http://www.x86-64.org/lists/discuss/msg02971.html */		
-#define SSE_MASK ((1<<25)|(1<<26))
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
-					   (1<<13)|(1<<15)|(1<<24))
-#define REQUIRED_MASK2 (1<<29)
-
-	pushfl				/* standard way to check for cpuid */
-	popl	%eax
-	movl	%eax,%ebx
-	xorl	$0x200000,%eax
-	pushl	%eax
-	popfl
-	pushfl
-	popl	%eax
-	cmpl	%eax,%ebx
-	jz	no_longmode		/* cpu has no cpuid */
-	movl	$0x0,%eax
-	cpuid
-	cmpl	$0x1,%eax
-	jb	no_longmode		/* no cpuid 1 */
-	xor	%di,%di
-	cmpl	$0x68747541,%ebx	/* AuthenticAMD */
-	jnz	noamd
-	cmpl	$0x69746e65,%edx
-	jnz	noamd
-	cmpl	$0x444d4163,%ecx
-	jnz	noamd
-	mov	$1,%di			/* cpu is from AMD */
-noamd:		
-	movl    $0x1,%eax
-	cpuid
-	andl	$REQUIRED_MASK1,%edx
-	xorl	$REQUIRED_MASK1,%edx
-	jnz	no_longmode
-	movl    $0x80000000,%eax
-	cpuid
-	cmpl    $0x80000001,%eax
-	jb      no_longmode             /* no extended cpuid */
-	movl    $0x80000001,%eax
-	cpuid
-	andl    $REQUIRED_MASK2,%edx
-	xorl    $REQUIRED_MASK2,%edx
-	jnz     no_longmode
-sse_test:		
-	movl	$1,%eax
-	cpuid
-	andl	$SSE_MASK,%edx
-	cmpl	$SSE_MASK,%edx
-	je	sse_ok
-	test	%di,%di
-	jz	no_longmode	/* only try to force SSE on AMD */ 
-	movl	$0xc0010015,%ecx	/* HWCR */
-	rdmsr
-	btr	$15,%eax	/* enable SSE */
-	wrmsr
-	xor	%di,%di		/* don't loop */
-	jmp	sse_test	/* try again */	
+	call verify_cpu
+	testl %eax,%eax
+	jz sse_ok
+
 no_longmode:
 	call	beep
 	lea	long_mode_panic,%si
@@ -366,7 +312,8 @@ no_longmode_loop:		
 long_mode_panic:
 	.string "Your CPU does not support long mode. Use a 32bit distribution."
 	.byte 0
-	
+
+#include "../kernel/verify_cpu.S"
 sse_ok:
 	popw	%ds
 	
Index: linux/arch/x86_64/kernel/acpi/wakeup.S
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/wakeup.S
+++ linux/arch/x86_64/kernel/acpi/wakeup.S
@@ -43,6 +43,11 @@ wakeup_code:
 	cmpl	$0x12345678, %eax
 	jne	bogus_real_magic
 
+  	call	verify_cpu			# Verify the cpu supports long
+						# mode
+	testl	%eax, %eax
+	jnz	no_longmode
+
 	testl	$1, video_flags - wakeup_code
 	jz	1f
 	lcall   $0xc000,$3
@@ -92,18 +97,6 @@ wakeup_32:
 # Running in this code, but at low address; paging is not yet turned on.
 	movb	$0xa5, %al	;  outb %al, $0x80
 
-	/* Check if extended functions are implemented */		
-	movl	$0x80000000, %eax
-	cpuid
-	cmpl	$0x80000000, %eax
-	jbe	bogus_cpu
-	wbinvd
-	mov	$0x80000001, %eax
-	cpuid
-	btl	$29, %edx
-	jnc	bogus_cpu
-	movl	%edx,%edi
-	
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 
@@ -123,6 +116,11 @@ wakeup_32:
 	leal    (wakeup_level4_pgt - wakeup_code)(%esi), %eax
 	movl	%eax, %cr3
 
+        /* Check if nx is implemented */
+        movl    $0x80000001, %eax
+        cpuid
+        movl    %edx,%edi
+
 	/* Enable Long Mode */
 	xorl    %eax, %eax
 	btsl	$_EFER_LME, %eax
@@ -244,10 +242,12 @@ bogus_64_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
 	jmp bogus_64_magic
 
-bogus_cpu:
-	movb	$0xbc,%al	;  outb %al,$0x80
-	jmp bogus_cpu
+.code16
+no_longmode:
+	movb    $0xbc,%al       ;  outb %al,$0x80
+	jmp no_longmode
 
+#include "../verify_cpu.S"
 	
 /* This code uses an extended set of video mode numbers. These include:
  * Aliases for standard modes
Index: linux/arch/x86_64/kernel/trampoline.S
===================================================================
--- linux.orig/arch/x86_64/kernel/trampoline.S
+++ linux/arch/x86_64/kernel/trampoline.S
@@ -54,6 +54,8 @@ r_base = .
 	movw	$(trampoline_stack_end - r_base), %sp
 
 	call	verify_cpu		# Verify the cpu supports long mode
+	testl   %eax, %eax		# Check for return code
+	jnz	no_longmode
 
 	mov	%cs, %ax
 	movzx	%ax, %esi		# Find the 32bit trampoline location
@@ -121,57 +123,10 @@ startup_64:
 	jmp	*%rax
 
 	.code16
-verify_cpu:
-	pushl	$0			# Kill any dangerous flags
-	popfl
-
-	/* minimum CPUID flags for x86-64 */
-	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
-			   (1<<13)|(1<<15)|(1<<24)|(1<<25)|(1<<26))
-#define REQUIRED_MASK2 (1<<29)
-
-	pushfl				# check for cpuid
-	popl	%eax
-	movl	%eax, %ebx
-	xorl	$0x200000,%eax
-	pushl	%eax
-	popfl
-	pushfl
-	popl	%eax
-	pushl	%ebx
-	popfl
-	cmpl	%eax, %ebx
-	jz	no_longmode
-
-	xorl	%eax, %eax		# See if cpuid 1 is implemented
-	cpuid
-	cmpl	$0x1, %eax
-	jb	no_longmode
-
-	movl	$0x01, %eax		# Does the cpu have what it takes?
-	cpuid
-	andl	$REQUIRED_MASK1, %edx
-	xorl	$REQUIRED_MASK1, %edx
-	jnz	no_longmode
-
-	movl	$0x80000000, %eax	# See if extended cpuid is implemented
-	cpuid
-	cmpl	$0x80000001, %eax
-	jb	no_longmode
-
-	movl	$0x80000001, %eax	# Does the cpu have what it takes?
-	cpuid
-	andl	$REQUIRED_MASK2, %edx
-	xorl	$REQUIRED_MASK2, %edx
-	jnz	no_longmode
-
-	ret				# The cpu supports long mode
-
 no_longmode:
 	hlt
 	jmp no_longmode
-
+#include "verify_cpu.S"
 
 	# Careful these need to be in the same 64K segment as the above;
 tidt:
Index: linux/arch/x86_64/kernel/verify_cpu.S
===================================================================
--- /dev/null
+++ linux/arch/x86_64/kernel/verify_cpu.S
@@ -0,0 +1,110 @@
+/*
+ *
+ *	verify_cpu.S - Code for cpu long mode and SSE verification. This
+ *	code has been borrowed from boot/setup.S and was introduced by
+ * 	Andi Kleen.
+ *
+ *	Copyright (c) 2007  Andi Kleen (ak@suse.de)
+ *	Copyright (c) 2007  Eric Biederman (ebiederm@xmission.com)
+ *	Copyright (c) 2007  Vivek Goyal (vgoyal@in.ibm.com)
+ *
+ * 	This source code is licensed under the GNU General Public License,
+ * 	Version 2.  See the file COPYING for more details.
+ *
+ *	This is a common code for verification whether CPU supports
+ * 	long mode and SSE or not. It is not called directly instead this
+ *	file is included at various places and compiled in that context.
+ * 	Following are the current usage.
+ *
+ * 	This file is included by both 16bit and 32bit code.
+ *
+ *	arch/x86_64/boot/setup.S : Boot cpu verification (16bit)
+ *	arch/x86_64/boot/compressed/head.S: Boot cpu verification (32bit)
+ *	arch/x86_64/kernel/trampoline.S: secondary processor verfication (16bit)
+ *	arch/x86_64/kernel/acpi/wakeup.S:Verfication at resume (16bit)
+ *
+ *	verify_cpu, returns the status of cpu check in register %eax.
+ *		0: Success    1: Failure
+ *
+ * 	The caller needs to check for the error code and take the action
+ * 	appropriately. Either display a message or halt.
+ */
+
+verify_cpu:
+
+	pushfl				# Save caller passed flags
+	pushl	$0			# Kill any dangerous flags
+	popfl
+
+	/* minimum CPUID flags for x86-64 */
+	/* see http://www.x86-64.org/lists/discuss/msg02971.html */
+#define SSE_MASK ((1<<25)|(1<<26))
+#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
+					   (1<<13)|(1<<15)|(1<<24))
+#define REQUIRED_MASK2 (1<<29)
+	pushfl				# standard way to check for cpuid
+	popl	%eax
+	movl	%eax,%ebx
+	xorl	$0x200000,%eax
+	pushl	%eax
+	popfl
+	pushfl
+	popl	%eax
+	cmpl	%eax,%ebx
+	jz	verify_cpu_no_longmode	# cpu has no cpuid
+
+	movl	$0x0,%eax		# See if cpuid 1 is implemented
+	cpuid
+	cmpl	$0x1,%eax
+	jb	verify_cpu_no_longmode	# no cpuid 1
+
+	xor	%di,%di
+	cmpl	$0x68747541,%ebx	# AuthenticAMD
+	jnz	verify_cpu_noamd
+	cmpl	$0x69746e65,%edx
+	jnz	verify_cpu_noamd
+	cmpl	$0x444d4163,%ecx
+	jnz	verify_cpu_noamd
+	mov	$1,%di			# cpu is from AMD
+
+verify_cpu_noamd:
+	movl    $0x1,%eax		# Does the cpu have what it takes
+	cpuid
+	andl	$REQUIRED_MASK1,%edx
+	xorl	$REQUIRED_MASK1,%edx
+	jnz	verify_cpu_no_longmode
+
+	movl    $0x80000000,%eax	# See if extended cpuid is implemented
+	cpuid
+	cmpl    $0x80000001,%eax
+	jb      verify_cpu_no_longmode	# no extended cpuid
+
+	movl    $0x80000001,%eax	# Does the cpu have what it takes
+	cpuid
+	andl    $REQUIRED_MASK2,%edx
+	xorl    $REQUIRED_MASK2,%edx
+	jnz     verify_cpu_no_longmode
+
+verify_cpu_sse_test:
+	movl	$1,%eax
+	cpuid
+	andl	$SSE_MASK,%edx
+	cmpl	$SSE_MASK,%edx
+	je	verify_cpu_sse_ok
+	test	%di,%di
+	jz	verify_cpu_no_longmode	# only try to force SSE on AMD
+	movl	$0xc0010015,%ecx	# HWCR
+	rdmsr
+	btr	$15,%eax		# enable SSE
+	wrmsr
+	xor	%di,%di			# don't loop
+	jmp	verify_cpu_sse_test	# try again
+
+verify_cpu_no_longmode:
+	popfl				# Restore caller passed flags
+	movl $1,%eax
+	ret
+verify_cpu_sse_ok:
+	popfl				# Restore caller passed flags
+	xorl %eax, %eax
+	ret

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 17:59 ` [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage Andi Kleen
@ 2007-04-28 18:07   ` Jeff Garzik
  2007-04-28 18:24     ` Andi Kleen
  2007-04-28 20:18     ` [patches] " Eric W. Biederman
  0 siblings, 2 replies; 217+ messages in thread
From: Jeff Garzik @ 2007-04-28 18:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Vivek Goyal, linux-kernel, patches, H. Peter Anvin

Andi Kleen wrote:
> From: Vivek Goyal <vgoyal@in.ibm.com>
> 
> 
> o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
>   load the protected mode kernel at non-1MB address. Now protected mode
>   component is relocatable and can be loaded at non-1MB addresses.
> 
> o As of today kdump uses it to run a second kernel from a reserved memory
>   area.
> 
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> Signed-off-by: Andi Kleen <ak@suse.de>

Can you point to / link to threads where the bootloader folks looked 
over the reloc changes from their side, and commented?

	Jeff




^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 18:07   ` Jeff Garzik
@ 2007-04-28 18:24     ` Andi Kleen
  2007-04-28 20:18     ` [patches] " Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-04-28 18:24 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Vivek Goyal, linux-kernel, patches, H. Peter Anvin


> Can you point to / link to threads where the bootloader folks looked 
> over the reloc changes from their side, and commented?

What boot loader folks? The result still works with grub and pxelinux at least
and I guess the others will complain as they get around testing new kernels.

-Andi


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 18:07   ` Jeff Garzik
  2007-04-28 18:24     ` Andi Kleen
@ 2007-04-28 20:18     ` Eric W. Biederman
  2007-04-28 20:38       ` H. Peter Anvin
                         ` (2 more replies)
  1 sibling, 3 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-28 20:18 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andi Kleen, patches, Vivek Goyal, linux-kernel, H. Peter Anvin

Jeff Garzik <jeff@garzik.org> writes:

> Andi Kleen wrote:
>> From: Vivek Goyal <vgoyal@in.ibm.com>
>> 
>> 
>> o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
>>   load the protected mode kernel at non-1MB address. Now protected mode
>>   component is relocatable and can be loaded at non-1MB addresses.
>> 
>> o As of today kdump uses it to run a second kernel from a reserved memory
>>   area.
>> 
>> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
>> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
>> Signed-off-by: Andi Kleen <ak@suse.de>
>
> Can you point to / link to threads where the bootloader folks looked 
> over the reloc changes from their side, and commented?

Jeff what is your concern.

The boot protocol change is in 2.6.21 for arch/i386.

HPA looked at it a while ago.

All it does is set a flag that tells a bootloader.
"Hey. I can run when loaded a non-default address, and this is what
 you have to align me to."

All relocation processing happens in the kernel itself.

So it is all pretty trivial.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 20:18     ` [patches] " Eric W. Biederman
@ 2007-04-28 20:38       ` H. Peter Anvin
  2007-04-28 20:46         ` Eric W. Biederman
  2007-04-28 20:39       ` Jeff Garzik
  2007-04-29  7:24       ` Jeremy Fitzhardinge
  2 siblings, 1 reply; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-28 20:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel

Eric W. Biederman wrote:
> 
> The boot protocol change is in 2.6.21 for arch/i386.
> 
> HPA looked at it a while ago.
> 
> All it does is set a flag that tells a bootloader.
> "Hey. I can run when loaded a non-default address, and this is what
>  you have to align me to."
> 
> All relocation processing happens in the kernel itself.
> 
> So it is all pretty trivial.
> 

Indeed.  We *did* find some problems with Grub with the early versions,
those were addressed.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 20:18     ` [patches] " Eric W. Biederman
  2007-04-28 20:38       ` H. Peter Anvin
@ 2007-04-28 20:39       ` Jeff Garzik
  2007-04-29  7:24       ` Jeremy Fitzhardinge
  2 siblings, 0 replies; 217+ messages in thread
From: Jeff Garzik @ 2007-04-28 20:39 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, patches, Vivek Goyal, linux-kernel, H. Peter Anvin

No specific concern; the patch description did not say that bootloader 
people had ACK'd the change, or describe the testing regimen.

Just reading the patch, the impact and preparation were unknowns.

	Jeff




^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 20:38       ` H. Peter Anvin
@ 2007-04-28 20:46         ` Eric W. Biederman
  2007-04-29  4:50           ` Vivek Goyal
  0 siblings, 1 reply; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-28 20:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> The boot protocol change is in 2.6.21 for arch/i386.
>> 
>> HPA looked at it a while ago.
>> 
>> All it does is set a flag that tells a bootloader.
>> "Hey. I can run when loaded a non-default address, and this is what
>>  you have to align me to."
>> 
>> All relocation processing happens in the kernel itself.
>> 
>> So it is all pretty trivial.
>> 
>
> Indeed.  We *did* find some problems with Grub with the early versions,
> those were addressed.

We found some failures that weren't root caused so we went to this
more conservative version.

RHEL5 is actually shipping the original version of these patches if I
recall correctly.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 20:46         ` Eric W. Biederman
@ 2007-04-29  4:50           ` Vivek Goyal
  0 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-04-29  4:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Jeff Garzik, Andi Kleen, patches, linux-kernel

On Sat, Apr 28, 2007 at 02:46:18PM -0600, Eric W. Biederman wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> > Eric W. Biederman wrote:
> >> 
> >> The boot protocol change is in 2.6.21 for arch/i386.
> >> 
> >> HPA looked at it a while ago.
> >> 
> >> All it does is set a flag that tells a bootloader.
> >> "Hey. I can run when loaded a non-default address, and this is what
> >>  you have to align me to."
> >> 
> >> All relocation processing happens in the kernel itself.
> >> 
> >> So it is all pretty trivial.
> >> 
> >
> > Indeed.  We *did* find some problems with Grub with the early versions,
> > those were addressed.
> 
> We found some failures that weren't root caused so we went to this
> more conservative version.
> 
> RHEL5 is actually shipping the original version of these patches if I
> recall correctly.
> 

Yes, RHEL5 is shipping original version of patches where an where and ELF
header has been added to describe bzImage.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-28 20:18     ` [patches] " Eric W. Biederman
  2007-04-28 20:38       ` H. Peter Anvin
  2007-04-28 20:39       ` Jeff Garzik
@ 2007-04-29  7:24       ` Jeremy Fitzhardinge
  2007-04-29 15:11           ` Eric W. Biederman
                           ` (2 more replies)
  2 siblings, 3 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-29  7:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel,
	H. Peter Anvin

Eric W. Biederman wrote:
> All it does is set a flag that tells a bootloader.
> "Hey. I can run when loaded a non-default address, and this is what
>  you have to align me to."
>
> All relocation processing happens in the kernel itself.
>   

Is it possible to decompress and extract the kernel image from the
bzImage without executing it?  Ie, is there enough information to find
the compressed data part of the bzImage by inspection?

At some point we'll need to change the Xen domain builder to handle
bzImage files, and it would be best if we didn't need to run them.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29  7:24       ` Jeremy Fitzhardinge
@ 2007-04-29 15:11           ` Eric W. Biederman
  2007-04-29 17:51         ` H. Peter Anvin
  2007-04-30  4:41         ` Rusty Russell
  2 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-29 15:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel,
	H. Peter Anvin, Rusty Russell, virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> All it does is set a flag that tells a bootloader.
>> "Hey. I can run when loaded a non-default address, and this is what
>>  you have to align me to."
>>
>> All relocation processing happens in the kernel itself.
>>   
>
> Is it possible to decompress and extract the kernel image from the
> bzImage without executing it?  Ie, is there enough information to find
> the compressed data part of the bzImage by inspection?
>
> At some point we'll need to change the Xen domain builder to handle
> bzImage files, and it would be best if we didn't need to run them.

/sbin/kexec bzImage ....

There is a 32bit entry point that is reasonably well specified
you can use.  I'm guessing you want to recover the 64bit ELF,
with all of your nice ELF notes etc.  vmlinux is run through
"objcopy -O binary" as part of the bzImage build process so all
of the ELF metadata is lost.

I have several ideas on how we can make this work but first I have to
ask what is it that you are trying to accomplish?

<rant>
  Right now I'm a little frustrated that insanity below slipped into
  head.S right after the 32bit entry point after the review said
  use the normal linux parameters for parameter passing.
  
  > #ifdef CONFIG_PARAVIRT
  >         movl %cs, %eax
  >         testl $0x3, %eax
  >         jnz startup_paravirt
  > #endif
  
  The whole thing should be based on a value in the linux parameter block
  pointed to by %esi, instead of the insane preserve all registers and
  attempt to be super compatible with everyone.

  Yes we do need a branch there, but no we don't need to be changing 
  the format we pass arguments to the kernel in.
</rant>

Anyway since it seems there is interest in going farther in tweaking
the linux boot process let's open thread up to what is needed/wanted
for future enhancements to linux booting, and let's see if we can
design something that works and isn't brain dead.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-29 15:11           ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-29 15:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> All it does is set a flag that tells a bootloader.
>> "Hey. I can run when loaded a non-default address, and this is what
>>  you have to align me to."
>>
>> All relocation processing happens in the kernel itself.
>>   
>
> Is it possible to decompress and extract the kernel image from the
> bzImage without executing it?  Ie, is there enough information to find
> the compressed data part of the bzImage by inspection?
>
> At some point we'll need to change the Xen domain builder to handle
> bzImage files, and it would be best if we didn't need to run them.

/sbin/kexec bzImage ....

There is a 32bit entry point that is reasonably well specified
you can use.  I'm guessing you want to recover the 64bit ELF,
with all of your nice ELF notes etc.  vmlinux is run through
"objcopy -O binary" as part of the bzImage build process so all
of the ELF metadata is lost.

I have several ideas on how we can make this work but first I have to
ask what is it that you are trying to accomplish?

<rant>
  Right now I'm a little frustrated that insanity below slipped into
  head.S right after the 32bit entry point after the review said
  use the normal linux parameters for parameter passing.
  
  > #ifdef CONFIG_PARAVIRT
  >         movl %cs, %eax
  >         testl $0x3, %eax
  >         jnz startup_paravirt
  > #endif
  
  The whole thing should be based on a value in the linux parameter block
  pointed to by %esi, instead of the insane preserve all registers and
  attempt to be super compatible with everyone.

  Yes we do need a branch there, but no we don't need to be changing 
  the format we pass arguments to the kernel in.
</rant>

Anyway since it seems there is interest in going farther in tweaking
the linux boot process let's open thread up to what is needed/wanted
for future enhancements to linux booting, and let's see if we can
design something that works and isn't brain dead.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29  7:24       ` Jeremy Fitzhardinge
  2007-04-29 15:11           ` Eric W. Biederman
@ 2007-04-29 17:51         ` H. Peter Anvin
  2007-04-29 18:10           ` Eric W. Biederman
  2007-04-30  4:41         ` Rusty Russell
  2 siblings, 1 reply; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-29 17:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jeff Garzik, Andi Kleen, patches, Vivek Goyal,
	linux-kernel

Jeremy Fitzhardinge wrote:
> Eric W. Biederman wrote:
>> All it does is set a flag that tells a bootloader.
>> "Hey. I can run when loaded a non-default address, and this is what
>>  you have to align me to."
>>
>> All relocation processing happens in the kernel itself.
>>   
> 
> Is it possible to decompress and extract the kernel image from the
> bzImage without executing it?  Ie, is there enough information to find
> the compressed data part of the bzImage by inspection?
> 
> At some point we'll need to change the Xen domain builder to handle
> bzImage files, and it would be best if we didn't need to run them.
> 

Probabilistically, you might be able to (search for a gzip header), but
it is *definitely* not guaranteed by protocol.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29 17:51         ` H. Peter Anvin
@ 2007-04-29 18:10           ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-29 18:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen, patches,
	Vivek Goyal, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Jeremy Fitzhardinge wrote:
>> Eric W. Biederman wrote:
>>> All it does is set a flag that tells a bootloader.
>>> "Hey. I can run when loaded a non-default address, and this is what
>>>  you have to align me to."
>>>
>>> All relocation processing happens in the kernel itself.
>>>   
>> 
>> Is it possible to decompress and extract the kernel image from the
>> bzImage without executing it?  Ie, is there enough information to find
>> the compressed data part of the bzImage by inspection?
>> 
>> At some point we'll need to change the Xen domain builder to handle
>> bzImage files, and it would be best if we didn't need to run them.
>> 
>
> Probabilistically, you might be able to (search for a gzip header), but
> it is *definitely* not guaranteed by protocol.

I suspect the issue isn't so much skipping the decompression
but either getting at the Xen ELF notes or bypassing privileged instructions.


Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29 15:11           ` Eric W. Biederman
  (?)
@ 2007-04-30  3:03           ` Rusty Russell
  2007-04-30  4:38               ` H. Peter Anvin
  -1 siblings, 1 reply; 217+ messages in thread
From: Rusty Russell @ 2007-04-30  3:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen, patches,
	Vivek Goyal, linux-kernel, H. Peter Anvin, virtualization

On Sun, 2007-04-29 at 09:11 -0600, Eric W. Biederman wrote:
>   Right now I'm a little frustrated that insanity below slipped into
>   head.S right after the 32bit entry point after the review said
>   use the normal linux parameters for parameter passing.
>   
>   > #ifdef CONFIG_PARAVIRT
>   >         movl %cs, %eax
>   >         testl $0x3, %eax
>   >         jnz startup_paravirt
>   > #endif
>   
>   The whole thing should be based on a value in the linux parameter block
>   pointed to by %esi, instead of the insane preserve all registers and
>   attempt to be super compatible with everyone.

Dammit, Eric, you spend a lot of time using words like "insane" where
you mean we didn't do everything all at once.

It's *not* clear that using %esi is sane, but nothing in the current
code prevents that.

I was trying to get Xen to use this entry point rather than their own: I
failed, and only lguest now uses it.

Fortunately, this also means it's trivial to change.
Rusty.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29 15:11           ` Eric W. Biederman
  (?)
  (?)
@ 2007-04-30  3:03           ` Rusty Russell
  -1 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-04-30  3:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

On Sun, 2007-04-29 at 09:11 -0600, Eric W. Biederman wrote:
>   Right now I'm a little frustrated that insanity below slipped into
>   head.S right after the 32bit entry point after the review said
>   use the normal linux parameters for parameter passing.
>   
>   > #ifdef CONFIG_PARAVIRT
>   >         movl %cs, %eax
>   >         testl $0x3, %eax
>   >         jnz startup_paravirt
>   > #endif
>   
>   The whole thing should be based on a value in the linux parameter block
>   pointed to by %esi, instead of the insane preserve all registers and
>   attempt to be super compatible with everyone.

Dammit, Eric, you spend a lot of time using words like "insane" where
you mean we didn't do everything all at once.

It's *not* clear that using %esi is sane, but nothing in the current
code prevents that.

I was trying to get Xen to use this entry point rather than their own: I
failed, and only lguest now uses it.

Fortunately, this also means it's trivial to change.
Rusty.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  3:03           ` Rusty Russell
@ 2007-04-30  4:38               ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-30  4:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell wrote:
> 
> Dammit, Eric, you spend a lot of time using words like "insane" where
> you mean we didn't do everything all at once.
> 
> It's *not* clear that using %esi is sane, but nothing in the current
> code prevents that.
> 

Why not?

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-30  4:38               ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-30  4:38 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

Rusty Russell wrote:
> 
> Dammit, Eric, you spend a lot of time using words like "insane" where
> you mean we didn't do everything all at once.
> 
> It's *not* clear that using %esi is sane, but nothing in the current
> code prevents that.
> 

Why not?

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29  7:24       ` Jeremy Fitzhardinge
  2007-04-29 15:11           ` Eric W. Biederman
  2007-04-29 17:51         ` H. Peter Anvin
@ 2007-04-30  4:41         ` Rusty Russell
  2 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-04-30  4:41 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jeff Garzik, Andi Kleen, patches, Vivek Goyal,
	linux-kernel, H. Peter Anvin

On Sun, 2007-04-29 at 00:24 -0700, Jeremy Fitzhardinge wrote:
> Is it possible to decompress and extract the kernel image from the
> bzImage without executing it?  Ie, is there enough information to find
> the compressed data part of the bzImage by inspection?
> 
> At some point we'll need to change the Xen domain builder to handle
> bzImage files, and it would be best if we didn't need to run them.

Almost.  See lguest's launcher code: load_bzimage().  You'll hate it 8)

Rusty.



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  4:38               ` H. Peter Anvin
  (?)
@ 2007-04-30  5:03               ` Rusty Russell
  2007-04-30  5:25                   ` Eric W. Biederman
                                   ` (2 more replies)
  -1 siblings, 3 replies; 217+ messages in thread
From: Rusty Russell @ 2007-04-30  5:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote:
> Rusty Russell wrote:
> > 
> > Dammit, Eric, you spend a lot of time using words like "insane" where
> > you mean we didn't do everything all at once.
> > 
> > It's *not* clear that using %esi is sane, but nothing in the current
> > code prevents that.
> 
> Why not?

(I assume you mean why isn't it clear?)

Because VMI uses the presence of a ROM to indicate it's not native.  KVM
uses a magic MSR IIRC.

I think it makes sense for lguest to change over, tho.  Patches welcome
8)

Rusty.



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  4:38               ` H. Peter Anvin
  (?)
  (?)
@ 2007-04-30  5:03               ` Rusty Russell
  -1 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-04-30  5:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote:
> Rusty Russell wrote:
> > 
> > Dammit, Eric, you spend a lot of time using words like "insane" where
> > you mean we didn't do everything all at once.
> > 
> > It's *not* clear that using %esi is sane, but nothing in the current
> > code prevents that.
> 
> Why not?

(I assume you mean why isn't it clear?)

Because VMI uses the presence of a ROM to indicate it's not native.  KVM
uses a magic MSR IIRC.

I think it makes sense for lguest to change over, tho.  Patches welcome
8)

Rusty.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  5:03               ` Rusty Russell
@ 2007-04-30  5:25                   ` Eric W. Biederman
  2007-04-30 15:34                 ` Eric W. Biederman
  2007-04-30 15:34                 ` Eric W. Biederman
  2 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30  5:25 UTC (permalink / raw)
  To: Rusty Russell
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell <rusty@rustcorp.com.au> writes:

> On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote:
>> Rusty Russell wrote:
>> > 
>> > Dammit, Eric, you spend a lot of time using words like "insane" where
>> > you mean we didn't do everything all at once.
>> > 
>> > It's *not* clear that using %esi is sane, but nothing in the current
>> > code prevents that.
>> 
>> Why not?
>
> (I assume you mean why isn't it clear?)
>
> Because VMI uses the presence of a ROM to indicate it's not native.  KVM
> uses a magic MSR IIRC.

Reasonable, if you don't mid a little hardware emulation.

> I think it makes sense for lguest to change over, tho.  Patches welcome
> 8)

Sure.

Peter do we want to use the bootloader byte and assign lguest it's own
bootloader type or do we want to add another field specific to 
paravirtualized environments?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-30  5:25                   ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30  5:25 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Rusty Russell <rusty@rustcorp.com.au> writes:

> On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote:
>> Rusty Russell wrote:
>> > 
>> > Dammit, Eric, you spend a lot of time using words like "insane" where
>> > you mean we didn't do everything all at once.
>> > 
>> > It's *not* clear that using %esi is sane, but nothing in the current
>> > code prevents that.
>> 
>> Why not?
>
> (I assume you mean why isn't it clear?)
>
> Because VMI uses the presence of a ROM to indicate it's not native.  KVM
> uses a magic MSR IIRC.

Reasonable, if you don't mid a little hardware emulation.

> I think it makes sense for lguest to change over, tho.  Patches welcome
> 8)

Sure.

Peter do we want to use the bootloader byte and assign lguest it's own
bootloader type or do we want to add another field specific to 
paravirtualized environments?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  5:03               ` Rusty Russell
  2007-04-30  5:25                   ` Eric W. Biederman
  2007-04-30 15:34                 ` Eric W. Biederman
@ 2007-04-30 15:34                 ` Eric W. Biederman
  2007-05-01  3:38                   ` Rusty Russell
  2007-05-01  3:38                   ` Rusty Russell
  2 siblings, 2 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 15:34 UTC (permalink / raw)
  To: Rusty Russell
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell <rusty@rustcorp.com.au> writes:

> On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote:
>> Rusty Russell wrote:
>> > 
>> > Dammit, Eric, you spend a lot of time using words like "insane" where
>> > you mean we didn't do everything all at once.
>> > 
>> > It's *not* clear that using %esi is sane, but nothing in the current
>> > code prevents that.
>> 
>> Why not?
>
> (I assume you mean why isn't it clear?)
>
> Because VMI uses the presence of a ROM to indicate it's not native.  KVM
> uses a magic MSR IIRC.
>
> I think it makes sense for lguest to change over, tho.  Patches welcome
> 8)

Reading this it occurs to me what I object to wasn't that clear.

I have no problem with the testing of %cs to see if we are not in ring0.
That part while a little odd is fine, and we will certainly need a test
to skip the protected instructions in head.S

What I object to in particular is having (struct lguest_info?) instead
of using the standard format for kernel parameters pointed to in %esi.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  5:03               ` Rusty Russell
  2007-04-30  5:25                   ` Eric W. Biederman
@ 2007-04-30 15:34                 ` Eric W. Biederman
  2007-04-30 15:34                 ` Eric W. Biederman
  2 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 15:34 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Rusty Russell <rusty@rustcorp.com.au> writes:

> On Sun, 2007-04-29 at 21:38 -0700, H. Peter Anvin wrote:
>> Rusty Russell wrote:
>> > 
>> > Dammit, Eric, you spend a lot of time using words like "insane" where
>> > you mean we didn't do everything all at once.
>> > 
>> > It's *not* clear that using %esi is sane, but nothing in the current
>> > code prevents that.
>> 
>> Why not?
>
> (I assume you mean why isn't it clear?)
>
> Because VMI uses the presence of a ROM to indicate it's not native.  KVM
> uses a magic MSR IIRC.
>
> I think it makes sense for lguest to change over, tho.  Patches welcome
> 8)

Reading this it occurs to me what I object to wasn't that clear.

I have no problem with the testing of %cs to see if we are not in ring0.
That part while a little odd is fine, and we will certainly need a test
to skip the protected instructions in head.S

What I object to in particular is having (struct lguest_info?) instead
of using the standard format for kernel parameters pointed to in %esi.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  5:25                   ` Eric W. Biederman
  (?)
@ 2007-04-30 16:03                   ` H. Peter Anvin
  2007-04-30 16:47                       ` Eric W. Biederman
  -1 siblings, 1 reply; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-30 16:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rusty Russell, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Eric W. Biederman wrote:
> 
> Sure.
> 
> Peter do we want to use the bootloader byte and assign lguest it's own
> bootloader type or do we want to add another field specific to 
> paravirtualized environments?
> 

The bootloader byte is already a bit too overused; I'm a little scared
that we're going to run out of boot loader IDs as it is.

We probably should add another field, and while we're at it maybe we
should add a boot loader extension field.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30  5:25                   ` Eric W. Biederman
  (?)
  (?)
@ 2007-04-30 16:03                   ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-30 16:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, virtualization

Eric W. Biederman wrote:
> 
> Sure.
> 
> Peter do we want to use the bootloader byte and assign lguest it's own
> bootloader type or do we want to add another field specific to 
> paravirtualized environments?
> 

The bootloader byte is already a bit too overused; I'm a little scared
that we're going to run out of boot loader IDs as it is.

We probably should add another field, and while we're at it maybe we
should add a boot loader extension field.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 16:03                   ` H. Peter Anvin
@ 2007-04-30 16:47                       ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 16:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Rusty Russell, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization,
	James Bottomley

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> Sure.
>> 
>> Peter do we want to use the bootloader byte and assign lguest it's own
>> bootloader type or do we want to add another field specific to 
>> paravirtualized environments?
>> 
>
> The bootloader byte is already a bit too overused; I'm a little scared
> that we're going to run out of boot loader IDs as it is.
>
> We probably should add another field, and while we're at it maybe we
> should add a boot loader extension field.

A dedicated subarchitecture field would make sense.  One of the pieces
that would be nice is if we could detect other non paravirt
subarchitectures.

James is there a reasonable way to detect voyager at boot time?
So we could potentially have a generic kernel that can also boot on
voyager?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-30 16:47                       ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 16:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, James Bottomley, Vivek Goyal,
	virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> Sure.
>> 
>> Peter do we want to use the bootloader byte and assign lguest it's own
>> bootloader type or do we want to add another field specific to 
>> paravirtualized environments?
>> 
>
> The bootloader byte is already a bit too overused; I'm a little scared
> that we're going to run out of boot loader IDs as it is.
>
> We probably should add another field, and while we're at it maybe we
> should add a boot loader extension field.

A dedicated subarchitecture field would make sense.  One of the pieces
that would be nice is if we could detect other non paravirt
subarchitectures.

James is there a reasonable way to detect voyager at boot time?
So we could potentially have a generic kernel that can also boot on
voyager?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-29 15:11           ` Eric W. Biederman
@ 2007-04-30 18:50             ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-30 18:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel,
	H. Peter Anvin, Rusty Russell, virtualization

Eric W. Biederman wrote:
> I have several ideas on how we can make this work but first I have to
> ask what is it that you are trying to accomplish?
>   

The requirements are:

   1. the domain builder needs to get various information about the
      guest kernel by inspecting its ELF notes
   2. we start the kernel in 32-bit mode with paging enabled, in ring 1
   3. the guest kernel needs various pieces of runtime information from
      the hypervisor about its runtime environment

At the moment we just load a bare vmlinux kernel with a xen-specific
entrypoint, so 1 and 2 are easy.  3 is achieved by having the domain
builder start the kernel with %esi pointing to a Xen info structure
which tells the kernel what it needs to know.

That works OK for a kernel which is compiled to run under Xen and can't
run in any other environment, but now that we can generate a single
kernel which can run in any number of different environments, its
unfortunate that we still need multiple variants of the kernel image.

Clearly the Xen domain builder needs to be extended to deal with a
bzImage format kernel directly, so we can use the same actual kernel
image for native and Xen booting.  Since we're changing the domain
builder anyway, I can change the details of how it works so long as the
3 requirements can still be met.

So, I have no problem in also building a boot protocol info structure,
and passing that in %esi, so long as I can store a pointer to the
Xen-specific info as well.  Some info will be duplicated, like the
initrd location, but that's OK.

I'd already reserved a Xen bootloader ID specifically with this in mind;
all I really need is a place where I can stash the pointer.

I think I'd prefer to have the domain builder decompress/relocate the
kernel from the bzImage and start it directly, rather than have it
decompress/relocate itself, but I'm not really set on that.  It depends
on how well it can deal with having paging enabled and being in ring 1. 
Looks like it might just be a matter of starting up with "enough" memory
mapped.

So the biggest unknown is where to put the Xen ELF notes.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-30 18:50             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-30 18:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Eric W. Biederman wrote:
> I have several ideas on how we can make this work but first I have to
> ask what is it that you are trying to accomplish?
>   

The requirements are:

   1. the domain builder needs to get various information about the
      guest kernel by inspecting its ELF notes
   2. we start the kernel in 32-bit mode with paging enabled, in ring 1
   3. the guest kernel needs various pieces of runtime information from
      the hypervisor about its runtime environment

At the moment we just load a bare vmlinux kernel with a xen-specific
entrypoint, so 1 and 2 are easy.  3 is achieved by having the domain
builder start the kernel with %esi pointing to a Xen info structure
which tells the kernel what it needs to know.

That works OK for a kernel which is compiled to run under Xen and can't
run in any other environment, but now that we can generate a single
kernel which can run in any number of different environments, its
unfortunate that we still need multiple variants of the kernel image.

Clearly the Xen domain builder needs to be extended to deal with a
bzImage format kernel directly, so we can use the same actual kernel
image for native and Xen booting.  Since we're changing the domain
builder anyway, I can change the details of how it works so long as the
3 requirements can still be met.

So, I have no problem in also building a boot protocol info structure,
and passing that in %esi, so long as I can store a pointer to the
Xen-specific info as well.  Some info will be duplicated, like the
initrd location, but that's OK.

I'd already reserved a Xen bootloader ID specifically with this in mind;
all I really need is a place where I can stash the pointer.

I think I'd prefer to have the domain builder decompress/relocate the
kernel from the bzImage and start it directly, rather than have it
decompress/relocate itself, but I'm not really set on that.  It depends
on how well it can deal with having paging enabled and being in ring 1. 
Looks like it might just be a matter of starting up with "enough" memory
mapped.

So the biggest unknown is where to put the Xen ELF notes.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 18:50             ` Jeremy Fitzhardinge
  (?)
  (?)
@ 2007-04-30 22:10             ` Eric W. Biederman
  2007-04-30 22:42               ` Jeremy Fitzhardinge
  2007-04-30 22:42               ` Jeremy Fitzhardinge
  -1 siblings, 2 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 22:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel,
	H. Peter Anvin, Rusty Russell, virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> I have several ideas on how we can make this work but first I have to
>> ask what is it that you are trying to accomplish?
>>   
>
> The requirements are:
>
>    1. the domain builder needs to get various information about the
>       guest kernel by inspecting its ELF notes
>    2. we start the kernel in 32-bit mode with paging enabled, in ring 1
>    3. the guest kernel needs various pieces of runtime information from
>       the hypervisor about its runtime environment
>
> At the moment we just load a bare vmlinux kernel with a xen-specific
> entrypoint, so 1 and 2 are easy.  3 is achieved by having the domain
> builder start the kernel with %esi pointing to a Xen info structure
> which tells the kernel what it needs to know.
>
> That works OK for a kernel which is compiled to run under Xen and can't
> run in any other environment, but now that we can generate a single
> kernel which can run in any number of different environments, its
> unfortunate that we still need multiple variants of the kernel image.

Yes.

> Clearly the Xen domain builder needs to be extended to deal with a
> bzImage format kernel directly, so we can use the same actual kernel
> image for native and Xen booting.  Since we're changing the domain
> builder anyway, I can change the details of how it works so long as the
> 3 requirements can still be met.

Ok.

> So, I have no problem in also building a boot protocol info structure,
> and passing that in %esi, so long as I can store a pointer to the
> Xen-specific info as well.  Some info will be duplicated, like the
> initrd location, but that's OK.

Reasonable.

> I'd already reserved a Xen bootloader ID specifically with this in mind;
> all I really need is a place where I can stash the pointer.

Right.

So if you are using the standard linux kernel calling convention and
placing the standard arguments in %esi supporting bzImage gets easier.

> I think I'd prefer to have the domain builder decompress/relocate the
> kernel from the bzImage and start it directly, rather than have it
> decompress/relocate itself, but I'm not really set on that.

We can change a lot more implementation details arbitrarily if you don't
know what needs to happen for decompression and relocation.  Although
I think there is a reasonable argument to support a bImage format.
bzImage without compression.  The current bootloaders would not care
but in the embedded space you can save space but not including the
decompressor in the kernel.

We have to avoid the writes decompressor-prinnt routines and 
possibly the reload of the segment registers.  But otherwise
we should be fine.  I don't see any other privileged instructions
in arch/i386/boot/compressed/{head.S, misc.c}

> It depends
> on how well it can deal with having paging enabled and being in ring 1. 
> Looks like it might just be a matter of starting up with "enough" memory
> mapped.

Yes.  I think so.  There is an additional issue of exactly how do we
get the fixmap region allocated so we can use it but that is minor.

> So the biggest unknown is where to put the Xen ELF notes.

What I really want to do is go back to sticking an ELF header on the
bzImage.  We still can't support multiple entry points that way but we
can include ELF notes fairly easily.

It looks like for the next version of booting lguest and Xen are
actually coming closer together again.  Yea.

For boot protocol. 2.0.7  We currently need a subarchitecture field (16bits?).
default == 0, Xen, lguest, voyager?, visws?, numaq?, efi?

We need a subarchitecture data pointer field (32bits).

We need some subarchitecture kernel information for the different
bootloader.

We need to target .23 because it is to late for .22.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 18:50             ` Jeremy Fitzhardinge
  (?)
@ 2007-04-30 22:10             ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 22:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> I have several ideas on how we can make this work but first I have to
>> ask what is it that you are trying to accomplish?
>>   
>
> The requirements are:
>
>    1. the domain builder needs to get various information about the
>       guest kernel by inspecting its ELF notes
>    2. we start the kernel in 32-bit mode with paging enabled, in ring 1
>    3. the guest kernel needs various pieces of runtime information from
>       the hypervisor about its runtime environment
>
> At the moment we just load a bare vmlinux kernel with a xen-specific
> entrypoint, so 1 and 2 are easy.  3 is achieved by having the domain
> builder start the kernel with %esi pointing to a Xen info structure
> which tells the kernel what it needs to know.
>
> That works OK for a kernel which is compiled to run under Xen and can't
> run in any other environment, but now that we can generate a single
> kernel which can run in any number of different environments, its
> unfortunate that we still need multiple variants of the kernel image.

Yes.

> Clearly the Xen domain builder needs to be extended to deal with a
> bzImage format kernel directly, so we can use the same actual kernel
> image for native and Xen booting.  Since we're changing the domain
> builder anyway, I can change the details of how it works so long as the
> 3 requirements can still be met.

Ok.

> So, I have no problem in also building a boot protocol info structure,
> and passing that in %esi, so long as I can store a pointer to the
> Xen-specific info as well.  Some info will be duplicated, like the
> initrd location, but that's OK.

Reasonable.

> I'd already reserved a Xen bootloader ID specifically with this in mind;
> all I really need is a place where I can stash the pointer.

Right.

So if you are using the standard linux kernel calling convention and
placing the standard arguments in %esi supporting bzImage gets easier.

> I think I'd prefer to have the domain builder decompress/relocate the
> kernel from the bzImage and start it directly, rather than have it
> decompress/relocate itself, but I'm not really set on that.

We can change a lot more implementation details arbitrarily if you don't
know what needs to happen for decompression and relocation.  Although
I think there is a reasonable argument to support a bImage format.
bzImage without compression.  The current bootloaders would not care
but in the embedded space you can save space but not including the
decompressor in the kernel.

We have to avoid the writes decompressor-prinnt routines and 
possibly the reload of the segment registers.  But otherwise
we should be fine.  I don't see any other privileged instructions
in arch/i386/boot/compressed/{head.S, misc.c}

> It depends
> on how well it can deal with having paging enabled and being in ring 1. 
> Looks like it might just be a matter of starting up with "enough" memory
> mapped.

Yes.  I think so.  There is an additional issue of exactly how do we
get the fixmap region allocated so we can use it but that is minor.

> So the biggest unknown is where to put the Xen ELF notes.

What I really want to do is go back to sticking an ELF header on the
bzImage.  We still can't support multiple entry points that way but we
can include ELF notes fairly easily.

It looks like for the next version of booting lguest and Xen are
actually coming closer together again.  Yea.

For boot protocol. 2.0.7  We currently need a subarchitecture field (16bits?).
default == 0, Xen, lguest, voyager?, visws?, numaq?, efi?

We need a subarchitecture data pointer field (32bits).

We need some subarchitecture kernel information for the different
bootloader.

We need to target .23 because it is to late for .22.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 22:10             ` Eric W. Biederman
  2007-04-30 22:42               ` Jeremy Fitzhardinge
@ 2007-04-30 22:42               ` Jeremy Fitzhardinge
  2007-04-30 22:51                 ` Jeremy Fitzhardinge
                                   ` (3 more replies)
  1 sibling, 4 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-30 22:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel,
	H. Peter Anvin, Rusty Russell, virtualization

Eric W. Biederman wrote:
>> I think I'd prefer to have the domain builder decompress/relocate the
>> kernel from the bzImage and start it directly, rather than have it
>> decompress/relocate itself, but I'm not really set on that.
>>     
>
> We can change a lot more implementation details arbitrarily if you don't
> know what needs to happen for decompression and relocation.

Yes, and if it can be made to work, it ultimately means less work for me ;)

> We have to avoid the writes decompressor-prinnt routines 

At worst, we could set up chunk of memory as a dummy framebuffer.  That
might be useful for debugging anyway.

> and 
> possibly the reload of the segment registers.  But otherwise
> we should be fine.  I don't see any other privileged instructions
> in arch/i386/boot/compressed/{head.S, misc.c}
>   

Xen will start the domain with a GDT loaded, and all the segment
registers loaded with flat segments.  I guess boot/compressed/head.S
could do the %cs ring check before deciding to do privileged operations.

I presume bzImage jumps straight to startup_32 on the newly decompressed
kernel?

>> It depends
>> on how well it can deal with having paging enabled and being in ring 1. 
>> Looks like it might just be a matter of starting up with "enough" memory
>> mapped.
>>     
>
> Yes.  I think so.  There is an additional issue of exactly how do we
> get the fixmap region allocated so we can use it but that is minor.
>   

I haven't checked if it already has this, but it would be nice if the
bzImage had a memory range/list of memory ranges it needs mapped to get
the kernel on its feet, so that the domain builder can just go and map
those areas for it (either P==V mappings, or with a constant offset;
whichever is more useful).

Also, if its a PAE kernel, Xen will start with PAE mode enabled, so
bzImage will have to deal with that.  But if its not touching
pagetables, it won't matter.

> What I really want to do is go back to sticking an ELF header on the
> bzImage.  We still can't support multiple entry points that way but we
> can include ELF notes fairly easily.
>   

That's OK.  We'll be able to use the boot info to go into the
Xen-specific path shortly after startup_32 anyway.

BTW, the test for a non-ring 0 %cs won't always be a good test for
paravirtualization; we're likely to start seeing hybrid execution models
where we run a largely paravirtualized kernel in a SVM/VT container.  If
we can just unconditionally use the bootloader arch definition to
determine the entry path into the kernel, it will clean things up nicely.

> It looks like for the next version of booting lguest and Xen are
> actually coming closer together again.  Yea.
>
> For boot protocol. 2.0.7  We currently need a subarchitecture field (16bits?).
> default == 0, Xen, lguest, voyager?, visws?, numaq?, efi?
>
> We need a subarchitecture data pointer field (32bits).
>   

Do we want to support starting a 64-bit guest in 64-bit mode?

> We need to target .23 because it is to late for .22.

Yes.  I'll need to do a moderate amount of work on the Xen side to make
this work, I think.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 22:10             ` Eric W. Biederman
@ 2007-04-30 22:42               ` Jeremy Fitzhardinge
  2007-04-30 22:42               ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-30 22:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Eric W. Biederman wrote:
>> I think I'd prefer to have the domain builder decompress/relocate the
>> kernel from the bzImage and start it directly, rather than have it
>> decompress/relocate itself, but I'm not really set on that.
>>     
>
> We can change a lot more implementation details arbitrarily if you don't
> know what needs to happen for decompression and relocation.

Yes, and if it can be made to work, it ultimately means less work for me ;)

> We have to avoid the writes decompressor-prinnt routines 

At worst, we could set up chunk of memory as a dummy framebuffer.  That
might be useful for debugging anyway.

> and 
> possibly the reload of the segment registers.  But otherwise
> we should be fine.  I don't see any other privileged instructions
> in arch/i386/boot/compressed/{head.S, misc.c}
>   

Xen will start the domain with a GDT loaded, and all the segment
registers loaded with flat segments.  I guess boot/compressed/head.S
could do the %cs ring check before deciding to do privileged operations.

I presume bzImage jumps straight to startup_32 on the newly decompressed
kernel?

>> It depends
>> on how well it can deal with having paging enabled and being in ring 1. 
>> Looks like it might just be a matter of starting up with "enough" memory
>> mapped.
>>     
>
> Yes.  I think so.  There is an additional issue of exactly how do we
> get the fixmap region allocated so we can use it but that is minor.
>   

I haven't checked if it already has this, but it would be nice if the
bzImage had a memory range/list of memory ranges it needs mapped to get
the kernel on its feet, so that the domain builder can just go and map
those areas for it (either P==V mappings, or with a constant offset;
whichever is more useful).

Also, if its a PAE kernel, Xen will start with PAE mode enabled, so
bzImage will have to deal with that.  But if its not touching
pagetables, it won't matter.

> What I really want to do is go back to sticking an ELF header on the
> bzImage.  We still can't support multiple entry points that way but we
> can include ELF notes fairly easily.
>   

That's OK.  We'll be able to use the boot info to go into the
Xen-specific path shortly after startup_32 anyway.

BTW, the test for a non-ring 0 %cs won't always be a good test for
paravirtualization; we're likely to start seeing hybrid execution models
where we run a largely paravirtualized kernel in a SVM/VT container.  If
we can just unconditionally use the bootloader arch definition to
determine the entry path into the kernel, it will clean things up nicely.

> It looks like for the next version of booting lguest and Xen are
> actually coming closer together again.  Yea.
>
> For boot protocol. 2.0.7  We currently need a subarchitecture field (16bits?).
> default == 0, Xen, lguest, voyager?, visws?, numaq?, efi?
>
> We need a subarchitecture data pointer field (32bits).
>   

Do we want to support starting a 64-bit guest in 64-bit mode?

> We need to target .23 because it is to late for .22.

Yes.  I'll need to do a moderate amount of work on the Xen side to make
this work, I think.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 22:42               ` Jeremy Fitzhardinge
  2007-04-30 22:51                 ` Jeremy Fitzhardinge
@ 2007-04-30 22:51                 ` Jeremy Fitzhardinge
  2007-04-30 23:10                 ` Eric W. Biederman
  2007-04-30 23:10                 ` Eric W. Biederman
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-30 22:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jeff Garzik, Andi Kleen, patches, Vivek Goyal,
	linux-kernel, H. Peter Anvin, Rusty Russell, virtualization

Jeremy Fitzhardinge wrote:
> I haven't checked if it already has this, but it would be nice if the
> bzImage had a memory range/list of memory ranges it needs mapped to get
> the kernel on its feet, so that the domain builder can just go and map
> those areas for it (either P==V mappings, or with a constant offset;
> whichever is more useful).
>   

Of course if it were a properly formed ELF file, you could encode this
in the PHDRs.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 22:42               ` Jeremy Fitzhardinge
@ 2007-04-30 22:51                 ` Jeremy Fitzhardinge
  2007-04-30 22:51                 ` Jeremy Fitzhardinge
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-04-30 22:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, H. Peter Anvin, virtualization

Jeremy Fitzhardinge wrote:
> I haven't checked if it already has this, but it would be nice if the
> bzImage had a memory range/list of memory ranges it needs mapped to get
> the kernel on its feet, so that the domain builder can just go and map
> those areas for it (either P==V mappings, or with a constant offset;
> whichever is more useful).
>   

Of course if it were a properly formed ELF file, you could encode this
in the PHDRs.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 22:42               ` Jeremy Fitzhardinge
  2007-04-30 22:51                 ` Jeremy Fitzhardinge
  2007-04-30 22:51                 ` Jeremy Fitzhardinge
@ 2007-04-30 23:10                 ` Eric W. Biederman
  2007-04-30 23:16                     ` H. Peter Anvin
  2007-04-30 23:10                 ` Eric W. Biederman
  3 siblings, 1 reply; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 23:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, Andi Kleen, patches, Vivek Goyal, linux-kernel,
	H. Peter Anvin, Rusty Russell, virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>>> I think I'd prefer to have the domain builder decompress/relocate the
>>> kernel from the bzImage and start it directly, rather than have it
>>> decompress/relocate itself, but I'm not really set on that.
>>>     
>>
>> We can change a lot more implementation details arbitrarily if you don't
>> know what needs to happen for decompression and relocation.
>
> Yes, and if it can be made to work, it ultimately means less work
> for me ;)

Now you are beginning to sound like a bootloader author.
Make it work and forget about it :)

>> We have to avoid the writes decompressor-prinnt routines 
>
> At worst, we could set up chunk of memory as a dummy framebuffer.  That
> might be useful for debugging anyway.

I'm trying to recall how we handle this on the LinuxBIOS side.
Because we have machines without a framebuffer setup.  Oh yeah,
we started in 32bit mode...

I do have some parameters to parse the command line in misc.c that
would accomplish this goal.

>> and 
>> possibly the reload of the segment registers.  But otherwise
>> we should be fine.  I don't see any other privileged instructions
>> in arch/i386/boot/compressed/{head.S, misc.c}
>>   
>
> Xen will start the domain with a GDT loaded, and all the segment
> registers loaded with flat segments.  I guess boot/compressed/head.S
> could do the %cs ring check before deciding to do privileged
> operations.

I'm tempted to just reload the segments in setup.S, but that might
break loadlin support or one of the other bootloaders that starts the
kernel in 32bit mode so we need to be careful.


> I presume bzImage jumps straight to startup_32 on the newly decompressed
> kernel?

Straight isn't the way I would but it but yes. startup_32 in arch/i386/head.S
is the first piece of code that outside of the decompressor that it runs.

> I haven't checked if it already has this, but it would be nice if the
> bzImage had a memory range/list of memory ranges it needs mapped to get
> the kernel on its feet, so that the domain builder can just go and map
> those areas for it (either P==V mappings, or with a constant offset;
> whichever is more useful).

P==V mappings I suspect.

> Also, if its a PAE kernel, Xen will start with PAE mode enabled, so
> bzImage will have to deal with that.  But if its not touching
> pagetables, it won't matter.

Exactly.

>> What I really want to do is go back to sticking an ELF header on the
>> bzImage.  We still can't support multiple entry points that way but we
>> can include ELF notes fairly easily.
>>   
>
> That's OK.  We'll be able to use the boot info to go into the
> Xen-specific path shortly after startup_32 anyway.

Yes.

> BTW, the test for a non-ring 0 %cs won't always be a good test for
> paravirtualization; we're likely to start seeing hybrid execution models
> where we run a largely paravirtualized kernel in a SVM/VT container.  If
> we can just unconditionally use the bootloader arch definition to
> determine the entry path into the kernel, it will clean things up nicely.

Yes.  That is why we need a distinct field for this and not overloading
the bootloader id.  That way if the field is non-zero we need to do
something special.

>> It looks like for the next version of booting lguest and Xen are
>> actually coming closer together again.  Yea.
>>
>> For boot protocol. 2.0.7 We currently need a subarchitecture field (16bits?).
>> default == 0, Xen, lguest, voyager?, visws?, numaq?, efi?
>>
>> We need a subarchitecture data pointer field (32bits).
>>   
>
> Do we want to support starting a 64-bit guest in 64-bit mode?

It's the only way that will be sane.  When I was doing my ELF header on
bzImage work I had that working.  But it got dropped due to some unexplained,
unreproducible testing failures and not really being necessary at the
time.  We also need that if we want to be certain we don't play with
page tables.

The other thing the ELF headers gave was a precise accounting of where
the kernel was, which is essentially what needs to be mapped at boot
time.

The hard part I suspect is going to be handling Xen when setup the
physical == virtual page tables

>> We need to target .23 because it is to late for .22.
>
> Yes.  I'll need to do a moderate amount of work on the Xen side to make
> this work, I think.

I think we all will but the upside if we design this carefully is
something that we won't have to change.

Eric


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 22:42               ` Jeremy Fitzhardinge
                                   ` (2 preceding siblings ...)
  2007-04-30 23:10                 ` Eric W. Biederman
@ 2007-04-30 23:10                 ` Eric W. Biederman
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 23:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>>> I think I'd prefer to have the domain builder decompress/relocate the
>>> kernel from the bzImage and start it directly, rather than have it
>>> decompress/relocate itself, but I'm not really set on that.
>>>     
>>
>> We can change a lot more implementation details arbitrarily if you don't
>> know what needs to happen for decompression and relocation.
>
> Yes, and if it can be made to work, it ultimately means less work
> for me ;)

Now you are beginning to sound like a bootloader author.
Make it work and forget about it :)

>> We have to avoid the writes decompressor-prinnt routines 
>
> At worst, we could set up chunk of memory as a dummy framebuffer.  That
> might be useful for debugging anyway.

I'm trying to recall how we handle this on the LinuxBIOS side.
Because we have machines without a framebuffer setup.  Oh yeah,
we started in 32bit mode...

I do have some parameters to parse the command line in misc.c that
would accomplish this goal.

>> and 
>> possibly the reload of the segment registers.  But otherwise
>> we should be fine.  I don't see any other privileged instructions
>> in arch/i386/boot/compressed/{head.S, misc.c}
>>   
>
> Xen will start the domain with a GDT loaded, and all the segment
> registers loaded with flat segments.  I guess boot/compressed/head.S
> could do the %cs ring check before deciding to do privileged
> operations.

I'm tempted to just reload the segments in setup.S, but that might
break loadlin support or one of the other bootloaders that starts the
kernel in 32bit mode so we need to be careful.


> I presume bzImage jumps straight to startup_32 on the newly decompressed
> kernel?

Straight isn't the way I would but it but yes. startup_32 in arch/i386/head.S
is the first piece of code that outside of the decompressor that it runs.

> I haven't checked if it already has this, but it would be nice if the
> bzImage had a memory range/list of memory ranges it needs mapped to get
> the kernel on its feet, so that the domain builder can just go and map
> those areas for it (either P==V mappings, or with a constant offset;
> whichever is more useful).

P==V mappings I suspect.

> Also, if its a PAE kernel, Xen will start with PAE mode enabled, so
> bzImage will have to deal with that.  But if its not touching
> pagetables, it won't matter.

Exactly.

>> What I really want to do is go back to sticking an ELF header on the
>> bzImage.  We still can't support multiple entry points that way but we
>> can include ELF notes fairly easily.
>>   
>
> That's OK.  We'll be able to use the boot info to go into the
> Xen-specific path shortly after startup_32 anyway.

Yes.

> BTW, the test for a non-ring 0 %cs won't always be a good test for
> paravirtualization; we're likely to start seeing hybrid execution models
> where we run a largely paravirtualized kernel in a SVM/VT container.  If
> we can just unconditionally use the bootloader arch definition to
> determine the entry path into the kernel, it will clean things up nicely.

Yes.  That is why we need a distinct field for this and not overloading
the bootloader id.  That way if the field is non-zero we need to do
something special.

>> It looks like for the next version of booting lguest and Xen are
>> actually coming closer together again.  Yea.
>>
>> For boot protocol. 2.0.7 We currently need a subarchitecture field (16bits?).
>> default == 0, Xen, lguest, voyager?, visws?, numaq?, efi?
>>
>> We need a subarchitecture data pointer field (32bits).
>>   
>
> Do we want to support starting a 64-bit guest in 64-bit mode?

It's the only way that will be sane.  When I was doing my ELF header on
bzImage work I had that working.  But it got dropped due to some unexplained,
unreproducible testing failures and not really being necessary at the
time.  We also need that if we want to be certain we don't play with
page tables.

The other thing the ELF headers gave was a precise accounting of where
the kernel was, which is essentially what needs to be mapped at boot
time.

The hard part I suspect is going to be handling Xen when setup the
physical == virtual page tables

>> We need to target .23 because it is to late for .22.
>
> Yes.  I'll need to do a moderate amount of work on the Xen side to make
> this work, I think.

I think we all will but the upside if we design this carefully is
something that we won't have to change.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 23:10                 ` Eric W. Biederman
@ 2007-04-30 23:16                     ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-30 23:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen, patches,
	Vivek Goyal, linux-kernel, Rusty Russell, virtualization

Eric W. Biederman wrote:
> 
> I'm tempted to just reload the segments in setup.S, but that might
> break loadlin support or one of the other bootloaders that starts the
> kernel in 32bit mode so we need to be careful.
> 

We already load all the segments in setup.S.  I'm retaining this in my
rewrite.

Given that I'm rewriting the whole thing, if there are things you want
setup.S to do, now is the time to ask.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-30 23:16                     ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-04-30 23:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, virtualization

Eric W. Biederman wrote:
> 
> I'm tempted to just reload the segments in setup.S, but that might
> break loadlin support or one of the other bootloaders that starts the
> kernel in 32bit mode so we need to be careful.
> 

We already load all the segments in setup.S.  I'm retaining this in my
rewrite.

Given that I'm rewriting the whole thing, if there are things you want
setup.S to do, now is the time to ask.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 23:16                     ` H. Peter Anvin
@ 2007-04-30 23:35                       ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 23:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen, patches,
	Vivek Goyal, linux-kernel, Rusty Russell, virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> I'm tempted to just reload the segments in setup.S, but that might
>> break loadlin support or one of the other bootloaders that starts the
>> kernel in 32bit mode so we need to be careful.
>> 
>
> We already load all the segments in setup.S.  I'm retaining this in my
> rewrite.

Good.  I guess I had moved it there in my last cleanup but I wasn't
brave enough yet to remove the reloads from both version of head.S
yet.

We should probably load the segment registers in trampoline.S as
well that would simplify things a little bit.

> Given that I'm rewriting the whole thing, if there are things you want
> setup.S to do, now is the time to ask.

The big wish list item would be subarch detection (at least if we need
it early).  As we are quickly moving to infrastructure that can runtime
switch between subarch's if we can detect the difference.

For the paravirt subarchitectures we are actually going to skip past
setup.S

So the big thing we need to start doing is to document the 32bit entry
point and what can be expected of it etc, especially in the paravirt
context.  But that isn't a setup.S problem.

Oh.  Yes.  We need a parameter structure in the kernel, that documents
what %esi points to later.  What arguments can be found in the
bootloader data.  If you are playing with setup.S in C that tends to
be a setup.S function.

Andi already has a call to verify_cpu so we abort if the cpu can't
handle our current kernel.  While that is not perfect aborting
gracefully in setup.S is a lot better then a lot of the alternatives.

Mostly what we need is to sort through the requirements of this
next boot protocol revision for paravirt loaders so we can do that
cleanly.  It will probably need a subarch type field and a subarch
data pointer field, but except for being stored in Setup.S that
isn't much of a setup.S problem either.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-04-30 23:35                       ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-04-30 23:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> I'm tempted to just reload the segments in setup.S, but that might
>> break loadlin support or one of the other bootloaders that starts the
>> kernel in 32bit mode so we need to be careful.
>> 
>
> We already load all the segments in setup.S.  I'm retaining this in my
> rewrite.

Good.  I guess I had moved it there in my last cleanup but I wasn't
brave enough yet to remove the reloads from both version of head.S
yet.

We should probably load the segment registers in trampoline.S as
well that would simplify things a little bit.

> Given that I'm rewriting the whole thing, if there are things you want
> setup.S to do, now is the time to ask.

The big wish list item would be subarch detection (at least if we need
it early).  As we are quickly moving to infrastructure that can runtime
switch between subarch's if we can detect the difference.

For the paravirt subarchitectures we are actually going to skip past
setup.S

So the big thing we need to start doing is to document the 32bit entry
point and what can be expected of it etc, especially in the paravirt
context.  But that isn't a setup.S problem.

Oh.  Yes.  We need a parameter structure in the kernel, that documents
what %esi points to later.  What arguments can be found in the
bootloader data.  If you are playing with setup.S in C that tends to
be a setup.S function.

Andi already has a call to verify_cpu so we abort if the cpu can't
handle our current kernel.  While that is not perfect aborting
gracefully in setup.S is a lot better then a lot of the alternatives.

Mostly what we need is to sort through the requirements of this
next boot protocol revision for paravirt loaders so we can do that
cleanly.  It will probably need a subarch type field and a subarch
data pointer field, but except for being stored in Setup.S that
isn't much of a setup.S problem either.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:39                     ` Andi Kleen
@ 2007-05-01  2:48                         ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  2:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, patches,
	Vivek Goyal, linux-kernel, Rusty Russell, virtualization

Andi Kleen wrote:
> 
> It's still unclear to me why exactly you want to rewrite it?
> Are there any particular bugs in the current code you want to fix?
> 

It's more the sheer degree of unmaintainability which is grating on my
nerves.  There is way too much hocus-pocus going on, and I dare to say
that probably noone understands what actually happens in there.

In response to the question about what the code looks like, I have put a
development snapshot patch at:

http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch

Note that it compiles, but it doesn't work yet.  This is not a
submission, just a "what does the code actually look like" sample.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-01  2:48                         ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  2:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Eric W. Biederman

Andi Kleen wrote:
> 
> It's still unclear to me why exactly you want to rewrite it?
> Are there any particular bugs in the current code you want to fix?
> 

It's more the sheer degree of unmaintainability which is grating on my
nerves.  There is way too much hocus-pocus going on, and I dare to say
that probably noone understands what actually happens in there.

In response to the question about what the code looks like, I have put a
development snapshot patch at:

http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch

Note that it compiles, but it doesn't work yet.  This is not a
submission, just a "what does the code actually look like" sample.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 15:34                 ` Eric W. Biederman
@ 2007-05-01  3:38                   ` Rusty Russell
  2007-05-01  3:45                       ` H. Peter Anvin
  2007-05-01  3:57                       ` Eric W. Biederman
  2007-05-01  3:38                   ` Rusty Russell
  1 sibling, 2 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  3:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

On Mon, 2007-04-30 at 09:34 -0600, Eric W. Biederman wrote:
> Reading this it occurs to me what I object to wasn't that clear.
> 
> I have no problem with the testing of %cs to see if we are not in ring0.
> That part while a little odd is fine, and we will certainly need a test
> to skip the protected instructions in head.S
> 
> What I object to in particular is having (struct lguest_info?) instead
> of using the standard format for kernel parameters pointed to in %esi.

Here's a rough patch to see what it looks like from an lguest POV.  It's
an improvement in many ways: I chose to hardcode the search for matching
backend rather than use paravirt_probe-style magic.

It'd be nicer if there were a "struct boot_params" declaration, but we
can't have everything.

Cheers,
Rusty.

diff -r 9a673a220ad6 Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c	Mon Apr 30 20:10:26 2007 +1000
+++ b/Documentation/lguest/lguest.c	Tue May 01 13:19:02 2007 +1000
@@ -30,10 +30,12 @@
 #include <termios.h>
 #include <getopt.h>
 #include <zlib.h>
+typedef unsigned long long u64;
 typedef uint32_t u32;
 typedef uint16_t u16;
 typedef uint8_t u8;
 #include "../../include/linux/lguest_launcher.h"
+#include "../../include/asm-i386/e820.h"
 
 #define PAGE_PRESENT 0x7 	/* Present, RW, Execute */
 #define NET_PEERNUM 1
@@ -915,10 +917,10 @@ static void usage(void)
 
 int main(int argc, char *argv[])
 {
-	unsigned long mem, pgdir, start, page_offset;
+	unsigned long mem, pgdir, start, page_offset, initrd_size = 0;
 	int c, lguest_fd, waker_fd;
 	struct device_list device_list;
-	struct lguest_boot_info *boot = (void *)0;
+	void *boot = (void *)0;
 	const char *initrd_name = NULL;
 
 	device_list.max_infd = -1;
@@ -966,15 +968,24 @@ int main(int argc, char *argv[])
 	map_device_descriptors(&device_list, mem);
 
 	/* Map the initrd image if requested */
-	if (initrd_name)
-		boot->initrd_size = load_initrd(initrd_name, mem);
+	if (initrd_name) {
+		initrd_size = load_initrd(initrd_name, mem);
+		*(unsigned long *)(boot+0x218) = mem - initrd_size;
+		*(unsigned long *)(boot+0x21c) = initrd_size;
+	}
 
 	/* Set up the initial linar pagetables. */
-	pgdir = setup_pagetables(mem, boot->initrd_size, page_offset);
-
-	/* Give the guest the boot information it needs. */
-	concat(boot->cmdline, argv+optind+2);
-	boot->max_pfn = mem/getpagesize();
+	pgdir = setup_pagetables(mem, initrd_size, page_offset);
+
+	/* E820 memory map: ours is a simple, single region. */
+	*(char*)(boot+E820NR) = 1;
+	*((struct e820entry *)(boot+E820MAP)) 
+		= ((struct e820entry) { 0, mem, E820_RAM });
+	/* Command line pointer and command line (at 4096) */
+	*(void **)(boot + 0x228) = boot + 4096;
+	concat(boot + 4096, argv+optind+2);
+	/* Paravirt type: 1 == lguest */
+	*(int *)(boot + 0x23c) = 1;
 
 	lguest_fd = tell_kernel(pgdir, start, page_offset);
 	waker_fd = setup_waker(&device_list);
diff -r 9a673a220ad6 arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Mon Apr 30 20:10:26 2007 +1000
+++ b/arch/i386/kernel/head.S	Tue May 01 12:29:55 2007 +1000
@@ -504,34 +504,17 @@ ignore_int:
 #ifdef CONFIG_PARAVIRT
 startup_paravirt:
 	cld
+	movl %esi, %eax
+	addl $__PAGE_OFFSET, %eax
  	movl $(init_thread_union+THREAD_SIZE),%esp
 
-	/* We take pains to preserve all the regs. */
-	pushl	%edx
-	pushl	%ecx
-	pushl	%eax
-
-	pushl	$__start_paravirtprobe
-1:
-	movl	0(%esp), %eax
-	cmpl	$__stop_paravirtprobe, %eax
-	je	unhandled_paravirt
-	pushl	(%eax)
-	movl	8(%esp), %eax
-	call	*(%esp)
-	popl	%eax
-
-	movl	4(%esp), %eax
-	movl	8(%esp), %ecx
-	movl	12(%esp), %edx
-
-	addl	$4, (%esp)
-	jmp	1b
-
-unhandled_paravirt:
+#ifdef CONFIG_LGUEST_GUEST
+	cmpl	$1, 0x23c(%eax)
+	je	lguest_init
+#endif
 	/* Nothing wanted us: we're screwed. */
 	ud2
-#endif
+#endif /* CONFIG_PARAVIRT */
 
 /*
  * Real beginning of normal "text" segment
diff -r 9a673a220ad6 drivers/lguest/lguest.c
--- a/drivers/lguest/lguest.c	Mon Apr 30 20:10:26 2007 +1000
+++ b/drivers/lguest/lguest.c	Tue May 01 13:28:06 2007 +1000
@@ -53,7 +53,6 @@ struct lguest_data lguest_data = {
 	.blocked_interrupts = { 1 }, /* Block timer interrupts */
 };
 struct lguest_device_desc *lguest_devices;
-static __initdata const struct lguest_boot_info *boot = __va(0);
 
 static enum paravirt_lazy_mode lazy_mode;
 static void lguest_lazy_mode(enum paravirt_lazy_mode mode)
@@ -378,8 +377,7 @@ static __init char *lguest_memory_setup(
 	/* We do this here because lockcheck barfs if before start_kernel */
 	atomic_notifier_chain_register(&panic_notifier_list, &paniced);
 
-	e820.nr_map = 0;
-	add_memory_region(0, PFN_PHYS(boot->max_pfn), E820_RAM);
+	add_memory_region(E820_MAP->addr, E820_MAP->size, E820_MAP->type);
 	return "LGUEST";
 }
 
@@ -410,8 +408,14 @@ static unsigned lguest_patch(u8 type, u1
 	return insn_len;
 }
 
-__init void lguest_init(void)
-{
+__init void lguest_init(void *boot)
+{
+	/* Copy boot parameters first. */
+	memcpy(boot_params, boot, PARAM_SIZE);
+	memcpy(boot_command_line,
+	       __va(*(unsigned long *)(boot_params + NEW_CL_POINTER)),
+	       COMMAND_LINE_SIZE);
+
 	paravirt_ops.name = "lguest";
 	paravirt_ops.paravirt_enabled = 1;
 	paravirt_ops.kernel_rpl = 1;
@@ -460,7 +464,6 @@ __init void lguest_init(void)
 	paravirt_ops.wbinvd = lguest_wbinvd;
 
 	hcall(LHCALL_LGUEST_INIT, __pa(&lguest_data), 0, 0);
-	strncpy(boot_command_line, boot->cmdline, COMMAND_LINE_SIZE);
 
 	/* We use top of mem for initial pagetables. */
 	init_pg_tables_end = __pa(pg0);
@@ -487,14 +490,6 @@ __init void lguest_init(void)
 
 	add_preferred_console("hvc", 0, NULL);
 
-	if (boot->initrd_size) {
-		/* We stash this at top of memory. */
-		INITRD_START = boot->max_pfn*PAGE_SIZE - boot->initrd_size;
-		INITRD_SIZE = boot->initrd_size;
-		LOADER_TYPE = 0xFF;
-	}
-
 	pm_power_off = lguest_power_off;
 	start_kernel();
 }
-paravirt_probe(lguest_maybe_init);
diff -r 9a673a220ad6 drivers/lguest/lguest_asm.S
--- a/drivers/lguest/lguest_asm.S	Mon Apr 30 20:10:26 2007 +1000
+++ b/drivers/lguest/lguest_asm.S	Tue May 01 12:37:25 2007 +1000
@@ -5,31 +5,12 @@
 /* FIXME: Once asm/processor-flags.h goes in, include that */
 #define X86_EFLAGS_IF 0x00000200
 
-/*
- * This is where we begin: head.S notes that paging is already enabled (which
- * doesn't happen in native boot) and calls the registered paravirt_probe
- * functions one at a time.  Ours is a simple assembler test for magic
- * registers.  If they're correct we jump to lguest_init.
- *
- * We put it in .init.text will be discarded after boot.
- */
-.section .init.text, "ax", @progbits
-ENTRY(lguest_maybe_init)
-	cmpl $LGUEST_MAGIC_EBP, %ebp
-	jne out
-	cmpl $LGUEST_MAGIC_EDI, %edi
-	jne out
-	cmpl $LGUEST_MAGIC_ESI, %esi
-	jne out
-	je lguest_init
-out:
-	ret
-
 /* The templates for inline patching. */
 #define LGUEST_PATCH(name, insns...)			\
 	lgstart_##name:	insns; lgend_##name:;		\
 	.globl lgstart_##name; .globl lgend_##name
 
+.section .init.text, "ax", @progbits
 LGUEST_PATCH(cli, movl $0, lguest_data+LGUEST_DATA_irq_enabled)
 LGUEST_PATCH(sti, movl $X86_EFLAGS_IF, lguest_data+LGUEST_DATA_irq_enabled)
 LGUEST_PATCH(popf, movl %eax, lguest_data+LGUEST_DATA_irq_enabled)
diff -r 9a673a220ad6 drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c	Mon Apr 30 20:10:26 2007 +1000
+++ b/drivers/lguest/lguest_user.c	Tue May 01 12:11:50 2007 +1000
@@ -7,13 +7,11 @@ static void setup_regs(struct lguest_reg
 static void setup_regs(struct lguest_regs *regs, unsigned long start)
 {
 	/* Write out stack in format lguest expects, so we can switch to it. */
-	regs->edi = LGUEST_MAGIC_EDI;
-	regs->ebp = LGUEST_MAGIC_EBP;
-	regs->esi = LGUEST_MAGIC_ESI;
 	regs->ds = regs->es = regs->ss = __KERNEL_DS|GUEST_PL;
 	regs->cs = __KERNEL_CS|GUEST_PL;
 	regs->eflags = 0x202; 	/* Interrupts enabled. */
 	regs->eip = start;
+	/* esi points to our boot information (physical address 0) */
 }
 
 /* + addr */
diff -r 9a673a220ad6 include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h	Mon Apr 30 20:10:26 2007 +1000
+++ b/include/asm-i386/paravirt.h	Tue May 01 11:35:17 2007 +1000
@@ -221,11 +221,6 @@ struct paravirt_ops
 	void (*irq_enable_sysexit)(void);
 	void (*iret)(void);
 };
-
-/* Mark a paravirt probe function. */
-#define paravirt_probe(fn)						\
- static asmlinkage void (*__paravirtprobe_##fn)(void) __attribute_used__ \
-		__attribute__((__section__(".paravirtprobe"))) = fn
 
 extern struct paravirt_ops paravirt_ops;
 
diff -r 9a673a220ad6 include/linux/lguest.h
--- a/include/linux/lguest.h	Mon Apr 30 20:10:26 2007 +1000
+++ b/include/linux/lguest.h	Tue May 01 12:11:07 2007 +1000
@@ -2,11 +2,6 @@
  * this is subject to wild and random change between versions. */
 #ifndef _ASM_LGUEST_H
 #define _ASM_LGUEST_H
-
-/* These are randomly chosen numbers which indicate we're an lguest at boot */
-#define LGUEST_MAGIC_EBP 0x4C687970
-#define LGUEST_MAGIC_EDI 0x652D4D65
-#define LGUEST_MAGIC_ESI 0xFFFFFFFF
 
 #ifndef __ASSEMBLY__
 #include <asm/irq.h>
diff -r 9a673a220ad6 include/linux/lguest_launcher.h
--- a/include/linux/lguest_launcher.h	Mon Apr 30 20:10:26 2007 +1000
+++ b/include/linux/lguest_launcher.h	Tue May 01 11:33:59 2007 +1000
@@ -15,14 +15,6 @@ struct lguest_dma
  	u32 used_len;
 	u32 addr[LGUEST_MAX_DMA_SECTIONS];
 	u16 len[LGUEST_MAX_DMA_SECTIONS];
-};
-
-/* This is found at address 0. */
-struct lguest_boot_info
-{
-	u32 max_pfn;
-	u32 initrd_size;
-	char cmdline[256];
 };
 
 struct lguest_block_page



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 15:34                 ` Eric W. Biederman
  2007-05-01  3:38                   ` Rusty Russell
@ 2007-05-01  3:38                   ` Rusty Russell
  1 sibling, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  3:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

On Mon, 2007-04-30 at 09:34 -0600, Eric W. Biederman wrote:
> Reading this it occurs to me what I object to wasn't that clear.
> 
> I have no problem with the testing of %cs to see if we are not in ring0.
> That part while a little odd is fine, and we will certainly need a test
> to skip the protected instructions in head.S
> 
> What I object to in particular is having (struct lguest_info?) instead
> of using the standard format for kernel parameters pointed to in %esi.

Here's a rough patch to see what it looks like from an lguest POV.  It's
an improvement in many ways: I chose to hardcode the search for matching
backend rather than use paravirt_probe-style magic.

It'd be nicer if there were a "struct boot_params" declaration, but we
can't have everything.

Cheers,
Rusty.

diff -r 9a673a220ad6 Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c	Mon Apr 30 20:10:26 2007 +1000
+++ b/Documentation/lguest/lguest.c	Tue May 01 13:19:02 2007 +1000
@@ -30,10 +30,12 @@
 #include <termios.h>
 #include <getopt.h>
 #include <zlib.h>
+typedef unsigned long long u64;
 typedef uint32_t u32;
 typedef uint16_t u16;
 typedef uint8_t u8;
 #include "../../include/linux/lguest_launcher.h"
+#include "../../include/asm-i386/e820.h"
 
 #define PAGE_PRESENT 0x7 	/* Present, RW, Execute */
 #define NET_PEERNUM 1
@@ -915,10 +917,10 @@ static void usage(void)
 
 int main(int argc, char *argv[])
 {
-	unsigned long mem, pgdir, start, page_offset;
+	unsigned long mem, pgdir, start, page_offset, initrd_size = 0;
 	int c, lguest_fd, waker_fd;
 	struct device_list device_list;
-	struct lguest_boot_info *boot = (void *)0;
+	void *boot = (void *)0;
 	const char *initrd_name = NULL;
 
 	device_list.max_infd = -1;
@@ -966,15 +968,24 @@ int main(int argc, char *argv[])
 	map_device_descriptors(&device_list, mem);
 
 	/* Map the initrd image if requested */
-	if (initrd_name)
-		boot->initrd_size = load_initrd(initrd_name, mem);
+	if (initrd_name) {
+		initrd_size = load_initrd(initrd_name, mem);
+		*(unsigned long *)(boot+0x218) = mem - initrd_size;
+		*(unsigned long *)(boot+0x21c) = initrd_size;
+	}
 
 	/* Set up the initial linar pagetables. */
-	pgdir = setup_pagetables(mem, boot->initrd_size, page_offset);
-
-	/* Give the guest the boot information it needs. */
-	concat(boot->cmdline, argv+optind+2);
-	boot->max_pfn = mem/getpagesize();
+	pgdir = setup_pagetables(mem, initrd_size, page_offset);
+
+	/* E820 memory map: ours is a simple, single region. */
+	*(char*)(boot+E820NR) = 1;
+	*((struct e820entry *)(boot+E820MAP)) 
+		= ((struct e820entry) { 0, mem, E820_RAM });
+	/* Command line pointer and command line (at 4096) */
+	*(void **)(boot + 0x228) = boot + 4096;
+	concat(boot + 4096, argv+optind+2);
+	/* Paravirt type: 1 == lguest */
+	*(int *)(boot + 0x23c) = 1;
 
 	lguest_fd = tell_kernel(pgdir, start, page_offset);
 	waker_fd = setup_waker(&device_list);
diff -r 9a673a220ad6 arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S	Mon Apr 30 20:10:26 2007 +1000
+++ b/arch/i386/kernel/head.S	Tue May 01 12:29:55 2007 +1000
@@ -504,34 +504,17 @@ ignore_int:
 #ifdef CONFIG_PARAVIRT
 startup_paravirt:
 	cld
+	movl %esi, %eax
+	addl $__PAGE_OFFSET, %eax
  	movl $(init_thread_union+THREAD_SIZE),%esp
 
-	/* We take pains to preserve all the regs. */
-	pushl	%edx
-	pushl	%ecx
-	pushl	%eax
-
-	pushl	$__start_paravirtprobe
-1:
-	movl	0(%esp), %eax
-	cmpl	$__stop_paravirtprobe, %eax
-	je	unhandled_paravirt
-	pushl	(%eax)
-	movl	8(%esp), %eax
-	call	*(%esp)
-	popl	%eax
-
-	movl	4(%esp), %eax
-	movl	8(%esp), %ecx
-	movl	12(%esp), %edx
-
-	addl	$4, (%esp)
-	jmp	1b
-
-unhandled_paravirt:
+#ifdef CONFIG_LGUEST_GUEST
+	cmpl	$1, 0x23c(%eax)
+	je	lguest_init
+#endif
 	/* Nothing wanted us: we're screwed. */
 	ud2
-#endif
+#endif /* CONFIG_PARAVIRT */
 
 /*
  * Real beginning of normal "text" segment
diff -r 9a673a220ad6 drivers/lguest/lguest.c
--- a/drivers/lguest/lguest.c	Mon Apr 30 20:10:26 2007 +1000
+++ b/drivers/lguest/lguest.c	Tue May 01 13:28:06 2007 +1000
@@ -53,7 +53,6 @@ struct lguest_data lguest_data = {
 	.blocked_interrupts = { 1 }, /* Block timer interrupts */
 };
 struct lguest_device_desc *lguest_devices;
-static __initdata const struct lguest_boot_info *boot = __va(0);
 
 static enum paravirt_lazy_mode lazy_mode;
 static void lguest_lazy_mode(enum paravirt_lazy_mode mode)
@@ -378,8 +377,7 @@ static __init char *lguest_memory_setup(
 	/* We do this here because lockcheck barfs if before start_kernel */
 	atomic_notifier_chain_register(&panic_notifier_list, &paniced);
 
-	e820.nr_map = 0;
-	add_memory_region(0, PFN_PHYS(boot->max_pfn), E820_RAM);
+	add_memory_region(E820_MAP->addr, E820_MAP->size, E820_MAP->type);
 	return "LGUEST";
 }
 
@@ -410,8 +408,14 @@ static unsigned lguest_patch(u8 type, u1
 	return insn_len;
 }
 
-__init void lguest_init(void)
-{
+__init void lguest_init(void *boot)
+{
+	/* Copy boot parameters first. */
+	memcpy(boot_params, boot, PARAM_SIZE);
+	memcpy(boot_command_line,
+	       __va(*(unsigned long *)(boot_params + NEW_CL_POINTER)),
+	       COMMAND_LINE_SIZE);
+
 	paravirt_ops.name = "lguest";
 	paravirt_ops.paravirt_enabled = 1;
 	paravirt_ops.kernel_rpl = 1;
@@ -460,7 +464,6 @@ __init void lguest_init(void)
 	paravirt_ops.wbinvd = lguest_wbinvd;
 
 	hcall(LHCALL_LGUEST_INIT, __pa(&lguest_data), 0, 0);
-	strncpy(boot_command_line, boot->cmdline, COMMAND_LINE_SIZE);
 
 	/* We use top of mem for initial pagetables. */
 	init_pg_tables_end = __pa(pg0);
@@ -487,14 +490,6 @@ __init void lguest_init(void)
 
 	add_preferred_console("hvc", 0, NULL);
 
-	if (boot->initrd_size) {
-		/* We stash this at top of memory. */
-		INITRD_START = boot->max_pfn*PAGE_SIZE - boot->initrd_size;
-		INITRD_SIZE = boot->initrd_size;
-		LOADER_TYPE = 0xFF;
-	}
-
 	pm_power_off = lguest_power_off;
 	start_kernel();
 }
-paravirt_probe(lguest_maybe_init);
diff -r 9a673a220ad6 drivers/lguest/lguest_asm.S
--- a/drivers/lguest/lguest_asm.S	Mon Apr 30 20:10:26 2007 +1000
+++ b/drivers/lguest/lguest_asm.S	Tue May 01 12:37:25 2007 +1000
@@ -5,31 +5,12 @@
 /* FIXME: Once asm/processor-flags.h goes in, include that */
 #define X86_EFLAGS_IF 0x00000200
 
-/*
- * This is where we begin: head.S notes that paging is already enabled (which
- * doesn't happen in native boot) and calls the registered paravirt_probe
- * functions one at a time.  Ours is a simple assembler test for magic
- * registers.  If they're correct we jump to lguest_init.
- *
- * We put it in .init.text will be discarded after boot.
- */
-.section .init.text, "ax", @progbits
-ENTRY(lguest_maybe_init)
-	cmpl $LGUEST_MAGIC_EBP, %ebp
-	jne out
-	cmpl $LGUEST_MAGIC_EDI, %edi
-	jne out
-	cmpl $LGUEST_MAGIC_ESI, %esi
-	jne out
-	je lguest_init
-out:
-	ret
-
 /* The templates for inline patching. */
 #define LGUEST_PATCH(name, insns...)			\
 	lgstart_##name:	insns; lgend_##name:;		\
 	.globl lgstart_##name; .globl lgend_##name
 
+.section .init.text, "ax", @progbits
 LGUEST_PATCH(cli, movl $0, lguest_data+LGUEST_DATA_irq_enabled)
 LGUEST_PATCH(sti, movl $X86_EFLAGS_IF, lguest_data+LGUEST_DATA_irq_enabled)
 LGUEST_PATCH(popf, movl %eax, lguest_data+LGUEST_DATA_irq_enabled)
diff -r 9a673a220ad6 drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c	Mon Apr 30 20:10:26 2007 +1000
+++ b/drivers/lguest/lguest_user.c	Tue May 01 12:11:50 2007 +1000
@@ -7,13 +7,11 @@ static void setup_regs(struct lguest_reg
 static void setup_regs(struct lguest_regs *regs, unsigned long start)
 {
 	/* Write out stack in format lguest expects, so we can switch to it. */
-	regs->edi = LGUEST_MAGIC_EDI;
-	regs->ebp = LGUEST_MAGIC_EBP;
-	regs->esi = LGUEST_MAGIC_ESI;
 	regs->ds = regs->es = regs->ss = __KERNEL_DS|GUEST_PL;
 	regs->cs = __KERNEL_CS|GUEST_PL;
 	regs->eflags = 0x202; 	/* Interrupts enabled. */
 	regs->eip = start;
+	/* esi points to our boot information (physical address 0) */
 }
 
 /* + addr */
diff -r 9a673a220ad6 include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h	Mon Apr 30 20:10:26 2007 +1000
+++ b/include/asm-i386/paravirt.h	Tue May 01 11:35:17 2007 +1000
@@ -221,11 +221,6 @@ struct paravirt_ops
 	void (*irq_enable_sysexit)(void);
 	void (*iret)(void);
 };
-
-/* Mark a paravirt probe function. */
-#define paravirt_probe(fn)						\
- static asmlinkage void (*__paravirtprobe_##fn)(void) __attribute_used__ \
-		__attribute__((__section__(".paravirtprobe"))) = fn
 
 extern struct paravirt_ops paravirt_ops;
 
diff -r 9a673a220ad6 include/linux/lguest.h
--- a/include/linux/lguest.h	Mon Apr 30 20:10:26 2007 +1000
+++ b/include/linux/lguest.h	Tue May 01 12:11:07 2007 +1000
@@ -2,11 +2,6 @@
  * this is subject to wild and random change between versions. */
 #ifndef _ASM_LGUEST_H
 #define _ASM_LGUEST_H
-
-/* These are randomly chosen numbers which indicate we're an lguest at boot */
-#define LGUEST_MAGIC_EBP 0x4C687970
-#define LGUEST_MAGIC_EDI 0x652D4D65
-#define LGUEST_MAGIC_ESI 0xFFFFFFFF
 
 #ifndef __ASSEMBLY__
 #include <asm/irq.h>
diff -r 9a673a220ad6 include/linux/lguest_launcher.h
--- a/include/linux/lguest_launcher.h	Mon Apr 30 20:10:26 2007 +1000
+++ b/include/linux/lguest_launcher.h	Tue May 01 11:33:59 2007 +1000
@@ -15,14 +15,6 @@ struct lguest_dma
  	u32 used_len;
 	u32 addr[LGUEST_MAX_DMA_SECTIONS];
 	u16 len[LGUEST_MAX_DMA_SECTIONS];
-};
-
-/* This is found at address 0. */
-struct lguest_boot_info
-{
-	u32 max_pfn;
-	u32 initrd_size;
-	char cmdline[256];
 };
 
 struct lguest_block_page

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 23:16                     ` H. Peter Anvin
                                       ` (2 preceding siblings ...)
  (?)
@ 2007-05-01  3:39                     ` Andi Kleen
  2007-05-01  2:48                         ` H. Peter Anvin
  -1 siblings, 1 reply; 217+ messages in thread
From: Andi Kleen @ 2007-05-01  3:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, Rusty Russell,
	virtualization

On Mon, Apr 30, 2007 at 04:16:10PM -0700, H. Peter Anvin wrote:
> Eric W. Biederman wrote:
> > 
> > I'm tempted to just reload the segments in setup.S, but that might
> > break loadlin support or one of the other bootloaders that starts the
> > kernel in 32bit mode so we need to be careful.
> > 
> 
> We already load all the segments in setup.S.  I'm retaining this in my
> rewrite.

It's still unclear to me why exactly you want to rewrite it?
Are there any particular bugs in the current code you want to fix?

-Andi

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 23:16                     ` H. Peter Anvin
  (?)
  (?)
@ 2007-05-01  3:39                     ` Andi Kleen
  -1 siblings, 0 replies; 217+ messages in thread
From: Andi Kleen @ 2007-05-01  3:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

On Mon, Apr 30, 2007 at 04:16:10PM -0700, H. Peter Anvin wrote:
> Eric W. Biederman wrote:
> > 
> > I'm tempted to just reload the segments in setup.S, but that might
> > break loadlin support or one of the other bootloaders that starts the
> > kernel in 32bit mode so we need to be careful.
> > 
> 
> We already load all the segments in setup.S.  I'm retaining this in my
> rewrite.

It's still unclear to me why exactly you want to rewrite it?
Are there any particular bugs in the current code you want to fix?

-Andi

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:38                   ` Rusty Russell
@ 2007-05-01  3:45                       ` H. Peter Anvin
  2007-05-01  3:57                       ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  3:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell wrote:
> 
> It'd be nicer if there were a "struct boot_params" declaration, but we
> can't have everything.

It's in my patchset-under-development.

(Preview snapshot:
http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch)

> diff -r 9a673a220ad6 Documentation/lguest/lguest.c
> --- a/Documentation/lguest/lguest.c	Mon Apr 30 20:10:26 2007 +1000
> +++ b/Documentation/lguest/lguest.c	Tue May 01 13:19:02 2007 +1000
> @@ -30,10 +30,12 @@
>  #include <termios.h>
>  #include <getopt.h>
>  #include <zlib.h>
> +typedef unsigned long long u64;
>  typedef uint32_t u32;
>  typedef uint16_t u16;
>  typedef uint8_t u8;

Why not uint64_t to go along with all the other defines?

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-01  3:45                       ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  3:45 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

Rusty Russell wrote:
> 
> It'd be nicer if there were a "struct boot_params" declaration, but we
> can't have everything.

It's in my patchset-under-development.

(Preview snapshot:
http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch)

> diff -r 9a673a220ad6 Documentation/lguest/lguest.c
> --- a/Documentation/lguest/lguest.c	Mon Apr 30 20:10:26 2007 +1000
> +++ b/Documentation/lguest/lguest.c	Tue May 01 13:19:02 2007 +1000
> @@ -30,10 +30,12 @@
>  #include <termios.h>
>  #include <getopt.h>
>  #include <zlib.h>
> +typedef unsigned long long u64;
>  typedef uint32_t u32;
>  typedef uint16_t u16;
>  typedef uint8_t u8;

Why not uint64_t to go along with all the other defines?

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:38                   ` Rusty Russell
@ 2007-05-01  3:57                       ` Eric W. Biederman
  2007-05-01  3:57                       ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-01  3:57 UTC (permalink / raw)
  To: Rusty Russell
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell <rusty@rustcorp.com.au> writes:

> On Mon, 2007-04-30 at 09:34 -0600, Eric W. Biederman wrote:
>> Reading this it occurs to me what I object to wasn't that clear.
>> 
>> I have no problem with the testing of %cs to see if we are not in ring0.
>> That part while a little odd is fine, and we will certainly need a test
>> to skip the protected instructions in head.S
>> 
>> What I object to in particular is having (struct lguest_info?) instead
>> of using the standard format for kernel parameters pointed to in %esi.
>
> Here's a rough patch to see what it looks like from an lguest POV.  It's
> an improvement in many ways: I chose to hardcode the search for matching
> backend rather than use paravirt_probe-style magic.

Cool.

> It'd be nicer if there were a "struct boot_params" declaration, but we
> can't have everything.

Well it will come.  I have an old one in kexec-tools and HPA looks like
he has one in his C rewrite.

I'm not going to worry about going farther until the patches in flight
settle down a little bit, but this looks promising.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-01  3:57                       ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-01  3:57 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Rusty Russell <rusty@rustcorp.com.au> writes:

> On Mon, 2007-04-30 at 09:34 -0600, Eric W. Biederman wrote:
>> Reading this it occurs to me what I object to wasn't that clear.
>> 
>> I have no problem with the testing of %cs to see if we are not in ring0.
>> That part while a little odd is fine, and we will certainly need a test
>> to skip the protected instructions in head.S
>> 
>> What I object to in particular is having (struct lguest_info?) instead
>> of using the standard format for kernel parameters pointed to in %esi.
>
> Here's a rough patch to see what it looks like from an lguest POV.  It's
> an improvement in many ways: I chose to hardcode the search for matching
> backend rather than use paravirt_probe-style magic.

Cool.

> It'd be nicer if there were a "struct boot_params" declaration, but we
> can't have everything.

Well it will come.  I have an old one in kexec-tools and HPA looks like
he has one in his C rewrite.

I'm not going to worry about going farther until the patches in flight
settle down a little bit, but this looks promising.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:45                       ` H. Peter Anvin
@ 2007-05-01  3:59                         ` Rusty Russell
  -1 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  3:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

On Mon, 2007-04-30 at 20:45 -0700, H. Peter Anvin wrote:
> Rusty Russell wrote:
> > 
> > It'd be nicer if there were a "struct boot_params" declaration, but we
> > can't have everything.
> 
> It's in my patchset-under-development.

Ah ha: excellent!

> > +typedef unsigned long long u64;
> >  typedef uint32_t u32;
> >  typedef uint16_t u16;
> >  typedef uint8_t u8;
> 
> Why not uint64_t to go along with all the other defines?

Because then it's a PITA to printf(): x86-64 has uint64_t as "unsigned
long".  So the lguest64 guys will add all kinds of horrible casts.  This
has the same effect, but ironically is more portable.

Rusty.


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-01  3:59                         ` Rusty Russell
  0 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  3:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

On Mon, 2007-04-30 at 20:45 -0700, H. Peter Anvin wrote:
> Rusty Russell wrote:
> > 
> > It'd be nicer if there were a "struct boot_params" declaration, but we
> > can't have everything.
> 
> It's in my patchset-under-development.

Ah ha: excellent!

> > +typedef unsigned long long u64;
> >  typedef uint32_t u32;
> >  typedef uint16_t u16;
> >  typedef uint8_t u8;
> 
> Why not uint64_t to go along with all the other defines?

Because then it's a PITA to printf(): x86-64 has uint64_t as "unsigned
long".  So the lguest64 guys will add all kinds of horrible casts.  This
has the same effect, but ironically is more portable.

Rusty.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:45                       ` H. Peter Anvin
@ 2007-05-01  4:00                         ` H. Peter Anvin
  -1 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  4:00 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

H. Peter Anvin wrote:
> Rusty Russell wrote:
>> It'd be nicer if there were a "struct boot_params" declaration, but we
>> can't have everything.
> 
> It's in my patchset-under-development.
> 
> (Preview snapshot:
> http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch)

Just pushed out a git tree:

http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-01  4:00                         ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  4:00 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

H. Peter Anvin wrote:
> Rusty Russell wrote:
>> It'd be nicer if there were a "struct boot_params" declaration, but we
>> can't have everything.
> 
> It's in my patchset-under-development.
> 
> (Preview snapshot:
> http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch)

Just pushed out a git tree:

http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  4:00                         ` H. Peter Anvin
  (?)
  (?)
@ 2007-05-01  4:50                         ` Rusty Russell
  2007-05-01  5:28                             ` H. Peter Anvin
  -1 siblings, 1 reply; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  4:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

On Mon, 2007-04-30 at 21:00 -0700, H. Peter Anvin wrote:
> H. Peter Anvin wrote:
> > Rusty Russell wrote:
> >> It'd be nicer if there were a "struct boot_params" declaration, but we
> >> can't have everything.
> > 
> > It's in my patchset-under-development.
> > 
> > (Preview snapshot:
> > http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch)
> 
> Just pushed out a git tree:
> 
> http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary

Any chance of splitting off a "struct boot_params" header (includable by
non-kernel code) for inclusion immediately?  The rest can wait until
2.6.23, but it'd be sweet to get this in place sooner.

BTW, wrt. a new "platform type" field, should it go something like this?

-0235/3	N/A	pad2		Unused
+0235/1	2.07+	platform_type	Runtime platform (see below)
+0236/2	N/A	pad2		Unused
...
+  platform_type:
+	For kernels which can boot on multiple platforms.  Currently
+	0 == native (normal), 1 == lguest (paravirtualized).

Thanks,
Rusty.



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  4:00                         ` H. Peter Anvin
  (?)
@ 2007-05-01  4:50                         ` Rusty Russell
  -1 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  4:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

On Mon, 2007-04-30 at 21:00 -0700, H. Peter Anvin wrote:
> H. Peter Anvin wrote:
> > Rusty Russell wrote:
> >> It'd be nicer if there were a "struct boot_params" declaration, but we
> >> can't have everything.
> > 
> > It's in my patchset-under-development.
> > 
> > (Preview snapshot:
> > http://userweb.kernel.org/~hpa/setup-snapshot-2007.04.30.patch)
> 
> Just pushed out a git tree:
> 
> http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary

Any chance of splitting off a "struct boot_params" header (includable by
non-kernel code) for inclusion immediately?  The rest can wait until
2.6.23, but it'd be sweet to get this in place sooner.

BTW, wrt. a new "platform type" field, should it go something like this?

-0235/3	N/A	pad2		Unused
+0235/1	2.07+	platform_type	Runtime platform (see below)
+0236/2	N/A	pad2		Unused
...
+  platform_type:
+	For kernels which can boot on multiple platforms.  Currently
+	0 == native (normal), 1 == lguest (paravirtualized).

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  4:50                         ` Rusty Russell
@ 2007-05-01  5:28                             ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  5:28 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell wrote:
> 
> BTW, wrt. a new "platform type" field, should it go something like this?
> 
> -0235/3	N/A	pad2		Unused
> +0235/1	2.07+	platform_type	Runtime platform (see below)
> +0236/2	N/A	pad2		Unused
> ...
> +  platform_type:
> +	For kernels which can boot on multiple platforms.  Currently
> +	0 == native (normal), 1 == lguest (paravirtualized).
> 

Well, yes, but we need to think about if there is more things that
should be added.  There *definitely* should be space for a platform data
pointer, to start out with.  I would also like to see a platform data
field, as well as a bootloader extension field (we're going to have that
problem soon enough.)

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-01  5:28                             ` H. Peter Anvin
  0 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-01  5:28 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, virtualization

Rusty Russell wrote:
> 
> BTW, wrt. a new "platform type" field, should it go something like this?
> 
> -0235/3	N/A	pad2		Unused
> +0235/1	2.07+	platform_type	Runtime platform (see below)
> +0236/2	N/A	pad2		Unused
> ...
> +  platform_type:
> +	For kernels which can boot on multiple platforms.  Currently
> +	0 == native (normal), 1 == lguest (paravirtualized).
> 

Well, yes, but we need to think about if there is more things that
should be added.  There *definitely* should be space for a platform data
pointer, to start out with.  I would also like to see a platform data
field, as well as a bootloader extension field (we're going to have that
problem soon enough.)

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:57                       ` Eric W. Biederman
  (?)
@ 2007-05-01  5:37                       ` Jeremy Fitzhardinge
  2007-05-01  6:11                         ` Eric W. Biederman
                                           ` (3 more replies)
  -1 siblings, 4 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01  5:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rusty Russell, H. Peter Anvin, Jeff Garzik, Andi Kleen, patches,
	Vivek Goyal, linux-kernel, virtualization

Eric W. Biederman wrote:
> I'm not going to worry about going farther until the patches in flight
> settle down a little bit, but this looks promising.
>   

Is there any value in adding an "early-putchar" function pointer into
the structure somehow?  I could easily arrange for the domain builder to
put a bit of code into the domain so that the early boot code can emit
something.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  3:57                       ` Eric W. Biederman
  (?)
  (?)
@ 2007-05-01  5:37                       ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01  5:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Eric W. Biederman wrote:
> I'm not going to worry about going farther until the patches in flight
> settle down a little bit, but this looks promising.
>   

Is there any value in adding an "early-putchar" function pointer into
the structure somehow?  I could easily arrange for the domain builder to
put a bit of code into the domain so that the early boot code can emit
something.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  5:28                             ` H. Peter Anvin
  (?)
  (?)
@ 2007-05-01  6:05                             ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-01  6:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Rusty Russell, Jeremy Fitzhardinge, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Rusty Russell wrote:
>> 
>> BTW, wrt. a new "platform type" field, should it go something like this?
>> 
>> -0235/3	N/A	pad2		Unused
>> +0235/1	2.07+	platform_type	Runtime platform (see below)
>> +0236/2	N/A	pad2		Unused
>> ...
>> +  platform_type:
>> +	For kernels which can boot on multiple platforms.  Currently
>> +	0 == native (normal), 1 == lguest (paravirtualized).
>> 
>
> Well, yes, but we need to think about if there is more things that
> should be added.  There *definitely* should be space for a platform data
> pointer, to start out with.  I would also like to see a platform data
> field, as well as a bootloader extension field (we're going to have that
> problem soon enough.)

Well in the paravirt case since we are starting virtually mapped
if we don't start with vmlinux but bzImage we need to define what
that virtual address space should contain, and where in the address
space it is safe to put those page tables.

Of the requirements I have heard so far that is the trickiest one.
Because it basically requires us to have a reasonable worst case
estimate of how much memory we are going to use before we start using
the bootmem allocator.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  5:28                             ` H. Peter Anvin
  (?)
@ 2007-05-01  6:05                             ` Eric W. Biederman
  -1 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-01  6:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Rusty Russell wrote:
>> 
>> BTW, wrt. a new "platform type" field, should it go something like this?
>> 
>> -0235/3	N/A	pad2		Unused
>> +0235/1	2.07+	platform_type	Runtime platform (see below)
>> +0236/2	N/A	pad2		Unused
>> ...
>> +  platform_type:
>> +	For kernels which can boot on multiple platforms.  Currently
>> +	0 == native (normal), 1 == lguest (paravirtualized).
>> 
>
> Well, yes, but we need to think about if there is more things that
> should be added.  There *definitely* should be space for a platform data
> pointer, to start out with.  I would also like to see a platform data
> field, as well as a bootloader extension field (we're going to have that
> problem soon enough.)

Well in the paravirt case since we are starting virtually mapped
if we don't start with vmlinux but bzImage we need to define what
that virtual address space should contain, and where in the address
space it is safe to put those page tables.

Of the requirements I have heard so far that is the trickiest one.
Because it basically requires us to have a reasonable worst case
estimate of how much memory we are going to use before we start using
the bootmem allocator.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  5:37                       ` Jeremy Fitzhardinge
  2007-05-01  6:11                         ` Eric W. Biederman
@ 2007-05-01  6:11                         ` Eric W. Biederman
  2007-05-01  7:34                         ` Rusty Russell
  2007-05-01  7:34                         ` Rusty Russell
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-01  6:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Rusty Russell, H. Peter Anvin, Jeff Garzik, Andi Kleen, patches,
	Vivek Goyal, linux-kernel, virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> I'm not going to worry about going farther until the patches in flight
>> settle down a little bit, but this looks promising.
>>   
>
> Is there any value in adding an "early-putchar" function pointer into
> the structure somehow?  I could easily arrange for the domain builder to
> put a bit of code into the domain so that the early boot code can emit
> something.

I don't think so.  Once we know what subarch it is we can do a specific
hypervisor call if we need to for early printing.  There are weird
issues like physical vs virtual that would seem to make anything more
generic very difficult to get right, because the code pointed at
would need to be fully pic.

So as a trivial hypervisor call certainly, but I'm pretty doubtful
about a function pointer.  

Then we can do:

if (xen)
   blah
else if (lguest)
   blah2


Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  5:37                       ` Jeremy Fitzhardinge
@ 2007-05-01  6:11                         ` Eric W. Biederman
  2007-05-01  6:11                         ` Eric W. Biederman
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-01  6:11 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal, H. Peter Anvin,
	virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> I'm not going to worry about going farther until the patches in flight
>> settle down a little bit, but this looks promising.
>>   
>
> Is there any value in adding an "early-putchar" function pointer into
> the structure somehow?  I could easily arrange for the domain builder to
> put a bit of code into the domain so that the early boot code can emit
> something.

I don't think so.  Once we know what subarch it is we can do a specific
hypervisor call if we need to for early printing.  There are weird
issues like physical vs virtual that would seem to make anything more
generic very difficult to get right, because the code pointed at
would need to be fully pic.

So as a trivial hypervisor call certainly, but I'm pretty doubtful
about a function pointer.  

Then we can do:

if (xen)
   blah
else if (lguest)
   blah2


Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  5:37                       ` Jeremy Fitzhardinge
  2007-05-01  6:11                         ` Eric W. Biederman
  2007-05-01  6:11                         ` Eric W. Biederman
@ 2007-05-01  7:34                         ` Rusty Russell
  2007-05-01  8:03                           ` Jeremy Fitzhardinge
  2007-05-01  8:03                           ` Jeremy Fitzhardinge
  2007-05-01  7:34                         ` Rusty Russell
  3 siblings, 2 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  7:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, H. Peter Anvin, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

On Mon, 2007-04-30 at 22:37 -0700, Jeremy Fitzhardinge wrote:
> Eric W. Biederman wrote:
> > I'm not going to worry about going farther until the patches in flight
> > settle down a little bit, but this looks promising.
> >   
> 
> Is there any value in adding an "early-putchar" function pointer into
> the structure somehow?  I could easily arrange for the domain builder to
> put a bit of code into the domain so that the early boot code can emit
> something.

Well there aren't that many instructions between startup_32 and
lguest_init at the moment, but I guess if we end up going through
bzImage decompression it makes more sense...

Rusty.



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  5:37                       ` Jeremy Fitzhardinge
                                           ` (2 preceding siblings ...)
  2007-05-01  7:34                         ` Rusty Russell
@ 2007-05-01  7:34                         ` Rusty Russell
  3 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-01  7:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, H. Peter Anvin, virtualization

On Mon, 2007-04-30 at 22:37 -0700, Jeremy Fitzhardinge wrote:
> Eric W. Biederman wrote:
> > I'm not going to worry about going farther until the patches in flight
> > settle down a little bit, but this looks promising.
> >   
> 
> Is there any value in adding an "early-putchar" function pointer into
> the structure somehow?  I could easily arrange for the domain builder to
> put a bit of code into the domain so that the early boot code can emit
> something.

Well there aren't that many instructions between startup_32 and
lguest_init at the moment, but I guess if we end up going through
bzImage decompression it makes more sense...

Rusty.

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  7:34                         ` Rusty Russell
  2007-05-01  8:03                           ` Jeremy Fitzhardinge
@ 2007-05-01  8:03                           ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01  8:03 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Eric W. Biederman, H. Peter Anvin, Jeff Garzik, Andi Kleen,
	patches, Vivek Goyal, linux-kernel, virtualization

Rusty Russell wrote:
> Well there aren't that many instructions between startup_32 and
> lguest_init at the moment, but I guess if we end up going through
> bzImage decompression it makes more sense...

Yes, that's what I was thinking.  If we boot compressed kernels in a
novel environment, there's going to be plenty of early debugging going
on.  Fortunately for me, I can just start it up under gdb, so it isn't
all that arduous.

    J


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-01  7:34                         ` Rusty Russell
@ 2007-05-01  8:03                           ` Jeremy Fitzhardinge
  2007-05-01  8:03                           ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-01  8:03 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeff Garzik, patches, linux-kernel, Vivek Goyal,
	Eric W. Biederman, H. Peter Anvin, virtualization

Rusty Russell wrote:
> Well there aren't that many instructions between startup_32 and
> lguest_init at the moment, but I guess if we end up going through
> bzImage decompression it makes more sense...

Yes, that's what I was thinking.  If we boot compressed kernels in a
novel environment, there's going to be plenty of early debugging going
on.  Fortunately for me, I can just start it up under gdb, so it isn't
all that arduous.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 18:50             ` Jeremy Fitzhardinge
                               ` (2 preceding siblings ...)
  (?)
@ 2007-05-02  9:31             ` Gerd Hoffmann
  2007-05-02 15:16               ` Jeremy Fitzhardinge
  2007-05-02 15:16               ` Jeremy Fitzhardinge
  -1 siblings, 2 replies; 217+ messages in thread
From: Gerd Hoffmann @ 2007-05-02  9:31 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jeff Garzik, patches, linux-kernel,
	Vivek Goyal, H. Peter Anvin, virtualization

Jeremy Fitzhardinge wrote:
> Eric W. Biederman wrote:
>> I have several ideas on how we can make this work but first I have to
>> ask what is it that you are trying to accomplish?
> 
> The requirements are:
> 
>    1. the domain builder needs to get various information about the
>       guest kernel by inspecting its ELF notes

Doesn't need to be ELF notes.  The current (3.0.5+) domain builder has 
pluggable binary parsers.  Right now there are two:  ELF (obviously ...) 
and binary (with a multiboot-like header).  Filling the informations 
such as virt_base is a function of the parser, so when adding one more 
parser to the domain builder for bzImage kernels the parser could do 
something completely different to gather the needed information ...

> That works OK for a kernel which is compiled to run under Xen and can't
> run in any other environment, but now that we can generate a single
> kernel which can run in any number of different environments, its
> unfortunate that we still need multiple variants of the kernel image.

Yep, although already much better than completely different kernels. 
Most space of a typical distro kernel is modules which are shared even 
with different kernel binaries.

> So, I have no problem in also building a boot protocol info structure,
> and passing that in %esi, so long as I can store a pointer to the
> Xen-specific info as well.

Yep, should work fine.

> I think I'd prefer to have the domain builder decompress/relocate the
> kernel from the bzImage and start it directly, rather than have it
> decompress/relocate itself,

I'd expect that work better too.

> It depends
> on how well it can deal with having paging enabled and being in ring 1. 

Xen direct paging mode requiring (leaf) page tables being mapped 
read-only makes page table manipulation a bit difficult.  Xen has to 
care whenever the memory it maps is a page table.  Native hasn't.

Also switching to a completely different set of page tables isn't easy 
under Xen.  My xen guest kexec patches have to perform some intresting 
tricks because of that ...

> Looks like it might just be a matter of starting up with "enough" memory
> mapped.

Doesn't solve the problem of having to switch from identity mapping to 
the 0xc0000000 one ...

cheers,
   Gerd



^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-04-30 18:50             ` Jeremy Fitzhardinge
                               ` (3 preceding siblings ...)
  (?)
@ 2007-05-02  9:31             ` Gerd Hoffmann
  -1 siblings, 0 replies; 217+ messages in thread
From: Gerd Hoffmann @ 2007-05-02  9:31 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Eric W. Biederman, H. Peter Anvin

Jeremy Fitzhardinge wrote:
> Eric W. Biederman wrote:
>> I have several ideas on how we can make this work but first I have to
>> ask what is it that you are trying to accomplish?
> 
> The requirements are:
> 
>    1. the domain builder needs to get various information about the
>       guest kernel by inspecting its ELF notes

Doesn't need to be ELF notes.  The current (3.0.5+) domain builder has 
pluggable binary parsers.  Right now there are two:  ELF (obviously ...) 
and binary (with a multiboot-like header).  Filling the informations 
such as virt_base is a function of the parser, so when adding one more 
parser to the domain builder for bzImage kernels the parser could do 
something completely different to gather the needed information ...

> That works OK for a kernel which is compiled to run under Xen and can't
> run in any other environment, but now that we can generate a single
> kernel which can run in any number of different environments, its
> unfortunate that we still need multiple variants of the kernel image.

Yep, although already much better than completely different kernels. 
Most space of a typical distro kernel is modules which are shared even 
with different kernel binaries.

> So, I have no problem in also building a boot protocol info structure,
> and passing that in %esi, so long as I can store a pointer to the
> Xen-specific info as well.

Yep, should work fine.

> I think I'd prefer to have the domain builder decompress/relocate the
> kernel from the bzImage and start it directly, rather than have it
> decompress/relocate itself,

I'd expect that work better too.

> It depends
> on how well it can deal with having paging enabled and being in ring 1. 

Xen direct paging mode requiring (leaf) page tables being mapped 
read-only makes page table manipulation a bit difficult.  Xen has to 
care whenever the memory it maps is a page table.  Native hasn't.

Also switching to a completely different set of page tables isn't easy 
under Xen.  My xen guest kexec patches have to perform some intresting 
tricks because of that ...

> Looks like it might just be a matter of starting up with "enough" memory
> mapped.

Doesn't solve the problem of having to switch from identity mapping to 
the 0xc0000000 one ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02  9:31             ` Gerd Hoffmann
  2007-05-02 15:16               ` Jeremy Fitzhardinge
@ 2007-05-02 15:16               ` Jeremy Fitzhardinge
  2007-05-02 20:51                 ` H. Peter Anvin
  2007-05-02 20:51                 ` H. Peter Anvin
  1 sibling, 2 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 15:16 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Eric W. Biederman, Jeff Garzik, patches, linux-kernel,
	Vivek Goyal, H. Peter Anvin, virtualization

Gerd Hoffmann wrote:
> Doesn't need to be ELF notes.  The current (3.0.5+) domain builder has
> pluggable binary parsers.  Right now there are two:  ELF (obviously
> ...) and binary (with a multiboot-like header).  Filling the
> informations such as virt_base is a function of the parser, so when
> adding one more parser to the domain builder for bzImage kernels the
> parser could do something completely different to gather the needed
> information ...

True.  But the plan is already to make bzImage an ELF file, so notes
would seem to be the best option.  At worst, it could be ELF notes
wrapped in some other container, but that's not pretty.

>> That works OK for a kernel which is compiled to run under Xen and can't
>> run in any other environment, but now that we can generate a single
>> kernel which can run in any number of different environments, its
>> unfortunate that we still need multiple variants of the kernel image.
>
> Yep, although already much better than completely different kernels.
> Most space of a typical distro kernel is modules which are shared even
> with different kernel binaries.

Yep.

>> So, I have no problem in also building a boot protocol info structure,
>> and passing that in %esi, so long as I can store a pointer to the
>> Xen-specific info as well.
>
> Yep, should work fine.
>
>> I think I'd prefer to have the domain builder decompress/relocate the
>> kernel from the bzImage and start it directly, rather than have it
>> decompress/relocate itself,
>
> I'd expect that work better too.
>
>> It depends
>> on how well it can deal with having paging enabled and being in ring 1. 
>
> Xen direct paging mode requiring (leaf) page tables being mapped
> read-only makes page table manipulation a bit difficult.  Xen has to
> care whenever the memory it maps is a page table.  Native hasn't.
>
> Also switching to a completely different set of page tables isn't easy
> under Xen.  My xen guest kexec patches have to perform some intresting
> tricks because of that ...

Yeah, that's tricky.  I ended up copying the Xen pagetables's pmd into
the kernel's so that they could share ptes.  Making a completely new
pagetable means you need to update the RO state on both old and new.

>> Looks like it might just be a matter of starting up with "enough" memory
>> mapped.
>
> Doesn't solve the problem of having to switch from identity mapping to
> the 0xc0000000 one ...

Hm.  That's right.  Xen will boot a vmlinux with its pagetable
pre-constructed to map it at its virtual address.  Going through bzImage
would mean it would be identity mapped, and someone early would need to
construct the virtual mapping.

But if the path is:

   1. enter bzImage in 32-bit mode
   2. decompress kernel
   3. jump to startup_32
   4. detect paravirt and choose appropriate backend
   5. run Xen startup code

then the Xen startup code can construct the virtual mapping before going
on with the rest of the kernel boot - steps 1-4 can be run with identity
mapping.


    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02  9:31             ` Gerd Hoffmann
@ 2007-05-02 15:16               ` Jeremy Fitzhardinge
  2007-05-02 15:16               ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 15:16 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Eric W. Biederman, H. Peter Anvin

Gerd Hoffmann wrote:
> Doesn't need to be ELF notes.  The current (3.0.5+) domain builder has
> pluggable binary parsers.  Right now there are two:  ELF (obviously
> ...) and binary (with a multiboot-like header).  Filling the
> informations such as virt_base is a function of the parser, so when
> adding one more parser to the domain builder for bzImage kernels the
> parser could do something completely different to gather the needed
> information ...

True.  But the plan is already to make bzImage an ELF file, so notes
would seem to be the best option.  At worst, it could be ELF notes
wrapped in some other container, but that's not pretty.

>> That works OK for a kernel which is compiled to run under Xen and can't
>> run in any other environment, but now that we can generate a single
>> kernel which can run in any number of different environments, its
>> unfortunate that we still need multiple variants of the kernel image.
>
> Yep, although already much better than completely different kernels.
> Most space of a typical distro kernel is modules which are shared even
> with different kernel binaries.

Yep.

>> So, I have no problem in also building a boot protocol info structure,
>> and passing that in %esi, so long as I can store a pointer to the
>> Xen-specific info as well.
>
> Yep, should work fine.
>
>> I think I'd prefer to have the domain builder decompress/relocate the
>> kernel from the bzImage and start it directly, rather than have it
>> decompress/relocate itself,
>
> I'd expect that work better too.
>
>> It depends
>> on how well it can deal with having paging enabled and being in ring 1. 
>
> Xen direct paging mode requiring (leaf) page tables being mapped
> read-only makes page table manipulation a bit difficult.  Xen has to
> care whenever the memory it maps is a page table.  Native hasn't.
>
> Also switching to a completely different set of page tables isn't easy
> under Xen.  My xen guest kexec patches have to perform some intresting
> tricks because of that ...

Yeah, that's tricky.  I ended up copying the Xen pagetables's pmd into
the kernel's so that they could share ptes.  Making a completely new
pagetable means you need to update the RO state on both old and new.

>> Looks like it might just be a matter of starting up with "enough" memory
>> mapped.
>
> Doesn't solve the problem of having to switch from identity mapping to
> the 0xc0000000 one ...

Hm.  That's right.  Xen will boot a vmlinux with its pagetable
pre-constructed to map it at its virtual address.  Going through bzImage
would mean it would be identity mapped, and someone early would need to
construct the virtual mapping.

But if the path is:

   1. enter bzImage in 32-bit mode
   2. decompress kernel
   3. jump to startup_32
   4. detect paravirt and choose appropriate backend
   5. run Xen startup code

then the Xen startup code can construct the virtual mapping before going
on with the rest of the kernel boot - steps 1-4 can be run with identity
mapping.


    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 15:16               ` Jeremy Fitzhardinge
  2007-05-02 20:51                 ` H. Peter Anvin
@ 2007-05-02 20:51                 ` H. Peter Anvin
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
                                     ` (3 more replies)
  1 sibling, 4 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 20:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Gerd Hoffmann, Eric W. Biederman, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

Jeremy Fitzhardinge wrote:
> 
> True.  But the plan is already to make bzImage an ELF file, so notes
> would seem to be the best option.  At worst, it could be ELF notes
> wrapped in some other container, but that's not pretty.
> 

It's not going to happen.  Too many boot loaders make assumptions about
ELF files which aren't really compatible; the entry conditions for an
ELF from a boot loader are pretty ill-defined, so I think this is a bad
idea.

At the very least, it shouldn't present the ELF magic number IMNSHO.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 15:16               ` Jeremy Fitzhardinge
@ 2007-05-02 20:51                 ` H. Peter Anvin
  2007-05-02 20:51                 ` H. Peter Anvin
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 20:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

Jeremy Fitzhardinge wrote:
> 
> True.  But the plan is already to make bzImage an ELF file, so notes
> would seem to be the best option.  At worst, it could be ELF notes
> wrapped in some other container, but that's not pretty.
> 

It's not going to happen.  Too many boot loaders make assumptions about
ELF files which aren't really compatible; the entry conditions for an
ELF from a boot loader are pretty ill-defined, so I think this is a bad
idea.

At the very least, it shouldn't present the ELF magic number IMNSHO.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 20:51                 ` H. Peter Anvin
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
@ 2007-05-02 21:01                   ` Jeremy Fitzhardinge
  2007-05-02 21:09                     ` H. Peter Anvin
  2007-05-02 21:09                     ` H. Peter Anvin
  2007-05-02 21:17                   ` Eric W. Biederman
  2007-05-02 21:17                   ` Eric W. Biederman
  3 siblings, 2 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 21:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Gerd Hoffmann, Eric W. Biederman, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> True.  But the plan is already to make bzImage an ELF file, so notes
>> would seem to be the best option.  At worst, it could be ELF notes
>> wrapped in some other container, but that's not pretty.
>>
>>     
>
> It's not going to happen.  Too many boot loaders make assumptions about
> ELF files which aren't really compatible; the entry conditions for an
> ELF from a boot loader are pretty ill-defined, so I think this is a bad
> idea.
>
> At the very least, it shouldn't present the ELF magic number IMNSHO.
>   

Hm, that's unfortunate.  How about an ELF file wrapped in some other
container, so that we can easily extract a properly formed ELF file?

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 20:51                 ` H. Peter Anvin
@ 2007-05-02 21:01                   ` Jeremy Fitzhardinge
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 21:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> True.  But the plan is already to make bzImage an ELF file, so notes
>> would seem to be the best option.  At worst, it could be ELF notes
>> wrapped in some other container, but that's not pretty.
>>
>>     
>
> It's not going to happen.  Too many boot loaders make assumptions about
> ELF files which aren't really compatible; the entry conditions for an
> ELF from a boot loader are pretty ill-defined, so I think this is a bad
> idea.
>
> At the very least, it shouldn't present the ELF magic number IMNSHO.
>   

Hm, that's unfortunate.  How about an ELF file wrapped in some other
container, so that we can easily extract a properly formed ELF file?

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
@ 2007-05-02 21:09                     ` H. Peter Anvin
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
                                         ` (3 more replies)
  2007-05-02 21:09                     ` H. Peter Anvin
  1 sibling, 4 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 21:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Gerd Hoffmann, Eric W. Biederman, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

Jeremy Fitzhardinge wrote:
> 
> Hm, that's unfortunate.  How about an ELF file wrapped in some other
> container, so that we can easily extract a properly formed ELF file?
> 

Effectively the same thing as changing the magic number.  Note that the
format for bzImage is pretty rigid, and it would be *highly* undesirable
to muck that up.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
  2007-05-02 21:09                     ` H. Peter Anvin
@ 2007-05-02 21:09                     ` H. Peter Anvin
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 21:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

Jeremy Fitzhardinge wrote:
> 
> Hm, that's unfortunate.  How about an ELF file wrapped in some other
> container, so that we can easily extract a properly formed ELF file?
> 

Effectively the same thing as changing the magic number.  Note that the
format for bzImage is pretty rigid, and it would be *highly* undesirable
to muck that up.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 20:51                 ` H. Peter Anvin
                                     ` (2 preceding siblings ...)
  2007-05-02 21:17                   ` Eric W. Biederman
@ 2007-05-02 21:17                   ` Eric W. Biederman
  2007-05-02 21:24                     ` H. Peter Anvin
  2007-05-02 21:24                     ` H. Peter Anvin
  3 siblings, 2 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-02 21:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Gerd Hoffmann, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Jeremy Fitzhardinge wrote:
>> 
>> True.  But the plan is already to make bzImage an ELF file, so notes
>> would seem to be the best option.  At worst, it could be ELF notes
>> wrapped in some other container, but that's not pretty.
>> 
>
> It's not going to happen.  Too many boot loaders make assumptions about
> ELF files which aren't really compatible; the entry conditions for an
> ELF from a boot loader are pretty ill-defined, so I think this is a bad
> idea.
>
> At the very least, it shouldn't present the ELF magic number IMNSHO.

I agree that there are some issues.

However we need the information that is contained in ELF headers or
a semantic equivalent so we might as well play with the possibility.

There are two practical issues for ELF and bootloaders.
virtual vs. physical addresses.   In a bzImage header all
we will present will be physical addresses so that isn't an
issue.

The other issue is what is the format of the arguments that the
executable expects.  There seems to be 0 consensus on this so
bootloaders simply can't agree, and any bootloader that is
prepared to deal with kernels from different locations is going
to have to cope.

So I figure we keep our current calling conventions and have a
note saying that we are linux so the format can be auto-detected.

There are of course plenty of bootloaders that load whatever happens
to be their OS kernel however they managed to get ld to spit it out,
and there are some really weird things going on there.  But that doesn't
matter because those bootloaders can make no pretense at being general
purpose.

There is a lot of future flexibility that comes from this in addition
to making x86 closer to the other architectures.

I do agree we need to tread carefully, but I have yet to hear about
any show stopper bugs, and it works well enough at least one major distro
has shipped a linux kernel bzImage with an ELF header.

So we won't do this casually and if it there are real problems we will
remove the ELF magic number.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 20:51                 ` H. Peter Anvin
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
  2007-05-02 21:01                   ` Jeremy Fitzhardinge
@ 2007-05-02 21:17                   ` Eric W. Biederman
  2007-05-02 21:17                   ` Eric W. Biederman
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-02 21:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann

"H. Peter Anvin" <hpa@zytor.com> writes:

> Jeremy Fitzhardinge wrote:
>> 
>> True.  But the plan is already to make bzImage an ELF file, so notes
>> would seem to be the best option.  At worst, it could be ELF notes
>> wrapped in some other container, but that's not pretty.
>> 
>
> It's not going to happen.  Too many boot loaders make assumptions about
> ELF files which aren't really compatible; the entry conditions for an
> ELF from a boot loader are pretty ill-defined, so I think this is a bad
> idea.
>
> At the very least, it shouldn't present the ELF magic number IMNSHO.

I agree that there are some issues.

However we need the information that is contained in ELF headers or
a semantic equivalent so we might as well play with the possibility.

There are two practical issues for ELF and bootloaders.
virtual vs. physical addresses.   In a bzImage header all
we will present will be physical addresses so that isn't an
issue.

The other issue is what is the format of the arguments that the
executable expects.  There seems to be 0 consensus on this so
bootloaders simply can't agree, and any bootloader that is
prepared to deal with kernels from different locations is going
to have to cope.

So I figure we keep our current calling conventions and have a
note saying that we are linux so the format can be auto-detected.

There are of course plenty of bootloaders that load whatever happens
to be their OS kernel however they managed to get ld to spit it out,
and there are some really weird things going on there.  But that doesn't
matter because those bootloaders can make no pretense at being general
purpose.

There is a lot of future flexibility that comes from this in addition
to making x86 closer to the other architectures.

I do agree we need to tread carefully, but I have yet to hear about
any show stopper bugs, and it works well enough at least one major distro
has shipped a linux kernel bzImage with an ELF header.

So we won't do this casually and if it there are real problems we will
remove the ELF magic number.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:17                   ` Eric W. Biederman
@ 2007-05-02 21:24                     ` H. Peter Anvin
  2007-05-02 21:36                       ` Eric W. Biederman
  2007-05-02 21:36                       ` Eric W. Biederman
  2007-05-02 21:24                     ` H. Peter Anvin
  1 sibling, 2 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 21:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeremy Fitzhardinge, Gerd Hoffmann, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

Eric W. Biederman wrote:
> 
> So we won't do this casually and if it there are real problems we will
> remove the ELF magic number.
> 

I think we can use ELF-compatible format just fine, but it would make
more sense to use a non-ELF magic number from the start, instead of
signalling it with a note.  Since bootloaders need to be aware, anyway,
they can just detect this magic and treat is as an Linux calling
convention ELF image, or they can not detect it, and treat it as a
bzImage.  As a side benefit, we:

a) can use a magic number that contains a jump instruction (to keep the
non-bootsector happy);
b) get a proper Linux kernel magic number.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:17                   ` Eric W. Biederman
  2007-05-02 21:24                     ` H. Peter Anvin
@ 2007-05-02 21:24                     ` H. Peter Anvin
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 21:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann

Eric W. Biederman wrote:
> 
> So we won't do this casually and if it there are real problems we will
> remove the ELF magic number.
> 

I think we can use ELF-compatible format just fine, but it would make
more sense to use a non-ELF magic number from the start, instead of
signalling it with a note.  Since bootloaders need to be aware, anyway,
they can just detect this magic and treat is as an Linux calling
convention ELF image, or they can not detect it, and treat it as a
bzImage.  As a side benefit, we:

a) can use a magic number that contains a jump instruction (to keep the
non-bootsector happy);
b) get a proper Linux kernel magic number.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:24                     ` H. Peter Anvin
  2007-05-02 21:36                       ` Eric W. Biederman
@ 2007-05-02 21:36                       ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-02 21:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Gerd Hoffmann, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> So we won't do this casually and if it there are real problems we will
>> remove the ELF magic number.
>> 
>
> I think we can use ELF-compatible format just fine, but it would make
> more sense to use a non-ELF magic number from the start, instead of
> signalling it with a note.  Since bootloaders need to be aware, anyway,
> they can just detect this magic and treat is as an Linux calling
> convention ELF image, or they can not detect it, and treat it as a
> bzImage.  As a side benefit, we:
>
> a) can use a magic number that contains a jump instruction (to keep the
> non-bootsector happy);
> b) get a proper Linux kernel magic number.

To the best of my knowledge I have already resolved both of those concerns,
in my current code.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:24                     ` H. Peter Anvin
@ 2007-05-02 21:36                       ` Eric W. Biederman
  2007-05-02 21:36                       ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-02 21:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> So we won't do this casually and if it there are real problems we will
>> remove the ELF magic number.
>> 
>
> I think we can use ELF-compatible format just fine, but it would make
> more sense to use a non-ELF magic number from the start, instead of
> signalling it with a note.  Since bootloaders need to be aware, anyway,
> they can just detect this magic and treat is as an Linux calling
> convention ELF image, or they can not detect it, and treat it as a
> bzImage.  As a side benefit, we:
>
> a) can use a magic number that contains a jump instruction (to keep the
> non-bootsector happy);
> b) get a proper Linux kernel magic number.

To the best of my knowledge I have already resolved both of those concerns,
in my current code.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:09                     ` H. Peter Anvin
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
@ 2007-05-02 21:39                       ` Jeremy Fitzhardinge
  2007-05-02 21:59                         ` H. Peter Anvin
  2007-05-02 21:59                         ` H. Peter Anvin
  2007-05-03  2:01                       ` Rusty Russell
  2007-05-03  2:01                       ` Rusty Russell
  3 siblings, 2 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Gerd Hoffmann, Eric W. Biederman, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Hm, that's unfortunate.  How about an ELF file wrapped in some other
>> container, so that we can easily extract a properly formed ELF file?
>>
>>     
>
> Effectively the same thing as changing the magic number.  Note that the
> format for bzImage is pretty rigid, and it would be *highly* undesirable
> to muck that up.

So the bzImage structure is currently:

   1. old-style boot sector
   2. old-style boot info, followed by 0xaa55 at the end of the sector
   3. the HdrS boot param block
   4. setup.S boot code
   5. the self-decompressing kernel

If we make 5 actually an ELF file, containing properly formed Ehdr,
Phdrs (for all the mappings required), and the actual kernel
decompressor, relocator and compressed kernel data, then it would be
easy for the Xen domain builder to find that and use it as a basis for
loading.  I think it would just require the bzImage boot param block to
contain an offset of the start of the ELF file.  The contents of the ELF
file would be in a form where the normal boot code could just jump over
the ELF headers, directly into the segment data itself.

ie:

   1. old-style boot sector
   2. old-style boot info, followed by 0xaa55 at the end of the sector
   3. the HdrS boot param block
   4. setup.S boot code (jumps directly into 5.3)
   5. 32-bit self-decompressing kernel:
         1. Ehdr
         2. Phdrs for all necessary mappings
         3. decompressor/relocator .text
         4. compressed kernel data


Does that sound reasonable?

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:09                     ` H. Peter Anvin
@ 2007-05-02 21:39                       ` Jeremy Fitzhardinge
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Hm, that's unfortunate.  How about an ELF file wrapped in some other
>> container, so that we can easily extract a properly formed ELF file?
>>
>>     
>
> Effectively the same thing as changing the magic number.  Note that the
> format for bzImage is pretty rigid, and it would be *highly* undesirable
> to muck that up.

So the bzImage structure is currently:

   1. old-style boot sector
   2. old-style boot info, followed by 0xaa55 at the end of the sector
   3. the HdrS boot param block
   4. setup.S boot code
   5. the self-decompressing kernel

If we make 5 actually an ELF file, containing properly formed Ehdr,
Phdrs (for all the mappings required), and the actual kernel
decompressor, relocator and compressed kernel data, then it would be
easy for the Xen domain builder to find that and use it as a basis for
loading.  I think it would just require the bzImage boot param block to
contain an offset of the start of the ELF file.  The contents of the ELF
file would be in a form where the normal boot code could just jump over
the ELF headers, directly into the segment data itself.

ie:

   1. old-style boot sector
   2. old-style boot info, followed by 0xaa55 at the end of the sector
   3. the HdrS boot param block
   4. setup.S boot code (jumps directly into 5.3)
   5. 32-bit self-decompressing kernel:
         1. Ehdr
         2. Phdrs for all necessary mappings
         3. decompressor/relocator .text
         4. compressed kernel data


Does that sound reasonable?

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
@ 2007-05-02 21:59                         ` H. Peter Anvin
  2007-05-02 23:03                           ` Jeremy Fitzhardinge
                                             ` (3 more replies)
  2007-05-02 21:59                         ` H. Peter Anvin
  1 sibling, 4 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 21:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Gerd Hoffmann, Eric W. Biederman, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

Jeremy Fitzhardinge wrote:
> 
> So the bzImage structure is currently:
> 
>    1. old-style boot sector
>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>    3. the HdrS boot param block
>    4. setup.S boot code
>    5. the self-decompressing kernel
> 
> If we make 5 actually an ELF file, containing properly formed Ehdr,
> Phdrs (for all the mappings required), and the actual kernel
> decompressor, relocator and compressed kernel data, then it would be
> easy for the Xen domain builder to find that and use it as a basis for
> loading.  I think it would just require the bzImage boot param block to
> contain an offset of the start of the ELF file.  The contents of the ELF
> file would be in a form where the normal boot code could just jump over
> the ELF headers, directly into the segment data itself.
> 
> ie:
> 
>    1. old-style boot sector
>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>    3. the HdrS boot param block
>    4. setup.S boot code (jumps directly into 5.3)
>    5. 32-bit self-decompressing kernel:
>          1. Ehdr
>          2. Phdrs for all necessary mappings
>          3. decompressor/relocator .text
>          4. compressed kernel data
> 
> Does that sound reasonable?
> 

I don't know if that would break any programs that are currently
bypassing the setup.  The existing setup protocol definitely allows
invoking an entry point which isn't 0x100000 (rather, the 32-bit
entrypoint is defined by code32_start); I'm not sure how Eric's
relocatable kernel patches (2.05 protocol) affect that, mostly because I
haven't seen any boot loaders which actually use it so I can't comment
on what their code looks like.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
  2007-05-02 21:59                         ` H. Peter Anvin
@ 2007-05-02 21:59                         ` H. Peter Anvin
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-02 21:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

Jeremy Fitzhardinge wrote:
> 
> So the bzImage structure is currently:
> 
>    1. old-style boot sector
>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>    3. the HdrS boot param block
>    4. setup.S boot code
>    5. the self-decompressing kernel
> 
> If we make 5 actually an ELF file, containing properly formed Ehdr,
> Phdrs (for all the mappings required), and the actual kernel
> decompressor, relocator and compressed kernel data, then it would be
> easy for the Xen domain builder to find that and use it as a basis for
> loading.  I think it would just require the bzImage boot param block to
> contain an offset of the start of the ELF file.  The contents of the ELF
> file would be in a form where the normal boot code could just jump over
> the ELF headers, directly into the segment data itself.
> 
> ie:
> 
>    1. old-style boot sector
>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>    3. the HdrS boot param block
>    4. setup.S boot code (jumps directly into 5.3)
>    5. 32-bit self-decompressing kernel:
>          1. Ehdr
>          2. Phdrs for all necessary mappings
>          3. decompressor/relocator .text
>          4. compressed kernel data
> 
> Does that sound reasonable?
> 

I don't know if that would break any programs that are currently
bypassing the setup.  The existing setup protocol definitely allows
invoking an entry point which isn't 0x100000 (rather, the 32-bit
entrypoint is defined by code32_start); I'm not sure how Eric's
relocatable kernel patches (2.05 protocol) affect that, mostly because I
haven't seen any boot loaders which actually use it so I can't comment
on what their code looks like.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:59                         ` H. Peter Anvin
@ 2007-05-02 23:03                           ` Jeremy Fitzhardinge
  2007-05-02 23:03                           ` Jeremy Fitzhardinge
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 23:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Gerd Hoffmann, Eric W. Biederman, Jeff Garzik, patches,
	linux-kernel, Vivek Goyal, virtualization

H. Peter Anvin wrote:
> I don't know if that would break any programs that are currently
> bypassing the setup.  The existing setup protocol definitely allows
> invoking an entry point which isn't 0x100000 (rather, the 32-bit
> entrypoint is defined by code32_start); I'm not sure how Eric's
> relocatable kernel patches (2.05 protocol) affect that, mostly because I
> haven't seen any boot loaders which actually use it so I can't comment
> on what their code looks like.

Yes, I'd expect that code32_start would point into the ELF text
segment.   You could align things so that the entrypoint is still
actually 0x100000, or bump it up a bit to fit the ELF headers.  I have
to admit I don't quite understand how all that fits together at the moment.

    J


^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:59                         ` H. Peter Anvin
  2007-05-02 23:03                           ` Jeremy Fitzhardinge
@ 2007-05-02 23:03                           ` Jeremy Fitzhardinge
  2007-05-03  4:50                           ` Vivek Goyal
  2007-05-03  4:50                           ` Vivek Goyal
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 23:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

H. Peter Anvin wrote:
> I don't know if that would break any programs that are currently
> bypassing the setup.  The existing setup protocol definitely allows
> invoking an entry point which isn't 0x100000 (rather, the 32-bit
> entrypoint is defined by code32_start); I'm not sure how Eric's
> relocatable kernel patches (2.05 protocol) affect that, mostly because I
> haven't seen any boot loaders which actually use it so I can't comment
> on what their code looks like.

Yes, I'd expect that code32_start would point into the ELF text
segment.   You could align things so that the entrypoint is still
actually 0x100000, or bump it up a bit to fit the ELF headers.  I have
to admit I don't quite understand how all that fits together at the moment.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:09                     ` H. Peter Anvin
                                         ` (2 preceding siblings ...)
  2007-05-03  2:01                       ` Rusty Russell
@ 2007-05-03  2:01                       ` Rusty Russell
  3 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-03  2:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Jeff Garzik, patches, linux-kernel,
	virtualization, Vivek Goyal, Gerd Hoffmann, Eric W. Biederman

On Wed, 2007-05-02 at 14:09 -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > Hm, that's unfortunate.  How about an ELF file wrapped in some other
> > container, so that we can easily extract a properly formed ELF file?
> > 
> 
> Effectively the same thing as changing the magic number.  Note that the
> format for bzImage is pretty rigid, and it would be *highly* undesirable
> to muck that up.

To add some code to the debate, here's how lguest loads a bzImage (from
my draft documentation).  Almost anything would be an improvement:


/* A bzImage, unlike an ELF file, is not meant to be loaded.  You're
 * supposed to jump into it and it will unpack itself.  We can't do that
 * because the Guest can't run the unpacking code, and adding features to
 * lguest kills puppies, so we don't want to.
 *
 * The bzImage is formed by putting the decompressing code in front of the
 * compressed kernel code.  So we can simple scan through it looking for the
 * first "gzip" header, and start decompressing from there. */
static unsigned long load_bzimage(int fd, unsigned long *page_offset)
{
	unsigned char c;
	int state = 0;

	/* GZIP header is 0x1F 0x8B <method> <flags>... <compressed-by>. */
	while (read(fd, &c, 1) == 1) {
		switch (state) {
		case 0:
			if (c == 0x1F)
				state++;
			break;
		case 1:
			if (c == 0x8B)
				state++;
			else
				state = 0;
			break;
		case 2 ... 8:
			state++;
			break;
		case 9:
			/* Seek back to the start of the gzip header. */
			lseek(fd, -10, SEEK_CUR);
			/* One final check: "compressed under UNIX". */
			if (c != 0x03)
				state = -1;
			else
				return unpack_bzimage(fd, page_offset);
		}
	}
	errx(1, "Could not find kernel in bzImage");
}

/* Unfortunately the entire ELF image isn't compressed: the segments
 * which need loading are extracted and compressed raw.  This denies us the
 * information we need to make a fully-general loader. */
static unsigned long unpack_bzimage(int fd, unsigned long *page_offset)
{
	gzFile f;
	int ret, len = 0;
	/* A bzImage always gets loaded at physical address 1M.  This is
	 * actually configurable as CONFIG_PHYSICAL_START, but as the comment
	 * there says, "Don't change this unless you know what you are doing".
	 * Indeed. */
	void *img = (void *)0x100000;

	/* gzdopen takes our file descriptor (carefully placed at the start of
	 * the GZIP header we found) and returns a gzFile. */
	f = gzdopen(fd, "rb");
	/* Unfortunately, if we made a mistake and it wasn't really a gzip
	 * header, it will still read the file, but directly without
	 * decompressing it.  For us, that's a misfeature. */
	if (gzdirect(f))
		errx(1, "did not find correct gzip header");
	/* We read it into memory in 64k chunks until we hit the end. */
	while ((ret = gzread(f, img + len, 65536)) > 0)
		len += ret;
	if (ret < 0)
		err(1, "reading image from bzImage");

	verbose("Unpacked size %i addr %p\n", len, img);

	/* Without the ELF header, we can't tell virtual-physical gap.  This is
	 * CONFIG_PAGE_OFFSET, and people do actually change it.  Fortunately,
	 * I have a clever way of figuring it out from the code itself.  */
	*page_offset = intuit_page_offset(img, len);

	/* Entry is physical address: convert to virtual */
	return (unsigned long)img + *page_offset;
}

/* Prepare to be SHOCKED and AMAZED.  And possibly a trifle nauseated.
 *
 * We know that CONFIG_PAGE_OFFSET sets what virtual address the kernel expects
 * to be.  We don't know what that option was, but we can figure it out
 * approximately by looking at the addresses in the code.  I chose the common
 * case of reading a memory location into the %eax register:
 *
 *  movl <some-address>, %eax
 *
 * This gets encoded as five bytes: "0xA1 <4-byte-address>".  For example,
 * "0xA1 0x18 0x60 0x47 0xC0" reads the address 0xC0476018 into %eax.
 *
 * In this example can guess that the kernel was compiled with
 * CONFIG_PAGE_OFFSET set to 0xC0000000 (it's always a round number).  If the
 * kernel were larger than 16MB, we might see 0xC1 addresses show up, but our
 * kernel isn't that bloated yet.
 *
 * Unfortunately, x86 has variable-length instructions, so finding this
 * particular instruction properly involves writing a disassembler.  Instead,
 * we rely on statistics.  We look for "0xA1" and tally the different bytes
 * which occur 4 bytes later (the "0xC0" in our example above).  When one of
 * those bytes appears three times, we can be reasonably confident that it
 * forms the start of CONFIG_PAGE_OFFSET.
 *
 * This is amazingly reliable. */
static unsigned long intuit_page_offset(unsigned char *img, unsigned long len)
{
	unsigned int i, possibilities[256] = { 0 };

	for (i = 0; i + 4 < len; i++) {
		/* mov 0xXXXXXXXX,%eax */
		if (img[i] == 0xA1 && ++possibilities[img[i+4]] > 3)
			return (unsigned long)img[i+4] << 24;
	}
	errx(1, "could not determine page offset");
}





^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:09                     ` H. Peter Anvin
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
  2007-05-02 21:39                       ` Jeremy Fitzhardinge
@ 2007-05-03  2:01                       ` Rusty Russell
  2007-05-03  2:01                       ` Rusty Russell
  3 siblings, 0 replies; 217+ messages in thread
From: Rusty Russell @ 2007-05-03  2:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, Vivek Goyal,
	Gerd Hoffmann, Eric W. Biederman

On Wed, 2007-05-02 at 14:09 -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > Hm, that's unfortunate.  How about an ELF file wrapped in some other
> > container, so that we can easily extract a properly formed ELF file?
> > 
> 
> Effectively the same thing as changing the magic number.  Note that the
> format for bzImage is pretty rigid, and it would be *highly* undesirable
> to muck that up.

To add some code to the debate, here's how lguest loads a bzImage (from
my draft documentation).  Almost anything would be an improvement:


/* A bzImage, unlike an ELF file, is not meant to be loaded.  You're
 * supposed to jump into it and it will unpack itself.  We can't do that
 * because the Guest can't run the unpacking code, and adding features to
 * lguest kills puppies, so we don't want to.
 *
 * The bzImage is formed by putting the decompressing code in front of the
 * compressed kernel code.  So we can simple scan through it looking for the
 * first "gzip" header, and start decompressing from there. */
static unsigned long load_bzimage(int fd, unsigned long *page_offset)
{
	unsigned char c;
	int state = 0;

	/* GZIP header is 0x1F 0x8B <method> <flags>... <compressed-by>. */
	while (read(fd, &c, 1) == 1) {
		switch (state) {
		case 0:
			if (c == 0x1F)
				state++;
			break;
		case 1:
			if (c == 0x8B)
				state++;
			else
				state = 0;
			break;
		case 2 ... 8:
			state++;
			break;
		case 9:
			/* Seek back to the start of the gzip header. */
			lseek(fd, -10, SEEK_CUR);
			/* One final check: "compressed under UNIX". */
			if (c != 0x03)
				state = -1;
			else
				return unpack_bzimage(fd, page_offset);
		}
	}
	errx(1, "Could not find kernel in bzImage");
}

/* Unfortunately the entire ELF image isn't compressed: the segments
 * which need loading are extracted and compressed raw.  This denies us the
 * information we need to make a fully-general loader. */
static unsigned long unpack_bzimage(int fd, unsigned long *page_offset)
{
	gzFile f;
	int ret, len = 0;
	/* A bzImage always gets loaded at physical address 1M.  This is
	 * actually configurable as CONFIG_PHYSICAL_START, but as the comment
	 * there says, "Don't change this unless you know what you are doing".
	 * Indeed. */
	void *img = (void *)0x100000;

	/* gzdopen takes our file descriptor (carefully placed at the start of
	 * the GZIP header we found) and returns a gzFile. */
	f = gzdopen(fd, "rb");
	/* Unfortunately, if we made a mistake and it wasn't really a gzip
	 * header, it will still read the file, but directly without
	 * decompressing it.  For us, that's a misfeature. */
	if (gzdirect(f))
		errx(1, "did not find correct gzip header");
	/* We read it into memory in 64k chunks until we hit the end. */
	while ((ret = gzread(f, img + len, 65536)) > 0)
		len += ret;
	if (ret < 0)
		err(1, "reading image from bzImage");

	verbose("Unpacked size %i addr %p\n", len, img);

	/* Without the ELF header, we can't tell virtual-physical gap.  This is
	 * CONFIG_PAGE_OFFSET, and people do actually change it.  Fortunately,
	 * I have a clever way of figuring it out from the code itself.  */
	*page_offset = intuit_page_offset(img, len);

	/* Entry is physical address: convert to virtual */
	return (unsigned long)img + *page_offset;
}

/* Prepare to be SHOCKED and AMAZED.  And possibly a trifle nauseated.
 *
 * We know that CONFIG_PAGE_OFFSET sets what virtual address the kernel expects
 * to be.  We don't know what that option was, but we can figure it out
 * approximately by looking at the addresses in the code.  I chose the common
 * case of reading a memory location into the %eax register:
 *
 *  movl <some-address>, %eax
 *
 * This gets encoded as five bytes: "0xA1 <4-byte-address>".  For example,
 * "0xA1 0x18 0x60 0x47 0xC0" reads the address 0xC0476018 into %eax.
 *
 * In this example can guess that the kernel was compiled with
 * CONFIG_PAGE_OFFSET set to 0xC0000000 (it's always a round number).  If the
 * kernel were larger than 16MB, we might see 0xC1 addresses show up, but our
 * kernel isn't that bloated yet.
 *
 * Unfortunately, x86 has variable-length instructions, so finding this
 * particular instruction properly involves writing a disassembler.  Instead,
 * we rely on statistics.  We look for "0xA1" and tally the different bytes
 * which occur 4 bytes later (the "0xC0" in our example above).  When one of
 * those bytes appears three times, we can be reasonably confident that it
 * forms the start of CONFIG_PAGE_OFFSET.
 *
 * This is amazingly reliable. */
static unsigned long intuit_page_offset(unsigned char *img, unsigned long len)
{
	unsigned int i, possibilities[256] = { 0 };

	for (i = 0; i + 4 < len; i++) {
		/* mov 0xXXXXXXXX,%eax */
		if (img[i] == 0xA1 && ++possibilities[img[i+4]] > 3)
			return (unsigned long)img[i+4] << 24;
	}
	errx(1, "could not determine page offset");
}

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:59                         ` H. Peter Anvin
                                             ` (2 preceding siblings ...)
  2007-05-03  4:50                           ` Vivek Goyal
@ 2007-05-03  4:50                           ` Vivek Goyal
  2007-05-03  6:42                             ` Eric W. Biederman
  2007-05-03  6:42                             ` Eric W. Biederman
  3 siblings, 2 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-05-03  4:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Gerd Hoffmann, Eric W. Biederman,
	Jeff Garzik, patches, linux-kernel, virtualization

On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > So the bzImage structure is currently:
> > 
> >    1. old-style boot sector
> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >    3. the HdrS boot param block
> >    4. setup.S boot code
> >    5. the self-decompressing kernel
> > 
> > If we make 5 actually an ELF file, containing properly formed Ehdr,
> > Phdrs (for all the mappings required), and the actual kernel
> > decompressor, relocator and compressed kernel data, then it would be
> > easy for the Xen domain builder to find that and use it as a basis for
> > loading.  I think it would just require the bzImage boot param block to
> > contain an offset of the start of the ELF file.  The contents of the ELF
> > file would be in a form where the normal boot code could just jump over
> > the ELF headers, directly into the segment data itself.
> > 
> > ie:
> > 
> >    1. old-style boot sector
> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >    3. the HdrS boot param block
> >    4. setup.S boot code (jumps directly into 5.3)
> >    5. 32-bit self-decompressing kernel:
> >          1. Ehdr
> >          2. Phdrs for all necessary mappings
> >          3. decompressor/relocator .text
> >          4. compressed kernel data
> > 
> > Does that sound reasonable?
> > 
> 
> I don't know if that would break any programs that are currently
> bypassing the setup.

I think kexec bzImage loader will break. It bypasses the setup code and
directly jumps to the code present after setup sectors(decompressor).

> The existing setup protocol definitely allows
> invoking an entry point which isn't 0x100000 (rather, the 32-bit
> entrypoint is defined by code32_start); I'm not sure how Eric's
> relocatable kernel patches (2.05 protocol) affect that, mostly because I
> haven't seen any boot loaders which actually use it so I can't comment
> on what their code looks like.

With relocatable patches, if a boot loader decides to load protected mode
component at non-1MB address, then it shall have to modify code32_start to
reflect the new location of protected mode code.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-02 21:59                         ` H. Peter Anvin
  2007-05-02 23:03                           ` Jeremy Fitzhardinge
  2007-05-02 23:03                           ` Jeremy Fitzhardinge
@ 2007-05-03  4:50                           ` Vivek Goyal
  2007-05-03  4:50                           ` Vivek Goyal
  3 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-05-03  4:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	Gerd Hoffmann, Eric W. Biederman

On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > So the bzImage structure is currently:
> > 
> >    1. old-style boot sector
> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >    3. the HdrS boot param block
> >    4. setup.S boot code
> >    5. the self-decompressing kernel
> > 
> > If we make 5 actually an ELF file, containing properly formed Ehdr,
> > Phdrs (for all the mappings required), and the actual kernel
> > decompressor, relocator and compressed kernel data, then it would be
> > easy for the Xen domain builder to find that and use it as a basis for
> > loading.  I think it would just require the bzImage boot param block to
> > contain an offset of the start of the ELF file.  The contents of the ELF
> > file would be in a form where the normal boot code could just jump over
> > the ELF headers, directly into the segment data itself.
> > 
> > ie:
> > 
> >    1. old-style boot sector
> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >    3. the HdrS boot param block
> >    4. setup.S boot code (jumps directly into 5.3)
> >    5. 32-bit self-decompressing kernel:
> >          1. Ehdr
> >          2. Phdrs for all necessary mappings
> >          3. decompressor/relocator .text
> >          4. compressed kernel data
> > 
> > Does that sound reasonable?
> > 
> 
> I don't know if that would break any programs that are currently
> bypassing the setup.

I think kexec bzImage loader will break. It bypasses the setup code and
directly jumps to the code present after setup sectors(decompressor).

> The existing setup protocol definitely allows
> invoking an entry point which isn't 0x100000 (rather, the 32-bit
> entrypoint is defined by code32_start); I'm not sure how Eric's
> relocatable kernel patches (2.05 protocol) affect that, mostly because I
> haven't seen any boot loaders which actually use it so I can't comment
> on what their code looks like.

With relocatable patches, if a boot loader decides to load protected mode
component at non-1MB address, then it shall have to modify code32_start to
reflect the new location of protected mode code.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  4:50                           ` Vivek Goyal
  2007-05-03  6:42                             ` Eric W. Biederman
@ 2007-05-03  6:42                             ` Eric W. Biederman
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
                                                 ` (3 more replies)
  1 sibling, 4 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-03  6:42 UTC (permalink / raw)
  To: vgoyal
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann, Jeff Garzik,
	patches, linux-kernel, virtualization

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
>> Jeremy Fitzhardinge wrote:
>> > 
>> > So the bzImage structure is currently:
>> > 
>> >    1. old-style boot sector
>> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
>> >    3. the HdrS boot param block
>> >    4. setup.S boot code
>> >    5. the self-decompressing kernel
>> > 
>> > If we make 5 actually an ELF file, containing properly formed Ehdr,
>> > Phdrs (for all the mappings required), and the actual kernel
>> > decompressor, relocator and compressed kernel data, then it would be
>> > easy for the Xen domain builder to find that and use it as a basis for
>> > loading.  I think it would just require the bzImage boot param block to
>> > contain an offset of the start of the ELF file.  The contents of the ELF
>> > file would be in a form where the normal boot code could just jump over
>> > the ELF headers, directly into the segment data itself.
>> > 
>> > ie:
>> > 
>> >    1. old-style boot sector
>> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
>> >    3. the HdrS boot param block
>> >    4. setup.S boot code (jumps directly into 5.3)
>> >    5. 32-bit self-decompressing kernel:
>> >          1. Ehdr
>> >          2. Phdrs for all necessary mappings
>> >          3. decompressor/relocator .text
>> >          4. compressed kernel data
>> > 
>> > Does that sound reasonable?
>> > 
>> 
>> I don't know if that would break any programs that are currently
>> bypassing the setup.

I think everything will break, unless we make 5.1 and 5.2 
into 4.2 and 4.3.  In the above design.

> I think kexec bzImage loader will break. It bypasses the setup code and
> directly jumps to the code present after setup sectors(decompressor).

Quite likely.    The boot sector except for a handful of bytes actually
goes unused so we can put extra header information there, I actually
have patches for placing an ELF header there.

If we wanted to do an ELF header in the middle we would have to put
it at the end of the setup sectors rather then the beginning of the
raw protected mode kernel image.

>> The existing setup protocol definitely allows
>> invoking an entry point which isn't 0x100000 (rather, the 32-bit
>> entrypoint is defined by code32_start); I'm not sure how Eric's
>> relocatable kernel patches (2.05 protocol) affect that, mostly because I
>> haven't seen any boot loaders which actually use it so I can't comment
>> on what their code looks like.
>
> With relocatable patches, if a boot loader decides to load protected mode
> component at non-1MB address, then it shall have to modify code32_start to
> reflect the new location of protected mode code.

Yes.  And this aspect of the relocatable kernel is all Vivek.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  4:50                           ` Vivek Goyal
@ 2007-05-03  6:42                             ` Eric W. Biederman
  2007-05-03  6:42                             ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-03  6:42 UTC (permalink / raw)
  To: vgoyal
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	Gerd Hoffmann, H. Peter Anvin

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
>> Jeremy Fitzhardinge wrote:
>> > 
>> > So the bzImage structure is currently:
>> > 
>> >    1. old-style boot sector
>> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
>> >    3. the HdrS boot param block
>> >    4. setup.S boot code
>> >    5. the self-decompressing kernel
>> > 
>> > If we make 5 actually an ELF file, containing properly formed Ehdr,
>> > Phdrs (for all the mappings required), and the actual kernel
>> > decompressor, relocator and compressed kernel data, then it would be
>> > easy for the Xen domain builder to find that and use it as a basis for
>> > loading.  I think it would just require the bzImage boot param block to
>> > contain an offset of the start of the ELF file.  The contents of the ELF
>> > file would be in a form where the normal boot code could just jump over
>> > the ELF headers, directly into the segment data itself.
>> > 
>> > ie:
>> > 
>> >    1. old-style boot sector
>> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
>> >    3. the HdrS boot param block
>> >    4. setup.S boot code (jumps directly into 5.3)
>> >    5. 32-bit self-decompressing kernel:
>> >          1. Ehdr
>> >          2. Phdrs for all necessary mappings
>> >          3. decompressor/relocator .text
>> >          4. compressed kernel data
>> > 
>> > Does that sound reasonable?
>> > 
>> 
>> I don't know if that would break any programs that are currently
>> bypassing the setup.

I think everything will break, unless we make 5.1 and 5.2 
into 4.2 and 4.3.  In the above design.

> I think kexec bzImage loader will break. It bypasses the setup code and
> directly jumps to the code present after setup sectors(decompressor).

Quite likely.    The boot sector except for a handful of bytes actually
goes unused so we can put extra header information there, I actually
have patches for placing an ELF header there.

If we wanted to do an ELF header in the middle we would have to put
it at the end of the setup sectors rather then the beginning of the
raw protected mode kernel image.

>> The existing setup protocol definitely allows
>> invoking an entry point which isn't 0x100000 (rather, the 32-bit
>> entrypoint is defined by code32_start); I'm not sure how Eric's
>> relocatable kernel patches (2.05 protocol) affect that, mostly because I
>> haven't seen any boot loaders which actually use it so I can't comment
>> on what their code looks like.
>
> With relocatable patches, if a boot loader decides to load protected mode
> component at non-1MB address, then it shall have to modify code32_start to
> reflect the new location of protected mode code.

Yes.  And this aspect of the relocatable kernel is all Vivek.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  6:42                             ` Eric W. Biederman
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
@ 2007-05-03  7:05                               ` Jeremy Fitzhardinge
  2007-05-03 13:23                                 ` Eric W. Biederman
  2007-05-03 13:23                                 ` Eric W. Biederman
  2007-05-08 16:41                               ` yhlu
  2007-05-08 16:41                               ` yhlu
  3 siblings, 2 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-03  7:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Gerd Hoffmann, Jeff Garzik, patches,
	linux-kernel, virtualization

Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
>
>   
>> On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
>>     
>>> Jeremy Fitzhardinge wrote:
>>>       
>>>> So the bzImage structure is currently:
>>>>
>>>>    1. old-style boot sector
>>>>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>>>>    3. the HdrS boot param block
>>>>    4. setup.S boot code
>>>>    5. the self-decompressing kernel
>>>>
>>>> If we make 5 actually an ELF file, containing properly formed Ehdr,
>>>> Phdrs (for all the mappings required), and the actual kernel
>>>> decompressor, relocator and compressed kernel data, then it would be
>>>> easy for the Xen domain builder to find that and use it as a basis for
>>>> loading.  I think it would just require the bzImage boot param block to
>>>> contain an offset of the start of the ELF file.  The contents of the ELF
>>>> file would be in a form where the normal boot code could just jump over
>>>> the ELF headers, directly into the segment data itself.
>>>>
>>>> ie:
>>>>
>>>>    1. old-style boot sector
>>>>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>>>>    3. the HdrS boot param block
>>>>    4. setup.S boot code (jumps directly into 5.3)
>>>>    5. 32-bit self-decompressing kernel:
>>>>          1. Ehdr
>>>>          2. Phdrs for all necessary mappings
>>>>          3. decompressor/relocator .text
>>>>          4. compressed kernel data
>>>>
>>>> Does that sound reasonable?
>>>>
>>>>         
>>> I don't know if that would break any programs that are currently
>>> bypassing the setup.
>>>       
>
> I think everything will break, unless we make 5.1 and 5.2 
> into 4.2 and 4.3.  In the above design.
>
>   
>> I think kexec bzImage loader will break. It bypasses the setup code and
>> directly jumps to the code present after setup sectors(decompressor).
>>     
>
> Quite likely.    The boot sector except for a handful of bytes actually
> goes unused so we can put extra header information there, I actually
> have patches for placing an ELF header there.

OK, whatever you think will work.  But I do think it should be a proper
ELF file with a correct magic number, so that you can just point an ELF
file parser at it and have it work (which means, of course, that all the
file offsets are offsets from the start of the Ehdr, rather than from
the start of the bzImage).

You haven't specifically commented on using the Phdrs as a way of
specifying the mappings required for decompression and early kernel
execution.  It seems pretty natural to me, but I guess that raises the
general question of what execution environment the kernel can expect to
find itself in, and which modes of booting will actually enable paging
and establish any kinds of mapping at all.

In the Xen case, its obviously the domain builder who creates the
mappings, and we can easily implement p != v mappings.  But when booting
native, presumably paging is off at this stage, and only identity maps
can be implemented.  I guess the rough rule is that if paging is enabled
on entry, the kernel should expect all the bzImage mappings to be in
place, but if paging is off, well, the question is moot.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  6:42                             ` Eric W. Biederman
@ 2007-05-03  7:05                               ` Jeremy Fitzhardinge
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-03  7:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Gerd Hoffmann, H. Peter Anvin

Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
>
>   
>> On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
>>     
>>> Jeremy Fitzhardinge wrote:
>>>       
>>>> So the bzImage structure is currently:
>>>>
>>>>    1. old-style boot sector
>>>>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>>>>    3. the HdrS boot param block
>>>>    4. setup.S boot code
>>>>    5. the self-decompressing kernel
>>>>
>>>> If we make 5 actually an ELF file, containing properly formed Ehdr,
>>>> Phdrs (for all the mappings required), and the actual kernel
>>>> decompressor, relocator and compressed kernel data, then it would be
>>>> easy for the Xen domain builder to find that and use it as a basis for
>>>> loading.  I think it would just require the bzImage boot param block to
>>>> contain an offset of the start of the ELF file.  The contents of the ELF
>>>> file would be in a form where the normal boot code could just jump over
>>>> the ELF headers, directly into the segment data itself.
>>>>
>>>> ie:
>>>>
>>>>    1. old-style boot sector
>>>>    2. old-style boot info, followed by 0xaa55 at the end of the sector
>>>>    3. the HdrS boot param block
>>>>    4. setup.S boot code (jumps directly into 5.3)
>>>>    5. 32-bit self-decompressing kernel:
>>>>          1. Ehdr
>>>>          2. Phdrs for all necessary mappings
>>>>          3. decompressor/relocator .text
>>>>          4. compressed kernel data
>>>>
>>>> Does that sound reasonable?
>>>>
>>>>         
>>> I don't know if that would break any programs that are currently
>>> bypassing the setup.
>>>       
>
> I think everything will break, unless we make 5.1 and 5.2 
> into 4.2 and 4.3.  In the above design.
>
>   
>> I think kexec bzImage loader will break. It bypasses the setup code and
>> directly jumps to the code present after setup sectors(decompressor).
>>     
>
> Quite likely.    The boot sector except for a handful of bytes actually
> goes unused so we can put extra header information there, I actually
> have patches for placing an ELF header there.

OK, whatever you think will work.  But I do think it should be a proper
ELF file with a correct magic number, so that you can just point an ELF
file parser at it and have it work (which means, of course, that all the
file offsets are offsets from the start of the Ehdr, rather than from
the start of the bzImage).

You haven't specifically commented on using the Phdrs as a way of
specifying the mappings required for decompression and early kernel
execution.  It seems pretty natural to me, but I guess that raises the
general question of what execution environment the kernel can expect to
find itself in, and which modes of booting will actually enable paging
and establish any kinds of mapping at all.

In the Xen case, its obviously the domain builder who creates the
mappings, and we can easily implement p != v mappings.  But when booting
native, presumably paging is off at this stage, and only identity maps
can be implemented.  I guess the rough rule is that if paging is enabled
on entry, the kernel should expect all the bzImage mappings to be in
place, but if paging is off, well, the question is moot.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
  2007-05-03 13:23                                 ` Eric W. Biederman
@ 2007-05-03 13:23                                 ` Eric W. Biederman
  2007-05-03 16:23                                   ` Jeremy Fitzhardinge
  2007-05-03 16:23                                   ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-03 13:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: vgoyal, H. Peter Anvin, Gerd Hoffmann, Jeff Garzik, patches,
	linux-kernel, virtualization

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> OK, whatever you think will work.  But I do think it should be a proper
> ELF file with a correct magic number, so that you can just point an ELF
> file parser at it and have it work (which means, of course, that all the
> file offsets are offsets from the start of the Ehdr, rather than from
> the start of the bzImage).

Yes.  I guess in this context, I am generally for building the ELF
headers by hand instead of with a linker script, because then we
know exactly what is happening and can ensure everything is just so.

> You haven't specifically commented on using the Phdrs as a way of
> specifying the mappings required for decompression and early kernel
> execution.  It seems pretty natural to me, but I guess that raises the
> general question of what execution environment the kernel can expect to
> find itself in, and which modes of booting will actually enable paging
> and establish any kinds of mapping at all.

Sorry, for not being clear I have been expecting to do this for years,
it is one of the reasons I keep coming back to putting an ELF header
on the bzImage.  arch/x86_64/kernel already does this to some extent
as it has to setup up some identity page mappings for itself in the
case it has to do the switch from real to protected mode itself.

> In the Xen case, its obviously the domain builder who creates the
> mappings, and we can easily implement p != v mappings.  But when booting
> native, presumably paging is off at this stage, and only identity maps
> can be implemented.  I guess the rough rule is that if paging is enabled
> on entry, the kernel should expect all the bzImage mappings to be in
> place, but if paging is off, well, the question is moot.

Right.  Except that there is a bit of a catch 22 in the
para-virtualized environments of setting up the page tables, I'm not
at all certain what the gain of setting up p != v mappings are.

Having just written some C code that runs fairly successfully in p !=
v, on arch/i386 I'm not too concerned.  arch/x86_64 ought to work with
a similar level of effort although the expectations there are a little
different. So while I don't necessarily considering running in p != v
when compiled to run at v general it should work for the cases we
are interested in.  Setting up the page tables for arch/x86_64 will
be more interesting.

Part of what I find compelling about this is our initial page tables
for linux have always had more going on than the virtual addresses
just being at a constant offset from of the physical addresses, so
the actions of the current domain builders have me concerned that they
may be violating some early linux booting assumptions and are
currently just getting lucky.  Moving the page table setup code into
the kernel removes that dependency from the domain builders.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
@ 2007-05-03 13:23                                 ` Eric W. Biederman
  2007-05-03 13:23                                 ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-03 13:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Gerd Hoffmann, H. Peter Anvin

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> OK, whatever you think will work.  But I do think it should be a proper
> ELF file with a correct magic number, so that you can just point an ELF
> file parser at it and have it work (which means, of course, that all the
> file offsets are offsets from the start of the Ehdr, rather than from
> the start of the bzImage).

Yes.  I guess in this context, I am generally for building the ELF
headers by hand instead of with a linker script, because then we
know exactly what is happening and can ensure everything is just so.

> You haven't specifically commented on using the Phdrs as a way of
> specifying the mappings required for decompression and early kernel
> execution.  It seems pretty natural to me, but I guess that raises the
> general question of what execution environment the kernel can expect to
> find itself in, and which modes of booting will actually enable paging
> and establish any kinds of mapping at all.

Sorry, for not being clear I have been expecting to do this for years,
it is one of the reasons I keep coming back to putting an ELF header
on the bzImage.  arch/x86_64/kernel already does this to some extent
as it has to setup up some identity page mappings for itself in the
case it has to do the switch from real to protected mode itself.

> In the Xen case, its obviously the domain builder who creates the
> mappings, and we can easily implement p != v mappings.  But when booting
> native, presumably paging is off at this stage, and only identity maps
> can be implemented.  I guess the rough rule is that if paging is enabled
> on entry, the kernel should expect all the bzImage mappings to be in
> place, but if paging is off, well, the question is moot.

Right.  Except that there is a bit of a catch 22 in the
para-virtualized environments of setting up the page tables, I'm not
at all certain what the gain of setting up p != v mappings are.

Having just written some C code that runs fairly successfully in p !=
v, on arch/i386 I'm not too concerned.  arch/x86_64 ought to work with
a similar level of effort although the expectations there are a little
different. So while I don't necessarily considering running in p != v
when compiled to run at v general it should work for the cases we
are interested in.  Setting up the page tables for arch/x86_64 will
be more interesting.

Part of what I find compelling about this is our initial page tables
for linux have always had more going on than the virtual addresses
just being at a constant offset from of the physical addresses, so
the actions of the current domain builders have me concerned that they
may be violating some early linux booting assumptions and are
currently just getting lucky.  Moving the page table setup code into
the kernel removes that dependency from the domain builders.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03 13:23                                 ` Eric W. Biederman
@ 2007-05-03 16:23                                   ` Jeremy Fitzhardinge
  2007-05-03 16:23                                   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-03 16:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Gerd Hoffmann, Jeff Garzik, patches,
	linux-kernel, virtualization

Eric W. Biederman wrote:
> Yes.  I guess in this context, I am generally for building the ELF
> headers by hand instead of with a linker script, because then we
> know exactly what is happening and can ensure everything is just so.
>   

Yes, it seems easiest - particularly given how flaky binutils can get
when you really try to control its ELF generation.

> Sorry, for not being clear I have been expecting to do this for years,
> it is one of the reasons I keep coming back to putting an ELF header
> on the bzImage. 
>   

OK.  It seems obvious, but I just wanted to make sure ;)

>> In the Xen case, its obviously the domain builder who creates the
>> mappings, and we can easily implement p != v mappings.  But when booting
>> native, presumably paging is off at this stage, and only identity maps
>> can be implemented.  I guess the rough rule is that if paging is enabled
>> on entry, the kernel should expect all the bzImage mappings to be in
>> place, but if paging is off, well, the question is moot.
>>     
>
> Right.  Except that there is a bit of a catch 22 in the
> para-virtualized environments of setting up the page tables, I'm not
> at all certain what the gain of setting up p != v mappings are.
>   

Well, that's more or less it.  If the decompressor ends up jumping to
startup_32, and that immediately goes into xen_start_kernel(), then
we're still running on the initial bzImage p=v pagetables.  At the
moment, when the domain builder maps the kernel's vmlinux to the vaddrs
in its Phdrs, so there's no need to do any more boot-time pagetable
manipulation.  If we come out of bzImage with only identity mappings,
then obviously the Xen case will need to do the same pagetable setup as
the native case - which is good so long as we can work out how to share
the code to do so.

For i386, it looks like this will be tricky because at this point:

    * we're not running at the linked address, so C code will be tricky
      and non-standard
    * we need to deal with multiple hypervisors and their constraints on
      what can be in a pagetable
    * we could be running with no paging, or paging in either non-PAE or
      PAE modes

Writing some code which can deal with all of those at once will be an
interesting exercise.

> Part of what I find compelling about this is our initial page tables
> for linux have always had more going on than the virtual addresses
> just being at a constant offset from of the physical addresses, so
> the actions of the current domain builders have me concerned that they
> may be violating some early linux booting assumptions and are
> currently just getting lucky.  Moving the page table setup code into
> the kernel removes that dependency from the domain builders.

The nice thing about having the domain builder create the pagetables is
that it turns it from a tricky bootstrap problem into a relatively easy
job.  The main thing is that the domain builder can create a scaffolding
pagetable which is enough to get everything started.  Once you have that
in place, its pretty easy to update it to set precisely the right bits
in the ptes, etc.

It also means that the path for Xen vs native will be more similar,
because the bzImage code won't need to deal with pagetable setup at all:
for native it won't matter, and for Xen it has already been done.  It
only matters once we hit the 32-bit kernel-proper code, and we diverge
at that point anyway.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03 13:23                                 ` Eric W. Biederman
  2007-05-03 16:23                                   ` Jeremy Fitzhardinge
@ 2007-05-03 16:23                                   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-03 16:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

Eric W. Biederman wrote:
> Yes.  I guess in this context, I am generally for building the ELF
> headers by hand instead of with a linker script, because then we
> know exactly what is happening and can ensure everything is just so.
>   

Yes, it seems easiest - particularly given how flaky binutils can get
when you really try to control its ELF generation.

> Sorry, for not being clear I have been expecting to do this for years,
> it is one of the reasons I keep coming back to putting an ELF header
> on the bzImage. 
>   

OK.  It seems obvious, but I just wanted to make sure ;)

>> In the Xen case, its obviously the domain builder who creates the
>> mappings, and we can easily implement p != v mappings.  But when booting
>> native, presumably paging is off at this stage, and only identity maps
>> can be implemented.  I guess the rough rule is that if paging is enabled
>> on entry, the kernel should expect all the bzImage mappings to be in
>> place, but if paging is off, well, the question is moot.
>>     
>
> Right.  Except that there is a bit of a catch 22 in the
> para-virtualized environments of setting up the page tables, I'm not
> at all certain what the gain of setting up p != v mappings are.
>   

Well, that's more or less it.  If the decompressor ends up jumping to
startup_32, and that immediately goes into xen_start_kernel(), then
we're still running on the initial bzImage p=v pagetables.  At the
moment, when the domain builder maps the kernel's vmlinux to the vaddrs
in its Phdrs, so there's no need to do any more boot-time pagetable
manipulation.  If we come out of bzImage with only identity mappings,
then obviously the Xen case will need to do the same pagetable setup as
the native case - which is good so long as we can work out how to share
the code to do so.

For i386, it looks like this will be tricky because at this point:

    * we're not running at the linked address, so C code will be tricky
      and non-standard
    * we need to deal with multiple hypervisors and their constraints on
      what can be in a pagetable
    * we could be running with no paging, or paging in either non-PAE or
      PAE modes

Writing some code which can deal with all of those at once will be an
interesting exercise.

> Part of what I find compelling about this is our initial page tables
> for linux have always had more going on than the virtual addresses
> just being at a constant offset from of the physical addresses, so
> the actions of the current domain builders have me concerned that they
> may be violating some early linux booting assumptions and are
> currently just getting lucky.  Moving the page table setup code into
> the kernel removes that dependency from the domain builders.

The nice thing about having the domain builder create the pagetables is
that it turns it from a tricky bootstrap problem into a relatively easy
job.  The main thing is that the domain builder can create a scaffolding
pagetable which is enough to get everything started.  Once you have that
in place, its pretty easy to update it to set precisely the right bits
in the ptes, etc.

It also means that the path for Xen vs native will be more similar,
because the bzImage code won't need to deal with pagetable setup at all:
for native it won't matter, and for Xen it has already been done.  It
only matters once we hit the 32-bit kernel-proper code, and we diverge
at that point anyway.

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  6:42                             ` Eric W. Biederman
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
  2007-05-03  7:05                               ` Jeremy Fitzhardinge
@ 2007-05-08 16:41                               ` yhlu
  2007-05-08 17:18                                 ` Eric W. Biederman
                                                   ` (3 more replies)
  2007-05-08 16:41                               ` yhlu
  3 siblings, 4 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 16:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

On 5/2/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
>
> > On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> >> Jeremy Fitzhardinge wrote:
> >> >
> >> > So the bzImage structure is currently:
> >> >
> >> >    1. old-style boot sector
> >> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >> >    3. the HdrS boot param block
> >> >    4. setup.S boot code
> >> >    5. the self-decompressing kernel
> >> >

Eric,

With the latest change that make vmlinux to be elf64 and make bzImage
do switch to 64bit long mode, the kernel started via kexec can not get
VGA console. but the serial console works well. I wonder if the
setup.S is skipped in bzImage via kexec path.

or i missed sth?

#!/bin/bash
./kexec -t bzImage -l bzImage_2.6.22_k8.1 --command-line="apic=debug
acpi_dbg_level=0x00000007 pci=routeirq snd-hda-intel.enable_msi=1
ramdisk_size=65536 root=/dev/ram0 rw ip=dhcp console=tty0
console=ttyS0,9600n8" --ramdisk=mydisk8_x86_64.gz


YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-03  6:42                             ` Eric W. Biederman
                                                 ` (2 preceding siblings ...)
  2007-05-08 16:41                               ` yhlu
@ 2007-05-08 16:41                               ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 16:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

On 5/2/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
>
> > On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> >> Jeremy Fitzhardinge wrote:
> >> >
> >> > So the bzImage structure is currently:
> >> >
> >> >    1. old-style boot sector
> >> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >> >    3. the HdrS boot param block
> >> >    4. setup.S boot code
> >> >    5. the self-decompressing kernel
> >> >

Eric,

With the latest change that make vmlinux to be elf64 and make bzImage
do switch to 64bit long mode, the kernel started via kexec can not get
VGA console. but the serial console works well. I wonder if the
setup.S is skipped in bzImage via kexec path.

or i missed sth?

#!/bin/bash
./kexec -t bzImage -l bzImage_2.6.22_k8.1 --command-line="apic=debug
acpi_dbg_level=0x00000007 pci=routeirq snd-hda-intel.enable_msi=1
ramdisk_size=65536 root=/dev/ram0 rw ip=dhcp console=tty0
console=ttyS0,9600n8" --ramdisk=mydisk8_x86_64.gz


YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 16:41                               ` yhlu
  2007-05-08 17:18                                 ` Eric W. Biederman
@ 2007-05-08 17:18                                 ` Eric W. Biederman
  2007-05-08 17:33                                     ` yhlu
  2007-05-08 17:24                                 ` Vivek Goyal
  2007-05-08 17:24                                 ` Vivek Goyal
  3 siblings, 1 reply; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-08 17:18 UTC (permalink / raw)
  To: yhlu
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

yhlu <yhlu.kernel@gmail.com> writes:

> Eric,
>
> With the latest change that make vmlinux to be elf64 and make bzImage
> do switch to 64bit long mode, the kernel started via kexec can not get
> VGA console. but the serial console works well. I wonder if the
> setup.S is skipped in bzImage via kexec path.

Yes.  setup.S has always been skipped by bzImage via the kexec path
unless you explicitly tell /sbin/kexec to use the 16bit entry point.

Is not having a VGA console a new thing, or it something you just noticed?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 16:41                               ` yhlu
@ 2007-05-08 17:18                                 ` Eric W. Biederman
  2007-05-08 17:18                                 ` Eric W. Biederman
                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-08 17:18 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> Eric,
>
> With the latest change that make vmlinux to be elf64 and make bzImage
> do switch to 64bit long mode, the kernel started via kexec can not get
> VGA console. but the serial console works well. I wonder if the
> setup.S is skipped in bzImage via kexec path.

Yes.  setup.S has always been skipped by bzImage via the kexec path
unless you explicitly tell /sbin/kexec to use the 16bit entry point.

Is not having a VGA console a new thing, or it something you just noticed?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 16:41                               ` yhlu
                                                   ` (2 preceding siblings ...)
  2007-05-08 17:24                                 ` Vivek Goyal
@ 2007-05-08 17:24                                 ` Vivek Goyal
  2007-05-08 17:34                                   ` yhlu
  2007-05-08 17:34                                   ` yhlu
  3 siblings, 2 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-05-08 17:24 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, H. Peter Anvin, Jeremy Fitzhardinge,
	Gerd Hoffmann, Jeff Garzik, patches, linux-kernel,
	virtualization

On Tue, May 08, 2007 at 09:41:09AM -0700, yhlu wrote:
> On 5/2/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >Vivek Goyal <vgoyal@in.ibm.com> writes:
> >
> >> On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> >>> Jeremy Fitzhardinge wrote:
> >>> >
> >>> > So the bzImage structure is currently:
> >>> >
> >>> >    1. old-style boot sector
> >>> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >>> >    3. the HdrS boot param block
> >>> >    4. setup.S boot code
> >>> >    5. the self-decompressing kernel
> >>> >
> 
> Eric,
> 
> With the latest change that make vmlinux to be elf64 and make bzImage
> do switch to 64bit long mode, the kernel started via kexec can not get
> VGA console. but the serial console works well. I wonder if the
> setup.S is skipped in bzImage via kexec path.
> 
> or i missed sth?
> 

Hi,

setup.S is never executed while doing kexec (unless somebody chooses to
do a real mode entry) and these patches don't change this beahviour.

Tomorrow I will test VGA behaviour on my machine. Are you using some
special frame buffer mode etc?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 16:41                               ` yhlu
  2007-05-08 17:18                                 ` Eric W. Biederman
  2007-05-08 17:18                                 ` Eric W. Biederman
@ 2007-05-08 17:24                                 ` Vivek Goyal
  2007-05-08 17:24                                 ` Vivek Goyal
  3 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-05-08 17:24 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	H. Peter Anvin, Eric W. Biederman

On Tue, May 08, 2007 at 09:41:09AM -0700, yhlu wrote:
> On 5/2/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >Vivek Goyal <vgoyal@in.ibm.com> writes:
> >
> >> On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> >>> Jeremy Fitzhardinge wrote:
> >>> >
> >>> > So the bzImage structure is currently:
> >>> >
> >>> >    1. old-style boot sector
> >>> >    2. old-style boot info, followed by 0xaa55 at the end of the sector
> >>> >    3. the HdrS boot param block
> >>> >    4. setup.S boot code
> >>> >    5. the self-decompressing kernel
> >>> >
> 
> Eric,
> 
> With the latest change that make vmlinux to be elf64 and make bzImage
> do switch to 64bit long mode, the kernel started via kexec can not get
> VGA console. but the serial console works well. I wonder if the
> setup.S is skipped in bzImage via kexec path.
> 
> or i missed sth?
> 

Hi,

setup.S is never executed while doing kexec (unless somebody chooses to
do a real mode entry) and these patches don't change this beahviour.

Tomorrow I will test VGA behaviour on my machine. Are you using some
special frame buffer mode etc?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 17:18                                 ` Eric W. Biederman
@ 2007-05-08 17:33                                     ` yhlu
  0 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 17:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Yes.  setup.S has always been skipped by bzImage via the kexec path
> unless you explicitly tell /sbin/kexec to use the 16bit entry point.
>
> Is not having a VGA console a new thing, or it something you just noticed?
>
> Eric
>

before the changes, it works well.

with --real-mode, it will reset the machine.
with --reset-vga, i will get

Kernel alive
kernel direct mapping tables up to 100000000 @ 8000-d000

on VGA monitor.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-08 17:33                                     ` yhlu
  0 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 17:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Yes.  setup.S has always been skipped by bzImage via the kexec path
> unless you explicitly tell /sbin/kexec to use the 16bit entry point.
>
> Is not having a VGA console a new thing, or it something you just noticed?
>
> Eric
>

before the changes, it works well.

with --real-mode, it will reset the machine.
with --reset-vga, i will get

Kernel alive
kernel direct mapping tables up to 100000000 @ 8000-d000

on VGA monitor.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 17:24                                 ` Vivek Goyal
@ 2007-05-08 17:34                                   ` yhlu
  2007-05-08 17:34                                   ` yhlu
  1 sibling, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 17:34 UTC (permalink / raw)
  To: vgoyal
  Cc: Eric W. Biederman, H. Peter Anvin, Jeremy Fitzhardinge,
	Gerd Hoffmann, Jeff Garzik, patches, linux-kernel,
	virtualization

On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> setup.S is never executed while doing kexec (unless somebody chooses to
> do a real mode entry) and these patches don't change this beahviour.
>
> Tomorrow I will test VGA behaviour on my machine. Are you using some
> special frame buffer mode etc?
>

I disabled the FB in the kernel.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 17:24                                 ` Vivek Goyal
  2007-05-08 17:34                                   ` yhlu
@ 2007-05-08 17:34                                   ` yhlu
  1 sibling, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 17:34 UTC (permalink / raw)
  To: vgoyal
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	H. Peter Anvin, Eric W. Biederman

On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> setup.S is never executed while doing kexec (unless somebody chooses to
> do a real mode entry) and these patches don't change this beahviour.
>
> Tomorrow I will test VGA behaviour on my machine. Are you using some
> special frame buffer mode etc?
>

I disabled the FB in the kernel.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 17:33                                     ` yhlu
  (?)
@ 2007-05-08 18:51                                     ` yhlu
  2007-05-08 19:01                                       ` yhlu
                                                         ` (5 more replies)
  -1 siblings, 6 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 18:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

Eric,

i tried to load vmlinux with kexec and got
Ramdisks not supported with generic elf arguments

So i use mkelfImage with my patch ( convert elf64 to elf32) to make
another elf32. and loaded with kexec and can not get vga console too.
---serial console works well.

the mkelfImage 2.7 patch is at
http://72.14.253.104/search?q=cache:fuxOvFw3ZIIJ:lists.osdl.org/pipermail/fastboot/attachments/20061108/009064a6/attachment.obj+mkelfImage+2.7+patch&hl=en&ct=clnk&cd=4&gl=us

So the problem is not bzImage related, but in somewhere in vmlinux.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 17:33                                     ` yhlu
  (?)
  (?)
@ 2007-05-08 18:51                                     ` yhlu
  -1 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 18:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

Eric,

i tried to load vmlinux with kexec and got
Ramdisks not supported with generic elf arguments

So i use mkelfImage with my patch ( convert elf64 to elf32) to make
another elf32. and loaded with kexec and can not get vga console too.
---serial console works well.

the mkelfImage 2.7 patch is at
http://72.14.253.104/search?q=cache:fuxOvFw3ZIIJ:lists.osdl.org/pipermail/fastboot/attachments/20061108/009064a6/attachment.obj+mkelfImage+2.7+patch&hl=en&ct=clnk&cd=4&gl=us

So the problem is not bzImage related, but in somewhere in vmlinux.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 18:51                                     ` yhlu
  2007-05-08 19:01                                       ` yhlu
@ 2007-05-08 19:01                                       ` yhlu
  2007-05-08 19:11                                       ` Eric W. Biederman
                                                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 19:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

> the mkelfImage 2.7 patch is at

https://lists.linux-foundation.org/pipermail/fastboot/attachments/20061108/009064a6/mkelfImage_2.7_amd64_1108-0001.obj

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 18:51                                     ` yhlu
@ 2007-05-08 19:01                                       ` yhlu
  2007-05-08 19:01                                       ` yhlu
                                                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 19:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

> the mkelfImage 2.7 patch is at

https://lists.linux-foundation.org/pipermail/fastboot/attachments/20061108/009064a6/mkelfImage_2.7_amd64_1108-0001.obj

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 18:51                                     ` yhlu
  2007-05-08 19:01                                       ` yhlu
  2007-05-08 19:01                                       ` yhlu
@ 2007-05-08 19:11                                       ` Eric W. Biederman
  2007-05-08 22:00                                         ` yhlu
  2007-05-08 22:00                                         ` yhlu
  2007-05-08 19:11                                       ` Eric W. Biederman
                                                         ` (2 subsequent siblings)
  5 siblings, 2 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-08 19:11 UTC (permalink / raw)
  To: yhlu
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

yhlu <yhlu.kernel@gmail.com> writes:

> Eric,
>
> i tried to load vmlinux with kexec and got
> Ramdisks not supported with generic elf arguments
>
> So i use mkelfImage with my patch ( convert elf64 to elf32) to make
> another elf32. and loaded with kexec and can not get vga console too.
> ---serial console works well.
>
> the mkelfImage 2.7 patch is at
> http://72.14.253.104/search?q=cache:fuxOvFw3ZIIJ:lists.osdl.org/pipermail/fastboot/attachments/20061108/009064a6/attachment.obj+mkelfImage+2.7+patch&hl=en&ct=clnk&cd=4&gl=us
>
> So the problem is not bzImage related, but in somewhere in vmlinux.

Odd.   Is it specifically these patches?
Or is it just the recent kernel from Linus?

You might try a git-bisect, or if it is just these patches
walking through them one-by-one.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 18:51                                     ` yhlu
                                                         ` (2 preceding siblings ...)
  2007-05-08 19:11                                       ` Eric W. Biederman
@ 2007-05-08 19:11                                       ` Eric W. Biederman
  2007-05-09  3:33                                       ` Vivek Goyal
  2007-05-09  3:33                                       ` Vivek Goyal
  5 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-08 19:11 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> Eric,
>
> i tried to load vmlinux with kexec and got
> Ramdisks not supported with generic elf arguments
>
> So i use mkelfImage with my patch ( convert elf64 to elf32) to make
> another elf32. and loaded with kexec and can not get vga console too.
> ---serial console works well.
>
> the mkelfImage 2.7 patch is at
> http://72.14.253.104/search?q=cache:fuxOvFw3ZIIJ:lists.osdl.org/pipermail/fastboot/attachments/20061108/009064a6/attachment.obj+mkelfImage+2.7+patch&hl=en&ct=clnk&cd=4&gl=us
>
> So the problem is not bzImage related, but in somewhere in vmlinux.

Odd.   Is it specifically these patches?
Or is it just the recent kernel from Linus?

You might try a git-bisect, or if it is just these patches
walking through them one-by-one.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 19:11                                       ` Eric W. Biederman
  2007-05-08 22:00                                         ` yhlu
@ 2007-05-08 22:00                                         ` yhlu
  2007-05-08 22:07                                           ` Jeremy Fitzhardinge
  2007-05-08 22:07                                           ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 22:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> You might try a git-bisect, or if it is just these patches
> walking through them one-by-one.

f82af20e1a028e16b9bb11da081fa1148d40fa6a is first bad commit
commit f82af20e1a028e16b9bb11da081fa1148d40fa6a
Author: Gerd Hoffmann <kraxel@suse.de>
Date:   Wed May 2 19:27:19 2007 +0200

    [PATCH] x86-64: ignore vgacon if hardware not present

    Avoid trying to set up vgacon if there's no vga hardware present.

    Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Andi Kleen <ak@suse.de>
    Cc: Alan <alan@lxorguk.ukuu.org.uk>
    Acked-by: Ingo Molnar <mingo@elte.hu>

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 19:11                                       ` Eric W. Biederman
@ 2007-05-08 22:00                                         ` yhlu
  2007-05-08 22:00                                         ` yhlu
  1 sibling, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 22:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> You might try a git-bisect, or if it is just these patches
> walking through them one-by-one.

f82af20e1a028e16b9bb11da081fa1148d40fa6a is first bad commit
commit f82af20e1a028e16b9bb11da081fa1148d40fa6a
Author: Gerd Hoffmann <kraxel@suse.de>
Date:   Wed May 2 19:27:19 2007 +0200

    [PATCH] x86-64: ignore vgacon if hardware not present

    Avoid trying to set up vgacon if there's no vga hardware present.

    Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Andi Kleen <ak@suse.de>
    Cc: Alan <alan@lxorguk.ukuu.org.uk>
    Acked-by: Ingo Molnar <mingo@elte.hu>

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:00                                         ` yhlu
  2007-05-08 22:07                                           ` Jeremy Fitzhardinge
@ 2007-05-08 22:07                                           ` Jeremy Fitzhardinge
  2007-05-08 22:35                                             ` H. Peter Anvin
  2007-05-08 22:35                                             ` H. Peter Anvin
  1 sibling, 2 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-08 22:07 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, vgoyal, H. Peter Anvin, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> You might try a git-bisect, or if it is just these patches
>> walking through them one-by-one.
>
> f82af20e1a028e16b9bb11da081fa1148d40fa6a is first bad commit
> commit f82af20e1a028e16b9bb11da081fa1148d40fa6a
> Author: Gerd Hoffmann <kraxel@suse.de>
> Date:   Wed May 2 19:27:19 2007 +0200
>
>    [PATCH] x86-64: ignore vgacon if hardware not present
>
>    Avoid trying to set up vgacon if there's no vga hardware present.
>
>    Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
>    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>    Signed-off-by: Andi Kleen <ak@suse.de>
>    Cc: Alan <alan@lxorguk.ukuu.org.uk>
>    Acked-by: Ingo Molnar <mingo@elte.hu>

Interesting.  I haven't really been following this thread, but doesn't
it mean something isn't being initialized properly if this patch makes a
difference?

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:00                                         ` yhlu
@ 2007-05-08 22:07                                           ` Jeremy Fitzhardinge
  2007-05-08 22:07                                           ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 217+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-08 22:07 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin, Eric W. Biederman

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> You might try a git-bisect, or if it is just these patches
>> walking through them one-by-one.
>
> f82af20e1a028e16b9bb11da081fa1148d40fa6a is first bad commit
> commit f82af20e1a028e16b9bb11da081fa1148d40fa6a
> Author: Gerd Hoffmann <kraxel@suse.de>
> Date:   Wed May 2 19:27:19 2007 +0200
>
>    [PATCH] x86-64: ignore vgacon if hardware not present
>
>    Avoid trying to set up vgacon if there's no vga hardware present.
>
>    Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
>    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>    Signed-off-by: Andi Kleen <ak@suse.de>
>    Cc: Alan <alan@lxorguk.ukuu.org.uk>
>    Acked-by: Ingo Molnar <mingo@elte.hu>

Interesting.  I haven't really been following this thread, but doesn't
it mean something isn't being initialized properly if this patch makes a
difference?

    J

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:07                                           ` Jeremy Fitzhardinge
@ 2007-05-08 22:35                                             ` H. Peter Anvin
  2007-05-08 22:41                                               ` yhlu
  2007-05-08 22:41                                               ` yhlu
  2007-05-08 22:35                                             ` H. Peter Anvin
  1 sibling, 2 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-08 22:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: yhlu, Eric W. Biederman, vgoyal, Gerd Hoffmann, Jeff Garzik,
	patches, linux-kernel, virtualization

Jeremy Fitzhardinge wrote:
> 
> Interesting.  I haven't really been following this thread, but doesn't
> it mean something isn't being initialized properly if this patch makes a
> difference?
> 

Specifically boot_params.screen_info isn't being properly set up by the
caller.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:07                                           ` Jeremy Fitzhardinge
  2007-05-08 22:35                                             ` H. Peter Anvin
@ 2007-05-08 22:35                                             ` H. Peter Anvin
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-08 22:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Eric W. Biederman, Jeff Garzik, patches, linux-kernel,
	virtualization, vgoyal, yhlu

Jeremy Fitzhardinge wrote:
> 
> Interesting.  I haven't really been following this thread, but doesn't
> it mean something isn't being initialized properly if this patch makes a
> difference?
> 

Specifically boot_params.screen_info isn't being properly set up by the
caller.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:35                                             ` H. Peter Anvin
@ 2007-05-08 22:41                                               ` yhlu
  2007-05-08 23:13                                                 ` H. Peter Anvin
  2007-05-08 23:13                                                 ` H. Peter Anvin
  2007-05-08 22:41                                               ` yhlu
  1 sibling, 2 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 22:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Eric W. Biederman, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
> Jeremy Fitzhardinge wrote:
> Specifically boot_params.screen_info isn't being properly set up by the
> caller.

 will setup real_mode_data in kexec path?

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:35                                             ` H. Peter Anvin
  2007-05-08 22:41                                               ` yhlu
@ 2007-05-08 22:41                                               ` yhlu
  1 sibling, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-08 22:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman

On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
> Jeremy Fitzhardinge wrote:
> Specifically boot_params.screen_info isn't being properly set up by the
> caller.

 will setup real_mode_data in kexec path?

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:41                                               ` yhlu
  2007-05-08 23:13                                                 ` H. Peter Anvin
@ 2007-05-08 23:13                                                 ` H. Peter Anvin
  2007-05-09  1:44                                                   ` Eric W. Biederman
  2007-05-09  1:44                                                   ` Eric W. Biederman
  1 sibling, 2 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-08 23:13 UTC (permalink / raw)
  To: yhlu
  Cc: Jeremy Fitzhardinge, Eric W. Biederman, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> Jeremy Fitzhardinge wrote:
>> Specifically boot_params.screen_info isn't being properly set up by the
>> caller.
> 
> will setup real_mode_data in kexec path?

-ENOPARSE

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 22:41                                               ` yhlu
@ 2007-05-08 23:13                                                 ` H. Peter Anvin
  2007-05-08 23:13                                                 ` H. Peter Anvin
  1 sibling, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-08 23:13 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> Jeremy Fitzhardinge wrote:
>> Specifically boot_params.screen_info isn't being properly set up by the
>> caller.
> 
> will setup real_mode_data in kexec path?

-ENOPARSE

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 23:13                                                 ` H. Peter Anvin
@ 2007-05-09  1:44                                                   ` Eric W. Biederman
  2007-05-09  2:23                                                     ` H. Peter Anvin
                                                                       ` (3 more replies)
  2007-05-09  1:44                                                   ` Eric W. Biederman
  1 sibling, 4 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09  1:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: yhlu, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann, Jeff Garzik,
	patches, linux-kernel, virtualization, Rusty Russell, Andi Kleen,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> yhlu wrote:
>> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>>> Jeremy Fitzhardinge wrote:
>>> Specifically boot_params.screen_info isn't being properly set up by the
>>> caller.
>> 
>> will setup real_mode_data in kexec path?
>
> -ENOPARSE

I believe YH is asking how we setup real_mode_data in /sbin/kexec.
The setup is:
>         real_mode->orig_x = 0;
>         real_mode->orig_y = 0;
>         real_mode->orig_video_page = 0;
>         real_mode->orig_video_mode = 0;
>         real_mode->orig_video_cols = 80;
>         real_mode->orig_video_lines = 25;
>         real_mode->orig_video_ega_bx = 0;
>         real_mode->orig_video_isVGA = 1;
>         real_mode->orig_video_points = 16;

Silly but generally safe.

More relevant because the code is in kernel we have:

arch/arm/kernel/setup.c:
> struct screen_info screen_info = {
>  .orig_video_lines	= 30,
>  .orig_video_cols	= 80,
>  .orig_video_mode	= 0,
>  .orig_video_ega_bx	= 0,
>  .orig_video_isVGA	= 1,
>  .orig_video_points	= 8
> };


arch/alpha/kernel/sys_sio.c:
> 	/* The AlphaBook1 has LCD video fixed at 800x600,
> 	   37 rows and 100 cols. */
> 	screen_info.orig_y = 37;
> 	screen_info.orig_video_cols = 100;
> 	screen_info.orig_video_lines = 37;


I expect I can find a few more examples where we specify
video_cols and video_lines but we use video_mode == 0.

Going farther mode 0x00 is a BIOS 40x25 mode.  So the patch below is
not always safe even if we boot the bzImage.  It is just highly
unlikely anyone would start the kernel in 40x25 text mode. 

Therefore I expect the test should test several additional
fields, in particular video lines and columns before we
decide that we have an uninitialized screen_info and give up.


> commit f82af20e1a028e16b9bb11da081fa1148d40fa6a
> Author: Gerd Hoffmann <kraxel@suse.de>
> Date:   Wed May 2 19:27:19 2007 +0200
> 
>     [PATCH] x86-64: ignore vgacon if hardware not present
>     
>     Avoid trying to set up vgacon if there's no vga hardware present.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
>     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>     Signed-off-by: Andi Kleen <ak@suse.de>
>     Cc: Alan <alan@lxorguk.ukuu.org.uk>
>     Acked-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c
> index 91a2078..3e67c34 100644
> --- a/drivers/video/console/vgacon.c
> +++ b/drivers/video/console/vgacon.c
> @@ -371,7 +371,8 @@ static const char *vgacon_startup(void)
>         }
>  
>         /* VGA16 modes are not handled by VGACON */
> -       if ((ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
> +       if ((ORIG_VIDEO_MODE == 0x00) ||        /* SCREEN_INFO not initialized */
> +           (ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
>             (ORIG_VIDEO_MODE == 0x0E) ||        /* 640x200/4 */
>             (ORIG_VIDEO_MODE == 0x10) ||        /* 640x350/4 */
>             (ORIG_VIDEO_MODE == 0x12) ||        /* 640x480/4 */

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 23:13                                                 ` H. Peter Anvin
  2007-05-09  1:44                                                   ` Eric W. Biederman
@ 2007-05-09  1:44                                                   ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09  1:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> yhlu wrote:
>> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>>> Jeremy Fitzhardinge wrote:
>>> Specifically boot_params.screen_info isn't being properly set up by the
>>> caller.
>> 
>> will setup real_mode_data in kexec path?
>
> -ENOPARSE

I believe YH is asking how we setup real_mode_data in /sbin/kexec.
The setup is:
>         real_mode->orig_x = 0;
>         real_mode->orig_y = 0;
>         real_mode->orig_video_page = 0;
>         real_mode->orig_video_mode = 0;
>         real_mode->orig_video_cols = 80;
>         real_mode->orig_video_lines = 25;
>         real_mode->orig_video_ega_bx = 0;
>         real_mode->orig_video_isVGA = 1;
>         real_mode->orig_video_points = 16;

Silly but generally safe.

More relevant because the code is in kernel we have:

arch/arm/kernel/setup.c:
> struct screen_info screen_info = {
>  .orig_video_lines	= 30,
>  .orig_video_cols	= 80,
>  .orig_video_mode	= 0,
>  .orig_video_ega_bx	= 0,
>  .orig_video_isVGA	= 1,
>  .orig_video_points	= 8
> };


arch/alpha/kernel/sys_sio.c:
> 	/* The AlphaBook1 has LCD video fixed at 800x600,
> 	   37 rows and 100 cols. */
> 	screen_info.orig_y = 37;
> 	screen_info.orig_video_cols = 100;
> 	screen_info.orig_video_lines = 37;


I expect I can find a few more examples where we specify
video_cols and video_lines but we use video_mode == 0.

Going farther mode 0x00 is a BIOS 40x25 mode.  So the patch below is
not always safe even if we boot the bzImage.  It is just highly
unlikely anyone would start the kernel in 40x25 text mode. 

Therefore I expect the test should test several additional
fields, in particular video lines and columns before we
decide that we have an uninitialized screen_info and give up.


> commit f82af20e1a028e16b9bb11da081fa1148d40fa6a
> Author: Gerd Hoffmann <kraxel@suse.de>
> Date:   Wed May 2 19:27:19 2007 +0200
> 
>     [PATCH] x86-64: ignore vgacon if hardware not present
>     
>     Avoid trying to set up vgacon if there's no vga hardware present.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
>     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>     Signed-off-by: Andi Kleen <ak@suse.de>
>     Cc: Alan <alan@lxorguk.ukuu.org.uk>
>     Acked-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c
> index 91a2078..3e67c34 100644
> --- a/drivers/video/console/vgacon.c
> +++ b/drivers/video/console/vgacon.c
> @@ -371,7 +371,8 @@ static const char *vgacon_startup(void)
>         }
>  
>         /* VGA16 modes are not handled by VGACON */
> -       if ((ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
> +       if ((ORIG_VIDEO_MODE == 0x00) ||        /* SCREEN_INFO not initialized */
> +           (ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
>             (ORIG_VIDEO_MODE == 0x0E) ||        /* 640x200/4 */
>             (ORIG_VIDEO_MODE == 0x10) ||        /* 640x350/4 */
>             (ORIG_VIDEO_MODE == 0x12) ||        /* 640x480/4 */

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  1:44                                                   ` Eric W. Biederman
  2007-05-09  2:23                                                     ` H. Peter Anvin
@ 2007-05-09  2:23                                                     ` H. Peter Anvin
  2007-05-09  3:30                                                       ` Eric W. Biederman
  2007-05-09  3:30                                                       ` Eric W. Biederman
  2007-05-09  2:44                                                     ` yhlu
  2007-05-09  2:44                                                     ` yhlu
  3 siblings, 2 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  2:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: yhlu, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann, Jeff Garzik,
	patches, linux-kernel, virtualization, Rusty Russell, Andi Kleen,
	Ingo Molnar

Eric W. Biederman wrote:
> 
> I expect I can find a few more examples where we specify
> video_cols and video_lines but we use video_mode == 0.
> 
> Going farther mode 0x00 is a BIOS 40x25 mode.  So the patch below is
> not always safe even if we boot the bzImage.  It is just highly
> unlikely anyone would start the kernel in 40x25 text mode. 
> 

Mode 0x00 is, at least theoretically, BIOS 40x25 *grayscale*; this mode
(and mode 0x02 which is the same thing in 80x25) were as far as I know
only ever used with composite monitors off CGA cards, i.e. functionally
never.  Actual monochrome monitors used mode 0x07.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  1:44                                                   ` Eric W. Biederman
@ 2007-05-09  2:23                                                     ` H. Peter Anvin
  2007-05-09  2:23                                                     ` H. Peter Anvin
                                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  2:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	Ingo Molnar

Eric W. Biederman wrote:
> 
> I expect I can find a few more examples where we specify
> video_cols and video_lines but we use video_mode == 0.
> 
> Going farther mode 0x00 is a BIOS 40x25 mode.  So the patch below is
> not always safe even if we boot the bzImage.  It is just highly
> unlikely anyone would start the kernel in 40x25 text mode. 
> 

Mode 0x00 is, at least theoretically, BIOS 40x25 *grayscale*; this mode
(and mode 0x02 which is the same thing in 80x25) were as far as I know
only ever used with composite monitors off CGA cards, i.e. functionally
never.  Actual monochrome monitors used mode 0x07.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  1:44                                                   ` Eric W. Biederman
                                                                       ` (2 preceding siblings ...)
  2007-05-09  2:44                                                     ` yhlu
@ 2007-05-09  2:44                                                     ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  2:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> I believe YH is asking how we setup real_mode_data in /sbin/kexec.

pxelinux:

SCREEN_INFO.orig_video_mode =  3
SCREEN_INFO.orig_x =  0
SCREEN_INFO.orig_y =  24
x86_boot_params[] :
0000: 00 18 ff ff 08 00 03 50 8c c8 03 00 8e c0 19 01
0010: 10 00 7c fb fc be 31 00 ac 20 c0 74 09 b4 0e bb
0020: 07 00 cd 10 eb f2 31 c0 cd 16 cd 19 ea f0 ff 00
0030: f0 44 69 72 65 63 74 20 62 6f 6f 74 15 00 10 00

current kexec:
SCREEN_INFO.orig_video_mode =  0
SCREEN_INFO.orig_x =  0
SCREEN_INFO.orig_y =  3
x86_boot_params[] :
0000: 00 03 00 fc 00 00 00 50 8c c8 00 00 8e c0 19 01
0010: 10 00 7c fb fc be 31 00 ac 20 c0 74 09 b4 0e bb
0020: 3f a3 00 16 eb f2 31 c0 cd 16 cd 19 ea f0 ff 00
0030: f0 44 69 72 65 63 74 20 62 6f 6f 74 15 00 20 00

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  1:44                                                   ` Eric W. Biederman
  2007-05-09  2:23                                                     ` H. Peter Anvin
  2007-05-09  2:23                                                     ` H. Peter Anvin
@ 2007-05-09  2:44                                                     ` yhlu
  2007-05-09  2:44                                                     ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  2:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin, Ingo Molnar

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> I believe YH is asking how we setup real_mode_data in /sbin/kexec.

pxelinux:

SCREEN_INFO.orig_video_mode =  3
SCREEN_INFO.orig_x =  0
SCREEN_INFO.orig_y =  24
x86_boot_params[] :
0000: 00 18 ff ff 08 00 03 50 8c c8 03 00 8e c0 19 01
0010: 10 00 7c fb fc be 31 00 ac 20 c0 74 09 b4 0e bb
0020: 07 00 cd 10 eb f2 31 c0 cd 16 cd 19 ea f0 ff 00
0030: f0 44 69 72 65 63 74 20 62 6f 6f 74 15 00 10 00

current kexec:
SCREEN_INFO.orig_video_mode =  0
SCREEN_INFO.orig_x =  0
SCREEN_INFO.orig_y =  3
x86_boot_params[] :
0000: 00 03 00 fc 00 00 00 50 8c c8 00 00 8e c0 19 01
0010: 10 00 7c fb fc be 31 00 ac 20 c0 74 09 b4 0e bb
0020: 3f a3 00 16 eb f2 31 c0 cd 16 cd 19 ea f0 ff 00
0030: f0 44 69 72 65 63 74 20 62 6f 6f 74 15 00 20 00

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  2:23                                                     ` H. Peter Anvin
@ 2007-05-09  3:30                                                       ` Eric W. Biederman
  2007-05-09  4:52                                                         ` yhlu
                                                                           ` (4 more replies)
  2007-05-09  3:30                                                       ` Eric W. Biederman
  1 sibling, 5 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09  3:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: yhlu, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann, Jeff Garzik,
	patches, linux-kernel, virtualization, Rusty Russell, Andi Kleen,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> I expect I can find a few more examples where we specify
>> video_cols and video_lines but we use video_mode == 0.
>> 
>> Going farther mode 0x00 is a BIOS 40x25 mode.  So the patch below is
>> not always safe even if we boot the bzImage.  It is just highly
>> unlikely anyone would start the kernel in 40x25 text mode. 
>> 
>
> Mode 0x00 is, at least theoretically, BIOS 40x25 *grayscale*; this mode
> (and mode 0x02 which is the same thing in 80x25) were as far as I know
> only ever used with composite monitors off CGA cards, i.e. functionally
> never.  Actual monochrome monitors used mode 0x07.

I agree.  We are not at all likely to see it in practice.  Even
if my memory is correct and vga cards and non-monochrome cga
cards supported that mode.

That doesn't mean checking for 0x00 is sufficient to detect
an initialized struct screen_info, or a lack of a video screen.

We have in kernel historical precedent for using 0x00 as just meaning
a text mode.  I'm fairly certain that I looked I more closely I could
find this convention of using 0x00 to mean a text mode on ia64, mips,
and ppc, in addition to the instances I found on alpha, arm,

Since the whole point is to detect the case where we don't have
a screen at all it makes sense to check several additional variables
and make certain that they are all 0.  Agreed?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  2:23                                                     ` H. Peter Anvin
  2007-05-09  3:30                                                       ` Eric W. Biederman
@ 2007-05-09  3:30                                                       ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09  3:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> 
>> I expect I can find a few more examples where we specify
>> video_cols and video_lines but we use video_mode == 0.
>> 
>> Going farther mode 0x00 is a BIOS 40x25 mode.  So the patch below is
>> not always safe even if we boot the bzImage.  It is just highly
>> unlikely anyone would start the kernel in 40x25 text mode. 
>> 
>
> Mode 0x00 is, at least theoretically, BIOS 40x25 *grayscale*; this mode
> (and mode 0x02 which is the same thing in 80x25) were as far as I know
> only ever used with composite monitors off CGA cards, i.e. functionally
> never.  Actual monochrome monitors used mode 0x07.

I agree.  We are not at all likely to see it in practice.  Even
if my memory is correct and vga cards and non-monochrome cga
cards supported that mode.

That doesn't mean checking for 0x00 is sufficient to detect
an initialized struct screen_info, or a lack of a video screen.

We have in kernel historical precedent for using 0x00 as just meaning
a text mode.  I'm fairly certain that I looked I more closely I could
find this convention of using 0x00 to mean a text mode on ia64, mips,
and ppc, in addition to the instances I found on alpha, arm,

Since the whole point is to detect the case where we don't have
a screen at all it makes sense to check several additional variables
and make certain that they are all 0.  Agreed?

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 18:51                                     ` yhlu
                                                         ` (4 preceding siblings ...)
  2007-05-09  3:33                                       ` Vivek Goyal
@ 2007-05-09  3:33                                       ` Vivek Goyal
  2007-05-09  4:42                                         ` yhlu
                                                           ` (3 more replies)
  5 siblings, 4 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-05-09  3:33 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, H. Peter Anvin, Jeremy Fitzhardinge,
	Gerd Hoffmann, Jeff Garzik, patches, linux-kernel,
	virtualization

On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
> Eric,
> 
> i tried to load vmlinux with kexec and got
> Ramdisks not supported with generic elf arguments
> 

This message generally appears if you did not specify --args-linux
on kexec command line while loading vmlinux.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-08 18:51                                     ` yhlu
                                                         ` (3 preceding siblings ...)
  2007-05-08 19:11                                       ` Eric W. Biederman
@ 2007-05-09  3:33                                       ` Vivek Goyal
  2007-05-09  3:33                                       ` Vivek Goyal
  5 siblings, 0 replies; 217+ messages in thread
From: Vivek Goyal @ 2007-05-09  3:33 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	H. Peter Anvin, Eric W. Biederman

On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
> Eric,
> 
> i tried to load vmlinux with kexec and got
> Ramdisks not supported with generic elf arguments
> 

This message generally appears if you did not specify --args-linux
on kexec command line while loading vmlinux.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:33                                       ` Vivek Goyal
  2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:42                                         ` yhlu
@ 2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:58                                           ` ebiederm
                                                             ` (5 more replies)
  2007-05-09  4:42                                         ` yhlu
  3 siblings, 6 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:42 UTC (permalink / raw)
  To: vgoyal
  Cc: Eric W. Biederman, H. Peter Anvin, Jeremy Fitzhardinge,
	Gerd Hoffmann, Jeff Garzik, patches, linux-kernel,
	virtualization

On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
> This message generally appears if you did not specify --args-linux
> on kexec command line while loading vmlinux.
>
besides elf-x86_64, still need --args-linux to pass sth? but how to
let it load ramdisk?

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:33                                       ` Vivek Goyal
                                                           ` (2 preceding siblings ...)
  2007-05-09  4:42                                         ` yhlu
@ 2007-05-09  4:42                                         ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:42 UTC (permalink / raw)
  To: vgoyal
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	H. Peter Anvin, Eric W. Biederman

On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
> This message generally appears if you did not specify --args-linux
> on kexec command line while loading vmlinux.
>
besides elf-x86_64, still need --args-linux to pass sth? but how to
let it load ramdisk?

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:33                                       ` Vivek Goyal
@ 2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:42                                         ` yhlu
                                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:42 UTC (permalink / raw)
  To: vgoyal
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	H. Peter Anvin, Eric W. Biederman

On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
> This message generally appears if you did not specify --args-linux
> on kexec command line while loading vmlinux.
>
besides elf-x86_64, still need --args-linux to pass sth? but how to
let it load ramdisk?

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:33                                       ` Vivek Goyal
  2007-05-09  4:42                                         ` yhlu
@ 2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:42                                         ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:42 UTC (permalink / raw)
  To: vgoyal
  Cc: Jeff Garzik, patches, linux-kernel, virtualization,
	H. Peter Anvin, Eric W. Biederman

On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
> This message generally appears if you did not specify --args-linux
> on kexec command line while loading vmlinux.
>
besides elf-x86_64, still need --args-linux to pass sth? but how to
let it load ramdisk?

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:30                                                       ` Eric W. Biederman
  2007-05-09  4:52                                                         ` yhlu
@ 2007-05-09  4:52                                                         ` yhlu
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                             ` (5 more replies)
  2007-05-09  4:52                                                         ` yhlu
                                                                           ` (2 subsequent siblings)
  4 siblings, 6 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Since the whole point is to detect the case where we don't have
> a screen at all it makes sense to check several additional variables
> and make certain that they are all 0.  Agreed?

need one good way to find if there is support vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:30                                                       ` Eric W. Biederman
  2007-05-09  4:52                                                         ` yhlu
  2007-05-09  4:52                                                         ` yhlu
@ 2007-05-09  4:52                                                         ` yhlu
  2007-05-09  7:58                                                         ` Gerd Hoffmann
  2007-05-09  7:58                                                         ` Gerd Hoffmann
  4 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin, Ingo Molnar

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Since the whole point is to detect the case where we don't have
> a screen at all it makes sense to check several additional variables
> and make certain that they are all 0.  Agreed?

need one good way to find if there is support vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:30                                                       ` Eric W. Biederman
@ 2007-05-09  4:52                                                         ` yhlu
  2007-05-09  4:52                                                         ` yhlu
                                                                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  4:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin, Ingo Molnar

On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Since the whole point is to detect the case where we don't have
> a screen at all it makes sense to check several additional variables
> and make certain that they are all 0.  Agreed?

need one good way to find if there is support vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:58                                           ` ebiederm
@ 2007-05-09  4:58                                           ` Eric W. Biederman
  2007-05-09  4:58                                           ` ebiederm
                                                             ` (3 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09  4:58 UTC (permalink / raw)
  To: yhlu
  Cc: vgoyal, H. Peter Anvin, Jeremy Fitzhardinge, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization

yhlu <yhlu.kernel@gmail.com> writes:

> On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
>> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
>> This message generally appears if you did not specify --args-linux
>> on kexec command line while loading vmlinux.
>>
> besides elf-x86_64, still need --args-linux to pass sth? but how to
> let it load ramdisk?

Same arguments just use --args-linux.

Basically the calling convention needs to be specified because
there isn't a universal one, and /sbin/kexec can't yet detect
vmlinux is linux.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:42                                         ` yhlu
                                                             ` (4 preceding siblings ...)
  2007-05-09  4:58                                           ` ebiederm
@ 2007-05-09  4:58                                           ` Eric W. Biederman
  5 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09  4:58 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
>> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
>> This message generally appears if you did not specify --args-linux
>> on kexec command line while loading vmlinux.
>>
> besides elf-x86_64, still need --args-linux to pass sth? but how to
> let it load ramdisk?

Same arguments just use --args-linux.

Basically the calling convention needs to be specified because
there isn't a universal one, and /sbin/kexec can't yet detect
vmlinux is linux.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:42                                         ` yhlu
  2007-05-09  4:58                                           ` ebiederm
  2007-05-09  4:58                                           ` Eric W. Biederman
@ 2007-05-09  4:58                                           ` ebiederm
  2007-05-09  4:58                                           ` ebiederm
                                                             ` (2 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: ebiederm @ 2007-05-09  4:58 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
>> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
>> This message generally appears if you did not specify --args-linux
>> on kexec command line while loading vmlinux.
>>
> besides elf-x86_64, still need --args-linux to pass sth? but how to
> let it load ramdisk?

Same arguments just use --args-linux.

Basically the calling convention needs to be specified because
there isn't a universal one, and /sbin/kexec can't yet detect
vmlinux is linux.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:42                                         ` yhlu
                                                             ` (3 preceding siblings ...)
  2007-05-09  4:58                                           ` ebiederm
@ 2007-05-09  4:58                                           ` ebiederm
  2007-05-09  4:58                                           ` Eric W. Biederman
  5 siblings, 0 replies; 217+ messages in thread
From: ebiederm @ 2007-05-09  4:58 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
>> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
>> This message generally appears if you did not specify --args-linux
>> on kexec command line while loading vmlinux.
>>
> besides elf-x86_64, still need --args-linux to pass sth? but how to
> let it load ramdisk?

Same arguments just use --args-linux.

Basically the calling convention needs to be specified because
there isn't a universal one, and /sbin/kexec can't yet detect
vmlinux is linux.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:42                                         ` yhlu
@ 2007-05-09  4:58                                           ` ebiederm
  2007-05-09  4:58                                           ` Eric W. Biederman
                                                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: ebiederm @ 2007-05-09  4:58 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
>> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
>> This message generally appears if you did not specify --args-linux
>> on kexec command line while loading vmlinux.
>>
> besides elf-x86_64, still need --args-linux to pass sth? but how to
> let it load ramdisk?

Same arguments just use --args-linux.

Basically the calling convention needs to be specified because
there isn't a universal one, and /sbin/kexec can't yet detect
vmlinux is linux.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:42                                         ` yhlu
                                                             ` (2 preceding siblings ...)
  2007-05-09  4:58                                           ` ebiederm
@ 2007-05-09  4:58                                           ` ebiederm
  2007-05-09  4:58                                           ` ebiederm
  2007-05-09  4:58                                           ` Eric W. Biederman
  5 siblings, 0 replies; 217+ messages in thread
From: ebiederm @ 2007-05-09  4:58 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin

yhlu <yhlu.kernel@gmail.com> writes:

> On 5/8/07, Vivek Goyal <vgoyal@in.ibm.com> wrote:
>> On Tue, May 08, 2007 at 11:51:35AM -0700, yhlu wrote:
>> This message generally appears if you did not specify --args-linux
>> on kexec command line while loading vmlinux.
>>
> besides elf-x86_64, still need --args-linux to pass sth? but how to
> let it load ramdisk?

Same arguments just use --args-linux.

Basically the calling convention needs to be specified because
there isn't a universal one, and /sbin/kexec can't yet detect
vmlinux is linux.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:52                                                         ` yhlu
  2007-05-09  5:04                                                           ` H. Peter Anvin
@ 2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
                                                                               ` (5 more replies)
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                             ` (3 subsequent siblings)
  5 siblings, 6 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:04 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
> 
> need one good way to find if there is support vga console.

There really isn't one, at least not given the current data structure;
the data structure has an "isVGA" flag, but if that is 0 it's supposed
to mean CGA/MDA/HGC/EGA, as opposed to VGA...

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:52                                                         ` yhlu
                                                                             ` (3 preceding siblings ...)
  2007-05-09  5:04                                                           ` H. Peter Anvin
@ 2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:04                                                           ` H. Peter Anvin
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:04 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
> 
> need one good way to find if there is support vga console.

There really isn't one, at least not given the current data structure;
the data structure has an "isVGA" flag, but if that is 0 it's supposed
to mean CGA/MDA/HGC/EGA, as opposed to VGA...

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:52                                                         ` yhlu
                                                                             ` (2 preceding siblings ...)
  2007-05-09  5:04                                                           ` H. Peter Anvin
@ 2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:04                                                           ` H. Peter Anvin
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:04 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
> 
> need one good way to find if there is support vga console.

There really isn't one, at least not given the current data structure;
the data structure has an "isVGA" flag, but if that is 0 it's supposed
to mean CGA/MDA/HGC/EGA, as opposed to VGA...

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:52                                                         ` yhlu
@ 2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:04 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
> 
> need one good way to find if there is support vga console.

There really isn't one, at least not given the current data structure;
the data structure has an "isVGA" flag, but if that is 0 it's supposed
to mean CGA/MDA/HGC/EGA, as opposed to VGA...

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:52                                                         ` yhlu
  2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:04                                                           ` H. Peter Anvin
@ 2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                             ` (2 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:04 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
> 
> need one good way to find if there is support vga console.

There really isn't one, at least not given the current data structure;
the data structure has an "isVGA" flag, but if that is 0 it's supposed
to mean CGA/MDA/HGC/EGA, as opposed to VGA...

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  4:52                                                         ` yhlu
                                                                             ` (4 preceding siblings ...)
  2007-05-09  5:04                                                           ` H. Peter Anvin
@ 2007-05-09  5:04                                                           ` H. Peter Anvin
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:04 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
> 
> need one good way to find if there is support vga console.

There really isn't one, at least not given the current data structure;
the data structure has an "isVGA" flag, but if that is 0 it's supposed
to mean CGA/MDA/HGC/EGA, as opposed to VGA...

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                               ` (3 preceding siblings ...)
  2007-05-09  5:08                                                             ` H. Peter Anvin
@ 2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:48                                                               ` yhlu
                                                                                 ` (3 more replies)
  2007-05-09  5:08                                                             ` H. Peter Anvin
  5 siblings, 4 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:08 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

H. Peter Anvin wrote:
> yhlu wrote:
>> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Since the whole point is to detect the case where we don't have
>>> a screen at all it makes sense to check several additional variables
>>> and make certain that they are all 0.  Agreed?
>> need one good way to find if there is support vga console.
> 
> There really isn't one, at least not given the current data structure;
> the data structure has an "isVGA" flag, but if that is 0 it's supposed
> to mean CGA/MDA/HGC/EGA, as opposed to VGA...

Of course, one could argue that since all of those were obsolete by the
time Linux was first created, that it probably doesn't matter and that
isVGA == 0 pretty much means the more obvious thing.

MDA/HGC stuck around for quite a while for debugging, since it was
non-conflicting with VGA, but even if it is, the reason people put it in
their system is to have something that the OS doesn't readily see.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                               ` (2 preceding siblings ...)
  2007-05-09  5:08                                                             ` H. Peter Anvin
@ 2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:08 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

H. Peter Anvin wrote:
> yhlu wrote:
>> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Since the whole point is to detect the case where we don't have
>>> a screen at all it makes sense to check several additional variables
>>> and make certain that they are all 0.  Agreed?
>> need one good way to find if there is support vga console.
> 
> There really isn't one, at least not given the current data structure;
> the data structure has an "isVGA" flag, but if that is 0 it's supposed
> to mean CGA/MDA/HGC/EGA, as opposed to VGA...

Of course, one could argue that since all of those were obsolete by the
time Linux was first created, that it probably doesn't matter and that
isVGA == 0 pretty much means the more obvious thing.

MDA/HGC stuck around for quite a while for debugging, since it was
non-conflicting with VGA, but even if it is, the reason people put it in
their system is to have something that the OS doesn't readily see.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:04                                                           ` H. Peter Anvin
@ 2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
                                                                               ` (4 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:08 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

H. Peter Anvin wrote:
> yhlu wrote:
>> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Since the whole point is to detect the case where we don't have
>>> a screen at all it makes sense to check several additional variables
>>> and make certain that they are all 0.  Agreed?
>> need one good way to find if there is support vga console.
> 
> There really isn't one, at least not given the current data structure;
> the data structure has an "isVGA" flag, but if that is 0 it's supposed
> to mean CGA/MDA/HGC/EGA, as opposed to VGA...

Of course, one could argue that since all of those were obsolete by the
time Linux was first created, that it probably doesn't matter and that
isVGA == 0 pretty much means the more obvious thing.

MDA/HGC stuck around for quite a while for debugging, since it was
non-conflicting with VGA, but even if it is, the reason people put it in
their system is to have something that the OS doesn't readily see.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
@ 2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
                                                                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:08 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

H. Peter Anvin wrote:
> yhlu wrote:
>> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Since the whole point is to detect the case where we don't have
>>> a screen at all it makes sense to check several additional variables
>>> and make certain that they are all 0.  Agreed?
>> need one good way to find if there is support vga console.
> 
> There really isn't one, at least not given the current data structure;
> the data structure has an "isVGA" flag, but if that is 0 it's supposed
> to mean CGA/MDA/HGC/EGA, as opposed to VGA...

Of course, one could argue that since all of those were obsolete by the
time Linux was first created, that it probably doesn't matter and that
isVGA == 0 pretty much means the more obvious thing.

MDA/HGC stuck around for quite a while for debugging, since it was
non-conflicting with VGA, but even if it is, the reason people put it in
their system is to have something that the OS doesn't readily see.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:04                                                           ` H. Peter Anvin
                                                                               ` (4 preceding siblings ...)
  2007-05-09  5:08                                                             ` H. Peter Anvin
@ 2007-05-09  5:08                                                             ` H. Peter Anvin
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:08 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

H. Peter Anvin wrote:
> yhlu wrote:
>> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Since the whole point is to detect the case where we don't have
>>> a screen at all it makes sense to check several additional variables
>>> and make certain that they are all 0.  Agreed?
>> need one good way to find if there is support vga console.
> 
> There really isn't one, at least not given the current data structure;
> the data structure has an "isVGA" flag, but if that is 0 it's supposed
> to mean CGA/MDA/HGC/EGA, as opposed to VGA...

Of course, one could argue that since all of those were obsolete by the
time Linux was first created, that it probably doesn't matter and that
isVGA == 0 pretty much means the more obvious thing.

MDA/HGC stuck around for quite a while for debugging, since it was
non-conflicting with VGA, but even if it is, the reason people put it in
their system is to have something that the OS doesn't readily see.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:04                                                           ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
@ 2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:08                                                             ` H. Peter Anvin
                                                                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:08 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

H. Peter Anvin wrote:
> yhlu wrote:
>> On 5/8/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> Since the whole point is to detect the case where we don't have
>>> a screen at all it makes sense to check several additional variables
>>> and make certain that they are all 0.  Agreed?
>> need one good way to find if there is support vga console.
> 
> There really isn't one, at least not given the current data structure;
> the data structure has an "isVGA" flag, but if that is 0 it's supposed
> to mean CGA/MDA/HGC/EGA, as opposed to VGA...

Of course, one could argue that since all of those were obsolete by the
time Linux was first created, that it probably doesn't matter and that
isVGA == 0 pretty much means the more obvious thing.

MDA/HGC stuck around for quite a while for debugging, since it was
non-conflicting with VGA, but even if it is, the reason people put it in
their system is to have something that the OS doesn't readily see.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:48                                                               ` yhlu
@ 2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:54                                                                 ` H. Peter Anvin
                                                                                   ` (11 more replies)
  2007-05-09  5:48                                                               ` yhlu
  3 siblings, 12 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  5:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
> H. Peter Anvin wrote:
> Of course, one could argue that since all of those were obsolete by the
> time Linux was first created, that it probably doesn't matter and that
> isVGA == 0 pretty much means the more obvious thing.
>
> MDA/HGC stuck around for quite a while for debugging, since it was
> non-conflicting with VGA, but even if it is, the reason people put it in
> their system is to have something that the OS doesn't readily see.
>
so the kexec tools need to scan the pci devices list, and find out how
to set real_mode.isVGA and orig_video_mode, also need to parse the
comand line about vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:08                                                             ` H. Peter Anvin
                                                                                 ` (2 preceding siblings ...)
  2007-05-09  5:48                                                               ` yhlu
@ 2007-05-09  5:48                                                               ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  5:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
> H. Peter Anvin wrote:
> Of course, one could argue that since all of those were obsolete by the
> time Linux was first created, that it probably doesn't matter and that
> isVGA == 0 pretty much means the more obvious thing.
>
> MDA/HGC stuck around for quite a while for debugging, since it was
> non-conflicting with VGA, but even if it is, the reason people put it in
> their system is to have something that the OS doesn't readily see.
>
so the kexec tools need to scan the pci devices list, and find out how
to set real_mode.isVGA and orig_video_mode, also need to parse the
comand line about vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:08                                                             ` H. Peter Anvin
  2007-05-09  5:48                                                               ` yhlu
@ 2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:48                                                               ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  5:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
> H. Peter Anvin wrote:
> Of course, one could argue that since all of those were obsolete by the
> time Linux was first created, that it probably doesn't matter and that
> isVGA == 0 pretty much means the more obvious thing.
>
> MDA/HGC stuck around for quite a while for debugging, since it was
> non-conflicting with VGA, but even if it is, the reason people put it in
> their system is to have something that the OS doesn't readily see.
>
so the kexec tools need to scan the pci devices list, and find out how
to set real_mode.isVGA and orig_video_mode, also need to parse the
comand line about vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:08                                                             ` H. Peter Anvin
@ 2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:48                                                               ` yhlu
                                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09  5:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
> H. Peter Anvin wrote:
> Of course, one could argue that since all of those were obsolete by the
> time Linux was first created, that it probably doesn't matter and that
> isVGA == 0 pretty much means the more obvious thing.
>
> MDA/HGC stuck around for quite a while for debugging, since it was
> non-conflicting with VGA, but even if it is, the reason people put it in
> their system is to have something that the OS doesn't readily see.
>
so the kexec tools need to scan the pci devices list, and find out how
to set real_mode.isVGA and orig_video_mode, also need to parse the
comand line about vga console.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (2 preceding siblings ...)
  2007-05-09  5:54                                                                 ` H. Peter Anvin
@ 2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:54                                                                 ` H. Peter Anvin
                                                                                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:54 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> H. Peter Anvin wrote:
>> Of course, one could argue that since all of those were obsolete by the
>> time Linux was first created, that it probably doesn't matter and that
>> isVGA == 0 pretty much means the more obvious thing.
>>
>> MDA/HGC stuck around for quite a while for debugging, since it was
>> non-conflicting with VGA, but even if it is, the reason people put it in
>> their system is to have something that the OS doesn't readily see.
>>
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

A better way, probably, would be for the kernel to export the
boot_params structure so kexec can reuse it.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
@ 2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:54                                                                 ` H. Peter Anvin
                                                                                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:54 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> H. Peter Anvin wrote:
>> Of course, one could argue that since all of those were obsolete by the
>> time Linux was first created, that it probably doesn't matter and that
>> isVGA == 0 pretty much means the more obvious thing.
>>
>> MDA/HGC stuck around for quite a while for debugging, since it was
>> non-conflicting with VGA, but even if it is, the reason people put it in
>> their system is to have something that the OS doesn't readily see.
>>
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

A better way, probably, would be for the kernel to export the
boot_params structure so kexec can reuse it.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (4 preceding siblings ...)
  2007-05-09  5:54                                                                 ` H. Peter Anvin
@ 2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:55                                                                 ` H. Peter Anvin
                                                                                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:54 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> H. Peter Anvin wrote:
>> Of course, one could argue that since all of those were obsolete by the
>> time Linux was first created, that it probably doesn't matter and that
>> isVGA == 0 pretty much means the more obvious thing.
>>
>> MDA/HGC stuck around for quite a while for debugging, since it was
>> non-conflicting with VGA, but even if it is, the reason people put it in
>> their system is to have something that the OS doesn't readily see.
>>
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

A better way, probably, would be for the kernel to export the
boot_params structure so kexec can reuse it.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (3 preceding siblings ...)
  2007-05-09  5:54                                                                 ` H. Peter Anvin
@ 2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:54                                                                 ` H. Peter Anvin
                                                                                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:54 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> H. Peter Anvin wrote:
>> Of course, one could argue that since all of those were obsolete by the
>> time Linux was first created, that it probably doesn't matter and that
>> isVGA == 0 pretty much means the more obvious thing.
>>
>> MDA/HGC stuck around for quite a while for debugging, since it was
>> non-conflicting with VGA, but even if it is, the reason people put it in
>> their system is to have something that the OS doesn't readily see.
>>
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

A better way, probably, would be for the kernel to export the
boot_params structure so kexec can reuse it.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:54                                                                 ` H. Peter Anvin
@ 2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:54                                                                 ` H. Peter Anvin
                                                                                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:54 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> H. Peter Anvin wrote:
>> Of course, one could argue that since all of those were obsolete by the
>> time Linux was first created, that it probably doesn't matter and that
>> isVGA == 0 pretty much means the more obvious thing.
>>
>> MDA/HGC stuck around for quite a while for debugging, since it was
>> non-conflicting with VGA, but even if it is, the reason people put it in
>> their system is to have something that the OS doesn't readily see.
>>
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

A better way, probably, would be for the kernel to export the
boot_params structure so kexec can reuse it.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
  2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:54                                                                 ` H. Peter Anvin
@ 2007-05-09  5:54                                                                 ` H. Peter Anvin
  2007-05-09  5:54                                                                 ` H. Peter Anvin
                                                                                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:54 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> On 5/8/07, H. Peter Anvin <hpa@zytor.com> wrote:
>> H. Peter Anvin wrote:
>> Of course, one could argue that since all of those were obsolete by the
>> time Linux was first created, that it probably doesn't matter and that
>> isVGA == 0 pretty much means the more obvious thing.
>>
>> MDA/HGC stuck around for quite a while for debugging, since it was
>> non-conflicting with VGA, but even if it is, the reason people put it in
>> their system is to have something that the OS doesn't readily see.
>>
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

A better way, probably, would be for the kernel to export the
boot_params structure so kexec can reuse it.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (9 preceding siblings ...)
  2007-05-09  5:55                                                                 ` H. Peter Anvin
@ 2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09 10:52                                                                   ` Eric W. Biederman
  2007-05-09 10:52                                                                   ` Eric W. Biederman
  2007-05-09  5:55                                                                 ` H. Peter Anvin
  11 siblings, 2 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:55 UTC (permalink / raw)
  To: yhlu
  Cc: Eric W. Biederman, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

yhlu wrote:
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

BTW, welcome to the hell of bypassing setup.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (8 preceding siblings ...)
  2007-05-09  5:55                                                                 ` H. Peter Anvin
@ 2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09  5:55                                                                 ` H. Peter Anvin
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:55 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

BTW, welcome to the hell of bypassing setup.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (7 preceding siblings ...)
  2007-05-09  5:55                                                                 ` H. Peter Anvin
@ 2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09  5:55                                                                 ` H. Peter Anvin
                                                                                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:55 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

BTW, welcome to the hell of bypassing setup.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (6 preceding siblings ...)
  2007-05-09  5:55                                                                 ` H. Peter Anvin
@ 2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09  5:55                                                                 ` H. Peter Anvin
                                                                                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:55 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

BTW, welcome to the hell of bypassing setup.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (5 preceding siblings ...)
  2007-05-09  5:54                                                                 ` H. Peter Anvin
@ 2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09  5:55                                                                 ` H. Peter Anvin
                                                                                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:55 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

BTW, welcome to the hell of bypassing setup.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:48                                                               ` yhlu
                                                                                   ` (10 preceding siblings ...)
  2007-05-09  5:55                                                                 ` H. Peter Anvin
@ 2007-05-09  5:55                                                                 ` H. Peter Anvin
  11 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09  5:55 UTC (permalink / raw)
  To: yhlu
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, Ingo Molnar

yhlu wrote:
> so the kexec tools need to scan the pci devices list, and find out how
> to set real_mode.isVGA and orig_video_mode, also need to parse the
> comand line about vga console.

BTW, welcome to the hell of bypassing setup.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:30                                                       ` Eric W. Biederman
                                                                           ` (2 preceding siblings ...)
  2007-05-09  4:52                                                         ` yhlu
@ 2007-05-09  7:58                                                         ` Gerd Hoffmann
  2007-05-09 11:21                                                           ` Eric W. Biederman
                                                                             ` (3 more replies)
  2007-05-09  7:58                                                         ` Gerd Hoffmann
  4 siblings, 4 replies; 217+ messages in thread
From: Gerd Hoffmann @ 2007-05-09  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, yhlu, Jeremy Fitzhardinge, vgoyal, Jeff Garzik,
	patches, linux-kernel, virtualization, Rusty Russell, Andi Kleen,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

   Hi,

> Since the whole point is to detect the case where we don't have
> a screen at all it makes sense to check several additional variables
> and make certain that they are all 0.  Agreed?

Like in the attached patch?

cheers,
   Gerd

[-- Attachment #2: vgacon-fixup-check --]
[-- Type: text/plain, Size: 1422 bytes --]

Refine SCREEN_INFO sanity check for vgacon initialization.

Checking video mode field only to see whenever SCREEN_INFO is
initialized is not enougth, in some cases it is zero although
a vga card is present.  Lets additionally check cols and lines.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/video/console/vgacon.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
===================================================================
--- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
+++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
@@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
 #endif
 	}
 
+	/* SCREEN_INFO initialized? */
+	if ((ORIG_VIDEO_MODE  == 0) &&
+	    (ORIG_VIDEO_LINES == 0) &&
+	    (ORIG_VIDEO_COLS  == 0))
+		goto no_vga;
+
 	/* VGA16 modes are not handled by VGACON */
-	if ((ORIG_VIDEO_MODE == 0x00) ||	/* SCREEN_INFO not initialized */
-	    (ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
+	if ((ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
 	    (ORIG_VIDEO_MODE == 0x0E) ||	/* 640x200/4 */
 	    (ORIG_VIDEO_MODE == 0x10) ||	/* 640x350/4 */
 	    (ORIG_VIDEO_MODE == 0x12) ||	/* 640x480/4 */

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  3:30                                                       ` Eric W. Biederman
                                                                           ` (3 preceding siblings ...)
  2007-05-09  7:58                                                         ` Gerd Hoffmann
@ 2007-05-09  7:58                                                         ` Gerd Hoffmann
  4 siblings, 0 replies; 217+ messages in thread
From: Gerd Hoffmann @ 2007-05-09  7:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	H. Peter Anvin, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

   Hi,

> Since the whole point is to detect the case where we don't have
> a screen at all it makes sense to check several additional variables
> and make certain that they are all 0.  Agreed?

Like in the attached patch?

cheers,
   Gerd

[-- Attachment #2: vgacon-fixup-check --]
[-- Type: text/plain, Size: 1422 bytes --]

Refine SCREEN_INFO sanity check for vgacon initialization.

Checking video mode field only to see whenever SCREEN_INFO is
initialized is not enougth, in some cases it is zero although
a vga card is present.  Lets additionally check cols and lines.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/video/console/vgacon.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
===================================================================
--- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
+++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
@@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
 #endif
 	}
 
+	/* SCREEN_INFO initialized? */
+	if ((ORIG_VIDEO_MODE  == 0) &&
+	    (ORIG_VIDEO_LINES == 0) &&
+	    (ORIG_VIDEO_COLS  == 0))
+		goto no_vga;
+
 	/* VGA16 modes are not handled by VGACON */
-	if ((ORIG_VIDEO_MODE == 0x00) ||	/* SCREEN_INFO not initialized */
-	    (ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
+	if ((ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
 	    (ORIG_VIDEO_MODE == 0x0E) ||	/* 640x200/4 */
 	    (ORIG_VIDEO_MODE == 0x10) ||	/* 640x350/4 */
 	    (ORIG_VIDEO_MODE == 0x12) ||	/* 640x480/4 */

[-- Attachment #3: Type: text/plain, Size: 184 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:55                                                                 ` H. Peter Anvin
  2007-05-09 10:52                                                                   ` Eric W. Biederman
@ 2007-05-09 10:52                                                                   ` Eric W. Biederman
  2007-05-09 16:31                                                                     ` yhlu
                                                                                       ` (3 more replies)
  1 sibling, 4 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09 10:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: yhlu, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann, Jeff Garzik,
	patches, linux-kernel, virtualization, Rusty Russell, Andi Kleen,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> yhlu wrote:
>> so the kexec tools need to scan the pci devices list, and find out how
>> to set real_mode.isVGA and orig_video_mode, also need to parse the
>> comand line about vga console.
>
> BTW, welcome to the hell of bypassing setup.

Well in this case things are so very much better then attempting to
us setup.S it isn't a real option.

In general BIOS calls just don't work reliably after linux has
been running for a while.

As for YH's point it does look like there are a few ways
to poke the linux kernel to see what is happening.

We can look in /proc/ioports and see what has reserved
the video resources.  That should give us a reasonable
estimate of the video adapter.  We can do an ioctl to
the console and see how many lines and columns we have.

Reusing boot_params could be nice but if we have the information
available in other ways digging it out that way is quite possibly
better.


Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  5:55                                                                 ` H. Peter Anvin
@ 2007-05-09 10:52                                                                   ` Eric W. Biederman
  2007-05-09 10:52                                                                   ` Eric W. Biederman
  1 sibling, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09 10:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> yhlu wrote:
>> so the kexec tools need to scan the pci devices list, and find out how
>> to set real_mode.isVGA and orig_video_mode, also need to parse the
>> comand line about vga console.
>
> BTW, welcome to the hell of bypassing setup.

Well in this case things are so very much better then attempting to
us setup.S it isn't a real option.

In general BIOS calls just don't work reliably after linux has
been running for a while.

As for YH's point it does look like there are a few ways
to poke the linux kernel to see what is happening.

We can look in /proc/ioports and see what has reserved
the video resources.  That should give us a reasonable
estimate of the video adapter.  We can do an ioctl to
the console and see how many lines and columns we have.

Reusing boot_params could be nice but if we have the information
available in other ways digging it out that way is quite possibly
better.


Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  7:58                                                         ` Gerd Hoffmann
  2007-05-09 11:21                                                           ` Eric W. Biederman
@ 2007-05-09 11:21                                                           ` Eric W. Biederman
  2007-05-10  0:55                                                           ` yhlu
  2007-05-10  0:55                                                           ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09 11:21 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: H. Peter Anvin, yhlu, Jeremy Fitzhardinge, vgoyal, Jeff Garzik,
	patches, linux-kernel, virtualization, Rusty Russell, Andi Kleen,
	Ingo Molnar

Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
>
> Like in the attached patch?

Looks good to me.

> cheers,
>   Gerd
> Refine SCREEN_INFO sanity check for vgacon initialization.
>
> Checking video mode field only to see whenever SCREEN_INFO is
> initialized is not enougth, in some cases it is zero although
> a vga card is present.  Lets additionally check cols and lines.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Alan <alan@lxorguk.ukuu.org.uk>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  drivers/video/console/vgacon.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> ===================================================================
> --- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
> +++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> @@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
>  #endif
>  	}
>  
> +	/* SCREEN_INFO initialized? */
> +	if ((ORIG_VIDEO_MODE  == 0) &&
> +	    (ORIG_VIDEO_LINES == 0) &&
> +	    (ORIG_VIDEO_COLS  == 0))
> +		goto no_vga;
> +
>  	/* VGA16 modes are not handled by VGACON */
> - if ((ORIG_VIDEO_MODE == 0x00) || /* SCREEN_INFO not initialized */
> -	    (ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
> +	if ((ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
>  	    (ORIG_VIDEO_MODE == 0x0E) ||	/* 640x200/4 */
>  	    (ORIG_VIDEO_MODE == 0x10) ||	/* 640x350/4 */
>  	    (ORIG_VIDEO_MODE == 0x12) ||	/* 640x480/4 */

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  7:58                                                         ` Gerd Hoffmann
@ 2007-05-09 11:21                                                           ` Eric W. Biederman
  2007-05-09 11:21                                                           ` Eric W. Biederman
                                                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-09 11:21 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	H. Peter Anvin, Ingo Molnar

Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> Since the whole point is to detect the case where we don't have
>> a screen at all it makes sense to check several additional variables
>> and make certain that they are all 0.  Agreed?
>
> Like in the attached patch?

Looks good to me.

> cheers,
>   Gerd
> Refine SCREEN_INFO sanity check for vgacon initialization.
>
> Checking video mode field only to see whenever SCREEN_INFO is
> initialized is not enougth, in some cases it is zero although
> a vga card is present.  Lets additionally check cols and lines.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Alan <alan@lxorguk.ukuu.org.uk>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  drivers/video/console/vgacon.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> ===================================================================
> --- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
> +++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> @@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
>  #endif
>  	}
>  
> +	/* SCREEN_INFO initialized? */
> +	if ((ORIG_VIDEO_MODE  == 0) &&
> +	    (ORIG_VIDEO_LINES == 0) &&
> +	    (ORIG_VIDEO_COLS  == 0))
> +		goto no_vga;
> +
>  	/* VGA16 modes are not handled by VGACON */
> - if ((ORIG_VIDEO_MODE == 0x00) || /* SCREEN_INFO not initialized */
> -	    (ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
> +	if ((ORIG_VIDEO_MODE == 0x0D) ||	/* 320x200/4 */
>  	    (ORIG_VIDEO_MODE == 0x0E) ||	/* 640x200/4 */
>  	    (ORIG_VIDEO_MODE == 0x10) ||	/* 640x350/4 */
>  	    (ORIG_VIDEO_MODE == 0x12) ||	/* 640x480/4 */

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09 10:52                                                                   ` Eric W. Biederman
@ 2007-05-09 16:31                                                                     ` yhlu
  2007-05-09 16:31                                                                     ` yhlu
                                                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09 16:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, vgoyal, Gerd Hoffmann,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

On 5/9/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> We can look in /proc/ioports and see what has reserved
> the video resources.  That should give us a reasonable
> estimate of the video adapter.  We can do an ioctl to
> the console and see how many lines and columns we have.
>
> Reusing boot_params could be nice but if we have the information
> available in other ways digging it out that way is quite possibly
> better.

Another path:
LiuxBIOS+elfboot+payload, and payload is compressed elf
(vmlinux+initrd) via lzma.
and use kexec to boot final production kernel.
We don't need to use boot_params from the first tiny kernel.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09 10:52                                                                   ` Eric W. Biederman
  2007-05-09 16:31                                                                     ` yhlu
@ 2007-05-09 16:31                                                                     ` yhlu
  2007-05-09 19:21                                                                     ` H. Peter Anvin
  2007-05-09 19:21                                                                     ` H. Peter Anvin
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-09 16:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	H. Peter Anvin, Ingo Molnar

On 5/9/07, Eric W. Biederman <ebiederm@xmission.com> wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> We can look in /proc/ioports and see what has reserved
> the video resources.  That should give us a reasonable
> estimate of the video adapter.  We can do an ioctl to
> the console and see how many lines and columns we have.
>
> Reusing boot_params could be nice but if we have the information
> available in other ways digging it out that way is quite possibly
> better.

Another path:
LiuxBIOS+elfboot+payload, and payload is compressed elf
(vmlinux+initrd) via lzma.
and use kexec to boot final production kernel.
We don't need to use boot_params from the first tiny kernel.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09 10:52                                                                   ` Eric W. Biederman
  2007-05-09 16:31                                                                     ` yhlu
  2007-05-09 16:31                                                                     ` yhlu
@ 2007-05-09 19:21                                                                     ` H. Peter Anvin
  2007-05-10  0:52                                                                         ` Eric W. Biederman
  2007-05-09 19:21                                                                     ` H. Peter Anvin
  3 siblings, 1 reply; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09 19:21 UTC (permalink / raw)
  To: ebiederm
  Cc: Jeremy Fitzhardinge, Jeff Garzik, patches, Rusty Russell,
	linux-kernel, virtualization, vgoyal, yhlu, Ingo Molnar,
	Gerd Hoffmann

ebiederm@xmission.com wrote:
> 
> Well in this case things are so very much better then attempting to
> us setup.S it isn't a real option.
> 

Obviously not, but it was more of a comment on the apparent assumption
that doing so would be simple.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09 10:52                                                                   ` Eric W. Biederman
                                                                                       ` (2 preceding siblings ...)
  2007-05-09 19:21                                                                     ` H. Peter Anvin
@ 2007-05-09 19:21                                                                     ` H. Peter Anvin
  3 siblings, 0 replies; 217+ messages in thread
From: H. Peter Anvin @ 2007-05-09 19:21 UTC (permalink / raw)
  To: ebiederm
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	Ingo Molnar

ebiederm@xmission.com wrote:
> 
> Well in this case things are so very much better then attempting to
> us setup.S it isn't a real option.
> 

Obviously not, but it was more of a comment on the apparent assumption
that doing so would be simple.

	-hpa

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09 19:21                                                                     ` H. Peter Anvin
@ 2007-05-10  0:52                                                                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-10  0:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Jeff Garzik, patches, Rusty Russell,
	linux-kernel, virtualization, vgoyal, yhlu, Ingo Molnar,
	Gerd Hoffmann

"H. Peter Anvin" <hpa@zytor.com> writes:

> Obviously not, but it was more of a comment on the apparent assumption
> that doing so would be simple.

Reasonable comment then.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
@ 2007-05-10  0:52                                                                         ` Eric W. Biederman
  0 siblings, 0 replies; 217+ messages in thread
From: Eric W. Biederman @ 2007-05-10  0:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal, yhlu,
	Ingo Molnar

"H. Peter Anvin" <hpa@zytor.com> writes:

> Obviously not, but it was more of a comment on the apparent assumption
> that doing so would be simple.

Reasonable comment then.

Eric

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  7:58                                                         ` Gerd Hoffmann
                                                                             ` (2 preceding siblings ...)
  2007-05-10  0:55                                                           ` yhlu
@ 2007-05-10  0:55                                                           ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-10  0:55 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Eric W. Biederman, H. Peter Anvin, Jeremy Fitzhardinge, vgoyal,
	Jeff Garzik, patches, linux-kernel, virtualization,
	Rusty Russell, Andi Kleen, Ingo Molnar

On 5/9/07, Gerd Hoffmann <kraxel@redhat.com> wrote:
> Refine SCREEN_INFO sanity check for vgacon initialization.
>
> Checking video mode field only to see whenever SCREEN_INFO is
> initialized is not enougth, in some cases it is zero although
> a vga card is present.  Lets additionally check cols and lines.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Alan <alan@lxorguk.ukuu.org.uk>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  drivers/video/console/vgacon.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> ===================================================================
> --- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
> +++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> @@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
>  #endif
>         }
>
> +       /* SCREEN_INFO initialized? */
> +       if ((ORIG_VIDEO_MODE  == 0) &&
> +           (ORIG_VIDEO_LINES == 0) &&
> +           (ORIG_VIDEO_COLS  == 0))
> +               goto no_vga;
> +
>         /* VGA16 modes are not handled by VGACON */
> -       if ((ORIG_VIDEO_MODE == 0x00) ||        /* SCREEN_INFO not initialized */
> -           (ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
> +       if ((ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
>             (ORIG_VIDEO_MODE == 0x0E) ||        /* 640x200/4 */
>             (ORIG_VIDEO_MODE == 0x10) ||        /* 640x350/4 */
>             (ORIG_VIDEO_MODE == 0x12) ||        /* 640x480/4 */
>
it works.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

* Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage
  2007-05-09  7:58                                                         ` Gerd Hoffmann
  2007-05-09 11:21                                                           ` Eric W. Biederman
  2007-05-09 11:21                                                           ` Eric W. Biederman
@ 2007-05-10  0:55                                                           ` yhlu
  2007-05-10  0:55                                                           ` yhlu
  3 siblings, 0 replies; 217+ messages in thread
From: yhlu @ 2007-05-10  0:55 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeff Garzik, patches, linux-kernel, virtualization, vgoyal,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar

On 5/9/07, Gerd Hoffmann <kraxel@redhat.com> wrote:
> Refine SCREEN_INFO sanity check for vgacon initialization.
>
> Checking video mode field only to see whenever SCREEN_INFO is
> initialized is not enougth, in some cases it is zero although
> a vga card is present.  Lets additionally check cols and lines.
>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Andi Kleen <ak@suse.de>
> Cc: Alan <alan@lxorguk.ukuu.org.uk>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  drivers/video/console/vgacon.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> ===================================================================
> --- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
> +++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> @@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
>  #endif
>         }
>
> +       /* SCREEN_INFO initialized? */
> +       if ((ORIG_VIDEO_MODE  == 0) &&
> +           (ORIG_VIDEO_LINES == 0) &&
> +           (ORIG_VIDEO_COLS  == 0))
> +               goto no_vga;
> +
>         /* VGA16 modes are not handled by VGACON */
> -       if ((ORIG_VIDEO_MODE == 0x00) ||        /* SCREEN_INFO not initialized */
> -           (ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
> +       if ((ORIG_VIDEO_MODE == 0x0D) ||        /* 320x200/4 */
>             (ORIG_VIDEO_MODE == 0x0E) ||        /* 640x200/4 */
>             (ORIG_VIDEO_MODE == 0x10) ||        /* 640x350/4 */
>             (ORIG_VIDEO_MODE == 0x12) ||        /* 640x480/4 */
>
it works.

YH

^ permalink raw reply	[flat|nested] 217+ messages in thread

end of thread, other threads:[~2007-05-10  0:55 UTC | newest]

Thread overview: 217+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-28 17:58 [PATCH] [0/22] x86 candidate patches for review II: 64bit relocatable kernel Andi Kleen
2007-04-28 17:58 ` [PATCH] [1/22] x86_64: dma_ops as const Andi Kleen
2007-04-28 17:58 ` [PATCH] [2/22] x86_64: Assembly safe page.h and pgtable.h Andi Kleen
2007-04-28 17:58 ` [PATCH] [3/22] x86_64: Kill temp boot pmds Andi Kleen
2007-04-28 17:58 ` [PATCH] [4/22] x86_64: Clean up the early boot page table Andi Kleen
2007-04-28 17:58 ` [PATCH] [5/22] x86_64: Fix early printk to use standard ISA mapping Andi Kleen
2007-04-28 17:58 ` [PATCH] [6/22] x86_64: modify copy_bootdata to use virtual addresses Andi Kleen
2007-04-28 17:58 ` [PATCH] [7/22] x86_64: cleanup segments Andi Kleen
2007-04-28 17:58 ` [PATCH] [8/22] x86_64: Add EFER to the register set saved by save_processor_state Andi Kleen
2007-04-28 17:58 ` [PATCH] [9/22] x86_64: 64bit PIC SMP trampoline Andi Kleen
2007-04-28 17:58 ` [PATCH] [10/22] x86_64: Get rid of dead code in suspend resume Andi Kleen
2007-04-28 17:58 ` [PATCH] [11/22] x86_64: wakeup.S rename registers to reflect right names Andi Kleen
2007-04-28 17:58 ` [PATCH] [12/22] x86_64: wakeup.S misc cleanups Andi Kleen
2007-04-28 17:59 ` [PATCH] [13/22] x86_64: 64bit ACPI wakeup trampoline Andi Kleen
2007-04-28 17:59 ` [PATCH] [14/22] x86_64: Modify discover_ebda to use virtual addresses Andi Kleen
2007-04-28 17:59 ` [PATCH] [15/22] x86_64: Remove the identity mapping as early as possible Andi Kleen
2007-04-28 17:59 ` [PATCH] [16/22] x86: Move swsusp __pa() dependent code to arch portion Andi Kleen
2007-04-28 17:59 ` [PATCH] [17/22] x86_64: do not use virt_to_page on kernel data address Andi Kleen
2007-04-28 17:59 ` [PATCH] [18/22] x86: __pa and __pa_symbol address space separation Andi Kleen
2007-04-28 17:59 ` [PATCH] [19/22] x86_64: Relocatable Kernel Support Andi Kleen
2007-04-28 17:59 ` [PATCH] [20/22] x86_64: build-time checking Andi Kleen
2007-04-28 17:59 ` [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage Andi Kleen
2007-04-28 18:07   ` Jeff Garzik
2007-04-28 18:24     ` Andi Kleen
2007-04-28 20:18     ` [patches] " Eric W. Biederman
2007-04-28 20:38       ` H. Peter Anvin
2007-04-28 20:46         ` Eric W. Biederman
2007-04-29  4:50           ` Vivek Goyal
2007-04-28 20:39       ` Jeff Garzik
2007-04-29  7:24       ` Jeremy Fitzhardinge
2007-04-29 15:11         ` Eric W. Biederman
2007-04-29 15:11           ` Eric W. Biederman
2007-04-30  3:03           ` Rusty Russell
2007-04-30  4:38             ` H. Peter Anvin
2007-04-30  4:38               ` H. Peter Anvin
2007-04-30  5:03               ` Rusty Russell
2007-04-30  5:25                 ` Eric W. Biederman
2007-04-30  5:25                   ` Eric W. Biederman
2007-04-30 16:03                   ` H. Peter Anvin
2007-04-30 16:47                     ` Eric W. Biederman
2007-04-30 16:47                       ` Eric W. Biederman
2007-04-30 16:03                   ` H. Peter Anvin
2007-04-30 15:34                 ` Eric W. Biederman
2007-04-30 15:34                 ` Eric W. Biederman
2007-05-01  3:38                   ` Rusty Russell
2007-05-01  3:45                     ` H. Peter Anvin
2007-05-01  3:45                       ` H. Peter Anvin
2007-05-01  3:59                       ` Rusty Russell
2007-05-01  3:59                         ` Rusty Russell
2007-05-01  4:00                       ` H. Peter Anvin
2007-05-01  4:00                         ` H. Peter Anvin
2007-05-01  4:50                         ` Rusty Russell
2007-05-01  4:50                         ` Rusty Russell
2007-05-01  5:28                           ` H. Peter Anvin
2007-05-01  5:28                             ` H. Peter Anvin
2007-05-01  6:05                             ` Eric W. Biederman
2007-05-01  6:05                             ` Eric W. Biederman
2007-05-01  3:57                     ` Eric W. Biederman
2007-05-01  3:57                       ` Eric W. Biederman
2007-05-01  5:37                       ` Jeremy Fitzhardinge
2007-05-01  6:11                         ` Eric W. Biederman
2007-05-01  6:11                         ` Eric W. Biederman
2007-05-01  7:34                         ` Rusty Russell
2007-05-01  8:03                           ` Jeremy Fitzhardinge
2007-05-01  8:03                           ` Jeremy Fitzhardinge
2007-05-01  7:34                         ` Rusty Russell
2007-05-01  5:37                       ` Jeremy Fitzhardinge
2007-05-01  3:38                   ` Rusty Russell
2007-04-30  5:03               ` Rusty Russell
2007-04-30  3:03           ` Rusty Russell
2007-04-30 18:50           ` Jeremy Fitzhardinge
2007-04-30 18:50             ` Jeremy Fitzhardinge
2007-04-30 22:10             ` Eric W. Biederman
2007-04-30 22:10             ` Eric W. Biederman
2007-04-30 22:42               ` Jeremy Fitzhardinge
2007-04-30 22:42               ` Jeremy Fitzhardinge
2007-04-30 22:51                 ` Jeremy Fitzhardinge
2007-04-30 22:51                 ` Jeremy Fitzhardinge
2007-04-30 23:10                 ` Eric W. Biederman
2007-04-30 23:16                   ` H. Peter Anvin
2007-04-30 23:16                     ` H. Peter Anvin
2007-04-30 23:35                     ` Eric W. Biederman
2007-04-30 23:35                       ` Eric W. Biederman
2007-05-01  3:39                     ` Andi Kleen
2007-05-01  3:39                     ` Andi Kleen
2007-05-01  2:48                       ` H. Peter Anvin
2007-05-01  2:48                         ` H. Peter Anvin
2007-04-30 23:10                 ` Eric W. Biederman
2007-05-02  9:31             ` Gerd Hoffmann
2007-05-02 15:16               ` Jeremy Fitzhardinge
2007-05-02 15:16               ` Jeremy Fitzhardinge
2007-05-02 20:51                 ` H. Peter Anvin
2007-05-02 20:51                 ` H. Peter Anvin
2007-05-02 21:01                   ` Jeremy Fitzhardinge
2007-05-02 21:01                   ` Jeremy Fitzhardinge
2007-05-02 21:09                     ` H. Peter Anvin
2007-05-02 21:39                       ` Jeremy Fitzhardinge
2007-05-02 21:39                       ` Jeremy Fitzhardinge
2007-05-02 21:59                         ` H. Peter Anvin
2007-05-02 23:03                           ` Jeremy Fitzhardinge
2007-05-02 23:03                           ` Jeremy Fitzhardinge
2007-05-03  4:50                           ` Vivek Goyal
2007-05-03  4:50                           ` Vivek Goyal
2007-05-03  6:42                             ` Eric W. Biederman
2007-05-03  6:42                             ` Eric W. Biederman
2007-05-03  7:05                               ` Jeremy Fitzhardinge
2007-05-03  7:05                               ` Jeremy Fitzhardinge
2007-05-03 13:23                                 ` Eric W. Biederman
2007-05-03 13:23                                 ` Eric W. Biederman
2007-05-03 16:23                                   ` Jeremy Fitzhardinge
2007-05-03 16:23                                   ` Jeremy Fitzhardinge
2007-05-08 16:41                               ` yhlu
2007-05-08 17:18                                 ` Eric W. Biederman
2007-05-08 17:18                                 ` Eric W. Biederman
2007-05-08 17:33                                   ` yhlu
2007-05-08 17:33                                     ` yhlu
2007-05-08 18:51                                     ` yhlu
2007-05-08 19:01                                       ` yhlu
2007-05-08 19:01                                       ` yhlu
2007-05-08 19:11                                       ` Eric W. Biederman
2007-05-08 22:00                                         ` yhlu
2007-05-08 22:00                                         ` yhlu
2007-05-08 22:07                                           ` Jeremy Fitzhardinge
2007-05-08 22:07                                           ` Jeremy Fitzhardinge
2007-05-08 22:35                                             ` H. Peter Anvin
2007-05-08 22:41                                               ` yhlu
2007-05-08 23:13                                                 ` H. Peter Anvin
2007-05-08 23:13                                                 ` H. Peter Anvin
2007-05-09  1:44                                                   ` Eric W. Biederman
2007-05-09  2:23                                                     ` H. Peter Anvin
2007-05-09  2:23                                                     ` H. Peter Anvin
2007-05-09  3:30                                                       ` Eric W. Biederman
2007-05-09  4:52                                                         ` yhlu
2007-05-09  4:52                                                         ` yhlu
2007-05-09  5:04                                                           ` H. Peter Anvin
2007-05-09  5:04                                                           ` H. Peter Anvin
2007-05-09  5:08                                                             ` H. Peter Anvin
2007-05-09  5:08                                                             ` H. Peter Anvin
2007-05-09  5:08                                                             ` H. Peter Anvin
2007-05-09  5:08                                                             ` H. Peter Anvin
2007-05-09  5:08                                                             ` H. Peter Anvin
2007-05-09  5:48                                                               ` yhlu
2007-05-09  5:48                                                               ` yhlu
2007-05-09  5:48                                                               ` yhlu
2007-05-09  5:54                                                                 ` H. Peter Anvin
2007-05-09  5:54                                                                 ` H. Peter Anvin
2007-05-09  5:54                                                                 ` H. Peter Anvin
2007-05-09  5:54                                                                 ` H. Peter Anvin
2007-05-09  5:54                                                                 ` H. Peter Anvin
2007-05-09  5:54                                                                 ` H. Peter Anvin
2007-05-09  5:55                                                                 ` H. Peter Anvin
2007-05-09  5:55                                                                 ` H. Peter Anvin
2007-05-09  5:55                                                                 ` H. Peter Anvin
2007-05-09  5:55                                                                 ` H. Peter Anvin
2007-05-09  5:55                                                                 ` H. Peter Anvin
2007-05-09 10:52                                                                   ` Eric W. Biederman
2007-05-09 10:52                                                                   ` Eric W. Biederman
2007-05-09 16:31                                                                     ` yhlu
2007-05-09 16:31                                                                     ` yhlu
2007-05-09 19:21                                                                     ` H. Peter Anvin
2007-05-10  0:52                                                                       ` Eric W. Biederman
2007-05-10  0:52                                                                         ` Eric W. Biederman
2007-05-09 19:21                                                                     ` H. Peter Anvin
2007-05-09  5:55                                                                 ` H. Peter Anvin
2007-05-09  5:48                                                               ` yhlu
2007-05-09  5:08                                                             ` H. Peter Anvin
2007-05-09  5:04                                                           ` H. Peter Anvin
2007-05-09  5:04                                                           ` H. Peter Anvin
2007-05-09  5:04                                                           ` H. Peter Anvin
2007-05-09  5:04                                                           ` H. Peter Anvin
2007-05-09  4:52                                                         ` yhlu
2007-05-09  7:58                                                         ` Gerd Hoffmann
2007-05-09 11:21                                                           ` Eric W. Biederman
2007-05-09 11:21                                                           ` Eric W. Biederman
2007-05-10  0:55                                                           ` yhlu
2007-05-10  0:55                                                           ` yhlu
2007-05-09  7:58                                                         ` Gerd Hoffmann
2007-05-09  3:30                                                       ` Eric W. Biederman
2007-05-09  2:44                                                     ` yhlu
2007-05-09  2:44                                                     ` yhlu
2007-05-09  1:44                                                   ` Eric W. Biederman
2007-05-08 22:41                                               ` yhlu
2007-05-08 22:35                                             ` H. Peter Anvin
2007-05-08 19:11                                       ` Eric W. Biederman
2007-05-09  3:33                                       ` Vivek Goyal
2007-05-09  3:33                                       ` Vivek Goyal
2007-05-09  4:42                                         ` yhlu
2007-05-09  4:42                                         ` yhlu
2007-05-09  4:42                                         ` yhlu
2007-05-09  4:58                                           ` ebiederm
2007-05-09  4:58                                           ` Eric W. Biederman
2007-05-09  4:58                                           ` ebiederm
2007-05-09  4:58                                           ` ebiederm
2007-05-09  4:58                                           ` ebiederm
2007-05-09  4:58                                           ` Eric W. Biederman
2007-05-09  4:42                                         ` yhlu
2007-05-08 18:51                                     ` yhlu
2007-05-08 17:24                                 ` Vivek Goyal
2007-05-08 17:24                                 ` Vivek Goyal
2007-05-08 17:34                                   ` yhlu
2007-05-08 17:34                                   ` yhlu
2007-05-08 16:41                               ` yhlu
2007-05-02 21:59                         ` H. Peter Anvin
2007-05-03  2:01                       ` Rusty Russell
2007-05-03  2:01                       ` Rusty Russell
2007-05-02 21:09                     ` H. Peter Anvin
2007-05-02 21:17                   ` Eric W. Biederman
2007-05-02 21:17                   ` Eric W. Biederman
2007-05-02 21:24                     ` H. Peter Anvin
2007-05-02 21:36                       ` Eric W. Biederman
2007-05-02 21:36                       ` Eric W. Biederman
2007-05-02 21:24                     ` H. Peter Anvin
2007-05-02  9:31             ` Gerd Hoffmann
2007-04-29 17:51         ` H. Peter Anvin
2007-04-29 18:10           ` Eric W. Biederman
2007-04-30  4:41         ` Rusty Russell
2007-04-28 17:59 ` [PATCH] [22/22] x86_64: Move cpu verification code to common file Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.