linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
@ 2012-12-13 22:01 Yinghai Lu
  2012-12-13 22:01 ` [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking Yinghai Lu
                   ` (27 more replies)
  0 siblings, 28 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Now we have limit kdump reseved under 896M, because kexec has the limitation.
and also bzImage need to stay under 4g.

To make kexec/kdump could use range above 4g, we need to make bzImage and
ramdisk could be loaded above 4g.
During booting bzImage will be unpacked on same postion and stay high.

The patches add fields in setup_header and boot_params to
1. get info about ramdisk position info above 4g from bootloader/kexec
2. get info about cmd_line_ptr info above 4g from bootloader/kexec
3. set xloadflags bit0 in header for bzImage and bootloader/kexec load
   could check that to decide if it could to put bzImage high.
4. use sentinel to make sure ext_* fields in boot_params could be used.

This patches is tested with kexec tools with local changes and they are sent
to kexec list later.

could be found at:

        git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-boot

and it is on top of linus's tree 2012-12-13 after pci merge,
plus tip:x86/mm2

-v2: add ext_cmd_line_ptr support, and handle boot_param/cmd_line is above
     4G case.
-v3: according to hpa, use xloadflags instead code32_start_offset.
     0x200 will not be changed...
-v4: move ext_ramdisk_image/ext_ramdisk_size/ext_cmd_line_ptr to boot_params.
     add handling cross GB boundary case.
-v5: put spare pages in BRK,so could avoid wasting about 4 pages.
     add check for bit USE_EXT_BOOT_PARAMS in xloadflags
-v6: use sentinel according to HPA
     add kdump load high support.

Yinghai Lu (27):
  x86, mm: Fix page table early allocation offset checking
  x86, mm: make pgd next calculation consistent with pud/pmd
  x86, boot: move verify_cpu.S and no_longmode after 0x200
  x86, boot: Move lldt/ltr out of 64bit code section
  x86, 64bit: clear ident mapping when kernel is above 512G
  x86, 64bit: Set extra ident mapping for whole kernel range
  x86: Merge early_reserve_initrd for 32bit and 64bit
  x86: add get_ramdisk_image/size()
  x86, boot: add get_cmd_line_ptr()
  x86, boot: move checking of cmd_line_ptr out of common path
  x86, boot: update cmd_line_ptr to unsigned long
  x86: use io_remap to access real_mode_data
  x86: use rsi/rdi to pass realmode_data pointer
  x86, kexec: remove 1024G limitation for kexec buffer on 64bit
  x86, kexec: set ident mapping for kernel that is above max_pfn
  x86, kexec: Merge ident_mapping_init and init_level4_page
  x86, kexec: only set ident mapping for ram.
  x86, boot: add fields to support load bzImage and ramdisk above 4G
  x86, boot: update comments about entries for 64bit image
  x86, 64bit: Print init kernel lowmap correctly
  x86, boot: Not need to check setup_header version
  mm: Add alloc_bootmem_low_pages_nopanic()
  x86: Don't panic if can not alloc buffer for swiotlb
  x86: Add swiotlb force off support
  x86, kdump: remove crashkernel range find limit for 64bit
  x86: add Crash kernel low reservation
  x86: Merge early kernel reserve for 32bit and 64bit

 Documentation/kernel-parameters.txt |   10 ++
 Documentation/x86/boot.txt          |   15 ++-
 Documentation/x86/zero-page.txt     |    4 +
 arch/x86/boot/boot.h                |   18 ++-
 arch/x86/boot/cmdline.c             |   12 +-
 arch/x86/boot/compressed/cmdline.c  |   12 +-
 arch/x86/boot/compressed/head_64.S  |   48 +++++---
 arch/x86/boot/compressed/misc.c     |   12 ++
 arch/x86/boot/header.S              |   12 +-
 arch/x86/boot/setup.ld              |    8 +-
 arch/x86/include/asm/bootparam.h    |   12 +-
 arch/x86/include/asm/kexec.h        |    6 +-
 arch/x86/include/asm/page.h         |    4 +
 arch/x86/kernel/head32.c            |   20 ---
 arch/x86/kernel/head64.c            |   63 +++++++---
 arch/x86/kernel/head_64.S           |  207 ++++++++++++++++++++++++++++---
 arch/x86/kernel/machine_kexec_64.c  |  228 ++++++++++++++++-------------------
 arch/x86/kernel/pci-swiotlb.c       |   15 ++-
 arch/x86/kernel/setup.c             |  144 +++++++++++++++++-----
 arch/x86/mm/init.c                  |    8 +-
 arch/x86/mm/init_64.c               |   12 +-
 drivers/iommu/amd_iommu.c           |    1 +
 include/linux/bootmem.h             |    5 +
 include/linux/kexec.h               |    3 +
 include/linux/swiotlb.h             |    3 +-
 kernel/kexec.c                      |   34 +++++-
 lib/swiotlb.c                       |   22 ++--
 mm/bootmem.c                        |    8 ++
 mm/nobootmem.c                      |    8 ++
 29 files changed, 673 insertions(+), 281 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
@ 2012-12-13 22:01 ` Yinghai Lu
  2012-12-14 10:53   ` Borislav Petkov
  2012-12-13 22:01 ` [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

During debug load kernel above 4G, found one page if is not used in BRK
and it should be with early page allocation.

Fix that checking and also add print out for every allocation from BRK
page table allocation.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 6f85de8..c4293cf 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
 						__GFP_ZERO, order);
 	}
 
-	if ((pgt_buf_end + num) >= pgt_buf_top) {
+	if ((pgt_buf_end + num) > pgt_buf_top) {
 		unsigned long ret;
 		if (min_pfn_mapped >= max_pfn_mapped)
 			panic("alloc_low_page: ran out of memory");
@@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
 	} else {
 		pfn = pgt_buf_end;
 		pgt_buf_end += num;
+		printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",
+			pfn << PAGE_SHIFT, (pgt_buf_end << PAGE_SHIFT) - 1);
 	}
 
 	for (i = 0; i < num; i++) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
  2012-12-13 22:01 ` [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking Yinghai Lu
@ 2012-12-13 22:01 ` Yinghai Lu
  2012-12-14 14:34   ` Borislav Petkov
  2012-12-13 22:01 ` [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200 Yinghai Lu
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Just like PUD_SIZE, and PMD_SIZE next calculation, aka
round down and add size.

also remove not need next checking, just pass end instead.
later phys_pud_init uses PTRS_PER_PUD checking to exit early
if end is too big.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/mm/init_64.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 4178530..91f116a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -530,9 +530,7 @@ kernel_physical_mapping_init(unsigned long start,
 		pgd_t *pgd = pgd_offset_k(start);
 		pud_t *pud;
 
-		next = (start + PGDIR_SIZE) & PGDIR_MASK;
-		if (next > end)
-			next = end;
+		next = (start & PGDIR_MASK) + PGDIR_SIZE;
 
 		if (pgd_val(*pgd)) {
 			pud = (pud_t *)pgd_page_vaddr(*pgd);
@@ -542,7 +540,7 @@ kernel_physical_mapping_init(unsigned long start,
 		}
 
 		pud = alloc_low_page();
-		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
+		last_map_addr = phys_pud_init(pud, __pa(start), __pa(end),
 						 page_size_mask);
 
 		spin_lock(&init_mm.page_table_lock);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
  2012-12-13 22:01 ` [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking Yinghai Lu
  2012-12-13 22:01 ` [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
@ 2012-12-13 22:01 ` Yinghai Lu
  2012-12-15 17:06   ` Borislav Petkov
  2012-12-13 22:01 ` [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu, Matt Fleming

We are short of space before 0x200 that is entry for startup_64.

According to hpa, we can not change startup_64 to other offset and
that become ABI now.

We could move function verify_cpu and no_longmode down, because one is
used via call and another will not return.
So could avoid extra code of jmp back and forth if we would move other
lines.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/boot/compressed/head_64.S |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2c4b171..fb984c0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -176,14 +176,6 @@ ENTRY(startup_32)
 	lret
 ENDPROC(startup_32)
 
-no_longmode:
-	/* This isn't an x86-64 CPU so hang */
-1:
-	hlt
-	jmp     1b
-
-#include "../../kernel/verify_cpu.S"
-
 	/*
 	 * Be careful here startup_64 needs to be at a predictable
 	 * address so I can export it in an ELF header.  Bootloaders
@@ -349,6 +341,15 @@ relocated:
  */
 	jmp	*%rbp
 
+	.code32
+no_longmode:
+	/* This isn't an x86-64 CPU so hang */
+1:
+	hlt
+	jmp     1b
+
+#include "../../kernel/verify_cpu.S"
+
 	.data
 gdt:
 	.word	gdt_end - gdt
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (2 preceding siblings ...)
  2012-12-13 22:01 ` [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200 Yinghai Lu
@ 2012-12-13 22:01 ` Yinghai Lu
  2012-12-15 17:28   ` Borislav Petkov
  2012-12-13 22:01 ` [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G Yinghai Lu
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu,
	Zachary Amsden, Matt Fleming

commit 08da5a2ca

    x86_64: Early segment setup for VT

add lldt/ltr to clean more segments.

Those code are put in code64, and it is using gdt that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path. It get at startup_64 directly,  and it has different
gdt.

Move those lines into code32 after their gdt is loaded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Zachary Amsden <zamsden@gmail.com>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/boot/compressed/head_64.S |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fb984c0..5c80b94 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -154,6 +154,12 @@ ENTRY(startup_32)
 	btsl	$_EFER_LME, %eax
 	wrmsr
 
+	/* After gdt is loaded */
+	xorl	%eax, %eax
+	lldt	%ax
+	movl    $0x20, %eax
+	ltr	%ax
+
 	/*
 	 * Setup for the jump to 64bit mode
 	 *
@@ -239,9 +245,6 @@ preferred_addr:
 	movl	%eax, %ss
 	movl	%eax, %fs
 	movl	%eax, %gs
-	lldt	%ax
-	movl    $0x20, %eax
-	ltr	%ax
 
 	/*
 	 * Compute the decompressed kernel start address.  It is where
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (3 preceding siblings ...)
  2012-12-13 22:01 ` [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
@ 2012-12-13 22:01 ` Yinghai Lu
  2012-12-16 17:49   ` Borislav Petkov
  2012-12-13 22:02 ` [PATCH v6 06/27] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

After following patch:
	x86, 64bit: Set extra ident mapping for whole kernel range

We have extra ident mapping for kernel that is loaded above 1G.

So need to clear extra pgd entry when kernel is loaded above 512g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..3ef9ce6 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -29,7 +29,17 @@
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
+	unsigned long pa_text = __pa_symbol(_text);
+	unsigned long pa_end = __pa_symbol(_end);
+
 	pgd_clear(pgd);
+
+	/* When kernel is loaded above 512G */
+	if (pa_text >= PGDIR_SIZE)
+		pgd_clear(pgd + pgd_index(pa_text));
+	if (pa_end - 1 >= PGDIR_SIZE)
+		pgd_clear(pgd + pgd_index(pa_end - 1));
+
 	__flush_tlb_all();
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 06/27] x86, 64bit: Set extra ident mapping for whole kernel range
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (4 preceding siblings ...)
  2012-12-13 22:01 ` [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 07/27] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Current when kernel is loaded above 1G, only [_text, _text+2M] is set
up with extra ident page table.
That is not enough, some variables that could be used early are out of
that range, like BRK for early page table.
Need to set map for [_text, _end] include text/data/bss/brk...

Also current kernel is not allowed to be loaded above 512g, it thinks
that address is too big.
We need to add one extra spare page for level3 to point that 512g range.
Need to check _text range and set level4 pg with that spare level3 page,
and set level3 with level2 page to cover [_text, _end] with extra mapping.

At last, to handle crossing GB boundary, we need to add another
level2 spare page. To handle crossing 512GB boundary, we need to
add another level3 spare page to next 512G range.

Test on with kexec-tools with local test code to force loading kernel
cross 1G, 5G, 512g, 513g.

We need this to put relocatable 64bit bzImage high above 1g.

-v4: add crossing GB boundary handling.
-v5: use spare pages from BRK, so could save pages when kernel is not
	loaded above 1GB.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/kernel/head_64.S |  203 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 187 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 980053c..7d13874 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -20,6 +20,7 @@
 #include <asm/processor-flags.h>
 #include <asm/percpu.h>
 #include <asm/nops.h>
+#include <asm/setup.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -42,6 +43,13 @@ L3_PAGE_OFFSET = pud_index(__PAGE_OFFSET)
 L4_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
+/* two for level3, and two for level2 */
+SPARE_MAP_SIZE = (4 * PAGE_SIZE)
+RESERVE_BRK(spare_map, SPARE_MAP_SIZE)
+
+#define spare_page(x)	(__brk_base + (x) * PAGE_SIZE)
+#define add_one_spare_page	addq $PAGE_SIZE, _brk_end(%rip)
+
 	.text
 	__HEAD
 	.code64
@@ -78,12 +86,6 @@ startup_64:
 	testl	%eax, %eax
 	jnz	bad_address
 
-	/* Is the address too large? */
-	leaq	_text(%rip), %rdx
-	movq	$PGDIR_SIZE, %rax
-	cmpq	%rax, %rdx
-	jae	bad_address
-
 	/* Fixup the physical addresses in the page table
 	 */
 	addq	%rbp, init_level4_pgt + 0(%rip)
@@ -97,25 +99,196 @@ startup_64:
 
 	addq	%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-	/* Add an Identity mapping if I am above 1G */
+	/* Add an Identity mapping if _end is above 1G */
+	leaq	_end(%rip), %r9
+	decq	%r9
+	cmp	$PUD_SIZE, %r9
+	jl	ident_complete
+
+	/* Clear spare pages */
+	leaq	__brk_base(%rip), %rdi
+	xorq	%rax, %rax
+	movq	$(SPARE_MAP_SIZE/8), %rcx
+1:	decq	%rcx
+	movq	%rax, (%rdi)
+	leaq	8(%rdi), %rdi
+	jnz	1b
+
+	/* get end */
+	andq	$PMD_PAGE_MASK, %r9
+	/* round start to 1G if it is below 1G */
 	leaq	_text(%rip), %rdi
 	andq	$PMD_PAGE_MASK, %rdi
+	cmp	$PUD_SIZE, %rdi
+	jg	1f
+	movq	$PUD_SIZE, %rdi
+1:
+	/* get 512G index */
+	movq	%r9, %r8
+	shrq	$PGDIR_SHIFT, %r8
+	andq	$(PTRS_PER_PGD - 1), %r8
+	movq	%rdi, %rax
+	shrq	$PGDIR_SHIFT, %rax
+	andq	$(PTRS_PER_PGD - 1), %rax
+
+	/* cross two 512G ? */
+	cmp	%r8, %rax
+	jne	set_level3_other_512g
+
+	/* all in first 512G ? */
+	cmp	$0, %rax
+	je	skip_level3_spare
+
+	/* same 512G other than first 512g */
+	/*
+	 * We need one level3, one or two level 2,
+	 * so use first one for level3.
+	 */
+	leaq    (spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq    init_level4_pgt(%rip), %rbx
+	movq    %rdx, 0(%rbx, %rax, 8)
+	addq    $L4_PAGE_OFFSET, %rax
+	movq    %rdx, 0(%rbx, %rax, 8)
+	/* one level3 in BRK */
+	add_one_spare_page
+
+	/* get 1G index */
+	movq    %r9, %r8
+	shrq    $PUD_SHIFT, %r8
+	andq    $(PTRS_PER_PUD - 1), %r8
+	movq    %rdi, %rax
+	shrq    $PUD_SHIFT, %rax
+	andq    $(PTRS_PER_PUD - 1), %rax
+
+	/* same 1G ? */
+	cmp     %r8, %rax
+	je	set_level2_start_only_not_first_512g
+
+	/* set level2 for end */
+	leaq    spare_page(0)(%rip), %rbx
+	leaq    (spare_page(2) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq    %rdx, 0(%rbx, %r8, 8)
+	/* second one level2 in BRK */
+	add_one_spare_page
+
+set_level2_start_only_not_first_512g:
+	leaq    spare_page(0)(%rip), %rbx
+	leaq    (spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq    %rdx, 0(%rbx, %rax, 8)
+	/* first one level2 in BRK */
+	add_one_spare_page
+
+	/* one spare level3 before level2*/
+	leaq    spare_page(1)(%rip), %rbx
+	jmp	set_level2_spare
+
+set_level3_other_512g:
+	/*
+	 * We need one or two level3, and two level2,
+	 * so use first two for level2.
+	 */
+	/* for level2 last on first 512g */
+	leaq	level3_ident_pgt(%rip), %rcx
+	/* start is in first 512G ? */
+	cmp	$0, %rax
+	je	set_level2_start_other_512g
+
+	/* Set level3 for _text */
+	leaq	(spare_page(3) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq	init_level4_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %rax, 8)
+	addq	$L4_PAGE_OFFSET, %rax
+	movq	%rdx, 0(%rbx, %rax, 8)
+	/* first one level3 in BRK */
+	add_one_spare_page
+
+	/* for level2 last not on first 512G */
+	leaq	spare_page(3)(%rip), %rcx
 
+set_level2_start_other_512g:
+	/* always need to set level2 */
 	movq	%rdi, %rax
 	shrq	$PUD_SHIFT, %rax
 	andq	$(PTRS_PER_PUD - 1), %rax
-	jz	ident_complete
+	movq	%rcx, %rbx  /* %rcx : level3 spare or level3_ident_pgt */
+	leaq	(spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq	%rdx, 0(%rbx, %rax, 8)
+	/* first one level2 in BRK */
+	add_one_spare_page
+
+set_level3_end_other_512g:
+	leaq	(spare_page(2) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	leaq	init_level4_pgt(%rip), %rbx
+	movq	%rdx, 0(%rbx, %r8, 8)
+	addq	$L4_PAGE_OFFSET, %r8
+	movq	%rdx, 0(%rbx, %r8, 8)
+	/* second one level3 in BRK */
+	add_one_spare_page
+
+	/* always need to set level2 */
+	movq	%r9, %r8
+	shrq	$PUD_SHIFT, %r8
+	andq	$(PTRS_PER_PUD - 1), %r8
+	leaq	spare_page(2)(%rip), %rbx
+	leaq	(spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq	%rdx, 0(%rbx, %r8, 8)
+	/* second one level2 in BRK */
+	add_one_spare_page
+
+	/* no spare level3 before level2 */
+	leaq    spare_page(0)(%rip), %rbx
+	jmp	set_level2_spare
+
+skip_level3_spare:
+	/* We have one or two level2 */
+	/* get 1G index */
+	movq	%r9, %r8
+	shrq	$PUD_SHIFT, %r8
+	andq	$(PTRS_PER_PUD - 1), %r8
+	movq	%rdi, %rax
+	shrq	$PUD_SHIFT, %rax
+	andq	$(PTRS_PER_PUD - 1), %rax
+
+	/* same 1G ? */
+	cmp	%r8, %rax
+	je	set_level2_start_only_first_512g
 
-	leaq	(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	/* set level2 without level3 spare */
 	leaq	level3_ident_pgt(%rip), %rbx
+	leaq	(spare_page(1) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
+	movq	%rdx, 0(%rbx, %r8, 8)
+	/* second one level2 in BRK */
+	add_one_spare_page
+
+set_level2_start_only_first_512g:
+	/*  set level2 without level3 spare */
+	leaq	level3_ident_pgt(%rip), %rbx
+	leaq	(spare_page(0) - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), %rdx
 	movq	%rdx, 0(%rbx, %rax, 8)
+	/* first one level2 in BRK */
+	add_one_spare_page
+
+	/* no spare level3 */
+	leaq    spare_page(0)(%rip), %rbx
 
+set_level2_spare:
 	movq	%rdi, %rax
 	shrq	$PMD_SHIFT, %rax
 	andq	$(PTRS_PER_PMD - 1), %rax
 	leaq	__PAGE_KERNEL_IDENT_LARGE_EXEC(%rdi), %rdx
-	leaq	level2_spare_pgt(%rip), %rbx
-	movq	%rdx, 0(%rbx, %rax, 8)
+	/* %rbx is set before */
+	movq	%r9, %r8
+	shrq	$PMD_SHIFT, %r8
+	andq	$(PTRS_PER_PMD - 1), %r8
+	cmp	%r8, %rax
+	jl	1f
+	addq	$PTRS_PER_PMD, %r8
+1:	movq	%rdx, 0(%rbx, %rax, 8)
+	addq	$PMD_SIZE, %rdx
+	incq	%rax
+	cmp	%r8, %rax
+	jle	1b
+
 ident_complete:
 
 	/*
@@ -439,11 +612,9 @@ NEXT_PAGE(level2_kernel_pgt)
 	 *  If you want to increase this then increase MODULES_VADDR
 	 *  too.)
 	 */
-	PMDS(0, __PAGE_KERNEL_LARGE_EXEC,
-		KERNEL_IMAGE_SIZE/PMD_SIZE)
-
-NEXT_PAGE(level2_spare_pgt)
-	.fill   512, 8, 0
+	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
+	/* hold the whole page */
+	.fill (PTRS_PER_PMD - (KERNEL_IMAGE_SIZE/PMD_SIZE)), 8, 0
 
 #undef PMDS
 #undef NEXT_PAGE
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 07/27] x86: Merge early_reserve_initrd for 32bit and 64bit
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (5 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 06/27] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 08/27] x86: add get_ramdisk_image/size() Yinghai Lu
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
---
 arch/x86/kernel/head32.c |   11 -----------
 arch/x86/kernel/head64.c |   11 -----------
 arch/x86/kernel/setup.c  |   22 ++++++++++++++++++----
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index c18f59d..4c52efc 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -33,17 +33,6 @@ void __init i386_start_kernel(void)
 	memblock_reserve(__pa_symbol(&_text),
 			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	/* Reserve INITRD */
-	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-		/* Assume only end is not page aligned */
-		u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-		u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
-		u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-	}
-#endif
-
 	/* Call the subarch specific early setup function */
 	switch (boot_params.hdr.hardware_subarch) {
 	case X86_SUBARCH_MRST:
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3ef9ce6..fbb68d4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -110,17 +110,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
 	memblock_reserve(__pa_symbol(&_text),
 			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	/* Reserve INITRD */
-	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-		/* Assume only end is not page aligned */
-		unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
-		unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
-		unsigned long ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-		memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-	}
-#endif
-
 	reserve_ebda_region();
 
 	/*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0982dc5..2bdcb0f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -360,6 +360,19 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 
 	return mapped_pages << PAGE_SHIFT;
 }
+static void __init early_reserve_initrd(void)
+{
+	/* Assume only end is not page aligned */
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+
+	if (!boot_params.hdr.type_of_loader ||
+	    !ramdisk_image || !ramdisk_size)
+		return;		/* No initrd provided by bootloader */
+
+	memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
+}
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
@@ -386,10 +399,6 @@ static void __init reserve_initrd(void)
 	if (pfn_range_is_mapped(PFN_DOWN(ramdisk_image),
 				PFN_DOWN(ramdisk_end))) {
 		/* All are mapped, easy case */
-		/*
-		 * don't need to reserve again, already reserved early
-		 * in i386_start_kernel
-		 */
 		initrd_start = ramdisk_image + PAGE_OFFSET;
 		initrd_end = initrd_start + ramdisk_size;
 		return;
@@ -400,6 +409,9 @@ static void __init reserve_initrd(void)
 	memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
 }
 #else
+static void __init early_reserve_initrd(void)
+{
+}
 static void __init reserve_initrd(void)
 {
 }
@@ -661,6 +673,8 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+	early_reserve_initrd();
+
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 08/27] x86: add get_ramdisk_image/size()
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (6 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 07/27] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 09/27] x86, boot: add get_cmd_line_ptr() Yinghai Lu
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

There several places to find ramdisk information early for reserving
and relocating.

Use functions to make code more readable and consistent.

Later will add ext_ramdisk_image/size in those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 2bdcb0f..9546c90 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -294,12 +294,25 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+static u64 __init get_ramdisk_image(void)
+{
+	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+	return ramdisk_image;
+}
+static u64 __init get_ramdisk_size(void)
+{
+	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+	return ramdisk_size;
+}
+
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 area_size     = PAGE_ALIGN(ramdisk_size);
 	u64 ramdisk_here;
 	unsigned long slop, clen, mapaddr;
@@ -338,8 +351,8 @@ static void __init relocate_initrd(void)
 		ramdisk_size  -= clen;
 	}
 
-	ramdisk_image = boot_params.hdr.ramdisk_image;
-	ramdisk_size  = boot_params.hdr.ramdisk_size;
+	ramdisk_image = get_ramdisk_image();
+	ramdisk_size  = get_ramdisk_size();
 	printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
 		" [mem %#010llx-%#010llx]\n",
 		ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -363,8 +376,8 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
 	if (!boot_params.hdr.type_of_loader ||
@@ -376,8 +389,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
-	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-	u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+	u64 ramdisk_image = get_ramdisk_image();
+	u64 ramdisk_size  = get_ramdisk_size();
 	u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 	u64 mapped_size;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 09/27] x86, boot: add get_cmd_line_ptr()
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (7 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 08/27] x86: add get_ramdisk_image/size() Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 10/27] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

later will check ext_cmd_line_ptr at the same time.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/cmdline.c |   10 ++++++++--
 arch/x86/kernel/head64.c           |   13 +++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index 10f6b11..b4c913c 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -13,13 +13,19 @@ static inline char rdfs8(addr_t addr)
 	return *((char *)(fs + addr));
 }
 #include "../cmdline.c"
+static unsigned long get_cmd_line_ptr(void)
+{
+	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
+
+	return cmd_line_ptr;
+}
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	return __cmdline_find_option(real_mode->hdr.cmd_line_ptr, option, buffer, bufsize);
+	return __cmdline_find_option(get_cmd_line_ptr(), option, buffer, bufsize);
 }
 int cmdline_find_option_bool(const char *option)
 {
-	return __cmdline_find_option_bool(real_mode->hdr.cmd_line_ptr, option);
+	return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
 }
 
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fbb68d4..0e83fc9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -51,13 +51,22 @@ static void __init clear_bss(void)
 	       (unsigned long) __bss_stop - (unsigned long) __bss_start);
 }
 
+static unsigned long get_cmd_line_ptr(void)
+{
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	return cmd_line_ptr;
+}
+
 static void __init copy_bootdata(char *real_mode_data)
 {
 	char * command_line;
+	unsigned long cmd_line_ptr;
 
 	memcpy(&boot_params, real_mode_data, sizeof boot_params);
-	if (boot_params.hdr.cmd_line_ptr) {
-		command_line = __va(boot_params.hdr.cmd_line_ptr);
+	cmd_line_ptr = get_cmd_line_ptr();
+	if (cmd_line_ptr) {
+		command_line = __va(cmd_line_ptr);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 	}
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 10/27] x86, boot: move checking of cmd_line_ptr out of common path
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (8 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 09/27] x86, boot: add get_cmd_line_ptr() Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 11/27] x86, boot: update cmd_line_ptr to unsigned long Yinghai Lu
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

cmdline.c::__cmdline_find_option... are shared between
16-bit setup code and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr less 1M.
as those cmdline could be put above 1M, or even 4G.

Move out accessible checking out of __cmdline_find_option()
So decompressor in misc.c can parse cmdline correctly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/boot.h    |   14 ++++++++++++--
 arch/x86/boot/cmdline.c |    8 ++++----
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 18997e5..7fadf80 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -289,12 +289,22 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
 int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	return __cmdline_find_option(boot_params.hdr.cmd_line_ptr, option, buffer, bufsize);
+	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	if (cmd_line_ptr >= 0x100000)
+		return -1;      /* inaccessible */
+
+	return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);
 }
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-	return __cmdline_find_option_bool(boot_params.hdr.cmd_line_ptr, option);
+	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+	if (cmd_line_ptr >= 0x100000)
+		return -1;      /* inaccessible */
+
+	return __cmdline_find_option_bool(cmd_line_ptr, option);
 }
 
 
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 6b3b6f7..768f00f 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
 		st_bufcpy	/* Copying this to buffer */
 	} state = st_wordstart;
 
-	if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-		return -1;	/* No command line, or inaccessible */
+	if (!cmdline_ptr)
+		return -1;      /* No command line */
 
 	cptr = cmdline_ptr & 0xf;
 	set_fs(cmdline_ptr >> 4);
@@ -111,8 +111,8 @@ int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
 		st_wordskip,	/* Miscompare, skip */
 	} state = st_wordstart;
 
-	if (!cmdline_ptr || cmdline_ptr >= 0x100000)
-		return -1;	/* No command line, or inaccessible */
+	if (!cmdline_ptr)
+		return -1;      /* No command line */
 
 	cptr = cmdline_ptr & 0xf;
 	set_fs(cmdline_ptr >> 4);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 11/27] x86, boot: update cmd_line_ptr to unsigned long
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (9 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 10/27] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 12/27] x86: use io_remap to access real_mode_data Yinghai Lu
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

boot/compressed/misc.c could be with 64 bit, and cmd_line_ptr could
above 4g.

So change to unsigned long instead, that will be 64bit in 64bit path
and 32bit in 32bit path.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/boot.h    |    8 ++++----
 arch/x86/boot/cmdline.c |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 7fadf80..5b75319 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -285,11 +285,11 @@ struct biosregs {
 void intcall(u8 int_no, const struct biosregs *ireg, struct biosregs *oreg);
 
 /* cmdline.c */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize);
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize);
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
 	if (cmd_line_ptr >= 0x100000)
 		return -1;      /* inaccessible */
@@ -299,7 +299,7 @@ static inline int cmdline_find_option(const char *option, char *buffer, int bufs
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-	u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
 	if (cmd_line_ptr >= 0x100000)
 		return -1;      /* inaccessible */
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 768f00f..625d21b 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -27,7 +27,7 @@ static inline int myisspace(u8 c)
  * Returns the length of the argument (regardless of if it was
  * truncated to fit in the buffer), or -1 on not found.
  */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int bufsize)
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char *buffer, int bufsize)
 {
 	addr_t cptr;
 	char c;
@@ -99,7 +99,7 @@ int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, int
  * Returns the position of that option (starts counting with 1)
  * or 0 on not found
  */
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option)
 {
 	addr_t cptr;
 	char c;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 12/27] x86: use io_remap to access real_mode_data
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (10 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 11/27] x86, boot: update cmd_line_ptr to unsigned long Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 13/27] x86: use rsi/rdi to pass realmode_data pointer Yinghai Lu
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

When 64bit bootloader put real mode data above 4g, We can not
access real mode data directly yet.

because in arch/x86/kernel/head_64.S, only set ident mapping
for 0-1g, and kernel code/data/bss.

Move early_ioremap_init() calling as early as possible to
x86_64_start_kernel.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |   26 +++++++++++++++++++++++---
 arch/x86/kernel/setup.c  |    2 ++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 0e83fc9..16eb325 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -62,12 +62,21 @@ static void __init copy_bootdata(char *real_mode_data)
 {
 	char * command_line;
 	unsigned long cmd_line_ptr;
+	char *p;
 
-	memcpy(&boot_params, real_mode_data, sizeof boot_params);
+	/*
+	 * for 64bit bootloader path, those data could be above 4G,
+	 * and we do not set ident mapping for them in head_64.S.
+	 * So need to use ioremap to access them.
+	 */
+	p = early_memremap((unsigned long)real_mode_data, sizeof(boot_params));
+	memcpy(&boot_params, p, sizeof(boot_params));
+	early_iounmap(p, sizeof(boot_params));
 	cmd_line_ptr = get_cmd_line_ptr();
 	if (cmd_line_ptr) {
-		command_line = __va(cmd_line_ptr);
+		command_line = early_memremap(cmd_line_ptr, COMMAND_LINE_SIZE);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
+		early_iounmap(command_line, COMMAND_LINE_SIZE);
 	}
 }
 
@@ -92,6 +101,10 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
+	/* boot_params is in bss */
+	early_ioremap_init();
+	copy_bootdata(real_mode_data);
+
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 
@@ -114,7 +127,14 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
-	copy_bootdata(__va(real_mode_data));
+	/*
+	 * hdr.version is always not 0, so check it to see
+	 *  if boot_params is copied or not.
+	 */
+	if (!boot_params.hdr.version) {
+		early_ioremap_init();
+		copy_bootdata(real_mode_data);
+	}
 
 	memblock_reserve(__pa_symbol(&_text),
 			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9546c90..e636c83 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -714,7 +714,9 @@ void __init setup_arch(char **cmdline_p)
 
 	early_trap_init();
 	early_cpu_init();
+#ifdef CONFIG_X86_32
 	early_ioremap_init();
+#endif
 
 	setup_olpc_ofw_pgd();
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 13/27] x86: use rsi/rdi to pass realmode_data pointer
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (11 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 12/27] x86: use io_remap to access real_mode_data Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 14/27] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

When 64bit bootloader put real mode data above 4g, pointer will be
64bit instead of 32bit

use rsi/rdi instead of esi/edi for real_data pointer passing
between asm code and c code.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head_64.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 7d13874..4630d20 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -409,9 +409,9 @@ ENTRY(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr	
 
-	/* esi is pointer to real mode structure with interesting info.
+	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
-	movl	%esi, %edi
+	movq	%rsi, %rdi
 	
 	/* Finally jump to run C code and to be on real kernel address
 	 * Since we are running on identity-mapped space we have to jump
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 14/27] x86, kexec: remove 1024G limitation for kexec buffer on 64bit
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (12 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 13/27] x86: use rsi/rdi to pass realmode_data pointer Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 15/27] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Now 64bit kernel supports more than 1T ram and kexec tools
could find buffer above 1T, remove that obsolete limitation.
and use MAXMEM instead.

Tested on system more than 1024G ram.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/include/asm/kexec.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..11bfdc5 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -48,11 +48,11 @@
 # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
 #else
 /* Maximum physical address we can use pages from */
-# define KEXEC_SOURCE_MEMORY_LIMIT      (0xFFFFFFFFFFUL)
+# define KEXEC_SOURCE_MEMORY_LIMIT      (MAXMEM-1)
 /* Maximum address we can reach in physical address mode */
-# define KEXEC_DESTINATION_MEMORY_LIMIT (0xFFFFFFFFFFUL)
+# define KEXEC_DESTINATION_MEMORY_LIMIT (MAXMEM-1)
 /* Maximum address we can use for the control pages */
-# define KEXEC_CONTROL_MEMORY_LIMIT     (0xFFFFFFFFFFUL)
+# define KEXEC_CONTROL_MEMORY_LIMIT     (MAXMEM-1)
 
 /* Allocate one page for the pdp and the second for the code */
 # define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 15/27] x86, kexec: set ident mapping for kernel that is above max_pfn
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (13 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 14/27] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 16/27] x86, kexec: Merge ident_mapping_init and init_level4_page Yinghai Lu
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

When first kernel is booted with memmap= or mem=  to limit max_pfn.
kexec can load second kernel above that max_pfn.

We need to set ident mapping for whole image in this case not just
for first 2M.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/machine_kexec_64.c |   43 +++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..be14ee1 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -56,6 +56,25 @@ out:
 	return result;
 }
 
+static int ident_mapping_init(struct kimage *image, pgd_t *level4p,
+				unsigned long mstart, unsigned long mend)
+{
+	int result;
+
+	mstart = round_down(mstart, PMD_SIZE);
+	mend   = round_up(mend - 1, PMD_SIZE);
+
+	while (mstart < mend) {
+		result = init_one_level2_page(image, level4p, mstart);
+		if (result)
+			return result;
+
+		mstart += PMD_SIZE;
+	}
+
+	return 0;
+}
+
 static void init_level2_page(pmd_t *level2p, unsigned long addr)
 {
 	unsigned long end_addr;
@@ -184,22 +203,34 @@ err:
 	return result;
 }
 
-
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
+	unsigned long mstart, mend;
 	pgd_t *level4p;
 	int result;
+	int i;
+
 	level4p = (pgd_t *)__va(start_pgtable);
 	result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
 	if (result)
 		return result;
+
 	/*
-	 * image->start may be outside 0 ~ max_pfn, for example when
-	 * jump back to original kernel from kexeced kernel
+	 * segments's mem ranges could be outside 0 ~ max_pfn,
+	 * for example when jump back to original kernel from kexeced kernel.
+	 * or first kernel is booted with user mem map, and second kernel
+	 * could be loaded out of that range.
 	 */
-	result = init_one_level2_page(image, level4p, image->start);
-	if (result)
-		return result;
+	for (i = 0; i < image->nr_segments; i++) {
+		mstart = image->segment[i].mem;
+		mend   = mstart + image->segment[i].memsz;
+
+		result = ident_mapping_init(image, level4p, mstart, mend);
+
+		if (result)
+			return result;
+	}
+
 	return init_transition_pgtable(image, level4p);
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 16/27] x86, kexec: Merge ident_mapping_init and init_level4_page
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (14 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 15/27] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 17/27] x86, kexec: only set ident mapping for ram Yinghai Lu
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Now ident_mapping_init is checking if pgd/pud is present for every 2M,
so several 2Ms are in same PUD, it will keep checking if pud is there.

init_level4_page does not check existing pgd/pud.

We will need to use ident_mapping_init with pfn_mapped array to
map ram only, and two entries in pfn_mapped could be in same pgd/pud,
so we need to check if pgd/pud is present instead of init_level4_page.

So merge these two set functions to make new ident_mapping_init not
check pgd/pud for every pmd in same pgd/pud, and use it to replace
init_level4_page.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/machine_kexec_64.c |  214 ++++++++++++++----------------------
 1 file changed, 80 insertions(+), 134 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index be14ee1..a0bf7fb 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -21,139 +21,6 @@
 #include <asm/mmu_context.h>
 #include <asm/debugreg.h>
 
-static int init_one_level2_page(struct kimage *image, pgd_t *pgd,
-				unsigned long addr)
-{
-	pud_t *pud;
-	pmd_t *pmd;
-	struct page *page;
-	int result = -ENOMEM;
-
-	addr &= PMD_MASK;
-	pgd += pgd_index(addr);
-	if (!pgd_present(*pgd)) {
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page)
-			goto out;
-		pud = (pud_t *)page_address(page);
-		clear_page(pud);
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
-	}
-	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud)) {
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page)
-			goto out;
-		pmd = (pmd_t *)page_address(page);
-		clear_page(pmd);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
-	}
-	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
-		set_pmd(pmd, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
-	result = 0;
-out:
-	return result;
-}
-
-static int ident_mapping_init(struct kimage *image, pgd_t *level4p,
-				unsigned long mstart, unsigned long mend)
-{
-	int result;
-
-	mstart = round_down(mstart, PMD_SIZE);
-	mend   = round_up(mend - 1, PMD_SIZE);
-
-	while (mstart < mend) {
-		result = init_one_level2_page(image, level4p, mstart);
-		if (result)
-			return result;
-
-		mstart += PMD_SIZE;
-	}
-
-	return 0;
-}
-
-static void init_level2_page(pmd_t *level2p, unsigned long addr)
-{
-	unsigned long end_addr;
-
-	addr &= PAGE_MASK;
-	end_addr = addr + PUD_SIZE;
-	while (addr < end_addr) {
-		set_pmd(level2p++, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
-		addr += PMD_SIZE;
-	}
-}
-
-static int init_level3_page(struct kimage *image, pud_t *level3p,
-				unsigned long addr, unsigned long last_addr)
-{
-	unsigned long end_addr;
-	int result;
-
-	result = 0;
-	addr &= PAGE_MASK;
-	end_addr = addr + PGDIR_SIZE;
-	while ((addr < last_addr) && (addr < end_addr)) {
-		struct page *page;
-		pmd_t *level2p;
-
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page) {
-			result = -ENOMEM;
-			goto out;
-		}
-		level2p = (pmd_t *)page_address(page);
-		init_level2_page(level2p, addr);
-		set_pud(level3p++, __pud(__pa(level2p) | _KERNPG_TABLE));
-		addr += PUD_SIZE;
-	}
-	/* clear the unused entries */
-	while (addr < end_addr) {
-		pud_clear(level3p++);
-		addr += PUD_SIZE;
-	}
-out:
-	return result;
-}
-
-
-static int init_level4_page(struct kimage *image, pgd_t *level4p,
-				unsigned long addr, unsigned long last_addr)
-{
-	unsigned long end_addr;
-	int result;
-
-	result = 0;
-	addr &= PAGE_MASK;
-	end_addr = addr + (PTRS_PER_PGD * PGDIR_SIZE);
-	while ((addr < last_addr) && (addr < end_addr)) {
-		struct page *page;
-		pud_t *level3p;
-
-		page = kimage_alloc_control_pages(image, 0);
-		if (!page) {
-			result = -ENOMEM;
-			goto out;
-		}
-		level3p = (pud_t *)page_address(page);
-		result = init_level3_page(image, level3p, addr, last_addr);
-		if (result)
-			goto out;
-		set_pgd(level4p++, __pgd(__pa(level3p) | _KERNPG_TABLE));
-		addr += PGDIR_SIZE;
-	}
-	/* clear the unused entries */
-	while (addr < end_addr) {
-		pgd_clear(level4p++);
-		addr += PGDIR_SIZE;
-	}
-out:
-	return result;
-}
-
 static void free_transition_pgtable(struct kimage *image)
 {
 	free_page((unsigned long)image->arch.pud);
@@ -203,6 +70,84 @@ err:
 	return result;
 }
 
+static void ident_pmd_init(pmd_t *pmd_page,
+			  unsigned long addr, unsigned long end)
+{
+	addr &= PMD_MASK;
+	for (; addr < end; addr += PMD_SIZE) {
+		pmd_t *pmd = pmd_page + pmd_index(addr);
+
+		if (!pmd_present(*pmd))
+			set_pmd(pmd, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+	}
+}
+static int ident_pud_init(struct kimage *image, pud_t *pud_page,
+			  unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	struct page *page;
+
+	for (; addr < end; addr = next) {
+		pud_t *pud = pud_page + pud_index(addr);
+		pmd_t *pmd;
+
+		next = (addr & PUD_MASK) + PUD_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pud_present(*pud)) {
+			pmd = pmd_offset(pud, 0);
+			ident_pmd_init(pmd, addr, next);
+			continue;
+		}
+		page = kimage_alloc_control_pages(image, 0);
+		if (!page)
+			return -ENOMEM;
+		pmd = (pmd_t *)page_address(page);
+		clear_page(pmd);
+		ident_pmd_init(pmd, addr, next);
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+static int ident_mapping_init(struct kimage *image, pgd_t *pgd_page,
+				unsigned long addr, unsigned long end)
+{
+	unsigned long next;
+	struct page *page;
+	int result;
+
+	for (; addr < end; addr = next) {
+		pgd_t *pgd = pgd_page + pgd_index(addr);
+		pud_t *pud;
+
+		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (next > end)
+			next = end;
+
+		if (pgd_present(*pgd)) {
+			pud = pud_offset(pgd, 0);
+			result = ident_pud_init(image, pud, addr, next);
+			if (result)
+				return result;
+			continue;
+		}
+
+		page = kimage_alloc_control_pages(image, 0);
+		if (!page)
+			return -ENOMEM;
+		pud = (pud_t *)page_address(page);
+		clear_page(pud);
+		result = ident_pud_init(image, pud, addr, next);
+		if (result)
+			return result;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+
+	return 0;
+}
+
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
 	unsigned long mstart, mend;
@@ -211,7 +156,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 	int i;
 
 	level4p = (pgd_t *)__va(start_pgtable);
-	result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+	clear_page(level4p);
+	result = ident_mapping_init(image, level4p, 0, max_pfn << PAGE_SHIFT);
 	if (result)
 		return result;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 17/27] x86, kexec: only set ident mapping for ram.
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (15 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 16/27] x86, kexec: Merge ident_mapping_init and init_level4_page Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

We should not set mapping for all under max_pfn.
That causes same problem that is fixed by

	x86, mm: Only direct map addresses that are marked as E820_RAM

This patch expose pfn_mapped array, and only set ident mapping for range
in that array.

This patch rely on new ident_mapping_init that could handle sharing
pgd/pud between different calling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/include/asm/page.h        |    4 ++++
 arch/x86/kernel/machine_kexec_64.c |   13 ++++++++++---
 arch/x86/mm/init.c                 |    4 ++--
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 8ca8283..100a20c 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -17,6 +17,10 @@
 
 struct page;
 
+#include <linux/range.h>
+extern struct range pfn_mapped[];
+extern int nr_pfn_mapped;
+
 static inline void clear_user_page(void *page, unsigned long vaddr,
 				   struct page *pg)
 {
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index a0bf7fb..cc6d0e3 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -157,9 +157,16 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 
 	level4p = (pgd_t *)__va(start_pgtable);
 	clear_page(level4p);
-	result = ident_mapping_init(image, level4p, 0, max_pfn << PAGE_SHIFT);
-	if (result)
-		return result;
+
+	for (i = 0; i < nr_pfn_mapped; i++) {
+		mstart = pfn_mapped[i].start << PAGE_SHIFT;
+		mend   = pfn_mapped[i].end << PAGE_SHIFT;
+
+		result = ident_mapping_init(image, level4p, mstart, mend);
+
+		if (result)
+			return result;
+	}
 
 	/*
 	 * segments's mem ranges could be outside 0 ~ max_pfn,
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c4293cf..7621772 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -300,8 +300,8 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
 	return nr_range;
 }
 
-static struct range pfn_mapped[E820_X_MAX];
-static int nr_pfn_mapped;
+struct range pfn_mapped[E820_X_MAX];
+int nr_pfn_mapped;
 
 static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (16 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 17/27] x86, kexec: only set ident mapping for ram Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:54   ` H. Peter Anvin
  2012-12-13 22:02 ` [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image Yinghai Lu
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu,
	Rob Landley, Matt Fleming

ext_ramdisk_image/size will record high 32bits for ramdisk info.

xloadflags bit0 will be set if relocatable with 64bit.

Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

bootloader will fill value to ext_ramdisk_image/size when it load
ramdisk above 4G.

Also bootloader will check if xloadflags bit0 is set to decicde if
it could load ramdisk high above 4G.

sentinel is used to make sure kernel have ext_* valid values set

Update header version to 2.12.

-v2: add ext_cmd_line_ptr for above 4G support.
-v3: update to xloadflags from HPA.
-v4: use fields from bootparam instead setup_header according to HPA.
-v5: add checking for USE_EXT_BOOT_PARAMS
-v6: use sentinel to check if ext_* are valid suggested by HPA.
     HPA said:
	1. add a field in the uninitialized portion, call it "sentinel";
	2. make sure the byte position corresponding to the "sentinel" field is
	   nonzero in the bzImage file;
	3. if the kernel boots up and sentinel is nonzero, erase those fields
	   that you identified as uninitialized;

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>
---
 Documentation/x86/boot.txt         |   15 ++++++++++++++-
 Documentation/x86/zero-page.txt    |    4 ++++
 arch/x86/boot/compressed/cmdline.c |    2 ++
 arch/x86/boot/compressed/misc.c    |   12 ++++++++++++
 arch/x86/boot/header.S             |   12 ++++++++++--
 arch/x86/boot/setup.ld             |    8 +++++++-
 arch/x86/include/asm/bootparam.h   |   12 +++++++++---
 arch/x86/kernel/head64.c           |    2 ++
 arch/x86/kernel/setup.c            |    4 ++++
 9 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index f15cb74..696da56 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -57,6 +57,9 @@ Protocol 2.10:	(Kernel 2.6.31) Added a protocol for relaxed alignment
 Protocol 2.11:	(Kernel 3.6) Added a field for offset of EFI handover
 		protocol entry point.
 
+Protocol 2.12:	(Kernel 3.9) Added three fields for loading bzImage and
+		 ramdisk above 4G with 64bit in bootparam.
+
 **** MEMORY LAYOUT
 
 The traditional memory map for the kernel loader, used for Image or
@@ -182,7 +185,7 @@ Offset	Proto	Name		Meaning
 0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
 0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
 0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
-0236/2	N/A	pad3		Unused
+0236/2	2.12+	xloadflags	Boot protocol option flags
 0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
 023C/4	2.07+	hardware_subarch Hardware subarchitecture
 0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
@@ -581,6 +584,16 @@ Protocol:	2.10+
   misaligned kernel.  Therefore, a loader should typically try each
   power-of-two alignment from kernel_alignment down to this alignment.
 
+Field name:     xloadflags
+Type:           modify (obligatory)
+Offset/size:    0x236/2
+Protocol:       2.12+
+
+  This field is a bitmask.
+
+  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
+        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
+
 Field name:	cmdline_size
 Type:		read
 Offset/size:	0x238/4
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index cf5437d..bc713a5 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -19,6 +19,9 @@ Offset	Proto	Name		Meaning
 090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
 0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table)
 0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
+0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
+0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
+0C8/004	ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
 140/080	ALL	edid_info	Video mode setup (struct edid_info)
 1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
 1E0/004	ALL	alk_mem_k	Alternative mem check, in KB
@@ -27,6 +30,7 @@ Offset	Proto	Name		Meaning
 1E9/001	ALL	eddbuf_entries	Number of entries in eddbuf (below)
 1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
 				(below)
+1F0/001	ALL	sentinel	0: states _ext_* fields are valid
 290/040	ALL	edd_mbr_sig_buffer EDD MBR signatures
 2D0/A00	ALL	e820_map	E820 memory map table
 				(array of struct e820entry)
diff --git a/arch/x86/boot/compressed/cmdline.c b/arch/x86/boot/compressed/cmdline.c
index b4c913c..bffd73b 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
 {
 	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
 
+	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 88f7ff6..f714576 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -318,6 +318,16 @@ static void parse_elf(void *output)
 	free(phdrs);
 }
 
+static void sanitize_real_mode(struct boot_params *real_mode)
+{
+	if (real_mode->sentinel) {
+		/* ext_* fields in boot_params are not valid, clear them */
+		real_mode->ext_ramdisk_image = 0;
+		real_mode->ext_ramdisk_size  = 0;
+		real_mode->ext_cmd_line_ptr  = 0;
+	}
+}
+
 asmlinkage void decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
@@ -325,6 +335,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
 {
 	real_mode = rmode;
 
+	sanitize_real_mode(real_mode);
+
 	if (real_mode->screen_info.orig_video_mode == 7) {
 		vidmem = (char *) 0xb0000;
 		vidport = 0x3b4;
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 8c132a6..0d5790f 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -279,7 +279,7 @@ _start:
 	# Part 2 of the header, from the old setup.S
 
 		.ascii	"HdrS"		# header signature
-		.word	0x020b		# header version number (>= 0x0105)
+		.word	0x020c		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 		.globl realmode_swtch
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
@@ -369,7 +369,15 @@ relocatable_kernel:    .byte 1
 relocatable_kernel:    .byte 0
 #endif
 min_alignment:		.byte MIN_KERNEL_ALIGN_LG2	# minimum alignment
-pad3:			.word 0
+
+xloadflags:
+CAN_BE_LOADED_ABOVE_4G	= 1		# If set, the kernel/boot_param/
+					# ramdisk could be loaded above 4g
+#if defined(CONFIG_X86_64) && defined(CONFIG_RELOCATABLE)
+			.word CAN_BE_LOADED_ABOVE_4G
+#else
+			.word 0
+#endif
 
 cmdline_size:   .long   COMMAND_LINE_SIZE-1     #length of the command line,
                                                 #added with boot protocol
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 03c0683..cbfefa8 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -13,7 +13,13 @@ SECTIONS
 	.bstext		: { *(.bstext) }
 	.bsdata		: { *(.bsdata) }
 
-	. = 497;
+	/* sentinel: make sure if boot_params from bootloader is right */
+	. = 496;
+	.sentinel	: {
+		sentinel = .;
+		BYTE(0xff);
+	}
+
 	.header		: { *(.header) }
 	.entrytext	: { *(.entrytext) }
 	.inittext	: { *(.inittext) }
diff --git a/arch/x86/include/asm/bootparam.h b/arch/x86/include/asm/bootparam.h
index 92862cd..ae2708b 100644
--- a/arch/x86/include/asm/bootparam.h
+++ b/arch/x86/include/asm/bootparam.h
@@ -58,7 +58,9 @@ struct setup_header {
 	__u32	initrd_addr_max;
 	__u32	kernel_alignment;
 	__u8	relocatable_kernel;
-	__u8	_pad2[3];
+	__u8	min_alignment;
+	__u16	xloadflags;
+#define CAN_BE_LOADED_ABOVE_4G	(1<<0)
 	__u32	cmdline_size;
 	__u32	hardware_subarch;
 	__u64	hardware_subarch_data;
@@ -106,7 +108,10 @@ struct boot_params {
 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
 	struct sys_desc_table sys_desc_table;		/* 0x0a0 */
 	struct olpc_ofw_header olpc_ofw_header;		/* 0x0b0 */
-	__u8  _pad4[128];				/* 0x0c0 */
+	__u32 ext_ramdisk_image;			/* 0x0c0 */
+	__u32 ext_ramdisk_size;				/* 0x0c4 */
+	__u32 ext_cmd_line_ptr;				/* 0x0c8 */
+	__u8  _pad4[116];				/* 0x0cc */
 	struct edid_info edid_info;			/* 0x140 */
 	struct efi_info efi_info;			/* 0x1c0 */
 	__u32 alt_mem_k;				/* 0x1e0 */
@@ -115,7 +120,8 @@ struct boot_params {
 	__u8  eddbuf_entries;				/* 0x1e9 */
 	__u8  edd_mbr_sig_buf_entries;			/* 0x1ea */
 	__u8  kbd_status;				/* 0x1eb */
-	__u8  _pad6[5];					/* 0x1ec */
+	__u8  _pad6[4];					/* 0x1ec */
+	__u8  sentinel;					/* 0x1f0 */
 	struct setup_header hdr;    /* setup header */	/* 0x1f1 */
 	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
 	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 16eb325..b8b6ad9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -55,6 +55,8 @@ static unsigned long get_cmd_line_ptr(void)
 {
 	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
+	cmd_line_ptr |= (u64)boot_params.ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e636c83..efb33dd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -298,12 +298,16 @@ static u64 __init get_ramdisk_image(void)
 {
 	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
 
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
 	return ramdisk_image;
 }
 static u64 __init get_ramdisk_size(void)
 {
 	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
 
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
 	return ramdisk_size;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (17 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 23:27   ` H. Peter Anvin
  2012-12-13 22:02 ` [PATCH v6 20/27] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Now 64bit entry is fixed on 0x200, can not be changed anymore.

Update the comments to reflect that.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/boot/compressed/head_64.S |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 5c80b94..5ba0c95 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -37,6 +37,12 @@
 	__HEAD
 	.code32
 ENTRY(startup_32)
+	/*
+	 * 32bit entry is 0, could not be changed!
+	 * If we come here directly from a bootloader,
+	 * kernel(text+data+bss+brk) ramdisk, zero_page, command line
+	 * all need to be under 4G limit.
+	 */
 	cld
 	/*
 	 * Test KEEP_SEGMENTS flag to see if the bootloader is asking
@@ -182,20 +188,18 @@ ENTRY(startup_32)
 	lret
 ENDPROC(startup_32)
 
-	/*
-	 * Be careful here startup_64 needs to be at a predictable
-	 * address so I can export it in an ELF header.  Bootloaders
-	 * should look at the ELF header to find this address, as
-	 * it may change in the future.
-	 */
 	.code64
 	.org 0x200
 ENTRY(startup_64)
 	/*
+	 * 64bit entry is 0x200, could not be changed!
 	 * We come here either from startup_32 or directly from a
-	 * 64bit bootloader.  If we come here from a bootloader we depend on
-	 * an identity mapped page table being provied that maps our
-	 * entire text+data+bss and hopefully all of memory.
+	 * 64bit bootloader.
+	 * If we come here from a bootloader, kernel(text+data+bss+brk),
+	 * ramdisk, zero_page, command line could be above 4G.
+	 * We depend on an identity mapped page table being provided
+	 * that maps our entire kernel(text+data+bss+brk), and hopefully
+	 * all of memory.
 	 */
 #ifdef CONFIG_EFI_STUB
 	/*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 20/27] x86, 64bit: Print init kernel lowmap correctly
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (18 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 21/27] x86, boot: Not need to check setup_header version Yinghai Lu
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

When we get x86_64_start_kernel from arch/x86/kernel/head_64.S,

We have
1. kernel highmap 512M (KERNEL_IMAGE_SIZE) from kernel loaded address.
2. kernel lowmap: [0, 1024M), and size (_end - _text) from kernel
   loaded address.

for example, if the kernel bzImage is loaded high from 8G, will get:
1. kernel highmap:  [8G, 8G+512M)
2. kernel lowmap: [0, 1024M), and  [8G, 8G +_end - _text)

So max_pfn_mapped that is for low map pfn recording is not that
simple to 512M for 64 bit.

Try to print out two ranges, when kernel is loaded high.

Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head64.c |    2 --
 arch/x86/kernel/setup.c  |   23 +++++++++++++++++++++--
 arch/x86/mm/init_64.c    |    6 +++++-
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index b8b6ad9..17978b2 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -110,8 +110,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	/* Make NULL pointers segfault */
 	zap_identity_mappings();
 
-	max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
-
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
 #ifdef CONFIG_EARLY_PRINTK
 		set_intr_gate(i, &early_idt_handlers[i]);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index efb33dd..c4e7aaa 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -675,6 +675,26 @@ static int __init parse_reservelow(char *p)
 
 early_param("reservelow", parse_reservelow);
 
+static __init void print_init_mem_mapped(void)
+{
+#ifdef CONFIG_X86_32
+	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+#else
+	unsigned long text = __pa_symbol(&_text);
+	unsigned long end = round_up(__pa_symbol(_end) - 1, PMD_SIZE);
+
+	if (end <= PUD_SIZE)
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			PUD_SIZE - 1);
+	else if (text <= PUD_SIZE)
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
+			end - 1);
+	else
+		printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx] [mem %#010lx-%#010lx]\n",
+			PUD_SIZE - 1, text, end - 1);
+#endif
+}
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -943,8 +963,7 @@ void __init setup_arch(char **cmdline_p)
 	setup_bios_corruption_check();
 #endif
 
-	printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
-			(max_pfn_mapped<<PAGE_SHIFT) - 1);
+	print_init_mem_mapped();
 
 	setup_real_mode();
 
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 91f116a..11c49b8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -304,10 +304,14 @@ void __init init_extra_mapping_uc(unsigned long phys, unsigned long size)
 void __init cleanup_highmap(void)
 {
 	unsigned long vaddr = __START_KERNEL_map;
-	unsigned long vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+	unsigned long vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
 	unsigned long end = roundup((unsigned long)_brk_end, PMD_SIZE) - 1;
 	pmd_t *pmd = level2_kernel_pgt;
 
+	/* Xen has its own end somehow with abused max_pfn_mapped */
+	if (max_pfn_mapped)
+		vaddr_end = __START_KERNEL_map + (max_pfn_mapped << PAGE_SHIFT);
+
 	for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
 		if (pmd_none(*pmd))
 			continue;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 21/27] x86, boot: Not need to check setup_header version
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (19 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 20/27] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 22/27] mm: Add alloc_bootmem_low_pages_nopanic() Yinghai Lu
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

That is for bootloader.

setup_data is in setup_header, and all bootloader is copying that
for bzImage. So for old bootloader should keep that as 0.

kexec till now for elf image, will set setup_data to 0.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4e7aaa..d0082a0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -439,8 +439,6 @@ static void __init parse_setup_data(void)
 	struct setup_data *data;
 	u64 pa_data;
 
-	if (boot_params.hdr.version < 0x0209)
-		return;
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
 		u32 data_len, map_len;
@@ -476,8 +474,6 @@ static void __init e820_reserve_setup_data(void)
 	u64 pa_data;
 	int found = 0;
 
-	if (boot_params.hdr.version < 0x0209)
-		return;
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
 		data = early_memremap(pa_data, sizeof(*data));
@@ -501,8 +497,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 	struct setup_data *data;
 	u64 pa_data;
 
-	if (boot_params.hdr.version < 0x0209)
-		return;
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
 		data = early_memremap(pa_data, sizeof(*data));
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 22/27] mm: Add alloc_bootmem_low_pages_nopanic()
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (20 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 21/27] x86, boot: Not need to check setup_header version Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb Yinghai Lu
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

We don't need to panic in some case, like for swiotlb preallocating.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 include/linux/bootmem.h |    5 +++++
 mm/bootmem.c            |    8 ++++++++
 mm/nobootmem.c          |    8 ++++++++
 3 files changed, 21 insertions(+)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 7b74452..858f743 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -99,6 +99,9 @@ void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
 extern void *__alloc_bootmem_low(unsigned long size,
 				 unsigned long align,
 				 unsigned long goal);
+void *__alloc_bootmem_low_nopanic(unsigned long size,
+				 unsigned long align,
+				 unsigned long goal);
 extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 				      unsigned long size,
 				      unsigned long align,
@@ -132,6 +135,8 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 
 #define alloc_bootmem_low(x) \
 	__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
+#define alloc_bootmem_low_pages_nopanic(x) \
+	__alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages(x) \
 	__alloc_bootmem_low(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages_node(pgdat, x) \
diff --git a/mm/bootmem.c b/mm/bootmem.c
index ecc4595..29322e2 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -830,6 +830,14 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
 
+void * __init __alloc_bootmem_low_nopanic(unsigned long size,
+					  unsigned long align,
+					  unsigned long goal)
+{
+	return ___alloc_bootmem_nopanic(size, align, goal,
+					ARCH_LOW_ADDRESS_LIMIT);
+}
+
 /**
  * __alloc_bootmem_low_node - allocate low boot memory from a specific node
  * @pgdat: node to allocate from
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index ecc2f13..abb1e6f 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -370,6 +370,14 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
 
+void * __init __alloc_bootmem_low_nopanic(unsigned long size,
+					  unsigned long align,
+					  unsigned long goal)
+{
+	return ___alloc_bootmem_nopanic(size, align, goal,
+					ARCH_LOW_ADDRESS_LIMIT);
+}
+
 /**
  * __alloc_bootmem_low_node - allocate low boot memory from a specific node
  * @pgdat: node to allocate from
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (21 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 22/27] mm: Add alloc_bootmem_low_pages_nopanic() Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-22  2:21   ` Konrad Rzeszutek Wilk
  2012-12-13 22:02 ` [PATCH v6 24/27] x86: Add swiotlb force off support Yinghai Lu
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu, Yinghai Lu

Normal boot path on system with iommu support:
swiotlb buffer will be allocated early at first and then try to initialize
iommu, if iommu for intel or amd could setup properly, swiotlb buffer
will be freed.

The early allocating is with bootmem, and could panic when we try to use
kdump with buffer above 4G only.

Replace the panic with WARN, and the kernel can go on without swiotlb,
and could iommu later.

Signed-off-by: Yinghai Lu <yinghai@kerne.org>
---
 arch/x86/kernel/pci-swiotlb.c |    5 ++++-
 include/linux/swiotlb.h       |    2 +-
 lib/swiotlb.c                 |   17 +++++++++++------
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 6c483ba..6f93eb7 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -91,7 +91,10 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
 void __init pci_swiotlb_init(void)
 {
 	if (swiotlb) {
-		swiotlb_init(0);
+		if (swiotlb_init(0)) {
+			swiotlb = 0;
+			return;
+		}
 		dma_ops = &swiotlb_dma_ops;
 	}
 }
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8d08b3e..f7535d1 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -22,7 +22,7 @@ extern int swiotlb_force;
  */
 #define IO_TLB_SHIFT 11
 
-extern void swiotlb_init(int verbose);
+int swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index f114bf6..6b99ea7 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -170,7 +170,7 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
-static void __init
+static int __init
 swiotlb_init_with_default_size(size_t default_size, int verbose)
 {
 	unsigned long bytes;
@@ -185,17 +185,22 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
 	/*
 	 * Get IO TLB memory from the low pages
 	 */
-	io_tlb_start = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
-	if (!io_tlb_start)
-		panic("Cannot allocate SWIOTLB buffer");
+	io_tlb_start = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
+	if (!io_tlb_start) {
+		WARN(1, "Cannot allocate SWIOTLB buffer");
+		return -1;
+	}
 
 	swiotlb_init_with_tbl(io_tlb_start, io_tlb_nslabs, verbose);
+
+	return 0;
 }
 
-void __init
+int __init
 swiotlb_init(int verbose)
 {
-	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
+	/* default to 64MB */
+	return swiotlb_init_with_default_size(64 * (1<<20), verbose);
 }
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 24/27] x86: Add swiotlb force off support
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (22 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-22  2:18   ` Konrad Rzeszutek Wilk
  2012-12-13 22:02 ` [PATCH v6 25/27] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

So use could disable swiotlb from command line, even swiotlb support
is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 Documentation/kernel-parameters.txt |    7 +++++++
 arch/x86/kernel/pci-swiotlb.c       |   10 +++++-----
 drivers/iommu/amd_iommu.c           |    1 +
 include/linux/swiotlb.h             |    1 +
 lib/swiotlb.c                       |    5 ++++-
 5 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 20e248c..08b4c9d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2832,6 +2832,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	swiotlb=	[IA-64] Number of I/O TLB slabs
 
+	swiotlb=[force|off|on] [KNL] disable or enable swiotlb.
+		force
+		on
+			Enable swiotlb.
+		off
+			Disable swiotlb.
+
 	switches=	[HW,M68k]
 
 	sysfs.deprecated=0|1 [KNL]
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 6f93eb7..80afd3b 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -58,12 +58,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
  */
 int __init pci_swiotlb_detect_override(void)
 {
-	int use_swiotlb = swiotlb | swiotlb_force;
-
 	if (swiotlb_force)
 		swiotlb = 1;
+	else if (swiotlb_force_off)
+		swiotlb = 0;
 
-	return use_swiotlb;
+	return swiotlb;
 }
 IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 		  pci_xen_swiotlb_detect,
@@ -76,9 +76,9 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
  */
 int __init pci_swiotlb_detect_4gb(void)
 {
-	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
+	/* don't initialize swiotlb if iommu=off (no_iommu=1) or force off */
 #ifdef CONFIG_X86_64
-	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
+	if (!no_iommu && !swiotlb_force_off && max_pfn > MAX_DMA32_PFN)
 		swiotlb = 1;
 #endif
 	return swiotlb;
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 55074cb..4f370d3 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3082,6 +3082,7 @@ int __init amd_iommu_init_dma_ops(void)
 	unhandled = device_dma_ops_init();
 	if (unhandled && max_pfn > MAX_DMA32_PFN) {
 		/* There are unhandled devices - initialize swiotlb for them */
+		WARN(swiotlb_force_off, "Please remove swiotlb=off\n");
 		swiotlb = 1;
 	}
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f7535d1..dd7cf65 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@ struct dma_attrs;
 struct scatterlist;
 
 extern int swiotlb_force;
+extern int swiotlb_force_off;
 
 /*
  * Maximum allowable number of contiguous slabs to map,
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 6b99ea7..3f51b2c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -51,6 +51,7 @@
 #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
 
 int swiotlb_force;
+int swiotlb_force_off;
 
 /*
  * Used to do a quick range check in swiotlb_tbl_unmap_single and
@@ -102,8 +103,10 @@ setup_io_tlb_npages(char *str)
 	}
 	if (*str == ',')
 		++str;
-	if (!strcmp(str, "force"))
+	if (!strcmp(str, "force") || !strcmp(str, "on"))
 		swiotlb_force = 1;
+	if (!strcmp(str, "off"))
+		swiotlb_force_off = 1;
 
 	return 1;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 25/27] x86, kdump: remove crashkernel range find limit for 64bit
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (23 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 24/27] x86: Add swiotlb force off support Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 26/27] x86: add Crash kernel low reservation Yinghai Lu
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
64bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/setup.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d0082a0..e73928a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -515,13 +515,11 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64 bits, kexec-tools currently limits us to 896 MiB; increase this
- * limit once kexec-tools are fixed.
  */
 #ifdef CONFIG_X86_32
 # define CRASH_KERNEL_ADDR_MAX	(512 << 20)
 #else
-# define CRASH_KERNEL_ADDR_MAX	(896 << 20)
+# define CRASH_KERNEL_ADDR_MAX	MAXMEM
 #endif
 
 static void __init reserve_crashkernel(void)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 26/27] x86: add Crash kernel low reservation
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (24 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 25/27] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 22:02 ` [PATCH v6 27/27] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
  2012-12-13 23:47 ` [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G H. Peter Anvin
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

During kdump kernel's booting stage, it need to find low ram for
swiotlb buffer when system does not support intel iommu/dmar remapping.

kexed-tools is appending memmap=exactmap and range from /proc/iomem
with "Crash kernel", and that range is above 4G for 64bit after boot
protocol 2.12.

We need to add another range in /proc/iomem like "Crash kernel low",
so kexec-tools could find that info and append to kdump kernel
command line.

Try to reserve some under 4G if the normal "Crash kernel" is above
4G.

User could specify the size with crashkernel_low=XX[KMG].
If the user does not specify that, will use 72M instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 Documentation/kernel-parameters.txt |    3 ++
 arch/x86/kernel/setup.c             |   57 +++++++++++++++++++++++++----------
 include/linux/kexec.h               |    3 ++
 kernel/kexec.c                      |   34 ++++++++++++++++++---
 4 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 08b4c9d..0fab0da 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -600,6 +600,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			is selected automatically. Check
 			Documentation/kdump/kdump.txt for further details.
 
+	crashkernel_low=size[KMG]
+			[KNL, x86] parts under 4G.
+
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory
 			in the running system. The syntax of range is
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e73928a..d62069b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -292,6 +292,21 @@ static void __init reserve_brk(void)
 	_brk_start = 0;
 }
 
+static u64 __init get_mem_size(unsigned long limit_pfn)
+{
+	int i;
+	u64 pages = 0;
+	unsigned long start_pfn, end_pfn;
+
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
+		start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
+		end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
+		pages += end_pfn - start_pfn;
+	}
+
+	return pages << PAGE_SHIFT;
+}
+
 #ifdef CONFIG_BLK_DEV_INITRD
 
 static u64 __init get_ramdisk_image(void)
@@ -363,20 +378,6 @@ static void __init relocate_initrd(void)
 		ramdisk_here, ramdisk_here + ramdisk_size - 1);
 }
 
-static u64 __init get_mem_size(unsigned long limit_pfn)
-{
-	int i;
-	u64 mapped_pages = 0;
-	unsigned long start_pfn, end_pfn;
-
-	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
-		start_pfn = min_t(unsigned long, start_pfn, limit_pfn);
-		end_pfn = min_t(unsigned long, end_pfn, limit_pfn);
-		mapped_pages += end_pfn - start_pfn;
-	}
-
-	return mapped_pages << PAGE_SHIFT;
-}
 static void __init early_reserve_initrd(void)
 {
 	/* Assume only end is not page aligned */
@@ -524,8 +525,11 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 
 static void __init reserve_crashkernel(void)
 {
+	const unsigned long long alignment = 16<<20;	/* 16M */
 	unsigned long long total_mem;
 	unsigned long long crash_size, crash_base;
+	unsigned long long low_base = 0, low_size = 0;
+	unsigned long total_low_mem = 0;
 	int ret;
 
 	total_mem = memblock_phys_mem_size();
@@ -537,8 +541,6 @@ static void __init reserve_crashkernel(void)
 
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
-		const unsigned long long alignment = 16<<20;	/* 16M */
-
 		/*
 		 *  kexec want bzImage is below CRASH_KERNEL_ADDR_MAX
 		 */
@@ -549,6 +551,7 @@ static void __init reserve_crashkernel(void)
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
+
 	} else {
 		unsigned long long start;
 
@@ -570,6 +573,28 @@ static void __init reserve_crashkernel(void)
 	crashk_res.start = crash_base;
 	crashk_res.end   = crash_base + crash_size - 1;
 	insert_resource(&iomem_resource, &crashk_res);
+
+	if (crash_base >= (1ULL<<32)) {
+		unsigned long long base;
+
+		total_low_mem = get_mem_size(1UL<<(32-PAGE_SHIFT));
+		ret = parse_crashkernel_low(boot_command_line, total_low_mem,
+						&low_size, &base);
+		if (ret != 0 || low_size <= 0)
+			low_size = (72UL<<20);  /* 72M */
+		low_base = memblock_find_in_range(low_size, (1ULL<<32),
+					low_size, alignment);
+	}
+	if (low_base) {
+		memblock_reserve(low_base, low_size);
+		pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
+			(unsigned long)(low_size >> 20),
+			(unsigned long)(low_base >> 20),
+			(unsigned long)(total_low_mem >> 20));
+		crashk_low_res.start = low_base;
+		crashk_low_res.end   = low_base + low_size - 1;
+		insert_resource(&iomem_resource, &crashk_low_res);
+	}
 }
 #else
 static void __init reserve_crashkernel(void)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..d2e6927 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -191,6 +191,7 @@ extern struct kimage *kexec_crash_image;
 /* Location of a reserved region to hold the crash kernel.
  */
 extern struct resource crashk_res;
+extern struct resource crashk_low_res;
 typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4];
 extern note_buf_t __percpu *crash_notes;
 extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
@@ -199,6 +200,8 @@ extern size_t vmcoreinfo_max_size;
 
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
+		unsigned long long *crash_size, unsigned long long *crash_base);
 int crash_shrink_memory(unsigned long new_size);
 size_t crash_get_memory_size(void);
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..2436ffc 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -54,6 +54,12 @@ struct resource crashk_res = {
 	.end   = 0,
 	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
+struct resource crashk_low_res = {
+	.name  = "Crash kernel low",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
 
 int kexec_should_crash(struct task_struct *p)
 {
@@ -1369,10 +1375,11 @@ static int __init parse_crashkernel_simple(char 		*cmdline,
  * That function is the entry point for command line parsing and should be
  * called from the arch-specific code.
  */
-int __init parse_crashkernel(char 		 *cmdline,
+static int __init __parse_crashkernel(char *cmdline,
 			     unsigned long long system_ram,
 			     unsigned long long *crash_size,
-			     unsigned long long *crash_base)
+			     unsigned long long *crash_base,
+				const char *name)
 {
 	char 	*p = cmdline, *ck_cmdline = NULL;
 	char	*first_colon, *first_space;
@@ -1382,16 +1389,16 @@ int __init parse_crashkernel(char 		 *cmdline,
 	*crash_base = 0;
 
 	/* find crashkernel and use the last one if there are more */
-	p = strstr(p, "crashkernel=");
+	p = strstr(p, name);
 	while (p) {
 		ck_cmdline = p;
-		p = strstr(p+1, "crashkernel=");
+		p = strstr(p+1, name);
 	}
 
 	if (!ck_cmdline)
 		return -EINVAL;
 
-	ck_cmdline += 12; /* strlen("crashkernel=") */
+	ck_cmdline += strlen(name);
 
 	/*
 	 * if the commandline contains a ':', then that's the extended
@@ -1409,6 +1416,23 @@ int __init parse_crashkernel(char 		 *cmdline,
 	return 0;
 }
 
+int __init parse_crashkernel(char *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+					"crashkernel=");
+}
+
+int __init parse_crashkernel_low(char *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+					"crashkernel_low=");
+}
 
 static void update_vmcoreinfo_note(void)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v6 27/27] x86: Merge early kernel reserve for 32bit and 64bit
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (25 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 26/27] x86: add Crash kernel low reservation Yinghai Lu
@ 2012-12-13 22:02 ` Yinghai Lu
  2012-12-13 23:47 ` [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G H. Peter Anvin
  27 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 22:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Eric W. Biederman, Andrew Morton, linux-kernel, Yinghai Lu

They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
 arch/x86/kernel/head32.c |    9 ---------
 arch/x86/kernel/head64.c |    3 ---
 arch/x86/kernel/setup.c  |    9 +++++++++
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index 4c52efc..17f7792 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -30,9 +30,6 @@ static void __init i386_default_early_setup(void)
 
 void __init i386_start_kernel(void)
 {
-	memblock_reserve(__pa_symbol(&_text),
-			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
-
 	/* Call the subarch specific early setup function */
 	switch (boot_params.hdr.hardware_subarch) {
 	case X86_SUBARCH_MRST:
@@ -46,11 +43,5 @@ void __init i386_start_kernel(void)
 		break;
 	}
 
-	/*
-	 * At this point everything still needed from the boot loader
-	 * or BIOS or kernel text should be early reserved or marked not
-	 * RAM in e820. All other memory is free game.
-	 */
-
 	start_kernel();
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 17978b2..934b122 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -136,9 +136,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
 		copy_bootdata(real_mode_data);
 	}
 
-	memblock_reserve(__pa_symbol(&_text),
-			 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
-
 	reserve_ebda_region();
 
 	/*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d62069b..fcfcaef 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -727,8 +727,17 @@ static __init void print_init_mem_mapped(void)
 
 void __init setup_arch(char **cmdline_p)
 {
+	memblock_reserve(__pa_symbol(_text),
+			 (unsigned long)__bss_stop - (unsigned long)_text);
+
 	early_reserve_initrd();
 
+	/*
+	 * At this point everything still needed from the boot loader
+	 * or BIOS or kernel text should be early reserved or marked not
+	 * RAM in e820. All other memory is free game.
+	 */
+
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2012-12-13 22:02 ` [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
@ 2012-12-13 22:54   ` H. Peter Anvin
  2012-12-13 23:28     ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-13 22:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel, Rob Landley, Matt Fleming

On 12/13/2012 02:02 PM, Yinghai Lu wrote:
>  1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
>  				(below)
> +1F0/001	ALL	sentinel	0: states _ext_* fields are vali

0x1f0 is unsuitable for use as sentinel -- or in fact for any purpose --
because it is quite plausible that someone may (fairly sanely) start the
copy range at 0x1f0 instead of 0x1f1... we really should have documented
it that way but it is too late now.

However, we can use 0x1ef.

	-hpa


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-13 22:02 ` [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image Yinghai Lu
@ 2012-12-13 23:27   ` H. Peter Anvin
  2012-12-14  0:13     ` Yinghai Lu
  2012-12-14  2:15     ` Yinghai Lu
  0 siblings, 2 replies; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-13 23:27 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On 12/13/2012 02:02 PM, Yinghai Lu wrote:
> +	 * If we come here from a bootloader, kernel(text+data+bss+brk),
> +	 * ramdisk, zero_page, command line could be above 4G.
> +	 * We depend on an identity mapped page table being provided
> +	 * that maps our entire kernel(text+data+bss+brk), and hopefully
> +	 * all of memory.

We should make it explicit what we depend on.  We certainly *can* depend
only on text+data+bss+brk ... with the dynamic page table approach we
can do that, and that would be most conservative; if we depend on other
things we should make that explicit, not just here but in boot.txt.

	-hpa



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2012-12-13 22:54   ` H. Peter Anvin
@ 2012-12-13 23:28     ` Yinghai Lu
  2012-12-13 23:38       ` H. Peter Anvin
  0 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-13 23:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel, Rob Landley, Matt Fleming

[-- Attachment #1: Type: text/plain, Size: 414 bytes --]

On Thu, Dec 13, 2012 at 2:54 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> 0x1f0 is unsuitable for use as sentinel -- or in fact for any purpose --
> because it is quite plausible that someone may (fairly sanely) start the
> copy range at 0x1f0 instead of 0x1f1... we really should have documented
> it that way but it is too late now.
>
> However, we can use 0x1ef.

right. updated to use 0x1ef.

Thanks

Yinghai

[-- Attachment #2: ext_ramdisk_image_v6_1.patch --]
[-- Type: application/octet-stream, Size: 10212 bytes --]

Subject: [PATCH v5 10/11] x86, boot: add fields to support load bzImage and ramdisk above 4G

ext_ramdisk_image/size will record high 32bits for ramdisk info.

xloadflags bit0 will be set if relocatable with 64bit.

Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

bootloader will fill value to ext_ramdisk_image/size when it load
ramdisk above 4G.

Also bootloader will check if xloadflags bit0 is set to decicde if
it could load ramdisk high above 4G.

sentinel is used to make sure kernel have ext_* valid values set

Update header version to 2.12.

-v2: add ext_cmd_line_ptr for above 4G support.
-v3: update to xloadflags from HPA.
-v4: use fields from bootparam instead setup_header according to HPA.
-v5: add checking for USE_EXT_BOOT_PARAMS
-v6: use sentinel to check if ext_* are valid suggested by HPA.
     HPA said:
	1. add a field in the uninitialized portion, call it "sentinel";
	2. make sure the byte position corresponding to the "sentinel" field is
	   nonzero in the bzImage file;
	3. if the kernel boots up and sentinel is nonzero, erase those fields
	   that you identified as uninitialized;
-v7: change to 0x1ef instead of 0x1f0, HPA said:
	it is quite plausible that someone may (fairly sanely) start the
	copy range at 0x1f0 instead of 0x1f1

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Rob Landley <rob@landley.net>
Cc: Matt Fleming <matt.fleming@intel.com>

---
 Documentation/x86/boot.txt         |   15 ++++++++++++++-
 Documentation/x86/zero-page.txt    |    4 ++++
 arch/x86/boot/compressed/cmdline.c |    2 ++
 arch/x86/boot/compressed/misc.c    |   12 ++++++++++++
 arch/x86/boot/header.S             |   12 ++++++++++--
 arch/x86/boot/setup.ld             |    7 +++++++
 arch/x86/include/asm/bootparam.h   |   13 ++++++++++---
 arch/x86/kernel/head64.c           |    2 ++
 arch/x86/kernel/setup.c            |    4 ++++
 9 files changed, 65 insertions(+), 6 deletions(-)

Index: linux-2.6/Documentation/x86/boot.txt
===================================================================
--- linux-2.6.orig/Documentation/x86/boot.txt
+++ linux-2.6/Documentation/x86/boot.txt
@@ -57,6 +57,9 @@ Protocol 2.10:	(Kernel 2.6.31) Added a p
 Protocol 2.11:	(Kernel 3.6) Added a field for offset of EFI handover
 		protocol entry point.
 
+Protocol 2.12:	(Kernel 3.9) Added three fields for loading bzImage and
+		 ramdisk above 4G with 64bit in bootparam.
+
 **** MEMORY LAYOUT
 
 The traditional memory map for the kernel loader, used for Image or
@@ -182,7 +185,7 @@ Offset	Proto	Name		Meaning
 0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
 0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
 0235/1	2.10+	min_alignment	Minimum alignment, as a power of two
-0236/2	N/A	pad3		Unused
+0236/2	2.12+	xloadflags	Boot protocol option flags
 0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
 023C/4	2.07+	hardware_subarch Hardware subarchitecture
 0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
@@ -582,6 +585,16 @@ Protocol:	2.10+
   misaligned kernel.  Therefore, a loader should typically try each
   power-of-two alignment from kernel_alignment down to this alignment.
 
+Field name:     xloadflags
+Type:           modify (obligatory)
+Offset/size:    0x236/2
+Protocol:       2.12+
+
+  This field is a bitmask.
+
+  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
+        - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
+
 Field name:	cmdline_size
 Type:		read
 Offset/size:	0x238/4
Index: linux-2.6/arch/x86/boot/header.S
===================================================================
--- linux-2.6.orig/arch/x86/boot/header.S
+++ linux-2.6/arch/x86/boot/header.S
@@ -279,7 +279,7 @@ _start:
 	# Part 2 of the header, from the old setup.S
 
 		.ascii	"HdrS"		# header signature
-		.word	0x020b		# header version number (>= 0x0105)
+		.word	0x020c		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 		.globl realmode_swtch
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
@@ -369,7 +369,15 @@ relocatable_kernel:    .byte 1
 relocatable_kernel:    .byte 0
 #endif
 min_alignment:		.byte MIN_KERNEL_ALIGN_LG2	# minimum alignment
-pad3:			.word 0
+
+xloadflags:
+CAN_BE_LOADED_ABOVE_4G	= 1		# If set, the kernel/boot_param/
+					# ramdisk could be loaded above 4g
+#if defined(CONFIG_X86_64) && defined(CONFIG_RELOCATABLE)
+			.word CAN_BE_LOADED_ABOVE_4G
+#else
+			.word 0
+#endif
 
 cmdline_size:   .long   COMMAND_LINE_SIZE-1     #length of the command line,
                                                 #added with boot protocol
Index: linux-2.6/arch/x86/include/asm/bootparam.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/bootparam.h
+++ linux-2.6/arch/x86/include/asm/bootparam.h
@@ -58,7 +58,9 @@ struct setup_header {
 	__u32	initrd_addr_max;
 	__u32	kernel_alignment;
 	__u8	relocatable_kernel;
-	__u8	_pad2[3];
+	__u8	min_alignment;
+	__u16	xloadflags;
+#define CAN_BE_LOADED_ABOVE_4G	(1<<0)
 	__u32	cmdline_size;
 	__u32	hardware_subarch;
 	__u64	hardware_subarch_data;
@@ -106,7 +108,10 @@ struct boot_params {
 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
 	struct sys_desc_table sys_desc_table;		/* 0x0a0 */
 	struct olpc_ofw_header olpc_ofw_header;		/* 0x0b0 */
-	__u8  _pad4[128];				/* 0x0c0 */
+	__u32 ext_ramdisk_image;			/* 0x0c0 */
+	__u32 ext_ramdisk_size;				/* 0x0c4 */
+	__u32 ext_cmd_line_ptr;				/* 0x0c8 */
+	__u8  _pad4[116];				/* 0x0cc */
 	struct edid_info edid_info;			/* 0x140 */
 	struct efi_info efi_info;			/* 0x1c0 */
 	__u32 alt_mem_k;				/* 0x1e0 */
@@ -115,7 +120,9 @@ struct boot_params {
 	__u8  eddbuf_entries;				/* 0x1e9 */
 	__u8  edd_mbr_sig_buf_entries;			/* 0x1ea */
 	__u8  kbd_status;				/* 0x1eb */
-	__u8  _pad6[5];					/* 0x1ec */
+	__u8  _pad5[3];					/* 0x1ec */
+	__u8  sentinel;					/* 0x1ef */
+	__u8  _pad6[1];					/* 0x1f0 */
 	struct setup_header hdr;    /* setup header */	/* 0x1f1 */
 	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
 	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -301,12 +301,16 @@ static u64 __init get_ramdisk_image(void
 {
 	u64 ramdisk_image = boot_params.hdr.ramdisk_image;
 
+	ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+
 	return ramdisk_image;
 }
 static u64 __init get_ramdisk_size(void)
 {
 	u64 ramdisk_size = boot_params.hdr.ramdisk_size;
 
+	ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+
 	return ramdisk_size;
 }
 
Index: linux-2.6/arch/x86/boot/compressed/cmdline.c
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/cmdline.c
+++ linux-2.6/arch/x86/boot/compressed/cmdline.c
@@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(vo
 {
 	unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
 
+	cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -55,6 +55,8 @@ static unsigned long get_cmd_line_ptr(vo
 {
 	unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
+	cmd_line_ptr |= (u64)boot_params.ext_cmd_line_ptr << 32;
+
 	return cmd_line_ptr;
 }
 
Index: linux-2.6/Documentation/x86/zero-page.txt
===================================================================
--- linux-2.6.orig/Documentation/x86/zero-page.txt
+++ linux-2.6/Documentation/x86/zero-page.txt
@@ -19,6 +19,9 @@ Offset	Proto	Name		Meaning
 090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
 0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table)
 0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
+0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
+0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
+0C8/004	ALL	ext_cmd_line_ptr  cmd_line_ptr high 32bits
 140/080	ALL	edid_info	Video mode setup (struct edid_info)
 1C0/020	ALL	efi_info	EFI 32 information (struct efi_info)
 1E0/004	ALL	alk_mem_k	Alternative mem check, in KB
@@ -27,6 +30,7 @@ Offset	Proto	Name		Meaning
 1E9/001	ALL	eddbuf_entries	Number of entries in eddbuf (below)
 1EA/001	ALL	edd_mbr_sig_buf_entries	Number of entries in edd_mbr_sig_buffer
 				(below)
+1EF/001	ALL	sentinel	0: states _ext_* fields are valid
 290/040	ALL	edd_mbr_sig_buffer EDD MBR signatures
 2D0/A00	ALL	e820_map	E820 memory map table
 				(array of struct e820entry)
Index: linux-2.6/arch/x86/boot/compressed/misc.c
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/misc.c
+++ linux-2.6/arch/x86/boot/compressed/misc.c
@@ -318,6 +318,16 @@ static void parse_elf(void *output)
 	free(phdrs);
 }
 
+static void sanitize_real_mode(struct boot_params *real_mode)
+{
+	if (real_mode->sentinel) {
+		/* ext_* fields in boot_params are not valid, clear them */
+		real_mode->ext_ramdisk_image = 0;
+		real_mode->ext_ramdisk_size  = 0;
+		real_mode->ext_cmd_line_ptr  = 0;
+	}
+}
+
 asmlinkage void decompress_kernel(void *rmode, memptr heap,
 				  unsigned char *input_data,
 				  unsigned long input_len,
@@ -325,6 +335,8 @@ asmlinkage void decompress_kernel(void *
 {
 	real_mode = rmode;
 
+	sanitize_real_mode(real_mode);
+
 	if (real_mode->screen_info.orig_video_mode == 7) {
 		vidmem = (char *) 0xb0000;
 		vidport = 0x3b4;
Index: linux-2.6/arch/x86/boot/setup.ld
===================================================================
--- linux-2.6.orig/arch/x86/boot/setup.ld
+++ linux-2.6/arch/x86/boot/setup.ld
@@ -13,6 +13,13 @@ SECTIONS
 	.bstext		: { *(.bstext) }
 	.bsdata		: { *(.bsdata) }
 
+	/* sentinel: make sure if boot_params from bootloader is right */
+	. = 495;
+	.sentinel	: {
+		sentinel = .;
+		BYTE(0xff);
+	}
+
 	. = 497;
 	.header		: { *(.header) }
 	.entrytext	: { *(.entrytext) }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G
  2012-12-13 23:28     ` Yinghai Lu
@ 2012-12-13 23:38       ` H. Peter Anvin
  0 siblings, 0 replies; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-13 23:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel, Rob Landley, Matt Fleming

On 12/13/2012 03:28 PM, Yinghai Lu wrote:
> On Thu, Dec 13, 2012 at 2:54 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> 0x1f0 is unsuitable for use as sentinel -- or in fact for any purpose --
>> because it is quite plausible that someone may (fairly sanely) start the
>> copy range at 0x1f0 instead of 0x1f1... we really should have documented
>> it that way but it is too late now.
>>
>> However, we can use 0x1ef.
>
> right. updated to use 0x1ef.
>

> +1EF/001	ALL	sentinel	0: states _ext_* fields are valid

Not the correct documentation.  What this does is detect broken 
32/64-bit bootloaders, and the remediation code should zero not just the 
ext_* fields but all the fields that were identified as uninitialized -- 
we're kind of assuming that kexec is representative here, since it 
doesn't help us by giving an ID.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
                   ` (26 preceding siblings ...)
  2012-12-13 22:02 ` [PATCH v6 27/27] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
@ 2012-12-13 23:47 ` H. Peter Anvin
  2012-12-14  0:00   ` Yinghai Lu
  2012-12-21 22:38   ` Konrad Rzeszutek Wilk
  27 siblings, 2 replies; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-13 23:47 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel, Konrad Rzeszutek Wilk, Stefano Stabellini

There are obviously good bits to this patchset, but I'm really starting 
to think the "pseudo-linear mode" via a trap handler -- meaning we can 
access all of memory without any extra effort -- makes more sense.  In 
fact, that way we could just build the full page tables without worrying 
about incremental bootstrap, depending on if that is a complexity win or 
not.

Either way, this is for native only: the Xen domain builder or other 
similar entry paths should be setting up page tables that cover all of 
memory; I'm hoping Konrad and Stefano can confirm this.

The only reason to go with another approach I can think of is if it 
makes 32/64-bit unification cleaner.

	-hpa


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2012-12-13 23:47 ` [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G H. Peter Anvin
@ 2012-12-14  0:00   ` Yinghai Lu
  2012-12-21 22:38   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-14  0:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel, Konrad Rzeszutek Wilk, Stefano Stabellini

On Thu, Dec 13, 2012 at 3:47 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> There are obviously good bits to this patchset, but I'm really starting to
> think the "pseudo-linear mode" via a trap handler -- meaning we can access
> all of memory without any extra effort -- makes more sense.  In fact, that
> way we could just build the full page tables without worrying about
> incremental bootstrap, depending on if that is a complexity win or not.
>
> Either way, this is for native only: the Xen domain builder or other similar
> entry paths should be setting up page tables that cover all of memory; I'm
> hoping Konrad and Stefano can confirm this.
>
> The only reason to go with another approach I can think of is if it makes
> 32/64-bit unification cleaner.

ok. let's wait until the new approach working.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-13 23:27   ` H. Peter Anvin
@ 2012-12-14  0:13     ` Yinghai Lu
  2012-12-14  0:38       ` H. Peter Anvin
  2012-12-14  2:15     ` Yinghai Lu
  1 sibling, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-14  0:13 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On Thu, Dec 13, 2012 at 3:27 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/13/2012 02:02 PM, Yinghai Lu wrote:
>> +      * If we come here from a bootloader, kernel(text+data+bss+brk),
>> +      * ramdisk, zero_page, command line could be above 4G.
>> +      * We depend on an identity mapped page table being provided
>> +      * that maps our entire kernel(text+data+bss+brk), and hopefully
>> +      * all of memory.
>
> We should make it explicit what we depend on.  We certainly *can* depend
> only on text+data+bss+brk ... with the dynamic page table approach we
> can do that, and that would be most conservative; if we depend on other
> things we should make that explicit, not just here but in boot.txt.

yes, in my version, only need kernel(text+data+bss+brk) get mapped.
aka the INIT_SIZE for decompressing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  0:13     ` Yinghai Lu
@ 2012-12-14  0:38       ` H. Peter Anvin
  2012-12-14  0:44         ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-14  0:38 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On 12/13/2012 04:13 PM, Yinghai Lu wrote:
> On Thu, Dec 13, 2012 at 3:27 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/13/2012 02:02 PM, Yinghai Lu wrote:
>>> +      * If we come here from a bootloader, kernel(text+data+bss+brk),
>>> +      * ramdisk, zero_page, command line could be above 4G.
>>> +      * We depend on an identity mapped page table being provided
>>> +      * that maps our entire kernel(text+data+bss+brk), and hopefully
>>> +      * all of memory.
>>
>> We should make it explicit what we depend on.  We certainly *can* depend
>> only on text+data+bss+brk ... with the dynamic page table approach we
>> can do that, and that would be most conservative; if we depend on other
>> things we should make that explicit, not just here but in boot.txt.
>
> yes, in my version, only need kernel(text+data+bss+brk) get mapped.
> aka the INIT_SIZE for decompressing.
>

It is definitely the minmum we can rely on, and so is the minimum we 
should rely on.  In fact, we don't even need .bss/.brk to be mapped, but 
we probably should require that as a matter of protocol.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  0:38       ` H. Peter Anvin
@ 2012-12-14  0:44         ` Yinghai Lu
  2012-12-14  0:51           ` H. Peter Anvin
  2012-12-14  0:51           ` Yinghai Lu
  0 siblings, 2 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-14  0:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On Thu, Dec 13, 2012 at 4:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/13/2012 04:13 PM, Yinghai Lu wrote:
>
> It is definitely the minmum we can rely on, and so is the minimum we should
> rely on.  In fact, we don't even need .bss/.brk to be mapped, but we
> probably should require that as a matter of protocol.

in my version of arch/x86/kernel/head_64.S is using BRK to do
ident/kernel high mapping
for kernel that is above 4G.
so .brk is needed.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  0:44         ` Yinghai Lu
@ 2012-12-14  0:51           ` H. Peter Anvin
  2012-12-14  0:51           ` Yinghai Lu
  1 sibling, 0 replies; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-14  0:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On 12/13/2012 04:44 PM, Yinghai Lu wrote:
> On Thu, Dec 13, 2012 at 4:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/13/2012 04:13 PM, Yinghai Lu wrote:
>>
>> It is definitely the minmum we can rely on, and so is the minimum we should
>> rely on.  In fact, we don't even need .bss/.brk to be mapped, but we
>> probably should require that as a matter of protocol.
>
> in my version of arch/x86/kernel/head_64.S is using BRK to do
> ident/kernel high mapping
> for kernel that is above 4G.
> so .brk is needed.
>

Yes, with the page fault approach we wouldn't need to do that, so that 
version is the minimum that can, practically, be required (one can 
constrain that even further, down to only needing a handful of pages, 
but that gets progressively more painful for little to no gain.)

However, as I said, rather than tying our hands for the future we should 
include .bss/.brk in the requirement.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  0:44         ` Yinghai Lu
  2012-12-14  0:51           ` H. Peter Anvin
@ 2012-12-14  0:51           ` Yinghai Lu
  2012-12-14  0:54             ` H. Peter Anvin
  1 sibling, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-14  0:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On Thu, Dec 13, 2012 at 4:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Thu, Dec 13, 2012 at 4:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/13/2012 04:13 PM, Yinghai Lu wrote:
>>
>> It is definitely the minmum we can rely on, and so is the minimum we should
>> rely on.  In fact, we don't even need .bss/.brk to be mapped, but we
>> probably should require that as a matter of protocol.
>
> in my version of arch/x86/kernel/head_64.S is using BRK to do
> ident/kernel high mapping
> for kernel that is above 4G.
> so .brk is needed.

also need to make sure zero page and command line get ident mapping.
because arch/x86/boot/compressed/head_64.S is using them.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  0:51           ` Yinghai Lu
@ 2012-12-14  0:54             ` H. Peter Anvin
  2012-12-14  1:00               ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-14  0:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On 12/13/2012 04:51 PM, Yinghai Lu wrote:
> On Thu, Dec 13, 2012 at 4:44 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Thu, Dec 13, 2012 at 4:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> On 12/13/2012 04:13 PM, Yinghai Lu wrote:
>>>
>>> It is definitely the minmum we can rely on, and so is the minimum we should
>>> rely on.  In fact, we don't even need .bss/.brk to be mapped, but we
>>> probably should require that as a matter of protocol.
>>
>> in my version of arch/x86/kernel/head_64.S is using BRK to do
>> ident/kernel high mapping
>> for kernel that is above 4G.
>> so .brk is needed.
>
> also need to make sure zero page and command line get ident mapping.
> because arch/x86/boot/compressed/head_64.S is using them.
>

... or we need to do the same kind of thing there.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  0:54             ` H. Peter Anvin
@ 2012-12-14  1:00               ` Yinghai Lu
  2012-12-14  1:04                 ` H. Peter Anvin
  0 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-14  1:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On Thu, Dec 13, 2012 at 4:54 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/13/2012 04:51 PM, Yinghai Lu wrote:

>> also need to make sure zero page and command line get ident mapping.
>> because arch/x86/boot/compressed/head_64.S is using them.
>>
>
> ... or we need to do the same kind of thing there.
>
your #PF handler approach will  handle accessing kernel code together
with zero_page, command_line, ramdisk ?
so only need to map first 2M?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-14  1:00               ` Yinghai Lu
@ 2012-12-14  1:04                 ` H. Peter Anvin
  0 siblings, 0 replies; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-14  1:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, Eric W. Biederman, Andrew Morton,
	linux-kernel

On 12/13/2012 05:00 PM, Yinghai Lu wrote:
> On Thu, Dec 13, 2012 at 4:54 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 12/13/2012 04:51 PM, Yinghai Lu wrote:
> 
>>> also need to make sure zero page and command line get ident mapping.
>>> because arch/x86/boot/compressed/head_64.S is using them.
>>>
>>
>> ... or we need to do the same kind of thing there.
>>
> your #PF handler approach will  handle accessing kernel code together
> with zero_page, command_line, ramdisk ?
> so only need to map first 2M?
> 

Well, it needs to map enough that it can bootstrap itself -- 2M is
certainly sufficient, but right now it touches memory which isn't
necessarily guaranteed to be in the same 2M chunk (specifically
.init.data and .init.text).

We can do the equivalent in the decompressor, there the memory layout is
even simpler, and it may very well be practical to say we only need the
caller to map a single 2M page RWX.

	-hpa


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image
  2012-12-13 23:27   ` H. Peter Anvin
  2012-12-14  0:13     ` Yinghai Lu
@ 2012-12-14  2:15     ` Yinghai Lu
  1 sibling, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-14  2:15 UTC (permalink / raw)
  To: H. Peter Anvin, Eric W. Biederman
  Cc: Thomas Gleixner, Ingo Molnar, Andrew Morton, linux-kernel

On Thu, Dec 13, 2012 at 3:27 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> if we depend on other
> things we should make that explicit, not just here but in boot.txt.

please check lines for boot.txt

---

**** 64-bit BOOT PROTOCOL

For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
We need a 64-bit boot protocol.

In 64-bit boot protocol, the first step in loading a Linux kernel
should be to setup the boot parameters (struct boot_params,
traditionally known as "zero page"). The memory for struct boot_params
should be allocated under or above 4G and initialized to all zero.
Then the setup header from offset 0x01f1 of kernel image on should be
loaded into struct boot_params and examined. The end of setup header
can be calculated as follow:

        0x0202 + byte value at offset 0x0201

In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as that
described in zero-page.txt.

After setting up the struct boot_params, the boot loader can load the
64-bit kernel in the same way as that of 16-bit boot protocol, but
kernel could be above 4G.

In 64-bit boot protocol, the kernel is started by jumping to the
64-bit kernel entry point, which is the start address of loaded
64-bit kernel plus 0x200.

At entry, the CPU must be in 64-bit mode with paging enabled.
The range with setup_header.init_size from start address of loaded
kernel and zero page and command line buffer get ident mapping;
a GDT must be loaded with the descriptors for selectors
__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
address of the struct boot_params.
---

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking
  2012-12-13 22:01 ` [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking Yinghai Lu
@ 2012-12-14 10:53   ` Borislav Petkov
  2012-12-19  3:30     ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Borislav Petkov @ 2012-12-14 10:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Thu, Dec 13, 2012 at 02:01:55PM -0800, Yinghai Lu wrote:
> During debug load kernel above 4G, found one page if is not used in BRK
> and it should be with early page allocation.

What does that mean?

I see that this patch adds a change to not use the page at pgt_buf_top
anymore but why? Is pgt_buf_top the first invalid address we cannot
reserve anymore?

Generally, can we get this whole deal described in a bit more detail for
the mere mortals among us, maybe a short ascii art thing showing what
all those pgt_buf_{start,end,top} mean.

> Fix that checking and also add print out for every allocation from BRK
> page table allocation.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 6f85de8..c4293cf 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
>  						__GFP_ZERO, order);
>  	}
>  
> -	if ((pgt_buf_end + num) >= pgt_buf_top) {
> +	if ((pgt_buf_end + num) > pgt_buf_top) {
>  		unsigned long ret;
>  		if (min_pfn_mapped >= max_pfn_mapped)
>  			panic("alloc_low_page: ran out of memory");
> @@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
>  	} else {
>  		pfn = pgt_buf_end;
>  		pgt_buf_end += num;
> +		printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",

		pr_debug

> +			pfn << PAGE_SHIFT, (pgt_buf_end << PAGE_SHIFT) - 1);
>  	}
>  
>  	for (i = 0; i < num; i++) {
> -- 
> 1.7.10.4

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd
  2012-12-13 22:01 ` [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
@ 2012-12-14 14:34   ` Borislav Petkov
  2012-12-19  3:37     ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Borislav Petkov @ 2012-12-14 14:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Thu, Dec 13, 2012 at 02:01:56PM -0800, Yinghai Lu wrote:
> Just like PUD_SIZE, and PMD_SIZE next calculation, aka
> round down and add size.

Why? Please explain more verbosely.

> also remove not need next checking, just pass end instead.
> later phys_pud_init uses PTRS_PER_PUD checking to exit early
> if end is too big.

Where? In the for-loop? Where does it check 'end'?

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  arch/x86/mm/init_64.c |    6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 4178530..91f116a 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -530,9 +530,7 @@ kernel_physical_mapping_init(unsigned long start,
>  		pgd_t *pgd = pgd_offset_k(start);
>  		pud_t *pud;
>  
> -		next = (start + PGDIR_SIZE) & PGDIR_MASK;
> -		if (next > end)
> -			next = end;
> +		next = (start & PGDIR_MASK) + PGDIR_SIZE;
>  
>  		if (pgd_val(*pgd)) {
>  			pud = (pud_t *)pgd_page_vaddr(*pgd);
> @@ -542,7 +540,7 @@ kernel_physical_mapping_init(unsigned long start,
>  		}
>  
>  		pud = alloc_low_page();
> -		last_map_addr = phys_pud_init(pud, __pa(start), __pa(next),
> +		last_map_addr = phys_pud_init(pud, __pa(start), __pa(end),
>  						 page_size_mask);
>  
>  		spin_lock(&init_mm.page_table_lock);
> -- 
> 1.7.10.4

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-13 22:01 ` [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200 Yinghai Lu
@ 2012-12-15 17:06   ` Borislav Petkov
  2012-12-19  3:44     ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Borislav Petkov @ 2012-12-15 17:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel, Matt Fleming

On Thu, Dec 13, 2012 at 02:01:57PM -0800, Yinghai Lu wrote:
> We are short of space before 0x200 that is entry for startup_64.

And you're moving this down because of the couple of bytes the next
patch is adding? If so, then explain that here.

> According to hpa, we can not change startup_64 to other offset and
> that become ABI now.
> 
> We could move function verify_cpu and no_longmode down, because one is
> used via call and another will not return.
> So could avoid extra code of jmp back and forth if we would move other
> lines.

What does that sentence even mean? Why is it in the commit message?

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Matt Fleming <matt.fleming@intel.com>
> ---
>  arch/x86/boot/compressed/head_64.S |   17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 2c4b171..fb984c0 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -176,14 +176,6 @@ ENTRY(startup_32)
>  	lret
>  ENDPROC(startup_32)
>  
> -no_longmode:
> -	/* This isn't an x86-64 CPU so hang */
> -1:
> -	hlt
> -	jmp     1b
> -
> -#include "../../kernel/verify_cpu.S"
> -
>  	/*
>  	 * Be careful here startup_64 needs to be at a predictable
>  	 * address so I can export it in an ELF header.  Bootloaders
> @@ -349,6 +341,15 @@ relocated:
>   */
>  	jmp	*%rbp
>  
> +	.code32
> +no_longmode:
> +	/* This isn't an x86-64 CPU so hang */
> +1:
> +	hlt
> +	jmp     1b
> +
> +#include "../../kernel/verify_cpu.S"
> +
>  	.data
>  gdt:
>  	.word	gdt_end - gdt
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section
  2012-12-13 22:01 ` [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
@ 2012-12-15 17:28   ` Borislav Petkov
  2012-12-19  3:53     ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Borislav Petkov @ 2012-12-15 17:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel, Zachary Amsden, Matt Fleming

On Thu, Dec 13, 2012 at 02:01:58PM -0800, Yinghai Lu wrote:
> commit 08da5a2ca
> 
>     x86_64: Early segment setup for VT
> 
> add lldt/ltr to clean more segments.
> 
> Those code are put in code64, and it is using gdt that is only
> loaded from code32 path.
> 
> That breaks booting with 64bit bootloader that does not go through
> code32 path. It get at startup_64 directly,  and it has different
> gdt.
> 
> Move those lines into code32 after their gdt is loaded.

Let me rewrite that commit message for ya, you tell me whether I got it
right:

"08da5a2ca479 ("x86_64: Early segment setup for VT") sets up LDT and TR
into a valid state in order to speed up boot decompression under VT. The
code which loads the GDT is executed in the 32-bit startup code while
the above change in the 64-bit part.

However, this breaks 64-bit bootloaders which jump straight to the
64-bit startup entry point and thus skip LDR and TR setup because they
use a different GDT.

Fix this by moving the LDT and TR setup to the 32-bit section."

Is that correct?

If so, why not take the time and try to write your commits more
understandably so that bystanders like me don't have to look at the code
first and scramble to understand what you mean?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G
  2012-12-13 22:01 ` [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G Yinghai Lu
@ 2012-12-16 17:49   ` Borislav Petkov
  2012-12-16 18:04     ` Yinghai Lu
  2012-12-19  3:57     ` Yinghai Lu
  0 siblings, 2 replies; 66+ messages in thread
From: Borislav Petkov @ 2012-12-16 17:49 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Thu, Dec 13, 2012 at 02:01:59PM -0800, Yinghai Lu wrote:
> After following patch:
> 	x86, 64bit: Set extra ident mapping for whole kernel range
> 
> We have extra ident mapping for kernel that is loaded above 1G.

What?

/me looks at next patch

Aaah, the *next* patch adds an extra ident mapping. Why don't you say
so?

> So need to clear extra pgd entry when kernel is loaded above 512g.

Why then isn't that patch following the next patch instead of coming
before it?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G
  2012-12-16 17:49   ` Borislav Petkov
@ 2012-12-16 18:04     ` Yinghai Lu
  2012-12-19  3:57     ` Yinghai Lu
  1 sibling, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-16 18:04 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel

On Sun, Dec 16, 2012 at 9:49 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Dec 13, 2012 at 02:01:59PM -0800, Yinghai Lu wrote:
>> After following patch:
>>       x86, 64bit: Set extra ident mapping for whole kernel range
>>
>> We have extra ident mapping for kernel that is loaded above 1G.
>
> What?
>
> /me looks at next patch
>
> Aaah, the *next* patch adds an extra ident mapping. Why don't you say
> so?
>
>> So need to clear extra pgd entry when kernel is loaded above 512g.
>
> Why then isn't that patch following the next patch instead of coming
> before it?
>

thanks a lot for checking the patch.
this patch is obsolete by #PF handler version.

so if possible, you can check my for-x86-boot branch instead.
   http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=shortlog;h=refs/heads/for-x86-boot

   git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-boot

will repost them after HPA is ok with my changes to his #PF handler patch.


Thanks

Yinghai

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking
  2012-12-14 10:53   ` Borislav Petkov
@ 2012-12-19  3:30     ` Yinghai Lu
  2012-12-19 17:16       ` Borislav Petkov
  0 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19  3:30 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel

On Fri, Dec 14, 2012 at 2:53 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Dec 13, 2012 at 02:01:55PM -0800, Yinghai Lu wrote:
>> During debug load kernel above 4G, found one page if is not used in BRK
>> and it should be with early page allocation.
>
> What does that mean?
>
> I see that this patch adds a change to not use the page at pgt_buf_top
> anymore but why? Is pgt_buf_top the first invalid address we cannot
> reserve anymore?
>
> Generally, can we get this whole deal described in a bit more detail for
> the mere mortals among us, maybe a short ascii art thing showing what
> all those pgt_buf_{start,end,top} mean.
>

change that too:
---
Subject: [PATCH] x86, mm: Fix page table early allocation offset checking

During debug load kernel above 4G, found one page if is not used in BRK
and it should be with early page allocation.

pgt_buf_top is address that can not be used, so should check if then new
end is above than that top, otherwise last page will not used.

Fix that checking and also add print out for every allocation from BRK
---

>>  arch/x86/mm/init.c |    4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index 6f85de8..c4293cf 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -47,7 +47,7 @@ __ref void *alloc_low_pages(unsigned int num)
>>                                               __GFP_ZERO, order);
>>       }
>>
>> -     if ((pgt_buf_end + num) >= pgt_buf_top) {
>> +     if ((pgt_buf_end + num) > pgt_buf_top) {
>>               unsigned long ret;
>>               if (min_pfn_mapped >= max_pfn_mapped)
>>                       panic("alloc_low_page: ran out of memory");
>> @@ -61,6 +61,8 @@ __ref void *alloc_low_pages(unsigned int num)
>>       } else {
>>               pfn = pgt_buf_end;
>>               pgt_buf_end += num;
>> +             printk(KERN_DEBUG "BRK [%#010lx, %#010lx] PGTABLE\n",
>
>                 pr_debug
>

I really hate pr_debug.

pr_debug is useless. it will not print out anything.

Yinghai

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd
  2012-12-14 14:34   ` Borislav Petkov
@ 2012-12-19  3:37     ` Yinghai Lu
  2012-12-19 20:48       ` Borislav Petkov
  0 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19  3:37 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel

On Fri, Dec 14, 2012 at 6:34 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Dec 13, 2012 at 02:01:56PM -0800, Yinghai Lu wrote:
>> Just like PUD_SIZE, and PMD_SIZE next calculation, aka
>> round down and add size.
>
> Why? Please explain more verbosely.
>
>> also remove not need next checking, just pass end instead.
>> later phys_pud_init uses PTRS_PER_PUD checking to exit early
>> if end is too big.
>
> Where? In the for-loop? Where does it check 'end'?
>

update to :

---
Subject: [PATCH] x86, mm: make pgd next calculation consistent with pud/pmd

Just like the way we calculate next for pud and pmd, aka
round down and add size.

also remove not needed next checking, just pass end with phys_pud_init.

pyhs_pud_init() uses PTRS_PER_PUD to stop its loop early so it could handle
big end properly.
---

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-15 17:06   ` Borislav Petkov
@ 2012-12-19  3:44     ` Yinghai Lu
  2012-12-19 20:57       ` Borislav Petkov
  0 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19  3:44 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel,
	Matt Fleming

On Sat, Dec 15, 2012 at 9:06 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Dec 13, 2012 at 02:01:57PM -0800, Yinghai Lu wrote:
>> We are short of space before 0x200 that is entry for startup_64.
>
> And you're moving this down because of the couple of bytes the next
> patch is adding? If so, then explain that here.

better?

---
Subject: [PATCH] x86, boot: move verify_cpu.S and no_longmode down

We need to move some code with 32bit section in following patch:

   x86, boot: Move lldt/ltr out of 64bit code section

but that will push startup_64 down from 0x200.

According to hpa, we can not change startup_64 to other offset and
that become ABI now.

We could move function verify_cpu and no_longmode down, because
verify_cpu is used via function call and no_longmode will not
return.
---

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section
  2012-12-15 17:28   ` Borislav Petkov
@ 2012-12-19  3:53     ` Yinghai Lu
  0 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19  3:53 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel,
	Zachary Amsden, Matt Fleming

On Sat, Dec 15, 2012 at 9:28 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Dec 13, 2012 at 02:01:58PM -0800, Yinghai Lu wrote:
>> commit 08da5a2ca
>>
>>     x86_64: Early segment setup for VT
>>
>> add lldt/ltr to clean more segments.
>>
>> Those code are put in code64, and it is using gdt that is only
>> loaded from code32 path.
>>
>> That breaks booting with 64bit bootloader that does not go through
>> code32 path. It get at startup_64 directly,  and it has different
>> gdt.
>>
>> Move those lines into code32 after their gdt is loaded.
>
> Let me rewrite that commit message for ya, you tell me whether I got it
> right:
>
> "08da5a2ca479 ("x86_64: Early segment setup for VT") sets up LDT and TR
> into a valid state in order to speed up boot decompression under VT. The
> code which loads the GDT is executed in the 32-bit startup code while
> the above change in the 64-bit part.
>
> However, this breaks 64-bit bootloaders which jump straight to the
> 64-bit startup entry point and thus skip LDR and TR setup because they
> use a different GDT.
>
> Fix this by moving the LDT and TR setup to the 32-bit section."
>
> Is that correct?
yes

update to:

---
Subject: [PATCH] x86, boot: Move lldt/ltr out of 64bit code section

commit 08da5a2ca

    x86_64: Early segment setup for VT

sets up LDT and TR into a valid state in order to speed up boot
decompression under VT.

Those code are put in code64, and it is using GDT that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path and jump to startup_64 directly, and it has different
GDT.

Move those lines into code32 after their GDT is loaded.
---

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G
  2012-12-16 17:49   ` Borislav Petkov
  2012-12-16 18:04     ` Yinghai Lu
@ 2012-12-19  3:57     ` Yinghai Lu
  1 sibling, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19  3:57 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel

On Sun, Dec 16, 2012 at 9:49 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Dec 13, 2012 at 02:01:59PM -0800, Yinghai Lu wrote:
>> After following patch:
>>       x86, 64bit: Set extra ident mapping for whole kernel range
>>
>> We have extra ident mapping for kernel that is loaded above 1G.
>
> What?
>
> /me looks at next patch
>
> Aaah, the *next* patch adds an extra ident mapping. Why don't you say
> so?
>
>> So need to clear extra pgd entry when kernel is loaded above 512g.
>
> Why then isn't that patch following the next patch instead of coming
> before it?

make the transition smooth.

zap_ident is after setting in head_64.S

anyway the whole zap_ident get deleted and we don't need this patch anymore.

Yinghai

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking
  2012-12-19  3:30     ` Yinghai Lu
@ 2012-12-19 17:16       ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2012-12-19 17:16 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Tue, Dec 18, 2012 at 07:30:06PM -0800, Yinghai Lu wrote:
> change that too:
> ---
> Subject: [PATCH] x86, mm: Fix page table early allocation offset checking
> 
> During debug load kernel above 4G, found one page if is not used in BRK
> and it should be with early page allocation.
> 
> pgt_buf_top is address that can not be used, so should check if then new
> end is above than that top, otherwise last page will not used.

Oh oh, I'm starting to slowly see the light. :-) You mean that at least
on 64-bit, we're calling alloc_low_pages with num=1 and the comparison
">=" is wrong because in that case we fall back to memblock allocation
even if we have the last page in BRK and can use it, correct?

If so, why don't you write the commit message like this instead:

"pgt_buf_top is the top BRK address which can not be used. We check
it before falling back to memblock allocation. However, the check to
fallback is wrongly off-by-one, leading to us not using the last BRK
page. However, we should do so, so fix that."

Makes sense?

[ … ]

> I really hate pr_debug.
> 
> pr_debug is useless. it will not print out anything.

Ok, I agree, pr_debug can mean different things depending on .config
settings.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd
  2012-12-19  3:37     ` Yinghai Lu
@ 2012-12-19 20:48       ` Borislav Petkov
  2012-12-19 21:55         ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Borislav Petkov @ 2012-12-19 20:48 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Tue, Dec 18, 2012 at 07:37:14PM -0800, Yinghai Lu wrote:
> update to :
> 
> ---
> Subject: [PATCH] x86, mm: make pgd next calculation consistent with pud/pmd
> 
> Just like the way we calculate next for pud and pmd, aka
> round down and add size.
> 
> also remove not needed next checking, just pass end with phys_pud_init.
> 
> pyhs_pud_init() uses PTRS_PER_PUD to stop its loop early so it could handle
> big end properly.

Almost there, let's merge the last two sentences:

"Also, do not boundary-check 'next' but pass 'end' down to
phys_pud_init() instead because the loop in there terminates at
PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly."

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-19  3:44     ` Yinghai Lu
@ 2012-12-19 20:57       ` Borislav Petkov
  2012-12-19 21:58         ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Borislav Petkov @ 2012-12-19 20:57 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel, Matt Fleming

On Tue, Dec 18, 2012 at 07:44:55PM -0800, Yinghai Lu wrote:
> On Sat, Dec 15, 2012 at 9:06 AM, Borislav Petkov <bp@alien8.de> wrote:
> > On Thu, Dec 13, 2012 at 02:01:57PM -0800, Yinghai Lu wrote:
> >> We are short of space before 0x200 that is entry for startup_64.
> >
> > And you're moving this down because of the couple of bytes the next
> > patch is adding? If so, then explain that here.
> 
> better?
> 
> ---
> Subject: [PATCH] x86, boot: move verify_cpu.S and no_longmode down
> 
> We need to move some code with 32bit section in following patch:
> 
>    x86, boot: Move lldt/ltr out of 64bit code section
> 
> but that will push startup_64 down from 0x200.
> 
> According to hpa, we can not change startup_64 to other offset and
> that become ABI now.
> 
> We could move function verify_cpu and no_longmode down, because
> verify_cpu is used via function call and no_longmode will not
> return.
> ---

Almost.

So this explains what you're doing but I'd like to know why?

Why do you need to free some more room between startup_32 and
startup_64? Do you need this room in another patch, maybe the next one:

"[PATCH v7 14/27] x86, boot: Move lldt/ltr out of 64bit code section"

Is that so? If yes, please write that in the commit message so that we
know why you're doing that change.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd
  2012-12-19 20:48       ` Borislav Petkov
@ 2012-12-19 21:55         ` Yinghai Lu
  0 siblings, 0 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19 21:55 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel

On Wed, Dec 19, 2012 at 12:48 PM, Borislav Petkov <bp@alien8.de> wrote:
>
> "Also, do not boundary-check 'next' but pass 'end' down to
> phys_pud_init() instead because the loop in there terminates at
> PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly."

---
Just like the way we calculate next for pud and pmd, aka
round down and add size.

Also, do not do boundary-checking with 'next', and just pass 'end'
down to phys_pud_init() instead. Because the loop in phys_pud_init()
terminates at PTRS_PER_PUD and thus can handle a possibly bigger
'end' properly.
---

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-19 20:57       ` Borislav Petkov
@ 2012-12-19 21:58         ` Yinghai Lu
  2012-12-19 22:04           ` Borislav Petkov
  2012-12-22  2:24           ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 66+ messages in thread
From: Yinghai Lu @ 2012-12-19 21:58 UTC (permalink / raw)
  To: Borislav Petkov, Yinghai Lu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Eric W. Biederman, Andrew Morton, linux-kernel,
	Matt Fleming

On Wed, Dec 19, 2012 at 12:57 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Dec 18, 2012 at 07:44:55PM -0800, Yinghai Lu wrote:
>
> So this explains what you're doing but I'd like to know why?
>
> Why do you need to free some more room between startup_32 and
> startup_64? Do you need this room in another patch, maybe the next one:
>
> "[PATCH v7 14/27] x86, boot: Move lldt/ltr out of 64bit code section"
>
> Is that so? If yes, please write that in the commit message so that we
> know why you're doing that change.

duplicate next patch commit log here. no, that's too long.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-19 21:58         ` Yinghai Lu
@ 2012-12-19 22:04           ` Borislav Petkov
  2012-12-22  2:24           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2012-12-19 22:04 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel, Matt Fleming

On Wed, Dec 19, 2012 at 01:58:57PM -0800, Yinghai Lu wrote:
> On Wed, Dec 19, 2012 at 12:57 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Tue, Dec 18, 2012 at 07:44:55PM -0800, Yinghai Lu wrote:
> >
> > So this explains what you're doing but I'd like to know why?
> >
> > Why do you need to free some more room between startup_32 and
> > startup_64? Do you need this room in another patch, maybe the next one:
> >
> > "[PATCH v7 14/27] x86, boot: Move lldt/ltr out of 64bit code section"
> >
> > Is that so? If yes, please write that in the commit message so that we
> > know why you're doing that change.
> 
> duplicate next patch commit log here. no, that's too long.

Sorry, I'm not suggesting to duplicate the patch commit log here -
simply say instead:

"We are short of space before address 0x200 which is the 64-bit entry
point (startup_64). Since we're going to need that space in the next
patch, and, according to hpa, startup_64 has become an ABI and thus
cannot be moved, move function verify_cpu and no_longmode further down."

See, clear and simple.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G
  2012-12-13 23:47 ` [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G H. Peter Anvin
  2012-12-14  0:00   ` Yinghai Lu
@ 2012-12-21 22:38   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-21 22:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Andrew Morton, linux-kernel, Stefano Stabellini

On Thu, Dec 13, 2012 at 03:47:36PM -0800, H. Peter Anvin wrote:
> There are obviously good bits to this patchset, but I'm really
> starting to think the "pseudo-linear mode" via a trap handler --
> meaning we can access all of memory without any extra effort --
> makes more sense.  In fact, that way we could just build the full
> page tables without worrying about incremental bootstrap, depending
> on if that is a complexity win or not.
> 
> Either way, this is for native only: the Xen domain builder or other
> similar entry paths should be setting up page tables that cover all
> of memory; I'm hoping Konrad and Stefano can confirm this.

We do setup __ka space for everything everything we need to a fault.

Prior to 3.5 we could have blown over to the MODULES_SPACE if the guest
was booted with more than 128GB. That is now fixed so most of the
space _after_ the ramdisk in the __ka space can be blown away
and we use __va. We do this "cleanup" in xen_pagetable_init stage.


Which works as a long as cleanup_highmap only cleans up to
max_pfn_mapped (which in Xen case covers up to the RAMdisk).

The layout of __ka space is as follow:

 kernel
 ramdisk
 an array called P2M for which we use __ka address until xen_pagetable_init
          at which point we blow away any __ka entries for it.
 start_info (we swap over to use __va before generic code is run).
 the initial page-tables (for which the kernel uses __va address). we blow
         away the __ka entries when  xen_pagetable_init is called.

This __ka cleanup business was done for 64-bit only as the 32-bit
code scared me.


Hm, why that function is called xen_pagetable_init seems a bit silly.
> 
> The only reason to go with another approach I can think of is if it
> makes 32/64-bit unification cleaner.
> 
> 	-hpa
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 24/27] x86: Add swiotlb force off support
  2012-12-13 22:02 ` [PATCH v6 24/27] x86: Add swiotlb force off support Yinghai Lu
@ 2012-12-22  2:18   ` Konrad Rzeszutek Wilk
  2012-12-22  5:00     ` Yinghai Lu
  0 siblings, 1 reply; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-22  2:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Thu, Dec 13, 2012 at 02:02:18PM -0800, Yinghai Lu wrote:
> So use could disable swiotlb from command line, even swiotlb support
> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.

Does this have any usage besides testing?

And also pls in the future use scripts/get_maintainer.pl so
that you can extract from the email of the maintainer (which would be me).

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> ---
>  Documentation/kernel-parameters.txt |    7 +++++++
>  arch/x86/kernel/pci-swiotlb.c       |   10 +++++-----
>  drivers/iommu/amd_iommu.c           |    1 +
>  include/linux/swiotlb.h             |    1 +
>  lib/swiotlb.c                       |    5 ++++-
>  5 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 20e248c..08b4c9d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2832,6 +2832,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  
>  	swiotlb=	[IA-64] Number of I/O TLB slabs
>  
> +	swiotlb=[force|off|on] [KNL] disable or enable swiotlb.
> +		force
> +		on
> +			Enable swiotlb.
> +		off
> +			Disable swiotlb.
> +
>  	switches=	[HW,M68k]
>  
>  	sysfs.deprecated=0|1 [KNL]
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6f93eb7..80afd3b 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -58,12 +58,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
> -	int use_swiotlb = swiotlb | swiotlb_force;
> -
>  	if (swiotlb_force)
>  		swiotlb = 1;
> +	else if (swiotlb_force_off)
> +		swiotlb = 0;
>  
> -	return use_swiotlb;
> +	return swiotlb;
>  }
>  IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
>  		  pci_xen_swiotlb_detect,
> @@ -76,9 +76,9 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
>   */
>  int __init pci_swiotlb_detect_4gb(void)
>  {
> -	/* don't initialize swiotlb if iommu=off (no_iommu=1) */
> +	/* don't initialize swiotlb if iommu=off (no_iommu=1) or force off */
>  #ifdef CONFIG_X86_64
> -	if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> +	if (!no_iommu && !swiotlb_force_off && max_pfn > MAX_DMA32_PFN)
>  		swiotlb = 1;
>  #endif
>  	return swiotlb;
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 55074cb..4f370d3 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3082,6 +3082,7 @@ int __init amd_iommu_init_dma_ops(void)
>  	unhandled = device_dma_ops_init();
>  	if (unhandled && max_pfn > MAX_DMA32_PFN) {
>  		/* There are unhandled devices - initialize swiotlb for them */
> +		WARN(swiotlb_force_off, "Please remove swiotlb=off\n");
>  		swiotlb = 1;
>  	}
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index f7535d1..dd7cf65 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -8,6 +8,7 @@ struct dma_attrs;
>  struct scatterlist;
>  
>  extern int swiotlb_force;
> +extern int swiotlb_force_off;
>  
>  /*
>   * Maximum allowable number of contiguous slabs to map,
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 6b99ea7..3f51b2c 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -51,6 +51,7 @@
>  #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
>  
>  int swiotlb_force;
> +int swiotlb_force_off;
>  
>  /*
>   * Used to do a quick range check in swiotlb_tbl_unmap_single and
> @@ -102,8 +103,10 @@ setup_io_tlb_npages(char *str)
>  	}
>  	if (*str == ',')
>  		++str;
> -	if (!strcmp(str, "force"))
> +	if (!strcmp(str, "force") || !strcmp(str, "on"))
>  		swiotlb_force = 1;
> +	if (!strcmp(str, "off"))
> +		swiotlb_force_off = 1;
>  
>  	return 1;
>  }
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb
  2012-12-13 22:02 ` [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb Yinghai Lu
@ 2012-12-22  2:21   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-22  2:21 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel, Yinghai Lu

On Thu, Dec 13, 2012 at 02:02:17PM -0800, Yinghai Lu wrote:
> Normal boot path on system with iommu support:
> swiotlb buffer will be allocated early at first and then try to initialize
> iommu, if iommu for intel or amd could setup properly, swiotlb buffer
> will be freed.
> 
> The early allocating is with bootmem, and could panic when we try to use
> kdump with buffer above 4G only.
> 
> Replace the panic with WARN, and the kernel can go on without swiotlb,
> and could iommu later.


What if SWIOTLB is the only option? Meaning there are no other IOMMUs?


> 
> Signed-off-by: Yinghai Lu <yinghai@kerne.org>
> ---
>  arch/x86/kernel/pci-swiotlb.c |    5 ++++-
>  include/linux/swiotlb.h       |    2 +-
>  lib/swiotlb.c                 |   17 +++++++++++------
>  3 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6c483ba..6f93eb7 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -91,7 +91,10 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
>  void __init pci_swiotlb_init(void)
>  {
>  	if (swiotlb) {
> -		swiotlb_init(0);
> +		if (swiotlb_init(0)) {
> +			swiotlb = 0;
> +			return;
> +		}
>  		dma_ops = &swiotlb_dma_ops;
>  	}
>  }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 8d08b3e..f7535d1 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -22,7 +22,7 @@ extern int swiotlb_force;
>   */
>  #define IO_TLB_SHIFT 11
>  
> -extern void swiotlb_init(int verbose);
> +int swiotlb_init(int verbose);
>  extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index f114bf6..6b99ea7 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -170,7 +170,7 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>   * Statically reserve bounce buffer space and initialize bounce buffer data
>   * structures for the software IO TLB used to implement the DMA API.
>   */
> -static void __init
> +static int __init
>  swiotlb_init_with_default_size(size_t default_size, int verbose)
>  {
>  	unsigned long bytes;
> @@ -185,17 +185,22 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
>  	/*
>  	 * Get IO TLB memory from the low pages
>  	 */
> -	io_tlb_start = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
> -	if (!io_tlb_start)
> -		panic("Cannot allocate SWIOTLB buffer");
> +	io_tlb_start = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
> +	if (!io_tlb_start) {
> +		WARN(1, "Cannot allocate SWIOTLB buffer");
> +		return -1;
> +	}
>  
>  	swiotlb_init_with_tbl(io_tlb_start, io_tlb_nslabs, verbose);
> +
> +	return 0;
>  }
>  
> -void __init
> +int __init
>  swiotlb_init(int verbose)
>  {
> -	swiotlb_init_with_default_size(64 * (1<<20), verbose);	/* default to 64MB */
> +	/* default to 64MB */
> +	return swiotlb_init_with_default_size(64 * (1<<20), verbose);
>  }
>  
>  /*
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200
  2012-12-19 21:58         ` Yinghai Lu
  2012-12-19 22:04           ` Borislav Petkov
@ 2012-12-22  2:24           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 66+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-22  2:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Eric W. Biederman, Andrew Morton, linux-kernel, Matt Fleming

On Wed, Dec 19, 2012 at 01:58:57PM -0800, Yinghai Lu wrote:
> On Wed, Dec 19, 2012 at 12:57 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Tue, Dec 18, 2012 at 07:44:55PM -0800, Yinghai Lu wrote:
> >
> > So this explains what you're doing but I'd like to know why?
> >
> > Why do you need to free some more room between startup_32 and
> > startup_64? Do you need this room in another patch, maybe the next one:
> >
> > "[PATCH v7 14/27] x86, boot: Move lldt/ltr out of 64bit code section"
> >
> > Is that so? If yes, please write that in the commit message so that we
> > know why you're doing that change.
> 
> duplicate next patch commit log here. no, that's too long.

Why is that a problem? Long patch commit logs are OK.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 24/27] x86: Add swiotlb force off support
  2012-12-22  2:18   ` Konrad Rzeszutek Wilk
@ 2012-12-22  5:00     ` Yinghai Lu
  2012-12-23  5:00       ` H. Peter Anvin
  0 siblings, 1 reply; 66+ messages in thread
From: Yinghai Lu @ 2012-12-22  5:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, linux-kernel

On Fri, Dec 21, 2012 at 6:18 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Dec 13, 2012 at 02:02:18PM -0800, Yinghai Lu wrote:
>> So use could disable swiotlb from command line, even swiotlb support
>> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
>
> Does this have any usage besides testing?

for kdump.

>
> And also pls in the future use scripts/get_maintainer.pl so
> that you can extract from the email of the maintainer (which would be me).
ok.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v6 24/27] x86: Add swiotlb force off support
  2012-12-22  5:00     ` Yinghai Lu
@ 2012-12-23  5:00       ` H. Peter Anvin
  0 siblings, 0 replies; 66+ messages in thread
From: H. Peter Anvin @ 2012-12-23  5:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Konrad Rzeszutek Wilk, Thomas Gleixner, Ingo Molnar,
	Eric W. Biederman, Andrew Morton, linux-kernel

On 12/21/2012 09:00 PM, Yinghai Lu wrote:
> On Fri, Dec 21, 2012 at 6:18 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Thu, Dec 13, 2012 at 02:02:18PM -0800, Yinghai Lu wrote:
>>> So use could disable swiotlb from command line, even swiotlb support
>>> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
>>
>> Does this have any usage besides testing?
>
> for kdump.
>

"For kdump" isn't an answer.  There is a reason why kdump needs this.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2012-12-23  5:00 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-13 22:01 [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G Yinghai Lu
2012-12-13 22:01 ` [PATCH v6 01/27] x86, mm: Fix page table early allocation offset checking Yinghai Lu
2012-12-14 10:53   ` Borislav Petkov
2012-12-19  3:30     ` Yinghai Lu
2012-12-19 17:16       ` Borislav Petkov
2012-12-13 22:01 ` [PATCH v6 02/27] x86, mm: make pgd next calculation consistent with pud/pmd Yinghai Lu
2012-12-14 14:34   ` Borislav Petkov
2012-12-19  3:37     ` Yinghai Lu
2012-12-19 20:48       ` Borislav Petkov
2012-12-19 21:55         ` Yinghai Lu
2012-12-13 22:01 ` [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200 Yinghai Lu
2012-12-15 17:06   ` Borislav Petkov
2012-12-19  3:44     ` Yinghai Lu
2012-12-19 20:57       ` Borislav Petkov
2012-12-19 21:58         ` Yinghai Lu
2012-12-19 22:04           ` Borislav Petkov
2012-12-22  2:24           ` Konrad Rzeszutek Wilk
2012-12-13 22:01 ` [PATCH v6 04/27] x86, boot: Move lldt/ltr out of 64bit code section Yinghai Lu
2012-12-15 17:28   ` Borislav Petkov
2012-12-19  3:53     ` Yinghai Lu
2012-12-13 22:01 ` [PATCH v6 05/27] x86, 64bit: clear ident mapping when kernel is above 512G Yinghai Lu
2012-12-16 17:49   ` Borislav Petkov
2012-12-16 18:04     ` Yinghai Lu
2012-12-19  3:57     ` Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 06/27] x86, 64bit: Set extra ident mapping for whole kernel range Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 07/27] x86: Merge early_reserve_initrd for 32bit and 64bit Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 08/27] x86: add get_ramdisk_image/size() Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 09/27] x86, boot: add get_cmd_line_ptr() Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 10/27] x86, boot: move checking of cmd_line_ptr out of common path Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 11/27] x86, boot: update cmd_line_ptr to unsigned long Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 12/27] x86: use io_remap to access real_mode_data Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 13/27] x86: use rsi/rdi to pass realmode_data pointer Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 14/27] x86, kexec: remove 1024G limitation for kexec buffer on 64bit Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 15/27] x86, kexec: set ident mapping for kernel that is above max_pfn Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 16/27] x86, kexec: Merge ident_mapping_init and init_level4_page Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 17/27] x86, kexec: only set ident mapping for ram Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 18/27] x86, boot: add fields to support load bzImage and ramdisk above 4G Yinghai Lu
2012-12-13 22:54   ` H. Peter Anvin
2012-12-13 23:28     ` Yinghai Lu
2012-12-13 23:38       ` H. Peter Anvin
2012-12-13 22:02 ` [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image Yinghai Lu
2012-12-13 23:27   ` H. Peter Anvin
2012-12-14  0:13     ` Yinghai Lu
2012-12-14  0:38       ` H. Peter Anvin
2012-12-14  0:44         ` Yinghai Lu
2012-12-14  0:51           ` H. Peter Anvin
2012-12-14  0:51           ` Yinghai Lu
2012-12-14  0:54             ` H. Peter Anvin
2012-12-14  1:00               ` Yinghai Lu
2012-12-14  1:04                 ` H. Peter Anvin
2012-12-14  2:15     ` Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 20/27] x86, 64bit: Print init kernel lowmap correctly Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 21/27] x86, boot: Not need to check setup_header version Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 22/27] mm: Add alloc_bootmem_low_pages_nopanic() Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb Yinghai Lu
2012-12-22  2:21   ` Konrad Rzeszutek Wilk
2012-12-13 22:02 ` [PATCH v6 24/27] x86: Add swiotlb force off support Yinghai Lu
2012-12-22  2:18   ` Konrad Rzeszutek Wilk
2012-12-22  5:00     ` Yinghai Lu
2012-12-23  5:00       ` H. Peter Anvin
2012-12-13 22:02 ` [PATCH v6 25/27] x86, kdump: remove crashkernel range find limit for 64bit Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 26/27] x86: add Crash kernel low reservation Yinghai Lu
2012-12-13 22:02 ` [PATCH v6 27/27] x86: Merge early kernel reserve for 32bit and 64bit Yinghai Lu
2012-12-13 23:47 ` [PATCH v6 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G H. Peter Anvin
2012-12-14  0:00   ` Yinghai Lu
2012-12-21 22:38   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).